CN110263643A - A kind of fast video people counting method based on sequential relationship - Google Patents

A kind of fast video people counting method based on sequential relationship Download PDF

Info

Publication number
CN110263643A
CN110263643A CN201910417972.3A CN201910417972A CN110263643A CN 110263643 A CN110263643 A CN 110263643A CN 201910417972 A CN201910417972 A CN 201910417972A CN 110263643 A CN110263643 A CN 110263643A
Authority
CN
China
Prior art keywords
model
neural network
image
network model
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910417972.3A
Other languages
Chinese (zh)
Other versions
CN110263643B (en
Inventor
周钊
郑莹斌
叶浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Chengguan Information Technology Co Ltd
Original Assignee
Shanghai Chengguan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Chengguan Information Technology Co Ltd filed Critical Shanghai Chengguan Information Technology Co Ltd
Priority to CN201910417972.3A priority Critical patent/CN110263643B/en
Publication of CN110263643A publication Critical patent/CN110263643A/en
Application granted granted Critical
Publication of CN110263643B publication Critical patent/CN110263643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of fast video people counting method based on sequential relationship, include the following steps: to construct lightweight neural network model, video image is loaded into lightweight neural network model, construct crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network model, lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model, the density map of image successive frame is converted density feature by the lightweight neural network model, construct Dynamic time series model, video image density feature and crowd density estimation model are loaded into Dynamic time series model Dynamic time series model to analyze the close frame of video image by weight computing formula, acquisition number value;By building lightweight neural network model and Dynamic time series model, solves the problem of in traditional people counting method that reaction speed is slow in neural network model and be only applicable to individual picture.

Description

A kind of fast video people counting method based on sequential relationship
Technical field
The invention belongs to computer vision fields, and in particular to a kind of fast video crowd counting side based on sequential relationship Method.
Background technique
Crowd counts: purpose is that crowd's number in statistics scene, content mainly include density estimation and number system Meter.Crowd's entirety state in which can be known roughly by estimation crowd density, so that the behavior to crowd judges. So as to it is safer, more effectively crowd is managed, such as sports ground, amusement place, conference centre, shopping center are easily sent out Accurate crowd's flow results can be obtained by counting crowd's number in raw short-term, Dense crowd management.
Main people counting method mainly includes two methods both at home and abroad at present, first is that extracting the close of image by network Then degree figure realizes the counting of image crowd by calculating density map again, this method is only applicable to individual picture, has ignored Context relation between video information first carries out feature extraction, then again second is that using for reference existing big network as basic network Density map calculating is carried out, generates current persons count's estimation, this method network parameter amount is bigger, cannot timely respond to.Guaranteeing essence Response speed is had lost in the case where degree.
Summary of the invention
The invention reside in a kind of fast video people counting method based on sequential relationship is provided, to solve traditional crowd Method of counting is suitable only for individual picture, has ignored context relation between video information and network parameter amount is bigger, The problem of cannot timely responding to, having lost response speed in the case where guaranteeing precision.
The invention is realized in this way the present invention provides a kind of fast video people counting method based on sequential relationship, Include the following steps:
Step S1: building lightweight neural network model,
Step S2: video image is loaded into lightweight neural network model, the lightweight neural network model pair Video image is pre-processed,
Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network In model, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,
Step S4: the density map of image successive frame is converted crowd image density spy by the lightweight neural network model Sign,
Step S5: building Dynamic time series model,
Step S6: video image density feature is loaded into Dynamic time series model, and crowd density estimation model is loaded into Into Dynamic time series model,
Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image phase Frame weight and present frame number are answered,
Step S8: according to the respective weights of respective frame, number value is obtained.
Preferably, lightweight neural network model described in the step S2 is to the pretreated mode of video image progress Matrixing is carried out to video image.
Preferably, the lightweight neural network model includes convolution operation and pondization operation.
Preferably, the convolution operation convolution kernel be 3 N channel network layer in carry out, the convolution operation obtains The density feature of image.
Preferably, the density feature of the pondization operation storage image.
Preferably, by crowd density estimation model to the feature vector of sport people into training, the sport people Feature vector is movement, gender, height and the appearance of sport people.
Preferably, the Dynamic time series model is made of multiple stages.
Preferably, the stage includes expansion one convolutional layer of convolution sum, and the expansion convolution is One-Dimensional Extended volume Product, the convolutional layer include an input layer and an output layer, and the output layer includes multiple hidden layers.
Preferably, nonlinear transformation, the input are carried out by ReLU activation primitive between the input layer and output layer Layer is using output layer information on last stage as input information.
Preferably, weight computing formula is
What wherein Vij was indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the feature dimensions that i is indicated Degree.What Wj was indicated is the video information weight of jth frame.
Compared with the existing methods, the beneficial effects of the present invention are: by building lightweight neural network model, nerve net It include pond and convolution operation in network, parameter amount is small in neural network, solves to ring in time in current people counting method It answers, has lost response speed in the case where guaranteeing precision, by constructing Dynamic time series model, Dynamic time series model includes multiple In the stage, feature context information association is realized in series connection between multiple stages, has solved only to be applicable in current people counting method In individual picture, the problem of having ignored the context relation between video information.
Detailed description of the invention
Fig. 1 is flow diagram of the invention;
Fig. 2 is lightweight neural network model of the invention;
Fig. 3 is Dynamic time series model of the invention;
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Fig. 1-3 is please referred to, the present invention provides a kind of fast video people counting method based on sequential relationship, including as follows Step:
Step S1: building lightweight neural network model,
Step S2: video image is loaded into lightweight neural network model, the lightweight neural network model pair Video image is pre-processed,
Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network In model, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,
Step S4: the density map of image successive frame is converted crowd image density spy by the lightweight neural network model Sign,
Step S5: building Dynamic time series model,
Step S6: video image density feature is loaded into Dynamic time series model, and crowd density estimation model is loaded into Into Dynamic time series model,
Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image phase Frame weight and present frame number are answered,
Step S8: according to the respective weights of respective frame, number value is obtained.
In the present embodiment, nerve in traditional people counting method is solved by building lightweight neural network model Network parameter amount is bigger in network model, cannot timely respond to, and has lost response speed in the case where guaranteeing precision.
In the present embodiment, by building Dynamic time series model, dynamic practice model obtains video image respective frame, and Present frame number, Dynamic time series model obtain number value, solve traditional crowd's technical method according to the respective weights of respective frame The density map summation that simply adds up can only be carried out, individual picture is only applicable to, has ignored the context relation between video information.
Further, lightweight neural network model described in the step S2 carries out pretreated mode to video image To carry out matrixing to video image.
In the present embodiment, matrix can be used to indicate in video image information, thus can using matrix theory and Matrix algorithm is analyzed and is handled to number, and video image is handled and analyzed by lightweight neural network model, And generate two-dimensional array storage image data.
Further, the lightweight neural network model includes convolution operation and pondization operation.
In the present embodiment, M2 indicates pondization operation, and what N@K × K was indicated is the network layer for the N channel that convolution kernel is K.
It in the present embodiment, include that convolution operation and pondization operate in lightweight neural network model, lightweight nerve Parameter amount is small in network model, and lightweight neural network model carries out quick response to image density feature, while keeping essence Response speed is not lost while spending.
Further, the convolution operation convolution kernel be 3 N channel network layer in carry out, the convolution operation obtains Take the density feature of image.
In the present embodiment, in the N channel network layer that convolution kernel is 3, lightweight neural network model is available very Good response, for the video of size 240*320, when under GPU environment being GeForce GTXTITANXP, processing It is per second that speed reaches 120 frames, and in the case where being i5-8500CPU@3.00GHz under CPU environment, it is every that processing speed reaches 25 frames Second.
Further, the pondization operates the density feature for storing image.
In the present embodiment, the video image characteristic that compression input is operated by pondization, on the one hand reduces feature and leads It causes parameter to reduce, and then simplifies complexity when lightweight neural network model calculates, on the other hand maintain feature Invariance dispels miscellaneous remaining information, stored key information by pondization operation.
Further, by crowd density estimation model to the feature vector of sport people into training, the sport people Feature vector be sport people movement, gender, height and appearance.
In the present embodiment, there is larger differences to need to merge a variety of characteristics of image for human's judgment in crowd's image Estimation crowd's number merges a variety of characteristics of image to larger difference in human's judgment by setting crowd density estimation model, reduces Lightweight neural network model handles crowd density feature difficulty.
Further, the Dynamic time series model is made of multiple stages.
In the present embodiment, what Ft was indicated is current video frame, and Ft-1 indicates the previous frame of current video frame, Ft+1 table Show that the next frame of current video frame, Block indicate that lightweight neural network model, stage indicate more in Dynamic time series model A stage.
In the present embodiment, it is realized by carrying out series connection to multiple stages in Dynamic time series model on characteristics of image The association of lower frame information.
Further, the stage includes expansion one convolutional layer of convolution sum, and the expansion convolution is one-dimensional expansion Convolution, the convolutional layer include an input layer and an output layer, and the output layer includes multiple hidden layers.
In the present embodiment, expansion convolution increases the impression of convolution kernel in the case where keeping parameter constant, simultaneously Guarantee that the Feature Mapping size of output pair is constant, convolution layer compression image information reduces model complexity.
Further, nonlinear transformation is carried out by ReLU activation primitive between the input layer and output layer, it is described defeated Enter layer using output layer information on last stage as input information.
In the present embodiment, ReLU function is nonlinear function, for linear function, the expression energy of ReLU function Power is stronger, and for nonlinear function, for ReLU function since the gradient in non-negative section is constant, there is no gradients to disappear Problem, so that the convergence rate of model maintains a stable state, while ReLu function has unilateral rejection ability, so that gently Magnitude neural network model has sparse activity.
Further, weight computing formula is
What wherein Vij was indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the feature dimensions that i is indicated Degree.What Wj was indicated is the video information weight of jth frame.
In the present embodiment, each frame information in video is obtained according to right calculation formula by Dynamic time series model Weighted value, while obtaining the number of each frame, further according to the respective weights of respective frame, obtain number value.
In the present embodiment, by building lightweight neural network model, image information is loaded into lightweight nerve In network model, lightweight neural network model carries out matrix variation to image information, stores image data by two bit arrays, Crowd density estimation model is built, and by crowd density estimation model loading into lightweight neural network model, lightweight mind The density map of image successive frame is obtained according to crowd density estimation model through network model, and the density map of successive frame is converted to Density feature builds Dynamic time series model, by density feature and crowd density estimation model loading into Dynamic time series model, moves State temporal model obtains respective frame weight by weight computing formula, and obtains present frame people according to crowd density estimation feature Number, Dynamic time series model obtain number value according to respective frame weight and present frame number.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (10)

1. a kind of fast video people counting method based on sequential relationship, characterized by the following steps:
Step S1: building lightweight neural network model,
Step S2: video image is loaded into lightweight neural network model, and the lightweight neural network model is to video Image is pre-processed,
Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network model In, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,
Step S4: the density map of image successive frame is converted crowd's image density feature by the lightweight neural network model,
Step S5: building Dynamic time series model,
Step S6: video image density feature is loaded into Dynamic time series model, crowd density estimation model is loaded into dynamic In state temporal model,
Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image respective frame Weight and present frame number,
Step S8: according to the respective weights of respective frame, number value is obtained.
2. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: institute Stating lightweight neural network model described in step S2 and carrying out pretreated mode to video image is to carry out square to video image Battle array transformation.
3. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: institute Stating lightweight neural network model includes convolution operation and pondization operation.
4. a kind of fast video people counting method based on sequential relationship according to claim 3, it is characterised in that: institute State convolution operation convolution kernel be 3 N channel network layer in carry out, the density that the convolution operation is used to obtain image is special Sign.
5. a kind of fast video people counting method based on sequential relationship according to claim 3, it is characterised in that: institute It states pondization and operates density feature for storing image.
6. a kind of fast video crowd's technical method based on sequential relationship according to claim 1, it is characterised in that: logical Crowd's Density estimating model is crossed to the feature vector of sport people into training, the feature vector of the sport people is sport people Movement, gender, height and appearance.
7. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: institute Dynamic time series model is stated to be made of multiple stages.
8. a kind of quick people counting method based on sequential relationship according to claim 7, it is characterised in that: the rank Section includes expansion one convolutional layer of convolution sum, and the expansion convolution is One-Dimensional Extended convolution, and the convolutional layer includes one Input layer and an output layer, the output layer include multiple hidden layers.
9. a kind of fast video people counting method based on sequential relationship according to claim 8, it is characterised in that: institute It states and nonlinear transformation is carried out by ReLU activation primitive between input layer and output layer, the input layer is with output on last stage Layer information is as input information.
10. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: Weight computing formula is
Wherein VijWhat is indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the characteristic dimension that i is indicated.Wj What is indicated is the video information weight of jth frame.
CN201910417972.3A 2019-05-20 2019-05-20 Quick video crowd counting method based on time sequence relation Active CN110263643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910417972.3A CN110263643B (en) 2019-05-20 2019-05-20 Quick video crowd counting method based on time sequence relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910417972.3A CN110263643B (en) 2019-05-20 2019-05-20 Quick video crowd counting method based on time sequence relation

Publications (2)

Publication Number Publication Date
CN110263643A true CN110263643A (en) 2019-09-20
CN110263643B CN110263643B (en) 2023-05-16

Family

ID=67914841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910417972.3A Active CN110263643B (en) 2019-05-20 2019-05-20 Quick video crowd counting method based on time sequence relation

Country Status (1)

Country Link
CN (1) CN110263643B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001274A (en) * 2020-08-06 2020-11-27 腾讯科技(深圳)有限公司 Crowd density determination method, device, storage medium and processor
CN112052833A (en) * 2020-09-27 2020-12-08 苏州科达科技股份有限公司 Object density monitoring system, method, video analysis server and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201215678D0 (en) * 2012-09-03 2012-10-17 Vision Semantics Ltd Crowd density estimation
CN102750710A (en) * 2012-05-31 2012-10-24 信帧电子技术(北京)有限公司 Method and device for counting motion targets in images
US20150310275A1 (en) * 2012-11-28 2015-10-29 Zte Corporation Method and device for calculating number and moving direction of pedestrians
WO2018059408A1 (en) * 2016-09-29 2018-04-05 北京市商汤科技开发有限公司 Cross-line counting method, and neural network training method and apparatus, and electronic device
CN108615027A (en) * 2018-05-11 2018-10-02 常州大学 A method of video crowd is counted based on shot and long term memory-Weighted Neural Network
CN108717528A (en) * 2018-05-15 2018-10-30 苏州平江历史街区保护整治有限责任公司 A kind of global population analysis method of more strategies based on depth network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750710A (en) * 2012-05-31 2012-10-24 信帧电子技术(北京)有限公司 Method and device for counting motion targets in images
GB201215678D0 (en) * 2012-09-03 2012-10-17 Vision Semantics Ltd Crowd density estimation
US20150310275A1 (en) * 2012-11-28 2015-10-29 Zte Corporation Method and device for calculating number and moving direction of pedestrians
WO2018059408A1 (en) * 2016-09-29 2018-04-05 北京市商汤科技开发有限公司 Cross-line counting method, and neural network training method and apparatus, and electronic device
CN108615027A (en) * 2018-05-11 2018-10-02 常州大学 A method of video crowd is counted based on shot and long term memory-Weighted Neural Network
CN108717528A (en) * 2018-05-15 2018-10-30 苏州平江历史街区保护整治有限责任公司 A kind of global population analysis method of more strategies based on depth network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
K SIMONYAN等: "《Very deep convolutional networks for large-scale image recognition》", 《ARXIV PREPRINT ARXIV:1409.1556》 *
WU X等: "《Adaptive scenario discovery for crowd counting》", 《ICASSP 2019-2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
ZHANG S等: "《 Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras》", 《PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
李海丰等: "《基于密度分类及组合特征的人数估计算法》", 《计算机应用研究》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001274A (en) * 2020-08-06 2020-11-27 腾讯科技(深圳)有限公司 Crowd density determination method, device, storage medium and processor
CN112001274B (en) * 2020-08-06 2023-11-17 腾讯科技(深圳)有限公司 Crowd density determining method, device, storage medium and processor
CN112052833A (en) * 2020-09-27 2020-12-08 苏州科达科技股份有限公司 Object density monitoring system, method, video analysis server and storage medium

Also Published As

Publication number Publication date
CN110263643B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
WO2021035807A1 (en) Target tracking method and device fusing optical flow information and siamese framework
Zhang et al. An improved deep computation model based on canonical polyadic decomposition
CN104424235B (en) The method and apparatus for realizing user profile cluster
CN110378844A (en) Motion blur method is gone based on the multiple dimensioned Image Blind for generating confrontation network is recycled
CN110111256A (en) Image Super-resolution Reconstruction method based on residual error distillation network
CN104778224B (en) A kind of destination object social networks recognition methods based on video semanteme
CN107729819A (en) A kind of face mask method based on sparse full convolutional neural networks
CN112040222B (en) Visual saliency prediction method and equipment
CN110458038A (en) The cross-domain action identification method of small data based on double-strand depth binary-flow network
CN112653899A (en) Network live broadcast video feature extraction method based on joint attention ResNeSt under complex scene
CN110263643A (en) A kind of fast video people counting method based on sequential relationship
CN109934881A (en) Image encoding method, the method for action recognition and computer equipment
CN109598732A (en) A kind of medical image cutting method based on three-dimensional space weighting
CN110147699A (en) A kind of image-recognizing method, device and relevant device
CN109886391A (en) A kind of neural network compression method based on the positive and negative diagonal convolution in space
CN110020639A (en) Video feature extraction method and relevant device
CN110321805A (en) A kind of dynamic expression recognition methods based on sequential relationship reasoning
Dai et al. Video scene segmentation using tensor-train faster-RCNN for multimedia IoT systems
CN110060286A (en) A kind of monocular depth estimation method
Lu et al. Heterogeneous model fusion federated learning mechanism based on model mapping
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN109086707A (en) A kind of expression method for tracing based on DCNNs-LSTM model
Wang et al. Basketball shooting angle calculation and analysis by deeply-learned vision model
Zhang et al. Fchp: Exploring the discriminative feature and feature correlation of feature maps for hierarchical dnn pruning and compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 15202, 15201 and 15203, Building 2, 498 Guoshoujing Road, Pudong New Area, Shanghai, 201203

Applicant after: SHANGHAI DUIGUAN INFORMATION TECHNOLOGY Co.,Ltd.

Address before: Room 2595, Building 6, No. 5995, Daye Highway, Fengxian District, Shanghai, 2010

Applicant before: SHANGHAI DUIGUAN INFORMATION TECHNOLOGY Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wu Xingjiao

Inventor after: Zheng Yingbin

Inventor after: Ye Hao

Inventor before: Zhou Zhao

Inventor before: Zheng Yingbin

Inventor before: Ye Hao

GR01 Patent grant
GR01 Patent grant