CN110263643A

CN110263643A - A kind of fast video people counting method based on sequential relationship

Info

Publication number: CN110263643A
Application number: CN201910417972.3A
Authority: CN
Inventors: 周钊; 郑莹斌; 叶浩
Original assignee: Shanghai Chengguan Information Technology Co Ltd
Current assignee: Shanghai Chengguan Information Technology Co Ltd
Priority date: 2019-05-20
Filing date: 2019-05-20
Publication date: 2019-09-20
Anticipated expiration: 2039-05-20
Also published as: CN110263643B

Abstract

The present invention provides a kind of fast video people counting method based on sequential relationship, include the following steps: to construct lightweight neural network model, video image is loaded into lightweight neural network model, construct crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network model, lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model, the density map of image successive frame is converted density feature by the lightweight neural network model, construct Dynamic time series model, video image density feature and crowd density estimation model are loaded into Dynamic time series model Dynamic time series model to analyze the close frame of video image by weight computing formula, acquisition number value；By building lightweight neural network model and Dynamic time series model, solves the problem of in traditional people counting method that reaction speed is slow in neural network model and be only applicable to individual picture.

Description

A kind of fast video people counting method based on sequential relationship

Technical field

The invention belongs to computer vision fields, and in particular to a kind of fast video crowd counting side based on sequential relationship Method.

Background technique

Crowd counts: purpose is that crowd's number in statistics scene, content mainly include density estimation and number system Meter.Crowd's entirety state in which can be known roughly by estimation crowd density, so that the behavior to crowd judges. So as to it is safer, more effectively crowd is managed, such as sports ground, amusement place, conference centre, shopping center are easily sent out Accurate crowd's flow results can be obtained by counting crowd's number in raw short-term, Dense crowd management.

Main people counting method mainly includes two methods both at home and abroad at present, first is that extracting the close of image by network Then degree figure realizes the counting of image crowd by calculating density map again, this method is only applicable to individual picture, has ignored Context relation between video information first carries out feature extraction, then again second is that using for reference existing big network as basic network Density map calculating is carried out, generates current persons count's estimation, this method network parameter amount is bigger, cannot timely respond to.Guaranteeing essence Response speed is had lost in the case where degree.

Summary of the invention

The invention reside in a kind of fast video people counting method based on sequential relationship is provided, to solve traditional crowd Method of counting is suitable only for individual picture, has ignored context relation between video information and network parameter amount is bigger, The problem of cannot timely responding to, having lost response speed in the case where guaranteeing precision.

The invention is realized in this way the present invention provides a kind of fast video people counting method based on sequential relationship, Include the following steps:

Step S1: building lightweight neural network model,

Step S2: video image is loaded into lightweight neural network model, the lightweight neural network model pair Video image is pre-processed,

Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network In model, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,

Step S4: the density map of image successive frame is converted crowd image density spy by the lightweight neural network model Sign,

Step S5: building Dynamic time series model,

Step S6: video image density feature is loaded into Dynamic time series model, and crowd density estimation model is loaded into Into Dynamic time series model,

Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image phase Frame weight and present frame number are answered,

Step S8: according to the respective weights of respective frame, number value is obtained.

Preferably, lightweight neural network model described in the step S2 is to the pretreated mode of video image progress Matrixing is carried out to video image.

Preferably, the lightweight neural network model includes convolution operation and pondization operation.

Preferably, the convolution operation convolution kernel be 3 N channel network layer in carry out, the convolution operation obtains The density feature of image.

Preferably, the density feature of the pondization operation storage image.

Preferably, by crowd density estimation model to the feature vector of sport people into training, the sport people Feature vector is movement, gender, height and the appearance of sport people.

Preferably, the Dynamic time series model is made of multiple stages.

Preferably, the stage includes expansion one convolutional layer of convolution sum, and the expansion convolution is One-Dimensional Extended volume Product, the convolutional layer include an input layer and an output layer, and the output layer includes multiple hidden layers.

Preferably, nonlinear transformation, the input are carried out by ReLU activation primitive between the input layer and output layer Layer is using output layer information on last stage as input information.

Preferably, weight computing formula is

What wherein Vij was indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the feature dimensions that i is indicated Degree.What Wj was indicated is the video information weight of jth frame.

Compared with the existing methods, the beneficial effects of the present invention are: by building lightweight neural network model, nerve net It include pond and convolution operation in network, parameter amount is small in neural network, solves to ring in time in current people counting method It answers, has lost response speed in the case where guaranteeing precision, by constructing Dynamic time series model, Dynamic time series model includes multiple In the stage, feature context information association is realized in series connection between multiple stages, has solved only to be applicable in current people counting method In individual picture, the problem of having ignored the context relation between video information.

Detailed description of the invention

Fig. 1 is flow diagram of the invention；

Fig. 2 is lightweight neural network model of the invention；

Fig. 3 is Dynamic time series model of the invention；

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Fig. 1-3 is please referred to, the present invention provides a kind of fast video people counting method based on sequential relationship, including as follows Step:

Step S1: building lightweight neural network model,

Step S5: building Dynamic time series model,

In the present embodiment, nerve in traditional people counting method is solved by building lightweight neural network model Network parameter amount is bigger in network model, cannot timely respond to, and has lost response speed in the case where guaranteeing precision.

In the present embodiment, by building Dynamic time series model, dynamic practice model obtains video image respective frame, and Present frame number, Dynamic time series model obtain number value, solve traditional crowd's technical method according to the respective weights of respective frame The density map summation that simply adds up can only be carried out, individual picture is only applicable to, has ignored the context relation between video information.

Further, lightweight neural network model described in the step S2 carries out pretreated mode to video image To carry out matrixing to video image.

In the present embodiment, matrix can be used to indicate in video image information, thus can using matrix theory and Matrix algorithm is analyzed and is handled to number, and video image is handled and analyzed by lightweight neural network model, And generate two-dimensional array storage image data.

Further, the lightweight neural network model includes convolution operation and pondization operation.

In the present embodiment, M2 indicates pondization operation, and what N@K × K was indicated is the network layer for the N channel that convolution kernel is K.

It in the present embodiment, include that convolution operation and pondization operate in lightweight neural network model, lightweight nerve Parameter amount is small in network model, and lightweight neural network model carries out quick response to image density feature, while keeping essence Response speed is not lost while spending.

Further, the convolution operation convolution kernel be 3 N channel network layer in carry out, the convolution operation obtains Take the density feature of image.

In the present embodiment, in the N channel network layer that convolution kernel is 3, lightweight neural network model is available very Good response, for the video of size 240*320, when under GPU environment being GeForce GTXTITANXP, processing It is per second that speed reaches 120 frames, and in the case where being i5-8500CPU@3.00GHz under CPU environment, it is every that processing speed reaches 25 frames Second.

Further, the pondization operates the density feature for storing image.

In the present embodiment, the video image characteristic that compression input is operated by pondization, on the one hand reduces feature and leads It causes parameter to reduce, and then simplifies complexity when lightweight neural network model calculates, on the other hand maintain feature Invariance dispels miscellaneous remaining information, stored key information by pondization operation.

Further, by crowd density estimation model to the feature vector of sport people into training, the sport people Feature vector be sport people movement, gender, height and appearance.

In the present embodiment, there is larger differences to need to merge a variety of characteristics of image for human's judgment in crowd's image Estimation crowd's number merges a variety of characteristics of image to larger difference in human's judgment by setting crowd density estimation model, reduces Lightweight neural network model handles crowd density feature difficulty.

Further, the Dynamic time series model is made of multiple stages.

In the present embodiment, what Ft was indicated is current video frame, and Ft-1 indicates the previous frame of current video frame, Ft+1 table Show that the next frame of current video frame, Block indicate that lightweight neural network model, stage indicate more in Dynamic time series model A stage.

In the present embodiment, it is realized by carrying out series connection to multiple stages in Dynamic time series model on characteristics of image The association of lower frame information.

Further, the stage includes expansion one convolutional layer of convolution sum, and the expansion convolution is one-dimensional expansion Convolution, the convolutional layer include an input layer and an output layer, and the output layer includes multiple hidden layers.

In the present embodiment, expansion convolution increases the impression of convolution kernel in the case where keeping parameter constant, simultaneously Guarantee that the Feature Mapping size of output pair is constant, convolution layer compression image information reduces model complexity.

Further, nonlinear transformation is carried out by ReLU activation primitive between the input layer and output layer, it is described defeated Enter layer using output layer information on last stage as input information.

In the present embodiment, ReLU function is nonlinear function, for linear function, the expression energy of ReLU function Power is stronger, and for nonlinear function, for ReLU function since the gradient in non-negative section is constant, there is no gradients to disappear Problem, so that the convergence rate of model maintains a stable state, while ReLu function has unilateral rejection ability, so that gently Magnitude neural network model has sparse activity.

Further, weight computing formula is

In the present embodiment, each frame information in video is obtained according to right calculation formula by Dynamic time series model Weighted value, while obtaining the number of each frame, further according to the respective weights of respective frame, obtain number value.

In the present embodiment, by building lightweight neural network model, image information is loaded into lightweight nerve In network model, lightweight neural network model carries out matrix variation to image information, stores image data by two bit arrays, Crowd density estimation model is built, and by crowd density estimation model loading into lightweight neural network model, lightweight mind The density map of image successive frame is obtained according to crowd density estimation model through network model, and the density map of successive frame is converted to Density feature builds Dynamic time series model, by density feature and crowd density estimation model loading into Dynamic time series model, moves State temporal model obtains respective frame weight by weight computing formula, and obtains present frame people according to crowd density estimation feature Number, Dynamic time series model obtain number value according to respective frame weight and present frame number.

It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims

1. a kind of fast video people counting method based on sequential relationship, characterized by the following steps:

Step S1: building lightweight neural network model,

Step S2: video image is loaded into lightweight neural network model, and the lightweight neural network model is to video Image is pre-processed,

Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network model In, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,

Step S4: the density map of image successive frame is converted crowd's image density feature by the lightweight neural network model,

Step S5: building Dynamic time series model,

Step S6: video image density feature is loaded into Dynamic time series model, crowd density estimation model is loaded into dynamic In state temporal model,

Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image respective frame Weight and present frame number,

2. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: institute Stating lightweight neural network model described in step S2 and carrying out pretreated mode to video image is to carry out square to video image Battle array transformation.

3. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: institute Stating lightweight neural network model includes convolution operation and pondization operation.

4. a kind of fast video people counting method based on sequential relationship according to claim 3, it is characterised in that: institute State convolution operation convolution kernel be 3 N channel network layer in carry out, the density that the convolution operation is used to obtain image is special Sign.

5. a kind of fast video people counting method based on sequential relationship according to claim 3, it is characterised in that: institute It states pondization and operates density feature for storing image.

6. a kind of fast video crowd's technical method based on sequential relationship according to claim 1, it is characterised in that: logical Crowd's Density estimating model is crossed to the feature vector of sport people into training, the feature vector of the sport people is sport people Movement, gender, height and appearance.

7. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: institute Dynamic time series model is stated to be made of multiple stages.

8. a kind of quick people counting method based on sequential relationship according to claim 7, it is characterised in that: the rank Section includes expansion one convolutional layer of convolution sum, and the expansion convolution is One-Dimensional Extended convolution, and the convolutional layer includes one Input layer and an output layer, the output layer include multiple hidden layers.

9. a kind of fast video people counting method based on sequential relationship according to claim 8, it is characterised in that: institute It states and nonlinear transformation is carried out by ReLU activation primitive between input layer and output layer, the input layer is with output on last stage Layer information is as input information.

10. a kind of fast video people counting method based on sequential relationship according to claim 1, it is characterised in that: Weight computing formula is

Wherein V_ijWhat is indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the characteristic dimension that i is indicated.W_j What is indicated is the video information weight of jth frame.