Summary of the invention
The invention reside in a kind of fast video people counting method based on sequential relationship is provided, to solve traditional crowd
Method of counting is suitable only for individual picture, has ignored context relation between video information and network parameter amount is bigger,
The problem of cannot timely responding to, having lost response speed in the case where guaranteeing precision.
The invention is realized in this way the present invention provides a kind of fast video people counting method based on sequential relationship,
Include the following steps:
Step S1: building lightweight neural network model,
Step S2: video image is loaded into lightweight neural network model, the lightweight neural network model pair
Video image is pre-processed,
Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network
In model, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,
Step S4: the density map of image successive frame is converted crowd image density spy by the lightweight neural network model
Sign,
Step S5: building Dynamic time series model,
Step S6: video image density feature is loaded into Dynamic time series model, and crowd density estimation model is loaded into
Into Dynamic time series model,
Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image phase
Frame weight and present frame number are answered,
Step S8: according to the respective weights of respective frame, number value is obtained.
Preferably, lightweight neural network model described in the step S2 is to the pretreated mode of video image progress
Matrixing is carried out to video image.
Preferably, the lightweight neural network model includes convolution operation and pondization operation.
Preferably, the convolution operation convolution kernel be 3 N channel network layer in carry out, the convolution operation obtains
The density feature of image.
Preferably, the density feature of the pondization operation storage image.
Preferably, by crowd density estimation model to the feature vector of sport people into training, the sport people
Feature vector is movement, gender, height and the appearance of sport people.
Preferably, the Dynamic time series model is made of multiple stages.
Preferably, the stage includes expansion one convolutional layer of convolution sum, and the expansion convolution is One-Dimensional Extended volume
Product, the convolutional layer include an input layer and an output layer, and the output layer includes multiple hidden layers.
Preferably, nonlinear transformation, the input are carried out by ReLU activation primitive between the input layer and output layer
Layer is using output layer information on last stage as input information.
Preferably, weight computing formula is
What wherein Vij was indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the feature dimensions that i is indicated
Degree.What Wj was indicated is the video information weight of jth frame.
Compared with the existing methods, the beneficial effects of the present invention are: by building lightweight neural network model, nerve net
It include pond and convolution operation in network, parameter amount is small in neural network, solves to ring in time in current people counting method
It answers, has lost response speed in the case where guaranteeing precision, by constructing Dynamic time series model, Dynamic time series model includes multiple
In the stage, feature context information association is realized in series connection between multiple stages, has solved only to be applicable in current people counting method
In individual picture, the problem of having ignored the context relation between video information.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Fig. 1-3 is please referred to, the present invention provides a kind of fast video people counting method based on sequential relationship, including as follows
Step:
Step S1: building lightweight neural network model,
Step S2: video image is loaded into lightweight neural network model, the lightweight neural network model pair
Video image is pre-processed,
Step S3: building crowd density estimation model, and crowd density estimation model is loaded into lightweight neural network
In model, the lightweight neural network model obtains the density map of image successive frame according to crowd density estimation model,
Step S4: the density map of image successive frame is converted crowd image density spy by the lightweight neural network model
Sign,
Step S5: building Dynamic time series model,
Step S6: video image density feature is loaded into Dynamic time series model, and crowd density estimation model is loaded into
Into Dynamic time series model,
Step S7: Dynamic time series model analyzes the close frame of video image by weight computing formula, obtains image phase
Frame weight and present frame number are answered,
Step S8: according to the respective weights of respective frame, number value is obtained.
In the present embodiment, nerve in traditional people counting method is solved by building lightweight neural network model
Network parameter amount is bigger in network model, cannot timely respond to, and has lost response speed in the case where guaranteeing precision.
In the present embodiment, by building Dynamic time series model, dynamic practice model obtains video image respective frame, and
Present frame number, Dynamic time series model obtain number value, solve traditional crowd's technical method according to the respective weights of respective frame
The density map summation that simply adds up can only be carried out, individual picture is only applicable to, has ignored the context relation between video information.
Further, lightweight neural network model described in the step S2 carries out pretreated mode to video image
To carry out matrixing to video image.
In the present embodiment, matrix can be used to indicate in video image information, thus can using matrix theory and
Matrix algorithm is analyzed and is handled to number, and video image is handled and analyzed by lightweight neural network model,
And generate two-dimensional array storage image data.
Further, the lightweight neural network model includes convolution operation and pondization operation.
In the present embodiment, M2 indicates pondization operation, and what N@K × K was indicated is the network layer for the N channel that convolution kernel is K.
It in the present embodiment, include that convolution operation and pondization operate in lightweight neural network model, lightweight nerve
Parameter amount is small in network model, and lightweight neural network model carries out quick response to image density feature, while keeping essence
Response speed is not lost while spending.
Further, the convolution operation convolution kernel be 3 N channel network layer in carry out, the convolution operation obtains
Take the density feature of image.
In the present embodiment, in the N channel network layer that convolution kernel is 3, lightweight neural network model is available very
Good response, for the video of size 240*320, when under GPU environment being GeForce GTXTITANXP, processing
It is per second that speed reaches 120 frames, and in the case where being i5-8500CPU@3.00GHz under CPU environment, it is every that processing speed reaches 25 frames
Second.
Further, the pondization operates the density feature for storing image.
In the present embodiment, the video image characteristic that compression input is operated by pondization, on the one hand reduces feature and leads
It causes parameter to reduce, and then simplifies complexity when lightweight neural network model calculates, on the other hand maintain feature
Invariance dispels miscellaneous remaining information, stored key information by pondization operation.
Further, by crowd density estimation model to the feature vector of sport people into training, the sport people
Feature vector be sport people movement, gender, height and appearance.
In the present embodiment, there is larger differences to need to merge a variety of characteristics of image for human's judgment in crowd's image
Estimation crowd's number merges a variety of characteristics of image to larger difference in human's judgment by setting crowd density estimation model, reduces
Lightweight neural network model handles crowd density feature difficulty.
Further, the Dynamic time series model is made of multiple stages.
In the present embodiment, what Ft was indicated is current video frame, and Ft-1 indicates the previous frame of current video frame, Ft+1 table
Show that the next frame of current video frame, Block indicate that lightweight neural network model, stage indicate more in Dynamic time series model
A stage.
In the present embodiment, it is realized by carrying out series connection to multiple stages in Dynamic time series model on characteristics of image
The association of lower frame information.
Further, the stage includes expansion one convolutional layer of convolution sum, and the expansion convolution is one-dimensional expansion
Convolution, the convolutional layer include an input layer and an output layer, and the output layer includes multiple hidden layers.
In the present embodiment, expansion convolution increases the impression of convolution kernel in the case where keeping parameter constant, simultaneously
Guarantee that the Feature Mapping size of output pair is constant, convolution layer compression image information reduces model complexity.
Further, nonlinear transformation is carried out by ReLU activation primitive between the input layer and output layer, it is described defeated
Enter layer using output layer information on last stage as input information.
In the present embodiment, ReLU function is nonlinear function, for linear function, the expression energy of ReLU function
Power is stronger, and for nonlinear function, for ReLU function since the gradient in non-negative section is constant, there is no gradients to disappear
Problem, so that the convergence rate of model maintains a stable state, while ReLu function has unilateral rejection ability, so that gently
Magnitude neural network model has sparse activity.
Further, weight computing formula is
What wherein Vij was indicated is feature vector group, and what wherein j was indicated is continuous number of video frames, the feature dimensions that i is indicated
Degree.What Wj was indicated is the video information weight of jth frame.
In the present embodiment, each frame information in video is obtained according to right calculation formula by Dynamic time series model
Weighted value, while obtaining the number of each frame, further according to the respective weights of respective frame, obtain number value.
In the present embodiment, by building lightweight neural network model, image information is loaded into lightweight nerve
In network model, lightweight neural network model carries out matrix variation to image information, stores image data by two bit arrays,
Crowd density estimation model is built, and by crowd density estimation model loading into lightweight neural network model, lightweight mind
The density map of image successive frame is obtained according to crowd density estimation model through network model, and the density map of successive frame is converted to
Density feature builds Dynamic time series model, by density feature and crowd density estimation model loading into Dynamic time series model, moves
State temporal model obtains respective frame weight by weight computing formula, and obtains present frame people according to crowd density estimation feature
Number, Dynamic time series model obtain number value according to respective frame weight and present frame number.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding
And modification, the scope of the present invention is defined by the appended.