CN109697435B

CN109697435B - People flow monitoring method and device, storage medium and equipment

Info

Publication number: CN109697435B
Application number: CN201910012764.5A
Authority: CN
Inventors: 周曦; 姚志强; 周翔; 李夏凤; 李继伟; 张庭
Original assignee: Chongqing Zhongke Yuncong Technology Co ltd
Current assignee: Chongqing Zhongke Yuncong Technology Co ltd
Priority date: 2018-12-14
Filing date: 2019-01-07
Publication date: 2020-10-23
Anticipated expiration: 2039-01-07
Also published as: CN109697435A

Abstract

The invention provides a people flow monitoring method, a people flow monitoring device, a storage medium and equipment, which are suitable for the technical field of image processing. The method comprises the following steps: acquiring a target image of a pedestrian to be monitored in a video image; extracting graphic features by using a model based on deep residual error network training; classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model; adopting a density regression model to obtain a human head distribution density map of a human body detection classified human body region, and calculating the number of people in the human body region according to the human head distribution density map; counting the number of people in the single region and the crowd region, and calculating the total number of people in the target image. When the people flow is monitored, the number of people in the video image is counted based on the depth residual error network in combination with the people detection and the density analysis, the number of people in the image can be accurately and rapidly counted, and the method has better robustness.

Description

People flow monitoring method and device, storage medium and equipment

Technical Field

The invention relates to the technical field of information, in particular to a people flow monitoring method, a people flow monitoring device, a storage medium and people flow monitoring equipment.

Background

In recent years, people counting technology is a research hotspot which is concerned by the industry, and is gradually applied to chain stores, supermarkets, hotels, airports, subways, scenic spots and the like of various malls, and people flow data generated under the scenes can provide valuable information for many fields. For chain stores and supermarkets in various shopping malls, in the face of the existing on-line electronic commerce systems with fire and heat, such as Jingdong, Taobao, Tianmao, Amazon and the like, the off-line sales markets are always pressed, and scientific management is obviously an effective means for improving self competitiveness. The people flow data of different time periods and different areas in the commercial place play an important role in improving the scientificity of the operation decision, the rationality of resource scheduling, the comfort of the consumption environment and the like, and the business people flow data has important significance on performance assessment, commodity conversion rate, shop site selection, commodity display and advertising value of the business. In addition, in public places such as exhibition halls, gymnasiums, subway stations, bus stations, airports and the like, people flow data can show real-time and accurate regional people number and crowd density, managers dynamically adjust staff configuration plans through data analysis, control regional crowd number, enable resources to be used more reasonably, and meanwhile can strengthen safety precaution.

However, the monitoring accuracy of the traditional pedestrian flow rate statistics can be affected by intensive crowds and various different shielding lights, and the pedestrian flow rate statistics accuracy of the intensive crowds with complex background lights is poorer in the sparse scene.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a people flow rate monitoring method, device, storage medium and device, which are used to solve the problem in the prior art that people flow rate is not accurately counted for dense people under complicated background lighting shadows when people flow rate is detected and analyzed with density.

To achieve the above and other related objects, in a first aspect of the present application, the present invention provides a people flow rate monitoring method, including:

acquiring a target image of a pedestrian to be monitored in a video image;

extracting the target image characteristics based on a model trained by a depth residual error network;

classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model;

acquiring a human head distribution density map in a human group area in a human group detection module by adopting a density regression model, and calculating the number of people in the human group area according to the human head distribution density map;

and counting the number of people in the single area and the crowd area, and calculating the total number of people in the target image.

In a second aspect of the present application, there is provided a people flow monitoring device, comprising:

the image acquisition module is used for acquiring a target image of a pedestrian to be monitored in the video image;

the characteristic extraction module is used for extracting the characteristics of the target image by using a depth residual error network;

the crowd detection module is used for classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model;

the density regression module is used for acquiring a human head distribution density map in a human group area in the crowd detection module by adopting a density regression model and calculating the number of people in the human group area according to the human head distribution density map;

and the number counting module is used for counting the number of people in the single person area and the crowd area and calculating the total number of people in the target image.

In a third aspect of the present application, a storage medium is provided, storing computer-readable instructions, which can cause at least one processor to execute the method described above.

In a fourth aspect of the present application, there is provided a people flow monitoring device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to execute the instructions, the one or more processors executing the instructions to cause the electronic device to perform the people flow monitoring method described above.

As described above, the people flow rate monitoring method, device, storage medium and apparatus of the present invention have the following beneficial effects:

according to the invention, when people flow is monitored, the number of people in the video image is counted based on the depth residual error network in combination with crowd detection and density analysis, and since the feature extraction based on the depth residual error network is not easily influenced by the outside, the accuracy of extracting the head and shoulder features of people is ensured, meanwhile, the people in the image is classified and identified by using the crowd detection mode, and different people counting modes are adopted aiming at different density areas, so that the number of people in the image can be accurately and rapidly counted, and the robustness is better.

Drawings

FIG. 1 is a flow chart of a method for monitoring human traffic according to the present invention;

FIG. 2 is a flowchart illustrating training a crowd detection model in a pedestrian flow monitoring method according to the present invention;

FIG. 3 is a block diagram of a human flow monitoring device according to the present invention;

FIG. 4 is a block diagram of a training population detection module in a human flow monitoring device according to the present invention;

fig. 5 is a schematic structural diagram of a human flow monitoring device according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Before explaining the embodiments of the present invention in detail, some terms related to the embodiments of the present invention are explained.

Deep learning: this concept stems from the study of artificial neural networks. For example, a multi-layer sensor with multiple hidden layers is a deep learning structure. In which deep learning forms more abstract high-level features by combining low-level features to explore a distributed feature representation of the data.

In another expression, deep learning is a method based on characterizing learning of data. A number of ways may be used to represent the observation (e.g., an image). Such as a vector of intensity values for each pixel in the image, or the image may be more abstractly represented as a series of edges, areas of particular shapes, etc. And task learning from examples can be easier by using some specific representation methods, such as face recognition or facial expression recognition. The advantage of deep learning is to use unsupervised or supervised feature learning and hierarchical feature extraction efficient algorithm to replace the manual feature acquisition.

Depth residual network (ResNet): the depth of the neural network is very important to its performance, so in an ideal case, the depth should be as deep as possible, as long as the network does not fit well. However, an optimization problem that can be encountered when actually training a neural network is that, as the depth of the neural network is continuously deepened, the gradient tends to disappear (i.e., gradient dispersion) the later, it is difficult to optimize the model, and the accuracy of the network is rather reduced. Alternatively, when the depth of the neural network is continuously increased, a problem of recurrence occurs, i.e., the accuracy rate increases first and then reaches saturation, and then the accuracy rate decreases as the depth is continuously increased.

Based on the above description, it can be known that when the number of network layers reaches a certain number, the performance of the network is saturated, and the performance of the network starts to degrade when the number of network layers is increased, but the degradation is not caused by overfitting because the training precision and the testing precision are reduced, which indicates that the neural network is difficult to train when the network reaches a certain depth. ResNet appears to solve the problem of performance degradation after network depth is increased. Specifically, ResNet proposes a depth residual learning (deep residual learning) framework to solve this performance degradation problem due to increased depth.

If a shallower network reaches the saturation accuracy, several Identity mapping layers are added after the network, at least the error will not increase, i.e. the deeper network should not bring the error rise on the training set. The idea of using congruent mapping to directly transfer the previous layer output to the next layer is the source of the sensitivity of ResNet.

The following describes an implementation environment related to the face retrieval method provided by the embodiment of the present invention.

Example 1

Referring to fig. 1, a flow chart of a people flow rate monitoring method according to the present invention is shown, which is detailed as follows:

step S101, acquiring a target image of a pedestrian to be monitored in a video image;

the source of the obtained video image may be a camera installed at each place, for example, image information of a corresponding area collected by a camera installed at a public place such as a shopping mall, a station, or some video images.

Step S102, extracting the target image characteristics by using a depth residual error network;

performing feature extraction on the target image based on each residual block sequentially connected in the depth residual network, and obtaining the head and shoulder of a single person, a box of a crowd area, confidence coefficient and density map information of the target image according to a network structure; any one residual block comprises an identity mapping and at least two convolution layers, and the identity mapping of any one residual block is directed to the output end of any one residual block from the input end of any one residual block;

specifically, the target image is input into a first residual block of the depth residual network; for any residual block, receiving the output of the last residual block, and performing feature extraction on the output of the last residual block based on the first convolutional layer, the second convolutional layer and the third convolutional layer; obtaining the output of the third convolutional layer, and transmitting the output of the third convolutional layer and the output of the last residual block to the next residual block; and acquiring the output of the last residual block of the depth residual network to obtain the human head-shoulder characteristics of the target image.

Specifically, because the ResNet network introduces a residual network structure which is used as a feature extraction algorithm to learn the head and shoulder features of people by using a deeper network layer number, a more accurate monitoring effect is obtained, meanwhile, the problem of gradient dispersion caused by the excessively deep network layer number is solved, the feature learning of a target image can be performed by using the deeper network structure, and the accuracy of people counting is ensured.

Step S103, classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model;

the method comprises the steps of obtaining a single head-shoulder area and a dense crowd area through a crowd detection model, and classifying and detecting the single head-shoulder area, the crowd area or the unmanned area in an original image by using a full convolution network when the original image is input.

Specifically, the target image is divided according to the crowd density degree, the input target image is converted into a plurality of regions of different types, and the subsequent individual statistics for each region is facilitated.

Step S104, acquiring a human head distribution density map in a crowd area by adopting a density regression model, and calculating the number of people in the crowd area according to the human head distribution density map;

calculating the Gaussian distribution of the head and the shoulder of the person in the target image crowd region to obtain a two-dimensional head distribution density map; calculating the total number of the people in the people distribution map to obtain the number of people in the crowd area; and the human head distribution density graph is represented by a pulse function convolution Gaussian kernel.

And S105, counting the number of people in the single person area and the crowd area, and calculating the total number of people in the target image.

The number of people in each corresponding area is respectively counted according to the area division by respectively adopting different number of people counting modes aiming at the dense area and the sparse area, and finally the number of people in each area is directly added to obtain the total number of people in the target image.

In the embodiment, the number of people is counted respectively for each area defined in the whole image (target image or original image), wherein the density map is obtained by detecting the position information (position coordinates) in the original image, that is, the number of people in the rectangular frame area is obtained by adopting the rectangular frame and calculating the density in the rectangular frame; the method has the advantages that the corresponding number of people can be accurately counted aiming at the dense people with illumination shadows, the crowd estimation capacity is improved, the current number of people can be reflected really, and the people flow of the people can be monitored rapidly and accurately by adopting different people counting modes according to different people densities in the image.

Example two

Please refer to fig. 2, which is a flowchart of a people flow monitoring method training a people detection model according to the present invention, and is detailed as follows:

step S201, labeling crowd areas of a plurality of sample images, labeling human heads and shoulders when the sample images are single images, labeling crowd frames when the sample images are crowd images, and constructing a crowd detection model according to the labeled areas of the sample images;

step S202, training the crowd detection model through a plurality of training samples, and generating the crowd detection model capable of realizing region classification positioning in a target image according to crowd characteristics.

Specifically, when the model is trained, data labels to be detected, that is, sample data (including sample images of various different densities) need to be prepared, for example, the input sample images are labeled according to clear human head and shoulder areas (calibrated in a framing form), and the label category is 1; marking the crowd with concentrated crowd density in the sample image, and marking the class as 2; and the other areas left in the sample, such as the area without human head and shoulder, are marked with a category of 0 and are not marked.

When the density regression model is trained, the whole graph can be used for end-to-end training, Gaussian distribution is obtained according to the human head area of the whole graph, a two-dimensional human head distribution density graph is obtained, and all values of the density graph are added to obtain the number of people in the whole graph. In addition, the density icon is generated offline or online according to a preset sampling multiple under a network structure, and a human head density distribution graph obtained by training and learning is generated, wherein the density graph is defined by adopting a pulse function convolution Gaussian kernel mode:

in the formula, D represents a final density map, N represents the number of human heads, x represents an image point, xi represents a human head position point, represents a pulse function, and G represents a Gaussian kernel function.

For the training, a caffe algorithm may be used to train the crowd detection model and the density regression model respectively, for example, the crowd detection model is trained first, the corresponding network part is fixed, and then the density regression model is trained (Loss may be used for supervised training, while Loss may be used for classification).

In addition, the human head-shoulder features in the target image for extraction may also be:

establishing a training sample set: acquiring a video monitoring frame image, performing various preprocessing on the acquired image, and determining the number of people in an image range in a manual mode;

the network structure is built according to Resnet Block, the Resnet network is a residual error network structure with stacked blocks, the network structure uses a full convolution network, full connection layers are not needed, the Resnet network structure disclosed at present has 18 layers, and the convolution layers of the whole convolution network structure are many, so that the Resnet network has at least one Block structure;

training a convolutional neural network model: after initialization, carrying out iterative training on the constructed convolutional neural network model by adopting a random gradient descent method, detecting the values of the gradient and the loss function once every iteration so as to obtain the optimal solution of each weight value W and bias value b in the network model structure, and obtaining the optimal convolutional neural network model of the training after iteration for multiple times;

and estimating the crowd density of the whole area according to a detection classification strategy through the obtained convolutional neural network classification models of the far and near partitions.

EXAMPLE III

Referring to fig. 3, a structural block diagram of a people flow rate monitoring device provided by the present invention is detailed as follows:

the image acquisition module 31 is used for acquiring a target image of a pedestrian to be monitored in the video image;

a feature extraction module 32, which extracts features in the target image by using a depth residual error network;

the method comprises the steps of extracting features of a target image based on all sequentially connected residual blocks in a depth residual error network, wherein any one of the residual blocks comprises an identity mapping layer and at least two convolution layers, and the identity mapping layer of any one of the residual blocks points to the output end of any one of the residual blocks from the input end of any one of the residual blocks.

The crowd detection module 33 is used for classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model;

labeling sample regions of a plurality of sample images, and constructing a crowd detection model according to the sample regions of the sample images;

and training the crowd detection model through a plurality of training samples to generate the crowd detection model capable of realizing region classification in a target image according to human head and shoulder characteristics.

The density regression module 34 is used for acquiring a human head distribution density map in the crowd region by adopting a density regression model, and calculating the number of people in the crowd region according to the human head distribution density map;

calculating the Gaussian distribution of the head and the shoulder of the person in the target image crowd region to obtain a two-dimensional human head distribution density map; calculating the total number of the people in the people distribution map to obtain the number of people in the crowd area; and the human head distribution density graph is represented by a pulse function convolution Gaussian kernel.

And the number counting module 35 is used for counting the number of people in the single person area and the crowd area and calculating the total number of people in the target image.

Referring to fig. 4, a block diagram of a crowd detection module in a people flow rate monitoring device according to the present invention is detailed as follows:

the model establishing unit 331 is configured to label crowd regions of a plurality of sample images, label a head and a shoulder of a person when a single image is in the sample images, label a group frame when a crowd image is in the sample images, and construct a crowd detection model according to the labeled regions of the sample images;

the model training unit 332 is configured to train the crowd detection model through a plurality of training samples, and generate the crowd detection model capable of realizing region classification and positioning in a target image according to crowd characteristics.

In the embodiment of the present invention, each unit of the device for monitoring the number of people may be implemented by a corresponding hardware or software unit, and each unit may be an independent software or hardware unit, or may be integrated into a software or hardware unit, which is not limited herein. The detailed implementation of each unit can refer to the description of the first embodiment, and is not repeated herein.

Example four

Fig. 5 shows a structure of a people flow monitoring device according to a fourth embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown.

The people flow monitoring device 5 of the embodiment of the invention comprises a processor 50, a memory 51 and a computer program 52 stored in the memory 51 and executable on the processor 50. The processor 50 executes the computer program 52 to implement the steps in the embodiment of the human traffic monitoring method, such as the steps S101 to S105 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the units in the above-described device embodiments, such as the functions of the units 31 to 35 shown in fig. 3.

In the embodiment of the invention, the number of people is respectively counted in each region defined in the whole image (target image), so that the corresponding number of people can be accurately counted even aiming at the dense people with illumination shadows, the estimation capability of the people is improved, the current number of people can be reflected more truly, and the people flow of the people can be monitored more quickly and accurately by respectively adopting different people counting modes according to different people densities in the image.

The computing equipment of the embodiment of the invention can be a personal computer, a smart phone and a tablet. The steps implemented when the processor 50 in the computing device 5 executes the computer program 52 to implement the blurring processing method for the motion-blurred image can refer to the description of the foregoing method embodiments, and are not described herein again.

EXAMPLE five

In an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program, and the computer program, when executed by a processor, implements the steps in the embodiment of the human traffic monitoring method described above, for example, steps S101 to S105 shown in fig. 1. Alternatively, the computer program may be adapted to perform the functions of the units of the above-described device embodiments, such as the functions of the units 31 to 35 shown in fig. 3, when executed by the processor.

The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium, such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.

In addition, each embodiment of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that a data processing program constitutes the present application. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present application, which also provides a non-volatile storage medium in which a data processing program is stored, which data processing program can be used to carry out any one of the above-described method embodiments of the present application.

In summary, when people flow monitoring is performed, the number of people in the video image is counted based on the depth residual error network in combination with crowd detection and density analysis, and since the feature extraction based on the depth residual error network is not easily influenced by the outside, the accuracy of extracting the head and shoulder features of people is ensured, meanwhile, classification and identification are performed in the image by using a crowd detection mode, and different people counting modes are adopted for different density areas, so that the number of people in the image can be accurately and rapidly counted, and the robustness is good. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A people flow monitoring method, comprising the steps of:

acquiring a target image of a pedestrian to be monitored in a video image;

extracting the target image features based on a model trained by a depth residual error network, wherein the features of the target image are extracted based on each sequentially connected residual error block in the depth residual error network, any one residual error block comprises an identity mapping and at least two convolution layers, and the identity mapping of any one residual error block points to the output end of any one residual error block from the input end of any one residual error block;

classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model; wherein, the distance and the near are utilized to detect the crowd in the image by adopting different convolutional neural network classification models;

acquiring a human head distribution density map in a human group area in a human group detection module by adopting a density regression model, and calculating the number of people in each human group area in different modes according to different human head distribution density maps;

2. The method of claim 1, wherein the training of the crowd detection model comprises:

labeling crowd regions of a plurality of sample images, labeling a head and a shoulder of a person when the sample images are single images, labeling a frame of the group when the sample images are crowd images, and constructing a crowd detection model according to the labeled regions of the sample images;

and training the crowd detection model through a plurality of training samples to generate the crowd detection model capable of realizing region classification positioning in a target image according to crowd characteristics.

3. The people flow monitoring method according to claim 1, wherein the step of obtaining a people distribution density map in the crowd area by using a density regression model and calculating the number of people in the crowd area according to the people distribution density map comprises the following steps:

calibrating the position of the head and the shoulder in the target image crowd region, and calculating to obtain a two-dimensional head distribution density graph for model training; based on the crowd region detected by the crowd detection model, obtaining a human head distribution density map according to a trained density regression model; calculating the number of people in the distribution map of the people in the region; the human head distribution density graph is represented by a pulse function convolution Gaussian kernel, and the statistics of the number of people adopts a mode of summing the density graph region.

4. A people flow monitoring device, the device comprising:

the characteristic extraction module is used for extracting the characteristics of the target image by utilizing a depth residual error network, wherein the characteristic extraction is carried out on the target image based on each sequentially connected residual error block in the depth residual error network, any one residual error block comprises an identity mapping and at least two convolution layers, and the identity mapping of any one residual error block points to the output end of any one residual error block from the input end of any one residual error block;

the crowd detection module is used for classifying and positioning an unmanned area, a single area and a crowd area in the target image by using a crowd detection model; wherein, the distance and the near are utilized to detect the crowd in the image by adopting different convolutional neural network classification models;

the density regression module is used for acquiring a human head distribution density map in a human group area in the crowd detection module by adopting a density regression model, and calculating the number of people in the human group area in different modes according to different human head distribution density maps;

5. The human flow monitoring device of claim 4, wherein the training of the crowd detection model comprises:

the model establishing unit is used for marking the crowd areas of a plurality of sample images, marking the head and the shoulders of a person when the sample images are single images, marking the frame of the crowd when the sample images are crowd images, and establishing a crowd detection model according to the marked areas of the sample images;

and the model training unit is used for training the crowd detection model through a plurality of training samples to generate the crowd detection model capable of realizing region classification positioning in the target image according to crowd characteristics.

6. The human flow monitoring device of claim 4, wherein the density regression module further comprises:

7. A people flow monitoring device, characterized in that the device comprises:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors with the execution of the instructions by the one or more processors causing the device to perform the people flow monitoring method of any one of claims 1-3.

8. A storage medium having stored thereon machine readable instructions for causing at least one processor to perform the method of traffic monitoring according to any of claims 1-3.