CN110276264A

CN110276264A - A kind of crowd density estimation method based on foreground segmentation figure

Info

Publication number: CN110276264A
Application number: CN201910446452.5A
Authority: CN
Inventors: 徐浩; 夏思宇
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-05-27
Filing date: 2019-05-27
Publication date: 2019-09-24
Anticipated expiration: 2039-05-27
Also published as: CN110276264B

Abstract

The invention discloses a kind of crowd density estimation methods based on foreground segmentation figure, and the method includes the steps of: the number of people of every figure of mark obtains point label figure；It is desired to make money or profit based on a label and obtains crowd density figure with Gaussian smoothing method；Crowd's foreground segmentation figure is obtained using the method for Threshold segmentation based on density map；The neural network for image characteristics extraction is separately designed, for the neural network branch that crowd density returns, and for the neural network branch of crowd's foreground segmentation, finally merges the output of two network branches to obtain final output；Make training set, training pattern；Crowd density estimation result is obtained using trained model measurement input picture.The method of the present invention can allow neural network learning to foreground segmentation information without additional mark information, and the erroneous detection problem of complex background can be effectively avoided using this method.

Description

Crowd density estimation method based on foreground segmentation graph

Technical Field

The invention belongs to the field of computer vision and image processing, and particularly relates to a crowd density estimation method based on a foreground segmentation map.

Background

Given a picture, the total number of people and the spatial distribution of people in the image are detected by using a computer vision technology, and the task of estimating the crowd density is realized. The existing crowd density estimation method mainly comprises two modes of pedestrian detection based on target detection, a head and shoulder detection method and density map regression based. The target detection-based method converts crowd density estimation into a pedestrian detection problem, and finally counts the detected targets to obtain the final number of people. The method has good detection effect on sparse people, and for high-density areas, the detection accuracy rate can be sharply reduced due to factors such as shielding and dimension. The method based on density map regression is characterized in that image blocks are regarded as a whole, the number of people in the region is directly regressed through local image features, the problems of target shielding, undersize and the like can be effectively solved, the regression accuracy of the high-density crowd image is high, the problem is that the effect on sparse crowds is poor, background false detection is easily caused, and the position of a specific target cannot be obtained.

The invention provides a novel crowd density estimation method based on a foreground segmentation graph, which can effectively inhibit the problem of false detection of the background, improve the stability of the algorithm and improve the precision of the algorithm.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems, the invention provides a crowd density estimation method based on a foreground segmentation graph, which can predict the foreground segmentation graph and the crowd density graph at the same time, and finally, the foreground segmentation graph and the crowd density graph are fused to effectively remove false detection of the background.

The technical scheme is as follows: in order to achieve the purpose of the invention, the invention provides a crowd density estimation method based on a foreground segmentation graph, which comprises the following steps:

step 1: marking the head of each image containing the crowd to obtain a point marking map;

step 2: obtaining a crowd density map by using a Gaussian smoothing method based on the point marker map;

and step 3: obtaining a crowd prospect segmentation map by using a threshold segmentation method based on the crowd density map;

and 4, step 4: respectively designing a neural network (backbone network) for extracting image features, a neural network branch (regression network branch) for crowd density regression and a neural network branch (foreground segmentation network branch) for crowd foreground segmentation, and inputting an image into the backbone network to obtain image features; then, inputting the image characteristics into a regression network and a foreground segmentation network branch simultaneously to obtain a crowd density graph and a crowd foreground segmentation graph respectively, and finally fusing the outputs of the two network branches to obtain a final output;

and 5: making a data set by using the method in the step 1-3, and training a network model;

step 6: and testing the input image by using the trained model to obtain a crowd density estimation result.

Further, the step 1 comprises the following steps:

and marking each crowd picture in the training set as a point mark, namely marking a point on each head of each person to represent one person. And obtaining a point mark graph, wherein the point mark graph is a gray graph with the channel number of 1, the size of the point mark graph is consistent with the size of the crowd picture, the numerical value of the position with the head is 1, and the other positions are 0.

Further, the step 2 comprises the following steps:

determining the size of a Gaussian kernel: and observing the size distribution condition of the human head of the data set, and determining the size of the Gaussian filter kernel, wherein the optimal size of the Gaussian kernel is consistent with the size of the human head.

Performing Gaussian kernel convolution: and (3) performing convolution operation on the point labeled graph obtained in the step (1) to obtain a crowd density graph, and taking the crowd density graph as a learning target finally output by the whole network during network training. Referring to fig. 2, the higher the population density, the brighter the color, where the gray value of the population density map represents the population density.

Further, the step 3 comprises the following steps:

determining a density map segmentation threshold: determining a segmentation threshold according to the distribution condition of the crowd density map; by analyzing the crowd density map obtained in the step 2, it can be obviously seen that the crowd region is approximately positioned in the region of the crowd density map where the pixel value is higher than a certain threshold value, so that a threshold value can be selected to perform threshold segmentation on the crowd density map, and a crowd foreground segmentation map is obtained.

Obtaining a crowd prospect segmentation map: and (3) carrying out threshold segmentation on the crowd density graph obtained in the step (2) by using the selected threshold, wherein the position with the pixel value larger than the threshold in the crowd density graph belongs to the foreground region, the value of the corresponding position in the foreground segmentation graph is set to be 1, the position smaller than the threshold belongs to the background region, the value of the corresponding position in the foreground segmentation graph is set to be 0, the foreground segmentation graph of the crowd is obtained, and the obtained foreground segmentation graph is used as a learning target of the foreground segmentation network branch during network training.

Further, the step 4 comprises the following steps:

backbone network: the first 10 layers of the VGG16 network are adopted, the maximum pooling layer comprises 3 layers and 7 convolution layers, and the size of each convolution kernel is 3. The parameters of the backbone network are represented by theta_bkdTo indicate.

Branching a regression network: the design of the stacked cavity convolution layers is adopted, the number of the stacked layers is 6, the expansion rate of each cavity convolution layer is 2, and the size of a convolution kernel is 3. The parameter of regression network branch is represented by theta_regTo indicate.

Foreground segmentation network branching: by adopting a stacked cavity convolution design, the number of stacked layers is 3, the expansion rate of the cavity convolution layer is 2, and the size of a convolution kernel is 3. Splitting network branching parameters using Θ_segTo indicate.

The integral structure is as follows: firstly inputting an original image containing the crowd into a backbone network to obtain image characteristics, then simultaneously inputting the image characteristics into a regression network and a foreground segmentation network branch to respectively obtain a crowd density graph and a crowd foreground segmentation graph, and finally fusing the results of the two network branches to obtain a final model output result, namely a crowd density estimation result. The fusion is a masking operation, as shown in the following equation:

D_final(x)＝D_mask(x,Θ_seg,Θ_bkd)*D_reg(x,Θ_reg,Θ_bkd)

where x denotes input image data, D_mask(x,Θ_seg,Θ_bkd) Representing the crowd prospect segmented network branch output result, D_reg(x,Θ_reg,Θ_bkd) Representing the regression network branch output result, D_final(x) And representing the final output result of the model.

Further, the step 5 comprises the following steps:

and (4) acquiring a data set by using the method in the step (1-3), training the neural network model designed in the step (4) by using the data set, and adopting an SGD algorithm in the training method.

Referring to fig. 2, the process of making the data set may be described as the following process:

firstly, all pictures in the crowd image set obtain corresponding point marker maps according to step 1. Then, for each image x containing the crowd_iAnd the corresponding point mark map is used for carrying out the following processes:

(1) determining the size sigma of a Gaussian kernel according to the overall size distribution of the data set human head;

(2) calculating a normalized Gaussian kernel according to the sigma;

(3) convolving the point mark map by using a Gaussian kernel to obtain a crowd density map D_gt(x_i)；

(4) The crowd foreground segmentation graph S can be obtained by carrying out threshold processing on the crowd density distribution graph obtained in the process (3)_gt(x_i)。

The model training process is as follows:

and (4) designing a loss function. The loss function contains two parts: the minimum Mean Square Error loss (MSE, Mean Square Error) between the final model output result and the crowd density graph generated by the point labeled graph is shown as a formula (1); the two-class cross Entropy loss (Bce, BinaryCross Entropy) between the output of the foreground segmentation network and the population foreground segmentation map generated from the point-labeled map is shown below.

Wherein L is_regAnd L_segRespectively representing the regression loss of the whole network and the segmentation loss of the foreground segmentation network branches, wherein N is the size of the training data batch, x_iRepresenting input image data, D_final(x_i) Representing the final output of the model, D_mask(x_i,Θ_seg,Θ_bkd) Foreground segmentation map, S, representing the branched output of a foreground segmentation network_gt(x_i) And D_gt(x_i) Respectively representing images x_iAnd a foreground segmentation graph and a crowd density graph generated by the corresponding point marking graph, wherein the Bce function is expressed as shown in a formula 3:

wherein,estimate output for model, y_iIs the target.

The model is trained using the SGD algorithm. And (3) performing end-to-end training on the whole network model by adopting a standard SGD optimization algorithm, and simultaneously optimizing the crowd density regression loss of the whole network, as shown in a formula (1), and the foreground segmentation loss of the foreground segmentation network branches, as shown in a formula (2). The batch size of the training data was set to 1, the learning rate was fixed at 1e-7, the momentum parameter (momentum) was set to 0.005, and the training was stopped when both losses converged. The criterion for convergence is that when the loss function is less than a certain value, it can be considered to be convergence. In the training process, the verification is carried out on the verification set every time an epoch is trained (all samples in the training set are trained once), and the model with the highest accuracy on the verification set is stored as a final model.

Further, the step 6 includes the following steps:

and 5, testing the input image by using the trained model in the step 5 to obtain a crowd density estimation result.

Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the invention provides a density estimation method based on a crowd foreground segmentation graph, a model predicts a foreground segmentation graph and a crowd density regression graph simultaneously, and the background false detection problem based on the density graph regression method can be effectively inhibited on the premise of not losing precision by combining the advantages of segmentation learning and density regression.

In addition, the foreground segmentation markers used in the invention do not need to be marked, can be directly obtained from the crowd density distribution map, and are simple and efficient. And also. The newly added foreground segmentation network branches and regression network branches in the framework share the characteristic extraction process of the backbone network, so that the calculated amount is reduced. The time for training and prediction is increased little compared to the original regression framework alone.

Drawings

FIG. 1 is a detailed flow diagram of the method of the present invention;

FIG. 2 is a schematic diagram of a crowd density estimation framework based on foreground segmentation;

FIG. 3 is a schematic diagram of a process for generating a crowd density map and a crowd foreground segmentation map from a point label map;

FIG. 4 is a graph showing the comparison of the present algorithm with the results of the regression network alone.

Detailed Description

The details of the present invention will be further described with reference to examples.

The invention provides a crowd density estimation method based on a foreground segmentation graph, which comprises the following steps of:

Further, the step 1 comprises the following steps:

Further, the step 2 comprises the following steps:

Further, the step 3 comprises the following steps:

Further, the step 4 comprises the following steps:

Branch of regression network: the design of the stacked cavity convolution layers is adopted, the number of the stacked layers is 6, the expansion rate of each cavity convolution layer is 2, and the size of a convolution kernel is 3. The parameter of regression network branch is represented by theta_regTo indicate.

D_final(x)＝D_mask(x,Θ_seg,Θ_bkd)*D_reg(x,Θ_reg,Θ_bkd)

Further, the step 5 comprises the following steps:

firstly, all pictures in the crowd image set obtain corresponding point marker maps according to step 1. Then, for each image x containing the crowd_iAnd its corresponding point markThe diagram proceeds as follows:

(2) calculating a normalized Gaussian kernel according to the sigma;

The model training process is as follows:

wherein,estimate output for model, y_iIs the target.

The model is trained using the SGD algorithm. And (3) performing end-to-end training on the whole network model by adopting a standard SGD optimization algorithm, and simultaneously optimizing the crowd density regression loss of the whole network, as shown in a formula (1), and the foreground segmentation loss of the foreground segmentation network branches, as shown in a formula (2). The batch size of the training data was set to 1, the learning rate was fixed at 1e-7, the momentum parameter (momentum) was set to 0.005, and the training was stopped when both losses converged. The criterion for convergence is that when the loss function is less than a certain value, it can be considered to be convergence. In the training process, the verification is carried out on the verification set every time an epoch is trained, and the model with the highest accuracy on the verification set is saved as the final model.

Further, the step 6 includes the following steps:

Claims

1. A crowd density estimation method based on a foreground segmentation graph is characterized by comprising the following steps:

and 4, step 4: respectively designing a neural network for extracting image features, a neural network branch for crowd density regression and a neural network branch for crowd foreground segmentation, and inputting an image into a backbone network to obtain image features; then, inputting the image characteristics into a regression network and a foreground segmentation network branch simultaneously to obtain a crowd density graph and a crowd foreground segmentation graph respectively, and finally fusing the outputs of the two network branches to obtain a final output;

and 5: making a data set by using the method in the step 1-3, and training the network model in the step 4;

2. The method for estimating the crowd density based on the foreground segmentation map as claimed in claim 1, wherein the labeling of the head of each image containing crowd obtains a point mark map by the following method: marking each crowd picture as a point mark, namely marking a point on each head of each person to represent one person, and obtaining a point mark picture, wherein the point mark picture is a gray scale picture with the channel number of 1, the size of the point mark picture is consistent with that of the crowd picture, the position value of the head of the person is 1, and the other points are 0.

3. The method for estimating the crowd density based on the foreground segmentation map as claimed in claim 1 or 2, wherein the crowd density map is obtained by using a gaussian smoothing method based on the point marker map, and the method comprises the following steps:

determining the size of a Gaussian kernel: determining the size of a Gaussian filter kernel according to the size distribution condition of human heads in the image;

performing Gaussian kernel convolution: and (3) performing convolution operation on the point labeled graph obtained in the step (1) to obtain a crowd density graph, and taking the crowd density graph as a learning target finally output by the whole network during network training.

4. The method according to claim 3, wherein the population density estimation method based on the foreground segmentation map is characterized in that the population foreground segmentation map is obtained by using a threshold segmentation method based on the population density map, and the method comprises the following steps:

determining a density map segmentation threshold: determining a segmentation threshold according to the distribution condition of the crowd density map;

obtaining a crowd prospect segmentation map: performing threshold segmentation on the crowd density image obtained in the step 2 by using the selected threshold, wherein the position of the crowd density image with the pixel value larger than the threshold belongs to a foreground area, and the pixel value of the corresponding position in the foreground segmentation image is set to be 1 to represent the foreground; and setting the pixel value of the corresponding position in the foreground segmentation image to be 0 to represent the background to obtain the foreground segmentation image of the crowd, wherein the position smaller than the threshold belongs to a background area, and the foreground segmentation image is used as a learning target of the foreground segmentation network branch during network training.

5. The method for estimating the crowd density based on the foreground segmentation map as claimed in claim 4, wherein in the step (4), the neural network for image feature extraction, the neural network branches for crowd density regression, and the neural network branches for crowd foreground segmentation are designed, and the method comprises the following steps:

neural networks for image feature extraction: adopting the front 10 layers of VGG16 network, including 3 maximum pooling layers and 7 convolution layers, each convolution layer has a convolution kernel size of 3, and the network parameter is theta_bkdRepresents;

neural network branches for population density regression: the design of stacked cavity convolution layers is adopted, the number of the stacked layers is 6, the expansion rate of each cavity convolution layer is 2, the size of a convolution kernel is 3, and the parameters of network branches are theta_regRepresents;

neural network branches for crowd-sourcing foreground segmentation: the design of stacked cavity convolution layers is adopted, the number of the stacked layers is 3, the expansion rate of the cavity convolution layers is 2, the size of a convolution kernel is 3, and the network branch parameter is theta_segRepresents;

the whole network structure: firstly inputting an original image containing crowds into a neural network for image feature extraction to obtain image features, then simultaneously inputting the image features into a crowd density regression neural network branch and a crowd foreground segmentation neural network branch to respectively obtain a crowd density graph and a crowd foreground segmentation graph, and finally fusing the results of the two network branches to obtain a final model output result, namely a crowd density estimation result, which is fused into a mask operation as shown in the following formula:

D_final(x)＝D_mask(x,Θ_seg,Θ_bkd)*D_reg(x,Θ_reg,Θ_bkd)

where x denotes input image data, D_mask(x,Θ_seg,Θ_bkd) Representing the output of the neural network branches of the segmentation of the crowd prospect, D_reg(x,Θ_reg,Θ_bkd) Representing the regression network branch output result, D_final(x) And representing the final output result of the model.

6. The crowd density estimation method based on the foreground segmentation graph as claimed in claim 5, wherein the method of steps 1-3 is used to make a data set, and the method of training the network model in step 4 is as follows:

(4.1) designing a loss function:

wherein L is_regAnd L_segRespectively representing the regression loss of the whole network and the segmentation loss of the foreground segmentation network branches, wherein N is the size of the training data batch, x_iRepresenting input image data, D_final(x_i) Representing the final output of the model, D_mask(x_i,Θ_seg,Θ_bkd) Foreground segmentation map, S, representing the branched output of a foreground segmentation network_gt(x_i) And D_gt(x_i) Respectively representing images x_iForeground of corresponding point marker map generationSegmentation graph and population density graph, Bce function is expressed as shown in formula 3:

wherein,estimate output for model, y_iIs a target;

and (4.2) performing end-to-end training on the whole network model by adopting a standard SGD optimization algorithm, setting a threshold, and stopping training when the loss function is smaller than the threshold in the training process to obtain the final whole network model.