CN110276264A - A kind of crowd density estimation method based on foreground segmentation figure - Google Patents

A kind of crowd density estimation method based on foreground segmentation figure Download PDF

Info

Publication number
CN110276264A
CN110276264A CN201910446452.5A CN201910446452A CN110276264A CN 110276264 A CN110276264 A CN 110276264A CN 201910446452 A CN201910446452 A CN 201910446452A CN 110276264 A CN110276264 A CN 110276264A
Authority
CN
China
Prior art keywords
crowd
segmentation
network
foreground segmentation
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910446452.5A
Other languages
Chinese (zh)
Other versions
CN110276264B (en
Inventor
徐浩
夏思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910446452.5A priority Critical patent/CN110276264B/en
Publication of CN110276264A publication Critical patent/CN110276264A/en
Application granted granted Critical
Publication of CN110276264B publication Critical patent/CN110276264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of crowd density estimation methods based on foreground segmentation figure, and the method includes the steps of: the number of people of every figure of mark obtains point label figure;It is desired to make money or profit based on a label and obtains crowd density figure with Gaussian smoothing method;Crowd's foreground segmentation figure is obtained using the method for Threshold segmentation based on density map;The neural network for image characteristics extraction is separately designed, for the neural network branch that crowd density returns, and for the neural network branch of crowd's foreground segmentation, finally merges the output of two network branches to obtain final output;Make training set, training pattern;Crowd density estimation result is obtained using trained model measurement input picture.The method of the present invention can allow neural network learning to foreground segmentation information without additional mark information, and the erroneous detection problem of complex background can be effectively avoided using this method.

Description

Crowd density estimation method based on foreground segmentation graph
Technical Field
The invention belongs to the field of computer vision and image processing, and particularly relates to a crowd density estimation method based on a foreground segmentation map.
Background
Given a picture, the total number of people and the spatial distribution of people in the image are detected by using a computer vision technology, and the task of estimating the crowd density is realized. The existing crowd density estimation method mainly comprises two modes of pedestrian detection based on target detection, a head and shoulder detection method and density map regression based. The target detection-based method converts crowd density estimation into a pedestrian detection problem, and finally counts the detected targets to obtain the final number of people. The method has good detection effect on sparse people, and for high-density areas, the detection accuracy rate can be sharply reduced due to factors such as shielding and dimension. The method based on density map regression is characterized in that image blocks are regarded as a whole, the number of people in the region is directly regressed through local image features, the problems of target shielding, undersize and the like can be effectively solved, the regression accuracy of the high-density crowd image is high, the problem is that the effect on sparse crowds is poor, background false detection is easily caused, and the position of a specific target cannot be obtained.
The invention provides a novel crowd density estimation method based on a foreground segmentation graph, which can effectively inhibit the problem of false detection of the background, improve the stability of the algorithm and improve the precision of the algorithm.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems, the invention provides a crowd density estimation method based on a foreground segmentation graph, which can predict the foreground segmentation graph and the crowd density graph at the same time, and finally, the foreground segmentation graph and the crowd density graph are fused to effectively remove false detection of the background.
The technical scheme is as follows: in order to achieve the purpose of the invention, the invention provides a crowd density estimation method based on a foreground segmentation graph, which comprises the following steps:
step 1: marking the head of each image containing the crowd to obtain a point marking map;
step 2: obtaining a crowd density map by using a Gaussian smoothing method based on the point marker map;
and step 3: obtaining a crowd prospect segmentation map by using a threshold segmentation method based on the crowd density map;
and 4, step 4: respectively designing a neural network (backbone network) for extracting image features, a neural network branch (regression network branch) for crowd density regression and a neural network branch (foreground segmentation network branch) for crowd foreground segmentation, and inputting an image into the backbone network to obtain image features; then, inputting the image characteristics into a regression network and a foreground segmentation network branch simultaneously to obtain a crowd density graph and a crowd foreground segmentation graph respectively, and finally fusing the outputs of the two network branches to obtain a final output;
and 5: making a data set by using the method in the step 1-3, and training a network model;
step 6: and testing the input image by using the trained model to obtain a crowd density estimation result.
Further, the step 1 comprises the following steps:
and marking each crowd picture in the training set as a point mark, namely marking a point on each head of each person to represent one person. And obtaining a point mark graph, wherein the point mark graph is a gray graph with the channel number of 1, the size of the point mark graph is consistent with the size of the crowd picture, the numerical value of the position with the head is 1, and the other positions are 0.
Further, the step 2 comprises the following steps:
determining the size of a Gaussian kernel: and observing the size distribution condition of the human head of the data set, and determining the size of the Gaussian filter kernel, wherein the optimal size of the Gaussian kernel is consistent with the size of the human head.
Performing Gaussian kernel convolution: and (3) performing convolution operation on the point labeled graph obtained in the step (1) to obtain a crowd density graph, and taking the crowd density graph as a learning target finally output by the whole network during network training. Referring to fig. 2, the higher the population density, the brighter the color, where the gray value of the population density map represents the population density.
Further, the step 3 comprises the following steps:
determining a density map segmentation threshold: determining a segmentation threshold according to the distribution condition of the crowd density map; by analyzing the crowd density map obtained in the step 2, it can be obviously seen that the crowd region is approximately positioned in the region of the crowd density map where the pixel value is higher than a certain threshold value, so that a threshold value can be selected to perform threshold segmentation on the crowd density map, and a crowd foreground segmentation map is obtained.
Obtaining a crowd prospect segmentation map: and (3) carrying out threshold segmentation on the crowd density graph obtained in the step (2) by using the selected threshold, wherein the position with the pixel value larger than the threshold in the crowd density graph belongs to the foreground region, the value of the corresponding position in the foreground segmentation graph is set to be 1, the position smaller than the threshold belongs to the background region, the value of the corresponding position in the foreground segmentation graph is set to be 0, the foreground segmentation graph of the crowd is obtained, and the obtained foreground segmentation graph is used as a learning target of the foreground segmentation network branch during network training.
Further, the step 4 comprises the following steps:
backbone network: the first 10 layers of the VGG16 network are adopted, the maximum pooling layer comprises 3 layers and 7 convolution layers, and the size of each convolution kernel is 3. The parameters of the backbone network are represented by thetabkdTo indicate.
Branching a regression network: the design of the stacked cavity convolution layers is adopted, the number of the stacked layers is 6, the expansion rate of each cavity convolution layer is 2, and the size of a convolution kernel is 3. The parameter of regression network branch is represented by thetaregTo indicate.
Foreground segmentation network branching: by adopting a stacked cavity convolution design, the number of stacked layers is 3, the expansion rate of the cavity convolution layer is 2, and the size of a convolution kernel is 3. Splitting network branching parameters using ΘsegTo indicate.
The integral structure is as follows: firstly inputting an original image containing the crowd into a backbone network to obtain image characteristics, then simultaneously inputting the image characteristics into a regression network and a foreground segmentation network branch to respectively obtain a crowd density graph and a crowd foreground segmentation graph, and finally fusing the results of the two network branches to obtain a final model output result, namely a crowd density estimation result. The fusion is a masking operation, as shown in the following equation:
Dfinal(x)=Dmask(x,Θsegbkd)*Dreg(x,Θregbkd)
where x denotes input image data, Dmask(x,Θsegbkd) Representing the crowd prospect segmented network branch output result, Dreg(x,Θregbkd) Representing the regression network branch output result, Dfinal(x) And representing the final output result of the model.
Further, the step 5 comprises the following steps:
and (4) acquiring a data set by using the method in the step (1-3), training the neural network model designed in the step (4) by using the data set, and adopting an SGD algorithm in the training method.
Referring to fig. 2, the process of making the data set may be described as the following process:
firstly, all pictures in the crowd image set obtain corresponding point marker maps according to step 1. Then, for each image x containing the crowdiAnd the corresponding point mark map is used for carrying out the following processes:
(1) determining the size sigma of a Gaussian kernel according to the overall size distribution of the data set human head;
(2) calculating a normalized Gaussian kernel according to the sigma;
(3) convolving the point mark map by using a Gaussian kernel to obtain a crowd density map Dgt(xi);
(4) The crowd foreground segmentation graph S can be obtained by carrying out threshold processing on the crowd density distribution graph obtained in the process (3)gt(xi)。
The model training process is as follows:
and (4) designing a loss function. The loss function contains two parts: the minimum Mean Square Error loss (MSE, Mean Square Error) between the final model output result and the crowd density graph generated by the point labeled graph is shown as a formula (1); the two-class cross Entropy loss (Bce, BinaryCross Entropy) between the output of the foreground segmentation network and the population foreground segmentation map generated from the point-labeled map is shown below.
Wherein L isregAnd LsegRespectively representing the regression loss of the whole network and the segmentation loss of the foreground segmentation network branches, wherein N is the size of the training data batch, xiRepresenting input image data, Dfinal(xi) Representing the final output of the model, Dmask(xisegbkd) Foreground segmentation map, S, representing the branched output of a foreground segmentation networkgt(xi) And Dgt(xi) Respectively representing images xiAnd a foreground segmentation graph and a crowd density graph generated by the corresponding point marking graph, wherein the Bce function is expressed as shown in a formula 3:
wherein,estimate output for model, yiIs the target.
The model is trained using the SGD algorithm. And (3) performing end-to-end training on the whole network model by adopting a standard SGD optimization algorithm, and simultaneously optimizing the crowd density regression loss of the whole network, as shown in a formula (1), and the foreground segmentation loss of the foreground segmentation network branches, as shown in a formula (2). The batch size of the training data was set to 1, the learning rate was fixed at 1e-7, the momentum parameter (momentum) was set to 0.005, and the training was stopped when both losses converged. The criterion for convergence is that when the loss function is less than a certain value, it can be considered to be convergence. In the training process, the verification is carried out on the verification set every time an epoch is trained (all samples in the training set are trained once), and the model with the highest accuracy on the verification set is stored as a final model.
Further, the step 6 includes the following steps:
and 5, testing the input image by using the trained model in the step 5 to obtain a crowd density estimation result.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the invention provides a density estimation method based on a crowd foreground segmentation graph, a model predicts a foreground segmentation graph and a crowd density regression graph simultaneously, and the background false detection problem based on the density graph regression method can be effectively inhibited on the premise of not losing precision by combining the advantages of segmentation learning and density regression.
In addition, the foreground segmentation markers used in the invention do not need to be marked, can be directly obtained from the crowd density distribution map, and are simple and efficient. And also. The newly added foreground segmentation network branches and regression network branches in the framework share the characteristic extraction process of the backbone network, so that the calculated amount is reduced. The time for training and prediction is increased little compared to the original regression framework alone.
Drawings
FIG. 1 is a detailed flow diagram of the method of the present invention;
FIG. 2 is a schematic diagram of a crowd density estimation framework based on foreground segmentation;
FIG. 3 is a schematic diagram of a process for generating a crowd density map and a crowd foreground segmentation map from a point label map;
FIG. 4 is a graph showing the comparison of the present algorithm with the results of the regression network alone.
Detailed Description
The details of the present invention will be further described with reference to examples.
The invention provides a crowd density estimation method based on a foreground segmentation graph, which comprises the following steps of:
step 1: marking the head of each image containing the crowd to obtain a point marking map;
step 2: obtaining a crowd density map by using a Gaussian smoothing method based on the point marker map;
and step 3: obtaining a crowd prospect segmentation map by using a threshold segmentation method based on the crowd density map;
and 4, step 4: respectively designing a neural network (backbone network) for extracting image features, a neural network branch (regression network branch) for crowd density regression and a neural network branch (foreground segmentation network branch) for crowd foreground segmentation, and inputting an image into the backbone network to obtain image features; then, inputting the image characteristics into a regression network and a foreground segmentation network branch simultaneously to obtain a crowd density graph and a crowd foreground segmentation graph respectively, and finally fusing the outputs of the two network branches to obtain a final output;
and 5: making a data set by using the method in the step 1-3, and training a network model;
step 6: and testing the input image by using the trained model to obtain a crowd density estimation result.
Further, the step 1 comprises the following steps:
and marking each crowd picture in the training set as a point mark, namely marking a point on each head of each person to represent one person. And obtaining a point mark graph, wherein the point mark graph is a gray graph with the channel number of 1, the size of the point mark graph is consistent with the size of the crowd picture, the numerical value of the position with the head is 1, and the other positions are 0.
Further, the step 2 comprises the following steps:
determining the size of a Gaussian kernel: and observing the size distribution condition of the human head of the data set, and determining the size of the Gaussian filter kernel, wherein the optimal size of the Gaussian kernel is consistent with the size of the human head.
Performing Gaussian kernel convolution: and (3) performing convolution operation on the point labeled graph obtained in the step (1) to obtain a crowd density graph, and taking the crowd density graph as a learning target finally output by the whole network during network training. Referring to fig. 2, the higher the population density, the brighter the color, where the gray value of the population density map represents the population density.
Further, the step 3 comprises the following steps:
determining a density map segmentation threshold: determining a segmentation threshold according to the distribution condition of the crowd density map; by analyzing the crowd density map obtained in the step 2, it can be obviously seen that the crowd region is approximately positioned in the region of the crowd density map where the pixel value is higher than a certain threshold value, so that a threshold value can be selected to perform threshold segmentation on the crowd density map, and a crowd foreground segmentation map is obtained.
Obtaining a crowd prospect segmentation map: and (3) carrying out threshold segmentation on the crowd density graph obtained in the step (2) by using the selected threshold, wherein the position with the pixel value larger than the threshold in the crowd density graph belongs to the foreground region, the value of the corresponding position in the foreground segmentation graph is set to be 1, the position smaller than the threshold belongs to the background region, the value of the corresponding position in the foreground segmentation graph is set to be 0, the foreground segmentation graph of the crowd is obtained, and the obtained foreground segmentation graph is used as a learning target of the foreground segmentation network branch during network training.
Further, the step 4 comprises the following steps:
backbone network: the first 10 layers of the VGG16 network are adopted, the maximum pooling layer comprises 3 layers and 7 convolution layers, and the size of each convolution kernel is 3. The parameters of the backbone network are represented by thetabkdTo indicate.
Branch of regression network: the design of the stacked cavity convolution layers is adopted, the number of the stacked layers is 6, the expansion rate of each cavity convolution layer is 2, and the size of a convolution kernel is 3. The parameter of regression network branch is represented by thetaregTo indicate.
Foreground segmentation network branching: by adopting a stacked cavity convolution design, the number of stacked layers is 3, the expansion rate of the cavity convolution layer is 2, and the size of a convolution kernel is 3. Splitting network branching parameters using ΘsegTo indicate.
The integral structure is as follows: firstly inputting an original image containing the crowd into a backbone network to obtain image characteristics, then simultaneously inputting the image characteristics into a regression network and a foreground segmentation network branch to respectively obtain a crowd density graph and a crowd foreground segmentation graph, and finally fusing the results of the two network branches to obtain a final model output result, namely a crowd density estimation result. The fusion is a masking operation, as shown in the following equation:
Dfinal(x)=Dmask(x,Θsegbkd)*Dreg(x,Θregbkd)
where x denotes input image data, Dmask(x,Θsegbkd) Representing the crowd prospect segmented network branch output result, Dreg(x,Θregbkd) Representing the regression network branch output result, Dfinal(x) And representing the final output result of the model.
Further, the step 5 comprises the following steps:
and (4) acquiring a data set by using the method in the step (1-3), training the neural network model designed in the step (4) by using the data set, and adopting an SGD algorithm in the training method.
Referring to fig. 2, the process of making the data set may be described as the following process:
firstly, all pictures in the crowd image set obtain corresponding point marker maps according to step 1. Then, for each image x containing the crowdiAnd its corresponding point markThe diagram proceeds as follows:
(1) determining the size sigma of a Gaussian kernel according to the overall size distribution of the data set human head;
(2) calculating a normalized Gaussian kernel according to the sigma;
(3) convolving the point mark map by using a Gaussian kernel to obtain a crowd density map Dgt(xi);
(4) The crowd foreground segmentation graph S can be obtained by carrying out threshold processing on the crowd density distribution graph obtained in the process (3)gt(xi)。
The model training process is as follows:
and (4) designing a loss function. The loss function contains two parts: the minimum Mean Square Error loss (MSE, Mean Square Error) between the final model output result and the crowd density graph generated by the point labeled graph is shown as a formula (1); the two-class cross Entropy loss (Bce, BinaryCross Entropy) between the output of the foreground segmentation network and the population foreground segmentation map generated from the point-labeled map is shown below.
Wherein L isregAnd LsegRespectively representing the regression loss of the whole network and the segmentation loss of the foreground segmentation network branches, wherein N is the size of the training data batch, xiRepresenting input image data, Dfinal(xi) Representing the final output of the model, Dmask(xisegbkd) Foreground segmentation map, S, representing the branched output of a foreground segmentation networkgt(xi) And Dgt(xi) Respectively representing images xiAnd a foreground segmentation graph and a crowd density graph generated by the corresponding point marking graph, wherein the Bce function is expressed as shown in a formula 3:
wherein,estimate output for model, yiIs the target.
The model is trained using the SGD algorithm. And (3) performing end-to-end training on the whole network model by adopting a standard SGD optimization algorithm, and simultaneously optimizing the crowd density regression loss of the whole network, as shown in a formula (1), and the foreground segmentation loss of the foreground segmentation network branches, as shown in a formula (2). The batch size of the training data was set to 1, the learning rate was fixed at 1e-7, the momentum parameter (momentum) was set to 0.005, and the training was stopped when both losses converged. The criterion for convergence is that when the loss function is less than a certain value, it can be considered to be convergence. In the training process, the verification is carried out on the verification set every time an epoch is trained, and the model with the highest accuracy on the verification set is saved as the final model.
Further, the step 6 includes the following steps:
and 5, testing the input image by using the trained model in the step 5 to obtain a crowd density estimation result.

Claims (6)

1. A crowd density estimation method based on a foreground segmentation graph is characterized by comprising the following steps:
step 1: marking the head of each image containing the crowd to obtain a point marking map;
step 2: obtaining a crowd density map by using a Gaussian smoothing method based on the point marker map;
and step 3: obtaining a crowd prospect segmentation map by using a threshold segmentation method based on the crowd density map;
and 4, step 4: respectively designing a neural network for extracting image features, a neural network branch for crowd density regression and a neural network branch for crowd foreground segmentation, and inputting an image into a backbone network to obtain image features; then, inputting the image characteristics into a regression network and a foreground segmentation network branch simultaneously to obtain a crowd density graph and a crowd foreground segmentation graph respectively, and finally fusing the outputs of the two network branches to obtain a final output;
and 5: making a data set by using the method in the step 1-3, and training the network model in the step 4;
step 6: and testing the input image by using the trained model to obtain a crowd density estimation result.
2. The method for estimating the crowd density based on the foreground segmentation map as claimed in claim 1, wherein the labeling of the head of each image containing crowd obtains a point mark map by the following method: marking each crowd picture as a point mark, namely marking a point on each head of each person to represent one person, and obtaining a point mark picture, wherein the point mark picture is a gray scale picture with the channel number of 1, the size of the point mark picture is consistent with that of the crowd picture, the position value of the head of the person is 1, and the other points are 0.
3. The method for estimating the crowd density based on the foreground segmentation map as claimed in claim 1 or 2, wherein the crowd density map is obtained by using a gaussian smoothing method based on the point marker map, and the method comprises the following steps:
determining the size of a Gaussian kernel: determining the size of a Gaussian filter kernel according to the size distribution condition of human heads in the image;
performing Gaussian kernel convolution: and (3) performing convolution operation on the point labeled graph obtained in the step (1) to obtain a crowd density graph, and taking the crowd density graph as a learning target finally output by the whole network during network training.
4. The method according to claim 3, wherein the population density estimation method based on the foreground segmentation map is characterized in that the population foreground segmentation map is obtained by using a threshold segmentation method based on the population density map, and the method comprises the following steps:
determining a density map segmentation threshold: determining a segmentation threshold according to the distribution condition of the crowd density map;
obtaining a crowd prospect segmentation map: performing threshold segmentation on the crowd density image obtained in the step 2 by using the selected threshold, wherein the position of the crowd density image with the pixel value larger than the threshold belongs to a foreground area, and the pixel value of the corresponding position in the foreground segmentation image is set to be 1 to represent the foreground; and setting the pixel value of the corresponding position in the foreground segmentation image to be 0 to represent the background to obtain the foreground segmentation image of the crowd, wherein the position smaller than the threshold belongs to a background area, and the foreground segmentation image is used as a learning target of the foreground segmentation network branch during network training.
5. The method for estimating the crowd density based on the foreground segmentation map as claimed in claim 4, wherein in the step (4), the neural network for image feature extraction, the neural network branches for crowd density regression, and the neural network branches for crowd foreground segmentation are designed, and the method comprises the following steps:
neural networks for image feature extraction: adopting the front 10 layers of VGG16 network, including 3 maximum pooling layers and 7 convolution layers, each convolution layer has a convolution kernel size of 3, and the network parameter is thetabkdRepresents;
neural network branches for population density regression: the design of stacked cavity convolution layers is adopted, the number of the stacked layers is 6, the expansion rate of each cavity convolution layer is 2, the size of a convolution kernel is 3, and the parameters of network branches are thetaregRepresents;
neural network branches for crowd-sourcing foreground segmentation: the design of stacked cavity convolution layers is adopted, the number of the stacked layers is 3, the expansion rate of the cavity convolution layers is 2, the size of a convolution kernel is 3, and the network branch parameter is thetasegRepresents;
the whole network structure: firstly inputting an original image containing crowds into a neural network for image feature extraction to obtain image features, then simultaneously inputting the image features into a crowd density regression neural network branch and a crowd foreground segmentation neural network branch to respectively obtain a crowd density graph and a crowd foreground segmentation graph, and finally fusing the results of the two network branches to obtain a final model output result, namely a crowd density estimation result, which is fused into a mask operation as shown in the following formula:
Dfinal(x)=Dmask(x,Θsegbkd)*Dreg(x,Θregbkd)
where x denotes input image data, Dmask(x,Θsegbkd) Representing the output of the neural network branches of the segmentation of the crowd prospect, Dreg(x,Θregbkd) Representing the regression network branch output result, Dfinal(x) And representing the final output result of the model.
6. The crowd density estimation method based on the foreground segmentation graph as claimed in claim 5, wherein the method of steps 1-3 is used to make a data set, and the method of training the network model in step 4 is as follows:
(4.1) designing a loss function:
wherein L isregAnd LsegRespectively representing the regression loss of the whole network and the segmentation loss of the foreground segmentation network branches, wherein N is the size of the training data batch, xiRepresenting input image data, Dfinal(xi) Representing the final output of the model, Dmask(xisegbkd) Foreground segmentation map, S, representing the branched output of a foreground segmentation networkgt(xi) And Dgt(xi) Respectively representing images xiForeground of corresponding point marker map generationSegmentation graph and population density graph, Bce function is expressed as shown in formula 3:
wherein,estimate output for model, yiIs a target;
and (4.2) performing end-to-end training on the whole network model by adopting a standard SGD optimization algorithm, setting a threshold, and stopping training when the loss function is smaller than the threshold in the training process to obtain the final whole network model.
CN201910446452.5A 2019-05-27 2019-05-27 Crowd density estimation method based on foreground segmentation graph Active CN110276264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910446452.5A CN110276264B (en) 2019-05-27 2019-05-27 Crowd density estimation method based on foreground segmentation graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910446452.5A CN110276264B (en) 2019-05-27 2019-05-27 Crowd density estimation method based on foreground segmentation graph

Publications (2)

Publication Number Publication Date
CN110276264A true CN110276264A (en) 2019-09-24
CN110276264B CN110276264B (en) 2023-04-07

Family

ID=67960149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910446452.5A Active CN110276264B (en) 2019-05-27 2019-05-27 Crowd density estimation method based on foreground segmentation graph

Country Status (1)

Country Link
CN (1) CN110276264B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507183A (en) * 2020-03-11 2020-08-07 杭州电子科技大学 Crowd counting method based on multi-scale density map fusion cavity convolution
CN111582778A (en) * 2020-04-17 2020-08-25 上海中通吉网络技术有限公司 Operation site cargo accumulation measuring method, device, equipment and storage medium
CN111652168A (en) * 2020-06-09 2020-09-11 腾讯科技(深圳)有限公司 Group detection method, device and equipment based on artificial intelligence and storage medium
CN111723664A (en) * 2020-05-19 2020-09-29 烟台市广智微芯智能科技有限责任公司 Pedestrian counting method and system for open type area
CN112001274A (en) * 2020-08-06 2020-11-27 腾讯科技(深圳)有限公司 Crowd density determination method, device, storage medium and processor
CN112115862A (en) * 2020-09-18 2020-12-22 广东机场白云信息科技有限公司 Crowded scene pedestrian detection method combined with density estimation
CN112802020A (en) * 2021-04-06 2021-05-14 中国空气动力研究与发展中心计算空气动力研究所 Infrared dim target detection method based on image inpainting and background estimation
CN112801063A (en) * 2021-04-12 2021-05-14 广东众聚人工智能科技有限公司 Neural network system and image crowd counting method based on neural network system
CN113239743A (en) * 2021-04-23 2021-08-10 普联国际有限公司 Crowd density detection method, device, equipment and storage medium
CN113378608A (en) * 2020-03-10 2021-09-10 顺丰科技有限公司 Crowd counting method, device, equipment and storage medium
CN113515990A (en) * 2020-09-28 2021-10-19 阿里巴巴集团控股有限公司 Image processing and crowd density estimation method, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447458A (en) * 2015-11-17 2016-03-30 深圳市商汤科技有限公司 Large scale crowd video analysis system and method thereof
CN108876774A (en) * 2018-06-07 2018-11-23 浙江大学 A kind of people counting method based on convolutional neural networks
US20190147584A1 (en) * 2017-11-15 2019-05-16 NEC Laboratories Europe GmbH System and method for single image object density estimation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447458A (en) * 2015-11-17 2016-03-30 深圳市商汤科技有限公司 Large scale crowd video analysis system and method thereof
US20190147584A1 (en) * 2017-11-15 2019-05-16 NEC Laboratories Europe GmbH System and method for single image object density estimation
CN108876774A (en) * 2018-06-07 2018-11-23 浙江大学 A kind of people counting method based on convolutional neural networks

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378608A (en) * 2020-03-10 2021-09-10 顺丰科技有限公司 Crowd counting method, device, equipment and storage medium
CN113378608B (en) * 2020-03-10 2024-04-19 顺丰科技有限公司 Crowd counting method, device, equipment and storage medium
CN111507183A (en) * 2020-03-11 2020-08-07 杭州电子科技大学 Crowd counting method based on multi-scale density map fusion cavity convolution
CN111582778A (en) * 2020-04-17 2020-08-25 上海中通吉网络技术有限公司 Operation site cargo accumulation measuring method, device, equipment and storage medium
CN111582778B (en) * 2020-04-17 2024-04-12 上海中通吉网络技术有限公司 Method, device, equipment and storage medium for measuring accumulation of cargos in operation site
CN111723664A (en) * 2020-05-19 2020-09-29 烟台市广智微芯智能科技有限责任公司 Pedestrian counting method and system for open type area
CN111652168A (en) * 2020-06-09 2020-09-11 腾讯科技(深圳)有限公司 Group detection method, device and equipment based on artificial intelligence and storage medium
CN111652168B (en) * 2020-06-09 2023-09-08 腾讯科技(深圳)有限公司 Group detection method, device, equipment and storage medium based on artificial intelligence
CN112001274A (en) * 2020-08-06 2020-11-27 腾讯科技(深圳)有限公司 Crowd density determination method, device, storage medium and processor
CN112001274B (en) * 2020-08-06 2023-11-17 腾讯科技(深圳)有限公司 Crowd density determining method, device, storage medium and processor
CN112115862B (en) * 2020-09-18 2023-08-29 广东机场白云信息科技有限公司 Congestion scene pedestrian detection method combined with density estimation
CN112115862A (en) * 2020-09-18 2020-12-22 广东机场白云信息科技有限公司 Crowded scene pedestrian detection method combined with density estimation
CN113515990A (en) * 2020-09-28 2021-10-19 阿里巴巴集团控股有限公司 Image processing and crowd density estimation method, device and storage medium
CN112802020A (en) * 2021-04-06 2021-05-14 中国空气动力研究与发展中心计算空气动力研究所 Infrared dim target detection method based on image inpainting and background estimation
CN112801063A (en) * 2021-04-12 2021-05-14 广东众聚人工智能科技有限公司 Neural network system and image crowd counting method based on neural network system
CN113239743A (en) * 2021-04-23 2021-08-10 普联国际有限公司 Crowd density detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110276264B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110276264B (en) Crowd density estimation method based on foreground segmentation graph
CN114782691B (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
CN108319972B (en) End-to-end difference network learning method for image semantic segmentation
CN110163110B (en) Pedestrian re-recognition method based on transfer learning and depth feature fusion
CN108550161B (en) Scale self-adaptive kernel-dependent filtering rapid target tracking method
CN103971386B (en) A kind of foreground detection method under dynamic background scene
CN105139395B (en) SAR image segmentation method based on small echo pond convolutional neural networks
CN108198201A (en) A kind of multi-object tracking method, terminal device and storage medium
CN110119728A (en) Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN102999901B (en) Based on the processing method after the Online Video segmentation of depth transducer and system
WO2019071976A1 (en) Panoramic image saliency detection method based on regional growth and eye movement model
CN106709901B (en) Simulation mist drawing generating method based on depth priori
CN104978567B (en) Vehicle checking method based on scene classification
CN108960404B (en) Image-based crowd counting method and device
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN106815563B (en) Human body apparent structure-based crowd quantity prediction method
CN107657625A (en) Merge the unsupervised methods of video segmentation that space-time multiple features represent
CN106981068A (en) A kind of interactive image segmentation method of joint pixel pait and super-pixel
CN109919053A (en) A kind of deep learning vehicle parking detection method based on monitor video
CN109948593A (en) Based on the MCNN people counting method for combining global density feature
CN104657980A (en) Improved multi-channel image partitioning algorithm based on Meanshift
CN116258608B (en) Water conservancy real-time monitoring information management system integrating GIS and BIM three-dimensional technology
CN106991686A (en) A kind of level set contour tracing method based on super-pixel optical flow field
CN114677323A (en) Semantic vision SLAM positioning method based on target detection in indoor dynamic scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant