CN111815665B - Single image crowd counting method based on depth information and scale perception information - Google Patents

Single image crowd counting method based on depth information and scale perception information Download PDF

Info

Publication number
CN111815665B
CN111815665B CN202010662406.1A CN202010662406A CN111815665B CN 111815665 B CN111815665 B CN 111815665B CN 202010662406 A CN202010662406 A CN 202010662406A CN 111815665 B CN111815665 B CN 111815665B
Authority
CN
China
Prior art keywords
density
density map
map
depth
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010662406.1A
Other languages
Chinese (zh)
Other versions
CN111815665A (en
Inventor
田玲
朱大勇
张栗粽
罗光春
邬丹丹
董文琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010662406.1A priority Critical patent/CN111815665B/en
Publication of CN111815665A publication Critical patent/CN111815665A/en
Application granted granted Critical
Publication of CN111815665B publication Critical patent/CN111815665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Abstract

The invention relates to a computer vision technology, and discloses a single image crowd counting method based on depth information and scale perception information, which improves the prediction capability and reduces the calculation complexity. The method comprises the following steps: s1, carrying out Gaussian mapping on head center coordinate data corresponding to an input sample picture to generate a preliminary truth density map, and correcting the preliminary truth density map based on depth information obtained by a depth estimation algorithm to obtain a truth density map; s2, predicting the crowd density map of the input sample picture by adopting a density estimation network to generate a predicted density map, calculating loss errors according to the predicted density map and the true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration; and S3, when counting the crowd of a single image, generating a predicted density map of the image by using a density prediction model, and calculating to obtain the total number of people in the image.

Description

Single image crowd counting method based on depth information and scale perception information
Technical Field
The invention relates to a computer vision technology, in particular to a single image crowd counting method based on depth information and scale perception information.
Background
The crowd counting aims to output a crowd density graph corresponding to a picture after the picture is input and processed by a network model, and finally, the probability of the number of people corresponding to each pixel on the density graph is summed to obtain the final total number of people. The crowd counting task is challenging due to problems of occlusion, view angle change, crowd size change, and distribution diversity.
In the early methods, each pedestrian in the crowd was mainly located by a target detector, and the number of detected targets was the counting result. However, these methods use manual features for classifier training and perform poorly in highly crowded scenes. In order to solve the problem of counting crowds in a complex scene, a crowd density graph is generated by using a convolutional neural network, and counting performance is improved by capturing scale change.
In 2016, zhang et al proposed an MCNN algorithm to cope with scale changes, which consisted of three branch networks, each of which sampled features using different sized receptive fields. For a given picture, three branch network processes are respectively carried out, channel fusion is carried out on the obtained result, and finally a final density graph is obtained through 1*1 convolution. But since this design involves only convolution of three different scales, each class can only serve a certain density level. However, there are dense variations and uneven population distribution in the actual scene, and it is not possible to strictly classify the population pictures into which category, so the effectiveness of the MCNN algorithm is limited by the number of branches.
Cao et al proposed an SANet algorithm to improve the scale-aware structure in 2018, integrated scale information using an inclusion structure, and performing convolution operation on each convolution layer using a plurality of convolution kernels, fusing each part of information, and fully sharing information from the bottom layer to the top layer. The network comprises four inclusion structures, and scale reduction is performed by using transposed convolution after each inclusion structure, so that the generated density map has the same size as an input density map, and pixel-level supervision can be performed. However, in the crowd counting scene, the pedestrians far away in the image appear as small targets under the influence of the camera angle. Such small objects are numerous in images and are the main subject of investigation. Although multi-scale information can be integrated by using the inclusion structure, with the forward transmission of a network, features are highly abstracted, and detail features of a small target are lost, so that the prediction capability of the final small target is reduced. In addition, the scale reduction is carried out by using the transposition convolution, the calculation complexity is high, and the performance of the method has no outstanding advantages in a certain training batch range.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a single image crowd counting method based on depth information and scale perception information is provided, prediction capability is improved, and calculation complexity is reduced.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the single image crowd counting method based on the depth information and the scale perception information comprises the following steps:
s1, carrying out Gaussian mapping on head center coordinate data corresponding to an input sample picture to generate a preliminary truth density map, and correcting the preliminary truth density map based on depth information obtained by a depth estimation algorithm to obtain a truth density map;
s2, predicting a crowd density map of the input sample picture by adopting a density estimation network to generate a predicted density map, calculating loss errors according to the predicted density map and a true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration;
and S3, when counting the crowd of a single image, generating a predicted density map of the image by using a density prediction model, and calculating to obtain the total number of people in the image.
As a further optimization, step S1 specifically includes:
s11, carrying out Gaussian distribution mapping on the head coordinate points in the sample picture label data by using a Gaussian kernel with a fixed size, and superposing mapping values at all positions of an image to form a primary true-value density graph F 1 (x);
S12, carrying out Gaussian distribution mapping on the head coordinate points in the sample picture label data by using a geometric self-adaptive Gaussian kernel, and superposing mapping values at all positions of the image to form a primary true-value density graph F 2 (x);
S13, extracting Depth information of each pixel position in an input sample picture by adopting a monocular Depth estimation algorithm to form a Depth estimation graph Depth (x);
s14, determining a final truth density map by using a threshold segmentation algorithm based on the information of the Depth estimation map Depth (x):
Figure BDA0002579100810000021
where δ is a preset segmentation threshold, F 1 (i, j) represents the preliminary truth density plot F 1 (x) Value, F, corresponding to the middle coordinate (i, j) 2 (i, j) represents the preliminary truth density plot F 2 (x) The middle coordinate (i, j) corresponds to a value, depth (i, j) represents a Depth value at the coordinate (i, j) in Depth (x), and M (i, j) represents a value corresponding to the coordinate (i, j) of the final truth density map.
As a further optimization, in step S2, the density estimation network includes: the system comprises a basic feature extraction module, a multi-scale capture module and a scale transfer module; the basic feature extraction module is used for extracting low-level features such as textures of the pictures; the multi-scale capturing module is used for further extracting picture features, fusing multi-scale information and storing detailed features of small targets; the scale transfer module is used for scale reduction of the feature map and lifting the feature map to the size of the input picture.
As a further optimization, the basic feature extraction module is composed of a convolutional layer before conv4_3 in the VGG16 network; the multi-scale capturing module adopts four dense connecting layers, each layer uses a 3 multiplied by 3 convolution kernel to extract features, the resolution of a feature map is kept unchanged by using edge filling, and the growth rate of convolution is set to be 256; the scale transfer module adopts sub-pixel convolution to restore the scale of the feature map and increase the resolution of the feature map to the size of the input picture.
As a further optimization, in step S2, the calculating of the loss error according to the predicted density map and the true density map specifically includes:
and measuring the error between the predicted density graph and the true density graph by using Euclidean distance as a loss function, wherein the expression is as follows:
Figure BDA0002579100810000031
wherein, F (X) i (ii) a Theta) is a predicted density map of the network output, theta represents a learning parameter in the network, X i The ith picture that represents the input is shown,
Figure BDA0002579100810000032
the truth density plot of the ith picture is shown, and N is the number of training pictures.
The invention has the beneficial effects that:
(1) The supervision information is more accurate:
the invention utilizes depth information to guide the generation of the truth density map, and the generated truth density map is more accurate than the truth density map generated by the traditional single mode. The information is used for guiding network training, and the predicted density graph is closer to a true value.
(2) A wide range of dimensional changes can be captured:
the invention constructs the multi-scale capture module suitable for the current scene by utilizing dense connection, fuses multi-scale information and reserves more detailed characteristics of small targets, thereby being beneficial to improving the prediction performance of the network on the multi-scale targets.
(3) Scale reduction is performed with low computational complexity:
the invention utilizes the sub-pixel convolution module to carry out scale reduction, avoids the problem that bilinear interpolation upsampling is used to ignore the self characteristics of the image, and simultaneously avoids the computational complexity of using transposition convolution upsampling.
Drawings
FIG. 1 is a flow chart of a crowd counting algorithm based on depth information and scale perception information according to the present invention;
FIG. 2 is a diagram of a process for generating a truth density map;
FIG. 3 is a diagram of a process for generating a predicted density map for population counting by a density estimation network.
Detailed Description
The invention aims to provide a method for counting the population of a single image based on depth information and scale perception information, which improves the prediction capability and reduces the calculation complexity. The core idea is as follows: (1) training a prediction model: firstly, generating a preliminary true density map, then, correcting the preliminary true density map based on depth information obtained by a depth estimation algorithm so as to obtain a true density map, wherein the true density map is used for a predicted density map generated by a point-to-point supervised density estimation network, adjusting network parameters through gradient back transmission according to an error between the true density map and the predicted density map, and generating a final prediction model through iteration; (2) And realizing the prediction of the density graph of the input picture based on the trained prediction model, and calculating the total number of people in the graph.
In the invention, the truth density map is generated not by single fixed Gaussian kernel mapping or geometric self-adaptive Gaussian kernel mapping but by the source analysis of the crowd picture, the size of the target close to the camera in the picture is large, and the distance between the targets is large. Targets farther from the camera are affected by the viewing angle, smaller targets, and smaller distances between targets. In view of this, the depth information of the picture is introduced to guide the generation of the true density map, and a more accurate density map is obtained to supervise and predict the generation of the density map.
In the process of acquiring the predicted density map, the dense connection structure is used, the small target detail features are fully reserved while the multi-scale features are fused, and the problem that the small target detail features are lost when the multi-scale features are captured by the conventional method is solved. In order to improve the resolution of the prediction graph, the method of filling dimension information by using channel information makes full use of the information of the image. The influence of manual characteristics introduced by the linear interpolation upsampling in the existing method is avoided, and meanwhile, the calculation complexity caused by a transposed convolution mode is avoided.
In a specific implementation, as shown in fig. 1, a crowd counting algorithm flow based on depth information and scale perception information in the present invention includes the following steps:
s1: obtaining a truth density map of an input sample picture:
in order to obtain a truth-value density map label of an input sample picture, gaussian mapping needs to be performed on head center coordinate data corresponding to the input picture to generate a preliminary truth-value density map. And then correcting the preliminary true density map based on the depth information obtained by the depth estimation algorithm, wherein the obtained true density icon is used for a predicted density map generated by the point-to-point supervised density estimation network.
Here, two gaussian mapping methods are used, which are a fixed gaussian kernel function and a geometric adaptive gaussian kernel function, respectively, and the preliminary truth density maps generated by the two methods are fused by depth information to generate a final truth density map, which is specifically shown in fig. 2.
S11, fixing a Gaussian kernel mapping mode:
let the coordinate of a head label point be x i Using delta (x-x) i ) Indicating a gaussian distribution position, so a picture with N persons' heads can be represented as
Figure BDA0002579100810000041
The corresponding population density map can be expressed as F (x) = sheet (x) × G σ (x) In that respect Wherein G is σ (x) The coordinate is closer to the central point, the value is larger, and sigma represents the size of the region range acted by the function. This density function assumes that each head is marked with a point x i The distribution in the image space is independent of each other, but the range of the region involved by different samples is different in size in the three-dimensional space due to the influence of perspective distortion.
S12, a geometric self-adaptive Gaussian kernel mapping mode:
determining the correlation parameter by the average distance of each person from neighboring objects for each head marker x in the picture i Marks m nearby objects as
Figure BDA0002579100810000042
The average distance between the objects is
Figure BDA0002579100810000043
The distribution of the image in the crowd is Gaussian kernel
Figure BDA0002579100810000044
Wherein sigma i And
Figure BDA0002579100810000045
and (6) correlating. The density map generated by the method can be expressed as
Figure BDA0002579100810000046
Figure BDA0002579100810000047
Wherein
Figure BDA0002579100810000048
Beta is a hyperparameter.
S13, extracting a depth estimation image:
the method calculates the depth map corresponding to the input sample picture through a monocular depth estimation algorithm, corrects the Gaussian mapping value of each position in the picture by utilizing a threshold segmentation algorithm based on the information of the depth map, and fuses the information of the two density maps to form the final density map. Specifically, the monadepth algorithm may be used to estimate the depth information of the input sample picture, and the input sample picture passes through the monadepth algorithm model to obtain a gray scale map, where each pixel value on the map represents the distance from the camera to the surface of the object.
S14, fusing the density maps generated in S11 and S12 by using the depth information:
suppose the input picture is X ∈ R h×h×c Wherein h represents the size of the picture, c represents the dimension of the picture, and a truth density map F is obtained by using a fixed Gaussian kernel function 1 (x) In that respect Using geometric self-adaptive Gaussian kernel function mapping to the input picture to obtain a true value density graph F 2 (x) .1. The And processing the input picture through a monodepth model to obtain a Depth information map Depth (x). The two types of true density map information obtained are further processed based on the depth information, and a true density map F is obtained 1 (x)、F 2 (x) The operation of segmentation according to the preset depth threshold δ is as follows:
Figure BDA0002579100810000051
wherein F 1 (i, j) shows a density map F 1 (x) Value, F, corresponding to the middle coordinate (i, j) 2 (i, j) shows a density map F 2 (x) The middle coordinate (i, j) corresponds to a value, depth (i, j) represents a Depth value at the coordinate (i, j) in Depth (x), and M (i, j) represents a value corresponding to the coordinate (i, j) of the final truth density map.
S2, acquiring a predicted density map (density estimation map) of the input sample picture based on the density estimation network:
the density estimation network employed in the present invention consists of three main components: the system comprises a basic feature extraction module, a multi-scale capture module and a scale reduction module; the basic feature extraction module is mainly used for extracting low-level features such as textures of pictures; the multi-scale capturing module is used for further extracting the features, fusing multi-scale information and storing the detailed features of the small target; the scale transfer module is mainly used for reducing the scale of the feature map and lifting the feature map to the size of the input picture.
S21, a basic feature extraction module:
the model may use layers in a pre-trained VGG module. By taking 256 × 256 pictures as input, and analyzing the convolutional layer in the VGG16, the scope of the reception field of the conv4_3 layer reaches 172, which is far beyond the scale of a large target. In the current scene, the ratio of the scale of the large target in the picture is less than one half, and finally, the basic feature extraction module adopted by the method is composed of the convolution layer before conv4_3 in the VGG 16.
S22, a multi-scale capturing module:
in order to keep the detail features of the small targets in the current scene, the feature information output by the basic feature extraction module is sequentially transmitted backwards through the multi-scale module, and the performance bottleneck caused by the loss of the detail information in the conventional research method is avoided. Unlike the random short connections of Resnet, dense connections ensure the greatest degree of information sharing from layer to layer. Through receptive field analysis, four layers of dense connection are added, and the receptive field range can meet the extraction of semantic information of all size targets. To ensure that the module can extract enough context information while avoiding too high a growth rate, the module extracts features using a 3 × 3 convolution kernel per layer, keeps the resolution of the feature map unchanged using edge-filling, and sets the growth rate of the convolution to 256. Since the output channel of the basic network is 512-dimensional, dimension conversion into a feature map of 256 channels is required before entering the scale capture module.
S23, a scale reduction module:
the module improves the resolution of the feature map based on sub-pixel convolution, and the size of the image is changed into 1/8 of the original size due to the fact that the basic feature extraction part performs 8 times of downsampling on the image by using three times of pooling operation. In the multi-scale capture module, the multi-layer feature maps need to be connected through channels, so that the feature map size is kept unchanged by using edge filling. For scale reduction we need 8 times up-sampling the feature map. Since the number of low resolution feature maps in the sub-pixel convolution operation must be the square number of the upsampling factor, a 1 × 1 convolution is added after the multi-scale capture module structure, and the number of channels of the feature maps is adjusted to 64 squares of the upsampling factor. Finally, dimension filling of the feature map is performed using the channel features.
The process of generating the predicted density map based on the density estimation network with the structure is shown in fig. 3, wherein an input sample picture firstly passes through a basic feature extraction part to extract basic features, then enters a multi-scale feature capture module to fuse scale information, and finally is subjected to scale reduction to generate the predicted density map.
After a prediction density map of an input sample picture is generated, calculating loss errors according to the prediction density map and a true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration;
in the training process, a population density estimation algorithm is trained by using Euclidean distance as a loss function, wherein Euclidean loss is mainly used for calculating estimation errors at a pixel level and has the expression of
Figure BDA0002579100810000061
Wherein F (X) i (ii) a Theta) is the network outputThe obtained density estimation graph is theta represents a learning parameter in the network, and X i The ith picture that represents the input is shown,
Figure BDA0002579100810000062
the truth label graph of the ith picture is shown, and N is the number of training pictures.
Inputting a density prediction model: crowd picture and label data for training
Figure BDA0002579100810000063
And (3) outputting: predicted density map
Figure BDA0002579100810000064
The training process is as follows:
1. data preprocessing: obtaining a preliminary truth density map F of a picture 1 (X i )、F 2 (X i ) (ii) a Depth map information Depth (X) of picture i ) (ii) a Determining a depth segmentation threshold value delta; based on the depth information, a final truth density map M (X) is obtained i );
2. Model parameters are initialized, and then the model is trained until the model converges: loading pictures according to batches; extracting basic characteristics and updating a characteristic diagram F i 1 ∈R h1×h1×c1 ←X i ∈R h×h×c (ii) a Channel transformation F for feature map i 1 ∈R h1×h1×c2 ←F i 1 ∈R h1 ×h1×c1 (ii) a Multi-scale capturing and updating feature map F i 2 ←F i 1 (ii) a Channel transformation of feature maps
Figure BDA0002579100810000067
Figure BDA0002579100810000068
Scale reduction to obtain prediction chart
Figure BDA0002579100810000065
Calculating M (X) i ),
Figure BDA0002579100810000066
Updating the model parameters.
The initialization of the model parameters, except for the VGG part participating in training, the convolution kernel parameters of the rest parts are initialized by using a Gaussian function, and the standard deviation of the parameters is set to be 0.01. Optimization of the model uses Adam algorithm to replace conventional random gradient descent algorithm, and in order to enable the model to be converged quickly, a fixed learning rate is set to be 1e -5
After a stable density prediction model is trained, a predicted density map of the image can be generated for an input image by using the model in practical application, and after the density map is obtained, the total number of people in the map can be obtained through summation of pixel points, which is the conventional calculation and is not repeated herein.

Claims (4)

1. The single image crowd counting method based on the depth information and the scale perception information is characterized by comprising the following steps of:
s1, carrying out Gaussian mapping on head center coordinate data corresponding to an input sample picture to generate a preliminary truth density map, and correcting the preliminary truth density map based on depth information obtained by a depth estimation algorithm to obtain a truth density map;
s2, predicting the crowd density map of the input sample picture by adopting a density estimation network to generate a predicted density map, calculating loss errors according to the predicted density map and the true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration;
s3, when the crowd of a single image is counted, generating a predicted density map of the image by using a density prediction model, and calculating to obtain the total number of people in the image;
the step S1 specifically includes:
s11, carrying out Gaussian distribution mapping on the head coordinate points in the sample picture label data by using a Gaussian kernel with a fixed size, and superposing mapping values at all positions of an image to form a primary true-value density graph F 1 (x);
S12, taking the sample pictureCarrying out Gaussian distribution mapping on human head coordinate points in the label data by using a geometric self-adaptive Gaussian kernel, and superposing mapping values at all positions of the image to form a primary true value density map F 2 (x);
S13, extracting Depth information of each pixel position in an input sample picture by adopting a monocular Depth estimation algorithm to form a Depth estimation graph Depth (x);
s14, determining a final truth density map by using a threshold segmentation algorithm based on the information of the Depth estimation map Depth (x):
Figure FDA0003842397960000011
where δ is a preset segmentation threshold, F 1 (i, j) represents a preliminary truth density plot F 1 (x) Value, F, corresponding to the middle coordinate (i, j) 2 (i, j) represents the preliminary truth density plot F 2 (x) The coordinate (i, j) in Depth (x) corresponds to a value, depth (i, j) represents a Depth value at the coordinate (i, j) in Depth (x), and M (i, j) represents a value corresponding to the coordinate (i, j) in the final truth density map.
2. The method of claim 1,
wherein, in step S2, the density estimation network includes: the system comprises a basic feature extraction module, a multi-scale capture module and a scale transfer module; the basic feature extraction module is used for extracting low-level features such as textures of the pictures; the multi-scale capturing module is used for further extracting picture features, fusing multi-scale information and storing detailed features of small targets; the scale transfer module is used for scale reduction of the feature map and lifting the feature map to the size of the input picture.
3. The method of claim 2, wherein the method for counting the population of single-image based on the depth information and the scale perception information,
the basic feature extraction module is composed of a convolution layer before conv4_3 in a VGG16 network; the multi-scale capturing module adopts four dense connecting layers, each layer uses a 3 multiplied by 3 convolution kernel to extract features, the resolution of a feature map is kept unchanged by using edge filling, and the growth rate of convolution is set to be 256; the scale transfer module adopts sub-pixel convolution to restore the scale of the feature map and increase the resolution of the feature map to the size of the input picture.
4. The method for counting the population of single-image based on depth information and scale perception information according to any one of claims 1-3,
in step S2, calculating a loss error according to the predicted density map and the true density map, specifically including:
and measuring the error between the predicted density graph and the true density graph by using Euclidean distance as a loss function, wherein the expression is as follows:
Figure FDA0003842397960000021
wherein, F (X) i (ii) a Theta) is a predicted density map of the network output, theta represents a learning parameter in the network, X i The ith picture that represents the input is shown,
Figure FDA0003842397960000022
the truth density map of the ith picture is shown, and N is the number of training pictures.
CN202010662406.1A 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information Active CN111815665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010662406.1A CN111815665B (en) 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010662406.1A CN111815665B (en) 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information

Publications (2)

Publication Number Publication Date
CN111815665A CN111815665A (en) 2020-10-23
CN111815665B true CN111815665B (en) 2023-02-17

Family

ID=72841731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010662406.1A Active CN111815665B (en) 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information

Country Status (1)

Country Link
CN (1) CN111815665B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767451B (en) * 2021-02-01 2022-09-06 福州大学 Crowd distribution prediction method and system based on double-current convolutional neural network
CN112861718A (en) * 2021-02-08 2021-05-28 暨南大学 Lightweight feature fusion crowd counting method and system
CN113436239A (en) * 2021-05-18 2021-09-24 中国地质大学(武汉) Monocular image three-dimensional target detection method based on depth information estimation
CN113688747B (en) * 2021-08-27 2024-04-09 国网浙江省电力有限公司双创中心 Method, system, device and storage medium for detecting personnel target in image
CN113807274B (en) * 2021-09-23 2023-07-04 山东建筑大学 Crowd counting method and system based on image anti-perspective transformation
CN113869285B (en) * 2021-12-01 2022-03-04 四川博创汇前沿科技有限公司 Crowd density estimation device, method and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295557A (en) * 2016-08-05 2017-01-04 浙江大华技术股份有限公司 A kind of method and device of crowd density estimation
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN107862261A (en) * 2017-10-25 2018-03-30 天津大学 Image people counting method based on multiple dimensioned convolutional neural networks
CN109145708A (en) * 2018-06-22 2019-01-04 南京大学 A kind of people flow rate statistical method based on the fusion of RGB and D information
WO2019084854A1 (en) * 2017-11-01 2019-05-09 Nokia Technologies Oy Depth-aware object counting
CN109858424A (en) * 2019-01-25 2019-06-07 佳都新太科技股份有限公司 Crowd density statistical method, device, electronic equipment and storage medium
CN110765817A (en) * 2018-07-26 2020-02-07 株式会社日立制作所 Method, device and equipment for selecting crowd counting model and storage medium thereof
CN111126177A (en) * 2019-12-05 2020-05-08 杭州飞步科技有限公司 People counting method and device
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8861816B2 (en) * 2011-12-05 2014-10-14 Illinois Tool Works Inc. Method and apparatus for prescription medication verification

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295557A (en) * 2016-08-05 2017-01-04 浙江大华技术股份有限公司 A kind of method and device of crowd density estimation
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN107862261A (en) * 2017-10-25 2018-03-30 天津大学 Image people counting method based on multiple dimensioned convolutional neural networks
WO2019084854A1 (en) * 2017-11-01 2019-05-09 Nokia Technologies Oy Depth-aware object counting
CN109145708A (en) * 2018-06-22 2019-01-04 南京大学 A kind of people flow rate statistical method based on the fusion of RGB and D information
CN110765817A (en) * 2018-07-26 2020-02-07 株式会社日立制作所 Method, device and equipment for selecting crowd counting model and storage medium thereof
CN109858424A (en) * 2019-01-25 2019-06-07 佳都新太科技股份有限公司 Crowd density statistical method, device, electronic equipment and storage medium
CN111126177A (en) * 2019-12-05 2020-05-08 杭州飞步科技有限公司 People counting method and device
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Cascaded Multi-Task Learning of Head Segmentation and Density Regression for RGBD Crowd Counting》;Desen Zhou等;《IEEE Access》;20200529;全文 *
《多层次特征融合的人群密度估计》;陈朋等;《中国图象图形学报》;20180831;全文 *

Also Published As

Publication number Publication date
CN111815665A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN111815665B (en) Single image crowd counting method based on depth information and scale perception information
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN109886121B (en) Human face key point positioning method for shielding robustness
CN108648161B (en) Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN112488210A (en) Three-dimensional point cloud automatic classification method based on graph convolution neural network
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN114863573B (en) Category-level 6D attitude estimation method based on monocular RGB-D image
Zhou et al. Scale adaptive image cropping for UAV object detection
CN113052835B (en) Medicine box detection method and system based on three-dimensional point cloud and image data fusion
CN111524135A (en) Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line
CN112862792B (en) Wheat powdery mildew spore segmentation method for small sample image dataset
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN110674704A (en) Crowd density estimation method and device based on multi-scale expansion convolutional network
CN112465021B (en) Pose track estimation method based on image frame interpolation method
CN114724120A (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
CN113065546A (en) Target pose estimation method and system based on attention mechanism and Hough voting
CN113673590A (en) Rain removing method, system and medium based on multi-scale hourglass dense connection network
CN112084952B (en) Video point location tracking method based on self-supervision training
CN112509021A (en) Parallax optimization method based on attention mechanism
CN111414931A (en) Multi-branch multi-scale small target detection method based on image depth
CN116097307A (en) Image processing method and related equipment
CN114429555A (en) Image density matching method, system, equipment and storage medium from coarse to fine
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant