CN108804992B - Crowd counting method based on deep learning - Google Patents

Crowd counting method based on deep learning Download PDF

Info

Publication number
CN108804992B
CN108804992B CN201710318219.XA CN201710318219A CN108804992B CN 108804992 B CN108804992 B CN 108804992B CN 201710318219 A CN201710318219 A CN 201710318219A CN 108804992 B CN108804992 B CN 108804992B
Authority
CN
China
Prior art keywords
human body
image
deep learning
new
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710318219.XA
Other languages
Chinese (zh)
Other versions
CN108804992A (en
Inventor
雷航
杨铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710318219.XA priority Critical patent/CN108804992B/en
Publication of CN108804992A publication Critical patent/CN108804992A/en
Application granted granted Critical
Publication of CN108804992B publication Critical patent/CN108804992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The method comprises the steps of extracting a motion prospect from a video, ensuring the visual angle and the perspective invariance of a camera by using a human body region model, and finally determining the statistical population of the human body region through preprocessing, extraction and detection. The method can reduce the search area of the sliding window and improve the search efficiency, overcomes the deformation of the monitoring video caused by the visual angle, the distance from the monitoring scene and the like, is simple in system installation and deployment, improves the human body detection accuracy rate based on the detection model of the deep learning convolutional neural network, eliminates redundant sub-areas by using a non-maximum inhibition method and reduces repeated counting, so that the results of detecting the human body, positioning the human body and counting the number of people are more accurate.

Description

Crowd counting method based on deep learning
Technical Field
The method belongs to the field of intelligent video monitoring, and particularly relates to a crowd counting method based on deep learning.
Background
With the popularization of video monitoring systems, cameras are distributed in all corners of cities. First, in the face of such a huge amount of cameras and surveillance videos, it is impractical to use an artificial approach to discriminate the behavior and attributes of people in a surveillance scene. Secondly, in the face of complicated scenes such as rainy days, snowy days, night scenes or ultra-dense crowds, it is difficult to identify people therein simply by naked eyes, not to mention counting the number of people therein.
At present, people counting methods applied to video monitoring systems are mainly classified into three categories: the first type judges and counts human bodies one by utilizing the sliding of a detector in an image; secondly, extracting crowd movement track characteristics from the images and clustering, wherein the clustering result is a crowd counting result; and the third category utilizes a statistical method to estimate the distribution of the crowd so as to calculate the density of the crowd and obtain the number of the crowd through calculation. However, the above methods all adopt a manual feature extraction mode, and are not suitable for complex scenes, or the methods cannot handle object deformation caused by viewing angles and perspective reasons without introducing perspective and observation angle invariance, and cannot be well suitable for scenes with wide viewing fields, or the methods for solving the perspective and observation angle invariance are adopted, but the accuracy is too dependent on parameters such as shooting angles and observation distances of a user manually measured camera, so that the installation configuration of the system is complicated. The image processing by the detector is determined by the quality of the detector, and the traversing calculation amount of the whole image by the sliding window is huge, so that the real-time performance is difficult to ensure.
Disclosure of Invention
The invention provides a method, which can not only reduce the search area of a sliding window to the maximum extent to improve the detection efficiency and reduce the interference of a complex scene on human body detection, but also can realize perspective and observation angle invariance by simple configuration.
In order to achieve the above purpose, the invention provides a crowd counting method based on deep learning, which comprises the following steps:
step 1, performing white balance preprocessing on a preprocessed image by adopting a gray world algorithm;
step 2, extracting the preprocessed image by adopting a background segmentation method of a K nearest neighbor algorithm;
step 3, traversing the extracted image pixels by adopting a method for ensuring the invariance of visual angles and perspectives, and inputting the coordinates (x, y) of the image pixels into a trained linear model to obtain the size of the human body region;
step 4, adopting a convolutional neural network as a human body to detect a model;
and 5, counting the final human body number.
Further, the gray world algorithm in step 1 performs white balance preprocessing on the image, and further includes the following steps:
1) averaging three channels of the preprocessed image;
2) obtaining the gain of each channel and superposing the gain value to the original image;
3) planning and processing the result;
the formula is as follows:
Figure BDA0001289054130000021
Figure BDA0001289054130000022
Figure BDA0001289054130000023
Figure BDA0001289054130000024
Figure BDA0001289054130000025
I out =(R new ,G new ,B new )
wherein M is R 、M G 、M B Representing the mean of the three channels of the input image R, G, B, respectively, alpha representing the global mean of the three channels, K representing the gain value of each channel, R new 、G new 、B new Representing the three channels behind the superimposed gain, I out Representing the image after gain superposition; for the above process, there may be overflow (>255, no less than 0) appears, experiments show that if it is directly going to do so>Setting 255 pixels to 255 may cause the image to be entirely whitish, so calculating all R is used new 、G new 、B new And then using the maximum value to linearly map the calculated data back to [0,255 []And (4) the following steps. The image is subjected to white balance preprocessing to automatically equalize the gray values of the pixels.
Furthermore, the extracted image in the step 2 adopts a background segmentation method of a K-nearest neighbor algorithm to traverse each pixel of the input image, find K pixel points which are closest to the pixel in a certain neighborhood, carry out majority voting on the categories of the points, and determine the category of the current pixel; the classification decision rule is as follows:
Figure BDA0001289054130000026
wherein I (-) is an indicator function, i.e. when y i =c j The time function goes to 1, otherwise 0.
Further, performing dilation and erosion operation on the image subjected to the extraction and pretreatment in the step 2.
Further, in step 3, traversing each pixel of the foreground region by using a method for ensuring the invariance of the viewing angle and the perspective, taking the pixel coordinate (x, y) as the center of the sub-region, and inputting the pixel coordinate (x, y) into the trained linear model to obtain the size of the human body region; the calculation formula of the human body region size is as follows:
w=θ 01 ·x+θ 2 ·y
h=ω 01 ·x+ω 2 ·y
Figure BDA0001289054130000027
Figure BDA0001289054130000028
wherein w, h represent the width and height of the body region at coordinates (x, y), respectively; theta, omega represent the weight of a linear model for finding the width and height of a human body region, respectively i And ω i The weights are learnable weights obtained by manually intercepting human body regions from a detection scene and training the human body regions by using a linear regression algorithm.
Further, in step 4, all the calculated sub-images of the human body region captured from the original image are input to the convolutional neural network to determine whether the sub-images are human bodies. The human body size calculation is based on a linear regression human body region model. Sequencing all the regions judged to have human bodies according to the network output value, namely the confidence degrees of the judged human bodies, taking the region with the highest confidence degree as a standard, and removing all the regions exceeding a certain set threshold value; the formula is as follows:
Figure BDA0001289054130000031
Figure BDA0001289054130000032
wherein S is over An area indicating an overlapping portion of two regions involved in the determination; s represents the sum of the areas of the two regions participating in the judgment; the area where f (o) is 0 is removed, and the remaining area is the final result.
Furthermore, the convolutional neural network described in step 4 is used as a human body detection model, and the network structure of the convolutional neural network refers to the cifar10 network in the mask deep learning framework, so that parameters of each layer of the network are simplified.
Because the invention adopts the technical scheme, the invention has the following beneficial effects:
the method comprises the steps of extracting a motion prospect from a video, ensuring the visual angle and the perspective invariance of a camera by using a human body region model, and finally determining the statistical population of the human body region through preprocessing, extracting and detecting. The method can reduce the search area of the sliding window and improve the search efficiency, overcomes the deformation of the monitoring video caused by the visual angle, the distance from the monitoring scene and the like, is simple in system installation and deployment, improves the human body detection accuracy rate based on the detection model of the deep learning convolutional neural network, eliminates redundant sub-areas by using a non-maximum inhibition method and reduces repeated counting, so that the results of detecting the human body, positioning the human body and counting the number of people are more accurate.
Drawings
FIG. 1 is a flow chart of a people counting method;
FIG. 2 is a flowchart of a human body region model training process;
FIG. 3 is a human detection model training flow diagram;
fig. 4 is a diagram of a human body detection convolutional neural network structure.
Detailed Description
The invention is further described below with reference to the following figures and examples.
As shown in fig. 1-3, a deep learning-based demographics method includes the following steps:
step 1, performing white balance preprocessing on a preprocessed image by adopting a gray scale world algorithm;
furthermore, the white balance preprocessing method of the gray scale world algorithm firstly averages the three channels of the preprocessed image, then obtains the gain of each channel and superposes the gain value on the original image, and finally plans the result. The image subjected to white balance processing can automatically balance the gray value of the pixels, prevent the whole image from being slightly bright or dark, and remove the interference of illumination to a certain extent.
The formula is as follows:
Figure BDA0001289054130000041
Figure BDA0001289054130000042
Figure BDA0001289054130000043
Figure BDA0001289054130000044
Figure BDA0001289054130000045
I out =(R new ,G new ,B new )
wherein M is R 、M G 、M E Representing the mean of the three channels of the input image R, G, B, respectively, alpha representing the global mean of the three channels, K representing the gain value of each channel, R new 、G new 、B new Representing the three channels after the superposition of gains, I out Representing the image after gain superposition. For the above process, there may be overflow (>255, no less than 0) appears, experiments show that if it is directly going to do so>Setting 255 pixels to 255 may cause the image to be entirely whitish, so calculating allR new 、G new 、B new And then using the maximum value to linearly map the calculated data back to [0,255 []And (4) inside.
Step 2, extracting the preprocessed image by adopting a background segmentation method of a K nearest neighbor algorithm;
further, the video motion foreground extraction technology based on the K Nearest Neighbor (KNN) algorithm traverses each pixel of the input image, finds K pixel points closest to the pixel in a certain neighborhood, performs majority voting on the categories of the points, determines the category of the current pixel, and updates the background. Dividing each framing of the video into a background or a foreground; the classification decision rule is as follows:
Figure BDA0001289054130000046
wherein I (-) is an indicator function, i.e. when y i =c j The time function goes to 1, otherwise 0.
And further, performing expansion corrosion operation on the extracted motion foreground to eliminate noise and obtain a final foreground area.
Dilation is the process of merging all background points in contact with an object into the object, expanding the boundary outward. Such as using a 3x3 structuring element, i.e., a dilation operation template. Each pixel of the scanned image is ored with the binary image covered by the structural element, if both are 0, the pixel of the scanned image is 0, otherwise, the pixel is 1. The result is a one-turn enlargement of the binary image.
Erosion is a process by which boundary points are eliminated and the boundaries are shrunk inward. Can be used to eliminate small and meaningless objects. Such as with a 3x3 erosion algorithm template. Each pixel of the scanned image is ANDed with the binary image covered by the structural element, if both are 1, the pixel of the scanned image is 1, otherwise, the pixel is 0. The result is a one-turn reduction of the binary image.
Step 3, traversing the extracted image pixels by adopting a method for ensuring the invariance of visual angles and perspectives, and inputting the coordinates (x, y) of the image pixels into a trained linear model to obtain the size of the human body region;
furthermore, each pixel of the foreground area is traversed by a sliding window method, then each traversed pixel point coordinate (x, y) is input into a human body area model to obtain the size of the human body area, and the relationship between the pixel space coordinate of the fixed scene image and the size of the human body area is established by adopting a linear regression model and ensuring the invariance of the visual angle and the perspective. Before training, the human body areas at various positions are manually intercepted from the scene, and all coordinates from far to near are covered as far as possible. And then training the model by using linear regression to obtain a human body region model. The formula is as follows:
Figure BDA0001289054130000051
Figure BDA0001289054130000052
where equation (1) is the objective function, h θ (x) A linear estimation function for the target problem is represented, and y represents a real value of the target problem; equation (2) is a weight update function, θ represents the weight of the linear model, and α represents the learning rate.
Will I out And intercepting subimages, inputting the subimages into a trained linear human body model, and judging whether the subimages are human bodies. In particular, the pixel coordinate (x, y) of each traversal is taken as the center of the subarea from I out And intercepting the subimage as an image to be detected. The calculation formula of the human body region size is as follows:
w=θ 01 ·x+θ 2 ·y
h=ω 01 ·x+ω 2 ·y
Figure BDA0001289054130000053
Figure BDA0001289054130000054
where w, h represent the width and height of the body region at coordinates (x, y), respectively. Theta, omega represent the weight of the linear model for finding the width and height of the human body region, respectively, theta i And ω i The weights are learnable weights obtained by manually intercepting human body regions from a detection scene and training the human body regions by using a linear regression algorithm. Since the perspective relation between the distance and the size of the object is a linear relation, the human body dimension calculation is based on a human body region model of linear regression. The machine learning method is used for ensuring the invariance of the visual angle and the perspective, so that the system can be corrected again simply no matter the system is installed and deployed, or the shooting angle and the shooting distance of the camera are changed.
Step 4, adopting a convolutional neural network as a human body detection model;
the convolutional neural network is used as a human body detection model, and the network structure of the convolutional neural network refers to a cifar10 network in a cafe deep learning framework, so that parameters of each layer of the network are simplified. In the training process of the convolutional neural network model based on deep learning, firstly, human body samples are collected from a large number of monitoring videos, namely a human body database, a human body sample database with 1600 positive samples and 1600 negative samples is finally obtained, and the human body sample database is used as a training sample network to obtain a human body detection model.
The convolutional neural network structure shown in fig. 4 contains two convolutional layers (convolution), two maximum pooling layers (max boosting), two local normalization layers (local response normalization), two full connection layers (full connection), and a softmax classifier. The convolutional layer is used for feature extraction; the pooling layer compresses the input feature map and reduces the scale of the features to obtain more inductive features; the local normalization layer functions like the activation layer, in that it normalizes input features to speed up training; the full-connection layer role summarizes the input features and maps the input features to a specific high order space for classification; the softmax layer is used to classify the feature vectors. In the human body detection convolution network model, the input of the model is 24 × 24 images with 3 channels, a 24 × 24 feature map of 16 channels is obtained through a first convolution layer Conv1 and an activation function layer Relu1, then a feature map of 16 channels 12 × 12 is obtained through Pooling layer Max Pooling1 sampling, then the feature map of a local normalization layer is kept unchanged in size, then the convolution layer Conv2, the activation function layer Relu2, the local normalization layer and the Pooling layer Max Pooling2 are respectively and sequentially input to obtain a feature map with 16 channels of 6 × 6, and finally the feature map is converted into a 2-dimensional feature vector through two fully connected layers and a softmax classifier. Such an input image of 24 x 24 with 3 channels is classified into 2 classes, human or non-human, via a convolutional network.
And 5, counting the final human body number.
Further, a non-maximum suppression algorithm is adopted for all the sub-regions judged as human bodies in the step 4, and redundant regions are removed. And sequencing all the regions judged to have the human body according to the network output value, namely the confidence coefficient of the judged human body, and then taking the region with the highest confidence coefficient as a standard to remove all the regions exceeding a certain set threshold value. The formula is as follows:
Figure BDA0001289054130000061
Figure BDA0001289054130000062
wherein S is over An area indicating an overlapping portion of two regions involved in the determination; s represents the sum of the areas of the two regions participating in the judgment; o represents the proportion of the area overlapping portion in the entire region. The area where f (o) is 0 is removed, and the remaining area is the final result. According to the experiment, the method achieves the best effect when the sigma is 0.2.

Claims (8)

1. A crowd counting method based on deep learning is characterized by comprising the following steps:
step 1, performing white balance preprocessing on a preprocessed image by adopting a gray world algorithm;
step 2, extracting the preprocessed image by adopting a background segmentation method of a K nearest neighbor algorithm;
step 3, traversing each pixel of the foreground area by adopting a method for ensuring the invariance of the visual angle and the perspective, taking the pixel coordinates (x, y) as the center of the sub-area, and inputting the pixel coordinates (x, y) into a trained linear model to obtain the size of the human body area; the calculation formula of the human body region size is as follows:
w=θ 01 ·x+θ 2 ·y
h=ω 01 ·x+ω 2 ·y
Figure FDA0003651742060000011
Figure FDA0003651742060000012
wherein w, h represent the width and height of the body region at coordinates (x, y), respectively; theta, omega represent the weight of the linear model for finding the width and height of the human body region, respectively, theta i And omega i The weight is learnable weight, and is obtained by manually intercepting a human body region from a detection scene and training by using a linear regression algorithm;
step 4, adopting a convolutional neural network as a human body detection model; inputting all calculated sub-images of the human body region intercepted from an original image into a convolutional neural network to judge whether the sub-images are human bodies;
and 5, counting the final number of human bodies.
2. The deep learning based demographics method of claim 1, wherein: step 1, the gray world algorithm carries out white balance preprocessing on the image, and further comprises the following steps:
1) averaging three channels of the preprocessed image;
2) obtaining the gain of each channel and superposing the gain value to the original image;
3) planning and processing the result;
the formula is as follows:
Figure FDA0003651742060000013
Figure FDA0003651742060000014
Figure FDA0003651742060000015
Figure FDA0003651742060000016
Figure FDA0003651742060000017
I out =(R new ,G new ,B new )
wherein M is R 、M G 、M B Representing the mean of the three channels of the input image R, G, B, respectively, alpha representing the global mean of the three channels, K representing the gain value of each channel, R new 、G new 、B new Representing the three channels behind the superimposed gain, I out Representing the image after gain superposition; for the above-described processing, there may be overflow>255, the phenomenon less than 0 can not occur, and experiments show that if the product is directly used>Setting 255 pixels to 255 may cause the image to be entirely whitish, so calculating all R is used new 、G new 、B new And then using the maximum value to re-linearly map the calculated data to [0, 255%]And (4) the following steps.
3. The deep learning based demographics method of claim 1 or 2, wherein: the grey values of the pixels of the image subjected to white balance preprocessing in the step 1 can be automatically equalized.
4. The deep learning based demographics method of claim 1, wherein: the extracted image in the step 2 adopts a background segmentation method of a K-nearest neighbor algorithm, traverses each pixel of the input image, finds K pixel points closest to the pixel in a certain neighborhood, carries out majority voting on the categories of the points, and determines the category of the current pixel; the classification decision rule is as follows:
Figure FDA0003651742060000021
wherein I (-) is an indicator function, i.e. when y i =c j The time function goes to 1, otherwise 0.
5. The deep learning based demographics method of claim 1 or 4, wherein: and (3) performing expansion corrosion operation on the image subjected to the extraction pretreatment in the step (2).
6. The deep learning based demographics method of claim 1, wherein: sorting all the regions judged to have human bodies according to the network output value, namely the confidence degrees of the judged human bodies, taking the region with the highest confidence degree as a standard, and removing all the regions exceeding a certain set threshold; the formula is as follows:
Figure FDA0003651742060000022
Figure FDA0003651742060000023
wherein S is over An area indicating an overlapping portion of two regions involved in the determination; s represents the total area of two regions participating in the decisionAnd; the area where f (o) is 0 is removed, and the remaining area is the final result.
7. The deep learning based demographics method of claim 1, wherein: and 4, taking the convolutional neural network as a human body detection model, and simplifying the parameters of each layer of the network by referring to the cifar10 network in the mask deep learning framework by the network structure of the convolutional neural network.
8. The deep learning based demographics method of claim 1, wherein: the human body size calculation is based on a linear regression human body region model.
CN201710318219.XA 2017-05-08 2017-05-08 Crowd counting method based on deep learning Active CN108804992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710318219.XA CN108804992B (en) 2017-05-08 2017-05-08 Crowd counting method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710318219.XA CN108804992B (en) 2017-05-08 2017-05-08 Crowd counting method based on deep learning

Publications (2)

Publication Number Publication Date
CN108804992A CN108804992A (en) 2018-11-13
CN108804992B true CN108804992B (en) 2022-08-26

Family

ID=64094203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710318219.XA Active CN108804992B (en) 2017-05-08 2017-05-08 Crowd counting method based on deep learning

Country Status (1)

Country Link
CN (1) CN108804992B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276779A (en) * 2019-06-04 2019-09-24 华东师范大学 A kind of dense population image generating method based on the segmentation of front and back scape
CN110562810B (en) * 2019-08-01 2020-10-23 珠海格力电器股份有限公司 Elevator dispatching method, device, computer equipment and storage medium
CN111882555B (en) * 2020-08-07 2024-03-12 中国农业大学 Deep learning-based netting detection method, device, equipment and storage medium
CN112580616B (en) * 2021-02-26 2021-06-18 腾讯科技(深圳)有限公司 Crowd quantity determination method, device, equipment and storage medium
CN114926467B (en) * 2022-07-22 2022-10-21 新恒汇电子股份有限公司 Full-automatic lead frame counting method and device
CN116385969B (en) * 2023-04-07 2024-03-12 暨南大学 Personnel gathering detection system based on multi-camera cooperation and human feedback
CN116805337B (en) * 2023-08-25 2023-10-27 天津师范大学 Crowd positioning method based on trans-scale visual transformation network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211411A (en) * 2007-12-21 2008-07-02 北京中星微电子有限公司 Human body detection process and device
CN102521582A (en) * 2011-12-28 2012-06-27 浙江大学 Human upper body detection and splitting method applied to low-contrast video
CN103051905A (en) * 2011-10-12 2013-04-17 苹果公司 Use of noise-optimized selection criteria to calculate scene white points
CN103049788A (en) * 2012-12-24 2013-04-17 南京航空航天大学 Computer-vision-based system and method for detecting number of pedestrians waiting to cross crosswalk
CN103226701A (en) * 2013-04-24 2013-07-31 天津大学 Modeling method of video semantic event
CN105160313A (en) * 2014-09-15 2015-12-16 中国科学院重庆绿色智能技术研究院 Method and apparatus for crowd behavior analysis in video monitoring
CN106027787A (en) * 2016-06-15 2016-10-12 维沃移动通信有限公司 White balance method of mobile terminal, and mobile terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160315682A1 (en) * 2015-04-24 2016-10-27 The Royal Institution For The Advancement Of Learning / Mcgill University Methods and systems for wireless crowd counting

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211411A (en) * 2007-12-21 2008-07-02 北京中星微电子有限公司 Human body detection process and device
CN103051905A (en) * 2011-10-12 2013-04-17 苹果公司 Use of noise-optimized selection criteria to calculate scene white points
CN102521582A (en) * 2011-12-28 2012-06-27 浙江大学 Human upper body detection and splitting method applied to low-contrast video
CN103049788A (en) * 2012-12-24 2013-04-17 南京航空航天大学 Computer-vision-based system and method for detecting number of pedestrians waiting to cross crosswalk
CN103226701A (en) * 2013-04-24 2013-07-31 天津大学 Modeling method of video semantic event
CN105160313A (en) * 2014-09-15 2015-12-16 中国科学院重庆绿色智能技术研究院 Method and apparatus for crowd behavior analysis in video monitoring
CN106027787A (en) * 2016-06-15 2016-10-12 维沃移动通信有限公司 White balance method of mobile terminal, and mobile terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于卷积神经网络的监控视频人数统计算法";马海军;《安徽大学学报》;20160530;第40卷(第3期);第22-28页 *

Also Published As

Publication number Publication date
CN108804992A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108804992B (en) Crowd counting method based on deep learning
US9846946B2 (en) Objection recognition in a 3D scene
CN106683119B (en) Moving vehicle detection method based on aerial video image
CN109598794B (en) Construction method of three-dimensional GIS dynamic model
CN108304808A (en) A kind of monitor video method for checking object based on space time information Yu depth network
CN104978567B (en) Vehicle checking method based on scene classification
CN106845364B (en) Rapid automatic target detection method
CN107330390B (en) People counting method based on image analysis and deep learning
CN106780560B (en) Bionic robot fish visual tracking method based on feature fusion particle filtering
Rout A survey on object detection and tracking algorithms
CN104517095B (en) A kind of number of people dividing method based on depth image
CN114266977B (en) Multi-AUV underwater target identification method based on super-resolution selectable network
CN101923637B (en) A kind of mobile terminal and method for detecting human face thereof and device
CN108734200B (en) Human target visual detection method and device based on BING (building information network) features
CN109410248B (en) Flotation froth motion characteristic extraction method based on r-K algorithm
Su et al. A new local-main-gradient-orientation HOG and contour differences based algorithm for object classification
CN112733711A (en) Remote sensing image damaged building extraction method based on multi-scale scene change detection
Ali et al. Vehicle detection and tracking in UAV imagery via YOLOv3 and Kalman filter
CN106056078A (en) Crowd density estimation method based on multi-feature regression ensemble learning
Ghahremannezhad et al. Automatic road detection in traffic videos
CN109215059B (en) Local data association method for tracking moving vehicle in aerial video
CN110636248B (en) Target tracking method and device
Angelo A novel approach on object detection and tracking using adaptive background subtraction method
KR101690050B1 (en) Intelligent video security system
CN111476314B (en) Fuzzy video detection method integrating optical flow algorithm and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant