CN111709300B - Crowd counting method based on video image - Google Patents

Crowd counting method based on video image Download PDF

Info

Publication number
CN111709300B
CN111709300B CN202010430583.7A CN202010430583A CN111709300B CN 111709300 B CN111709300 B CN 111709300B CN 202010430583 A CN202010430583 A CN 202010430583A CN 111709300 B CN111709300 B CN 111709300B
Authority
CN
China
Prior art keywords
image
density
background
pedestrian
decoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010430583.7A
Other languages
Chinese (zh)
Other versions
CN111709300A (en
Inventor
韩铠宇
翁立
王建中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010430583.7A priority Critical patent/CN111709300B/en
Publication of CN111709300A publication Critical patent/CN111709300A/en
Application granted granted Critical
Publication of CN111709300B publication Critical patent/CN111709300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a crowd counting method based on video images. The method comprises the steps that input data are continuous video frame images, and redundant information is separated through pixel subtraction on collected continuous video frames and given background images to obtain preprocessed input images; inputting the preprocessed image into a density classification-based coding-decoding network model, extracting multi-scale features by using a backbone network, and performing feature fusion on the features for density regression to give weight; and simultaneously, the extracted multi-scale features are used for up-sampling to obtain a corresponding density estimation image, and finally, the density images corresponding to the features of different scales are weighted to obtain a final density estimation image. The method provided by the invention aims at the crowd counting of the video image, utilizes the similarity between pedestrians to a certain extent, and utilizes the filtering of redundant information, thereby not only obtaining the real-time pedestrian counting, but also retaining the background picture in real time.

Description

Crowd counting method based on video image
Technical Field
The invention belongs to the field of crowd image processing in computer vision, and particularly relates to a method for counting crowds and segmenting a pedestrian background in an image.
Background
The crowd counting is to count the number of pedestrians in the image or the video image sequence. In real life, effective pedestrian counting has important significance in the fields of safety control, area planning, behavior analysis and the like, for example, certain data support is provided in the aspects of trampling prevention, traffic route design, advertisement position putting, building site selection and the like.
The current pedestrian counting methods can be mainly divided into three categories: early detection-based approaches, regression-based approaches, and today's density map regression approaches. The detection-based method is to detect the pedestrian through a sliding window by utilizing the characteristics of edges and the like, is limited by pedestrian shielding, and is suitable for occasions with dispersed targets. The regression-based method improves the counting accuracy in the shielded crowd to a certain extent, but cannot well obtain the spatial information of pedestrian distribution.
With the continuous development of the field of computer vision, pedestrian counting starts to turn to a density map regression method. Compared with the two methods, the method can provide the distribution condition of the pedestrians while processing the shielding problem by using the density map regression mode, so that specific spatial distribution information is obtained.
Pedestrian counting still remains today a problem common to a large number of computer vision fields. For example, perspective problems with perspective transformation make the detection of people at different scales more difficult. Most of existing counting methods adopt a form of deep learning and multi-scale feature extraction, and have the advantages that pedestrian features of different scales are extracted by multilayer or multi-column convolution, so that the perspective problem is solved to a certain extent, and an improved space still exists.
In fact, with pedestrian counts for fixed scenes, a large amount of redundant information often occurs. Such as surrounding buildings, parked vehicles, are often unchanged for a certain period of time. In the existing method for generating the density map by utilizing deep learning, certain resources are occupied by calculating interference data, and the calculation speed is reduced. For the interference of the background information, redundant information can be filtered in advance in the form of online background updating and background segmentation in the video stream processing.
By combining the above ideas, the invention provides a crowd counting method based on video images and online background segmentation.
Disclosure of Invention
Aiming at the problems in the existing pedestrian counting field, the invention provides a crowd counting method based on a video image. The method has the following advantages:
in a model training stage, 1) selecting a mature multilayer small Convolutional Neural Network (CNN), such as a VGG-16 structure, and performing primary feature extraction, so that the image has strong representation capability and parameters are reduced, the model is simpler and has strong universality; 2) and performing density estimation on the image by using the obtained multi-scale features. Under the condition that the similarity exists among the pedestrians and the density is high, statistics can be more effectively carried out by mainly using low-level features; in less dense cases, the high-level features of the pedestrian will make the count more accurate. Therefore, by means of the density classification mode, statistics can be conducted according to different shielding conditions, and counting accuracy is improved.
In the application process, the background segmentation method is used for separating the environmental interference, key information is reserved, the part of the image participating in calculation is simplified in a sparse matrix form, and the speed of subsequent pedestrian counting regression is increased; and continuously updating the background by utilizing the spatial information of the pedestrian detection and the information reserved by the background segmentation method, and finally separating the complete background.
A crowd counting method based on video images comprises the following steps:
step one, selecting a pedestrian image data set with marking information, wherein the number of a test set and a training set is 6: 4, the proportion can be modified according to the actual data set, and then Gaussian function processing is carried out according to the head annotation pixel points of the image to generate an initial true density chart corresponding to the original image;
and step two, building a coding-decoding convolution network model based on density classification.
The density classification-based coding-decoding convolutional network model is divided into a backbone network and two branches: and taking the VGG-16 network as a backbone network, and extracting corresponding different scale features by using all the layers. Performing fusion input on the extracted features with different scales through a density regression branch, and realizing density classification through regression to obtain the weight of a decoding branch; and the decoding branch utilizes the features of each scale to up-sample and decode the restored image to generate a crowd density estimation image corresponding to the features of each scale, and weights the crowd density estimation image by using the weight obtained by the density regression branch to obtain a final density estimation image.
And step three, training the coding-decoding convolutional network model based on density classification and built in the step two through a training set, optimizing parameters by adopting a random gradient descent algorithm, and calculating the loss between the density estimation graph and the truth density graph by using Euclidean distance. A complete model with a better effect is reserved for actual detection;
and step four, reducing the input image by utilizing a preprocessing method of background separation, finishing the generation of a sparse matrix, and realizing a final counting result by the coding-decoding convolutional network model based on density classification obtained in the step three.
Method of background separation: the pixel subtraction is carried out on the collected continuous video frames and the given background image, and the image content of all irrelevant background information is reserved in a threshold dividing mode, so that the reduction of the input image content is realized, and the convolution efficiency is improved; and extracting the pedestrian-containing part through a final density estimation image generated by the coding-decoding convolution network model, and updating the rest part to a background image layer in a background mode to realize the real-time updating of the background.
The specific content of the first step is as follows:
and converting the pedestrian image with the head position mark in the data set into a true value density map by utilizing a two-dimensional Gaussian convolution kernel for loss difference calculation. Selecting a density map based on a geometrically adapted Gaussian kernel, and formulating as follows:
Figure BDA0002500393550000031
the truth density map is obtained by convolving the delta pulse function with a gaussian function, and summing after convolution. x is the number of i Representing the pixel position of the human head in the image; delta (x-x) i ) An impulse function representing the position of the human head in the image; n is the total number of the heads in the image;
Figure BDA0002500393550000032
is a distance x i Average distance of m persons with the nearest head; beta is a fixed value and is used for generating a width parameter of the Gaussian function.
Further, β is 0.3.
And converting the pedestrian image with the head mark into a true-value density map through the operation, and performing subsequent training by comparing the true-value density map with the output of the convolutional neural network.
The third step comprises the following specific contents:
and (4) training the coding-decoding convolution network model which is built in the step two and is based on density classification by using the test set image as input, and reserving model parameters. The loss between the final density estimate map and the true density map is calculated using the euclidean distance. The parameters are optimized using a random gradient descent algorithm until the loss values converge to the expected values.
When the distance between the density graph generated by Euclidean distance measurement and the real value is adopted, the loss function is defined as follows:
Figure BDA0002500393550000041
where N denotes the number of pictures input to the encoding-decoding convolutional network model, Z (X) i (ii) a Theta) is the final density estimation diagram corresponding to the ith input picture, Z GT A truth density plot is shown. Θ represents the network parameters to be learned.
The coding-decoding convolutional network model is evaluated using Mean Square Error (MSE) and Mean Absolute Error (MAE). MSE is used for describing the accuracy of the encoding-decoding convolutional network model, the accuracy is higher when the MSE is smaller, and the MAE can reflect the error condition of a predicted value.
Figure BDA0002500393550000042
Figure BDA0002500393550000043
Wherein, C i Indicating the number of people predicted for the picture,
Figure BDA0002500393550000044
representing the actual number of people.
And (3) testing process: and selecting a test set, inputting the test set into a trained model for testing, outputting a final crowd density graph, and counting results. And taking the optimal result as a model parameter for packaging.
The concrete content of the fourth step is as follows:
and subtracting the background image from the collected continuous video frames by using a background separation method, namely obtaining a difference image by performing pixel subtraction on the input initial image and the background image. The difference map contains information of all irrelevant backgrounds, including the change of shadows caused by pedestrians, vehicles and light irradiation. And performing threshold division on the difference map to filter out small interference such as illumination and the like to obtain a region of interest (ROI) separating the background. And (4) reserving the ROI image, namely the effective image in the model in the input step three. In the process, the filtering of redundant information is realized, and the convolution rate of the coding-decoding convolution network model is improved in a sparse matrix form.
After a final density estimation image of the ROI image is obtained, constructing a pedestrian mask template in a manual calibration mode (given according to actual conditions), performing expansion operation of digital image processing morphological change by using the pedestrian mask template and the final density estimation image (a highlight point in the density image is convolved with the mask template to obtain an expanded region which represents that pedestrians exist in the current region) to obtain a pedestrian image, and performing pixel value negation on the pedestrian image to obtain a background updating mask; and performing dot multiplication on the background updating mask and the initial image to obtain an updated background image, wherein the updated background image is used for updating the background image participating in background subtraction, and the online updating of the background image is realized.
And preprocessing the acquired information through the fourth step, and detecting and counting the pedestrians through the optimal model selected in the third step. High-efficiency pedestrian counting and spatial information feedback are realized.
The invention has the following beneficial effects:
the invention adopts a coding-decoding network based on density classification to generate a final density estimation graph; and the preprocessing of the image is realized by utilizing a background separation method, and the generation of a final density estimation graph is accelerated.
The input data is a continuous video frame image, and the input image after pretreatment is obtained by performing pixel subtraction separation redundant information on the collected continuous video frame and a given background image; the preprocessed image is input to a density-classification based encoding-decoding network. Aiming at the characteristic that the density of pedestrians is similar, the network extracts multi-scale features by using a backbone network, performs feature fusion on the features and gives weight by using density regression; and simultaneously, the extracted multi-scale features are used for up-sampling to obtain a corresponding density estimation image, and finally, the density images corresponding to the features of different scales are weighted to obtain a final density estimation image. Compared with the existing crowd counting technology, the method provided by the invention aims at the crowd counting of the video image, utilizes the similarity between pedestrians to a certain extent, and utilizes the filtering of redundant information, thereby not only obtaining the real-time pedestrian counting, but also retaining the background picture in real time. In addition, the pedestrian count of a single image can be realized by using the encoding-decoding network based on density classification alone.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a model of a density classification based encoding-decoding convolutional network;
FIG. 3 is a network model training flow diagram of the present invention;
FIG. 4 is a schematic diagram of a background separation process;
FIG. 5 is a flow chart of the present invention.
Detailed Description
The method of the invention is further described below with reference to the accompanying drawings and examples.
As shown in fig. 1, a method for counting people based on video images includes the following steps:
step one, selecting a pedestrian image data set with labeled information, wherein the number of a test set and a training set is 6: 4, the proportion can be modified according to the actual data set, and then Gaussian function processing is carried out according to the head annotation pixel points of the image to generate an initial true density chart corresponding to the original image;
the concrete content is as follows:
and converting the pedestrian image with the head position mark in the data set into a true value density map by utilizing a two-dimensional Gaussian convolution kernel for loss difference calculation. In order to make the density map correspond to the image with different visual angles and dense crowd better, the density map based on the geometric adaptive Gaussian kernel is selected, and the formula is as follows:
Figure BDA0002500393550000061
the truth density map is obtained by convolving the delta pulse function with a gaussian function, and summing after convolution. x is the number of i Representing the pixel position of the human head in the image; delta (x-x) i ) An impulse function representing the position of the human head in the image; n is the total number of the heads in the image;
Figure BDA0002500393550000062
is a distance x i Average distance of m persons with the nearest head; beta is a fixed value and is used for generating a width parameter of the Gaussian function.
Further, β is 0.3.
And converting the pedestrian image with the head mark into a true density map through the operation, and performing subsequent training by using the true density map as output comparison of the convolutional neural network.
And step two, building a coding-decoding convolution network model based on density classification.
As shown in fig. 2, the density classification-based coding-decoding convolutional network model is divided into a backbone network and two branches: and taking the VGG-16 network as a backbone network, and extracting corresponding different scale features by using all the layers. Performing fusion input on the extracted features with different scales through a density regression branch, and realizing density classification through regression to obtain the weight of a decoding branch; and the decoding branch utilizes the features of each scale to up-sample and decode the restored image to generate a crowd density estimation image corresponding to the features of each scale, and weights the crowd density estimation image by using the weight obtained by the density regression branch to obtain a final density estimation image.
And step three, training the coding-decoding convolutional network model based on density classification and built in the step two through a training set, optimizing parameters by adopting a random gradient descent algorithm, and calculating the loss between the density estimation graph and the truth density graph by using Euclidean distance. A complete model with a better effect is reserved for actual detection;
as shown in fig. 3, the specific content is:
and (4) training the coding-decoding convolution network model which is built in the step two and is based on density classification by using the test set image as input, and reserving model parameters. The loss between the final density estimate map and the true density map is calculated using the euclidean distance. The parameters are optimized using a random gradient descent algorithm until the loss value converges to the expected value.
When the distance between the density graph generated by Euclidean distance measurement and the real value is adopted, the loss function is defined as follows:
Figure BDA0002500393550000071
where N denotes the number of pictures input to the encoding-decoding convolutional network model, Z (X) i (ii) a Theta) is the final density estimation diagram corresponding to the ith input picture, Z GT A truth density plot is shown. Θ represents the network parameters to be learned.
The coding-decoding convolutional network model is evaluated using Mean Square Error (MSE) and Mean Absolute Error (MAE). MSE is used for describing the accuracy of the encoding-decoding convolutional network model, the accuracy is higher when the MSE is smaller, and the MAE can reflect the error condition of a predicted value.
Figure BDA0002500393550000081
Figure BDA0002500393550000082
Wherein, C i Indicating the number of people predicted for the picture,
Figure BDA0002500393550000083
representing the actual number of people.
The testing process comprises the following steps: and selecting a test set, inputting the test set into a trained model for testing, outputting a final crowd density graph, and counting results. And taking the optimal result as a model parameter for packaging.
And step four, reducing the input image by utilizing a preprocessing method of background separation, finishing the generation of a sparse matrix, and realizing a final counting result by the coding-decoding convolutional network model based on density classification obtained in the step three.
As shown in fig. 4, the background separation method: the pixel subtraction is carried out on the collected continuous video frames and a given background image, and the image content of all irrelevant background information is reserved in a threshold dividing mode, so that the reduction of the input image content is realized, and the convolution efficiency is improved; and extracting the pedestrian-containing part through a final density estimation image generated by the coding-decoding convolution network model, and updating the rest part to a background image layer in a background mode to realize the real-time updating of the background.
The concrete contents are as follows:
and subtracting the background image from the collected continuous video frames by using a background separation method, namely obtaining a difference image by performing pixel subtraction on the input initial image and the background image. The difference map contains information of all irrelevant backgrounds, including the change of shadows caused by pedestrians, vehicles and light irradiation. And performing threshold division on the difference map to filter out small interference such as illumination and the like to obtain a region of interest (ROI) separating the background. And (4) reserving the ROI image, namely the effective image in the model in the input step three. In the process, the filtering of redundant information (background interference) is realized, and the convolution rate of the coding-decoding convolution network model is improved in a sparse matrix form.
After obtaining a final density estimation image of the ROI image, constructing a pedestrian mask template in a manual calibration mode (given according to actual conditions), performing expansion operation of digital image processing morphological change by using the pedestrian mask template and the final density estimation image (a highlight point in the density image is convolved with the mask template to obtain an expanded region representing that pedestrians exist in the current region) to obtain a pedestrian image (only containing the pedestrians, each pedestrian is replaced by one mask template, the pedestrian mask can be understood as the pedestrian mask, and the pedestrian mask is not the template), and performing pixel value inversion on the pedestrian image (after binarization, 0 is changed into 1, and 1 is changed into 0) to obtain a background updating mask; and performing dot multiplication on the background updating mask and the initial image to obtain an updated background image, wherein the updated background image is used for updating the background image participating in background subtraction, and the online updating of the background image is realized.
As shown in fig. 5, the collected information is preprocessed in step four, and then pedestrian detection counting is performed by the best model selected in step three. High-efficiency pedestrian counting and spatial information feedback are realized.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (5)

1. A crowd counting method based on video images is characterized by comprising the following steps:
step one, selecting a pedestrian image data set with marking information, wherein the number of a test set and a training set is 6: 4, performing Gaussian function processing according to the head marking pixel points of the image to generate an initial true value density chart corresponding to the original image;
secondly, building a coding-decoding convolution network model based on density classification;
the density classification-based coding-decoding convolutional network model is divided into a backbone network and two branches: a VGG-16 network is used as a backbone network, and all the layers are used for extracting corresponding different scale features; performing fusion input on the extracted features with different scales through a density regression branch, and realizing density classification through regression to obtain the weight of a decoding branch; the decoding branch utilizes the features of all scales to up-sample and decode the restored image to generate a crowd density estimation image corresponding to the features of all scales, and weights are carried out by utilizing the weights obtained by the density regression branch to obtain a final density estimation image;
step three, training the coding-decoding convolution network model based on density classification and built in the step two through a training set, optimizing parameters by adopting a random gradient descent algorithm, and calculating the loss between a density estimation graph and a truth density graph by using Euclidean distance; a complete model with a better effect is reserved for actual detection;
fourthly, reducing the input image by utilizing a preprocessing method of background separation, finishing the generation of a sparse matrix, and realizing a final counting result through the coding-decoding convolution network model based on density classification obtained in the third step;
method of background separation: the pixel subtraction is carried out on the collected continuous video frames and a given background image, and the image content of all irrelevant background information is reserved in a threshold dividing mode, so that the reduction of the input image content is realized, and the convolution efficiency is improved; and extracting the pedestrian-containing part through a final density estimation image generated by the coding-decoding convolutional network model, and updating the rest part to a background image layer in a background mode to realize the real-time updating of the background.
2. The method according to claim 1, wherein the step one comprises the following steps:
converting a pedestrian image with a head position mark in the data set into a true value density map by using a two-dimensional Gaussian convolution kernel for loss difference calculation; selecting a density map based on a geometrically adapted Gaussian kernel, and formulating as follows:
Figure FDA0003664498400000021
the truth value density chart is obtained by convolution of a delta pulse function and a Gaussian function, and the convolution is performed firstly and then the summation is performed; x is the number of i Representing the pixel position of the human head in the image; delta (x-x) i ) An impulse function representing the position of the human head in the image; n is the total number of the heads in the image;
Figure FDA0003664498400000022
is a distance x i Average distance of m persons with the nearest head; beta is a fixed value and is used for generating a width parameter of a Gaussian function;
and converting the pedestrian image with the head mark into a true density map through the operation, and performing subsequent training by using the true density map as output comparison of the convolutional neural network.
3. The method according to claim 2, wherein the third step comprises:
training the coding-decoding convolutional network model based on density classification built in the step two by using the test set image as input, and reserving model parameters; calculating a loss between the final density estimation map and the true density map using the euclidean distance; optimizing parameters by adopting a random gradient descent algorithm until a loss value converges to a predicted value;
when the distance between the density graph generated by Euclidean distance measurement and the real value is adopted, the loss function is defined as follows:
Figure FDA0003664498400000023
where N denotes the number of pictures input to the encoding-decoding convolutional network model, Z (X) i (ii) a Theta) is the final density estimation diagram corresponding to the ith input picture, Z GT Representing a truth density chart; theta represents a network parameter to be learned;
evaluating the coding-decoding convolutional network model by using Mean Square Error (MSE) and Mean Absolute Error (MAE); MSE is used for describing the accuracy of the coding-decoding convolutional network model, the accuracy is higher when the MSE is smaller, and the MAE can reflect the error condition of a predicted value;
Figure FDA0003664498400000031
Figure FDA0003664498400000032
wherein, C i Indicating the number of people predicted for the picture,
Figure FDA0003664498400000033
representing the actual number of people;
the testing process comprises the following steps: selecting a test set, inputting the test set into a trained model for testing, outputting a final crowd density graph, and counting results; and taking the optimal result as a model parameter for packaging.
4. The method according to claim 3, wherein the detailed contents of the fourth step are as follows:
subtracting the background image from the collected continuous video frames by using a background separation method, namely obtaining a difference image by performing pixel subtraction on the input initial image and the background image; the difference map contains information of all irrelevant backgrounds, including shadow changes caused by pedestrian, vehicle and light irradiation; carrying out threshold division on the difference map to filter out small interference of illumination so as to obtain a region of interest (ROI) separating the background; the reserved ROI image is an effective image in the model in the input step three; in the process, the filtering of redundant information is realized, and the convolution rate of the coding-decoding convolution network model is improved in a sparse matrix form;
after a final density estimation image of the ROI image is obtained, constructing a pedestrian mask template in a manual calibration mode, performing expansion operation of digital image processing morphological change on the pedestrian mask template and the final density estimation image to obtain a pedestrian image, and performing pixel value negation on the pedestrian image to obtain a background updating mask, wherein a highlight point in the density image is convoluted with the mask template to obtain an expanded area, which represents that pedestrians exist in the current area; performing dot multiplication on the background updating mask and the initial image to obtain an updated background image, wherein the updated background image is used for updating the background image participating in background subtraction and realizing online updating of the background image;
preprocessing the acquired information through the fourth step, and detecting and counting pedestrians through the optimal model selected in the third step; high-efficiency pedestrian counting and spatial information feedback are realized.
5. The method of claim 2, wherein β is 0.3.
CN202010430583.7A 2020-05-20 2020-05-20 Crowd counting method based on video image Active CN111709300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430583.7A CN111709300B (en) 2020-05-20 2020-05-20 Crowd counting method based on video image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430583.7A CN111709300B (en) 2020-05-20 2020-05-20 Crowd counting method based on video image

Publications (2)

Publication Number Publication Date
CN111709300A CN111709300A (en) 2020-09-25
CN111709300B true CN111709300B (en) 2022-08-12

Family

ID=72538030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430583.7A Active CN111709300B (en) 2020-05-20 2020-05-20 Crowd counting method based on video image

Country Status (1)

Country Link
CN (1) CN111709300B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515990A (en) * 2020-09-28 2021-10-19 阿里巴巴集团控股有限公司 Image processing and crowd density estimation method, device and storage medium
CN112632601B (en) * 2020-12-16 2024-03-12 苏州玖合智能科技有限公司 Crowd counting method for subway carriage scene
CN112767316A (en) * 2020-12-31 2021-05-07 山东师范大学 Crowd counting method and system based on multi-scale interactive network
CN112699848B (en) * 2021-01-15 2022-05-31 上海交通大学 Counting method and system for dense crowd of image
CN112597985B (en) * 2021-03-04 2021-07-02 成都西交智汇大数据科技有限公司 Crowd counting method based on multi-scale feature fusion
CN116703904A (en) * 2023-08-04 2023-09-05 中建八局第一数字科技有限公司 Image-based steel bar quantity detection method, device, equipment and medium
CN117854191A (en) * 2024-01-10 2024-04-09 北京中航智信建设工程有限公司 Airport isolation remote self-help checking system and method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130188A1 (en) * 2017-10-26 2019-05-02 Qualcomm Incorporated Object classification in a video analytics system
CN109101930B (en) * 2018-08-18 2020-08-18 华中科技大学 Crowd counting method and system
CN109815867A (en) * 2019-01-14 2019-05-28 东华大学 A kind of crowd density estimation and people flow rate statistical method
CN110781780B (en) * 2019-10-11 2023-04-07 浙江大华技术股份有限公司 Vacancy detection method and related device

Also Published As

Publication number Publication date
CN111709300A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN111709300B (en) Crowd counting method based on video image
CN106874894B (en) Human body target detection method based on regional full convolution neural network
CN108427920B (en) Edge-sea defense target detection method based on deep learning
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN108805015B (en) Crowd abnormity detection method for weighted convolution self-coding long-short term memory network
CN110555368B (en) Fall-down behavior identification method based on three-dimensional convolutional neural network
CN108764085B (en) Crowd counting method based on generation of confrontation network
CN111723693B (en) Crowd counting method based on small sample learning
CN111401144B (en) Escalator passenger behavior identification method based on video monitoring
CN110956094A (en) RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network
CN110135296A (en) Airfield runway FOD detection method based on convolutional neural networks
CN111191667B (en) Crowd counting method based on multiscale generation countermeasure network
CN108038846A (en) Transmission line equipment image defect detection method and system based on multilayer convolutional neural networks
CN110427839A (en) Video object detection method based on multilayer feature fusion
CN107563345A (en) A kind of human body behavior analysis method based on time and space significance region detection
CN110765833A (en) Crowd density estimation method based on deep learning
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN101971190A (en) Real-time body segmentation system
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN109255326B (en) Traffic scene smoke intelligent detection method based on multi-dimensional information feature fusion
CN105957356B (en) A kind of traffic control system and method based on pedestrian's quantity
CN108416780B (en) Object detection and matching method based on twin-region-of-interest pooling model
Patil et al. Motion saliency based generative adversarial network for underwater moving object segmentation
CN106650617A (en) Pedestrian abnormity identification method based on probabilistic latent semantic analysis
Wang et al. Removing background interference for crowd counting via de-background detail convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant