CN109858389B - Vertical ladder people counting method and system based on deep learning - Google Patents

Vertical ladder people counting method and system based on deep learning Download PDF

Info

Publication number
CN109858389B
CN109858389B CN201910023341.3A CN201910023341A CN109858389B CN 109858389 B CN109858389 B CN 109858389B CN 201910023341 A CN201910023341 A CN 201910023341A CN 109858389 B CN109858389 B CN 109858389B
Authority
CN
China
Prior art keywords
target
image
dbls
algorithm
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910023341.3A
Other languages
Chinese (zh)
Other versions
CN109858389A (en
Inventor
陈国特
王超
施行
王伟
蔡巍伟
吴磊磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinzailing Technology Co ltd
Original Assignee
Zhejiang Xinzailing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Xinzailing Technology Co ltd filed Critical Zhejiang Xinzailing Technology Co ltd
Priority to CN201910023341.3A priority Critical patent/CN109858389B/en
Publication of CN109858389A publication Critical patent/CN109858389A/en
Application granted granted Critical
Publication of CN109858389B publication Critical patent/CN109858389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for counting the number of people on a vertical ladder based on deep learning, which comprises the following steps: the straight elevator monitoring camera judges whether a person exists through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and a lift car starts to analyze a request in a running state, triggers an image and calls a straight elevator pedestrian detection algorithm to perform target detection; after receiving an analysis request, taking an image from a main code stream according to a time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame; and obtaining the specific number of the vertical ladders through the detection frame, writing the number into a database, and reporting to the aerial ladder platform.

Description

Vertical ladder people counting method and system based on deep learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a vertical ladder people counting method and system based on deep learning.
Background
In the use process of the vertical ladder, people trapping caused by partial faults or human factors cannot be avoided, the current people trapping number can be detected through the detection method, and the detection method informs the property or the aerial ladder platform, so that rescuers can timely rescue the vertical ladder. The vertical ladder is provided with the gravity sensor, has a certain load range, exceeds this scope and leads to the device inefficacy easily, from the security consideration, can rationally control vertical ladder load range through detecting the vertical ladder passenger number. The traffic of people taking a single straight ladder in one day is counted, the number of people in a community is counted for big data analysis aiming at a community scene, and a certain early warning effect is achieved in the aspect of safety management; for the scenes with large flow, such as schools and hospitals, the flow data are counted and analyzed, and the operation efficiency can be improved by optimizing and scheduling the straight elevator operation; for scenes such as a shopping mall and a shopping center, flow data are counted, reasonable layout of advertisement positions is facilitated, profits of operating assets are increased, and the like.
The invention relates to a Chinese patent with the application number of CN201710157587.0 in the prior art, in particular to a people counting method and device and an elevator dispatching method and system. And taking the connected domain mark aiming at the human head target pixel block in the two-dimensional projection image. The target area of the head is obtained by taking the maximum value of the target as the center of a circle and presetting the radius, and the number of the heads is obtained by comparing the area of the communication area with a preset area threshold value. The image processing algorithm in the invention belongs to the traditional pattern recognition algorithm, and some characteristics need to be designed manually. For example, the preset radius circle covers the candidate area, the environment is different in different elevator scenes, or the deviation of the preset value may be caused by the camera position deviation caused by human factors, and thus the method does not have universality.
Disclosure of Invention
The invention aims to provide a vertical ladder people counting method and system based on deep learning.
In order to solve the technical problems, the invention adopts the following technical scheme:
the embodiment of the invention provides a vertical ladder people counting method based on deep learning, which comprises the following steps:
the straight elevator monitoring camera judges whether a person exists through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and a lift car starts to analyze a request in a running state, triggers an image and calls a straight elevator pedestrian detection algorithm to perform target detection;
after receiving an analysis request, taking an image from a main code stream according to a time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;
and obtaining the specific number of the vertical ladders through the detection frame, writing the number into a database, and reporting to the aerial ladder platform.
Preferably, the starting to invoke the YOLOv3 algorithm to perform analysis to obtain the final detection box specifically is:
firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the bounding boxes predicted by the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detection box.
Preferably, three-scale feature map prediction is adopted, and when the input is 416 × 416, the feature maps are respectively 13 × 13, 26 × 26, 52 × 52 obtained by using a YOLOv3 detection network, specifically: the YOLOv3 detection network uses 23 residual error units behind the first group of DBLs, 6 groups of DBLs, and adds convolution layers to obtain 13 × 13 characteristic output for detecting large targets; the method comprises the following steps that tensor splicing is carried out on 23 residual error units, the characteristics of 26 x 26 obtained by branching DBL and up-sampling after 5 groups of DBLs and the characteristics of 26 x 26 obtained by 19 residual error units, concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26; making branches behind the 5 groups of DBLs to continue to make DBLs and upsampling to obtain 52 x 52 features, making tensor splicing with the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 feature output for detecting small targets after the spliced tensor passes through the 6 groups of DBLs and the convolution layer; and finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians.
Another aspect of the embodiments of the present invention is to provide a system for counting people on a vertical ladder based on deep learning, including:
the analysis request module is used for judging the presence of a person by the straight elevator monitoring camera through a camera presence-absence algorithm, judging the closing of the straight elevator door by a door opening and closing algorithm, starting an analysis request when the elevator car is in a running state, triggering an image and calling a straight elevator pedestrian detection algorithm for target detection;
the elevator vertical people counting module is used for taking an image from the main code stream according to the time node of the trigger image after receiving the analysis request, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;
and the writing-in uploading module is used for obtaining the specific number of the vertical ladders through the detection frame, writing the number into the database and reporting to the aerial ladder platform.
Preferably, the starting to invoke the YOLOv3 algorithm to perform analysis to obtain the final detection box specifically is:
firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the bounding boxes predicted by the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detection box.
Preferably, three-scale feature map prediction is adopted, and when the input is 416 × 416, the feature maps are respectively 13 × 13, 26 × 26, 52 × 52 obtained by using a YOLOv3 detection network, specifically: the YOLOv3 detection network uses 23 residual error units behind the first group of DBLs, 6 groups of DBLs, and adds convolution layers to obtain 13 × 13 characteristic output for detecting large targets; the method comprises the following steps that tensor splicing is carried out on 23 residual error units, the characteristics of 26 x 26 obtained by branching DBL and up-sampling after 5 groups of DBLs and the characteristics of 26 x 26 obtained by 19 residual error units, concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26; making branches behind the 5 groups of DBLs to continue to make DBLs and upsampling to obtain 52 x 52 features, making tensor splicing with the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 feature output for detecting small targets after the spliced tensor passes through the 6 groups of DBLs and the convolution layer; and finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians.
The invention has the following beneficial effects: the method is applied to the aerial ladder platform, and the detectability of the pedestrian target of the vertical ladder is improved by marking a large amount of data of the pedestrian of the vertical ladder and using the improved YOLOv3 network training data set; and the model obtained based on big data sample training has good universality and can be suitable for various scenes such as communities, schools, hospitals, hotels, markets and the like.
Drawings
FIG. 1 is a flowchart illustrating steps of a method for counting the number of people on a vertical ladder based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a skeleton structure of a YOLOv3 detection network according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a system for counting people on a vertical ladder based on deep learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart illustrating steps of a method for counting the number of people on a vertical ladder based on deep learning according to an embodiment of the present invention is shown, which includes the following steps:
s1, the straight elevator monitoring camera judges whether a person is present through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and an analysis request is started when the elevator car is in a running state, an image is triggered, and a straight elevator pedestrian detection algorithm is called for target detection;
s2, after receiving the analysis request, taking an image from the main code stream according to the time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;
the method for obtaining the final detection box by starting to call the YOLOv3 algorithm for analysis specifically comprises the following steps:
firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the bounding boxes predicted by the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detection box.
According to the method, the head area of the straight-ladder pedestrians is detected for statistics, and the condition that the traffic is large and crowded is considered, the straight-ladder pedestrians are seriously shielded, so that the detection of the head and the shoulders or the whole trunk is not considered. Fig. 2 shows a schematic diagram of YOLOv3 detection network framework, the network model size 246M. The input layer is the image to be detected, and DBL is the convolution layer + Batch Normalization layer + leak excitation layer. The calculation is accelerated by three-layer fusion. Where convolutional layers are used to extract features, convolution is a mathematical operation on two real variable functions. Generally, input image data and a kernel function are convoluted, the output is called feature mapping, image feature information is extracted through convolution operation in deep learning, if a two-dimensional image I is used as input, a two-dimensional kernel K is used, and the formula is as follows:
Figure BDA0001941579080000081
wherein M isr,McRows and columns of I, Kr,KcIs the row and column of K, m, n is the step size, and i, j should satisfy the condition: i is more than or equal to 0 and less than or equal to Mr+Kr-1,0≤j≤Mc+Kc-1。
In order to solve the problem of gradient disappearance during deep network training, the Batch Normalization layer standardizes the mapping value of the nonlinear function of the hidden layer neuron:
Figure BDA0001941579080000082
wherein x(k)The mapping of neurons after excitation transformation, k is constant, E (x) is mean, and Var (x) is variance.
The leaky excitation layer performs nonlinear mapping on input neurons:
Figure BDA0001941579080000083
wherein xiAs the weight of the neuron, yiTo map, aiI is a constant.
And in addition, a residual error unit is introduced to deepen the network depth. ResN is a residual unit, where x is the value output to the neuron by the previous layer; w is the weight by which x passes to the neuron; y is the output value of x within a neuron as determined by the activation function:
y=F(x,ω)+x
the YOLOv3 network uses 23 residual units after the first group of DBLs, 6 groups of DBLs, and adds convolutional layers to obtain 13 × 13 characteristic outputs for large target detection. And tensor splicing is performed on the characteristics of 26 x 26 obtained by branching and up-sampling after 5 groups of DBLs and 26 x 26 obtained by 19 residual units after 23 residual units and 5 groups of DBLs, wherein concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26. And branching the 5 groups of DBLs, continuously performing DBL and upsampling to obtain 52 x 52 features, simultaneously performing tensor splicing on the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 features output for detecting the small target after the spliced tensor passes through 6 groups of DBLs and the convolution layer is added. And finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians. The three scales of 13 × 13, 26 × 26 and 52 × 52 are adopted to detect different target sizes as a whole, and since the detection effect of the YOLOv3 network on partial small target areas is not good, the detection performance of the YOLOv3 is improved by obtaining a small anchor through re-clustering on a straight ladder pedestrian data set.
And S3, obtaining the specific number of the vertical ladders through the detection frame, writing the number into the database, and reporting to the aerial ladder platform.
According to the method for counting the number of the vertical ladders based on the deep learning, 39 thousands of vertical ladder pedestrian samples under different scenes are marked, the Yolov3 network training model is utilized, the model obtained by training aiming at the data set has good universality in the application aspect, can adapt to pedestrian recognition under various illumination intensities, different vertical ladder types and vertical ladder scenes, and has high accuracy.
Corresponding to the method of the embodiment of the invention, the embodiment of the invention also provides a system for counting the number of people on the vertical ladder based on deep learning, and the functional block diagram is shown in fig. 2, which comprises the following steps:
the analysis request module is used for judging the presence of a person by the straight elevator monitoring camera through a camera presence-absence algorithm, judging the closing of the straight elevator door by a door opening and closing algorithm, starting an analysis request when the elevator car is in a running state, triggering an image and calling a straight elevator pedestrian detection algorithm for target detection;
the elevator vertical people counting module is used for taking an image from the main code stream according to the time node of the trigger image after receiving the analysis request, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;
wherein, the step of starting to call the YOLOv3 algorithm for analysis to obtain the final detection box specifically comprises the following steps:
firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the bounding boxes predicted by the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detection box.
According to the method, the head area of the straight-ladder pedestrians is detected for statistics, and the condition that the traffic is large and crowded is considered, the straight-ladder pedestrians are seriously shielded, so that the detection of the head and the shoulders or the whole trunk is not considered. Fig. 2 shows a schematic diagram of YOLOv3 detection network framework, the network model size 246M. The input layer is the image to be detected, and DBL is the convolution layer + Batch Normalization layer + leak excitation layer. The calculation is accelerated by three-layer fusion. Where convolutional layers are used to extract features, convolution is a mathematical operation on two real variable functions. Generally, input image data and a kernel function are convoluted, the output is called feature mapping, image feature information is extracted through convolution operation in deep learning, if a two-dimensional image I is used as input, a two-dimensional kernel K is used, and the formula is as follows:
Figure BDA0001941579080000111
wherein M isr,McRows and columns of I, Kr,KcIs the row and column of K, m, n is the step size, and i, j should satisfy the condition: i is more than or equal to 0 and less than or equal to Mr+Kr-1,0≤j≤Mc+Kc-1。
In order to solve the problem of gradient disappearance during deep network training, the Batch Normalization layer standardizes the mapping value of the nonlinear function of the hidden layer neuron:
Figure BDA0001941579080000112
wherein x(k)The mapping of neurons after excitation transformation, k is constant, E (x) is mean, and Var (x) is variance.
The leaky excitation layer performs nonlinear mapping on input neurons:
Figure BDA0001941579080000113
wherein xiAs the weight of the neuron, yiTo map, aiI is a constant.
And in addition, a residual error unit is introduced to deepen the network depth. ResN is a residual unit, where x is the value output to the neuron by the previous layer; w is the weight by which x passes to the neuron; y is the output value of x within a neuron as determined by the activation function:
y=F(x,ω)+x
the YOLOv3 network uses 23 residual units after the first group of DBLs, 6 groups of DBLs, and adds convolutional layers to obtain 13 × 13 characteristic outputs for large target detection. And tensor splicing is performed on the characteristics of 26 x 26 obtained by branching and up-sampling after 5 groups of DBLs and 26 x 26 obtained by 19 residual units after 23 residual units and 5 groups of DBLs, wherein concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26. And branching the 5 groups of DBLs, continuously performing DBL and upsampling to obtain 52 x 52 features, simultaneously performing tensor splicing on the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 features output for detecting the small target after the spliced tensor passes through 6 groups of DBLs and the convolution layer is added. And finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians. The three scales of 13 × 13, 26 × 26 and 52 × 52 are adopted to detect different target sizes as a whole, and since the detection effect of the YOLOv3 network on partial small target areas is not good, the detection performance of the YOLOv3 is improved by obtaining a small anchor through re-clustering on a straight ladder pedestrian data set.
And the writing-in uploading module is used for obtaining the specific number of the vertical ladders through the detection frame, writing the number into the database and reporting to the aerial ladder platform. According to the vertical ladder people counting system based on deep learning provided by the embodiment of the invention, 39 thousands of vertical ladder pedestrian samples under different scenes are marked, the Yolov3 network training model is utilized, the model obtained by training aiming at the data set has good universality in the application aspect, can adapt to pedestrian recognition under various illumination intensities, different vertical ladder types and vertical ladder scenes, and has high accuracy.
It is to be understood that the exemplary embodiments described herein are illustrative and not restrictive. Although one or more embodiments of the present invention have been described with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.

Claims (2)

1. A vertical ladder people counting method based on deep learning is characterized by comprising the following steps:
the straight elevator monitoring camera judges whether a person exists through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and a lift car starts to analyze a request in a running state, triggers an image and calls a straight elevator pedestrian detection algorithm to perform target detection;
after receiving an analysis request, taking an image from a main code stream according to a time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;
the specific number of the vertical ladder persons is obtained through the detection frame, the number is written into a database and reported to the aerial ladder platform at the same time,
the method for obtaining the final detection box by starting to call the YOLOv3 algorithm for analysis specifically comprises the following steps:
firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the predicted bounding boxes of the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detected box,
and (3) adopting characteristic diagram prediction of three scales, when the input is 416 × 416, obtaining characteristic diagrams of 13 × 13, 26 × 26 and 52 × 52 by adopting a YOLOv3 detection network, specifically: the YOLOv3 detection network uses 23 residual error units behind the first group of DBLs, 6 groups of DBLs, and adds convolution layers to obtain 13 × 13 characteristic output for detecting large targets; the method comprises the following steps that tensor splicing is carried out on 23 residual error units, the characteristics of 26 x 26 obtained by branching DBL and up-sampling after 5 groups of DBLs and the characteristics of 26 x 26 obtained by 19 residual error units, concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26; making branches behind the 5 groups of DBLs to continue to make DBLs and upsampling to obtain 52 x 52 features, making tensor splicing with the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 feature output for detecting small targets after the spliced tensor passes through the 6 groups of DBLs and the convolution layer; and finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians.
2. The utility model provides a vertical ladder people counting system based on degree of depth study which characterized in that includes:
the analysis request module is used for judging the presence of a person by the straight elevator monitoring camera through a camera presence-absence algorithm, judging the closing of the straight elevator door by a door opening and closing algorithm, starting an analysis request when the elevator car is in a running state, triggering an image and calling a straight elevator pedestrian detection algorithm for target detection;
the elevator vertical people counting module is used for taking an image from the main code stream according to the time node of the trigger image after receiving the analysis request, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;
a write-in upload module for obtaining the number of the specific vertical ladder through the detection frame, writing the number into the database and simultaneously reporting to the aerial ladder platform,
the method for obtaining the final detection box by starting to call the YOLOv3 algorithm for analysis specifically comprises the following steps:
firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the predicted bounding boxes of the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detected box,
and (3) adopting characteristic diagram prediction of three scales, when the input is 416 × 416, obtaining characteristic diagrams of 13 × 13, 26 × 26 and 52 × 52 by adopting a YOLOv3 detection network, specifically: the YOLOv3 detection network uses 23 residual error units behind the first group of DBLs, 6 groups of DBLs, and adds convolution layers to obtain 13 × 13 characteristic output for detecting large targets; the method comprises the following steps that tensor splicing is carried out on 23 residual error units, the characteristics of 26 x 26 obtained by branching DBL and up-sampling after 5 groups of DBLs and the characteristics of 26 x 26 obtained by 19 residual error units, concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26; making branches behind the 5 groups of DBLs to continue to make DBLs and upsampling to obtain 52 x 52 features, making tensor splicing with the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 feature output for detecting small targets after the spliced tensor passes through the 6 groups of DBLs and the convolution layer; and finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians.
CN201910023341.3A 2019-01-10 2019-01-10 Vertical ladder people counting method and system based on deep learning Active CN109858389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910023341.3A CN109858389B (en) 2019-01-10 2019-01-10 Vertical ladder people counting method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910023341.3A CN109858389B (en) 2019-01-10 2019-01-10 Vertical ladder people counting method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN109858389A CN109858389A (en) 2019-06-07
CN109858389B true CN109858389B (en) 2021-06-04

Family

ID=66894403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910023341.3A Active CN109858389B (en) 2019-01-10 2019-01-10 Vertical ladder people counting method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN109858389B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222673B (en) * 2019-06-21 2021-04-06 杭州宇泛智能科技有限公司 Passenger flow statistical method based on head detection
CN110414391B (en) * 2019-07-15 2021-05-04 河北工业大学 Active movable vehicle bottom dangerous goods detection device based on deep learning algorithm
CN110738148A (en) * 2019-09-29 2020-01-31 浙江新再灵科技股份有限公司 Cloud target detection algorithm based on heterogeneous platform
CN110765904A (en) * 2019-10-11 2020-02-07 浙江新再灵科技股份有限公司 Device and method for predicting crowd consumption index based on elevator scene
CN111353377A (en) * 2019-12-24 2020-06-30 浙江工业大学 Elevator passenger number detection method based on deep learning
CN111476600B (en) * 2020-03-23 2023-09-19 浙江新再灵科技股份有限公司 Statistical analysis method for audience numbers of direct ladder advertisement
CN111986253B (en) * 2020-08-21 2023-09-15 日立楼宇技术(广州)有限公司 Method, device, equipment and storage medium for detecting elevator crowding degree
CN112926500B (en) * 2021-03-22 2022-09-20 重庆邮电大学 Pedestrian detection method combining head and overall information
CN113012335A (en) * 2021-03-22 2021-06-22 上海工程技术大学 Subway platform guide queuing system based on YOLOv3 face detection
CN114495003A (en) * 2022-01-24 2022-05-13 上海申视信科技有限公司 People number identification and statistics method and system based on improved YOLOv3 network
CN114333120A (en) * 2022-03-14 2022-04-12 南京理工大学 Bus passenger flow detection method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108069307A (en) * 2016-11-14 2018-05-25 杭州海康威视数字技术股份有限公司 The method and device that a kind of number in elevator is counted
CN108792853B (en) * 2018-06-26 2020-08-04 潍坊学院 Elevator dispatching system and method

Also Published As

Publication number Publication date
CN109858389A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN109858389B (en) Vertical ladder people counting method and system based on deep learning
Li et al. Road network extraction via deep learning and line integral convolution
CN103390164B (en) Method for checking object based on depth image and its realize device
CN107016357A (en) A kind of video pedestrian detection method based on time-domain convolutional neural networks
CN110188807A (en) Tunnel pedestrian target detection method based on cascade super-resolution network and improvement Faster R-CNN
CN104063719A (en) Method and device for pedestrian detection based on depth convolutional network
CN107944450A (en) A kind of licence plate recognition method and device
CN104134364B (en) Real-time traffic sign identification method and system with self-learning capacity
CN114283469B (en) Improved YOLOv4-tiny target detection method and system
CN110197152A (en) A kind of road target recognition methods for automated driving system
Jain et al. Performance analysis of object detection and tracking algorithms for traffic surveillance applications using neural networks
CN116824335A (en) YOLOv5 improved algorithm-based fire disaster early warning method and system
CN109190488A (en) Front truck car door opening detection method and device based on deep learning YOLOv3 algorithm
CN114842208A (en) Power grid harmful bird species target detection method based on deep learning
CN106682600A (en) Method and terminal for detecting targets
CN111259736B (en) Real-time pedestrian detection method based on deep learning in complex environment
CN114202803A (en) Multi-stage human body abnormal action detection method based on residual error network
CN113221804A (en) Disordered material detection method and device based on monitoring video and application
CN113378668A (en) Method, device and equipment for determining accumulated water category and storage medium
CN117612249A (en) Underground miner dangerous behavior identification method and device based on improved OpenPose algorithm
Chaganti et al. Predicting Landslides and Floods with Deep Learning
CN110956156A (en) Deep learning-based red light running detection system
CN110163081A (en) SSD-based real-time regional intrusion detection method, system and storage medium
CN114943873A (en) Method and device for classifying abnormal behaviors of construction site personnel
CN115563652A (en) Track embedding leakage prevention method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant