CN109858389B

CN109858389B - Vertical ladder people counting method and system based on deep learning

Info

Publication number: CN109858389B
Application number: CN201910023341.3A
Authority: CN
Inventors: 陈国特; 王超; 施行; 王伟; 蔡巍伟; 吴磊磊
Original assignee: Zhejiang Xinzailing Technology Co ltd
Current assignee: Zhejiang Xinzailing Technology Co ltd
Priority date: 2019-01-10
Filing date: 2019-01-10
Publication date: 2021-06-04
Anticipated expiration: 2039-01-10
Also published as: CN109858389A

Abstract

The invention discloses a method and a system for counting the number of people on a vertical ladder based on deep learning, which comprises the following steps: the straight elevator monitoring camera judges whether a person exists through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and a lift car starts to analyze a request in a running state, triggers an image and calls a straight elevator pedestrian detection algorithm to perform target detection; after receiving an analysis request, taking an image from a main code stream according to a time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame; and obtaining the specific number of the vertical ladders through the detection frame, writing the number into a database, and reporting to the aerial ladder platform.

Description

Vertical ladder people counting method and system based on deep learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a vertical ladder people counting method and system based on deep learning.

Background

In the use process of the vertical ladder, people trapping caused by partial faults or human factors cannot be avoided, the current people trapping number can be detected through the detection method, and the detection method informs the property or the aerial ladder platform, so that rescuers can timely rescue the vertical ladder. The vertical ladder is provided with the gravity sensor, has a certain load range, exceeds this scope and leads to the device inefficacy easily, from the security consideration, can rationally control vertical ladder load range through detecting the vertical ladder passenger number. The traffic of people taking a single straight ladder in one day is counted, the number of people in a community is counted for big data analysis aiming at a community scene, and a certain early warning effect is achieved in the aspect of safety management; for the scenes with large flow, such as schools and hospitals, the flow data are counted and analyzed, and the operation efficiency can be improved by optimizing and scheduling the straight elevator operation; for scenes such as a shopping mall and a shopping center, flow data are counted, reasonable layout of advertisement positions is facilitated, profits of operating assets are increased, and the like.

The invention relates to a Chinese patent with the application number of CN201710157587.0 in the prior art, in particular to a people counting method and device and an elevator dispatching method and system. And taking the connected domain mark aiming at the human head target pixel block in the two-dimensional projection image. The target area of the head is obtained by taking the maximum value of the target as the center of a circle and presetting the radius, and the number of the heads is obtained by comparing the area of the communication area with a preset area threshold value. The image processing algorithm in the invention belongs to the traditional pattern recognition algorithm, and some characteristics need to be designed manually. For example, the preset radius circle covers the candidate area, the environment is different in different elevator scenes, or the deviation of the preset value may be caused by the camera position deviation caused by human factors, and thus the method does not have universality.

Disclosure of Invention

The invention aims to provide a vertical ladder people counting method and system based on deep learning.

In order to solve the technical problems, the invention adopts the following technical scheme:

the embodiment of the invention provides a vertical ladder people counting method based on deep learning, which comprises the following steps:

the straight elevator monitoring camera judges whether a person exists through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and a lift car starts to analyze a request in a running state, triggers an image and calls a straight elevator pedestrian detection algorithm to perform target detection;

after receiving an analysis request, taking an image from a main code stream according to a time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;

and obtaining the specific number of the vertical ladders through the detection frame, writing the number into a database, and reporting to the aerial ladder platform.

Preferably, the starting to invoke the YOLOv3 algorithm to perform analysis to obtain the final detection box specifically is:

firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the bounding boxes predicted by the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detection box.

Preferably, three-scale feature map prediction is adopted, and when the input is 416 × 416, the feature maps are respectively 13 × 13, 26 × 26, 52 × 52 obtained by using a YOLOv3 detection network, specifically: the YOLOv3 detection network uses 23 residual error units behind the first group of DBLs, 6 groups of DBLs, and adds convolution layers to obtain 13 × 13 characteristic output for detecting large targets; the method comprises the following steps that tensor splicing is carried out on 23 residual error units, the characteristics of 26 x 26 obtained by branching DBL and up-sampling after 5 groups of DBLs and the characteristics of 26 x 26 obtained by 19 residual error units, concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26; making branches behind the 5 groups of DBLs to continue to make DBLs and upsampling to obtain 52 x 52 features, making tensor splicing with the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 feature output for detecting small targets after the spliced tensor passes through the 6 groups of DBLs and the convolution layer; and finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians.

Another aspect of the embodiments of the present invention is to provide a system for counting people on a vertical ladder based on deep learning, including:

the analysis request module is used for judging the presence of a person by the straight elevator monitoring camera through a camera presence-absence algorithm, judging the closing of the straight elevator door by a door opening and closing algorithm, starting an analysis request when the elevator car is in a running state, triggering an image and calling a straight elevator pedestrian detection algorithm for target detection;

the elevator vertical people counting module is used for taking an image from the main code stream according to the time node of the trigger image after receiving the analysis request, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;

and the writing-in uploading module is used for obtaining the specific number of the vertical ladders through the detection frame, writing the number into the database and reporting to the aerial ladder platform.

The invention has the following beneficial effects: the method is applied to the aerial ladder platform, and the detectability of the pedestrian target of the vertical ladder is improved by marking a large amount of data of the pedestrian of the vertical ladder and using the improved YOLOv3 network training data set; and the model obtained based on big data sample training has good universality and can be suitable for various scenes such as communities, schools, hospitals, hotels, markets and the like.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for counting the number of people on a vertical ladder based on deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a skeleton structure of a YOLOv3 detection network according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a system for counting people on a vertical ladder based on deep learning according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart illustrating steps of a method for counting the number of people on a vertical ladder based on deep learning according to an embodiment of the present invention is shown, which includes the following steps:

s1, the straight elevator monitoring camera judges whether a person is present through a camera unmanned algorithm, a door opening and closing algorithm judges that the straight elevator door is closed, and an analysis request is started when the elevator car is in a running state, an image is triggered, and a straight elevator pedestrian detection algorithm is called for target detection;

s2, after receiving the analysis request, taking an image from the main code stream according to the time node triggering the image, and starting to call a YOLOv3 algorithm for analysis to obtain a final detection frame;

the method for obtaining the final detection box by starting to call the YOLOv3 algorithm for analysis specifically comprises the following steps:

According to the method, the head area of the straight-ladder pedestrians is detected for statistics, and the condition that the traffic is large and crowded is considered, the straight-ladder pedestrians are seriously shielded, so that the detection of the head and the shoulders or the whole trunk is not considered. Fig. 2 shows a schematic diagram of YOLOv3 detection network framework, the network model size 246M. The input layer is the image to be detected, and DBL is the convolution layer + Batch Normalization layer + leak excitation layer. The calculation is accelerated by three-layer fusion. Where convolutional layers are used to extract features, convolution is a mathematical operation on two real variable functions. Generally, input image data and a kernel function are convoluted, the output is called feature mapping, image feature information is extracted through convolution operation in deep learning, if a two-dimensional image I is used as input, a two-dimensional kernel K is used, and the formula is as follows:

wherein M is_r,M_cRows and columns of I, K_r,K_cIs the row and column of K, m, n is the step size, and i, j should satisfy the condition: i is more than or equal to 0 and less than or equal to M_r+K_r-1,0≤j≤M_c+K_c-1。

In order to solve the problem of gradient disappearance during deep network training, the Batch Normalization layer standardizes the mapping value of the nonlinear function of the hidden layer neuron:

wherein x^(k)The mapping of neurons after excitation transformation, k is constant, E (x) is mean, and Var (x) is variance.

The leaky excitation layer performs nonlinear mapping on input neurons:

wherein x_iAs the weight of the neuron, y_iTo map, a_iI is a constant.

And in addition, a residual error unit is introduced to deepen the network depth. ResN is a residual unit, where x is the value output to the neuron by the previous layer; w is the weight by which x passes to the neuron; y is the output value of x within a neuron as determined by the activation function:

y＝F(x,ω)+x

the YOLOv3 network uses 23 residual units after the first group of DBLs, 6 groups of DBLs, and adds convolutional layers to obtain 13 × 13 characteristic outputs for large target detection. And tensor splicing is performed on the characteristics of 26 x 26 obtained by branching and up-sampling after 5 groups of DBLs and 26 x 26 obtained by 19 residual units after 23 residual units and 5 groups of DBLs, wherein concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26. And branching the 5 groups of DBLs, continuously performing DBL and upsampling to obtain 52 x 52 features, simultaneously performing tensor splicing on the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 features output for detecting the small target after the spliced tensor passes through 6 groups of DBLs and the convolution layer is added. And finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians. The three scales of 13 × 13, 26 × 26 and 52 × 52 are adopted to detect different target sizes as a whole, and since the detection effect of the YOLOv3 network on partial small target areas is not good, the detection performance of the YOLOv3 is improved by obtaining a small anchor through re-clustering on a straight ladder pedestrian data set.

And S3, obtaining the specific number of the vertical ladders through the detection frame, writing the number into the database, and reporting to the aerial ladder platform.

According to the method for counting the number of the vertical ladders based on the deep learning, 39 thousands of vertical ladder pedestrian samples under different scenes are marked, the Yolov3 network training model is utilized, the model obtained by training aiming at the data set has good universality in the application aspect, can adapt to pedestrian recognition under various illumination intensities, different vertical ladder types and vertical ladder scenes, and has high accuracy.

Corresponding to the method of the embodiment of the invention, the embodiment of the invention also provides a system for counting the number of people on the vertical ladder based on deep learning, and the functional block diagram is shown in fig. 2, which comprises the following steps:

wherein, the step of starting to call the YOLOv3 algorithm for analysis to obtain the final detection box specifically comprises the following steps:

The leaky excitation layer performs nonlinear mapping on input neurons:

wherein x_iAs the weight of the neuron, y_iTo map, a_iI is a constant.

y＝F(x,ω)+x

And the writing-in uploading module is used for obtaining the specific number of the vertical ladders through the detection frame, writing the number into the database and reporting to the aerial ladder platform. According to the vertical ladder people counting system based on deep learning provided by the embodiment of the invention, 39 thousands of vertical ladder pedestrian samples under different scenes are marked, the Yolov3 network training model is utilized, the model obtained by training aiming at the data set has good universality in the application aspect, can adapt to pedestrian recognition under various illumination intensities, different vertical ladder types and vertical ladder scenes, and has high accuracy.

It is to be understood that the exemplary embodiments described herein are illustrative and not restrictive. Although one or more embodiments of the present invention have been described with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.

Claims

1. A vertical ladder people counting method based on deep learning is characterized by comprising the following steps:

the specific number of the vertical ladder persons is obtained through the detection frame, the number is written into a database and reported to the aerial ladder platform at the same time,

firstly, normalizing the image of the straight-ladder pedestrian, converting the size of the image to 416 x 416, and then gridding the image, wherein if the center of the detected target falls into a certain grid, the grid is responsible for predicting the target; YOLOv3 used K-means to get 9 prior frames, and assumed three-scale feature map prediction, when the input is 416 × 416, the feature maps are 13 × 13, 26 × 26, and 52 × 52; each yolo layer uses three prior frames, the three prior frames are divided into feature maps with 3 scales according to the sizes of the prior frames, and the feature map with a large scale uses a small prior frame; each grid can predict a plurality of bounding boxes, and if the overlapping part of the predicted bounding boxes of the grid and the marked group route is the largest during training, the target can be judged to be in the grid, and the full time of the grid is responsible for the prediction of the target; partially, one target can be detected for multiple times, so that target frames are repeatedly detected, the non-maximum value is restrained to detect highly overlapped boundary frames, all prediction frames except the highest confidence coefficient are removed, and the same target is ensured to output only one detection frame; obtaining specific coordinate information and classification categories through regression of all the prediction frames; by setting a target confidence threshold, where the confidence is the probability of the target class multiplied by the overlap of the predicted target and the calibrated truth, the predicted box above the threshold is the last detected box,

and (3) adopting characteristic diagram prediction of three scales, when the input is 416 × 416, obtaining characteristic diagrams of 13 × 13, 26 × 26 and 52 × 52 by adopting a YOLOv3 detection network, specifically: the YOLOv3 detection network uses 23 residual error units behind the first group of DBLs, 6 groups of DBLs, and adds convolution layers to obtain 13 × 13 characteristic output for detecting large targets; the method comprises the following steps that tensor splicing is carried out on 23 residual error units, the characteristics of 26 x 26 obtained by branching DBL and up-sampling after 5 groups of DBLs and the characteristics of 26 x 26 obtained by 19 residual error units, concat is used for splicing tensors with the same scale, and the spliced tensors pass through 5 groups of DBLs and are added with convolution layers to obtain characteristic output of 26 x 26; making branches behind the 5 groups of DBLs to continue to make DBLs and upsampling to obtain 52 x 52 features, making tensor splicing with the 52 x 52 features obtained by 11 residual error units, and obtaining 52 x 52 feature output for detecting small targets after the spliced tensor passes through the 6 groups of DBLs and the convolution layer; and finally, the three layers of characteristic outputs are jointly used for detecting the straight-ladder pedestrians.

2. The utility model provides a vertical ladder people counting system based on degree of depth study which characterized in that includes:

a write-in upload module for obtaining the number of the specific vertical ladder through the detection frame, writing the number into the database and simultaneously reporting to the aerial ladder platform,