CN116229376A

CN116229376A - Crowd early warning method, counting system, computing device and storage medium

Info

Publication number: CN116229376A
Application number: CN202310499633.0A
Authority: CN
Inventors: 张伟; 戴祥麟
Original assignee: Shandong Yishi Intelligent Technology Co ltd
Current assignee: Yantai Jiuyuan Technology Service Co.,Ltd.
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2023-06-06
Anticipated expiration: 2043-05-06
Also published as: CN116229376B

Abstract

The invention discloses a crowd early warning method, a counting system, a computing device and a storage medium, belonging to the technical field of machine learning, comprising the following steps: collecting image data by using a monitoring camera of subway station equipment; performing head detection on passengers in the subway station by adopting a target detection algorithm; evaluating crowd crowding degree of the subway station image acquisition area according to the passenger head detection result; according to the actual crowd flowing conditions of different subway stations, setting a threshold value of crowd crowding degree evaluation indexes, and alarming by an alarm after the crowd crowding degree exceeds the set threshold value. The invention can directly detect the crowd head information of the subway station under the condition of crowded staff in the commute time period, count the staff, and effectively overcome the singleness of crowd crowding judgment by adopting absolute number by constructing the evaluation index of crowd crowding degree, so that the judgment of the crowd crowding degree of the passengers of the subway station is more scientific.

Description

Crowd early warning method, counting system, computing device and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a crowd early warning method, a counting system, computing equipment and a storage medium.

Background

With the development of economy and the increasing population, the scale of cities is larger and larger, and the pressure of urban subway systems is also increasing. The subway station is particularly crowded during commuting to work and off work, and in order to avoid accidents such as crowded stepping and the like caused by too dense personnel, the subway station is generally provided with a corresponding alarm device.

In daily practice, the inventor finds that the prior technical scheme has the following problems:

the traditional way of guaranteeing crowd safety mainly utilizes manual video monitoring, subjectively judges crowd state, and when crowd intensity is too great, then carries out manual early warning. Another widely used way is to count by using photoelectric sensors and millimeter wave radars, which is more traditional and can not accurately grasp the distribution information of people. There is no mature solution for high-end demands such as population count in irregular areas. The method is also based on the density map to count people, meanwhile, the method can output the crowd density map, provides key information of crowd space distribution, has good effect on ultra-dense areas, but has large change of the number of subway station people and rapid change of the positions of the people, and meanwhile, the method is excessively large in model parameter quantity, is not beneficial to the deployment of subway station equipment and cannot output the crowd number in real time.

In view of the foregoing, it is necessary to provide a new solution to the above-mentioned problems.

Disclosure of Invention

In order to solve the technical problems, the application provides a crowd early warning method, a counting system, a computing device and a storage medium, which can directly detect crowd head information of a subway station under the condition of crowded staff in a commute time period and count staff.

A crowd early warning method comprising:

collecting image data by using a monitoring camera of subway station equipment;

performing head detection on passengers in the subway station by adopting a target detection algorithm;

evaluating crowd crowding degree of the subway station image acquisition area according to the passenger head detection result;

setting thresholds of crowd crowding degree evaluation indexes according to actual crowd flowing conditions of different subway stations, and alarming by an alarm after the crowd crowding degree exceeds the set thresholds;

wherein, evaluation subway station image acquisition region's crowd crowded degree includes:

calculating the total pixel sum of all detection frames in each frame of the monitoring video, and counting the total occupied area of passengers in the detection area;

judging whether the passenger head detection frame is overlapped with the adjacent detection frame according to the position coordinates of the passenger head detection frame, and if the passenger head detection frame is overlapped with the adjacent detection frame, calculating the overlapping area of the passenger head detection frame;

and taking the ratio of the total occupied area of passengers to the overlapping area in the detection area as an evaluation index of crowd crowding degree.

Preferably, before the head of the subway station passenger is detected by adopting the target detection algorithm, training of the target detection algorithm is finished; the training of the target detection algorithm comprises the following steps:

creating a pedestrian head detection data set;

performing data enhancement on the data set;

writing a yaml configuration file of a data set;

performing head detection on the pictures in the enhanced data set by adopting a target detection algorithm;

calculating a loss function of the target detection algorithm, judging whether the loss function meets the requirement, and completing training of the target detection algorithm after the loss function meets the requirement;

and when the value of the loss function does not meet the requirement, adopting a Cosine LR scheduler scheduler to dynamically adjust the learning rate.

Preferably, the performing header detection on the picture in the enhanced data set by using the target detection algorithm includes:

adjusting the picture size to 960 pixels at the input;

before entering a backbone network module of a target detection network, the picture firstly enters a Focus module for slicing;

and carrying out convolution operation on the sliced picture to obtain a double downsampling characteristic diagram under the condition of no information loss.

Preferably, before the picture enters the backbone network module of the target detection network, the picture enters the Focus module for slicing, and the step of entering the Focus module for slicing includes:

and taking a value from every other pixel in each picture to obtain four complementary sampling pictures, so that the channel width and the channel height of the sampling pictures are reduced to half of the original image, but the input channels are expanded by 4 times, and the spliced pictures form 12 channels relative to the original RGB three channels.

Preferably, an adaptive calculation anchor frame strategy is adopted to adjust the width and the height of the detection frame;

the calculation formula of the width of the detection frame is as follows:

；

in the method, in the process of the invention,

representing the width of the detection frame,/-, and>

representing the width of the overall image, +.>

Representing the width of tensor +.>

Representing an activation function;

the calculation formula of the height of the detection frame is as follows:

；

in the method, in the process of the invention,

indicating the height of the detection frame,/-, and>

representing the height of the whole image, +.>

Representing the height of tensor +.>

Representing an activation function.

Preferably, a pre-heat training strategy is used before training the target detection algorithm; the pre-heat training strategy comprises the following steps: firstly, training is carried out for 5 times by using learning rate data smaller than a preset learning rate, and then training is carried out by modifying the learning rate data into the preset learning rate.

Preferably, the network layer part of the target detection network adopts a structure of combining a characteristic pyramid network and a path aggregation network; and the output end of the detection head of the target detection network uses the CIOU loss function as the loss function of the bounding box.

According to another aspect of the application, a counting system is further provided, which is characterized by being applicable to the crowd early warning method, including a monitoring camera, an engineering machine, an alarm, a monitoring video storage and a display screen; the engineering machine is internally provided with a target detection algorithm; the engineering machine comprises an algorithm processing module; the engineering machine can call the video information of the monitoring camera and process the video information through the algorithm processing module to obtain a detection frame of the head of the passenger.

According to another aspect of the present application, there is also provided a computing device, comprising: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the computing device is running, the processor and the memory are communicated through the bus, and the machine-readable instructions are executed by the processor to perform the steps of the crowd early warning method.

According to another aspect of the present application, there is also provided a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the crowd early warning method.

Compared with the prior art, the application has the following beneficial effects:

1. aiming at the problem that the traditional pedestrian detection has poor detection effect on personnel stacking in a personnel-intensive place, the invention provides a method for directly detecting the heads of passengers in a subway station under a monitoring view angle, which can directly detect the head information of the passengers in the subway station under the condition of personnel-intensive commute time period and count the personnel.

2. The invention constructs the evaluation index of crowd crowding degree, effectively overcomes the singleness of crowd crowding judgment by adopting absolute number of people, and makes the judgment of the crowd crowding degree of passengers in subway stations more scientific.

3. The model designed by the invention has smaller structure and low calculation force requirement, and is favorable for deployment on equipment in the actual scene of the subway station.

4. The model adopted by the invention carries out various data enhancement modes in the preparation process of the data set, so that the data volume is greatly improved, the generalization performance of the model is enhanced, and the effect of the model is more stable.

5. The invention can realize the crowd counting function by improving the original camera, simplify the system and effectively reduce the cost.

Drawings

Some specific embodiments of the invention will be described in detail hereinafter by way of example and not by way of limitation with reference to the accompanying drawings. The same reference numbers will be used throughout the drawings to refer to the same or like parts or portions. It will be appreciated by those skilled in the art that the drawings are not necessarily drawn to scale. In the accompanying drawings:

FIG. 1 is a schematic overall flow chart of the present invention.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

As shown in FIG. 1, the crowd early warning method comprises the following steps:

and S1, acquiring image data by using monitoring cameras of subway station equipment.

And S2, performing head detection on passengers in the subway station by adopting a trained target detection algorithm.

And S3, evaluating the crowd crowding degree of the subway station image acquisition area according to the passenger head detection result.

And marking a head detection frame of the personnel in the video image, and displaying the number of the personnel in the frame in the video image in real time. And defining a detection area according to the requirements, and determining pixel coordinates of four vertexes of the area.

step S31, calculating the total pixel sum of all detection frames in each frame of the monitoring video, and counting the total occupied area of passengers in the detection area.

And S32, judging whether the passenger head detection frame is overlapped with the adjacent detection frame according to the position coordinates of the passenger head detection frame, and if the passenger head detection frame is overlapped with the adjacent detection frame, calculating the overlapping area of the passenger head detection frame.

And S33, taking the ratio of the total occupied area of passengers in the detection area to the overlapping area as an evaluation index of crowd crowding degree.

Namely, the evaluation indexes of crowd crowding degree are as follows: S1/S2. Wherein S1 is the total occupied area of passengers in the detection area, and S2 is the overlapping area. Different evaluation index values may be set as thresholds according to construction conditions of different stations and their management capabilities.

And S4, setting thresholds of crowd crowding degree evaluation indexes according to actual crowd flowing conditions of different subway stations, and giving an alarm after the crowd crowding degree exceeds the set thresholds.

According to the actual space size of different subway stations, a threshold value of a crowd crowding degree evaluation index is set, namely, when the threshold value is larger than, an alarm gives an alarm, the number of people in a video changes the color, so that workers are reminded of excessive people in the subway stations in a scene at the moment, people need to be evacuated, and meanwhile, the background records the current crowding scene, so that the background is convenient to view.

Furthermore, before the head detection of the subway station passenger by adopting the target detection algorithm, the method further comprises the following steps:

step S10, training a target detection algorithm.

The training of the target detection algorithm comprises the following steps:

and S100, manufacturing a pedestrian head detection data set.

Specifically, through internet collection, obtain HT21 dataset, this dataset adopts the control visual angle, relates to subway station scene, and crowd is sparse all has. The dataset is initially processed and each tag is converted to a (x, y, w, h, cls). Wherein x represents the x-axis coordinate of the center point of the marking frame, y represents the y-axis coordinate of the center point of the marking frame, w represents the width of the marking frame, and h represents the height of the marking frame.

Step S200, data enhancement is carried out on the data set.

Specifically, the enhancement of image data is mainly performed in the following ways: random rotation of the image, adding Coarse Dropout noise, performing color disturbance, and adding Gaussian noise. The enhanced data set is divided into a training set and a testing set.

Step 300, writing a data set yaml configuration file.

The yaml configuration file of the data set contains addresses of a training set and a testing set, names of data categories and numbers of the data categories.

And step 400, performing head detection on the pictures in the enhanced data set by adopting a target detection algorithm.

The target detection algorithm is established in a fusion mode and comprises a detection head of a target detection network and a network layer of the target detection network.

The detection head part of the target detection network can be one of a thunder target detection algorithm, a Yolo target detection algorithm, an SDD target detection algorithm, a DETR target detection algorithm, a CenterNet target detection algorithm, a TTFNet target detection algorithm, an FCOS target detection algorithm and a Nanodet target detection algorithm.

The network layer (negk) portion of the object detection network employs a Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) combined architecture. The combination operation FPN layer conveys strong semantic features from top to bottom, the feature pyramid conveys strong positioning features from bottom to top, the combination operation FPN layer and the feature pyramid are combined with each other, and parameter aggregation is carried out on different detection layers from different trunk layers.

The detector head output of the target detection network uses the CIOU loss function as the loss function of the bounding box.

；

Wherein distancce is a kind of ₂ Represents the DISTANCE between the center points of the predicted frame and the target frame _C The diagonal distance representing the smallest bounding rectangle, v is the influencing factor,

is the cross-over ratio, i.e., the overlap ratio of the predicted and actual bounding boxes.

First, the picture size is adjusted to 960 pixels at the input to overcome the small person head object. The picture then enters the Focus module for slicing before entering the Backbone network (Backbone) module of the object detection network. The specific operation is that every other pixel in one picture is taken to a value similar to adjacent downsampling, thus four pictures are taken, the four pictures are complementary, but no information is lost, the channel width and the channel height are reduced to half of the original ones, but the input channels are expanded by 4 times, namely, the spliced picture is changed into 12 channels relative to the original RGB three-channel mode, and finally, the obtained new picture is subjected to convolution operation, and finally, the double downsampling characteristic diagram under the condition of no information loss is obtained.

And selecting an SGD-Momentum optimizer to optimize the target detection network, wherein the Momentum value Momentum is set to 0.9, and the acceleration value nester ov is set to true. The training iteration number is set to be 100, the training iteration number is stopped in advance about 60 through experiments, and the loss function is converged.

In the post-processing process of target detection, a non-maximum suppression operation is usually required for screening a plurality of target frames, because the CIOU loss function contains information about an influence factor v and a true value (ground trunk), and no true value exists in test reasoning. The network uses the combination of CIOU loss function and DIOU non-maximum suppression as Weighted non-maximum suppression (Weighted NMS) to screen the best detection box from multiple candidate boxes.

The loss function of the target detection network consists of three parts, namely classification loss, positioning loss and confidence loss. Wherein the classification loss and the positioning loss are calculated using a binary cross entropy loss function:

；

wherein BCEWITHLogitsLoss represents a binary cross entropy loss function,

the weight representing the current factor is used,

representing the x-axis coordinate of the center point of the marking frame, < + >>

Representing the y-axis coordinate of the center point of the labeling frame, < + >>

Representing an activation function.

Confidence loss calculation uses the CIOU function calculation, while classification loss, using the BCE loss function, where only the classification loss for positive samples is calculated:

；

wherein, as the loss function of BCE,

weight representing the current factor, +.>

Representing an activation function.

The network predicts three prediction frames for each grid of 80 x 80 grids, and since only the passenger heads are detected, the total class number is 1, so that the prediction information of each prediction frame only comprises 1 classification probability, and finally a probability matrix of [3 x 80 x 1] is formed.

Confidence loss, referred to as CIOU of the network predicted target bounding box and the real bounding box, is used with BCE loss function. Calculated here is the confidence loss for all samples.

The positioning loss is calculated by using CIOU loss function.

And calculating a loss function of the target detection algorithm, judging whether the loss function meets the requirement, and completing training of the target detection algorithm after the loss function meets the requirement.

；

；/>

Wherein, the liquid crystal display device comprises a liquid crystal display device,

learning rate representing the present period, +.>

Learning rate representing next cycle, +.>

Is set to the initial learning rate, ">

Representing the minimum learning rate,/->

Representing learning period, the minimum learning rate defaults to 1e-5, < >>

Is a constant, wherein K is an integer and +.>

。

And the test picture predicts the coordinates and the categories of the head frame of the passenger through the characteristic extraction and the fusion process of the characteristic map of the target detection network, wherein the categories are the same identification.

In addition, a preheating training strategy can be used before the target detection network training starts, the learning rate not exceeding 1e-3 is used for carrying out 5 times of iterative training, and then the training is carried out by modifying the learning rate into a preset learning rate, so that the model is helped to slow down the phenomenon that the gradient is reduced and the fitting phenomenon is carried out in advance in the initial stage, and the stability of distribution and the stability of the deep layer of the model are maintained.

The width and the height of the detection frame are adjusted by adopting an adaptive calculation anchoring frame strategy, and the problems of gradient explosion and unstable training are avoided by adjusting the width and the height of the prediction target frame. Checking the labeling information in the dataset before starting the training of the target detection network, calculating the optimal recall rate of the labeling information in the dataset for the default anchoring frame, and when the optimal recall rate is greater than or equal to 0.98, not needing to update the anchoring frame; if the optimal recall is less than 0.98, then the anchor boxes conforming to this dataset need to be recalculated.

The calculation formula of the width of the detection frame is as follows:

；

in the method, in the process of the invention,

representing the width of the detection frame,/-, and>

representing the width of the overall image, +.>

Representing the width of tensor +.>

Representing an activation function.

The calculation formula of the height of the detection frame is as follows:

；

in the method, in the process of the invention,

indicating the height of the detection frame,/-, and>

representing the height of the whole image, +.>

Representing the height of tensor +.>

Representing an activation function.

Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or described herein.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The crowd early warning method is characterized by comprising the following steps of:

collecting image data by using a monitoring camera of subway station equipment;

2. The crowd early warning method of claim 1, further comprising training a target detection algorithm before head detection of subway station passengers using the target detection algorithm; the training of the target detection algorithm comprises the following steps:

creating a pedestrian head detection data set;

performing data enhancement on the data set;

writing a yaml configuration file of a data set;

3. The crowd early warning method of claim 2, wherein the employing a target detection algorithm to perform head detection on pictures in the enhanced dataset comprises:

adjusting the picture size to 960 pixels at the input;

4. The crowd early warning method of claim 3, wherein the picture is sliced by entering a Focus module before entering a backbone network module of the target detection network, the entering the Focus module to slice comprising:

5. The crowd early warning method of claim 2, wherein an adaptive calculation anchor frame strategy is adopted to adjust the width and the height of the detection frame;

the calculation formula of the width of the detection frame is as follows:

；

in the method, in the process of the invention,

representing the width of the detection frame,/-, and>

representing the width of the overall image, +.>

Representing the width of tensor +.>

Representing an activation function; />

The calculation formula of the height of the detection frame is as follows:

；

in the method, in the process of the invention,

indicating the height of the detection frame,/-, and>

representing the height of the whole image, +.>

Representing the height of tensor +.>

Representing an activation function.

6. The crowd early warning method of claim 2, wherein a warm-up training strategy is used prior to training a target detection algorithm; the pre-heat training strategy comprises the following steps: firstly, training is carried out for 5 times by using learning rate data smaller than a preset learning rate, and then training is carried out by modifying the learning rate data into the preset learning rate.

7. The crowd early warning method of claim 3, wherein the network layer part of the target detection network adopts a structure of combining a characteristic pyramid network and a path aggregation network; and the output end of the detection head of the target detection network uses the CIOU loss function as the loss function of the bounding box.

8. A counting system which is characterized by being applicable to the crowd early warning method of any one of claims 1-7, comprising a monitoring camera, an engineering machine, an alarm, a monitoring video storage and a display screen; the engineering machine is internally provided with a target detection algorithm; the engineering machine comprises an algorithm processing module; the engineering machine can call the video information of the monitoring camera and process the video information through the algorithm processing module to obtain a detection frame of the head of the passenger.

9. A computing device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the computing device is running, the machine-readable instructions when executed by the processor performing the steps of the crowd early warning method of any one of claims 1 to 7.

10. A computer storage medium, wherein a computer program is stored on the computer storage medium, which computer program, when being executed by a processor, performs the steps of the crowd early warning method as claimed in any one of claims 1 to 7.