CN112991399A

CN112991399A - Bus passenger number detection system based on RFS

Info

Publication number: CN112991399A
Application number: CN202110308023.9A
Authority: CN
Inventors: 汪景; 吕军威; 刘志钢; 彭威
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-18
Anticipated expiration: 2041-03-23
Also published as: CN112991399B

Abstract

The invention relates to a bus passenger number detection system based on RFS, comprising: the video data acquisition and preprocessing module is used for shooting the door getting-on and getting-off areas through the vehicle-mounted camera device, acquiring the video data of the passengers getting-on and getting-off, and preprocessing the video data; the human head detection module is used for carrying out human head detection on the preprocessed video data through an SSD deep convolution neural network algorithm; the GM-PHD human head tracking module based on the RFS realizes human head tracking on video data by adopting a GM-PHD filtering algorithm based on the RFS; and the people counting output module is used for acquiring the motion track information according to the detected head information and counting the number of passengers according to the motion track information. Compared with the prior art, the method has the advantages of stable and accurate detection, no need of modifying the public transport vehicles and the like.

Description

Bus passenger number detection system based on RFS

Technical Field

The invention relates to the field of electronic information, in particular to a bus passenger number detection system based on RFS.

Background

With the rapid growth of economy, the continuous deepening of urbanization degree and the continuous increase of the number of motor vehicles in China, the traffic jam phenomenon is increasingly serious, and the urban sustainable development road is blocked to a great extent. The public transport is an important component of the urban passenger transport system, is related to various fields of social life and production, and undertakes the passenger transport task of most resident travel transportation. The optimal scheme for solving the severe traffic conditions of the city is to preferentially develop public transportation, scientifically and reasonably design a public transportation network and improve the comfort level and attraction of public transportation travel.

Real-time detection data of the number of passengers in a bus is an important part of bus system data. In the aspect of bus planning and operation, data support can be provided for bus network optimization; at the passenger demand aspect, unnecessary waiting time of passengers can be reduced through the public of the real-time number of the bus, so that other lines can be selected for going out. The existing bus passenger number detection methods mainly comprise three methods: the first method is pressure sensing detection, and the number of the passengers in the bus is obtained by carrying out statistical analysis on the total weight and the weight of the getting-on and getting-off components; the second is infrared detection, wherein infrared detection devices are arranged at the upper and lower doors of the automobile to detect and count the number of passengers getting on and off the automobile; the third is image processing, which directly detects the number of people by using all-around video image data in the carriage. The first and second methods can be realized only after the bus is modified; in the third method, the detection error of the number of people in the area repeatedly covered by the camera in the carriage is large, and the accuracy of statistical data is low.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a bus passenger number detection system based on RFS.

The purpose of the invention can be realized by the following technical scheme:

an RFS-based bus people number detection system, comprising:

the video data acquisition and preprocessing module is used for shooting the door getting-on and getting-off areas through the vehicle-mounted camera device, acquiring the video data of the passengers getting-on and getting-off, and preprocessing the video data;

the human head detection module is used for detecting the human head of the passenger through an SSD deep convolution neural network algorithm on the preprocessed video data;

the GM-PHD human head tracking module based on the RFS realizes the human head tracking of passengers on video data by adopting a GM-PHD filtering algorithm based on the RFS;

and the people counting output module is used for acquiring the motion track information according to the detected head information of the passengers and counting the number of the passengers according to the motion track information.

Specifically, the video data acquiring and preprocessing module includes:

the system comprises a bus door entering and leaving video acquisition submodule, a video data preprocessing submodule and a video data preprocessing submodule, wherein the bus door entering and leaving video acquisition submodule is used for shooting an entering and leaving door area through a vehicle-mounted camera device to acquire video data, acquiring the number of persons entering and leaving a bus when the bus arrives at a stop through detecting the number of persons in the entering and leaving video data, and wirelessly transmitting the video data to the video data preprocessing submodule;

and the video data preprocessing submodule is used for performing frame cutting processing on the video and standardizing the processed data.

The human head detection module includes:

the training submodule is used for marking the picture sequence of the video data processed by the video data acquisition and preprocessing module and training the SSD deep convolutional neural network by using the marked picture sequence;

and the prediction sub-module is used for inputting the video data to be detected into the trained SSD deep convolutional neural network, detecting the head of a passenger in the picture sequence to be detected, and transmitting the centroid position information of the detected head of the passenger to the GM-PHD head tracking module based on the RFS.

The GM-PHD human head tracking module based on RFS comprises:

a prediction submodule: according to the centroid position information of the passenger head, carrying out newborn target prediction, hatching target prediction and surviving target prediction in sequence;

the updating submodule updates the Gaussian element parameters by using the measurement and observation matrix and the measurement noise;

the pruning submodule is used for pruning the updated Gaussian element parameters, combining similar Gaussian elements and pruning the Gaussian element with the minimum weight;

the state extraction submodule extracts expected values corresponding to the Gaussian elements with the weight values larger than the threshold;

and the track identification submodule is used for carrying out human head movement detection by utilizing an animation track identification algorithm and outputting the track information of the human head of the passenger.

Furthermore, the people counting and outputting module counts the number of getting-on people by adopting a cross-line people group counting method and counts the number of getting-off people by adopting a cross-region people group counting method. The specific contents for counting the number of passengers getting on the bus by adopting the cross-line people group counting method are as follows:

firstly, determining a determination line, judging whether a detection target moves downwards and crosses the line according to motion track information transmitted by a GM-PHD head tracking module based on RFS, and counting the number of passengers getting on the bus by one if the detection target crosses the line; if the detection target does not cross the line, judging that the vehicle is not on the bus and not counting the number of people on the bus.

The specific contents of counting the number of getting-off people by adopting a cross-region people group counting method are as follows:

1) firstly, dividing three interested areas I, II and III of video data from top to bottom to adapt to crowds with different heights, judging the area where a passenger is located according to the current position of the passenger, and acquiring the movement displacement of the passenger;

2) judging the region of the passenger, if in the I district, then directly obtaining the motion displacement by its motion trail information and being N pixel points, if in the II district, obtaining it according to the motion trail information and having moved M pixel points in the II district, judging again whether it reaches the I district: if the motion displacement reaches the I area, the motion displacement is updated to N pixel points moving in the I area, and if the motion displacement does not reach the I area, the motion displacement is M pixel points; if the target person is in the area III, counting and judging after the target person enters the area II; and finally, judging the false detection condition: if the movement displacement (N/M) is larger than 60, counting the getting-off number, if the movement displacement (N/M) is smaller than or equal to 60, judging that the passengers do not get off the bus, and not counting the number of the passengers getting off the bus.

Further, the training submodule includes:

the prior frame matching unit is used for searching a prior frame with the maximum IOU (input output) of each real target to ensure that each real target corresponds to at least one prior frame, trying to match the remaining unmatched prior frames with any real target, and matching if the IOU between the prior frames and any real target is greater than a threshold value, wherein the real target is the head of a passenger;

a loss function selection unit that calculates a weighted sum of the position error and the confidence error;

the data augmentation unit is used for augmenting the data by a data augmentation method;

a fine adjustment unit: based on the Hole algorithm, fine tuning is carried out on the model trained by the training submodule, and a network structure is changed to obtain a denser score map;

and a filtering unit for deleting wrong, overlapped and inaccurate bounding boxes based on the NMS algorithm.

Further, the prediction sub-module includes:

the prediction frame filtering unit is used for determining the maximum confidence coefficient of each prediction frame according to the category confidence coefficient, and filtering the prediction frame with the lower confidence coefficient according to the confidence coefficient threshold after filtering the prediction frame with the background according to the confidence coefficient;

the prediction frame decoding unit is used for decoding the left prediction frames, obtaining the real position parameters of the prediction frames according to the prior frames, performing descending order arrangement on the prediction frames according to the confidence coefficient and reserving the first k prediction frames;

and the filtering unit filters the prediction boxes with larger overlapping degree based on the NMS algorithm and takes the residual prediction boxes as the detection result.

Further, the trajectory recognition submodule carries out human head movement detection by adding an extraction flight path recognition algorithm optimized by a correction mechanism.

Compared with the prior art, the bus number detection system based on the RFS obtains the head detection result through the SSD deep convolution neural network algorithm, provides more stable and accurate measurement data for the tracking algorithm, and improves the bus number detection accuracy; compared with other public transport number detection methods, the number detection system detects the number of passengers getting on or off the bus, so that the number counting result is more effective and accurate, the implementation is convenient, and the public transport vehicle is not required to be modified; the obtained real-time bus number detection data can provide data support for bus network planning, bus dispatching, bus stop passenger flow research, evacuation and the like.

Drawings

FIG. 1 is a schematic structural diagram of a bus people number detection system based on RFS in an embodiment;

FIG. 2 is a schematic flow chart of an extraction track identification algorithm in the embodiment;

FIG. 3 is a flow chart of the statistics of the number of people getting on and off the vehicle in the embodiment.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

Examples

As shown in FIG. 1, the invention relates to a bus people number detection system based on RFS, which comprises the following modules:

A. a video data acquisition and preprocessing module;

B. a human head detection module;

C. a GM-PHD head tracking module based on RFS;

D. and a people counting output module.

The human head detection module is connected with the GM-PHD human head tracking module based on RFS and used for transmitting and detecting the mass center position information of the human head. The GM-PHD human head tracking module based on the RFS is connected with the people counting output module, and the module transmits the motion trail information of the detected human head.

The video data acquisition and preprocessing module specifically comprises two sub-modules:

a1, getting on and off the bus video acquisition submodule: the getting-on and getting-off videos are obtained by shooting the getting-on and getting-off door areas through the vehicle-mounted camera device, and the video data are transmitted to the next sub-module through the 4G wireless communication technology, so that the original video data are provided for the modules.

A2, video data preprocessing submodule: the method is used for performing frame cutting processing on the video and standardizing the processed data. The module can be remotely completed by a computer. As a preferred embodiment, the computer may use OpenCV to perform video framing and then normalize the input picture size to 300 x 300 with tensflow backup.

In the human head detection module, the invention realizes the detection of the target by using the SSD deep convolution neural network algorithm with better detection effect through computer programming. Because the target shelters from seriously when whole body detects, and considers that camera device's angle and head shelter from the lighter selection of problem and carry out remote target detection to passenger's head, mainly include two submodule pieces:

b1, training submodule: and the data processed by the video data preprocessing submodule is remotely sent to the training submodule, a marking person manually marks the picture sequence of the conventional video data processed by the video data acquisition and preprocessing module by using a VoTT marking tool, and trains a neural network by using the manually marked picture sequence so as to train a human head detection model more suitable for the situation.

B2, prediction submodule: and inputting the video data to be detected into a neural network, and performing human head detection on the picture sequence to be detected through the trained neural network. And transmitting the information of the centroid position of the detected head to the next module.

The training submodule specifically comprises the following steps:

b11, prior box matching: the prior frame is a rectangular frame with different sizes predefined at each position of the feature map, and has different aspect ratios for matching with a real object. The first step of training requires matching the real target, i.e. the passenger's head, with the prior box, and predicting this target by the bounding box corresponding to the matched prior box. A priori box matching in SSD is performed by: firstly, aiming at each real target, matching a prior frame with the maximum cross-over ratio (IOU) to ensure that each real target at least corresponds to one prior frame; the remaining unmatched prior boxes are then matched with all real targets with IOUs greater than a threshold (threshold of SSD 300 is 0.5). The prior frame matched with the real target is generally called as a positive sample, and the prior frame matched with the background is called as a negative sample. Because the negative samples are relatively more, the SSD samples the negative samples according to the principle that the confidence error is larger and the priority is higher, and the SSD is used as the negative samples during training, so that the proportion of the positive samples to the negative samples is about 1: 3. the IOU is an index for measuring the overlapping degree of the rectangular frames, and is equal to the ratio of the intersection area of the two rectangular frames to the union area of the rectangular frames.

The feature map is firstly divided into S multiplied by S grids, when the central point of a potential target is in a certain grid, the grid generates B boundary prediction frames to predict the potential target, and each boundary prediction frame has own confidence. Then the loss function selection is performed next:

b12, selecting a loss function: the loss function is a weighted sum of the bit value error and the confidence error:

wherein, N is the number of positive samples of the prior frame, and if N is 0, the loss function value is 0; c is a category confidence degree predicted value; l is a position predicted value of the boundary frame corresponding to the prior frame; g is a position parameter of the real target; the weight coefficient α is set to 1 by cross validation.

Position error was calculated using Smooth L1 loss:

wherein x is_ijE {1,0}, if x_ijIf the number of the real targets is 1, the ith prior frame is matched with the jth real target; (cx, cy) is the center of the prior frame d; w is the width of the prior frame; h is the height of the prior frame;

the central abscissa of the jth real target;

the central abscissa of the ith prior box.

Confidence errors were calculated using softmax loss:

wherein p is a real target species;

if it is

It indicates that the ith prior frame matches the jth real target, and the kind of the jth real target is p.

B13, data augmentation: the number of training samples is increased by various data enhancement methods such as horizontal turning, cutting, amplifying and reducing, and the robustness of the algorithm to the input targets with different sizes and different shapes is improved.

B14, Atrous Algothirm: and (3) carrying out fine adjustment on the model trained by the B1 training submodule by using a Hole algorithm, and changing the network structure to obtain a denser score map.

B15, NMS algorithm filtering: the multiple feature level features of the SSD may generate more bounding boxes, and there may still be more false, overlapping, and inaccurate bounding boxes after the IOU processing optimization. Therefore, optimization processing is carried out by using non-maximum suppression so as to improve the speed and the precision of target detection. The principle is as follows: when a plurality of bounding boxes contain the same real target and the IOU of the bounding box is higher, the bounding box with the highest score is selected, and the rest bounding boxes are deleted.

In the prediction submodule B2, the specific implementation steps are as follows:

b21, prediction box filtering: for each prediction frame, determining a category (the one with the maximum confidence) according to the category confidence, filtering out the prediction frame with a background according to the category confidence, and filtering out the prediction frame with a lower confidence value according to a confidence threshold (0.5);

b22, decoding a prediction box: decoding the left prediction frames, obtaining the real position parameters of the prediction frames according to the prior frames, generally performing descending arrangement according to confidence, and only keeping the first n (such as 400) prediction frames;

b23, NMS algorithm filtering: like B15, the prediction blocks with larger overlap are filtered out, and the remaining prediction blocks are the detection results.

In the module C, in the GM-PHD head tracking module based on RFS, a Gaussian Mixture-Probability Hypothesis Density (GM-PHD) filter in a Random Finite Set (RFS) lossy filtering algorithm is used to filter the part of the video data that realizes head tracking. The following algorithm process implementation sub-modules at the time k + 1:

the Gaussian element parameter of the posterior PHD at the k moment is assumed to be

The Gaussian element parameter of the PFS posterior PHD of the new target at the k +1 moment is

The Gaussian element parameter of the incubation target PFS posterior PHD is

Wherein the content of the first and second substances,

is the mean of the ith Gaussian at time k;

is the weight of the ith Gaussian element at the moment k;

is the covariance of the ith gaussian element at time k; j. the design is a square_kIs the number of gaussian elements of the check PHD after time k.

C1, predictor sub-module:

(1) predicting a new target: directly taking the Gaussian parameter of the newly-generated targets PHD at the moment of k +1 as the predicted PHD parameter, wherein the newly-generated targets are J_γ,k+1When J is 1, …, J_γ,k+1，i＝1,…,J_γ,k+1When there is

(2) Predicting hatching targets: the number of incubation targets at the k +1 moment is J_β,k+1When J is 1, …, J_β,k+1，l＝1,…,J_k，i＝J_γ,k+1,…,(J_γ,k+1+J_kJ_β,k+1) In time, there are:

wherein the content of the first and second substances,

is state ofTransferring the matrix;

is the incubation target state noise covariance;

is the noise weight of the jth Gaussian at time k;

is the mean of the ith Gaussian component at the moment k + 1;

is the covariance of the ith Gaussian component at the time when k shifts to k + 1;

(3) predicting surviving targets: let the survival probability be p_S，j＝1,…,J_k，

i＝(J_γ,k+1+J_kJ_β,k+1),…,(J_γ,k+1+J_kJ_β,k+1+J_k) In time, there are:

c2, update submodule: by measuring random set Z_k+1The observation matrix H and the measurement noise covariance update weight R are updated. Let the detection probability be P_D。

Updating the undetected human head target by: when J is 1, …, J_γ,k+1+J_kJ_β,k+1+J_kIn time, there are:

for the detected human head target, the centroid coordinate obtained by the human head detection module is used as a measurement random set Z_k+1Update PHD, pair

When J is 1, …, J_k+1|kIn time, there are:

wherein, κ_k(z) probability of cluttered RFS for poisson distribution; n (. mu.,. sigma.) represents the Gaussian density with mean μ and variance σ.

C3, trimming submodule: in order to reduce clutter and improve algorithm speed, the updated Gaussian element parameters need to be pruned, similar Gaussian elements are combined, and the Gaussian element with the minimum weight is pruned;

c4, status extraction submodule: extracting expected values corresponding to Gaussian elements with weights larger than a threshold;

c5, track identification submodule: as shown in FIG. 2, an operation flow is implemented for the extraction track recognition algorithm. (1) Extracting color histograms of a determined target state of a previous frame and a current frame state estimation region, calculating a cost function matched with any confirmed state and any current estimation state region, and generating an incidence matrix; (2) initializing a track price for an unassigned successful state estimate (price is set to 0); (3) carrying out 'optimal track' matching; (4) and judging, and if all the matching is successful, outputting the flight path. If not, the last cycle matching result needs to be removed, the track price is updated, and the optimal track matching is carried out again until all the matching is successful. In the invention, the color histogram describes the proportion of different colors in the whole image; the cost function is used for calculating the associated cost of the two states; the incidence matrix is a two-dimensional matrix consisting of incidence costs, and the incidence costs of the state estimation of all current frames and the confirmation states of all previous frames are calculated in a traversing manner; the "optimal track" is the motion track of the target.

Because the auction-based track recognition algorithm is only applicable to a random set of state estimates for which the number of targets is constant. Therefore, when the head target tracking is performed on the bus getting-on/off video, a correction mechanism is required to be added to optimize the algorithm and obtain the correct track. The basic principle of the correction mechanism is as follows: assume that the current frame state estimation random set summarizes the number of states as N (k +1) and the last frame confirmed estimation number is l (k). If N (k +1) is more than or equal to L (k), generating the flight path of L (k) human head targets by using the flight path identification algorithm, performing similarity calculation on the residual N (k +1) -L (k) state estimates and the known new target states according to the Papanicolaou distance of the color histogram characteristics for matching, selecting the successfully matched state estimate as the flight path of the current frame new target, and regarding the residual state estimate as clutter deletion; if N (k +1) is less than L (k), N (k +1) tracks are generated by applying the track recognition algorithm and represent the situation that the head target disappears.

The prediction submodule is connected with the updating submodule to transmit a prediction Gaussian parameter, the updating submodule is connected with the trimming submodule to transmit an updated Gaussian parameter, the trimming submodule is connected with the state extraction submodule to transmit a merged and deleted Gaussian parameter, the state extraction submodule is connected with the track identification submodule to transmit a target state estimation random set and a target number estimation random set, and the track identification submodule is connected with the people counting output module to transmit motion track information of a detected person.

In the module D, the people counting output module is implemented as shown in the flow chart of the people counting output module in fig. 3. When people are on the bus, a simpler cross-line people group counting method is adopted to reduce the complexity of the system and improve the efficiency; a cross-region people group counting method is adopted during the statistics of the number of passengers getting off, three regions of interest are defined to adapt to the cross-region counting of people with different heights, and the accuracy of people counting is improved.

Counting the number of passengers getting on the bus. Firstly, defining a determination line according to a head motion boundary line of passengers getting on the bus in historical video data, judging whether a detection target moves downwards and crosses the line according to motion track information transmitted by a GM-PHD head tracking module based on RFS, and counting the number of passengers getting on the bus by one if the detection target crosses the line; if the detection target does not cross the line, the detection target is not on the vehicle and is not counted.

And counting the number of people getting off. The method comprises the steps of firstly, utilizing two transverse tangents to define three interested areas for video data, and sequentially naming the three interested areas as an area I, an area II and an area III from top to bottom so as to facilitate the judgment of getting on and off of passengers with different heights. Judging the area where the passenger is located according to the current position of the passenger to obtain the movement displacement of the passenger: if the current pixel point is in the I area, the motion displacement is directly obtained from the motion track information of the current pixel point as N pixel points; if the mobile terminal is in the area II, M pixel points which move in the area II are obtained according to the motion trail information, and whether the mobile terminal reaches the area I is judged: if the motion displacement reaches the I area, the motion displacement is updated to N pixel points moving in the I area, and if the motion displacement does not reach the I area, the motion displacement is M pixel points; and if the target person is in the area III, waiting for the target person to enter the area II and then performing counting judgment. And finally, judging the false detection condition: if the movement displacement (N/M) is larger than 60, counting the getting-off, and if the movement displacement (N/M) is smaller than or equal to 60, the passengers do not get off and are not counted.

The stops are numbered in sequence, and the number of the passengers n (t) getting on the bus and the number of the passengers m (t) getting off the bus at the stop t are obtained through the operation. Assuming that the number of the buses after the bus passes the stop t is X (t), the following steps are provided:

X(t)＝X(t-1)+n(t)-m(t)(t＝1,2，3，…，X(0)＝0)

while the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An RFS-based bus passenger number detection system is characterized by comprising:

2. The RFS-based bus passenger number detection system as claimed in claim 1, wherein said video data acquisition and pre-processing module comprises:

3. The RFS-based bus passenger number detection system as claimed in claim 1, wherein the head detection module comprises:

4. The RFS-based bus passenger number detection system as claimed in claim 3, wherein said RFS-based GM-PHD head tracking module comprises:

5. The RFS-based bus passenger number detection system according to claim 1, wherein the passenger number counting output module counts the number of passengers getting on the bus by using an over-line passenger group counting method, and counts the number of passengers getting off the bus by using an over-region passenger group counting method.

6. The RFS-based bus passenger number detection system as claimed in claim 3, wherein said training sub-module comprises:

7. The RFS-based bus occupant detection system according to claim 3, wherein said prediction sub-module comprises:

8. The RFS-based bus passenger number detection system as claimed in claim 4, wherein the trajectory recognition sub-module performs human head movement detection by adding an extraction trajectory recognition algorithm optimized by a correction mechanism.

9. The RFS-based bus passenger number detection system as claimed in claim 5, wherein the specific content of the people counting output module for counting the number of passengers getting on the bus by adopting an over-line people group counting method is as follows:

10. The RFS-based bus passenger number detection system as claimed in claim 5, wherein the specific content of the people number counting output module for counting the number of getting-off passengers by adopting a cross-region people group counting method is as follows: