CN114092866A

CN114092866A - Method for predicting space-time distribution of airport passenger flow

Info

Publication number: CN114092866A
Application number: CN202010754090.9A
Authority: CN
Inventors: 万竞军
Original assignee: Shanghai Hongqiao Hub Transportation Center Construction And Development Co ltd
Current assignee: Shanghai Hongqiao Hub Transportation Center Construction And Development Co ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2022-02-25

Abstract

The invention discloses a method for predicting the space-time distribution of airport passenger flow, which comprises the steps of counting the number of passengers in a data end through a video shot by a security camera, obtaining an original video record from each camera, extracting the number of passengers in the video by utilizing a video crowd counting algorithm, storing the number of passengers in an algorithm end as the passenger flow of a certain area at a certain time point, modeling historical data of the passenger flow by utilizing a space-time graph convolution model, and finally predicting the space-time distribution of future passenger flow. The system is simple and convenient to use, and people can be counted by shooting videos of the security camera at the data end, so that wifi probe data is replaced; at an algorithm end, in order to avoid manually constructing a large number of features, data modeling is carried out by introducing a mainstream method of traffic network flow prediction, namely graph volume network deep learning, and the method can automatically extract and combine features on time and space, improve the prediction capability and finally predict the time-space distribution of future passenger flow.

Description

Method for predicting space-time distribution of airport passenger flow

Technical Field

The invention relates to the field of computer vision, in particular to a method for predicting space-time distribution of airport passenger flow.

Background

Existing airport passenger flow volume prediction schemes can be divided into two broad categories: dynamic modeling simulation and data-driven modeling.

The dynamic modeling is to start from the generation of the passenger flow, and generally assume that the travel or interval time from the terminal building to the departure port, the security inspection port and other positions of people follows a certain random distribution, or can be described by a certain difference equation, and suitable distribution parameters are estimated from historical data, so as to perform passenger flow simulation. G.kovacs et al (2012) abstracts the check-in counter, security check, waiting area, gate and other areas of the airport into points in the directed graph, models the transfer of people therein by using a store-and-forward model and a differential equation, and finally obtains the passenger flow distribution by simulation. Because such models need some assumptions for theoretical simplification, the models are often not in accordance with actual situations, the prediction effect is difficult to guarantee, the models can only be applied to small areas, and generally only aiming at the on-duty or arrival links, such as F.Jaweb (2018) and the like, passengers wait for time modeling through a random process and a queuing theory model to evaluate the service level of an airport. Dynamic modeling requires complex mathematical tools and domain knowledge, and therefore, as data storage and processing technologies develop, researchers prefer to use data-driven modeling schemes.

The data-driven modeling scheme is to perform modeling analysis on the passenger flow volume as a time sequence without considering the specific generation process of the passenger flow volume. Typical methods are of two types: a conventional time series modeling method and a machine learning method. Typical time series modeling methods are differential integration Moving Integrated Moving Average model, ARIMA) and its variants. Such linear models utilize correlations of passenger flow over similar time periods for modeling and prediction. The Liyuan luminance (2018) establishes an ARIMA model for passenger flow data of the three airports in 1 month from 2008 to 2016 and 12 months for short-term prediction of airport passenger throughput. The typical representation in the machine learning model is an artificial neural network, and a nonlinear component is introduced into the model through the specific activation of hidden layer neurons on input, so that the model can be well fitted with complex nonlinear data. Liao et al (2002) analyzed and predicted annual throughput in 1970-; wangcui (2008) also predicts throughput by grey theory in combination with neural networks.

In the above, no matter the traditional ARIMA model or the neural network model, the aim is to predict monthly and annual passenger flow data, which is not suitable for short-term prediction in the future, because the collected data has relatively small fluctuation, and the passenger flow in short time periods such as 5 minutes and 15 minutes is greatly influenced by random factors. In recent two years, research direction is shifted to short-term prediction of passenger flow in different regions of airports, a Gradient Boosting Decision Tree (GBDT) based on feature extraction becomes the mainstream, and used data is changed from monthly and annual data to real-time wifi probe data. The Wifi probe data refers to the fact that a passenger is connected with a wireless network (Wifi) in an airport, each connected device can be located and counted in a server section, and the number of people connected with the Wifi at different time and places can reflect passenger flow distribution in the airport. When the GBDT is used for data modeling, time and space characteristics need to be extracted manually. The time characteristic means that the relation or periodicity between the number of connected persons at a certain point in time and the number of connected persons in a past period of time is considered, and typical time characteristics are the number of connected persons (mean value, maximum value, minimum value and standard deviation) in the past 1 hour/1 day/2 days/3 days/1 week/2 weeks/1 month. The spatial feature is a typical feature of the number of connected persons in an adjacent place in the past since the number of connected persons in a certain place at a certain time is considered to have a certain correlation with the number of connected persons in an adjacent place in the past due to the movement of passengers in an airport. In addition, information of takeoff and landing of the airplane can be taken as features to enter the model. The GBDT model based on feature extraction has good effect on short-term prediction because the space-time distribution of passenger flow is concerned for the first time. For example, the space-time distribution of the Guangzhou white cloud airport passenger flow is analyzed by utilizing GBDT in the form of evermore peak (2017).

Traditional ARIMA or neural network models tend to be only suitable for large-granularity traffic data predictions, such as day, month, year, and they do not take into account spatial information inside airports.

The GBDT model based on the time and space characteristics improves the point by manually constructing the characteristics, but the effect of the GBDT model is greatly influenced by the characteristics, and a large amount of manpower and material resources are needed to find out which characteristics in the data are beneficial to the passenger flow prediction. When the space features are extracted, a smaller number of space divisions are needed in the method, for example, the space divisions are divided according to the airport building level, when the number of the divided areas is too large, the number of the extracted space features is too large, the model prediction effect and the required calculation time are greatly improved, and the dimension disaster situation occurs.

In addition, the model is currently modeled based on statistical data of wifi probes in airports, however, not every passenger can be connected with wifi, and masamia lily (2018) finds that strong correlation does not exist between the number of wifi connecting people and the number of passengers by researching data of 2016, 9 months, in Guangzhou white cloud airports, and is shown in figure 1;

as shown in FIG. 1, it can be found that the number of passengers and the number of devices connected with wifi all present a certain periodicity every day, the number of passengers and the area where the peak value of the number of devices connected with wifi appears are close, but the proportion of the number of passengers and the number of devices connected with wifi appears differently in different time periods every day, for example, the number of passengers often floats 2-3 times the number of devices connected with wifi when the number of passengers reaches the peak value, and the proportion is less than 1 in the valley (because part of the devices connected with wifi comes from airport staff), because of the floating characteristic of the proportion, the number of people connected with wifi is predicted first, and then the number of people in each area at the predicted time (patent application No. 201811385401.8) is obtained according to the mapping of the predicted number of the connection terminals in each area and the proportion of the number of the real people, and the result is not accurate. Therefore, improvements in the prior art are needed.

Disclosure of Invention

In order to overcome the defects of the prior art, a method for predicting the spatial and temporal distribution of airport passenger flow is provided, and the number of passengers in each region in the subsequent time period can be effectively predicted.

In order to achieve the purpose, the invention provides a method for predicting the space-time distribution of airport passenger flow, which comprises the steps of counting the number of people through a video shot by a security camera at a data end, obtaining an original video record from each camera, extracting the number of passengers in the video by using a video crowd counting algorithm to be used as the passenger flow of a certain time point in a certain area, storing the passenger flow into an algorithm end, modeling historical data of the passenger flow by using a space-time graph convolution model, and finally predicting the space-time distribution of future passenger flow.

According to the airport passenger flow space-time distribution prediction method, the crowd counting method comprises area counting and line counting.

According to the method for predicting the space-time distribution of the airport passenger flow, two schemes are provided for people counting, the first scheme is that the detected human body is counted based on target detection to obtain a predicted value, the second scheme is that the human body is regarded as a regression problem, the position of the head of the human body is marked, the mapping relation from picture pixel points to the number of people is established, and regression is performed by using extracted features, so that the number of people corresponding to the pixel points after regression is close to a real value.

In the second method, the head position of each person in the picture is identified by a cross symbol, mapping from picture pixel points to corresponding number of people is established according to the marked position, the number of people is indirectly counted by using a density map, and x is marked on each head marked in the image_iIts distance from the k nearest heads is denoted as d_i1,d_i2...,d_ikAnd x_iThe associated pixels correspond to approximate and average distances on the ground in the scene

Area of proportional radius, density F is written as

Wherein M represents the total number of people in the labeled picture,

representing variance as σ_iBeta represents a scaling parameter, and the empirical value is 0.3.

In the method for predicting the time-space distribution of the airport passenger flow, the time-space graph convolution model is realized by a residual error network, the residual error network is a residual error module, and an output formula of the residual error network can be represented by the following formula:

y_l＝x_l+F(x_l,W_l)

x_l+1＝f(y_l)

wherein x_l、x_l+1Representing the input of the current module, the next module, y_lRepresenting the output of the current module, W_lAll parameters of the current module are represented, and a fitted density map is obtained by the method described above.

According to the method for predicting the space-time distribution of the passenger flow volume of the airport, the network maps the input density map into the fitting density map, the closer the two density maps are, the better the model effect is proved, and the Euclidean distance is used for measuring the distance between the density maps, so that the optimization target of the residual error network is as follows:

where Θ denotes the parameters of the neural network, N is the number of samples, F (X)_i(ii) a Θ) represents the fitted density map pixel, F represents the true density map pixel; after obtaining the predicted density map, it is necessary to convert the predicted density map into the corresponding number of people, and since the gaussian kernel transform is used, it is only necessary to directly sum the density map pixels, that is, it is sufficient to sum the density map pixels

Wherein

For the predicted number of people, F (X)_i(ii) a Θ) is the density map estimate corresponding to the ith picture obtained.

In order to map the extracted time and space characteristics into the predicted passenger flow, the output of the last time-space convolution block is transmitted into an output layer, wherein the output layer consists of two time convolutions and a full connection layer, the output result of the output layer is the predicted value of the next step, the predicted value of the next step is reconstructed by the output layer and the original data, the model is used for obtaining the predicted value of the next step, and the prediction values of the future steps are obtained by analogy.

Compared with the prior art, the technical scheme adopted by the invention has the beneficial effects that:

the system is simple and convenient to use, and people can be counted by shooting videos of the security camera at the data end, so that wifi probe data is replaced; at an algorithm end, in order to avoid manually constructing a large number of features, data modeling is carried out by introducing a mainstream method of traffic network flow prediction, namely graph volume network deep learning, and the method can automatically extract and combine features on time and space, improve the prediction capability and finally predict the time-space distribution of future passenger flow.

Drawings

FIG. 1 is a schematic diagram of the number of persons leaving an airport in 2016 and 9 months and connected with a wifi device in the prior art;

FIG. 2 is a diagram of the basic module of the residual error network of the present invention;

FIG. 3 is a schematic view of passenger flow data according to the present invention;

FIG. 4 is a schematic view of the overall structure of the model.

Detailed Description

The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.

The drawings attached to the present specification, the depicted structures, ratios, sizes, and the like are only used for matching the disclosure of the present specification, so that those skilled in the art can understand and read the present specification, and do not limit the conditions that the present invention can be implemented, so that the present specification does not have a substantial technical meaning, and any structural modifications, ratio changes, or size adjustments should still fall within the scope of the present disclosure without affecting the efficacy and the achievable purpose of the present invention. In addition, the terms "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not to be construed as a scope of the present invention.

The embodiment of the invention discloses a method for predicting the space-time distribution of airport passenger flow, which comprises the steps of counting the number of people in a data end through a video shot by a security camera, obtaining an original video record from each camera, extracting the number of passengers in the video by using a video crowd counting algorithm to be used as the passenger flow of a certain time point in a certain area, storing the passenger flow into an algorithm end, modeling historical data of the passenger flow by using a space-time graph convolution model, and finally predicting the space-time distribution of future passenger flow.

People counting is a typical problem in the computer vision field, and the problem to be solved is to count people from pictures or videos, which can be generally divided into two types: the method mainly solves the problem of counting the number of people in the whole area, and the problem of line counting mainly concerns the number of people passing through a certain line defined by people in the space, such as the scenes of getting on or off buses and the like. In this problem we are more concerned about how many passengers a certain area has in common at a certain point in time and therefore belong to the area counting problem.

The first method is based on target detection, for example, a human body or some characteristic parts of the human body (such as the head, the shoulders, etc.) are directly detected, and then the detected human body is counted to obtain a predicted value. The second solution is to regard it as a regression problem, firstly mark the position of human head, establish the mapping relation from picture pixel point to number of people, and use the extracted characteristics to carry out regression, so as to make the number of people corresponding to the pixel point after regression close to the true value as much as possible, which is the mainstream solution at present.

There are many public research data sets from labeling to the problem of population counting of density maps, such as ShanghaiTech, Mall, UCF _ CC _50, etc., the head position of each person can be identified by a cross symbol, according to the labeled position, mapping from image pixel points to corresponding population can be established, and there are large areas without pedestrians in the map, so there are a large number of 0 elements after mapping, and in order to avoid the influence of sparsity and make data continuous, we do not directly establish the mapping from pixel points to population, but use the density maps to indirectly count the population.

Therefore, a method for converting the number of people into a high-quality density map is provided, and the method is suitable for the self-adaptive Gaussian kernel of the high-density crowd. For each head x labeled in the image_iUsing a Dirac function delta (x-x)_i) Is expressed with its distance from the k nearest heads denoted as d_i1,d_i2...,d_ik. And x_iThe associated pixels correspond to approximate and average distances on the ground in the scene

Proportional radius area. Therefore, to estimate pixel x_iThe surrounding population density needs to be convolved with a Gaussian kernel whose variance is

Proportionally, the density F can be written as

Wherein M represents the total number of people in the labeled picture,

In particular, the space-time graph convolution model is implemented by a residual network, which is a residual block, as shown in figure 2,the input of the first layer is firstly reduced by the number of channels through the 1 × 1 convolutional layer, then the characteristics are extracted through the 3 × 3 convolutional layer, and finally the number of channels is restored through the 1 × 1 convolutional layer to obtain the output F (x) of the current layer network_l) I.e. the residual. Adding the original input and the residual error to reactivate as the final output x of the current layer_l+1. The number of training parameters can be fully reduced by arranging the three middle convolutional layers, and the problem of gradient dispersion is relieved by introducing the identity mapping.

The general expression can be expressed as follows:

y_l＝x_l+F(x_l,W_l)

x_l+1＝f(y_l)

wherein x_l、x_l+1Representing the input of the current module, the next module, y_lRepresenting the output of the current module, W_lRepresenting all parameters of the current module.

The training data is pixel information of a picture, a density map corresponding to the picture needs to be predicted, and the information of an original picture is expected to be fully extracted by using a model, so that the difference between the predicted density map and a real density map is as small as possible, and the purpose of accurately predicting the number of people is achieved. The model we use is a layer 101 residual network (ResNet-101). Aiming at the problem, the structure of the first four convolution layer groups of the residual error network is borrowed, and the last fifth convolution layer group and the last output layer are modified. The features extracted from the first four convolution layer groups are reduced in dimension through convolution of 1 x 1 in the fifth convolution layer group, and then are mapped to a single channel, and in an output layer, a fitted density map is obtained through an up-sampling method on the feature map. As shown in table 1, the analysis data of the residual network and the network structure in the scheme are shown;

TABLE 1

Note: where the number of modules is 3 indicates the stacking of 3 identical modules, i.e., 9 convolutional layers, 1 × 1 etc. indicates the size of the convolutional kernel, and the latter 128 etc. indicates the number of output channels of the convolutional layer.

Since our network maps the input density map to the fitted density map, the closer the two density maps are, the better the model effect is proved, as shown in fig. 3, which is a structural diagram of the whole model, and the euclidean distance is used to measure the distance between the density maps, the optimization goal of the residual network is:

where Θ denotes the parameters of the neural network, N is the number of samples, F (X)_i(ii) a Θ) represents the fitted density map pixel and F represents the true density map pixel.

After obtaining the predicted density map, it is necessary to convert the predicted density map into the corresponding number of people, and since the gaussian kernel transform is used, it is only necessary to directly sum the density map pixels, that is, it is sufficient to sum the density map pixels

Wherein

The methods are all prediction methods for a single picture, and for videos, one frame of image needs to be taken out at intervals for people number measurement and calculation. Considering the traveling speed of passengers, in the problem, a scheme of taking out one frame every 5 seconds is adopted, and passenger flow statistics is carried out by taking 5 seconds as granularity.

Passenger flow prediction algorithm based on graph convolution network

Using the people counting algorithm mentioned in the previous section, the original view can be looked atThe frequency records are converted into structured passenger flow volume data, and as shown in fig. 4, each node in the graph represents a region corresponding to one camera, and an edge represents connectivity of the camera region, so that a passenger can walk from the node to the next node, and thus the graph represents spatial distribution of the region corresponding to the camera. Upsilon is_t∈RⁿIs an n-dimensional vector (where n is the total number of camera regions) and each component represents the number of passengers observed by the ith camera at time t. The upper and lower level relationships in FIG. 4 represent continuity in time, upsilon_t-M+1,υ_t-M+2,...,υ_tAnd represents history data from time t-M +1 to time t, i.e., passenger flow volume data of past M time periods. Because the cameras are not distributed at equal intervals, the distances among different cameras may be different, and the regions corresponding to some cameras and another region may not be reached directly, all the information can be expressed by the weight among the nodes, namely w in the graph_ijFor the connection weight between nodes, W e R is formed^n×nI.e. G_tThe weight matrix of (2). Available G of the whole graph_t＝(V_tε, W) represents V_tIs the self-owned attribute of n vertexes, corresponding to n observation regions in the airport space; e is an edge set and represents connectivity between the regions; w is formed as R^n×nRepresents G_tThe connection matrix of (2).

The traffic prediction can be considered as a time series prediction problem, i.e. predicting the most likely traffic after the next H timestamps are predicted given the first M observation samples, i.e. predicting the traffic after the next H timestamps

Due to the formation characteristic of traffic flow, the correlation of the passenger flow of a specific area in adjacent time is strong generally, and in addition, the passenger flow flows on a topological structure of a space, so the passenger flow of the adjacent area also presents some related trends, and the main time and space characteristics of passenger flow distribution need to be captured by the model, so that the prediction effect is improved.

The space-time graph convolution model comprises a time convolution layerAnd a space-time convolution layer, the time convolution layer comprising a one-dimensional convolution, the width of the kernel being K_tFollowed by a Gated Linear Unit (GLU) as activation. For each vertex in graph G, similar to the sliding window method, the time convolution is applied to K of the input vector_tThe neighbor operates to cause the sequence length to be shortened by K_t-1. Thus, the input to the time convolution of each vertex can be viewed as a sequence of length M, with C_iEach channel (i represents input, the number of channels in the input vector) is written as

Convolution kernel

Y is mapped to the result. The time-gated convolution can be defined as:

wherein P and Q are input gates of GLU, and C is used for each_o(o in the notation represents output, the number of channels of the output vector, i.e. how many different features are extracted) a number of different convolution kernel parameters perform a linear transformation on the input vector Y, where W is_i,b_iIs a corresponding parameter

Representing a Hadamard product; σ denotes the sigmoid activation function commonly used in neural networks, and σ (Q) this nonlinear gate controls the information entering the model in P. Compared with the activation function value directly using linear transformation, the structure can be excavatedComplex temporal characteristics.

Similarly, by at each node

The same convolution kernel Γ is used on all the stations (such as monitoring stations), and the time convolution can be generalized to a three-dimensional tensor.

Because the time convolution only extracts traffic flow features in the time dimension, and the space map convolution only extracts features in the space dimension, they need to be stacked to realize complex combination of simple features, and further fit complex space-time distribution. To this end, we have designed spatio-temporal convolutional layer modules, each of which consists of two temporal convolutions and one graph convolution, wherein the graph convolution module is located between the two temporal convolution modules. The composition similar to the sandwich can enable us to use a bottleeck strategy, namely compared with the number of output channels of a first time convolution module, the number of output channels in the graph convolution module is much smaller, and the number of output channels is recovered in a second time convolution module, so that the dimensionality to be fitted is greatly reduced, the training is facilitated, and the possibility of overfitting is reduced. We also added a Batch Normalization Layer (Batch Normalization Layer) at the end of each spatio-temporal convolution Layer to make the distribution of parameters more stable and reduce overfitting.

The input and output of the space-time volume block are both 3D tensors. For the l space-time convolution block, its input

Output of

Calculated by the following equation:

wherein

The upper and lower time convolutions in the ith time-space convolution block; theta^l _*GIs the middle graph convolution; RELU denotes this activation function of the linear correction unit.

To map the extracted temporal and spatial features to the predicted passenger flow, we pass the output of the last spatio-temporal volume block into the output layer, where the output layer consists of two temporal volumes and one fully connected layer. The output result of the output layer is a predicted value of a future step, and the predicted value and the original data are reconstructed

And obtaining the next predicted value by using the model, and obtaining the predicted values of future multiple steps by analogy.

The optimization objective or loss function for the entire model is:

wherein, W_θAre all the training parameters in the model; v. of_t+1Is the real value in the future,

representing the prediction of the model.

Specific embodiments of the invention have been described above. It is to be understood that the invention is not limited to the particular embodiments described above, in that devices and structures not described in detail are understood to be implemented in a manner common in the art; various changes or modifications may be made by one skilled in the art within the scope of the claims without departing from the spirit of the invention, and without affecting the spirit of the invention.

Claims

1. A method for predicting the spatial and temporal distribution of airport passenger flow is characterized by comprising the following steps: counting the number of people through the shooting video of the security camera at the data end, obtaining original video records from each camera, extracting the number of people in the video by using a video crowd counting algorithm to serve as the passenger flow of a certain time point in a certain area, storing the passenger flow into the algorithm end, modeling historical data of the passenger flow by using a space-time graph convolution model, and finally predicting the space-time distribution of the future passenger flow.

2. The method for predicting the spatial-temporal distribution of the passenger flow volume of the airport according to claim 1, characterized in that: the people counting method comprises two types of area counting and line counting.

3. The method for predicting the spatial-temporal distribution of the passenger flow volume of the airport according to claim 2, characterized in that: the crowd counting has two schemes, the first is to count the detected human body based on target detection to obtain a predicted value, the second is to regard the human body as a regression problem, mark the position of the head of the human body, establish the mapping relation from the picture pixel points to the number of people, and carry out regression by using the extracted characteristics, so that the number of people corresponding to the pixel points after regression is close to the true value.

4. The method for predicting the spatial-temporal distribution of the passenger flow volume of the airport according to claim 3, wherein: in the second scheme, the head position of each person in the picture is identified by using a cross symbol, mapping from picture pixel points to the corresponding number of people is established according to the marked position, the number of people is indirectly counted by using a density map, and each head x marked in the image is marked_iIts distance from the k nearest heads is denoted as d_i1,d_i2...,d_ikAnd x_iThe associated pixels correspond to an approximate sum-flatness on the ground in the sceneMean distance

Area of proportional radius, density F is written as

Wherein M represents the total number of people in the labeled picture,

5. The method for predicting the spatial-temporal distribution of the passenger flow volume of the airport according to claim 1, characterized in that: the space-time graph convolution model is realized through a residual error network, the residual error network is a residual error module, and an output formula of the residual error network can be represented by the following formula:

y_l＝x_l+F(x_l,W_l)

x_l+1＝f(y_l)

6. The method for predicting the spatial-temporal distribution of the passenger flow volume of the airport according to claim 5, wherein: the network maps the input density map into a fitting density map, the closer the two density maps are, the better the model effect is proved, the Euclidean distance is used for measuring the distance between the density maps, and therefore the optimization target of the residual error network is as follows:

Wherein

7. The method for predicting the spatial and temporal distribution of the passenger flow volume of the airport according to claim 6, wherein: the space-time graph convolution model comprises a time convolution layer and a space-time convolution layer, in order to map extracted time and space characteristics into predicted passenger flow, the output of the last space-time convolution block is transmitted into an output layer, wherein the output layer consists of two time convolution layers and a full connection layer, the output result of the output layer is a predicted value of one step in the future, the predicted value of the next step is reconstructed by the output layer and original data, the predicted value of the next step is obtained by using the model, and the predicted value of the multiple steps in the future is obtained by the analogy.