WO2023029234A1

WO2023029234A1 - Method for bus arrival time prediction when lacking data

Info

Publication number: WO2023029234A1
Application number: PCT/CN2021/132828
Authority: WO
Inventors: 马佳曼; 罗喜伶; 蒋淑园; 金晨
Original assignee: 北京航空航天大学杭州创新研究院
Priority date: 2021-09-01
Filing date: 2021-11-24
Publication date: 2023-03-09
Also published as: CN113470365B; CN113470365A

Abstract

A method for bus arrival time prediction when lacking data, comprising: finding important geographic locations by using a density-based clustering method; using the found geographic locations to represent a geographic structure of each route to form node information; according to geographic structure similarity, constructing weighted edge information provided with traffic importance so as to form a public transportation network graph; and according to the public transportation network graph, establishing multi-head spatial and temporal graph attention network prediction models to learn the correlation between bus routes from the perspective of space and time, wherein a spatial attention module is a multi-head graph attention module provided with a mask, and a temporal attention module is built by an LSTM layer and a transformer layer. The described method may effectively learn and predict the travel time for each route in a public transportation network, which may not only improve the accuracy of arrival time prediction for bus routes that have few current travel records, but also provide the estimated travel time for bus routes which are in the design stage and do not have historical records.

Description

A bus arrival time prediction method for missing data

technical field

The invention relates to the field of bus arrival time prediction, in particular to a bus arrival time prediction method for missing data, which is used to provide accurate data for bus lines with uncertain departure times, problems with GPS equipment, etc. arrival time forecast.

Background technique

The bus network is very important to the rapid development of urban transportation. Due to the characteristics of economy and environmental protection, buses are still the main solution for urban travel. The main factors hindering passengers from choosing public transport are their long waiting time and uncertainty of journey time, so accurate prediction of bus arrival time is very important to solve this problem. It can also help reduce traffic congestion and be used in other comprehensive intelligent transportation applications, such as trip planning.

Existing bus arrival or travel time prediction methods rely on a large amount of historical travel records and extensive coverage of public transport routes. Therefore, existing methods cannot or are difficult to accurately predict the arrival time of buses with missing data. The main reasons include:

1) Data sparsity: The main problem with travel time predictions for predicting suburban (sparse records) and traffic routes under consideration and design (no records) is that they have a lot of missing data and cannot learn accurate runs from historical records mode, so that the arrival time cannot be predicted;

2) Independent travel mode: From the perspective of space, in order to expand the service coverage area, the bus lines in the urban bus network will not have too much repetition and overlap, making it difficult for bus lines to learn from each other. Even if some stations share parts of the same road, routes in the transportation network have relatively different and independent travel patterns due to different needs, different numbers of passengers, and different parking sequences and locations;

3) Complex traffic conditions: The travel patterns of public transport lines are more complex than those of private vehicles, since they also have unique traffic information. In addition to being influenced by road lengths, directions and interaction numbers, public transport routes are also influenced by departure timing, numbers and places of stops.

Therefore, it is necessary and important to propose a bus arrival time prediction method with missing historical data.

Contents of the invention

In order to solve the problem of predicting the arrival time of buses with missing data, the present invention provides a method for predicting the arrival time of buses with missing data, which can accurately and effectively learn and predict each route in the public transportation network The travel time can not only improve the accuracy of arrival time prediction for bus lines with few current travel records, but also provide estimated travel time for bus lines in the design stage without historical records.

Technical scheme of the present invention is as follows:

The invention provides a method for predicting the arrival time of buses facing missing data, which comprises the following steps:

1) Integrate historical bus track GPS information, public transport site location information and urban interest point data (data information includes building longitude and latitude and building urban function classification information); through the density-based clustering method from the integrated data Extract important geographic locations in the bus network, use the obtained geographic locations to represent the geographic structure of each bus route, and perform node extraction and edge extraction on the public transportation network graph to construct a public transportation network graph;

2) According to the public transportation network diagram of construction, set up the multi-head spatio-temporal graph attention network prediction model from the perspective of space and time to learn the correlation between bus routes; wherein, the multi-head spatio-temporal graph attention network prediction model includes sequential A connected spatial attention module and a temporal attention module; the spatial attention module is an attention module of a multi-head graph attention network with a mask, which applies a graph attention network to learn the spatial dependencies between different nodes, and A multi-head map attention block with a mask is used to learn the global and local spatial dependencies between bus lines in different situations; the temporal attention module includes an LSTM layer and a transformer layer for local time-dependent learning and global time-dependent learning;

3) Using a multi-head spatio-temporal graph attention network prediction model to predict bus arrival times with missing data.

Further, the described extraction of important geographic locations in the bus network from the integrated data through the density-based clustering method is specifically:

Set the parameters of the density-based clustering method according to the number of GPS points whose speed is 0 in the integrated data, and obtain the important geographic locations in the bus network through the density-based clustering method, and according to the GPS points contained in each geographic location The number of points to determine the weight of the geographic location.

Further, using the obtained important geographic locations to represent the geographic structure of each route is specifically: using weights to represent the positions of intersections and stations, so as to represent the geographic structure of each bus route.

Further, the node extraction is specifically: select a geographic location between every two adjacent stations of each bus route, and use the two station information and the selected geographic location information set between them as node s information to represent the road segment between two adjacent stations.

Further, selecting a geographic location between each two adjacent stations to constitute node information is specifically: selecting the intersection with the farthest distance from the two adjacent stations and the largest flow of people in the bus route, and The intersection information and the geographic location information of two adjacent stations constitute node information.

Further, the edge extraction is specifically:

Create an edge graph to represent the relationship between road sections, where the weight of the edge code is the spatial correlation strength or similarity; construct three graphs to represent the similarity relationship of geographical structure, the distance relationship between bus routes, and the division of urban functional areas. The adjacency matrix A of the relationship is used to obtain three kinds of public transportation network graphs representing different relationships.

Preferably, the construction method of the adjacency matrix A is specifically as follows: establishing an edge graph of geographical structure similarity, extracting the position information and node length information of the nodes according to the three important geographic location information contained in the extracted nodes, and using the DTW algorithm Do the similarity comparison, establish the geographical structure similarity adjacency matrix A _g between the nodes; then, extract the urban functional category of each node according to the building category information contained in the data of the urban interest points near the three geographical locations in each node , according to the similarity of urban functions, the adjacency matrix A _f of the urban functional area division relationship between nodes is established; finally, according to the distance relationship between bus routes, a third geographical distance adjacency matrix A _d is designed; the edge in the adjacency matrix The weights of are normalized and range between 0 and 1.

Further, the step 3) is specifically: for the bus lines with lack of data, use the multi-head space-time graph attention network prediction model and according to the similarity of the bus operation mode, learn the complete historical data in the first h time periods bus operation patterns; and then predict bus arrival times for bus lines with lack of data.

Compared with the prior art, the present invention utilizes a density-based clustering method to locate important travel locations (such as stations, intersections) in the route according to the geographical structure of each public transport route and the space-time dependency between the routes ; Construct a weighted public transport network graph with traffic importance based on the mined important travel locations. Based on the constructed traffic network graph, a multi-attention graph neural network is provided to learn the correlation between bus routes from both spatial and temporal perspectives. In the spatial attention module, an attention module (multi-head GAT) with a multi-head masked graph attention network is established, which can learn global and local routes in three views of city function, route distance, and bus structure similarity The importance of learning and effectively combining multiple influencing factors to combine learning travel modes. In the temporal attention module, it is proposed to use long short-term memory (LSTM) and transformer layers to learn the bus operation mode of the far and recent time. Combine spatial and temporal attention modules to accurately infer and learn travel (arrival) times for bus routes with sparse and history-free data. The invention can not only improve the prediction accuracy rate of the arrival time of the bus lines with few current travel records, but also provide estimated travel time between stations for the lines in the design stage without historical records.

Description of drawings

Figure 1 is a flow chart of the bus arrival time prediction method with missing data;

Fig. 2 is the schematic diagram of the pseudo-code of the density-based clustering method in the embodiment of the present invention;

Fig. 3 is a schematic diagram of a pseudocode of a method for selecting a geographic location between sites in an embodiment of the present invention.

Detailed ways

The present invention will be further elaborated and described below in combination with specific embodiments.

The flow chart of the method of the present invention is shown in Fig. 1, and it is made up of two main parts, respectively is the construction of public transport network map and the construction based on multi-head attention map network model. First of all, the GPS information of the historical bus running track, the location information of public transportation stations and the data of urban points of interest (POI) are integrated, and then the present invention proposes a density-based clustering method to automatically discover important geographic locations (including stations and Intersections), and use the excavated geographic location to represent the geographical structure of each route; according to the similarity of geographical structure (such as the distance between bus stops, the number of intersections), the distance between bus routes, and the division of urban functional areas , to construct three kinds of public transportation network graphs. According to the constructed public transportation network graph, the present invention establishes a multi-head space-time graph attention network prediction model, including spatial and temporal attention modules to predict arrival time. In the spatial attention module, the present invention designs a multi-head graph attention block with a mask to learn the global and local spatial dependencies between bus lines in different situations. The multi-head mechanism performs ensemble learning on the travel times of adjacent bus line segments with complete historical travel records. Through the designed masking mechanism to mask the weighted adjacency matrix composed of different neighbor bus line segments, the attention blocks can be focused on the selected neighbors and the computation time is reduced. For example, when a target route has few unstable histories, its own input history travel time is necessarily ignored. In the time attention module, in order to accurately simulate the time influence of past historical data under different conditions (such as normal and abnormal situations under traffic jams and weather conditions), the present invention constructs a time attention through an LSTM layer and a transformer layer module to obtain global (long distance) and local (nearest) temporal patterns for travel time prediction. Through spatio-temporal attention, it is possible to learn the bus operation mode (X _th ,…,X _t ) with complete historical data in the first h time period; according to the similarity of the operation mode, the arrival of the bus with the lack of historical data Time X _t+1 forecast.

The construction of the public transport network graph and the prediction model of the multi-attention graph network of the present invention are described below.

1. Construction of public transportation network map

Since the structure of public transportation is usually represented by a succession of connected road segments or stops, existing grid-based map segmentation is ineffective for travel time prediction. Thus, by considering the spatial and temporal characteristics of public transportation, the present invention constructs a transportation network G from a new graphical view, and proposes a density-based clustering method (DBSCAN) to automatically discover important geographic locations ( Stations and intersections), and use the obtained geographic location to represent the geographic structure (Geo-structure) of each route. Then, based on the geographical structure of the discovered bus routes, the present invention constructs a public transport network G, where a node S in G represents a travel segment between every two adjacent stops.

(1) Mining the geographical structure of public transportation based on DBSCAN

First, the present invention utilizes a density-based clustering algorithm (DBSCAN) to extract the geographic structure of public transportation, including the exact location and density of intersections and stops. The process does not directly use the marked station information, but extracts the locations from the DBSCAN algorithm for two reasons: First, since the proposed travel segment is station-based, there are cases where two routes have the same stations but different paths . Therefore, it is inaccurate to extract the geographical structure of the route using only the location of the station. Second, the waiting time at each station and intersection is usually affected by the number of passengers and traffic lights, which can lead to large differences in travel times for different routes. Therefore, the present invention uses a density-based clustering method (DBSCAN) to find a dense area (Geo-structure discovery), without setting the number of clusters or a fixed shape. The specific steps are shown in Figure 2 (Algorithm 1) in Figure 2, GPS point p contains longitude and latitude information (long, lat), and c is an important point in the bus network obtained after algorithmic clustering, including longitude, latitude, GPS number and station information (long, lat, num, st) . The density-based clustering algorithm needs two parameters: the scanning radius eps and the minimum number of points minPts. The parameters are set according to the number of GPS points with speed values around 0 in the database to find all important geographic locations along the way, including stations and intersections. . Stations and intersections are then weighted according to the number of GPS within each excavated geographic location.

(2) Node extraction and edge extraction of public transportation network graph

2.1) Node extraction

After extracting the important geographical location of the bus network, the present invention proposes a node information representation method. The method design first selects an important geographical location between every two adjacent stations of each bus route, and the two station information The collection of important geographic location information between and is used as the information of node s to represent the travel segment from one station to the next. The method for selecting the geographic location between sites is shown in Figure 3 (Algorithm 2). In Figure 3, step 3 traverses all the collections of clusters C obtained from DBSCAN. In steps 4-6, among the two adjacent c with sites, find the one with the farthest distance from the site and the largest flow of people c _i+1 is one of the information in s, step 7 gathers two sites and c _i+1 as the information of s.

2.2) Adjacency matrix (edge) establishment

The invention establishes an edge graph according to the node information to represent the relationship between road sections (nodes). Since many influencing factors will affect the travel mode of the public transport system, the present invention constructs three adjacency matrices A to represent different relationships. First, establish the edge graph of geographical structure similarity relationship, extract the location information of the road section (node) and the length information of the road section according to the three important geographical location information contained in the extracted nodes, use the DTW algorithm to compare the similarity, and establish the geographical relationship between nodes Structurally similar adjacency matrix A _g ; then, according to the building category information contained in the data of urban interest points near the three positions in each node (set as the collection of building urban function category information within a distance of 100 meters in the radius of latitude and longitude of the node), extract The urban function category of each road section (node), according to the similarity of urban functions, establishes the adjacency matrix A _f of the urban functional area division relationship between the nodes; finally, due to a certain spatial geographical range, the traffic conditions on the road may be compared resemblance. According to this consideration, the present invention designs a third geographic distance adjacency matrix A _d according to the distance relationship between bus routes. The weight of the edge in the adjacency matrix is designed in the present invention after normalization processing, and the range is between 0 and 1.

2. Construction of multi-head spatio-temporal graph attention network prediction model

After constructing the bus network graph, the present invention proposes a multi-head spatio-temporal graph attention network prediction model for predicting the bus travel time of the whole city with limited data. This model can effectively learn and predict the travel time of various bus lines in the city, especially for suburban lines and routes without any historical bus travel records. It can help update existing public transport systems, adjust outdated timetables for routes in developing regions, and help select new routes and design new ones, providing travel times for each route under normal and irregular traffic conditions. This model contains two modules, the spatial attention module and the temporal attention module.

2.1) Spatial attention module

In the spatial attention module, the present invention applies a graph attention network (GAT) to learn the spatial dependencies between different travel segments. Compared with graph convolutional network (GCN), GAT's dynamic graph processing ability and inductive learning ability are more suitable for city-wide travel time prediction with limited data. The graph attention layer is the fundamental part of GAT, which can learn the correlation between each node and update the hidden features of each pair of nodes. Node features are denoted as h ^t _i in time interval t. In the first layer, h ^t _i is the input travel time record and embedded temporal information for segment s _i . The attention coefficient e ^t _ij of s _i and s _j can be expressed as:

where W is the learnable parameter of layer l, and a(.) is the function to calculate the correlation. The present invention utilizes the LeakyReLU active function to train the feed-forward neural network. For each layer, the output is normalized to [0,1] by the softmax function:

In order to obtain a richer combination of travel modes, which is used to accurately learn the operation mode of bus lines with complete historical data for bus lines with missing data, the present invention extends spatial attention to a masked multi-head attention mechanism, which has a learnable The K independent attention heads of L are concatenated to achieve the final spatial attention result:

where σ is the softmax function. In each attention head, the present invention adds a mask attention mechanism to the adjacency matrix, which is used to pay attention to the bus lines with complete historical operation data and learn the operation mode. The representation of mask m at layer l is:

Where γ is the threshold of the attention a of node i and node j, and the X output after adding a mask to layer l is expressed as:

Among them, X is the input data of the bus running time, X' is the output after adding the attention mechanism, X ^l ' is the output of the l layer after the mask mechanism is added,

is the Hadamard product of the adjacency matrix and the mask matrix.

2.2) Temporal attention module

After learning spatial dependencies, the present invention connects a temporal attention module. Since the travel time of each bus trip is greatly affected by real-time traffic conditions. For example, when traffic conditions are normal, the travel time of the target route's past long-distance history for the same time period in the current time period may be highly similar. However, when traffic congestion occurs, the travel pattern may be erratic, but still may have a similar pattern to the most recent time period. Therefore, for different traffic situations, both global (far time) and local (nearest) time travel patterns need to be considered.

2.2.1 Local time-dependent learning

A recurrent neural network (RNN) is a type of artificial neural network that is particularly well suited for capturing temporal dependencies in sequence learning. However, previous studies have shown that RNNs are often difficult to train for long sequences due to the problem of vanishing and exploding gradients. To overcome these shortcomings, LSTM (Long Short-Term Memory) automatically determines the optimal time lag by introducing an input gate and a forget gate. Therefore, the present invention builds an LSTM-based model to focus on local temporal information, where h _t-1 is the input vector of the LSTM unit, Wi _ix , Wi _h and _bi are the learnable parameter matrix and bias vector of the recurrent layer, σ standard sigmoid function. The input gate _it of LSTM can be expressed as:

i _t ＝σ(W _ix x _t +W _ih h _t-1 +b _i ) (7)

2.2.2 Global time-dependent learning

In order to discover temporal bus travel pattern information from a global perspective, the present invention introduces a transformer layer in the temporal attention module. For a single-head attention layer, there are usually three types of vector Q, query K, key, and value V for each node in the public transport network graph. The hidden subspace learning process can be expressed as:

Q=X _i W ^Q , K=X _i W ^K , V=X _i W ^V . (8)

W ^Q , W ^K , W ^V are the parameters of the science department, and the output Attention of the global time attention is calculated according to the scaled dot product attention, where d _K is the scaling factor, expressed as:

2.3) Bus arrival time prediction layer

After obtaining the high-dimensional spatio-temporal features, the present invention uses a linear layer for prediction. Train the prediction X t+1 by minimizing the mean square error L between the predicted value X' _t+1 _of the expected output and the true value, and use the mean square error (MSE) loss to train the multi-head spatiotemporal graph attention network prediction model, Can be expressed as:

where θ is a learnable parameter in the model.

In the experimental verification link of the method of the present invention, the present invention uses bus trajectory and POI information, and these data are all obtained from the traffic department of a certain city. Bus trajectories include location, timestamp, speed, and bus ID information. The average sampling frequency is 30 seconds per point, with a daily volume of about 300,000 points generated by 278 individual lines. The POI dataset consists of building locations and categories (social functions). Using the cross-validation method, three lines were selected as target lines to evaluate the performance and robustness of the method of the present invention. The three test bus lines selected by the present invention are located in different areas of the city, including developed central areas, remote areas, and paths connecting the center and the suburbs. The results are then averaged. 40%, 60%, and 80% of the GPS recorded points were randomly deleted from each trajectory of the three test bus lines to test the prediction accuracy for different degrees of sparsely recorded lines. The present invention also deletes all their histories and treats them as three routes under design (without any historical travel time records) to test the travel time estimation performance of new routes (site locations and routes are designed), This facilitates the evaluation of the proposed model, here named MAGTTE for the inventive method.

Comparative experimental methods:

• Historical average model (HA). Travel time is predicted by calculating the historical average travel time of bus lines in each time period (15 minutes).

●Spatial-temporal artificial neural network (ST-ANN)

●Kalman filter (KF)

● Support Vector Regression (SVR)

● E-knn: This is a proposed model based on the weighted enhanced k-NN method, which uses the k records most similar to the current traffic condition to identify the traffic condition and predict the travel time. Here, the similarity of the travel mode is set to be above 90/%, and the target road segment is k neighbors.

RnnTTE: The model is based on an LSTM neural network and contains a fully connected LSTM layer with 128 hidden units.

DeepTTE: This is a model that combines geo-conv layers and lstm layers to predict travel time.

The experimental results are shown in Fig. 4, and the results show that the proposed method outperforms other existing state-of-the-art methods in the field of travel time prediction problems during the design process, whether it is sparse records or routes. It demonstrates that the proposed MAGTTE can effectively predict the public transit travel time of the whole city with the minimum MAPE error rate, and is used to update and develop public transit routes.

The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the present invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims

A method for predicting bus arrival time facing missing data, characterized in that it comprises the steps of:

1) Data integration of GPS information of historical bus trajectories, location information of public transport stations and urban point of interest data; extraction of important geographic locations in the bus network from GPS information of historical bus trajectories through a density-based clustering method, Use the obtained geographic location to represent the geographical structure of each bus route, design the node extraction and edge weight representation methods of the public transportation network graph according to the location information of public transportation stations and the data of urban interest points, and construct the public transportation network graph;

2) According to the public transportation network diagram of construction, set up the multi-head spatio-temporal graph attention network prediction model from the perspective of space and time to learn the correlation between bus routes; wherein, the multi-head spatio-temporal graph attention network prediction model includes sequential A connected spatial attention module and a temporal attention module; the spatial attention module is a multi-head map attention module with a mask, wherein the multi-head design is used to learn global and local spatial dependencies between bus lines in different situations, The mask is designed to pay attention to important bus line travel rules with complete historical data; the temporal attention module includes an LSTM layer and a transformer layer for local time-dependent learning and global time-dependent learning, respectively;

3) Using the multi-head spatio-temporal graph attention network prediction model to predict the arrival time of buses with missing data;

Described step 3) is specifically: for the bus route that lacks data, utilize multi-head spatio-temporal graph attention network prediction model and according to the similarity of bus operation mode, learn the bus that has complete historical data in the preceding h time period mode of operation; in turn predicting bus arrival times for bus lines with lack of data.
According to the method for predicting the arrival time of buses with missing data according to claim 1, it is characterized in that, extracting the important geographic location in the bus network from the GPS information of the historical bus running track by a density-based clustering method, specifically for:

Set the parameters of the density-based clustering method according to the number of GPS points whose speed is 0 in the integrated data, and obtain the important geographic locations in the bus network through the density-based clustering method, and according to the GPS points contained in each geographic location The number of points to determine the weight of the geographic location.
According to the method for predicting the arrival time of buses with missing data according to claim 1, it is characterized in that, the important geographic location obtained by the use is used to represent the geographical structure of each route, specifically: using weights to represent The location of intersections and stops to represent the geographic structure of each transit route.
The method for predicting bus arrival time facing missing data according to claim 1, wherein said node extraction is specifically: selecting a geographic location between every two adjacent stations of each bus route , take the two site information and the selected geographic location information set between them as the information of node s to represent the road section between two adjacent sites.
According to the method for predicting the arrival time of buses with missing data according to claim 4, it is characterized in that a geographic location is selected between each two adjacent stations to form node information, specifically: selecting a bus route , the intersection with the farthest distance from the two adjacent stations and the largest flow of people, the intersection information and the geographic location information of the two adjacent stations constitute node information.
According to the method for predicting the arrival time of buses with missing data according to claim 5, it is characterized in that, the described edge weight representation method is specifically:

Create an edge graph to represent the relationship between road sections, where the weight of the edge code is the spatial correlation strength or similarity; construct three graphs to represent the similarity relationship of geographical structure, the distance relationship between bus routes, and the division of urban functional areas. The adjacency matrix A of the relationship is used to obtain three kinds of public transportation network graphs representing different relationships.
According to the method for predicting the arrival time of buses with missing data according to claim 6, it is characterized in that the construction method of the adjacency matrix A is specifically: establishing a geographical structure similarity relationship edge graph, according to the node included in the extraction Three important geographic location information, extract the node location information, node length information, use the DTW algorithm to do similarity comparison, and establish the geographic structure similarity adjacency matrix A g between nodes; then, according to the three geographic location information in each node According to the building category information contained in the nearby urban interest point data, the urban function category of each node is extracted, and the urban functional area division relationship adjacency matrix A f between nodes is established according to the similarity of urban functions; finally, according to the bus route The distance relationship among them, a third geographical distance adjacency matrix A d is designed; the weight of the edge in the adjacency matrix is normalized, and the range is between 0 and 1.