CN116050640A

CN116050640A - Short-time passenger flow prediction method of multi-mode traffic system based on self-adaptive multi-graph convolution

Info

Publication number: CN116050640A
Application number: CN202310108449.9A
Authority: CN
Inventors: 张金雷; 杨立兴; 杨咏杰; 阴佳腾; 戚建国; 高自友
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2023-02-01
Filing date: 2023-02-01
Publication date: 2023-05-02
Anticipated expiration: 2043-02-01
Also published as: CN116050640B

Abstract

The invention discloses a short-time passenger flow prediction method of a multi-mode traffic system based on self-adaptive multi-graph convolution. The method comprises the following steps: aiming at a multi-mode traffic system, acquiring a historical passenger flow sequence, an autocorrelation graph and a cross correlation graph; and taking the historical passenger flow sequence, the autocorrelation graph and the cross correlation graph as inputs, and outputting predicted future passenger flows of each traffic mode by using a trained short-time passenger flow prediction model. The invention can cooperatively consider future passenger flows of a plurality of different areas in the range of the urban multi-mode traffic system for a plurality of traffic modes, solves the problem of heterogeneous traffic passenger flows of the multi-mode, realizes information interaction of different traffic modes and improves the calculation efficiency.

Description

Short-time passenger flow prediction method of multi-mode traffic system based on self-adaptive multi-graph convolution

Technical Field

The invention relates to the technical field of traffic passenger flow prediction, in particular to a short-time passenger flow prediction method of a multi-mode traffic system based on self-adaptive multi-graph convolution.

Background

Short-term passenger flow prediction in urban traffic systems can capture the spatiotemporal characteristics of multi-mode traffic passenger flows and predict future passenger flows of each traffic mode in each region of the city respectively. However, there are some difficulties in short-term inflow prediction for multiple traffic patterns. For example, information interaction mechanisms between different traffic modes in a multi-mode traffic system are difficult to obtain; complex dynamic spatiotemporal features of passenger flow of a multi-mode traffic system are difficult to capture; the heterogeneity of the passenger flow data of the multimode traffic causes that the passenger flow data is difficult to organize and the model is difficult to construct.

Currently, deep learning models are widely used for traffic prediction, including long and short term memory networks (LSTM), convolutional Neural Networks (CNN), graph roll-up neural networks (GCN), and the like. However, the existing multi-mode traffic short-term passenger flow prediction scheme has the following problems: existing studies typically focus only on the target traffic pattern and some external factors, such as weather conditions, while ignoring the effects of other traffic patterns; the existing research on short-time passenger flow prediction of multi-mode traffic is seldom focused on interaction mechanisms in a multi-mode traffic system; because different traffic modes have different spatial characteristics, heterogeneity exists in passenger flow data of the multimode traffic, the heterogeneity of the passenger flow data causes that the multimode traffic passenger flow data is difficult to organize, and a prediction model is difficult to construct; because the functions of different areas in the urban network range are different, in the multimode traffic system, the passenger flow rules of different traffic modes have larger difference, so that the passenger flow rules of different traffic modes in the urban range are difficult to acquire. In addition, the space-time relationship between different traffic patterns may be time-varying, and existing studies use static space-time relationship matrices to characterize the space-time relationship between different traffic patterns, making it difficult to capture dynamic space-time characteristics.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a short-time passenger flow prediction method of a multi-mode traffic system based on self-adaptive multi-graph convolution. The method comprises the following steps:

for a multi-mode traffic system, a historical passenger flow sequence X is acquired _(t-L)→t Autocorrelation diagram G _s And cross-correlation diagram G _c ；

Learning a mapping function F (-) by training a short-time passenger flow prediction model to predict future passenger flows of each traffic mode, expressed as:

X _t+1 ＝F(X _(t-L)→t ,G _s ,G _c )

wherein L represents the length of the history period, X _t+1 Representing the sequence of inbound traffic of the multi-mode transportation system at time t+1.

Compared with the prior art, the invention has the advantages that a novel short-time passenger flow prediction model based on multi-task learning is provided, future incoming passenger flows of different traffic modes in the multi-mode traffic system can be accurately predicted, and an information interaction mechanism between the different traffic modes is extracted, so that a reliable method and profound insight are provided for managing and understanding the multi-traffic mode system.

Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow chart of a method for short-term passenger flow prediction for a multi-mode traffic system based on adaptive multi-graph convolution in accordance with one embodiment of the present invention;

FIG. 2 is a schematic illustration of a short-term passenger flow prediction model according to one embodiment of the invention;

FIG. 3 is a diagram of a multiple time relationship attention mechanism according to one embodiment of the invention;

FIG. 4 is a schematic diagram of an influence coefficient matrix visualization according to one embodiment of the invention;

FIG. 5 is a schematic diagram of a feature aggregation layer based on an attention mechanism in accordance with one embodiment of the invention;

FIG. 6 is a causal convolutional layer diagram in accordance with one embodiment of the present invention;

FIG. 7 is a diagram of a TaxiBJ data visualization in accordance with one embodiment of the present invention;

FIG. 8 is a schematic diagram of subway and bus data processing according to one embodiment of the invention;

FIG. 9 is a schematic illustration of a partial area multimode traffic flow in accordance with one embodiment of the invention;

FIG. 10 is a graph illustrating the result of hyper-parameter adjustment, according to an embodiment of the invention;

FIG. 11 is a diagram showing the prediction results of M2-forms in different regions according to an embodiment of the present invention;

fig. 12 is a schematic diagram of prediction results of a short-term passenger flow prediction model in different regions according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

Referring to fig. 1, the provided method for predicting short-term passenger flow of a multi-mode traffic system based on adaptive multi-graph convolution generally comprises the following steps: step S110, acquiring a historical passenger flow sequence, an autocorrelation graph and a cross correlation graph aiming at a multi-mode traffic system; step S120, taking the historical passenger flow sequence, the autocorrelation graph and the cross correlation graph as inputs, and outputting predicted future passenger flows of each traffic mode by using the trained short-time passenger flow prediction model. In the following, the related concepts will be defined first, and the short-term passenger flow prediction problem of the multi-mode traffic system will be defined in detail, so as to introduce the related short-term passenger flow prediction model and experimental verification result.

1. Related concepts and problem definition

Definition 1 (multimode traffic system): the multimode traffic system T is defined by M (M>1) A traffic pattern, such as subway, bus, taxi, etc. For the kth mode of transportation,

representing traffic pattern k passenger flow at time t, where N _k The number of nodes representing traffic pattern k. For a multi-mode traffic system T, the passenger flow at time T can be expressed as

Definition 2 (autocorrelation diagram): for traffic pattern k, its autocorrelation map is defined as

wherein ,V_k Is a node set, and |V _k |＝N _k 。/>

Is a weight matrix that is used to characterize the correlation between each point in the set of nodes. For a multi-mode traffic system T, the autocorrelation map of all traffic modes is represented as

/>

Definition 3 (cross-correlation diagram): for traffic pattern m and traffic pattern n, the cross-correlation map is defined as

wherein ,V_m and V_n Node sets of traffic pattern m and traffic pattern n, respectively, and |V _m |＝N _m 、|V _n |＝N _n 。/>

Is a weight matrix that characterizes the correlation between traffic pattern m and traffic pattern n. Specifically, [ A ] _mn ] _i,j The correlation between node i in traffic pattern m and node j of traffic pattern n is represented. For a multimode traffic system T, the cross-correlation map of all traffic modes is denoted +.>

Definition 4 (static spatial correlation matrix): in order to comprehensively acquire the spatial characteristics in the multi-mode traffic system, the invention defines two static spatial correlation matrixes, namely a distance correlation matrix

And a functional dependency matrix->

Specifically, given traffic pattern m and traffic pattern n, distance correlation matrix +.>

See equation (1) for definition. Wherein (1)>

Representing the distance correlation, lng, between node i in traffic pattern m and node j in traffic pattern n _i ，lat _i and lng_j ，lat _j Representing nodes i and m in traffic patternLongitude and latitude of node j of traffic pattern n. The function dist (·) is used to calculate the euclidean distance between two points.

For a functional correlation matrix, given traffic pattern m and traffic pattern n, the functional correlation matrix is expressed as:

wherein ,

the functional correlation between node i in traffic pattern m and node j of traffic pattern n is shown. P is p _m,i and p_n,j Respectively representing the inbound passenger flow sequences of the node i in the traffic pattern m and the node j in the traffic pattern n. Corr (.) is used to calculate the correlation coefficient. Sigma represents standard deviation.

Problem definition: for a multi-mode traffic system T, the system comprises M (M>1) A traffic pattern. Historical passenger flow sequence X for a given multimode traffic system _(t-L)→t Autocorrelation diagram G _s And cross-correlation diagram G _c Short-term traffic prediction for a multi-mode traffic system aims to find a function F (·) that can predict future traffic for each traffic mode within the multi-mode traffic system:

X _t+1 ＝F(X _(t-L)→t ,G _s ,G _c ) (3)

2. Introduction to correlation model

1) ProbSparse self-attention mechanism

Vaswani et al first proposed a transducer model for natural language processing. The transducer is characterized in that the model is composed of a multi-head attention mechanism and a forward propagating neural network, wherein the multi-head attention mechanism is composed of a plurality of self-attention mechanisms. The innovation of the methou et al, which was developed for the transducer model and proposed the Informar model for long-time series prediction tasks, is the ProbSparse self-attention mechanism layer and the distillation layer (Distilling layer).

The probspark self-attention mechanism mainly involves three matrices, namely a query matrix Q, a key matrix K, and a value matrix V. For traffic pattern i, assuming that the history period length is L and the total number of nodes is S _i . The three matrices can be expressed as

and />

The probspark self-attention mechanism expression is as follows:

wherein Softmax (. Cndot.) represents the activation function,

for scaling the dot product result. />

Representing a processed query matrix containing only the data obtained by sparse metrics M (q _i K), u being calculated from the sampling factor c by u=c· lnL _Q And (5) controlling. Based on the probspark self-attention mechanism, the multi-headed probspark self-attention mechanism is expressed as follows:

wherein ,n_p Indicating the number of points of attention, indicating the convolution operation.

And

a learnable parameter representing the i-th point of attention. Furthermore, the invention is provided with d _q ＝d _k ＝d _v ＝d _model/n, wherein d_model Is a super parameter. Since the probspark self-attention mechanism screens only the query vector Q, the value vector V is redundant and the information matrix is difficult to further process, and for this purpose, zhou et al use a distillation layer to eliminate the redundancy of the value vector V. Assuming that the output of the multi-headed probspark attention mechanism in the i-layer coding layer is R _i The distillation layer principle is as follows.

Wherein ELU (·) is the activation function, conv1D (·) is the one-dimensional convolution operation, and MaxPool (·) represents the max pooling operation.

2) Multi-picture convolution (MGC)

Many studies have utilized multi-graph convolution (MGC) for fully characterizing the spatial characteristics of traffic networks. The MGC may be represented by the following formula:

wherein ,

A ^u represents the u-th correlation matrix, D ^u Is the corresponding degree matrix. W (W) _u Representing the learnable parameters. The MGC is able to capture a variety of spatial relationships compared to ordinary graph convolution. However, this graph convolution is difficult to adapt to multi-mode traffic systems. In particular, the amount of parameters in the MGC can increase significantly as the number of traffic patterns increases, resulting in models that are difficult to train. Furthermore, in a multi-mode traffic system, the correlation matrix between different traffic modes is typically different in dimensionThus, the various features cannot be directly aggregated by summation. To solve these problems, the present invention proposes MGCs for use in multi-mode traffic systems. Specifically, for a multi-mode traffic system T, the system includes M (M>1) A traffic pattern and a target traffic pattern P. By->

And

respectively representing a distance correlation matrix set and a functional correlation matrix set, and defining +.>

For the MGC for the target traffic pattern P, the formula is expressed as follows: />

Where u represents the kind of correlation matrix,

and />

Representing the cross correlation matrix and the autocorrelation matrix, respectively. W (W) ^u,c and W^u,s Is a learnable parameter.

3. The short-time passenger flow prediction model provided by the invention

The invention provides a novel short-time passenger flow prediction model M2-former based on multi-task learning. In particular, the model consists of a plurality of branches with codec structures, each branch corresponding to a particular traffic pattern. For a particular traffic pattern, the encoder is used to learn and capture explicit and implicit spatiotemporal correlations between multiple traffic patterns; the decoder is used for further extracting inflow characteristics of the target traffic pattern and generating future inflow. The overall framework of the model is shown in fig. 2. The M2-former consists of E encoder layers and D decoder layers, with task specific layers for obtaining future inflow for each traffic pattern.

The encoder is used for learning the complex correlation among the multimode traffic, and mainly comprises two parts, namely: multiple time relationship attention mechanism (MTR-A) for extracting time correlation between multiple traffic patterns, and ProbSparse attention mechanism by multiple mode traffic system

And an attention mechanism based feature aggregation layer (AAB) composition; a multi-space adaptive multi-graph convolution (MSR-MGC) is used to capture explicit and implicit spatial correlations between different traffic patterns. Finally, the fusion layer sums the outputs of the two parts.

1) Multiple time relationship attention mechanism (MTR-a):

MTR-A architecture As shown in FIG. 3, the MTR-A is governed by the ProbSparse attention mechanism of the multimode transportation system

And an attention-mechanism-based feature aggregation layer (AAB) composition for capturing time-dependence of multimode traffic.

Multi-mode traffic system ProbSparse attention mechanism

Based on the multi-head ProbSparse self-attention mechanism, the invention provides a multi-mode traffic system ProbSparse attention mechanism>

The mechanism aims at calculating the influence coefficient of various traffic modes on a target traffic mode and obtaining time correlation. Specifically, for a multi-mode traffic system T, the system includes M (M>1) A traffic pattern and a target traffic pattern P. Assuming that the historical time period length is L, the historical incoming passenger flow of the multi-mode traffic system is denoted as X ^O ＝{X ^k ,k＝1,…,M}，/>

wherein ,S_k The number of nodes representing the kth traffic pattern. />

A historical passenger flow sequence representing a target traffic pattern.

X is to be ^P Consider the key matrix +.>

Sum matrix->

X is to be ^O Consider the query matrix set q= { Q ₁ ,…,Q _m ,Q _P }, wherein->

For the target traffic pattern P, the multimode traffic system probspark attention mechanism may be represented by the following formula:

wherein ,

is a distillation layer>

and />

Is a scientific system parameter. W is different in data structure of different traffic modes _k For unifying the dimensions of all time correlation results. Furthermore, all traffic patterns share the same multi-headed probspark self-attention mechanism layer.

The key to (a) is to calculate the product of the query matrix and the key matrix, i.e. +.>

Definitions of the invention->

For a matrix of influence coefficients of traffic pattern k on the target traffic pattern, the elements of the matrix can be calculated by the following formula:

wherein ,q_k Representation of

In (a), K represents K ^T Is a component of the group. The influence coefficient matrix may be visualized as shown in fig. 4. By calculating->

and K^T The influence coefficient matrix can describe the influence of the historical passenger flow of the traffic mode k on the target traffic mode by the product of the elements. Further, the influence coefficient matrix is multiplied by the value matrix, so that a time correlation characteristic diagram between the traffic pattern k and the target traffic pattern is obtained +.>

Further, the dimension of the time correlation between all traffic modes and the target traffic mode is unified and the overall time correlation characteristic graph set is obtained by virtue of the dimension

To fully extract H _P The invention proposes a feature aggregation layer based on an attention mechanism for further processing H _P 。

The feature aggregation layer (AAB) based on the attention mechanism provided by the invention is shown in figure 5, and a feature diagram H is input _P ，H _P Sequentially processing the traffic mode level attention layer and the node level attention layer to respectively generate two attention matrixes A ^TL ∈R ^m×1×1 And

each attention matrix is multiplied with a corresponding input for feature refinement, respectively. After feature refinement, a feature map H' is obtained _p Further summing the feature map of each traffic pattern and obtaining the time feature map of the target traffic system via residual link +.>

The feature aggregation layer AAB based on the attention mechanism may be represented by the following formula, wherein, the ". Ala represents the Hadamard product.

The AAB mainly includes a traffic mode level attention layer and a node level attention layer. In particular, the traffic pattern level attention layer focuses on the impact of different traffic patterns. Given characteristic diagram H _P The attention layer first aggregates the features of each traffic pattern by max pooling, average pooling, and node level feature extraction (NLE), generating three different vectors, namely

and />

The three have the same dimension. The three vectors are input to the same full connection layer, thereby obtaining the traffic mode level attention matrix A ^TL Formula (VI)The expression is as follows:

wherein sigma represents a Sigmoid activation function, FC (·) represents a fully connected layer, W _in ∈R ^m×(m/r) and W_re ∈R ^(m/r)×m R is a given parameter, which is a learnable parameter.

Since the maximum pooling and average pooling can only capture part of the characteristics of all traffic patterns. Therefore, the invention proposes that NLE is used for supplementing the maximum pooling and the average pooling, so that the model comprehensively acquires the characteristics of all traffic modes. Specifically, the NLE sequentially processes the feature map of each traffic pattern along the time axis and the node axis, respectively. And further, splicing the values of different traffic modes to obtain a final result. Specifically, for the traffic pattern k, the NLE first processes the feature map time axis alone to obtain the representation value of the node

The calculation formula is as follows:

wherein ,

for the corresponding feature map of traffic pattern k +.>

Is a learnable parameter. Further, NLE pair->

Processing along node axis and obtaining corresponding result of traffic pattern k +.>

wherein ,

is a learnable parameter. Finally, the results of each traffic mode are spliced to obtain

For a node level attention layer, the attention layer is primarily concerned with the impact of different nodes. Therefore, the traffic mode level attention layer and the node level attention layer are complementary, and information of the multi-mode traffic is comprehensively obtained. Specifically, a feature map H 'after being processed by the traffic pattern level attention layer is given' _p The node-level attention layer realizes feature aggregation and generates two feature matrixes with the same dimension by carrying out maximum pooling and average pooling operation on feature graphs corresponding to each traffic mode

and />

Further, is->

and />

Generating a node level attention matrix A through two-dimensional convolution ^NL 。

2) Multi-space adaptive multi-graph convolution (MSR-MGC)

The invention provides a self-adaptive multi-graph convolution MSR-MGC for extracting dynamic space characteristics of a multi-mode traffic system. Unlike the multi-map convolution of equation (8), the MSR-MGC utilizes an adaptive cross-correlation matrix

The method is used for capturing the implicit spatial relation between different traffic modes so as to characterize the dynamic spatial characteristics of the multi-mode traffic system. For each traffic mode, an adaptive implicit relation node matrix E is provided ^adp For describing implicit spatial features of the corresponding traffic pattern. Based on E ^adp The adaptive spatial relationship matrix of traffic pattern i to traffic pattern j is defined as follows:

wherein ,

spatial relation matrix representing corresponding class, +.>

and />

Is a learnable parameter. C is a super parameter and represents the number of hidden states in the self-adaptive hidden relation node matrix. ReLU (·) is an activation function for eliminating weak connections.

For the first encoding layer, input of a given target traffic pattern

Set of spatial correlation matrices A ^d and A^f And an adaptive implicit spatial relationship node set +.>

The MSR-MGC may be represented by the following formula: />

wherein ,

and />

Is a learnable parameter.

Finally, the output H of MTR-A _MTR And output H of MSR-MGC _MSR And inputting the data to the fusion layer through the fusion layer to generate the output of the first layer coding layer.

X _en ＝H _MSR +H _MTR (19)

In addition, because of the various links between different traffic patterns, the parameter amount of the model can be increased significantly with the increase of the number of traffic patterns. In one embodiment, the regularization term is designed

For avoiding excessive parameter amounts.

Where ε is a predefined parameter that is used to trade-off the specific gravity of the auto-correlation parameter and the cross-correlation parameter. Typically, historical inflow of the target traffic pattern has the greatest impact on its future inflow, while other traffic patterns have less impact relative to the target traffic pattern. Thus, ε is set to less than 1.

The decoder is used for knowledge sharing among the multi-mode traffic systems and extracting the characteristics of the target traffic mode. For example, the decoder is mainly composed of two parts, namely: the self-time relation attention mechanism (STR-A) is used for extracting the self-time correlation of the target traffic mode, and is formed by superposing two attention mechanisms, namely a causal ProbSparse attention mechanism and a convolution multi-head attention mechanism; the self-space multi-graph convolution (SSR-MGC) is used for extracting the self-space correlation, so that knowledge sharing among multiple traffic modes is realized.

1) Self-time relationship attention mechanism (STR-A)

The self-temporal relationship attention mechanism consists of a causal probspark attention mechanism and a convolution multi-head attention mechanism.

For the causal probspark attention mechanism (CPS-se:Sup>A), it uses se:Sup>A one-dimensional causal convolution to process the historical passenger flow for the corresponding traffic pattern. In addition, the invention processes the data by using the dilation convolution, so that the convolution kernel obtains a larger receptive field. Specifically, as shown in FIG. 6, historical passenger flow data X for a given target traffic pattern ^P And a convolution kernel function f with a kernel size of K, wherein the one-dimensional causal convolution extracts and aggregates historical passenger flow information from a specific time step.

wherein ,*_D Representing a hole convolution operation, where D represents the expansion coefficient, e.g., setting d=2 ⁱ Where i represents an i-th decoding layer. Fig. 6 shows a one-dimensional causal convolution of

expansion coefficients

1,2 and 4. Furthermore, the present invention sets residual links, so that the output of CPS-A is

Further, is->

Is input into the multi-head probspark self-attentive system for obtaining the hidden state of the target traffic pattern>

Convolution multi-head attention mechanism: the original multi-headed attention mechanism uses a fully connected layer to process query, key and value matrices. However, fully connected layers have difficulty capturing characteristics of different nodes in a traffic network and require consuming a significant amount of computing resources. Therefore, the invention provides a convolution multi-head attention mechanism, which replaces a full connection layer by a two-dimensional convolution operation, and the full connection layer is shown in the following formula:

wherein, represents the convolution operation,

and />

Representing a learnable parameter->

n _c The number of the attention points in the convolution multi-head attention mechanism is represented as a super parameter.

2) Self-space multi-picture convolution (SSR-MGC)

Since the decoding layer only considers the target traffic pattern, there is no display connection between the different branches. Therefore, in order to realize knowledge sharing among corresponding branches of different traffic modes, the invention provides an SSR-MGC, and an implicit connection is constructed by utilizing a multi-element linear relation learning framework to realize knowledge sharing of multi-mode traffic. Unlike equation (8), SSR-MGC considers only the autocorrelation matrix and works on the learnable parameters

Modifications are made. Specifically, in the i-th decoding layer, the present invention sets +.>

Traffic pattern P corresponds to a learnable parameter in SSR-MGC. For the target traffic pattern P, the SSR-MGC of the i-th decoding layer is shown as follows:

wherein ,

and />

The distance autocorrelation matrix and the functional autocorrelation matrix of the traffic pattern P are represented, respectively. The invention aggregates the learnable parameters of all branches in the ith decoding layer into the same set

Where L' =2×l. Given training data { X, Y }, wherein +.>

Then->

The maximum posterior probability of (2) is as follows:

wherein the first item

For a priori distribution, assume +/for each decoding layer>

Are independent of each other. The second term is the maximum likelihood estimate of the network. Let->

Obeying a tensor normal distribution:

wherein ,

is mean tensor, ++>

and />

Respectively representing an input covariance matrix, an output covariance matrix and a traffic mode covariance matrix. />

Represents the Kronecker product. Knowledge sharing among different traffic modes is achieved by applying the same distribution to the parameters of different branches.

Bringing formula (25) into formula (24) and taking the negative logarithm to obtain the regular term corresponding to SSR-MGC in the ith decoding layer

wherein ,

representing the dimension size of the corresponding category. It is noted that during training, only +.>

While the rest covariance matrices are setThe bit identity matrix is not updated. The invention updates with the algorithm proposed by Ohlson et al.

Finally, output H of SSR-MGC _SSR Output H with STR-A _STR And summing and inputting the sum into the fully-connected layer to generate future passenger flow of the corresponding traffic mode.

In general, the present invention selects Mean Square Error (MSE) as the loss function of the model as follows:

where α and β are predefined parameters for weighting the specific gravity of two regularization terms.

4. Experimental results

The data set, evaluation index and loss function, reference model, model parameter set, and result analysis used for experimental verification will be described in detail below.

1) Data set

Various traffic patterns represented by subways, taxis and buses are selected. The experiment was based on real multimodal traffic system data from beijing, china on days 2016, 2, 29 to 4, 1 (about 1 month). In addition, only the inflow data of the subway at the working day and the receiving and sending requirements of the taxis and buses are considered. Because the service time of the traffic modes is different, inflow data between 5:00 am and 11:00 pm, namely the service time of the subway, is selected. The time granularity is set to 30 minutes, 36 time steps per day. Details of the data for the different modes of transportation will be described in detail below.

Taxi demand data: taxi demand data using taxi. As shown in fig. 7, the left graph is axibj raw data, which divides the beijing part area into 32×32 grid cells, wherein the color of the grid represents the inflow amount of the grid, and the darker the color, the higher the inflow amount. As shown in the right diagram of fig. 7, one 4×4 mesh unit is defined as one area, and thus there are 64 areas in total. Further, inflow data for each region is extracted to represent a taxi inflow data set. In addition, since taxis do not have a fixed site, the center of each area is set as a taxi station.

Bus passenger flow data: as shown in fig. 8, in the left diagram, 1269 bus stops (light dots) are screened according to the region where the taxis are located. Inflow data of each site is obtained by collecting AFC data of each site. Because the number of bus stops is large and the fluctuation of the passenger flow of each bus stop is large, the passenger flows of the bus stops in the same area are summarized, and the result is used for representing the passenger flow of each area. Thus, the incoming data structures of buses and taxis are homogenous. In addition, in order to correctly represent the bus stops for each zone, a virtual bus stop is generated for each zone. As shown in the right diagram of fig. 8, given an area containing S bus stops, the larger the average daily passenger flow is, the more representative the area is. Therefore, in the region of coordinates (i, j), the positional formula of the virtual bus stop is:

/>

wherein ,lng_virtual and lat_virtual Respectively representing the longitude and latitude of the virtual site. lng _i and lat_i Respectively representing the longitude and latitude of the site i,

representing the average traffic volume for site i.

Subway passenger flow data: as shown in fig. 8 (dark dots), a part of the area has no subway stations. Therefore, all subway stations (dark dots) within the area are selected to represent the subway inflow dataset, and there are 174 subway stations in total (right diagram of fig. 8). Compared with inflow data of taxis and buses, the subway has a different data structure.

As shown in the right diagram of fig. 8, two areas with associated subway stations and bus stations are selected for illustration, namely area (a) and area (b). The area (a) contains all traffic patterns. Since the area (b) indicates an area not including a subway station, the nearest subway station (in a circle) is selected for explanation. As shown in fig. 9, the time period of the selected data is monday to friday. Overall, inflow and pattern differences for different traffic patterns in different regions are significant. Specifically, in terms of passenger flow, there is a slight difference between the three traffic patterns in region (a), while there is a complete difference between the three traffic patterns in region (b). The passenger flow of the subway and the taxi in the area (b) is dominant, and the passenger flow of the bus is obviously lower. In the aspect of passenger flow law, the passenger flows of subways, buses and taxis in the area (a) all show bimodal distribution. For region (b), the three traffic patterns are all unimodal, but the different traffic patterns have different peak times.

2) Evaluation index and loss function

Root Mean Square Error (RMSE), weighted average absolute percentage error (WMAPE) and Mean Absolute Error (MAE) were selected as the evaluation indices for model performance, which were defined as follows.

wherein ,

and y_i,k Respectively a predicted value and a true value of a traffic mode k, M is an input passenger flow sequenceIs a combination of the total length of (a) and (b).

3) Reference model

In experimental verification, the proposed M2-former model will be compared with the following reference model. All models were operated on a desktop with i7-8700K processor (12M cache, frequency up to 4.7 GHz), 32GB running memory, and NVIDIA GeForce GTX 3070 graphics card.

Long and short term memory network (LSTM): the LSTM method of the full communication layer is adopted to model the inflow of the traffic mode.

Two-dimensional convolutional neural network (CNN-2D): and modeling inflow of traffic modes by using a CNN-2D model, wherein each traffic mode has a CNN layer and a full connection layer. For all CNN-2D models, the kernel size is 3×3, the padding is 1, and the stride is 1.

ConvLSTM: the model combines convolution operation with LSTM, and has strong time sequence data space-time modeling capability. Future passenger flow for each mode of transportation is predicted using the model.

ST-ResNet: the model uses 2D-CNN and residual connection, so that the space-time characteristics of the passenger flow in the network range can be captured, and the future passenger flow can be predicted.

MIX-MGC: the model is a model based on multi-graph convolution, and has different branches, and knowledge is shared among the different branches. The model is capable of cooperatively predicting a plurality of traffic patterns. Specifically, the model includes two parts in total, a first part learning shared knowledge across tasks by regularization and a second part learning shared knowledge by multiple linear relationships.

STGCN: a GCN-based deep learning model may model temporal features with a spatial map-convolution layer and a temporal gating-causal-convolution layer.

Informar: a model based on a multi-head ProbSparse attention mechanism is used for predicting future passenger flow of each traffic mode respectively.

4) Model parameter setting and super parameter debugging

In the experiment, an M2-former model is built in a PyTorch environment, the batch size in the experiment is set to be 16, and the proportion of a training set, a verification set and a test set is as follows7:1:2. in the M2-fromer model, the number of layers of the encoding layer and decoding layer is 2, and the sampling coefficient c is set to 6. In the loss function, α and β are set to 0.0005 and 0.0001, respectively. In addition, there are four super parameters to be set, including

The number of points of interest and d _model The number of points of attention in the convolutionally multi-headed attention mechanism, and the length of the historical event steps.

For the following

The number of the middle attention points is set to be [2, 10 ] in the search interval]The step length is 1; for->

D in (d) _model Setting a search set (5, 10, 15, 20, 25, 30); for the number of points of attention in the convolution multi-head attention mechanism, the search interval is set as [2, 10]The step length is 1; for the length of the history period, the search interval is set to be [2, 10]The step length is 1; . When the super-parameters are adjusted, a control variable method is adopted for parameter adjustment, and the specific principle is not repeated. The section uses RMSE and MAE as evaluation indexes of model effects, and uses the full-network dataset to perform super-parameter adjustment. The result of the super parameter adjustment is shown in fig. 10. From the figure, it can be seen that for +.>

In the method, the optimal value of the number of the points of attention is 2, d _model An optimal value of 10; the optimal value for the number of points of attention in the convolutionally multi-headed attention mechanism is 4. For the length of the history period, the optimal value is 4.

5. Discussion of experimental results

The experimental results are shown in table 1. It can be seen that the M2-former has the lowest error in short-term passenger flow prediction of the multi-mode traffic system, and the highest prediction accuracy, compared with all the reference models. The prediction results of the M2-former model in the region (a) and the region (b) are shown in fig. 11, in which the left side view corresponds to the region (a) and the right side view corresponds to the region (b).

TABLE 1 comparison table of short-term passenger flow prediction results for multimode traffic

Table 2 ablation experimental results

Furthermore, the effectiveness of the M2-former structure is proved by an ablation experiment, and the partial structure and framework of the M2-former are changed according to the principle of control variables. And RMSE, MAE and WMAPE were used as evaluation indexes, the results are shown in table 2. Since the main components of the M2-former model can be divided into an attention mechanism-based module and a graph-based module, they are discussed in two parts. First, model details are as follows for the attention mechanism based module.

The multimode traffic probspark attention mechanism part in the M2-former model is removed, the rest remains unchanged.

AAB: the attention-based feature aggregation layer in the M2-former model is replaced by an addition operation, and the rest part is unchanged.

CPS-A: the causal probspark attention mechanism layer in the M2-former model is replaced by a multi-headed probspark self attention mechanism layer, the rest of which is unchanged.

Conv-A: the convolved multi-head attention mechanism layer in the M2-former model is replaced by a common multi-head attention mechanism, and the rest part is unchanged.

Columns 3-6 of Table 2 show the evaluation index for the different models for

In other words, theThe model only considers the autocorrelation of the target traffic pattern, and the prediction error is the highest. This result suggests that multiple correlations between multiple traffic patterns must be considered together, rather than just autocorrelation. For AAB, the model directly adds the feature graphs of multiple traffic modes, but different weights are not allocated to different traffic modes, so that the prediction error is high. This is mainly because the addition operation defaults to the same importance of all features, resulting in redundancy of features for certain regions. The proposed attention-based aggregation block is able to balance all features and find the best trade-off for them, thus solving the underlying problem. In addition, the model errors of CPS-A and Conv-A are also high.

Next, model details are as follows for graph-based modules.

MSR-MGC: the multi-dimensional multi-map convolution in the M2-former model is removed, the remainder remaining unchanged.

SSR-MGC: the self-space multi-map convolution in the M2-former model is removed, the remainder remaining unchanged.

Columns 7-8 of Table 2 show the evaluation index for the different models. Specifically, if no graph-based module is added, the prediction error of the model in each traffic mode is higher than M2-former, indicating that the graph-based module helps to improve the prediction accuracy. Furthermore, the prediction error of the model with MSR-MGC removed is much higher than that of the model with SSR-MGC removed, and this result indicates that it is crucial to obtain multiple spatial cross-correlations between multiple traffic patterns.

In order to clarify the advantages of cooperatively considering multiple traffic modes, M2-precursors are simplified, and a model suitable for single-mode traffic short-time passenger flow prediction is designed, and the model performance is shown in Table 3. The result shows that the prediction error of the M2-former on each traffic mode is smaller than that of the M2-former on each traffic mode, and the advantages of the multi-mode traffic are considered cooperatively.

Table 3 single mode traffic prediction model and multi-mode traffic prediction model results are compared

In addition, the M2-former model can extract a space-time information interaction mechanism in the multi-mode traffic system, and the space-time information interaction mechanism is subjected to visual analysis in an experiment, so that the multi-mode traffic system can be understood conveniently.

For the time interaction mechanism, the M2-former extracts the time interaction mechanism of the multi-mode traffic system through the MTR-A. To better understand the underlying principles of this mechanism, the MTR-A in the M2-former was modified, and an MTM-CA model was designed that uses a multi-head attention mechanism to obtain multiple time correlations in a multi-mode traffic system. Referring to Table 4, it can be seen that the M2-former prediction error is lower than that of MTM-CA.

TABLE 4 comparison of MTM-CA and M2-former model test results

Further, the time correlation of the subway-taxi and the subway-bus is visually analyzed, as shown in fig. 12, wherein fig. 12 (a) corresponds to the subway-taxi and fig. 12 (b) corresponds to the subway-bus. For the time correlation of subway-taxi, MTM-CA and M2-former obtain similar results; for the time dependence of subway-bus, MTM-CA is quite different from the M2-former results. In general, M2-former and MTM-CA are used for extracting historical passenger flow information of different areas and distributing corresponding weights for the different areas, so that the historical passenger flow information of each area is filtered, and the extraction of time correlation is realized.

For the space interaction mechanism, the M2-former extracts the space interaction mechanism of the multi-mode traffic system through the MSR-MGC. To explore the effect of the adaptive multi-graph convolution and the different spatial relationship graphs on the results, the M2-former model was modified. Specifically, no Adapt represents the original MGC instead of the MSR-MGC without using the adaptive multi-graph convolution, adapt represents the use of the adaptive multi-graph convolution. (D) And (F) represents the use of only the distance correlation matrix or the functional correlation matrix, respectively, (DF) represents the use of both the distance correlation matrix and the functional correlation matrix. The results are shown in Table 5, which demonstrates the effectiveness of the proposed model.

Table 5 results alignment table of corresponding models of different spatial interaction mechanisms

Further, different spatial relation matrixes are visualized, and spatial information interaction mechanisms among different traffic modes are analyzed. It is known from analysis that for a functional correlation matrix, the adaptive spatial correlation matrix can preserve the basic features of the static spatial correlation matrix and refine the features of the different regions. For the distance correlation matrix, the static space correlation matrix can only capture local distance correlation features among different sites, global correlation features cannot be obtained, and the adaptive space correlation matrix can obtain global distance correlation features. In general, the spatial information interaction mechanism filters the most relevant areas by distributing different weights to different areas of different traffic modes, and filters useless information to realize information interaction.

In summary, the present invention can cooperatively consider future passenger flows of a plurality of different areas for a plurality of traffic modes within the range of the urban multi-mode traffic system. The constructed M2-former model can extract dynamic space-time characteristics of various traffic modes, wherein the model can capture a space-time information interaction mechanism between multi-mode traffic and simultaneously obtain the dynamic space-time characteristics of different traffic modes through an encoder; through the decoder, the model can extract the characteristics of the target traffic mode and accurately predict the future passenger flow of the corresponding traffic mode. In addition, the model comprises a multi-branch structure, and connection is built among different branches, so that the problem of heterogeneous traffic passenger flows in multiple modes is solved, information interaction in different traffic modes is realized, and the calculation efficiency is improved.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

Each block in the flowchart or block diagrams in the figures may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A multimode traffic system short-time passenger flow prediction method based on self-adaptive multi-graph convolution comprises the following steps:

X _t+1 ＝F(X _(t-L)→t ,G _s ,G _c )

wherein L represents the length of the history period, X _t+1 Representing an inbound passenger flow sequence of the multi-mode traffic system at the time t+1;

wherein the autocorrelation diagram is expressed as

Representation of the intersectionAn autocorrelation diagram corresponding to the pass pattern k, V _k Is a node set, and |V _k |＝N _k ，/>

For weight matrix, for describing correlation between each point in node set, M represents number of traffic modes, N _k The number of nodes representing the traffic pattern k;

wherein the cross-correlation diagram is expressed as

Cross-correlation diagram representing traffic pattern m and traffic pattern n, V _m and V_n Node sets of traffic pattern m and traffic pattern n, respectively, and |V _m |＝N _m 、|V _n |＝N _n ，/>

As a weight matrix for characterizing the correlation between traffic pattern m and traffic pattern N, N _m Representing the number of nodes of the traffic pattern m, N _n The number of nodes in traffic pattern n is indicated.

2. The method of claim 1, wherein the short-term passenger flow prediction model comprises a plurality of branches, each branch containing an encoder, a decoder, and a task-specific layer, and each branch corresponding to a particular traffic pattern, wherein for a particular traffic pattern, the encoder is to learn and capture explicit and implicit spatiotemporal correlations between multiple traffic patterns; the decoder is used for extracting inflow characteristics of the target traffic mode and generating future inflow; the task-specific layer is used to obtain future inflow of the corresponding traffic pattern.

3. The method of claim 2, wherein for each branch:

the encoder comprises a multi-mode traffic system ProbSparse attention mechanism, a characteristic aggregation layer based on the attention mechanism, a multi-element space self-adaptive multi-graph convolution and a fusion layer, wherein the multi-mode traffic system ProbSparse attention mechanism is used for calculating influence coefficients of various traffic modes on a target traffic mode and obtaining time correlation; the feature aggregation layer based on the attention mechanism comprises a traffic mode level attention layer and a node level attention layer, wherein the traffic mode level attention layer focuses on the influence of different traffic modes, and the node level attention layer focuses on the influence of different nodes; the multi-element space self-adaptive multi-graph convolution is used for extracting dynamic space characteristics of the multi-mode traffic system; the fusion layer gathers the output of the feature aggregation layer based on the attention mechanism and the output of the multi-element space self-adaptive multi-graph convolution;

the decoder comprises a self-time relation attention mechanism and a self-space multi-graph convolution, wherein the self-time relation attention mechanism is overlapped with a causal ProbSparse attention mechanism and a convolution multi-head attention mechanism and is used for extracting the self-time correlation of the target traffic mode; the self-space multi-graph convolution is used for extracting the self-space correlation, so that knowledge sharing among multiple traffic modes is realized.

4. A method according to claim 3, characterized in that for the target traffic pattern P, the multimode traffic system probspark attention mechanism is expressed as:

where Q represents a query matrix, K represents a key matrix, V represents a value matrix,

is a distillation layer, W _k And b is a scientific system parameter, W ^o Is a weight coefficient. />

5. A method according to claim 3, characterized in that the attention mechanism based feature aggregation layer is expressed as:

H' _P ＝A ^TL ⊙H _P

H″ _P ＝A ^NL ⊙H' _P

wherein, the ". Aldrich represents Hadamard product, H _P Representing input feature images, H _P Sequentially processing the traffic mode level attention layer and the node level attention layer to respectively generate two attention matrixes A ^TL and A^NL 。

6. A method according to claim 3, wherein the causal probspark attention mechanism processes historical passenger flow for traffic patterns using one-dimensional causal convolution, and wherein the convolutionally multi-headed attention mechanism processes query matrices, key matrices, and value matrices using two-dimensional convolution operations.

7. A method according to claim 3, characterized in that for the self-space multi-map convolution, for the target traffic pattern P, the output of the self-space multi-map convolution of the i-th decoding layer is represented as:

wherein ,

and />

Distance autocorrelation matrix and functional autocorrelation matrix, X, respectively representing target traffic pattern P ^P Historical passenger flow data of the target traffic pattern, < +.>

Is a learnable parameter, u represents the category of the correlation matrix,/->

Is an autocorrelation matrix.

8. The method of claim 1, wherein the loss function for training the short-term passenger flow prediction model is a mean square error.

9. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor realizes the steps of the method according to any of claims 1 to 8.

10. A computer device comprising a memory and a processor, on which memory a computer program is stored which can be run on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 8 when the computer program is executed.