CN116911460A

CN116911460A - Traffic flow prediction method combining multi-head attention with self-adaptive graph convolution

Info

Publication number: CN116911460A
Application number: CN202310894728.2A
Authority: CN
Inventors: 张红; 阚苏南; 曹洁; 张玺君; 王红燕; 巩蕾; 朱思雨
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-20

Abstract

The invention discloses a traffic flow prediction method combining multi-head attention with self-adaptive graph convolution, which is applied to the technical field of traffic flow prediction. The method comprises the following steps: s1, constructing a model; s2, modeling the spatial correlation of the traffic flow by using graph convolution, capturing the adjacent spatial information of each node on the graph, and extracting the spatial characteristics of the traffic flow; s3, connection of hidden nodes in the graph structure can be found from the data through training, so that the spatial relationship of traffic flow hidden in the data is captured; s4, capturing time dependent information of adjacent time slices of the traffic flow; s5, extracting space-time information of different space dimensions from different historical moments by introducing a plurality of attention heads, and capturing dynamic space-time correlation of traffic flow to obtain high-dimensional space-time characteristics; s6, fusing by two fusion methods of point-by-point addition and vector splicing. The method and the device improve the accuracy of the model to the traffic flow prediction.

Description

Traffic flow prediction method combining multi-head attention with self-adaptive graph convolution

Technical Field

The invention relates to the technical field of traffic flow prediction, in particular to a traffic flow prediction method combining multi-head attention with self-adaptive graph convolution.

Background

Methods used in traffic flow prediction now mainly include statistical model-based, machine learning, and deep learning models.

Statistical modeling is performed by using historical traffic data by a method based on a statistical model, and future traffic flow is predicted by analyzing data patterns and trends. These models are relatively easy to implement and perform well in the case of simple traffic networks and steady traffic flows. However, they often fail to accurately capture the effects of complex traffic features and incidents.

Machine learning models make predictions using features and patterns in large-scale data sets. They are able to handle complex traffic conditions and provide more accurate predictions. The models have strong generalization capability and good adaptability to nonlinear relations and complex traffic networks. However, the training and reasoning process of these models requires a lot of data and computational resources and the interpretation of the model is poor, making it difficult to understand the decision process inside the model.

The deep learning model is a machine learning model and is characterized by having a multi-layer neural network structure. In traffic flow prediction, the deep learning model may improve prediction accuracy by learning complex relationships and nonlinear features in the traffic network. In particular, deep learning methods based on graph roll networks (Graph Convolutional Network, GCN) are excellent in traffic flow prediction. The GCN can effectively process the topological structure and the spatial dependency relationship of the traffic network and capture the interaction information among the nodes in the prediction process. The method can more accurately predict traffic flow, and is particularly suitable for complex urban traffic networks.

Although deep learning based approaches have made significant progress in traffic flow prediction, there are also some challenges. Training of deep learning models requires a lot of labeling data and computational resources and is very sensitive to the selection of hyper-parameters and model tuning. In addition, the model has poor interpretation, and it is difficult to explain the reasons and decision-making process of model prediction.

Therefore, a traffic flow prediction method combining multi-head attention with adaptive graph convolution is provided to solve the difficulty existing in the prior art, which is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the present invention provides a traffic flow prediction method combining multi-head attention with adaptive graph convolution to solve the problems in the prior art.

In order to achieve the above object, the present invention provides the following technical solutions:

a traffic flow prediction method combining multi-head attention with self-adaptive graph convolution comprises the following steps:

s1, constructing an MSTA-GCN model;

s2, modeling the spatial correlation of the traffic flow by using a graph convolution according to the model obtained in the S1, capturing the adjacent spatial information of each node on the graph, and extracting the spatial characteristics of the traffic flow;

s3, according to the model obtained in the S1, a self-adaptive graph convolution method is used, connection of hidden nodes in a graph structure can be found from data through training, and therefore the spatial relationship of traffic flow hidden in the data is captured;

s4, capturing time-dependent information of adjacent time slices of the traffic flow through a time dimension;

s5, extracting space-time information of different space dimensions from different historical moments by introducing a plurality of attention heads, and capturing dynamic space-time correlation of traffic flow to obtain high-dimensional space-time characteristics;

s6, according to the high-dimensional space-time characteristics obtained in the S5, fusing by two space-time characteristic fusion methods of point-by-point addition and vector splicing.

Optionally, the MSTA-GCN model in S1 is composed of a multi-head space-time attention mechanism layer, a picture scroll layer, a time convolution layer and a space-time fusion layer.

Optionally, S2 is specifically:

for a parameterized filter signal G with the input signal x, the spectral convolution is defined as:

wherein U is a matrix composed of feature vectors, and the function g _θ (Λ) is a frequency response function of the convolution kernel, Λ is a diagonal matrix formed by eigenvalues, θ is a parameter to be learned, and T is a transpose of the matrix, which is a calculation method.

Alternatively, the adaptive graph rolling method in S3 uses an adaptive adjacency matrix instead of the conventional adjacency matrix in order to implement an adaptive graph rolling network.

Alternatively, the adaptive adjacency matrix is initialized by random two with a learnable parameter E ₁ ,E ₂ ∈R ^N′F Node embedding is realized, and the calculation formula is as follows:

A _adp ＝softmax(Relu(E ₁ E ₂ ^T ))

wherein A is _adp For the adaptive adjacency matrix calculated, softmax is the normalization function, relu is the activation function, E ₁ E ₂ T is a transpose of a learnable parameter, and is a calculation method.

Optionally, an dilation-causal convolution is used as a temporal convolution layer in S4 to capture the temporal dimension of the node.

Optionally, a multi-headed spatiotemporal attention mechanism is used in S5 in combination with adaptively capturing spatiotemporal correlations of traffic flow dynamics.

Optionally, the multi-head space-time attention mechanism formula is as follows:

SAt＝A _t ·σ((XW ₁ )W ₂ (W ₃ X) ^T +b _s )

chebyshev polynomialsElement-wise multiplication with a spatio-temporal attention matrix SAt to obtain a chebyshev polynomial with attention ++>Wherein ∈Hadamard product;

the multi-head spatiotemporal attention mechanism formula is as follows:

TAt＝A _t ·σ((XV ₁ )V ₂ (V ₃ X) ^T +b _t )

wherein x= (X ₁ ,X ₂ ,...X _t )∈R ^NFT For model deliveryEntering data, wherein N is the number of graph nodes, F is the number of channels of the data, T is the length of a time sequence, and A _s ,A _t ,b _s ,b _t R ^N′N ,W ₁ ,V ₁ R ^T ,W ₃ ,V ₃ R ^F ,W ₂ ,V ₂ ∈R ^F′T For the learnable parameters, sigma is sigmoid as an activation function, the space-time attention matrix SAtTAt is dynamically calculated from the current input, SAt _i,j TAt is the strength of the spatial correlation between node i and node j _i,j For the time-dependent intensity between time i and j, TAt is the calculated attention mechanism matrix, A _t TAt, a parameter that can be learned _i ' _,j For the normalized attention mechanism matrix exp is the exponent of e, leakyRelu is the activation function, out _t For the extracted time feature, TCN is time convolution network, TAt ^(h) Is the h attention mechanism matrix.

Optionally, in S6, the fusion is performed by two space-time feature fusion methods of point-by-point bitwise addition and vector concatenation, which specifically includes:

is provided withIs node v _i Temporal and spatial features to be fused;

adding by point and bit by bit:the method directly adds the corresponding elements of the feature vectors, so that the original space-time feature information can be reserved as far as possible;

vector splicing:the method is to splice the feature vectors in the appointed dimension to fuse the space-time features, the dimension is increased after the data are spliced, and the abundant feature information is displayed.

Compared with the prior art, the invention discloses a traffic flow prediction method combining multi-head attention with self-adaptive graph convolution, which has the beneficial effects that: the model combines a multi-head space-time attention mechanism and a space-time convolution network to extract dynamic space-time characteristics in traffic flow data, and the influence of the space-time attention mechanism, the self-adaptive graph convolution network and the multi-head mechanism on the performance of the model is researched through an ablation experiment; the multi-head mechanism can capture more abundant space-time characteristics, the space-time attention mechanism can effectively mine dynamic space-time characteristics of traffic flow data, the self-adaptive graph rolling network can effectively capture hidden space-dependent relationships, and the prediction performance of the model can be improved under the combined action of the modules; experiments show that the traffic flow prediction model MSTA-GCN provided by the chapter can effectively model traffic flow data, and the prediction accuracy of traffic flow is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a traffic flow prediction method of multi-head attention combined with adaptive graph convolution provided by the invention;

FIG. 2 is a MSTA-GCN model framework diagram provided by the invention;

FIG. 3 is a graph convolution in spatial dimensions provided by the present invention;

FIG. 4 is a graph of the predicted results of a model provided by an embodiment of the present invention on a PEMS04 dataset;

FIG. 5 is a graph of the predicted results of a model provided by an embodiment of the present invention on a PEMS08 dataset;

fig. 6 is an ablation experimental study chart of MSTA-GCN provided by the embodiment of the present invention on two data sets, wherein 6a is an ablation experimental study of PEMS04 data set, and 6b is an ablation experimental study of PEMS08 data set;

fig. 7 is a diagram of prediction results of different spatial information fusion methods according to an embodiment of the present invention, where 7a is a point-wise bitwise addition fusion method and 7b is a vector concatenation fusion method.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention discloses a traffic flow prediction method combining multi-head attention with self-adaptive graph convolution, which comprises the following steps:

s1, constructing an MSTA-GCN model;

Further, referring to FIG. 2, the MSTA-GCN model in S1 is composed of a multi-head spatiotemporal attention mechanism layer, a picture scroll layer, a temporal convolution layer and a spatiotemporal fusion layer.

Further, S2 is specifically:

In particular, the above is a general form of spectral convolution, but it has two major drawbacks. Referring to fig. 3, first, the complexity of the graph convolution operation is high. Second, the graph convolution kernel is global and has a large number of parameters. The chapter adopts a chebNet graph convolution network based on a spectrogram theory to overcome the problems, and uses polynomial expansion approximation calculation graph convolution to solve the problem of overlarge product core:

the number of parameters of the convolution kernel can be reduced to k by using a polynomial approximation of highest degree k. In the process of carrying out the graph convolution operation, the complexity can be reduced to O (K|E|) by using an iteratively defined Chebyshev polynomial as an approximation, wherein K is the order of the polynomial, and E is the number of edges in the graph. The convolution kernel defined based on chebyshev polynomials is called chebyshev convolution kernel, and the corresponding convolution operation is called chebyshev convolution. Their definitions are given in the following formula:

lambda in the formula _max Representing the maximum eigenvalue of the Laplace matrix, I _N Is an identity matrix, and L represents a laplace matrix. The chebyshev polynomial matrix operation is fixed, can be completed in a preprocessing stage, and the Laplacian matrix is generally sparse, and can be accelerated by using sparse tensor multiplication, so that the computational complexity of graph convolution is greatly reduced.

Further, the adaptive graph rolling method in S3 uses an adaptive adjacency matrix instead of the conventional adjacency matrix in order to implement the adaptive graph rolling network.

Specifically, the hidden global spatial features in the data are calculated according to the adaptive adjacency matrix and the chebNet graph rolling network, and then are fused with the graph rolling result. In the fusion process, a gating mechanism is used to selectively update and forget hidden spatial features between nodes.

Further, the adaptive adjacency matrix has a learnable parameter E by randomly initializing two adjacency matrices ₁ ,E ₂ ∈R ^N′F Node embedding is realized, and the calculation formula is as follows:

A _adp ＝softmax(Relu(E ₁ E ₂ ^T ))

Further, an dilation-causal convolution is used as a temporal convolution layer in S4 to capture the temporal dimension of the node.

Specifically, the dilation causal convolution network can realize the receptive field of an exponential level by increasing the number of layers. Compared with a cyclic neural network (RNN) based method, the dilation causal convolutional network can process long sequences in a non-recursive manner, thereby facilitating parallel computation and alleviating gradient explosion problems. The dilation-causal convolution preserves the time causal order by filling zeros on the inputs so that predictions made at the current time step only relate to historical information;

gating mechanisms are a technique that can control information flow, which has been demonstrated to control information flow well in the time-convolved network layer, helping TCNs to handle long sequences and long-term dependencies better.

Given input X ε R ^N×D×S The calculation formula of TCN is:

TCN＝g(θ ₁ *X+a)⊙σ(θ ₁ *X+b)

wherein θ ₁ ，θ ₂ A and b are model parameters, and are element products, g (·) is an output activation function, σ (·) is a sigmoid function, and the proportion of information transferred to the next layer is determined.

Further, a multi-head spatio-temporal attention mechanism is used in S5 in combination with adaptively capturing the spatio-temporal correlation of traffic flow dynamics.

In particular, in the space dimension, traffic conditions among different geographic positions can influence each other, and the traffic conditions continuously change along with the change of time, so that the traffic control system has strong dynamic property. In the time dimension, the traffic condition at a certain moment has a dependency relationship with the historical traffic condition, and the dependency relationship shows nonlinear change along with the time.

Further, the multi-head space-time attention mechanism formula is as follows:

SAt＝A _t ·σ((XW ₁ )W ₂ (W ₃ X) ^T +b _s )

chebyshev polynomialsElement-wise multiplying the space-time attention matrix SAt to obtain a matrix containing notesIdeographic chebyshev polynomial +.>Wherein ∈Hadamard product;

the multi-head spatiotemporal attention mechanism formula is as follows:

TAt＝A _t ·σ((XV ₁ )V ₂ (V ₃ X) ^T +b _t )

wherein x= (X ₁ ,X ₂ ,...X _t )∈R ^NFT Is the input data of the model, N is the number of graph nodes, F is the number of channels of the data, T is the length of the time sequence, A _s ,A _t ,b _s ,b _t R ^N′N ,W ₁ ,V ₁ R ^T ,W ₃ ,V ₃ R ^F ,W ₂ ,V ₂ ∈R ^F′T For the learnable parameters, sigma is sigmoid as an activation function, the space-time attention matrix SAtTAt is dynamically calculated from the current input, SAt _i,j TAt is the strength of the spatial correlation between node i and node j _i,j For the time-dependent intensity between time i and j, TAt is the calculated attention mechanism matrix, A _t TAt, a parameter that can be learned _i ' _,j For the normalized attention mechanism matrix exp is the exponent of e, leakyRelu is the activation function, out _t For the extracted time feature, TCN is time convolution network, TAt ^(h) Is the h attention mechanism matrix.

Further, in the S6, two space-time feature fusion methods of point-by-point addition and vector splicing are used for fusion, and the method specifically comprises the following steps:

is provided withIs node v _i Temporal and spatial features to be fused;

In a specific embodiment, the MSTA-GCN model is implemented using a pytorch framework, and the model construction is completed in a PyCharm development environment. Experiments were run in the Inter (R) Core (TM) i5-6300HQ CPU@2.30GHz and NVIDIA TESLA T4 GPU 15GB environments. When the model is in a training order, the learning rate is set to be 0.001; the batch size is set to 32; num_heads is set to 8; the convolution kernel is set to 64. In addition, since the model predicts a traffic flow for 1 hour in the future, the prediction window T of the model _p Is 12 in size.

The MSTA-GCN model was compared to the following six baseline models:

LSTM: a recurrent neural network for solving the long-term dependence problem, which selectively forgets or stores information by introducing a gating mechanism;

DCRNN: the diffusion convolution recurrent neural network uses a diffusion graph convolution network and a seq2seq to encode space information and time information respectively to capture the space-time correlation of traffic flow;

STGCN: the STGCN model consists of two ST-Conv modules and a full connection layer, wherein the ST-Conv modules comprise two time-gated convolution layers and a picture convolution layer and are used for capturing space-time characteristics;

ASTGCN (r): attention-based space-time diagram convolutional networks integrate three different components to simulate the periodicity of traffic data for three different periods of a highway for traffic flow prediction. In order to ensure fairness of the comparison experiment, only the part of the latest period is taken;

STG2Seq: STG2Seq uses multiple gating map convolution modules and Seq2Seq architecture and mechanism of interest to make multi-step predictions of traffic flow;

graph WaveNet: graph WaveNet models the spatio-temporal correlation of traffic flows by combining Graph convolution with dilation-causal convolution.

Table 1 comparison of the performance of different traffic flow prediction methods

Table 1 shows the predicted performance of the MSTA-GCN model and the different baseline models on different data sets for 1 hour (12 steps). As can be seen from Table 1, the MSTA-GCN model exhibited the best performance on both PEMS04 and PEMS08 sets of data.

The LSTM model only considers the time dependence of traffic flow, ignoring the spatial dependence of traffic flow. Graph WaveNet, ASTGCN (r), DCRNN, STGCN and MSTA-GCN models herein take into account both time dependence and spatial correlation, thus having better predictive performance, which illustrates that spatial correlation of traffic flows plays a vital role in traffic flow prediction.

DCRNN, STGCN, ASTGCN (r) and STG2Seq both model spatial correlation using graph convolution, but the graph convolution network only considers local spatial proximity when aggregating neighbor information, thereby ignoring hidden global spatial correlation in the data. GraphWaveNet, while taking into account hidden spatial correlation, does not take into account the dynamic randomness of the spatiotemporal information in the traffic stream data. The MSTA-GCN model of the chapter utilizes the self-adaptive graph convolution network and a multi-head space-time attention mechanism to model the hidden space correlation and the dynamic randomness of space-time information in data, captures more abundant space-time characteristics and improves the accuracy of the model for predicting traffic flow.

Referring to fig. 4 and 5, the real value of the traffic flow in the test data and the predicted value of the model are plotted, and it can be seen from the graph that the predicted result of the model is relatively stable, and the change trend of the traffic flow can be predicted as a whole. The traffic flow in the early morning and the evening is less, the traffic flow in the time period is easy to predict, and the prediction result of the model is close to the true value. Because the traffic flow in the daytime is large and the randomness is strong, the traffic flow prediction in the time period is not easy to predict, and the model cannot well predict the traffic flow trend in the time period.

Ablation experiments were performed to further study the effects of different modules in the model, five variants of the MSTA-GCN model were designed in this section compared to the MSTA-GCN model, and the differences are shown below:

remove_tat: removing a time attention module in the MSTA-GCN model;

remove_sat: removing a spatial attention module in the MSTA-GCN model;

remove_stat: removing a time and space attention module in the MSTA-GCN model;

remove_adp: removing an adaptive graph rolling module in the MSTA-GCN model;

remove_multi-heads: the multi-head mechanism in the MSTA-GCN model is removed, and only the space-time attention mechanism is saved.

Referring to fig. 6, the MSTA-GCN model and its 5 variants are shown with predicted MAE values for each time step over the PEMS04 and PEMS08 datasets for one hour in the future, 6a for the ablation experimental study of the PEMS04 dataset and 6b for the ablation experimental study of the PEMS08 dataset. From the figure, the MSTA-GCN performance is always superior to other variant models, and the effectiveness of modeling the complex time-space correlation of traffic flow by a time and space attention mechanism, a multi-head mechanism and an adaptive graph rolling network is shown. Referring to fig. 7, 7a is a point-by-bit addition fusion method, 7b is a vector splicing fusion method, and prediction results of two spatial and temporal information fusion methods of point-by-bit addition and vector splicing are shown.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A traffic flow prediction method of multi-head attention combined with adaptive graph convolution, comprising the steps of:

s1, constructing an MSTA-GCN model;

2. The traffic flow prediction method of multi-head attention-combining adaptive graph convolution according to claim 1, wherein the MSTA-GCN model in S1 consists of a multi-head spatiotemporal attention mechanism layer, a graph convolution layer, a temporal convolution layer and a spatiotemporal fusion layer.

3. The traffic flow prediction method of multi-head attention-coupled adaptive graph convolution according to claim 1, wherein S2 is specifically:

4. The traffic flow prediction method of multi-headed attention combined with adaptive graph convolution according to claim 1, wherein the adaptive graph convolution method in S3 uses an adaptive adjacency matrix instead of a conventional adjacency matrix for implementing an adaptive graph convolution network.

5. The traffic flow prediction method based on multi-head attention-coupled adaptive graph convolution according to claim 4, wherein the adaptive adjacency matrix is generated by randomly initializing two adaptive adjacency matrices with a learnable parameter E ₁ ,E ₂ ∈R ^N′F Node embedding is realized, and the calculation formula is as follows:

A _adp ＝softmax(Relu(E ₁ E ₂ ^T ))

6. The traffic flow prediction method of multi-headed attention-coupled adaptive map convolution of claim 1, wherein the time dimension of the node is captured in S4 using dilation-causal convolution as a temporal convolution layer.

7. The traffic flow prediction method of multi-head attention-coupled adaptive graph convolution of claim 1, wherein the multi-head spatio-temporal attention mechanism is used in S5 to combine the adaptive capture of the spatio-temporal correlation of traffic flow dynamics.

8. The traffic flow prediction method of multi-head attention combined with adaptive graph convolution of claim 7, wherein the multi-head spatiotemporal attention mechanism formula is as follows:

SAt＝A _t ·σ((XW ₁ )W ₂ (W ₃ X) ^T +b _s )

the multi-head spatiotemporal attention mechanism formula is as follows:

TAt＝A _t ·σ((XV ₁ )V ₂ (V ₃ X) ^T +b _t )

wherein x= (X ₁ ,X ₂ ,...X _t )∈R ^NFT Is the input data of the model, N is the number of graph nodes, F is the number of channels of the data, T is the length of the time sequence, A _s ,A _t ,b _s ,b _t R ^N′N ,W ₁ ,V ₁ R ^T ,W ₃ ,V ₃ R ^F ,W ₂ ,V ₂ ∈R ^F′T For the learnable parameters, sigma is sigmoid as an activation function, the space-time attention matrix SAtTAt is dynamically calculated from the current input, SAt _i,j TAt is the strength of the spatial correlation between node i and node j _i,j For the time-dependent intensity between time i and j, TAt is the calculated attention mechanism matrix, A _t TAt 'as a learnable parameter' _i,j For the normalized attention mechanism matrix exp is the exponent of e, leakyRelu is the activation function, out _t For the extracted time feature, TCN is time convolution network, TAt ^(h) Is the h attention mechanism matrix.

9. The traffic flow prediction method of multi-head attention combined with self-adaptive graph convolution according to claim 1, wherein in S6, two space-time feature fusion methods of point-by-point bitwise addition and vector concatenation are fused specifically:

is provided withIs node v _i Temporal and spatial features to be fused;