CN115409276A

CN115409276A - Traffic prediction transfer learning method based on space-time diagram self-attention model

Info

Publication number: CN115409276A
Application number: CN202211116536.0A
Authority: CN
Inventors: 姜佳伟; 韩程凯; 王静远; 吴俊杰
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2022-11-29

Abstract

The invention discloses a traffic prediction transfer learning method based on a space-time diagram self-attention model, which comprises the following steps: converting historical traffic data and an urban traffic network structure into high-dimensional space-time expression vectors through a data embedding layer; converting the high-dimensional space-time expression vector into a space-time characteristic vector after space-time self-attention block coding through a space-time coder, and connecting the outputs of the space-time coders from the first layer to the L layer through jumping to obtain a final space-time characteristic vector; and inputting the final space-time characteristic vector to an output layer to obtain predicted traffic data. The method can simultaneously capture the short-distance and long-distance spatial correlation in the traffic network, complete the dynamic modeling of the spatial correlation of traffic data, integrate time and spatial information and realize the inter-city deep space-time prediction task.

Description

Traffic prediction transfer learning method based on space-time diagram self-attention model

Technical Field

The invention relates to the technical field of deep learning, in particular to a traffic prediction transfer learning method based on a space-time diagram self-attention model.

Background

Modern cities are developing to smart cities, various vehicles are continuously increased, and the problem of traffic jam is more obvious, which brings huge pressure to modern city traffic management. The intelligent traffic system is an important component of modern smart cities, and can analyze and process traffic conditions to avoid traffic jam.

Traffic prediction is one of the main functions of intelligent traffic systems, and the main purpose of the traffic prediction is to predict future traffic conditions based on historical traffic data. Accurate traffic prediction can assist vehicles in planning routes, assist relevant departments in commanding vehicle dispatching and effectively relieve traffic jam. For example, accurate prediction of taxi needs of passengers in a city can better pre-allocate and dispatch vehicles to better meet the needs of the passengers and avoid unnecessary waste of resources and waiting.

Traffic prediction is difficult because traffic conditions are affected by various external factors such as complex spatial correlation, dynamic temporal correlation, and weather. The starting point of spatial correlation is that geographic entities in space interact, for example, the traffic flow of an upstream road has a large influence on the traffic conditions of a downstream road. In addition, different functional areas may have their own unique traffic patterns. From the perspective of time correlation, the traffic condition of a place shows a certain trend in a short term and shows a certain periodic law in a long term. For example, on successive weekdays, the traffic patterns at the same location during early rush hour hours may be similar, repeating every 24 hours, but the traffic patterns on weekdays and weekends may differ significantly. Finally, external factors such as extreme weather and traffic control obviously influence people's trips, and further influence traffic conditions on roads.

Currently, the mainstream methods of traffic prediction are mainly classified into two categories, namely, the conventional knowledge-driven method and the data-driven method. Conventional methods include classical statistical methods and machine learning methods. However, these methods limit the ability to model non-linear data, so they generally perform poorly in practice. With the advance of urbanization process, traffic data are continuously accumulated, a data base is laid for the field, and a new perspective for solving the problem is provided for researchers, namely a data driving method. In recent years, with the rapid development of deep learning, researchers have proposed a large number of deep learning methods to solve this challenging problem. In particular, graph Neural Networks (GNNs) can model non-euclidean data, more conforming to the structure of a traffic network, so GNN-based methods have been extensively studied for traffic prediction. These data-driven methods exhibit superior performance due to their ability to model and extract complex data features in traffic flow data, but they still face some limitations.

First, for GNN-based approaches, in most existing studies, the spatial structure of the road network is often represented by a static, predefined or self-learned adjacency matrix. However, modeling the dynamic spatial and temporal correlation of traffic data is a key challenge due to dynamic changes in traffic conditions (such as rush hours, weekends, traffic accidents, or congestion). Static adjacency matrices limit the ability to learn urban traffic dynamics. In addition, the conventional method is designed for a local road network, and has no effect on capturing the long-distance spatial correlation. RNN-based models suffer from gradient vanishing or explosion problems when dealing with long-range sequences, and GNN-based models gather information from their neighborhood, also locally. In a real urban road network, not only the traffic flow of adjacent links (e.g., upstream and downstream) is correlated, but also non-adjacent links with the same function have similar traffic patterns. Therefore, in predicting the traffic flow, it is necessary to consider the correlation of the short distance and the long distance at the same time. Finally, the existing method considers the problem of data migration among different cities less, so that the existing method is difficult to be applied to the cities with less traffic data. Due to the fact that the development levels of different cities are different, the problem of insufficient data exists because a small city has difficulty in collecting enough data to support the training requirement of a complex deep learning model. One approach to this problem is to use a migration learning technique to perform a deep spatio-temporal prediction task across cities to migrate the knowledge learned from data-rich cities to data-poor cities.

Therefore, how to provide a traffic prediction migration learning method based on a space-time diagram self-attention model to solve at least one of the above technical problems is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the invention provides a traffic prediction migration learning method based on a space-time diagram self-attention model, which can simultaneously capture the short-distance and long-distance spatial correlation in a traffic network, complete dynamic modeling of the spatial correlation of traffic data, integrate time and spatial information, and realize a deep space-time prediction task across cities.

In order to achieve the purpose, the invention adopts the following technical scheme:

a traffic prediction transfer learning method based on a space-time diagram self-attention model comprises the following steps:

s1: converting historical traffic data and an urban traffic network structure into high-dimensional space-time expression vectors through a data embedding layer;

s2: inputting the high-dimensional space-time expression vector to a first layer space-time encoder, and specifically comprising the following steps:

carrying out layer normalization on the high-dimensional space-time expression vector to obtain a space-time expression vector after the layer normalization;

respectively inputting the space-time expression vectors after layer normalization into a time perception space self-attention mechanism and a trend perception time self-attention mechanism to correspondingly obtain multi-head space characteristic vectors and multi-head time characteristic vectors;

splicing the multi-head space eigenvector and the multi-head time eigenvector, and adding the multi-head space eigenvector and the high-dimensional space-time expression vector when the layer is not normalized to obtain a space-time eigenvector;

carrying out layer normalization on the space-time characteristic vector to obtain the space-time characteristic vector after the layer normalization;

inputting the space-time feature vector after layer normalization into a fully-connected feedforward neural network, and adding the output and the space-time feature vector when the layer normalization is not performed to obtain the space-time feature vector after space-time self-attention block coding;

s3: inputting the space-time characteristic vector after the space-time self-attention block coding output by the first layer of space-time coder as a high-dimensional space-time expression vector to the second layer of space-time coder, repeating the operation of S2, and so on until the output of the L-th layer of space-time coder is obtained;

s4: connecting the outputs of the space-time encoders from the first layer to the L layer through jumping to obtain a final space-time feature vector;

s5: inputting the final space-time characteristic vector to an output layer to obtain a space-time prediction model;

s6: the method comprises the steps of simultaneously training a space-time prediction model on a source data set through an autoregressive task and an autorecoding task to obtain a space-time diagram self-attention model, initializing through pre-trained parameters when the space-time diagram self-attention model is applied to a target city data set, and then finely adjusting model parameters through the target city data set to realize transfer learning among different cities.

Preferably, the step of converting the historical traffic data and the urban traffic network structure into the high-dimensional space-time expression vector through the data embedding layer specifically comprises the following steps:

converting historical traffic data into traffic data embedding vectors through a traffic data embedding module;

the method comprises the steps that week-internal-period information and day-internal-time information of historical traffic data are correspondingly converted into week-internal-period embedded vectors and day-internal-time embedded vectors through a periodic information embedding module;

outputting the position information of the historical traffic data sequence into a sequence position information coding vector through a sequence position information coding module;

performing eigenvalue decomposition on a Laplace matrix of an adjacent matrix of an urban traffic network structure through a node position information embedding module to obtain an image Laplace eigenvector, and obtaining a node position information embedding vector through a full connection layer;

and adding the traffic data embedded vector, the week-day week embedded vector, the day-time embedded vector, the sequence position information coding vector and the node position information embedded vector to obtain a high-dimensional space-time expression vector.

Preferably, the step of inputting the layer-normalized spatio-temporal representation vector into a spatial attention mechanism of temporal perception to obtain a multi-head spatial feature vector specifically includes:

the time-aware spatial self-attention mechanism comprises h _s A spatial attention head;

in each spatial attention head, the layer normalized spatio-temporal representation vector is converted into a spatial key matrix K by causal convolution ^S And a spatial value matrix V ^S ；

Converting the layer normalized spatio-temporal expression vector into a spatial query matrix Q by full join operation ^S ；

Query the space matrix Q ^S And space key matrix K ^S Matrix multiplication is carried out, and then scaling is carried out to obtain an original space attention matrix A ^S ；

Computing the original spatial attention matrix A ^S And a spatial mask matrix M ^S Performing softmax operation on the Hadamard product to obtain a final space attention matrix;

the final space attention matrix and the space value matrix V ^S Performing matrix multiplication to obtain a space feature vector SSA;

and splicing the space feature vectors SSA output by each space attention head to obtain a multi-head space feature vector.

Preferably, the step of inputting the layer-normalized spatio-temporal representation vector into the trend-aware temporal autofocusing mechanism to obtain the multi-head temporal feature vector specifically includes:

the time self-attention mechanism of trend perception comprises h _t (ii) a time attention head;

in each temporal attention head, the layer-normalized spatio-temporal representation vector is converted into a temporal query matrix Q by causal convolution ^T And time key matrix K ^T ；

Converting the space-time expression vector after layer normalization into a time value matrix V through full connection operation ^T ；

Query matrix Q over time ^T And time key matrix K ^T Carrying out matrix multiplication and scaling to obtain an original time attention matrix A ^T ；

For the original time attention matrix A ^T Performing softmax operation to obtain a final time attention matrix;

the final time attention matrix and the time value matrix V ^T Performing matrix multiplication to obtain a time eigenvector TSA;

and splicing the time eigenvectors TSA output by each time attention head to obtain the multi-head time eigenvector.

Preferably, the spatial key matrix K ^S And a spatial value matrix V ^S The specific expression is as follows:

K ^S ＝Φ ^SK *X，V ^S ＝Φ ^SV *X

wherein X is the space-time expression vector after layer normalization, X is the causal convolution operation, phi ^SK And phi ^SV Is a parameter of the convolution kernel;

spatial query matrix Q ^S The specific expression is as follows:

Q ^S ＝XW ^SQ

wherein, W ^SQ Is a learnable parameter matrix.

Preferably, the original spatial attention matrix A ^S The specific expression is as follows:

wherein, d _k Is a spatial query matrix Q ^S And space key matrix K ^S T denotes the matrix transpose. Preferably, the specific calculation formula of the multi-head spatial feature vector SSA is:

SSA＝softmax(A ^S ⊙M ^S )V ^S

wherein an |, indicates a hadamard product.

Preferably, the time-query matrix Q ^T And time key matrix K ^T The expression is as follows:

Q ^T ＝Φ ^TQ *X，K ^T ＝Φ ^TK *X

wherein X is the space-time expression vector after layer normalization, X is the causal convolution operation, phi ^TQ And phi ^TK Is a parameter of the convolution kernel;

time value matrix V ^T The expression is as follows:

V ^T ＝XW ^TV

wherein, W ^TV Is a learnable parameter matrix.

Preferably, the original temporal attention matrix A ^T The expression is as follows:

wherein, d _k Is a time query matrix Q ^T And a time key matrix K ^T A characteristic dimension of (d);

the temporal feature vector TSA expression is:

TSA＝softmax(A ^T )V ^T 。

preferably, the core idea of the autoregressive task is to use data at past time to generate data at future time, so as to model the context dependence of the traffic data;

the core idea of the self-encoding task is to use the perturbed data to restore the original data, thereby generating a more efficient data representation of the input data.

The invention discloses a traffic prediction migration learning method based on a space-time diagram self-attention model, which has the following advantages:

(1) A time-sensing spatial map self-attention model is designed, and dynamic modeling of spatial correlation of traffic data is completed by introducing historical time information.

(2) A special mask mechanism and a dynamic time warping algorithm are designed, and the modeling of the spatial relationship between long distance and short distance in a traffic network is completed.

(3) Various space-time embedded coding schemes are designed, more accurate traffic prediction is realized, and city management and planning are facilitated.

(4) A pre-training method for model migration among different cities is designed, pre-training is carried out through the model, and the model is migrated to other city data sets, so that the problem of insufficient data of the city is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic block diagram of a traffic prediction transfer learning method based on a space-time diagram self-attention model according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a traffic prediction transfer learning method based on a space-time diagram self-attention model, which comprises the following steps of:

s2: inputting the high-dimensional space-time expression vector into a first layer space-time coder, and specifically comprising the following steps:

respectively inputting the space-time expression vectors after layer normalization into a space self-attention mechanism for time perception and a time self-attention mechanism for trend perception, and correspondingly obtaining multi-head space characteristic vectors and multi-head time characteristic vectors;

Specifically, in S1, the data embedding layer needs to retain as much as possible of the spatial structure information and the time series information in the original data in the process of converting the original input into the high-dimensional space-time expression vector. Thus, the data embedding layer contains the following 4 modules:

(1) A traffic data embedding module: similar to the traditional Transformer model, the input of the module is historical traffic data, and the historical traffic data is projected into traffic data embedded vectors through a full connection layer.

(2) A periodic information embedding module: traffic data typically has a significant periodic regularity, as it is generated by human daily activities. Therefore, the input of the module is week internal week and day internal time information of historical traffic data, week internal week and day internal time information is reserved through two learnable periodic information embedding vectors, and a week internal week embedding vector and a day internal time embedding vector are output.

(3) A sequence position information encoding module: because a self-attention mechanism is used in the space-time encoder and cannot keep the position information of the sequence, the input of the module is the position information of the traffic data sequence, the position information of the sequence is coded through sine and cosine functions with different frequencies, and a sequence position information coding vector is output.

(4) A node position information embedding module: this module wishes to save the location information of each node in the graph. The graph Laplace eigenvector is a spectrum technology for embedding a graph into Euclidean space, and the eigenvectors form a local coordinate system, so that not only is the global graph structure reserved, but also the distance information between nodes can be well described. Therefore, the module performs eigenvalue decomposition on the Laplace matrix of the adjacent matrix of the urban traffic network structure to obtain an image Laplace eigenvector, and then obtains a node position information embedded vector through a full connection layer.

Finally, the outputs of the modules are added to obtain a high-dimensional space-time expression vector X of the original traffic data.

Specifically, how to model the spatial correlation and the temporal correlation of traffic data is a key technical difficulty of the traffic prediction task. The present invention proposes a new space-time encoder to better learn the space-time characterization of traffic data. The space-time encoder in S2 includes two sublayers: a spatiotemporal self-attention block and a fully-connected feedforward neural network, and layer normalization and residual connection operations are applied around each sub-layer. But unlike the traditional multi-headed self-attention mechanism, the model decomposes the multi-headed dot product attention operation in the space-time encoder. Specifically, the model uses a spatial attention head to perform a spatial self-attention mechanism for temporal perception, and a temporal attention head to perform a temporal self-attention mechanism for trend perception, respectively. The results of these headings are stitched together and projected again to get the final output of the spatiotemporal attention block, which allows the model to integrate both spatial and temporal information.

Wherein, the spatial self-attention mechanism of time perception: if the traditional self-attention operation is performed on the spatial dimension, each node only pays attention to the information of other nodes in the same time period, and the propagation delay of the traffic condition is ignored. For example, when a traffic accident occurs in one area, it may take a certain time to affect the traffic conditions in the neighboring area. To solve this problem, the present invention proposes to use a causal convolution operation instead of the traditional full join operation to introduce the effect of time information to dynamically compute attention between nodes, i.e. each node will pay different attention to other nodes at different times. In addition, the invention also introduces a spatial mask matrix to highlight spatial correlation from short-distance and long-distance angles. From a short distance perspective, the matrix filters attention between pairs of nodes that are far apart by setting a threshold. From the perspective of long distance, the invention firstly calculates a similarity matrix through a dynamic time warping algorithm (DTW) according to the historical data time sequence of each node, and selects a plurality of nodes with the traffic modes most similar to the current node for each node and reserves the nodes. Thus, for each node, the spatial mask matrix retains not only its neighbors that are short distance, but those nodes that are long distance but have similar traffic patterns.

The temporal-aware spatial self-attention mechanism is implemented as follows: the module comprises h _s A spatial attention head. In each spatial attention head, the model first uses a causal convolution to correct the timeConverting space-time expression vector after layer normalization of scale input into space key matrix K ^S And a spatial value matrix V ^S Converting the input layer-normalized spatio-temporal representation vector into a spatial query matrix Q using a full join operation ^S Specifically, the following formula is calculated:

Q ^S ＝XW ^SQ ，K ^S ＝Φ ^SK *X，V ^S ＝Φ ^SV *X

wherein X is a space-time representation vector after layer normalization, W ^SQ Is a learnable parameter matrix, is a causal convolution operation, phi ^SK And phi ^SV Are parameters of the convolution kernel.

Performing a spatial query matrix Q ^S And spatial key matrix K ^S The original space attention matrix A is obtained by matrix multiplication and scaling ^S The specific calculation is shown as the following formula:

wherein d is _k Is a spatial query matrix Q ^S And spatial key matrix K ^S The characteristic dimension of (c).

Computing the original spatial attention matrix A ^S And a spatial mask matrix M ^S And performing softmax operation on the result to obtain a final space attention matrix.

The final space attention matrix and the space value matrix V are combined ^S Performing matrix multiplication to obtain a space feature vector SSA, and specifically calculating as shown in the following formula:

SSA＝softmax(A ^S ⊙M ^S )V ^S

wherein, l represents a hadamard product.

And finally, splicing the results SSA of each spatial attention head to obtain the multi-head spatial feature vector.

Time-of-trend-aware self-attention mechanism: the present model uses a trend-aware temporal self-attention mechanism to mine temporal patterns of traffic data. The simple point-by-point self-attention operation cannot consider the local context information of the traffic data, and the judgment of the traffic data change trend can be confused. Thus, the present invention uses causal convolution instead of the traditional fully-connected layer to introduce trends over the history of the time series.

The temporal self-attention mechanism of trend perception is performed as follows: the module comprises _t Time attention head. In each temporal attention head, the model first converts the input spatio-temporal representation vector into a temporal query matrix Q using causal convolution ^T And a time key matrix K ^T Converting the input spatio-temporal representation vector into a matrix of time values V using a full concatenation operation ^T The specific calculation is shown as follows:

Q ^T ＝Φ ^TQ *X，K ^T ＝Φ ^TK *X，V ^T ＝XW ^TV

wherein X is the input space-time representation vector, X is the causal convolution operation, phi ^TQ And phi ^TK Is a parameter of the convolution kernel, W ^TV Is a learnable parameter matrix.

Query matrix Q over time ^T And a time key matrix K ^T Matrix multiplication is carried out, and then scaling is carried out to obtain an original time attention matrix A ^T Specifically, the following formula is calculated:

wherein, d _k Is a time query matrix Q ^T And time key matrix K ^T The characteristic dimension of (c). Then for A ^T And performing softmax operation to obtain a final time attention matrix.

Performing matrix multiplication of the final time attention matrix and the time value matrix to obtain a time eigenvector TSA, specifically calculating as follows:

TSA＝softmax(A ^T )V ^T

and finally, splicing the results TSA of each time attention head to obtain a multi-head time feature vector.

In S5, the output layer respectively realizes multi-step prediction and converts the characteristic dimension into the required output dimension through two fully-connected layers.

Further, pre-training and transfer learning in S6: for the problem of insufficient data in part of cities, pre-training the model and performing transfer learning are effective solutions. Although there may be great differences in data distribution and road network topology among different cities, there is a significant similarity in traffic patterns across cities. The model pre-trained on the large data set is migrated to the small data set, so that the prediction performance of the model on the small data set is improved, and the traffic prediction related knowledge learned from the source city with rich data can be transferred to the target city with sparse data. The model provided by the invention is completely based on a self-attention mechanism and does not contain any graph volume operation, so that the traffic Transformer model which is pre-trained can be directly and conveniently migrated to other data sets. In order to learn from a source data set to migratable traffic prediction knowledge and further improve the performance of a model on a target data set, the invention designs two pre-training tasks: (a) autoregressive task: the core idea of this task is to use past time data to generate future time data, thereby modeling the contextual dependency of traffic data. (b) self-encoding tasks: the core idea of this task is to use the noise-perturbed data to restore the original data, thereby generating a more efficient data representation of the input data. A specific data perturbation method is to randomly select 15% of the data to set 0, and use the model to recover the value set to 0 from the traffic data that is not replaced. In order to avoid the defects of different pre-training tasks, an auto-coding task and an auto-regression task are fused in a pre-training stage, namely two pre-training tasks are simultaneously performed to complete pre-training of a model on a source data set. And when the model is applied to a data set of a target city, initializing by using the pre-trained parameters, and then finely adjusting the model parameters by using the data set of the target city, thereby realizing the transfer learning among different cities.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The traffic prediction transfer learning method based on the space-time diagram self-attention model is characterized by comprising the following steps:

s3: inputting the space-time feature vector after the space-time self-attention block coding output by the first layer of space-time coder as a high-dimensional space-time representation vector to the second layer of space-time coder, repeating the operation of S2, and so on until the output of the L-th layer of space-time coder is obtained;

2. The traffic prediction migration learning method based on the spatiotemporal self-attention model as claimed in claim 1, wherein the transforming of the historical traffic data and the urban traffic network structure into the high-dimensional spatiotemporal representation vector through the data embedding layer specifically comprises:

3. The traffic prediction transfer learning method based on the spatio-temporal map self-attention model according to claim 1, wherein the step of inputting the layer-normalized spatio-temporal expression vector to a time-aware spatial self-attention mechanism to obtain a multi-head spatial feature vector specifically comprises:

in each spatial attention head, the layer-normalized spatio-temporal representation vector is converted into a spatial key matrix K by causal convolution ^S And a spatial value matrix V ^S ；

Query the space matrix Q ^S And space key matrix K ^S Carrying out matrix multiplication and scaling to obtain an original space attention matrix A ^S ；

4. The traffic prediction migration learning method based on the spatiotemporal self-attention model as claimed in claim 1, wherein the step of inputting the layer-normalized spatiotemporal representation vector to the trend-aware temporal self-attention mechanism to obtain the multi-head temporal feature vector specifically comprises:

Converting the layer-normalized spatio-temporal expression vector into a time value matrix V by full-join operation ^T ；

Query matrix Q over time ^T And a time key matrix K ^T Matrix multiplication is carried out, and then scaling is carried out to obtain an original time attention matrix A ^T ；

the final time attention matrix and the time value matrix V are combined ^T Performing matrix multiplication to obtain a time eigenvector TSA;

and splicing the time eigenvectors TSA output by each time attention head to obtain a multi-head time eigenvector.

5. The traffic prediction transfer learning method based on the spatiotemporal self-attention model as claimed in claim 3, characterized in that the spatial key matrix K ^S And a spatial value matrix V ^S The specific expression is as follows:

K ^S ＝Φ ^SK *X，V ^S ＝Φ ^SV *X

spatial query matrix Q ^S The specific expression is as follows:

Q ^S ＝XW ^SQ

wherein, W ^SQ Is a learnable parameter matrix.

6. The traffic prediction migration learning method based on space-time diagram self-attention model according to claim 5Method, characterized by the original spatial attention matrix A ^S The specific expression is as follows:

wherein d is _k Is a spatial query matrix Q ^S And spatial key matrix K ^S T denotes the matrix transpose.

7. The traffic prediction transfer learning method based on the space-time diagram self-attention model according to claim 6, wherein the specific calculation formula of the multi-head space feature vector SSA is as follows:

SSA＝softmax(A ^S ⊙M ^S )V ^S

wherein, l represents a hadamard product.

8. The traffic prediction transfer learning method based on the space-time diagram self-attention model as claimed in claim 4, wherein the time query matrix Q ^T And time key matrix K ^T The expression is as follows:

Q ^T ＝Φ ^TQ *X，K ^T ＝Φ ^TK *X

time value matrix V ^T The expression is as follows:

V ^T ＝XW ^TV

wherein, W ^TV Is a learnable parameter matrix.

9. The method of claim 8, wherein the original time-space-diagram-based traffic prediction transition learning method is characterized in that the original time attention matrix A ^T The expression is as follows:

wherein d is _k Is a time query matrix Q ^T And time key matrix K ^T A characteristic dimension of (d);

the temporal feature vector TSA expression is:

TSA＝softmax(A ^T )V ^T 。

10. the traffic prediction transfer learning method based on the spatiotemporal graph self-attention model according to claim 1,

the core idea of the autoregressive task is to use data at past time to generate data at future time, so as to model the context dependence of traffic data;