CN115409276A - Traffic prediction transfer learning method based on space-time diagram self-attention model - Google Patents

Traffic prediction transfer learning method based on space-time diagram self-attention model Download PDF

Info

Publication number
CN115409276A
CN115409276A CN202211116536.0A CN202211116536A CN115409276A CN 115409276 A CN115409276 A CN 115409276A CN 202211116536 A CN202211116536 A CN 202211116536A CN 115409276 A CN115409276 A CN 115409276A
Authority
CN
China
Prior art keywords
time
space
matrix
attention
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211116536.0A
Other languages
Chinese (zh)
Inventor
姜佳伟
韩程凯
王静远
吴俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202211116536.0A priority Critical patent/CN115409276A/en
Publication of CN115409276A publication Critical patent/CN115409276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a traffic prediction transfer learning method based on a space-time diagram self-attention model, which comprises the following steps: converting historical traffic data and an urban traffic network structure into high-dimensional space-time expression vectors through a data embedding layer; converting the high-dimensional space-time expression vector into a space-time characteristic vector after space-time self-attention block coding through a space-time coder, and connecting the outputs of the space-time coders from the first layer to the L layer through jumping to obtain a final space-time characteristic vector; and inputting the final space-time characteristic vector to an output layer to obtain predicted traffic data. The method can simultaneously capture the short-distance and long-distance spatial correlation in the traffic network, complete the dynamic modeling of the spatial correlation of traffic data, integrate time and spatial information and realize the inter-city deep space-time prediction task.

Description

Traffic prediction transfer learning method based on space-time diagram self-attention model
Technical Field
The invention relates to the technical field of deep learning, in particular to a traffic prediction transfer learning method based on a space-time diagram self-attention model.
Background
Modern cities are developing to smart cities, various vehicles are continuously increased, and the problem of traffic jam is more obvious, which brings huge pressure to modern city traffic management. The intelligent traffic system is an important component of modern smart cities, and can analyze and process traffic conditions to avoid traffic jam.
Traffic prediction is one of the main functions of intelligent traffic systems, and the main purpose of the traffic prediction is to predict future traffic conditions based on historical traffic data. Accurate traffic prediction can assist vehicles in planning routes, assist relevant departments in commanding vehicle dispatching and effectively relieve traffic jam. For example, accurate prediction of taxi needs of passengers in a city can better pre-allocate and dispatch vehicles to better meet the needs of the passengers and avoid unnecessary waste of resources and waiting.
Traffic prediction is difficult because traffic conditions are affected by various external factors such as complex spatial correlation, dynamic temporal correlation, and weather. The starting point of spatial correlation is that geographic entities in space interact, for example, the traffic flow of an upstream road has a large influence on the traffic conditions of a downstream road. In addition, different functional areas may have their own unique traffic patterns. From the perspective of time correlation, the traffic condition of a place shows a certain trend in a short term and shows a certain periodic law in a long term. For example, on successive weekdays, the traffic patterns at the same location during early rush hour hours may be similar, repeating every 24 hours, but the traffic patterns on weekdays and weekends may differ significantly. Finally, external factors such as extreme weather and traffic control obviously influence people's trips, and further influence traffic conditions on roads.
Currently, the mainstream methods of traffic prediction are mainly classified into two categories, namely, the conventional knowledge-driven method and the data-driven method. Conventional methods include classical statistical methods and machine learning methods. However, these methods limit the ability to model non-linear data, so they generally perform poorly in practice. With the advance of urbanization process, traffic data are continuously accumulated, a data base is laid for the field, and a new perspective for solving the problem is provided for researchers, namely a data driving method. In recent years, with the rapid development of deep learning, researchers have proposed a large number of deep learning methods to solve this challenging problem. In particular, graph Neural Networks (GNNs) can model non-euclidean data, more conforming to the structure of a traffic network, so GNN-based methods have been extensively studied for traffic prediction. These data-driven methods exhibit superior performance due to their ability to model and extract complex data features in traffic flow data, but they still face some limitations.
First, for GNN-based approaches, in most existing studies, the spatial structure of the road network is often represented by a static, predefined or self-learned adjacency matrix. However, modeling the dynamic spatial and temporal correlation of traffic data is a key challenge due to dynamic changes in traffic conditions (such as rush hours, weekends, traffic accidents, or congestion). Static adjacency matrices limit the ability to learn urban traffic dynamics. In addition, the conventional method is designed for a local road network, and has no effect on capturing the long-distance spatial correlation. RNN-based models suffer from gradient vanishing or explosion problems when dealing with long-range sequences, and GNN-based models gather information from their neighborhood, also locally. In a real urban road network, not only the traffic flow of adjacent links (e.g., upstream and downstream) is correlated, but also non-adjacent links with the same function have similar traffic patterns. Therefore, in predicting the traffic flow, it is necessary to consider the correlation of the short distance and the long distance at the same time. Finally, the existing method considers the problem of data migration among different cities less, so that the existing method is difficult to be applied to the cities with less traffic data. Due to the fact that the development levels of different cities are different, the problem of insufficient data exists because a small city has difficulty in collecting enough data to support the training requirement of a complex deep learning model. One approach to this problem is to use a migration learning technique to perform a deep spatio-temporal prediction task across cities to migrate the knowledge learned from data-rich cities to data-poor cities.
Therefore, how to provide a traffic prediction migration learning method based on a space-time diagram self-attention model to solve at least one of the above technical problems is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the invention provides a traffic prediction migration learning method based on a space-time diagram self-attention model, which can simultaneously capture the short-distance and long-distance spatial correlation in a traffic network, complete dynamic modeling of the spatial correlation of traffic data, integrate time and spatial information, and realize a deep space-time prediction task across cities.
In order to achieve the purpose, the invention adopts the following technical scheme:
a traffic prediction transfer learning method based on a space-time diagram self-attention model comprises the following steps:
s1: converting historical traffic data and an urban traffic network structure into high-dimensional space-time expression vectors through a data embedding layer;
s2: inputting the high-dimensional space-time expression vector to a first layer space-time encoder, and specifically comprising the following steps:
carrying out layer normalization on the high-dimensional space-time expression vector to obtain a space-time expression vector after the layer normalization;
respectively inputting the space-time expression vectors after layer normalization into a time perception space self-attention mechanism and a trend perception time self-attention mechanism to correspondingly obtain multi-head space characteristic vectors and multi-head time characteristic vectors;
splicing the multi-head space eigenvector and the multi-head time eigenvector, and adding the multi-head space eigenvector and the high-dimensional space-time expression vector when the layer is not normalized to obtain a space-time eigenvector;
carrying out layer normalization on the space-time characteristic vector to obtain the space-time characteristic vector after the layer normalization;
inputting the space-time feature vector after layer normalization into a fully-connected feedforward neural network, and adding the output and the space-time feature vector when the layer normalization is not performed to obtain the space-time feature vector after space-time self-attention block coding;
s3: inputting the space-time characteristic vector after the space-time self-attention block coding output by the first layer of space-time coder as a high-dimensional space-time expression vector to the second layer of space-time coder, repeating the operation of S2, and so on until the output of the L-th layer of space-time coder is obtained;
s4: connecting the outputs of the space-time encoders from the first layer to the L layer through jumping to obtain a final space-time feature vector;
s5: inputting the final space-time characteristic vector to an output layer to obtain a space-time prediction model;
s6: the method comprises the steps of simultaneously training a space-time prediction model on a source data set through an autoregressive task and an autorecoding task to obtain a space-time diagram self-attention model, initializing through pre-trained parameters when the space-time diagram self-attention model is applied to a target city data set, and then finely adjusting model parameters through the target city data set to realize transfer learning among different cities.
Preferably, the step of converting the historical traffic data and the urban traffic network structure into the high-dimensional space-time expression vector through the data embedding layer specifically comprises the following steps:
converting historical traffic data into traffic data embedding vectors through a traffic data embedding module;
the method comprises the steps that week-internal-period information and day-internal-time information of historical traffic data are correspondingly converted into week-internal-period embedded vectors and day-internal-time embedded vectors through a periodic information embedding module;
outputting the position information of the historical traffic data sequence into a sequence position information coding vector through a sequence position information coding module;
performing eigenvalue decomposition on a Laplace matrix of an adjacent matrix of an urban traffic network structure through a node position information embedding module to obtain an image Laplace eigenvector, and obtaining a node position information embedding vector through a full connection layer;
and adding the traffic data embedded vector, the week-day week embedded vector, the day-time embedded vector, the sequence position information coding vector and the node position information embedded vector to obtain a high-dimensional space-time expression vector.
Preferably, the step of inputting the layer-normalized spatio-temporal representation vector into a spatial attention mechanism of temporal perception to obtain a multi-head spatial feature vector specifically includes:
the time-aware spatial self-attention mechanism comprises h s A spatial attention head;
in each spatial attention head, the layer normalized spatio-temporal representation vector is converted into a spatial key matrix K by causal convolution S And a spatial value matrix V S
Converting the layer normalized spatio-temporal expression vector into a spatial query matrix Q by full join operation S
Query the space matrix Q S And space key matrix K S Matrix multiplication is carried out, and then scaling is carried out to obtain an original space attention matrix A S
Computing the original spatial attention matrix A S And a spatial mask matrix M S Performing softmax operation on the Hadamard product to obtain a final space attention matrix;
the final space attention matrix and the space value matrix V S Performing matrix multiplication to obtain a space feature vector SSA;
and splicing the space feature vectors SSA output by each space attention head to obtain a multi-head space feature vector.
Preferably, the step of inputting the layer-normalized spatio-temporal representation vector into the trend-aware temporal autofocusing mechanism to obtain the multi-head temporal feature vector specifically includes:
the time self-attention mechanism of trend perception comprises h t (ii) a time attention head;
in each temporal attention head, the layer-normalized spatio-temporal representation vector is converted into a temporal query matrix Q by causal convolution T And time key matrix K T
Converting the space-time expression vector after layer normalization into a time value matrix V through full connection operation T
Query matrix Q over time T And time key matrix K T Carrying out matrix multiplication and scaling to obtain an original time attention matrix A T
For the original time attention matrix A T Performing softmax operation to obtain a final time attention matrix;
the final time attention matrix and the time value matrix V T Performing matrix multiplication to obtain a time eigenvector TSA;
and splicing the time eigenvectors TSA output by each time attention head to obtain the multi-head time eigenvector.
Preferably, the spatial key matrix K S And a spatial value matrix V S The specific expression is as follows:
K S =Φ SK *X,V S =Φ SV *X
wherein X is the space-time expression vector after layer normalization, X is the causal convolution operation, phi SK And phi SV Is a parameter of the convolution kernel;
spatial query matrix Q S The specific expression is as follows:
Q S =XW SQ
wherein, W SQ Is a learnable parameter matrix.
Preferably, the original spatial attention matrix A S The specific expression is as follows:
Figure BDA0003845547030000061
wherein, d k Is a spatial query matrix Q S And space key matrix K S T denotes the matrix transpose. Preferably, the specific calculation formula of the multi-head spatial feature vector SSA is:
SSA=softmax(A S ⊙M S )V S
wherein an |, indicates a hadamard product.
Preferably, the time-query matrix Q T And time key matrix K T The expression is as follows:
Q T =Φ TQ *X,K T =Φ TK *X
wherein X is the space-time expression vector after layer normalization, X is the causal convolution operation, phi TQ And phi TK Is a parameter of the convolution kernel;
time value matrix V T The expression is as follows:
V T =XW TV
wherein, W TV Is a learnable parameter matrix.
Preferably, the original temporal attention matrix A T The expression is as follows:
Figure BDA0003845547030000071
wherein, d k Is a time query matrix Q T And a time key matrix K T A characteristic dimension of (d);
the temporal feature vector TSA expression is:
TSA=softmax(A T )V T
preferably, the core idea of the autoregressive task is to use data at past time to generate data at future time, so as to model the context dependence of the traffic data;
the core idea of the self-encoding task is to use the perturbed data to restore the original data, thereby generating a more efficient data representation of the input data.
The invention discloses a traffic prediction migration learning method based on a space-time diagram self-attention model, which has the following advantages:
(1) A time-sensing spatial map self-attention model is designed, and dynamic modeling of spatial correlation of traffic data is completed by introducing historical time information.
(2) A special mask mechanism and a dynamic time warping algorithm are designed, and the modeling of the spatial relationship between long distance and short distance in a traffic network is completed.
(3) Various space-time embedded coding schemes are designed, more accurate traffic prediction is realized, and city management and planning are facilitated.
(4) A pre-training method for model migration among different cities is designed, pre-training is carried out through the model, and the model is migrated to other city data sets, so that the problem of insufficient data of the city is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic block diagram of a traffic prediction transfer learning method based on a space-time diagram self-attention model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a traffic prediction transfer learning method based on a space-time diagram self-attention model, which comprises the following steps of:
s1: converting historical traffic data and an urban traffic network structure into high-dimensional space-time expression vectors through a data embedding layer;
s2: inputting the high-dimensional space-time expression vector into a first layer space-time coder, and specifically comprising the following steps:
carrying out layer normalization on the high-dimensional space-time expression vector to obtain a space-time expression vector after the layer normalization;
respectively inputting the space-time expression vectors after layer normalization into a space self-attention mechanism for time perception and a time self-attention mechanism for trend perception, and correspondingly obtaining multi-head space characteristic vectors and multi-head time characteristic vectors;
splicing the multi-head space eigenvector and the multi-head time eigenvector, and adding the multi-head space eigenvector and the high-dimensional space-time expression vector when the layer is not normalized to obtain a space-time eigenvector;
carrying out layer normalization on the space-time characteristic vector to obtain the space-time characteristic vector after the layer normalization;
inputting the space-time feature vector after layer normalization into a fully-connected feedforward neural network, and adding the output and the space-time feature vector when the layer normalization is not performed to obtain the space-time feature vector after space-time self-attention block coding;
s3: inputting the space-time characteristic vector after the space-time self-attention block coding output by the first layer of space-time coder as a high-dimensional space-time expression vector to the second layer of space-time coder, repeating the operation of S2, and so on until the output of the L-th layer of space-time coder is obtained;
s4: connecting the outputs of the space-time encoders from the first layer to the L layer through jumping to obtain a final space-time feature vector;
s5: inputting the final space-time characteristic vector to an output layer to obtain a space-time prediction model;
s6: the method comprises the steps of simultaneously training a space-time prediction model on a source data set through an autoregressive task and an autorecoding task to obtain a space-time diagram self-attention model, initializing through pre-trained parameters when the space-time diagram self-attention model is applied to a target city data set, and then finely adjusting model parameters through the target city data set to realize transfer learning among different cities.
Specifically, in S1, the data embedding layer needs to retain as much as possible of the spatial structure information and the time series information in the original data in the process of converting the original input into the high-dimensional space-time expression vector. Thus, the data embedding layer contains the following 4 modules:
(1) A traffic data embedding module: similar to the traditional Transformer model, the input of the module is historical traffic data, and the historical traffic data is projected into traffic data embedded vectors through a full connection layer.
(2) A periodic information embedding module: traffic data typically has a significant periodic regularity, as it is generated by human daily activities. Therefore, the input of the module is week internal week and day internal time information of historical traffic data, week internal week and day internal time information is reserved through two learnable periodic information embedding vectors, and a week internal week embedding vector and a day internal time embedding vector are output.
(3) A sequence position information encoding module: because a self-attention mechanism is used in the space-time encoder and cannot keep the position information of the sequence, the input of the module is the position information of the traffic data sequence, the position information of the sequence is coded through sine and cosine functions with different frequencies, and a sequence position information coding vector is output.
(4) A node position information embedding module: this module wishes to save the location information of each node in the graph. The graph Laplace eigenvector is a spectrum technology for embedding a graph into Euclidean space, and the eigenvectors form a local coordinate system, so that not only is the global graph structure reserved, but also the distance information between nodes can be well described. Therefore, the module performs eigenvalue decomposition on the Laplace matrix of the adjacent matrix of the urban traffic network structure to obtain an image Laplace eigenvector, and then obtains a node position information embedded vector through a full connection layer.
Finally, the outputs of the modules are added to obtain a high-dimensional space-time expression vector X of the original traffic data.
Specifically, how to model the spatial correlation and the temporal correlation of traffic data is a key technical difficulty of the traffic prediction task. The present invention proposes a new space-time encoder to better learn the space-time characterization of traffic data. The space-time encoder in S2 includes two sublayers: a spatiotemporal self-attention block and a fully-connected feedforward neural network, and layer normalization and residual connection operations are applied around each sub-layer. But unlike the traditional multi-headed self-attention mechanism, the model decomposes the multi-headed dot product attention operation in the space-time encoder. Specifically, the model uses a spatial attention head to perform a spatial self-attention mechanism for temporal perception, and a temporal attention head to perform a temporal self-attention mechanism for trend perception, respectively. The results of these headings are stitched together and projected again to get the final output of the spatiotemporal attention block, which allows the model to integrate both spatial and temporal information.
Wherein, the spatial self-attention mechanism of time perception: if the traditional self-attention operation is performed on the spatial dimension, each node only pays attention to the information of other nodes in the same time period, and the propagation delay of the traffic condition is ignored. For example, when a traffic accident occurs in one area, it may take a certain time to affect the traffic conditions in the neighboring area. To solve this problem, the present invention proposes to use a causal convolution operation instead of the traditional full join operation to introduce the effect of time information to dynamically compute attention between nodes, i.e. each node will pay different attention to other nodes at different times. In addition, the invention also introduces a spatial mask matrix to highlight spatial correlation from short-distance and long-distance angles. From a short distance perspective, the matrix filters attention between pairs of nodes that are far apart by setting a threshold. From the perspective of long distance, the invention firstly calculates a similarity matrix through a dynamic time warping algorithm (DTW) according to the historical data time sequence of each node, and selects a plurality of nodes with the traffic modes most similar to the current node for each node and reserves the nodes. Thus, for each node, the spatial mask matrix retains not only its neighbors that are short distance, but those nodes that are long distance but have similar traffic patterns.
The temporal-aware spatial self-attention mechanism is implemented as follows: the module comprises h s A spatial attention head. In each spatial attention head, the model first uses a causal convolution to correct the timeConverting space-time expression vector after layer normalization of scale input into space key matrix K S And a spatial value matrix V S Converting the input layer-normalized spatio-temporal representation vector into a spatial query matrix Q using a full join operation S Specifically, the following formula is calculated:
Q S =XW SQ ,K S =Φ SK *X,V S =Φ SV *X
wherein X is a space-time representation vector after layer normalization, W SQ Is a learnable parameter matrix, is a causal convolution operation, phi SK And phi SV Are parameters of the convolution kernel.
Performing a spatial query matrix Q S And spatial key matrix K S The original space attention matrix A is obtained by matrix multiplication and scaling S The specific calculation is shown as the following formula:
Figure BDA0003845547030000111
wherein d is k Is a spatial query matrix Q S And spatial key matrix K S The characteristic dimension of (c).
Computing the original spatial attention matrix A S And a spatial mask matrix M S And performing softmax operation on the result to obtain a final space attention matrix.
The final space attention matrix and the space value matrix V are combined S Performing matrix multiplication to obtain a space feature vector SSA, and specifically calculating as shown in the following formula:
SSA=softmax(A S ⊙M S )V S
wherein, l represents a hadamard product.
And finally, splicing the results SSA of each spatial attention head to obtain the multi-head spatial feature vector.
Time-of-trend-aware self-attention mechanism: the present model uses a trend-aware temporal self-attention mechanism to mine temporal patterns of traffic data. The simple point-by-point self-attention operation cannot consider the local context information of the traffic data, and the judgment of the traffic data change trend can be confused. Thus, the present invention uses causal convolution instead of the traditional fully-connected layer to introduce trends over the history of the time series.
The temporal self-attention mechanism of trend perception is performed as follows: the module comprises t Time attention head. In each temporal attention head, the model first converts the input spatio-temporal representation vector into a temporal query matrix Q using causal convolution T And a time key matrix K T Converting the input spatio-temporal representation vector into a matrix of time values V using a full concatenation operation T The specific calculation is shown as follows:
Q T =Φ TQ *X,K T =Φ TK *X,V T =XW TV
wherein X is the input space-time representation vector, X is the causal convolution operation, phi TQ And phi TK Is a parameter of the convolution kernel, W TV Is a learnable parameter matrix.
Query matrix Q over time T And a time key matrix K T Matrix multiplication is carried out, and then scaling is carried out to obtain an original time attention matrix A T Specifically, the following formula is calculated:
Figure BDA0003845547030000121
wherein, d k Is a time query matrix Q T And time key matrix K T The characteristic dimension of (c). Then for A T And performing softmax operation to obtain a final time attention matrix.
Performing matrix multiplication of the final time attention matrix and the time value matrix to obtain a time eigenvector TSA, specifically calculating as follows:
TSA=softmax(A T )V T
and finally, splicing the results TSA of each time attention head to obtain a multi-head time feature vector.
In S5, the output layer respectively realizes multi-step prediction and converts the characteristic dimension into the required output dimension through two fully-connected layers.
Further, pre-training and transfer learning in S6: for the problem of insufficient data in part of cities, pre-training the model and performing transfer learning are effective solutions. Although there may be great differences in data distribution and road network topology among different cities, there is a significant similarity in traffic patterns across cities. The model pre-trained on the large data set is migrated to the small data set, so that the prediction performance of the model on the small data set is improved, and the traffic prediction related knowledge learned from the source city with rich data can be transferred to the target city with sparse data. The model provided by the invention is completely based on a self-attention mechanism and does not contain any graph volume operation, so that the traffic Transformer model which is pre-trained can be directly and conveniently migrated to other data sets. In order to learn from a source data set to migratable traffic prediction knowledge and further improve the performance of a model on a target data set, the invention designs two pre-training tasks: (a) autoregressive task: the core idea of this task is to use past time data to generate future time data, thereby modeling the contextual dependency of traffic data. (b) self-encoding tasks: the core idea of this task is to use the noise-perturbed data to restore the original data, thereby generating a more efficient data representation of the input data. A specific data perturbation method is to randomly select 15% of the data to set 0, and use the model to recover the value set to 0 from the traffic data that is not replaced. In order to avoid the defects of different pre-training tasks, an auto-coding task and an auto-regression task are fused in a pre-training stage, namely two pre-training tasks are simultaneously performed to complete pre-training of a model on a source data set. And when the model is applied to a data set of a target city, initializing by using the pre-trained parameters, and then finely adjusting the model parameters by using the data set of the target city, thereby realizing the transfer learning among different cities.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The traffic prediction transfer learning method based on the space-time diagram self-attention model is characterized by comprising the following steps:
s1: converting historical traffic data and an urban traffic network structure into high-dimensional space-time expression vectors through a data embedding layer;
s2: inputting the high-dimensional space-time expression vector to a first layer space-time encoder, and specifically comprising the following steps:
carrying out layer normalization on the high-dimensional space-time expression vector to obtain a space-time expression vector after the layer normalization;
respectively inputting the space-time expression vectors after layer normalization into a time perception space self-attention mechanism and a trend perception time self-attention mechanism to correspondingly obtain multi-head space characteristic vectors and multi-head time characteristic vectors;
splicing the multi-head space eigenvector and the multi-head time eigenvector, and adding the multi-head space eigenvector and the high-dimensional space-time expression vector when the layer is not normalized to obtain a space-time eigenvector;
carrying out layer normalization on the space-time characteristic vector to obtain the space-time characteristic vector after the layer normalization;
inputting the space-time feature vector after layer normalization into a fully-connected feedforward neural network, and adding the output and the space-time feature vector when the layer normalization is not performed to obtain the space-time feature vector after space-time self-attention block coding;
s3: inputting the space-time feature vector after the space-time self-attention block coding output by the first layer of space-time coder as a high-dimensional space-time representation vector to the second layer of space-time coder, repeating the operation of S2, and so on until the output of the L-th layer of space-time coder is obtained;
s4: connecting the outputs of the space-time encoders from the first layer to the L layer through jumping to obtain a final space-time feature vector;
s5: inputting the final space-time characteristic vector to an output layer to obtain a space-time prediction model;
s6: the method comprises the steps of simultaneously training a space-time prediction model on a source data set through an autoregressive task and an autorecoding task to obtain a space-time diagram self-attention model, initializing through pre-trained parameters when the space-time diagram self-attention model is applied to a target city data set, and then finely adjusting model parameters through the target city data set to realize transfer learning among different cities.
2. The traffic prediction migration learning method based on the spatiotemporal self-attention model as claimed in claim 1, wherein the transforming of the historical traffic data and the urban traffic network structure into the high-dimensional spatiotemporal representation vector through the data embedding layer specifically comprises:
converting historical traffic data into traffic data embedding vectors through a traffic data embedding module;
the method comprises the steps that week-internal-period information and day-internal-time information of historical traffic data are correspondingly converted into week-internal-period embedded vectors and day-internal-time embedded vectors through a periodic information embedding module;
outputting the position information of the historical traffic data sequence into a sequence position information coding vector through a sequence position information coding module;
performing eigenvalue decomposition on a Laplace matrix of an adjacent matrix of an urban traffic network structure through a node position information embedding module to obtain an image Laplace eigenvector, and obtaining a node position information embedding vector through a full connection layer;
and adding the traffic data embedded vector, the week-day week embedded vector, the day-time embedded vector, the sequence position information coding vector and the node position information embedded vector to obtain a high-dimensional space-time expression vector.
3. The traffic prediction transfer learning method based on the spatio-temporal map self-attention model according to claim 1, wherein the step of inputting the layer-normalized spatio-temporal expression vector to a time-aware spatial self-attention mechanism to obtain a multi-head spatial feature vector specifically comprises:
the time-aware spatial self-attention mechanism comprises h s A spatial attention head;
in each spatial attention head, the layer-normalized spatio-temporal representation vector is converted into a spatial key matrix K by causal convolution S And a spatial value matrix V S
Converting the layer normalized spatio-temporal expression vector into a spatial query matrix Q by full join operation S
Query the space matrix Q S And space key matrix K S Carrying out matrix multiplication and scaling to obtain an original space attention matrix A S
Computing the original spatial attention matrix A S And a spatial mask matrix M S Performing softmax operation on the Hadamard product to obtain a final space attention matrix;
the final space attention matrix and the space value matrix V S Performing matrix multiplication to obtain a space feature vector SSA;
and splicing the space feature vectors SSA output by each space attention head to obtain a multi-head space feature vector.
4. The traffic prediction migration learning method based on the spatiotemporal self-attention model as claimed in claim 1, wherein the step of inputting the layer-normalized spatiotemporal representation vector to the trend-aware temporal self-attention mechanism to obtain the multi-head temporal feature vector specifically comprises:
the time self-attention mechanism of trend perception comprises h t (ii) a time attention head;
in each temporal attention head, the layer-normalized spatio-temporal representation vector is converted into a temporal query matrix Q by causal convolution T And time key matrix K T
Converting the layer-normalized spatio-temporal expression vector into a time value matrix V by full-join operation T
Query matrix Q over time T And a time key matrix K T Matrix multiplication is carried out, and then scaling is carried out to obtain an original time attention matrix A T
For the original time attention matrix A T Performing softmax operation to obtain a final time attention matrix;
the final time attention matrix and the time value matrix V are combined T Performing matrix multiplication to obtain a time eigenvector TSA;
and splicing the time eigenvectors TSA output by each time attention head to obtain a multi-head time eigenvector.
5. The traffic prediction transfer learning method based on the spatiotemporal self-attention model as claimed in claim 3, characterized in that the spatial key matrix K S And a spatial value matrix V S The specific expression is as follows:
K S =Φ SK *X,V S =Φ SV *X
wherein X is the space-time expression vector after layer normalization, X is the causal convolution operation, phi SK And phi SV Is a parameter of the convolution kernel;
spatial query matrix Q S The specific expression is as follows:
Q S =XW SQ
wherein, W SQ Is a learnable parameter matrix.
6. The traffic prediction migration learning method based on space-time diagram self-attention model according to claim 5Method, characterized by the original spatial attention matrix A S The specific expression is as follows:
Figure FDA0003845547020000041
wherein d is k Is a spatial query matrix Q S And spatial key matrix K S T denotes the matrix transpose.
7. The traffic prediction transfer learning method based on the space-time diagram self-attention model according to claim 6, wherein the specific calculation formula of the multi-head space feature vector SSA is as follows:
SSA=softmax(A S ⊙M S )V S
wherein, l represents a hadamard product.
8. The traffic prediction transfer learning method based on the space-time diagram self-attention model as claimed in claim 4, wherein the time query matrix Q T And time key matrix K T The expression is as follows:
Q T =Φ TQ *X,K T =Φ TK *X
wherein X is the space-time expression vector after layer normalization, X is the causal convolution operation, phi TQ And phi TK Is a parameter of the convolution kernel;
time value matrix V T The expression is as follows:
V T =XW TV
wherein, W TV Is a learnable parameter matrix.
9. The method of claim 8, wherein the original time-space-diagram-based traffic prediction transition learning method is characterized in that the original time attention matrix A T The expression is as follows:
Figure FDA0003845547020000042
wherein d is k Is a time query matrix Q T And time key matrix K T A characteristic dimension of (d);
the temporal feature vector TSA expression is:
TSA=softmax(A T )V T
10. the traffic prediction transfer learning method based on the spatiotemporal graph self-attention model according to claim 1,
the core idea of the autoregressive task is to use data at past time to generate data at future time, so as to model the context dependence of traffic data;
the core idea of the self-encoding task is to use the perturbed data to restore the original data, thereby generating a more efficient data representation of the input data.
CN202211116536.0A 2022-09-14 2022-09-14 Traffic prediction transfer learning method based on space-time diagram self-attention model Pending CN115409276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211116536.0A CN115409276A (en) 2022-09-14 2022-09-14 Traffic prediction transfer learning method based on space-time diagram self-attention model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211116536.0A CN115409276A (en) 2022-09-14 2022-09-14 Traffic prediction transfer learning method based on space-time diagram self-attention model

Publications (1)

Publication Number Publication Date
CN115409276A true CN115409276A (en) 2022-11-29

Family

ID=84165519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211116536.0A Pending CN115409276A (en) 2022-09-14 2022-09-14 Traffic prediction transfer learning method based on space-time diagram self-attention model

Country Status (1)

Country Link
CN (1) CN115409276A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383391A (en) * 2023-06-06 2023-07-04 深圳须弥云图空间科技有限公司 Text classification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383391A (en) * 2023-06-06 2023-07-04 深圳须弥云图空间科技有限公司 Text classification method and device
CN116383391B (en) * 2023-06-06 2023-08-11 深圳须弥云图空间科技有限公司 Text classification method and device

Similar Documents

Publication Publication Date Title
CN109697852B (en) Urban road congestion degree prediction method based on time sequence traffic events
US11842271B2 (en) Multi-scale multi-granularity spatial-temporal traffic volume prediction
Zang et al. Long-term traffic speed prediction based on multiscale spatio-temporal feature learning network
CN113313947B (en) Road condition evaluation method of short-term traffic prediction graph convolution network
CN111612243B (en) Traffic speed prediction method, system and storage medium
CN112071062B (en) Driving time estimation method based on graph convolution network and graph attention network
Lin et al. A spatial-temporal hybrid model for short-term traffic prediction
CN112150207A (en) Online taxi appointment order demand prediction method based on space-time context attention network
CN113222218B (en) Traffic accident risk prediction method based on convolution long-time and short-time memory neural network
CN113283581B (en) Multi-fusion graph network collaborative multi-channel attention model and application method thereof
CN111815046A (en) Traffic flow prediction method based on deep learning
CN113065074A (en) Track destination prediction method based on knowledge graph and self-attention mechanism
WO2022129421A1 (en) Traffic prediction
CN114529081A (en) Space-time combined traffic flow prediction method and device
CN115587454A (en) Traffic flow long-term prediction method and system based on improved Transformer model
CN116385970B (en) People stream aggregation prediction model based on space-time sequence data
CN110570035A (en) people flow prediction system for simultaneously modeling space-time dependency and daily flow dependency
Xu et al. AGNP: Network-wide short-term probabilistic traffic speed prediction and imputation
Ren et al. Short‐Term Traffic Flow Prediction: A Method of Combined Deep Learnings
CN115409276A (en) Traffic prediction transfer learning method based on space-time diagram self-attention model
Xiong et al. DCAST: a spatiotemporal model with DenseNet and GRU based on attention mechanism
CN115762147B (en) Traffic flow prediction method based on self-adaptive graph meaning neural network
CN115331460B (en) Large-scale traffic signal control method and device based on deep reinforcement learning
CN116258253A (en) Vehicle OD prediction method based on Bayesian neural network
CN116665448A (en) Traffic speed real-time prediction method and system based on graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination