CN116011638A

CN116011638A - Urban space-time prediction method based on space-time attention

Info

Publication number: CN116011638A
Application number: CN202211715849.8A
Authority: CN
Inventors: 王静远; 韩程凯; 姜佳伟; 李超
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-04-25

Abstract

The invention discloses a city space-time prediction method based on space-time attention, which comprises the steps of constructing and utilizing a city space-time prediction model based on space-time attention to carry out city space-time prediction; the city space-time prediction model comprises a space-time encoder, wherein the space-time encoder comprises a semantic space self-attention module, a geographic space self-attention module, a time self-attention module and a heterogeneous attention fusion module, and the space-time encoder is used for obtaining space-time feature vectors according to high-dimensional space-time representation vectors. The urban space-time prediction method disclosed by the invention can acquire long-distance semantic neighborhood information, short-distance geographic neighborhood information and long-distance time dependency relationship in the historical data, and accurately performs urban space-time prediction by simulating the complex and dynamic space-time dependency relationship of the historical data.

Description

Urban space-time prediction method based on space-time attention

Technical Field

The invention relates to the technical field of urban space-time prediction, in particular to a method for urban space-time prediction based on space-time attention.

Background

Urban space-time prediction is the "main battlefield" where artificial intelligence is combined with smart cities. Common city data prediction tasks include traffic condition prediction, population density prediction, riding service demand prediction, air quality prediction, and the like.

At present, urban space-time dynamic prediction based on deep learning neural network is a mainstream method in the field of urban data mining at present due to higher accuracy; however, GNN-based models have three main limitations in urban space-time prediction:

(1) The spatial correlation between spatial locations in spatio-temporal data, as affected by travel patterns and accidents in cities, is time-varying, not static. For example, the correlation between two spatial nodes becomes stronger during early peaks and weaker during other periods. The existing method mainly builds a spatial correlation model in a static mode, and limits the ability of learning a dynamic city space-time mode whether the spatial correlation model is predefined or adaptive;

(2) Due to the functional division of cities, two remote sites may reflect similar spatio-temporal patterns, which means that the spatial dependency between sites is long distance. However, the existing method is often based on local design, and cannot capture the long-distance dependency relationship. For example, GNN-based models suffer from excessive smoothness, making it difficult to capture long-range spatial dependencies;

(3) In a space-time system, the propagation of spatial information between different locations may be subject to time delays. For example, when a traffic accident occurs at one place, a delay time of several minutes is required to affect traffic conditions at neighboring places. However, in a typical GNN model-based instant messaging mechanism, such features are omitted.

In this regard, the present invention proposes a dynamic long-distance urban space-time prediction model based on propagation delay perception, and utilizes the model to perform urban space-time prediction, so as to overcome the above-mentioned drawbacks existing in the prior art.

Disclosure of Invention

In view of this, the present invention provides a space-time prediction method for cities based on space-time attention, which is implemented by constructing a space-time prediction model of cities, wherein the space-time prediction model includes a space-time encoder based on a self-attention mechanism, captures dynamic spatial correlation of short distance and long distance through a geospatial self-attention module and a semantic space self-attention module, and captures time patterns of dynamic and long distance through a time self-attention module, thereby accurately simulating complex and dynamic space-time dependency of historical data.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a city space-time prediction method based on space-time attention comprises the steps of constructing a city space-time prediction model based on space-time attention, and carrying out city space-time prediction by utilizing the city space-time prediction model; the city space-time prediction model comprises a space-time encoder which comprises a semantic space self-attention module, a geographic space self-attention module, a time self-attention module and a heterogeneous attention fusion module, and is used for obtaining a space-time characteristic vector according to a high-dimensional space-time representation vector,

the semantic space self-attention module is used for acquiring long-distance semantic neighborhood information according to the high-dimensional space-time representation vector and the binary semantic mask matrix;

the geographic space self-attention module is used for acquiring short-distance geographic neighborhood information according to the high-dimensional space-time representation vector and the binary geographic mask matrix;

the time self-attention module is used for acquiring a long-distance time dependency relationship according to the high-dimensional space-time representation vector;

and the heterogeneous attention fusion module is used for carrying out serial projection on the output results of the semantic space self-attention module, the geographic space self-attention module and the time self-attention module to obtain the space-time feature vector.

Preferably, the high-dimensional space-time representation vector is obtained by converting historical space-time data and urban traffic road network structures through a data embedding layer.

Preferably, the high-dimensional space-time representation vector is input to the semantic space self-attention module, the geographic space self-attention module and the time self-attention module after being divided according to the number of the attention modules.

Preferably, the semantic space self-attention module, the geospatial self-attention module and the temporal self-attention module map the high-dimensional space-time representation vector into a query matrix, a key matrix and a value matrix when calculating.

Preferably, the expression of the calculation process of the semantic space self-attention module and the geospatial self-attention module is as follows:

in the formula ,

respectively a query matrix, a key matrix and a value matrix mapped according to the high-dimensional space-time representation vector, t represents time, s represents space, and a represents Hadamard product, M _sem Representing a binary semantic mask matrix, M _geo Representing a binary geographic mask matrix, d' representing the dimensions of the query, key, and value matrix.

Preferably, the geospatial self-attention module, prior to computation, the key matrix

Firstly, obtaining a key matrix of time delay capable of simulating space information propagation through a delay sensing characteristic conversion module>

The acquisition step comprises the following steps:

s1, acquiring a short-term space-time mode according to input historical space-time data, and converting the short-term space-time mode into a memory vector m _i ，

S2, obtaining a high-dimensional representation u of the S-step historical space-time data sequence of the node n from the time slice (t-S+1) to t _t,n And the memory vector m _i Comparing to obtain a similarity vector w _i ，

S3, according to the similarity vector w _i The comprehensive history sequence representation r is obtained according to the following formula _t,n ，

in the formula ,W^c Is a parameter matrix which can be learned;

s4, integrating the history sequence characteristics r of N nodes _t,n After concatenation, adding with the key matrix to obtain a modified key matrix

/>

Preferably, the step of acquiring the short-term spatio-temporal pattern comprises:

slicing the historical space-time data by utilizing a sliding window to obtain a historical space-time data sequence;

k-Shape clustering is carried out on the historical space-time data sequence to obtain the short-term space-time mode

Preferably, the binary geographic mask matrix is constructed according to the relation between the distance between two nodes and a preset threshold value; the binary semantic mask matrix is constructed according to the following steps,

k nodes with highest similarity with the current node are selected as semantic neighbors;

and enabling the weight between the current node and the semantic neighbor to be 1 and the rest to be 0.

Preferably, the space-time encoder has multiple layers, and the space-time feature vector is obtained by performing jump connection on the output result of each layer of space-time encoder.

Preferably, the urban space-time prediction model further comprises an output layer for obtaining predicted urban space-time data according to the space-time feature vector,

preferably, the output layer comprises two fully connected layers.

Preferably, the heterogeneous attention fusion module performs a serial projection process, and the expression formula is:

wherein ,

representing a connector, Z ^geo ,Z ^sem ,Z ^t Is output connection, h _geo ，h _sem ，h _t Respectively represent the number of the three attention modules, W ^O ∈R ^d×d Is a learnable projection matrix, d represents the dimensions of the high-dimensional spatiotemporal representation vector.

According to the technical scheme, the urban space-time prediction method based on space-time attention is provided, and the problems caused by complex features of space-time data, namely dynamic, long distance and time delay, are fully solved.

Compared with the prior art, the space-time encoder based on the self-attention mechanism is constructed, the geographic space self-attention module and the semantic space self-attention module are specifically arranged, the local geographic neighborhood and the global semantic neighborhood are modeled through different shielding methods so as to capture the dynamic space correlation of short distance and long distance, and the time self-attention module is used for capturing the dependency relationship of the dynamic and long distance;

another beneficial effect of the invention includes: the feature conversion module of delay perception is arranged in front of the geospatial self-attention module, and the geospatial self-attention module is expanded through the feature conversion module of delay perception so as to explicitly simulate the time delay of spatial information propagation.

The urban space-time prediction model provided by the invention can execute multi-step prediction and single-step prediction, and simultaneously releases the limit of the existing GNN model in urban space-time prediction, and has accurate prediction performance and good interpretability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a space-time encoder according to the present invention;

FIG. 2 is a schematic diagram of a city space-time prediction model structure provided by the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a city space-time prediction method based on space-time attention, which comprises the following steps: constructing a city space-time prediction model based on space-time attention, and carrying out city space-time prediction by utilizing the constructed city space-time prediction model;

the space-time encoder based on the self-attention mechanism can simulate complex and dynamic space-time dependency.

Structurally, as shown in fig. 1, the space-time encoder includes a semantic space self-attention module, a geospatial self-attention module, a temporal self-attention module, and a heteroattention fusion module, respectively, for obtaining space-time feature vectors according to high-dimensional space-time representation vectors, specifically,

and the heterogeneous attention fusion module is used for carrying out serial projection on the output results of the semantic space self-attention module, the geographic space self-attention module and the time self-attention module to obtain a space-time feature vector.

Firstly, the high-dimensional space-time representation vector is obtained by converting historical space-time data and urban traffic road network structures through a data embedding layer. The data embedding layer can retain as much spatial structure information and time sequence information as possible in the original data in the process of converting the original input into a high-dimensional representation. Specifically, the input of the embedding layer is the original space-time data, and the original space-time data is projected into a space-time data embedding vector X epsilon R through the full connection layer ^T×N×d Where N is the number of spatial nodes, T is the time step and d is the feature dimension.

In one embodiment, after the high-dimensional space-time representation vector is obtained, the high-dimensional space-time representation vector is divided according to the number of self-attention modules and then is respectively input into the semantic space self-attention module, the geographic space self-attention module and the time self-attention moduleA force module. If you h _geo ，h _sem ，h _t Representing the number of three attention modules, respectively, the acquired high-dimensional spatiotemporal representation vector is divided into (h _geo +h _sem +h _t ) The dimensions d' of each module are, correspondingly:

d′＝d/(h _geo +h _sem +h _t )

further, the semantic space self-attention module, the geospatial self-attention module, and the temporal self-attention module map the high-dimensional spatio-temporal representation vectors into a query matrix Q, a key matrix K, and a value matrix V prior to computation.

The spatial self-attention module, namely the semantic spatial self-attention module and the geographic spatial self-attention module, designed by the invention is used for capturing dynamic spatial correlation in space-time data. Formally, at time t, a query, key and value matrix from the attention operation is first obtained from the high-dimensional spatiotemporal representation vector, respectively:

in the formula, wherein

D' is the dimension of the query, key, and value matrix, which is a learnable parameter. />

Then, self-attention operations are applied in the spatial dimension to model interactions between nodes and to derive spatial correlations (attention scores) of all nodes over time, i.e.:

it can be seen from this that the spatial correlation between nodes

Are different, i.e. dynamic, on different time slices. Thus, the spatial self-attention module may be adjusted to capture dynamic spatial correlation; and multiplying the attention score by a value matrix to obtain the output of the spatial self-attention module, wherein the specific formula is as follows:

for spatial self-attention in the above formula, each node interacts with all nodes, which is equivalent to treating the spatial map as a fully connected map. However, only a few interactions between node pairs are necessary, including nearby node pairs and node pairs that are far apart but functionally similar. Thus, the present application introduces two graph mask matrices M _geo and M_sem To capture both short-range and long-range spatial dependencies in the spatiotemporal data.

Specifically, a binary geographic mask matrix M is constructed according to the relation between the distance between two nodes and a preset threshold value _geo The method comprises the steps of carrying out a first treatment on the surface of the I.e. the weight is reset to 1 only if the distance between two nodes (number of hops in the figure) is smaller than the threshold lambda, otherwise to 0. In this way, the attention of the node pairs farther away can be masked.

Further, a binary semantic mask matrix M is constructed according to the following steps _sem ，

That is, from a remote perspective, the present application uses a Dynamic Time Warping (DTW) algorithm to calculate the similarity of historical spatiotemporal data between nodes. In this way, node pairs that exhibit similar spatiotemporal patterns due to similar city functions are found.

Based on the two mask matrices obtained, the expression of the computation process of the semantic space self-attention module and the geospatial self-attention module can be defined as:

in the formula ,

Meanwhile, in the propagation of real-world spatiotemporal data, there is a propagation delay, for example, in the case of traffic data, when a traffic accident occurs in one region, it may take several minutes to affect the traffic conditions in the neighboring region. Thus, in one embodiment provided by the present invention, the geospatial self-attention module employs a key matrix prior to computation

The space propagation delay sensing feature conversion module can capture propagation delay from short-term historical space-time data of each node, incorporate delay information into a key matrix of the geospatial self-attention module and can clearly simulate the time delay of the propagation of the space information.

Further, a key matrix is acquired

The method comprises the following steps:

s1, acquiring a short-term space-time mode according to input historical space-time data, and utilizing an embedded matrix W in a delay perception feature conversion module ^m Converting short-term spatiotemporal patterns into memory vectors m _i ，

In one embodiment, representative short-term spatiotemporal patterns are derived from input historical spatiotemporal data

The acquisition process comprises the following steps: historical spatiotemporal data X epsilon R entered using a sliding window of length S ^T×N×C Slicing on a time axis T, (C is the dimension of the spatiotemporal feature, such as c=2 representing inflow and outflow) to obtain a cluster of historical spatiotemporal data sequences, i.e. several time sequences of length S;

k-Shape clustering is carried out on the obtained plurality of historical space-time data sequences to obtain a short-term space-time mode

Specifically, firstly, slicing the input historical space-time data by using a sliding window with the size of S to obtain a cluster of historical space-time data sequences, and then, carrying out k-Shape clustering on the historical space-time data sequences. The k-Shape clustering algorithm is a time series clustering method, and the Shape of the time series is kept and is not influenced by scaling and movement. The present application uses the center point p of each cluster _i To represent the cluster, where p _i Also a time series of length S. Then use

Representing the clustering result, where N _p Is the total number of clusters, i.e. can +.>

Considered as a set of short-term spatio-temporal patterns.

Further, embedding matrix W in feature transformation module using delay perception ^m Space-time pattern set

Is converted into a memory vector, namely:

m _i ＝p _i W ^m

s2, acquiring an S-step historical space-time data sequence from a time slice (t-S+1) to t of a node n, and setting the sequence as x _(t-S+1:t),n Further through embedding matrix W in the delay-aware feature transformation module ^u Obtaining a high-dimensional representation u _t,n The method comprises the following steps:

u _t,n ＝x _(t-S+1:t),n W ^u

similar spatio-temporal patterns may have similar effects on nearby spatio-temporal patterns, especially abnormal spatio-temporal patterns, such as congestion. Thus, the present application compares the historical spatiotemporal data sequence of each node with the extracted spatiotemporal pattern set, fusing information of similar patterns into the historical traffic sequence representation of each node. Specifically, the similarity vector is obtained by the following formula,

s3, according to the similarity vector w _i The time space mode set is weighted and summed to obtain the comprehensive history sequence representation r _t,n The formula is as follows,

in the formula ,W^c Is a parameter matrix which can be learned; integrated historical sequence representation r _t,n Comprising nodes n from time slices (t-S+1) to tIs provided.

Namely: />

wherein ,R_t ∈R ^N×d′ Comprehensive history sequence characteristic r representing N nodes _t,n And d' is the dimension of the key matrix as a result of the concatenation.

Novel key matrix

The historical spatiotemporal data information of all nodes from time slice (t-S+1) to t is integrated at time slice t. The query matrix may take into account historical spatiotemporal data of other nodes when calculating the product of the query matrix and the new key matrix to obtain the spatial dependency of the time slices t in the formula. This process explicitly simulates the time delay in the propagation of spatial information. This module is not added to the semantic space self-attention module because short-term spatiotemporal data of distant nodes has little impact on the current node.

For temporal self-attention, there is a dependency (e.g., periodicity, trending) between spatiotemporal data in different time slices, and the dependency is different in different cases. Thus, the present application employs a temporal self-attention module to discover dynamic temporal patterns. Formally, for node n, a query, key and value matrix are obtained first, respectively:

wherein

T represents time, which is a learnable parameter. Then, as before, the self-attention operation is applied in the time dimension, and the time dependency relationship among all the time slices of the node n is obtained as follows:

it can be seen that the temporal self-attention can discover the dynamic temporal patterns of different nodes in the spatiotemporal data. In addition, the self-attention in time has global information modeling capability, and long-distance time dependency relationship among all time slices can be modeled.

In one embodiment, the output of the temporal self-attention module is:

when three types of attention mechanisms are defined, the present application merges heterogeneous attention into a multi-headed self-attention block to reduce the computational complexity of the model. Specifically, the heterogeneous attention fusion module is arranged at the end and is used for carrying out serial projection on the output result of the attention mechanism corresponding to the geography, the semantics and the time head, so that the space and the time information are integrated at the same time. Formally, the process of serial projection can be expressed by the following formula:

wherein ,

representing a connector, Z ^geo ,Z ^sem ,Z ^t Is output connection, h _geo ,h _sem ,h _t For the number of three attention modules, W ^O ∈R ^d×d Is a learnable projection matrix.

In one embodiment, a feed-forward network of fully-connected locations is employed after the heterogeneous attention fusion module to obtain an output X _o ∈R ^T×N×d The method comprises the steps of carrying out a first treatment on the surface of the And the position fully connected feed forward network includes layer normalization and residual connection.

In one embodiment, the space-time encoder is provided with a plurality of layers, and the space-time characteristic vector is obtained by jumping and connecting the output result of each layer of space-time encoder.

Furthermore, the urban space-time prediction model disclosed by the invention further comprises an output layer, wherein the output layer is used for obtaining predicted urban space-time data according to the space-time feature vector, namely, the input of the output layer is the space-time feature vector obtained by the space-time encoder, and the output is the prediction of the space-time data under a plurality of time steps in the future.

The output layer comprises two full-connection layers, and multi-step prediction and feature dimension conversion into required output dimensions are respectively realized through the two full-connection layers.

In one embodiment, an overall urban spatiotemporal predictive model is shown in FIG. 2, and predicts future spatiotemporal data based on input historical spatiotemporal data and the structure of the urban traffic network, the model comprising, structurally, a data embedding layer, a plurality of stacked spatiotemporal encoders and an output layer.

And the executing steps of the model specifically comprise:

s1: converting the historical space-time data and the urban traffic road network structure into high-dimensional space-time representation vectors through a data embedding layer;

s2: dividing the high-dimensional space-time representation vector input into three types of attention heads of a layer 1 'space-time encoder' according to the size proportion of the dimension;

s3: each attention header maps the input high-dimensional spatiotemporal representation vector into a query, key and value matrix (Q, K, V)

S4: the method comprises the steps of inputting original space-time data, inquiry, keys and a value matrix into a delay perception feature conversion module, inputting the output of the module into a geographic space self-attention module, and acquiring short-distance geographic neighborhood information by combining a binary geographic mask matrix;

s5: inputting the query, the key and the value matrix into a semantic space self-attention module, and acquiring long-distance semantic neighborhood information by combining a binary semantic mask matrix;

s6: inputting the query, the key and the value matrix into a 'time self-attention module', and acquiring a long-distance time dependency relationship;

s7: the outputs of the three types of self-attention modules are spliced by a heterogeneous attention fusion module to obtain space-time feature vectors;

s8: inputting the space-time characteristic vector into a 'fully connected feedforward neural network';

s9: inputting the space-time characteristic vector output by the 1 st layer space-time encoder as a high-dimensional space-time representation vector to the 2 nd layer space-time encoder, repeating the operations of S2 to S8, and the like until the output of the L th layer space-time encoder is obtained;

s10: the outputs of the space-time encoders of the 1 st layer to the L th layer are connected through jump, so that a final space-time feature vector is obtained;

s11: and inputting the final space-time feature vector into an output layer to obtain predicted space-time data.

In order to verify the effect achieved by the technical solution of the invention, peMS04, peMS07, peMS08 are the sets of spatiotemporal data in the United states with only one dimension of flow, respectively; and on a Beijing city taxi flow data set T-Drive of two dimensions of input flow and output flow, a New York city taxi flow data set NYCTaxi and a Chicago city bicycle flow data set BikeCHI, city space-time prediction experiments are carried out, and the invention uses average absolute error MAE, mean square error RMSE and average absolute percentage error MAPE as evaluation indexes to evaluate, and experimental results are shown in the following table, wherein the method provided by the invention is named PDFormer, and other rows are different existing space-time prediction methods.

Table one: american dataset comparison results

And (II) table: beijing, new York and Chicago dataset comparison-inflow flow

Table three: beijing, new York and Chicago dataset comparison-inflow flow

Experiments prove that the performance of the model provided by the invention exceeds that of a plurality of existing methods, and accurate urban space-time prediction is realized.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A city space-time prediction method based on space-time attention is characterized in that a city space-time prediction model based on space-time attention is constructed, and city space-time prediction is carried out by utilizing the city space-time prediction model; the city space-time prediction model comprises a space-time encoder which comprises a semantic space self-attention module, a geographic space self-attention module, a time self-attention module and a heterogeneous attention fusion module, and is used for obtaining a space-time characteristic vector according to a high-dimensional space-time representation vector,

2. The method for urban space-time prediction based on space-time attention according to claim 1, wherein the high-dimensional space-time representation vector is obtained by converting historical space-time data and urban traffic road network structures through a data embedding layer.

3. The method according to claim 1, wherein the high-dimensional spatiotemporal expression vector is divided by the number of attention modules and is input to the semantic spatial self-attention module, the geospatial self-attention module and the temporal self-attention module, respectively.

4. The method of claim 1, wherein the semantic spatial self-attention module and the geospatial self-attention module are calculated as follows:

in the formula ,

5. The method of claim 4, wherein the geospatial self-attention module, prior to computation, uses the key matrix to predict the city space-time

The acquisition step comprises the following steps:

in the formula ,W^c Is a parameter matrix which can be learned;

6. The method of spatiotemporal attention-based urban spatiotemporal prediction of claim 5, wherein the step of acquiring short term spatiotemporal patterns comprises:

slicing the input historical space-time data by utilizing a sliding window to obtain a historical space-time data sequence;

7. The method for urban space-time prediction based on space-time attention according to claim 1, wherein the binary geographic mask matrix is constructed according to the relation between the distance between two nodes and a preset threshold value; the binary semantic mask matrix is constructed according to the following steps,

8. The method of claim 1, wherein the space-time encoder has a plurality of layers, and the space-time feature vector is obtained by jumping-connection of the output result of the space-time encoder of each layer.

9. The method of claim 1, wherein the model further comprises an output layer for obtaining predicted urban spatiotemporal data based on the spatiotemporal feature vectors.

10. The urban space-time prediction method based on space-time attention according to claim 1, wherein the heterogeneous attention fusion module performs a series projection process, and the expression is as follows:

wherein ,

representing a connector, Z ^geo ,Z ^sem ,Z ^t Is output connection, h _geo ，h _sem ，h _t The number of the three attention modules, W ^O ∈R ^d×d Is a learnable projection matrix, d is the dimension of the high-dimensional spatiotemporal representation vector. />