CN116682271A

CN116682271A - Traffic flow prediction method based on U-shaped multi-scale space-time diagram convolutional network

Info

Publication number: CN116682271A
Application number: CN202310674025.9A
Authority: CN
Inventors: 张帅; 余望之; 宋小波; 姚家渭; 张文宇
Original assignee: Hangzhou Half Cloud Technology Co ltd; Zhejiang University of Finance and Economics
Current assignee: Hangzhou Half Cloud Technology Co ltd; Zhejiang University of Finance and Economics
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-09-01

Abstract

The application discloses a traffic flow prediction method based on a U-shaped multi-scale space-time diagram convolution network, wherein the U-shaped multi-scale space-time diagram convolution network comprises a space-time encoder and a space-time decoder, original characteristic data comprising channel dimension, node dimension and time dimension are constructed according to historical traffic flow data of each node in the acquired traffic network in a preset time period, then the original characteristic data is input into the space-time encoder, space-time characteristics are extracted, then the extracted space-time characteristics are input into the space-time decoder, each decoding layer of the space-time decoder is connected with a corresponding encoding layer in the space-time encoder in a jumping mode, and finally a prediction result is obtained. The method can comprehensively capture the space-time dependency relationship on different scales, and can obtain better prediction performance in a plurality of prediction time point step sizes.

Description

Traffic flow prediction method based on U-shaped multi-scale space-time diagram convolutional network

Technical Field

The application belongs to the technical field of traffic management, and particularly relates to a traffic flow prediction method based on a U-shaped multi-scale space-time diagram convolution network.

Background

In recent years, intelligent Traffic Systems (ITS) have become an indispensable component in traffic networks, and accurate traffic flow prediction can realize efficient and reliable traffic management, can also help people to make optimal traffic routes, and has very important significance. However, traffic flow data has complex spatio-temporal dependencies, typically captured by spatio-temporal modules and stacked in the channel dimension. Meanwhile, when capturing the implicit mode, the implicit fine granularity characteristic and the multi-scale space-time dependency, the model often ignores the channel semantics, so that the space-time dependency relationship is difficult to comprehensively express. Thus, achieving accurate traffic flow predictions remains a challenge.

Currently, deep learning-based models have been widely used for traffic prediction, and these models mainly consist of three branches: a Convolutional Neural Network (CNN) based model, a Recurrent Neural Network (RNN) based model, and a graph rolling network (GCN) based model. In the spatial dimension, the CNN-based model treats the traffic prediction problem as an image learning problem, thereby effectively capturing the spatial relationship between sensors. The RNN-based model captures temporal features in an iterative fashion in the time dimension to process time-series tasks. CNN-based models tend to ignore non-euclidean features of traffic networks, while RNN-based models are prone to time-consuming iterative propagation and gradient explosion/disappearance problems when capturing long-distance sequences. The GCN-based model regards the traffic network as a graph structure, so that the non-Euclidean features of the traffic network can be effectively captured, and the CNN-based model can be utilized to effectively extract the time features. However, GCN-based models still have some drawbacks. First, while some GCN-based research efforts have employed graph-based approaches to address traffic prediction problems, most GCN-based models capture spatio-temporal dependencies that are not sufficiently multi-scale to fully represent complex relationships between sensors. Second, while attention mechanisms have been widely used for some GCN-based traffic prediction models, they have not been used to capture implicit patterns and implicit fine-grained features that pile up on the channel dimension.

Disclosure of Invention

The application aims to provide a traffic flow prediction method based on a U-shaped multi-scale space-time diagram convolution network, so as to overcome the problems encountered by the existing deep learning model in traffic prediction.

In order to achieve the above purpose, the technical scheme of the application is as follows:

a traffic flow prediction method based on a U-shaped multi-scale space-time diagram convolutional network, the U-shaped multi-scale space-time diagram convolutional network comprising a space-time encoder and a space-time decoder, the traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolutional network comprising:

acquiring historical traffic flow data of each node in a traffic network in a preset time period, and constructing original characteristic data comprising channel dimension, node dimension and time dimension;

inputting original characteristic data into a space-time encoder, wherein the space-time encoder comprises M coding layers which are sequentially connected, in each coding layer, capturing time characteristics by a time gating encoder, and further capturing space characteristics by a static image gating image rolling network and a self-adaptive image gating image rolling network sequentially to obtain first characteristics; the other path of the input feature passes through a channel modeling module, the attention scores of different channels are calculated in the channel modeling module, and normalization and random discarding processing are carried out to obtain a second feature; then adding the first feature and the second feature to obtain an output feature of the coding layer;

inputting the output of the space-time encoder into a space-time decoder to obtain a prediction result, wherein the space-time decoder comprises M decoding layers which are sequentially connected, each decoding layer is in jump connection with a corresponding encoding layer in the space-time encoder, in each encoding layer, one path of input characteristics is decoded by a time gating decoder, additional fine granularity time characteristics are captured, and then adaptive spatial characteristic decoding is realized by an adaptive graph gating graph rolling network to capture fine granularity spatial characteristics, so that a third characteristic is obtained; the other path of the input features passes through a channel modeling module to effectively capture the implicit mode and the implicit fine granularity features among channels to obtain fourth features; and then adding the third feature and the fourth feature to obtain the output feature of the decoding layer.

Further, the feature channel dimension includes four features of traffic flow, a timestamp, a longitude of a node, and a latitude of the node.

Further, the time gating encoder adopts the following formula:

e＝Γ(Cconv(X _e )⊙g(X _e )+r(X _e ))

Cconv(X _e )＝X _e W _ec +b _ec

g(X _e )＝σ(X _e W _eg +b _eg )

r(X _e )＝X _e W _er +b _er

wherein Cconv (X) _e ) Is a one-dimensional causal convolution, W _ec and b_ec Is a parameter of one-dimensional causal convolution; g (X) _e ) Representing gating controlling one-dimensional causal convolution output, W _eg and b_eg Is a gating parameter; r (X) _e ) For residual connection to avoid network degradation, W _er and b_er Parameters of residual connection; e is the output of the time-gated encoder, Γ represents the activation function, Γ represents the Hadamard product calculated per element, X _e Is the input to the time-gated encoder.

Further, the static graph gating graph rolling network adopts the following formula:

wherein ,output of the rolling network for static diagram gating diagram, W ₁ and W₂ Is a linear layer, sigma and +.is the product of Sigmoid function and hadamard per element used to form gating, BN is the batch normalization, X is the input to the static graph gating graph convolution network, A _FG Is a static diagram.

Further, the adaptive graph gating graph rolling network adopts the following formula:

wherein ,for the output of an adaptive graph gating graph rolling network, W ₁ and W₂ Is a linear layer, sigma and +.are the Sigmoid function and the hadamard product per element used to form gating, BN is the batch normalization, X is the input to the adaptive graph gating graph convolution network, A _AG Is an adaptive graph.

Further, the channel modeling module adopts the following formula expression:

CSAM＝Dropout(LN(AS))

wherein AS is the attention score, CSAM is the output of the channel modeling module, dropout is the random discard, LN is the normalization, Q is the query vector obtained by the query mapping of the input, K is the key vector obtained by the key mapping of the input, and V is the value vector obtained by the value mapping of the input.

Further, the time gating decoder adopts the following formula:

d＝Γ(Tconv(X _d )⊙g(X _d )+r(X _d ))

Tconv(X _d )＝X _d W _dc +b _dc

g(X _d )＝σ(X _d W _dg +b _dg )

r(X _d )＝X _d W _dr +b _dr

wherein Tconv (X) _d ) For one-dimensional transposed convolution, W _dc and b_dc Is a one-dimensional transposed convolved parameter; g (X) _d ) Representing gating controlling one-dimensional transposed convolution output, W _dg and b_dg Is a gating parameter; r (X) _d ) For residual connection to avoid network degradation, W _dr and b_dr Parameters of residual connection; d is the output of the time-gated decoder, Γ represents the ReLU activation function; sigma is a Sigmoid activation function; as indicated by the addition of the element-calculated Hadamard product, X _d Is the input to the time-gated codec.

According to the traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolution network, provided by the application, the gradient explosion/disappearance problem caused by capturing long-distance sequences can be effectively relieved in the encoding and decoding processes through the novel time encoder-decoder module with causal convolution and transposed convolution. The method can effectively capture the implicit mode and the implicit fine granularity characteristic among channels through a new channel self-attention mechanism to strengthen the channel semantics, and strengthen the representation capability of space-time dependency. According to the application, through the full convolution and jump connected U-shaped multi-scale space-time diagram convolution network, space-time dependency relations on different scales can be comprehensively captured, and good prediction performance can be obtained in a plurality of prediction time point step sizes.

Drawings

FIG. 1 is a schematic diagram of a U-shaped multi-scale space-time diagram convolutional network model of the present application.

Fig. 2 is a flow chart of a traffic flow prediction method based on a U-shaped multi-scale space-time diagram convolutional network.

Fig. 3 is a schematic diagram of a coding layer structure of a space-time encoder according to an embodiment of the present application.

FIG. 4 is a schematic diagram of a time-gated encoder according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a static map gating rolled network according to an embodiment of the present application.

Fig. 6 is a schematic diagram of an adaptive graph gating rolled network according to an embodiment of the present application.

FIG. 7 is a schematic diagram of a channel modeling module according to an embodiment of the present application.

Fig. 8 is a schematic diagram of a decoding layer structure of a space-time decoder according to an embodiment of the present application.

FIG. 9 is a schematic diagram of a time-gated decoder according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The traffic network g= (V, E, a) is a directed graph or undirected graph representing spatial relationships between different nodes, where V represents N nodes observed on traffic paths, E represents a set of edges representing connectivity between nodes,refers to a weighted adjacency matrix that represents the degree of connectivity between nodes. Traffic flow refers to the flow of traffic detected by sensors in nodes per unit of time. The purpose of traffic flow prediction is to find a function that uses the T historical time point steps of the spatio-temporal signal and the traffic network graph G to predict the future T' time point steps of the spatio-temporal signal. The above-described mapping relationship can be expressed as formula (1):

[Y ^t+1 ,Y ^t+2 ,...,Y ^t+T ]＝f([X ^t-T′+1 ,X ^t-T′+2 ,...,X ^t ]；G) (1)

where T' and T are the historical and future time point steps for prediction, respectively.Andhistorical and future spatiotemporal signals, respectively, D is the number of input features of the model. The present application contemplates four features, namely traffic flow sequence, time stamp, longitude of sensor and latitude of sensor, wherein time stamp refers to time feature with period attribute (cycle, month period and year period) for enhancing the representation capability of the period attribute of traffic flow.

The application provides a U-shaped multi-scale space-time diagram convolution network model for traffic prediction. As shown in FIG. 1, the U-shaped multi-scale space-time diagram convolutional network model (GSTC-Unet) provided by the application comprises a space-time encoder and a space-time decoder, wherein jump connection is carried out between the space-time encoder and the space-time decoder, so that multi-scale space-time dependency can be effectively captured.

Wherein the space-time encoder comprises a Time Gating Encoder (TGE), a spatial modeling module and a channel modeling module; the space-time decoder includes a Time Gate Decoder (TGD), an adaptive graph gate-map convolution network (GGCN-AG), and a channel modeling module.

The application adopts a Time Gating Encoder (TGE) and a Time Gating Decoder (TGD) to extract the time characteristics efficiently so as to effectively relieve the gradient explosion/disappearance problem generated in the long-distance sequence capturing process. The spatial modeling module captures spatial features using a static graph gated graph rolling network (GGCN-FG) and an adaptive graph gated graph rolling network (GGCN-AG). The channel modeling module captures implicit patterns and implicit fine granularity characteristics among channels by using a channel self-attention mechanism (CSAM), so that the representation capability of the space-time dependency relationship of the traffic network is enhanced.

In one embodiment, as shown in fig. 2, the traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolution network provided by the application comprises the following steps:

step S1, historical traffic flow data of each node in a traffic network in a preset time period is obtained, and original characteristic data comprising channel dimension, node dimension and time dimension are constructed.

The application predicts traffic flow at T' time points in the future by using traffic flow at T historical time points and the traffic network graph G. Aiming at the proposed U-shaped multi-scale space-time diagram convolution network model, input original characteristic data of the network model are firstly constructed.

The original characteristic data constructed in the embodiment is used forRepresentation, wherein C _ie Is the channel dimension, N is the number of nodes of the traffic network, T _ie Is the time dimension of the input. The channel dimension comprises traffic flow, a time stamp, longitude of the node, latitude of the node and other characteristics.

S2, inputting original characteristic data into a space-time encoder, wherein the space-time encoder comprises M coding layers which are sequentially connected, in each coding layer, capturing time characteristics by a time gating encoder, and then further capturing space characteristics by a static diagram gating diagram rolling network and a self-adaptive diagram gating diagram rolling network in sequence to obtain first characteristics; the other path of the input feature passes through a channel modeling module, the attention scores of different channels are calculated in the channel modeling module, and normalization and random discarding processing are carried out to obtain a second feature; the first feature and the second feature are then added to obtain an output feature of the encoding layer.

In a specific embodiment, the space-time encoder comprises 4 coding layers connected in sequence, each coding layer is shown in fig. 3 and comprises two branches, and the output characteristics of the two branches are added to obtain the output characteristics of the coding layer.

Specifically, the first branch of the coding layer includes a time gating encoder TGE, a static graph gating graph rolling network GGCN-FG and an adaptive graph gating graph rolling network GGCN-AG; the second branch is the channel modeling module CSAM.

In a specific embodiment, a Time Gating Encoder (TGE) is shown in fig. 4, the TGE employing one-dimensional causal convolution to adequately capture the temporal characteristics of the traffic network and stack them in the channel dimension for downsampling the temporal characteristics. In addition, TGE uses gating modules that can effectively capture time dependent outputs to adaptively control the effective outputs. Thus, TGE constructs causal relationships of time series predictions.

Given an input wherein C_ie Is the channel dimension of the input, N is the number of nodes, T _ie Is the time dimension of the input, TGE may be expressed as follows:

e＝Γ(Cconv(X _e )⊙g(X _e )+r(X _e )) (2)

Cconv(X _e )＝X _e W _ec +b _ec (3)

g(X _e )＝σ(X _e W _eg +b _eg ) (4)

r(X _e )＝X _e W _er +b _er (5)

wherein Cconv (X) _e ) For one-dimensional causal convolution, for coding temporal characteristics, W _ec and b_ec Is a parameter of one-dimensional causal convolution; g (X) _e ) Representing gating controlling one-dimensional causal convolution output, W _eg and b_eg Is a gating parameter; r (X) _e ) For residual connection to avoid network degradation, W _er and b_er Parameters of residual connection;output as TGE, wherein C _oe For the channel dimension of the output, T _oe Is the time dimension of the output; Γ represents an activation function; the product of Hadamard calculated by element; />With the output being adaptively controlled.

In a specific embodiment, a static graph gated graph rolling network (GGCN-FG) is shown in FIG. 5, where the static graph is derived from the distance between different nodes. Generally, different nodes have different geographical location distributions, while neighboring nodes have higher similarity. Therefore, the application combines the longitude and latitude positions of the nodes and the Gaussian kernel based on the threshold value, calculates the Euclidean distance between the nodes in the traffic network and represents the static diagram. The static diagram (FG) is shown in formula (6):

wherein ,X_i and Y_i Latitude and longitude, X, respectively, of node i _j and Y_j The latitude and longitude where node j is located, σ is the standard deviation of the Euclidean distance, and ε is the threshold of the Gaussian kernel.

Assume thatRepresenting the generated static graph; />Representing input data, where C _in T is the channel dimension of the input _in Is the time dimension of the input; />Representing a linear layer, where C _in C as the channel dimension of the input _out Is the channel dimension of the output. Therefore, GCN can be defined as formula (7):

g＝A _FG XW (7)

as shown in GGCN-FG of fig. 5, based on equation (7), the output of GCN is further adaptively controlled by gating, more statistical information is obtained by regularization method, and network degradation is avoided by residual connection. Based on FG, GGCN-FG expressed by formula (8) can be obtained:

wherein ,for GGCN output, C _out For the channel dimension of the output, T _out Is the time dimension; w (W) ₁ and W₂ For the linear layer, σ and +.are the Sigmoid function and the per-element hadamard product used to form gating, BN is batch normalization.

In a specific embodiment, an adaptive graph gated graph rolling network (GGCN-AG) is shown in FIG. 6. In addition to static graphs generated from geographical location distribution, adaptive graphs are also very important, which may represent hidden spatial dependencies between nodes. The adaptive map (AG) is independent of a priori knowledge and can be defined according to equation (9):

wherein ,the model training method is a randomly initialized vector, hidden space dependency relations among nodes can be automatically captured in the model training process, C is the length of the vector, and ReLU and SoftMax functions are used for improving the speed and effect of model training.

As shown in GGCN-AG of fig. 6, based on equation (7), the output of GCN is further adaptively controlled by gating, more statistical information is obtained by regularization method, and network degradation is avoided by residual connection. Based on the adaptive map AG, GGCN-FG expressed by formula (10) can be obtained:

In a specific embodiment, the channel modeling module, as shown in FIG. 7, in capturing the spatio-temporal dependencies of traffic flow data, different channels represent different spatio-temporal semantics and therefore often require conversion channels. In order to improve the representation capability of space-time dependency relationship among channels and capture the implicit mode and implicit fine granularity characteristics among channels, the application provides a novel CSAM to realize semantic interaction among channels. AS shown in the channel modeling module in fig. 7, CSAM employs a self-attention mechanism to calculate the Attention Score (AS) for the different channels AS shown in equation (14). CSAM can comprehensively represent implicit patterns and implicit fine-grained features of stacked spatio-temporal dependencies in the channel dimension. In addition, the present embodiment also employs Layer Normalization (LN) and random discard (Dropout) to accelerate network convergence and prevent overfitting. Thus, CSAM can be expressed as equation (15):

CSAM＝Dropout(LN(AS)) (15)

wherein , and />Respectively representing a query vector, a key vector and a value vector, which are respectively obtained by input through query mapping, key mapping and value mapping, and taking the query vector, the key vector and the value vector as input of a self-attention mechanism; c (C) _in Is the channel dimension of the input, C _d Is the channel dimension, T, of the self-attention mechanism _in Is the time dimension of the input, softmax is the activation function; />Is the output of CSAM, C _out Is the channel dimension of the output, T _out Is the time dimension of the output.

S3, inputting the output of the space-time encoder into a space-time decoder to obtain a prediction result, wherein the space-time decoder comprises M decoding layers which are sequentially connected, each decoding layer is in jump connection with a corresponding encoding layer in the space-time encoder, in each encoding layer, one path of input characteristics is decoded by the time gating decoder to acquire additional fine granularity time characteristics, and then adaptive spatial characteristic decoding is realized by an adaptive graph gating graph rolling network to acquire fine granularity spatial characteristics to obtain third characteristics; the other path of the input features passes through a channel modeling module to effectively capture the implicit mode and the implicit fine granularity features among channels to obtain fourth features; and then adding the third feature and the fourth feature to obtain the output feature of the decoding layer.

In traffic flow prediction, extraction of time-dependent relationships is indispensable. This is especially true when dealing with long distance sequences, which are prone to gradient explosion/extinction problems during training. Accordingly, the present application proposes a novel time encoder-decoder module comprising TGE and TGD, thereby effectively extracting the time characteristics of traffic flow data.

The space-time decoder of this embodiment also includes four decoding layers, each of which is shown in fig. 8, and also has two branches, and the output features of the two branches are added to obtain the output features of the decoding layers.

Specifically, the first branch of the decoding layer includes a time gating decoder TGD and an adaptive graph gating graph rolling network GGCN-AG; the second branch is the channel modeling module CSAM.

In a specific embodiment, a Time Gated Decoder (TGD) as shown in fig. 9, to upsample and restore the time features captured in the TGD to the original time scale as input, the TGD employs one-dimensional transpose convolution to capture additional time features and map them to higher dimensional space, thereby alleviating the gradient explosion/extinction problem caused by the conventional zero-fill method. TGD also employs gating to adaptively control model outputs. Thus, TGD alleviates the problem of feature underutilization.

In this embodiment, each decoding layer of the space-time decoder is in skip connection with a corresponding encoding layer of the space-time encoder, so that the input of the first decoding layer is obtained by splicing the output of the space-time decoder with the output of the fourth encoding layer. And the input of the second decoding layer is obtained by splicing the output of the first decoding layer with the output of the third encoding layer, and so on.

For any one decoding layer, given input featuresAnd output characteristics of the coding layer->Is spliced to obtain-> wherein C_id Channel dimension for input feature, C _ie Channel dimension, T, for coding layer output features _id Is the time dimension of the input.

The time gate decoder TGD may be defined as follows:

d＝Γ(Tconv(X _d )⊙g(X _d )+r(X _d )) (16)

Tconv(X _d )＝X _d W _dc +b _dc (17)

g(X _d )＝σ(X _d W _dg +b _dg ) (18)

r(X _d )＝X _d W _dr +b _dr (19)

wherein Tconv (X) _d ) For one-dimensional transpose convolution, for decoding temporal features, W _dc and b_dc Is a one-dimensional transposed convolved parameter; g (X) _d ) Representing gating controlling one-dimensional transposed convolution output, W _dg and b_dg Is a gating parameter; r (X) _d ) For residual connection to avoid network degradation, W _dr and b_dr Parameters of residual connection;output as TGD, wherein C _od For the channel dimension of the output, T _od Is the time dimension of the output; Γ represents a ReLU activation function; sigma is a Sigmoid activation function; the product of Hadamard calculated by element is indicated by the letter.

The application also verifies the model through experiments, and tests the performance of the proposed model by adopting two data sets of PeMSD4 and PeMSD8, wherein the data sets are from Caltrans Performance Measurement System (PeMS). The PeMSD4 dataset has 180 sensors located in san francisco. The PeMSD8 dataset has 175 sensors located at san Bei Nadi no. Both data sets contain traffic flow data from month 1 of 2017, 6 to month 1 of 2017, 8, with 5 minutes as a time point step. Referring to the study by Li et al (2017), traffic flow data was normalized using the Z-Score method, and training, validation and test sets of data were partitioned using a 6:2:2 ratio.

Traffic flow for 12 historical time point steps (60 minutes) was used to predict traffic flow for 12 future successive time point steps and three statistical indicators were used to evaluate the performance of the model: mean Absolute Error (MAE), root Mean Square Error (RMSE), and Mean Absolute Percent Error (MAPE). Let y be _p (1. Ltoreq.p. Ltoreq.P)The three statistical indexes can be calculated according to the following formulas, wherein P represents the P-th true value and the predicted value, and P represents the size of the test sample:

adam was chosen as the optimizer for the experiment, with an initial learning rate of 0.001, training round number of 150, and batch size of 64. In TGE and TGD, one-dimensional causal convolution, one-dimensional transpose convolution, gating, and residual concatenation all use a convolution kernel with a filter size of 3 and a time point step of 1 as the convolution kernel. In the spatial modeling module, σ and ε are both set to 0.5, E ₁ and E₂ The length of (2) is set to 10. In the channel modeling module, the channel dimension C of the self-attention mechanism _d Let 8 be the number. The number of coding layers and decoding layers/is set to 4. Input time dimension T _in Let 12 be the output time dimension T _out Set to 12.t represents the time feature scale of each compression of TGE, set to 2. Input channel dimension C _in Channel dimension C of 32, output _out 32.

Table 1 lists the average predictions for 3 time-point steps (15 minutes), 6 time-point steps (30 minutes) and 12 time-point steps (60 minutes) in the future for GSTC-Unet and other reference models proposed by the present application on both PeMSD4 and PeMSD8 datasets. The comparison experiment results show that:

(1) The deep learning models (LSTM, STGCN, DCRNN, ASTGNN, MTGNN, graph WaveNet and GSTC-Unet) generally outperform the traditional statistical model (HA). It follows that complex and self-learning deep learning models are more adept at modeling traffic flow data with spatiotemporal relationships.

(2) In the deep learning model, the GCN-based models (STGCN, DCRNN, ASTGNN, MTGNN, graph WaveNet, and GSTC-Unet) perform better than the model without GCN structure (LSTM), thus it can be seen that the GCN-based models are better at processing traffic flow data with non-euclidean features.

(3) Compared with STGCN, DCRNN, ASTGNN, MTGNN and Graph WaveNet, GSTC-Unet has better performance because the U-MSSGCN adopted by GSTC-Unet can effectively capture the multi-scale space-time dependency relationship of traffic flow data.

(4) Compared to RNN-based models (i.e., LSTM and DCRNN), CNN-based models (i.e., STGCN, MTGNN, graph WaveNet and GSTC-une) perform better (especially for predictions of 30 and 60 minutes into the future) because the RNN gradient explosion/disappearance problem limits its ability to capture long distance sequences. Furthermore, GSTC-Unet performs best in all CNN-based models, because in GSTC-Unet, the temporal encoder-decoder module captures temporal signatures with CNN-based TGEs that construct causal relationships for time series predictions and TGDs that alleviate the problem of signature underutilization. Therefore, GSTC-Unet can effectively alleviate the gradient explosion/disappearance problem during long-distance sequence capture, thereby achieving excellent performance in long-distance prediction.

(5) ASTMNN adopts a trend-aware self-attention mechanism in the space-time dimension, and can adaptively learn the dynamic space-time characteristics of traffic flow data, but ignores the characteristics in the channel dimension. Compared with ASTMNN, the GSTC-Unet adopts novel CSAM in the channel dimension, and can capture the implicit mode and the implicit fine granularity characteristic between channels, so that the representation capability of the model on the time-space dependence is enhanced, and the GSTC-Unet has better performance than the ASTMNN.

(6) In summary, the GSTC-Unet provided by the application obtains the best prediction result on most indexes, especially for long-distance prediction of 30 minutes and 60 minutes in the future.

Table 1 results of comparative experiments of different models on two data sets

The GSTC-Unet provided by the application can comprehensively capture long-distance sequences, implicit modes, implicit fine granularity characteristics and multi-scale space-time dependency relationships, thereby realizing accurate traffic flow prediction. First, the present application proposes a novel time encoder-decoder based on causal convolution and transposed convolution, which is capable of constructing causal relationships for time series prediction and alleviating the defect of feature under-utilization, which contributes to long-distance series prediction. Secondly, the application provides a new channel self-attention mechanism which can enhance the channel semantics and effectively capture the implicit mode and the implicit fine granularity characteristics among channels, and simultaneously enhance the representation capability of the model on the space-time dependency relationship. Finally, the application provides a novel U-shaped multi-scale space-time diagram convolution network with full convolution and jump connection, which can comprehensively capture space-time dependency relations on different scales. A large number of experimental results show that GSTC-Unet has remarkable superiority in predicting traffic flow.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolution network is characterized in that the U-shaped multi-scale space-time diagram convolution network comprises a space-time encoder and a space-time decoder, and the traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolution network comprises the following steps:

2. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolutional network according to claim 1, wherein the characteristic channel dimensions comprise four characteristics of traffic flow, time stamp, longitude of node and latitude of node.

3. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolutional network according to claim 1, wherein the time gating encoder adopts the following formula:

e＝Γ(Cconv(X _e )⊙g(X _e )+r(X _e ))

Cconv(X _e )＝X _e W _ec +b _ec

g(X _e )＝σ(X _e W _eg +b _eg )

r(X _e )＝X _e W _er +b _er

4. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolution network according to claim 1, wherein the static diagram gating diagram convolution network is expressed as follows by adopting a formula:

5. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolution network according to claim 1, wherein the adaptive diagram gating diagram convolution network is expressed as follows by adopting a formula:

6. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolutional network according to claim 1, wherein the channel modeling module adopts the following formula expression:

CSAM＝Dropout(LN(AS))

7. The traffic flow prediction method based on the U-shaped multi-scale space-time diagram convolutional network according to claim 1, wherein the time gating decoder adopts the following formula:

d＝Γ(Tconv(X _d )⊙g(X _d )+r(X _d ))

Tconv(X _d )＝X _d W _dc +b _dc

g(X _d )＝σ(X _d W _dg +b _dg )

r(X _d )＝X _d W _dr +b _dr