CN113487856B

CN113487856B - Traffic flow combination prediction model based on graph convolution network and attention mechanism

Info

Publication number: CN113487856B
Application number: CN202110621902.7A
Authority: CN
Inventors: 张红; 陈林龙; 曹洁; 阚苏南; 赵天信
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2022-10-14
Anticipated expiration: 2041-06-04
Also published as: CN113487856A

Abstract

The invention relates to a traffic flow combination prediction model based on a graph convolution network and an attention mechanism, which comprises three parts, namely a graph convolution network GCN, a gating recursion unit GRU and a soft attention mechanism SoftAttention. The model of the invention can directly process traffic flow data on an original traffic network, effectively captures the space-time characteristics of traffic flow, captures the space correlation of the traffic flow on the network by using GCN, and automatically distinguishes the importance of each traffic flow sequence on the final prediction performance so as to improve the accuracy of prediction.

Description

Traffic flow combination prediction model based on graph convolution network and attention mechanism

Technical Field

The invention relates to the technical field, in particular to a traffic flow combination prediction model based on a graph convolution network and an attention mechanism.

Background

Traffic flow prediction is an important component of the Intelligent Transportation System (ITS). The traffic prediction can not only provide scientific basis for traffic managers to sense traffic jam in advance and limit vehicle running, but also provide suitable travel routes for urban passengers and improve travel efficiency. Traffic prediction is a process of analyzing urban road traffic conditions including traffic, speed and density, mining traffic patterns, and predicting road traffic trends. However, since the traffic flow has a complex space-time dependency and is influenced by external factors such as road environment, accurate and efficient traffic flow prediction is always a difficult task.

Up to now, various methods have been proposed to predict traffic flow, and related research methods may be classified into a method based on a conventional statistical theory and a machine learning method based on an intelligent calculation. Firstly, a method based on the traditional statistical theory mainly forms traffic flow data into a single time sequence and converts a traffic flow prediction problem into a time sequence prediction problem. In fact, traffic flow data is affected by many factors, it is difficult to obtain an accurate traffic flow prediction model, and existing models cannot accurately describe complex traffic flow data changes in a real-world environment. Secondly, machine learning methods based on intelligent computing are increasingly taking an important position in traffic flow prediction tasks.

In recent years, due to rapid development of deep learning, more and more researchers use deep neural networks to predict traffic flow with high accuracy. Many deep learning methods for traffic flow prediction have been proposed, such as SAE, DNN, DBN, LSTM, CNN-LSTM. Some methods only consider the time dependence and ignore the spatial correlation of traffic flow, so that the change of traffic conditions is not restricted by a road network, and the traffic state cannot be accurately predicted. Some prediction methods take into account spatial correlation for short-term passenger ride demand prediction.

Although some studies introduce CNNs for spatial correlation modeling and have made great progress in traffic flow prediction tasks, CNNs are commonly used for euclidean data such as images, regular meshes, etc., and such models cannot work in the context of urban road networks with complex topologies, and thus they cannot essentially describe the spatial correlation of the road network. Therefore, this method also has certain limitations. In recent years, with the rapid development of Graph Convolution Networks (GCNs), GCNs can process data of arbitrary graph structures, providing good solutions to the above-mentioned problems, while GCNs achieve good results on several different types of tasks based on graph structures, such as emotion classification, unsupervised learning, image classification, and the like.

Attention mechanism has been widely used in various tasks such as natural language processing, image captioning, and speech recognition. With the rapid development of attention mechanisms, existing attention models can be divided into a self-attention mechanism, a soft attention mechanism, a hard attention mechanism and the like. The goal of the attention mechanism is to select information from all inputs that is relatively important to the current task.

Accurate traffic flow prediction is a precondition guarantee for realizing intelligent traffic, but due to the complex space-time characteristic of traffic flow, the prediction is always a difficult problem. In order to capture the complex space-time correlation of the traffic flow, the invention provides a traffic flow combination prediction model based on a graph product network and an attention mechanism to predict the traffic flow.

Disclosure of Invention

The invention aims to solve the technical problem of providing a traffic flow combination prediction model based on a graph convolution network and an attention mechanism, which can directly process traffic flow data on an original traffic network, effectively capture the space-time characteristics of the traffic flow, capture the space correlation of the traffic flow on the network by using a GCN (group traffic network), learn the time dependence of the traffic flow through a GRU (group traffic channel), introduce a soft attention mechanism to adaptively distribute different degrees of attention to traffic flow sequences at different moments, and automatically distinguish the importance of each traffic flow sequence on the final prediction performance so as to improve the prediction accuracy.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: the traffic flow combination prediction model comprises three parts, namely a graph convolution network GCN, a gating recursion unit GRU and a Soft Attention mechanism Soft Attention mechanism, wherein in the formed GGCN-SA traffic flow combination prediction model, the GCN is used for capturing a topological structure of a graph to obtain spatial correlation, the GRU is used for capturing dynamic change of node attributes to obtain time correlation, the Soft Attention mechanism Soft Attention is used for adaptively allocating Attention of different degrees at different moments to traffic flow sequences and automatically distinguishing the importance of each traffic flow sequence to final prediction performance so as to improve the prediction accuracy; the GGCN-SA model is constructed by combining GCN and GRU, and n pieces of historical time series traffic data are input into the GGCN-SA model to obtain n hidden states with space-time characteristics.

Further, the traffic flow combination prediction model is constructed by the following steps: the GCN maps the spatial characteristics and the relation of traffic flow between observation stations into a graph, and the output of the GCN is input into a GRU model which is used for capturing the time correlation of traffic data; inputting the hidden state into an attention model to determine a feature vector covering global traffic information changes; calculating a weight for each hidden state using multi-layer perception by a Softmax function; calculating each feature vector covering the global traffic information change in a weighted sum mode; and outputting the prediction result by using the full connection layer.

The invention has the following advantages: deep learning can learn deep space-time characteristics of traffic flow from a large amount of traffic flow data, and a novel deep learning-based traffic flow combined prediction model GGCN-SA is established to effectively capture the space-time characteristics of the traffic flow. The invention uses a Graph Convolution Network (GCN) to capture the spatial correlation of a road network, a Gated Recursion Unit (GRU) to capture the time dependency, and further introduces a soft attention mechanism (SoftAttention) to aggregate information in different neighborhood ranges so as to enhance the prediction performance of the model. A large number of experiments are carried out on METR-LA and SZ-taxi data sets, and the experimental results show that compared with a baseline method, the GGCN-SA model provided by the invention has better prediction performance.

1. The invention provides a novel deep learning model (GGCN-SA), which captures complex spatial correlation and time dependence from traffic flow data by using a graph convolution network and a gated recursion unit and is used for a traffic flow prediction task of an urban road network.

2. The invention designs a new soft attention mechanism (SoftAttention) to adaptively allocate different degrees of attention to traffic flow sequences at different times, automatically distinguish the importance of each traffic flow sequence on final prediction performance, aggregate information in different neighborhood ranges, and automatically learn the importance of the different neighborhood ranges on traffic flow.

3. Six-group comparison experiments are respectively carried out on two groups of traffic data sets, and the experimental results show that compared with the existing baseline method, the model has the best prediction performance on different data sets.

Drawings

Fig. 1 is a road network diagram structure G of the present invention.

FIG. 2 is a schematic diagram of the GGCN-SA model structure of the invention.

FIG. 3 is a schematic diagram of the GCN structure of the present invention.

Fig. 4 is an internal structure of a GRU network unit U of the present invention.

Fig. 5 is a GRU network structure of the present invention.

Fig. 6 is a schematic diagram of the soft attention mechanism of the present invention.

FIG. 7 is the left view of the present invention: a los Angeles highway; right panel: shenzhen Luhu region.

Fig. 8 shows the convergence of the GGCN-SA model of the present invention after 500 iterations.

FIG. 9 is the present invention: 9 (a) traffic time series prediction results; and 9 (b) predicting the traffic time series.

FIG. 10 shows the accuracy prediction results of the GGCN-SA model of the invention and other models.

Detailed Description

The present invention will be described in further detail with reference to examples.

1 model

The traffic flow combination prediction model based on the graph convolution network and the attention mechanism comprises three parts, namely a Graph Convolution Network (GCN), a Gated Recursive Unit (GRU) and a soft attention mechanism (SoftAttention), and the network structure of the model is shown in fig. 2. In the GGCN-SA model, GCN is used to capture the topology of the graph to obtain spatial correlation, GRU is used to capture the dynamic change of node attributes to obtain temporal correlation, soft attention mechanism (SoftAttention) is used to adaptively assign different degrees of attention to traffic flow sequences at different times, and to automatically distinguish the importance of each traffic flow sequence to the final prediction performance to improve the prediction accuracy. The GGCN-SA model is constructed by combining GCN and GRU, and n pieces of historical time-series traffic data are input into the GGCN-SA model to obtain n hidden states with space-time characteristics.

2 problem definition

In this study, the goal of traffic flow prediction is to predict traffic information over a certain period of time from historical traffic information on roads. In general, traffic conditions may refer to traffic flow, speed, and density. Without loss of generality, the present study represents traffic conditions in terms of traffic speed.

Definition 1: a road network G. As shown in fig. 1, an unweighted graph G = (V, E) is used herein to describe the topology of a road network, and each sensor detection point is regarded as one node, and the connection relationship of any two sensors is regarded as an edge between two corresponding nodes. Where V is a set of road nodes, V = { V = ₁ ，v ₂ ，...，v _N N is the number of nodes and E is a set of edges. Expressing the observed traffic flow on G as a graph signal X ∈ R ^N×P Where P represents the number of node attribute features.

Definition 2: feature matrix X ^N×P . The traffic information on a road network is regarded as the attribute of a node in the network, and a characteristic matrix belongs to R by X ∈ ^N×P Representation where P represents the number of node attribute features, i.e., the length of the historical time series, and X _t ∈R ^N×i Representing the traffic speed on each road at time i.

Definition 3: adjacency matrix A ∈ R ^N×N . The adjacency matrix a is used to represent the connection between roads, and contains only elements of 0 and 1. If there is no link between the roads, the element is 0, otherwise it is 1.

Thus, suppose X ^(t) Representing the graph signal observed at time T, the traffic prediction problem aims at learning a function f that maps the T' history graph signal to the future T graph signal, given a graph G, the traffic speed over time T is calculated as follows:

wherein T' is the historical time series length of the traffic speed, and T is the time series length of the traffic speed needing to be predicted.

3 model construction

The GCN maps the spatial characteristics and relationships of traffic flow between observers into a graph, and inputs the output of the GCN into a GRU model that is capable of capturing the temporal correlation of traffic data. The hidden state is then input into the attention model to determine a feature vector that covers the global traffic information change. Wherein the weight for each hidden state is calculated using multi-layer perception by means of a Softmax function. Each feature vector covering the global traffic information variation is calculated in a weighted sum manner. And finally, outputting the prediction result by using the full connection layer.

3.1 spatial correlation modeling

Given the feature matrix X and the adjacency matrix a, the GCN may replace the convolution operations in the previous CNN by performing spectral convolution operations taking into account the graph nodes and the first order neighborhood of nodes to capture the spatial features of the graph. Furthermore, the hierarchical propagation rules are applicable to stacking multiple networks. Therefore, the GCN model is used herein to learn spatial features from traffic data.

GCN Structure As shown in FIG. 3, the 2-layer GCN model can be expressed as:

wherein X represents a feature matrix, A represents an adjacency matrix,

the pre-processing step is shown as follows,

is a contiguous matrix of graphs with self-connected structures,

is a matrix of degrees and is,

W ₀ ∈R ^P×H and W ₁ ∈R ^H×T Representing the hidden layer weight of the first layer and the second layer respectively, wherein P is time length, H is the number of hidden units, f (X, A) is epsilon R ^N×T Representing an output with a prediction length T, reLU () represents an activation function.

By determining the topological relationship between the central road segment and the peripheral road segments, the GCN can encode the topological structure and the road segment attributes of the road network at the same time. On the basis, the study learns the spatial correlation of the road network through a GCN model.

3.2 time correlation modeling

The internal structure of the GRU network unit is shown in fig. 4.

Inputs to the current GRU unit include the output of the previous GRU unit and the current observation. Through internal processing, output characteristics are obtained and input into the next GRU unit. Wherein h is _t-1 Hidden state at time t-1, x _t Is traffic information at time t, r _t For resetting the gate, for controlling the extent of ignoring the state information of the preceding moment, u _t For updating the gate, for controlling the extent to which the state information has been brought into the current state at the previous moment, c _t Memory contents stored for time t cell, h _t The output state at time t. The GRU model takes the hidden state at the t-1 moment and the current traffic information as input to obtain the traffic state at the t moment. The model still keeps the change trend of historical traffic information while capturing the traffic information at the current moment, and has the capability of capturing time correlation.

The calculation formula of the GRU network unit is as follows

Wherein r is _t Representing a reset gate, the smaller the reset gate, the less information of the previous state is entered. u. of _t Indicating an update gate, the larger the value of the update gate, the more state information that is entered at the previous time. x is the number of _t And h _t-1 Representing the input vector at the current time t and the output vector at time t-1, y, respectively _t An output vector representing time t. []Indicating that the two vectors are connected. Sigma represents a sigmoid activation function that controls the opening or closing of the reset gate and the update gate. By connecting a series of network units U, canA complete GRU neural network was constructed as shown in fig. 5.

3.3 Soft attention mechanism

Under the promotion of soft attention mechanism research, the soft attention mechanism is introduced into a Graph Convolution Network (GCN) and a Gated Recursion Unit (GRU) model to model the traffic data of a network structure in consideration of the spatiotemporal correlation of the graph structure of the traffic network and the traffic data. In the framework proposed herein, the output of the GCN is input into the GRU module, a soft attention mechanism is added to the output of the GRU, information in different neighborhood ranges is aggregated by using the soft attention mechanism, and then a feature vector capable of expressing the traffic state change trend is calculated for predicting future traffic tasks.

The structure of the soft attention mechanism is shown in fig. 6. Suppose a time series X _i (i =1,2.., n), where n is the time series length, first, the hidden state H at different times is calculated using the GRU model _i (i =1,2, · n), and represent them as H = { H = ₁ ，H ₂ ，...，H _n }. Then, a scoring function is designed to calculate the weight for each hidden state. Finally, output H _o Calculated in a weighted average manner.

The weight of each feature is divided by the Softmax function (score) _i ) Performing normalization calculation and obtaining final weight (alpha) _i ). Wherein w ₁ And w ₂ Weights of the first and second layers, respectively, b ₁ And b ₂ Is the deviation of the first and second layers, respectively.

score _i ＝w ₍₂₎ (w ₍₁₎ H+b ₍₁₎ )+b ₍₂₎ (4)

Finally, output H _o Calculated in a weighted average manner as follows.

The attention mechanism can be regarded as that the adaptive weight alpha is calculated _i Generating an input sequence H _i Fixed length of _o 。

3.4 loss function

The aim of the training is to minimize the error between the actual traffic speed and the predicted traffic speed in the road network. The actual traffic speed and the predicted traffic speed of different road sections are respectively Y and

and (4) showing. Therefore, the loss function of the GGCN-SA model is shown below.

Wherein, Y _t And

representing actual and predicted traffic speeds, respectively, with λ being the regularization parameter and w being the weight. The first term is used to minimize the error between the actual speed and the predicted speed. The second term | | w | | non-woven phosphor ₂ The L2 regularization term can prevent the occurrence of parameters with overlarge numerical values in the model, and is beneficial to avoiding overfitting.

4. Experiment of

4.1 data description

Two sets of traffic data sets, namely the Loop Detector data set in los Angeles (METR-LA) and the taxi track data set in Shenzhen City (SZ-taxi), were used herein to verify the performance of the GGCN-SA model presented herein. The actual traffic data set of the experiment contains different attributes such as location, date, time period, speed and flow etc. The details of the experimental data set are shown in table 1:

TABLE 1 description of the Experimental data set

Data set	METR-LA	SZ-taxi
			Data type	Time series	Time series
Position of	Los Angeles highway	Shenzhen Luhu region
			Spacer	5-minute	15-minute
Time period
		1/3/2012-7/3/2012	1/1/2015-31/1/2015
Properties				Speed of rotation	Speed of rotation
	Recording	207 sensors	156 roads

The METR-LA dataset originated from a loop detector on the los Angeles highway, spanning from 3/1/2012 to 3/7/2012, with historical traffic speeds collected by 207 sensors, with traffic speeds summarized every 5 minutes. The SZ-taxi data set is originated from the Luohu region of Shenzhen city, and the time span is 1 month and 1 day in 2015 to 31 months in 2015. In the present study, 156 major roads in the lake region were selected as the study area, and the driving speeds of each road were summarized every 15 minutes.

The experimental data mainly includes two parts: one is a 156 by 156 adjacency matrix that describes the spatial relationship between roads. Each row represents a road and the values in the adjacency matrix represent connectivity between roads. The other is a feature matrix, which describes the change in traffic speed over time on each road. Each row represents a road and each column represents traffic speed on the road for a different time period.

Since the METR-LA dataset contains some missing data, we use a linear interpolation method to fill in missing values. Before entering the data into the predictive model, the data is normalized using the min-max normalization method, which is limited to [0,1]. Normalized formula is

Wherein x _i Represents the ith original data, x _max And x _min Respectively represent the maximum and minimum values of the original data, and

representing the normalized input data.

4.2 Experimental Environment and parameter settings

The experiment was compiled and run on a Linux server (CPU: intel (R) Xeon (R) CPU E5-2620 v4@2.10GHz,GPU:NVIDIAGeForce GTX 1080). And completing construction and training of a traffic flow prediction model in a PyCharm development environment based on a TensorFlow deep learning framework.

The GGCN-SA model was trained using Adam optimizer herein, manually setting the initial learning rate to 0.001, using L2 regularization in the loss function to prevent overfitting. The GGCN-SA model selects a modified linear unit (ReLU) as an activation function, and can effectively improve the calculation speed of a neural network while avoiding the problem of gradient disappearance. In the experiment, all data sets were divided into a proportion of 8:2 as training and test sets, respectively. Traffic flow velocities were predicted for 15 minutes, 30 minutes, 45 minutes and 60 minutes.

4.3 results of the experiment

In this study, the predicted results of the GGCN-SA model were compared to the results of a historical average model (HA), an autoregressive moving average model (ARIMA), a Support Vector Regression (SVR) model, a Graph Convolution Network (GCN) model, and a Gated Recursion Unit (GRU) model.

(1) Historical average model (HA): average traffic information over a historical period is used as a prediction.

(2) Support vector regression model (SVR): support vector regression uses a linear support vector machine to train a model to obtain the relationship between input and output for traffic flow prediction.

(3) Autoregressive moving average model (ARIMA): ARIMA is one of the most widespread and popular models for time series prediction, which fits observed time series into a parametric model to predict future traffic data.

(4) Graph convolutional network model (GCN): the topological structure of the urban road network is captured by using the graph convolution network to obtain the spatial characteristics of the traffic data.

(5) Gated recursive unit model (GRU): RNN is a classical deep learning method for processing sequence learning tasks. GRU is the most prevalent variant of RNN and can be used for time series modeling.

Selecting a METR-LA data set, and carrying out 500 times of iterative training on the GGCN-SA model under a time sequence of 15 minutes, wherein the error change of the GGCN-SA model along with the increase of the iterative times is shown in figure 8. Meanwhile, predicted values and real values of traffic speeds of the GGCN-SA model and other comparison models on two different road sections in the METR-LA data set within one day are shown in FIG. 9.

Fig. 9 (a) and (b) show the predicted performance of the various models as the prediction interval increases. In general, as the prediction time interval becomes longer, the prediction error also increases due to error propagation. As can be seen from the figure, a method that only considers the temporal correlation can obtain good prediction accuracy in short-term prediction, such as a GRU model. However, as the prediction time interval increases, errors are continuously transmitted, and the prediction accuracy of the GRU model is sharply reduced. In contrast, the GCN-GRU model has a slower rate of performance degradation, mainly because the GCN-GRU can simultaneously capture the spatio-temporal characteristics of traffic flow, which is more important in long-term prediction. However, the prediction error of the GCN-GRU model increases as more time series are considered in the model. In contrast, the GGCN-SA model provided by the method achieves better prediction performance in almost all time steps, and the strategy of combining the graph convolution network and the gated recursion unit with the attention mechanism can better enhance the characterization capability of the model on the space-time characteristics of the traffic flow.

4.4 model evaluation

To better analyze the experimental results, the predictive performance of the model is evaluated, and the error between the actual traffic flow speed and the prediction result is evaluated based on the following indicators:

root Mean Square Error (RMSE):

determining the coefficient (R) ² )：

Mean Absolute Error (MAE):

accuracy (Accuracy):

interpretable variance score (Var):

in the formula, Y _t And

respectively the real speed and the predicted speed of the time sample j on the link i. N is the number of nodes on the road. Y and

are each Y _t And

the set of (a) and (b),

is the average value of Y.

In particular, the prediction error is measured by RMSE and MAE, and the smaller the values of RMSE and MAE, the better the prediction effect. The accuracy is used for detecting the prediction precision, and the larger the numerical value is, the better the prediction effect is. R ² And Var and the ability to measure the fitting of the prediction result to the actual data, the larger the value, the better the prediction effect.

Tables 2 and 3 show traffic predictions for the GGCN-SA model and other baseline methods performed on the METR-LA dataset and SZ-taxi dataset for 15 minutes, 30 minutes, 45 minutes, and 60 minutes, respectively. The baseline method does not combine both spatial correlation and temporal dependency, but rather models the temporal sequence or spatial topology in a coarse-grained manner. In contrast, the GGCN-SA model established by the method has more obvious advantages on a METR-LA data set than on an SZ-taxi data set by modeling the space topology of an observation station, and can more effectively mine space-time characteristics so as to enhance the representation capability of the model on the space-time characteristics of the traffic flow and predict more accurately.

(1) Effect of prediction Algorithm on accuracy

From tables 2 and 3 we can find that the neural network based methods include MLP model, GCN model, GRU model. Modeling its temporal characteristics then HAs better prediction accuracy compared to other methods (e.g., HA, ARIMA, and SVR models). From table 2, it can be seen that, on the METR-LA data set, for the traffic flow prediction of 15 minutes, the MAEs of the GGCN-SA model, the GCN model and the GRU model are respectively reduced by about 24.82%,22.86% and 21.00% and the accuracy is respectively improved by about 4.47%,4.35% and 4.35% compared with the HA model. Compared with an ARIMA model, the RMSE of the GGCN-SA model, the GCN model and the GRU model is respectively reduced by about 49.85 percent, 48.17 percent and 47.93 percent, and the accuracy is respectively improved by about 10.52 percent, 10.16 percent and 10.16 percent. The MAE of the GGCN-SA model, the GCN model and the GRU model is respectively reduced by about 15.50%,17.83% and 17.06% compared with the SVR model, and the accuracy is respectively improved by 1.78%,1.45% and 1.45% compared with the SVR model. This is mainly due to the difficulty of HA, ARIMA and SVR methods to capture traffic flow spatiotemporal characteristics. The lower predictive effect of the GCN model is because GCN considers only spatial features and ignores the temporal correlation of traffic data.

The GGCN-SA model was tested on the SZ-taxi dataset, as shown in Table 3, for the 15-minute traffic flow prediction, the RMSE of the GGCN-SA model, the GCN model and the GRU model was reduced by about 5.68%,0.77% and 0.93%, respectively, and the accuracy of the GGCN-SA model was improved by about 2.57%, while the accuracy of the GCN model and the GRU model was slightly lower than that of the HA model. Compared with an ARIMA model, the RMSE of the GGCN-SA model, the GCN model and the GRU model is respectively reduced by 40.40%,37.29% and 37.40%, and the accuracy is respectively improved by 89.95%,85.98% and 86.24%. This is mainly because ARIMA is difficult to capture traffic flow spatio-temporal characteristics and ARIMA is calculated by calculating the error of each node and averaging, which also increases the final total error if some data fluctuates. Therefore, ARIMA has the lowest prediction accuracy. Under different time sequences, the GGCN-SA model can obtain higher prediction precision on two groups of data sets, and the robustness of the model is better, so that the accuracy and the effectiveness of the GGCN-SA model in traffic flow prediction are proved.

(2) Effect of spatio-temporal correlation on prediction accuracy

In order to verify the influence of the space-time characteristics of the traffic flow on the traffic flow prediction result, the GGCN-SA model is compared with the GCN model and the GRU model. As shown by the SZ-taxi data set in the table 3, compared with the GCN model only considering the spatial characteristics, the RMSE of the GGCN-SA model is reduced by about 1.71 percent, the accuracy is improved by about 0.99 percent under 30 minutes, and the GGCN-SA model has better representation capability on the spatial characteristics of traffic flow. Compared with the GRU model only considering the time characteristics, the RMSE of the GGCN-SA model is reduced by about 4.66%, the accuracy is improved by about 2.00%, and the GGCN-SA model has better characterization capability on the time characteristics of the traffic flow. In summary, as can be seen from tables 2 and 3, the overall effect of the GGCN-SA model is better than that of the GCN model and the GRU model in the traffic flow predictions of 15, 30, 45 and 60 minutes on the two data sets, thereby proving that the GGCN-SA model can capture the spatiotemporal characteristics of the traffic flow at the same time and has better representation capability on the spatiotemporal characteristics of the traffic flow.

Table 2: prediction of GGCN-SA model and other baseline methods on METR-LA data set (None means that the value is too small, best results are shown in bold in the table)

Table 3: prediction results of GGCN-SA model and other baseline methods on SZ-taxi dataset

(3) Effect of attention mechanism on prediction results

The GGCN-SA model is compared with a model without an intention mechanism (GCN-GRU) to verify the impact of the spatiotemporal characteristics of traffic flow on traffic flow prediction results. The results are shown in Table 4, and at 15 minutes, the MAE of the GGCN-SA model on the METR-LA and SZ-taxi data sets is reduced by about 5.89% and 3.26% respectively compared with the MAE of the GCN-GRU model, and the accuracy is improved by about 0.55% and 1.27% respectively. Under 30 minutes, the MAE is reduced by about 1.51 percent and 4.11 percent respectively, and the accuracy is improved by about 0.11 percent and 1.28 percent respectively. Under 45 minutes, the MAE is respectively reduced by about 3.23 percent and 4.29 percent, and the accuracy is respectively improved by about 0.34 percent and 1.13 percent. Under 60 minutes, the MAE of the GGCN-SA model on the METR-LA data set is inferior to that of the GCN-GRU model, and on the SZ-taxi data set, the MAE of the GGCN-SA model is reduced by about 6.31 percent, and the accuracy is improved by about 1.28 percent.

Table 4: comparison of GGCN-SA model with GCN-GRU model on two data sets, METR-LA and SZ-taxi

Therefore, as can be seen from the data in table 4 and fig. 10, the prediction error of the GGCN-SA model proposed herein is smaller than that of the model without the intention mechanism (GCN-GRU), and the prediction accuracy of the GGCN-SA model is higher under different traffic data sets and prediction levels of different time intervals, so that the GGCN-SA model has better characterization capability on the spatiotemporal characteristics of traffic flow.

In conclusion, the GGCN-SA model can always obtain the best result at different time intervals, which shows that the GGCN-SA model has better representation capability on the space-time characteristics of traffic flow. The model can also capture the variation trend of the traffic speed and predict the starting time and the ending time of the traffic flow peak period. The GGCN-SA model can better capture the space-time characteristics of traffic flow, thereby proving the accuracy and the effectiveness of the GGCN-SA model in real-time traffic prediction.

Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. The traffic flow combination prediction model based on the graph convolution network and the attention mechanism is characterized in that: the traffic flow combined prediction model comprises three parts, namely a graph convolution network GCN, a gating recursion unit GRU and a soft attention mechanism SoftAttention, wherein in the formed GGCN-SA traffic flow combined prediction model, the GCN is used for capturing a topological structure of a graph to obtain spatial correlation, the GRU is used for capturing dynamic change of node attributes to obtain time correlation, and the soft attention mechanism SoftAttention aggregates information in different neighborhood ranges to adaptively distribute attention of different degrees at different moments to traffic flow sequences and automatically distinguish the importance of each traffic flow sequence on final prediction performance so as to improve the accuracy of prediction; constructing a GGCN-SA model by combining GCN and GRU, and inputting n historical time series traffic data into the GGCN-SA model to obtain n hidden states with space-time characteristics;

the construction and training of the traffic flow combined prediction model combining the graph convolution network GCN, the gated recursion unit GRU and the soft attention mechanism comprises the following steps:

1) Describing the topological structure of the road network by using an unweighted graph G = (V, E), regarding each sensor detection point as a node, and regarding the connection relation of any two sensors as an edge between two corresponding nodes; where V is a set of road nodes, V = { V1, V2., vN }, N is the number of nodes, E is a set of edges; representing traffic flow observed on G as graph signals

Wherein P represents the number of node attribute features;

2) The traffic information on road network is regarded as the attribute of nodes in the network, and the characteristic matrix is used

Representation, where P represents the number of node attribute features, i.e., the length of the historical time series, and

representing traffic information on each road at time i;

3) Using a contiguous matrix

Representing a connection between roads, the adjacency matrix containing only elements of 0 and 1; if there is no link between roads, the element is 0, otherwise it is 1;

4) By determining the topological relation between the central road section and the peripheral road sections, the GCN encodes the topological structure and the road section attributes of the road network at the same time, and maps the spatial characteristics and the relation of the traffic flow between the observation stations into a graph;

5) Inputting the output of the GCN into a GRU model, wherein the GRU model is used for capturing the time correlation of traffic data; a time sequenceXi (i=1,2.. N), where n is the time series length, hidden states at different times are calculated using the GRU modelHi (i=1,2.. Multidot., n), and represent them asH

{H1，H2，...，Hn}; then, designing a scoring function to calculate the weight of each hidden state; finally, the output of the soft attention mechanism modelHoCalculating in a weighted average mode; the calculation formula of the GRU network unit is as follows:

wherein the content of the first and second substances,r _t representing a reset gate, the smaller the reset gate, the less information of a previous state is entered;u _t the value of the update gate is larger, and the state information of the previous moment is more entered;x _t andh _t-1 respectively representing the input vector at the current time t and the output vector at the time t-1,y _t an output vector representing time t; []Representing that the two vectors are connected;

representing a sigmoid activation function, and controlling the opening or closing of a reset gate and an update gate;c _t the memory content stored in the unit is represented as t time;W _u represents the weight of the update gate;W _r representing the weight of the reset gate;b _r indicating the bias of the reset gate;b _u indicating the bias of the update gate;W _c representing the weight of the cell store;b _c an offset representing a cell storage;W _o a weight representing the output;xrepresenting an input vector;

inputting the hidden state into a soft attention model, aggregating information in different neighborhood ranges by using a soft attention mechanism, and then calculating a characteristic vector capable of expressing the traffic state change trend for predicting a future traffic task; designing a scoring function to calculate the weight of each hidden state, wherein the weight of each feature is divided by the Softmax functionscore _i Carrying out normalization calculation and obtaining the final weight of each hidden stateα _i ：

score _i ＝w ₂ (w ₁ H _i +b ₁ )+b ₂ ；

Whereinw ₁ Andw ₂ the weights of the first and second layers of the two-layer GCN model,b ₁ andb ₂ is the deviation of the first layer and the second layer, respectively;

computing characteristics of each covering global traffic information change in a weighted sum modeOutput of eigenvector, soft attention mechanism modelHoThe calculation is performed in a weighted average manner as follows:

finally, outputting a prediction result by using a full connection layer;

loss function of GGCN-SA modellossAs follows:

wherein, the first and the second end of the pipe are connected with each other,

；

Ytand

respectively representing an actual traffic state and a predicted traffic state;

is a regularization parameter;

wis the weight of the image,m=2，w _p respectively takew1 andw2。