CN112767682A

CN112767682A - Multi-scale traffic flow prediction method based on graph convolution neural network

Info

Publication number: CN112767682A
Application number: CN202011513907.XA
Authority: CN
Inventors: 张美越; 王森章; 缪浩; 杜金龙
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-05-07

Abstract

This paper discloses a multi-task urban traffic flow prediction method based on graph neural network, and for the first time studies the new problem of predicting multi-scale (fine-grained and coarse-grained) traffic flow. (road links) between topological proximity and traffic flow similarity to construct a coarse-grained road map; then a cross-scale graph convolution Cross‑Scale GCN is proposed to extract fine-grained and coarse-grained traffic flow features and convert them into Fusion. Using LSTM with Intra-Attention and Inter-Attention to extract temporal features, in order to ensure the consistency of the prediction results of the two scales of data, structural constraints are introduced. The scheme shows excellent performance in both fine-grained and coarse-grained traffic forecasting, and improves the forecasting accuracy.

Description

Multi-scale traffic flow prediction method based on graph convolution neural network

Technical Field

The invention provides a multi-scale traffic flow prediction method based on a graph convolution neural network, relates to the field of space-time big data prediction, and is mainly used for predicting traffic flows of different granularities in cities so as to help traffic departments to alleviate traffic congestion.

Background

With the acceleration of urbanization process in China, the contradiction between the growing urban population and the limited space resources is increasingly serious day by day, and the problem of traffic jam becomes a big problem which hinders urban development. Since the sixties of the last century, urban traffic planning and urban traffic control have been studied in countries around the world, but with the continuous expansion of urban scales and the increasing complexity of traffic conditions, effective traffic management by these two measures is no longer feasible, and therefore Intelligent Transportation Systems (ITS) have come into force. The intelligent traffic system combines advanced physical communication equipment and intelligent computer technology to establish an information prediction and management system aiming at the whole traffic network, and is the best way for comprehensively and effectively solving the problems in the field of traffic transportation including traffic jam at present.

Urban traffic flow prediction is an important component of an intelligent traffic system, and comprises traffic speed, traffic density and the like. The method has important research and application values in many fields, and most of the traditional prediction methods are based on statistics, including ARIMA, VAR and the like. Statistical-based methods typically learn linear mapping models based on historical traffic data to predict their future trends. While such methods may achieve the desired performance in road level traffic prediction, their performance may be significantly degraded when applied to predicting traffic across a road network, where the correlation between roads is highly non-linear and dynamic, and none of such methods is well characterized by the spatiotemporal nature of the data. With the advancement of technology, the increase of hardware, and the collection of a large amount of data, neural networks are widely used due to their excellent performance, and with the introduction of network structures of a convolutional neural network, a recurrent neural network, and a series of variants thereof, various deep learning models are used for traffic flow prediction. Many researchers have proposed a series of new approaches, such as: DCRNN, STGCN, TGCN and the like, and the neural network-based methods learn characteristics from a large amount of data, well utilize the spatiotemporal characteristics of the data and obtain excellent performance. The above studies are all the existing technical exploration and further optimization in urban traffic flow prediction, but the above methods have some limitations. First, most of the previous studies have focused on predicting traffic conditions on each road, which can be considered as a fine-grained prediction. However, in many cases, coarse-grained predictions are also needed, such as predicting traffic flow between different urban areas covering multiple road connections, to help governments better understand traffic conditions from a macroscopic perspective. This is particularly useful in applications of city planning and public traffic planning.

In summary, the existing urban traffic flow prediction model usually ignores the mutual influence among different granularity data, has a defect in the aspect of area prediction, and also has the problem of fuzzy prediction. Therefore, the existing problems often have the defects of low prediction accuracy and efficiency.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a multi-scale urban traffic flow prediction method based on a graph neural network, aiming at the defects of the prior art and solving the problems of the defects in the background art. By adopting the method disclosed by the invention, the traffic flow prediction of different scales of the whole city can be realized by effectively utilizing the space-time correlation of the traffic data flow, and higher prediction precision can be ensured under different conditions.

The technical scheme is as follows: a multi-scale urban traffic flow prediction method based on a graph neural network comprises the following specific steps:

the method comprises the following steps: data pre-processing

1) Obtaining an allocation matrix A based on a graph structure_fc1. The original data is processed to remove abnormal values. Processing fine-grained data by using a Louvain algorithm in community detection to obtain a mapping matrix A from fine granularity to coarse granularity based on a graph structure_fc1。

2) Obtaining an allocation matrix A based on node characteristics_fc2. Collecting the fine-grained data of all the moments together to form a new matrix, and performing spectral clustering on the matrix to obtain a mapping matrix A from fine granularity to coarse granularity based on node characteristics_fc2。

3) And fusing the two mapping matrixes to obtain a final mapping matrix. Specifically, the dot product operation is performed on the two mapping matrixes, so that the final mapping matrix A is obtained_fc：

A_fc＝softmax(A_fc1⊙A_fc2)

4) Then, fine-grained data are aggregated by adopting a summing or averaging mode, so that coarse-grained data X are obtained_c：

X_c＝Agg(v₁，v₂，...，v_n)

The fine-grained and coarse-grained data are urban traffic flow historical data tensors required by us.

Step two: training neural networks

And (4) training the whole network by using the fine-grained and coarse-grained urban traffic flow data constructed in the step one. The model has two parts: the device comprises a spatial feature extraction module and a temporal feature extraction module. The space extraction module comprises a common graph convolution GCN and a Cross-Scale graph convolution Cross-Scale GCN, wherein the GCN is used for respectively extracting the respective space characteristics of different granularity data, and the Cross-Scale GCN fuses the space characteristics of the different granularity data. The time characteristic extraction module comprises an Inter-orientation and an Intro-orientation, wherein the Inter-orientation is used for enhancing the capability of capturing time correlation in the same granularity, and the Intro-orientation is used for capturing time correlation of data with different granularities.

The input of the network is a fine-grained historical traffic characteristic matrix, a coarse-grained historical traffic characteristic matrix and an adjacent matrix corresponding to the fine-grained historical traffic characteristic matrix and the coarse-grained historical traffic characteristic matrix. The value of the adjacency matrix represents whether two nodes are connected or not, and if the two nodes are connected, the adjacency matrix is 1, and the unconnected nodes are 0. The method comprises the steps of firstly, performing convolution on coarse-and-fine-granularity data through a GCN, then fusing the characteristics of the data with different granularities through a Cross-Scale GCN by using a mapping matrix, wherein information with fine granularity flows to the coarse granularity, and meanwhile, the data with the coarse granularity also flows to the fine granularity. And then fusing the hidden representations of the two kinds of granularity data through Intra-orientation by adding an Inter-orientation LSTM, and finally mapping the hidden representations into the data with different granularities through a full connection layer to obtain final prediction data.

In addition, in order to ensure the accuracy of the fine granularity and the coarse granularity, structural constraints are added between the predicted values and the real values, so that the predicted values of the coarse granularity nodes are ensured to be corresponding to the fine granularity nodes corresponding to the coarse granularity nodes. If with X^TRepresenting the real data of a fine grain size,

fine-grained prediction data representing the output,

represents the coarse-grained real data of the image,

coarse-grained prediction data representing the output, using the mean square error penalty, the objective function can ultimately be described in the form:

wherein lambda is a hyper-parameter, the loss function is optimized by using the Adam algorithm and the back propagation algorithm, and finally, when the algorithm converges, an optimal solution is obtained.

Step three: generating a prediction result

Using the fine-grained and coarse-grained city traffic flow matrix { X) of the first t momentsⁱI ═ 1,. t } and the corresponding adjacency matrix are input into the trained network model to obtain the prediction results of the urban traffic flow with two granularities at the next moment, namely the flow { X with the fine granularityⁱI ═ t +1} and coarse grain flow { X |ⁱ|i＝t+1}。

As a further preferable aspect of the present invention, in the second step, the spatial convolution module and the temporal feature extraction module are specifically designed as follows:

for data with two granularities of thickness and fineness, two independent graph neural networks are used, and a space convolution module firstly contains a common GCN which can be aggregatedThe characteristic information of the local node also comprises Cross-scale GCN, the result is added to the fine-grained characteristic through coarse-grained convolution, and the Cross-scale GCN and the fine-grained characteristic realize information transmission. We learn a better representation of the nodes by passing information in each graph and exchange information between the two representations using a flow of information from coarse to fine and from fine to coarse, which allows the nodes in the graph to capture the characteristics of distant nodes. Spatial convolution module we use the Attention-added LSTM, and when the input sequence is very long, it is difficult for the model to learn a reasonable vector representation. Thus, Intra-attention is added so that the data at the last moment can be differently weighted to consider all previous moment data. The input history data is { X¹，X²，...，X^TPredicting data { X) at next time^t+1And after the Intra-attribute is added, the model can acquire the historical data at which moment is more important for predicting the next moment. To model the temporal correlation between two scale data, we also designed an Inter-Attention mechanism. Since the feature dimensions of the two scale data are different, we need to convert them to the same feature space first. We can upsample the coarse-grained features into a fine-grained feature space. Then, Inter-orientation is carried out, and finally, data with different granularities in thickness are mapped through MLP to obtain final prediction.

Has the advantages that: the invention provides an urban traffic flow prediction method based on a graph neural network aiming at the problem of urban traffic flow prediction. Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

1) the invention firstly researches the multi-scale traffic prediction problem and provides a multi-task space-time network model to realize urban traffic flow prediction of different scales.

2) A cross-scale space-time feature learning mechanism is provided, and the mechanism comprises a cross-scale GCN layer which effectively fuses cross-scale space features and a layered attention mechanism which captures cross-scale time correlation, so that model prediction is more accurate.

3) In order to meet the consistency of multi-scale traffic data prediction results, structural constraints are designed for the objective function.

Drawings

FIG. 1 is a method flow diagram;

FIG. 2 is a detailed design block diagram of a model;

FIG. 3 is a schematic diagram of the Intra-orientation and Inter-orientation modules;

Detailed Description

The technical scheme of the invention is further explained in detail with reference to the attached drawings.

The overall flow of the urban traffic flow prediction method based on the graph neural network is shown in figure 1. And inputting the preprocessed data into a module containing spatial feature extraction and temporal feature extraction to generate the coarse-grained and fine-grained urban traffic flow at the future moment. Two different granularities of data can interact, specifically, the invention constructs as input two sets of data:

X^T: predicting fine-grained traffic flow data, X, at n moments before a time point^t＝{x^t|t＝1，...n}

Predicting coarse-grained traffic flow data at n moments before the time point,

the invention discloses a multi-scale urban traffic flow prediction method based on a graph neural network, which comprises the following specific processes:

the method comprises the following steps: data pre-processing

5) Obtaining an allocation matrix A based on a graph structure_fc1. The original data is processed to remove abnormal values. Processing fine-grained data by using a Louvain algorithm in community detection to obtain a mapping matrix A from fine granularity to coarse granularity based on a graph structure_fc1。

6) Obtaining an allocation matrix A based on node characteristics_fc2. The fine-grained data of all the moments are gathered together to formApplying spectral clustering to a new matrix to obtain a mapping matrix A from fine granularity to coarse granularity based on node characteristics_fc2。

7) And fusing the two mapping matrixes to obtain a final mapping matrix. Specifically, the dot product operation is carried out on the two mapping matrixes to obtain a final mapping matrix A_fc。

A_fc＝softmax(A_fc1⊙A_fc2)

8) Then, fine-grained data are aggregated by adopting a summing or averaging mode, so that coarse-grained data X are obtained_c

X_c＝Agg(v₁，v₂，...，v_n)

Step two: training neural networks

And (4) training the whole network by using the fine-grained and coarse-grained urban traffic flow data constructed in the step one. As shown in fig. 2, the model has two parts: the device comprises a spatial feature extraction module and a temporal feature extraction module. The space extraction module comprises GCN and Cross-Scale GCN, wherein the GCN is used for respectively extracting the respective space characteristics of the data with different granularities, and the Cross-Scale GCN fuses the characteristics of the data with different granularities. The time extraction module comprises an Inter-orientation and an intra-orientation, wherein the Inter-orientation is used for enhancing the capability of capturing time correlation in the same granularity, and the intra-orientation is used for capturing time correlation of data in different granularities.

The input of the network is a historical flow characteristic matrix { X with fine granularity¹，X²，...，X^TAnd coarse-grained historical traffic characterization matrices

In addition, in order to ensure the accuracy of the fine granularity and the coarse granularity, structural constraints are added between the predicted values of the coarse granularity nodes and the corresponding fine granularity nodes, so that the predicted values of the coarse granularity nodes and the corresponding fine granularity nodes are ensuredIs the corresponding.

If with X^TRepresenting the real data of a fine grain size,

fine-grained prediction data representing the output,

represents the coarse-grained real data of the image,

coarse-grained prediction data representing the output, using the mean square error penalty, the objective function can be described ultimately as follows:

For data with two granularities of thickness and fineness, two independent graph neural networks are used, a space convolution module firstly comprises a common GCN, the common GCN can aggregate characteristic information of local nodes, remote traffic flow can also influence nodes at the current position, Cross-scale GCN is added for enabling the nodes to capture the remote node information, the result is added to fine-grained characteristics through coarse-grained convolution, and information transmission is achieved through the coarse-grained convolution and the fine-grained characteristics. We learn a better representation of the nodes by passing information in each graph and exchange information between the two representations using a flow of information from coarse to fine and from fine to coarse, which allows the nodes in the graph to capture the characteristics of distant nodes. Our method can overcome some known limitations of classical graph neural networks, such as capturing information of distant nodes while performing effective training. The formula for Cross-scale GCN is as follows:

the spatial convolution module we use LSTM and improve, as shown in fig. 3, when the input sequence is very long, it is difficult for the model to learn a reasonable vector representation. The Intra-attribute is added, so that the data of the last moment can consider the data of all the previous moments in different emphasis, and the most core operation is a string of weight parameters, the importance degree of each element is learned from the sequence, and then the elements are combined according to the importance degree. The weighting parameter is a coefficient of attention allocation, which element is assigned more or less attention. The Attention mechanism is implemented by retaining intermediate output results of the LSTM encoder on the input sequence, then training a model to selectively learn these inputs and associate the output sequence with them as the model is output. The input history data is { X¹，X²，...，X^TPredicting data { X) at next time^t+1And after the Intra-attribute is added, the model can acquire the historical data at which moment is more important for predicting the next moment. The Intra-anchorage formula is as follows

f_t＝σ(W_f[h_t-1，H_t]+b_f)

i_t＝σ(W_i[h_t-1，H_t]+b_f)

c_t＝f_t⊙c_t-1+tanh(W_c[h_t-1，H_t]+b_c)

o_t＝σ(W_o[h_t-1，H_t]+b_o)，h_t＝o_t⊙tanh(c_t)

[α₁，α₂，...，α_m]＝align(h_t，h_m)

Wherein σ represents an activation function,. alpha.represents a Hadamard product,. f_t，i_t，o_tRespectively representing a forgetting gate, an input gate and an output gate. c. C_t，h_tMemory cells and hidden features, respectively. align represents the Intra-attribute computation similarity function and s represents the final hidden representation.

To simulate the temporal correlation between two scale data, we also designed an Inter-orientation mechanism, as shown on the right side of FIG. 3. Since the feature dimensions of the two scale data are different, we need to convert them to the same feature space first. We first upsample coarse-grained features to the fine-grained feature space, in s'_cRepresenting the final hidden feature, then mapping the fine-grained feature to the coarse-grained feature, and using s' to represent the final hidden feature, wherein the internal attention formula is as follows

Z＝β₁s+β₂s_c

Z_c＝β_c，1s’+β_c，2s_c

β represents the coefficient of Inter-Attention.

Step three: generating a prediction result

The fine-grained traffic flow matrix (X) at the first t moments ⁱ1,. t } and coarse grain traffic flow matrix

Inputting the corresponding adjacency matrix into the trained network model to obtain the prediction results of the urban traffic flow with two granularities at the next moment, namely the fine-grained flow { XⁱI ═ t +1} and coarse grain flow { X |ⁱ|i＝t+1}。

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. A multi-scale urban traffic flow prediction method based on graph neural network, its main features include the following steps:

(1) Preprocessing the observed data:

Two allocation matrices A _fc1 and A _fc2 are obtained based on the graph structure and node features. The two mapping matrices are fused to obtain the final mapping matrix. Specifically, the dot product operation is performed on the two mapping matrices to obtain the final mapping matrix A _fc :

A _fc =softmax(A _fc1 ⊙A _fc2 )

Then aggregate the fine-grained data by summing or averaging to obtain the coarse-grained data X _c :

X _c =Agg(v ₁ , v ₂ , . . . , v _n )

(2) Problem definition: Given a road sensor network G, fine-grained traffic flow observations {X ¹ , X ² , ..., X ^T }, and coarse-grained traffic flow observations

Our goal is to simultaneously predict the multi-scale traffic flow in the next moment

(3) For coarse and fine data, we use two independent graph neural networks. The spatial convolution module first includes ordinary GCN, which can aggregate the feature information of local nodes, and the long-distance traffic flow may also affect the current position. In order to enable the node to capture the information of distant nodes, cross-scale GCN is added to convolve the cross-scale graph, and the convolution result of the coarse-grained feature is added to the fine-grained feature, and the two realize information transfer. We learn better representations of nodes by passing information in each graph and exchange information between the two representations using coarse-to-fine and fine-to-coarse information flow, which allows nodes in the graph to capture long-range The characteristics of nodes can also assist each other in prediction;

(4) The data input after graph convolution is added to the LSTM of Attention. When the input time series is very long, it is difficult for the model to learn a reasonable vector representation. Intra-attention is added for this, so that the data at the last moment can be different Focusing on the data of all the previous moments, it learns the importance of each moment from the time series, and then combines the elements according to the importance. The weight parameter is an attention distribution coefficient, which assigns attention weights of different sizes to each element. The attention mechanism is implemented by retaining the intermediate output results of the LSTM encoder on the input sequence, and then training a model to selectively learn these inputs and associate the output sequence with it when the model outputs;

(5) In order to simulate the temporal correlation between the two scale data, we also design an Inter-Attention mechanism. Since the feature dimensions of the two scale data are different, we need to convert them to the same feature space first. We first upsample the coarse-grained features to the fine-grained feature space, perform an Inter-Attention to obtain a fine-grained prediction, then map the fine-grained features to coarse-grained features, and perform an Inter-Attention to obtain a coarse-grained prediction;

(6) The gradient stochastic descent method is used, and the model is optimized by back-propagation, so that the data generation is more accurate.

2. a kind of multi-scale urban traffic flow prediction method based on graph neural network according to claim 1, utilizes common graph convolutional neural network GCN, cross-scale graph convolutional neural network Cross-Scale GCN and joined Intra-Attention , Inter-Attention's LSTM learns fine-grained and coarse-grained spatiotemporal features of traffic flow. Information transfer can be achieved between two granularity traffic flow data, learning better node representations by passing information in each graph, and using coarse-to-fine and fine-to-coarse information flow between the two representations Exchange information, which allows nodes in the graph to capture features of distant nodes. Intra-Attention and Inter-Attention can help the model better capture temporal features, and can consider the data of all historical moments with different emphasis. The method improves the accuracy of prediction, provides a more powerful auxiliary tool for urban traffic planning, route selection and traffic risk prediction, and provides a more convenient and accurate method.