Multi-scale traffic flow prediction method based on graph convolution neural network
Technical Field
The invention provides a multi-scale traffic flow prediction method based on a graph convolution neural network, relates to the field of space-time big data prediction, and is mainly used for predicting traffic flows of different granularities in cities so as to help traffic departments to alleviate traffic congestion.
Background
With the acceleration of urbanization process in China, the contradiction between the growing urban population and the limited space resources is increasingly serious day by day, and the problem of traffic jam becomes a big problem which hinders urban development. Since the sixties of the last century, urban traffic planning and urban traffic control have been studied in countries around the world, but with the continuous expansion of urban scales and the increasing complexity of traffic conditions, effective traffic management by these two measures is no longer feasible, and therefore Intelligent Transportation Systems (ITS) have come into force. The intelligent traffic system combines advanced physical communication equipment and intelligent computer technology to establish an information prediction and management system aiming at the whole traffic network, and is the best way for comprehensively and effectively solving the problems in the field of traffic transportation including traffic jam at present.
Urban traffic flow prediction is an important component of an intelligent traffic system, and comprises traffic speed, traffic density and the like. The method has important research and application values in many fields, and most of the traditional prediction methods are based on statistics, including ARIMA, VAR and the like. Statistical-based methods typically learn linear mapping models based on historical traffic data to predict their future trends. While such methods may achieve the desired performance in road level traffic prediction, their performance may be significantly degraded when applied to predicting traffic across a road network, where the correlation between roads is highly non-linear and dynamic, and none of such methods is well characterized by the spatiotemporal nature of the data. With the advancement of technology, the increase of hardware, and the collection of a large amount of data, neural networks are widely used due to their excellent performance, and with the introduction of network structures of a convolutional neural network, a recurrent neural network, and a series of variants thereof, various deep learning models are used for traffic flow prediction. Many researchers have proposed a series of new approaches, such as: DCRNN, STGCN, TGCN and the like, and the neural network-based methods learn characteristics from a large amount of data, well utilize the spatiotemporal characteristics of the data and obtain excellent performance. The above studies are all the existing technical exploration and further optimization in urban traffic flow prediction, but the above methods have some limitations. First, most of the previous studies have focused on predicting traffic conditions on each road, which can be considered as a fine-grained prediction. However, in many cases, coarse-grained predictions are also needed, such as predicting traffic flow between different urban areas covering multiple road connections, to help governments better understand traffic conditions from a macroscopic perspective. This is particularly useful in applications of city planning and public traffic planning.
In summary, the existing urban traffic flow prediction model usually ignores the mutual influence among different granularity data, has a defect in the aspect of area prediction, and also has the problem of fuzzy prediction. Therefore, the existing problems often have the defects of low prediction accuracy and efficiency.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a multi-scale urban traffic flow prediction method based on a graph neural network, aiming at the defects of the prior art and solving the problems of the defects in the background art. By adopting the method disclosed by the invention, the traffic flow prediction of different scales of the whole city can be realized by effectively utilizing the space-time correlation of the traffic data flow, and higher prediction precision can be ensured under different conditions.
The technical scheme is as follows: a multi-scale urban traffic flow prediction method based on a graph neural network comprises the following specific steps:
the method comprises the following steps: data pre-processing
1) Obtaining an allocation matrix A based on a graph structurefc1. The original data is processed to remove abnormal values. Processing fine-grained data by using a Louvain algorithm in community detection to obtain a mapping matrix A from fine granularity to coarse granularity based on a graph structurefc1。
2) Obtaining an allocation matrix A based on node characteristicsfc2. Collecting the fine-grained data of all the moments together to form a new matrix, and performing spectral clustering on the matrix to obtain a mapping matrix A from fine granularity to coarse granularity based on node characteristicsfc2。
3) And fusing the two mapping matrixes to obtain a final mapping matrix. Specifically, the dot product operation is performed on the two mapping matrixes, so that the final mapping matrix A is obtainedfc:
Afc=softmax(Afc1⊙Afc2)
4) Then, fine-grained data are aggregated by adopting a summing or averaging mode, so that coarse-grained data X are obtainedc:
Xc=Agg(v1,v2,...,vn)
The fine-grained and coarse-grained data are urban traffic flow historical data tensors required by us.
Step two: training neural networks
And (4) training the whole network by using the fine-grained and coarse-grained urban traffic flow data constructed in the step one. The model has two parts: the device comprises a spatial feature extraction module and a temporal feature extraction module. The space extraction module comprises a common graph convolution GCN and a Cross-Scale graph convolution Cross-Scale GCN, wherein the GCN is used for respectively extracting the respective space characteristics of different granularity data, and the Cross-Scale GCN fuses the space characteristics of the different granularity data. The time characteristic extraction module comprises an Inter-orientation and an Intro-orientation, wherein the Inter-orientation is used for enhancing the capability of capturing time correlation in the same granularity, and the Intro-orientation is used for capturing time correlation of data with different granularities.
The input of the network is a fine-grained historical traffic characteristic matrix, a coarse-grained historical traffic characteristic matrix and an adjacent matrix corresponding to the fine-grained historical traffic characteristic matrix and the coarse-grained historical traffic characteristic matrix. The value of the adjacency matrix represents whether two nodes are connected or not, and if the two nodes are connected, the adjacency matrix is 1, and the unconnected nodes are 0. The method comprises the steps of firstly, performing convolution on coarse-and-fine-granularity data through a GCN, then fusing the characteristics of the data with different granularities through a Cross-Scale GCN by using a mapping matrix, wherein information with fine granularity flows to the coarse granularity, and meanwhile, the data with the coarse granularity also flows to the fine granularity. And then fusing the hidden representations of the two kinds of granularity data through Intra-orientation by adding an Inter-orientation LSTM, and finally mapping the hidden representations into the data with different granularities through a full connection layer to obtain final prediction data.
In addition, in order to ensure the accuracy of the fine granularity and the coarse granularity, structural constraints are added between the predicted values and the real values, so that the predicted values of the coarse granularity nodes are ensured to be corresponding to the fine granularity nodes corresponding to the coarse granularity nodes. If with X
TRepresenting the real data of a fine grain size,
fine-grained prediction data representing the output,
represents the coarse-grained real data of the image,
coarse-grained prediction data representing the output, using the mean square error penalty, the objective function can ultimately be described in the form:
wherein lambda is a hyper-parameter, the loss function is optimized by using the Adam algorithm and the back propagation algorithm, and finally, when the algorithm converges, an optimal solution is obtained.
Step three: generating a prediction result
Using the fine-grained and coarse-grained city traffic flow matrix { X) of the first t momentsiI ═ 1,. t } and the corresponding adjacency matrix are input into the trained network model to obtain the prediction results of the urban traffic flow with two granularities at the next moment, namely the flow { X with the fine granularityiI ═ t +1} and coarse grain flow { X |i|i=t+1}。
As a further preferable aspect of the present invention, in the second step, the spatial convolution module and the temporal feature extraction module are specifically designed as follows:
for data with two granularities of thickness and fineness, two independent graph neural networks are used, and a space convolution module firstly contains a common GCN which can be aggregatedThe characteristic information of the local node also comprises Cross-scale GCN, the result is added to the fine-grained characteristic through coarse-grained convolution, and the Cross-scale GCN and the fine-grained characteristic realize information transmission. We learn a better representation of the nodes by passing information in each graph and exchange information between the two representations using a flow of information from coarse to fine and from fine to coarse, which allows the nodes in the graph to capture the characteristics of distant nodes. Spatial convolution module we use the Attention-added LSTM, and when the input sequence is very long, it is difficult for the model to learn a reasonable vector representation. Thus, Intra-attention is added so that the data at the last moment can be differently weighted to consider all previous moment data. The input history data is { X1,X2,...,XTPredicting data { X) at next timet+1And after the Intra-attribute is added, the model can acquire the historical data at which moment is more important for predicting the next moment. To model the temporal correlation between two scale data, we also designed an Inter-Attention mechanism. Since the feature dimensions of the two scale data are different, we need to convert them to the same feature space first. We can upsample the coarse-grained features into a fine-grained feature space. Then, Inter-orientation is carried out, and finally, data with different granularities in thickness are mapped through MLP to obtain final prediction.
Has the advantages that: the invention provides an urban traffic flow prediction method based on a graph neural network aiming at the problem of urban traffic flow prediction. Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1) the invention firstly researches the multi-scale traffic prediction problem and provides a multi-task space-time network model to realize urban traffic flow prediction of different scales.
2) A cross-scale space-time feature learning mechanism is provided, and the mechanism comprises a cross-scale GCN layer which effectively fuses cross-scale space features and a layered attention mechanism which captures cross-scale time correlation, so that model prediction is more accurate.
3) In order to meet the consistency of multi-scale traffic data prediction results, structural constraints are designed for the objective function.
Drawings
FIG. 1 is a method flow diagram;
FIG. 2 is a detailed design block diagram of a model;
FIG. 3 is a schematic diagram of the Intra-orientation and Inter-orientation modules;
Detailed Description
The technical scheme of the invention is further explained in detail with reference to the attached drawings.
The overall flow of the urban traffic flow prediction method based on the graph neural network is shown in figure 1. And inputting the preprocessed data into a module containing spatial feature extraction and temporal feature extraction to generate the coarse-grained and fine-grained urban traffic flow at the future moment. Two different granularities of data can interact, specifically, the invention constructs as input two sets of data:
XT: predicting fine-grained traffic flow data, X, at n moments before a time pointt={xt|t=1,...n}
Predicting coarse-grained traffic flow data at n moments before the time point,
the invention discloses a multi-scale urban traffic flow prediction method based on a graph neural network, which comprises the following specific processes:
the method comprises the following steps: data pre-processing
5) Obtaining an allocation matrix A based on a graph structurefc1. The original data is processed to remove abnormal values. Processing fine-grained data by using a Louvain algorithm in community detection to obtain a mapping matrix A from fine granularity to coarse granularity based on a graph structurefc1。
6) Obtaining an allocation matrix A based on node characteristicsfc2. The fine-grained data of all the moments are gathered together to formApplying spectral clustering to a new matrix to obtain a mapping matrix A from fine granularity to coarse granularity based on node characteristicsfc2。
7) And fusing the two mapping matrixes to obtain a final mapping matrix. Specifically, the dot product operation is carried out on the two mapping matrixes to obtain a final mapping matrix Afc。
Afc=softmax(Afc1⊙Afc2)
8) Then, fine-grained data are aggregated by adopting a summing or averaging mode, so that coarse-grained data X are obtainedc
Xc=Agg(v1,v2,...,vn)
The fine-grained and coarse-grained data are urban traffic flow historical data tensors required by us.
Step two: training neural networks
And (4) training the whole network by using the fine-grained and coarse-grained urban traffic flow data constructed in the step one. As shown in fig. 2, the model has two parts: the device comprises a spatial feature extraction module and a temporal feature extraction module. The space extraction module comprises GCN and Cross-Scale GCN, wherein the GCN is used for respectively extracting the respective space characteristics of the data with different granularities, and the Cross-Scale GCN fuses the characteristics of the data with different granularities. The time extraction module comprises an Inter-orientation and an intra-orientation, wherein the Inter-orientation is used for enhancing the capability of capturing time correlation in the same granularity, and the intra-orientation is used for capturing time correlation of data in different granularities.
The input of the network is a historical flow characteristic matrix { X with fine granularity
1,X
2,...,X
TAnd coarse-grained historical traffic characterization matrices
In addition, in order to ensure the accuracy of the fine granularity and the coarse granularity, structural constraints are added between the predicted values of the coarse granularity nodes and the corresponding fine granularity nodes, so that the predicted values of the coarse granularity nodes and the corresponding fine granularity nodes are ensuredIs the corresponding.
If with X
TRepresenting the real data of a fine grain size,
fine-grained prediction data representing the output,
represents the coarse-grained real data of the image,
coarse-grained prediction data representing the output, using the mean square error penalty, the objective function can be described ultimately as follows:
wherein lambda is a hyper-parameter, the loss function is optimized by using the Adam algorithm and the back propagation algorithm, and finally, when the algorithm converges, an optimal solution is obtained.
For data with two granularities of thickness and fineness, two independent graph neural networks are used, a space convolution module firstly comprises a common GCN, the common GCN can aggregate characteristic information of local nodes, remote traffic flow can also influence nodes at the current position, Cross-scale GCN is added for enabling the nodes to capture the remote node information, the result is added to fine-grained characteristics through coarse-grained convolution, and information transmission is achieved through the coarse-grained convolution and the fine-grained characteristics. We learn a better representation of the nodes by passing information in each graph and exchange information between the two representations using a flow of information from coarse to fine and from fine to coarse, which allows the nodes in the graph to capture the characteristics of distant nodes. Our method can overcome some known limitations of classical graph neural networks, such as capturing information of distant nodes while performing effective training. The formula for Cross-scale GCN is as follows:
the spatial convolution module we use LSTM and improve, as shown in fig. 3, when the input sequence is very long, it is difficult for the model to learn a reasonable vector representation. The Intra-attribute is added, so that the data of the last moment can consider the data of all the previous moments in different emphasis, and the most core operation is a string of weight parameters, the importance degree of each element is learned from the sequence, and then the elements are combined according to the importance degree. The weighting parameter is a coefficient of attention allocation, which element is assigned more or less attention. The Attention mechanism is implemented by retaining intermediate output results of the LSTM encoder on the input sequence, then training a model to selectively learn these inputs and associate the output sequence with them as the model is output. The input history data is { X1,X2,...,XTPredicting data { X) at next timet+1And after the Intra-attribute is added, the model can acquire the historical data at which moment is more important for predicting the next moment. The Intra-anchorage formula is as follows
ft=σ(Wf[ht-1,Ht]+bf)
it=σ(Wi[ht-1,Ht]+bf)
ct=ft⊙ct-1+tanh(Wc[ht-1,Ht]+bc)
ot=σ(Wo[ht-1,Ht]+bo),ht=ot⊙tanh(ct)
[α1,α2,...,αm]=align(ht,hm)
Wherein σ represents an activation function,. alpha.represents a Hadamard product,. ft,it,otRespectively representing a forgetting gate, an input gate and an output gate. c. Ct,htMemory cells and hidden features, respectively. align represents the Intra-attribute computation similarity function and s represents the final hidden representation.
To simulate the temporal correlation between two scale data, we also designed an Inter-orientation mechanism, as shown on the right side of FIG. 3. Since the feature dimensions of the two scale data are different, we need to convert them to the same feature space first. We first upsample coarse-grained features to the fine-grained feature space, in s'cRepresenting the final hidden feature, then mapping the fine-grained feature to the coarse-grained feature, and using s' to represent the final hidden feature, wherein the internal attention formula is as follows
Z=β1s+β2sc
Zc=βc,1s’+βc,2sc
β represents the coefficient of Inter-Attention.
Step three: generating a prediction result
The fine-grained traffic flow matrix (X) at the
first t moments i1,. t } and coarse grain traffic flow matrix
Inputting the corresponding adjacency matrix into the trained network model to obtain the prediction results of the urban traffic flow with two granularities at the next moment, namely the fine-grained flow { X
iI ═ t +1} and coarse grain flow { X |
i|i=t+1}。
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.