CN112562312B

CN112562312B - GraphSAGE traffic network data prediction method based on fusion features

Info

Publication number: CN112562312B
Application number: CN202011129295.4A
Authority: CN
Inventors: 徐东伟; 商学天; 魏臣臣; 林臻谦; 丁加丽; 彭航
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2022-10-28
Anticipated expiration: 2040-10-21
Also published as: CN112562312A

Abstract

A method for predicting data of a GraphSAGE traffic network based on fusion characteristics comprises the steps of firstly, carrying out correlation coefficient calculation on historical traffic flow data of the network, constructing a correlation matrix of the network, redefining the communication state between nodes according to the correlation between the nodes of the network, and obtaining a topological network based on time correlation; and then, respectively extracting road network characteristic information of the original traffic road network and the reconstructed topological road network by using GraphSAGE, and predicting the future traffic state of the road network by fusing the road network space-time characteristic information extracted by two different road networks. The invention integrates the road network space-time characteristic information extracted from two different road networks, predicts the future traffic state of the road network and improves the accuracy of predicting the traffic network state data.

Description

GraphSAGE traffic network data prediction method based on fusion features

Technical Field

The invention relates to a method for predicting GraphSAGE traffic network data based on fusion characteristics, and belongs to the field of intelligent transportation.

Background

With the rapid development of modern cities, the number of people and vehicles is rapidly increased, so that the problem of urban traffic road congestion is more and more severe, and people and society are not disturbed, therefore, the traffic state is better adjusted in order to ensure that the road traffic has stronger liquidity, and the prediction of future traffic state data is of great significance.

The road traffic prediction method at the present stage mainly comprises the following steps: the method comprises the following steps of graph convolution neural network, noise reduction self-encoder, support vector machine, feedback neural network and the like, but most of the methods are direct-push learning and cannot be directly generalized to unknown roads.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method for predicting the GraphSAGE traffic network data based on fusion characteristics.

The technical scheme adopted by the invention for solving the technical problem is as follows:

a method for predicting GraphSAGE traffic network data based on fusion characteristics comprises the following steps:

1) Constructing a topological network based on time correlation: calculating correlation coefficients among different road network nodes according to historical traffic state data of the road network, redefining the communication relation among the nodes according to the correlation coefficients among the different nodes, and constructing a logic correlation road network based on time correlation;

2) Extracting road network space-time characteristics based on GraphSAGE and performing characteristic fusion: adopting GraphSAGE to respectively extract space-time characteristics of the original road network and the reconstructed logic-related road network based on time correlation, and performing characteristic fusion on the extracted different characteristics;

3) Defining a network model loss function, continuously training and adjusting model parameters by taking a minimum loss function as a target, and finally realizing the prediction of the traffic state of the road network: and defining a model loss function, continuously carrying out iterative training by adopting a back propagation algorithm to reduce the loss function, and finally storing optimal model parameters to realize the prediction of future traffic state data of the road network based on the historical traffic state data of the road network.

The technical conception of the invention is as follows: firstly, performing correlation coefficient calculation on historical traffic flow data of a road network, constructing a road network correlation matrix, redefining the communication state between nodes according to the correlation between the road network nodes, and obtaining a topological road network based on time correlation; and then, respectively extracting road network characteristic information of the original traffic road network and the reconstructed topological road network by using GraphSAGE, and predicting the future traffic state of the road network by fusing the road network space-time characteristic information extracted by two different road networks. The method has a crucial role in the field of intelligent traffic, realizes the extraction of the space-time characteristics of the traffic flow state, and improves the accuracy of the traffic network state prediction.

The invention has the following beneficial effects: (1) And fully mining the space-time characteristics of the road network by adopting a GraphSAGE graph aggregation algorithm. (2) The original road network and the reconstructed topological road network are subjected to space-time feature extraction and feature fusion respectively, so that the prediction precision of the traffic road network is effectively improved.

Drawings

FIG. 1 is a diagram of a GraphSAGE network model architecture.

FIG. 2 is the prediction result of GraphSAGE traffic network model based on fusion features (3, 10, 2017).

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a method for predicting GraphSAGE traffic network data based on fusion characteristics comprises the following steps:

1) Constructing a topological network based on time correlation, and the process is as follows:

1.1 ) construct an original road network of traffic

Constructing a traffic original road network G = (V, E), wherein: v = { V) ₁ ,v ₂ ,v ₃ ,…,v _N }，|V|＝N，

N is the number of detectors in a node of the traffic network, E is the adjacency matrix of the traffic network state, i.e. the spatial relationship between the nodes of the traffic network, v _i (

i epsilon

1,2,3, \8230;, N) represents the ith detector which detects the traffic node, and the selected and node v _i The node sets in the space are the connection relation and are marked as

If the i-th detector v _i Representative road node and jth detector v _j If the represented road nodes have adjacent relation, e _ij =1, otherwise: e.g. of a cylinder _ij ＝0；

1.2 Computing correlation coefficients between different road network nodes

Road node v for each detector using Pearson's correlation coefficient calculation formula _i (i ∈ 1,2,3, \8230;, N), the historical road state data of which are as follows: x is the number of _i ＝[x _i1 ,x _i2 ,x _i3 ,…,x _iT ]T is the data quantity in the historical data, the correlation between the nodes of each road network is calculated, and the ith detector v is used for calculating the correlation _i Representative road node and jth detector v _j Pearson's correlation coefficient r between representative road nodes _ij The calculation formula is as follows:

where K is the length of the traffic network state node data represented by the detector chosen in the calculation of the pearson correlation coefficient. Obtaining a Pearson correlation coefficient matrix of x of the road network G by obtaining Pearson correlation coefficients among different detectors

1.3 Constructing a logical correlation road network based on time correlation according to the Pearson correlation coefficient matrix

For each detector node v _i Belongs to V, the Pearson correlation coefficient between detectors is obtained by calculation, and the Pearson correlation coefficient is selected to be connected with a node V _i M detectors with larger coefficient (noted as

) For the continuous-edge relation, a coefficient matrix of the temporal correlation is constructed, wherein,

v _ik representation and node v _i Establishing the kth (k =1,2, \ 8230;, m) node, v, of the continuous edge relation _im ∈V，

Represents rounding down; p is the proportion of the more relevant detector nodes chosen, p ∈ (0, 1). The constructed traffic road network H = (V, A), wherein

a _ij Denotes the ith detector v _i And the jth detector v _j The connection edge relationship existing between the two is as follows:

2) Extracting the space-time characteristics of the road network based on GraphSAGE and fusing the characteristics, wherein the process is as follows:

according to an original road network and a constructed logic correlation road network based on time correlation, space information is aggregated by adopting a mean aggregation method aiming at the neighbor node characteristics of each detector node, and if T-layer mean aggregation is carried out, an aggregation calculation formula is as follows:

wherein

And

respectively represents the node v for the original road network and the logically related road network _i Extracting the t-th layer characteristics through the characteristics of the GraphSAGE traffic network model;

respectively represent

Features obtained by polymerization at layer t by means of a mean polymerization process, in which

And

respectively expressed as original road network and logically related road network, and node v _i A node set with a connection edge relation, namely a neighbor node set; mean is expressed as solving the characteristic mean value of different nodes on different characteristic attributes; CONCAT is expressed as feature merge; sigma ₁ ，σ ₂ Expressed as an activation function; w is a group of ₁ ^t ，W ₂ ^t Weight parameters to be trained for the model;

after T-layer GraphSAGE mean value aggregation is carried out on all nodes in the road network, aggregation characteristics representing adjacent matrixes and based on correlation coefficient matrixes are obtained respectively

And

and performing feature fusion on the two aggregated features, wherein the calculation mode of the feature fusion is as follows:

wherein

Representing the feature after feature fusion, W ^T ，σ ₁ ，σ ₂ B is the parameter to be learned of the model, σ ₁ As ReLU function, σ ₂ As a Sigmoid function, the function expression is:

3) Defining a network model loss function, and predicting traffic network state data, wherein the process is as follows:

defining a network model loss function L _G ：

Defining a model loss function L _G The model is that the output data is subjected to anti-standardization operation to obtain the predicted traffic network state data, and the anti-standardization calculation formula is as follows:

wherein the content of the first and second substances,

respectively represent the minimum value of the speed of the ith road section,

respectively represent the maximum speed of the ith link, F _i(t+q) The speed of the ith road at the (t + q) th time is respectively predicted.

Example (c): the data in the actual experiment were carried out as follows:

1) Selecting experimental data

The experimental data set adopts the speed data of 323 detectors in the Seattle expressway network in 2017 all the year, and the data sampling interval is 5 minutes.

2) Parameter determination

The number of nodes of the traffic network detector is N =323, and the number of features of each node is F =12; the division ratio a =0.8 of the training set and the test set, when a road network structure based on time correlation is constructed and a Pearson correlation coefficient is calculated, the length K =288 × 12=3456 of the historical traffic state data of each selected detector node, and the selection ratio of the detector node with the larger Pearson correlation coefficient is set as p =0.25; the number of layers of GraphSAGE mean aggregation is T =3, the number of hidden units in each layer is 128,64,32, and the activation function sigma is a ReLU activation function; reconstruction error coefficient α =100; the Adam optimizer was used to optimize the model parameters.

3) Results of the experiment

The model evaluation index selects the Root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE). The functional expressions are respectively:

wherein

For the real traffic status data at the kth time,

predicted traffic state data for the kth time.

In the result analysis, the time error of the data of one whole year is evaluated, the experimental result is shown in table 1, and table 1 is the prediction experimental result of the data of the traffic road of 2017 year all the year in seattle:

Time	RMSE	MAE	MAPE(％)
				2017	3.72	2.51	5.59

Table 1.

The embodiments described in this specification are merely exemplary of implementations of the inventive concepts and are provided for illustrative purposes only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the embodiments, but is to be accorded the widest scope consistent with the principles and equivalents thereof as contemplated by those skilled in the art.

Claims

1. A method for predicting GraphSAGE traffic network data based on fusion characteristics is characterized by comprising the following steps:

2) Extracting road network space-time characteristics based on GraphSAGE and performing characteristic fusion: adopting GraphSAGE to respectively extract space-time characteristics of the original road network and the reconstructed logically-related road network based on time correlation, and performing characteristic fusion on the extracted different characteristics;

3) Defining a network model loss function, continuously training and adjusting model parameters by taking the minimum loss function as a target, and finally realizing the prediction of the traffic state of the road network: defining a model loss function, continuously performing iterative training by adopting a back propagation algorithm to reduce the loss function, and finally storing optimal model parameters to realize prediction of future traffic state data of the road network based on historical traffic state data of the road network;

the process of the step 1) is as follows:

1.1 Constructing an original road network of traffic

Constructing a traffic original road network G = (V, E), wherein: v = { V) ₁ ,v ₂ ,v ₃ ,...,v _N }，|V|＝N，

N is the number of detectors in a node of the traffic network, E is the adjacency matrix of the traffic network state, i.e. the spatial relationship between the nodes of the traffic network, v _i (i epsilon 1,2,3, \8230;, N) represents the ith detector which detects the traffic node, and the selected and node v _i The node sets in the connection relation in space are recorded as

If the ith detector v _i Representative road node and jth detector v _j If the represented road nodes have adjacent relation, e _ij =1, otherwise: e.g. of the type _ij ＝0；

1.2 Computing correlation coefficients between different road network nodes

Road node v for each detector using Pearson's correlation coefficient calculation formula _i (i ∈ 1,2,3, \8230;, N), the historical road state data of which are as follows: x is a radical of a fluorine atom _i ＝[x _i1 ,x _i2 ,x _i3 ,…,x _iT ]T is the data quantity in the historical data, the correlation between each road network node is calculated, and the ith detector v _i Representative road node and jth detector v _j Pearson's correlation coefficient r between representative road nodes _ij The calculation formula is as follows:

k is the length of the traffic network state node data represented by the detector selected when the Pearson correlation coefficient is calculated, and the Pearson correlation coefficient matrix of x of the road network G is obtained by obtaining the Pearson correlation coefficient among different detectors

For each detector node v _i E, V, calculating to obtain Pearson correlation coefficient between detectors, selecting and connecting with node V _i M detectors with larger coefficients between them are taken as a continuous boundary relation, and the m detectors are recorded as

A matrix of time-dependent coefficients is constructed, wherein,

v _ik representation and node v _i Establishing the kth (k =1,2, \8230;, m) node of the continuous edge relation，v _im ∈V，

Represents rounding down; p is the proportion of the selected relatively relevant detector nodes, p is the (0, 1), and the constructed traffic road network H = (V, A), wherein

the process of the step 2) is as follows:

wherein

And

respectively representing the nodes v for the original road network and the logically related road network _i Extracting the characteristics of the t-th layer through the characteristics of the GraphSAGE traffic network model;

respectively represent

And

respectively expressed as an original road network and a logic-related road network, and a node v _i A node set with a connection edge relation, namely a neighbor node set; mean is expressed as solving the characteristic mean value of different nodes on different characteristic attributes; CONCAT is expressed as feature merge; sigma ₁ ，σ ₂ Expressed as an activation function; w ₁ ^t ，W ₂ ^t Weight parameters which need to be trained for the model;

And

wherein

the process of the step 3) is as follows:

defining a network model loss function L _G ：

Defining a model loss function L _G Wherein alpha is a reconstruction error coefficient, X is real data of a future traffic state, a loss function is minimized, optimal model parameters are returned finally, a loop iteration back propagation algorithm is adopted, and finally a GraphSAGE traffic network model based on fusion characteristics is reservedThe state data of the access network has an anti-standardization calculation formula as follows:

wherein the content of the first and second substances,

respectively represent the minimum value of the speed of the ith road section,

respectively represent the maximum speed of the ith link, F _i(t+q) The predicted sizes are respectively the speed of the ith road at the (t + q) th moment.