CN115700628A

CN115700628A - Traffic flow prediction method and system containing missing data

Info

Publication number: CN115700628A
Application number: CN202211301263.7A
Authority: CN
Inventors: 金雅妮; 刘彩苹; 谢鲲; 文吉刚; 张大方
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-02-07

Abstract

The invention discloses a traffic flow prediction method containing missing data, which comprises the following steps: acquiring a traffic data set of a certain area, wherein the traffic data set comprises missing data, reconstructing the traffic data set into a traffic flow data matrix, inputting the traffic flow data matrix X into an orthogonal nonnegative matrix decomposition (ONMF) module of a trained space-time prediction model to form K clusters, decomposing GMF module filling data in each cluster by using a generalized matrix of the space-time prediction model to obtain the traffic flow data matrix filled with data

For the filled traffic flow data matrix

Carrying out standardization treatment, and carrying out standardization on the traffic flow data matrix according to the historical step length H and the prediction window W

Modelling as a three-dimensional tensor

Will three-dimensional tensor

And inputting the graph convolution cyclic neural network GCRNN of the trained space-time prediction model to obtain prediction data Y'. The invention has universality in the aspect of traffic prediction of missing data, and learns the space-time characteristics in a finer-grained manner to realize more effective traffic flow prediction.

Description

Traffic flow prediction method and system containing missing data

Technical Field

The invention belongs to the technical field of deep learning and intelligent traffic in artificial intelligence, and particularly relates to a traffic flow prediction method and system with missing data, which are realized by using a Fine-grained filling Graph Convolution neural Network (FCGCRN).

Background

In recent years, with the collection of mass data by sensors and monitoring systems, the prediction task has been widely studied in various fields such as climate, finance and traffic. Traffic prediction is a classic application, is an indispensable component of an Intelligent Transportation System (ITS for short), and plays an important role in alleviating traffic congestion, reducing traffic accidents and improving urban traffic service quality. Predicting future states is crucial for traffic flow prediction given historical traffic flow and existing road information. However, each future traffic flow data depends not only on the historical value of the piece of traffic flow but also on other pieces of traffic flow. Meanwhile, traffic data loss may occur when the sensor collects traffic flow due to network jitter, equipment failure, and the like. Therefore, how to accurately predict the future state of traffic flow containing missing data is a challenging problem.

The existing research on traffic flow prediction mainly comprises three types of algorithms. The first is based on a statistical method, which assumes that each traffic flow is a stationary sequence and adopts a linear algorithm to fit traffic data, such as Historical Average (HA), autoregressive Integrated moving average (ARIMA), gaussian Process (GP), and the like; the second is based on a single neural Network method, which adopts a Recurrent Neural Network (RNN) and a variant Long Short Term Memory (LSTM) and a Gated Recurrent Unit (GRU) thereof, and can process Long-range time series traffic data in a Short time; the third method is based on a hybrid Neural Network method, which fuses a Convolutional Neural Network (CNN) or a Graph convolutional Neural Network (GCN) and a Recurrent Neural Network (RNN) to respectively capture the complex spatial dependency relationship between traffic flows and the long-time dependency relationship between single traffic sequences.

However, the above existing traffic flow prediction methods all have some non-negligible technical problems: firstly, the traditional method and the single neural network method only consider the characteristics of traffic flow data in a time dimension, and do not explicitly model the interdependence relation between different time sequences, so that the prediction performance is low; secondly, the CNN in the hybrid neural network method encapsulates the interaction between traffic flows into a global hidden state, and is limited to processing a regular grid structure to capture spatial correlation, so that the characterization capability of the CNN is weak when processing a non-grid structure spatial relationship, and the prediction accuracy is further influenced; third, GCN in the hybrid neural network approach relies on predefined graphs, making the model less versatile; fourthly, the mixed method is lack of a proper parameter learning mode, so that the space-time correlation cannot be represented in fine granularity, and the prediction precision is further influenced; fifth, the three methods are highly sensitive to the loss of traffic data, which makes the model highly susceptible to noise while learning features, thereby degrading prediction performance.

Disclosure of Invention

Aiming at the defects or the improvement requirements of the prior art, the invention provides a traffic flow prediction method and a traffic flow prediction system containing missing data, and aims to solve the technical problem that the future state of the traffic flow cannot be faithfully reflected due to the fact that the spatial correlation of traffic data cannot be captured by the conventional traffic flow prediction method; and the technical problem that the spatial representation is limited to grid data because the complex spatial correlation under the non-Euclidean space cannot be represented; and lack of versatility of the hybrid neural network approach due to the limitations of the predefined graph affecting the spatial characterization; the technical problem that the traffic flow prediction precision is influenced due to the fact that the traffic flow cannot be represented in a fine-grained mode and the specific mode of the flow node is captured is solved; and the data loss problem caused by network or equipment faults influences the characteristic learning performance, so that the technical problem of influencing the learning time and space correlation of the characteristic learning module is caused.

To achieve the above object, according to one aspect of the present invention, there is provided a traffic flow prediction method including missing data, including the steps of:

(1) Acquiring a traffic data set of a certain area, wherein the traffic data set comprises missing data, and reconstructing the traffic data set into a traffic flow data matrix;

(2) Inputting the traffic flow data matrix X obtained in the step (1) into the trained space-timeForming K clusters by an orthogonal nonnegative matrix decomposition ONMF module of the prediction model, and decomposing and filling GMF module filling data in each cluster by utilizing a generalized matrix of a space-time prediction model to obtain a traffic flow data matrix after the data is filled

For the filled traffic flow data matrix

Carrying out standardization processing to obtain a standardized traffic flow data matrix

And standardizing the traffic flow data matrix according to the historical step length H and the prediction window W

Modelling as a three-dimensional tensor

(3) Modeling the three-dimensional tensor obtained in the step (2)

Inputting the data into a graph convolution cyclic neural network GCRNN of the trained space-time prediction model to obtain prediction data Y'.

Preferably, the traffic data is three-dimensional tensor data { time, node, traffic characteristics }, wherein the time refers to the time when the node acquires the traffic characteristics, the node refers to a single sensor, and the traffic characteristics comprise a vehicle speed characteristic, a traffic flow characteristic and a person number characteristic;

preferably, in step (1), the traffic data set

Wherein

Watch with watchA traffic flow data matrix showing the traffic flow of the nth node in all nodes (namely sensors) arranged on all streets in the area at T moments, n is the [1,N ]]，t∈[1，T]T is any positive integer, N is the total number of sensors arranged on all streets in the area, and

indicating that the data is non-negative;

wherein

For the c characteristic value of the nth node at the t moment, c is the [1,C ]]Wherein C represents a traffic characteristic category.

Preferably, the process of inputting the traffic flow data matrix X obtained in the step (1) into the ONMF module to form K clusters in the step (2) specifically includes:

(2-1) initializing matrix factors F and G of the traffic flow data matrix X to random values within (0,1);

(2-2) initializing the obtained matrix factors F and G according to the step (2-1) and adopting an updating rule

And

updating observable data in the matrix when the X-FG is applied ^T When the error is converged, stopping iteration so as to obtain an updated matrix factor G;

(2-3) clustering the traffic flow data matrix X into K clusters according to the updated matrix factor G obtained in the step (2-2);

preferably, the GMF module is used to fill the traffic flow data matrix X in each cluster in step (2) to obtain a traffic flow data matrix after data filling

For the filled traffic flow data matrix

Modelling as a three-dimensional tensor

This process comprises the following sub-steps:

(2-4) dividing the traffic flow data matrix X into observable and unobservable data sets in each cluster obtained in the step (2-3), and reconstructing the observable data sets to obtain a time vector v _p ∈R ^m A node vector v _q ∈R ^m And traffic flow vector v _y ∈R ^m Reconstructing the non-observable data set to a time vector v' _p ∈R ^m′ And node vector v' _q ∈R ^m′ Wherein m represents a total number of observable data in the observable data set and m' represents a total number of unobservable data in the unobservable data set;

(2-5) v obtained in the step (2-4) _p And v _q The two vectors are input to the embedding layer of the GMF module to obtain the output of the embedding layer, i.e. the time matrix factor P and the node matrix factor Q:

P＝e ₁ (v _p )，

Q＝e ₂ (v _q )，

wherein P ∈ R ^m×a And Q ∈ R ^m×a Time matrix factor and node matrix factor, respectively, a =16 is a latent factor, e ₁ () And e ₂ () Representing embedded functions, which are all the functions of the store.nn.embed () in the Pythrch frame;

(2-6) inputting the time matrix factor P and the node matrix factor Q obtained in the step (2-5) into a decomposition layer of the GMF module to obtain an output result f (P, Q):

f(P，Q)＝P⊙Q，

wherein an element product operation is indicated by an element.

(2-7) inputting the output result f (P, Q) obtained in the step (2-6) into a filling layer of the GMF module, wherein the obtained output is a filling result g (P, Q):

g(P，Q)＝a _ott (W ^T (P⊙Q)+b)，

wherein, a _out For Relu activation function, W and b represent the weight and bias parameters learnable in GMF module, respectively;

(2-8) measuring the filling result of the step (2-7) by using mean square error MSE to obtain the traffic flow vector v of the step (2-4) _y And an error value MSE between the filling result g (P, Q) of step (2-7), which is calculated by the formula:

(2-9) updating the weight W and the bias parameter b which can be learned in the GMF module through an Adam optimizer;

(2-10) repeating the steps (2-8) to (2-9) until the MSE is smaller than the threshold value or the training times reach a preset turn, thereby obtaining a trained GMF module;

(2-11) inputting the unobservable data set obtained in the step (2-4) into the GMF module trained in the step (2-10) to obtain a traffic flow vector v' _y ∈R ^m′ ；

(2-12) obtaining a time vector v under the observable data set according to the step (2-4) _p V, node vector _q Traffic flow vector v _y And a time vector v 'under the non-observable data set' _p And node vector v' _q And the traffic flow vector v 'obtained in the step (2-11)' _y And acquiring the traffic flow data matrix after the data filling

(2-13) standardizing the traffic flow data matrix filled with the data in the step (2-12) to obtain a standardized traffic flow data matrix

Wherein mu is a traffic flow data matrix

A mean value of

Standard deviation of (d);

(2-14) standardizing the traffic flow data matrix obtained in the step (2-13) according to the historical step length H and the prediction window W

Reconstructed as a three-dimensional tensor

And the three-dimensional tensor Y ∈ R ^{(T-H-W+1)×N×W} 。

Preferably, the GCRNN network is trained by the following steps:

(3-1) the three-dimensional tensor obtained in the step (2-14)

And Y are divided into a training set and a test set according to the proportion of 6:4;

(3-2) performing adaptive graph learning on the GCRNN through the parameter E to obtain the adjacency matrix

The calculation formula of the step is as follows:

wherein E ∈ R ^N×e The method is a learnable parameter matrix, a torch.FloatTensor () function in a Pythrch frame is adopted for initializing a parameter E, and the optimal parameter adjusting results of the parameter E are 2 and 10;

(3-3) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

And the adjacency matrix obtained in step (3-2)

Inputting the graph convolution neural network to obtain the graph convolution result H of the Kth cluster _K ∈R ^N×h ；

The calculation formula of the step is as follows:

wherein G is _K ∈R ^N×N Is the Laplace matrix, θ, of the Kth cluster _K ∈R ^H×h And b _K ∈R ^h The learnable parameter of the Kth cluster is h, and the number of neurons of a hidden layer in the graph convolution cyclic neural network is h;

(3-4) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

Input into a recurrent neural network toObtaining a characterization result h at the moment t _t ；

(3-5) outputting h obtained in the step (3-4-4) _t Inputting the data into a two-dimensional convolution layer of a GCRNN network (as shown in FIG. 3) to obtain a final traffic flow prediction result Y' epsilon R ^N×W ；

The specific implementation of this step is as follows:

Y′＝h _t ★f _1×h ，

wherein, the channel number of the two-dimensional convolution layer is 1, the output W =12,f _1×h Represents a convolution kernel size in the two-dimensional convolution layer of (1,h), where h =64;

(3-6) calculating a loss value O (Y, Y ') between the prediction result Y' obtained in the step (3-5) and the tensor Y obtained in the step (2-14) by using an L1 loss function;

specifically, the L1 penalty function used in this step is:

(3-7) utilizing the loss function of step (3-6) and the Adam optimizer in the Pythrch framework for the learnable parameters E of step (3-2) and the learnable parameters of steps (3-4-1) to (3-4-4)

And

carrying out iterative updating;

(3-8) repeating the training process from the step (3-6) to the step (3-7) until the iteration number (100 in the invention) of the step (3-7) or the loss value O (Y, Y') of the step (3-6) is less than a set threshold value, and finishing the training so as to obtain a preliminarily trained GCRNN model;

and (3-9) verifying the GCRNN model preliminarily trained in the step (3-8) by using the test set obtained in the step (3-1) until the prediction error is optimal, thereby obtaining a trained space-time prediction model.

Preferably, step (3-4) comprises the sub-steps of:

(3-4-1) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 Input to GCRNN to obtain updated gate z at time t _t ∈R ^N×h ；

Specifically, the calculation formula in this step is:

wherein h is ₀ ∈R ^N×h Represents an initial state, is a matrix of all 0 s, [,]it is shown that the operation of splicing,

and

is the updating gate z at the time t in the Kth cluster _t σ (-) is a sigmoid activation function;

(3-4-2) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 Input GCRNN to obtain reset gate r at t _t ∈R ^N×h ；

The calculation formula of the step is as follows:

wherein the content of the first and second substances,

and

is reset gate r at time t in the Kth cluster _t A learnable parameter of (c);

(3-4-3) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The reset door r obtained in the step (3-4-2) _t The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 Input to GCRNN to obtain transmission state at t moment

The calculation formula of the step is as follows:

wherein, l represents an element product,

and

is the transmission state of the Kth cluster at time t

A learnable parameter of (c);

(3-4-4) updating the door z obtained in the step (3-4-1) _t The characterization result at the last time t-1 and the transmission state obtained in the step (3-4-3)

Inputting the result into GCRNN to obtain a characterization result h at the current time t _t ∈R ^N×h ；

The calculation formula of the step is as follows:

wherein z is _t ⊙h _t-1 Representing information h for the last time t-1 _t-1 The selective forgetting is carried out, and the selective forgetting is carried out,

indicating the information including the current time t

Performing selective memory.

According to another aspect of the present invention, there is provided a traffic flow prediction system including missing data, comprising:

the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring a traffic data set of a certain area, the traffic data set comprises missing data, and the traffic data set is reconstructed into a traffic flow data matrix;

a second module for connectingInputting the traffic flow data matrix X obtained by the first module into an orthogonal nonnegative matrix decomposition ONMF module of a trained space-time prediction model to form K clusters, decomposing and filling GMF module filling data in each cluster by utilizing a generalized matrix of the space-time prediction model to obtain a traffic flow data matrix after the data is filled

For the filled traffic flow data matrix

Modelling as a three-dimensional tensor

A third module for modeling the second module to obtain a three-dimensional tensor

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) Because the invention adopts the step (3), GCN and GRU are fused to form a Graph Convolution Recurrent Neural Network (GCRNN for short), the module can approximate GCN to capture spatial correlation by utilizing Chebyshev polynomial, and the mode can improve the calculation performance; meanwhile, a GRU model is adopted to enable the GRU model to selectively memorize key characteristics so as to learn long-range time characteristics. Therefore, the technical problem that the future state of the traffic flow cannot be faithfully reflected due to the fact that only the time relation is represented by the existing method can be solved;

(2) Since the step (3) is adopted in the invention, when the spatial correlation is captured, the adopted GCN depends on graph Laplacian decomposition to process irregular graph data, so that the technical problem that the spatial representation is limited to regular data can be solved;

(3) Because the invention adopts the step (3), the graph learning network is designed, and the graph structure can be learned in a data-driven mode instead of relying on the predefined graph structure, thereby solving the technical problem of poor universality of the hybrid neural network method;

(4) As the invention adopts the step (2), a cluster parameter learning mechanism based on Orthogonal non-negative Matrix Factorization (ONMF) is designed, and the step (3) is combined to learn the specific parameters between clusters in the clusters and capture the space-time dependency relationship of the traffic flow sequence with fine granularity. The mechanism enables traffic flow data from the same cluster to have a common parameter space, and traffic flows from different clusters to have independent parameter spaces, so that the technical problem that the traffic flow prediction precision is influenced due to the fine-grained representation problem is solved;

(5) Because the invention adopts the step (2), a Generalized Matrix Factorization filling module (GMF for short) is designed, and the invention is a simple and efficient deep learning method, and fills the lost traffic flow sequence by learning implicit cross correlation. The module learns the time node interaction function in each traffic flow cluster in a mode of element product rather than inner product, not only inherits the advantage of matrix decomposition, but also fully excavates the nonlinear intrinsic correlation of the traffic flow in different clusters, thereby making up the technical problem that the characteristic learning performance is influenced by the data loss problem caused by network or equipment faults.

(6) Because the invention adopts the cluster parameter learning mechanism in the step (2), the parameter quantity of the parameter learning mode is greatly reduced compared with the node-specific mode by controlling the number of clusters, namely the relationship between the fine-grained characterization learning and the parameter quantity is balanced; the clustering method in the module is well suitable for missing traffic data, and the number of nodes in each cluster is relatively balanced.

(7) Because the invention adopts the step (2), the influence of missing data on the model is avoided by only updating observable data; meanwhile, the ONMF module in the step (2) has uniqueness and cluster interpretability of solution: the unique performance of the solution can ensure the stability of the algorithm; meanwhile, the clustering module in the step (2) has the interpretability due to the equivalence between the clustering module and the K-means method.

(8) The ONMF module, the GMF module and the GCRNN module are independent and can be used independently or jointly to adapt to the existing space-time data prediction model so as to improve the prediction performance.

Drawings

FIG. 1 illustrates a multi-modal characterization of a traffic data set;

FIG. 2 is a framework of the spatio-temporal prediction model of the present invention;

FIG. 3 is a block diagram of a GCRNN module designed by the present invention;

FIG. 4 is an ablation experiment of the spatiotemporal prediction model FCGCRN of the present invention on the PEMS04 data set with a deletion rate of 10%, wherein FIG. 4 (a), FIG. 4 (b), and FIG. 4 (c) are the results of the experiment on three error measures, mean absolute error MAE, root mean square error RMSE, and mean absolute percentage error MAPE, respectively;

FIG. 5 is an ablation experiment of the spatiotemporal prediction model FCGCRN of the present invention on the PEMS08 data set with a deletion rate of 10%, wherein FIG. 5 (a), FIG. 5 (b), and FIG. 5 (c) are the results of the experiment on three error measures, mean absolute error MAE, root mean square error RMSE, and mean absolute percentage error MAPE, respectively;

FIG. 6 is an ablation experiment of the spatiotemporal prediction model FCGCRN of the present invention on the PEMS04 data set with a deletion rate of 30%, wherein FIG. 6 (a), FIG. 6 (b), and FIG. 6 (c) are the results of the experiment on three error measures, mean absolute error MAE, root mean square error RMSE, and mean absolute percentage error MAPE, respectively;

FIG. 7 is an ablation experiment of the spatiotemporal prediction model FCGCRN of the present invention on a PEMS08 data set with a deletion rate of 30%, wherein FIG. 7 (a), FIG. 7 (b), and FIG. 7 (c) are the results of the experiment on three error measures, mean absolute error MAE, root mean square error RMSE, and mean absolute percentage error MAPE, respectively;

fig. 8 is a parametric analysis experiment of the spatio-temporal prediction model FCGCRN of the present invention on the cluster parameter K, wherein fig. 8 (a), fig. 8 (b), fig. 8 (c) and fig. 8 (d) are analysis experiments of the FCGCRN on four data sets PEMS04 (MR = 10), PEMS08 (MR = 10), PEMS04 (MR = 30) and PEMS08 (MR = 30), respectively, measured in terms of mean absolute percentage error MAPE and root mean square error RMSE;

fig. 9 is a flowchart of a traffic flow prediction method including missing data according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The basic idea of the invention is to provide a traffic flow prediction method and system containing missing data, which utilizes orthogonal nonnegative matrix decomposition to cluster traffic data sets, adopts generalized matrix decomposition to fill the missing traffic data under each cluster, adopts a self-adaptive mode to learn a graph and fuses a graph convolution recurrent neural network to represent time and space characteristics in the traffic data in a fine granularity mode by sharing a specific parameter learning mechanism between clusters in a cluster, and finally completes prediction of all traffic data sets by using a two-dimensional convolution layer.

As shown in fig. 9, the present invention provides a traffic flow prediction method containing missing data, which specifically includes the following steps:

specifically, in this step, traffic data (which is three-dimensional tensor data) including missing data of each street in a certain area is acquired by using a sensor arranged on the street, and a traffic data set is formed by the traffic data including the missing data of all the streets, and then data dimension reduction processing is performed on the traffic data set to acquire a traffic flow data matrix.

The three-dimensional tensor data refers to time, nodes, traffic characteristics. The time refers to the time when the node acquires the traffic characteristics, the node refers to a single sensor, and the traffic characteristics comprise a vehicle speed characteristic, a traffic flow characteristic and a person number characteristic.

In this step, a traffic data set

Wherein

It represents the traffic flow data matrix of the nth node in all nodes (i.e. sensors) set on all streets in the area at T moments, n is [1,N ]]，t∈[1，T]T is any positive integer, N is the total number of sensors arranged on all streets in the area, and

indicating that the data is non-negative;

wherein

For the c characteristic value of the nth node at the t moment, c is the [1,C ]]Where C represents a traffic characteristic category, the present invention has 3 traffic characteristics (vehicle speed characteristic, traffic flow characteristic, and head count characteristic), and thus C =3;

after obtaining the traffic data set Z, the invention sets C to 1, and realizes data dimension reduction through Numpy. Squeeze () function of Numpy library in Python, and the obtained traffic flow data matrix is

(2) Inputting the traffic flow data Matrix X obtained in the step (1) into an Orthogonal non-negative Matrix Factorization (ONMF) module of a trained space-time prediction model to form K clusters (see the following steps (2-1) to (2-3)) and filling data in each cluster by utilizing a Generalized Matrix Factorization (GMGMGMGMF) module of the space-time prediction model to obtain the traffic flow data Matrix after data filling

(see the following steps (2-4) to (2-12)) for the filled traffic flow data matrix

(see the following steps (2-13)) and based on the history step length H and the prediction window W, the normalized traffic flow data matrix is formed

Modelling as a three-dimensional tensor

(see the following steps (2-14) for details);

specifically, the ONMF module mentioned in step (2) is the first part shown in fig. 2. The module is a clustering module and is used for clustering the traffic flow data matrix obtained in the step (1) into K clusters; the ONMF is characterized in that an orthogonal constraint condition is added on the basis of matrix decomposition, and formalized expression is as follows: min _{F≥0，G≥0} ||X-FG ^T || ² ，s.t.G ^T G = I, wherein,

is a traffic flow data matrix (R represents a real number, where X isA non-negative real matrix of T rows and N columns),

and

are two matrix factors of the traffic flow data matrix X,

is an identity matrix, and the value range of K is 2 to 5, preferably 2 or 3;

the GMF module is the second part shown in fig. 2, i.e. the filler module of the present invention, comprising an embedding layer, a decomposition layer and a filler layer. And (4) according to the clustering result obtained in the step (2-3), filling data of the missing part in each cluster by using a GMF module according to observable data.

The process of inputting the traffic flow data matrix X obtained in the step (1) into the ONMF module to form K clusters in the step (2) specifically comprises the following steps:

And

and (2) because the traffic flow data matrix obtained in the step (1) is missing, the ONMF module only uses observable data in the traffic flow data matrix.

The advantage of this step is that only using observable data in the traffic stream data matrix can avoid missing data (i.e. unobservable data) from affecting the clustering result. Meanwhile, the matrix factors F and G obtained by adopting the updating rule of the step have uniqueness of solution, namely, the clustering results of each time are kept consistent under the condition that the parameters are consistent.

the advantage of this step is that it is interpretable, i.e. there is equivalence with the K-means clustering algorithm. The concrete expression is as follows: if the kth probability of a certain node is the maximum, the node belongs to the kth cluster; in this way, the traffic flow data matrix is clustered into K clusters. Meanwhile, the data volume proportion of the K clusters obtained in the step is balanced, and favorable conditions are provided for better representing the space-time correlation subsequently. The detailed process of the clustering traffic flow data matrix of the ONMF module in the steps (2-1) to (2-3) is followed by continuously adjusting the parameter K to ensure that the parameter K is fused with the GCRNN module to achieve the optimal model, namely the prediction error is minimized;

filling a traffic flow data matrix X in each cluster by using a GMF module in the step (2) to obtain a traffic flow data matrix after data filling

For the filled traffic flow data matrix

Modelling as a three-dimensional tensor

This process comprises the following sub-steps:

(2-4) dividing the traffic flow data matrix X into observable data sets (i.e., sets of non-missing data in the traffic flow data matrix X) and non-observable data sets in each cluster obtained in the step (2-3)Observable data sets (i.e., sets of missing data in the traffic flow data matrix X) are reconstructed to obtain a time vector v _p ∈R ^m A node vector v _q ∈R ^m And traffic flow vector v _y ∈R ^m Reconstructing the non-observable data set to a time vector v' _p ∈R ^m′ And node vector v' _q ∈R ^m′ Wherein m represents a total number of observable data in the observable data set and m' represents a total number of unobservable data in the unobservable data set;

(2-5) v obtained in the step (2-4) _p And v _q The two vectors are input to the embedding layer of the GMF module to obtain the output of the embedding layer, i.e., the time matrix factor P and the node matrix factor Q (shown in fig. 2):

P＝e ₁ (v _p )，

Q＝e ₂ (v _q )，

wherein P ∈ R ^m×a And Q ∈ R ^m×a Time matrix factor and node matrix factor, respectively, a =16 is a latent factor, e ₁ () And e ₂ () Representing embedded functions, wherein the embedded functions are all the store.nn.embedding () functions in a Pythrch frame, and parameters of the two embedded functions are different in the specific implementation process;

f(P，Q)＝P⊙Q，

wherein an element product operation is indicated by an element.

The method has the advantages that the mode of replacing the inner product of the matrix with the element product in the decomposition layer inherits the advantages of matrix decomposition and fully excavates the nonlinear intrinsic correlation of data in different clusters.

g(P，Q)＝a _out (W ^T (P⊙Q)+b)，

wherein, a _out Is Relu laserThe living function, W and b represent the learnable weight and bias parameters in the GMF module, respectively;

the step has the advantages that the filling layer is a simple and efficient neural network, and missing traffic data can be filled by the layer.

(2-8) measuring the filling result in the step (2-7) by adopting Mean Square Error (MSE) to obtain the traffic flow vector v in the step (2-4) _y And an error value MSE between the filling result g (P, Q) of step (2-7), which is calculated by the formula:

(2-10) repeating the above steps (2-8) to (2-9) until the MSE is less than the threshold (10 in the present invention) ^-6 ) Or the training times reach the preset turns (100 in the invention), so as to obtain the well-trained GMF module;

At this point, the data fill job is complete.

(2-12) obtaining a time vector v under the observable data set according to the step (2-4) _p A node vector v _q Traffic flow vector v _y And a time vector v 'under the non-observable data set' _p And node vector v' _q And the traffic flow vector v 'obtained in the step (2-11)' _y And acquiring the traffic flow data matrix after the data filling

(2-13) standardizing the traffic flow data matrix filled with data in the step (2-12) to obtain a standardized traffic flow data matrix

Wherein mu is a traffic flow data matrix

Has a mean value of

Standard deviation of (d);

the advantage of this step lies in first: eliminating data dimension to improve the convergence rate of the GCRNN module so as to reduce the calculation cost; and secondly, the situation of gradient explosion in the module training process is prevented.

Reconstructed as a three-dimensional tensor

And the three-dimensional tensor Y ∈ R ^{(T-H-W+1)×N×W} ；

The space-time prediction model is a Fine-grained packed Graph Convolution Recurrent neural Network (FCGCRN for short), and comprises an ONMF module, a GMF module and a GCRNN module.

(3) Modeling the three-dimensional tensor obtained in the step (2)

The prediction data Y' is obtained by inputting the data into a Graph Convolution Recurrent Neural Network (GCRNN for short) of the trained spatio-temporal prediction model, which has a low error rate.

As shown in fig. 3, the GCRNN of the present invention fuses a graph learning neural network, a graph convolution neural network, and a recurrent neural network under a cluster parameter learning mechanism.

The GCRNN network is obtained by training the following steps:

(3-1) subjecting the three-dimensional tensor obtained in the step (2-14) to

The calculation formula of the step is as follows:

wherein E ∈ R ^N×e Is a parameter matrix which can be learnt, and the initialization of the parameter E adopts a torch.FloatTensor () function in a Pythtorch frame; in the invention, the optimal parameter adjusting result of the parameter e is 2 and 10;

(3-3) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

And the adjacency matrix obtained in step (3-2)

The calculation formula of the step is as follows:

wherein, G _K ∈R ^N×N Is the Laplace matrix of the Kth cluster, θ _K ∈R ^H×h And b _K ∈R ^h For the learnable parameter of the kth cluster, h is the neuron number of the hidden layer in the graph convolution recurrent neural network.

Further, the invention adopts Chebyshev polynomial expansion

Approximate Laplace matrix G _K . Wherein the content of the first and second substances,

in the concrete implementation, T ₀ ＝1，

In the invention, through parameter adjustment, the optimal prediction effect is finally determined when the polynomial parameter d is 2 and the cluster parameter K is 2 or 3.

The advantage of this step lies in first: the Chebyshev polynomial is used for approximating the Laplace matrix, so that high-dimensional features can be represented with lower calculation cost; secondly, the method comprises the following steps: the graph convolution neural network adopted can be used for powerfully characterizing the pair space correlation (comprising Euclidean space and non-Euclidean space).

(3-4) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

Inputting into a recurrent neural network (which is used for characterizing the time dependence) to obtain a characterization result h at the time t _t ；

Specifically, the invention fuses the graph convolution neural network into the cyclic neural network to form the GCRNN, so as to jointly represent the time and space correlation, and the multi-layer perceptron MLP in the cyclic neural network GRU is replaced by the graph convolution network GCN. The method comprises the following concrete steps:

(3-4-1) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 (see step (3-4-4) below for details) is input to the GCRNN to obtain the refresh gate z at time t _t ∈R ^N×h ；

Specifically, the calculation formula in this step is:

wherein h is ₀ ∈R ^N×h Representing an initial state, is a matrix of all 0 s, [,]it is shown that the operation of splicing,

and

(3-4-2) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 (see step (3-4-4) below for details) input GCRNN to get reset gate r at time t _t ∈R ^N×h ；

The calculation formula of the step is as follows:

wherein, the first and the second end of the pipe are connected with each other,

and

is reset gate r at time t in the Kth cluster _t A learnable parameter of (c);

(3-4-3) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The reset gate r obtained in the step (3-4-2) _t The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 (see the following step (3-4-4)) is input to the GCRNN to obtain the transmission status at time t

The calculation formula of the step is as follows:

therein, <' >The product of the elements is expressed by,

and

is the transmission state of the Kth cluster at time t

A learnable parameter of (c);

The calculation formula of the step is as follows:

indicating the information containing the current time t

Performing selective memory;

in the steps (3-4-1) to (3-4-4), feature learning is carried out at a certain time T, in the specific implementation, long-time correlation is represented by iterating information at T times, and the future traffic flow is predicted by utilizing the accumulated representation result at the last time.

In addition, due to the excellent characterization capability of the graph convolution neural network, the number of layers of the graph neural network in the invention is only 2, and a lower error result can be achieved, namely the number of layers of the GCRNN is 2.

The above steps (3-1) to (3-4) have advantages in that first: temporal and spatial correlations in traffic prediction data are captured in a fine-grained manner and high-precision predictions are achieved. In the prior art implementation, the parameter matrix θ is shared by all nodes (i.e., roads), however, in traffic prediction, not all nodes adopt the same mode. As shown in fig. 1, road 1 exhibits an early peak mode,

roads

2 and 4 exhibit a late peak mode, and road 3 exhibits an early-late peak mode. Therefore, the invention adopts a specific parameter learning mode between cluster sharing and cluster to represent the space-time correlation in fine granularity, namely the GCRNN module is realized in K clusters. Secondly, the method comprises the following steps: compared with other recurrent neural networks, the adopted recurrent neural network GRU can learn the characteristics in a long time range with less parameters and lower calculation cost, and the learning capability of the GRU is equivalent to that of other recurrent neural networks.

The specific implementation of this step is as follows:

Y′＝h _t ★f _1×h ，

wherein the number of channels of the two-dimensional convolutional layer is 1, the output W =12,f _1×h The convolution kernel size in the two-dimensional convolution layer is (1,h), and the prediction effect is best when h =64 in the invention;

(3-6) calculating a loss value O (Y, Y ') between the prediction result Y' obtained in the step (3-5) and the tensor Y (which refers to the training set) obtained in the step (2-14) by using an L1 loss function;

specifically, the L1 penalty function used in this step is:

(3-7) Using the loss function of step (3-6) and the Adam optimizer in the Pythrch framework for the learnable parameters E of step (3-2) and the learnable parameters of steps (3-4-1) to (3-4-4)

And

carrying out iterative updating;

(3-8) repeating the training process from the step (3-6) to the step (3-7) until the number of iterations of the step (3-7) (100 in the present invention) or the loss value O (Y, Y') of the step (3-6) is less than the set threshold value (10 in the present invention) ^-6 ) Finishing training to obtain a preliminarily trained GCRNN model;

and (3-9) verifying the GCRNN model preliminarily trained in the step (3-8) by using the test set obtained in the step (3-1) until the prediction error is optimal, thereby obtaining the trained space-time prediction model.

In summary, through the above description of the present invention, the main advantages of the present invention include:

1. the traffic flow prediction method containing the missing data can fill the traffic flow data containing the missing data, represents the complex and long-range space-time correlation of the historical traffic flow in a fine granularity manner, and realizes high-precision prediction of the future traffic flow state.

2. By dividing the traffic flow data, the GCRNN can adopt a specific parameter learning mechanism between intra-cluster sharing and clusters to extract the characteristics of the traffic flow data. Clustering adopts an orthogonal constrained non-negative matrix factorization algorithm ONMF, the algorithm is well adapted to missing traffic flow data, and only iteration is carried out on observable data. Moreover, the number of nodes of each cluster in the clustering result is uniformly distributed, and favorable conditions are provided for better representing the space-time relationship in the follow-up process. In addition, clustering is realized before data filling, so that a clustering structure is accurate and reliable.

3. And the generalized matrix decomposition module GMF is adopted to realize missing filling in each cluster, so that the characteristic of high correlation of the clusters is effectively utilized, and meanwhile, the nonlinear relation among data is learned by adopting a simple neural network.

4. The invention adopts the graph convolution neural network to learn the spatial correlation among data, the graph structure is not limited by Euclidean space, and the spatial relationship can be well represented. Meanwhile, the graph convolution neural network adopts a Chebyshev polynomial to approximate the traditional graph convolution, and the calculation cost is reduced under the condition of ensuring the representation effect. The invention replaces the predefined graph with adaptive graph learning in the GCRNN module, so that the graph depends on data and not on a predefined structure.

5. The traffic flow data is modeled into tensors and is subjected to standardization processing, so that the difference between data scales can be effectively eliminated, and the influence on the representation is eliminated.

Results of the experiment

The invention performs experiments on two real traffic flow data sets of PEMS04 and PEMS08, which contain 10% and 30% of Missing Rate (MR). The PEMS04 data set has 307 nodes, 16992 time steps and 59 days of total time; the PEMS08 data set has 170 nodes, 17856 time steps and total time duration of 62 days. Both data sets were 5 minutes as a time step.

The validity and accuracy of the traffic flow prediction method with missing data provided by the invention are verified through comparative experiments on real data sets, and the method is shown in the following tables 1 and 2 and fig. 4 to 7. The spatio-temporal prediction model FCGCRN is compared with other eight reference methods (baselines), including HA, ARIMA, logistic Regression (LR for short), LSTM, DCRNN, MTGNN, DMSTGCN and STGODE (the methods refer to English letters directly in other papers without Chinese characters), and three measurement indexes are adopted: mean Absolute Error MAE, root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). The smaller the error metric index, the better the prediction performance for missing data. Tables 1-2 show the results of the experiments at two deletion rates and two real data sets, and it can be seen from the tables that the prediction performance of the spatio-temporal prediction model FCGCRN is superior to other baselines under both the MAE and RMSE indexes and superior to most benchmark methods (baselines) under the MAPE index. The performance of the spatio-temporal prediction model FCGCRN is more outstanding under the condition of high deletion rate. As can be seen from Table 1, the prediction performance of the deep learning methods (LR, LSTM, DCRNN, DMSTGCN and STGODE) is superior to the conventional statistical methods (HA and ARIMA). It can be further seen that the adaptive image learning models (MTGNN and STGODE) are greatly affected by data missing, and in contrast, the model of the present invention is also an image learning model, but the performance is more stable under the condition that data is missing and the missing rate is large.

The ablation experiments of fig. 4-7 demonstrate the effectiveness of the key modules of the present invention. The GMF ensures the integrity of the data by filling missing data; the cluster parameter learning mechanism ensures the fine granularity of the model through an intra-cluster sharing-inter-cluster parameter specific mode; the data-driven graph learning mode in the GCRNN module enables the model to be more universal in the space-time sequence prediction task. FIG. 8 shows an analysis of the number of key parameter clusters K of the present invention that determine the diversity of parameters in the GCRNN module and affect the number of parameters. As can be seen, the model works best when K is 2 or 3, indicating that GCRNN learns class 2 to 3 specific parameters. The results of this experiment correspond to traffic sequences with early peak, late peak and early-late peak patterns. In conclusion, the method has the advantages of high stability, fine granularity and universality on the space-time prediction task containing the missing data.

TABLE 1 comparative experimental results of the spatio-temporal prediction model FCGCRN of the present invention and other eight baseline methods (baselines) on the data sets of PEMS04 and PEMS08 (loss rate 10%)

TABLE 2 comparative experimental results of the spatio-temporal prediction model FCGCRN of the present invention and other eight baseline methods (baselines) on the data sets of PEMS04 and PEMS08 (loss rate 30%)

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A traffic flow prediction method containing missing data is characterized by comprising the following steps:

(2) Inputting the traffic flow data matrix X obtained in the step (1) into an orthogonal nonnegative matrix decomposition ONMF module of a trained space-time prediction model to form K clusters, decomposing and filling GMF module filling data in each cluster by utilizing a generalized matrix of the space-time prediction model to obtain a traffic flow data matrix after the data is filled

For the filled traffic flow data matrix

Modelling as a three-dimensional tensor

(3) Obtained by modeling in step (2)Three-dimensional tensor

2. The method for predicting the traffic flow containing the missing data according to claim 1, wherein the traffic data is three-dimensional tensor data { time, node, traffic characteristics }, wherein the time refers to the time when the node acquires the traffic characteristics, the node refers to a single sensor, and the traffic characteristics comprise a vehicle speed characteristic, a traffic flow characteristic and a people number characteristic.

3. The method for predicting the traffic flow containing the missing data according to claim 1 or 2, wherein in the step (1), the traffic data set

Wherein

indicating that the data is non-negative;

wherein

For the c characteristic value of the nth node at the t moment, c is the [1,C ]]Wherein C represents trafficA class of features.

4. The method for predicting the traffic flow containing the missing data according to any one of claims 1 to 3, wherein the process of inputting the traffic flow data matrix X obtained in the step (1) into the ONMF module to form K clusters in the step (2) specifically comprises:

And

updating observable data in the matrix when the X-FG ^T When the error is converged, stopping iteration so as to obtain an updated matrix factor G;

and (2-3) clustering the traffic flow data matrix X into K clusters according to the updated matrix factor G obtained in the step (2-2).

5. The method for predicting the traffic flow containing the missing data according to claim 4, wherein the GMF module is used to fill the traffic flow data matrix X in each cluster in the step (2) to obtain the traffic flow data matrix after data filling

For the filled traffic flow data matrix

Modelling as a three-dimensional tensor

This process comprises the following sub-steps:

P＝e ₁ (v _p )，

Q＝e ₂ (v _q )，

f(P，Q)＝P⊙Q，

wherein an element product operation is indicated by an element.

g(P，Q)＝a _out (W ^T (P⊙Q)+b)，

(2-12) according to the time vector v under the observable data set obtained in the step (2-4) _p A node vector v _q Traffic flow vector v _y And a time vector v 'under the non-observable data set' _p And node vector v' _q And the traffic flow vector v 'obtained in the step (2-11)' _y And acquiring the traffic flow data matrix after the data filling

Wherein mu is a traffic flow data matrix

Has a mean value of

The standard deviation of (a);

Reconstructed as a three-dimensional tensor

And the three-dimensional tensor Y ∈ R ^{(T-H-W+1)×N×W} 。

6. The method for predicting the traffic flow containing the missing data according to claim 5, wherein the GCRNN network is obtained by training through the following steps:

(3-1) subjecting the three-dimensional tensor obtained in the step (2-14) to

And Y are divided into a training set and a test set according to the ratio of 6: 4;

The calculation formula of the step is as follows:

(3-3) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

And the adjacency matrix obtained in step (3-2)

The calculation formula of the step is as follows:

wherein G is _K ∈R ^N×N Is the Laplace matrix of the Kth cluster, θ _K ∈R ^H×h And b _K ∈R ^h The learnable parameter of the Kth cluster is h, and the number of neurons of a hidden layer in the graph convolution cyclic neural network is h;

(3-4) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

Inputting into a recurrent neural network to obtain a characterization result h at the moment t _t ；

(3-5) step (3-4)) The resulting output h _t Inputting the data into a two-dimensional convolution layer of a GCRNN network (as shown in FIG. 3) to obtain a final traffic flow prediction result Y' epsilon R ^N×W ；

The specific implementation of this step is as follows:

Y′＝h _t ★f _1×h ，

specifically, the L1 penalty function used in this step is:

And

carrying out iterative updating;

7. The traffic flow prediction method containing missing data according to claim 6, characterized in that step (3-4) includes the following substeps:

(3-4-1) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The adjacency matrix obtained in step (3-2)

And the characterization result h at the previous time t-1 _t-1 Input to GCRNN to obtain updated gate z at time t _t ∈R ^N ^×h ；

Specifically, the calculation formula in this step is:

and

(3-4-2) subjecting the three-dimensional tensor obtained in the step (3-1) to

Data of training set at time t

The adjacency matrix obtained in step (3-2)

The calculation formula of the step is as follows:

wherein the content of the first and second substances,

and

is reset gate r at time t in the Kth cluster _t A learnable parameter of (c);

(3-4-3) subjecting the three-dimensional tensor obtained in the step (3-1)

Data of training set at time t

And the characterization result h at the previous time t-1 _t-1 Input to GCRNN to obtain transmission state at time t

The calculation formula of the step is as follows:

wherein, l represents an element product,

and

is the transmission state of the Kth cluster at time t

A learnable parameter of (c);

The calculation formula of the step is as follows:

indicating the information containing the current time t

Is selectedAnd (6) selectively memorizing.

8. A traffic flow prediction system including missing data, comprising:

a second module for inputting the traffic flow data matrix X obtained by the first module into the orthogonal nonnegative matrix decomposition ONMF module of the trained space-time prediction model to form K clusters, and decomposing and filling GMF module filling data in each cluster by utilizing the generalized matrix of the space-time prediction model to obtain the traffic flow data matrix after filling data

For the filled traffic flow data matrix

Modelling as a three-dimensional tensor

A third module for modeling the three-dimensional tensor obtained by the second module

Inputting the graph convolution cyclic neural network GCRNN of the trained space-time prediction model to obtain prediction data Y'.