CN114822025B

CN114822025B - Traffic flow combined prediction method

Info

Publication number: CN114822025B
Application number: CN202210417752.2A
Authority: CN
Inventors: 殷礼胜; 吴洋洋; 刘攀
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2023-04-18
Anticipated expiration: 2042-04-20
Also published as: CN114822025A

Abstract

The invention discloses a traffic flow combined prediction method, which comprises the following steps: 1. decomposing the traffic flow time sequence of the prediction node and the first-order neighborhood node thereof into a series of relatively stable high-low frequency component sequences by using a traffic flow variation modal decomposition algorithm improved based on mutual information entropy; 2. using a traffic flow spatial correlation model based on the graph attention network, taking each decomposed component sequence as characteristic input of the graph attention network, and weighting each component sequence by using an attention coefficient and then outputting the weighted component sequence; 3. and (3) using a gated cycle unit network-based traffic flow time dependency model, using the weighted component sequence of each node as the characteristic input of the gated cycle unit network, iteratively training each parameter of the model to be optimal by an improved RMSProp traffic flow optimization algorithm, predicting each component by using the model, and superposing and summing the predicted values to obtain a final prediction result. The invention can effectively predict the traffic flow and improve the prediction precision.

Description

Traffic flow combined prediction method

Technical Field

The invention relates to a traffic flow combination prediction method, in particular to a traffic flow combination prediction method based on improved variation modal decomposition, a graph attention network and a gated circulation unit network. Belonging to the intelligent traffic prediction field.

Background

With the development of urbanization and digitization, an intelligent traffic system is constructed, travel services with cooperation of automatic driving and vehicle roads are developed, and the popularization of road intelligent management and traffic signal linkage becomes a key field. And accurate and timely traffic flow prediction is one of the bases for realizing the management and control of the intelligent traffic system.

The current traffic flow prediction methods are mainly summarized into two types: the method is characterized in that the method comprises the following steps that firstly, traditional statistical learning methods such as moving average autoregression, support vector regression, kalman filtering, hidden Markov models and the like are mostly based on the assumption of stability of a time sequence and need to rely on complex manual feature processing, and a traffic flow time sequence belongs to a nonlinear non-stable sequence; and the other is a feedforward neural network method, such as a multilayer perceptron, a radial basis function neural network and the like, which can only simply simulate the nonlinear change in traffic data, cannot accurately and effectively capture the short-term and long-term time dependence of the traffic data, has low prediction precision, and has slow training iteration speed when processing a large amount of traffic flow data. However, the above two types of prediction methods do not consider the influence of the spatial correlation of the traffic flow in the real complex traffic network on the prediction precision, and the model method has limited expression capability.

Based on the analysis, the traffic flow has the characteristics of complex non-stationarity, spatial correlation, time dependence and the like, and the single prediction model and the single prediction method have the limitations in the aspects of specific application conditions, prediction accuracy and the like.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a traffic flow combination prediction method, so that the characteristics of the complex non-stationarity, the spatial correlation and the time dependence of the traffic flow are considered, and the prediction precision and the speed of the traffic flow can be effectively improved.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention relates to a traffic flow combination prediction method which is characterized by comprising the following steps:

step 1: the traffic flow variation modal decomposition is improved based on mutual information entropy:

step 1.1: the mutual information entropy I (Y; Z) is defined by formula (1):

I(Y；Z)＝H(Y)+H(Z)-H(Y,Z) (1)

in formula (1): y represents a set of traffic flow time series, and Y = (Y) ₁ ,y ₂ ,···,y _i ,···,y _N )，y _i Represents the ith traffic flow data, Z represents another group of traffic flow time series, and Z = (Z) ₁ ,z ₂ ,···,z _i ,···,z _N )，z _i Represents the ithTraffic flow data, N is the time series length; h (-) represents the entropy of the edge information, and H (-) represents the entropy of the joint information; p (-) represents an edge probability density function, p (-) represents a joint probability density function, and has:

in formula (2): p (y) _i ) And p (z) _i ) Respectively represent the ith traffic flow data y _i And z _i The edge probability density function of (a); p (y) _i ,z _i ) Indicating the ith traffic flow data y _i And z _i A joint probability density function of (a);

step 1.2: an augmented Lagrangian objective function L (u) is defined using equation (3) _k ,ω _k ,λ)：

In formula (3): u. of _k (t) represents the k-th modal component of the traffic flow time series in the time domain t, and u _k (t)＝A _k (t)cos(φ _k (t)), wherein A _k (t) denotes the k-th modal component u _k (t) amplitude in the time domain t, [ phi ] _k (t) denotes the k-th modal component u _k (t) instantaneous phase in the time domain t; omega _k (t) denotes the k-th modal component u _k (t) instantaneous frequency in the time domain t, and ω _k (t)＝φ _k ' (t); λ (t) represents the lagrange multiplier in the time domain t; alpha represents a secondary penalty term;

represents a partial differential over the time domain t; δ (t) represents a unit impulse function in the time domain t; j represents an imaginary number; omega _k Representing the k-th modal component u in the time domain t _k (t) transforming to a center frequency in the frequency domain ω; * Representing a convolution operator; f (t) represents a traffic flow time sequence Y or Z in a time domain t;

step 1.3: defining n as the number of search iterations and initializing to n =0;

step 1.4: initializing the kth modal component of the n +1 th iteration of the traffic flow time sequence Y or Z in the time domain t

Is greater than or equal to>

Non-negative phase/phase->

Non-decreasing, instantaneous frequency->

Is phase->

A derivative of (a); lagrange multiplier lambda of n +1 th iteration ⁿ⁺¹ (t); defining the number of the modes as K;

step 1.5: respectively obtaining the kth modal component of the n +1 th iteration of the traffic flow time sequence Y or Z under the frequency domain omega by using the formula (4) and the formula (5)

And a central frequency->

In the formulae (4) and (5),

and &>

Respectively in the time domain t

f(t)、/>

And λ ⁿ (t) is transformed by Fourier transform into the frequency domain ω, wherein>

Represents the kth 'modal component of the (n + 1) th iteration in the time domain t, <' > is selected>

Represents the kth "modal component of the (n + 1) th iteration in the time domain t; k 'denotes a k' th modal component index before the k-th modal component, and k "denotes a k" th modal component index after the k-th modal component; />

Representing the center frequency of the k modal component in the frequency domain ω at the nth iteration;

step 1.6: updating Lagrange multiplier for n +1 th iteration using equation (6)

In the formula (6), τ is a noise margin;

step 1.7: judging whether the formula (7) is established, if so, executing a step 1.8; otherwise, assigning n +1 to n, and returning to the step 1.5;

in the formula (7), ε has a predetermined accuracy.

Step 1.8: to pair

Performing inverse Fourier transform to obtain a real part which is the kth traffic flow component sequence vimf of the traffic flow time sequence Y or Z _k ；

Step 1.9: calculating the kth traffic flow component sequence vimf using equation (8) _k Entropy of edge information of H (vimf) _k )：

In the formula (8), the reaction mixture is,

indicating the k-th traffic flow component sequence vimf _k The ith traffic flow data->

The edge probability density function of (a);

step 1.10: calculating adjacent traffic flow component sequence vimf using equation (9) _k And vimf _k+1 Mutual information entropy of I (vimf) _k ,vimf _k+1 )：

I(vimf _k ,vimf _k+1 )＝H(vimf _k )+H(vimf _k+1 )-H(vimf _k ,vimf _k+1 ) (9)

In the formula (9), vimf _k+1 A k +1 th traffic flow component sequence representing a traffic flow time sequence Y or Z; h (vimf) _k ) Indicating the k-th traffic flow component sequence vimf _k The edge information entropy of (1); h (vimf) _k+1 ) Represents the k +1 th traffic flow component sequence vimf _k+1 The edge information entropy of (1); h (vimf) _k ,vimf _k+1 ) Denotes vimf _k And vimf _k+1 The joint information entropy of (1);

step 1.11: by using(10) Determining and dividing traffic flow low-frequency component sequence set { vimf ₁ ,vimf ₂ ,···,vimf _k* } and a set of high-frequency component sequences { vimf' _k*+1 ,vimf′ _k*+2 ,···,vimf′ _K Subscript k of }:

in the formula (10), vimf _k* Representing a corresponding first traffic flow component sequence when the mutual information entropy I (·,) obtains a minimum value; vimf 'of' _k*+1 A second traffic flow component sequence corresponding to the mutual information entropy I (·,) when the minimum value point is obtained; i (vimf) _k* ,vimf′ _k*+1 ) A minimum value representing the entropy of the mutual information;

step 1.12: calculating k +1 th traffic flow component sequence vimf 'by using formula (11)' _k*+1 Is above a threshold value TV _k+1 ：

In the formula (11), median represents a median operation; Δ represents a scaling factor constant;

step 1.13: TV according to threshold value _k*+1 To the k +1 high frequency component sequence vimf' _k*+1 Carrying out threshold noise reduction: prepared from vimf' _k*+1 The absolute value of the medium amplitude is less than the threshold value TV _k*+1 Is set to zero, the absolute value of the amplitude is greater than a threshold value TV _k*+1 So as to obtain the kth +1 traffic flow high-frequency component sequence vimf after threshold value noise reduction _k*+1 ；

Step 1.14: according to the steps 1.12 to 1.13, K +2 to K high-frequency component sequences { vimf' _k*+2 ,···,vimf′ _K Carrying out threshold value noise reduction to obtain K-K-number traffic flow high-frequency component sequences { vimf }after noise reduction _k*+1 ,vimf _k*+2 ,···,vimf _K }; meanwhile, the traffic flow low-frequency component sequence { vimf divided in the step 1.11 is reserved ₁ ,vimf ₂ ,···,vimf _k* }; finally obtaining K component sequences { vimf of the traffic flow time sequence Y or Z ₁ ,···,vimf _k* ,vimf _k*+1 ,···,vimf _K }；

And 2, step: traffic flow spatial correlation capture based on graph attention network:

step 2.1: selecting a certain traffic flow node in a traffic network, using the selected traffic flow node as a central prediction node and marking the node as a p point, decomposing the traffic flow time sequence of the p point through the process of the step 1, and reducing noise by a threshold value to obtain a traffic flow time sequence x containing K component sequences as shown in a formula (12) _p ：

In the formula (12), the reaction mixture is,

1 st and kth low-frequency component sequences respectively representing p points; />

K +1 and K high-frequency component sequences respectively representing p points;

step 2.2: selecting a plurality of traffic flow nodes in a traffic network, taking the nodes as first-order neighborhood nodes of p points, marking the nodes as a set q, wherein m first-order neighborhood nodes are shared in the set q, and marking the nodes as q = { q = ₁ ,q ₂ ,···,q _n ,···,q _m In which q is _n The point represents the nth first-order neighborhood node of the p point, and the nth first-order neighborhood node q _n The traffic flow time series is decomposed and subjected to threshold noise reduction through the process of the step 1 to obtain the traffic flow time series containing K component series shown in the formula (13)

In the formula (13), the reaction mixture is,

respectively represent q _n The 1 st, kth low frequency component sequence of a point;

respectively represent q _n The kth +1, kth high frequency component sequence of points;

step 2.3: the graph attention network calculates the traffic flow time series x of the central prediction node p by using the formula (14) _p For the nth first-order neighborhood node q _n Time series of traffic flow

In a manner known in the art->

In the formula (14), exp represents a natural base number;

representing a learnable weight vector; w represents a learnable weight matrix; | represents a splicing operation; σ (-) denotes the LeakyReLU activation function;

step 2.4: calculating the traffic flow time series x of the central prediction node p using equation (14) _p Obtaining the attention coefficient vector alpha shown in the formula (15) for the attention coefficients of the traffic flow time sequence of all the first-order neighborhood nodes in the set q _pq ：

In the formula (15), the reaction mixture is,

and &>

Time series x of traffic flows respectively representing central prediction nodes p _p Attention coefficients for traffic flow time series for 1 st, 2 nd, nth and mth first order neighborhood nodes;

step 2.5: calculating the traffic flow time series x of the central prediction node p using equation (14) _p Self-attention coefficient of (a) _pp And is combined with x _p Multiplying the component sequences to obtain a p-point weighted traffic flow time sequence x shown in formula (16) _p ：

Step 2.6: the nth first-order neighborhood node q _n Time series of traffic flow

Is compared with the attention coefficient->

Multiplying to obtain the nth first-order neighborhood node q shown in the formula (17) _n Weighted traffic flow time series->

Step 2.7: obtaining the traffic flow time sequence and the attention coefficient vector alpha of all first-order neighborhood nodes q according to the process of the step 2.6 _pq The multiplied weighted traffic flow time series is shown as equation (18):

in the formula (18), the reaction mixture,

representing the mth first-order neighborhood node q _m The traffic flow time series containing K component series; />

Representing the mth first-order neighborhood node q _m Time series x of traffic flow _qm And attention factor->

The multiplied weighted traffic flow time series;

and step 3: traffic flow time-dependent capture based on gated cycle cell networks:

step 3.1: weighted traffic flow time series of each node

The respective component sequences are respectively used as characteristic input of a gated circulation unit network, and an improved RMSProp traffic flow optimization algorithm is utilized to carry out optimization solution on a gated circulation unit network model to obtain each parameter value in the model;

step 3.1.1: defining e as iteration number, initializing e =0, and setting maximum iteration number e _max ；

Step 3.1.2: defining a learning rate hyperparameter as eta, a state variable hyperparameter as gamma, a regularization punishment parameter as rho and a real constant as epsilon, and randomly initializing; define the state variable of the e-th iteration as s _e And initializing s _e ＝0；

Step 3.1.3: defining a loss function f (u) for the e-th iteration _e )，u _e Representing a traffic flow argument at the e-th iteration;

step 3.1.4: updating the weighted traffic flow of the p-point using equation (19)Time series x _p K component sequence of (1)

Small batch sample b at e +1 th iteration _e+1 Corresponding small batch random gradient g _e+1 ：

In the formula (19), b _e+1 Representing weighted traffic flow time series x _p K component sequence of (1)

A small batch of samples at iteration e +1, and->

Wherein +>

Small batch sample b representing the e +1 th iteration _e+1 The ith sample in (1), b | represents the small batch sample b _e+1 The number of (2); />

Represents the ith sample @, using the e +1 th iteration>

Calculated loss function f (u) _e ) A random gradient of (a);

step 3.1.5: updating the state variable s of the e +1 th iteration using equation (20) _e+1 ：

s _e+1 ＝γs _e +(1-γ)g _e+1 ⊙g _e+1 (20)

In equation (20), an indicates an element-by-element product operator;

step 3.1.6: state variable s for the e +1 th iteration using equation (21) _e+1 Performing deviation correction to obtain corrected e +1 th iterationState variable

Step 3.1.7: the loss function argument u for updating the e +1 th iteration is calculated from equation (22) _e+1 ：

In the formula (22), division, evolution and multiplication are element-by-element operations; epsilon is a real constant added to maintain numerical stability;

step 3.1.8: judging whether e reaches the maximum iteration number e _max If yes, the trained gated cycle unit network model and the optimal parameter values thereof are obtained, and step 3.2 is executed; otherwise, assigning e +1 to e, and executing the step 3.1.4 to execute in sequence;

step 3.2: obtaining a weighted traffic flow time sequence x of a point p by applying a trained gated cyclic unit network model _p Set of prediction outputs of the K component sequences

Wherein it is present>

And

predicted values of K-th, K-th +1, K' -th and K-th weighted traffic flow component sequences respectively representing p points;

q is to be ₁ ～q _m Weighted traffic flow time series of points

Respective component ofThe sequence is used as the characteristic input of the gated cyclic unit network, and the step 3.1.1 to the step 3.1.8 are executed, so that the trained gated cyclic unit network model is respectively applied to obtain q ₁ ～q _m Weighted traffic flow time series in each of the nodes +>

The prediction output set of K component sequences: />

Wherein it is present>

And &>

Respectively represent q _n Predicted values of kth, kth +1, kth and kth weighted traffic flow component sequences of points; />

And &>

Respectively represent q _m Predicted values of kth, kth +1, kth and kth weighted traffic flow component sequences of points;

step 3.3: the weighted traffic flow time sequence x _p 、

The predicted values in the respective prediction output sets are superposed and summed, so that the final prediction output h of the point p of the central prediction node is obtained by using the formula (23) _p ：

In the formula (23), the reaction mixture is,

set of prediction outputs representing p points->

The kth' predicted value of (1); />

Representing the nth first order neighborhood node q _n Predictive output set of points +>

The k-th predicted value of (1).

Compared with the prior art, the technical scheme adopted by the invention has the following technical effects:

based on the characteristics of three aspects of complex non-stationarity, spatial correlation and time dependence of a traffic flow time sequence, compared with the traditional prediction method only considering one or two aspects of characteristics, the method can obtain better prediction precision and speed, and particularly comprises the following steps:

1. in the invention, the complex non-stationarity of the traffic flow time sequence is considered, the optimal central frequency and the limited bandwidth of each mode can be adaptively matched by using a variational mode decomposition algorithm improved by mutual information entropy in the searching and solving processes, the effective separation of mode components and the high-frequency threshold noise reduction are realized, the traffic flow time sequence is decomposed into a series of relatively stable high-low frequency component sequences, the non-stationarity of the time sequence is reduced, and the prediction precision of the traffic flow is improved;

2. according to the method, the spatial correlation of the traffic flow time sequence is considered, the attention mechanism of the graph attention network is utilized, the attention coefficient is used for weighting each component sequence, the traffic flow of the adjacent nodes of the traffic network can be captured to influence of the traffic flow of the central prediction node in different degrees, and the prediction precision is improved;

3. according to the method, the time dependency of the traffic flow time sequence is considered, the time-controlled circulation unit network is utilized to effectively capture the short-term and long-term time dependency of the time sequence data, the weighted traffic flow component sequence is used as the characteristic input of the time-controlled circulation unit network, the improved RMSProp traffic flow optimization algorithm is utilized for iterative training, and the iterative speed is improved and the traffic flow prediction precision is further improved through the deviation correction and regularization strategy.

Drawings

FIG. 1 is a view illustrating a construction of a traffic flow combination prediction method according to the present invention;

FIG. 2 is a model diagram of spatial correlation of traffic flow;

FIG. 3 is a traffic flow time-dependent model diagram;

FIG. 4 is a flow chart of a traffic flow combination prediction method according to the present invention;

FIG. 5 is a result graph of traffic flow after improved variational modal decomposition;

fig. 6 is a graph of actual traffic flow and prediction results from 2021, 5, month, 30, day 8.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The invention improves the prediction precision and speed of a traffic flow prediction model based on the characteristics of three aspects of complexity, non-stationarity, spatial correlation and time dependence of a traffic flow time sequence, and provides a traffic flow combination prediction method based on the combination of improved variation modal decomposition, a graph attention network and a gated cycle unit network. As shown in fig. 1, in this embodiment, a traffic flow combination method includes the following steps:

step 1: the traffic flow variation modal decomposition based on mutual information entropy improvement is as follows:

step 1.1: the mutual information entropy I (Y; Z) is defined by formula (1):

I(Y；Z)＝H(Y)+H(Z)-H(Y,Z) (1)

in formula (1): y represents a set of traffic flow time series, and Y = (Y) ₁ ,y ₂ ,···,y _i ,···,y _N )，y _i Represents the ith traffic flow data, Z represents another group of traffic flow time series, and Z = (Z) ₁ ,z ₂ ,···,z _i ,···,z _N )，z _i Representing the ith traffic flow data, wherein N is the length of the time series; h (-) represents the entropy of the edge information, and H (-) represents the entropy of the joint information; p (-) represents an edge probability density function, p (-) represents a joint probability density function, and has:

in formula (2): p (y) _i ) And p (z) _i ) Respectively represent the ith traffic flow data y _i And z _i An edge probability density function of (a); p (y) _i ,z _i ) Indicating the ith traffic flow data y _i And z _i A joint probability density function of (a); when the mutual information entropy value between two traffic flow modal component sequences is larger, the relevance is stronger, otherwise, the relevance is weaker. When the mutual information is zero, the two sequences are completely independent of each other;

step 1.2: the augmented Lagrangian objective function L (u) is defined by equation (3) _k ,ω _k ,λ)：

In formula (3): u. of _k (t) represents the k-th modal component of the traffic flow time series in the time domain t, and u _k (t)＝A _k (t)cos(φ _k (t)), wherein A _k (t) denotes the kth modal component u _k (t) amplitude in the time domain t, [ phi ] _k (t) denotes the k-th modal component u _k (t) instantaneous phase in the time domain t; omega _k (t) denotes the k-th modal component u _k (t) instantaneous frequency in the time domain t, and ω _k (t)＝φ _k ' (t); λ (t) represents the lagrange multiplier in the time domain t for strict enforcement of the constraint; alpha represents a secondary penalty term and is used for reducing Gaussian noise interference in the time sequence, and the weight of the secondary penalty term is deduced by Bayesian prior and is inversely proportional to the noise level;

Is greater than or equal to>

Non-negative phase/phase->

Non-decreasing, instantaneous frequency->

Is phase->

A derivative of (a); lagrange multiplier lambda of n +1 th iteration ⁿ⁺¹ (t); defining the number of modes as K =5;

And a central frequency->

Namely, searching for a saddle point when the eigenvalue of the hessian matrix of the augmented lagrange target function formula (3) at the position where the gradient is zero is positive or negative:

in the formulae (4) and (5),

and &>

Respectively denotes ^ under the time domain t>

f(t)、/>

And λ ⁿ (t) transforming the traffic flow time series in the time domain t into the frequency domain ω for iteration by means of a fourier transformation, wherein £ v @>

Representing the kth' modal component of the (n + 1) th iteration in the time domain t,

represents the kth "modal component of the (n + 1) th iteration in the time domain t; k 'denotes the k' th modal component index preceding the k-th modal component, k "denotes the k" th modal component index following the k-th modal component; />

Representing the center frequency of the kth modal component in the frequency domain ω at the nth iteration;

step 1.6: updating Lagrange multiplier for n +1 th iteration using equation (6)

In the formula (6), τ is a noise margin, and an empirical value τ =0.005 is taken without strict requirement;

in the formula (7), epsilon is a constant with given precision and error control, precision and iteration times are determined, strict requirements are not required, and an empirical value epsilon =1e-6 can be taken;

step 1.8: to pair

Performing inverse Fourier transform, converting the frequency domain signal into a time domain signal, and obtaining a real part which is the k-th traffic flow component sequence vimf of the traffic flow time sequence Y or Z without considering the imaginary part value _k ；

In the formula (8), the reaction mixture is,

indicating the kth traffic flow component sequence vimf _k The ith traffic flow data->

The edge probability density function of (a);

step 1.10: adjacent intersection calculation Using equation (9)Flux component sequence vimf _k And vimf _k+1 Mutual information entropy of I (vimf) _k ,vimf _k+1 )：

I(vimf _k ,vimf _k+1 )＝H(vimf _k )+H(vimf _k+1 )-H(vimf _k ,vimf _k+1 ) (9)

step 1.11: determining a set of low-frequency component sequences { vimf) for dividing traffic flow by using equation (10) ₁ ,vimf ₂ ,···,vimf _k* } and a set of high-frequency component sequences { vimf' _k*+1 ,vimf′ _k*+2 ,···,vimf′ _K Subscript k of }:

in the formula (10), vimf _k* Representing a corresponding first traffic flow component sequence when the mutual information entropy I (·,) obtains a minimum value; vimf 'of' _k*+1 Representing a corresponding second traffic flow component sequence when the mutual information entropy I (·,) is obtained to a minimum value point; i (vimf) _k* ,vimf′ _k*+1 ) A minimum value representing the mutual information entropy;

step 1.12: calculating k +1 th traffic flow component sequence vimf 'by using formula (11)' _k*+1 Is measured by a threshold value TV _k+1 ：

In the formula (11), median represents a median operation; Δ represents a scaling factor constant, and may take an empirical value of Δ =0.6745; threshold value TV _k+1 Left side part of

For calculating the zoom factor, the right part->

For calculating a base threshold;

step 1.13: TV according to threshold value _k*+1 To k +1 high-frequency component sequences vimf' _k*+1 Carrying out threshold noise reduction: prepared from vimf' _k*+1 The absolute value of the medium amplitude is less than the threshold value TV _k*+1 Is set to zero, the absolute value of the amplitude is greater than a threshold value TV _k*+1 So as to obtain the k +1 high-frequency component sequence vimf of the traffic flow after threshold value noise reduction _k*+1 ；

Step 1.14: according to the steps 1.12 to 1.13, K +2 to K high-frequency component sequences { vimf' _k*+2 ,···,vimf′ _K Carrying out threshold value noise reduction to obtain K-K-number traffic flow high-frequency component sequences { vimf }after noise reduction _k*+1 ,vimf _k*+2 ,···,vimf _K }; meanwhile, the traffic flow low-frequency component sequence { vimf divided in the step 1.11 is reserved ₁ ,vimf ₂ ,···,vimf _k* }; finally obtaining K component sequences { vimf) of the traffic flow time sequence Y or Z ₁ ,···,vimf _k* ,vimf _k*+1 ,···,vimf _K }；

In the formula (12), the reaction mixture is,

step 2.2: selecting a plurality of traffic flow nodes in a traffic network, taking the nodes as first-order neighborhood nodes of p points, marking the nodes as a set q, wherein m first-order neighborhood nodes are shared in the set q, and marking the nodes as q = { q = ₁ ,q ₂ ,···,q _n ,···,q _m Wherein q is _n The point represents the nth first-order neighborhood node of the p point, and the nth first-order neighborhood node q _n The traffic flow time series is decomposed and subjected to threshold noise reduction through the process of the step 1 to obtain the traffic flow time series containing K component series shown in the formula (13)

In the formula (13), the reaction mixture is,

respectively represent q _n The 1 st, k × low frequency component sequence of points;

Is greater than or equal to>

In the formula (14), exp represents a natural base number;

representing a learnable weight vector; w represents a learnable weight matrix; | | represents the splicing operation; σ (-) denotes the LeakyReLU activation function;

step 2.4: calculation of traffic flow time series x of central prediction node p using equation (14) _p Obtaining the attention coefficient vector alpha shown in the formula (15) for the attention coefficients of the traffic flow time sequence of all the first-order neighborhood nodes in the set q _pq ：

In the formula (15), the reaction mixture is,

and &>

step 2.5: calculation of traffic flow time series x of central prediction node p using equation (14) _p Self-attention coefficient of (a) _pp And is combined with x _p Multiplying the component sequences to obtain a p-point weighted traffic flow time sequence x shown in formula (16) _p ：

In each component sequence and attention coefficient->

Step 2.7: obtaining the traffic flow time sequence and the attention coefficient vector alpha of all the first-order neighborhood nodes q according to the process of the step 2.6 _pq The multiplied weighted traffic flow time series is shown as equation (18):

in the formula (18), the reaction mixture,

Representing the mth first-order neighborhood node q _m Is based on the traffic flow time series->

And attention factor->

The multiplied weighted traffic flow time series;

step 3.1: weighted traffic flow time series x of each node _p 、x _q1 ～x _qm The respective component sequences are respectively used as characteristic input of a gated circulation unit network, and an improved RMSProp traffic flow optimization algorithm is utilized to carry out optimization solution on a gated circulation unit network model to obtain each parameter value in the model; step 3.1.1-3.1.8 inputting p point weighted traffic flow time sequence x _p K component sequence of (1)

For example;

step 3.1.1: defining e as iteration number, initializing e =0, and setting maximum iteration number e _max ＝50；

Step 3.1.2: defining a learning rate over-parameter as eta, a state variable over-parameter as gamma, a regularization penalty parameter as rho and a real constant as epsilon, and randomly initializing the learning rate over-parameter eta =0.01, the state variable over-parameter gamma e (0.9,1), the penalty parameter rho e (0,1) and the real constant epsilon e (0,1); define the state variable of the e-th iteration as s _e And initializing s _e ＝0；

step 3.1.4: updating the weighted traffic flow time series x of p points using equation (19) _p K component sequence of (1)

In formula (19), b _e+1 Representing weighted traffic flow time series x _p K component sequence of (1)

A small batch of samples at iteration e +1, and->

Wherein it is present>

Small batch sample b representing the e +1 th iteration _e+1 The ith sample in (1), b | represents the small batch sample b _e+1 Is a hyper-parameter independent of the iteration number e; />

Represents the ith sample @, using the e +1 th iteration>

Calculated loss function f (u) _e ) A random gradient of (a);

s _e+1 ＝γs _e +(1-γ)g _e+1 ⊙g _e+1 (20)

In equation (20), an indicates the element-by-element product operator;

step 3.1.6: state variable s for the e +1 th iteration using equation (21) _e+1 Performing deviation correction to obtain corrected state variable of e +1 th iteration

Step 3.1.7: the loss of updating the e +1 st iteration is calculated from equation (22)Argument u of loss function _e+1 ：

step 3.1.8: judging whether e reaches the maximum iteration number e _max If yes, the trained gating cycle unit network model and the optimal parameter values thereof are obtained, and step 3.2 is executed; otherwise, assigning e +1 to e, and executing the step 3.1.4 to execute in sequence;

step 3.2: obtaining a weighted traffic flow time sequence x of a point p by applying a trained gated cyclic unit network model _p Set of prediction outputs of the K component sequences of

Wherein it is present>

And

q is to be ₁ ～q _m Weighted traffic flow time series of points

The respective component sequences are used as the characteristic input of the gated cyclic unit network, and the step 3.1.1 to the step 3.1.8 are executed, so that the trained gated cyclic unit network model is respectively applied to obtain q ₁ ～q _m Respective weighted traffic flow time series->

The predicted output set of K component sequences of (a): />

Wherein +>

And &>

Respectively represent q _n Predicted values of kth, kth +1, kth' and kth weighted traffic flow component sequences of points; />

And &>

Respectively represent q _m Predicted values of kth, kth +1, kth' and kth weighted traffic flow component sequences of points;

step 3.3: the weighted traffic flow time sequence x _p 、x _q1 ～x _qm The predicted values in the respective prediction output sets are superposed and summed, so that the final prediction output h of the point p of the central prediction node is obtained by using the formula (23) _p ：

In the formula (23), the compound represented by the formula,

set of prediction outputs representing p points->

The kth' predicted value of (1); />

The kth "predicted value in (1); fig. 4 is a flow chart of a traffic flow combination prediction method of the present invention.

Examples of the applications

1) RTMC traffic data set and pre-processing

The data set used in the embodiments of the present invention is collected by a Regional Traffic Management Center (RTMC) under the state of minnesota traffic department (MnDOT), which includes flow data every 30 seconds, hour, and day.

The selected traffic node region I-35W and TH 62 comprises a central prediction node and four neighborhood nodes. The S50 monitoring point is a central prediction node, and the S49, S329 and the two Exit monitoring points are first-order neighborhood nodes.

The sampling time is 8-00-20 from 1/4/2021 to 30/5/2021, and the sampling interval of the traffic flow time series is re-aggregated into 5mins of data, and 8640 data are contained in each node after aggregation. And dividing the data of the first 45 days into training set samples, and dividing the data of the last 15 days into test set samples. A sliding window of width T +1 is used in the training set to generate training samples, where T is the historical timestamp data length and the T +1 th timestamp data is the tag value, and the sliding window is shifted one time unit (5 mins) at a time.

2) Performance index

To evaluate the prediction performance of the combined prediction method of the present invention, mean Absolute Error (MAE), mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) were used as performance indicators. The definitions are as follows:

(1) Mean absolute Error (MeanAbsolute Error, MAE): average of absolute errors.

(2) Mean absolute percent Error (MeanAbsolute Percentage Error, MAPE): the prediction error is a percentage of the true value.

(3) Root Mean square Error (Root Mean Squared Error, RMSE): root mean square difference of predicted value and true value.

In equations (24), (25) and (26), | b | is the number of small-lot samples of the traffic flow training set, x _i And

and respectively a real value and a predicted value of the ith traffic flow sample in the test set. The smaller the values of the MAE, MAPE and RMSE indexes are, the better the prediction result is.

3) Model training and experimental analysis

Fig. 2 and 3 show a traffic flow spatial correlation capture model and a traffic flow time-dependent capture model, respectively.

In the main super-parameter setting, the penalty factor alpha of the improved traffic flow variation modal decomposition algorithm is 2000, the noise tolerance parameter tau is 0.005, the step length of a historical timestamp in a training set T =7, the number of neurons in a hidden layer of a GAT network is 128, the number of neurons in a hidden layer of a GRU network is 64, the size of batchSize in batch processing in the network is 32, and the maximum training iteration number Epoch of the improved RMSProp traffic flow optimization algorithm is 50.

The traffic flow time sequence is decomposed into 5 stable component subsequences with different frequencies and amplitudes by using an improved traffic flow variation modal decomposition algorithm, and each modal component sequence vimf normalized from low frequency to high frequency is shown in fig. 5.

Table 1 shows the entropy of mutual information between each vimf component, which is known from the table ₂ And vimf ₃ Inter-component mutual information entropy value I (vimf) ₂ ,vimf ₃ ) Decrease, and vimf ₃ And vimf ₄ Mutual information entropy of components I (vimf) ₃ ,vimf ₄ ) Opening deviceAt the beginning of the increase, a minimum point occurs, so vimf will be ₃ The subscript of the component is taken as the subscript k of the boundary point of the low frequency component and the high frequency component. Preserving low frequency components vimf ₁ And vimf ₂ And for high frequency component vimf ₃ To vimf ₅ And (4) carrying out threshold value noise reduction, and reserving the information of the original traffic flow time sequence to the maximum extent.

TABLE 1 mutual information entropy between VIMF components

FIG. 6 shows the original traffic flow and the final predicted results from 20 to 00 in 5 months and 30 days in 2021 in test set. It can be seen that the traffic flows in the prediction periods of 8.

Claims

1. A traffic flow combination prediction method is characterized by comprising the following steps:

step 1.1: mutual information entropy I (Y; Z) is defined by formula (1):

I(Y；Z)＝H(Y)+H(Z)-H(Y,Z) (1)

in formula (1): y represents a set of traffic flow time series, and Y = (Y) ₁ ,y ₂ ,···,y _i ,···,y _N )，y _i Represents the ith traffic flow data, Z represents another group of traffic flow time series, and Z = (Z) ₁ ,z ₂ ,···,z _i ,···,z _N )，z _i Representing the ith traffic flow data, wherein N is the length of the time series; h (-) represents the edge information entropy, and H (-) represents the joint information entropy; p (-) represents an edge probability density function, p (-) represents a joint probability density function, and has:

in formula (2): p (y) _i ) And p (z) _i ) Respectively represent the ith traffic flow data y _i And z _i The edge probability density function of (a); p (y) _i ,z _i ) Represents the ith traffic flow data y _i And z _i A joint probability density function of (a);

In formula (3): u. of _k (t) represents the k-th modal component of the traffic flow time series in the time domain t, and u _k (t)＝A _k (t)cos(φ _k (t)), wherein A _k (t) denotes the k-th modal component u _k (t) amplitude in the time domain t, [ phi ] _k (t) denotes the k-th modal component u _k (t) instantaneous phase in the time domain t; omega _k (t) denotes the kth modal component u _k (t) instantaneous frequency in the time domain t, and ω _k (t)＝φ _k ' (t); λ (t) represents the lagrange multiplier in the time domain t; alpha represents a secondary penalty term;

Is greater than or equal to>

Non-negative phase/phase->

Non-decreasing, instantaneous frequency->

Is phase->

A derivative of (a); lagrange multiplier lambda of n +1 th iteration ⁿ⁺¹ (t); defining the number of modes as K;

And center frequency +>

/>

In the formulae (4) and (5),

and &>

Respectively denotes ^ under the time domain t>

f(t)、/>

And λ ⁿ (t) is transformed by a Fourier transform into the frequency domain ω, wherein>

A kth 'modal component representing an n +1 th iteration in the time domain t, <' >>

Representing the center frequency of the kth modal component in the frequency domain ω at the nth iteration; />

Representing the kth modal component of the traffic flow time sequence of the (n + 1) th iteration in the time domain t; lambda [ alpha ] ⁿ (t) represents the lagrangian multiplier for the nth iteration in the time domain t;

step 1.6: updating Lagrange multiplier for n +1 th iteration using equation (6)

In the formula (6), τ is a noise margin;

in the formula (7), epsilon is a given precision;

step 1.8: to pair

In the formula (8), the reaction mixture is,

The edge probability density function of (a);

I(vimf _k ,vimf _k+1 )＝H(vimf _k )+H(vimf _k+1 )-H(vimf _k ,vimf _k+1 ) (9)

In the formula (9), vimf _k+1 Representing traffic flowThe (k + 1) th traffic flow component sequence of the traffic time sequence Y or Z; h (vimf) _k ) Indicating the k-th traffic flow component sequence vimf _k Entropy of edge information of (a); h (vimf) _k+1 ) Represents the k +1 th traffic flow component sequence vimf _k+1 The edge information entropy of (1); h (vimf) _k ,vimf _k+1 ) Denotes vimf _k And vimf _k+1 The joint information entropy of (2);

step 1.11: determining and dividing traffic flow low-frequency component sequence set { vimf) by using equation (10) ₁ ,vimf ₂ ,···,vimf _k* H and a set of high-frequency component sequences { vimf' _k*+1 ,vimf′ _k*+2 ,···,vimf′ _K Subscript k of }:

/>

step 1.12: calculating k +1 th traffic flow component sequence vimf 'by using formula (11)' _k*+1 Is measured by a threshold value TV _k*+1 ：

step 1.13: TV according to threshold value _k*+1 To the k +1 high frequency component sequence vimf' _k*+1 Carrying out threshold noise reduction: prepared from vimf' _k*+1 The absolute value of the medium amplitude is less than the threshold value TV _k*+1 Is set to zero, the absolute value of the amplitude is greater than a threshold value TV _k*+1 So as to obtain the k +1 high-frequency component sequence vimf of the traffic flow after threshold value noise reduction _k*+1 ；

Step 2: traffic flow spatial correlation capture based on graph attention network:

In the formula (12), the reaction mixture is,

1 st and k-th low-frequency component sequences respectively representing p points; />

step 2.2: selecting a plurality of traffic flow nodes in a traffic network, taking the nodes as first-order neighborhood nodes of p points, marking the nodes as a set q, wherein m first-order neighborhood nodes are shared in the set q, and marking the nodes as q = { q = ₁ ,q ₂ ,···,q _n ,···,q _m Wherein q is _n The point represents the nth first-order neighborhood node of the p point, the nth first-order neighborhoodDomain node q _n The traffic flow time series is decomposed and subjected to threshold noise reduction through the process of the step 1 to obtain the traffic flow time series containing K component series shown in the formula (13)

In the formula (13), the reaction mixture is,

respectively represent q _n The 1 st, k × low frequency component sequence of points; />

In a manner known in the art->

In the formula (14), exp represents a natural base number;

representing a learnable weight vector; w represents a learnable weight matrix; i denotes a splicing operation(ii) a σ (-) denotes the LeakyReLU activation function; />

Step 2.4: calculating the traffic flow time series x of the central prediction node p using equation (14) _p For the attention coefficients of the traffic flow time sequence of all the first-order neighborhood nodes in the set q, the attention coefficient vector alpha shown in the formula (15) is obtained _pq ：

In the formula (15), the reaction mixture is,

and &>

The sequence of the components and the attention coefficient of

Step 2.7: obtaining the traffic flow time sequence and the attention coefficient vector alpha of all the first-order neighborhood nodes q according to the process of the step 2.6 _pq The multiplied weighted traffic flow time series is shown in equation (18):

in the formula (18), the reaction mixture,

And attention factor->

The multiplied weighted traffic flow time series;

step 3.1: weighted traffic flow time series x of each node _p 、

Step 3.1.2: defining a learning rate hyperparameter as eta, a state variable hyperparameter as gamma, a regularization punishment parameter as rho and a real constant as epsilon, and randomly initializing; defining the state variable of the e-th iteration as s _e And initializing s _e ＝0；

step 3.1.4: updating the weighted traffic flow time series x of point p using equation (19) _p K component sequence of (1)

Small batch sample b at e +1 th iteration _e+1 Corresponding small batch random gradient g _e+1 ：/>

In the formula (19), b _e+1 Representing a weighted traffic flow time series x _p K component sequence of (1)

A small batch of samples at iteration e +1, and->

Wherein it is present>

To representSmall batch of samples b for the e +1 th iteration _e+1 The ith sample in (1), b | represents the small batch sample b _e+1 The number of (2); />

Represents the ith sample @, using the e +1 th iteration>

Calculated loss function f (u) _e ) A random gradient of (a);

s _e+1 ＝γs _e +(1-γ)g _e+1 ⊙g _e+1 (20)

In equation (20), an indicates an element-by-element product operator;

step 3.1.8: judging whether e reaches the maximum iteration number e _max If yes, the trained gated cycle unit network model and the optimal parameter values thereof are obtained, and the steps are executed3.2; otherwise, assigning e +1 to e, and executing the step 3.1.4 to execute in sequence;

Wherein it is present>

And &>

q is to be ₁ ～q _m Weighted traffic flow time series of points

The respective component sequences are used as the characteristic input of the gated cyclic unit network, and the step 3.1.1 to the step 3.1.8 are executed, so that the trained gated cyclic unit network model is respectively applied to obtain q ₁ ～q _m Weighted traffic flow time series in each of the nodes +>