CN115762147A

CN115762147A - Traffic flow prediction method based on adaptive graph attention neural network

Info

Publication number: CN115762147A
Application number: CN202211386613.4A
Authority: CN
Inventors: 黄海辉; 李坤鸿; 常光辉; 王玮晗; 胡智鹏; 胡诗洋
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-03-07
Anticipated expiration: 2042-11-07
Also published as: CN115762147B

Abstract

The invention discloses a traffic flow prediction method based on a self-adaptive graph attention neural network, aims to predict the traffic flow of medium and long-term traffic vehicles, and belongs to the technical field of urban traffic planning and flow prediction. The method comprises the following steps: step 1: extracting flow data of a road, and preprocessing the flow data by an attention mechanism and a space-time data embedding method to obtain a preprocessed data sequence; step 2: on the basis, extracting space-time characteristics from the obtained data sequence; and step 3: after extraction through multiple network layers, convergence is performed using an improved multi-head attention mechanism, and a predicted result is obtained through a full-link layer. The method adopts multi-module parallel processing, improves the convolution mode and reduces the training time. The method of the invention can predict the traffic flow more accurately and can complete the prediction task better.

Description

Traffic flow prediction method based on adaptive graph attention neural network

Technical Field

The invention belongs to traffic flow prediction in the field of space-time sequence prediction, and relates to a traffic flow prediction method based on a self-adaptive graph attention neural network, which is only used for medium and long time flow prediction tasks in a traffic system.

Background

With the continuous development of national economy, the living standard of people is continuously improved, and the number of private cars is more and more, so the pressure borne by roads is continuously increased, and an intelligent transportation system is provided for solving the problem. Traffic flow prediction is a very important part in an intelligent traffic system, generates great assistance for traffic scheduling, is an indispensable part for traffic management departments to reasonably allocate road resources or provide more effective travel strategies for the public, and is one of effective methods for solving the current traffic efficiency problem.

At present, with the wide application of an Intelligent Transportation System (ITS), massive traffic data can be obtained in time, and the research on traffic speed prediction is further promoted. Fixed position sensors on the road record traffic data, including speed, flow and position information. A close spatiotemporal relationship exists among the traffic characteristics; therefore, the key to traffic prediction is capturing the dynamic spatiotemporal correlation of data. However, this task is challenging due to the complexity and non-linearity of traffic data.

First, the spatial dependency of the nodes is dynamic. Complex dependencies exist between nodes. And the spatial relationship between nodes is not independent but dynamically changes over time. However, several existing methods fail to dynamically model the traffic data spatio-temporally. Secondly, the nonlinearity of traffic speed changes and the propagation of errors during training make the traditional deep learning method insufficient for long-term prediction. Most importantly, these methods are based on a predefined graph structure matrix, which limits the dependence in the spatial utilization traffic data. This means that these methods still fail. And meanwhile, the space-time characteristics are extracted while the dynamic correlation of the traffic data is ignored.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A traffic flow prediction method based on a self-adaptive graph attention neural network is provided. The technical scheme of the invention is as follows:

a traffic flow prediction method based on an adaptive graph attention neural network comprises the following steps:

step 1: setting time slices, collecting and counting historical traffic flow information under each time slice through a detection device arranged at a traffic intersection, and forming a two-dimensional traffic flow matrix; dividing the obtained historical traffic flow information into a training set and a testing set;

and 2, step: the traffic flow detection device installed at each traffic intersection is regarded as a node, and the connection among the nodes under a single time slice and traffic flow data form a road network topological graph; connecting each node under the adjacent time slices with nodes of an upper time period and a lower time period, and dynamically adjusting the corresponding weight of the nodes by adopting an attention mechanism according to the road network flow information so as to construct a space-time network sequence and obtain a local space-time connection graph;

and step 3: expanding the convolution kernel by expanding convolution to increase the receptive field;

and 4, step 4: constructing a space-time graph convolution network model for predicting traffic flow based on an attention mechanism, wherein the space-time graph convolution network model adopts a mode of overlapping a plurality of modules, processes and outputs data in a time period, and adopts a multi-head attention mechanism to be connected with a residual error;

and 5: after output splicing, the output of the gate control mechanism block is obtained through a full connection layer, wherein a residual error structure is added for preventing overfitting when the output splicing passes through the full connection layer;

step 6: testing the self-adaptive graph attention neural network model after training by using a test set, evaluating the error of the model, returning to the step 2 if the error is larger than a set threshold value, and retraining the model;

and 7: and inputting the traffic flow data of the front N set time slices of the road section to be predicted into the trained adaptive graph attention neural network model, and predicting the traffic flow of the future N time slices of the road section.

Further, the step 1 specifically includes: setting 5 minutes as a time slice, collecting and counting historical traffic flow information under each time slice through a detection device arranged at a traffic intersection, and forming a two-dimensional traffic flow matrix;

the historical traffic flow data comprises the license plate number of the motor vehicle passing by the road section in a time slice, the passing time, the passing speed, the traffic flow information and the weather condition of the day, the repeated data and the invalid data are cleaned, and the residual data are subjected to z-score standardization processing

Wherein x _i Is used as the original data, and the data is transmitted,

for new data, μ _i Is a mean value, σ _i Is the standard deviation. n is the number of stations in the road segment.

Further, the road network topology map in step 2 includes: the set V, | V | = N of all nodes represents the number of nodes, the edge set E among the nodes, and the weight adjacency matrix A of the edges among the nodes; obtaining a road network topological graph G = (V, E, A), and according to the topological structure of a local space-time graph, directly capturing the correlation between each node and the space-time neighbor thereof; the construction of the spatio-temporal network sequence uses A epsilon R ^N*N A adjacency matrix representing a space diagram, A ∈ R ^3N*3N A adjacency matrix representing a local space-time diagram constructed on three consecutive spatial diagrams; for a node i in the space diagram, calculating a new index of the node i in the local space-time diagram by (t-1) N + i, wherein t (0 < t ≦ 3) represents a time step in the local space-time diagram; if two nodes are connected to each other in this local space-time diagram, the corresponding value in the adjacency matrix is set to 1; wherein the adjacency matrix of the local space-time diagram is represented as:

wherein v is _i Representing a node i in a local space-time diagram, wherein an adjacent matrix A comprises 3N nodes, and the diagonal line of the adjacent matrix is an adjacent matrix of a space network with three continuous time steps; the two sides of the diagonal represent the connectivity of each node belonging to the adjacent time step to itself.

Further, in the step 2, the attention mechanism is adopted to dynamically adjust the corresponding weight of the node, and the specific steps are as follows:

each node block represents the current flow state at the time step t, and different colors represent different influence weights; the channels are divided in the time dimension, wherein one time step is a channel, and the aim is to dynamically adjust the space-time correlation by distributing dynamic weights to the characteristics at different time steps. Mining dynamic spatiotemporal correlations between data using a channel attention mechanism;

the feature compression of X is first performed by a global averaging pool, which converts each temporal channel into a number such that each channel has a global acceptance field in the spatial dimension;

X _p 、f _pool (X) indicates that each channel has a global receptive field in the spatial dimension, where T represents the historical time step and N represents the total number of sensors. Since the present study only considers the speed characteristics, C =1. Wherein X _p ∈R ^T To learn the non-linear correlation between data, this equation is passed through two fully connected layers;

x _att ＝W ₂ δ(W ₁ X _p )

x _att expressing the attention coefficient, wherein

For trainable parameters, r represents the scaling ratio of the channel, δ represents the ReLU activation function; furthermore, to obtain a weight value between 0 and 1, x is recalibrated using the sigmoid activation function as follows _att ：

x′ _att ＝σ(x _att )

Then, use is made of x' _att And the Hadamard product of X to obtain dynamically adjusted spatio-temporal feature data as follows:

X _att ＝X⊙x′ _att

subsequently, X _att And sending the data to a gating spreading convolution module and a space convolution module to further capture space-time characteristics.

Further, the step 3: expanding the convolution kernel by adopting expanding convolution to increase the receptive field, which specifically comprises the following steps:

in this module, different expansion rate convolution inputs are employed to achieve short, medium and long term prediction goals; x _att ∈R ^T*N*C Is the input to the module, where T represents the time step of the input sequence; for mining short, medium and long term features, pair X _att Performing dilation convolution, wherein the dilation rate is D =1,2,5 and 11; then, after convolution in the time dimension, the results are concatenated as follows:

X _cat ＝concat(X _att *f _D＝1 ,X _att *f _D＝2 ,X _att *f _D＝5 ,X _att *f _D＝11 )

wherein X _att *f _D＝i (i =1,2,5,11) denotes the expansion convolution, f ∈ R ^1*2 Is a convolution kernel; in order to keep consistent with the dimensions of the spatial features, the connected vectors will be dimension converted through the fully connected layers;

X _D ＝FC(X _cat )

FC denotes the full connection layer, X _D For the resulting vectors, finally, a gating mechanism is used to control the transmission of the time information; the system consists of two parallel activation functions; the Tanh activation function is used to overcome the vanishing gradient problem; sigmoid activation function maps data between 0 and 1Transmitting as message transmission control; x _D By gating mechanism in the following form:

H _T ＝g(X _D1 )⊙σ(X _D2 )。

wherein g (X) _D1 )、σ(X _D2 ) Respectively representing two activation functions, H _T Is the result of its output.

Further, the step 4 specifically includes:

adopting a space convolution module, wherein a convolution operator adopts the following calculation formula:

wherein,

θ、I _N are learnable parameters; A.

respectively representing the adjacency matrix and the augmentation matrix thereof,

the matrix is normalized for it.

A self-learning graph structure matrix is generated by adopting an attention-based method; the matrix may learn hidden spatial correlations between nodes from the input data;

wherein X ∈ R ^T*N*C Is the input of the model, V _s ∈R ^N*N ，U ₁ ∈R ^C ，U ₂ ∈R ^T*C And U ₃ ∈R ^T Are learnable parameters; subsequently, all data are normalized by using a Softmax function to obtain an adaptive graph structure matrix

The following graph is presented to curl the layers:

finally, the time-space units are fused by a fusion mechanism, H _s 、H _T Respectively representing the gating mechanism and the output of the graph convolution layer to obtain the output of each dynamic space-time block;

Y＝Z ₁ H _T +Z ₂ H _S

wherein Z ₁ And Z ₂ Is a learnable parameter matrix.

Further, the step 5 specifically includes:

the output layer converts the output of the last graph convolution layer into a traffic information sequence of T' time steps in the future, the input of the output layer being represented by transposing the input and reshaping it into X ^T ∈R ^T*N*C The T connected layers are used to generate the prediction, as follows:

wherein F (x) ₁ ,x ₂ ,…,x _t ) Represents the prediction result at the ith time step,

representing a learnable parameter.

Further, said step 6 selects the Huber loss as a loss function,

wherein, Y represents a basic fact,

representing the prediction of the model, δ is a threshold parameter that controls the range of squared error loss.

Further, the step 7: inputting the traffic flow data of the front N set time slices of the road section to be predicted into a trained adaptive graph attention neural network model, predicting the traffic flow of the future N time slices of the road section, and specifically comprising the following steps:

in this step, the history data is represented as x = (x) _t ,x _t-1 ,...,x _t-T+1 ) As an input traffic sequence of length T, x' = (x) _t+1 ,x _t+2 ,…,x _t+p ) Is the predicted flow data for the next time step, and specifically defines the formula shown below, where

Are learnable parameters:

the invention has the following advantages and beneficial effects:

first, spatio-temporal heterogeneity is fully taken into account and the spatial dependencies of the nodes are dynamic when trying to extract the spatial features of the graph. Complex dependencies exist between nodes. And the spatial relationship between nodes is not independent but dynamically changes over time. Most importantly, these methods are based on a predefined graph structure matrix, which limits the dependence in the spatial utilization traffic data. In the experiment, a time chart with more characteristics is obtained by using a space-time embedding method and combining an attention mechanism. And in the extraction process, the dynamic adjustment module is used for adjusting the structure of the relevant graph in time to extract more sufficient space-time characteristics, so that a better prediction effect can be achieved.

Drawings

FIG. 1 is a flow diagram of an adaptive neural network prediction module provided by the present invention;

FIG. 2 is a schematic diagram of a dynamic adjustment module;

FIG. 3 is a schematic diagram of a feature extraction module.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

as shown in fig. 1, a traffic flow prediction method based on an adaptive graph attention neural network includes the following steps:

step 1: setting 5 minutes as a time slice, collecting and counting historical traffic flow information under each time slice through a detection device arranged at a traffic intersection, and forming a two-dimensional traffic flow matrix;

Wherein x _i Is used as the original data, and the data is transmitted,

for new data, μ _i Is a mean value, σ _i Is the standard deviation.

Step 2: the traffic flow detection device installed at each traffic intersection is regarded as a node, and the connection among the nodes under a single time slice and traffic flow data form a road network topological graph; connecting each node under the adjacent time slices with nodes of an upper time period and a lower time period, and simultaneously considering that the weight of each node in each time period in a road network is different, and adjusting the corresponding weight of the node according to the traffic information of the road network by an attention mechanism to construct a space-time network sequence to obtain a local space-time connection graph;

the road network topological graph comprises: the set V, | V | = N of all nodes represents the number of nodes, the edge set E among the nodes, and the weight adjacency matrix A of the edges among the nodes; and obtaining a road network topological graph G = (V, E, A). According to the topological structure of the local space-time graph, the correlation between each node and the space-time neighbor thereof can be directly captured. The spatio-temporal network sequence is constructed by using A epsilon R ^N*N A adjacency matrix representing a space diagram, A ∈ R ^3N*3N A adjacency matrix representing a local space-time diagram constructed on three consecutive spatial diagrams; for a node i in the space diagram, calculating a new index of the node i in the local space-time diagram by (t-1) N + i, wherein t (0 < t ≦ 3) represents a time step in the local space-time diagram; if two nodes are connected to each other in this local space-time diagram, the corresponding value in the adjacency matrix is set to 1; wherein the adjacency matrix of the local space-time diagram is represented as:

wherein v is _i Representing node i in the local space-time diagram. The adjacency matrix a includes 3N nodes. A adjacency matrix of local space-time diagrams is illustrated. The diagonal of the adjacency matrix is the adjacency matrix of the spatial network of three consecutive time steps. The sides of the diagonal represent the connectivity of each node belonging to the adjacent time step to itself.

The node weight dynamic adjusting module in the step comprises the following steps:

referring to fig. 2, each node block represents the current traffic state at time step t, and different colors represent different impact weights. The present study divides the channels in the time dimension, where one time step is one channel. The purpose is to dynamically adjust the spatiotemporal correlation by assigning dynamic weights to the features at different time steps. We use the channel attention mechanism to mine the dynamic spatiotemporal correlation between data.

Feature compression for X is first performed by a global averaging pool that converts each temporal channel to a number such that each channel has a global acceptance field in the spatial dimension.

Wherein X _p ∈R ^T To learn the non-linear correlation between data, equation (1) is passed through two fully connected layers.

x _att ＝W ₂ δ(W ₁ X _p )

Wherein

r represents the scaling ratio of the channels and δ represents the ReLU activation function. Furthermore, to obtain a weight value between 0 and 1, x is recalibrated using the sigmoid activation function as follows _att ：

x′ _att ＝σ(x _att )

Then, x 'is used' _att And the Hadamard product of X to obtain dynamically adjusted spatio-temporal feature data as follows:

X _att ＝X⊙x′ _att

then, X _att And sending the data to a gating spreading convolution module and a space convolution module to further capture space-time characteristics.

And step 3: we use an improved gated dilation convolution to converge on the long-term features. For convolutional networks, it is difficult to obtain a larger field of view because it is limited by the size of the convolution kernel. To increase the receptive field, the convolution operation generally employs three methods: the use of larger convolution kernels, the use of deep nets, and aggregation operations prior to convolution. Here we use dilation convolution to expand the convolution kernel by adding "dilation" to obtain a larger receptive field.

In this block, we use different dilation rate convolution inputs to achieve short, medium and long term prediction goals. X _att ∈R ^T*N*C Is the input to the module, where T represents the time step of the input sequence. For digging short, medium and long termPeriod characteristics of, for X _att The dilation convolution was performed with dilation rates D =1,2,5, 11. Then, after convolution in the time dimension, the results are concatenated as follows:

wherein X _att *f _D＝i (i =1,2,5,11) denotes expansion convolution, f ∈ R ^1*2 Is a convolution kernel. To keep consistent with the dimensions of the spatial features, the connected vectors will be dimension converted through the fully connected layers.

X _D ＝FC(X _cat )

Finally, a gating mechanism is used to control the transmission of the time information. Consisting of two parallel activation functions. The Tanh activation function is used to overcome the vanishing gradient problem. The sigmoid activation function maps data between 0 and 1 to message passing control. X _D Passing through a gating mechanism in the following manner:

H _T ＝g(X _D1 )⊙σ(X _D2 )

and 4, step 4: and (3) extracting the space-time characteristics of the data sequence based on the preprocessed data sequence output in the step (2). And constructing a graph convolution network model for predicting traffic flow based on an attention mechanism, wherein the network processes and outputs data in a time period in a mode of overlapping a plurality of modules, and the attention mechanism is adopted to reduce the loss of characteristics as much as possible.

In this module, in order to make the model lightweight and reduce excessive overhead, we use a spatial convolution module, and the convolution operator uses the following calculation formula:

wherein,

θ is a learnable parameter.

As can be seen from the evolution of GCN, the process of graph convolution is essentially determined by the adjacency matrix. A self-learning graph structure matrix is generated by adopting a method based on attention. The matrix may learn hidden spatial correlations between nodes from the input data.

Wherein X ∈ R ^T*N*C Is the input of the model, V ∈ R ^N*N ，U ₁ ∈R ^C ，U ₂ ∈R ^T*C And U ₃ ∈R ^T Are learnable parameters. Subsequently, all data are normalized by using a Softmax function to obtain an adaptive graph structure matrix

The following graph is presented to curl the layers:

and finally, fusing the time-space units through a fusion mechanism to obtain the output of each dynamic time-space block.

Y＝Z ₁ H _T +Z ₂ H _S

Wherein Z ₁ And Z ₂ Is a learnable parameter matrix.

And 5: and splicing the obtained outputs, and then obtaining the output of the gate control mechanism block through a full connection layer, wherein a residual error structure is added for preventing overfitting when the outputs pass through the full connection layer.

In this step, the output layer is a traffic information sequence that converts the output of the last graph convolution layer into T' time steps in the future, and the input of the output layer is represented by transposing the input and reshaping it into X ^T ∈R ^T*N*C The T connected layers are used to generate the prediction, as follows:

representing a learnable parameter.

Step 6: and (3) testing the self-adaptive graph attention neural network model after training is finished by using the test set, evaluating the error of the model, returning to the step (2) if the error is greater than a set threshold value, and retraining the model.

In this step, we select the Huber loss (Huber 1992) as the loss function. The Huber loss is less sensitive to outliers than the squared error loss.

Wherein, Y represents a basic fact,

And 7: and inputting the traffic flow data of the first N set time slices of the road section to be predicted into the trained adaptive graph attention neural network model, and predicting the traffic flow of the future N time slices of the road section.

In this step, the history data is represented as x = (x) _t ,x _t-1 ,...,x _t-T+1 ) As an input traffic sequence of length T, x' = (x) _t+1 ,x _t+2 ,…,x _t+p ) Is the flow data of the next time step that we predict, and specifically defines the formula as follows, where

Are learnable parameters:

the systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A traffic flow prediction method based on a self-adaptive graph attention neural network is characterized by comprising the following steps:

step 1: setting time slices, collecting and counting historical traffic flow information under each time slice through a detection device installed at a traffic intersection, and forming a two-dimensional traffic flow matrix; dividing the obtained historical traffic flow information into a training set and a testing set;

step 2: the traffic flow detection device installed at each traffic intersection is regarded as a node, and the connection among the nodes under a single time slice and traffic flow data form a road network topological graph; connecting each node under the adjacent time slices with nodes of an upper time period and a lower time period, and dynamically adjusting the corresponding weight of the nodes by adopting an attention mechanism according to the road network flow information so as to construct a space-time network sequence and obtain a local space-time connection graph;

step 6: testing the self-adaptive graph attention neural network model after training by using a test set, evaluating the error of the model, returning to the step 2 if the error is larger than a set threshold value, and re-training the model;

2. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 1, wherein the step 1 specifically comprises: setting 5 minutes as a time slice, collecting and counting historical traffic flow information under each time slice through a detection device arranged at a traffic intersection, and forming a two-dimensional traffic flow matrix;

Wherein x _i Is used as the original data, and the data is transmitted,

for new data, μ _i Is a mean value, σ _i And n is the number of stations in the road section.

3. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 1, wherein the road network topology map in the step 2 comprises: the set V, | V | = N of all nodes represents the number of nodes, the edge set E among the nodes, and the weight adjacency matrix A of the edges among the nodes; obtaining a road network topological graph G = (V, E, A), and directly capturing each time according to the topological structure of the local space-time graphCorrelation between an individual node and its spatio-temporal neighbors; the construction of the spatio-temporal network sequence uses A epsilon R ^N*N A adjacency matrix representing a space diagram, A ∈ R ^3N*3N A adjacency matrix representing a local space-time diagram constructed on three consecutive spatial diagrams; for a node i in the space diagram, calculating a new index of the node i in the local space-time diagram by (t-1) N + i, wherein t (0 < t ≦ 3) represents a time step in the local space-time diagram; if two nodes are connected to each other in this local space-time diagram, the corresponding value in the adjacency matrix is set to 1; wherein the adjacency matrix of the local space-time diagram is represented as:

wherein v is _i Representing a node i in a local space-time diagram, wherein an adjacent matrix A comprises 3N nodes, and the diagonal line of the adjacent matrix is an adjacent matrix of a space network with three continuous time steps; the sides of the diagonal represent the connectivity of each node belonging to the adjacent time step to itself.

4. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 3, wherein the step 2 dynamically adjusts the corresponding weights of the nodes by adopting an attention mechanism, and comprises the following specific steps:

x _att ＝W ₂ δ(W ₁ X _p )

x _att expressing the attention coefficient, wherein

For trainable parameters, r represents the scaling ratio of the channel and δ represents the ReLU activation function; furthermore, to obtain a weight value between 0 and 1, x is recalibrated using the sigmoid activation function as follows _att ：

x′ _att ＝σ(x _att )

X _att ＝X⊙x′ _att

5. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 4, wherein the step 3: expanding the convolution kernel by adopting expanding convolution to increase the receptive field, which specifically comprises the following steps:

in this module, different dilation rate convolution inputs are employed to achieve short, medium and long term prediction goals; x _att ∈R ^T ^*N*C Is the input to the module, where T represents the time step of the input sequence; for mining short, medium and long term features, for X _att Is expandedConvolution with expansion ratio D =1,2,5, 11; then, after convolution in the time dimension, the results are concatenated as follows:

wherein X _att *f _D＝i (i =1,2,5,11) denotes the expansion convolution, f ∈ R ^1*2 Is a convolution kernel; to be consistent with the dimensions of the spatial features, the connected vectors will be dimension converted through the fully connected layers;

X _D ＝FC(X _cat )

FC denotes the full connection layer, X _D For the resulting vector, finally, a gating mechanism is used to control the transmission of the time information; the system consists of two parallel activation functions; the Tanh activation function is used to overcome the vanishing gradient problem; the sigmoid activation function maps data between 0 and 1 as message transfer control; x _D By gating mechanism in the following form:

H _T ＝g(X _D1 )⊙σ(X _D2 )。

6. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 5, wherein the step 4 specifically comprises:

the signal x on the graph G is filtered with a kernel G θ using a spatial convolution module as follows:

wherein,

θ、I _N are learnable parameters; A.

the matrix is normalized for it.

The following graph is presented to curl the layers:

Y＝Z ₁ H _T +Z ₂ H _S

wherein Z ₁ And Z ₂ Is a learnable parameter matrix.

7. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 6, wherein the step 5 specifically comprises:

wherein F (x) ₁ ,x ₂ ,…,x _t ) Denotes the prediction at the ith time step, W ₁ ⁽ⁱ⁾ ，W ₂ ⁽ⁱ⁾ ，

Representing a learnable parameter.

8. The traffic flow prediction method based on the adaptive graph attention neural network as claimed in claim 7, wherein said step 6 selects Huber loss as loss function,

wherein, Y represents a basic fact,

9. The traffic flow prediction method based on the adaptive graph attention neural network according to claim 8, characterized in that the step 7: inputting the traffic flow data of the front N set time slices of the road section to be predicted into a trained adaptive graph attention neural network model, predicting the traffic flow of the future N time slices of the road section, and specifically comprising the following steps:

in this step, the history data is represented asx＝(x _t ,x _t-1 ,...,x _t-T+1 ) As an input traffic sequence of length T, x' = (x) _t+1 ,x _t+2 ,…,x _t+p ) Is the predicted flow data for the next time step, specifically defining the formula as follows, where θ is a learnable parameter: (x) _t+1 ,x _t+2 ,…,x _t+p )＝F _θ (x _t ,x _t-1 ,...,x _t-T+1 )。