CN115423189A - Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism - Google Patents
Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism Download PDFInfo
- Publication number
- CN115423189A CN115423189A CN202211069744.XA CN202211069744A CN115423189A CN 115423189 A CN115423189 A CN 115423189A CN 202211069744 A CN202211069744 A CN 202211069744A CN 115423189 A CN115423189 A CN 115423189A
- Authority
- CN
- China
- Prior art keywords
- passenger flow
- time
- adaptive
- graph
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 28
- 230000007246 mechanism Effects 0.000 title claims abstract description 22
- 230000000306 recurrent effect Effects 0.000 title claims description 16
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 43
- 238000012549 training Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 15
- 230000007774 longterm Effects 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 12
- 230000003068 static effect Effects 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000004088 simulation Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000000137 annealing Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 2
- 238000003860 storage Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 3
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 3
- 238000009792 diffusion process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Marketing (AREA)
- Mathematical Physics (AREA)
- Development Economics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a rail transit passenger flow prediction model and method of an adaptive graph convolution cyclic neural network combined with an attention mechanism, wherein an AEAGRU module is used as a main network module of an encoder-decoder framework and is used for carrying out combined modeling on space-time correlation, and an attention-enhanced adaptive graph convolution module (AEAGCN) is provided for carrying out modeling on space dynamics and node specificity, so that the problems that the existing rail prediction method is weak in generalization capability and cannot well realize accurate rail passenger flow prediction are solved.
Description
Technical Field
The invention relates to the technical field of traffic flow prediction, in particular to a rail transit passenger flow prediction method of an adaptive graph convolution recurrent neural network combined with an attention mechanism.
Background
With the development and progress of economy, science and technology and society, the speed of urbanization is accelerated, and the urban traffic system is more convenient and more, and simultaneously brings huge challenges, such as traffic jam, energy consumption, environmental pollution and the like. Traffic supervision based on the intelligent traffic system can improve traffic efficiency to a certain extent and relieve traffic jam.
Accurate passenger flow prediction is one of the core research contents of the intelligent traffic system. For urban rail transit, along with the continuous construction and development of a rail network, the operation and schedule planning become more and more complex, and accurate traffic flow prediction is helpful for optimizing train scheduling, formulating a reasonable operation scheme, providing reference for trip planning of passengers, early warning for large-scale crowd intensive activities, planning and distributing subway lines and stations in the future and the like, so that the transportation pressure of rail transit is reduced, the trip convenience of the passengers is improved, and great help can be provided for the management of the whole subway network and even urban traffic.
The traditional method regards traffic flow prediction as a multi-time sequence prediction task, so a traditional time sequence modeling method such as ARIMA is used, but the method does not consider the spatial correlation among different time sequences, and most of the models cannot model nonlinear time sequences. Although some grid-based methods achieve good results, the natural topological connection property of the road network determines the characteristics of the non-european data of the traffic flow data. This further hinders the effective application of the conventional method to the traffic flow prediction task.
Inspired by the fact that a large number of researchers began modeling non-euclidean data using graph neural networks in recent spatio-temporal sequence prediction studies, some traffic flow prediction studies also began modeling complex spatial correlations using graph neural network-based approaches. Such as a diffusion graph convolution model, a space-time graph convolution model, etc. In a space correlation modeling mode, most of the existing graph neural network traffic flow prediction methods are based on a predefined static road network, the mode cannot model the space dynamics, and most of the methods regard all stations as the same or only manually mark attributes such as transfer stations and departure and destination stations. In a word, the existing track prediction method is not strong in generalization capability, cannot well realize accurate track passenger flow prediction, and still has a large promotion space. The existing orbit prediction task faces three challenges:
(1) Node specificity
In existing large-scale methods, subway stations are regarded as the same node, and the specificity of different nodes is rarely considered, wherein the modes of modeling the specificity of the nodes by methods such as artificial attribute labeling, embedding of an OD transfer matrix, embedding of POI similarity and embedding of a time sequence similarity (DWT) time graph and the like of a start-stop station or a transfer station and the like belong to methods of only traditional GCN, and the methods model the relevance among the nodes instead of explicitly modeling the specificity of the attributes of the nodes. Although such a method (after performing neighborhood aggregation on all spatial nodes, performing the same transformation on channel dimensions) can greatly reduce the number of parameters and increase the operation speed, in a real scene, even if the roads are directly adjacent or the stations have completely consistent functional attributes, the traffic flow change modes are not necessarily completely the same. There are natural deficiencies to modeling approaches that rely entirely on structural or functional similarity assumptions, lacking consideration of node specificity. In summary, it is necessary to explicitly model different nodes specifically, but it is a valuable challenge to solve the problem of parameter quantity explosion caused by modeling each node individually while explicitly modeling the node specificity.
(2) Spatial dynamic correlation
In prior work, a number of approaches model spatial correlation based on predefined static adjacency matrices. Such as a static graph based on topological connections: if node i is adjacent to node j, the corresponding element Aij in adjacency matrix A is set to 1, otherwise it is set to 0. And finally, carrying out aggregation of the spatial neighborhood through Laplace transformation. This type of method is called a static spatial correlation modeling method, and neglects that in a real scene, different sites are not only affected by the adjacent upstream and downstream passenger flows, but also related to factors such as time, weather, large-scale activities, and the like. Therefore, modeling spatial dynamic correlations is a very challenging task.
(3) Time step difference.
It is known that the timing prediction task has a phenomenon of error accumulation as the time step increases, that is, the error is larger as the predicted time step is longer. Meanwhile, all weight parameters based on the RNN model are shared on all time steps, and certain difference among different time steps along with the increase of the predicted time steps is not considered.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a rail transit passenger flow prediction model of an adaptive graph convolution cyclic neural network in combination with an attention mechanism.
One of the purposes of the invention is realized by the following technical scheme:
a rail transit passenger flow prediction model of an adaptive graph convolution recurrent neural network combined with an attention mechanism is built by the following steps:
step S1: respectively learning long-term and short-term dynamic spatial correlation in rail transit passenger flow by using a self-adaptive graph convolution network and an attention mechanism, performing long-term and short-term spatial correlation modeling by combining the two through graph convolution operation, and performing overall AEAGCRN model building through a coder-decoder framework;
step S2: using an AEAGRU module as a main network module of an encoder-decoder architecture, namely using a GRU as a main body framework, replacing MLP operation in the GRU with an attention-enhancing adaptive graph convolution module (AEAGCN) for joint modeling of spatio-temporal correlations;
and step S3: spatial dynamics and node specificity are modeled using an attention-enhancing adaptive graph convolution module (AEAGCN).
Further, in step S1, during training, the encoder state h is initialized by inputting a historical time series into the encoder 1 Initializing decoder states for the all-0 matrix and using encoder final states, using separation layers with time-step specificity and decodingThe projection layers of parameter sharing within a time step of the decoder together constitute the prediction layer of the decoder.
Further, in step S2, the attention-enhanced graph convolution (AEAGCN) module obtained in step S1 is embedded into the GRU to replace the MLP operation, and finally an attention-enhanced adaptive graph convolution loop network is obtained;
z t =σ(AEAGCN([X :,t ,h t-1 ]))
r t =σ(AEAGCN([X :,t ,h t-1 ]))。
further, the specific operation of step S2 is:
firstly, splicing the current time input and the current state, and then putting the current time input and the current state as input into an AEAGCN module; nonlinear transformation is carried out through a sigmoid activation function, wherein sigma refers to the sigmoid activation function, and the inputs of an update gate and a reset gate are respectively obtained; finally, carrying out gating operation on the two layers to obtain a hidden state and subsequent layer output:
wherein tanh is a nonlinear activation function mapped to the (-1, 1) interval; x :,t ,h t Input and output at the t time step, respectively; [. For]Representing a matrix splicing operation, denoted Hadamard product, z t ,r t Respectively an update gate and a reset gate,is based on the hidden state of the reset gate calculation.
Further, in the step S3, first, 2 adaptive globally shared node parameter matrices E are used 1 ,E 2 (E 1 ∈R N×D ) To model the spatial correlation of the global time domain, D being the parameter momentArray characteristic dimensions; e 1 ,E 2 Respectively representing the attributes of the inbound passenger flow and the outbound passenger flow of all the sites, and passing through E 1 ,E 2 Long-time scale global adaptive spatial correlation matrix A can be constructed by training parameter matrix static For learning the hidden association relationship between nodes, the specific formula is as follows:
wherein softmax represents a non-linear normalization mode, and ReLU represents a non-linear activation function; e 1 ,E 2 Training full training set data to obtain two global parameter shared incidence matrixes, and performing matrix operation and nonlinear transformation A att That is to say is A att Is adaptive;
secondly, capturing the dynamic change of the spatial correlation which exists in the near future from the recent historical data by using a dynamic graph structure based on an attention mechanism, wherein the specific formula is as follows:
representing a short-term spatial correlation matrix; wherein l represents a parameter in the l-th layer graph convolution operation;is a linear transformation weight parameter matrix; the x operator represents a matrix multiplication; by combining the two models, the model can effectively capture the characteristics of the dynamic spatial correlation under different time scales, and realize the modeling of the spatial correlation of long-term static and real-time dynamic states; the formula is as follows:
wherein l represents the parameter of the convolution operation of the l-th layer of graph and is distinguished from the global sharing parameter,representing a spatial correlation adaptive matrix in combination with global parameter sharing and local spatial features;
finally, referring to the convolution model of the normal map, by introducing a node channel domain feature matrix E c To characterize different node characteristics (E) c ∈R N×d ) And c represents a convolution model of the defined dynamic graph, and the specific formula is as follows:
wherein W c ∈R d×C_in×C_out Representing a trainable weight matrix, b c ∈R C_in×C_out Representing a bias term; h l The graph convolution network layer AEAGCN input representing the l-th layer is also the output of the graph convolution operation of the previous layer.
The invention also aims to provide a rail transit passenger flow prediction method of the adaptive graph convolution cyclic neural network combined with the attention mechanism, which utilizes the prediction model and comprises the following steps:
firstly, defining a track traffic network graph G = (V, E, A), wherein V represents a track station set, V = { V1., vn }, wherein | V | = N, N represents the number of track stations and the number of graph nodes, E represents an edge connecting two track stations in the graph, and A represents an adjacency matrix (measured by node distance or time sequence similarity) of time sequence traffic data nodes; the track timing passenger flow graph signal tensor can be expressed as X = { X = { :,1 ,X :,2 ,...,X :,t },X∈R N×T×C Wherein X is :,t ={X 1,t ,...,X i,t ,...,X N,t },X :,t ∈R N×C ,X N,t ∈R 1×C Using X i,t To represent the traffic data of the ith node at time t, C represents the channel dimension C =2, representing the traffic, and the multi-time-step traffic prediction problem can be defined as solving a mapping function f given the traffic tensor data X on the traffic network G θ ;f θ According to the passenger flow data of the past p time slices, the passenger flow data of the future delta time slices is obtained by mapping and is expressed by the following formula:
(X :,t+1 ,X :,t+2 ,...,X :,t+Δ )=f θ (X :,t-p+1, X :,t-p+2 ,...,X :,t ;G);
and then data preparation is carried out, the card swiping data is cleaned and abnormal repairing is carried out, the time slice size is set to be S minutes, the statistics is carried out on the incoming and outgoing passenger flow volume data in the interval of every S minutes, the incoming and outgoing passenger flow volume data are used as a group of data, and the tensor X of the graph signal belongs to R B×N×C (ii) a Wherein B is the sequence length ordered in time series, N is the number of track sites, and C is the attribute of a node; finally, according to the sequence window size required by input and output, carrying out window sliding operation on the time sequence data of the rail passenger flow, and making a training and testing data set X t ∈R N×T×C ;
And completing the construction of a prediction model, performing simulation training, and predicting the passenger flow of the rail transit by using the model.
Further, the time slice size is set to 10 minutes, i.e., S equals 10.
Further, the simulation training is to train the models by using a teacher-training method, optimize all models on training parameters by using an Adam optimizer, and adjust the learning rate by using a cosine annealing method; the same number of layers of AEAGRU model stacks are used on the model parameters as the main networks for the encoder and decoder, respectively.
The invention has the beneficial effects that:
(1) The invention carries out the construction of an integral AEAGCRN model through a coder-decoder framework, and regards the whole problem as a sequence prediction problem, wherein a layering layer is added before the last shared layer of a decoder to add a differentiated weight to different time steps, thereby solving the problem of time step difference;
(2) The method uses a plurality of node adaptive parameters to display and model attributes of different nodes, and simultaneously uses a low-rank node adaptive matrix to perform tensor decomposition on channel dimension transformation of the nodes, thereby relieving the problem of parameter quantity explosion.
(3) The adaptive parameter graph is captured to obtain the spatial dynamic association relation, the adaptive parameter graph is decomposed into two low-rank node adaptive parameter matrixes, the number of parameters is reduced, model convergence is facilitated, all the node adaptive parameters are shared globally, namely global sharing is achieved in all Network modules in an encoder and a decoder, namely direct connection channels are added into a Network in a phase-changing mode (namely the idea of high way Network), and a new solution is provided for the problem that graph convolution cannot be deeply achieved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as follows.
Drawings
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, in which:
FIG. 1 is a diagram of the overall model Seq2Seq framework of the present invention;
FIG. 2 is a block diagram of an attention enhancing GRU of the present invention;
FIG. 3 is a diagram of an attention enhancing adaptive map convolutional neural network of the present invention.
Detailed Description
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the preferred embodiments are illustrative of the invention only and are not limiting upon the scope of the invention.
As shown in the figure, the invention provides an attention-enhanced-based adaptive graph convolution recurrent neural network learning method, which comprises the following specific implementation steps of:
first, a track traffic network map G = (V, E, a) is defined, where V represents a track site set, V = { V1.,. Vn }, where | V | = N, N represents the number of track sites and also the number of map nodes. E represents the edge connecting the two track stations in the diagram. A represents the adjacency matrix of time series traffic data nodes (measured by node distance or time series similarity). The track timing passenger flow graph signal tensor can be expressed as X = { X = { :,1 ,X :,2 ,...,X :,t },X∈R N×T×C Wherein X is :,t ={X 1,t ,...,X i,t ,...,X N,t },X :,t ∈R N×C ,X N,t ∈R 1×C We use X i,t To represent the traffic data for the ith node at time t. C represents the channel dimension C =2, representing passenger ingress and egress. Given the tensor data X of traffic flow in traffic network G, the multi-time-step traffic flow prediction problem can be defined as solving a mapping function f θ 。f θ And mapping to obtain the passenger flow data of delta time slices in the future according to the passenger flow data of p time slices in the past. Can be expressed by the following formula.
(X :,t+1 ,X :,t+2 ,...,X :,t+Δ )=f θ (X :,t-p+1 ,X :,t-p+2 ,...,X :,t ;G)。
And (II) preparing data. Firstly, cleaning and abnormal repairing are carried out on the data of the card swiping. For example, defining a trip where a person is inbound at a site and outbound at the site as an invalid cycle trip; the journey that someone arrives at the station on the first day and leaves the station on the second day is defined as abnormal data. For invalid cycle runs and exception data, it should be deleted. Secondly, setting the size of the time slice to be S minutes, counting the incoming and outgoing passenger flow data in the interval of every S minutes, and taking the data as a group of data to obtain a graph signal tensor X belonging to R B×N×C . Where B is the sequence length in time series, N is the number of track sites, and C is the node's attribute. Finally, according to the size of the sequence window required by input and output, carrying out window sliding operation on the time sequence data of the rail passenger flow, and making a training and testing data set X t ∈R N×T×C . The size of the time slice can be selected according to actual needs, and in the embodiment, 10 minutes is adopted, i.e., S =10.
And (III) building a model. Firstly, respectively learning the long-term and short-term dynamic spatial correlation in rail transit passenger flow by using a self-adaptive graph convolution network and an attention mechanism, and fusing the long-term and short-term dynamic spatial correlation and the short-term dynamic spatial correlation to perform long-term and short-term spatial correlation modeling through graph convolution operation; then, embedding the graph convolution into a gated cycle unit (GRU) to perform dynamic space-time correlation joint modeling; and finally, performing integral model building through an encoder-decoder framework. Specifically, the construction of the prediction model comprises the following steps:
step S1: respectively learning long-term and short-term dynamic spatial correlation in rail transit passenger flow by using a self-adaptive graph convolution network and an attention mechanism, performing long-term and short-term spatial correlation modeling by combining the two through graph convolution operation, and performing overall AEAGCRN model building through a coder-decoder framework; in step S1, an encoder state h is initialized by inputting a historical time sequence into an encoder during training 1 The decoder state is initialized for the full 0 matrix by using the encoder final state, and the prediction layer of the decoder is formed by using a separation layer with time step specificity and a projection layer with parameter sharing in the decoder time step.
Step S2: using an AEAGRU module as a main network module of an encoder-decoder architecture, namely using a GRU as a main body framework, replacing MLP operation in the GRU with an attention-enhancing adaptive graph convolution module (AEAGCN) for joint modeling of spatio-temporal correlations; in step S2, the attention-enhanced atlas (AEAGCN) module obtained in step S1 is embedded into a GRU to replace MLP operation, and finally an attention-enhanced adaptive atlas circulation network is obtained;
z t =σ(AEAGCN([X :,t ,h t-1 ]))
r t =σ(AEAGCN([X :,t ,h t-1 ]))。
the specific operation is as follows:
firstly, splicing the current time input and the current state, and then putting the current time input and the current state as input into an AEAGCN module; nonlinear transformation is carried out through a sigmoid activation function, wherein sigma refers to the sigmoid activation function, and the inputs of an update gate and a reset gate are respectively obtained; finally, carrying out gating operation on the two layers to obtain a hidden state and subsequent layer output:
wherein tanh is a nonlinear activation function mapped to the (-1, 1) interval; x :,t ,h t Input and output at the t time step, respectively; [. The]Representing a matrix splicing operation, denoted Hadamard product, z t ,r t Respectively an update gate and a reset gate,is based on the hidden state of the reset gate calculation.
And step S3: spatial dynamics and node specificity are modeled using an attention-enhancing adaptive graph convolution module (AEAGCN). The method specifically comprises the following steps:
first, a 2-adaptive globally shared node parameter matrix E is used 1 ,E 2 (E 1 ∈R N×D ) Modeling the spatial correlation of the global time domain, wherein D is a characteristic dimension of a parameter matrix; e 1 ,E 2 Respectively representing the attributes of the inbound passenger flow and the outbound passenger flow of all the sites, and passing through E 1 ,E 2 Long-time scale global adaptive spatial correlation matrix A can be constructed by training parameter matrix static For learning the hidden association relationship between the nodes, the specific formula is as follows:
wherein softmax represents a nonlinear normalization mode, and ReLU represents a nonlinear activation function; e 1 ,E 2 Training full training set data to obtain two global parameter shared incidence matrixes, and performing matrix operation and nonlinear transformation A att That is to say is A att Is adaptive;
secondly, using a dynamic graph structure based on an attention mechanism to capture dynamic changes of spatial correlation existing in the near future from the near-future historical data, the specific formula is as follows:
representing a short-term spatial correlation matrix; wherein l represents a parameter in the l-th layer graph convolution operation;is a linear transformation weight parameter matrix; the x operator represents a matrix multiplication; by combining the two models, the model can effectively capture the characteristics of the dynamic space correlation under different time scales, and realize the simultaneous long-term static and real-time dynamicModeling the spatial correlation of (a); the formula is as follows:
wherein l represents the parameter of the convolution operation of the l-th layer of graph and is distinguished from the global sharing parameter,representing a spatial correlation adaptive matrix combining global parameter sharing and local spatial features;
finally, referring to the convolution model of the common view, by introducing the node channel domain feature matrix E c To characterize different node characteristics (E) c ∈R N×d ) And c represents a convolution model of the defined dynamic graph, and the specific formula is as follows:
wherein W c ∈R d×C_in×C_out Representing a trainable weight matrix, b c ∈R C_in×C_out Representing a bias term; h l The input of the graph convolution network layer AEAGCN representing the l-th layer is also the output of the graph convolution operation of the previous layer.
(IV) model training, in this embodiment, two real data sets are used for the experiment, and the two data sets are Beijing Subway and Chongqing Subway respectively. Firstly, a model is constructed by using the framework of Seq2Seq, and the input and the output are regarded as a sequence, wherein the encoder outputs and inputs with variable lengths. In the aspect of training skills, the embodiment trains the model by using a teacher-training method, in which the aforementioned attention-enhanced adaptive graph convolutional network module (AEAGCN) replaces an MLP module in a GRU, and the GRU is improved to be AEAGRU as a framework network. In the training process, the epochs of the first round do not use the output of the last state as the input of the next state, but directly use the real target data (ground route) of the training dataThe corresponding previous entry is used as the input of the next state. I.e. let the encoder learn a criterion using the real target data as input. As the training step progresses, the probability of adopting the real target data is reduced at an exponential reduction rate, and the output of the last state is used moreAs input for the next state.
In terms of training parameters, all models are optimized by Adam optimizer, and the learning rate is adjusted by using a cosine annealing method, wherein the period T =10. The maximum epoch number was 150, the initial learning rate lr was 0.005, the minimum learning rate was 1e-6, and the batch size was 32.
In terms of model parameters, first, the present embodiment uses AEAGRU model stacks of the same number of layers (layers = 2) as the main networks of the encoder and decoder, respectively. Second, for the encoder, this embodiment uses the tensor h of all 0 s 1 As an initial state tensor, where h 1 ∈R B×C_in×C_out B denotes the size of batch _ size, C _ in and C _ out represent the channel dimensions of input and output, respectively, C _ in =64, C _out =64. Finally, the dimension d of the adaptive parameter matrix E for different nodes of AEAGCN is uniformly set to the same 32.
In order to illustrate the effectiveness of the invention, a series of experimental verifications are carried out, and a baseline model widely used in the field of traffic flow prediction at the present stage and the most advanced model are selected to carry out comparison experiments. The method comprises the following steps:
ARIMA: kalman filtering autoregressive comprehensive moving average model
FC-LSTM: a long and short term memory network. A recurrent neural network of fully connected computational hidden units is used.
DCRNN: the diffusion map is convolved with a recurrent neural network. It combines an image convolution network with a recurrent neural network using an encoder-decoder approach.
STGCN: and (3) convoluting the neural network by the space-time diagram. The method carries out space-time dependence modeling by combining graph convolution and a one-dimensional convolution neural network.
AGCRN: an adaptive graph convolution recurrent neural network. It combines adaptive graph convolution with a recurrent neural network.
GWN: and (5) mapping a wavelet neural network. The method only uses self-adaptive graph convolution to carry out spatial modeling, and carries out time correlation modeling through a one-dimensional diffusion convolution neural network and a gate control mechanism.
The time series algorithm is used for predicting whether the effect is good or not and depends on a good evaluation index. In this example, common MAE, RMSE and MAPE were selected for effect evaluation.
Because of X in MAPE i The traffic of the station inbound or outbound flow is less than or equal to δ (δ =1 is selected at this time) every 10 minutes, so that the disturbance of low-flow data to the MAPE index is reduced.
The results are shown in tables 1 and 2 through experimental verification on a subway data set BJ _ Metro in a certain city and a subway data set CQ _ Metro in a direct prefecture city. By comparing the predicted step lengths of 1,3 and 6, the experimental results of the models of the station entrance and exit population on the test set corresponding to the predicted 10 minutes, 30 minutes and 60 minutes in the future are shown. Where AEAGCN-64 and AEAGCN-32 represent the model when the model hidden unit dimensions are set to 64 and 32. Bold font indicates that the model achieves the best performance in the data set, underlining indicates that a suboptimal performance is achieved.
Table 1: experimental results on CQ _ Metro dataset
Table 2: experimental results on BJ _ Metro dataset
The final result shows that the indexes are superior to those of the existing method on the BJ _ Metro data set, and partial indexes exceed those of the existing latest method on the CQ _ Metro data set, so that the partial indexes achieve the approximately optimal effect. Wherein AEAGCN-64 and AEAGCN-32 indicate that the dimension of the hidden unit of the model is set to 64 and 32, the result indicates that the learning capability of the model is greatly improved along with the adjustment of the hidden dimension, but subsequent experiments also find that the ratio of the improvement benefit of improving the dimension of the hidden unit to the space cost is gradually decreased, and 64 is a more appropriate parameter value. The experiment uses a hardware platform of Intel (R) Xeon (R) Silver 4210RCPU @2.40GHz CPU and 2080Ti GPU, and is built by a Pythroch training frame based on a Windows10 system and a software platform under Python language.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable connection, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, or the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media includes instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A rail transit passenger flow prediction model of an adaptive graph convolution recurrent neural network combined with an attention mechanism is characterized in that: the model is built by the following steps:
step S1: respectively learning long-term and short-term dynamic spatial correlation in rail transit passenger flow by using an adaptive graph convolution network and an attention mechanism, carrying out long-term and short-term spatial correlation modeling by fusing the long-term and short-term dynamic spatial correlation and the short-term dynamic spatial correlation, and carrying out integral AEAGCRN model building by using a coder-decoder framework;
step S2: an AEAGRU module is used as a main network module of an encoder-decoder framework, namely a GRU is used as a main body framework, MLP operation in the GRU is replaced by an attention-enhanced adaptive graph convolution module (AEAGCN) for carrying out joint modeling on spatio-temporal correlation;
and step S3: spatial dynamics and node specificity are modeled using an attention-enhancing adaptive graph convolution module (AEAGCN).
2. The rail transit passenger flow prediction model of the adaptive graph convolution recurrent neural network in combination with an attention mechanism of claim 1, characterized in that: in step S1, during training, by inputting the historical time sequence into the encoder, the encoder state h1 is initialized to the all-0 matrix, and the decoder state is initialized using the encoder final state, and the prediction layer of the decoder is composed using the separation layer with time step specificity and the projection layer shared by the parameters in the decoder time step.
3. The rail transit passenger flow prediction model of the adaptive graph convolution recurrent neural network in combination with an attention mechanism of claim 2, characterized in that: in step S2, embedding the attention-enhanced graph convolution (AEAGCN) module obtained in step S1 into a GRU to replace MLP operation, and finally obtaining an attention-enhanced-based adaptive graph convolution loop network;
z t =σ(AEAGCN([X :,t ,h t-1 ]))
r t =σ(AEAGCN([X :,t ,h t-1 ]))。
4. the rail transit passenger flow prediction model of the adaptive graph convolution recurrent neural network in combination with an attention mechanism of claim 3, characterized in that: the specific operation of the step S2 is:
firstly, splicing the current time input and the current state, and then putting the current time input and the current state as input into an AEAGCN module; nonlinear transformation is carried out through a sigmoid activation function, wherein sigma refers to the sigmoid activation function, and the inputs of an update gate and a reset gate are respectively obtained; finally, performing gating operation on the two to obtain a hidden state and subsequent layer output:
wherein tanh is a nonlinear activation function mapped to the (-1, 1) interval; x :,t ,h t Input and output at the t time step, respectively; [. The]Representing a matrix splicing operation, representing a Hadamard product, z t ,r t Respectively an update gate and a reset gate,is based on the hidden state of the reset gate calculation.
5. The model of any one of claims 1 to 4 for predicting rail transit passenger flow in conjunction with an adaptive graph-convolution recurrent neural network for attention, characterized by: in step S3, first, 2 adaptive globally shared node parameter matrices E are used 1 ,E 2 (E 1 ∈R N×D ) Modeling the spatial correlation of the global time domain, wherein D is the characteristic dimension of the parameter matrix; e 1 ,E 2 Respectively representing the attributes of the passenger flow entering the station and the passenger flow leaving the station byE 1 ,E 2 A parameter matrix can be trained to construct a long-time-scale globally adaptive spatial correlation matrix A static For learning the hidden association relationship between nodes, the specific formula is as follows:
wherein softmax represents a nonlinear normalization mode, and ReLU represents a nonlinear activation function; e 1 ,E 2 Training full training set data to obtain two global parameter shared incidence matrixes, and performing matrix operation and nonlinear transformation A att That is to say is A att Is adaptive;
secondly, using a dynamic graph structure based on an attention mechanism to capture dynamic changes of spatial correlation existing in the near future from the near-future historical data, the specific formula is as follows:
representing a short-term spatial correlation matrix; wherein l represents a parameter in the l-th layer graph convolution operation;is a linear transformation weight parameter matrix; the x operator represents a matrix multiplication; by combining the two models, the model can effectively capture the characteristics of the dynamic spatial correlation under different time scales, and realize the modeling of the spatial correlation of long-term static and real-time dynamic states; the formula is as follows:
wherein l represents the parameter of the convolution operation of the l-th layer of graph and is distinguished from the global sharing parameter,representing a spatial correlation adaptive matrix in combination with global parameter sharing and local spatial features;
finally, referring to the convolution model of the normal map, by introducing a node channel domain feature matrix E c To characterize different node characteristics (E) c ∈R N×d ) The concrete formula is as follows:
wherein W c ∈R d×C_in×C_out Representing a trainable weight matrix, b c ∈R C_in×C_out Representing a bias term; h l The input of the graph convolution network layer AEAGCN representing the l-th layer is also the output of the graph convolution operation of the previous layer.
6. A method for rail transit passenger flow prediction in conjunction with an adaptive graph convolution cyclic neural network of the attention mechanism, using a prediction model according to any one of claims 1 to 5, characterized by:
firstly, defining a track traffic network graph G = (V, E, A), wherein V represents a track station set, V = { V1., vn }, wherein | V | = N, N represents the number of track stations and the number of graph nodes, E represents an edge connecting two track stations in the graph, and A represents an adjacency matrix of time-series traffic data nodes (measured by node distance or time-series similarity); the track timing passenger flow graph signal tensor can be expressed as X = { X = { :,1 ,X :,2 ,...,X :,t },X∈R N×T×C Wherein X is :,t ={X 1,t ,...,X i,t ,...,X N,t },X :,t ∈R N×C ,X N,t ∈R 1×C Using X i,t To represent the ith node atThe traffic flow data of time t, C represents the channel dimension C =2, representing the traffic flow, and the multi-time-step traffic flow prediction problem can be defined as solving a mapping function f under the precondition of giving the traffic tensor data X on the traffic network G θ ;f θ According to the passenger flow data of the past p time slices, the passenger flow data of the future delta time slices is obtained by mapping and is expressed by the following formula:
(X :,t+1 ,X :,t+2 ,…,X :,t+Δ )=f θ (X :,t-p+1 ,X :,t-p+2 ,…,X :,t ;G);
and then data preparation is carried out, the card swiping data is cleaned and abnormal repairing is carried out, the time slice size is set to be S minutes, the statistics is carried out on the incoming and outgoing passenger flow volume data in the interval of every S minutes, the incoming and outgoing passenger flow volume data are used as a group of data, and the tensor X of the graph signal belongs to R B×N×C (ii) a Wherein B is the sequence length ordered in time series, N is the number of track sites, and C is the attribute of a node; finally, according to the size of the sequence window required by input and output, carrying out window sliding operation on the time sequence data of the rail passenger flow, and making a training and testing data set X t ∈R N×T×C ;
Completing the construction of a prediction model according to any one of claims 1 to 5, carrying out simulation training, and carrying out rail transit passenger flow prediction by using the model.
7. The method of claim 6 for predicting rail transit passenger flow in conjunction with an adaptive graph convolution recurrent neural network in an attention mechanism, comprising: the time slice size is set to 10 minutes, i.e. S equals 10.
8. The method for predicting the passenger flow in rail transit of the adaptive graph-convolution recurrent neural network in combination with the attention mechanism as recited in claim 6, wherein: the simulation training is to train the models by using a teacher-training method, optimize all the models on training parameters by using an Adam optimizer, and adjust the learning rate by using a cosine annealing method; the same number of layers of AEAGRU model stacks are used on the model parameters as the master network for the encoder and decoder, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211069744.XA CN115423189A (en) | 2022-09-02 | 2022-09-02 | Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211069744.XA CN115423189A (en) | 2022-09-02 | 2022-09-02 | Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115423189A true CN115423189A (en) | 2022-12-02 |
Family
ID=84202330
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211069744.XA Pending CN115423189A (en) | 2022-09-02 | 2022-09-02 | Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115423189A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128122A (en) * | 2023-01-03 | 2023-05-16 | 北京交通大学 | Urban rail transit short-time passenger flow prediction method considering burst factors |
CN116432868A (en) * | 2023-06-12 | 2023-07-14 | 深圳大学 | Subway passenger flow prediction method and device based on node query set and storage medium |
CN117475638A (en) * | 2023-12-26 | 2024-01-30 | 北京建筑大学 | Traffic OD passenger flow prediction method and system based on multichannel hypergraph convolutional network |
-
2022
- 2022-09-02 CN CN202211069744.XA patent/CN115423189A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128122A (en) * | 2023-01-03 | 2023-05-16 | 北京交通大学 | Urban rail transit short-time passenger flow prediction method considering burst factors |
CN116128122B (en) * | 2023-01-03 | 2023-09-12 | 北京交通大学 | Urban rail transit short-time passenger flow prediction method considering burst factors |
CN116432868A (en) * | 2023-06-12 | 2023-07-14 | 深圳大学 | Subway passenger flow prediction method and device based on node query set and storage medium |
CN116432868B (en) * | 2023-06-12 | 2023-09-19 | 深圳大学 | Subway passenger flow prediction method and device based on node query set and storage medium |
CN117475638A (en) * | 2023-12-26 | 2024-01-30 | 北京建筑大学 | Traffic OD passenger flow prediction method and system based on multichannel hypergraph convolutional network |
CN117475638B (en) * | 2023-12-26 | 2024-03-08 | 北京建筑大学 | Traffic OD passenger flow prediction method and system based on multichannel hypergraph convolutional network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109697852B (en) | Urban road congestion degree prediction method based on time sequence traffic events | |
Wan et al. | CTS-LSTM: LSTM-based neural networks for correlatedtime series prediction | |
CN109887282B (en) | Road network traffic flow prediction method based on hierarchical timing diagram convolutional network | |
CN115423189A (en) | Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism | |
CN111210633B (en) | Short-term traffic flow prediction method based on deep learning | |
CN112489426B (en) | Urban traffic flow space-time prediction scheme based on graph convolution neural network | |
CN111242292B (en) | OD data prediction method and system based on deep space-time network | |
CN114692984B (en) | Traffic prediction method based on multi-step coupling graph convolution network | |
CN111242395B (en) | Method and device for constructing prediction model for OD (origin-destination) data | |
Liu et al. | Smart environment design planning for smart city based on deep learning | |
Liu et al. | Fedgru: Privacy-preserving traffic flow prediction via federated learning | |
CN111652425A (en) | River water quality prediction method based on rough set and long and short term memory network | |
CN110991607B (en) | Subway passenger flow prediction method and device, electronic equipment and storage medium | |
CN113112791A (en) | Traffic flow prediction method based on sliding window long-and-short term memory network | |
Xu et al. | AGNP: Network-wide short-term probabilistic traffic speed prediction and imputation | |
CN117116048A (en) | Knowledge-driven traffic prediction method based on knowledge representation model and graph neural network | |
CN118297775A (en) | Urban planning management and control system based on digital twin technology | |
Xu et al. | A taxi dispatch system based on prediction of demand and destination | |
Wu et al. | Learning spatial–temporal pairwise and high-order relationships for short-term passenger flow prediction in urban rail transit | |
Miao et al. | A queue hybrid neural network with weather weighted factor for traffic flow prediction | |
Yang et al. | MGSTCN: a multi-graph spatio-temporal convolutional network for metro passenger flow prediction | |
Ye et al. | Demand forecasting of online car‐hailing by exhaustively capturing the temporal dependency with TCN and Attention approaches | |
Yang et al. | Short‐Term Forecasting of Dockless Bike‐Sharing Demand with the Built Environment and Weather | |
Jiao et al. | Multi-step traffic flow prediction method based on the Conv1D+ LSTM | |
Mou et al. | Short-term traffic flow prediction: A long short-term memory model enhanced by temporal information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |