CN115423189A - Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism - Google Patents

Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism Download PDF

Info

Publication number
CN115423189A
CN115423189A CN202211069744.XA CN202211069744A CN115423189A CN 115423189 A CN115423189 A CN 115423189A CN 202211069744 A CN202211069744 A CN 202211069744A CN 115423189 A CN115423189 A CN 115423189A
Authority
CN
China
Prior art keywords
passenger flow
time
adaptive
graph
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211069744.XA
Other languages
Chinese (zh)
Inventor
郑林江
李鹏
刘卫宁
孙棣华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202211069744.XA priority Critical patent/CN115423189A/en
Publication of CN115423189A publication Critical patent/CN115423189A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a rail transit passenger flow prediction model and method of an adaptive graph convolution cyclic neural network combined with an attention mechanism, wherein an AEAGRU module is used as a main network module of an encoder-decoder framework and is used for carrying out combined modeling on space-time correlation, and an attention-enhanced adaptive graph convolution module (AEAGCN) is provided for carrying out modeling on space dynamics and node specificity, so that the problems that the existing rail prediction method is weak in generalization capability and cannot well realize accurate rail passenger flow prediction are solved.

Description

Rail transit passenger flow prediction model and method of adaptive graph convolution cyclic neural network combined with attention mechanism
Technical Field
The invention relates to the technical field of traffic flow prediction, in particular to a rail transit passenger flow prediction method of an adaptive graph convolution recurrent neural network combined with an attention mechanism.
Background
With the development and progress of economy, science and technology and society, the speed of urbanization is accelerated, and the urban traffic system is more convenient and more, and simultaneously brings huge challenges, such as traffic jam, energy consumption, environmental pollution and the like. Traffic supervision based on the intelligent traffic system can improve traffic efficiency to a certain extent and relieve traffic jam.
Accurate passenger flow prediction is one of the core research contents of the intelligent traffic system. For urban rail transit, along with the continuous construction and development of a rail network, the operation and schedule planning become more and more complex, and accurate traffic flow prediction is helpful for optimizing train scheduling, formulating a reasonable operation scheme, providing reference for trip planning of passengers, early warning for large-scale crowd intensive activities, planning and distributing subway lines and stations in the future and the like, so that the transportation pressure of rail transit is reduced, the trip convenience of the passengers is improved, and great help can be provided for the management of the whole subway network and even urban traffic.
The traditional method regards traffic flow prediction as a multi-time sequence prediction task, so a traditional time sequence modeling method such as ARIMA is used, but the method does not consider the spatial correlation among different time sequences, and most of the models cannot model nonlinear time sequences. Although some grid-based methods achieve good results, the natural topological connection property of the road network determines the characteristics of the non-european data of the traffic flow data. This further hinders the effective application of the conventional method to the traffic flow prediction task.
Inspired by the fact that a large number of researchers began modeling non-euclidean data using graph neural networks in recent spatio-temporal sequence prediction studies, some traffic flow prediction studies also began modeling complex spatial correlations using graph neural network-based approaches. Such as a diffusion graph convolution model, a space-time graph convolution model, etc. In a space correlation modeling mode, most of the existing graph neural network traffic flow prediction methods are based on a predefined static road network, the mode cannot model the space dynamics, and most of the methods regard all stations as the same or only manually mark attributes such as transfer stations and departure and destination stations. In a word, the existing track prediction method is not strong in generalization capability, cannot well realize accurate track passenger flow prediction, and still has a large promotion space. The existing orbit prediction task faces three challenges:
(1) Node specificity
In existing large-scale methods, subway stations are regarded as the same node, and the specificity of different nodes is rarely considered, wherein the modes of modeling the specificity of the nodes by methods such as artificial attribute labeling, embedding of an OD transfer matrix, embedding of POI similarity and embedding of a time sequence similarity (DWT) time graph and the like of a start-stop station or a transfer station and the like belong to methods of only traditional GCN, and the methods model the relevance among the nodes instead of explicitly modeling the specificity of the attributes of the nodes. Although such a method (after performing neighborhood aggregation on all spatial nodes, performing the same transformation on channel dimensions) can greatly reduce the number of parameters and increase the operation speed, in a real scene, even if the roads are directly adjacent or the stations have completely consistent functional attributes, the traffic flow change modes are not necessarily completely the same. There are natural deficiencies to modeling approaches that rely entirely on structural or functional similarity assumptions, lacking consideration of node specificity. In summary, it is necessary to explicitly model different nodes specifically, but it is a valuable challenge to solve the problem of parameter quantity explosion caused by modeling each node individually while explicitly modeling the node specificity.
(2) Spatial dynamic correlation
In prior work, a number of approaches model spatial correlation based on predefined static adjacency matrices. Such as a static graph based on topological connections: if node i is adjacent to node j, the corresponding element Aij in adjacency matrix A is set to 1, otherwise it is set to 0. And finally, carrying out aggregation of the spatial neighborhood through Laplace transformation. This type of method is called a static spatial correlation modeling method, and neglects that in a real scene, different sites are not only affected by the adjacent upstream and downstream passenger flows, but also related to factors such as time, weather, large-scale activities, and the like. Therefore, modeling spatial dynamic correlations is a very challenging task.
(3) Time step difference.
It is known that the timing prediction task has a phenomenon of error accumulation as the time step increases, that is, the error is larger as the predicted time step is longer. Meanwhile, all weight parameters based on the RNN model are shared on all time steps, and certain difference among different time steps along with the increase of the predicted time steps is not considered.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a rail transit passenger flow prediction model of an adaptive graph convolution cyclic neural network in combination with an attention mechanism.
One of the purposes of the invention is realized by the following technical scheme:
a rail transit passenger flow prediction model of an adaptive graph convolution recurrent neural network combined with an attention mechanism is built by the following steps:
step S1: respectively learning long-term and short-term dynamic spatial correlation in rail transit passenger flow by using a self-adaptive graph convolution network and an attention mechanism, performing long-term and short-term spatial correlation modeling by combining the two through graph convolution operation, and performing overall AEAGCRN model building through a coder-decoder framework;
step S2: using an AEAGRU module as a main network module of an encoder-decoder architecture, namely using a GRU as a main body framework, replacing MLP operation in the GRU with an attention-enhancing adaptive graph convolution module (AEAGCN) for joint modeling of spatio-temporal correlations;
and step S3: spatial dynamics and node specificity are modeled using an attention-enhancing adaptive graph convolution module (AEAGCN).
Further, in step S1, during training, the encoder state h is initialized by inputting a historical time series into the encoder 1 Initializing decoder states for the all-0 matrix and using encoder final states, using separation layers with time-step specificity and decodingThe projection layers of parameter sharing within a time step of the decoder together constitute the prediction layer of the decoder.
Further, in step S2, the attention-enhanced graph convolution (AEAGCN) module obtained in step S1 is embedded into the GRU to replace the MLP operation, and finally an attention-enhanced adaptive graph convolution loop network is obtained;
z t =σ(AEAGCN([X :,t ,h t-1 ]))
r t =σ(AEAGCN([X :,t ,h t-1 ]))。
further, the specific operation of step S2 is:
firstly, splicing the current time input and the current state, and then putting the current time input and the current state as input into an AEAGCN module; nonlinear transformation is carried out through a sigmoid activation function, wherein sigma refers to the sigmoid activation function, and the inputs of an update gate and a reset gate are respectively obtained; finally, carrying out gating operation on the two layers to obtain a hidden state and subsequent layer output:
Figure BDA0003829235660000031
Figure BDA0003829235660000032
wherein tanh is a nonlinear activation function mapped to the (-1, 1) interval; x :,t ,h t Input and output at the t time step, respectively; [. For]Representing a matrix splicing operation, denoted Hadamard product, z t ,r t Respectively an update gate and a reset gate,
Figure BDA0003829235660000033
is based on the hidden state of the reset gate calculation.
Further, in the step S3, first, 2 adaptive globally shared node parameter matrices E are used 1 ,E 2 (E 1 ∈R N×D ) To model the spatial correlation of the global time domain, D being the parameter momentArray characteristic dimensions; e 1 ,E 2 Respectively representing the attributes of the inbound passenger flow and the outbound passenger flow of all the sites, and passing through E 1 ,E 2 Long-time scale global adaptive spatial correlation matrix A can be constructed by training parameter matrix static For learning the hidden association relationship between nodes, the specific formula is as follows:
Figure BDA0003829235660000034
wherein softmax represents a non-linear normalization mode, and ReLU represents a non-linear activation function; e 1 ,E 2 Training full training set data to obtain two global parameter shared incidence matrixes, and performing matrix operation and nonlinear transformation A att That is to say is A att Is adaptive;
secondly, capturing the dynamic change of the spatial correlation which exists in the near future from the recent historical data by using a dynamic graph structure based on an attention mechanism, wherein the specific formula is as follows:
Figure BDA0003829235660000041
Figure BDA0003829235660000042
representing a short-term spatial correlation matrix; wherein l represents a parameter in the l-th layer graph convolution operation;
Figure BDA0003829235660000043
is a linear transformation weight parameter matrix; the x operator represents a matrix multiplication; by combining the two models, the model can effectively capture the characteristics of the dynamic spatial correlation under different time scales, and realize the modeling of the spatial correlation of long-term static and real-time dynamic states; the formula is as follows:
Figure BDA0003829235660000044
wherein l represents the parameter of the convolution operation of the l-th layer of graph and is distinguished from the global sharing parameter,
Figure BDA0003829235660000045
representing a spatial correlation adaptive matrix in combination with global parameter sharing and local spatial features;
finally, referring to the convolution model of the normal map, by introducing a node channel domain feature matrix E c To characterize different node characteristics (E) c ∈R N×d ) And c represents a convolution model of the defined dynamic graph, and the specific formula is as follows:
Figure BDA0003829235660000046
wherein W c ∈R d×C_in×C_out Representing a trainable weight matrix, b c ∈R C_in×C_out Representing a bias term; h l The graph convolution network layer AEAGCN input representing the l-th layer is also the output of the graph convolution operation of the previous layer.
The invention also aims to provide a rail transit passenger flow prediction method of the adaptive graph convolution cyclic neural network combined with the attention mechanism, which utilizes the prediction model and comprises the following steps:
firstly, defining a track traffic network graph G = (V, E, A), wherein V represents a track station set, V = { V1., vn }, wherein | V | = N, N represents the number of track stations and the number of graph nodes, E represents an edge connecting two track stations in the graph, and A represents an adjacency matrix (measured by node distance or time sequence similarity) of time sequence traffic data nodes; the track timing passenger flow graph signal tensor can be expressed as X = { X = { :,1 ,X :,2 ,...,X :,t },X∈R N×T×C Wherein X is :,t ={X 1,t ,...,X i,t ,...,X N,t },X :,t ∈R N×C ,X N,t ∈R 1×C Using X i,t To represent the traffic data of the ith node at time t, C represents the channel dimension C =2, representing the traffic, and the multi-time-step traffic prediction problem can be defined as solving a mapping function f given the traffic tensor data X on the traffic network G θ ;f θ According to the passenger flow data of the past p time slices, the passenger flow data of the future delta time slices is obtained by mapping and is expressed by the following formula:
(X :,t+1 ,X :,t+2 ,...,X :,t+Δ )=f θ (X :,t-p+1, X :,t-p+2 ,...,X :,t ;G);
and then data preparation is carried out, the card swiping data is cleaned and abnormal repairing is carried out, the time slice size is set to be S minutes, the statistics is carried out on the incoming and outgoing passenger flow volume data in the interval of every S minutes, the incoming and outgoing passenger flow volume data are used as a group of data, and the tensor X of the graph signal belongs to R B×N×C (ii) a Wherein B is the sequence length ordered in time series, N is the number of track sites, and C is the attribute of a node; finally, according to the sequence window size required by input and output, carrying out window sliding operation on the time sequence data of the rail passenger flow, and making a training and testing data set X t ∈R N×T×C
And completing the construction of a prediction model, performing simulation training, and predicting the passenger flow of the rail transit by using the model.
Further, the time slice size is set to 10 minutes, i.e., S equals 10.
Further, the simulation training is to train the models by using a teacher-training method, optimize all models on training parameters by using an Adam optimizer, and adjust the learning rate by using a cosine annealing method; the same number of layers of AEAGRU model stacks are used on the model parameters as the main networks for the encoder and decoder, respectively.
The invention has the beneficial effects that:
(1) The invention carries out the construction of an integral AEAGCRN model through a coder-decoder framework, and regards the whole problem as a sequence prediction problem, wherein a layering layer is added before the last shared layer of a decoder to add a differentiated weight to different time steps, thereby solving the problem of time step difference;
(2) The method uses a plurality of node adaptive parameters to display and model attributes of different nodes, and simultaneously uses a low-rank node adaptive matrix to perform tensor decomposition on channel dimension transformation of the nodes, thereby relieving the problem of parameter quantity explosion.
(3) The adaptive parameter graph is captured to obtain the spatial dynamic association relation, the adaptive parameter graph is decomposed into two low-rank node adaptive parameter matrixes, the number of parameters is reduced, model convergence is facilitated, all the node adaptive parameters are shared globally, namely global sharing is achieved in all Network modules in an encoder and a decoder, namely direct connection channels are added into a Network in a phase-changing mode (namely the idea of high way Network), and a new solution is provided for the problem that graph convolution cannot be deeply achieved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as follows.
Drawings
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, in which:
FIG. 1 is a diagram of the overall model Seq2Seq framework of the present invention;
FIG. 2 is a block diagram of an attention enhancing GRU of the present invention;
FIG. 3 is a diagram of an attention enhancing adaptive map convolutional neural network of the present invention.
Detailed Description
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the preferred embodiments are illustrative of the invention only and are not limiting upon the scope of the invention.
As shown in the figure, the invention provides an attention-enhanced-based adaptive graph convolution recurrent neural network learning method, which comprises the following specific implementation steps of:
first, a track traffic network map G = (V, E, a) is defined, where V represents a track site set, V = { V1.,. Vn }, where | V | = N, N represents the number of track sites and also the number of map nodes. E represents the edge connecting the two track stations in the diagram. A represents the adjacency matrix of time series traffic data nodes (measured by node distance or time series similarity). The track timing passenger flow graph signal tensor can be expressed as X = { X = { :,1 ,X :,2 ,...,X :,t },X∈R N×T×C Wherein X is :,t ={X 1,t ,...,X i,t ,...,X N,t },X :,t ∈R N×C ,X N,t ∈R 1×C We use X i,t To represent the traffic data for the ith node at time t. C represents the channel dimension C =2, representing passenger ingress and egress. Given the tensor data X of traffic flow in traffic network G, the multi-time-step traffic flow prediction problem can be defined as solving a mapping function f θ 。f θ And mapping to obtain the passenger flow data of delta time slices in the future according to the passenger flow data of p time slices in the past. Can be expressed by the following formula.
(X :,t+1 ,X :,t+2 ,...,X :,t+Δ )=f θ (X :,t-p+1 ,X :,t-p+2 ,...,X :,t ;G)。
And (II) preparing data. Firstly, cleaning and abnormal repairing are carried out on the data of the card swiping. For example, defining a trip where a person is inbound at a site and outbound at the site as an invalid cycle trip; the journey that someone arrives at the station on the first day and leaves the station on the second day is defined as abnormal data. For invalid cycle runs and exception data, it should be deleted. Secondly, setting the size of the time slice to be S minutes, counting the incoming and outgoing passenger flow data in the interval of every S minutes, and taking the data as a group of data to obtain a graph signal tensor X belonging to R B×N×C . Where B is the sequence length in time series, N is the number of track sites, and C is the node's attribute. Finally, according to the size of the sequence window required by input and output, carrying out window sliding operation on the time sequence data of the rail passenger flow, and making a training and testing data set X t ∈R N×T×C . The size of the time slice can be selected according to actual needs, and in the embodiment, 10 minutes is adopted, i.e., S =10.
And (III) building a model. Firstly, respectively learning the long-term and short-term dynamic spatial correlation in rail transit passenger flow by using a self-adaptive graph convolution network and an attention mechanism, and fusing the long-term and short-term dynamic spatial correlation and the short-term dynamic spatial correlation to perform long-term and short-term spatial correlation modeling through graph convolution operation; then, embedding the graph convolution into a gated cycle unit (GRU) to perform dynamic space-time correlation joint modeling; and finally, performing integral model building through an encoder-decoder framework. Specifically, the construction of the prediction model comprises the following steps:
step S1: respectively learning long-term and short-term dynamic spatial correlation in rail transit passenger flow by using a self-adaptive graph convolution network and an attention mechanism, performing long-term and short-term spatial correlation modeling by combining the two through graph convolution operation, and performing overall AEAGCRN model building through a coder-decoder framework; in step S1, an encoder state h is initialized by inputting a historical time sequence into an encoder during training 1 The decoder state is initialized for the full 0 matrix by using the encoder final state, and the prediction layer of the decoder is formed by using a separation layer with time step specificity and a projection layer with parameter sharing in the decoder time step.
Step S2: using an AEAGRU module as a main network module of an encoder-decoder architecture, namely using a GRU as a main body framework, replacing MLP operation in the GRU with an attention-enhancing adaptive graph convolution module (AEAGCN) for joint modeling of spatio-temporal correlations; in step S2, the attention-enhanced atlas (AEAGCN) module obtained in step S1 is embedded into a GRU to replace MLP operation, and finally an attention-enhanced adaptive atlas circulation network is obtained;
z t =σ(AEAGCN([X :,t ,h t-1 ]))
r t =σ(AEAGCN([X :,t ,h t-1 ]))。
the specific operation is as follows:
firstly, splicing the current time input and the current state, and then putting the current time input and the current state as input into an AEAGCN module; nonlinear transformation is carried out through a sigmoid activation function, wherein sigma refers to the sigmoid activation function, and the inputs of an update gate and a reset gate are respectively obtained; finally, carrying out gating operation on the two layers to obtain a hidden state and subsequent layer output:
Figure BDA0003829235660000071
Figure BDA0003829235660000072
wherein tanh is a nonlinear activation function mapped to the (-1, 1) interval; x :,t ,h t Input and output at the t time step, respectively; [. The]Representing a matrix splicing operation, denoted Hadamard product, z t ,r t Respectively an update gate and a reset gate,
Figure BDA0003829235660000073
is based on the hidden state of the reset gate calculation.
And step S3: spatial dynamics and node specificity are modeled using an attention-enhancing adaptive graph convolution module (AEAGCN). The method specifically comprises the following steps:
first, a 2-adaptive globally shared node parameter matrix E is used 1 ,E 2 (E 1 ∈R N×D ) Modeling the spatial correlation of the global time domain, wherein D is a characteristic dimension of a parameter matrix; e 1 ,E 2 Respectively representing the attributes of the inbound passenger flow and the outbound passenger flow of all the sites, and passing through E 1 ,E 2 Long-time scale global adaptive spatial correlation matrix A can be constructed by training parameter matrix static For learning the hidden association relationship between the nodes, the specific formula is as follows:
Figure BDA0003829235660000074
wherein softmax represents a nonlinear normalization mode, and ReLU represents a nonlinear activation function; e 1 ,E 2 Training full training set data to obtain two global parameter shared incidence matrixes, and performing matrix operation and nonlinear transformation A att That is to say is A att Is adaptive;
secondly, using a dynamic graph structure based on an attention mechanism to capture dynamic changes of spatial correlation existing in the near future from the near-future historical data, the specific formula is as follows:
Figure BDA0003829235660000081
Figure BDA0003829235660000082
representing a short-term spatial correlation matrix; wherein l represents a parameter in the l-th layer graph convolution operation;
Figure BDA0003829235660000083
is a linear transformation weight parameter matrix; the x operator represents a matrix multiplication; by combining the two models, the model can effectively capture the characteristics of the dynamic space correlation under different time scales, and realize the simultaneous long-term static and real-time dynamicModeling the spatial correlation of (a); the formula is as follows:
Figure BDA0003829235660000084
wherein l represents the parameter of the convolution operation of the l-th layer of graph and is distinguished from the global sharing parameter,
Figure BDA0003829235660000085
representing a spatial correlation adaptive matrix combining global parameter sharing and local spatial features;
finally, referring to the convolution model of the common view, by introducing the node channel domain feature matrix E c To characterize different node characteristics (E) c ∈R N×d ) And c represents a convolution model of the defined dynamic graph, and the specific formula is as follows:
Figure BDA0003829235660000086
wherein W c ∈R d×C_in×C_out Representing a trainable weight matrix, b c ∈R C_in×C_out Representing a bias term; h l The input of the graph convolution network layer AEAGCN representing the l-th layer is also the output of the graph convolution operation of the previous layer.
(IV) model training, in this embodiment, two real data sets are used for the experiment, and the two data sets are Beijing Subway and Chongqing Subway respectively. Firstly, a model is constructed by using the framework of Seq2Seq, and the input and the output are regarded as a sequence, wherein the encoder outputs and inputs with variable lengths. In the aspect of training skills, the embodiment trains the model by using a teacher-training method, in which the aforementioned attention-enhanced adaptive graph convolutional network module (AEAGCN) replaces an MLP module in a GRU, and the GRU is improved to be AEAGRU as a framework network. In the training process, the epochs of the first round do not use the output of the last state as the input of the next state, but directly use the real target data (ground route) of the training dataThe corresponding previous entry is used as the input of the next state. I.e. let the encoder learn a criterion using the real target data as input. As the training step progresses, the probability of adopting the real target data is reduced at an exponential reduction rate, and the output of the last state is used more
Figure BDA0003829235660000087
As input for the next state.
In terms of training parameters, all models are optimized by Adam optimizer, and the learning rate is adjusted by using a cosine annealing method, wherein the period T =10. The maximum epoch number was 150, the initial learning rate lr was 0.005, the minimum learning rate was 1e-6, and the batch size was 32.
In terms of model parameters, first, the present embodiment uses AEAGRU model stacks of the same number of layers (layers = 2) as the main networks of the encoder and decoder, respectively. Second, for the encoder, this embodiment uses the tensor h of all 0 s 1 As an initial state tensor, where h 1 ∈R B×C_in×C_out B denotes the size of batch _ size, C _ in and C _ out represent the channel dimensions of input and output, respectively, C _ in =64, C _out =64. Finally, the dimension d of the adaptive parameter matrix E for different nodes of AEAGCN is uniformly set to the same 32.
In order to illustrate the effectiveness of the invention, a series of experimental verifications are carried out, and a baseline model widely used in the field of traffic flow prediction at the present stage and the most advanced model are selected to carry out comparison experiments. The method comprises the following steps:
ARIMA: kalman filtering autoregressive comprehensive moving average model
FC-LSTM: a long and short term memory network. A recurrent neural network of fully connected computational hidden units is used.
DCRNN: the diffusion map is convolved with a recurrent neural network. It combines an image convolution network with a recurrent neural network using an encoder-decoder approach.
STGCN: and (3) convoluting the neural network by the space-time diagram. The method carries out space-time dependence modeling by combining graph convolution and a one-dimensional convolution neural network.
AGCRN: an adaptive graph convolution recurrent neural network. It combines adaptive graph convolution with a recurrent neural network.
GWN: and (5) mapping a wavelet neural network. The method only uses self-adaptive graph convolution to carry out spatial modeling, and carries out time correlation modeling through a one-dimensional diffusion convolution neural network and a gate control mechanism.
The time series algorithm is used for predicting whether the effect is good or not and depends on a good evaluation index. In this example, common MAE, RMSE and MAPE were selected for effect evaluation.
Figure BDA0003829235660000091
Figure BDA0003829235660000092
Figure BDA0003829235660000093
Because of X in MAPE i The traffic of the station inbound or outbound flow is less than or equal to δ (δ =1 is selected at this time) every 10 minutes, so that the disturbance of low-flow data to the MAPE index is reduced.
The results are shown in tables 1 and 2 through experimental verification on a subway data set BJ _ Metro in a certain city and a subway data set CQ _ Metro in a direct prefecture city. By comparing the predicted step lengths of 1,3 and 6, the experimental results of the models of the station entrance and exit population on the test set corresponding to the predicted 10 minutes, 30 minutes and 60 minutes in the future are shown. Where AEAGCN-64 and AEAGCN-32 represent the model when the model hidden unit dimensions are set to 64 and 32. Bold font indicates that the model achieves the best performance in the data set, underlining indicates that a suboptimal performance is achieved.
Table 1: experimental results on CQ _ Metro dataset
Figure BDA0003829235660000101
Table 2: experimental results on BJ _ Metro dataset
Figure BDA0003829235660000102
The final result shows that the indexes are superior to those of the existing method on the BJ _ Metro data set, and partial indexes exceed those of the existing latest method on the CQ _ Metro data set, so that the partial indexes achieve the approximately optimal effect. Wherein AEAGCN-64 and AEAGCN-32 indicate that the dimension of the hidden unit of the model is set to 64 and 32, the result indicates that the learning capability of the model is greatly improved along with the adjustment of the hidden dimension, but subsequent experiments also find that the ratio of the improvement benefit of improving the dimension of the hidden unit to the space cost is gradually decreased, and 64 is a more appropriate parameter value. The experiment uses a hardware platform of Intel (R) Xeon (R) Silver 4210RCPU @2.40GHz CPU and 2080Ti GPU, and is built by a Pythroch training frame based on a Windows10 system and a software platform under Python language.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable connection, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, or the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media includes instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A rail transit passenger flow prediction model of an adaptive graph convolution recurrent neural network combined with an attention mechanism is characterized in that: the model is built by the following steps:
step S1: respectively learning long-term and short-term dynamic spatial correlation in rail transit passenger flow by using an adaptive graph convolution network and an attention mechanism, carrying out long-term and short-term spatial correlation modeling by fusing the long-term and short-term dynamic spatial correlation and the short-term dynamic spatial correlation, and carrying out integral AEAGCRN model building by using a coder-decoder framework;
step S2: an AEAGRU module is used as a main network module of an encoder-decoder framework, namely a GRU is used as a main body framework, MLP operation in the GRU is replaced by an attention-enhanced adaptive graph convolution module (AEAGCN) for carrying out joint modeling on spatio-temporal correlation;
and step S3: spatial dynamics and node specificity are modeled using an attention-enhancing adaptive graph convolution module (AEAGCN).
2. The rail transit passenger flow prediction model of the adaptive graph convolution recurrent neural network in combination with an attention mechanism of claim 1, characterized in that: in step S1, during training, by inputting the historical time sequence into the encoder, the encoder state h1 is initialized to the all-0 matrix, and the decoder state is initialized using the encoder final state, and the prediction layer of the decoder is composed using the separation layer with time step specificity and the projection layer shared by the parameters in the decoder time step.
3. The rail transit passenger flow prediction model of the adaptive graph convolution recurrent neural network in combination with an attention mechanism of claim 2, characterized in that: in step S2, embedding the attention-enhanced graph convolution (AEAGCN) module obtained in step S1 into a GRU to replace MLP operation, and finally obtaining an attention-enhanced-based adaptive graph convolution loop network;
z t =σ(AEAGCN([X :,t ,h t-1 ]))
r t =σ(AEAGCN([X :,t ,h t-1 ]))。
4. the rail transit passenger flow prediction model of the adaptive graph convolution recurrent neural network in combination with an attention mechanism of claim 3, characterized in that: the specific operation of the step S2 is:
firstly, splicing the current time input and the current state, and then putting the current time input and the current state as input into an AEAGCN module; nonlinear transformation is carried out through a sigmoid activation function, wherein sigma refers to the sigmoid activation function, and the inputs of an update gate and a reset gate are respectively obtained; finally, performing gating operation on the two to obtain a hidden state and subsequent layer output:
Figure FDA0003829235650000011
Figure FDA0003829235650000012
wherein tanh is a nonlinear activation function mapped to the (-1, 1) interval; x :,t ,h t Input and output at the t time step, respectively; [. The]Representing a matrix splicing operation, representing a Hadamard product, z t ,r t Respectively an update gate and a reset gate,
Figure FDA0003829235650000021
is based on the hidden state of the reset gate calculation.
5. The model of any one of claims 1 to 4 for predicting rail transit passenger flow in conjunction with an adaptive graph-convolution recurrent neural network for attention, characterized by: in step S3, first, 2 adaptive globally shared node parameter matrices E are used 1 ,E 2 (E 1 ∈R N×D ) Modeling the spatial correlation of the global time domain, wherein D is the characteristic dimension of the parameter matrix; e 1 ,E 2 Respectively representing the attributes of the passenger flow entering the station and the passenger flow leaving the station byE 1 ,E 2 A parameter matrix can be trained to construct a long-time-scale globally adaptive spatial correlation matrix A static For learning the hidden association relationship between nodes, the specific formula is as follows:
Figure FDA0003829235650000022
wherein softmax represents a nonlinear normalization mode, and ReLU represents a nonlinear activation function; e 1 ,E 2 Training full training set data to obtain two global parameter shared incidence matrixes, and performing matrix operation and nonlinear transformation A att That is to say is A att Is adaptive;
secondly, using a dynamic graph structure based on an attention mechanism to capture dynamic changes of spatial correlation existing in the near future from the near-future historical data, the specific formula is as follows:
Figure FDA0003829235650000023
Figure FDA0003829235650000024
representing a short-term spatial correlation matrix; wherein l represents a parameter in the l-th layer graph convolution operation;
Figure FDA0003829235650000025
is a linear transformation weight parameter matrix; the x operator represents a matrix multiplication; by combining the two models, the model can effectively capture the characteristics of the dynamic spatial correlation under different time scales, and realize the modeling of the spatial correlation of long-term static and real-time dynamic states; the formula is as follows:
Figure FDA0003829235650000026
wherein l represents the parameter of the convolution operation of the l-th layer of graph and is distinguished from the global sharing parameter,
Figure FDA0003829235650000027
representing a spatial correlation adaptive matrix in combination with global parameter sharing and local spatial features;
finally, referring to the convolution model of the normal map, by introducing a node channel domain feature matrix E c To characterize different node characteristics (E) c ∈R N×d ) The concrete formula is as follows:
Figure FDA0003829235650000028
wherein W c ∈R d×C_in×C_out Representing a trainable weight matrix, b c ∈R C_in×C_out Representing a bias term; h l The input of the graph convolution network layer AEAGCN representing the l-th layer is also the output of the graph convolution operation of the previous layer.
6. A method for rail transit passenger flow prediction in conjunction with an adaptive graph convolution cyclic neural network of the attention mechanism, using a prediction model according to any one of claims 1 to 5, characterized by:
firstly, defining a track traffic network graph G = (V, E, A), wherein V represents a track station set, V = { V1., vn }, wherein | V | = N, N represents the number of track stations and the number of graph nodes, E represents an edge connecting two track stations in the graph, and A represents an adjacency matrix of time-series traffic data nodes (measured by node distance or time-series similarity); the track timing passenger flow graph signal tensor can be expressed as X = { X = { :,1 ,X :,2 ,...,X :,t },X∈R N×T×C Wherein X is :,t ={X 1,t ,...,X i,t ,...,X N,t },X :,t ∈R N×C ,X N,t ∈R 1×C Using X i,t To represent the ith node atThe traffic flow data of time t, C represents the channel dimension C =2, representing the traffic flow, and the multi-time-step traffic flow prediction problem can be defined as solving a mapping function f under the precondition of giving the traffic tensor data X on the traffic network G θ ;f θ According to the passenger flow data of the past p time slices, the passenger flow data of the future delta time slices is obtained by mapping and is expressed by the following formula:
(X :,t+1 ,X :,t+2 ,…,X :,t+Δ )=f θ (X :,t-p+1 ,X :,t-p+2 ,…,X :,t ;G);
and then data preparation is carried out, the card swiping data is cleaned and abnormal repairing is carried out, the time slice size is set to be S minutes, the statistics is carried out on the incoming and outgoing passenger flow volume data in the interval of every S minutes, the incoming and outgoing passenger flow volume data are used as a group of data, and the tensor X of the graph signal belongs to R B×N×C (ii) a Wherein B is the sequence length ordered in time series, N is the number of track sites, and C is the attribute of a node; finally, according to the size of the sequence window required by input and output, carrying out window sliding operation on the time sequence data of the rail passenger flow, and making a training and testing data set X t ∈R N×T×C
Completing the construction of a prediction model according to any one of claims 1 to 5, carrying out simulation training, and carrying out rail transit passenger flow prediction by using the model.
7. The method of claim 6 for predicting rail transit passenger flow in conjunction with an adaptive graph convolution recurrent neural network in an attention mechanism, comprising: the time slice size is set to 10 minutes, i.e. S equals 10.
8. The method for predicting the passenger flow in rail transit of the adaptive graph-convolution recurrent neural network in combination with the attention mechanism as recited in claim 6, wherein: the simulation training is to train the models by using a teacher-training method, optimize all the models on training parameters by using an Adam optimizer, and adjust the learning rate by using a cosine annealing method; the same number of layers of AEAGRU model stacks are used on the model parameters as the master network for the encoder and decoder, respectively.
CN202211069744.XA 2022-09-02 2022-09-02 Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism Pending CN115423189A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211069744.XA CN115423189A (en) 2022-09-02 2022-09-02 Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211069744.XA CN115423189A (en) 2022-09-02 2022-09-02 Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism

Publications (1)

Publication Number Publication Date
CN115423189A true CN115423189A (en) 2022-12-02

Family

ID=84202330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211069744.XA Pending CN115423189A (en) 2022-09-02 2022-09-02 Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism

Country Status (1)

Country Link
CN (1) CN115423189A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128122A (en) * 2023-01-03 2023-05-16 北京交通大学 Urban rail transit short-time passenger flow prediction method considering burst factors
CN116432868A (en) * 2023-06-12 2023-07-14 深圳大学 Subway passenger flow prediction method and device based on node query set and storage medium
CN117475638A (en) * 2023-12-26 2024-01-30 北京建筑大学 Traffic OD passenger flow prediction method and system based on multichannel hypergraph convolutional network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128122A (en) * 2023-01-03 2023-05-16 北京交通大学 Urban rail transit short-time passenger flow prediction method considering burst factors
CN116128122B (en) * 2023-01-03 2023-09-12 北京交通大学 Urban rail transit short-time passenger flow prediction method considering burst factors
CN116432868A (en) * 2023-06-12 2023-07-14 深圳大学 Subway passenger flow prediction method and device based on node query set and storage medium
CN116432868B (en) * 2023-06-12 2023-09-19 深圳大学 Subway passenger flow prediction method and device based on node query set and storage medium
CN117475638A (en) * 2023-12-26 2024-01-30 北京建筑大学 Traffic OD passenger flow prediction method and system based on multichannel hypergraph convolutional network
CN117475638B (en) * 2023-12-26 2024-03-08 北京建筑大学 Traffic OD passenger flow prediction method and system based on multichannel hypergraph convolutional network

Similar Documents

Publication Publication Date Title
CN109697852B (en) Urban road congestion degree prediction method based on time sequence traffic events
Wan et al. CTS-LSTM: LSTM-based neural networks for correlatedtime series prediction
CN109887282B (en) Road network traffic flow prediction method based on hierarchical timing diagram convolutional network
CN115423189A (en) Rail transit passenger flow prediction model and method of adaptive graph convolution recurrent neural network combined with attention mechanism
CN111210633B (en) Short-term traffic flow prediction method based on deep learning
CN112489426B (en) Urban traffic flow space-time prediction scheme based on graph convolution neural network
CN111242292B (en) OD data prediction method and system based on deep space-time network
CN114692984B (en) Traffic prediction method based on multi-step coupling graph convolution network
CN111242395B (en) Method and device for constructing prediction model for OD (origin-destination) data
Liu et al. Smart environment design planning for smart city based on deep learning
Liu et al. Fedgru: Privacy-preserving traffic flow prediction via federated learning
CN111652425A (en) River water quality prediction method based on rough set and long and short term memory network
CN110991607B (en) Subway passenger flow prediction method and device, electronic equipment and storage medium
CN113112791A (en) Traffic flow prediction method based on sliding window long-and-short term memory network
Xu et al. AGNP: Network-wide short-term probabilistic traffic speed prediction and imputation
CN117116048A (en) Knowledge-driven traffic prediction method based on knowledge representation model and graph neural network
CN118297775A (en) Urban planning management and control system based on digital twin technology
Xu et al. A taxi dispatch system based on prediction of demand and destination
Wu et al. Learning spatial–temporal pairwise and high-order relationships for short-term passenger flow prediction in urban rail transit
Miao et al. A queue hybrid neural network with weather weighted factor for traffic flow prediction
Yang et al. MGSTCN: a multi-graph spatio-temporal convolutional network for metro passenger flow prediction
Ye et al. Demand forecasting of online car‐hailing by exhaustively capturing the temporal dependency with TCN and Attention approaches
Yang et al. Short‐Term Forecasting of Dockless Bike‐Sharing Demand with the Built Environment and Weather
Jiao et al. Multi-step traffic flow prediction method based on the Conv1D+ LSTM
Mou et al. Short-term traffic flow prediction: A long short-term memory model enhanced by temporal information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination