CN116403397A

CN116403397A - Traffic prediction method based on deep learning

Info

Publication number: CN116403397A
Application number: CN202211651167.5A
Authority: CN
Inventors: 魏迎梅; 高敏; 杨雨璇; 韩贝贝; 谢毓湘; 康来; 蒋杰
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-07-07

Abstract

The invention provides a traffic prediction method based on deep learning, which comprises the following steps: acquiring historical characterization of traffic state information and spatiotemporal information characterizing a first number of historical time steps and future characterization of spatiotemporal information characterizing a second number of future time steps; processing the history characterization by using a first BERT model to obtain a first state code; adding the first state code to the future representation to obtain a predictive representation; and processing the predictive representation by using a second BERT model to obtain a predicted traffic state. Through the mode, the method and the device can effectively capture the hidden space-time dependence in the traffic data and improve the accuracy of long-term prediction.

Description

Traffic prediction method based on deep learning

Technical Field

The invention belongs to the technical field of intelligent traffic, and particularly relates to a traffic prediction method based on deep learning.

Background

With the acceleration of the urban process and the rapid development of economy, the population of cities and the number of motor vehicles are continuously increasing. In order to improve urban operation efficiency to the maximum extent, intelligent traffic systems are developed in various cities, and traffic prediction plays an important role in the intelligent traffic systems. The accurate prediction result can effectively relieve the traffic jam of the city, and provides more meaningful decision basis for traffic management. There are two main challenges in traffic prediction: temporal and spatial dependencies. Time dependence means that the current traffic state is affected by the previous traffic state. The time dependence has the characteristics of proximity, periodicity, trending and the like. Spatial dependence refers to the effect of the surrounding environment on the traffic conditions of a region. The influence of different adjacent regions differs from each other. Generally, the closer the distance, the greater the impact. The temporal and spatial dependencies are always interleaved together, resulting in more complex correlations.

With the great achievement of deep learning methods in the fields of computer vision, natural language processing, and the like, many researchers have attempted to introduce deep learning methods into traffic prediction. Convolutional neural networks (Convolutional Neural Networks, CNN) and graph neural networks (Graph Neural Network, GNN) are used to learn the spatial correlation hidden in the traffic data of the grid structure and the graph structure, respectively. The recurrent neural network (Recursive Neural Network, RNN) has instructive significance for modeling time correlation. The RNN variant long and short term memory model and gating loop may be applied to predict short term traffic flow as they solve the gradient explosion and gradient vanishing problems of the conventional RNN model.

The traditional RNN model still has a disadvantage in terms of capture time dependence. In traffic prediction, the traffic state of the current time period may be affected by the traffic state long before. However, conventional RNN models have difficulty remembering long-before traffic conditions, that is, there is a long-term dependence. In addition, the existing machine learning method can only model the dependency in time, and cannot capture the dependency in space.

Disclosure of Invention

The invention provides a traffic prediction method based on deep learning, which aims to solve the problem of low accuracy of the existing traffic long-term state prediction.

In order to solve the technical problems, the invention provides a traffic prediction method based on deep learning, which comprises the following steps: acquiring historical representations of traffic state information and spatiotemporal information representing a first number of historical time steps and future representations of spatiotemporal information representing a second number of future time steps; processing the history characterization by using a first BERT model to obtain a first state code; adding the first state code to the future representation to obtain a predictive representation; and processing the predictive representation by using a second BERT model to obtain a predicted traffic state.

Optionally, the applying a first BERT model to process the history characterization to obtain a first state code includes: performing time attention calculation on the history characterization to acquire a first time attention code; performing spatial attention calculation on the first time attention code to acquire a first spatial attention code; and carrying out layer normalization processing on the first space attention code to obtain a first state code.

Optionally, the performing a time attention calculation on the history characterization to obtain a first time attention code includes: decomposing the history characterization into time steps and node granularity, and calculating a time input vector of the time attention of the current layer of any time step of any node in the first BERT model according to the history characterization, wherein the time input vector comprises a time query vector, a time key vector and a time value vector; calculating a first time attention weight of the current layer by applying an activation function according to the time query vector and the time key vector; and carrying out weighted summation on the first time attention weight of the current layer and the time value vector, and carrying out residual connection with the first time attention code of the previous layer to obtain the first time attention code of the current layer.

Optionally, the performing spatial attention calculation on the first temporal attention code to obtain a first spatial attention code includes: calculating a spatial input vector of the spatial attention of the current layer of any time step of any node in the first BERT model according to the first spatial attention code, wherein the spatial input vector comprises a spatial query vector, a spatial key vector and a spatial value vector; calculating a first spatial attention weight of the current layer by applying an activation function according to the spatial query vector and the spatial key vector; and carrying out weighted summation on the first spatial attention weight of the current layer and the spatial value vector, and carrying out residual connection with the first time attention code of the current layer to obtain the first spatial attention code of the current layer.

Optionally, the performing layer normalization processing on the first spatial attention code to obtain a first state code includes: processing the first spatial attention code using a feed forward network; and superposing the first space attention code with the output of the feedforward network to obtain a first state code.

Optionally, the adding the first state code to the future representation to obtain a predicted representation includes: if the first number is less than the second number, performing random number filling on the first state code of the first number of time steps, and expanding to the second number of time steps; if the first number is greater than the second number, zero padding the future representation of the second number of time steps to the first number of time steps.

Optionally, the applying a second BERT model to process the predictive representation to obtain a predicted traffic state includes: performing time attention calculation on the predictive representation to obtain a second time attention code; and performing spatial attention calculation on the second time attention code, obtaining a second spatial attention code, performing layer normalization processing on the second spatial attention code, and obtaining a second state code, wherein the second state code is the predicted traffic state.

Optionally, the performing a temporal attention calculation on the predictive representation, and acquiring the second temporal attention code includes: decomposing the predictive representation into time steps and node granularity, and calculating a time input vector of the time attention of the current layer of any time step of any node in the second BERT model according to the predictive representation, wherein the time input vector comprises a time query vector, a time key vector and a time value vector; calculating a second time attention weight of the current layer by applying an activation function according to the time query vector and the time key vector; and carrying out weighted summation on the second time attention weight of the current layer and the time value vector, and carrying out residual connection with the second time attention code of the previous layer to obtain the second time attention code of the current layer.

Based on the same inventive concept, the embodiment of the invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the method according to any one of the previous claims.

Based on the same inventive concept, the embodiment of the invention also provides a computer storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to execute the method of any one of the foregoing claims.

From the above, the present invention provides a traffic prediction method based on deep learning, comprising: acquiring historical representations of traffic state information and spatiotemporal information representing a first number of historical time steps and future representations of spatiotemporal information representing a second number of future time steps; processing the history characterization by using a first BERT model to obtain a first state code; adding the first state code to the future representation to obtain a predictive representation; and processing the prediction characterization by using a second BERT model to obtain a predicted traffic state, so that the hidden space-time dependence in traffic data can be effectively captured, and the accuracy of long-term prediction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a traffic prediction method based on deep learning in an embodiment of the invention;

FIG. 2 is a schematic representation of input representation in a deep learning-based traffic prediction method in an embodiment of the present invention;

FIG. 3 is a schematic diagram of separation spatiotemporal attention in an embodiment of the invention;

FIG. 4 is a flowchart of a method for obtaining a first state code according to an embodiment of the present invention;

FIG. 5 is a schematic view of traffic prediction based on deep learning in an embodiment of the invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present invention should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present invention, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

The embodiment of the invention provides a traffic prediction method based on deep learning, as shown in fig. 1, the traffic prediction method based on deep learning comprises the following steps:

step S1: a historical representation of traffic state information and spatiotemporal information characterizing a first number P of historical time steps is obtained, and a future representation of spatiotemporal information characterizing a second number Q of future time steps is obtained.

The bi-directional encoder representation (Bidirectional Encoder Representation from Transformer, BERT) is a milestone model in natural language processing that includes a mechanism of attention that reduces the distance between any time steps in the time window to 1, effectively solving the long-term dependence problem. Furthermore, BERT is a pre-trained model that can be equipped with different lightweight outputs for different tasks without the need to individually design a custom model for a particular task. Accordingly, BERT is expected to be a generic model of traffic conditions that can be used for a number of downstream tasks, such as traffic condition classification and traffic condition clustering. Aiming at the defects of the traditional method in terms of capturing the spatial dependency and the time dependency and the advantages of the BERT model, the embodiment of the invention modifies the BERT model to be a model suitable for traffic prediction scenes, which is called TPBERT for short.

Before step S1, the road network is represented as a directed graph g= (V, E, a), where V is the set of nodes, E is the set of edges between nodes, and a is the adjacency matrix. In particular, n= |v| is the number of nodes, _ij e A represents v _i And v _j Physical distance between them. The traffic state of all vertices on the directed graph G at time step t is represented by a vector

And represents, where C is the number of traffic state observations. Based on the traffic state data observed in the directed graph G and P historical time steps, the traffic prediction task can be expressed as learning a function f to predict the traffic state for the future Q time steps: f (G, x=y;

wherein, the liquid crystal display device comprises a liquid crystal display device,

since traffic prediction is subject to time and space dependencies, it is important to encode and incorporate time and space information into the model. In addition, in consideration of the positional relationship between the historical time step and the future time step, traffic data and time information, space information and position information thereof need to be encoded to obtain traffic state embedded information, time embedded information, space embedded information and position embedded information. And all embedded sizes are set to D. The description of each information is as follows:

traffic state embedded information: the original traffic state observations at time step t are represented as

To keep D consistent with other embedded sizes, X _t The final representation is obtained through a fully connected network

Time embedded information: periodicity is an important feature of time dependence in traffic prediction, and time embedding mainly comprises daily periodicity, which means that traffic states are more similar at the same time of day, and weekly periodicity, which means that traffic states on the same day of the week have the same pattern. For example, seven days a week, 7 different embedded vectors are needed for the weekly periodicity to be represented. The daily periodicity is expressed in relation to the time interval of data collection. Assuming a time interval of 5 minutes, there is 24X 60 a day5 = 288 time steps. Thus 288 different embedded vectors will be used to represent the daily periodicity. The daily periodic and weekly periodic embedding of the present invention is randomly initialized. Time embedded information

Is obtained by embedding and adding daily periodicity and weekly periodicity, which can be updated continuously during training.

Spatial embedding information: BERT can encode elements in a sequence and relationships between elements, but cannot model spatial dependencies. To solve this problem, spatial embedding information is proposed based on graph embedding, which retains key information of nodes in a vector, and node representation is learned using a node embedding algorithm, which is a biased random walk in which the hyper-parameters p and q control the strategy of walking. All node representation vectors are pre-trained to facilitate spatial embedding, expressed as

Position embedding information: there are two options for position coding, absolute position coding and relative position coding. The relative position codes are chosen in the embodiments of the invention because absolute position codes require known positions in the whole time sequence, whereas relative position codes are not. For consecutive P historical time steps and Q future time steps, their relative positions may be encoded by different embeddings of P+Q. As with the time embedded information, the location embedded information

Is also randomly initialized and may be updated in training.

As shown in fig. 2, the historical characterization includes historical traffic state embedded information, historical time embedded information, historical space embedded information, and historical location embedded information, and the future characterization includes future time embedded information, future space embedded information, and future location embedded information. I.e. the history of the history time step t is characterized as

Future characterization of the future time step t is +.>

Step S2: and processing the history characterization by using a first BERT model to obtain a first state code.

To capture the hidden time and spatial dependencies in traffic state data, time and spatial attentiveness are computed one by one using a split spatiotemporal attentiveness mechanism. As shown in fig. 3, the input is first passed to the temporal attention to capture the temporal dependency and then to the spatial attention to capture the spatial dependency, resulting in a final output. It is noted that the attention is calculated for each node in the road network, that is, each node may be a query vector. As shown in fig. 4, step S2 includes:

step S21: and performing time attention calculation on the history characterization to acquire a first time attention code.

Decomposing the history characterization into time steps t and node granularity v, calculating a time input vector of the time attention of the current layer l of any time step t of any node v in the first BERT model according to the history characterization, wherein the time input vector comprises a time query vector

Time key vector->

Time value vector +.>

The calculation formula is as follows:

where LN represents a layer normalization operation and a represents the a-th attention header. Assuming that the total number of attention heads is A, the dimension D of the attention heads ^h ＝÷A。

Calculating a first temporal attention weight of the current layer l by applying an activation function based on the temporal query q-vector and the temporal key k-vector

Wherein SM is an activation function, and the preferred embodiment of the present invention is a softmax activation function.

Weighted summation of the first temporal attention weight of the current layer/with the temporal value vector

And residual connection is carried out with the first time attention code of the upper layer to obtain the first time attention code of the current layer l

The calculation formula is as follows:

step S22: encoding the first time attention

Performing spatial attention calculation to obtain the first spatial attention code +.>

In an embodiment of the invention, temporal attention encoding

Is the input to calculate the spatial attention. That is, the new spatial query vector, spatial key vector and spatial value vector are composed of +.>

Obtained, still used herein

The representation is not described in detail.

In step S22, a code is encoded according to the first temporal attention

Calculating a spatial input vector of spatial attention of the current layer l of any time step t of any node v in the first BERT model, the spatial input vector comprising a spatial query vector +.>

Space key vector->

Spatial value vector +.>

The calculation formula is the same as that in step S21, and will not be described here again.

From the spatial query vector

And the spatial key vector->

Calculating a first spatial attention weight of the current layer/using an activation function>

Said first spatial attention weight to the current layer/

And the spatial value vector->

Weighted summation +.>

And residual connection is carried out with the first time attention code of the current layer l to obtain the first space attention code of the current layer l +.>

The calculation formula is as follows:

step S23: encoding the first spatial attention

Performing layer normalization to obtain a first state code +.>

Optionally, the first spatial attention is first encoded using a feed forward network

And (5) processing. Then the first spatial attention code is superimposed with the output of the feed forward network to obtain a first state code +.>

The calculation formula is as follows:

step S3: and adding the first state code and the future representation to obtain a prediction representation.

The time, space and location information of the future token is known in both inputs of the history token and the future token. Thus, it is necessary to incorporate future information into the model through secondary inputs and complete the transition from history to future. The first state code is a first number P of time step dimensions, the future representation is a second number Q of time step dimensions, P is not equal to Q, namely when the future representation is inconsistent with the first state code dimensions, if the first number is smaller than the second number, random number filling is carried out on the first state code of the first number of time steps, and the first state code is expanded to the second number of time steps. If the first number is greater than the second number, zero padding is performed on the future representation of the second number of time steps, extending to the first number of time steps. In this way, the dimensions of the history characterization and the dimensions of the future characterization can be guaranteed to be consistent.

Step S4: and processing the predictive representation by using a second BERT model to obtain a predicted traffic state.

The TPBERT model is structured as shown in the figure5, the entire TPBERT model is built up from 2 l layers. The first layer is the operation layer number of the first BERT model, the second layer is the operation layer number of the second BERT model, and the second BERT model and the first BERT model are identical in structure. The previous layer has extracted abstract information of the history characterization, and the later layer can be combined with future characterization to make corresponding predictions. The history characterization method is expressed as

Future characterization is denoted->

E _h Is fed into the front L layer, generating an output +.>

When p=q, or output H by expanding the dimension ^L And future characterization E _f When the dimensions of (a) are the same, H ^L And future characterization E _f Adding to obtain +.>

E _f ' being fed into the second L layer, producing an output +.>

I.e. predictive representation E _p . To obtain the final prediction->

E _p A fully connected neural network will be entered.

In step S4, performing a temporal attention calculation on the predictive representation, and obtaining a second temporal attention code; and performing spatial attention calculation on the second time attention code, obtaining a second spatial attention code, performing layer normalization processing on the second spatial attention code, and obtaining a second state code, wherein the second state code is the predicted traffic state. The analysis and calculation process is the same as that in the step 2, and the historical characterization data in the original formula is replaced by the predictive characterization data.

The following experiments were conducted on the deep learning-based traffic prediction method of the present embodiment, and as shown in table 1, the TPBERT model of the present embodiment was evaluated using two common data sets METR-LA and PeMS-BAY from the real world, the time step of both data sets being 5 minutes, and short-, medium-and long-term predictions being represented by 3, 6 and 12 steps, respectively. METR-LA and PeMS-BAY are two different scale traffic data sets, the Mean Absolute Error (MAE), root Mean Square Error (RMSE), and Mean Absolute Percent Error (MAPE) being three measures of the model performance.

Table 1 test comparative results

HA, ARIMA, SVR, FNN, FC-LSTM, DCRNN, STGCN, MRA-BGCN, graph WaveNet, STA Wnet, MTGNN, GMANs in Table 1 are other different types of predictive models, HA represents a predictive model that utilizes a weighted average of historical time series as a predictive result; ARIMA and Kalman filter are a statistical prediction model for predicting and analyzing time sequences; SVR is a model which regards traffic prediction as a regression task and predicts with the help of a support vector machine; FNN is a prediction model consisting of two dense layers and L2 regularization; FC-LSTM is an encoder-decoder prediction model; DCRNN is a predictive model that captures spatial and temporal correlations using bipartite graph random walks and RNNs; STGCN is a prediction model which is built on a space-time convolution block and integrates graph convolution and gate control time convolution; MRA-BGCN is a prediction model which introduces a two-component graph convolution and a multi-range attention mechanism to integrate traffic information from different neighbors; graph WaveNet is a prediction model for learning long sequence information by using an adaptive dependency matrix and one-dimensional convolution; STA Wnet is a predictive model that uses self-learning node embedding to represent potential spatial relationships; MTGNN is a multivariate time series prediction model consisting of graph structure learning, graph convolution and time convolution; GMAN is an encoder-decoder architecture prediction model equipped with various attention mechanisms, such as spatial attention, temporal attention, and transform attention. Experimental results show that the traffic prediction method based on the deep learning is used for improving the accuracy in traffic prediction. In short-term prediction, MRA-BGCN performs best on both data sets. In mid-term and long-term predictions, TPBERT performs better on both datasets than other models. From different data sets, METR-LA predictions were larger than PeMS-BAY errors, indicating that METR-LA traffic conditions were more complex than BAY regions, while TPBERT performed well on the more challenging METR-LA, indicating that TPBERT has significant modeling capabilities for complex traffic data.

The embodiment of the invention obtains the historical representation of traffic state information and space-time information representing a first number of historical time steps and the future representation of space-time information representing a second number of future time steps; processing the history characterization by using a first BERT model to obtain a first state code; adding the first state code to the future representation to obtain a predictive representation; and the second BERT model is applied to process the prediction characterization, so that the predicted traffic state is obtained, the accuracy of long-term prediction can be improved, and the capture of the hidden space-time dependence in traffic data is facilitated.

Based on the same inventive concept, the embodiment of the invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the processor executes the program to implement the method according to any one of the preceding claims.

Fig. 6 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 601, a memory 602, an input/output interface 603, a communication interface 604, and a bus 605. Wherein the processor 601, the memory 602, the input/output interface 603 and the communication interface 604 are communicatively coupled to each other within the device via a bus 605.

The processor 601 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided by the embodiments of the present invention.

The Memory 602 may be implemented in the form of ROM (Read Only Memory), RAM (Random AccessMemory ), static storage device, dynamic storage device, or the like. The memory 602 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present invention are implemented by software or firmware, relevant program codes are stored in the memory 602 and invoked by the processor 601 for execution.

The input/output interface 603 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The communication interface 604 is used to connect a communication module (not shown in the figure) to enable the present device to interact with other devices for communication. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

The bus 605 includes a path to transfer information between the various components of the device, such as the processor 601, memory 602, input/output interfaces 603, and communication interfaces 604.

It should be noted that although the above device only shows the processor 601, the memory 602, the input/output interface 603, the communication interface 604, and the bus 605, in the implementation, the device may further include other components necessary for realizing normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary for implementing the embodiments of the present invention, and not all the components shown in the drawings.

Based on the same inventive concept, the embodiments of the present invention also provide a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform the method of any one of the foregoing.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present disclosure, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present invention as described above, which are not provided in details for the sake of brevity.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the invention, are intended to be included within the scope of the present disclosure.

Claims

1. The traffic prediction method based on the deep learning is characterized by comprising the following steps of:

acquiring historical characterization of traffic state information and spatiotemporal information characterizing a first number of historical time steps and future characterization of spatiotemporal information characterizing a second number of future time steps;

processing the history characterization by using a first BERT model to obtain a first state code;

adding the first state code to the future representation to obtain a predictive representation;

and processing the predictive representation by using a second BERT model to obtain a predicted traffic state.

2. The deep learning-based traffic prediction method according to claim 1, wherein the applying a first BERT model to process the history characterization to obtain a first state code includes:

performing time attention calculation on the history characterization to acquire a first time attention code;

performing spatial attention calculation on the first time attention code to acquire a first spatial attention code;

and carrying out layer normalization processing on the first space attention code to obtain a first state code.

3. The deep learning based traffic prediction method according to claim 2, wherein the performing a temporal attention calculation on the history characterization to obtain a first temporal attention code comprises:

decomposing the history characterization into time steps and node granularity, and calculating a time input vector of the time attention of the current layer of any time step of any node in the first BERT model according to the history characterization, wherein the time input vector comprises a time query vector, a time key vector and a time value vector;

calculating a first time attention weight of the current layer by applying an activation function according to the time query vector and the time key vector;

and carrying out weighted summation on the first time attention weight of the current layer and the time value vector, and carrying out residual connection with the first time attention code of the previous layer to obtain the first time attention code of the current layer.

4. The deep learning based traffic prediction method according to claim 2, wherein the performing spatial attention calculation on the first temporal attention code to obtain a first spatial attention code includes:

calculating a spatial input vector of spatial attention of a current layer of any time step of any node in the first BERT model according to the first time attention taking code, wherein the spatial input vector comprises a spatial query vector, a spatial key vector and a spatial value vector;

calculating a first spatial attention weight of the current layer by applying an activation function according to the spatial query vector and the spatial key vector;

and carrying out weighted summation on the first spatial attention weight of the current layer and the spatial value vector, and carrying out residual connection with the first time attention code of the current layer to obtain the first spatial attention code of the current layer.

5. The traffic prediction method based on deep learning of claim 2, wherein the performing layer normalization processing on the first spatial attention code to obtain a first state code includes:

processing the first spatial attention code using a feed forward network;

and superposing the first space attention code with the output of the feedforward network to obtain a first state code.

6. The deep learning based traffic prediction method according to claim 1, wherein said adding the first state code to the future representation to obtain a predicted representation comprises:

if the first number is less than the second number, performing random number filling on the first state code of the first number of time steps, and expanding to the second number of time steps;

if the first number is greater than the second number, zero padding the future representation of the second number of time steps to the first number of time steps.

7. The deep learning-based traffic prediction method according to claim 1, wherein the applying the second BERT model to process the predicted representation to obtain the predicted traffic state includes:

performing time attention calculation on the predictive representation to obtain a second time attention code;

and performing spatial attention calculation on the second time attention code, obtaining a second spatial attention code, performing layer normalization processing on the second spatial attention code, and obtaining a second state code, wherein the second state code is the predicted traffic state.

8. The deep learning based traffic prediction method according to claim 7, wherein said performing a temporal attention calculation on the predicted representation, obtaining a second temporal attention code comprises:

decomposing the predictive representation into time steps and node granularity, and calculating a time input vector of the time attention of the current layer of any time step of any node in the second BERT model according to the predictive representation, wherein the time input vector comprises a time query vector, a time key vector and a time value vector;

calculating a second time attention weight of the current layer by applying an activation function according to the time query vector and the time key vector;

and carrying out weighted summation on the second time attention weight of the current layer and the time value vector, and carrying out residual connection with the second time attention code of the previous layer to obtain the second time attention code of the current layer.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 8 when the program is executed by the processor.

10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform the method of any one of claims 1 to 8.