CN110610232A

CN110610232A - Long-term and short-term traffic flow prediction model construction method based on deep learning

Info

Publication number: CN110610232A
Application number: CN201910855977.4A
Authority: CN
Inventors: 沈强儒; 施佺; 曹阳; 曹慧; 葛文璇; 汤天培; 刘志杰; 邱礼平
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2019-12-24

Abstract

The invention discloses a long-term and short-term traffic flow prediction model construction method based on deep learning, namely a long-term and short-term time sequence network, a convolution neural network and a recursion neural network are used for extracting short-term local dependence modes among variables and generating long-term modes of time sequence trends. The present invention significantly improves the latest results of time series predictions for multiple reference data sets by combining the strength and autoregressive components of convolutional and recurrent neural networks. Through intensive analysis and empirical research, the efficiency of the LSTNet model architecture is shown, short-term and long-term repetitive patterns in data are successfully captured, and a linear model and a nonlinear model are combined to carry out robustness prediction.

Description

Long-term and short-term traffic flow prediction model construction method based on deep learning

Technical Field

The invention belongs to the field of traffic models, and particularly relates to a long-term and short-term traffic flow prediction model construction method based on deep learning.

Background

Multivariate time series prediction is an important machine learning problem in many fields, including prediction of solar power plant energy output, power consumption, and traffic congestion. The temporal data that occurs in these real-world applications typically involves a mixture of long-term and short-term patterns for which traditional methods such as autoregressive models and gaussian processes may fail.

Deep neural networks are also receiving increasing attention in time series analysis. Previously, there has been a focus on time series classification, i.e. the task of automatically assigning class labels to time series entries. For example, RNN architectures have been studied to extract information patterns from healthcare sequence data and classify the data according to diagnostic categories. RNN architectures have also been applied to mobile data for performing actions or activity classification on input sequences. CNN models are also used for motion, activity recognition, for extracting shift-invariant local patterns from input sequences as features of classification models.

Deep neural networks have also been investigated for time series prediction, i.e., the larger the task-horizon is, the more difficult it is to predict unknown time series in future time series using time series observed in the past. This effort ranged from early work with RNN models and mixed models combining ARIMA and multi-layered perceptron (MLP) to the recent application of vanilla RNN and dynamic boltzmann machines in time series prediction.

One of the most prominent univariate time series models is the autoregressive integrated moving average (ARIMA) model. The popularity of the ARIMA model is due to its statistical properties and the well-known Box-Jenkins method in the model selection process. The ARIMA model is not only applicable to various exponential smoothing techniques, but is also flexible enough to encompass other types of time series models, including Autoregressive (AR), Moving Average (MA), and autoregressive moving average (ARMA). The ARIMA model, however, includes variables used to model long-term time dependencies and is rarely used for high-dimensional multivariate time series prediction due to its high computational cost.

On the other hand, Vector Autoregressive (VAR) is the most widely used model in multivariate time series. The VAR model naturally extends the AR model to multivariate settings, which neglects the dependencies between the output variables. Significant advances have been made in recent years in various VAR models, including elliptical VAR models for heavy tail time series and structured VAR models to better explain the dependencies between high-dimensional variables. More. However, the model capacity of the VAR grows linearly in the size of the time window and is quadratic in the number of variables. This means that inherited large models tend to over-fit when dealing with long-term temporal patterns. To alleviate this problem, the original high-dimensional signal is reduced to a low-dimensional hidden representation, then VAR is applied for prediction, and various regularization options are provided.

The time series prediction problem can also be treated as a standard regression problem with time varying parameters. Therefore, it is not surprising that various regression models with different loss functions and regularization terms are applied to the time series prediction task. For example, linear Support Vector Regression (SVR) learns the maximum edge hyperplane based on regression loss, where the hyperparameter epsilon controls the threshold of the prediction error. Ridge regression is another example, and can be re-overlaid from an SVR model by setting ε to zero. Finally, the LASSO model is applied to encourage sparsity in the model parameters so that interesting patterns between different input signals can be displayed. These linear methods are actually more efficient for multivariate time series prediction due to the high quality ready-to-use solvers in the machine learning community. However, like VARs, these linear models may not be able to capture the complex non-linear relationships of multivariate signals, resulting in their inefficient performance.

The Gaussian Process (GP) is a non-parametric method for modeling distributions over a continuous functional domain. This is in contrast to models defined by parameterized function classes such as VAR and SVR. GP can be applied to the multivariate time series prediction task proposed in the above, and can be used as a priori of the function space in bayesian inference. For example, a full bayesian approach is proposed where GP priors are nonlinear state space models that can capture complex dynamic phenomena. However, the powerful function of the gaussian process comes at the cost of high computational complexity. Direct implementation of the gaussian process for multivariate time series prediction has cubic complexity exceeding the number of observations due to matrix inversion of the kernel matrix.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a long-term and short-term traffic flow prediction model construction method based on deep learning.

The technical scheme is as follows: a long-term and short-term traffic flow prediction model construction method based on deep learning comprises the following steps:

1. problem formulation:

the task of formulating multivariate time series predictions is:

a series of fully observed time series signals are given: y ═ Y₁，y₂，...y_TIn which y_t∈RⁿN is a variable dimension; to predict y_T+hWhere h is the ideal horizon before the current timestamp, assume { y }₁，y₂，...y_TAvailable; also, to predict the next timestamp y_T+h+1Value of (c), assume { y }₁，y₂，...y_T，y_T+1Available; formulating an input matrix X at a timestamp T_T＝{y₁，y₂，...y_T}∈R^n×T(ii) a Selecting the range of the prediction task according to the requirement set by the environment;

2. convolution component:

the LSTNet framework is a deep learning framework, is specially designed for a multivariate time series prediction task, and mixes long-term and short-term modes; the first layer of the LSTNet architecture is a convolutional network without pools, which is extractedShort-term modes in the inter-dimension and local dependencies between variables, the convolutional layer consists of a number of filters of width ω and height n, the heights being set to be the same as the number of variables; the kth filter scans the input matrix X and generates: h is_k＝RELU(W_k*X+b_k) Wherein: denotes convolution operation, output h_kIs a vector, RELU function is RELU (x) max (0, x); each vector h of length T is formed by zero padding of the left hand side of the input matrix X_kThe size of the output matrix of the convolutional layer is d_cX T of which

d_cRepresenting the number of filters;

3. frequent components:

the output of the convolution layer is simultaneously fed into a current component and a current-skip component, wherein the current component is a circulation layer with a gating circulation unit GRU, a RELU function is used as a hiding updating activation function, and the hiding state of a repeating unit at time t is calculated as:

r_t＝σ(x_tW_xr+h_t-1W_xr+b_r)；

u_t＝σ(x_tW_xr+h_t-1W_xr+b_u)；

c_t＝RELU(x_tW_xc+r_t⊙(h_t-1W_xr)+b_r)；

h_t＝(1-u_t)⊙h_t-1+u_t⊙c_t；

wherein [ ] is the product of elements, [ sigma ] is the sigmoid function, x_tIs the input of the layer at time t; the output of this layer is the hidden state of each timestamp;

4. a recursive skip component:

developing a loop structure with time hopping connections to extend the time span of the information stream and thereby simplify the optimization process, in particular, the skipping connections are added between the current hidden cell and the hidden cells at the same stage during the neighboring period; the update process can be expressed as:

r_t＝σ(x_tW_xr+h_t-pW_hr+b_r)

u_t＝σ(x_tW_xu+h_t-pW_hu+b_u)

c_t＝RELU(x_tW_xc+r_t⊙(h_t-pW_hc)+b_c)

h_t＝(1-u_t)⊙h_t-p+u_t⊙c_t

the input to this layer is the output of the convolutional layer; p is the number of skipped hidden units, and the value of p can be easily determined for a data set with clear periodic pattern, otherwise, the value is adjusted; good tuning p can significantly improve the performance of the model, and furthermore, LSTNet can be easily extended to variables containing the hop length p;

frequent and frequent skip components of the output using a dense layer combination; the input to the dense layer includes the hidden state of the recursive component at the timestamp t (toRepresentation) and the hidden state of the recursive skip component p of the iterative skip component from timestamp t-p +1 to t, represented asThe output calculation formula of the dense layer is as follows:

wherein the content of the first and second substances,represents the prediction of the (upper) part of the neural network at the time stamp t;

5. time attention layer:

add attention mechanism that learns a weighted combination of implicit representations at each window position of the input matrix, in particular, the attention weight α at the current timestamp t_t∈R^qThe calculation is as follows:

wherein the content of the first and second substances,for matrix stacking with implicit representation of RNN columns, attncore is some similar function, such as dot product, cosine or parameterized by a simple multi-layered perceptron;

the final output of the temporal attention layer is a weighted context vector c_t＝H_tα_tAnd a last window hidden representationAnd a linear projection operation:

6. autoregressive component:

decomposing the final prediction of LSTNet into a linear part, which is mainly focused on local scaling problems, and a non-linear part containing recurring patterns; in the LSTNet architecture, a classical Autoregressive (AR) model is employed as the linear component; all dimensions have the same set of linear parameters, and the AR model is as follows:

then, by integrating the outputs of the neural network part and the AR component, the final prediction of LSTNet is obtained:

wherein the content of the first and second substances,representative modelFinal prediction at timestamp t;

7. establishing an objective function:

the squared error is the default loss function for many prediction tasks, and the corresponding optimization objective is expressed as:

wherein Θ represents a set of parameters of the model; omega_TRAIN；||·||_FIs the Frobenius norm;

the objective function of linear support vector regression is:

ξ_t，i≥0；

wherein C and epsilon are hyper-parameters, excited by the remarkable performance of the linear support vector machine model, and an objective function of the hyper-parameters is taken as a substitute method of square loss and is incorporated into the LSTNET model; suppose ε is 0¹The above objective function is reduced to the following absolute loss function:

the advantage of the absolute loss function is that it is more sensitive to anomalies in the real-time sequence data, in the experimental part, a validation set is used to decide which objective function to use, whether the squared loss eq.7 or the absolute loss eq.9;

as an optimization: and (3) optimizing the strategy:

suppose the time series of inputs is Y_t＝{y₁，y₂，...，y_tDefine an adjustable window size q and re-represent the input t at the timestamp as X_t＝{y_t-q+1，y_t-q+2，...y_tThen, thisThe problem becomes a problem with a set of characteristic value pairs { X_t，Y_t+hThe regression task of (1) can be solved by a random gradient body surface method or a variable thereof.

Has the advantages that: the present invention proposes a new deep learning framework, namely long-term and short-term time series networks (LSTNet), which use Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to extract short-term local dependency patterns between variables and develop long-term patterns of time series trends. In addition, the scale insensitivity problem of the neural network model is solved by utilizing the traditional autoregressive model. In the evaluation of actual data with complex mixtures of repeating patterns, LSTNet achieves significant performance improvements over several of the most advanced baseline methods. All data and experimental codes were available online.

Drawings

FIG. 1 is a schematic diagram of the long and short term time series network (LSTNet) of the present invention;

FIG. 2 is a schematic representation of the autocorrelation of sampled variables forming four data sets in accordance with the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below so that those skilled in the art can better understand the advantages and features of the present invention, and thus the scope of the present invention will be more clearly defined. The embodiments described herein are only a few embodiments of the present invention, rather than all embodiments, and all other embodiments that can be derived by one of ordinary skill in the art without inventive faculty based on the embodiments described herein are intended to fall within the scope of the present invention.

Examples

In the present invention, we propose a deep learning framework for multivariate time series prediction, i.e. long and short term time series networks (LSTNet), as shown in fig. 1. It exploits the advantages of convolutional layers to find local dependency patterns between multidimensional input variables and cyclic layers to capture complex long-term dependencies. A novel loop structure, i.e. current-skip, is designed to capture the very long-term dependence patterns and make optimization easier by exploiting the periodic nature of the input time-series signal. Finally, LSTNet partially parallels a traditional autoregressive linear model with a nonlinear neural network, which makes the nonlinear deep learning model more robust against time series violating scale variations. In experiments with real-world seasonal time series datasets, our model is consistently superior to traditional linear models and GRU recurrent neural networks.

The invention is constructed as follows:

in the present invention, the time series prediction problem is first formulated, and then the details of the proposed LSTNet architecture are discussed in the following section, as shown in fig. 1. Finally, an objective function and an optimization strategy are proposed.

1 problem formulation

In this context, the task of multivariate time series prediction is formulated, giving a series of fully observed time series signals Y ═ Y₁，y₂，...y_TIn which y_t∈RⁿWhere n is a variable dimension, our goal is to predict a series of future signals in a rolling predictive manner. To say this, to predict y_T+hWhere h is the ideal horizon before the current timestamp, we assume { y }₁，y₂，...y_TIs available. Also, to predict the next timestamp y_T+h+1Let us assume the value of y₁，y₂，...y_T，y_T+1Is available. Thus, we formulate the input matrix X at the timestamp T_T＝{y₁，y₂，...y_T}∈R^n×T；

In most cases, the scope of the predictive task is selected according to the requirements of the environmental settings, for example, for traffic usage, the range of interest is from a few hours to a day; for stock market data, even second/minute advance predictions are meaningful for generating returns.

Fig. 1 outlines the proposed LSTnet architecture. LSTNet is a deep learning framework designed specifically for multivariate time series prediction tasks, mixing long-term and short-term patterns. In the following sections, the building blocks of LSTNet will be described in detail.

2 convolution component

The first layer of LSTNet is a convolutional network without pools, aimed at extracting short-term patterns in the time dimension and local dependencies between variables. The convolutional layer is composed of a plurality of filters of width ω and height n (the height is set to be the same as the number of variables). The kth filter scans the input matrix X and generates: h is_k＝RELU(W_k*X+b_k) Wherein: denotes convolution operation, output h_kIs a vector, the RELU function is RELU (x) max (0, x). We fit each vector h of length T by inputting zero padding to the left of the matrix X_k. The size of the output matrix of the convolutional layer is d_cX T, wherein d_cThe number of filters is indicated.

3 regular assembly

The output of the convolutional layer feeds both the Recurrent and the Recurrent-skip elements. The Recurrent component is a loop layer with gated loop units (GRUs) and uses the RELU function as a hidden update activation function. The hidden state of the repeating unit at time t is calculated as,

r_t＝σ(x_tW_xr+h_t-1W_xr+b_r)；

u_t＝σ(x_tW_xr+h_t-1W_xr+b_u)；

c_t＝RELU(x_tW_xc+r_t⊙(h_t-1W_xr)+b_r)；

h_t＝(1-u_t)⊙h_t-1+u_t⊙c_t；

wherein [ ] is the product of elements, [ sigma ] is the sigmoid function, x_tIs the input to the layer at time t. The output of this layer is the hidden state of each timestamp. While people are accustomed to using the tanh function as the hidden update activation function, experience has found that RELU results in more reliable performance, by which gradients are more easily propagated backwards.

4 recursive skip component

The loop layer with GRU and LSTM units is carefully designed to remember historical information and thus understand the relatively long-term dependencies. However, GRU and LSTM are generally unable to acquire long-term correlations in practice due to gradient vanishing. The present invention proposes a new recursive skip component using a periodic pattern for real world data sets to alleviate this problem. The invention develops a loop structure with time hopping connections to extend the time span of the information stream and thus simplify the optimization process. Specifically, the skip link is added between the current hidden cell and the hidden cell at the same stage during the neighborhood. The update process can be expressed as:

r_t＝σ(x_tW_xr+h_t-pW_hr+b_r)

u_t＝σ(x_tW_xu+h_t-pW_hu+b_u)

c_t＝RELU(x_tW_xc+r_t⊙(h_t-pW_hc)+b_c)

h_t＝(1-u_t)⊙h_t-p+u_t⊙c_t；

the input to this layer is the output of the convolutional layer. p is the number of skipped hidden units, and the value of p can be easily determined for a well-defined data set of periodic patterns, otherwise it will be adjusted. In experiments it was found that even in the latter case, good tuning p significantly improves the performance of the model. Furthermore, LSTNet can be easily extended to variables containing the hop length p.

Frequent and frequent skip components of the output using a dense layer combination. The input to the dense layer includes the hidden state of the recursive component at the timestamp t (toRepresentation) and the hidden state of the recursive skip component p of the iterative skip component from timestamp t-p +1 to t, represented asThe output calculation formula of the dense layer is as follows:

wherein the content of the first and second substances,the prediction result of the (upper) part of the neural network in fig. 1 is represented at the time stamp t.

5 time attention layer

However, recursive skipping of layers requires a predefined over-parameter p, which is disadvantageous in non-seasonal time series prediction or whose period length changes dynamically over time. To alleviate this problem, we consider another approach, the attention mechanism, which learns a weighted combination of implicit representations at each window position of the input matrix. In particular, the attention weight α at the current timestamp t_t∈R^qThe calculation is as follows:

wherein the content of the first and second substances,for matrix stacking with implicit representation of RNN columns, attncore is some similar function, such as dot product, cosine or parameterized by a simple multi-layered perceptron.

The final output of the temporal attention layer is a weighted context vector c_t＝H_tα_tAnd a last window hidden representationAnd a linear projection operation.

6 autoregressive component

One major drawback of neural network models is that the output scale is insensitive to the input scale due to the non-linear nature of the convolution and recursion components. Unfortunately, in a particular real dataset, the input signal scales constantly change in a non-periodic manner, greatly reducing the prediction accuracy of the neural network model. To remedy this deficiency, similar in spirit to the road network, we decompose the final prediction of LSTNet into a linear part, which focuses mainly on local scaling problems, and a non-linear part containing recurring patterns. In the LSTNet architecture, a classical Autoregressive (AR) model is employed as the linear component. In the model of the invention, all dimensions have the same set of linear parameters. The AR model is as follows:

wherein the content of the first and second substances,representing the final prediction of the model at time stamp t.

7 objective function

wherein Θ represents a set of parameters of the model; omega_TRAIN；||·||_FIs the Frobenius norm; a conventional linear regression model with a squared loss function is called a linear ridge, which is equivalent to a vector autoregressive model with ridge regularization. However, experiments have shown that linear support vector regression is controlled in some data setsA linear ridge model is shown. The only difference between linear support vector regression and linear ridge is the objective function. The objective function of linear support vector regression is:

ξ_t，i≥0；

where C and ε are hyperparameters. Excited by the significant performance of the linear support vector machine model, its objective function is incorporated into the LSTNET model as an alternative to the square loss. For convenience, let ε be 0¹The above objective function is reduced to the following absolute loss function:

the advantage of the absolute loss function is that it is more sensitive to anomalies in the real-time sequence data. In the experimental part, a validation set is used to decide which objective function to use, the squared penalty eq.7 or the absolute penalty eq.9.

8 optimization strategy

The optimization strategy provided by the invention is the same as the traditional time series prediction model. Suppose the time series of inputs is Y_t＝{y₁，y₂，...，y_t}. Defining an adjustable window size q and re-representing the input t at the timestamp as X_t＝{y_t-q+1，y_t-q+2，...y_t}. The problem then becomes a problem with a set of feature value pairs { X }_t，Y_t+hThe regression task of (1) can be solved by a random gradient body surface method or a variable thereof.

Second, the comparative test evaluation of the present invention:

the invention has performed 9 methods of extensive experiments on the time series prediction task on 4 reference datasets. All data and experimental codes are available from the web.

a comparison method

The method for comparative evaluation of the present invention is as follows:

(1) AR represents an autoregressive model, equivalent to a one-dimensional Vector Autoregressive (VAR) model.

(2) The LRidge model is a vector auto-regression (VAR) model with L2 regularization, and is widely used in multivariate time series prediction.

(3) LSVR is a Vector Autoregressive (VAR) model with a support vector regression objective function.

(4) TRMF is an autoregressive model factorized with a temporal regular matrix.

(5) GP is a gaussian process modeled in time series.

(6) Va-MLP is a model combining multi-layer perception (MLP) and autoregressive models proposed in the above.

(7) RNN-GRU is a recurrent neural network model based on GRU units.

(8) LSTNet-Skip is our proposed LSTNet model with skipped RNN layers.

(9) LSTNet-Attn is our proposed LSTNet model with temporal attention layers.

For the above single output methods, such as AR, LRidge, LSVR and GP, only n models are trained independently, i.e. n is different for each model output.

b measure index

The invention uses three traditional evaluation indexes, defined as:

root Relative Square Error (RSE):

empirical correlation Coefficient (COR):

Y，representing the true signal and the system predicted signal, respectively, RSE is a widely used scaled version of the Root Mean Square Error (RMSE) designed to make the evaluation more readable, regardless of the data size. For RSE, lower values are better, while higher CORR values are better.

c data

The present invention uses four publicly available reference data sets.

Traffic: traffic hour data was collected from the California department of transportation for 48 months (2015-2016). These data describe road occupancy (between 0 and 1) measured by different sensors on the highway in the gulf of san francisco.

Solar energy: record of solar power generation in 2006 with 137 photovoltaic plants in alabama sampled every 10 minutes.

Electric power: from 2012 to 2014, electricity usage (kWh) was recorded every 15 minutes for n-321 customers. We convert the data to reflect the power consumption per hour;

exchange rate: a summary of daily rates from 1990 to 20 years in 8 foreign countries australia, uk, canada, switzerland, china, japan, new zealand and singapore.

All data sets were chronologically divided into a training set (60%), a validation set (20%) and a test set (20%). To facilitate future multivariate time series prediction studies, we published all raw datasets and web-preprocessed datasets.

Table 1: data set statistical table

In the data set statistical table of table 1, T is the time series length, D is the number of variables, and L is the sampling rate.

To test for the presence of long-term and/or short-term repetitive patterns in time series data, we plotted an autocorrelation of randomly selected ones of the variables in the four data sets in fig. 2. Autocorrelation, also known as sequence correlation, is the correlation between a signal and its delayed replica as a function of delay as defined below.

Wherein, X_tRepresents a time series signal; μ represents the mean; σ represents the variance. In practical applications, empirical unbiased estimates are considered to calculate the autocorrelation.

It can be seen from the graphs (a), (b) and (c) of fig. 2 that there is a repeating pattern of high autocorrelation in the traffic, solar and power data sets, but not in the exchange rate data set. Furthermore, we can also observe short-term daily patterns (every 24 hours) and long-term weekly patterns (every 7 days) in the graphs of traffic and power data sets, which perfectly reflects the expected regularity of road traffic conditions and power consumption. On the other hand, in the plot (d) of the exchange rate dataset we hardly see any repetitive long-term patterns, except some short-term local continuity. These observations are important to our later analysis of the empirical results of the different methods, that is, they should perform better when the data contains such repetitive patterns (e.g., electricity, traffic, and solar energy) for methods that can correctly model and successfully utilize the short and long term repetitive patterns in the data. Argon energy). On the other hand, if the data set does not contain such patterns (e.g., exchange rates), the advantages of these methods may not lead to better performance than other less powerful methods.

d details of the experiment

A grid search is performed for all tunable hyper-parameters on the retention validation set for each method and data set. Specifically, if applied, all methods share the same grid search range of window size q, which varies by {2 }⁰，2¹，...，2⁹}. For LRidge and LSVR, the regularization factor in is from {2 }^-10，2^-8，...，2⁸，2¹⁰And (6) selecting. For GP, the RBF kernel bandwidth σ and the noise level α are from {2 }^-10，2^-8，...，2⁸，2¹⁰And (6) selecting. For TRMF, the hidden dimension is from {2 }²，...，2⁶Chosen, the regularization coefficients λ are chosen from {0.1, 1, 10 }. For LST-Skip and LST-Attn, the training strategy described above was used. The hidden dimensions of the recursive layer and convolutional layer are selected from {50, 100, 200} and the recursive skip layer is selected from {20, 50, 100 }. The skip length p of the recursive skip layer is set to 24 for the traffic and power data set and adjusted from 2 for the solar and exchange rate data set¹To 2⁶. The regularization coefficients for the AR components are chosen from 0.1, 1, 10 to achieve the best performance. We perform the dropping operation after each layer except for input and output, and the rate is typically set to 0.1 or 0.2, using ADAM algorithms to optimize the model parameters.

e main result

Setting h {3, 6, 12, 24} means predicting power and traffic data from 3 hours to 24 hours, solar data from 30 minutes to 240 minutes, and exchange rate data from 3 days to 24 days, respectively. The larger h, the more difficult the prediction task. The best results for each (data, metric) pair are highlighted in bold in this table. For LSTNet-Skip (a version of the proposed LSTNet), the total number of bold results is 17. The result for LSTNet-ATT (another version of LSTNet) was 7. The values for the remaining methods are between 0 and 3.

It is clear that the proposed two models LSTNet-Skip and LSTNet-Attn are constantly improving the state of the art on data sets with periodic patterns, especially over a wide range of conditions. Furthermore, the RSE metrics of LSTNet on solar, traffic and power data sets were 9.2%, 11.7% and 22.2% higher than the strong neural baseline RNN-GRU, respectively, when the range was 24, demonstrating the effectiveness of the complex repetitive pattern framework design. More importantly, given that the former still provides a considerable improvement over baseline, the user may consider LSTNet-attn as an alternative to LSTNet-skip when the application is unaware of the periodic pattern q. However, on the exchange rate data set, the proposed LSTNET is slightly worse than AR and LRIDGE. The autocorrelation curves of these data sets are used to show that there are repetitive patterns in the solar, traffic and power data sets rather than in the exchange rates. The current results provide empirical evidence for the success of the LSTNET model in modeling long-term and short-term dependence patterns. When they do appear in the data. Otherwise, LSTNET performed comparable to the better baselines (AR and LRIDGE) in the representative baseline.

Comparing the results of the univariate AR with the multivariate baseline methods (LRIDGE, LSVR and RNN), we found that in some data sets, such as solar energy and traffic, the multivariate methods are stronger but weaker, which means that richer input information can lead to overfitting of the traditional multivariate methods. In contrast, LSTNET has a powerful performance under different circumstances, partly because of its autoregressive components.

Ablation study

To demonstrate the effectiveness of our framework design, we conducted careful ablation studies. Specifically, we delete each component at a time in the LSTNet framework. First, we named LSTNet without the different components as shown below.

LSTw/oskip: LSTNet model without the Recurrent-skip component and attention component.

LSTw/oCNN: LSTNet has no skip model for the convolution component.

LSTw/oAR: LSTNet has no skip model for the AR component.

For different baselines, we adjust the hidden dimensions of the models to make them have similar model parameters to the completed LSTNet model, thereby eliminating the performance gain due to model complexity.

In the results of the test using RSE and CORR measurements: the best results for each dataset are obtained by either LST-Skip or LST-Attn. The deletion of the AR component (in LSTw/oAR) from the entire model caused the most significant performance degradation for most data sets, showing the critical role of the AR component in general. Deleting Skip and CNN components (LSTw/oCNN or LSTw/oskip) causes a significant performance degradation on some datasets, but not all. All components of LSTNet together lead to our robust performance for all datasets. The conclusion is that the architectural design of the present invention is most robust in all experimental environments, especially in large field of view environments.

The reason why the AR component has such an important role is that: AR is generally robust, scale-varying data. To empirically verify this intuition, one dimension (a variable) of the time series signal in the power consumption data set was plotted over a duration of 1 to 5000 hours, and it can be seen that true consumption increases suddenly around 1000 hours, while lstnet-Skip successfully catches this sudden change, but LSTw/oAR fails to react correctly. Ablation studies clearly demonstrate the efficiency of our architectural design. All components contribute to the excellent and robust performance of LSTNet.

The invention provides a novel deep learning framework (LSTNet) aiming at the multivariate time series prediction problem. By combining the strength and autoregressive components of convolutional and recurrent neural networks, the method significantly improves the latest results of time series prediction for multiple reference data sets. Through intensive analysis and empirical research, the efficiency of the LSTNet model architecture is shown, short-term and long-term repetitive patterns in data are successfully captured, and a linear model and a nonlinear model are combined to carry out robustness prediction.

Claims

1. A long-term and short-term traffic flow prediction model construction method based on deep learning is characterized in that: comprises the following steps:

1. problem formulation:

the task of formulating multivariate time series predictions is:

2. convolution component:

the LSTNet framework is a deep learning framework, is specially designed for a multivariate time series prediction task, and mixes long-term and short-term modes; the first layer of the LSTNet architecture is a convolutional network without a pool, a short-term mode in a time dimension and local dependency among variables are extracted, the convolutional layer is composed of a plurality of filters with width omega and height n, and the height is set to be the same as the number of the variables; the kth filter scans the input matrix X and generates: h is_k＝RELU(W_k*X+b_k) Wherein: denotes convolution operation, output h_kIs a vector, RELU function is RELU (x) max (0, x); each vector h of length T is formed by zero padding of the left hand side of the input matrix X_kThe size of the output matrix of the convolutional layer is d_cX T of which

d_cRepresenting the number of filters;

3. frequent components:

r_t＝σ(x_tW_xr+h_t-1W_xr+b_r)；

u_t＝σ(x_tW_xr+h_t-1W_xr+b_u)；

c_t＝RELU(x_tW_xc+r_t⊙(h_t-1W_xr)+b_r)；

h_t＝(1-u_t)⊙h_t-1+u_t⊙c_t；

4. a recursive skip component:

r_t＝σ(x_tW_xr+h_t-pW_hr+b_r)

u_t＝σ(x_tW_xu+h_t-pW_hu+b_u)

c_t＝RELU(x_tW_xc+r_t⊙(h_t-pW_hc)+b_c)

h_t＝(1-u_t)⊙h_t-p+u_t⊙c_t

5. time attention layer:

6. autoregressive component:

wherein the content of the first and second substances,representing the final prediction of the model at time stamp t;

7. establishing an objective function:

the objective function of linear support vector regression is:

ξ_t，i≥0；

the advantage of the absolute penalty function is that it is more sensitive to anomalies in the real-time sequence data, and in the experimental part, a validation set is used to decide which objective function to use, the squared penalty Eq.7 or the absolute penalty Eq.9.

2. The deep learning-based long-term and short-term traffic flow prediction model construction method according to claim 1, characterized in that: and (3) optimizing the strategy: suppose the time series of inputs is Y_t＝{y₁，y₂，...，y_tDefine an adjustable window size q and re-represent the input t at the timestamp as X_t＝{y_t-q+1，y_t-q+2，...y_tThen, the problem becomes a solution with a set of characteristic value pairs { X }_t，Y_t+hThe regression task of (1) can be solved by a random gradient body surface method or a variable thereof.