CN114912666A

CN114912666A - Short-time passenger flow volume prediction method based on CEEMDAN algorithm and attention mechanism

Info

Publication number: CN114912666A
Application number: CN202210434929.XA
Authority: CN
Inventors: 王嘉旋; 王睿
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2022-08-16

Abstract

The invention belongs to the field of short-time passenger flow prediction of rail transit, and provides a CNN-LSTM fusion neural network short-time passenger flow prediction method based on a complete empirical mode decomposition algorithm (CEEMDAN) and an Attention mechanism (Attention) of adaptive noise. The method comprises the following steps: s1, preprocessing data; s2, processing an input layer; s3, hidden layer processing; s4, processing an output layer; and S5, training a model. The method has the advantages of strong peak prediction capability, noise problem consideration of prediction results and strong capability of resisting irregular training data shapes.

Description

Short-time passenger flow volume prediction method based on CEEMDAN algorithm and attention mechanism

Technical Field

The invention belongs to the field of short-time passenger flow prediction of rail transit, and particularly relates to a CNN-LSTM fusion neural network short-time passenger flow prediction method based on a complete empirical mode decomposition algorithm (CEEMDAN) and an Attention mechanism (Attention) of adaptive noise.

Background

The short-time passenger flow prediction of the rail transit takes several minutes as a span, and the number of passengers in a certain time period in the future is predicted according to the passenger flow data of the historical subway station. Accurate short-time passenger flow prediction can provide early warning information for the subway station, so that workers can take specific measures to relieve load pressure of the subway station. In terms of image recognition, Convolutional Neural Networks (CNNs) are widely used due to the fixed-scale input and the large number of similar non-essential pixel-neuron connections that can be eliminated. In the field of machine translation and speech recognition, a Recurrent Neural Network (RNN) and a long-short term memory network (LSTM) of a variant thereof enable a model to learn in sequence prediction through an internal state mechanism and utilize information such as time or sequence of previous and later periods, so that words and sentences can be well predicted. Deep learning has universality to problems to be solved, and a model training task can be completed only by providing enough learning data. In the short-time passenger flow prediction problem, a proper model can be built according to the historical passenger flow information of the subway line for training, so that the future passenger flow change can be predicted. However, the utilization rate of the original data information by a single LSTM network is low, and only the time characteristics can be extracted. The passenger flow volume data is time-space two-dimensional data, and the existing method realizes the extraction of space characteristics by fusing a CNN network, thereby realizing certain performance improvement. But has certain disadvantages: the method comprises the steps of 1, the peak value prediction capability is insufficient 2, the noise part cannot be predicted properly, the prediction result is smooth, a certain difference is formed between the prediction result and real data, and the capability of resisting the shape of a small amount of irregular original data is poor.

The adaptive noise complete empirical mode decomposition algorithm (CEEMDAN) is a signal decomposition algorithm proposed by Torres et al on the basis of an empirical mode decomposition algorithm (EMD) and an Ensemble Empirical Mode Decomposition (EEMD) algorithm. The EMD algorithm decomposes the time series signals step by step to obtain a plurality of eigenmode functions imf (intrinsic mode function) and a residual term res. However, due to the existence of the mode aliasing problem, the time-frequency distribution is wrong, so that imf only meets the formal correctness and loses the real meaning. Wu and Huang propose an EEMD algorithm, and the mode aliasing caused by factors such as intermittent high-frequency components is solved by introducing white noise, but the mode aliasing cannot be overcome due to the fact that the introduced white noise assignment cannot be specifically quantized and when the initial value is set incorrectly. And when the original signal is recovered, the introduced white noise cannot be effectively eliminated, thereby bringing a large error. The method is different from the method that the EEMD algorithm directly adds the whole white noise, and the CEEMDAN algorithm adds the white noise component which is decomposed by the EMD during decomposition, so that the problems that the white noise is too much and cannot be removed when finally various dispersed signals are added are effectively solved. In the decomposition process, the CEEMDAN algorithm carries out weighted average immediately when decomposing imf components, and the problem that the inaccuracy of imf components brings too large errors for subsequent decomposition is effectively solved. The CEEMDAN algorithm is not used for short-time passenger flow prediction of rail transit in the prior art.

Disclosure of Invention

Aiming at the problems, on the basis of the existing CNN-LSTM model, the CEEMDAN algorithm is added to an input layer of the CNN-LSTM model to realize the separation of main data and noise data, the noise and the main data are respectively trained and predicted, and finally the two prediction results are integrated to achieve a more accurate passenger flow prediction result. In order to solve the problem of insufficient peak prediction capability, the invention adds an Attention mechanism aiming at time dimension on the basis of a CEEMDAN-CNN-LSTM model structure, reduces information loss by removing a pooling layer in a CNN network, finally realizes a CEEMDAN-ConvLSTM-Attention model, and improves passenger flow prediction performance.

Technical scheme

A short-time passenger flow prediction method based on a CEEMDAN algorithm and an attention mechanism is characterized by comprising the following steps:

s1, data preprocessing

Firstly, preprocessing original data to obtain rail transit passenger flow volume data based on two dimensions of time and space. The raw data is transaction records which can be collected by a rail transit system. The pretreatment process is as follows: and cleaning and counting the original data according to fields such as transaction time, a station where the transaction occurs, transaction types and the like, namely screening and clearing transaction records such as buses and ferries according to each station, only keeping the transaction records of the subway stations, counting the total number of people entering the subway stations within a fixed time interval, and taking the total number of people entering the subway stations within each time interval as passenger flow data of the subway stations at the current time point.

The time-space two-dimensional matrix obtained after the raw data preprocessing is as follows:

where S denotes a site index ranging from 1 to m, and t denotes a time interval index ranging from 1 to n.

The passenger flow data of a certain subway station k from the time t-n to the last time interval t-1 can be expressed as:

the passenger flow data of all subway stations at a certain time interval i can be represented as:

s2, input layer processing

Performing data decomposition on the time-space two-dimensional passenger flow volume data by using a CEEMDAN algorithm on an input layer to obtain a main part data matrix

Sum noise partial data matrix

And the training set and the test set are partitioned.

S2.1 CEEMDAN Algorithm processing procedure

Inputting the original input matrix into each column

This is regarded as a continuous-time signal x (t) as the signal to be decomposed. The treatment process is as follows:

(3) introducing normally distributed Gaussian white noise into the signal to be decomposed, wherein X (t) is the original signal, n ⁱ (t) is Gaussian according to normal distributionWhite noise, N being the number of times noise is added, ξ ₀ Standard deviation for noise:

X _i (t)＝X(t)+ξ ₀ n ⁱ (t)，i＝1,2,3...,N

(4) the preprocessed signal is decomposed using the EMD algorithm, resulting in a number of first order imf components:

imf ⁱ ₁ (t)＝EMD(X _i (t))

(3) the average of all imf components was taken as the first-order imf component of the CEEMDAN decomposition:

(4) the first order residual term res is calculated from the first order imf component:

(5) the first-order residual term res ₁ (t) repeating the above process as a new signal, resulting in a second order imf component and a second order remainder. The new input signal is, after white noise is introduced:

res ₁ (t)+ξ ₀ n ⁱ (t)，i＝1,2,3...,N。

the second-order imf component after EMD algorithm decomposition is:

the second-order residue resulting from removing the second-order imf component is:

(6) when K is 1,2., K, the K-th margin is calculated as:

and (5) repeating the content in the step (5) by taking the obtained remainder as a new signal, and repeating the process to the K order until the generated remainder cannot be decomposed (the generated remainder is a monotonous function or the extreme value point is not more than two). Let res (t) be the remainder that cannot be decomposed finally.

(7) The final decomposition results are:

where res (t) is the remainder of the CEEMDAN decomposition K order imf component.

And S2.2, adding the high-frequency small amplitude signals obtained by decomposition to obtain noise part data, and adding the smoother low-frequency signals to obtain main part data. And integrating the time sequence decomposition results of a plurality of sites to obtain a main part data matrix

Sum noise partial data matrix

The following were used:

and S2.3, carrying out normalization processing on the decomposed data to enable the preprocessed data to be limited in a certain range, so that the problems of non-convergence and the like caused by singular sample data in training are solved.

Since the traffic data is very unevenly distributed over time, the value is very large at the peak and the traffic is 0 at some night time, normalization processing is indispensable. The scheme adopts a min-max normalization method, and the specific formula is as follows:

wherein X is the data currently being normalized, Y is the processed output data, X is _max Is the maximum of all data points, X _min The minimum of all data points. After normalization, all data points are at [0,1 ]]The numerical values within the interval.

S2.4 after normalization, the resulting passenger flow data that is continuous in the time dimension is converted into a supervised learning sequence shape that is acceptable to the LSTM network.

The input data to the LSTM network must conform to its required supervised learning sequence shape, changing the original data shape according to the predicted step size desired during the training process. The predicted step size represents the longest limit that each target value can carry historical passenger flow information during the training process. By inputting the sequence X _t ＝[x ₁ x ₂ … x _n ]For example, the prediction step is k, and the converted data format is:

the body part data matrix and the noise part data are converted into supervised learning sequence shapes, respectively, and provided to S3 and S5.

S3, hidden layer processing

According to the invention, the hidden layer adds an Attention mechanism on the basis of a CNN-LSTM model, removes a pooling layer, builds a ConvLSTM-Attention model, and respectively performs model training on main data and noise part data obtained by an input layer. The specific process is as follows:

firstly, extracting the spatial characteristics of a two-dimensional matrix through a convolutional layer, then inputting the obtained spatial characteristic sequence of a time dimension into an LSTM network for time characteristic extraction, hiding part of neurons through a Dropout layer to prevent an overfitting phenomenon (namely, a model converges in a training set and shows an overfitting to a test set), inputting an output sequence of the LSTM network into an attention mechanism layer to calculate a weight value of each data in the sequence, and then multiplying the weight value with the data. The output matrix is finally flattened by the Flatten layer into a one-dimensional sequence that the output layer can receive.

The method selects a 1-dimensional CNN network to perform spatial feature extraction, and extracts the main part data matrix processed in S2

Sum noise partial data matrix

As separate inputs to the two models. The specific implementation method of the convolutional layer is as follows:

wherein the content of the first and second substances,

is the input of the convolutional layer of

Or

Is the convolution layer output, W is the weight obtained from model training, b is the bias obtained from model training, and σ is the Relu activation function.

The matrix after CNN network feature extraction is X _t ＝[x _t-n x _t-(n-1) … x _t-1 ] ^T The matrix is the distribution of the spatial eigenvalue in time, and the eigen matrix is used as the input of the LSTM network for model training.

The long-short term memory network LSTM is a recurrent neural network model constructed according to the LSTM concept proposed by Juergen Schmidhuber et al. The LSTM network includes an input layer, an LSTM layer, a fully connected layer, and an output layer. The LSTM layer includes a forgetting gate, an input gate, and an output gate, and the specific algorithm is as follows.

i _t ＝σ(w _i g[h _t-1 ,x _t ]+b _i )

f _t ＝σ(w _f g[h _t-1 ,x _t ]+b _f )

o _t ＝σ(w _o g[h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

Wherein i _t Representing the input Gate calculation procedure, f _t And

indicating a forgetting gate calculation procedure, o _t Representing the output gate calculation procedure, C _t And h _t Long-term and short-term memory parameters, respectively, σ denotes the sigmoid activation function, a denotes the matrix element-wise product, and w and b are weights and biases, respectively. h is _t The final output values, weights and biases for the model learned parameters after the model is input for each supervised learning sequence.

Attention is quantified in neural networks as specific weight values. The attention mechanism implementation method of the invention is to add a dense layer with an activation function of softmax after the LSTM network. The full connection layer takes LSTM network output as input, and calculates a corresponding weight matrix through a softmax activation function, thereby realizing the effect of automatically learning weight parameters in the training process. The Softmax function is an activation function that maps neuron outputs to an interval of (0, 1), and the mapping result can be regarded as a probability. This probability formula is as follows.

Wherein x is _k For the element for which we want to calculate the weight at present, x _i Historical data required for use in calculating current prediction data. In an LSTM network, β _t For data h output at a time _t Corresponding weight, and multiplying the weight by the data to obtain enhanced output data h _t ' and takes it as input data for the next calculation. As the result after the LSTM network training is a two-dimensional matrix formed by a plurality of input sequences, a Flatten layer is added behind the LSTM network to Flatten the two-dimensional matrix into a one-dimensional sequence which can be received by an output layer.

S4, output layer processing

And receiving the output of the hidden layer by using a full connection layer at an output layer and outputting the prediction result, wherein the prediction result is the prediction result of each time point under the current model parameter.

S5, model training

In each iteration process, error calculation is carried out on the prediction result output by the output layer and the real sequence, and model parameters are updated through an optimization algorithm.

After each round of training is finished, the loss function value is firstly calculated, parameters are updated through a model optimization algorithm, the loss function value is reduced by turns, and the prediction error is reduced. The optimization algorithm selected by the scheme is an Adam optimizer, so that the problems of gradient dip and the like can be effectively converged and overcome. The selected loss function is Mean Square Error (MSE), and the specific formula is as follows:

where N is the total number of input samples, y _i In order to achieve the target value,

is a predicted value. And updating the model parameters through an Adam optimization algorithm to reduce the error function value until the model converges.

Model testing

Inputting test set data into the trained model for prediction, integrating prediction results of the two models, adding noise prediction data and main part prediction data at the same time point to obtain final passenger flow prediction data, performing inverse normalization processing on the prediction results, calculating MAE and RMSE errors, and comparing model prediction performance. The anti-normalization processing formula is as follows:

wherein Y is the result after inverse normalization,

for the current predicted value, X _max And X _max The maximum and minimum values of the input data during the normalization process, respectively.

The invention has the beneficial effects that:

(1) the CEEMDAN-ConvLSTM-Attention model designed by the invention greatly improves the accuracy of the short-time passenger flow prediction result. The ConvLSTM fusion model makes full use of time and space characteristics in the original data of subway passenger flow, so that the accuracy of a prediction result is far higher than that of a single LSTM network prediction result.

(2) The method integrates an attention mechanism aiming at the time dimension, solves the problem that the traditional model is insufficient in peak value prediction capability when a short-time passenger flow prediction task is realized, and improves the practical significance of a prediction result.

(3) The model provided by the invention is integrated with a CEEMDAN algorithm, so that noise data and main data are respectively predicted, and the problems that the prediction result is smooth and the real situation cannot be well fitted at the data fluctuation position in the past are effectively solved.

Drawings

FIG. 1 is a diagram of the CEEMDAN-CNN-LSTM-Attention model structure

FIG. 2 shows the decomposition results of CEEMDAN algorithm

FIG. 3 shows the prediction results of CEEMDAN-ConvLSTM-Attention model

FIG. 4 shows the predicted results of the CNN-LSTM model

Detailed Description

The technical solutions provided in the present application will be further described with reference to the following specific embodiments and accompanying drawings. The advantages and features of the present application will become more apparent in conjunction with the following description.

As shown in fig. 1, a short-time passenger flow prediction method based on a CEEMDAN algorithm and an attention mechanism is characterized by comprising the following steps:

s1, data preprocessing

S2, input layer processing

Sum noise partial data matrix

And the training set and the test set are partitioned.

S3, hidden layer processing

The hidden layer adds an Attention mechanism on the basis of the CNN-LSTM model, removes a pooling layer, establishes a ConvLSTM-Attention model, and respectively performs model training on main data and noise part data obtained by the input layer.

S4, output layer processing

S5, model training

Example 1

In this embodiment, a data set of a backrush rail transit transaction record in 2015 is used as original data, and two-dimensional data of a plurality of subway stations with space and time intervals of every 10 minutes is obtained by processing methods such as data cleaning, and a training set and a test set are divided by the data.

In the actual experiment process, considering that an LSTM model has better prediction capability on smooth shape signals, the invention selects high-frequency signals to be added as noise components, and adds other components as trend items to ensure that the main body shape of the passenger flow volume signal is not lost due to excessive decomposition, and the specific decomposition formula is as follows:

X＝∑imf _High +∑imf _i

in step S2, the decomposition result of the CEEMDAN algorithm is shown in fig. 2, where signal is the original passenger flow volume data map of a single day. IMF1 is high-frequency noise partial data obtained by decomposition, and IMF2 is main body partial data obtained by decomposition. In the specific decomposition process, the data of a single day is not decomposed one by one, and the specific operation is to decompose the whole data set, so that the continuity of the data is not damaged in the decomposition process, the split data is split according to continuous time data, and unnecessary information loss or data alignment errors are avoided. The same splitting method is adopted for the working day data set and all the date data sets, and the fairness principle of comparison is guaranteed.

In the CNN-LSTM module, a one-dimensional CNN network is selected and the pooling layer of the CNN network is removed to extract the spatial characteristics of the passenger flow data. Since the passenger flow volume data is the distribution of the passenger flow volumes of a plurality of subway stations in time, the effect of extracting the spatial features among the passenger flow volumes of the stations without damaging the time features can be achieved by using the one-dimensional CNN network. And taking the output result of the CNN network as the input of the LSTM network to extract the time characteristics, thereby achieving the purpose of extracting the time and space characteristics. And a softmax layer is added behind the LSTM network to serve as an attention mechanism layer, and the passenger flow prediction result is further optimized by automatically learning the weight of each time point through each training turn.

As can be seen by comparing FIG. 3 with FIG. 4, the single CNN-LSTM network prediction result is poor at the peak, the CEEMDAN-ConvLSTM-orientation model prediction result at the peak is more excellent, and the fitting degree of the prediction result and the real data is higher. The following table shows the concrete data comparison of CEEMDAN-ConvLTM-Attention model prediction error with other model prediction errors. Where the errors are chosen as mean absolute error MAE and root mean square error RMSE and the predicted results are compared by the data set for both working and full date.

The above description is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the present application in any way. Any changes or modifications made by those skilled in the art based on the above disclosure should be considered as equivalent effective embodiments, and all the changes or modifications should fall within the protection scope of the technical solution of the present application.

Claims

1. A short-time passenger flow prediction method based on a CEEMDAN algorithm and an attention mechanism is characterized by comprising the following steps:

s1, data preprocessing

S2, input layer processing

Sum noise partial data matrix

And dividing a training set and a test set;

s3, hidden layer processing

The hidden layer adds an Attention mechanism on the basis of the CNN-LSTM model, removes a pooling layer, establishes a ConvLSTM-Attention model, and respectively performs model training on main data and noise part data obtained by an input layer;

s4, output layer processing

Receiving the output of the hidden layer by using a full connection layer on an output layer and outputting a prediction result, wherein the prediction result is the prediction result of each time point under the current model parameter;

s5, model training

2. The method for predicting short-term passenger flow based on the CEEMDAN algorithm and the attention mechanism as claimed in claim 1, wherein the step S1 is as follows:

preprocessing original data to obtain rail transit passenger flow volume data based on two dimensions of time and space, wherein the original data are transaction records which can be collected by a rail transit system;

the pretreatment process is as follows: cleaning and counting original data according to fields such as transaction time, a station where a transaction occurs, transaction types and the like, namely screening and clearing transaction records such as buses and ferrys according to each station, only keeping the transaction records of subway stations, counting the total number of people entering each subway station within a fixed time interval, and taking the total number of people entering each subway station within each time interval as passenger flow data of the subway station at the current time point;

wherein S represents a site index ranging from 1 to m, and t represents a time interval index ranging from 1 to n; the passenger flow data of a certain subway station k from the time t-n to the last time interval t-1 can be expressed as:

3. the method for predicting the short-term passenger flow based on the CEEMDAN algorithm and the attention mechanism as claimed in claim 1, wherein the step S2 is as follows:

s2.1 CEEMDAN Algorithm processing procedure

Inputting the original input matrix into each column

As a continuous-time signal x (t), as the signal to be decomposed; the treatment process is as follows:

(1) introducing normally distributed Gaussian white noise into the signal to be decomposed, wherein X (t) is the original signal, n ⁱ (t) is white Gaussian noise conforming to normal distribution, N is the number of times noise is added, ξ ₀ Standard deviation for noise:

X _i (t)＝X(t)+ξ ₀ n ⁱ (t)，i＝1,2,3...,N

(2) the preprocessed signal is decomposed using the EMD algorithm, resulting in a number of first order imf components:

imf ⁱ ₁ (t)＝EMD(X _i (t))

(5) the first order residual term res ₁ (t) repeating the above process as a new signal to obtain a second order imf component and a second order remainder; the new input signal is, after white noise is introduced:

res ₁ (t)+ξ ₀ n ⁱ (t)，i＝1,2,3...,N。

the second-order imf component after EMD algorithm decomposition is:

(6) when K is 1,2., K, the K-th margin is calculated as:

repeating the content in the step (5) by taking the obtained remainder as a new signal, and repeating the process to K order until the generated remainder cannot be decomposed (is a monotonous function or has no more than two extreme points); recording the residual items which can not be decomposed finally as res (t);

(7) the final decomposition results are:

where res (t) is the remainder of the CEEMDAN decomposition K order imf component;

s2.2, adding the high-frequency small-amplitude signals obtained by decomposition to obtain noise part data, and adding the smoother low-frequency signals to obtain main part data; and integrating the time sequence decomposition results of a plurality of sites to obtain a main part data matrix

Sum noise partial data matrix

The following were used:

s2.3, normalization processing is carried out on the decomposed data, so that the preprocessed data are limited in a certain range, and the problems of non-convergence and the like caused by singular sample data in training are solved;

the scheme adopts a min-max normalization method, and the specific formula is as follows:

wherein X is the data currently being normalized, Y is the processed output data, X is _max Is the maximum of all data points, X _min Is the minimum of all data points;

s2.4, after normalization processing, converting the obtained passenger flow volume data continuous in the time dimension into a supervised learning sequence shape acceptable by the LSTM network;

4. The method for predicting the short-term passenger flow based on the CEEMDAN algorithm and the attention mechanism as claimed in claim 1, wherein the step S3 is as follows:

the hidden layer adds an Attention mechanism on the basis of the CNN-LSTM model, removes a pooling layer, builds a ConvLSTM-Attention model, and respectively performs model training on main data and noise part data obtained by an input layer; the specific process is as follows:

firstly, extracting spatial features of a two-dimensional matrix through a convolutional layer, then inputting an obtained spatial feature sequence of a time dimension into an LSTM network for time feature extraction, hiding part of neurons through a Dropout layer to prevent an overfitting phenomenon, inputting an output sequence of the LSTM network into an attention mechanism layer to calculate a weight value of each data in the sequence, multiplying the weight value with the data, and finally flattening the output matrix into a one-dimensional sequence which can be received by an output layer through a Flatten layer;

selecting a 1-dimensional CNN network for spatial feature extraction, and performing main part data matrix processing in S2

Sum noise partial data matrix

As separate inputs to the two models; the specific implementation method of the convolutional layer is as follows:

wherein the content of the first and second substances,

is the input of the convolutional layer of

Or

Is the output of the convolutional layer, W is the weight obtained from the model training, b is the bias obtained from the model training, σ is the Relu activation function;

the matrix after CNN network feature extraction is X _t ＝[x _t-n x _t-(n-1) …x _t-1 ] ^T The matrix is the distribution of the space characteristic value in time, and the characteristic matrix is used as the input of an LSTM network for model training;

the LSTM network comprises an input layer, an LSTM layer, a full connection layer and an output layer; the LSTM layer comprises a forgetting gate, an input gate and an output gate, and the specific algorithm is as follows:

i _t ＝σ(w _i g[h _t-1 ,x _t ]+b _i )

f _t ＝σ(w _f g[h _t-1 ,x _t ]+b _f )

o _t ＝σ(w _o g[h _t-1 ,x _t ]+b _o )

h _t ＝o _t *tanh(C _t )

wherein i _t Representing input Gate calculationsProcess, f _t And

indicating a forgetting gate calculation procedure, o _t Representing the output gate calculation procedure, C _t And h _t Long term and short term memory parameters, respectively, σ denotes a sigmoid activation function, w and b denote weights and offsets, respectively _t Inputting the final output value, the weight and the bias of each supervised learning sequence after the model is input into the supervised learning sequence as parameters learned by the model;

the attention mechanism implementation method is that a dense layer with an activation function of softmax is added behind an LSTM network; the full connection layer takes LSTM network output as input, and calculates a corresponding weight matrix through a softmax activation function, thereby realizing the effect of automatically learning weight parameters in the training process; the Softmax function is an activation function that maps neuron outputs to an interval of (0, 1), and the mapping result can be regarded as a probability; this probability formula is as follows:

wherein x is _k For the element for which we want to calculate the weight at present, x _i Historical data required for use in calculating current prediction data; in an LSTM network, β _t For data h output at a time _t Corresponding weight, and multiplying the weight by the data to obtain the enhanced output data h _t ' and use it as input data for the next calculation; as the result after the LSTM network training is a two-dimensional matrix formed by a plurality of input sequences, a Flatten layer is added behind the LSTM network to Flatten the two-dimensional matrix into a one-dimensional sequence which can be received by an output layer.

5. The method for predicting the short-term passenger flow based on the CEEMDAN algorithm and the attention mechanism as claimed in claim 1, wherein the step S5 is as follows:

after each round of training is finished, firstly calculating a loss function value, updating parameters through a model optimization algorithm, reducing the loss function value by turns, and reducing a prediction error; the optimization algorithm selected by the scheme is an Adam optimizer, so that the convergence can be more effective, and the problems of gradient dip and the like can be solved; the selected loss function is Mean Square Error (MSE), and the specific formula is as follows:

is a predicted value; and updating the model parameters through an Adam optimization algorithm to reduce the error function value until the model converges.