CN116797274A

CN116797274A - Shared bicycle demand prediction method based on Attention-LSTM-LightGBM

Info

Publication number: CN116797274A
Application number: CN202310855865.5A
Authority: CN
Inventors: 张瑞轩; 张宇航; 何佳磊; 刘冰慈; 李郅; 孙健
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2023-07-12
Filing date: 2023-07-12
Publication date: 2023-09-22

Abstract

The invention belongs to the technical field of intelligent traffic and information, and particularly relates to a shared bicycle demand prediction method based on Attention-LSTM-LightGBM. The method comprises the following steps: and acquiring historical characteristic data and historical demand data, merging to obtain a time sequence data set, preprocessing and analyzing the correlation of the time sequence data set, and dividing the time sequence data set into a training set, a verification set and a test set. And inputting the data of the test set into the model to obtain a first predicted value. And inputting the test set data into the model to obtain a second predicted value. And fusing the first predicted value and the second predicted value by using the BP neural network, evaluating the fused result, and taking the evaluated fused result as a final predicted result of the shared bicycle demand. The final result is predicted by combining the two models, so that the complex relation of the data is more accurately captured, and the prediction precision is improved.

Description

Shared bicycle demand prediction method based on Attention-LSTM-LightGBM

Technical Field

The invention belongs to the technical field of intelligent traffic and information, and particularly relates to a shared bicycle demand prediction method based on Attention-LSTM-LightGBM.

Background

With the continuous development of sharing economy, nowadays, people are more and more commonly accepted a bicycle traveling mode characterized by environmental protection and low carbon. The sharing bicycle becomes an important choice for short-distance travel in cities, is convenient for people to travel, has the characteristics of low carbon and environmental protection, and promotes construction of a green travel system and protects the environment. However, with the rapid development of the shared bicycle industry in China in recent years, the delivery and scheduling of the shared bicycle face some challenges, such as the problem of unbalanced bicycle supply and demand caused by fluctuation of demand, the problem of resource waste caused by excessive delivery, and the problem of urban infrastructure occupation. The demand forecast of the sharing bicycle can help the sharing bicycle company to reasonably arrange the vehicle distribution and delivery plan, and improve the utilization rate and the service level. Therefore, the method has great significance for accurately predicting and researching the demand of the shared bicycle. The system can better adapt to the fluctuation of space-time requirements of the shared bicycle system, simultaneously reduce the problems of piling and random placement of the shared bicycle, reduce the operation cost of the platform, and improve the convenience and the utilization rate of users.

Currently, in research on predicting the demand of a shared bicycle, problems are more studied by using a conventional statistical method, a machine learning method and a deep learning method, and since a data noise point may reduce the accuracy of model prediction by using a conventional statistical method such as an ARIMA model, etc., in recent years, the research of deep learning prediction is gradually more, the shared bicycle data has seasonal, trending and periodicity problems, and a single model may not accurately capture complex relations of data.

Disclosure of Invention

The invention aims to overcome the defects and provide a shared bicycle demand prediction method based on the Attention-LSTM-LightGBM, which utilizes a combination model to more accurately capture the complex relation of data and improves the prediction precision.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the invention discloses a shared bicycle demand prediction method based on Attention-LSTM-LightGBM, which comprises the following steps:

acquiring historical characteristic data and historical demand data of a shared bicycle in a designated area;

combining the historical demand data and the historical characteristic data to obtain a time sequence data set, preprocessing and performing correlation analysis on the time sequence data set, and dividing the preprocessed and correlation analyzed time sequence data set into a training set, a verification set and a test set;

inputting data of a test set into an attribute-LSTM model to obtain a first predicted value, and inputting data of a training set and data of a verification set into a LightGBM model to obtain a second predicted value;

and fusing the first predicted value and the second predicted value by using the BP neural network, evaluating the fused result, and taking the evaluated fused result as a final predicted result of the shared bicycle demand.

The historical demand of the shared bicycle comprises the demand of a common user, the demand of a temporary user and the total demand, and the historical characteristic data comprises seasons, dates, holidays, weather categories, body surface temperature, temperature and humidity.

Weather data are weather data of the nearest weather observation station in the appointed area, and recording time intervals of shared bicycle demand data, body surface temperature, temperature and humidity are all 1 hour.

The data preprocessing and the data correlation analysis of the time sequence data set are specifically as follows:

combining the historical demand data with the historical characteristic data according to the time scales to construct a time sequence data set for sharing the demand of the bicycle;

converting season, holiday and weather category data in the time series data set into category data, and converting other characteristic data in the time series data set into floating point data;

checking whether the historical characteristic data in the converted time sequence data set has missing data or not, if so, filling the missing value by using cubic spline interpolation, and if so, deleting the data of the corresponding time in the time sequence data set;

and detecting all the characteristics of the time series data set by adopting a box diagram, identifying data outside the inner limit and the outer limit as abnormal values, and deleting the abnormal values.

The data correlation analysis of the time series data set subjected to data preprocessing is specifically as follows:

and carrying out correlation analysis on each characteristic of the time sequence data set subjected to data preprocessing by adopting a Spearman correlation coefficient to acquire the characteristic with larger influence on the demand of the shared bicycle, and deleting the characteristic with smaller correlation.

The Attention-LSTM model is specifically as follows:

initializing attention weight, multiplying the multi-element weather input data with the time step length of L by the attention initial weight, and respectively sampling importance of the important time step and the multi-element weather input data according to the following formula:

X＝(x ₁ ，x ₂ ，...，x ₂₄ )

W＝(w ₁ ，w ₂ ，...，w ₂₄ )

X’＝(x ₁ w _i ，x ₂ w ₂ ，...，x ₂₄ w ₂₄ )

wherein X is the multiple weather input data, w is the attention weight of the multiple weather input data, and X' is the weather feature;

weather feature X' = (X) ₁ w ₁ ，x ₂ w ₂ ，...，x ₂₄ w ₂₄ ) Inputting the model into an LSTM, setting LSTM network parameters, determining the number of hidden layers, the number of single-layer neurons and the time step of input characteristics, initializing the weight and deviation of the network layers, adopting the mean square error as a loss function of the model, and selecting Adam as a model optimizer;

the Attention weight is learned through a firework algorithm, the Attention weight is generated based on iteration of the firework algorithm to obtain a predicted value of the Attention-LSTM model, and the Attention weight is continuously optimized according to root mean square error between the predicted value and historical demand data.

The firework algorithm consists of the following steps:

initializing a weight set;

calculating a fitness function;

explosion spark and gaussian spark generation;

the attention weights are iteratively updated.

The lightGBM model construction process is as follows:

selecting basic learners, the number of learners and the learning rate;

adjusting sample and characteristic sampling parameters, setting node splitting, carrying out characteristic engineering, characteristic coding, characteristic normalization and vectorization on effective data in a time sequence data set subjected to data preprocessing to obtain predicted values as input data of a LightGBM algorithm, and carrying out histogram algorithm and GOSS algorithm optimization;

training a LightGBM model by data of the training set, and validating the LightGBM model by data of the validation set.

The method for fusing the first predicted value and the second predicted value by using the BP neural network specifically comprises the following steps:

the BP neural network has three hidden layers, each hidden layer has 64 neurons, the dimension of the input layer is defined as 2, the dimension of the output layer is defined as 1, in the training process of the neural network, the training set is divided into five different folds by using a cross-validation technology, the neural network is trained by using four folds each time of training, the performance of the model is validated by using the remaining one fold, thereby preventing overfitting, and in each training iteration, the MSE is used as a loss function, and the weight of the neural network is updated by using an Adam optimizer.

The error checking of the fusion result is specifically as follows:

taking the predicted result output by the BP neural network as the predicted value of the final shared bicycle demand, and selecting an average absolute error and a root mean square error for evaluation during error detection, wherein the calculation formula is as follows:

wherein MAE is the mean absolute error, RMSE is the root mean square error, n is the data set size, y _i Is a set of real data that is to be processed,is the predicted dataset.

Compared with the prior art, the invention has the following beneficial effects:

the method of the invention is as follows: and acquiring historical characteristic data and historical demand data of the shared bicycle in the designated area, and using the historical characteristic data and the historical demand data as data bases for subsequent prediction. And combining the historical demand data and the historical characteristic data to obtain a time series data set, preprocessing and performing correlation analysis on the time series data set, and dividing the preprocessed and correlation analyzed time series data set into a training set, a verification set and a test set. The data of the test set is input into the attribute-LSTM model to obtain a first predicted value, and the LSTM model can adaptively select the most relevant input features of the predicted target to pay Attention to and learn through an attribute mechanism, so that the prediction accuracy and generalization performance of the model are improved. And inputting the data of the training set and the data of the verification set into a LightGBM model to obtain a second predicted value, wherein the LightGBM has faster training speed and smaller memory occupation for processing a large amount of data and is excellent in processing high-dimensional data. And fusing the first predicted value and the second predicted value by using the BP neural network, evaluating the fused result, and taking the evaluated fused result as a final predicted result of the shared bicycle demand. The final result is predicted by combining the two models, so that the complex relation of the data is more accurately captured, and the prediction precision is improved.

Further, in the Attention-LSTM model, the Attention weight is learned through a firework algorithm, the Attention weight is generated based on iteration of the firework algorithm to obtain the predicted value of the Attention-LSTM model, and the Attention weight is continuously optimized according to the root mean square error between the predicted value and the historical demand data. The generation of gaussian sparks can further improve training efficiency.

Drawings

FIG. 1 is a flow chart of a shared bicycle demand prediction method of the present invention;

FIG. 2 is a thermodynamic diagram of the correlation analysis between various indicators using Spearman correlation coefficients in accordance with the present invention;

FIG. 3 is a graph comparing the predicted effects of a combined model and a single model;

FIG. 4 is a graph of average absolute error analysis of the prediction results of different models;

fig. 5 is a graph of root mean square error analysis of the predictions of different models.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, a method for predicting the demand of a shared bicycle based on an attribute-LSTM-LightGBM is characterized by comprising the following steps:

the historical demand of the shared bicycle includes a common user demand, a temporary user demand, and a total demand, and the historical characteristic data includes seasons, dates, holidays, weather categories, body surface temperature, and humidity.

Preferably, the recording time intervals of the historical demand data, the body surface temperature, the temperature and the humidity of the shared bicycle are all 1 hour, and the meteorological data of the nearest meteorological observation station in the appointed area is selected as the weather characteristic data of the shared bicycle in the area.

Combining the historical demand data and the historical characteristic data to obtain a time sequence data set, carrying out data preprocessing and data correlation analysis on the time sequence data set, and dividing the preprocessed and correlation analyzed time sequence data set into a training set, a verification set and a test set;

the data preprocessing and the data correlation analysis of the time series data set are specifically as follows:

The data correlation analysis of the data-preprocessed time-series data set is specifically as follows:

the Attention-LSTM model is specifically as follows:

initializing attention weight, multiplying the multi-element weather input data with the time step length of L by the attention initial weight, taking the time step length of L as one hour, and respectively sampling importance of the important time step and the multi-element weather input data according to the following formula:

X＝(x ₁ ，x ₂ ，...，x ₂₄ )

W＝(w ₁ ，w ₂ ，...，w ₂₄ )

X’＝(x ₁ w ₁ ，x ₂ w ₂ ，...，x ₂₄ w ₂₄ )

where X is the multiple weather input data, w is the attention weight of the multiple weather input data, and X' is the weather feature.

Weather feature X' = (X) ₁ w ₁ ，x ₂ w ₂ ，...，x ₂₄ w ₂₄ ) Inputting the parameters into an LSTM, setting LSTM network parameters, determining the number of hidden layers, the number of single-layer neurons and the time step of input characteristics, initializing the weight and deviation of the network layers, adopting the mean square error as a loss function of a model, and selecting Adam as a model optimizer.

The LSTM is additionally provided with a gating structure based on RNN, and comprises a forgetting door, an input door and an output door, so that the LSTM can memorize short-term weather characteristics and gradually forget long-term weather characteristics. In particular, forgetting gate is an operator in LSTM that deletes long-term features with sigmoid function, which is a key operator in gating structure;

the Attention weight is learned through the firework algorithm, the Attention weight is generated based on iteration of the firework algorithm to obtain the predicted value of the Attention-LSTM model, the Attention weight is continuously optimized according to the root mean square error between the predicted value and the historical demand data, and the training efficiency can be further improved through Gaussian spark generation.

The firework algorithm consists of the following steps:

initializing a weight set;

calculating a fitness function;

explosion spark and gaussian spark generation;

the attention weights are iteratively updated.

The lightGBM model building process is as follows:

selecting basic learners, the number of learners and the learning rate;

and (3) adjusting sample (row) and feature (column) sampling parameters, setting node splitting, processing effective data in the time sequence data set subjected to data preprocessing through feature engineering, feature coding, feature normalization and vectorization, and then optimizing through a histogram algorithm and a GOSS algorithm to obtain a predicted value, wherein the effective data is used as input data of a LightGBM algorithm.

And (3) fusing the first predicted value and the second predicted value by using the BP neural network, evaluating the fused result, and taking the evaluated fused result as a final predicted result of the shared bicycle demand, wherein a comparison graph of the combined model and the single model predicted effect is shown in fig. 3.

the BP neural network has three hidden layers, each hidden layer has 64 neurons, two predicted values in the input data set are respectively a first predicted value and a second predicted value, therefore, the dimension of the input layer is defined as 2, the dimension of the output layer is defined as 1, in the training process of the neural network, the training set is divided into five different folds by using a cross validation technology, the neural network is trained by using four folds each time of training, the performance of the model is validated by using the remaining one fold, so that over fitting is avoided, and in each training iteration, the Mean Square Error (MSE) is used as a loss function, and the neural network weight is updated by using an Adam optimizer. As shown in fig. 4, the average absolute error analysis graph of the prediction results of the different models is shown in fig. 5, and the root mean square error analysis graph of the prediction results of the different models is shown.

The error checking of the fusion result is specifically as follows:

Preferably, the invention combines the advantages of the two models of the attribute-LSTM and the LightGBM, can fully utilize the historical characteristic data and the historical demand data of the shared bicycle, and improves the prediction precision. The Attention weight in the Attention-LSTM model can be effectively learned by the firework algorithm, and the training efficiency can be further improved by the generation of Gaussian sparks.

Through the Attention mechanism, the LSTM model can adaptively select the most relevant input features of the prediction target to pay Attention to and learn, so that the prediction accuracy and generalization performance of the model are improved.

The data set adopted by the invention has large capacity and multidimensional data, and the LightGBM has faster training speed and smaller memory occupation in processing a large amount of data and is excellent in processing high-dimensional data.

First preferred embodiment:

s1, acquiring historical demand data of a shared bicycle in a certain city and historical characteristic data near an operation place of the shared bicycle, wherein the data set records 10886 shared bicycle leasing data in two years from 2011 month to 2012 month of the shared bicycle in the city;

S2, merging the historical demand data and the historical characteristic data of the shared bicycle to obtain a time sequence data set, preprocessing and correlation analysis are carried out on the obtained time sequence data set, and the time sequence data set subjected to preprocessing and correlation analysis is divided into a training set, a verification set and a test set, wherein the time sequence data set is specifically as follows:

the season, holiday and weather category data of the data in the time series data set are converted into category type data, and other data are converted into floating point type data. Specifically, for seasonal data, the application values 1, 2, 3, and 4 represent spring, summer, autumn, and winter, respectively; for holiday data, 0 represents a workday, 1 represents a holiday; for the characteristic weather data, the application values 1, 2, 3 and 4 represent sunny, cloudy, rainy and snowy days and bad weather, respectively. The actual temperature and the body temperature are in degrees centigrade. The data of the temporary user and the member user of the shared bicycle are the original integer data.

Outliers are deleted and missing values in the dataset are complemented by cubic spline interpolation, meeting some continuity requirements between adjacent small segments of data to fit the entire data smoothly.

And combining the historical demand data with the multi-element time series data such as the historical characteristic data and the like to predict the shared bicycle leasing demand of the region. The proposed model was evaluated using 23 data from 2012, 12, 19 of the dataset as the test set.

The demand of the shared bicycle is affected by various factors such as time, season, holiday, weather, temperature, humidity, and wind speed. Sharing daytime demands of a bicycle during the time of dayThe method is obviously longer than night, and has the remarkable characteristics of having peaks in the morning and evening, showing double wave peaks and two wave troughs in the same day and changing wave bands; in one year, the demand for 4-10 months is greater than the demand for other months. In order to obtain a data input model with strong correlation with the demand, 10 influence factors and demand of the existing data are utilized to carry out Spearman correlation coefficient analysis on the preprocessed data and the data of the shared bicycle demand, and the correlation coefficient of any two factor variables X and Y is defined as rho _X，Y The formula is as follows, data with high correlation with the demand of the shared bicycle is obtained through Spearman correlation coefficient analysis, and visualization is realized in the form of thermodynamic diagram. The results are shown in FIG. 2.

Wherein ρ is _X，Y Represents the Spearman correlation coefficient between features, N represents the feature quantity, X _i Represents the ith observation value of a certain feature, Y _i Representing another characteristic ith observation.

As can be seen from fig. 2, the shared bicycle demand has a relationship with holidays, seasons, weather categories, air temperature, body surface temperature, humidity, casual users, and general users. However, only the body surface temperature data and the general user data are retained in consideration of the inherent correlation of the air temperature data with the body surface temperature data, the temporary user data, and the general user data. When the multidimensional variable is input, data of whether holidays, seasons, weather categories, body surface temperature, humidity and common users are input.

The preprocessed and correlation analyzed time series data set is divided into a training set, a validation set and a test set.

S3, learning the attention weight through a firework algorithm. And obtaining a predicted value of the Attention-LSTM model based on the Attention weight iteratively generated by the firework algorithm, and continuously optimizing the Attention weight according to the root mean square error between the predicted value and the true value. Finally, inputting the test set data into the training-completed attribute-LSTM model to obtain a first predicted value;

as the input data, past weather data is used. For time t, the sliding window length is 24, and the sliding window data is x= (X) ₁ ，x ₂ ，...，x ₂₄ ) The corresponding past shared bicycle demand is y= (y) ₁ ，y ₂ ，...，y _t )。

The idea of the Attention-based LSTM is to introduce a layer of Attention to the basic LSTM network. In order to avoid being trapped in a local optimal result, a global optimal parameter is found as far as possible, the Attention weight is learned through a firework algorithm, the Attention weight is continuously optimized according to the Root Mean Square Error (RMSE) between a predicted value and a true value, and meanwhile, an Adam optimizer is used for learning and optimizing the Attention-LSTM network parameter.

First, an attention layer weight is initialized, the initial weight being generated by a firework algorithm:

W＝(w ₁ ，w ₂ ，...，w ₂₄ )

based on the initial attention weight, paying attention to weather features of different importance degrees at each time point in the sliding window, and obtaining attention weighted time window features:

X’＝(x ₁ w ₁ ，x ₂ w ₂ ，...，x ₂₄ w ₂₄ )

then, X' is input into the Attention-LSTM network, and in each LSTM unit, the process is a nonlinear mapping process for the input features, and the formula is as follows:

i _t ＝σ(W _i ·[h _t-1 ，X’ _t ]+b _i )

f _t ＝σ(W _f ·[h _t-1 ，X’ _t ]+b _f )

c _t ＝tanh(W _C ·[h _t-1 ，X’ _t ]+b _C )

C _t ＝f _t *C _t-1 +i _t *c _t

o _t ＝σ(W _o ·[h _t-1 ，X’ _t ]+b _o )

h _t ＝o _t *tanh(C _t )

in the method, in the process of the invention,

sigma (·) represents the sigmoid activation function, X' _t H is the current input _t-1 For input from the last instant, [ h ] _t-1 ，X’ _t ]Representing a concatenation of two vectors, and, in addition, i _t To activate the output value of the function, W _i B is the weight of the input gate _i Is the bias term of the input gate, f _t Representing the output value of the forgetting gate in the LSTM cell, W _f Weight of forgetting gate b _f Bias item for forgetting door c _t Representing the cell state of the LSTM unit, wherein tanh is tanh activation function, W _C Weight of tanh layer, b _C Is the bias term of the tanh layer, C _t For the cell state at time t, C _t-1 Is the state of the unit at the time t-1, o _t Representing the output value of the output gate, W _o To output the weight of the door, b _o To output the bias term of the gate, h _t Is the output of the current moment. The final predicted output value is obtained by stitching:

and generating the optimal Attention weight combination in the Attention layer of the Attention-LSTM network through a firework algorithm. The firework algorithm consists of 6 parts, and is specifically introduced as follows:

s3131, generating initial Attention weight sets according to the dimension of input data and the time step 24, wherein each Attention weight combination is used as a firework, the number of the initial firework is set to be 20, the initial Attention weights can be obtained through random generation, the initial Attention weight sets are input into an Attention-LSTM model to obtain an initial sharing bicycle prediction result, and the Root Mean Square Error (RMSE) between the prediction result and historical demand data is calculated according to the initial Attention weight sets;

s3132, taking a Root Mean Square Error (RMSE) between a prediction result and historical demand data as a firework fitness function, and determining the quality of the firework according to the fitness function value corresponding to the firework quality;

s3133, the fireworks generate sparks in an explosion mode, the number of sparks generated by the fireworks and the size of explosion displacement are required to be controlled, the number of sparks generated by the fireworks with smaller fitness function value is large, the number of sparks generated by the fireworks with smaller fitness function value and the size of explosion displacement are controlled by the following formulas:

wherein s is _i 、a _i Respectively represent fireworks W _i The number of sparks and the magnitude of the explosive displacement, k and l represent the preset maximum number of sparks and the magnitude of the explosive displacement, f, respectively _max And f _min Respectively representing the maximum value and the minimum value of the fitness function, wherein xi is a non-zero bias to prevent the number of sparks and the size of explosion displacement from sinking to zero;

s3134, generating some Gaussian sparks conforming to Gaussian distribution by using a Gaussian random number with the average value of 0 and the difference of 1, and increasing the diversity of spark populations so as to achieve the purpose of rapid convergence;

s3135, selecting the fireworks or sparks with the highest quality from all the fireworks and sparks according to the fitness function, entering the next iteration, and entering the rest fireworks or sparks with larger distance from the fireworks or sparks with the highest quality into the next iteration, wherein the aim is to increase the diversity of the population, and the probability formula is selected as follows:

wherein R (W) _i ) Represents the sum of the distances of the ith firework or spark individual and other firework or spark individuals in the same iteration, p (w _i ) Probability of being selected for the ith firework or spark individual;

s3136, the iteration number in the firework algorithm is 50, and iteration is terminated when the preset iteration number is reached.

The attribute-LSTM model superparameter settings are as follows:

time step: 24, a step of detecting the position of the base;

Drop_out：0.3；

Batch_size＝32；

Hidden_numbers＝128；

an optimizer: adam.

And training a LightGBM model through data of the training set, verifying the LightGBM model through data of the verification set, and inputting data of the test set into the LightGBM model to obtain a second predicted value.

The LightGBM is a tree-based gradient lifting method, and adopts a Histogram algorithm to reduce memory occupation and calculation cost. The prediction capability of the model is gradually improved by combining a plurality of trees through repeated iteration.

In each iteration, the LightGBM weights the training data according to the previous residuals, discretizes the data using a histogram algorithm, and then finds the optimal split point to construct a decision tree. To further reduce the temporal and spatial complexity, the LightGBM uses GOSS algorithm to sample data, EFB algorithm to sample features, and mutually exclusive feature set to reduce the dimensionality.

The objective function of the LightGBM is:

wherein l (y _i ，F(x _i θ)) is a loss function, F (x) _i θ) is a predictive function of the model, θ is a model parameter. Omega (f) _k ) Is a regularization term used to avoid overfitting.

Through super parameter adjustment, the super parameters of the LightGBM in this embodiment are:

the learner is GBDT (GBDT), the number of the base learners is 100, and the learning rate is 0.1000;

sample (row) and feature (column) sampling cases: the sample sampling rate is 1.0, and the tree characteristic sampling rate is 1.0;

node splitting arrangement: the node splitting threshold value is 0.001, the minimum sample number of the leaf nodes is 10, the minimum weight of the samples in the leaf nodes is 0.002, the maximum depth of the tree is 10, and the minimum sample number of the leaf nodes is 10;

the L1 regular term is 1;

the L2 regularization term is 0.

S4: and fusing the first predicted value and the second predicted value by using the BP neural network, evaluating the fused result, and taking the evaluated fused result as a final predicted result of the shared bicycle demand. The final result is predicted by combining the two models, so that the complex relationship of the data is captured more accurately, the prediction precision is improved, and the problem that a single model can not capture the complex relationship of the data accurately is avoided.

The first predicted value and the second predicted value are input into the BP neural network, a BP neural network model is trained, and the BP neural network with three hidden layers is constructed, and each hidden layer has 64 neurons. The input dataset contains two predictors, namely predictors obtained by the Attention-LSTM and lightGBM models, respectively, thus defining the dimension of the input layer as 2 and the dimension of the output layer as 1. Using a Sigmoid function as an activation function, the Sigmoid function can map the input values between 0 and 1, can limit the output values between 0 and 1, and in the training process of the neural network, a cross-validation technique is used to divide the training set into five different folds.

At each training, four folds were used to train the neural network and the remaining one fold was used to verify the performance of the model, avoiding overfitting. In each training iteration, the Mean Square Error (MSE) is used as a loss function, with a maximum number of iterations of 2000.

Updating neural network weights using Adam optimizer with learning rate of 0.001, beta ₁ The exponential decay rate of the first moment estimate is 0.9, beta ₂ The exponential decay rate of the second moment estimate was 0.999.

The error checking of the fusion result is specifically as follows:

taking a predicted result output by the BP neural network as a predicted value of the final shared bicycle demand, comparing the predicted value with historical demand data, and selecting a Mean Square Error (MSE) and a Root Mean Square Error (RMSE) as evaluation indexes, wherein the specific formula is as follows:

Experiment platform:

a processor: intel i5-11400H, memory: 16.0GB;

operating system: windows10 (64 bits);

program language version: python3.8;

integrated development environment: anaconda and Pycharm.

The experiment selects a classical prediction model in traffic flow prediction as comparison: the prediction performance of the support vector machine algorithm (SVR), adaboost regression, xgboost regression, lightGBM and Attention-based long-short-term memory network (Attention-LSTM) is compared with that of the Attention-LSTM-LightGBM model. The experimental results are compared in table one:

table one, comparison table of classical prediction model and attribute-LSTM-LightGBM model prediction results:

the result shows that in the method provided by the invention, in the application of the shared bicycle demand prediction, a more accurate predicted value can be obtained, and compared with the traditional single machine learning or single deep learning model, the prediction result error is obviously reduced. Therefore, the demand quantity prediction model of the shared bicycle provided by the research can predict the demands of each period in one day for the operation departments of the shared bicycle, further adjust the distribution and scheduling strategies of the shared bicycle, improve the utilization rate and the profitability of the shared bicycle, and provide effective references for the traffic management departments to urban traffic planning and public transportation system optimization.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A shared bicycle demand prediction method based on attribute-LSTM-LightGBM is characterized by comprising the following steps:

2. The method for predicting the demand of a shared bicycle based on the Attention-LSTM-LightGBM according to claim 1, wherein the historical demand of the shared bicycle comprises a common user demand, a temporary user demand and a total demand, and the historical characteristic data comprises seasons, dates, holidays, weather categories, body surface temperature, temperature and humidity.

3. The method for predicting the demand of the shared bicycle based on the Attention-LSTM-LightGBM according to claim 2, wherein the weather data is weather data of the nearest weather observation station in the designated area, and recording time intervals of the demand data, the body surface temperature, the temperature and the humidity of the shared bicycle are all 1 hour.

4. The method for predicting the demand of the shared bicycle based on the Attention-LSTM-LightGBM according to claim 1, wherein the data preprocessing and the data correlation analysis of the time series data set are specifically as follows:

5. The method for predicting the demand of a shared bicycle based on the Attention-LSTM-LightGBM as set forth in claim 1, wherein the data correlation analysis of the time-series data set subjected to the data preprocessing in S2 is specifically as follows:

6. The shared bicycle demand prediction method based on the Attention-LSTM-LightGBM of claim 1, wherein the Attention-LSTM model is specifically as follows:

X＝(x ₁ ，x ₂ ，...，x ₂₄ )

W＝(w ₁ ，w ₂ ，...，w ₂₄ )

X’＝(x ₁ w ₁ ，x ₂ w ₂ ，...，x ₂₄ w ₂₄ )

7. The shared bicycle demand prediction method based on the Attention-LSTM-LightGBM of claim 6, wherein the firework algorithm consists of the following steps:

initializing a weight set;

calculating a fitness function;

explosion spark and gaussian spark generation;

the attention weights are iteratively updated.

8. The shared bicycle demand prediction method based on Attention-LSTM-LightGBM of claim 1, wherein the LightGBM model construction process is as follows:

s321, selecting basic learners, the number of the learners and the learning rate;

s322, adjusting sample and characteristic sampling parameters, setting node splitting, processing effective data in a time sequence data set subjected to data preprocessing through characteristic engineering, characteristic coding, characteristic normalization and vectorization to serve as input data of a LightGBM algorithm, and optimizing through a histogram algorithm and a GOSS algorithm to obtain a predicted value;

s323, training a LightGBM model through data of the training set, and verifying the LightGBM model through data of the verification set.

9. The method for predicting the demand of the shared bicycle based on the Attention-LSTM-LightGBM of claim 1, wherein the fusing of the first predicted value and the second predicted value by the BP neural network in S4 is specifically as follows:

10. The method for predicting the demand of the shared bicycle based on the Attention-LSTM-LightGBM as set forth in claim 1, wherein the error checking of the fusion result in S4 is specifically as follows: