Disclosure of Invention
The invention aims to provide a neural network LSTM model-based multi-factor traffic flow prediction method, which is used for solving the problem that the prediction effect of an LSTM prediction model in the prior art is poor.
In order to realize the task, the invention adopts the following technical scheme:
a multi-factor short-term traffic flow prediction method based on a neural network LSTM comprises the following steps:
step 1: acquiring traffic flow data within a period of time, and preprocessing the traffic flow data to obtain short-time traffic flow data;
step 2: screening the short-time traffic flow data obtained in the step 1 according to weather records and holiday records, taking short-time traffic flow data only in abnormal weather periods as a first data set, taking short-time traffic flow data only in holiday periods as a second data set, taking short-time traffic flow data in non-abnormal weather periods and holiday periods as a third data set, and taking short-time traffic flow data meeting both abnormal weather periods and holiday periods as a fourth data set;
and step 3: performing data cleaning, data reconstruction and normalization on the first data set, the second data set, the third data set and the fourth data set obtained in the step 2;
and 4, step 4: and (3) establishing an LSTM neural network model, selecting one of the normalized first data set, the normalized second data set, the normalized third data set and the normalized fourth data set obtained in the step (3) according to the weather condition and the holiday condition of the date to be predicted, training the LSTM neural network model by using the selected data set, adjusting LSTM parameters, and obtaining the traffic flow of the predicted date according to the established LSTM neural network model.
Further, step 1 comprises the following substeps:
and acquiring traffic flow data from a database of the highway toll station, preprocessing the traffic flow data, counting the number of vehicles every 15min to obtain short-time traffic flow data, wherein the traffic flow data comprises the starting time of the toll booth, the entrance network number, the entrance shift, the entrance vehicle type, the vehicle access date and time.
Further, the step 2 of screening the short-term traffic flow data obtained in the step 1 according to the weather record and the holiday record comprises the following steps:
(1) inquiring weather records, screening the short-term traffic flow data obtained in the step 1 according to the weather records, obtaining weather influence intensity factors of different weathers according to a formula I, dividing a time period in which the weather influence intensity factor is greater than a weather threshold value into abnormal weather time periods,
wherein, ViRepresenting traffic flow under different weather conditions, T representing total traffic flow in the month, i representing different weather, CiWeather influence coefficient, theta, representing different weatheriRepresenting the weather influence intensity factor, wherein the weather threshold is theta at sunny daysiTaking the value of (A);
(2) inquiring holiday records, screening the short-term traffic flow data obtained in the step 1 according to the holiday records, obtaining holiday influence coefficients according to a formula II, dividing the time period in which the holiday influence coefficients are larger than holiday threshold values into holiday time periods,
wherein E isjIndicates the traffic flow, beta, in the case of holidays of different festivalsjRepresenting holiday influence coefficients, j representing different holidays, the holiday threshold being beta when the holidays are not holidaysjThe value of (a).
Further, in step 3, a MInMaxScaler method is adopted for data cleaning and data reconstruction.
Further, training the LSTM neural network model and adjusting LSTM parameters using the selected data set in step 4, and obtaining the traffic flow of the predicted date according to the established LSTM neural network model specifically is as follows:
establishing a forgetting gate and an input gate in the LSTM neural network model, controlling the forgetting degree of the selected data set by using the forgetting gate, controlling the information of the predicted date by using the input gate, propagating a gradient value backwards by a gradient descent method, obtaining the optimal LSTM parameter after multiple iterations to further obtain the optimal LSTM neural network model, and obtaining the traffic flow of the predicted date by using the optimal LSTM neural network model.
Further, the traffic flow data acquired in the step 1 within a period of time is the traffic flow data within one month.
Compared with the prior art, the invention has the following technical characteristics:
1. the method is different from the conventional LSTM model for predicting the traffic flow, provides a more detailed thought, eliminates the influence of other factors such as weather factors, holiday factors and the like on the traffic flow, relatively improves the prediction precision, and enables the traffic flow prediction in a certain period of time in the future to be more accurate and effective.
2. The invention provides a data processing method for enabling data to be purer, and time periods corresponding to special weather and holidays are classified.
3. The invention classifies the data sets, selects more accurate data sets when predicting various conditions and improves the prediction precision.
Detailed Description
The embodiment discloses a multi-factor prediction method based on a neural network LSTM, which specifically includes the following steps as shown in fig. 1:
step 1: acquiring traffic flow data within a period of time, and preprocessing the traffic flow data to obtain short-time traffic flow data;
step 2: screening the short-time traffic flow data obtained in the step 1 according to weather records and holiday records, taking short-time traffic flow data only in abnormal weather periods as a first data set, taking short-time traffic flow data only in holiday periods as a second data set, taking short-time traffic flow data in non-abnormal weather periods and holiday periods as a third data set, and taking short-time traffic flow data meeting both abnormal weather periods and holiday periods as a fourth data set;
and step 3: performing data cleaning, data reconstruction and normalization on the first data set, the second data set, the third data set and the fourth data set obtained in the step 2;
and 4, step 4: and (3) establishing an LSTM neural network model, selecting one of the normalized first data set, the normalized second data set, the normalized third data set and the normalized fourth data set obtained in the step (3) according to the weather condition and the holiday condition of the date to be predicted, training the LSTM neural network model by using the selected data set, adjusting LSTM parameters, and obtaining the traffic flow of the predicted date according to the established LSTM neural network model.
Specifically, step 1 includes the following substeps:
and acquiring traffic flow data from a database of the highway toll station, preprocessing the traffic flow data, counting the number of vehicles every 15min to obtain short-time traffic flow data, wherein the traffic flow data comprises the starting time of the toll booth, the entrance network number, the entrance shift, the entrance vehicle type, the vehicle access date and time.
Preferably, a select statement is adopted when the traffic flow data is acquired from the database of the highway toll station, the traffic flow data acquired after preprocessing is useful information, namely date and time, which represents the time when a vehicle enters the toll station, data1. sample (period) sum () is used for carrying out downsampling on the data every 15min to count the number of vehicles, and short-time traffic flow data every 15min is acquired through preprocessing at the moment.
Specifically, the step 2 of screening the short-term traffic flow data obtained in the step 1 according to the weather record and the holiday record comprises the following steps:
(1) inquiring weather records, screening the short-term traffic flow data obtained in the step 1 according to the weather records, obtaining weather influence intensity factors of different weathers according to a formula I, dividing a time period in which the weather influence intensity factor is greater than a weather threshold value into abnormal weather time periods,
wherein, ViIndicating traffic flow in different weather conditions, T indicating total traffic flow in the current month, i indicating different weather conditions, e.g. sunny, cloudy, light, heavy, medium, foggy, etc., CiWeather influence coefficient, theta, representing different weatheriRepresenting the weather influence intensity factor, wherein the weather threshold is theta at sunny daysiTaking the value of (A);
by analysis, for example, the traffic flow is obviously reduced in heavy rain weather, when CiThe value of (i represents heavy rainy day) is smaller than that of normal weather, so CiThe smaller the value of (A), the larger the influence of the corresponding weather condition on the traffic flow.
θiThe larger the weather condition is, the larger the influence of the weather condition on the traffic flow is. For example, taking the partial data at the exit of the toll station in month 6 in 2018 as an example, the total traffic flow T in month june is 855418, and the table is as follows:
by the above table, daily traffic volume V of each day is known
iWeather conditions and total traffic flow in the month are T855418, which is then determined by
Determining the influence coefficients C
i0.032606281, 0.032351435, 0.029995862 and 0.030846908 respectively, and considering the weather condition, the worse the weather (such as light rain to middle rain) is, the influence coefficient C is obtained by analysis
iThe smaller. Then is further processed by
Separately determining the intensity factors theta
iThe values are 30.66893768, 30.91052993, 33.33793175 and 32.41816003 respectively, and the influence coefficient C can be obtained by analysis
iThe smaller the intensity factor theta
iThe larger. Finally, the weather is divided into different influence levels according to the intensity factor. If the intensity factor is 30.91052993 in light rain, the intensity factor can be adjusted toIts impact is graded as 1; the intensity factor in rainy weather from light rain to medium rain is 33.33793175, and its effect can be rated as 2.
Therefore passing through thetaiThe value of (a) divides the influence of i weather (i can be taken in sunny days, cloudy days, rainy days, foggy days and the like) on the short-time traffic flow into 4 grades:
(1)0 represents no effect;
(2)1 represents a slight effect;
(3)2 represents an influence;
(4)3 represents a greater effect;
the method has the advantages that the purity of the traffic flow under the normal weather condition is guaranteed, the data under the weather which is only influenced are screened and stored in a classified mode.
Weather conditions
|
Whether or not there is an influence
|
Whether to perform screening or not
|
Impact grade
|
In sunny days
|
Is free of
|
Whether or not
|
0
|
Cloudy day
|
Is free of
|
Whether or not
|
0
|
Small rain (cloudy turn little rain)
|
Is provided with
|
Is that
|
1
|
Rain (little rain turning rain)
|
Is provided with
|
Is that
|
2
|
Heavy rain (middle rain turning heavy rain)
|
Is provided with
|
Is that
|
3
|
Fog with large size
|
Is provided with
|
Is that
|
3
|
Haze
|
Is provided with
|
Is that
|
3 |
If severe weather including medium snow, heavy snow, sand storm, rainstorm, typhoon and the like occurs, the data under the weather can be deleted, and the purity of the data is ensured.
(2) Inquiring holiday records, screening the short-term traffic flow data obtained in the step 1 according to the holiday records, obtaining holiday influence coefficients according to a formula II, dividing the time period in which the holiday influence coefficients are larger than holiday threshold values into holiday time periods,
wherein E isjIndicates the traffic flow, beta, in the case of holidays of different festivalsjRepresenting holiday influence coefficients, j representing different holidays, the holiday threshold being beta when the holidays are not holidaysjThe value of (a).
By analysis, for example, the traffic flow rate in the golden week of eleven is obviously increased in October, when Ej(at this moment, j represents eleven holidays) is larger than the value of the non-holiday, and the total traffic flow T in the month is constant, so EjThe larger the value of (A), the output betajThe larger the value, the larger the influence of the holiday on the traffic flow at this time. Number of holidays according to statutory holidays and season and beta of the holidayjThe impact level may be divided into 3 levels:
(1) level 1 represents a slight effect;
(2) level 2 represents influence;
(3) level 3 represents a greater impact;
in order to make the prediction of the traffic flow more accurate, if the data of the holidays is included in the data, the data is subjected to screening operation.
Further, in step 3, a MInMaxScaler method is adopted for data cleaning and data reconstruction.
Further, training the LSTM neural network model and adjusting LSTM parameters using the selected data set in step 4, and obtaining the traffic flow of the predicted date according to the established LSTM neural network model specifically is as follows:
establishing a forgetting gate and an input gate in the LSTM neural network model, controlling the forgetting degree of the selected data set by using the forgetting gate, controlling the information of the predicted date by using the input gate, propagating a gradient value backwards by a gradient descent method, obtaining the optimal LSTM parameter after multiple iterations to further obtain the optimal LSTM neural network model, and obtaining the traffic flow of the predicted date by using the optimal LSTM neural network model.
The invention also adopts a plurality of model evaluation indexes to compare to evaluate the quality of the prediction result, including the square percentage error (MAPE), the square absolute error (MAE), the Mean Square Error (MSE) and the R square.
Wherein, R square:
representing the sum of the squares of the differences between the predicted data and the mean of the original data,
representing the sum of the squares of the difference between the raw data and the mean.
For the errors of square percentage error, square absolute error and mean square error, the smaller the error, the better the prediction effect. The R-squared is the ratio of the multiple regression sum of squares to the total sum of squares, which is a statistic that measures the degree of fit in the multiple regression equation, reflecting the ratio explained by the regression equation estimated in the variation of the dependent variable. The closer the square of R is to 1, the better the fit of the regression. In the invention, the square of R is mainly used as a main basis, and other evaluation indexes are used as references.
Example 1:
in the example, data of June in 2018 are collected once by 15Min, because the data required by the example are provided by a Xian toll station, the Xian belongs to a temperate zone semi-humid continental monsoon climate, the climate is clear in four seasons, the climate is mild, the rainfall is moderate, and most of the rainy season is concentrated in two seasons of spring and summer. Data were observed to be in the form of a small rain turning to a negative on three consecutive days, from month 6, month 16, to month 6, month 18. At this time, data from No. 6/month No. 1 to No. 18 are used as a training set, and No. 6/month No. 19 is used as a test set to predict the traffic flow condition of No. 6/month No. 19, and it is known from past weather data that No. 6/month No. 19 is also cloudy, the weather condition is different from the previous three days, and the test effect is shown in FIG. 2. And deleting the data from No. 6/month 16 to No. 6/month 18, excluding the condition that raindrops turn negative for three consecutive days, using the data after excluding the weather as a training set, and using the data from No. 6/month 19 as a test set to predict the traffic flow condition of No. 19, wherein the prediction effect is shown in fig. 3(a) and 3 (b). As can be seen by comparing fig. 2(a), 2(b) and 3(a), 3(b), the evaluation index mse, rmes, mae is significantly larger in fig. 2(b) than in fig. 3(b), and the R square in fig. 3(b) is about 1% larger than the R square in fig. 2(b), and the closer the R square is to 1, indicating the better degree of fit of regression. As can be seen from comparison of the prediction effects of fig. 2(a) and fig. 2(b) and fig. 3(a) and fig. 3(b), the weather factor may affect the prediction of the traffic flow, and the prediction effect may be better after the weather factor in the data is eliminated.
Example 2:
in this example, to take into account the effect of holidays on traffic flow predictions, the data in the 2 nd festival of 2018 were selected as the training set, since depending on the local climate conditions, there is less rain and snow in winter, which minimizes the effect of weather factors. Because the spring festival in month 2 is the most important day of the year, and the 15 days before and after the spring festival are the peak periods of spring return, the data of the spring festival in month 2 is selected to fully explain the influence of holidays on the traffic flow. The data of 2 months is used as a training set to be trained in the LSTM model, the data of 3 months No. 1 is used as a test set to be predicted, and the result is shown in fig. 4(a) and 4(b), wherein mse and rmse are larger, and R square is smaller than 0.92, the neural network LSTM model shows that the model has better prediction effect, and the R square is generally larger than 0.93. The reason for this is that since the entire 2 months is the most busy period of spring transportation and the 3 months of spring transportation has ended, the data of holiday 2 months is used to predict the data of 3 months 1, and the prediction effect becomes poor. The influence of holiday factors on the predicted traffic flow of the LSTM model is illustrated by the example.