CN107610469B

CN107610469B - Day-dimension area traffic index prediction method considering multi-factor influence

Info

Publication number: CN107610469B
Application number: CN201710955116.4A
Authority: CN
Inventors: 翁剑成; 邸小建; 林鹏飞; 王晶晶; 付宇; 毛力增
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-10-13
Filing date: 2017-10-13
Publication date: 2021-02-02
Anticipated expiration: 2037-10-13
Also published as: CN107610469A

Abstract

The invention discloses a day dimension area traffic index prediction method considering multi-factor influence, which comprises the following steps: dividing and aggregating areas; preprocessing original data of the regional traffic index; and (4) considering multi-factor influence, and predicting the regional traffic index in the daily dimension. The specific technical scheme of the invention is as follows: on the basis of traffic cell division, aggregating the traffic cells with the same aggregation property, and calculating a regional traffic index; determining a prediction time period and a prediction period based on the early warning requirement of the operation of the road network; preprocessing, such as extracting, making up, eliminating, comprehensively constructing historical data factor attribute sets from different angles, and the like, is carried out on the regional traffic data; predicting the running congestion state of the regional road network based on a decision tree theory; and determining the final prediction result of the regional traffic index by using the squared Euclidean distance. The method deepens the monitoring application of the running state of the urban road network on one hand, and provides technical support for early warning and forecasting work of the running state of the road network on the other hand.

Description

Day-dimension area traffic index prediction method considering multi-factor influence

Technical Field

The invention relates to a day-dimension area traffic index prediction method considering multi-factor influence, and belongs to the field of traffic data mining application and traffic information prediction.

Background

Along with the improvement of traffic informatization and intellectualization levels, traffic operation monitoring in different ranges and contents is realized in various cities and areas, and powerful support service is provided for guaranteeing the safety, high efficiency and green operation of a traffic system. On the premise of having massive monitoring data, how to perform early warning, prediction and provide corresponding control measures more actively by passive monitoring of the traffic running state becomes a core problem which is more and more concerned by industry governing departments. If the running efficiency of the road network is low, the normal running of the city and the traveling of citizens are inevitably seriously influenced. Therefore, the research and practice of the early warning and prediction model oriented to the urban road network provides powerful data support for the active prevention and control of abnormal traffic states, and has powerful promotion effect on the management of the industry governing department and the improvement of the operation scheduling level.

Research at home and abroad aiming at traffic prediction mainly focuses on short-time prediction, namely, real-time prediction is made on traffic flow at the next decision time t +1 or even a plurality of later times at the time t. It is generally considered that the prediction time span between t and t +1 does not exceed a prediction of 15 min. The short-term traffic flow mainly comprises a model based on a statistical method, a Kalman filtering model, a nonparametric regression model, a neural network model, a model based on a chaos theory and the like, and various models have good prediction effects in the aspect of short-term traffic flow prediction. However, various literature researches show that the prediction research on the traffic flow is mainly short-time prediction and mainly dynamic prediction for hours or a day in the future. The medium and long term prediction has less application, and further cannot serve an industry manager to comprehensively grasp the prospective operation condition of the road network in a long term in the future. Meanwhile, the road network state influence factors are not finely divided, and various factors possibly influencing the traffic flow running state are not fully considered.

The method comprises the steps of firstly, realizing division of traffic zones according to a traffic planning principle, reducing the dimension of the number of regional evaluation objects through spatial autocorrelation analysis, and further obtaining a regional traffic index. And establishing a historical sample database of a traffic state evolution series by combining various attribute data such as adverse weather data, large-scale activity record information traffic control, time events and the like through the processes of data screening, elimination, discrimination and the like. Through numerical tests, a regional traffic index prediction model considering multi-factor influence is constructed, and regional traffic index prediction under daily dimension is realized. The method is helpful for a management decision maker to master the areas and time intervals in advance where high-risk congestion is likely to occur in the next week, and lays a foundation for inducing and reasonably distributing traffic demands and guaranteeing smooth traffic by combining a corresponding traffic operation early warning processing mechanism and method, so that the traffic operation is safe, green and efficient.

Disclosure of Invention

The invention aims to provide a day-dimension regional traffic index prediction method considering multi-factor influence, which is used for acquiring the regional traffic index change trend in a period in advance so as to realize advanced prevention and control and early warning and forecast of the road network operation condition. The method provides support for improving the operation efficiency of the road network, reducing the congestion condition and the accident occurrence probability and improving the operation safety service level of the road network during peak traveling.

In order to achieve the purpose, the technical scheme adopted by the invention is a daily dimension area traffic index prediction method considering multi-factor influence, and the method specifically comprises the following steps:

step 1, dividing and aggregating traffic areas;

step 1.1, dividing traffic cells based on a road network structure;

factors such as land property, administrative division, natural landform, road network structure and the like are comprehensively considered, and the analysis area is divided into a plurality of traffic cells. When dividing traffic cells, the large difference of traffic demands of inner and outer ring areas of a city is considered, the area division area of the area with large traffic demand is small, and the area of the area with small traffic demand is increased.

Step 1.2, aggregating traffic districts based on spatial autocorrelation analysis;

in order to enhance the pertinence and the accuracy of the evaluation of the running state of the regional road network, trivial traffic cells are merged, and regions with similar running states of the road network are subjected to region aggregation by adopting a spatial autocorrelation division method. The local Molan index (LISA index for short) is used as a local spatial autocorrelation test index to identify the clustering property of the operation state in the region, namely, the spatial clustering of the traffic cells is realized according to a property similarity criterion.

Step 2, determining relevant prediction parameters of the regional traffic index;

the prediction time interval and the prediction period are important parameters in traffic prediction. The predicted time interval represents the minimum time unit of the data series of traffic state changes. The regional traffic index prediction aims at predicting the overall trend of the overall operation state of the road network of the next week region in advance, and particularly accurately identifying the region with high road network operation pressure in the traffic peak period so as to make corresponding dredging measures in advance. Therefore, the prediction time interval and the prediction period of the regional traffic index should be determined by comprehensively considering the efficiency and the accuracy of the prediction model in practical application.

Step 3, preprocessing the original data of the regional traffic index;

step 3.1, calculating a regional traffic index;

the specific calculation steps are as follows:

s1, calculating an initial regional traffic index R_m: and calculating the ratio of the free flow speed of the region m passing through each grade section to the actual average traveling speed at the statistical interval of not more than 15 minutes. Respectively counting road mileage of each grade road in a whole road network and an area m road network at a severe congestion level by referring to a road traffic grade division standard of the road sections, taking the severe congestion mileage proportion in the area m road network as a weight, and calculating according to a formula (1) to obtain an initial index R of the area traffic_m。

Wherein α represents a time period; m represents the number of regions; p represents the number of road segments in the area m; l is_αmRepresenting the road mileage at the serious congestion level in the road network in the region m in the alpha period;

representing the free flow speed of p road sections in the passing area m in the alpha period;

representing the actual average speed over a period of alpha through p road segments in the area m.

S2. calculatingRegional traffic index RTI: in the pair R_mAfter data accumulation in a period of time, normalizing the regional traffic index pre-index according to a formula (2) to finally obtain a value range of [0,10 ]]Regional traffic index RTI.

In the formula, RTI represents a regional traffic index; r represents an area traffic initial index; r_minMinimum value, R, representing the initial index of regional traffic in the historical data series_mtxRepresenting the maximum value of the initial index of regional traffic in the historical data series.

Step 3.2, making up missing values by using original data;

the rule for the original data to compensate for missing values is as follows:

s1, extracting a data series with a deletion proportion of less than or equal to 15% from original data, and performing compensation processing on discontinuous parts in the data series;

s2, under the condition that single time point data is lost, an arithmetic mean value of two adjacent time point data is adopted as recovery data;

s3, extracting corresponding historical data RTI of the previous i weeks in the same period under the condition that a plurality of continuous time point data are missing_i，w_iRepresenting RTI_iThe corresponding weight and the calculation formula of the lost data RTI are as follows:

wherein 0 < w_i1, the weights satisfy the following relationship in terms of time distance and time mutual correlation degree: w is a_i+1＜w_iAnd is

i does not exceed 3.

3.3, removing abnormal values from the original data;

the rule for rejecting abnormal values from the original data is as follows:

s1, calculating a front difference and a rear difference of each time index value in a data series;

B_{1_t}＝RTI_t-RTI_t-1 (4)

B_{2_t}＝RTI_t+1-RTI_t (5)

in the formula, B₁__tRepresenting the previous difference of the index value at a certain time; b is₂__tA posterior difference representing an index value at a time; RTI_tRepresenting index data at a certain current time; RTI_t-1Representing the index data at the previous moment; RTI_t+1Representing the index data at the later time.

S2, calculating the fluctuation index of the index value at each moment;

wherein Z represents the fluctuation index of the index value at a certain moment; b is₁__tA pre-difference representing the value of the index at that time; b is₂__tA back difference representing the value of the index at that time; RTI_tRepresenting the regional traffic index at the current time.

And S3, judging whether the numerical value is a singular value according to the Z value obtained by calculation in the step 3.3 in the step 2, taking 15% as a judgment limit, and if Z is more than 15%, determining that the numerical value is the singular value and removing.

Step 3.4, carrying out regional traffic index grading treatment;

and dividing the regional traffic indexes into 5 classes by using a lower threshold dividing principle, wherein the classification result is used for predicting the congestion state grade of the decision tree, and the index data is used for predicting the traffic indexes by using Euclidean distance after the classification is finished.

TABLE 1 road traffic operation level division

Regional Traffic Index (RTI)	0≤RTI＜2	2≤RTI＜4	4≤RTI＜6	6≤RTI＜8	8≤RTI≤10
						Road network operation level	Clear	Is basically unblocked	Light congestion	Moderate congestion	Severe congestion

Step 3.5, constructing a historical data factor attribute set;

since the change in the regional traffic index is affected by a variety of factors, a set of factor attributes needs to be first determined for a set of training samples. The set of factor attributes is divided into a region attribute, a date attribute, a weather attribute, and an event attribute. The date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area.

Table 2 factor attribute selection

Step 4, constructing a regional traffic index prediction model

Step 4.1, predicting the grade of the running congestion state of the regional road network;

and (3) generating a regional traffic index decision tree through the training sample set constructed in the step (3.5), wherein the process mainly comprises a division and selection process, an updating process of the regional traffic index decision tree and a prediction process of the regional road network running state grade.

(1) Recursive division of regional exponential samples for a tree building process

Firstly, setting a training data set of nodes as D, and calculating the kini indexes of all factors, including region attributes, date attributes, weather attributes and event attributes. At this time, for each characteristic attribute a, for each value a that it may take, D is divided into D according to whether the test of the sample point pair a ═ a is yes or no₁And D₂In both sections, the kini index at a ═ a is calculated using formula (7) and formula (8).

Where Gini (D) represents the uncertainty of set D; k represents the total number of categories; k represents the category sequence number; p is a radical of_kRepresenting the probability that a sample point belongs to class k.

In the formula, Gini (D, a) represents the uncertainty of the set D after a ═ a division.

Selecting the feature with the minimum Gini index and the corresponding segmentation point as the optimal feature and the optimal segmentation point from all the possible features A and all the possible segmentation points a. And generating two sub-nodes from the current node, and distributing the training data set to the two sub-nodes according to the characteristics.

And calling the first sub-node and the second sub-node recursively until a stopping condition is met.

Generating CART decision tree.

Set up the minimum sample quantity that a leaf node needs, or the maximum depth of the tree, avoid overfitting.

(2) Updating regional traffic index decision trees

The accuracy of the model is greatly influenced by the accuracy of weather forecast, and the timely updating of historical data, particularly historical weather factors, is beneficial to improving the accuracy of the model, so that the method provides a perfecting mechanism for dynamically updating the historical training library of the regional traffic index. In the training library, on one hand, only historical data of n months before a prediction period is selected and reserved all the time in order to improve the algorithm operation speed; on the other hand, before predicting the ith period, the real weather condition of the i-1 period is updated.

(3) Inputting each attribute value in the prediction time period to predict the congestion state grade

And collecting various attribute information such as next week tail number restriction, weather conditions, large activities, traffic control and the like, and predicting by using the generated regional traffic index decision tree to obtain a rough classification result of the traffic operation state grade in the prediction time period. In the division selection process, the division standard needs to be determined, namely the critical value of the attribute variable is determined

Step 4.2, using the squared Euclidean distance to predict the regional traffic index;

and screening the regional traffic index in the historical state most similar to the current prediction state by using the squared Euclidean distance. Definition Y { Y₁,y₂,…,y_qThe current prediction state vector is used as the prediction state vector, and the history state vectors with the same rough classification are combined into a set C_s{C_s1,C_s2,…C_sq}. Therefore, the squared euclidean distance between the historical state vector and the predicted state vector is calculated as follows:

in the formula, C_sRepresenting the squared Euclidean distance between the s-th historical state and the predicted state with the same rough classification result; x_sqRepresenting the value of the qth attribute in the s-th historical state vector in the data set X with the same coarse classification result; y is_qA value representing the qth attribute in the prediction state vector Y; q is 1,2, …, Q and Q is positive integerAnd (4) counting.

Taking the regional traffic indexes with the squared Euclidean distance smaller than the threshold c to form a set V { V }₁,V₂,…V_Z}. The threshold value c is the c-th percentile of the Euclidean distance, and the average absolute error between the predicted value and the actual value of the regional traffic index is the minimum at the moment.

The final predicted regional traffic index is:

in the formula, P_fRepresents a prediction index value; z is the amount of data in set V.

When the regional traffic index calculation method model is constructed, the road mileage ratio of the road network in the region at the serious congestion level is used as a weight value.

The historical data factor attribute set constructed in step 3.5 is divided into an area attribute, a date attribute, a weather attribute and an event attribute. The date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area. The method specifically comprises the following steps: region, month, period, workday, holiday, week, student holiday, end cap, weather, special event, major event, and traffic control. Various factors which may affect the operation state of the road network are comprehensively considered, and continuous expansion and updating are supported.

And 4.1, establishing a perfecting mechanism for dynamically updating the regional traffic index historical training library. The attribute information of the historical data is updated in real time while the high-efficiency algorithm operation speed is guaranteed, and the influence of errors caused by the weather factor information in the historical data is reduced to the minimum.

After the congestion level is determined by using the regional traffic index decision tree, a squared Euclidean distance method is further selected, and the traffic index of the historical state closest to the predicted time period is searched to serve as the traffic index of the time period.

Compared with the prior art, the invention has the following obvious advantages and beneficial effects:

(1) the invention fully considers various factors influencing the running state of the road network, such as areas, dates, weather, events and the like, provides a regional traffic index prediction method based on a decision tree theory, comprehensively considers the prediction requirements and application feasibility, and can realize the regional traffic index prediction detailed to the daily dimension of each cell. The method overcomes the defects that the prior related research only focuses on short-time prediction of traffic information, the overall operation condition of the lower road network is difficult to evaluate, and active prevention and control measures are developed in advance.

(2) The invention can accurately predict the object from the whole road network traffic index to the regional traffic index, so that the prediction result is more practical and the regional road network operation characteristics are more accurately described. The forecasting process is easy to operate, and meanwhile, with the continuous improvement of historical data, the factor attribute set can be further updated and improved, various influence factors are considered in detail, and data support is provided for urban road network forecasting and early warning.

(3) The updating iteration of the historical data can effectively improve the model precision. The method establishes a perfection mechanism for dynamically updating the regional traffic index historical training library. The attribute information of the historical data is updated in real time while the high-efficiency algorithm operation speed is guaranteed, and the influence of errors caused by the weather factor information in the historical data is reduced to the minimum.

(4) The inspection and analysis of model precision shows that the average absolute error of the predicted value and the actual value of the regional traffic index is basically controlled within 0.6, and the average relative error can be kept between 4% and 10%. The method has better prediction accuracy in the peak period of working days and non-working days. The method is more feasible when being applied to the regional traffic index prediction work of daily dimension.

Drawings

Fig. 1 is a schematic diagram of traffic cell aggregation based on spatial autocorrelation analysis;

FIG. 2 is a flow chart of the raw data preprocessing of regional traffic indexes;

FIG. 3 is a flow chart of regional traffic index prediction based on decision tree theory;

FIG. 4 shows the result of the prediction of early peak traffic index in the country trade area of 17-23 months in 2017;

FIG. 5 shows the predicted late peak traffic index in the national trade area of 17-23 months in 2017;

FIG. 6 is a flow chart of the method of the present invention.

Detailed Description

The method selects the country trade region traffic index of Beijing city as a prediction object, predicts the traffic index of the region in 4 month and 17 to 23 day in 2017 by using a medium-long term region traffic index prediction method based on a decision tree theory, and performs model precision verification on the early peak index and the late peak index.

The specific implementation steps are as follows:

step 1, dividing key attention areas;

the Beijing city is divided into 1911 traffic districts under the premise that the administrative district is not broken and natural division zones such as rivers, railways and the like are taken as the boundaries of the traffic districts by comprehensively considering factors such as land property, the administrative district, natural landforms, road network structures and the like. Considering that the difference of the traffic demands of the inner and outer ring areas of the city is large, and the division fineness of the traffic districts is different, the division area of the inner area of the five rings is small, and the area of the outer ring area is increased. So as to achieve the purposes of reducing the workload as much as possible and enhancing the operability of investigation and analysis under the condition of meeting the precision requirement.

On the basis of the above region division, autocorrelation inspection is performed on the local space by using the local Moran index, and the autocorrelation degree between the region m and the adjacent region is effectively measured. For the area with space autocorrelation property, the grid cell attribute value x is utilized_mAnd corresponding spatial lag x_m，-1In turn with the mean of the variable attributes

The magnitude relationship is spatially clustered. Traffic cells with the same aggregate properties are further aggregated.

usually, the traffic state change of 5-15 minutes continuously has certain stability and regularity. The regional traffic index prediction under the medium and long term angles aims at predicting the overall trend of the overall operation state of the road network of the next week region in advance, so that the traffic state at the future time can be accurately predicted in real time by taking 30 minutes as a prediction time interval on the basis of determining the operation characteristics and prediction requirements of the road network. In addition, the method only predicts the time period with strong prediction demand and obvious traffic flow change, and sets the prediction time period to be 18 hours from 5:00 in the morning to 23:00 in the evening.

Step 3, preprocessing the original data of the regional traffic index;

the raw data is preprocessed by screening, compensation, elimination, etc. according to the data preprocessing flow shown in fig. 2. The data set of the pre-treatment is shown in the following table:

TABLE 3 traffic index data (parts) of the pre-processed areas

Area name	Date and time	Traffic index	Congestion level
				Country trade area	201703251800	7.3	3
Country trade area	201703251805	7.5	3
				Country trade area	201703251810	7.6	3
Country trade area	201703251815	7.8	3
				Country trade area	201703251820	7.6	3

Then, a historical data factor attribute set is constructed, taking a country trade region as an example, and the region ID is numbered 18. Sample data are shown in the following table:

table 4 training sample data example

Step 4, constructing a regional traffic index prediction model;

and integrating the factor attribute set and the preprocessed regional traffic index data to be used as a training sample library required by the prediction work. Inquiring the date attribute of the prediction week, the weather condition, the large-scale activity and other related information, and predicting the regional traffic index according to the prediction flow shown in fig. 3.

Basic information table for 4 months, 17-23 days in table 52017

Table 6 prediction of peak traffic index prediction results in country trade area during weekdays

Time period	4 month and 17 days	4 month and 18 days	4 month and 19 days	4 month and 20 days	4 month and 21 days
						7:00	5.3	6.9	5.2	5.2	3.1
7:30	7.0	6.9	7.8	6.9	6.9
						8:00	6.9	6.9	6.9	6.3	5.4
8:30	6.2	6.9	6.9	6.9	6.8
						9:00	5.2	7.3	5.9	6.4	6.8
17:00	7.0	7.6	7.0	7.1	7.1
						17:30	7.1	7.1	8.3	8.3	8.3
18:00	8.4	7.7	8.3	8.3	8.2
						18:30	6.8	8.3	7.1	7.1	7.2
19:00	7.0	7.0	5.0	5.0	6.1

In order to evaluate the effect of the prediction model, the average absolute error, the average relative error, the root mean square error and the error distribution probability (the data proportion of which the absolute error is less than 0.5) are used as evaluation indexes of the prediction effect, and the accuracy of the traffic index prediction model of the medium-long term region based on the decision tree theory is verified in the peak period and the peak-balancing period of the working day and the non-working day respectively. The results are shown in the following table:

table 7 prediction results of peak traffic index of country trade region during weekday

The statistical results show that the average absolute error of the predicted value and the actual value of the regional traffic index is controlled within 0.6, the average relative error can be kept between 4% and 10%, the prediction precision in each period is good, and particularly the prediction result in the peak period is better than that in the peak-smoothing period. The root mean square error of each test time interval is about 0.5, which shows that the discrete degree of the error is not large, and reflects the error stability of the prediction model to a certain degree. The distribution probability of the error shows that the absolute error of more than 80% of data can be controlled within 0.5 basically, and the absolute error of more than 90% of data series in the peak period is lower than 0.5, so that the predicted work service requirement is basically met.

Claims

1. A day dimension area traffic index prediction method considering multi-factor influence is characterized by comprising the following steps: the method specifically comprises the following steps:

step 1, dividing and aggregating traffic areas;

step 1.1, dividing traffic cells based on a road network structure;

comprehensively considering the factors of land property, administrative division, natural landform and road network structure, dividing the analysis area into a plurality of traffic cells; when dividing traffic cells, the fact that the difference of traffic demands of inner and outer ring areas of a city is large, the divided area of the area with large traffic demand is small, and the area of the area with small traffic demand is increased is considered;

in order to enhance the pertinence and the accuracy of the evaluation of the running state of the regional road network, trivial traffic cells are merged, and regions with similar running states of the road network are subjected to region aggregation by adopting a spatial autocorrelation division method; identifying the aggregation property of the operation state in the region by using the local Moran index as a local spatial autocorrelation test index, namely realizing spatial clustering of traffic cells according to a property similarity criterion;

the prediction time interval and the prediction period are important parameters in traffic prediction; predicting a minimum time unit of the data series whose time interval represents the traffic state change; the regional traffic index prediction aims at predicting the overall trend of the overall operation state of the road network of the next week region in advance and accurately identifying the region with high road network operation pressure in the traffic peak period so as to make corresponding dredging measures in advance; therefore, the efficiency and the precision requirements of the prediction model in practical application are comprehensively considered, and the prediction time interval and the prediction period of the regional traffic index are determined;

step 3, preprocessing the original data of the regional traffic index;

step 3.1, calculating a regional traffic index;

the specific calculation steps are as follows:

s1, calculating an initial regional traffic index R_m: taking not more than 15 minutes as a statistical interval, calculating the ratio of the free flow speed of the region m passing through each level of road sections to the actual average running speed; respectively counting road mileage of each grade road in a whole road network and an area m road network at a severe congestion level by referring to a road traffic grade division standard of the road sections, taking the severe congestion mileage proportion in the area m road network as a weight, and calculating according to a formula (1) to obtain an initial index R of the area traffic_m；

representing the actual average speed of p road sections in the passing area m in the alpha period;

s2, calculating an area traffic index RTI: in the pair R_mAfter data accumulation in a period of time, normalizing the regional traffic index pre-index according to a formula (2) to finally obtain a value range of [0,10 ]]Regional traffic index RTI;

in the formula, RTI represents a regional traffic index; r_mRepresenting an area traffic initial index; r_minRepresenting the minimum value, R, of the initial index of regional traffic in the historical regional traffic index data series_maxRepresenting the maximum initial index of regional traffic in a historical data seriesA large value;

step 3.2, making up missing values by using original data;

the rule for the original data to compensate for missing values is as follows:

s3, extracting corresponding historical data RTI of the previous i weeks in the same period under the condition that a plurality of continuous time point data are missing_i，w_iRepresenting RTI_iCorresponding weight, lost regional traffic index data RTI_dThe calculation formula of (2) is as follows:

i is not more than 3;

3.3, removing abnormal values from the original data;

the rule for rejecting abnormal values from the original data is as follows:

B_{1_t}＝RTI_t-RTI_t-1 (4)

B_{2_t}＝RTI_t+1-RTI_t (5)

in the formula, B_{1_t}Representing the previous difference of the index value at a certain time; b is_{2_t}A posterior difference representing an index value at a time; RTI_tRepresenting the current regional traffic index at a certain moment; RTI_t-1Representing the regional traffic index at the previous moment; RTI_t+1Representing the area of the next momentA domain traffic index;

s2, calculating the fluctuation index of the index value at each moment;

wherein Z represents the fluctuation index of the index value at a certain moment; b is_{1_t}A pre-difference representing the value of the index at that time; b is_{2_t}A back difference representing the value of the index at that time;

s3, judging whether the numerical value is a singular value or not according to the Z value obtained by calculation in the step 3.3 in the step 2, taking 15% as a judgment limit, and if Z is more than 15%, determining that the numerical value is the singular value and removing;

step 3.4, carrying out regional traffic index grading treatment;

the regional traffic index is divided into 5 classes by using the following threshold dividing principle, wherein the classes are respectively smooth: RTI is more than or equal to 0 and less than 2, and is basically unblocked: RTI is more than or equal to 2 and less than 4, and light congestion is caused: RTI is more than or equal to 4 and less than 6, and medium congestion is caused: RTI is more than or equal to 6 and less than 8, and the congestion is serious: RTI is more than or equal to 8 and less than or equal to 10; the classification result is used for predicting the congestion state grade of the decision tree, and the index data is used for predicting the traffic index by using Euclidean distance after the classification is finished;

step 3.5, constructing a training sample set;

because the change of the regional traffic index is influenced by various factors, a factor attribute set is determined for a training sample set; dividing the factor attribute set into an area attribute, a date attribute, a weather attribute and an event attribute; the date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area; the date attribute factors comprise month, time period, working day, holidays, week, student holidays and tail number restriction; weather attribute factors mainly include rain, snow, haze and the like; the event attribute factors comprise special events, large-scale activities and traffic control; carrying out standard formatting treatment on the collected influence factor basic data, and integrating an influence factor attribute set and a preprocessed traffic index data set to be used as a training sample set;

step 4, constructing a regional traffic index prediction model

generating a regional traffic index decision tree through the training sample set constructed in the step 3.5, wherein the process mainly comprises a division and selection process, an updating process of the regional traffic index decision tree and a prediction process of the regional road network running state grade;

Setting a training data set of nodes as D, and calculating the kini indexes of all factors, including region attributes, date attributes, weather attributes and event attributes; at this time, for each characteristic attribute a, for each value a that it may take, D is divided into D according to whether the test of the sample point pair a ═ a is yes or no₁And D₂Calculating a kini index when A is a by using formula (7) and formula (8);

where Gini (D) represents the uncertainty of set D; k represents the total number of categories; k represents the category sequence number; p is a radical of_kRepresenting the probability that the sample point belongs to the kth class;

where Gini (D, a) represents the uncertainty of the set D after a ═ a segmentation;

selecting the feature with the minimum Gini index and the corresponding segmentation point as the optimal feature and the optimal segmentation point from all possible feature attributes A and all possible segmentation points a thereof; generating two sub-nodes from the current node, and distributing the training data set to the two sub-nodes according to the characteristics;

recursively calling the first and second sub-nodes until a stopping condition is met;

generating a CART decision tree;

setting the minimum sample number required by a leaf node or the maximum depth of the tree to avoid overfitting;

(2) updating regional traffic index decision trees

The accuracy of the model is greatly influenced by the accuracy of weather forecast, and the timely updating of historical data is beneficial to improving the accuracy of the model, so that the method provides a perfecting mechanism for dynamically updating the historical training library of the regional traffic index; in the training library, on one hand, only historical data of n months before a prediction period is selected and reserved all the time in order to improve the algorithm operation speed; on the other hand, before predicting the jth period, updating the real weather condition of the j-1 period;

Collecting various attribute information of next week tail number restriction, weather conditions, large activities and traffic control, and predicting by using the generated regional traffic index decision tree to obtain a rough classification result of the traffic running state grade in a prediction time period; in the division selection process, the division standard needs to be determined, namely the critical value of the attribute variable is determined

screening the regional traffic index in the historical state which is most similar to the current prediction state by using the squared Euclidean distance; definition Y { Y₁，y₂，...，y_qThe current prediction state vector is used as the prediction state vector, and the history state vectors with the same rough classification are combined into a set X_s{X_S1，X_s2，...，X_sq}; therefore, the squared euclidean distance between the historical state vector and the predicted state vector is calculated as follows:

in the formula, C_sSquared Euclidean representing the s-th history and prediction states with the same coarse classification resultA distance; x_sqRepresenting the value of the qth attribute in the s-th historical state vector in the data set X with the same coarse classification result; y is_qA value representing the qth attribute in the prediction state vector Y; q is 1,2, Q is a positive integer;

taking the regional traffic indexes with the squared Euclidean distance smaller than the threshold c to form a set V { V }₁，V₂，...V_L}; the threshold value c is the c-th percentile of the Euclidean distance, and the average absolute error between the predicted value and the actual value of the regional traffic index is minimum at the moment;

the final predicted regional traffic index is:

in the formula, P_fRepresents a prediction index value; l is the number of data in set V.

2. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: when the regional traffic index calculation method model is constructed, the road mileage ratio of the road network in the region at the serious congestion level is used as a weight value.

3. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: the historical data factor attribute set constructed in the step 3.5 is divided into an area attribute, a date attribute, a weather attribute and an event attribute; the date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area; the method specifically comprises the following steps: region, month, time period, workday, holiday, week, student holiday, tail number restriction, weather, special event, major event, and traffic control; various factors which may affect the operation state of the road network are comprehensively considered, and continuous expansion and updating are supported.

4. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: step 4.1, a perfection mechanism for dynamically updating the regional traffic index historical training library is established; the method has the advantages that the efficient algorithm operation speed is guaranteed, meanwhile, the attribute information of the historical data is updated in real time, and the error influence caused by the information of the historical data is reduced to the minimum.

5. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: after the congestion level is determined by using the regional traffic index decision tree, a squared Euclidean distance method is selected, and the traffic index of the historical state closest to the predicted time period is searched to serve as the traffic index of the time period.