CN107610469B - Day-dimension area traffic index prediction method considering multi-factor influence - Google Patents
Day-dimension area traffic index prediction method considering multi-factor influence Download PDFInfo
- Publication number
- CN107610469B CN107610469B CN201710955116.4A CN201710955116A CN107610469B CN 107610469 B CN107610469 B CN 107610469B CN 201710955116 A CN201710955116 A CN 201710955116A CN 107610469 B CN107610469 B CN 107610469B
- Authority
- CN
- China
- Prior art keywords
- index
- traffic
- regional
- attribute
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Traffic Control Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a day dimension area traffic index prediction method considering multi-factor influence, which comprises the following steps: dividing and aggregating areas; preprocessing original data of the regional traffic index; and (4) considering multi-factor influence, and predicting the regional traffic index in the daily dimension. The specific technical scheme of the invention is as follows: on the basis of traffic cell division, aggregating the traffic cells with the same aggregation property, and calculating a regional traffic index; determining a prediction time period and a prediction period based on the early warning requirement of the operation of the road network; preprocessing, such as extracting, making up, eliminating, comprehensively constructing historical data factor attribute sets from different angles, and the like, is carried out on the regional traffic data; predicting the running congestion state of the regional road network based on a decision tree theory; and determining the final prediction result of the regional traffic index by using the squared Euclidean distance. The method deepens the monitoring application of the running state of the urban road network on one hand, and provides technical support for early warning and forecasting work of the running state of the road network on the other hand.
Description
Technical Field
The invention relates to a day-dimension area traffic index prediction method considering multi-factor influence, and belongs to the field of traffic data mining application and traffic information prediction.
Background
Along with the improvement of traffic informatization and intellectualization levels, traffic operation monitoring in different ranges and contents is realized in various cities and areas, and powerful support service is provided for guaranteeing the safety, high efficiency and green operation of a traffic system. On the premise of having massive monitoring data, how to perform early warning, prediction and provide corresponding control measures more actively by passive monitoring of the traffic running state becomes a core problem which is more and more concerned by industry governing departments. If the running efficiency of the road network is low, the normal running of the city and the traveling of citizens are inevitably seriously influenced. Therefore, the research and practice of the early warning and prediction model oriented to the urban road network provides powerful data support for the active prevention and control of abnormal traffic states, and has powerful promotion effect on the management of the industry governing department and the improvement of the operation scheduling level.
Research at home and abroad aiming at traffic prediction mainly focuses on short-time prediction, namely, real-time prediction is made on traffic flow at the next decision time t +1 or even a plurality of later times at the time t. It is generally considered that the prediction time span between t and t +1 does not exceed a prediction of 15 min. The short-term traffic flow mainly comprises a model based on a statistical method, a Kalman filtering model, a nonparametric regression model, a neural network model, a model based on a chaos theory and the like, and various models have good prediction effects in the aspect of short-term traffic flow prediction. However, various literature researches show that the prediction research on the traffic flow is mainly short-time prediction and mainly dynamic prediction for hours or a day in the future. The medium and long term prediction has less application, and further cannot serve an industry manager to comprehensively grasp the prospective operation condition of the road network in a long term in the future. Meanwhile, the road network state influence factors are not finely divided, and various factors possibly influencing the traffic flow running state are not fully considered.
The method comprises the steps of firstly, realizing division of traffic zones according to a traffic planning principle, reducing the dimension of the number of regional evaluation objects through spatial autocorrelation analysis, and further obtaining a regional traffic index. And establishing a historical sample database of a traffic state evolution series by combining various attribute data such as adverse weather data, large-scale activity record information traffic control, time events and the like through the processes of data screening, elimination, discrimination and the like. Through numerical tests, a regional traffic index prediction model considering multi-factor influence is constructed, and regional traffic index prediction under daily dimension is realized. The method is helpful for a management decision maker to master the areas and time intervals in advance where high-risk congestion is likely to occur in the next week, and lays a foundation for inducing and reasonably distributing traffic demands and guaranteeing smooth traffic by combining a corresponding traffic operation early warning processing mechanism and method, so that the traffic operation is safe, green and efficient.
Disclosure of Invention
The invention aims to provide a day-dimension regional traffic index prediction method considering multi-factor influence, which is used for acquiring the regional traffic index change trend in a period in advance so as to realize advanced prevention and control and early warning and forecast of the road network operation condition. The method provides support for improving the operation efficiency of the road network, reducing the congestion condition and the accident occurrence probability and improving the operation safety service level of the road network during peak traveling.
In order to achieve the purpose, the technical scheme adopted by the invention is a daily dimension area traffic index prediction method considering multi-factor influence, and the method specifically comprises the following steps:
step 1, dividing and aggregating traffic areas;
step 1.1, dividing traffic cells based on a road network structure;
factors such as land property, administrative division, natural landform, road network structure and the like are comprehensively considered, and the analysis area is divided into a plurality of traffic cells. When dividing traffic cells, the large difference of traffic demands of inner and outer ring areas of a city is considered, the area division area of the area with large traffic demand is small, and the area of the area with small traffic demand is increased.
Step 1.2, aggregating traffic districts based on spatial autocorrelation analysis;
in order to enhance the pertinence and the accuracy of the evaluation of the running state of the regional road network, trivial traffic cells are merged, and regions with similar running states of the road network are subjected to region aggregation by adopting a spatial autocorrelation division method. The local Molan index (LISA index for short) is used as a local spatial autocorrelation test index to identify the clustering property of the operation state in the region, namely, the spatial clustering of the traffic cells is realized according to a property similarity criterion.
Step 2, determining relevant prediction parameters of the regional traffic index;
the prediction time interval and the prediction period are important parameters in traffic prediction. The predicted time interval represents the minimum time unit of the data series of traffic state changes. The regional traffic index prediction aims at predicting the overall trend of the overall operation state of the road network of the next week region in advance, and particularly accurately identifying the region with high road network operation pressure in the traffic peak period so as to make corresponding dredging measures in advance. Therefore, the prediction time interval and the prediction period of the regional traffic index should be determined by comprehensively considering the efficiency and the accuracy of the prediction model in practical application.
step 3.1, calculating a regional traffic index;
the specific calculation steps are as follows:
s1, calculating an initial regional traffic index Rm: and calculating the ratio of the free flow speed of the region m passing through each grade section to the actual average traveling speed at the statistical interval of not more than 15 minutes. Respectively counting road mileage of each grade road in a whole road network and an area m road network at a severe congestion level by referring to a road traffic grade division standard of the road sections, taking the severe congestion mileage proportion in the area m road network as a weight, and calculating according to a formula (1) to obtain an initial index R of the area trafficm。
Wherein α represents a time period; m represents the number of regions; p represents the number of road segments in the area m; l isαmRepresenting the road mileage at the serious congestion level in the road network in the region m in the alpha period;representing the free flow speed of p road sections in the passing area m in the alpha period;representing the actual average speed over a period of alpha through p road segments in the area m.
S2. calculatingRegional traffic index RTI: in the pair RmAfter data accumulation in a period of time, normalizing the regional traffic index pre-index according to a formula (2) to finally obtain a value range of [0,10 ]]Regional traffic index RTI.
In the formula, RTI represents a regional traffic index; r represents an area traffic initial index; rminMinimum value, R, representing the initial index of regional traffic in the historical data seriesmtxRepresenting the maximum value of the initial index of regional traffic in the historical data series.
Step 3.2, making up missing values by using original data;
the rule for the original data to compensate for missing values is as follows:
s1, extracting a data series with a deletion proportion of less than or equal to 15% from original data, and performing compensation processing on discontinuous parts in the data series;
s2, under the condition that single time point data is lost, an arithmetic mean value of two adjacent time point data is adopted as recovery data;
s3, extracting corresponding historical data RTI of the previous i weeks in the same period under the condition that a plurality of continuous time point data are missingi,wiRepresenting RTIiThe corresponding weight and the calculation formula of the lost data RTI are as follows:
wherein 0 < wi1, the weights satisfy the following relationship in terms of time distance and time mutual correlation degree: w is ai+1<wiAnd isi does not exceed 3.
3.3, removing abnormal values from the original data;
the rule for rejecting abnormal values from the original data is as follows:
s1, calculating a front difference and a rear difference of each time index value in a data series;
B1_t=RTIt-RTIt-1 (4)
B2_t=RTIt+1-RTIt (5)
in the formula, B1_tRepresenting the previous difference of the index value at a certain time; b is2_tA posterior difference representing an index value at a time; RTItRepresenting index data at a certain current time; RTIt-1Representing the index data at the previous moment; RTIt+1Representing the index data at the later time.
S2, calculating the fluctuation index of the index value at each moment;
wherein Z represents the fluctuation index of the index value at a certain moment; b is1_tA pre-difference representing the value of the index at that time; b is2_tA back difference representing the value of the index at that time; RTItRepresenting the regional traffic index at the current time.
And S3, judging whether the numerical value is a singular value according to the Z value obtained by calculation in the step 3.3 in the step 2, taking 15% as a judgment limit, and if Z is more than 15%, determining that the numerical value is the singular value and removing.
Step 3.4, carrying out regional traffic index grading treatment;
and dividing the regional traffic indexes into 5 classes by using a lower threshold dividing principle, wherein the classification result is used for predicting the congestion state grade of the decision tree, and the index data is used for predicting the traffic indexes by using Euclidean distance after the classification is finished.
TABLE 1 road traffic operation level division
Regional Traffic Index (RTI) | 0≤RTI<2 | 2≤RTI<4 | 4≤RTI<6 | 6≤RTI<8 | 8≤RTI≤10 |
Road network operation level | Clear | Is basically unblocked | Light congestion | Moderate congestion | Severe congestion |
Step 3.5, constructing a historical data factor attribute set;
since the change in the regional traffic index is affected by a variety of factors, a set of factor attributes needs to be first determined for a set of training samples. The set of factor attributes is divided into a region attribute, a date attribute, a weather attribute, and an event attribute. The date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area.
Table 2 factor attribute selection
Step 4.1, predicting the grade of the running congestion state of the regional road network;
and (3) generating a regional traffic index decision tree through the training sample set constructed in the step (3.5), wherein the process mainly comprises a division and selection process, an updating process of the regional traffic index decision tree and a prediction process of the regional road network running state grade.
(1) Recursive division of regional exponential samples for a tree building process
Firstly, setting a training data set of nodes as D, and calculating the kini indexes of all factors, including region attributes, date attributes, weather attributes and event attributes. At this time, for each characteristic attribute a, for each value a that it may take, D is divided into D according to whether the test of the sample point pair a ═ a is yes or no1And D2In both sections, the kini index at a ═ a is calculated using formula (7) and formula (8).
Where Gini (D) represents the uncertainty of set D; k represents the total number of categories; k represents the category sequence number; p is a radical ofkRepresenting the probability that a sample point belongs to class k.
In the formula, Gini (D, a) represents the uncertainty of the set D after a ═ a division.
Selecting the feature with the minimum Gini index and the corresponding segmentation point as the optimal feature and the optimal segmentation point from all the possible features A and all the possible segmentation points a. And generating two sub-nodes from the current node, and distributing the training data set to the two sub-nodes according to the characteristics.
And calling the first sub-node and the second sub-node recursively until a stopping condition is met.
Generating CART decision tree.
Set up the minimum sample quantity that a leaf node needs, or the maximum depth of the tree, avoid overfitting.
(2) Updating regional traffic index decision trees
The accuracy of the model is greatly influenced by the accuracy of weather forecast, and the timely updating of historical data, particularly historical weather factors, is beneficial to improving the accuracy of the model, so that the method provides a perfecting mechanism for dynamically updating the historical training library of the regional traffic index. In the training library, on one hand, only historical data of n months before a prediction period is selected and reserved all the time in order to improve the algorithm operation speed; on the other hand, before predicting the ith period, the real weather condition of the i-1 period is updated.
(3) Inputting each attribute value in the prediction time period to predict the congestion state grade
And collecting various attribute information such as next week tail number restriction, weather conditions, large activities, traffic control and the like, and predicting by using the generated regional traffic index decision tree to obtain a rough classification result of the traffic operation state grade in the prediction time period. In the division selection process, the division standard needs to be determined, namely the critical value of the attribute variable is determined
Step 4.2, using the squared Euclidean distance to predict the regional traffic index;
and screening the regional traffic index in the historical state most similar to the current prediction state by using the squared Euclidean distance. Definition Y { Y1,y2,…,yqThe current prediction state vector is used as the prediction state vector, and the history state vectors with the same rough classification are combined into a set Cs{Cs1,Cs2,…Csq}. Therefore, the squared euclidean distance between the historical state vector and the predicted state vector is calculated as follows:
in the formula, CsRepresenting the squared Euclidean distance between the s-th historical state and the predicted state with the same rough classification result; xsqRepresenting the value of the qth attribute in the s-th historical state vector in the data set X with the same coarse classification result; y isqA value representing the qth attribute in the prediction state vector Y; q is 1,2, …, Q and Q is positive integerAnd (4) counting.
Taking the regional traffic indexes with the squared Euclidean distance smaller than the threshold c to form a set V { V }1,V2,…VZ}. The threshold value c is the c-th percentile of the Euclidean distance, and the average absolute error between the predicted value and the actual value of the regional traffic index is the minimum at the moment.
The final predicted regional traffic index is:
in the formula, PfRepresents a prediction index value; z is the amount of data in set V.
When the regional traffic index calculation method model is constructed, the road mileage ratio of the road network in the region at the serious congestion level is used as a weight value.
The historical data factor attribute set constructed in step 3.5 is divided into an area attribute, a date attribute, a weather attribute and an event attribute. The date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area. The method specifically comprises the following steps: region, month, period, workday, holiday, week, student holiday, end cap, weather, special event, major event, and traffic control. Various factors which may affect the operation state of the road network are comprehensively considered, and continuous expansion and updating are supported.
And 4.1, establishing a perfecting mechanism for dynamically updating the regional traffic index historical training library. The attribute information of the historical data is updated in real time while the high-efficiency algorithm operation speed is guaranteed, and the influence of errors caused by the weather factor information in the historical data is reduced to the minimum.
After the congestion level is determined by using the regional traffic index decision tree, a squared Euclidean distance method is further selected, and the traffic index of the historical state closest to the predicted time period is searched to serve as the traffic index of the time period.
Compared with the prior art, the invention has the following obvious advantages and beneficial effects:
(1) the invention fully considers various factors influencing the running state of the road network, such as areas, dates, weather, events and the like, provides a regional traffic index prediction method based on a decision tree theory, comprehensively considers the prediction requirements and application feasibility, and can realize the regional traffic index prediction detailed to the daily dimension of each cell. The method overcomes the defects that the prior related research only focuses on short-time prediction of traffic information, the overall operation condition of the lower road network is difficult to evaluate, and active prevention and control measures are developed in advance.
(2) The invention can accurately predict the object from the whole road network traffic index to the regional traffic index, so that the prediction result is more practical and the regional road network operation characteristics are more accurately described. The forecasting process is easy to operate, and meanwhile, with the continuous improvement of historical data, the factor attribute set can be further updated and improved, various influence factors are considered in detail, and data support is provided for urban road network forecasting and early warning.
(3) The updating iteration of the historical data can effectively improve the model precision. The method establishes a perfection mechanism for dynamically updating the regional traffic index historical training library. The attribute information of the historical data is updated in real time while the high-efficiency algorithm operation speed is guaranteed, and the influence of errors caused by the weather factor information in the historical data is reduced to the minimum.
(4) The inspection and analysis of model precision shows that the average absolute error of the predicted value and the actual value of the regional traffic index is basically controlled within 0.6, and the average relative error can be kept between 4% and 10%. The method has better prediction accuracy in the peak period of working days and non-working days. The method is more feasible when being applied to the regional traffic index prediction work of daily dimension.
Drawings
Fig. 1 is a schematic diagram of traffic cell aggregation based on spatial autocorrelation analysis;
FIG. 2 is a flow chart of the raw data preprocessing of regional traffic indexes;
FIG. 3 is a flow chart of regional traffic index prediction based on decision tree theory;
FIG. 4 shows the result of the prediction of early peak traffic index in the country trade area of 17-23 months in 2017;
FIG. 5 shows the predicted late peak traffic index in the national trade area of 17-23 months in 2017;
FIG. 6 is a flow chart of the method of the present invention.
Detailed Description
The method selects the country trade region traffic index of Beijing city as a prediction object, predicts the traffic index of the region in 4 month and 17 to 23 day in 2017 by using a medium-long term region traffic index prediction method based on a decision tree theory, and performs model precision verification on the early peak index and the late peak index.
The specific implementation steps are as follows:
step 1, dividing key attention areas;
the Beijing city is divided into 1911 traffic districts under the premise that the administrative district is not broken and natural division zones such as rivers, railways and the like are taken as the boundaries of the traffic districts by comprehensively considering factors such as land property, the administrative district, natural landforms, road network structures and the like. Considering that the difference of the traffic demands of the inner and outer ring areas of the city is large, and the division fineness of the traffic districts is different, the division area of the inner area of the five rings is small, and the area of the outer ring area is increased. So as to achieve the purposes of reducing the workload as much as possible and enhancing the operability of investigation and analysis under the condition of meeting the precision requirement.
On the basis of the above region division, autocorrelation inspection is performed on the local space by using the local Moran index, and the autocorrelation degree between the region m and the adjacent region is effectively measured. For the area with space autocorrelation property, the grid cell attribute value x is utilizedmAnd corresponding spatial lag xm,-1In turn with the mean of the variable attributesThe magnitude relationship is spatially clustered. Traffic cells with the same aggregate properties are further aggregated.
Step 2, determining relevant prediction parameters of the regional traffic index;
usually, the traffic state change of 5-15 minutes continuously has certain stability and regularity. The regional traffic index prediction under the medium and long term angles aims at predicting the overall trend of the overall operation state of the road network of the next week region in advance, so that the traffic state at the future time can be accurately predicted in real time by taking 30 minutes as a prediction time interval on the basis of determining the operation characteristics and prediction requirements of the road network. In addition, the method only predicts the time period with strong prediction demand and obvious traffic flow change, and sets the prediction time period to be 18 hours from 5:00 in the morning to 23:00 in the evening.
the raw data is preprocessed by screening, compensation, elimination, etc. according to the data preprocessing flow shown in fig. 2. The data set of the pre-treatment is shown in the following table:
TABLE 3 traffic index data (parts) of the pre-processed areas
Area name | Date and time | Traffic index | Congestion level |
Country trade area | 201703251800 | 7.3 | 3 |
Country trade area | 201703251805 | 7.5 | 3 |
Country trade area | 201703251810 | 7.6 | 3 |
Country trade area | 201703251815 | 7.8 | 3 |
Country trade area | 201703251820 | 7.6 | 3 |
Then, a historical data factor attribute set is constructed, taking a country trade region as an example, and the region ID is numbered 18. Sample data are shown in the following table:
table 4 training sample data example
and integrating the factor attribute set and the preprocessed regional traffic index data to be used as a training sample library required by the prediction work. Inquiring the date attribute of the prediction week, the weather condition, the large-scale activity and other related information, and predicting the regional traffic index according to the prediction flow shown in fig. 3.
Basic information table for 4 months, 17-23 days in table 52017
Table 6 prediction of peak traffic index prediction results in country trade area during weekdays
|
4 month and 17 |
4 month and 18 |
4 month and 19 |
4 month and 20 |
4 month and 21 days |
7:00 | 5.3 | 6.9 | 5.2 | 5.2 | 3.1 |
7:30 | 7.0 | 6.9 | 7.8 | 6.9 | 6.9 |
8:00 | 6.9 | 6.9 | 6.9 | 6.3 | 5.4 |
8:30 | 6.2 | 6.9 | 6.9 | 6.9 | 6.8 |
9:00 | 5.2 | 7.3 | 5.9 | 6.4 | 6.8 |
17:00 | 7.0 | 7.6 | 7.0 | 7.1 | 7.1 |
17:30 | 7.1 | 7.1 | 8.3 | 8.3 | 8.3 |
18:00 | 8.4 | 7.7 | 8.3 | 8.3 | 8.2 |
18:30 | 6.8 | 8.3 | 7.1 | 7.1 | 7.2 |
19:00 | 7.0 | 7.0 | 5.0 | 5.0 | 6.1 |
In order to evaluate the effect of the prediction model, the average absolute error, the average relative error, the root mean square error and the error distribution probability (the data proportion of which the absolute error is less than 0.5) are used as evaluation indexes of the prediction effect, and the accuracy of the traffic index prediction model of the medium-long term region based on the decision tree theory is verified in the peak period and the peak-balancing period of the working day and the non-working day respectively. The results are shown in the following table:
table 7 prediction results of peak traffic index of country trade region during weekday
The statistical results show that the average absolute error of the predicted value and the actual value of the regional traffic index is controlled within 0.6, the average relative error can be kept between 4% and 10%, the prediction precision in each period is good, and particularly the prediction result in the peak period is better than that in the peak-smoothing period. The root mean square error of each test time interval is about 0.5, which shows that the discrete degree of the error is not large, and reflects the error stability of the prediction model to a certain degree. The distribution probability of the error shows that the absolute error of more than 80% of data can be controlled within 0.5 basically, and the absolute error of more than 90% of data series in the peak period is lower than 0.5, so that the predicted work service requirement is basically met.
Claims (5)
1. A day dimension area traffic index prediction method considering multi-factor influence is characterized by comprising the following steps: the method specifically comprises the following steps:
step 1, dividing and aggregating traffic areas;
step 1.1, dividing traffic cells based on a road network structure;
comprehensively considering the factors of land property, administrative division, natural landform and road network structure, dividing the analysis area into a plurality of traffic cells; when dividing traffic cells, the fact that the difference of traffic demands of inner and outer ring areas of a city is large, the divided area of the area with large traffic demand is small, and the area of the area with small traffic demand is increased is considered;
step 1.2, aggregating traffic districts based on spatial autocorrelation analysis;
in order to enhance the pertinence and the accuracy of the evaluation of the running state of the regional road network, trivial traffic cells are merged, and regions with similar running states of the road network are subjected to region aggregation by adopting a spatial autocorrelation division method; identifying the aggregation property of the operation state in the region by using the local Moran index as a local spatial autocorrelation test index, namely realizing spatial clustering of traffic cells according to a property similarity criterion;
step 2, determining relevant prediction parameters of the regional traffic index;
the prediction time interval and the prediction period are important parameters in traffic prediction; predicting a minimum time unit of the data series whose time interval represents the traffic state change; the regional traffic index prediction aims at predicting the overall trend of the overall operation state of the road network of the next week region in advance and accurately identifying the region with high road network operation pressure in the traffic peak period so as to make corresponding dredging measures in advance; therefore, the efficiency and the precision requirements of the prediction model in practical application are comprehensively considered, and the prediction time interval and the prediction period of the regional traffic index are determined;
step 3, preprocessing the original data of the regional traffic index;
step 3.1, calculating a regional traffic index;
the specific calculation steps are as follows:
s1, calculating an initial regional traffic index Rm: taking not more than 15 minutes as a statistical interval, calculating the ratio of the free flow speed of the region m passing through each level of road sections to the actual average running speed; respectively counting road mileage of each grade road in a whole road network and an area m road network at a severe congestion level by referring to a road traffic grade division standard of the road sections, taking the severe congestion mileage proportion in the area m road network as a weight, and calculating according to a formula (1) to obtain an initial index R of the area trafficm;
Wherein α represents a time period; m represents the number of regions; p represents the number of road segments in the area m; l isαmRepresenting the road mileage at the serious congestion level in the road network in the region m in the alpha period;representing the free flow speed of p road sections in the passing area m in the alpha period;representing the actual average speed of p road sections in the passing area m in the alpha period;
s2, calculating an area traffic index RTI: in the pair RmAfter data accumulation in a period of time, normalizing the regional traffic index pre-index according to a formula (2) to finally obtain a value range of [0,10 ]]Regional traffic index RTI;
in the formula, RTI represents a regional traffic index; rmRepresenting an area traffic initial index; rminRepresenting the minimum value, R, of the initial index of regional traffic in the historical regional traffic index data seriesmaxRepresenting the maximum initial index of regional traffic in a historical data seriesA large value;
step 3.2, making up missing values by using original data;
the rule for the original data to compensate for missing values is as follows:
s1, extracting a data series with a deletion proportion of less than or equal to 15% from original data, and performing compensation processing on discontinuous parts in the data series;
s2, under the condition that single time point data is lost, an arithmetic mean value of two adjacent time point data is adopted as recovery data;
s3, extracting corresponding historical data RTI of the previous i weeks in the same period under the condition that a plurality of continuous time point data are missingi,wiRepresenting RTIiCorresponding weight, lost regional traffic index data RTIdThe calculation formula of (2) is as follows:
wherein 0 < wi1, the weights satisfy the following relationship in terms of time distance and time mutual correlation degree: w is ai+1<wiAnd isi is not more than 3;
3.3, removing abnormal values from the original data;
the rule for rejecting abnormal values from the original data is as follows:
s1, calculating a front difference and a rear difference of each time index value in a data series;
B1_t=RTIt-RTIt-1 (4)
B2_t=RTIt+1-RTIt (5)
in the formula, B1_tRepresenting the previous difference of the index value at a certain time; b is2_tA posterior difference representing an index value at a time; RTItRepresenting the current regional traffic index at a certain moment; RTIt-1Representing the regional traffic index at the previous moment; RTIt+1Representing the area of the next momentA domain traffic index;
s2, calculating the fluctuation index of the index value at each moment;
wherein Z represents the fluctuation index of the index value at a certain moment; b is1_tA pre-difference representing the value of the index at that time; b is2_tA back difference representing the value of the index at that time;
s3, judging whether the numerical value is a singular value or not according to the Z value obtained by calculation in the step 3.3 in the step 2, taking 15% as a judgment limit, and if Z is more than 15%, determining that the numerical value is the singular value and removing;
step 3.4, carrying out regional traffic index grading treatment;
the regional traffic index is divided into 5 classes by using the following threshold dividing principle, wherein the classes are respectively smooth: RTI is more than or equal to 0 and less than 2, and is basically unblocked: RTI is more than or equal to 2 and less than 4, and light congestion is caused: RTI is more than or equal to 4 and less than 6, and medium congestion is caused: RTI is more than or equal to 6 and less than 8, and the congestion is serious: RTI is more than or equal to 8 and less than or equal to 10; the classification result is used for predicting the congestion state grade of the decision tree, and the index data is used for predicting the traffic index by using Euclidean distance after the classification is finished;
step 3.5, constructing a training sample set;
because the change of the regional traffic index is influenced by various factors, a factor attribute set is determined for a training sample set; dividing the factor attribute set into an area attribute, a date attribute, a weather attribute and an event attribute; the date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area; the date attribute factors comprise month, time period, working day, holidays, week, student holidays and tail number restriction; weather attribute factors mainly include rain, snow, haze and the like; the event attribute factors comprise special events, large-scale activities and traffic control; carrying out standard formatting treatment on the collected influence factor basic data, and integrating an influence factor attribute set and a preprocessed traffic index data set to be used as a training sample set;
step 4, constructing a regional traffic index prediction model
Step 4.1, predicting the grade of the running congestion state of the regional road network;
generating a regional traffic index decision tree through the training sample set constructed in the step 3.5, wherein the process mainly comprises a division and selection process, an updating process of the regional traffic index decision tree and a prediction process of the regional road network running state grade;
(1) recursive division of regional exponential samples for a tree building process
Setting a training data set of nodes as D, and calculating the kini indexes of all factors, including region attributes, date attributes, weather attributes and event attributes; at this time, for each characteristic attribute a, for each value a that it may take, D is divided into D according to whether the test of the sample point pair a ═ a is yes or no1And D2Calculating a kini index when A is a by using formula (7) and formula (8);
where Gini (D) represents the uncertainty of set D; k represents the total number of categories; k represents the category sequence number; p is a radical ofkRepresenting the probability that the sample point belongs to the kth class;
where Gini (D, a) represents the uncertainty of the set D after a ═ a segmentation;
selecting the feature with the minimum Gini index and the corresponding segmentation point as the optimal feature and the optimal segmentation point from all possible feature attributes A and all possible segmentation points a thereof; generating two sub-nodes from the current node, and distributing the training data set to the two sub-nodes according to the characteristics;
recursively calling the first and second sub-nodes until a stopping condition is met;
generating a CART decision tree;
setting the minimum sample number required by a leaf node or the maximum depth of the tree to avoid overfitting;
(2) updating regional traffic index decision trees
The accuracy of the model is greatly influenced by the accuracy of weather forecast, and the timely updating of historical data is beneficial to improving the accuracy of the model, so that the method provides a perfecting mechanism for dynamically updating the historical training library of the regional traffic index; in the training library, on one hand, only historical data of n months before a prediction period is selected and reserved all the time in order to improve the algorithm operation speed; on the other hand, before predicting the jth period, updating the real weather condition of the j-1 period;
(3) inputting each attribute value in the prediction time period to predict the congestion state grade
Collecting various attribute information of next week tail number restriction, weather conditions, large activities and traffic control, and predicting by using the generated regional traffic index decision tree to obtain a rough classification result of the traffic running state grade in a prediction time period; in the division selection process, the division standard needs to be determined, namely the critical value of the attribute variable is determined
Step 4.2, using the squared Euclidean distance to predict the regional traffic index;
screening the regional traffic index in the historical state which is most similar to the current prediction state by using the squared Euclidean distance; definition Y { Y1,y2,...,yqThe current prediction state vector is used as the prediction state vector, and the history state vectors with the same rough classification are combined into a set Xs{XS1,Xs2,...,Xsq}; therefore, the squared euclidean distance between the historical state vector and the predicted state vector is calculated as follows:
in the formula, CsSquared Euclidean representing the s-th history and prediction states with the same coarse classification resultA distance; xsqRepresenting the value of the qth attribute in the s-th historical state vector in the data set X with the same coarse classification result; y isqA value representing the qth attribute in the prediction state vector Y; q is 1,2, Q is a positive integer;
taking the regional traffic indexes with the squared Euclidean distance smaller than the threshold c to form a set V { V }1,V2,...VL}; the threshold value c is the c-th percentile of the Euclidean distance, and the average absolute error between the predicted value and the actual value of the regional traffic index is minimum at the moment;
the final predicted regional traffic index is:
in the formula, PfRepresents a prediction index value; l is the number of data in set V.
2. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: when the regional traffic index calculation method model is constructed, the road mileage ratio of the road network in the region at the serious congestion level is used as a weight value.
3. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: the historical data factor attribute set constructed in the step 3.5 is divided into an area attribute, a date attribute, a weather attribute and an event attribute; the date attribute and the weather attribute are global factors influencing the running state of the road network, and the area attribute and the event attribute are local factors which are possible to occur in a specific area; the method specifically comprises the following steps: region, month, time period, workday, holiday, week, student holiday, tail number restriction, weather, special event, major event, and traffic control; various factors which may affect the operation state of the road network are comprehensively considered, and continuous expansion and updating are supported.
4. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: step 4.1, a perfection mechanism for dynamically updating the regional traffic index historical training library is established; the method has the advantages that the efficient algorithm operation speed is guaranteed, meanwhile, the attribute information of the historical data is updated in real time, and the error influence caused by the information of the historical data is reduced to the minimum.
5. The method for predicting the traffic index of the daily dimension area in consideration of the multi-factor influence as claimed in claim 1, wherein: after the congestion level is determined by using the regional traffic index decision tree, a squared Euclidean distance method is selected, and the traffic index of the historical state closest to the predicted time period is searched to serve as the traffic index of the time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710955116.4A CN107610469B (en) | 2017-10-13 | 2017-10-13 | Day-dimension area traffic index prediction method considering multi-factor influence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710955116.4A CN107610469B (en) | 2017-10-13 | 2017-10-13 | Day-dimension area traffic index prediction method considering multi-factor influence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107610469A CN107610469A (en) | 2018-01-19 |
CN107610469B true CN107610469B (en) | 2021-02-02 |
Family
ID=61078206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710955116.4A Active CN107610469B (en) | 2017-10-13 | 2017-10-13 | Day-dimension area traffic index prediction method considering multi-factor influence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107610469B (en) |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108305459B (en) * | 2018-02-05 | 2021-02-02 | 北京交通大学 | Road network operation index prediction method based on data driving |
CN110197582A (en) * | 2018-02-27 | 2019-09-03 | 沈阳美行科技有限公司 | Data analysing method, device and traffic prewarning method, apparatus |
CN108428067A (en) * | 2018-04-09 | 2018-08-21 | 东华大学 | A kind of printing quality analysis of Influential Factors method based on historical data |
CN108537485B (en) * | 2018-04-11 | 2022-03-22 | 圆通速递有限公司 | Express mail delay processing method and system |
CN108346292B (en) * | 2018-04-17 | 2021-02-05 | 吉林大学 | Urban expressway real-time traffic index calculation method based on checkpoint data |
CN108615360B (en) * | 2018-05-08 | 2022-02-11 | 东南大学 | Traffic demand day-to-day evolution prediction method based on neural network |
CN108417037A (en) * | 2018-05-09 | 2018-08-17 | 电子科技大学 | A kind of sight spot periphery ride number computational methods based on traffic situation |
CN108831147B (en) * | 2018-05-24 | 2020-11-10 | 温州大学苍南研究院 | Data-driven method for observing macro driving fluctuation of urban bus |
US20210192410A1 (en) * | 2018-07-04 | 2021-06-24 | Sony Corporation | Information processing device, information processing method, and program |
CN109360421B (en) * | 2018-11-28 | 2022-03-25 | 平安科技(深圳)有限公司 | Traffic information prediction method and device based on machine learning and electronic terminal |
CN109559512B (en) * | 2018-12-05 | 2021-08-24 | 北京掌行通信息技术有限公司 | Regional traffic flow prediction method and device |
CN109887272B (en) * | 2018-12-26 | 2021-08-13 | 创新先进技术有限公司 | Traffic pedestrian flow prediction method and device |
CN109697854B (en) * | 2019-02-25 | 2021-07-16 | 公安部交通管理科学研究所 | Multi-dimensional urban road traffic state evaluation method |
CN109993408B (en) * | 2019-02-28 | 2021-07-09 | 河海大学 | Network appointment vehicle transport capacity allocation method based on service area division |
CN110223510B (en) * | 2019-04-24 | 2021-03-26 | 长安大学 | Multi-factor short-term traffic flow prediction method based on neural network LSTM |
CN110428613A (en) * | 2019-07-09 | 2019-11-08 | 中山大学 | A kind of intelligent transportation trend prediction method of machine learning |
CN110363990A (en) * | 2019-07-15 | 2019-10-22 | 广东工业大学 | A kind of public transport is passed unimpeded index acquisition methods, system and device |
CN110458337B (en) * | 2019-07-23 | 2020-12-22 | 内蒙古工业大学 | C-GRU-based network appointment vehicle supply and demand prediction method |
CN110763929A (en) * | 2019-08-08 | 2020-02-07 | 浙江大学 | Intelligent monitoring and early warning system and method for convertor station equipment |
CN110598923A (en) * | 2019-09-03 | 2019-12-20 | 深圳市得益节能科技股份有限公司 | Air conditioner load prediction method based on support vector regression optimization and error correction |
CN110837888A (en) * | 2019-11-13 | 2020-02-25 | 大连理工大学 | Traffic missing data completion method based on bidirectional cyclic neural network |
CN111210088B (en) * | 2020-01-16 | 2023-06-02 | 上海理工大学 | Traffic state index prediction method based on space-time factors |
CN111445694B (en) * | 2020-03-04 | 2022-02-01 | 青岛海信网络科技股份有限公司 | Festival and holiday traffic scheduling method and device based on traffic flow prediction |
CN111402585B (en) * | 2020-03-25 | 2021-02-02 | 中南大学 | Detection method for sporadic congestion path |
CN111626366B (en) * | 2020-05-28 | 2022-05-17 | 南京航空航天大学 | Operation characteristic-based area sector scene similarity identification method |
CN111768625A (en) * | 2020-07-01 | 2020-10-13 | 中国计量大学 | Traffic road event prediction method based on graph embedding |
CN112652164B (en) * | 2020-12-02 | 2022-12-30 | 北京北大千方科技有限公司 | Traffic time interval dividing method, device and equipment |
CN112837533B (en) * | 2021-01-08 | 2021-11-19 | 合肥工业大学 | Highway accident frequency prediction method considering risk factor time-varying characteristics |
CN113159374B (en) * | 2021-03-05 | 2022-04-22 | 北京化工大学 | Data-driven urban traffic flow rate mode identification and real-time prediction early warning method |
CN113033471A (en) * | 2021-04-15 | 2021-06-25 | 北京百度网讯科技有限公司 | Traffic abnormality detection method, apparatus, device, storage medium, and program product |
CN113378458A (en) * | 2021-05-26 | 2021-09-10 | 广州华南路桥实业有限公司 | Congestion early warning method, device, medium and equipment based on big data |
CN113327418B (en) * | 2021-05-31 | 2022-10-25 | 同济大学 | Expressway congestion risk grading real-time prediction method |
CN113902185B (en) * | 2021-09-30 | 2023-10-31 | 北京百度网讯科技有限公司 | Determination method and device for regional land property, electronic equipment and storage medium |
CN114331058B (en) * | 2021-12-15 | 2023-04-21 | 东南大学 | Assessment method for influence of built environment on traffic running condition |
CN114385639A (en) * | 2022-01-13 | 2022-04-22 | 湖北中南鹏力海洋探测系统工程有限公司 | Offshore ground wave radar data storage method and synthesis method |
CN114386536B (en) * | 2022-03-22 | 2022-07-01 | 腾讯科技(深圳)有限公司 | Region determination method, device, computing equipment and storage medium |
CN114548836A (en) * | 2022-04-25 | 2022-05-27 | 杭州玳数科技有限公司 | Epidemic situation-based multi-factor traffic hub operation method and system |
CN115331425B (en) * | 2022-06-30 | 2023-12-19 | 银江技术股份有限公司 | Traffic early warning method, device and system |
CN115440038B (en) * | 2022-08-31 | 2023-11-03 | 青岛海信网络科技股份有限公司 | Traffic information determining method and electronic equipment |
CN115440039B (en) * | 2022-09-01 | 2024-06-07 | 南京大学 | Traffic accident congestion cause analysis method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012024976A1 (en) * | 2010-08-23 | 2012-03-01 | 北京世纪高通科技有限公司 | Traffic information processing method and device thereof |
CN106448168A (en) * | 2016-11-24 | 2017-02-22 | 中山大学 | Automatic detection method for traffic incident based on tendency indicator and fluctuation indicator |
CN107045788A (en) * | 2017-06-28 | 2017-08-15 | 北京数行健科技有限公司 | Traffic Forecasting Methodology and device |
-
2017
- 2017-10-13 CN CN201710955116.4A patent/CN107610469B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012024976A1 (en) * | 2010-08-23 | 2012-03-01 | 北京世纪高通科技有限公司 | Traffic information processing method and device thereof |
CN106448168A (en) * | 2016-11-24 | 2017-02-22 | 中山大学 | Automatic detection method for traffic incident based on tendency indicator and fluctuation indicator |
CN107045788A (en) * | 2017-06-28 | 2017-08-15 | 北京数行健科技有限公司 | Traffic Forecasting Methodology and device |
Non-Patent Citations (3)
Title |
---|
基于历史频繁模式的交通流预测算法;钟慧玲等;《计算机工程与设计》;20120430;第33卷(第4期);1546-1552 * |
基于空间相关性分析的路网评价区域划分方法;邹文杰等;《北京工业大学学报》;20120430;第38卷(第4期);564-569 * |
数据挖掘技术在交通领域的应用;祝小静;《中国优秀硕士学位论文全文数据库信息科技辑》;20140215(第2期);I138-435 * |
Also Published As
Publication number | Publication date |
---|---|
CN107610469A (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107610469B (en) | Day-dimension area traffic index prediction method considering multi-factor influence | |
CN109871876B (en) | Expressway road condition identification and prediction method based on floating car data | |
CN108346292B (en) | Urban expressway real-time traffic index calculation method based on checkpoint data | |
Lin et al. | Using machine learning to assist crime prevention | |
CN104484993B (en) | Processing method of cell phone signaling information for dividing traffic zones | |
CN109299438B (en) | Public transport facility supply level evaluation method based on network appointment data | |
CN103984994B (en) | Method for predicting urban rail transit passenger flow peak duration | |
CN105678481A (en) | Pipeline health state assessment method based on random forest model | |
CN104751288A (en) | Segment-based multi-dimensional online quality evaluation system and method for steel coils | |
CN111008223A (en) | Regional traffic jam correlation calculation method based on space-time association rule | |
CN111179592B (en) | Urban traffic prediction method and system based on spatio-temporal data flow fusion analysis | |
CN113436433B (en) | Efficient urban traffic outlier detection method | |
CN112508237B (en) | Rain type region division method based on data analysis and real-time rain type prediction method | |
CN110781267A (en) | Multi-scale space analysis and evaluation method and system based on geographical national conditions | |
CN105374209A (en) | Urban region road network running state characteristic information extraction method | |
CN103514743A (en) | Method for recognizing abnormal traffic state characteristics of real-time index data matching memory range | |
CN114596700B (en) | Real-time traffic estimation method for expressway section based on portal data | |
Feng et al. | Spatiotemporal characterization of megaregional poly-centrality: Evidence for new urban hypotheses and implications for polycentric policies | |
CN110889092A (en) | Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data | |
Zambon et al. | Milan dynamic noise mapping from few monitoring stations: Statistical analysis on road network | |
CN108108859A (en) | A kind of traffic administration duties optimization method based on big data analysis | |
CN108257385B (en) | Method for discriminating abnormal events based on public transportation | |
CN115907822A (en) | Load characteristic index relevance mining method considering region and economic influence | |
CN117238126A (en) | Traffic accident risk assessment method under continuous flow road scene | |
CN116739376A (en) | Highway pavement preventive maintenance decision method based on data mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |