CN111160650B - Adaboost algorithm-based traffic flow characteristic analysis and prediction method - Google Patents

Adaboost algorithm-based traffic flow characteristic analysis and prediction method Download PDF

Info

Publication number
CN111160650B
CN111160650B CN201911401878.5A CN201911401878A CN111160650B CN 111160650 B CN111160650 B CN 111160650B CN 201911401878 A CN201911401878 A CN 201911401878A CN 111160650 B CN111160650 B CN 111160650B
Authority
CN
China
Prior art keywords
prediction
traffic flow
sequence
time
weak
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911401878.5A
Other languages
Chinese (zh)
Other versions
CN111160650A (en
Inventor
文成林
郑乐军
沈硕
尉涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
State Grid Hubei Electric Power Co Ltd
Original Assignee
Hangzhou Dianzi University
State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University, State Grid Hubei Electric Power Co Ltd filed Critical Hangzhou Dianzi University
Priority to CN201911401878.5A priority Critical patent/CN111160650B/en
Publication of CN111160650A publication Critical patent/CN111160650A/en
Application granted granted Critical
Publication of CN111160650B publication Critical patent/CN111160650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Primary Health Care (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a traffic flow characteristic analysis and prediction method based on an Adaboost algorithm. The invention provides a thought evolution algorithm (MEC) -based directional search of optimal initial parameters of a neural network, which aims to solve the problem that the neural network is easy to fall into local optimization; and integrating the optimized neural network by using an Adaboost algorithm to solve the problem of poor generalization performance of the neural network on a new sample set, and readjusting the weight distribution of the Adaboost algorithm on the weak predictors by using a prediction error square and reciprocal criterion on the basis, so that the network prediction precision of each predictor is improved to the maximum extent. The invention can improve the traffic flow prediction precision and has better adaptability to different traffic flow states.

Description

Adaboost algorithm-based traffic flow characteristic analysis and prediction method
Technical Field
The invention belongs to the field of intelligent traffic, and particularly relates to a traffic flow characteristic analysis and prediction method based on Adaboost algorithm.
Background
The urban traffic control system is used for reasonably controlling the traffic flow in an urban road network, so that the traffic flow can use intersections in a time-sharing manner, traffic accidents are avoided, traffic congestion is prevented, and traffic condition information is timely provided for related personnel and pedestrians on vehicles so as to improve traffic safety. In order to realize the control, the system needs to know real-time traffic conditions instantly, the same prediction method has different accuracy of traffic flow prediction in different time periods and regions, and the results obtained by adopting different prediction methods in the same group of data have great difference.
At present, the traffic flow analysis and research mainly carries out chaotic identification through a recursive graph method of a chaos theory, a Kolmogorov entropy, a Lyapunov index and the like, so as to judge whether the traffic flow has predictability. However, most of these methods require a large sample size, and the calculation methods are not yet sophisticated enough to perform comparable measurements. The research of a prediction model of a traffic state has a plurality of theories and methods, the existing short-time traffic flow prediction method can be roughly divided into two categories, one category is a mathematical model method based on determination, and the other category is an intelligent model prediction method based on knowledge, for example, the proposed Kalman filtering prediction traffic flow has the characteristics of few model parameters and relatively simple and convenient calculation, but the nonlinearity and uncertainty in the traffic flow prediction process are difficult to reflect; the genetic algorithm is used for optimizing the neural network, so that the problems of low convergence speed, poor popularization capability and the like are solved, and the whole population evolution search efficiency is low. Because the early neural network adopts the traditional BP learning to solve the problem of weight correction of a hidden layer, and the overall minimum value cannot be effectively searched for a multi-peak value and an immaterial function, the method has the assignment randomness to network parameters and the sensitivity to initial values, so that the simulation result of a neural network model in engineering application is unstable; and a disadvantage of the conventional BP learning is that online learning cannot be performed, and sufficient samples need to be accumulated for unified training, so that network parameters cannot be adjusted in real time according to new samples.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a traffic flow characteristic analysis and prediction method based on Adaboost algorithm.
The technical scheme adopted by the invention for solving the technical problem is as follows:
the invention specifically comprises the following steps:
step (1) short-term characteristic analysis of traffic flow based on R/S analysis method
Step (1-1) calculation step:
a time series { x (t) }, t 1,2, …, M is set as follows.
1) Divide it into length n [ M/n ]]A length of equal subsequences, I a Denotes the a-th sub-sequence segment, and the time-sequence segment at the a-th is denoted by { x (i) }, i ═ 1,2, …, n. E a Represents the mean over the a-th subsequence segment:
Figure BDA0002347692070000021
2) subsequence section I a Cumulative deviation X (i, a) of the elements in (a) from the mean:
Figure BDA0002347692070000022
3) subsequence section I a Is extremely poor
Figure BDA0002347692070000023
And standard deviation of sample
Figure BDA0002347692070000024
Figure BDA0002347692070000025
Figure BDA0002347692070000026
4) Sub-sequence segment length is the re-standard range value (R/S) of n divisions n
Figure BDA0002347692070000027
Step (1-2) analysis process:
the time series can be classified into three types according to the difference of the hurst-specific values:
(1)0< H <0.5, indicating that the sequence is not a random walk sequence, but is an inversely correlated time sequence, i.e., the trend of change in the future is opposite to the trend of the past, and the closer H is to 0, the stronger the persistence is.
(2) H is 0.5, which indicates that the sequence is a standard random walk sequence, i.e. the future trend of change has no relation with the increment of the past trend.
(3)0.5< H <1, indicating that the time series is persistent, the past increasing trend is predictive of a future increasing trend, and the past decreasing trend is predictive of a future decreasing trend. When H approaches 1, it indicates that the past is closely related to the future. Quantitative analysis can be made according to the future change trend of the persistence and the anti-persistence to the time sequence.
Step (2) traffic flow time sequence phase space reconstruction
The phase space reconstruction theory is a vital part in chaotic system analysis, and the phase space is constructed by utilizing the time sequence data of the traffic flow, so that the sequence is hidden with a rule in the evolution process and useful internal information can be embodied. Setting the time sequence of traffic flow as
Figure BDA0002347692070000031
Let time delay be τ and embedding dimension be M, then the M-dimensional phase space vectors constructed by the time-delayed phase space reconstruction method are:
X={X(t)|X(t)=[x(t),x(t+τ),…,x(t+(m-1)τ)] T ,t=1,2,…,M} (6)
wherein X is an M × M dimensional matrix, the number of phase points in the reconstructed phase space is M ═ N- (M-1) τ, the M phase points form a phase type in the M dimensional phase space, the phase type represents the state of the traffic flow system at a certain moment, and the phase type is connected according to the time increasing sequence, so that the evolution track of the traffic flow system in the M dimensional phase space can be described, and therefore, the original one-dimensional time sequence prediction problem is converted into the prediction of the M dimensional phase point sequence. Assuming that the predicted phase points { X (t), X (t-1), …, X (t-k) }, k ═ 1,2, …, t-1 are known, and the phase points to be predicted at the current time t + (m-1) τ are { X (t +1), X (t +2), …, X (t + p) }, where p ═ 1 is referred to as one-step prediction, and p >1 is referred to as multi-step prediction, the prediction model can be expressed as:
{x(t+(m-1)τ+1),…,x(t+(m-1)τ+p)}=F(X(t),…,X(t-k)) (7)
and the generalized approximation capability of the feedforward neural network is utilized to realize one-step or multi-step prediction of the traffic flow. The method utilizes a C-C method to calculate the embedding dimension and the delay time, and calculates the maximum Lyapunov index of the traffic flow through a wolf method to judge the chaos characteristic of the traffic flow.
Step (3) MEC-BP fusion algorithm
The thought evolution algorithm is a novel evolution algorithm aiming at the defects of the genetic algorithm and simulating the human thought evolution process. The method inherits partial thought of a genetic algorithm and introduces two new operation operators of 'convergence' and 'dissimilarity'. Convergence and differentiation are respectively responsible for local and global optimization, the two operators are independent and coordinated with each other, the improvement of any operation can improve the overall search efficiency of the algorithm, and the directional learning and memory mechanism of the algorithm enables the algorithm to have extremely strong global optimization capability.
Setting t as the current number of iterations in the MEC global iteration; ρ is the number of iterations currently in progress in an iteration within a certain sub-population. Each individual in the sub-population represents an initial weight value and a threshold value in a group of BP neural network fusion algorithms, and a single individual N is measured i,j The fitness index is obtained by further calculating the fusion result obtained by the converged BP neural network fusion model after training, and in the internal iteration of the sub-populations, the optimal individual N is selected from each sub-population through a local bulletin board i,pbest Then, the individual represents the whole sub-population to participate in the global competition through a global bulletin board, and the global optimal sub-population S is selected gbest And a global optimal individual N included therein gbest . After multiple iterations, the MEC-BP neural network model obtained by training the initial weight represented by the final global optimal individual and the threshold is the final obtainedA multi-source traffic data fusion model.
Step (4) neural network integrated prediction model based on Adaboost algorithm
The adaptive enhancement algorithm (Adaboost) obtains the sample weight by repeatedly searching the sample feature space, continuously adjusts the weight of the training sample in the iteration process, increases (reduces) the weight of the sample with low (high) prediction precision, and adopts the method of weighted majority voting to combine to form a strong predictor, namely, increases (reduces) the weight of a weak predictor with smaller (larger) prediction error rate, so that the weight plays a larger (smaller) role in voting, and the prediction performance of the learning algorithm is obviously improved.
Step (4-1) Adaboost algorithm step
Step 1: data acquisition and network initialization. Selecting m groups of training samples T { (X) from the sample space i ,y i ) Giving the weight distribution of training samples as w 1i 1/m, i is 1,2, …, m, the network structure is determined according to the input and output dimensions of the samples, the initial weight and the threshold value of the neural network are obtained by the optimization of the improved thought evolution algorithm, D (1) represents the initial weight for obtaining the samples, and K represents the number of predictors.
D(1)=(w 11 ,w 12 ,…w 1i ,…,w 1m ) (8)
Step 2: carry out iteration K ═ 1,2, …, K
(a) When training the k weak predictor, the weak predictor H is used k (x) Training samples and predicting training data output regression error rate ξ k Calculating the maximum error E of the samples on the training set k And relative error xi of each sample ki
E k =max(|y i -H k (X i )) (9)
Figure BDA0002347692070000041
Figure BDA0002347692070000042
(b) Calculating the weight a of the weak predictor in the final predictor k
Figure BDA0002347692070000043
(c) According to the predicted sequence weight a k And adjusting the weight of the next round of training samples:
D(k+1)=(w k+1,1 ,w k+1,2 ,…,w k+1,m ) (13)
Figure BDA0002347692070000044
step 3: training K rounds to obtain K groups of weak prediction functions H k (x) Combining the weak prediction functions according to the weight of the weak predictor to obtain a strong predictor h (x) which is as follows:
Figure BDA0002347692070000045
step 4: in order to better solve the weighted value of each group of weak predictors, weak prediction function values H of K groups of weak predictors are obtained by training MEC-BP neural network through Adaboost algorithm k (x) Then, the weighted value w of each group of weak prediction functions is carried out by adopting the square sum reciprocal criterion of the prediction error again k Solving to obtain the accumulated strong predictor h (x) -sigma w k *H k (x k ,a k ). The larger the square sum of the prediction errors is, the lower the prediction accuracy of the prediction model is, so that the importance of the prediction model in the combined prediction is reduced, and a larger weighting coefficient is assigned to the single prediction model with the smaller square sum of the prediction errors in the combined prediction. The weighting coefficient calculation method comprises the following steps:
Figure BDA0002347692070000051
let y ki For the prediction of the k-th weak predictor at the i-th momentValue, y i The observed value at the ith time point of the same prediction object, m represents the time length, E k The k-th weak predictor is the sum of the square of the prediction errors.
Figure BDA0002347692070000052
The invention has the beneficial effects that: aiming at the basic characteristics of uncertainty, complexity and high nonlinearity of short-time traffic flow, the R/S analysis method is applied to short-time traffic flow analysis, can reveal the internal law of microscopic traffic flow movement and quantitatively reveal the dynamic characteristics of a traffic system. And simultaneously, optimizing initial parameter selection of the BP neural network by adopting a thought evolution algorithm, improving the prediction precision of the neural network, performing Adaboost algorithm integration effective comprehensive decision on the network optimized by a plurality of thought evolution algorithms, improving the generalization of the network, and readjusting the weight distribution of the Adaboost algorithm on the weak predictors by a prediction error square and reciprocal criterion on the basis, so that the prediction precision of the network is improved to the maximum extent by each predictor. And then, predicting the short-time traffic flow according to the created integrated neural network prediction model by adopting a PeMS system data set.
Drawings
FIG. 1 is a block diagram of the MEC-BP fusion algorithm structure.
FIG. 2 is a diagram of an integrated neural network architecture based on the Adaboost algorithm.
FIG. 3 is a time sequence diagram of traffic flow with different statistical scales for 5 consecutive days.
And 5, the wavelet transform real part time-frequency distribution of short-term traffic flow at 45 min intervals is shown.
Fig. 5 plots of Hurst indices for different statistical scales.
FIG. 6 log (R/S) of traffic flow time series at statistical scale 10min n And V n With respect to the logn variation curve.
FIG. 7 is a graph of V obtained at different statistical scales for the same time length n Change curve with logn.
FIG. 8 is a three-dimensional phase space reconstruction of traffic flow sequences of different statistical scales.
FIG. 9 is a comparison of predicted values of short-term traffic flow under different models.
FIG. 10 compares the predicted traffic flow values of different models with the absolute error values of the estimated values.
Detailed Description
The invention comprises the following steps:
step (1) short-term characteristic analysis of traffic flow based on R/S analysis method
Step (1-1) calculation step:
a time series { x (t) }, t ═ 1,2, …, M is set with the following calculation.
1) Divide it into length n [ M/n ]]A length of equal subsequences, I a Denotes the a-th sub-sequence segment, and the time-sequence segment at the a-th is denoted by { x (i) }, i ═ 1,2, …, n. E a Represents the mean over the a-th subsequence segment:
Figure BDA0002347692070000061
2) subsequence section I a Cumulative deviation X (i, a) of the elements in (a) from the mean:
Figure BDA0002347692070000062
3) subsequence section I a Extreme difference of (2)
Figure BDA0002347692070000063
And standard deviation of sample
Figure BDA0002347692070000064
Figure BDA0002347692070000065
Figure BDA0002347692070000066
4) Re-scale range (R/S) with subsequences length n divided n
Figure BDA0002347692070000067
Step (1-2) analysis process:
the time series can be classified into three types according to the difference of the hurst-specific values:
(1)0< H <0.5, indicating that the sequence is not a random walk sequence, but is an inversely correlated time sequence, i.e., the trend of change in the future is opposite to the trend of the past, and the closer H is to 0, the stronger the persistence is.
(2) H is 0.5, which indicates that the sequence is a standard random walk sequence, i.e. the future trend of change has no relation with the increment of the past trend.
(3)0.5< H <1, indicating that the time series is persistent, the past increasing trend is predictive of a future increasing trend, and the past decreasing trend is predictive of a future decreasing trend. When H approaches 1, it indicates that the past is closely related to the future. Quantitative analysis can be made according to the future change trend of the persistence and the anti-persistence to the time sequence.
Step (2) traffic flow time sequence phase space reconstruction
The phase space reconstruction theory is a vital part in chaotic system analysis, and the phase space is constructed by utilizing the time sequence data of the traffic flow, so that the sequence is hidden with a rule in the evolution process and useful internal information can be embodied. Setting the time sequence of traffic flow as
Figure BDA0002347692070000071
Let time delay be τ and embedding dimension be M, then the M-dimensional phase space vectors constructed by the time-delayed phase space reconstruction method are:
X={X(t)|X(t)=[x(t),x(t+τ),…,x(t+(m-1)τ)] T ,t=1,2,…,M} (6)
wherein X is an M × M dimensional matrix, the number of phase points in the reconstructed phase space is M ═ N- (M-1) τ, the M phase points form a phase type in the M dimensional phase space, the phase type represents the state of the traffic flow system at a certain moment, and the phase type is connected according to the time increasing sequence, so that the evolution track of the traffic flow system in the M dimensional phase space can be described, and therefore, the original one-dimensional time sequence prediction problem is converted into the prediction of the M dimensional phase point sequence. Assuming that the predicted phase points { X (t), X (t-1), …, X (t-k) }, k ═ 1,2, …, t-1 are known, and the phase points to be predicted at the current time t + (m-1) τ are { X (t +1), X (t +2), …, X (t + p) }, where p ═ 1 is referred to as one-step prediction, and p >1 is referred to as multi-step prediction, the prediction model can be expressed as:
{x(t+(m-1)τ+1),…,x(t+(m-1)τ+p)}=F(X(t),…,X(t-k)) (7)
and the generalized approximation capability of the feedforward neural network is utilized to realize one-step or multi-step prediction of the traffic flow. The method utilizes a C-C method to calculate the embedding dimension and the delay time, and calculates the maximum Lyapunov index of the traffic flow through a wolf method to judge the chaos characteristic of the traffic flow.
Step (3) MEC-BP fusion algorithm
The thought evolution algorithm is a novel evolution algorithm aiming at the defects of the genetic algorithm and simulating the human thought evolution process. The method inherits partial thought of a genetic algorithm and introduces two new operation operators of 'convergence' and 'dissimilarity'. Convergence and differentiation are respectively responsible for local and global optimization, the two operators are independent and coordinated with each other, the improvement of any operation can improve the overall search efficiency of the algorithm, and the directional learning and memory mechanism of the algorithm enables the algorithm to have extremely strong global optimization capability.
Referring to fig. 1, t is the number of iterations currently in progress in the MEC global iteration; ρ is the number of iterations currently in progress in an iteration within a certain sub-population. Each individual in the sub-population represents an initial weight value and a threshold value in a group of BP neural network fusion algorithms, and a single individual N is measured i,j The fitness index is obtained by further calculating the fusion result obtained by the converged BP neural network fusion model after training, and in the internal iteration of the sub-populations, the optimal individual N is selected from each sub-population through a local bulletin board i,pbest Then, the individual represents the whole sub-population and participates in the whole population through the global bulletin boardLocal competition, selecting global optimum sub-population S gbest And global optimal individual N included therein gbest . After multiple iterations, the MEC-BP neural network model obtained by training the initial weight represented by the final global optimal individual and the threshold is the finally obtained multi-source traffic data fusion model.
Step (4) neural network integrated prediction model based on Adaboost algorithm
The adaptive boosting algorithm (Adaboost) is to obtain the sample weight by repeatedly searching the sample feature space, continuously adjust the weight of the training sample in the iterative process, increase (reduce) the weight of the sample with low (high) prediction precision, and combine by adopting a weighted majority voting method to form a strong predictor, i.e. increase (reduce) the weight of a weak predictor with a smaller (larger) prediction error rate, so that the weight plays a larger (smaller) role in voting, and the prediction performance of the learning algorithm is obviously improved, as shown in fig. 2.
Step (4-1) Adaboost algorithm step
Step 1: data acquisition and network initialization. Selecting m groups of training samples T { (X) from the sample space i ,y i ) Giving the weight distribution of training samples as w 1i 1/m, i is 1,2, …, m, the network structure is determined according to the input and output dimensions of the samples, the initial weight and the threshold value of the neural network are obtained by the optimization of the improved thought evolution algorithm, D (1) represents the initial weight for obtaining the samples, and K represents the number of predictors.
D(1)=(w 11 ,w 12 ,…w 1i ,…,w 1m ) (8)
Step 2: carry out iteration K ═ 1,2, …, K
(a) When training the k weak predictor, the weak predictor H is used k (x) Training samples and predicting training data output regression error rate ξ k Calculating the maximum error E of the samples on the training set k And relative error xi of each sample ki
E k =max(|y i -H k (X i )) (9)
Figure BDA0002347692070000081
Figure BDA0002347692070000082
(b) Calculating the weight a of the weak predictor in the final predictor k
Figure BDA0002347692070000083
(c) According to the predicted sequence weight a k And adjusting the weight of the next round of training samples:
D(k+1)=(w k+1,1 ,w k+1,2 ,…,w k+1,m ) (13)
Figure BDA0002347692070000084
step 3: training K rounds to obtain K groups of weak prediction functions H k (x) Combining the weak prediction functions according to the weight of the weak predictor to obtain a strong predictor h (x) which is as follows:
Figure BDA0002347692070000091
step 4: in order to better solve the weighted value of each group of weak predictors, weak prediction function values H of K groups of weak predictors are obtained by training MEC-BP neural network through Adaboost algorithm k (x) Then, the weighted value w of each group of weak prediction functions is carried out by adopting the square sum reciprocal criterion of the prediction error again k Solving to obtain the accumulated strong predictor h (x) -sigma w k *H k (x k ,a k ). The larger the square sum of the prediction errors is, the lower the prediction accuracy of the prediction model is, so that the importance of the prediction model in the combined prediction is reduced, and a larger weighting coefficient is assigned to the single prediction model with the smaller square sum of the prediction errors in the combined prediction. The calculation method of the weighting coefficient comprises the following steps:
Figure BDA0002347692070000092
let y ki Is the predicted value of the k type weak predictor at the i time, y i The observed value at the ith time point of the same prediction object, m represents the time length, E k The k-th weak predictor is the sum of the square of the prediction errors.
Figure BDA0002347692070000093
Step (5) loading a PeMS data set to carry out traffic flow simulation test
To verify the effectiveness of the present invention, two types of source data are used in the PeMS system: 30 seconds traffic flow and lane occupancy, which aggregates the 30 seconds data into 5min, 15min, 1hour, etc. data sets. Experimental data set 1: collecting single road section traffic flow aggregation of 4 continuous working days from 5 months 2 days to 5 months 5 days in 2011, and recording traffic flow data under a 5min statistical scale; experimental data set 2: the method comprises the steps of aggregating 3 road traffic flows of 5 continuous dates from 6 months 1 days in 2011 to 5 months in 2011 (from wednesday to sunday), continuously recording traffic flow data under different statistical scales of 5, 10, 15, 20 and the like by adopting 24 continuous hours per day as observation time, and respectively obtaining 1440, 720, 480 and 360 data.
The similarity between the curves in fig. 3 shows that on different scales, the traffic flow changes have self-similarity, and the traffic flow data can be found to show obvious quasi-periodic trend by observing the change trend of the traffic flow time sequence in the data of the time period 5 min. In order to identify the self-similarity of traffic flow data, wavelet transformation is adopted to decompose the traffic flow data, and wavelet decomposition coefficients of the traffic flow data shown in fig. 4 refer to similarity indexes (RI), wherein the larger the RI is, the larger the self-similarity is, and due to the change of travel demands, the wavelet coefficients of working days (the first three days) and weekends (the last two days) are different, which indicates that the traffic flow has the time-interval property, so that the time-interval of the traffic flow data can be divided into busy time intervals, idle time intervals and normal time intervals. The experimental data show that the traffic flow time period can be divided into: the busy time period is 7:00-9:30, 14:30-18: 30; the idle time period is 0:00-5: 00; the rest is the normal time period.
A. Result of predictive analysis of short-term traffic flow based on R/S analysis method
The method for solving the Hurst value by the R/S analysis method is influenced by the sample size, in order to further track and compare data with different observation scales, the traffic flow sequence is accumulated by taking a natural period of the traffic flow as a unit one day, information which represents the change rule of the traffic flow sequence in the period is reserved to the maximum extent for calculation, and the following is the analysis of the traffic flow of the data set 2.
(1) Fig. 5 is a Hurst change curve for different days on different statistical scales, and shows that the values of the Hurst index are all located in an interval [0.5,1], which indicates that the traffic flow time series has long-term memory property, and indicates that the overall direction of traffic flow change inherits the past overall trend, and the past increasing (decreasing) trend indicates the future increasing (decreasing) trend. Each curve in the graph shows an integral descending trend along with the increase of the time length, namely, the Hurst index is reduced along with the increase of the sample amount, which shows that in the same statistical scale range, when the time sequence reaches a certain scale, the self-similarity of the original time sequence is damaged by increasing data; the Hurst index shows a descending trend along with the increase of a time statistical scale(s) under the same time length, the traffic flow sequence has short-term effectiveness, and the long memory of the time sequence is weakened along with the increase of time.
(2) Table 1 shows the calculation of the Hurst indexes for the same number of days (5 days) in different statistical scales in three different time periods, and the results show that the Hurst indexes for traffic flows in the same statistical scale from idle time periods to busy time periods show an increasing trend, because the busy traffic is stronger and the self-similarity is stronger, and the traffic predictability is stronger in the same time scale; the Hurst indexes of different scales at the same time period are in a descending trend, and the Hurst indexes are expected to be closer to 0.5 along with the continuous increase of the statistical scale, so that the traffic flow has no fractal feature, mainly because the correlation time sequence existing in the past and the future is a completely independent process.
TABLE 1 Hurst index at different time intervals and on different statistical scales for the same number of days
Figure BDA0002347692070000101
(3) If the time series is long-range correlated, the interdependence between times is strong. FIG. 6 shows the log (R/S) of the traffic flow time series at a statistical scale of 10min n And V n With respect to the logn variation curve, the original sequence V can be seen n The average cycle period of the traffic flow with the statistical scale of 10min is judged to be 207min, namely the sequence loses memory of the initial condition after passing 207min on average; meanwhile, the Hurst index (0.6233) after the sequence is disturbed is found to be smaller than the Hurst index (0.7031) of the original sequence, because the correlation structure of the original sequence is destroyed after the data are disturbed, and the ordered degree of the traffic flow time sequence is reduced; finding V after scrambling the sequence n Is a flat curve, and shows that the sequence becomes independent random process without long-range correlation.
(4) FIG. 7 is a graph of V obtained at different statistical scales for the same time length n The change curve along with logn is the calculation result of short-term traffic flow sequence under different statistical scales, and V is found along with the reduction of the statistical scales n The longer the mutation time, i.e. the longer the time required for the long memory to disappear, but in practice this long-term memory is not infinite, but gradually diminishes over time until it is forgotten, so short-term predictions are still possible. When tau is 1hour n The rising trend of the statistical curve is not obvious, and the closer the Hurst index is to 0.5, the more noise in the sequence is, the closer the sequence is to a random process.
(5) In order to quantitatively describe the complexity of traffic flow, traffic complexity based on fractal, chaos and entropy is analyzed, as shown in table 2, the Hurst index and sample entropy are gradually reduced along with the increase of statistical scale, and the time sequence is more complex when the sample entropy of 5min sampling is found to be the maximum; the maximum Lyapunov exponent is always a positive number, so that the motion of the system in a certain vector direction is unstable, and the chaotic attractor appears in the direction, so that the motion of the whole system is in a chaotic state. As shown in fig. 8, it can be seen from the different components of the reconstructed phase space of the traffic flow sequence that the trajectories thereof are repeatedly folded and cross each other to form a dense band, and as the statistical scale becomes larger, the traffic characteristics become more obvious.
TABLE 2 traffic flow feature complexity analysis at different statistical scales
Figure BDA0002347692070000111
B. Model predictive analysis
Designing a BP neural network according to the characteristics of traffic flow, wherein the network is divided into an input layer, a hidden layer and an output layer, and a characteristic vector and a corresponding output construction process when the window width is m-4 are used to finally obtain a sample for training the neural network
Figure BDA0002347692070000112
The adopted network structure is 4-3-p, the input layer is provided with 4 nodes which represent the traffic flow of 4 time points before the time node; the hidden layer has 3 nodes, and the output layer has p nodes for the traffic flow predicted by the network. In the improved BP _ Adaboost algorithm, a strong predictor consisting of 10 groups of weak predictors is arranged to predict data samples, wherein an error threshold is set to be 0.1, 3-day historical data are adopted for each data set to predict the traffic flow situation of 4 th day, the front 864 groups are training samples, and the rear 288 groups are test samples. The invention adopts the following 3 error indexes to measure the accuracy of combined prediction.
Figure BDA0002347692070000121
Figure BDA0002347692070000122
Figure BDA0002347692070000123
Wherein: n is the length of the traffic flow data sequence, y i For the sample output value, d i Is the sample target value. Determining the coefficient (R) 2 ) The larger the index is, the better the model effect is, R 2 ∈[0,1](ii) a The smaller the indexes of Mean Square Error (MSE) and mean absolute error (MAD) are, the more reasonable the structure of the corresponding model is.
A Matlab2017b simulation software is used for training a traditional BP method, a BP _ Adaboost method, an MEC-BP _ Adaboost model and an improved MEC-BP _ Adaboost model (the method), the trained models are used for carrying out short-time traffic flow single-step prediction on the data set 1, and the result is shown in a table 3 and fig. 9 and 10.
TABLE 3 comparison of Performance indicators for different prediction models
Figure BDA0002347692070000124
As can be seen from table 3 above and fig. 9 and 10, based on the MEC-BP model, the mean square error and the mean absolute error are respectively reduced by 29.8% and 3.5% compared with the conventional BP model, which proves the effectiveness of MEC in optimizing the initial parameters of the BP model; based on the BP _ Adaboost model, compared with the traditional BP model, the mean square error and the average absolute error are respectively reduced by 56.3 percent and 27.1 percent, which proves that the generalization capability of the Adaboost algorithm to a neural network is greatly improved, and the Adaboost algorithm adopts a weighted majority voting method, so that the prediction precision of the model can be effectively improved, and the phenomenon of 'over-fitting' of the model is avoided; based on the method, compared with a BP model, the mean square error and the mean absolute error are respectively reduced by 78.2 percent and 46.4 percent, and the rationality of the improved method for traffic flow prediction is proved; compared with the MEC-BP _ Adaboost model, the mean square error and the average absolute error are respectively reduced by 44.9 percent and 25.9 percent based on the method, and the weight value of the weak predictor by adopting the error square sum reciprocal criterion is proved, so that the prediction precision of the weak predictor is higher, and the generalization capability of the predictor is more effectively improved.
In order to better show the prediction effect of each weak predictor, the weight of each group of weak prediction functions is solved by a prediction error square sum reciprocal method, so that the performance of each weak predictor is better shown, and the decision performance of the whole model is improved, wherein the weight comparison of each weak predictor of the MEC-BP-Adaboost model and the weight comparison of each weak predictor of the MEC-BP-Adaboost model are shown in a table 4:
TABLE 4 weight comparison of each weak predictor in the two models
Figure BDA0002347692070000131
The experimental results in table 4 show that, according to the improved weight values of 10 MECs optimized neural networks, the weight ratios of the neural networks of 3 rd, 4 th and 8 th of the MEC-BP _ Adaboost model are the largest, which indicates that the 3 neural networks have a more obvious traffic flow prediction effect, and after the improvement of the method, the weight of other neural networks having small influence on the model is reduced, the influence of the 3 neural networks on the whole model is increased, valuable information provided by the networks is fully utilized, and the accuracy of the prediction result is maximized.
In order to further verify the effectiveness and universality of the model, 2-step, 3-step, 4-step and 5-step prediction is carried out by adopting a data set 2, as shown in table 5, so that the prediction error of the method is generally smaller than that of the original method along with the increase of the number of the prediction steps, but the prediction precision is reduced along with the increase of the step length under the same model.
TABLE 5 MSE-value comparison of different prediction step sizes for different models
Figure BDA0002347692070000132

Claims (1)

1. A traffic flow characteristic analysis and prediction method based on Adaboost algorithm specifically comprises the following steps:
step (1) short-term characteristic analysis of traffic flow based on R/S analysis method
Step (1-1) calculation step:
setting a time series { x (t) }, t ═ 1,2, …, M with the following calculations;
1) divide it into length n [ M/n ]]Length of equal-length subsequences, I a Represents the a-th sub-sequence segment, and the time sequence segment at the a-th sub-sequence segment is represented as { x (i) }, i ═ 1,2, …, n; e a Represents the mean over the a-th subsequence segment:
Figure FDA0003623329770000011
2) subsequence section I a Cumulative deviation X (i, a) of the elements in (a) from the mean:
Figure FDA0003623329770000012
3) subsequence section I a Is extremely poor
Figure FDA0003623329770000013
And standard deviation of sample
Figure FDA0003623329770000014
Figure FDA0003623329770000015
Figure FDA0003623329770000016
4) Sub-sequence segment length is the re-standard range value (R/S) of n divisions n
Figure FDA0003623329770000017
Step (1-2) analysis process:
time series are classified into three types according to the difference of the hurst-specific values:
(1)0< H <0.5, which indicates that the sequence is not a random walk sequence, and is an inversely correlated time sequence, namely the future change trend is opposite to the past trend, and the closer H is to 0, the stronger the reverse persistence is;
(2) h ═ 0.5, indicating that the sequence is a standard random walk sequence, i.e. the future trend of change has no relation to the increment of the past trend;
(3)0.5< H <1, indicating that the time series is persistent, a past increasing trend is predictive of a future increasing trend, and a past decreasing trend is predictive of a future decreasing trend; when H approaches 1, it indicates that the past is closely related to the future; quantitative analysis can be carried out according to the future change trend of the persistence and the anti-persistence to the time sequence;
step (2) traffic flow time sequence phase space reconstruction
Setting the time sequence of traffic flow as
Figure FDA0003623329770000021
Let time delay be τ and embedding dimension be M, then the M-dimensional phase space vectors constructed by the time-delayed phase space reconstruction method are:
X={X(t)|X(t)=[x(t),x(t+τ),…,x(t+(m-1)τ)] T ,t=1,2,…,M} (6)
wherein X is an M × M dimensional matrix, the number of phase points in a reconstructed phase space is M ═ N- (M-1) τ, the M phase points form a phase type in the M dimensional phase space, the phase type represents the state of a traffic flow system at a certain moment, the phase type is connected according to the time increasing sequence, the evolution track of the traffic flow system in the M dimensional phase space can be described, and the original one-dimensional time sequence prediction problem is converted into the prediction of the M dimensional phase point sequence; assuming that the predicted phase points { X (t), X (t-1), …, X (t-k) }, k ═ 1,2, …, t-1 are known, and the phase points to be predicted at the current time t + (m-1) τ are { X (t +1), X (t +2), …, X (t + p) }, where p ═ 1 is referred to as one-step prediction, and p >1 is referred to as multi-step prediction, the prediction model is expressed as:
{x(t+(m-1)τ+1),…,x(t+(m-1)τ+p)}=F(X(t),…,X(t-k)) (7)
one-step or multi-step prediction of traffic flow is realized by utilizing the generalization approximation capability of a feedforward neural network;
step (3) MEC-BP fusion
Setting t as the current number of iterations in the MEC global iteration; rho is the iteration number currently carried out in the iteration inside a certain sub-population; each individual in the sub-population represents an initial weight value and a threshold value in a group of BP neural network fusion algorithms, and a single individual N is measured i,j The fitness index is obtained by further calculating the fusion result obtained by the converged BP neural network fusion model after training, and in the internal iteration of the sub-populations, the optimal individual N is selected from each sub-population through a local bulletin board i,pbest Then the individual represents the whole sub-population to participate in the global competition through a global bulletin board, and a global optimal sub-population S is selected gbest And a global optimal individual N included therein gbest (ii) a After multiple iterations, the MEC-BP neural network model obtained by training the initial weight represented by the final global optimal individual and the threshold is the finally obtained multi-source traffic data fusion model;
step (4) neural network integrated prediction model based on Adaboost algorithm
The method comprises the following steps of obtaining sample weight by repeatedly searching a sample feature space, continuously adjusting the weight of a training sample in an iteration process, increasing the weight of a sample with low prediction precision, reducing the weight of a sample with high prediction precision, and combining by adopting a weighted majority voting method to form a strong predictor, wherein the method specifically comprises the following steps:
step (4-1) Adaboost algorithm step
Step 1: data acquisition and network initialization; selecting m groups of training samples T { (X) from the sample space i ,y i ) Giving the weight distribution of training samples as w 1i 1/m, i 1,2, …, m, depending on the sample input and output dimensionsThe method comprises the steps that a network structure is determined, initial weights and threshold values of a neural network are obtained through thought evolution algorithm optimization, D (1) represents the initial weights of obtained samples, and K represents the number of predictors;
D(1)=(w 11 ,w 12 ,…w 1i ,…,w 1m ) (8)
step 2: carry out iteration K ═ 1,2, …, K
(a) When training the k weak predictor, the weak predictor H is used k (x) Training samples and predicting training data output regression error rate ξ k Calculating the maximum error E of the samples on the training set k And relative error xi of each sample ki
E k =max(|y i -H k (X i )|) (9)
Figure FDA0003623329770000031
Figure FDA0003623329770000032
(b) Calculating the weight a of the weak predictor in the final predictor k
Figure FDA0003623329770000033
(c) According to the predicted sequence weight a k And adjusting the weight of the next round of training samples:
D(k+1)=(w k+1,1 ,w k+1,2 ,…,w k+1,m ) (13)
Figure FDA0003623329770000034
step 3: k groups of weak predictors H are obtained after K rounds of training k (x) Combining each weak prediction function according to the weight of the weak predictor to obtain a strong predictor h(x) Comprises the following steps:
Figure FDA0003623329770000035
step 4: in order to better solve the weighted value of each group of weak predictors, K groups of weak predictors H are obtained by training MEC-BP neural network through Adaboost algorithm k (x) Then, the weighted value w of each group of weak prediction functions is carried out by adopting the square sum reciprocal criterion of the prediction error again k Solving to obtain the accumulated strong predictor h (x) -sigma w k *H k (x k ,a k ) (ii) a The larger the square sum of the prediction errors is, the lower the prediction accuracy of the prediction model is, so that the importance of the prediction model in combined prediction is reduced, and a larger weighting coefficient is given to the single prediction model with smaller square sum of the prediction errors in the combined prediction; the calculation method of the weighting coefficient comprises the following steps:
Figure FDA0003623329770000036
let y ki Is the predicted value of the k type weak predictor at the i time, y i E 'is the observed value at the ith time of the same prediction object, N represents the time length' k The prediction error square sum of the kth weak predictor;
Figure FDA0003623329770000041
CN201911401878.5A 2019-12-31 2019-12-31 Adaboost algorithm-based traffic flow characteristic analysis and prediction method Active CN111160650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911401878.5A CN111160650B (en) 2019-12-31 2019-12-31 Adaboost algorithm-based traffic flow characteristic analysis and prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911401878.5A CN111160650B (en) 2019-12-31 2019-12-31 Adaboost algorithm-based traffic flow characteristic analysis and prediction method

Publications (2)

Publication Number Publication Date
CN111160650A CN111160650A (en) 2020-05-15
CN111160650B true CN111160650B (en) 2022-08-09

Family

ID=70559329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911401878.5A Active CN111160650B (en) 2019-12-31 2019-12-31 Adaboost algorithm-based traffic flow characteristic analysis and prediction method

Country Status (1)

Country Link
CN (1) CN111160650B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861436A (en) * 2021-02-18 2021-05-28 天津大学 Real-time prediction method for engine emission
CN113345236A (en) * 2021-06-11 2021-09-03 北京航空航天大学 Time-space traffic state prediction method based on Transformer network
CN114241779B (en) * 2022-02-24 2022-07-29 深圳市城市交通规划设计研究中心股份有限公司 Short-time prediction method, computer and storage medium for urban expressway traffic flow
CN117494295A (en) * 2024-01-03 2024-02-02 江苏安防科技有限公司 BIM-based track traffic operation and maintenance method, system, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708384A (en) * 2012-06-04 2012-10-03 西南交通大学 Bootstrapping weak learning method based on random fern and classifier thereof
EP3035314A1 (en) * 2014-12-18 2016-06-22 Be-Mobile NV A traffic data fusion system and the related method for providing a traffic state for a network of roads
CN107688863A (en) * 2017-07-13 2018-02-13 天津大学 The short-term wind speed high accuracy combination forecasting method that adaptive iteration is strengthened

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708384A (en) * 2012-06-04 2012-10-03 西南交通大学 Bootstrapping weak learning method based on random fern and classifier thereof
EP3035314A1 (en) * 2014-12-18 2016-06-22 Be-Mobile NV A traffic data fusion system and the related method for providing a traffic state for a network of roads
CN107688863A (en) * 2017-07-13 2018-02-13 天津大学 The short-term wind speed high accuracy combination forecasting method that adaptive iteration is strengthened

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Short-Term Tra±c Flow Prediction Model of Wavelet Neural Network Based on Mind Evolutionary Algorithm;Liqiang Xu 等;《International Journal of Pattern Recognition》;20180621;全文 *
基于BP-Adaboost模型的交通流量预测;朱志强;《福建交通科技》;20190430;全文 *
基于布谷鸟算法的小波神经网络短时交通流预测研究;黄晓慧;《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》;20170115;全文 *
混合交通流时间序列的去趋势波动分析;吴建军 等;《物理学报》;20110115;全文 *

Also Published As

Publication number Publication date
CN111160650A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111160650B (en) Adaboost algorithm-based traffic flow characteristic analysis and prediction method
CN109754113B (en) Load prediction method based on dynamic time warping and long-and-short time memory
CN111210633B (en) Short-term traffic flow prediction method based on deep learning
CN108564790B (en) Urban short-term traffic flow prediction method based on traffic flow space-time similarity
CN111915059B (en) Attention mechanism-based Seq2Seq berth occupancy prediction method
CN106448151B (en) Short-term traffic flow prediction method
CN111428926B (en) Regional power load prediction method considering meteorological factors
CN110390349A (en) Bus passenger flow volume based on XGBoost model predicts modeling method
CN104464304A (en) Urban road vehicle running speed forecasting method based on road network characteristics
CN107480815A (en) A kind of power system taiwan area load forecasting method
CN110942637B (en) SCATS system road traffic flow prediction method based on airspace map convolutional neural network
CN112614346B (en) Short-term traffic flow prediction method based on singular spectrum analysis and echo state network
CN109726802B (en) Machine learning prediction method for wind speed in railway and wind farm environment
CN112907970B (en) Variable lane steering control method based on vehicle queuing length change rate
CN110674965A (en) Multi-time step wind power prediction method based on dynamic feature selection
CN105631532A (en) Power system load prediction method using fuzzy decision-based neural network model
CN112396234A (en) User side load probability prediction method based on time domain convolutional neural network
CN113988426A (en) Electric vehicle charging load prediction method and system based on FCM clustering and LSTM
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
CN107704426A (en) Water level prediction method based on extension wavelet-neural network model
CN116665483A (en) Novel method for predicting residual parking space
CN109784562B (en) Smart power grid power load prediction method based on big data space-time clustering
CN114596726A (en) Parking position prediction method based on interpretable space-time attention mechanism
CN111667694B (en) Short-term traffic flow prediction method based on improved DTW-KNN
CN117252285A (en) Multi-index sewage water quality prediction method based on parallel CNN-GRU network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210330

Address after: 310018 No.2 street, Baiyang street, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Applicant after: HANGZHOU DIANZI University

Applicant after: STATE GRID HUBEI ELECTRIC POWER Co.,Ltd.

Address before: 310018 No.2 street, Baiyang street, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Applicant before: HANGZHOU DIANZI University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant