CN115938104A - Dynamic short-time road network traffic state prediction model and prediction method - Google Patents
Dynamic short-time road network traffic state prediction model and prediction method Download PDFInfo
- Publication number
- CN115938104A CN115938104A CN202111115375.9A CN202111115375A CN115938104A CN 115938104 A CN115938104 A CN 115938104A CN 202111115375 A CN202111115375 A CN 202111115375A CN 115938104 A CN115938104 A CN 115938104A
- Authority
- CN
- China
- Prior art keywords
- network
- prediction
- time
- value
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000005457 optimization Methods 0.000 claims abstract description 24
- 230000003068 static effect Effects 0.000 claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 14
- 230000008859 change Effects 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000009471 action Effects 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 18
- 238000002474 experimental method Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000005259 measurement Methods 0.000 claims description 9
- 239000003795 chemical substances by application Substances 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000013461 design Methods 0.000 claims description 6
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 claims description 2
- 108700041286 delta Proteins 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 claims description 2
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 claims 3
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 claims 3
- 230000002787 reinforcement Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Traffic Control Systems (AREA)
Abstract
The invention discloses a dynamic short-time network traffic state prediction method based on deep certainty strategy gradient algorithm optimization. According to the method, vector type short-time traffic state expression is considered in the KNN prediction model, and the model has higher flexibility in processing scenes of rapid and slow traffic state change, conventional traffic evolution and unconventional traffic evolution. Through dynamic optimization of the DDPG algorithm, the short-time traffic state prediction model can be dynamically adjusted and predicted by changing from simple static prediction along with time evolution into the insight traffic state evolution, so that the problem that static and semi-static models can only fit historical data and rules and cannot rapidly adapt to sudden and random changes of real-time traffic states in the prior art is solved, and the prediction precision is further improved.
Description
Technical Field
The invention belongs to the field of traffic big data technology and application, relates to short-time short-circuit network traffic state prediction, and particularly relates to a dynamic short-time short-circuit network traffic state prediction model optimized based on a deep certainty strategy gradient algorithm.
Background
The short-time road network traffic state prediction has important practical functions in an intelligent traffic system and a future-oriented intelligent vehicle-road cooperative system, and is a premise for performing other real-time traffic services such as on-road path guidance and path decision. Therefore, the quality of many basic traffic services is affected by the level of accuracy of the short-term network traffic state prediction.
In recent years, with the diversification development of traffic detectors and the promotion of data storage devices, the research on traffic data acquisition technology and related applications thereof has advanced greatly. Correspondingly, the short-time network traffic state prediction algorithm driven by traffic big data is also endless, and mainly comprises a shallow learning (traditional machine learning) model represented by K nearest neighbors, a support vector machine, a decision tree and the like, and a deep learning model represented by a long-time and short-time memory network, a convolutional neural network and a combination modeling of the long-time and short-time memory network and the convolutional neural network.
However, these two types of models are often trained and constructed with abundant historical data, i.e. the main architecture and hyper-parameters of the model are determined, and no longer adjusted with or only at certain periods during the application process of the model. The 'static' and 'semi-static' models can only better fit historical data and laws, but cannot adapt to sudden and random changes of real-time traffic states quickly and make corresponding adjustments in time.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a dynamic short-time network traffic status prediction model and prediction method,
the complete technical scheme of the invention comprises the following steps:
a dynamic short-time network traffic state prediction method based on deep certainty strategy gradient algorithm optimization comprises the following steps:
the method comprises the following steps: data collection and processing
(1) The time, the position and the vehicle instantaneous speed information uploaded to a superior system by a vehicle-mounted GPS device in a specified time interval are collected,
(2) Calculating to obtain the average value v (t) of the vehicle speed on a certain section l at the moment t, wherein the average values at the moment t and before the moment t are known values,
(3) Get a section of road l 1 ,l 2 ,…,l n Formed road section network corresponding average speed value set V t = (v 1 (t),v 2 (t),…,v n (t)), n is the number of road segments;
(4) Aggregating the observed values of delta-1 moments before the moment t into a space-time matrix X by utilizing the observed values of the moment t and the moment before the moment t t ,X t Indicating the traffic state at time t, X t As shown in the publication (1):
(5) To X t Processing is carried out, and for each single reference state point, a trend vector of the reference state point is calculatedAnd defines traffic state unit X 'in vector mode' t I.e. by
The reference state point here refers to the space-time matrix X t The value is known at a particular time.
Step two: construction of KNN-based static prediction model
(1) Respectively adopting Euclidean distance to reference state point V t The distance ED between i Performing a metric, and using the cosine distance to the trend vectorDistance between CD i Performing measurement, wherein the expression is as follows:
in the formula (I), the compound is shown in the specification,for the ith known reference status point data, <' >>The ith known trend vector is represented by a subscript h in the formula as known historical data in the collected samples; and thus construct a state distance SD for measuring the similarity of state units i :
Wherein u =1,2, …, M and M are historical sample numbers, α is a coefficient for balancing euclidean distance and cosine distance, and the value range is [0,1];
(2) Selecting K neighbors according to similarity measurement results
Calculating a sample X to be predicted t+1 And all known state distances between the historical samples, and K historical samples V with the minimum distance h,1 ,V h,2 ,…,V h,K As neighbors;
(3) Calculating a prediction value of a sample to be predicted
Method for calculating predicted value by using incremental predictionTag value to neighbor, i.e. state distance SD i The gaussian weighting is performed depending on the magnitude of the distance,
for X t Recording the future state point X at the (t + 1) time t+1 Is y t ,
The increment is y of the nearest neighbor h,j And V h,j The difference between, for the j-th neighbor (j =1,2, …, K) the expression:
△y h,j =y h,j -V h,j (6)
i.e. the traffic state variation within the prediction window;
Step two, in the construction of the KNN-based static prediction model, the method further comprises the following steps:
(4) Coarse calibration of model parameters
Calibrating undetermined parameters delta, K and alpha in the KNN-based static prediction model by utilizing a gridding search experiment of the acquired real data, and specifically comprising the following steps:
firstly, establishing an evaluation system of a prediction effect, including a root mean square error MAE and an absolute value percentage error MAPE, to obtain:
wherein N is the number of road sections, N is the number of samples to be predicted in the experiment,respectively obtaining a predicted value and a true value of the ith sample to be predicted;
secondly, discretizing the value range of the parameters to be calibrated, taking different parameter combinations one by one for carrying out experiments, and recording the experiment results;
and selecting the parameter combination with the optimal experimental result as the calibration value of the model parameter.
The method also comprises the third step: the parameter alpha is dynamically optimized based on the DDPG algorithm. The method specifically comprises the following steps:
the following definitions are made:
state S t : including observed external road network traffic state V t And predicting the model's own state P t I.e. S t ={V t ,P t In which V is t For the state unit observed at time t, P t The residual error of the last prediction is known for the prediction model at the moment t;
action a t : the parameter alpha value selected in the decision-making is different from the rough calibration of KNN model parameter, and alpha is [0,1]]The continuous values in the inner are not discretized.
Instant reward r i : defining the execution of action a by means of a coarsely scaled static KNN prediction model t After averageIndex lifting rate as reward function when executing action a t If the obtained indexes are all smaller than those obtained by the rough calibration model, the action a is called to be taken t Is an effective optimization; otherwise, it is said to take action a t Is an optimization of invalidity when taking action a t If the obtained index is larger than the index obtained by the rough calibration model but not more than 1% of the value, the obtained index is called as not completely effective. Accordingly, the reward function is defined as:
wherein, MAE t 、MAE′ t Is paired with a state unit X t Static rough calibration model and selection action a during prediction t The mean absolute value error obtained by the model has the mean absolute value percentage error MAPE t 、MAPE′ t . When a is t When effective, r is known t Is positive; when a is t When it is invalid, r is known t Is negative; when a is t When not completely effective, r is known t Is 0.
The DDPG algorithm training process comprises the following steps:
(1) Initialization parameters
The DDPG adopts an Actor-Critic architecture, the Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and particularly, the functions of the Actor and the Critic are realized by a neural network;
the original Actor and Critic networks are regarded as estimation networks (Online networks), the same Network structure copy is set to be called as a Target Network (Target Network), the estimation Network updates the parameter value when interacting with the environment every time, and the Target Network copies the parameter value of the estimation Network at the specified interval;
firstly, the estimated network parameters of Actor and Critic are initialized and recorded as theta Q And theta μ ,θ Q To estimate network parameters, θ, for Actor μ Network parameters are estimated for Critic and copied to the target network, i.e., theta Q →θ Q’ And theta μ →θ μ’ ;
(2) Experience collection
Firstly, randomly selecting a time point to start prediction by using historical data and a time axis thereof;
recording the time step of the prediction as i, wherein i represents any position number, and recording the traffic state V of the time step of the prediction in the prediction process along with the time change i (ii) a Calculating residual error as prediction model state P according to the prediction result which can be evaluated at the last time i (ii) a The parameter alpha used in the KNN prediction model is the action value a decided by the intelligent agent at the time step i (ii) a Evaluating the predicted effect according to the reward function to obtain r i (ii) a The state S is observed at the next time step, i.e. at the moment i +1 i+1 (ii) a Will (S) i ,a i ,r i ,S i+1 ) As a record, storing in a memory pool for later use;
continuously repeating the prediction process according to a time axis to finish T times of predictions, and recording as a training round;
(3) Empirical playback
Giving a threshold value of the sample parameters in the memory pool, randomly sampling a batch of training estimation networks from the memory pool when the number of the sample records in the memory pool exceeds the threshold value, updating the estimation network parameters and recording the parameters according to a certain gradient calculation method and a certain back propagation ruleFor the gradient of the network parameter θ, the more recent is:
when the interval appoints a time step, the target network parameters are subjected to soft update, specifically:
(4) And finishing the training when the specified number of rounds is reached.
The deep neural network design mode in the DDPG algorithm is as follows:
(1) The Actor network has two input interfaces respectively used for inputting the traffic state V of the road network t And predicting model state P t . On the basis of inputting the traffic state of a road network and the state of a prediction model, the Critic network also needs to input an action value, namely an output value of an Actor network;
(2) The output value of the Actor network is an action value, namely a parameter alpha in a prediction model, the output dimension is 1, and the output layer uses a sigmoid activation function in consideration of the alpha value range; the output value of the Critic network is a Q value, the output dimension is 1, and the value range of the Critic network is not clearly defined, so that the output layer uses a linear activation function;
(3) Critic networks use a method of minimizing a loss function, i.e. a gradient descent method, to update network parameters, the loss function being in the form of a mean square error, in particular
Wherein, Q(s) i ,a i |θ Q ) Output value, q, for the target network i Is a target value whose calculation is based on a state transition
q i =r i +γQ′( s i +1 ,μ′(s i+1 |θ μ’ )|θ Q’ ) (14)
In the formula, Q 'is a target Q network, and mu' is a target strategy network;
updating of the Actor network is based on calculation of policy gradients, i.e.
For a loss of gradient>Representing a gradient operator, Q (s, a | θ) Q ) For online Q function at network parameter theta Q ,s=s i ,a=μ(s i ) The output of the time.
The method also comprises the following four steps: verification of prediction experiment
In training, monitoring the effectiveness of the model in training by observing word prediction rewards or turn cumulative rewards; after the training is finished, the relative quality of the model can be directly verified by calculating and comparing the two evaluation indexes.
Compared with the prior art, the invention has the advantages that:
(1) In the KNN prediction model, a vector type short-time traffic state expression and a state distance measurement method of Euclidean distance and cosine distance fusion are considered, and the model has higher flexibility in processing scenes of rapid traffic state change, conventional and non-conventional traffic evolution.
(2) Through dynamic optimization of the DDPG algorithm, the short-time traffic state prediction model can be dynamically adjusted and predicted by converting simple static prediction along with time evolution into insight traffic state evolution, and the prediction precision is further improved.
Drawings
Fig. 1 is a flow of model construction for the dynamic short-time network traffic state prediction model and the prediction method of the present invention.
FIG. 2 is a main model structure formed by Agent and Actor-Critic in DDPG algorithm.
FIG. 3 is a diagram of the change in the round jackpot during the training of the DDPG algorithm of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
As shown in FIG. 1, the invention provides a dynamic short-term network traffic state prediction model optimized based on a depth certainty strategy gradient algorithm, and a method for predicting a road dynamic short-term network traffic state by using the model. The model mainly comprises a KNN-based static prediction model and a dynamic optimization part based on a DDPG algorithm, and the specific implementation way of the prediction method comprises the following four steps: the method comprises the following steps of data collection and processing, KNN-based static prediction model construction, DDPG algorithm dynamic optimization part construction and training and prediction experiment verification, and specifically comprises the following steps:
the method comprises the following steps: data collection and processing
The invention uses floating car data, which in this example originates from a taxi-onboard GPS device collected in the beijing workers stadium area at 7 months in 2015, which uploads real-time, vehicle location, and instantaneous speed information to a superordinate system at 2 minute intervals. For a given known road network, the collected data can be used to characterize the traffic state of the road network at various times. Taking the vehicle speed value as an example, for a moment t, the average speed value v (t) of the vehicle on a certain road section l can be calculated; in the section of road l 1 ,l 2 ,…,l n (n represents the number of links, n =257 in the present embodiment) the corresponding average speed value on the road network is V t =(v 1 (t),v 2 (t),…,v n (t)), this set of velocity values V is called t The road network traffic state at time t. Because the short-time traffic state change has continuity, the trend of the traffic state change is often difficult to express by using the observation value at a single moment, so the observation values at delta moments before the moment t are aggregated into a space-time matrix to express the traffic state at the moment t, as shown in a publication (1):
in the formula, X t Aggregating the observed values of delta moments before the moment t into a space-time matrix, wherein v (t) is the vehicle position on a certain road section lThe average speed value at the time t; n represents the number of links.
For distinguishing the above concepts, called V t Is a traffic state point at time t, X t Is a traffic state sequence at the time t. For X t Recording the future state point at (t + 1) time as y t And (X) t ,y t ) Is a set of sample pairs. Short-term traffic condition prediction in the present invention uses X t According to y, to t And (6) performing prediction.
In research, the form of the space-time matrix completely represents the traffic state of the road network at each time step, but the following problems exist: on one hand, the state evolution is not outstanding enough along with the time change, and the multi-dimensional time sequence arrangement may obscure some information to cause misjudgment, for example, the Euclidean distances from the origin to (1,2,3,4) and to (4,3,2,1) in the four-dimensional space are the same; on the other hand, in the form of the spatio-temporal matrix, by aggregating a plurality of time step data, the overall data dimension is multiplied, and more resources are consumed during calculation, even dimension disaster is induced, resulting in reduction of result precision.
Thus, the present invention is directed to X t Further processing is performed to instead use a trend vector that yields Vt from a single reference state point VtThe expression defines the traffic state in a vector defining mode, simply depicts the evolution situation of the traffic state and the evolution result thereof, called as a traffic state unit, namely
Step two: KNN-based static prediction model construction
The static short-time prediction model based on KNN mainly selects K most similar samples called as neighbors by calculating the similarity between the samples to be predicted and known samples; and then reasonably speculating the label of the sample to be predicted by using the adjacent label, namely completing the prediction. The method specifically comprises the following construction steps:
(1) Defining a similarity metric function
For two samples, the similarity between them is often measured by calculating their distance: the smaller the distance between the two is, the higher the similarity between the two is; conversely, the lower the similarity. For a traffic state unit formed by two data forms, the invention defines a method for fusing two distance measurement modes, which respectively utilize Euclidean distance to the distance ED between reference state points i Performing a measurement, and using the cosine distance to the distance CD between trend vectors i Perform a measurement, i.e.
In the formula (I), the compound is shown in the specification,for the ith known reference status point data, <' >>The known trend vector with a subscript h in the formula is known historical data in the collected samples; and thus construct a state distance SD for measuring the similarity of state units i :
Wherein u =1,2, …, M and M are historical sample numbers, α is a coefficient for balancing euclidean distance and cosine distance, and the value range is [0,1].
From equation (5) and the above definitions, the value of α determines the tendency of the state distance to both Euclidean and cosine distances, i.e., SD → 0 i ≈CD i The evolution trend of the state plays a decisive factor in the similarity measurement of the state units, and the situation is usually suitable for the situation that the traffic evolution trend is very remarkable in characteristic, such as short-time sudden change or almost constant maintenance of the traffic state; when α → 1, SD i ≈ED i Whether the state units are similar or not is explained to be more dependent on the reference state point of the final result, and the evolution forming process is not important, and the situation can be generally understood as the situation that the traffic evolution trend is more conventional.
(2) And selecting K neighbors according to the similarity measurement result.
Calculating the state distance between the sample to be predicted and all known historical samples, and calculating K historical samples V with the minimum distance h,1 ,V h,2 ,…,V h,K As neighbors.
(3) And calculating the predicted value of the sample to be predicted.
The invention calculates the predicted value by using an incremental prediction mode and carries out Gaussian weighting on the adjacent label value according to the distance.
First, y is defined to be the nearest neighbor in increments h,j And V h,j The difference between them is found for the j-th neighbor (reordering subscript j =1,2, …, K)
△y h,j =y h,j -V h,j (6)
I.e., the amount of traffic state change within the prediction window.
Secondly, calculating a predicted value as
Wherein, in view of the magnitude of the state distance value and the purpose of simplification, the weight can be set
(4) And (5) roughly calibrating model parameters.
In order to realize the prediction function, undetermined parameters in the prediction model are calibrated, including delta, K and alpha, and the specific mode is a gridding search experiment using real data. The K search interval in this example is 5-120, with a spacing of 5; the alpha search interval is 0-1 and the spacing is 0.1.
First, an evaluation system for the prediction effect should be established, including the mean square error (MAE) and the mean square percent error (MAPE), i.e., MAPE
Wherein N is the number of samples to be predicted in the experiment,respectively is the predicted value and the real value of the ith sample to be predicted.
And discretizing the value range of the parameters to be calibrated, taking different parameter combinations one by one to perform experiments, and recording the experiment results.
And finally, selecting the parameter combination with the optimal experimental result as the calibration value of the model parameter. The calibration values at different prediction steps in this example are specifically as follows:
step three: and constructing and training a dynamic optimization part based on the DDPG algorithm.
Through the second step, the prediction model based on the KNN can already complete the expected prediction task, but from the aspect of a parameter calibration method, the model can only carry out static prediction, and the influence caused by objective change and change of short-time traffic flow is ignored by the static model, which reflects that parameters K and alpha which are closely related to a time-varying traffic state in the model are not changed correspondingly any more in the prediction process. Therefore, the invention is further based on the dynamic optimization part construction and training of the DDPG algorithm, the method protected by the invention is not limited to the method adopted by the embodiment, and a person skilled in the art can adopt other feasible methods to carry out the dynamic optimization part construction and training, but the method adopted by the embodiment is a reasonable optimization method at present, and the method of the embodiment is specifically described below.
Calibration experiments prove that the sensitivity of the K value to the influence of the result is effectively inhibited by the Gaussian weighting method, namely the prediction error can be almost always gradually reduced along with the increase of the K. In other words, if it is desired to achieve the purpose of improving accuracy by dynamically adjusting the parameters of the prediction model, the parameter K only needs to be selected with a larger value. However, for the parameter α, the calibration method is contrary to the original purpose of setting the parameter α, that is, the model is compatible with regular and irregular changes of the short-term traffic flow by adjusting the parameter α, and the general calibration method does not have strong flexibility and adaptability.
Therefore, the invention provides a method for dynamically optimizing the parameter alpha through a deep reinforcement learning algorithm. Reinforcement learning is a special machine learning algorithm, which takes a Markov Decision Process (MDP) as a basic modeling idea and mainly includes elements such as a state S, an action a, and a reward r. An Agent constructed by a reinforcement learning algorithm carries out a series of interactions with the environment to complete a sequential decision process, and through continuous self-learning, the Agent can carry out actions capable of maximizing rewards in the face of different states in the environment. The Deep Deterministic Policy Gradient (DDPG) algorithm is a reinforcement learning algorithm combining Deep learning and an Actor-critical architecture, has the advantages of being capable of dealing with continuous state space and continuous action space, and more suitable for the reality problem of outputting high-dimensional continuous actions, so that the dynamic adaptability of the model to a complex environment is improved, and dynamic continuous decision is realized. The method specifically comprises the following steps:
1. elements such as state S, action a, reward r and the like are defined.
First, a problem definition is made explicit, i.e. the dynamic optimization process is modeled with a markov decision process. In the process of rolling prediction of the model, the value problem of the parameter alpha in each prediction process in the prediction model based on the KNN is regarded as a Markov decision, the decision value does not depend on the decision value in the past prediction, but only depends on the road network traffic state observed at present and the prediction effect of the prediction model, and the Markov property is met.
From the modeling described above, the following definitions can be made:
state S t : including observed external road network traffic state V t And predicting the model's own state P t I.e. S t ={V t ,P t In which V is t For the state unit observed at time t, P t The residual of the last prediction is known to the prediction model for time t.
Action a t : the parameter alpha value selected in the decision-making is different from the rough calibration of KNN model parameter, and alpha is [0,1]]The continuous values in the inner are not discretized.
(immediate) award r t : defining the execution of action a by means of a coarsely scaled static KNN prediction model t The latter average index lift rate is used as a reward function. Before the function is given, firstly, a rule for qualitatively evaluating the prediction effect is defined: when action a is performed t If the obtained indexes are all smaller than those obtained by the rough calibration model, the action a is called to be taken t Is an effective optimization; otherwise, it is said to take action a t Is an inefficient optimization. To speed up algorithm convergence, the present invention chooses to be gracefully tolerated, i.e., when action a is taken t If the obtained index is larger than the index obtained by the rough calibration model but not more than 1% of the value, the obtained index is called as not completely effective. Accordingly, the reward function is defined as:
wherein, MAE t 、MAE′ t Is paired with a state unit X t Static rough calibration model and selection action a during prediction t The mean absolute value error obtained by the model has the mean absolute value percentage error MAPE t 、MAPE′ t . When a is t When effective, r is known t Is positive; when a is t When it is invalid, r is known t Is negative; when a is t When not completely effective, r is known t Is 0. This means that when action a is performed t
And 2. Designing a DDPG algorithm training flow.
In combination with the application scenario of short-term traffic state prediction, referring to the classic DDPG algorithm, as shown in FIG. 2, the invention proposes a specific training process as follows:
(1) And initializing parameters.
The agent in the DDPG adopts an Actor-criticic framework, and integrates two methods of value-based and strategy-based. The Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and particularly, the functions of the Actor and the Critic are realized by a neural network. In order to improve the stability of the algorithm, an original Actor and a Critic Network are regarded as estimation networks (Online networks), a copy of the same Network structure is set to be called a Target Network (Target Network), the estimation networks interact with the environment every time, namely, parameter values are updated, and the Target Network copies the parameter values of the estimation networks at specified intervals.
Therefore, the estimated network parameters of Actor and Critic are initialized and recorded as θ Q And theta μ Then copied to the target network, i.e. theta Q →θ Q’ And theta μ →θ μ’ 。
(2) And (4) collecting experiences.
First, a time point is randomly selected to start prediction using the history data and the time axis thereof.
If the predicted time step is recorded as i, the traffic state V of the predicted time step can be recorded in the prediction process along with the change of time i (ii) a Calculating residual error as prediction model state P according to the prediction result which can be evaluated at the last time i (ii) a The parameter alpha used in the KNN prediction model is the action value a decided by the intelligent agent at the time step i (ii) a Evaluating the predicted effect according to the reward function to obtain r i (ii) a In addition, the known state S is observed at the next time step, i.e., at time i +1 i+1 . Will (S) i ,a i ,r i ,S i+1 ) As a record, stored in the memory pool for use.
And continuously repeating the prediction process according to a time axis to finish T times of prediction, and recording as a training round. In this example, T = 100.
(3) And (4) performing experience playback.
As the training progresses, a greater amount of experience is accumulated in the memory pool. Giving a threshold value delta, randomly sampling a batch of training estimation networks from the memory pool when the number of sample records in the memory pool exceeds delta, updating estimation network parameters and recording the parameters according to a certain gradient calculation method and a certain back propagation ruleFor the gradient of the network parameter θ, the update formula is:
when the time step is appointed at the interval, the target network parameters are updated, the updating method is soft updating, and the method specifically comprises the following steps:
(4) And finishing the training when the specified number M of rounds is reached. In this example, take M =4000.
Deep neural network design in DDPG algorithm.
The estimation network of Actor and Critic and a series of deep neural networks formed by the target network are key parts of the strategy learned in the DDPG algorithm. Since the target network is a copy of the estimation network, that is, the structure of the target network is consistent with the estimation network, the estimation network of Actor and Critic only needs to be designed as follows. Although Automated Machine Learning (AutoML), and in particular Neural network Search technology (NAS), has recently received extensive attention and research to provide the possibility of Automated design of deep Neural networks, certain manual rules and limitations should still be given to the specific functions and structures of the networks in the design. In combination with the purpose and the scene of using the Actor and the Critic network, namely considering network input and output, an error function and the like, the invention makes the following requirements on the design:
(1) The Actor network has two input interfaces for inputting the traffic state V of the road network t And predicting model state P t . On the basis of inputting the traffic state of the road network and the state of the prediction model, the criticic network also needs to input an action value, namely an output value of the Actor network.
(2) The output value of the Actor network is an action value, namely a parameter alpha in the prediction model, so that the output dimension is 1, and the sigmoid activation function is used by an output layer in consideration of the value range of the alpha. The output value of the Critic network is a Q value, the output dimension is 1, and the value range of the Critic network is not clearly defined, so that the output layer uses a linear activation function.
(3) Critic networks use a method of minimizing a loss function, i.e. a gradient descent method, to update network parameters, the loss function being in the form of a mean square error, in particular
Wherein, Q(s) i ,a i |θ Q ) Output value, q, for the target network i Is a target value whose calculation is based on a state transition
q i =r i +γQ′(s i+1 ,μ′(s i+1 |θ μ’ )|θ Q’ ) (14)
Updating of the Actor network is based on calculation of policy gradients, i.e.
Step four: and (5) verifying by a prediction experiment.
It is crucial to verify that the training and the derived model are valid. The root mean square error MAE and the absolute value percentage error MAPE form a relatively comprehensive evaluation index system, and the reward function is defined as the average lifting percentage of the DDPG optimization on the KNN prediction model in the two indexes. Thus, in training, the effectiveness of the model in training can be monitored by observing word prediction rewards or turn accumulation rewards; after the training is finished, the relative quality of the model can be directly verified by calculating and comparing the two evaluation indexes.
This example is illustrated with the monitored running total and rewards as shown in FIG. 3. It can be seen from the black trend line in the figure that, when training is started, the intelligent agent obtains a low reward, even a negative value, and the reward is positive after several rounds, although a round with a negative value still appears, the reward value can be completely guaranteed to be positive as the training is gradually promoted, and the value of the trend line is about 100.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.
Claims (6)
1. A dynamic short-time network traffic state prediction method based on deep certainty strategy gradient algorithm optimization is characterized by comprising the following steps:
the method comprises the following steps: data collection and processing
(1) Collecting the time, position and vehicle instantaneous speed information uploaded to the superior system by the vehicle-mounted GPS device in a specified time interval,
(2) Calculating to obtain the average value v (t) of the vehicle speed on a certain section l at the moment t, wherein the average values at the moment t and before the moment t are known,
(3) By the above method, a route section l is obtained 1 ,l 2 ,…,l n Formed road section network corresponding average speed value set V t =(v 1 (t),v 2 (t),…,v n (t)), n is the number of road segments;
(4) Calculating and aggregating the acquired data at the time t and delta-1 times before the time t into a space-time matrix X t ,X t Indicating the traffic situation at time t, X t As shown in the publication (1):
(5) To X t Processing is performed to calculate a trend vector for generating a reference point for each single reference state pointAnd defines a traffic state unit X 'in a vector mode' t I.e. by
Step two: construction of KNN-based static prediction model
(1) Respectively adopting Euclidean distance to reference state point V t The distance ED between i Performing a metric, and using the cosine distance to the trend vectorDistance between CD i Performing measurement, wherein the expression is as follows:
in the formula (I), the compound is shown in the specification,for the ith known reference status point data,/>>The ith known trend vector is represented by a subscript h in the formula as known historical data in the collected samples; and thus construct a state distance SD for measuring the similarity of state units i :
Wherein u =1,2, …, M and M are historical sample numbers, α is a coefficient for balancing euclidean distance and cosine distance, and the value range is [0,1];
(2) Selecting K neighbors according to similarity measurement results
Calculating a sample X to be predicted t+1 And all known state distances among the historical samples, and K historical samples V with the minimum distance h,1 ,V h,2 ,…,V h,K As neighbors;
(3) Calculating a prediction value of a sample to be predicted
Computing predicted values using a method of incremental predictionTag value to neighbor, i.e. state distance SD i Gaussian weighting according to distance>
For X t Remember itFuture state point X at time (t + 1) t+1 Is y t ,
The increment is y of the nearest neighbor h,j And V h,j The difference between them, for the j-th neighbor (j =1,2, …, K) is expressed as:
△y h,j =y h,j -V h,j (6)
namely the traffic state variation in the prediction window;
secondly, the predicted value at the (t + 1) moment in the future is calculated through Gaussian weightingIs composed of
2. The method for predicting the traffic state of the dynamic short-time network based on the deep deterministic strategy gradient algorithm optimization of claim 1,
the second step is as follows: in constructing the static prediction model based on the KNN, the method further comprises the following steps:
(4) Coarse calibration of model parameters
Calibrating undetermined parameters delta, K and alpha in the KNN-based static prediction model by utilizing a gridding search experiment of the acquired real data, wherein the parameters are as follows:
firstly, establishing an evaluation system of a prediction effect, including a root mean square error MAE and an absolute value percentage error MAPE, to obtain:
wherein N is the number of road sections, N is the number of samples to be predicted in the experiment,respectively obtaining a predicted value and a true value of the ith sample to be predicted;
secondly, discretizing the value range of the parameters to be calibrated, taking different parameter combinations one by one for carrying out experiments, and recording the experiment results;
and selecting the parameter combination with the optimal experimental result as the calibration value of the model parameter.
3. The method for predicting traffic state of short-time dynamic short-circuit network based on deep deterministic strategy gradient algorithm optimization of claim 2,
the method also comprises the following third step: the parameter alpha is dynamically optimized based on the DDPG algorithm. The method specifically comprises the following steps:
the following definitions are made:
state S t : including observed external road network traffic state V t And predicting the model's own state P t I.e. S t ={V t ,P t In which V is t For the state unit observed at time t, P t The residual error of the last prediction is known for the prediction model at the moment t;
action a t : the parameter alpha value selected in the decision-making is different from the rough calibration of KNN model parameter, and alpha is [0,1]]The continuous values in the inner are not discretized.
Instant award r t : defining the execution of action a by means of a coarsely scaled static KNN prediction model t The average index lifting rate is used as a reward function when the action a is executed t If the obtained indexes are all smaller than those obtained by the rough calibration model, the action a is called to be taken t Is an effective optimization; otherwise, it is said to take action a t Is an optimization of invalidity when taking action a t If the obtained index is larger than the index obtained by the rough calibration model but not more than 1% of the value, the obtained index is called as not completely effective. Accordingly, a reward function r is defined t Comprises the following steps:
wherein, MAE t 、MAE′ t Are respectively paired with state unit X t Static rough calibration model and selection action a during prediction t Mean absolute error of model acquisition, similarly MAPE t 、MAPE′ t Are respectively paired with state unit X t Static rough calibration model and selection action a during prediction t The mean absolute value percentage error the model takes. When a is t When effective, r is known t Is positive; when a is t When it is invalid, r is known t Is negative; when a is t When not completely effective, r is known t Is 0.
4. The method for predicting traffic state of short-time dynamic short-circuit network based on deep deterministic strategy gradient algorithm optimization of claim 3,
the DDPG algorithm training process comprises the following steps:
(1) Initialization parameters
An Actor-Critic architecture is adopted in the DDPG, the Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and particularly, the functions of the Actor and the Critic are realized by a neural network;
the original Actor and Critic networks are regarded as estimation networks (Online networks), the same Network structure copy is set to be called as a Target Network (Target Network), the estimation Network updates the parameter value when interacting with the environment every time, and the Target Network copies the parameter value of the estimation Network at the specified interval;
firstly, initializing the estimated network parameters of Actor and Critic, and recording the parameters as theta Q And theta μ ,θ Q To estimate network parameters, θ, for Actor μ Network parameters are evaluated for criticic and copied to the target network, i.e., theta Q →θ Q’ And theta μ →θ μ’ ;
(2) Experience collection
Firstly, randomly selecting a time point to start prediction by using historical data and a time axis thereof;
recording the time step of the prediction as i, wherein i represents any position number, and recording the traffic state V of the time step of the prediction in the prediction process along with the time change i (ii) a Calculating residual error as prediction model state P according to the prediction result which can be evaluated at the last time i (ii) a The parameter alpha used in the KNN prediction model is the action value a decided by the intelligent agent at the time step i (ii) a Evaluating the predicted effect according to the reward function to obtain r i (ii) a The state S is observed at the next time step, i.e. at the moment i +1 i+1 (ii) a Will (S) i ,a i ,r i ,S i+1 ) As a record, storing in a memory pool for later use;
continuously repeating the prediction process according to a time axis to finish T times of predictions, and recording as a training round;
(3) Empirical playback
Giving a threshold value of the sample parameters in the memory pool, randomly sampling a batch of training estimation networks from the memory pool when the number of the sample records in the memory pool exceeds the threshold value, updating the estimation network parameters and recording the parameters according to a certain gradient calculation method and a certain back propagation ruleFor the gradient of the network parameter θ, the more recent is:
when the interval appoints a time step, the target network parameters are subjected to soft update, specifically:
(4) And finishing the training when the specified number of rounds is reached.
5. The method for predicting traffic state of short-time dynamic short-circuit network based on deep deterministic strategy gradient algorithm optimization of claim 4,
the deep neural network design mode in the DDPG algorithm is as follows:
(1) The Actor network has two input interfaces for inputting the traffic state V of the road network t And predicting model state P t . On the basis of inputting the traffic state of a road network and the state of a prediction model, the Critic network also needs to input an action value, namely an output value of an Actor network;
(2) The output value of the Actor network is an action value, namely a parameter alpha in a prediction model, the output dimension is 1, and the output layer uses a sigmoid activation function in consideration of the alpha value range; the output value of the Critic network is a Q value, the output dimension is 1, and the value range of the Critic network is not clearly defined, so that the output layer uses a linear activation function;
(3) Critic networks use a method of minimizing a loss function, i.e. a gradient descent method, to update network parameters, the loss function being in the form of a mean square error, in particular
Wherein, Q(s) i ,a i |θ Q ) For the target network output value, q i Is a target value whose calculation is based on a state transition
q i =r i +γQ′(s i+1 ,μ′(s i+1 |θ μ’ )|θ Q’ ) (14)
In the formula, Q 'is a target Q network, and mu' is a target strategy network;
updating of Actor networks is based on calculation of policy gradients, i.e.
6. The method for predicting the traffic state of the dynamic short-time network based on the deep deterministic strategy gradient algorithm optimization of claim 5, further comprising:
step four: verification of prediction experiment
Monitoring the effectiveness of the model in training by observing word prediction rewards or turn accumulation rewards during training; after training is finished, the relative quality of the model is directly verified by calculating and comparing the two evaluation indexes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115375.9A CN115938104B (en) | 2021-09-23 | Dynamic short-time road network traffic state prediction model and prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115375.9A CN115938104B (en) | 2021-09-23 | Dynamic short-time road network traffic state prediction model and prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115938104A true CN115938104A (en) | 2023-04-07 |
CN115938104B CN115938104B (en) | 2024-06-28 |
Family
ID=
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116362418A (en) * | 2023-05-29 | 2023-06-30 | 天能电池集团股份有限公司 | Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery |
CN117974366A (en) * | 2024-04-01 | 2024-05-03 | 深圳市普裕时代新能源科技有限公司 | Energy management system based on industrial and commercial energy storage |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060223529A1 (en) * | 2005-03-31 | 2006-10-05 | Takayoshi Yokota | Data processing apparatus for probe traffic information and data processing system and method for probe traffic information |
JP2010092247A (en) * | 2008-10-07 | 2010-04-22 | Internatl Business Mach Corp <Ibm> | Controller, control method and control program |
CN109190797A (en) * | 2018-08-03 | 2019-01-11 | 北京航空航天大学 | A kind of large-scale road network state Forecasting Approach for Short-term based on improvement k arest neighbors |
CA3060900A1 (en) * | 2018-11-05 | 2020-05-05 | Royal Bank Of Canada | System and method for deep reinforcement learning |
CN112907971A (en) * | 2021-02-04 | 2021-06-04 | 南通大学 | Urban road network short-term traffic flow prediction method based on genetic algorithm optimization space-time residual error model |
CN113313947A (en) * | 2021-05-31 | 2021-08-27 | 湖南大学 | Road condition evaluation method of short-term traffic prediction graph convolution network |
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060223529A1 (en) * | 2005-03-31 | 2006-10-05 | Takayoshi Yokota | Data processing apparatus for probe traffic information and data processing system and method for probe traffic information |
JP2010092247A (en) * | 2008-10-07 | 2010-04-22 | Internatl Business Mach Corp <Ibm> | Controller, control method and control program |
CN109190797A (en) * | 2018-08-03 | 2019-01-11 | 北京航空航天大学 | A kind of large-scale road network state Forecasting Approach for Short-term based on improvement k arest neighbors |
CA3060900A1 (en) * | 2018-11-05 | 2020-05-05 | Royal Bank Of Canada | System and method for deep reinforcement learning |
CN112907971A (en) * | 2021-02-04 | 2021-06-04 | 南通大学 | Urban road network short-term traffic flow prediction method based on genetic algorithm optimization space-time residual error model |
CN113313947A (en) * | 2021-05-31 | 2021-08-27 | 湖南大学 | Road condition evaluation method of short-term traffic prediction graph convolution network |
Non-Patent Citations (1)
Title |
---|
刘易诗;关雪峰;吴华意;曹军;张娜;: "基于时空关联度加权的LSTM短时交通速度预测", 地理信息世界, no. 01, 25 February 2020 (2020-02-25), pages 49 - 55 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116362418A (en) * | 2023-05-29 | 2023-06-30 | 天能电池集团股份有限公司 | Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery |
CN116362418B (en) * | 2023-05-29 | 2023-08-22 | 天能电池集团股份有限公司 | Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery |
CN117974366A (en) * | 2024-04-01 | 2024-05-03 | 深圳市普裕时代新能源科技有限公司 | Energy management system based on industrial and commercial energy storage |
CN117974366B (en) * | 2024-04-01 | 2024-06-11 | 深圳市普裕时代新能源科技有限公司 | Energy management system based on industrial and commercial energy storage |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111310915B (en) | Data anomaly detection defense method oriented to reinforcement learning | |
CN109754605B (en) | Traffic prediction method based on attention temporal graph convolution network | |
CN112465151A (en) | Multi-agent federal cooperation method based on deep reinforcement learning | |
CN110794308B (en) | Method and device for predicting train battery capacity | |
CN115238850B (en) | Mountain area slope displacement prediction method based on MI-GRA and improved PSO-LSTM | |
CN116562514B (en) | Method and system for immediately analyzing production conditions of enterprises based on neural network | |
Dao et al. | Deep reinforcement learning monitor for snapshot recording | |
Ge et al. | An improved PF remaining useful life prediction method based on quantum genetics and LSTM | |
CN115719294A (en) | Indoor pedestrian flow evacuation control method and system, electronic device and medium | |
Ch et al. | Groundwater level forecasting using SVM-PSO | |
Zeng et al. | A survey on causal reinforcement learning | |
Carolina Jara Ten Kathen et al. | A comparison of pso-based informative path planners for autonomous surface vehicles for water resource monitoring | |
CN110779526B (en) | Path planning method, device and storage medium | |
CN117332693A (en) | Slope stability evaluation method based on DDPG-PSO-BP algorithm | |
CN115906673B (en) | Combat entity behavior model integrated modeling method and system | |
CN115938104A (en) | Dynamic short-time road network traffic state prediction model and prediction method | |
CN115174263B (en) | Attack path dynamic decision method and device | |
CN114861368B (en) | Construction method of railway longitudinal section design learning model based on near-end strategy | |
CN115938104B (en) | Dynamic short-time road network traffic state prediction model and prediction method | |
CN113762464B (en) | Train operation reference curve dynamic generation method based on learning | |
CN113837443A (en) | Transformer substation line load prediction method based on depth BilSTM | |
CN114911157A (en) | Robot navigation control method and system based on partial observable reinforcement learning | |
Alpcan | Dual control with active learning using Gaussian process regression | |
CN117556681B (en) | Intelligent air combat decision method, system and electronic equipment | |
Tang et al. | Learning to Solve Soft-Constrained Vehicle Routing Problems with Lagrangian Relaxation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |