CN115938104B - Dynamic short-time road network traffic state prediction model and prediction method - Google Patents

Dynamic short-time road network traffic state prediction model and prediction method Download PDF

Info

Publication number
CN115938104B
CN115938104B CN202111115375.9A CN202111115375A CN115938104B CN 115938104 B CN115938104 B CN 115938104B CN 202111115375 A CN202111115375 A CN 202111115375A CN 115938104 B CN115938104 B CN 115938104B
Authority
CN
China
Prior art keywords
time
network
prediction
state
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111115375.9A
Other languages
Chinese (zh)
Other versions
CN115938104A (en
Inventor
任毅龙
姜涵
于海洋
晁文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111115375.9A priority Critical patent/CN115938104B/en
Publication of CN115938104A publication Critical patent/CN115938104A/en
Application granted granted Critical
Publication of CN115938104B publication Critical patent/CN115938104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a depth deterministic strategy gradient algorithm optimization-based dynamic short-time road network traffic state prediction method, which comprises the steps of collecting and sorting data uploaded by a vehicle-mounted device to an upper system, constructing a KNN-based static prediction model, constructing and training a dynamic optimization part based on DDPG algorithm, and dynamically optimizing parameters through a depth reinforcement learning algorithm. The method considers the short-time traffic state expression of the vector type in the KNN prediction model, and gives the model greater flexibility in processing traffic state rapid change, conventional and non-conventional traffic evolution scenes. By dynamic optimization of DDPG algorithm, the short-time traffic state prediction model can be dynamically adjusted and predicted by converting simple static prediction which evolves along with time into learning traffic state evolution, so that the problems that static and semi-static models in the prior art can only fit historical data and rules and cannot be quickly adapted to sudden and random changes of real-time traffic states are solved, and prediction accuracy is further improved.

Description

Dynamic short-time road network traffic state prediction model and prediction method
Technical Field
The invention belongs to the field of traffic big data technology and application, relates to short-time road network traffic state prediction, and in particular relates to a dynamic short-time road network traffic state prediction model optimized based on a depth deterministic strategy gradient algorithm.
Background
The short-time road network traffic state prediction has important practical effects in both intelligent traffic systems and intelligent vehicle-road cooperative systems facing the future, and is a precondition for carrying out other real-time traffic services such as on-road path induction, path decision and the like. Therefore, the quality of many basic traffic services is affected by the accuracy of the prediction of the traffic state of the short-term road network.
In recent years, along with the diversified development of traffic detectors and the improvement of data storage devices, traffic data acquisition technology and related application research thereof have been greatly advanced. Correspondingly, the short-time road network traffic state prediction algorithm driven by traffic big data is also endless and mainly comprises a shallow learning (traditional machine learning) model represented by K nearest neighbor, a support vector machine, a decision tree and the like, and a deep learning model represented by long-time memory network, convolution neural network and combination modeling of the two.
However, these two types of models are often trained and built using robust historical data, i.e., determining the main architecture and super parameters of the model, and are no longer tuned with or only at certain periods of the model during application. The static and semi-static models can only better fit historical data and rules, but cannot quickly adapt to sudden and random changes of real-time traffic states and make corresponding adjustments in time.
Disclosure of Invention
In order to solve the technical problems, the invention provides a dynamic short-time road network traffic state prediction model and a prediction method,
The complete technical scheme of the invention comprises the following steps:
a dynamic short-time road network traffic state prediction method based on depth deterministic strategy gradient algorithm optimization comprises the following steps:
step one: data collection and processing
(1) The time, position and vehicle instantaneous speed information uploaded to a superior system by a vehicle-mounted GPS device in a specified time interval are collected,
(2) Calculating to obtain an average value v (t) of the vehicle speed on a certain road section l at the moment t, wherein the average values at the moment t and the moment before t are known values,
(3) Obtaining a corresponding average speed value set V t= (v1(t),v2(t),…,vn (t)) on a road section network formed by road sections l 1,l2,…,ln, wherein n is the number of the road sections;
(4) Using time t, observations at delta-1 times prior to time t are aggregated into a spatio-temporal matrix X t,Xt representing the traffic state at time t, X t is as shown in (1):
(5) Processing X t, calculating trend vector of each single reference state point And defines the traffic state unit X' t in a vector manner, i.e
Wherein the method comprises the steps of
Reference state points herein refer to specific values at a certain moment in the spatiotemporal matrix X t, which values are known.
Step two: construction of a KNN-based static predictive model
(1) Measuring the distance ED i between the reference state points V t by using Euclidean distance and measuring the trend vector by using cosine distanceThe distance CD i between the two is measured, and the expression is:
In the method, in the process of the invention, For the i-th known reference state point data,The i-th known trend vector, where the expression with the subscript h is known historical data in the collected sample; and constructs a state distance SD i to measure state cell similarity therefrom:
wherein u=1, 2, …, M is the number of historical samples, α is the coefficient of balance euclidean distance and cosine distance, and the value range is [0,1];
(2) Selecting K neighbors according to similarity measurement results
Calculating state distances between a sample X t+1 to be predicted and all known historical samples, and taking K historical samples V h,1,Vh,2,…,Vh,K with the smallest distance as neighbors;
(3) Calculating a predicted value of a sample to be predicted
Method for calculating predicted value using delta predictionThe label values of the neighbors, i.e. the state distances SD i are gaussian weighted according to the distance size,
For X t, the future state point X t+1 at the time point (t+1) is recorded as y t,
The delta is the difference between y h,j and V h,j for the nearest neighbor, and for the j-th nearest neighbor (j=1, 2, …, K) the expression is:
△yh,j=yh,j-Vh,j(
6)
I.e., traffic state variables within a prediction window;
secondly, calculating a predicted value through Gaussian weighting Is that
In the weights
Step two, constructing a static prediction model based on KNN, which further comprises the following steps:
(4) Coarse calibration of model parameters
Calibrating undetermined parameters delta, K and alpha in the KNN-based static prediction model by utilizing a gridding search experiment of the acquired real data, wherein the calibration method specifically comprises the following steps:
Firstly, establishing an evaluation system of a prediction effect, wherein the evaluation system comprises a root mean square error MAE and an absolute value percentage error MAPE, and obtaining:
wherein N is the number of road segments, N is the number of samples to be predicted in the experiment, The predicted value and the true value of the ith sample to be predicted are respectively;
Secondly, discretizing the value range of the parameter to be calibrated, taking different parameter combinations one by one for experiment, and recording the experimental result;
and selecting the parameter combination with the best experimental result as a calibration value of the model parameters.
The method also comprises the following step three: the parameter alpha is dynamically optimized based on DDPG algorithm. The method specifically comprises the following steps:
The following definitions are made:
State S t: the method comprises the steps of observing an external road network traffic state V t and a prediction model self state P t, namely S t={Vt,Pt, wherein V t is a state unit observed at a time t, and P t is a residual error of the prediction model known to be predicted last time at the time t;
action a t: the alpha value of the parameter selected in the decision is different from the coarse calibration of the KNN model parameter, and alpha is a continuous value in [0,1] and is not discretized.
Instant prize r i: defining an average index lifting rate after executing the action a t as a reward function by means of a coarse-calibrated static KNN prediction model, and when indexes obtained by executing the action a t are smaller than indexes obtained by the coarse-calibrated model, then the action a t is regarded as effective optimization; conversely, action a t is said to be an ineffective optimization, and is said to be not fully effective when the index obtained by action a t is greater than the index obtained by the coarse calibration model, but does not exceed 1% of the value. Accordingly, a bonus function is defined as:
Wherein MAE t、MAE′t is the mean absolute value error obtained by the static coarse calibration model and the selection action a t model when predicting the state unit X t, and similarly, the mean absolute value percentage error MAPE t、MAPE′t is included. Then when a t is valid, it is known that r t is positive; when a t is inactive, r t is known to be negative; when a t is not fully effective, r t is found to be 0.
The DDPG algorithm training flow is as follows:
(1) Initializing parameters
An Actor-Critic architecture is adopted in DDPG, the Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and specifically, the functions of the Actor and the Critic are realized by a neural network;
The original Actor and Critic networks are regarded as evaluation networks (Online networks), copies of the same Network structure are set to be called Target networks (Target networks), the evaluation networks interact with the environment each time, parameter values are updated, and the Target networks copy the parameter values of the evaluation networks at specified intervals;
Initializing estimated network parameters of an Actor and Critic, recording as θ Q and θ μQ as the estimated network parameters of the Actor, and θ μ as the estimated network parameters of Critic, and copying to a target network, namely θ Q→θQ' and θ μ→θμ';
(2) Experience collection
Firstly, randomly selecting a time point to start prediction by utilizing historical data and a time axis thereof;
Recording the predicted time step as i, wherein i represents any position number, and recording the traffic state V i of the predicted time step in the prediction process of time variation; calculating residual errors according to the prediction result which can be evaluated last time, and taking the residual errors as a prediction model state P i; the parameter alpha used in the KNN prediction model is an action value a i determined by the agent in the time step; evaluating the prediction effect according to the reward function to obtain r i; observing and obtaining a state S i+1 at the next time step, namely i+1; storing (S i,ai,ri,Si+1) as a record in a memory pool for later use;
Continuously repeating the prediction process according to a time axis to complete T predictions, and marking the prediction as a training round;
(3) Experience playback
Given a sample parameter threshold in a memory pool, when the number of sample records in the memory pool exceeds the threshold, a batch training evaluation network is randomly sampled from the sample records in the memory pool, and the evaluation network parameters are updated and recorded according to a certain gradient calculation method and a counter-propagation ruleBeing the gradient of the network parameter θ, the update is:
gradients for network parameters θ Q and θ μ;
when the time steps are appointed at intervals, the target network parameters are updated in a soft mode, specifically:
(4) The specified number of rounds is reached and the training is ended.
The deep neural network design mode in DDPG algorithm is as follows:
(1) The Actor network has two input interfaces for inputting road network traffic state V t and predictive model state P t, respectively. The Critic network also needs to input an action value, namely an output value of the Actor network, on the basis of inputting the traffic state of the road network and the state of the prediction model;
(2) The output value of the Actor network is an action value, namely, the parameter alpha in the prediction model is output, the output dimension is 1, and the output layer uses a sigmoid activation function in consideration of the value range of alpha; the output value of the Critic network is Q value, the output dimension is 1, and the linear activation function is used in the output layer because the value range is not defined clearly;
(3) Critic networks update network parameters using a method that minimizes a loss function, i.e., a gradient descent method, whose loss function is in the form of a mean square error, in particular
Wherein Q (s i,aiQ) is the target network output value, Q i is the target value, and the calculation is based on state transition
qi=ri+γQ′(si+1,μ′(si+1μ')|θQ') (14)
Wherein Q 'is a target Q network, and mu' is a target strategy network;
Updating the Actor network is based on calculation of policy gradients, i.e
In order to lose the gradient,Representing the gradient operator, Q (s, a|θ Q) is the output of the online Q function at network parameters θ Q,s=si,a=μ(si).
The method also comprises the following steps: prediction experiment verification
In training, monitoring the effectiveness of the model in training by observing word predictive rewards or round progressive rewards; after training is completed, the relative quality of the model can be directly verified by calculating and comparing the two evaluation indexes.
The invention has the advantages compared with the prior art that:
(1) In the KNN prediction model, a short-time traffic state expression of a vector formula is considered, and a state distance measurement method of fusion of Euclidean distance and cosine distance is introduced, so that the model has higher flexibility in processing traffic state change emergency, conventional and non-conventional traffic evolution scenes.
(2) Through the dynamic optimization of DDPG algorithm, the short-time traffic state prediction model can be changed from the static prediction which is simple in evolution along with time to the prediction which is known about the traffic state evolution to dynamically adjust and predict, so that the prediction precision is further improved.
Drawings
FIG. 1 is a flow of model construction for a dynamic short-time road network traffic state prediction model and a prediction method according to the invention.
FIG. 2 is a diagram of the main model structure of the Agent and Actor-Critic in the DDPG algorithm of the present invention.
FIG. 3 is a round jackpot variation during training of the algorithm of the present invention DDPG.
Detailed Description
The invention is further described below with reference to the drawings and the detailed description.
As shown in fig. 1, the invention provides a dynamic short-time road network traffic state prediction model optimized based on a depth deterministic strategy gradient algorithm and a method for predicting the road dynamic short-time road network traffic state by using the model. The model mainly comprises a KNN-based static prediction model and a DDPG algorithm-based dynamic optimization part, and the specific implementation way of the prediction method comprises four steps: the method comprises the steps of data collection and processing, KNN-based static prediction model construction, DDPG algorithm-based dynamic optimization part construction and training and prediction experiment verification, and specifically comprises the following steps:
step one: data collection and processing
The present invention uses floating car data, which in this example is derived from the collection of taxi onboard GPS devices in beijing worker stadium area at month 2015, which upload real time, vehicle position, and instantaneous speed information to the upper system every 2 minute time interval. For a given known road network, the collected data may be used to characterize the traffic state of the road network at various times. Taking a vehicle speed value as an example, for the time t, an average speed value v (t) of the vehicle on a certain road section l can be calculated; the average speed value corresponding to the road network composed of the segments l 1,l2,…,ln (n represents the number of segments, n=257 in the present embodiment is V t=(v1(t),v2(t),…,vn (t)), and this set of speed values V t is referred to as the road network traffic state at the time t. Because the short-time traffic state change has continuity, the trend of the traffic state change is difficult to be expressed by using the observed value at a single moment, so the invention uses the observed values at delta moments before the moment t to be aggregated into a space-time matrix to represent the traffic state at the moment t, as shown in a public expression (1):
wherein X t is a space-time matrix formed by aggregating observed values of delta moments before a moment t, and v (t) is an average speed value of a vehicle at the moment t on a certain road section l; n represents the number of road segments.
In order to distinguish the traffic state point at time t from the concept V t, X t is a traffic state sequence at time t. For X t, it is noted that the future state point at time (t+1) is y t, and (X t,yt) is a set of sample pairs. The short-time traffic state prediction in the invention is based on X t, and predicts y t.
In the research, the form of the space-time matrix fully characterizes the road network traffic state of each time step, but has the following problems: on the one hand, the state evolution is not enough outstanding with the time change, and the multi-dimensional time sequence arrangement can blur some information to cause misjudgment, for example, the Euclidean distance from the origin to (1, 2,3, 4) and from the origin to (4, 3,2, 1) is the same in four-dimensional space; on the other hand, the space-time matrix mode increases the whole data dimension by aggregating a plurality of time step data, more resources are consumed in calculation, and even dimension disasters are induced, so that the accuracy of the result is reduced.
Thus, the present invention further processes X t to change it to use a trend vector consisting of a single reference state point Vt and resulting VtThe representation defines the traffic state in a vector definition manner, simply characterizes the evolution condition of the traffic state and the evolution result thereof, and is called a traffic state unit, namely
Wherein the method comprises the steps of
Step two: static prediction model construction based on KNN
The KNN-based static short-time prediction model used in the invention mainly selects K most similar samples by calculating the similarity between a sample to be predicted and a known sample, which is called as a neighbor; and reasonably presuming the label of the sample to be predicted by utilizing the label of the neighbor, namely, completing the prediction. The method specifically comprises the following construction steps:
(1) Defining a similarity metric function
For two samples, the similarity of the two is often measured by calculating their distance: the smaller the distance between the two, the higher the similarity between the two is; otherwise, the lower the similarity. For a traffic state unit composed of two data forms, the invention defines a method for fusing two distance measurement modes, namely measuring the distance ED i between reference state points by using Euclidean distance and measuring the distance CD i between trend vectors by using cosine distance, namely
In the method, in the process of the invention,For the i-th known reference state point data,The i-th known trend vector, where the expression with the subscript h is known historical data in the collected sample; and constructs a state distance SD i to measure state cell similarity therefrom:
wherein u=1, 2, …, M is the number of historical samples, α is the coefficient of balancing the euclidean distance and the cosine distance, and the value range is [0,1].
As can be seen from the formula (5) and the definition above, the value of α determines the tendency of the state distance to the two parts of the euclidean distance and the cosine distance, that is, when α→0, SD i≈CDi, which illustrates that the evolution trend of the state plays a decisive factor in the similarity measure of the state units, and this situation is often applicable to the situation that the traffic evolution trend features are very significant, for example, the traffic state has a short-time mutation or remains almost constant; when alpha is equal to 1, SD i≈EDi indicates that whether the state units are similar or not depends on the reference state point of the final result, but the evolution process is not important, and the situation can be often understood as the situation that the traffic evolution trend is more conventional.
(2) K neighbors are selected according to the similarity measurement result.
And calculating the state distance between the sample to be predicted and all known historical samples, and taking K historical samples V h,1,Vh,2,…,Vh,K with the smallest distance as neighbors.
(3) And calculating a predicted value of the sample to be predicted.
The invention calculates the predicted value and uses the form of increment prediction, and carries out Gaussian weighting on the label value of the neighbor according to the distance.
First, define the delta between y h,j and V h,j for the nearest neighbor, then there is a difference for the j-th nearest neighbor (reorder footer j=1, 2, …, K)
△yh,j=yh,j-Vh,j (6)
I.e. the amount of traffic state change within the prediction window.
Then, the predicted value is calculated by Gaussian weighting to be
Wherein, in view of the state distance value and for the sake of brevity, weights can be set
(4) And (5) roughly calibrating model parameters.
In order to realize the prediction function, undetermined parameters in the prediction model are also calibrated, including delta, K and alpha, in a specific mode, a gridding search experiment using real data is adopted. In this example, the K search interval is 5-120, and the interval is 5; the alpha search interval is 0-1, and the interval is 0.1.
Firstly, an evaluation system of the predictive effect should be established, including root mean square error (mean absolute error, MAE) and absolute percentage error (mean absolute percentage error, MAPE, i.e.)
Wherein N is the number of samples to be predicted in the experiment,The predicted value and the true value of the i-th sample to be predicted are respectively.
And secondly, discretizing the range of values of parameters to be calibrated, taking different parameter combinations one by one for experiment, and recording the experimental result.
And finally, selecting the parameter combination with the best experimental result as a calibration value of the model parameters. The calibration values at different prediction steps in this example are specifically as follows:
Step three: and dynamically optimizing part construction and training based on DDPG algorithm.
Through the second step, the prediction model based on KNN can complete the expected prediction task, but from the aspect of a parameter calibration method, the model can only conduct static prediction, and the static model ignores objective changes of short-time traffic flow and influences caused by the changes, which are reflected in parameters K and alpha closely related to time-varying traffic states in the model, and do not change adaptively in the prediction process. Therefore, the method is further based on DDPG algorithm dynamic optimization part construction and training, the method protected by the invention is not limited to the mode adopted by the embodiment, and a person skilled in the art can adopt other feasible modes to carry out dynamic optimization part construction and training, but the embodiment adopts the current more reasonable optimization method, and the method of the embodiment is specifically described below.
Calibration experiments prove that the Gaussian weighting method effectively suppresses the sensitivity of the K value to the effect, namely the prediction error almost always gradually decreases with the increase of K. In other words, if it is desired to achieve the goal of improving the accuracy by dynamically adjusting the parameters of the prediction model, only a larger value needs to be selected for the parameter K. However, for the parameter alpha, the calibration method is contrary to the original purpose of setting the parameter alpha, namely, the model is compatible with the conventional and unconventional changes of the short-time traffic flow through the adjustment of the parameter alpha, and the calibration of the general system does not have stronger flexibility and adaptability.
Therefore, the invention provides a method for dynamically optimizing the parameter alpha through a deep reinforcement learning algorithm. Reinforcement learning is a special machine learning algorithm, which takes a Markov Decision Process (MDP) as a basic modeling idea and mainly comprises the elements of a state S, an action a, a reward r and the like. And the Agent constructed by the reinforcement learning algorithm performs a series of interactions with the environment to complete a sequential decision process, and makes actions capable of maximizing rewards in different states in the environment through continuous self-learning. The depth deterministic strategy Gradient (DDPG) algorithm is a reinforcement learning algorithm combining deep learning and an Actor-Critic architecture, has the advantages of being capable of coping with continuous state space and continuous action space, and is more suitable for outputting high-dimensional continuous action real problems so as to improve the dynamic adaptability of a model to a complex environment and realize dynamic continuous decision. The method specifically comprises the following steps:
1. defining elements such as state S, action a, and prize r.
First, the problem definition is explicit, i.e. the dynamic optimization process is modeled with a markov decision process. In the rolling prediction process of the model, the value problem of the parameter alpha in each prediction process of the KNN-based prediction model is regarded as one time of Markov decision, the decision value in the past prediction is not depended, and the markov is satisfied only by the current observed road network traffic state and the prediction effect of the prediction model.
From the modeling described above, the following definitions can be made:
State S t: the method comprises the steps of observing an external road network traffic state V t and a prediction model self state P t, namely S t={Vt,Pt, wherein V t is a state unit observed at the moment t, and P t is a residual error of the last prediction known by the prediction model at the moment t.
Action a t: the alpha value of the parameter selected in the decision is different from the coarse calibration of the KNN model parameter, and alpha is a continuous value in [0,1] and is not discretized.
(Immediate) rewards r t: the average index boost rate after performing action a t is defined as a reward function by means of a coarsely calibrated static KNN prediction model. Before giving this function, first a rule is defined how to evaluate the predictive effect qualitatively: when the indexes obtained by executing the action a t are smaller than the indexes obtained by the coarse calibration model, the action a t is called as effective optimization; otherwise, take action a t is said to be an ineffective optimization. In order to accelerate algorithm convergence, the present invention chooses to be moderately tolerant, i.e., not fully effective when the index obtained by taking action a t is greater than the index obtained by the coarse calibration model, but not more than 1% of its value. Accordingly, a bonus function is defined as:
Wherein MAE t、MAE′t is the mean absolute value error obtained by the static coarse calibration model and the selection action a t model when predicting the state unit X t, and similarly, the mean absolute value percentage error MAPE t、MAPE′t is included. Then when a t is valid, it is known that r t is positive; when a t is inactive, r t is known to be negative; when a t is not fully effective, r t is found to be 0. This means that when action a is performed t
And 2, designing a DDPG algorithm training flow.
In combination with the application scenario of short-time traffic state prediction, and referring to a classical DDPG algorithm, as shown in fig. 2, the invention proposes a specific training flow as follows:
(1) Initializing parameters.
The agent in DDPG adopts an Actor-Critic architecture, and integrates two types of methods based on values and policies. The Actor is responsible for outputting actions, interacting with the environment and learning strategies, and the Critic is responsible for evaluating the actions and improving the strategies, and specifically, the functions of the Actor and the Critic are realized by the neural network. In order to improve the stability of the algorithm, the original Actor and Critic networks are regarded as evaluation networks (Online networks), copies of the same Network structure are set to be called Target networks (Target networks), the evaluation networks interact with the environment each time, namely parameter values are updated, and the Target networks copy the parameter values of the evaluation networks at specified intervals.
Therefore, the estimated network parameters of the Actor and Critic are initialized, denoted as θ Q and θ μ, and then copied to the target network, namely θ Q→θQ' and θ μ→θμ'.
(2) And (5) experience collection.
First, a time point is randomly selected to start prediction by using history data and a time axis thereof.
Recording the predicted time step as i, and recording the traffic state V i of the predicted time step in the prediction process of time variation; calculating residual errors according to the prediction result which can be evaluated last time, and taking the residual errors as a prediction model state P i; the parameter alpha used in the KNN prediction model is an action value a i determined by the agent in the time step; evaluating the prediction effect according to the reward function to obtain r i; in addition, the next time step, i.e., i+1, is observed for the state S i+1. And storing (S i,ai,ri,Si+1) as a record in a memory pool for standby.
And continuously repeating the prediction process according to the time axis to finish T predictions, and recording the predictions as a training round. In this example, t=100 is taken.
(3) Experience playback.
As training proceeds, a relatively rich experience will be accumulated in the memory pool. Given a threshold delta, when the number of sample records in a memory pool exceeds delta, a batch training estimation network is randomly sampled from the memory pool, an estimation network parameter is updated according to a certain gradient calculation method and a counter-propagation rule, and the parameters of the estimation network are recordedThe gradient for the network parameter θ is updated as follows:
When the interval is designated in time steps, updating the target network parameters, wherein the updating method is soft updating, and specifically comprises the following steps:
(4) The designated round number M is reached, and the training is finished. In this example, m=4000 is taken.
Deep neural network design in DDPG algorithm.
The series of deep neural networks consisting of the evaluation network and the target network of the Actor and Critic are key parts of the strategy learned in DDPG algorithm. Since the target network is a copy of the evaluation network, i.e., the structural settings of the target network are consistent with the evaluation network, only the evaluation networks of the Actor and Critic need be designed as follows. Although Automated machine learning (Automated MACHINE LEARNING, AUTOML), and particularly neural network search technology (Neural Architecture Search, NAS), has received extensive attention and research in recent times, which has made it possible to automate the design of deep neural networks, certain manual regulations and restrictions should still be given in the design for the specific functions and structures of the network. In combination with the purpose and scene of using the Actor and Critic networks, namely considering network input and output, error function and the like, the invention has the following requirements on the design:
(1) The Actor network has two input interfaces for inputting road network traffic state V t and predictive model state P t, respectively. The Critic network also needs to input an action value, namely an output value of the Actor network, on the basis of inputting the traffic state of the road network and the state of the prediction model.
(2) The output value of the Actor network is an action value, namely a parameter alpha in the prediction model, so that the output dimension is 1, and the sigmoid activation function is used by the output layer in consideration of the value range of alpha. The output value of the Critic network is Q value, the output dimension is 1, and the linear activation function is used in the output layer because the value range is not clearly defined.
(3) Critic networks update network parameters using a method that minimizes a loss function, i.e., a gradient descent method, whose loss function is in the form of a mean square error, in particular
Wherein Q (s i,aiQ) is the target network output value, Q i is the target value, and the calculation is based on state transition
qi=ri+γQ′(si+1,μ′(si+1μ')|θQ') (14)
Updating the Actor network is based on calculation of policy gradients, i.e
Step four: and (5) verifying a prediction experiment.
It is crucial to verify that the training and the resulting model are valid. The root mean square error MAE and the absolute value percentage error MAPE form a comprehensive evaluation index system, and the reward function is defined as the average lifting percentage of DDPG optimization on the KNN prediction model. Thus, in training, the effectiveness of the model in training may be monitored by observing word predictive rewards or round-robin jackpots; after training is completed, the relative quality of the model can be directly verified by calculating and comparing the two evaluation indexes.
This example illustrates the monitored cumulative meetings and rewards as shown in FIG. 3. It can be seen from the black trend line in the graph that when training is started, the agent obtains very low rewards, even negative values, and positive values after a few rounds, and although negative rounds still occur, the rewards can be completely guaranteed to be positive values along with gradual lifting of training, and the trend line value is about 100.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent structural changes made to the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (6)

1. A dynamic short-time road network traffic state prediction method based on depth deterministic strategy gradient algorithm optimization is characterized by comprising the following steps:
step one: data collection and processing
(1) The time, position and vehicle instantaneous speed information uploaded to a superior system by a vehicle-mounted GPS device in a specified time interval are collected,
(2) Calculating the average value v (t) of the vehicle speed on a certain road section l at the moment t, wherein the average value at the moment t and the moment before t is known,
(3) Obtaining a corresponding average speed value set V t=(v1(t),v2(t),…,vn (t) on a road section network formed by road sections l 1,l2,…,ln by using the method, wherein n is the number of the road sections;
(4) Using the collected data of delta-1 times before time t, calculating and aggregating into a space-time matrix X t,Xt to represent traffic state at time t, X t is as shown in (1):
(5) Processing X t, calculating trend vectors for reference points for each single reference state point And defines the traffic state unit X' t in a vector manner, i.e
Wherein the method comprises the steps of
Step two: construction of a KNN-based static predictive model
(1) Measuring the distance ED i between the reference state points V t by using Euclidean distance and measuring the trend vector by using cosine distanceThe distance CD i between the two is measured, and the expression is:
In the method, in the process of the invention, For the i-th known reference state point data,The i-th known trend vector, where the expression with the subscript h is known historical data in the collected sample; and constructs a state distance SD i to measure state cell similarity therefrom:
wherein u=1, 2, …, M is the number of historical samples, α is the coefficient of balance euclidean distance and cosine distance, and the value range is [0,1];
(2) Selecting K neighbors according to similarity measurement results
Calculating state distances between a sample X t+1 to be predicted and all known historical samples, and taking K historical samples V h,1,Vh,2,…,Vh,K with the smallest distance as neighbors;
(3) Calculating a predicted value of a sample to be predicted
Method for calculating predicted value using delta predictionThe label values of the neighbors, i.e. the state distances SD i are gaussian weighted according to the distance size,
For X t, the future state point X t+1 at the time point (t+1) is recorded as y t,
The delta is the difference between y h,j and V h,j for the nearest neighbor, and for the j-th nearest neighbor (j=1, 2, …, K) the expression is:
△yh,j=yh,j-Vh,j (6)
I.e., traffic state variables within a prediction window;
Next, the predicted value at the time (t+1) in the future is calculated by Gaussian weighting Is that
In the weights
2. The method for predicting the traffic state of the dynamic short-time road network based on the optimization of the depth certainty strategy gradient algorithm according to claim 1, wherein,
The second step is as follows: the construction of the KNN-based static prediction model further comprises the following steps:
(4) Coarse calibration of model parameters
Calibrating undetermined parameters delta, K and alpha in the KNN-based static prediction model by utilizing a gridding search experiment of the acquired real data, wherein the calibration method specifically comprises the following steps:
Firstly, establishing an evaluation system of a prediction effect, wherein the evaluation system comprises a root mean square error MAE and an absolute value percentage error MAPE, and obtaining:
wherein N is the number of road segments, N is the number of samples to be predicted in the experiment, The predicted value and the true value of the ith sample to be predicted are respectively;
Secondly, discretizing the value range of the parameter to be calibrated, taking different parameter combinations one by one for experiment, and recording the experimental result;
and selecting the parameter combination with the best experimental result as a calibration value of the model parameters.
3. The method for predicting the traffic state of the dynamic short-time road network based on the optimization of the depth certainty strategy gradient algorithm according to claim 2, wherein,
The method also comprises the following step three: the dynamic optimization parameter alpha based on DDPG algorithm specifically comprises:
The following definitions are made:
State S t: the method comprises the steps of observing an external road network traffic state V t and a prediction model self state P t, namely S t={Vt,Pt, wherein V t is a state unit observed at a time t, and P t is a residual error of the prediction model known to be predicted last time at the time t;
Action a t: the parameter alpha value selected in the decision is different from the KNN model parameter coarse calibration, alpha takes continuous values in [0,1], discretization is not carried out,
Instant prize r t: defining an average index lifting rate after executing the action a t as a reward function by means of a coarse-calibrated static KNN prediction model, and when indexes obtained by executing the action a t are smaller than indexes obtained by the coarse-calibrated model, then the action a t is regarded as effective optimization; conversely, action a t is said to be an ineffective optimization, and when the index obtained by action a t is greater than the index obtained by the coarse calibration model, but not more than 1% of the value thereof, it is said to be incompletely effective, and accordingly, the reward function r t is defined as:
Wherein MAE t、MAE′t is the average absolute value error obtained by the static coarse calibration model and the selective action a t model when the state unit X t is predicted, and similarly MAPE t、MAPE′t is the average absolute value percentage error obtained by the static coarse calibration model and the selective action a t model when the state unit X t is predicted, when a t is valid, r t is known to be positive; when a t is inactive, r t is known to be negative; when a t is not fully effective, r t is found to be 0.
4. The method for predicting the traffic state of the dynamic short-time road network based on the optimization of the depth certainty strategy gradient algorithm according to claim 3, wherein,
The DDPG algorithm training flow is as follows:
(1) Initializing parameters
An Actor-Critic architecture is adopted in DDPG, the Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and specifically, the functions of the Actor and the Critic are realized by a neural network;
The original Actor and Critic networks are regarded as evaluation networks (Online networks), copies of the same Network structure are set to be called Target networks (Target networks), the evaluation networks interact with the environment each time, parameter values are updated, and the Target networks copy the parameter values of the evaluation networks at specified intervals;
Initializing estimated network parameters of an Actor and Critic, recording as θ Q and θ μQ as the estimated network parameters of the Actor, and θ μ as the estimated network parameters of Critic, and copying to a target network, namely θ Q→θQ' and θ μ→θμ';
(2) Experience collection
Firstly, randomly selecting a time point to start prediction by utilizing historical data and a time axis thereof;
Recording the predicted time step as i, wherein i represents any position number, and recording the traffic state V i of the predicted time step in the prediction process of time variation; calculating residual errors according to the prediction result which can be evaluated last time, and taking the residual errors as a prediction model state P i; the parameter alpha used in the KNN prediction model is an action value a i determined by the agent in the time step; evaluating the prediction effect according to the reward function to obtain r i; observing and obtaining a state S i+1 at the next time step, namely i+1; storing (S i,ai,ri,Si+1) as a record in a memory pool for later use;
Continuously repeating the prediction process according to a time axis to complete T predictions, and marking the prediction as a training round;
(3) Experience playback
Given a sample parameter threshold in a memory pool, when the number of sample records in the memory pool exceeds the threshold, a batch training evaluation network is randomly sampled from the sample records in the memory pool, and the evaluation network parameters are updated and recorded according to a certain gradient calculation method and a counter-propagation ruleBeing the gradient of the network parameter θ, the update is:
gradients for network parameters θ Q and θ μ;
when the time steps are appointed at intervals, the target network parameters are updated in a soft mode, specifically:
(4) The specified number of rounds is reached and the training is ended.
5. The method for predicting the traffic state of the dynamic short-time road network based on the optimization of the depth certainty strategy gradient algorithm as set forth in claim 4, wherein,
The deep neural network design mode in DDPG algorithm is as follows:
(1) The Actor network is provided with two input interfaces which are respectively used for inputting a road network traffic state V t and a prediction model state P t, Critic network, and an action value, namely an output value of the Actor network is required to be input on the basis of inputting the road network traffic state and the prediction model state;
(2) The output value of the Actor network is an action value, namely, the parameter alpha in the prediction model is output, the output dimension is 1, and the output layer uses a sigmoid activation function in consideration of the value range of alpha; the output value of the Critic network is Q value, the output dimension is 1, and the linear activation function is used in the output layer because the value range is not defined clearly;
(3) Critic networks update network parameters using a method that minimizes a loss function, i.e., a gradient descent method, whose loss function is in the form of a mean square error, in particular
Wherein Q (s i,aiQ) is the target network output value, Q i is the target value, and the calculation is based on state transition
qi=ri+γQ′(si+1,μ′(si+1μ')|θQ') (14)
Wherein Q 'is a target Q network, and mu' is a target strategy network;
Updating the Actor network is based on calculation of policy gradients, i.e
In order to lose the gradient,Representing the gradient operator, Q (s, a|θ Q) is the output of the online Q function at network parameters θ Q,s=si,a=μ(si).
6. The depth deterministic policy gradient algorithm-optimized dynamic short-time road network traffic state prediction method according to claim 5, further comprising:
step four: prediction experiment verification
In training, monitoring the effectiveness of the model in training by observing word predictive rewards or round progressive rewards; after training, the relative quality of the model is directly verified by calculating and comparing the two evaluation indexes.
CN202111115375.9A 2021-09-23 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method Active CN115938104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115375.9A CN115938104B (en) 2021-09-23 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115375.9A CN115938104B (en) 2021-09-23 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method

Publications (2)

Publication Number Publication Date
CN115938104A CN115938104A (en) 2023-04-07
CN115938104B true CN115938104B (en) 2024-06-28

Family

ID=86699480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111115375.9A Active CN115938104B (en) 2021-09-23 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method

Country Status (1)

Country Link
CN (1) CN115938104B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362418B (en) * 2023-05-29 2023-08-22 天能电池集团股份有限公司 Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery
CN117974366B (en) * 2024-04-01 2024-06-11 深圳市普裕时代新能源科技有限公司 Energy management system based on industrial and commercial energy storage

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3060900A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada System and method for deep reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006285567A (en) * 2005-03-31 2006-10-19 Hitachi Ltd Data processing system of probe traffic information, data processor of probe traffic information, and data processing method of probe traffic information
JP5220542B2 (en) * 2008-10-07 2013-06-26 インターナショナル・ビジネス・マシーンズ・コーポレーション Controller, control method and control program
CN109190797A (en) * 2018-08-03 2019-01-11 北京航空航天大学 A kind of large-scale road network state Forecasting Approach for Short-term based on improvement k arest neighbors
CN112907971B (en) * 2021-02-04 2022-06-10 南通大学 Urban road network short-term traffic flow prediction method based on genetic algorithm optimization space-time residual error model
CN113313947B (en) * 2021-05-31 2022-04-19 湖南大学 Road condition evaluation method of short-term traffic prediction graph convolution network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3060900A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada System and method for deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于时空关联度加权的LSTM短时交通速度预测;刘易诗;关雪峰;吴华意;曹军;张娜;;地理信息世界;20200225(第01期);49-55 *

Also Published As

Publication number Publication date
CN115938104A (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110647900B (en) Intelligent safety situation prediction method, device and system based on deep neural network
CN115938104B (en) Dynamic short-time road network traffic state prediction model and prediction method
CN110675623A (en) Short-term traffic flow prediction method, system and device based on hybrid deep learning
CN111191841A (en) Power load prediction method and device, computer equipment and storage medium
CN111767517B (en) BiGRU multi-step prediction method, system and storage medium applied to flood prediction
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN116721537A (en) Urban short-time traffic flow prediction method based on GCN-IPSO-LSTM combination model
CN115410372B (en) Reliable prediction method for highway traffic flow based on Bayesian LSTM
CN109598052B (en) Intelligent ammeter life cycle prediction method and device based on correlation coefficient analysis
CN113449905A (en) Traffic jam early warning method based on gated cyclic unit neural network
CN116930609A (en) Electric energy metering error analysis method based on ResNet-LSTM model
Ch et al. Groundwater level forecasting using SVM-PSO
CN111967308A (en) Online road surface unevenness identification method and system
CN112036598A (en) Charging pile use information prediction method based on multi-information coupling
CN113268929B (en) Short-term load interval prediction method and device
CN114582131A (en) Monitoring method and system based on intelligent ramp flow control algorithm
CN113159395A (en) Deep learning-based sewage treatment plant water inflow prediction method and system
CN116960962A (en) Mid-long term area load prediction method for cross-area data fusion
CN117074951A (en) Lithium battery state of charge estimation method, device, equipment and readable storage medium
CN116486611A (en) Urban road vehicle speed prediction method
CN113884936B (en) ISSA coupling DELM-based lithium ion battery health state prediction method
CN115330064A (en) Human-machine decision logic online optimization method for highly automatic driving
CN117556681B (en) Intelligent air combat decision method, system and electronic equipment
CN112215520B (en) Multi-target fusion passing method and device, computer equipment and storage medium
CN115034500B (en) Vehicle speed prediction method and system based on dual-attention mechanism network and vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant