CN115938104A - Dynamic short-time road network traffic state prediction model and prediction method - Google Patents

Dynamic short-time road network traffic state prediction model and prediction method Download PDF

Info

Publication number
CN115938104A
CN115938104A CN202111115375.9A CN202111115375A CN115938104A CN 115938104 A CN115938104 A CN 115938104A CN 202111115375 A CN202111115375 A CN 202111115375A CN 115938104 A CN115938104 A CN 115938104A
Authority
CN
China
Prior art keywords
network
prediction
time
value
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111115375.9A
Other languages
Chinese (zh)
Other versions
CN115938104B (en
Inventor
任毅龙
姜涵
于海洋
晁文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111115375.9A priority Critical patent/CN115938104B/en
Priority claimed from CN202111115375.9A external-priority patent/CN115938104B/en
Publication of CN115938104A publication Critical patent/CN115938104A/en
Application granted granted Critical
Publication of CN115938104B publication Critical patent/CN115938104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Traffic Control Systems (AREA)

Abstract

The invention discloses a dynamic short-time network traffic state prediction method based on deep certainty strategy gradient algorithm optimization. According to the method, vector type short-time traffic state expression is considered in the KNN prediction model, and the model has higher flexibility in processing scenes of rapid and slow traffic state change, conventional traffic evolution and unconventional traffic evolution. Through dynamic optimization of the DDPG algorithm, the short-time traffic state prediction model can be dynamically adjusted and predicted by changing from simple static prediction along with time evolution into the insight traffic state evolution, so that the problem that static and semi-static models can only fit historical data and rules and cannot rapidly adapt to sudden and random changes of real-time traffic states in the prior art is solved, and the prediction precision is further improved.

Description

Dynamic short-time road network traffic state prediction model and prediction method
Technical Field
The invention belongs to the field of traffic big data technology and application, relates to short-time short-circuit network traffic state prediction, and particularly relates to a dynamic short-time short-circuit network traffic state prediction model optimized based on a deep certainty strategy gradient algorithm.
Background
The short-time road network traffic state prediction has important practical functions in an intelligent traffic system and a future-oriented intelligent vehicle-road cooperative system, and is a premise for performing other real-time traffic services such as on-road path guidance and path decision. Therefore, the quality of many basic traffic services is affected by the level of accuracy of the short-term network traffic state prediction.
In recent years, with the diversification development of traffic detectors and the promotion of data storage devices, the research on traffic data acquisition technology and related applications thereof has advanced greatly. Correspondingly, the short-time network traffic state prediction algorithm driven by traffic big data is also endless, and mainly comprises a shallow learning (traditional machine learning) model represented by K nearest neighbors, a support vector machine, a decision tree and the like, and a deep learning model represented by a long-time and short-time memory network, a convolutional neural network and a combination modeling of the long-time and short-time memory network and the convolutional neural network.
However, these two types of models are often trained and constructed with abundant historical data, i.e. the main architecture and hyper-parameters of the model are determined, and no longer adjusted with or only at certain periods during the application process of the model. The 'static' and 'semi-static' models can only better fit historical data and laws, but cannot adapt to sudden and random changes of real-time traffic states quickly and make corresponding adjustments in time.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a dynamic short-time network traffic status prediction model and prediction method,
the complete technical scheme of the invention comprises the following steps:
a dynamic short-time network traffic state prediction method based on deep certainty strategy gradient algorithm optimization comprises the following steps:
the method comprises the following steps: data collection and processing
(1) The time, the position and the vehicle instantaneous speed information uploaded to a superior system by a vehicle-mounted GPS device in a specified time interval are collected,
(2) Calculating to obtain the average value v (t) of the vehicle speed on a certain section l at the moment t, wherein the average values at the moment t and before the moment t are known values,
(3) Get a section of road l 1 ,l 2 ,…,l n Formed road section network corresponding average speed value set V t = (v 1 (t),v 2 (t),…,v n (t)), n is the number of road segments;
(4) Aggregating the observed values of delta-1 moments before the moment t into a space-time matrix X by utilizing the observed values of the moment t and the moment before the moment t t ,X t Indicating the traffic state at time t, X t As shown in the publication (1):
Figure RE-GDA0003498659920000021
/>
(5) To X t Processing is carried out, and for each single reference state point, a trend vector of the reference state point is calculated
Figure RE-GDA0003498659920000022
And defines traffic state unit X 'in vector mode' t I.e. by
Figure RE-GDA0003498659920000023
Wherein
Figure RE-GDA0003498659920000024
The reference state point here refers to the space-time matrix X t The value is known at a particular time.
Step two: construction of KNN-based static prediction model
(1) Respectively adopting Euclidean distance to reference state point V t The distance ED between i Performing a metric, and using the cosine distance to the trend vector
Figure RE-GDA0003498659920000025
Distance between CD i Performing measurement, wherein the expression is as follows:
Figure RE-GDA0003498659920000026
Figure RE-GDA0003498659920000027
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0003498659920000028
for the ith known reference status point data, <' >>
Figure RE-GDA0003498659920000029
The ith known trend vector is represented by a subscript h in the formula as known historical data in the collected samples; and thus construct a state distance SD for measuring the similarity of state units i
Figure RE-GDA00034986599200000210
Wherein u =1,2, …, M and M are historical sample numbers, α is a coefficient for balancing euclidean distance and cosine distance, and the value range is [0,1];
(2) Selecting K neighbors according to similarity measurement results
Calculating a sample X to be predicted t+1 And all known state distances between the historical samples, and K historical samples V with the minimum distance h,1 ,V h,2 ,…,V h,K As neighbors;
(3) Calculating a prediction value of a sample to be predicted
Method for calculating predicted value by using incremental prediction
Figure RE-GDA00034986599200000211
Tag value to neighbor, i.e. state distance SD i The gaussian weighting is performed depending on the magnitude of the distance,
for X t Recording the future state point X at the (t + 1) time t+1 Is y t
The increment is y of the nearest neighbor h,j And V h,j The difference between, for the j-th neighbor (j =1,2, …, K) the expression:
△y h,j =y h,j -V h,j (6)
i.e. the traffic state variation within the prediction window;
secondly, calculating a predicted value by Gaussian weighting
Figure RE-GDA0003498659920000031
Is composed of
Figure RE-GDA0003498659920000032
In the formula, the weight
Figure RE-GDA0003498659920000033
Step two, in the construction of the KNN-based static prediction model, the method further comprises the following steps:
(4) Coarse calibration of model parameters
Calibrating undetermined parameters delta, K and alpha in the KNN-based static prediction model by utilizing a gridding search experiment of the acquired real data, and specifically comprising the following steps:
firstly, establishing an evaluation system of a prediction effect, including a root mean square error MAE and an absolute value percentage error MAPE, to obtain:
Figure RE-GDA0003498659920000034
Figure RE-GDA0003498659920000035
wherein N is the number of road sections, N is the number of samples to be predicted in the experiment,
Figure RE-GDA0003498659920000036
respectively obtaining a predicted value and a true value of the ith sample to be predicted;
secondly, discretizing the value range of the parameters to be calibrated, taking different parameter combinations one by one for carrying out experiments, and recording the experiment results;
and selecting the parameter combination with the optimal experimental result as the calibration value of the model parameter.
The method also comprises the third step: the parameter alpha is dynamically optimized based on the DDPG algorithm. The method specifically comprises the following steps:
the following definitions are made:
state S t : including observed external road network traffic state V t And predicting the model's own state P t I.e. S t ={V t ,P t In which V is t For the state unit observed at time t, P t The residual error of the last prediction is known for the prediction model at the moment t;
action a t : the parameter alpha value selected in the decision-making is different from the rough calibration of KNN model parameter, and alpha is [0,1]]The continuous values in the inner are not discretized.
Instant reward r i : defining the execution of action a by means of a coarsely scaled static KNN prediction model t After averageIndex lifting rate as reward function when executing action a t If the obtained indexes are all smaller than those obtained by the rough calibration model, the action a is called to be taken t Is an effective optimization; otherwise, it is said to take action a t Is an optimization of invalidity when taking action a t If the obtained index is larger than the index obtained by the rough calibration model but not more than 1% of the value, the obtained index is called as not completely effective. Accordingly, the reward function is defined as:
Figure RE-GDA0003498659920000041
wherein, MAE t 、MAE′ t Is paired with a state unit X t Static rough calibration model and selection action a during prediction t The mean absolute value error obtained by the model has the mean absolute value percentage error MAPE t 、MAPE′ t . When a is t When effective, r is known t Is positive; when a is t When it is invalid, r is known t Is negative; when a is t When not completely effective, r is known t Is 0.
The DDPG algorithm training process comprises the following steps:
(1) Initialization parameters
The DDPG adopts an Actor-Critic architecture, the Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and particularly, the functions of the Actor and the Critic are realized by a neural network;
the original Actor and Critic networks are regarded as estimation networks (Online networks), the same Network structure copy is set to be called as a Target Network (Target Network), the estimation Network updates the parameter value when interacting with the environment every time, and the Target Network copies the parameter value of the estimation Network at the specified interval;
firstly, the estimated network parameters of Actor and Critic are initialized and recorded as theta Q And theta μ ,θ Q To estimate network parameters, θ, for Actor μ Network parameters are estimated for Critic and copied to the target network, i.e., theta Q →θ Q’ And theta μ →θ μ’
(2) Experience collection
Firstly, randomly selecting a time point to start prediction by using historical data and a time axis thereof;
recording the time step of the prediction as i, wherein i represents any position number, and recording the traffic state V of the time step of the prediction in the prediction process along with the time change i (ii) a Calculating residual error as prediction model state P according to the prediction result which can be evaluated at the last time i (ii) a The parameter alpha used in the KNN prediction model is the action value a decided by the intelligent agent at the time step i (ii) a Evaluating the predicted effect according to the reward function to obtain r i (ii) a The state S is observed at the next time step, i.e. at the moment i +1 i+1 (ii) a Will (S) i ,a i ,r i ,S i+1 ) As a record, storing in a memory pool for later use;
continuously repeating the prediction process according to a time axis to finish T times of predictions, and recording as a training round;
(3) Empirical playback
Giving a threshold value of the sample parameters in the memory pool, randomly sampling a batch of training estimation networks from the memory pool when the number of the sample records in the memory pool exceeds the threshold value, updating the estimation network parameters and recording the parameters according to a certain gradient calculation method and a certain back propagation rule
Figure RE-GDA0003498659920000042
For the gradient of the network parameter θ, the more recent is:
Figure RE-GDA0003498659920000043
Figure RE-GDA0003498659920000051
as a network parameter theta Q And theta μ A gradient of (a);
when the interval appoints a time step, the target network parameters are subjected to soft update, specifically:
Figure RE-GDA0003498659920000052
(4) And finishing the training when the specified number of rounds is reached.
The deep neural network design mode in the DDPG algorithm is as follows:
(1) The Actor network has two input interfaces respectively used for inputting the traffic state V of the road network t And predicting model state P t . On the basis of inputting the traffic state of a road network and the state of a prediction model, the Critic network also needs to input an action value, namely an output value of an Actor network;
(2) The output value of the Actor network is an action value, namely a parameter alpha in a prediction model, the output dimension is 1, and the output layer uses a sigmoid activation function in consideration of the alpha value range; the output value of the Critic network is a Q value, the output dimension is 1, and the value range of the Critic network is not clearly defined, so that the output layer uses a linear activation function;
(3) Critic networks use a method of minimizing a loss function, i.e. a gradient descent method, to update network parameters, the loss function being in the form of a mean square error, in particular
Figure RE-GDA0003498659920000053
Wherein, Q(s) i ,a iQ ) Output value, q, for the target network i Is a target value whose calculation is based on a state transition
q i =r i +γQ′( s i +1 ,μ′(s i+1μ’ )|θ Q’ ) (14)
In the formula, Q 'is a target Q network, and mu' is a target strategy network;
updating of the Actor network is based on calculation of policy gradients, i.e.
Figure RE-GDA0003498659920000054
/>
Figure RE-GDA0003498659920000055
For a loss of gradient>
Figure RE-GDA0003498659920000056
Representing a gradient operator, Q (s, a | θ) Q ) For online Q function at network parameter theta Q ,s=s i ,a=μ(s i ) The output of the time.
The method also comprises the following four steps: verification of prediction experiment
In training, monitoring the effectiveness of the model in training by observing word prediction rewards or turn cumulative rewards; after the training is finished, the relative quality of the model can be directly verified by calculating and comparing the two evaluation indexes.
Compared with the prior art, the invention has the advantages that:
(1) In the KNN prediction model, a vector type short-time traffic state expression and a state distance measurement method of Euclidean distance and cosine distance fusion are considered, and the model has higher flexibility in processing scenes of rapid traffic state change, conventional and non-conventional traffic evolution.
(2) Through dynamic optimization of the DDPG algorithm, the short-time traffic state prediction model can be dynamically adjusted and predicted by converting simple static prediction along with time evolution into insight traffic state evolution, and the prediction precision is further improved.
Drawings
Fig. 1 is a flow of model construction for the dynamic short-time network traffic state prediction model and the prediction method of the present invention.
FIG. 2 is a main model structure formed by Agent and Actor-Critic in DDPG algorithm.
FIG. 3 is a diagram of the change in the round jackpot during the training of the DDPG algorithm of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
As shown in FIG. 1, the invention provides a dynamic short-term network traffic state prediction model optimized based on a depth certainty strategy gradient algorithm, and a method for predicting a road dynamic short-term network traffic state by using the model. The model mainly comprises a KNN-based static prediction model and a dynamic optimization part based on a DDPG algorithm, and the specific implementation way of the prediction method comprises the following four steps: the method comprises the following steps of data collection and processing, KNN-based static prediction model construction, DDPG algorithm dynamic optimization part construction and training and prediction experiment verification, and specifically comprises the following steps:
the method comprises the following steps: data collection and processing
The invention uses floating car data, which in this example originates from a taxi-onboard GPS device collected in the beijing workers stadium area at 7 months in 2015, which uploads real-time, vehicle location, and instantaneous speed information to a superordinate system at 2 minute intervals. For a given known road network, the collected data can be used to characterize the traffic state of the road network at various times. Taking the vehicle speed value as an example, for a moment t, the average speed value v (t) of the vehicle on a certain road section l can be calculated; in the section of road l 1 ,l 2 ,…,l n (n represents the number of links, n =257 in the present embodiment) the corresponding average speed value on the road network is V t =(v 1 (t),v 2 (t),…,v n (t)), this set of velocity values V is called t The road network traffic state at time t. Because the short-time traffic state change has continuity, the trend of the traffic state change is often difficult to express by using the observation value at a single moment, so the observation values at delta moments before the moment t are aggregated into a space-time matrix to express the traffic state at the moment t, as shown in a publication (1):
Figure RE-GDA0003498659920000061
in the formula, X t Aggregating the observed values of delta moments before the moment t into a space-time matrix, wherein v (t) is the vehicle position on a certain road section lThe average speed value at the time t; n represents the number of links.
For distinguishing the above concepts, called V t Is a traffic state point at time t, X t Is a traffic state sequence at the time t. For X t Recording the future state point at (t + 1) time as y t And (X) t ,y t ) Is a set of sample pairs. Short-term traffic condition prediction in the present invention uses X t According to y, to t And (6) performing prediction.
In research, the form of the space-time matrix completely represents the traffic state of the road network at each time step, but the following problems exist: on one hand, the state evolution is not outstanding enough along with the time change, and the multi-dimensional time sequence arrangement may obscure some information to cause misjudgment, for example, the Euclidean distances from the origin to (1,2,3,4) and to (4,3,2,1) in the four-dimensional space are the same; on the other hand, in the form of the spatio-temporal matrix, by aggregating a plurality of time step data, the overall data dimension is multiplied, and more resources are consumed during calculation, even dimension disaster is induced, resulting in reduction of result precision.
Thus, the present invention is directed to X t Further processing is performed to instead use a trend vector that yields Vt from a single reference state point Vt
Figure RE-GDA0003498659920000071
The expression defines the traffic state in a vector defining mode, simply depicts the evolution situation of the traffic state and the evolution result thereof, called as a traffic state unit, namely
Figure RE-GDA0003498659920000072
Wherein
Figure RE-GDA0003498659920000073
Step two: KNN-based static prediction model construction
The static short-time prediction model based on KNN mainly selects K most similar samples called as neighbors by calculating the similarity between the samples to be predicted and known samples; and then reasonably speculating the label of the sample to be predicted by using the adjacent label, namely completing the prediction. The method specifically comprises the following construction steps:
(1) Defining a similarity metric function
For two samples, the similarity between them is often measured by calculating their distance: the smaller the distance between the two is, the higher the similarity between the two is; conversely, the lower the similarity. For a traffic state unit formed by two data forms, the invention defines a method for fusing two distance measurement modes, which respectively utilize Euclidean distance to the distance ED between reference state points i Performing a measurement, and using the cosine distance to the distance CD between trend vectors i Perform a measurement, i.e.
Figure RE-GDA0003498659920000074
Figure RE-GDA0003498659920000075
In the formula (I), the compound is shown in the specification,
Figure RE-GDA0003498659920000076
for the ith known reference status point data, <' >>
Figure RE-GDA0003498659920000077
The known trend vector with a subscript h in the formula is known historical data in the collected samples; and thus construct a state distance SD for measuring the similarity of state units i
Figure RE-GDA0003498659920000078
Wherein u =1,2, …, M and M are historical sample numbers, α is a coefficient for balancing euclidean distance and cosine distance, and the value range is [0,1].
From equation (5) and the above definitions, the value of α determines the tendency of the state distance to both Euclidean and cosine distances, i.e., SD → 0 i ≈CD i The evolution trend of the state plays a decisive factor in the similarity measurement of the state units, and the situation is usually suitable for the situation that the traffic evolution trend is very remarkable in characteristic, such as short-time sudden change or almost constant maintenance of the traffic state; when α → 1, SD i ≈ED i Whether the state units are similar or not is explained to be more dependent on the reference state point of the final result, and the evolution forming process is not important, and the situation can be generally understood as the situation that the traffic evolution trend is more conventional.
(2) And selecting K neighbors according to the similarity measurement result.
Calculating the state distance between the sample to be predicted and all known historical samples, and calculating K historical samples V with the minimum distance h,1 ,V h,2 ,…,V h,K As neighbors.
(3) And calculating the predicted value of the sample to be predicted.
The invention calculates the predicted value by using an incremental prediction mode and carries out Gaussian weighting on the adjacent label value according to the distance.
First, y is defined to be the nearest neighbor in increments h,j And V h,j The difference between them is found for the j-th neighbor (reordering subscript j =1,2, …, K)
△y h,j =y h,j -V h,j (6)
I.e., the amount of traffic state change within the prediction window.
Secondly, calculating a predicted value as
Figure RE-GDA0003498659920000081
Wherein, in view of the magnitude of the state distance value and the purpose of simplification, the weight can be set
Figure RE-GDA0003498659920000082
(4) And (5) roughly calibrating model parameters.
In order to realize the prediction function, undetermined parameters in the prediction model are calibrated, including delta, K and alpha, and the specific mode is a gridding search experiment using real data. The K search interval in this example is 5-120, with a spacing of 5; the alpha search interval is 0-1 and the spacing is 0.1.
First, an evaluation system for the prediction effect should be established, including the mean square error (MAE) and the mean square percent error (MAPE), i.e., MAPE
Figure RE-GDA0003498659920000083
Figure RE-GDA0003498659920000091
Wherein N is the number of samples to be predicted in the experiment,
Figure RE-GDA0003498659920000092
respectively is the predicted value and the real value of the ith sample to be predicted.
And discretizing the value range of the parameters to be calibrated, taking different parameter combinations one by one to perform experiments, and recording the experiment results.
And finally, selecting the parameter combination with the optimal experimental result as the calibration value of the model parameter. The calibration values at different prediction steps in this example are specifically as follows:
Figure RE-GDA0003498659920000093
step three: and constructing and training a dynamic optimization part based on the DDPG algorithm.
Through the second step, the prediction model based on the KNN can already complete the expected prediction task, but from the aspect of a parameter calibration method, the model can only carry out static prediction, and the influence caused by objective change and change of short-time traffic flow is ignored by the static model, which reflects that parameters K and alpha which are closely related to a time-varying traffic state in the model are not changed correspondingly any more in the prediction process. Therefore, the invention is further based on the dynamic optimization part construction and training of the DDPG algorithm, the method protected by the invention is not limited to the method adopted by the embodiment, and a person skilled in the art can adopt other feasible methods to carry out the dynamic optimization part construction and training, but the method adopted by the embodiment is a reasonable optimization method at present, and the method of the embodiment is specifically described below.
Calibration experiments prove that the sensitivity of the K value to the influence of the result is effectively inhibited by the Gaussian weighting method, namely the prediction error can be almost always gradually reduced along with the increase of the K. In other words, if it is desired to achieve the purpose of improving accuracy by dynamically adjusting the parameters of the prediction model, the parameter K only needs to be selected with a larger value. However, for the parameter α, the calibration method is contrary to the original purpose of setting the parameter α, that is, the model is compatible with regular and irregular changes of the short-term traffic flow by adjusting the parameter α, and the general calibration method does not have strong flexibility and adaptability.
Therefore, the invention provides a method for dynamically optimizing the parameter alpha through a deep reinforcement learning algorithm. Reinforcement learning is a special machine learning algorithm, which takes a Markov Decision Process (MDP) as a basic modeling idea and mainly includes elements such as a state S, an action a, and a reward r. An Agent constructed by a reinforcement learning algorithm carries out a series of interactions with the environment to complete a sequential decision process, and through continuous self-learning, the Agent can carry out actions capable of maximizing rewards in the face of different states in the environment. The Deep Deterministic Policy Gradient (DDPG) algorithm is a reinforcement learning algorithm combining Deep learning and an Actor-critical architecture, has the advantages of being capable of dealing with continuous state space and continuous action space, and more suitable for the reality problem of outputting high-dimensional continuous actions, so that the dynamic adaptability of the model to a complex environment is improved, and dynamic continuous decision is realized. The method specifically comprises the following steps:
1. elements such as state S, action a, reward r and the like are defined.
First, a problem definition is made explicit, i.e. the dynamic optimization process is modeled with a markov decision process. In the process of rolling prediction of the model, the value problem of the parameter alpha in each prediction process in the prediction model based on the KNN is regarded as a Markov decision, the decision value does not depend on the decision value in the past prediction, but only depends on the road network traffic state observed at present and the prediction effect of the prediction model, and the Markov property is met.
From the modeling described above, the following definitions can be made:
state S t : including observed external road network traffic state V t And predicting the model's own state P t I.e. S t ={V t ,P t In which V is t For the state unit observed at time t, P t The residual of the last prediction is known to the prediction model for time t.
Action a t : the parameter alpha value selected in the decision-making is different from the rough calibration of KNN model parameter, and alpha is [0,1]]The continuous values in the inner are not discretized.
(immediate) award r t : defining the execution of action a by means of a coarsely scaled static KNN prediction model t The latter average index lift rate is used as a reward function. Before the function is given, firstly, a rule for qualitatively evaluating the prediction effect is defined: when action a is performed t If the obtained indexes are all smaller than those obtained by the rough calibration model, the action a is called to be taken t Is an effective optimization; otherwise, it is said to take action a t Is an inefficient optimization. To speed up algorithm convergence, the present invention chooses to be gracefully tolerated, i.e., when action a is taken t If the obtained index is larger than the index obtained by the rough calibration model but not more than 1% of the value, the obtained index is called as not completely effective. Accordingly, the reward function is defined as:
Figure RE-GDA0003498659920000101
wherein, MAE t 、MAE′ t Is paired with a state unit X t Static rough calibration model and selection action a during prediction t The mean absolute value error obtained by the model has the mean absolute value percentage error MAPE t 、MAPE′ t . When a is t When effective, r is known t Is positive; when a is t When it is invalid, r is known t Is negative; when a is t When not completely effective, r is known t Is 0. This means that when action a is performed t
And 2. Designing a DDPG algorithm training flow.
In combination with the application scenario of short-term traffic state prediction, referring to the classic DDPG algorithm, as shown in FIG. 2, the invention proposes a specific training process as follows:
(1) And initializing parameters.
The agent in the DDPG adopts an Actor-criticic framework, and integrates two methods of value-based and strategy-based. The Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and particularly, the functions of the Actor and the Critic are realized by a neural network. In order to improve the stability of the algorithm, an original Actor and a Critic Network are regarded as estimation networks (Online networks), a copy of the same Network structure is set to be called a Target Network (Target Network), the estimation networks interact with the environment every time, namely, parameter values are updated, and the Target Network copies the parameter values of the estimation networks at specified intervals.
Therefore, the estimated network parameters of Actor and Critic are initialized and recorded as θ Q And theta μ Then copied to the target network, i.e. theta Q →θ Q’ And theta μ →θ μ’
(2) And (4) collecting experiences.
First, a time point is randomly selected to start prediction using the history data and the time axis thereof.
If the predicted time step is recorded as i, the traffic state V of the predicted time step can be recorded in the prediction process along with the change of time i (ii) a Calculating residual error as prediction model state P according to the prediction result which can be evaluated at the last time i (ii) a The parameter alpha used in the KNN prediction model is the action value a decided by the intelligent agent at the time step i (ii) a Evaluating the predicted effect according to the reward function to obtain r i (ii) a In addition, the known state S is observed at the next time step, i.e., at time i +1 i+1 . Will (S) i ,a i ,r i ,S i+1 ) As a record, stored in the memory pool for use.
And continuously repeating the prediction process according to a time axis to finish T times of prediction, and recording as a training round. In this example, T = 100.
(3) And (4) performing experience playback.
As the training progresses, a greater amount of experience is accumulated in the memory pool. Giving a threshold value delta, randomly sampling a batch of training estimation networks from the memory pool when the number of sample records in the memory pool exceeds delta, updating estimation network parameters and recording the parameters according to a certain gradient calculation method and a certain back propagation rule
Figure RE-GDA0003498659920000111
For the gradient of the network parameter θ, the update formula is:
Figure RE-GDA0003498659920000112
/>
when the time step is appointed at the interval, the target network parameters are updated, the updating method is soft updating, and the method specifically comprises the following steps:
Figure RE-GDA0003498659920000113
(4) And finishing the training when the specified number M of rounds is reached. In this example, take M =4000.
Deep neural network design in DDPG algorithm.
The estimation network of Actor and Critic and a series of deep neural networks formed by the target network are key parts of the strategy learned in the DDPG algorithm. Since the target network is a copy of the estimation network, that is, the structure of the target network is consistent with the estimation network, the estimation network of Actor and Critic only needs to be designed as follows. Although Automated Machine Learning (AutoML), and in particular Neural network Search technology (NAS), has recently received extensive attention and research to provide the possibility of Automated design of deep Neural networks, certain manual rules and limitations should still be given to the specific functions and structures of the networks in the design. In combination with the purpose and the scene of using the Actor and the Critic network, namely considering network input and output, an error function and the like, the invention makes the following requirements on the design:
(1) The Actor network has two input interfaces for inputting the traffic state V of the road network t And predicting model state P t . On the basis of inputting the traffic state of the road network and the state of the prediction model, the criticic network also needs to input an action value, namely an output value of the Actor network.
(2) The output value of the Actor network is an action value, namely a parameter alpha in the prediction model, so that the output dimension is 1, and the sigmoid activation function is used by an output layer in consideration of the value range of the alpha. The output value of the Critic network is a Q value, the output dimension is 1, and the value range of the Critic network is not clearly defined, so that the output layer uses a linear activation function.
(3) Critic networks use a method of minimizing a loss function, i.e. a gradient descent method, to update network parameters, the loss function being in the form of a mean square error, in particular
Figure RE-GDA0003498659920000121
Wherein, Q(s) i ,a iQ ) Output value, q, for the target network i Is a target value whose calculation is based on a state transition
q i =r i +γQ′(s i+1 ,μ′(s i+1μ’ )|θ Q’ ) (14)
Updating of the Actor network is based on calculation of policy gradients, i.e.
Figure RE-GDA0003498659920000122
Step four: and (5) verifying by a prediction experiment.
It is crucial to verify that the training and the derived model are valid. The root mean square error MAE and the absolute value percentage error MAPE form a relatively comprehensive evaluation index system, and the reward function is defined as the average lifting percentage of the DDPG optimization on the KNN prediction model in the two indexes. Thus, in training, the effectiveness of the model in training can be monitored by observing word prediction rewards or turn accumulation rewards; after the training is finished, the relative quality of the model can be directly verified by calculating and comparing the two evaluation indexes.
This example is illustrated with the monitored running total and rewards as shown in FIG. 3. It can be seen from the black trend line in the figure that, when training is started, the intelligent agent obtains a low reward, even a negative value, and the reward is positive after several rounds, although a round with a negative value still appears, the reward value can be completely guaranteed to be positive as the training is gradually promoted, and the value of the trend line is about 100.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (6)

1. A dynamic short-time network traffic state prediction method based on deep certainty strategy gradient algorithm optimization is characterized by comprising the following steps:
the method comprises the following steps: data collection and processing
(1) Collecting the time, position and vehicle instantaneous speed information uploaded to the superior system by the vehicle-mounted GPS device in a specified time interval,
(2) Calculating to obtain the average value v (t) of the vehicle speed on a certain section l at the moment t, wherein the average values at the moment t and before the moment t are known,
(3) By the above method, a route section l is obtained 1 ,l 2 ,…,l n Formed road section network corresponding average speed value set V t =(v 1 (t),v 2 (t),…,v n (t)), n is the number of road segments;
(4) Calculating and aggregating the acquired data at the time t and delta-1 times before the time t into a space-time matrix X t ,X t Indicating the traffic situation at time t, X t As shown in the publication (1):
Figure RE-FDA0003498659910000011
(5) To X t Processing is performed to calculate a trend vector for generating a reference point for each single reference state point
Figure RE-FDA0003498659910000012
And defines a traffic state unit X 'in a vector mode' t I.e. by
Figure RE-FDA0003498659910000013
Wherein
Figure RE-FDA0003498659910000014
Step two: construction of KNN-based static prediction model
(1) Respectively adopting Euclidean distance to reference state point V t The distance ED between i Performing a metric, and using the cosine distance to the trend vector
Figure RE-FDA0003498659910000015
Distance between CD i Performing measurement, wherein the expression is as follows:
Figure RE-FDA0003498659910000016
Figure RE-FDA0003498659910000017
in the formula (I), the compound is shown in the specification,
Figure RE-FDA0003498659910000018
for the ith known reference status point data,/>>
Figure RE-FDA0003498659910000019
The ith known trend vector is represented by a subscript h in the formula as known historical data in the collected samples; and thus construct a state distance SD for measuring the similarity of state units i
Figure RE-FDA00034986599100000110
Wherein u =1,2, …, M and M are historical sample numbers, α is a coefficient for balancing euclidean distance and cosine distance, and the value range is [0,1];
(2) Selecting K neighbors according to similarity measurement results
Calculating a sample X to be predicted t+1 And all known state distances among the historical samples, and K historical samples V with the minimum distance h,1 ,V h,2 ,…,V h,K As neighbors;
(3) Calculating a prediction value of a sample to be predicted
Computing predicted values using a method of incremental prediction
Figure RE-FDA0003498659910000028
Tag value to neighbor, i.e. state distance SD i Gaussian weighting according to distance>
For X t Remember itFuture state point X at time (t + 1) t+1 Is y t
The increment is y of the nearest neighbor h,j And V h,j The difference between them, for the j-th neighbor (j =1,2, …, K) is expressed as:
△y h,j =y h,j -V h,j (6)
namely the traffic state variation in the prediction window;
secondly, the predicted value at the (t + 1) moment in the future is calculated through Gaussian weighting
Figure RE-FDA0003498659910000027
Is composed of
Figure RE-FDA0003498659910000021
In the formula, weight
Figure RE-FDA0003498659910000022
2. The method for predicting the traffic state of the dynamic short-time network based on the deep deterministic strategy gradient algorithm optimization of claim 1,
the second step is as follows: in constructing the static prediction model based on the KNN, the method further comprises the following steps:
(4) Coarse calibration of model parameters
Calibrating undetermined parameters delta, K and alpha in the KNN-based static prediction model by utilizing a gridding search experiment of the acquired real data, wherein the parameters are as follows:
firstly, establishing an evaluation system of a prediction effect, including a root mean square error MAE and an absolute value percentage error MAPE, to obtain:
Figure RE-FDA0003498659910000023
Figure RE-FDA0003498659910000024
wherein N is the number of road sections, N is the number of samples to be predicted in the experiment,
Figure RE-FDA0003498659910000025
respectively obtaining a predicted value and a true value of the ith sample to be predicted;
secondly, discretizing the value range of the parameters to be calibrated, taking different parameter combinations one by one for carrying out experiments, and recording the experiment results;
and selecting the parameter combination with the optimal experimental result as the calibration value of the model parameter.
3. The method for predicting traffic state of short-time dynamic short-circuit network based on deep deterministic strategy gradient algorithm optimization of claim 2,
the method also comprises the following third step: the parameter alpha is dynamically optimized based on the DDPG algorithm. The method specifically comprises the following steps:
the following definitions are made:
state S t : including observed external road network traffic state V t And predicting the model's own state P t I.e. S t ={V t ,P t In which V is t For the state unit observed at time t, P t The residual error of the last prediction is known for the prediction model at the moment t;
action a t : the parameter alpha value selected in the decision-making is different from the rough calibration of KNN model parameter, and alpha is [0,1]]The continuous values in the inner are not discretized.
Instant award r t : defining the execution of action a by means of a coarsely scaled static KNN prediction model t The average index lifting rate is used as a reward function when the action a is executed t If the obtained indexes are all smaller than those obtained by the rough calibration model, the action a is called to be taken t Is an effective optimization; otherwise, it is said to take action a t Is an optimization of invalidity when taking action a t If the obtained index is larger than the index obtained by the rough calibration model but not more than 1% of the value, the obtained index is called as not completely effective. Accordingly, a reward function r is defined t Comprises the following steps:
Figure RE-FDA0003498659910000031
wherein, MAE t 、MAE′ t Are respectively paired with state unit X t Static rough calibration model and selection action a during prediction t Mean absolute error of model acquisition, similarly MAPE t 、MAPE′ t Are respectively paired with state unit X t Static rough calibration model and selection action a during prediction t The mean absolute value percentage error the model takes. When a is t When effective, r is known t Is positive; when a is t When it is invalid, r is known t Is negative; when a is t When not completely effective, r is known t Is 0.
4. The method for predicting traffic state of short-time dynamic short-circuit network based on deep deterministic strategy gradient algorithm optimization of claim 3,
the DDPG algorithm training process comprises the following steps:
(1) Initialization parameters
An Actor-Critic architecture is adopted in the DDPG, the Actor is responsible for outputting actions, interacting with the environment and learning strategies, the Critic is responsible for evaluating the actions and improving the strategies, and particularly, the functions of the Actor and the Critic are realized by a neural network;
the original Actor and Critic networks are regarded as estimation networks (Online networks), the same Network structure copy is set to be called as a Target Network (Target Network), the estimation Network updates the parameter value when interacting with the environment every time, and the Target Network copies the parameter value of the estimation Network at the specified interval;
firstly, initializing the estimated network parameters of Actor and Critic, and recording the parameters as theta Q And theta μ ,θ Q To estimate network parameters, θ, for Actor μ Network parameters are evaluated for criticic and copied to the target network, i.e., theta Q →θ Q’ And theta μ →θ μ’
(2) Experience collection
Firstly, randomly selecting a time point to start prediction by using historical data and a time axis thereof;
recording the time step of the prediction as i, wherein i represents any position number, and recording the traffic state V of the time step of the prediction in the prediction process along with the time change i (ii) a Calculating residual error as prediction model state P according to the prediction result which can be evaluated at the last time i (ii) a The parameter alpha used in the KNN prediction model is the action value a decided by the intelligent agent at the time step i (ii) a Evaluating the predicted effect according to the reward function to obtain r i (ii) a The state S is observed at the next time step, i.e. at the moment i +1 i+1 (ii) a Will (S) i ,a i ,r i ,S i+1 ) As a record, storing in a memory pool for later use;
continuously repeating the prediction process according to a time axis to finish T times of predictions, and recording as a training round;
(3) Empirical playback
Giving a threshold value of the sample parameters in the memory pool, randomly sampling a batch of training estimation networks from the memory pool when the number of the sample records in the memory pool exceeds the threshold value, updating the estimation network parameters and recording the parameters according to a certain gradient calculation method and a certain back propagation rule
Figure RE-FDA0003498659910000041
For the gradient of the network parameter θ, the more recent is:
Figure RE-FDA0003498659910000042
Figure RE-FDA0003498659910000043
as a network parameter theta Q And theta μ A gradient of (a);
when the interval appoints a time step, the target network parameters are subjected to soft update, specifically:
Figure RE-FDA0003498659910000044
(4) And finishing the training when the specified number of rounds is reached.
5. The method for predicting traffic state of short-time dynamic short-circuit network based on deep deterministic strategy gradient algorithm optimization of claim 4,
the deep neural network design mode in the DDPG algorithm is as follows:
(1) The Actor network has two input interfaces for inputting the traffic state V of the road network t And predicting model state P t . On the basis of inputting the traffic state of a road network and the state of a prediction model, the Critic network also needs to input an action value, namely an output value of an Actor network;
(2) The output value of the Actor network is an action value, namely a parameter alpha in a prediction model, the output dimension is 1, and the output layer uses a sigmoid activation function in consideration of the alpha value range; the output value of the Critic network is a Q value, the output dimension is 1, and the value range of the Critic network is not clearly defined, so that the output layer uses a linear activation function;
(3) Critic networks use a method of minimizing a loss function, i.e. a gradient descent method, to update network parameters, the loss function being in the form of a mean square error, in particular
Figure RE-FDA0003498659910000045
Wherein, Q(s) i ,a iQ ) For the target network output value, q i Is a target value whose calculation is based on a state transition
q i =r i +γQ′(s i+1 ,μ′(s i+1μ’ )|θ Q’ ) (14)
In the formula, Q 'is a target Q network, and mu' is a target strategy network;
updating of Actor networks is based on calculation of policy gradients, i.e.
Figure RE-FDA0003498659910000051
Figure RE-FDA0003498659910000052
For a loss of gradient>
Figure RE-FDA0003498659910000053
Representing a gradient operator, Q (s, a | θ) Q ) For online Q function at network parameter theta Q ,s=s i ,a=μ(s i ) The output of the time.
6. The method for predicting the traffic state of the dynamic short-time network based on the deep deterministic strategy gradient algorithm optimization of claim 5, further comprising:
step four: verification of prediction experiment
Monitoring the effectiveness of the model in training by observing word prediction rewards or turn accumulation rewards during training; after training is finished, the relative quality of the model is directly verified by calculating and comparing the two evaluation indexes.
CN202111115375.9A 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method Active CN115938104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115375.9A CN115938104B (en) 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115375.9A CN115938104B (en) 2021-09-23 Dynamic short-time road network traffic state prediction model and prediction method

Publications (2)

Publication Number Publication Date
CN115938104A true CN115938104A (en) 2023-04-07
CN115938104B CN115938104B (en) 2024-06-28

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362418A (en) * 2023-05-29 2023-06-30 天能电池集团股份有限公司 Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery
CN117974366A (en) * 2024-04-01 2024-05-03 深圳市普裕时代新能源科技有限公司 Energy management system based on industrial and commercial energy storage

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060223529A1 (en) * 2005-03-31 2006-10-05 Takayoshi Yokota Data processing apparatus for probe traffic information and data processing system and method for probe traffic information
JP2010092247A (en) * 2008-10-07 2010-04-22 Internatl Business Mach Corp <Ibm> Controller, control method and control program
CN109190797A (en) * 2018-08-03 2019-01-11 北京航空航天大学 A kind of large-scale road network state Forecasting Approach for Short-term based on improvement k arest neighbors
CA3060900A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada System and method for deep reinforcement learning
CN112907971A (en) * 2021-02-04 2021-06-04 南通大学 Urban road network short-term traffic flow prediction method based on genetic algorithm optimization space-time residual error model
CN113313947A (en) * 2021-05-31 2021-08-27 湖南大学 Road condition evaluation method of short-term traffic prediction graph convolution network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060223529A1 (en) * 2005-03-31 2006-10-05 Takayoshi Yokota Data processing apparatus for probe traffic information and data processing system and method for probe traffic information
JP2010092247A (en) * 2008-10-07 2010-04-22 Internatl Business Mach Corp <Ibm> Controller, control method and control program
CN109190797A (en) * 2018-08-03 2019-01-11 北京航空航天大学 A kind of large-scale road network state Forecasting Approach for Short-term based on improvement k arest neighbors
CA3060900A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada System and method for deep reinforcement learning
CN112907971A (en) * 2021-02-04 2021-06-04 南通大学 Urban road network short-term traffic flow prediction method based on genetic algorithm optimization space-time residual error model
CN113313947A (en) * 2021-05-31 2021-08-27 湖南大学 Road condition evaluation method of short-term traffic prediction graph convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘易诗;关雪峰;吴华意;曹军;张娜;: "基于时空关联度加权的LSTM短时交通速度预测", 地理信息世界, no. 01, 25 February 2020 (2020-02-25), pages 49 - 55 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362418A (en) * 2023-05-29 2023-06-30 天能电池集团股份有限公司 Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery
CN116362418B (en) * 2023-05-29 2023-08-22 天能电池集团股份有限公司 Online prediction method for application-level manufacturing capacity of intelligent factory of high-end battery
CN117974366A (en) * 2024-04-01 2024-05-03 深圳市普裕时代新能源科技有限公司 Energy management system based on industrial and commercial energy storage
CN117974366B (en) * 2024-04-01 2024-06-11 深圳市普裕时代新能源科技有限公司 Energy management system based on industrial and commercial energy storage

Similar Documents

Publication Publication Date Title
CN111310915B (en) Data anomaly detection defense method oriented to reinforcement learning
CN109754605B (en) Traffic prediction method based on attention temporal graph convolution network
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN110794308B (en) Method and device for predicting train battery capacity
CN115238850B (en) Mountain area slope displacement prediction method based on MI-GRA and improved PSO-LSTM
CN116562514B (en) Method and system for immediately analyzing production conditions of enterprises based on neural network
Dao et al. Deep reinforcement learning monitor for snapshot recording
Ge et al. An improved PF remaining useful life prediction method based on quantum genetics and LSTM
CN115719294A (en) Indoor pedestrian flow evacuation control method and system, electronic device and medium
Ch et al. Groundwater level forecasting using SVM-PSO
Zeng et al. A survey on causal reinforcement learning
Carolina Jara Ten Kathen et al. A comparison of pso-based informative path planners for autonomous surface vehicles for water resource monitoring
CN110779526B (en) Path planning method, device and storage medium
CN117332693A (en) Slope stability evaluation method based on DDPG-PSO-BP algorithm
CN115906673B (en) Combat entity behavior model integrated modeling method and system
CN115938104A (en) Dynamic short-time road network traffic state prediction model and prediction method
CN115174263B (en) Attack path dynamic decision method and device
CN114861368B (en) Construction method of railway longitudinal section design learning model based on near-end strategy
CN115938104B (en) Dynamic short-time road network traffic state prediction model and prediction method
CN113762464B (en) Train operation reference curve dynamic generation method based on learning
CN113837443A (en) Transformer substation line load prediction method based on depth BilSTM
CN114911157A (en) Robot navigation control method and system based on partial observable reinforcement learning
Alpcan Dual control with active learning using Gaussian process regression
CN117556681B (en) Intelligent air combat decision method, system and electronic equipment
Tang et al. Learning to Solve Soft-Constrained Vehicle Routing Problems with Lagrangian Relaxation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant