CN112907296B

CN112907296B - Electronic toll road dynamic pricing method sensitive to journey deadline

Info

Publication number: CN112907296B
Application number: CN202110303725.8A
Authority: CN
Inventors: 金嘉晖; 朱晓璇; 吴碧伟; 吴巍炜; 罗军舟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2024-05-24
Anticipated expiration: 2041-03-22
Also published as: CN112907296A

Abstract

The invention discloses a journey cut-off time sensitive electronic toll road dynamic pricing method, which comprises the following steps: establishing a journey cut-off time sensitive simulated traffic environment model, including an urban road network model, a journey cut-off time model, a journey running cost model and the like, generating simulated data according to real data, and processing the simulated data to obtain an input state vector; offline training and learning are carried out by using the deep reinforcement learning model, and a trained dynamic pricing model is obtained; the journey deadline sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, the traffic flow information is transmitted to the dynamic pricing model, and the dynamic pricing model carries out dynamic pricing of the electronic toll road according to the input state information. The invention not only can effectively relieve traffic jam, but also can meet the time requirement of travelers to the greatest extent.

Description

Electronic toll road dynamic pricing method sensitive to journey deadline

Technical Field

The invention relates to the field of toll road pricing of smart cities, in particular to a dynamic pricing method of an electronic toll road with sensitive journey deadline.

Background

Urban transportation plays a significant role in the national economic development. Along with the improvement of the living standard of people, the keeping amount of private cars is rapidly increased, so that the traffic demand and the road infrastructure construction are not synchronous, and a large number of traffic jams and traffic accidents are caused. The conventional urban management method reduces the occurrence of congestion by changing the urban road structure. For example, roads are built on a large scale, the road supply capacity is increased, and the contradiction between traffic supply and demand is solved. While this measure may alleviate traffic congestion in the initial phase, the effect is short lived. Urban road capacity increases and traffic demand increases, which in turn exacerbates traffic congestion.

In order to reduce traffic jam, a road charging mechanism receives great attention in the field of urban management, and aims to split traffic flow by charging vehicles on busy roads, so that the vehicles with the purpose of reducing travel cost can independently travel on roads which are not jammed and have low charging, and the purposes of dredging the traffic flow and relieving the traffic jam are achieved. Currently, this approach has been implemented through electronic toll collection systems and has been successfully applied in a variety of countries and regions.

In order to guarantee the rationality of road pricing, there are two problems when carrying out specific road charging: firstly, the traffic environment is complex, the traffic condition is continuously changed, and the traffic environment has strong dynamic property, especially under the sudden conditions of travel traffic accidents or abnormal weather and the like. Therefore, dynamic road tolling based on real-time traffic flow must be implemented. Second, the travel route of the vehicle is highly time-dependent. For example, office workers or some people who have booked flights or trains may have strict time requirements that they must reach their destination before the exact time, not be careful about road tolling values, while other travelers who do not have time requirements may prefer to choose a route with a lower tolling value.

Existing road pricing mechanisms are divided into static and dynamic pricing mechanisms. Static charging is to set a fixed toll on the road, which, although easy to implement, may not match traffic dynamics. In some early work, dynamic pricing was to allocate different tolls to a road for different periods of time, but this may not be well suited to dynamic traffic environments. Although the road charging can be dynamically adjusted in real time by using the reinforcement learning algorithm, the existing method has poor adaptability to a large-scale urban road network, and does not consider the time requirements of travelers, so that the method cannot be well adapted to complex and dynamic environments. Therefore, the dynamic pricing is carried out for each road to adapt to the traffic flow which changes in real time, and the dynamic pricing is very important to relieving traffic jams and simultaneously considering individual time requirement differences of travelers.

Disclosure of Invention

The invention aims to: aiming at the problems in the prior art, the invention provides a journey cut-off time sensitive electronic toll road dynamic pricing method, which aims to solve the problem that the time requirements of travelers are considered in a real-time dynamic change traffic environment, and the dynamic pricing and charging are carried out on the roads to relieve traffic jam.

The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a journey deadline sensitive electronic toll road dynamic pricing method, the method comprising the steps of:

(1) Establishing a journey cut-off time sensitive simulated traffic environment model, wherein the simulated traffic environment model comprises an urban road network model, a journey cut-off time model and a journey running cost model; the city road network model is used for establishing a topology structure of the city road network, the journey cut-off time model is used for describing the time requirement of a traveler on journey, and the journey running cost model is used for calculating the cost of the vehicle journey and defining the route selection of the vehicle;

(2) The journey cut-off time sensitive simulated traffic environment model gives the rewarding value and state transition information of the reinforcement learning intelligent agent; generating simulated traffic data through the collected real data in the city, establishing the distribution of vehicle journey demands, determining the value of actions corresponding to the state through the current traffic state, and determining reasonable pricing according to the value;

(3) Offline training and learning are carried out by using the deep reinforcement learning model, and a trained dynamic pricing model is obtained;

(4) And dynamically pricing the electronic toll roads in the urban road network by using the trained dynamic pricing model.

Further, in the step (1), the road network model in the journey cut-off time sensitive simulated traffic environment model is described as follows:

Abstract city road network expressed by directed graph network g= < O, E, U >; o= { OD ₁,OD₂,…,OD_R } represents the stroke origin-destination set, where Represents an origin-destination quadruple, where u _k represents the start of the trip, u _j represents the end of the trip,/>Representing the generated travel amount, P _k,j representing the set of all paths from the beginning to the end of the travel without loops, e= { E ₁,e₂,…e_m } representing the set of roads in the city, u= { U ₁,u₂,…u_n } representing the set of areas in the city, H representing the duration of decision making required by the reinforcement learning agent, dividing H into a number of integer time intervals of length τ, and t representing the sequence number of the current time interval;

Further, the trip deadline model described in step (1) is described as follows: the trip deadline d characterizes the time requirement of a certain trip of a user and represents the deadline of the vehicle trip;

further, the travel cost model described in step (1) is described as follows:

The travel cost model adds the time cost of the travel and the cost charged by the travel passing road, and the travel cost of the vehicle helps the vehicle to select a path so as to influence the traffic state; the time cost is related to the stop time of the journey, the time requirement of a traveler is simulated by the stop time d of the journey, the vehicle arrives at a destination before the stop time of the journey, the decision time is H minutes, the journey stop time d=0, 1, … H of the vehicle is randomly allocated, the monetary cost is related to the driving path, the driving path is represented by a variable p, the road passing on the driving path of the vehicle is represented by a variable e, the current time step is represented by a variable t, and the variable Representing the charged amount paid by the road e, wherein the monetary cost is the sum of the charged amounts of the roads passed by the path;

The travel cost is expressed as The travel time is d, and the travel cost of traveling from region u _k to region u _j is/>, which is the travel cost of traveling through path pAccording to whether the travel of the vehicle has a cut-off time or not, if d=0, it indicates that the vehicle has no travel cut-off time, and at this time, the cost of money and the time cost of the vehicle are determined together, ω represents the value per unit time of travel of the vehicle,/>The running time of the vehicle on the road e at the time step t is represented, if D is not equal to 0, the running time is calculated according to the difference value x between the current time and the journey cut-off time, the variable D is used for representing the acceptable time threshold of the vehicle, and if the difference value x between the current time and the vehicle cut-off time is larger than D, the running cost is determined by the monetary cost and the time cost together; if the difference x between the current time and the vehicle cut-off time is smaller than D, the running cost is only determined by the time cost, and the specific calculation formula of the running cost of the vehicle is as follows:

more specifically, when x > D, the running cost gradually increases with the lapse of time; when x < D, the running cost is determined only by the time cost, and as time gets closer to the cutoff time, the time cost increases exponentially;

Further, the method is characterized in that the state of the current urban road condition in the step (2) is three-dimensional, and is expressed as follows: s ^t＝(e,u_j, d), respectively a road e on which the vehicle is traveling, a destination u _j of the journey, and a journey deadline d, Representing the traffic state of the current time step t, namely the number of vehicles on the road e going to the destination u _j and the journey deadline being d;

Further, the reward value in the step (2) calculates an output reward value based on the time requirement of the traveler and the reward function set by the congestion relief condition, wherein the reward value is feedback of actions executed by the reinforcement learning agent, and helps the reinforcement learning agent correct the actions; the reward function is determined according to the optimization target of the agent, and the calculation modes comprise three modes: a reward calculation targeting maximizing the number of vehicles reaching the destination before the trip deadline, a reward calculation targeting minimizing the number of vehicles not reaching the destination before the trip deadline, or a reward calculation targeting minimizing the total time for the vehicles to reach the destination beyond the trip deadline;

Rewards targeted to maximize the number of vehicles arriving at the destination before the trip deadline are calculated as follows:

rewards aimed at minimizing the number of vehicles that did not reach the destination before the trip deadline are calculated as follows:

calculating a prize with the objective of minimizing the total time for the vehicle to reach the destination beyond the trip deadline, as follows:

Wherein the method comprises the steps of The number of vehicles traveling on road e at time step t and destined for zone u _j and trip deadline d, the variable τ represents the length of time of one time step, and the variable u _k represents the departure of the vehicle trip,/>Representing the travel time of the vehicle on road e without congestion, C _e representing the capacity of the current road e, M, N being constant;

Further, in the step (2), the process of outputting reasonable pricing specifically includes:

The journey cut-off time sensitive simulated traffic environment model performs state abstraction according to traffic flow state information on the urban road network And combining the action range provided by the action space to obtain reasonable pricing in the current state.

Further, in the step (4), the journey deadline sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, and transmits the traffic flow information to the dynamic pricing model, and the dynamic pricing model outputs reasonable pricing according to the input state information and returns the pricing to the journey deadline sensitive simulated traffic environment model;

the journey deadline sensitive simulated traffic environment model receives pricing and executes the pricing operation; and responding to the traffic in the real urban traffic environment to obtain the next traffic information state, and further, carrying out dynamic pricing on the electronic toll road.

Further, in step (2), the distribution generated by the vehicle journey demand is a gaussian distribution.

Further, in the step (2), the current urban road condition state uses the statistical number of vehicles with the same journey cut-off time as the traffic flow to process, so that the complexity of training the deep reinforcement learning model is reduced, and the convergence is improved.

Further, in the step (3), the deep reinforcement learning model is trained by using a multithreading asynchronous training method, so that the training speed is improved, and the convergence of the pricing strategy is accelerated.

The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

(1) Is beneficial to being expanded to a large-scale urban road network. When the traditional single agent interacts with the environment to learn the strategy, the situation of difficult convergence is easy to occur due to the large state space. The invention uses a plurality of local agents and global agents to asynchronously learn charging strategies, breaks the correlation between data, improves the training speed of the strategies and ensures that the strategies are easier to converge.

(2) The model is more perfect. The existing model assumes that the vehicle has no time requirements or that the time requirements are the same, i.e. the difference in travel time from different travelers is not taken into account. According to the method, the time requirements of the vehicles are simulated by adding the cut-off time into the model, and the time requirements of different vehicles on reaching the destination are considered; when the vehicle running cost is modeled, a time threshold value is introduced, the time approaching cutoff time is simulated, and the vehicle only selects the characteristics of a path according to the time cost, so that the model is more perfect and is more suitable for the actual traffic environment.

(3) Is beneficial to relieving traffic jam. On the one hand, the dynamic road charging can split vehicles running on a road with high congestion and charging, and the effect of relieving the congestion is achieved. On the other hand, the effect which is expected to be obtained according to the strategy can be selected from three different reward functions in a targeted way, and the flexibility is higher.

Drawings

FIG. 1 is a reinforcement learning architecture for dynamic pricing of urban toll roads;

FIG. 2 is a flow chart of a journey deadline sensitive electronic toll road dynamic pricing method implemented by the present invention.

Detailed Description

The invention is further elucidated below in connection with the drawings and the specific embodiments.

The journey deadline sensitive electronic toll road dynamic pricing method is realized through a journey deadline sensitive simulated traffic environment model and a dynamic pricing model, as shown in figure 1, the journey deadline sensitive simulated traffic environment model takes peak time vehicle data as the input of the dynamic pricing model, and the electronic toll road is dynamically priced according to the trained dynamic pricing model. The invention provides a journey cut-off time sensitive electronic toll road dynamic pricing method for relieving traffic jam in urban traffic flow peak environment, and the flow is shown in figure 2. The specific implementation steps are as follows:

(1) Establishing a journey cut-off time sensitive simulated traffic environment model, wherein the simulated traffic environment model comprises an urban road network model, a journey cut-off time model and a journey running cost model; the city road network model is used for establishing a topology structure of the city road network, the journey cut-off time model is used for describing the time requirement of a traveler on journey, and the journey running cost model is used for calculating the cost of a vehicle journey and defining the route selection of the vehicle.

further, the travel cost model described in step (1) is described as follows:

Claims

1. A method for dynamically pricing electronic toll roads sensitive to trip deadlines, the method comprising the steps of:

(2) The journey cut-off time sensitive simulated traffic environment model gives the rewarding value and state transition information of the reinforcement learning intelligent agent; generating simulated traffic data through the collected real data in the city, establishing the distribution of vehicle journey demands, determining the value of actions corresponding to the state through the state of the road condition of the current city, and determining pricing according to the value;

(4) Dynamically pricing the electronic toll roads in the urban road network by using the trained dynamic pricing model;

in the step (1), the road network model in the journey cut-off time sensitive simulated traffic environment model is described as follows:

In step (1), the journey cut-off time model is described as follows: the trip deadline d characterizes the time requirement of a certain trip of a user and represents the deadline of the vehicle trip;

In the step (1), the travel cost model is described as follows:

The travel cost model adds the time cost of the travel and the cost charged by the travel passing road, and the travel cost of the vehicle helps the vehicle to select a path so as to influence the traffic state; the time cost is related to the stop time of the journey, the time requirement of a traveler is simulated by the stop time d of the journey, the vehicle reaches a destination before the stop time of the journey, the decision time is H minutes, the journey stop time d=0, 1, … and H of the vehicle are randomly distributed, the monetary cost is related to the driving path, the driving path is represented by a variable p, the road passing through the driving path of the vehicle is represented by a variable e, the current time step is represented by a variable t, and the variable Representing the charged amount paid by the road e, wherein the monetary cost is the sum of the charged amounts of the roads passed by the path;

In the step (2), the rewarding value calculates and outputs the rewarding value based on the time requirement of the traveler and the rewarding function set by the congestion relief condition, and the rewarding value is feedback of actions executed by the reinforcement learning agent and helps the reinforcement learning agent to correct the actions; the reward function is determined according to the optimization target of the agent, and the calculation modes comprise three modes: a reward calculation targeting maximizing the number of vehicles reaching the destination before the trip deadline, a reward calculation targeting minimizing the number of vehicles not reaching the destination before the trip deadline, or a reward calculation targeting minimizing the total time for the vehicles to reach the destination beyond the trip deadline;

In the step (4), the journey cut-off time sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, and transmits the traffic flow information to the dynamic pricing model, and the dynamic pricing model outputs pricing according to the input state information and returns the pricing to the journey cut-off time sensitive simulated traffic environment model;

2. The method for dynamically pricing electronic toll roads sensitive to trip deadlines according to claim 1, wherein in step (2), the status of the current city road condition is three-dimensional and expressed as follows: s ^t＝(e,u_j, d), respectively a road e on which the vehicle is traveling, a destination u _j of the journey, and a journey deadline d,The traffic status of the current time step t, i.e., the number of vehicles on the road e going to the destination u _j and the trip deadline is d, is represented.

3. The method for dynamically pricing electronic toll roads sensitive to journey deadlines according to claim 1, wherein in step (2), the process of outputting rational pricing according to the status of the current urban road conditions specifically comprises:

the journey cut-off time sensitive simulated traffic environment model converts information into a state which can be identified by the model according to traffic flow state information on the urban road network And combining the action range provided by the action space to obtain pricing in the current state.

4. The method for dynamically pricing electronic toll roads sensitive to trip deadlines of claim 1, wherein in step (2) the distribution of vehicle trip requirements is a gaussian distribution; the state of the current urban road condition takes the statistical number of vehicles with the same journey deadline as the traffic flow to be processed; the deep reinforcement learning model is trained by adopting a multithread asynchronous training method.