CN112907296B - Electronic toll road dynamic pricing method sensitive to journey deadline - Google Patents

Electronic toll road dynamic pricing method sensitive to journey deadline Download PDF

Info

Publication number
CN112907296B
CN112907296B CN202110303725.8A CN202110303725A CN112907296B CN 112907296 B CN112907296 B CN 112907296B CN 202110303725 A CN202110303725 A CN 202110303725A CN 112907296 B CN112907296 B CN 112907296B
Authority
CN
China
Prior art keywords
time
journey
model
vehicle
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110303725.8A
Other languages
Chinese (zh)
Other versions
CN112907296A (en
Inventor
金嘉晖
朱晓璇
吴碧伟
吴巍炜
罗军舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110303725.8A priority Critical patent/CN112907296B/en
Publication of CN112907296A publication Critical patent/CN112907296A/en
Application granted granted Critical
Publication of CN112907296B publication Critical patent/CN112907296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • G06Q30/0284Time or distance, e.g. usage of parking meters or taximeters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Traffic Control Systems (AREA)
  • Devices For Checking Fares Or Tickets At Control Points (AREA)

Abstract

The invention discloses a journey cut-off time sensitive electronic toll road dynamic pricing method, which comprises the following steps: establishing a journey cut-off time sensitive simulated traffic environment model, including an urban road network model, a journey cut-off time model, a journey running cost model and the like, generating simulated data according to real data, and processing the simulated data to obtain an input state vector; offline training and learning are carried out by using the deep reinforcement learning model, and a trained dynamic pricing model is obtained; the journey deadline sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, the traffic flow information is transmitted to the dynamic pricing model, and the dynamic pricing model carries out dynamic pricing of the electronic toll road according to the input state information. The invention not only can effectively relieve traffic jam, but also can meet the time requirement of travelers to the greatest extent.

Description

Electronic toll road dynamic pricing method sensitive to journey deadline
Technical Field
The invention relates to the field of toll road pricing of smart cities, in particular to a dynamic pricing method of an electronic toll road with sensitive journey deadline.
Background
Urban transportation plays a significant role in the national economic development. Along with the improvement of the living standard of people, the keeping amount of private cars is rapidly increased, so that the traffic demand and the road infrastructure construction are not synchronous, and a large number of traffic jams and traffic accidents are caused. The conventional urban management method reduces the occurrence of congestion by changing the urban road structure. For example, roads are built on a large scale, the road supply capacity is increased, and the contradiction between traffic supply and demand is solved. While this measure may alleviate traffic congestion in the initial phase, the effect is short lived. Urban road capacity increases and traffic demand increases, which in turn exacerbates traffic congestion.
In order to reduce traffic jam, a road charging mechanism receives great attention in the field of urban management, and aims to split traffic flow by charging vehicles on busy roads, so that the vehicles with the purpose of reducing travel cost can independently travel on roads which are not jammed and have low charging, and the purposes of dredging the traffic flow and relieving the traffic jam are achieved. Currently, this approach has been implemented through electronic toll collection systems and has been successfully applied in a variety of countries and regions.
In order to guarantee the rationality of road pricing, there are two problems when carrying out specific road charging: firstly, the traffic environment is complex, the traffic condition is continuously changed, and the traffic environment has strong dynamic property, especially under the sudden conditions of travel traffic accidents or abnormal weather and the like. Therefore, dynamic road tolling based on real-time traffic flow must be implemented. Second, the travel route of the vehicle is highly time-dependent. For example, office workers or some people who have booked flights or trains may have strict time requirements that they must reach their destination before the exact time, not be careful about road tolling values, while other travelers who do not have time requirements may prefer to choose a route with a lower tolling value.
Existing road pricing mechanisms are divided into static and dynamic pricing mechanisms. Static charging is to set a fixed toll on the road, which, although easy to implement, may not match traffic dynamics. In some early work, dynamic pricing was to allocate different tolls to a road for different periods of time, but this may not be well suited to dynamic traffic environments. Although the road charging can be dynamically adjusted in real time by using the reinforcement learning algorithm, the existing method has poor adaptability to a large-scale urban road network, and does not consider the time requirements of travelers, so that the method cannot be well adapted to complex and dynamic environments. Therefore, the dynamic pricing is carried out for each road to adapt to the traffic flow which changes in real time, and the dynamic pricing is very important to relieving traffic jams and simultaneously considering individual time requirement differences of travelers.
Disclosure of Invention
The invention aims to: aiming at the problems in the prior art, the invention provides a journey cut-off time sensitive electronic toll road dynamic pricing method, which aims to solve the problem that the time requirements of travelers are considered in a real-time dynamic change traffic environment, and the dynamic pricing and charging are carried out on the roads to relieve traffic jam.
The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a journey deadline sensitive electronic toll road dynamic pricing method, the method comprising the steps of:
(1) Establishing a journey cut-off time sensitive simulated traffic environment model, wherein the simulated traffic environment model comprises an urban road network model, a journey cut-off time model and a journey running cost model; the city road network model is used for establishing a topology structure of the city road network, the journey cut-off time model is used for describing the time requirement of a traveler on journey, and the journey running cost model is used for calculating the cost of the vehicle journey and defining the route selection of the vehicle;
(2) The journey cut-off time sensitive simulated traffic environment model gives the rewarding value and state transition information of the reinforcement learning intelligent agent; generating simulated traffic data through the collected real data in the city, establishing the distribution of vehicle journey demands, determining the value of actions corresponding to the state through the current traffic state, and determining reasonable pricing according to the value;
(3) Offline training and learning are carried out by using the deep reinforcement learning model, and a trained dynamic pricing model is obtained;
(4) And dynamically pricing the electronic toll roads in the urban road network by using the trained dynamic pricing model.
Further, in the step (1), the road network model in the journey cut-off time sensitive simulated traffic environment model is described as follows:
Abstract city road network expressed by directed graph network g= < O, E, U >; o= { OD 1,OD2,…,ODR } represents the stroke origin-destination set, where Represents an origin-destination quadruple, where u k represents the start of the trip, u j represents the end of the trip,/>Representing the generated travel amount, P k,j representing the set of all paths from the beginning to the end of the travel without loops, e= { E 1,e2,…em } representing the set of roads in the city, u= { U 1,u2,…un } representing the set of areas in the city, H representing the duration of decision making required by the reinforcement learning agent, dividing H into a number of integer time intervals of length τ, and t representing the sequence number of the current time interval;
Further, the trip deadline model described in step (1) is described as follows: the trip deadline d characterizes the time requirement of a certain trip of a user and represents the deadline of the vehicle trip;
further, the travel cost model described in step (1) is described as follows:
The travel cost model adds the time cost of the travel and the cost charged by the travel passing road, and the travel cost of the vehicle helps the vehicle to select a path so as to influence the traffic state; the time cost is related to the stop time of the journey, the time requirement of a traveler is simulated by the stop time d of the journey, the vehicle arrives at a destination before the stop time of the journey, the decision time is H minutes, the journey stop time d=0, 1, … H of the vehicle is randomly allocated, the monetary cost is related to the driving path, the driving path is represented by a variable p, the road passing on the driving path of the vehicle is represented by a variable e, the current time step is represented by a variable t, and the variable Representing the charged amount paid by the road e, wherein the monetary cost is the sum of the charged amounts of the roads passed by the path;
The travel cost is expressed as The travel time is d, and the travel cost of traveling from region u k to region u j is/>, which is the travel cost of traveling through path pAccording to whether the travel of the vehicle has a cut-off time or not, if d=0, it indicates that the vehicle has no travel cut-off time, and at this time, the cost of money and the time cost of the vehicle are determined together, ω represents the value per unit time of travel of the vehicle,/>The running time of the vehicle on the road e at the time step t is represented, if D is not equal to 0, the running time is calculated according to the difference value x between the current time and the journey cut-off time, the variable D is used for representing the acceptable time threshold of the vehicle, and if the difference value x between the current time and the vehicle cut-off time is larger than D, the running cost is determined by the monetary cost and the time cost together; if the difference x between the current time and the vehicle cut-off time is smaller than D, the running cost is only determined by the time cost, and the specific calculation formula of the running cost of the vehicle is as follows:
more specifically, when x > D, the running cost gradually increases with the lapse of time; when x < D, the running cost is determined only by the time cost, and as time gets closer to the cutoff time, the time cost increases exponentially;
Further, the method is characterized in that the state of the current urban road condition in the step (2) is three-dimensional, and is expressed as follows: s t=(e,uj, d), respectively a road e on which the vehicle is traveling, a destination u j of the journey, and a journey deadline d, Representing the traffic state of the current time step t, namely the number of vehicles on the road e going to the destination u j and the journey deadline being d;
Further, the reward value in the step (2) calculates an output reward value based on the time requirement of the traveler and the reward function set by the congestion relief condition, wherein the reward value is feedback of actions executed by the reinforcement learning agent, and helps the reinforcement learning agent correct the actions; the reward function is determined according to the optimization target of the agent, and the calculation modes comprise three modes: a reward calculation targeting maximizing the number of vehicles reaching the destination before the trip deadline, a reward calculation targeting minimizing the number of vehicles not reaching the destination before the trip deadline, or a reward calculation targeting minimizing the total time for the vehicles to reach the destination beyond the trip deadline;
Rewards targeted to maximize the number of vehicles arriving at the destination before the trip deadline are calculated as follows:
rewards aimed at minimizing the number of vehicles that did not reach the destination before the trip deadline are calculated as follows:
calculating a prize with the objective of minimizing the total time for the vehicle to reach the destination beyond the trip deadline, as follows:
Wherein the method comprises the steps of The number of vehicles traveling on road e at time step t and destined for zone u j and trip deadline d, the variable τ represents the length of time of one time step, and the variable u k represents the departure of the vehicle trip,/>Representing the travel time of the vehicle on road e without congestion, C e representing the capacity of the current road e, M, N being constant;
Further, in the step (2), the process of outputting reasonable pricing specifically includes:
The journey cut-off time sensitive simulated traffic environment model performs state abstraction according to traffic flow state information on the urban road network And combining the action range provided by the action space to obtain reasonable pricing in the current state.
Further, in the step (4), the journey deadline sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, and transmits the traffic flow information to the dynamic pricing model, and the dynamic pricing model outputs reasonable pricing according to the input state information and returns the pricing to the journey deadline sensitive simulated traffic environment model;
the journey deadline sensitive simulated traffic environment model receives pricing and executes the pricing operation; and responding to the traffic in the real urban traffic environment to obtain the next traffic information state, and further, carrying out dynamic pricing on the electronic toll road.
Further, in step (2), the distribution generated by the vehicle journey demand is a gaussian distribution.
Further, in the step (2), the current urban road condition state uses the statistical number of vehicles with the same journey cut-off time as the traffic flow to process, so that the complexity of training the deep reinforcement learning model is reduced, and the convergence is improved.
Further, in the step (3), the deep reinforcement learning model is trained by using a multithreading asynchronous training method, so that the training speed is improved, and the convergence of the pricing strategy is accelerated.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
(1) Is beneficial to being expanded to a large-scale urban road network. When the traditional single agent interacts with the environment to learn the strategy, the situation of difficult convergence is easy to occur due to the large state space. The invention uses a plurality of local agents and global agents to asynchronously learn charging strategies, breaks the correlation between data, improves the training speed of the strategies and ensures that the strategies are easier to converge.
(2) The model is more perfect. The existing model assumes that the vehicle has no time requirements or that the time requirements are the same, i.e. the difference in travel time from different travelers is not taken into account. According to the method, the time requirements of the vehicles are simulated by adding the cut-off time into the model, and the time requirements of different vehicles on reaching the destination are considered; when the vehicle running cost is modeled, a time threshold value is introduced, the time approaching cutoff time is simulated, and the vehicle only selects the characteristics of a path according to the time cost, so that the model is more perfect and is more suitable for the actual traffic environment.
(3) Is beneficial to relieving traffic jam. On the one hand, the dynamic road charging can split vehicles running on a road with high congestion and charging, and the effect of relieving the congestion is achieved. On the other hand, the effect which is expected to be obtained according to the strategy can be selected from three different reward functions in a targeted way, and the flexibility is higher.
Drawings
FIG. 1 is a reinforcement learning architecture for dynamic pricing of urban toll roads;
FIG. 2 is a flow chart of a journey deadline sensitive electronic toll road dynamic pricing method implemented by the present invention.
Detailed Description
The invention is further elucidated below in connection with the drawings and the specific embodiments.
The journey deadline sensitive electronic toll road dynamic pricing method is realized through a journey deadline sensitive simulated traffic environment model and a dynamic pricing model, as shown in figure 1, the journey deadline sensitive simulated traffic environment model takes peak time vehicle data as the input of the dynamic pricing model, and the electronic toll road is dynamically priced according to the trained dynamic pricing model. The invention provides a journey cut-off time sensitive electronic toll road dynamic pricing method for relieving traffic jam in urban traffic flow peak environment, and the flow is shown in figure 2. The specific implementation steps are as follows:
(1) Establishing a journey cut-off time sensitive simulated traffic environment model, wherein the simulated traffic environment model comprises an urban road network model, a journey cut-off time model and a journey running cost model; the city road network model is used for establishing a topology structure of the city road network, the journey cut-off time model is used for describing the time requirement of a traveler on journey, and the journey running cost model is used for calculating the cost of a vehicle journey and defining the route selection of the vehicle.
(2) The journey cut-off time sensitive simulated traffic environment model gives the rewarding value and state transition information of the reinforcement learning intelligent agent; generating simulated traffic data through the collected real data in the city, establishing the distribution of vehicle journey demands, determining the value of actions corresponding to the state through the current traffic state, and determining reasonable pricing according to the value;
(3) Offline training and learning are carried out by using the deep reinforcement learning model, and a trained dynamic pricing model is obtained;
(4) And dynamically pricing the electronic toll roads in the urban road network by using the trained dynamic pricing model.
Further, in the step (1), the road network model in the journey cut-off time sensitive simulated traffic environment model is described as follows:
Abstract city road network expressed by directed graph network g= < O, E, U >; o= { OD 1,OD2,…,ODR } represents the stroke origin-destination set, where Represents an origin-destination quadruple, where u k represents the start of the trip, u j represents the end of the trip,/>Representing the generated travel amount, P k,j representing the set of all paths from the beginning to the end of the travel without loops, e= { E 1,e2,…em } representing the set of roads in the city, u= { U 1,u2,…un } representing the set of areas in the city, H representing the duration of decision making required by the reinforcement learning agent, dividing H into a number of integer time intervals of length τ, and t representing the sequence number of the current time interval;
Further, the trip deadline model described in step (1) is described as follows: the trip deadline d characterizes the time requirement of a certain trip of a user and represents the deadline of the vehicle trip;
further, the travel cost model described in step (1) is described as follows:
The travel cost model adds the time cost of the travel and the cost charged by the travel passing road, and the travel cost of the vehicle helps the vehicle to select a path so as to influence the traffic state; the time cost is related to the stop time of the journey, the time requirement of a traveler is simulated by the stop time d of the journey, the vehicle arrives at a destination before the stop time of the journey, the decision time is H minutes, the journey stop time d=0, 1, … H of the vehicle is randomly allocated, the monetary cost is related to the driving path, the driving path is represented by a variable p, the road passing on the driving path of the vehicle is represented by a variable e, the current time step is represented by a variable t, and the variable Representing the charged amount paid by the road e, wherein the monetary cost is the sum of the charged amounts of the roads passed by the path;
The travel cost is expressed as The travel time is d, and the travel cost of traveling from region u k to region u j is/>, which is the travel cost of traveling through path pAccording to whether the travel of the vehicle has a cut-off time or not, if d=0, it indicates that the vehicle has no travel cut-off time, and at this time, the cost of money and the time cost of the vehicle are determined together, ω represents the value per unit time of travel of the vehicle,/>The running time of the vehicle on the road e at the time step t is represented, if D is not equal to 0, the running time is calculated according to the difference value x between the current time and the journey cut-off time, the variable D is used for representing the acceptable time threshold of the vehicle, and if the difference value x between the current time and the vehicle cut-off time is larger than D, the running cost is determined by the monetary cost and the time cost together; if the difference x between the current time and the vehicle cut-off time is smaller than D, the running cost is only determined by the time cost, and the specific calculation formula of the running cost of the vehicle is as follows:
more specifically, when x > D, the running cost gradually increases with the lapse of time; when x < D, the running cost is determined only by the time cost, and as time gets closer to the cutoff time, the time cost increases exponentially;
Further, the method is characterized in that the state of the current urban road condition in the step (2) is three-dimensional, and is expressed as follows: s t=(e,uj, d), respectively a road e on which the vehicle is traveling, a destination u j of the journey, and a journey deadline d, Representing the traffic state of the current time step t, namely the number of vehicles on the road e going to the destination u j and the journey deadline being d;
Further, the reward value in the step (2) calculates an output reward value based on the time requirement of the traveler and the reward function set by the congestion relief condition, wherein the reward value is feedback of actions executed by the reinforcement learning agent, and helps the reinforcement learning agent correct the actions; the reward function is determined according to the optimization target of the agent, and the calculation modes comprise three modes: a reward calculation targeting maximizing the number of vehicles reaching the destination before the trip deadline, a reward calculation targeting minimizing the number of vehicles not reaching the destination before the trip deadline, or a reward calculation targeting minimizing the total time for the vehicles to reach the destination beyond the trip deadline;
Rewards targeted to maximize the number of vehicles arriving at the destination before the trip deadline are calculated as follows:
rewards aimed at minimizing the number of vehicles that did not reach the destination before the trip deadline are calculated as follows:
calculating a prize with the objective of minimizing the total time for the vehicle to reach the destination beyond the trip deadline, as follows:
Wherein the method comprises the steps of The number of vehicles traveling on road e at time step t and destined for zone u j and trip deadline d, the variable τ represents the length of time of one time step, and the variable u k represents the departure of the vehicle trip,/>Representing the travel time of the vehicle on road e without congestion, C e representing the capacity of the current road e, M, N being constant;
Further, in the step (2), the process of outputting reasonable pricing specifically includes:
The journey cut-off time sensitive simulated traffic environment model performs state abstraction according to traffic flow state information on the urban road network And combining the action range provided by the action space to obtain reasonable pricing in the current state.
Further, in the step (4), the journey deadline sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, and transmits the traffic flow information to the dynamic pricing model, and the dynamic pricing model outputs reasonable pricing according to the input state information and returns the pricing to the journey deadline sensitive simulated traffic environment model;
the journey deadline sensitive simulated traffic environment model receives pricing and executes the pricing operation; and responding to the traffic in the real urban traffic environment to obtain the next traffic information state, and further, carrying out dynamic pricing on the electronic toll road.
Further, in step (2), the distribution generated by the vehicle journey demand is a gaussian distribution.
Further, in the step (2), the current urban road condition state uses the statistical number of vehicles with the same journey cut-off time as the traffic flow to process, so that the complexity of training the deep reinforcement learning model is reduced, and the convergence is improved.
Further, in the step (3), the deep reinforcement learning model is trained by using a multithreading asynchronous training method, so that the training speed is improved, and the convergence of the pricing strategy is accelerated.

Claims (4)

1. A method for dynamically pricing electronic toll roads sensitive to trip deadlines, the method comprising the steps of:
(1) Establishing a journey cut-off time sensitive simulated traffic environment model, wherein the simulated traffic environment model comprises an urban road network model, a journey cut-off time model and a journey running cost model; the city road network model is used for establishing a topology structure of the city road network, the journey cut-off time model is used for describing the time requirement of a traveler on journey, and the journey running cost model is used for calculating the cost of the vehicle journey and defining the route selection of the vehicle;
(2) The journey cut-off time sensitive simulated traffic environment model gives the rewarding value and state transition information of the reinforcement learning intelligent agent; generating simulated traffic data through the collected real data in the city, establishing the distribution of vehicle journey demands, determining the value of actions corresponding to the state through the state of the road condition of the current city, and determining pricing according to the value;
(3) Offline training and learning are carried out by using the deep reinforcement learning model, and a trained dynamic pricing model is obtained;
(4) Dynamically pricing the electronic toll roads in the urban road network by using the trained dynamic pricing model;
in the step (1), the road network model in the journey cut-off time sensitive simulated traffic environment model is described as follows:
Abstract city road network expressed by directed graph network g= < O, E, U >; o= { OD 1,OD2,…,ODR } represents the stroke origin-destination set, where Represents an origin-destination quadruple, where u k represents the start of the trip, u j represents the end of the trip,/>Representing the generated travel amount, P k,j representing the set of all paths from the beginning to the end of the travel without loops, e= { E 1,e2,…em } representing the set of roads in the city, u= { U 1,u2,…un } representing the set of areas in the city, H representing the duration of decision making required by the reinforcement learning agent, dividing H into a number of integer time intervals of length τ, and t representing the sequence number of the current time interval;
In step (1), the journey cut-off time model is described as follows: the trip deadline d characterizes the time requirement of a certain trip of a user and represents the deadline of the vehicle trip;
In the step (1), the travel cost model is described as follows:
The travel cost model adds the time cost of the travel and the cost charged by the travel passing road, and the travel cost of the vehicle helps the vehicle to select a path so as to influence the traffic state; the time cost is related to the stop time of the journey, the time requirement of a traveler is simulated by the stop time d of the journey, the vehicle reaches a destination before the stop time of the journey, the decision time is H minutes, the journey stop time d=0, 1, … and H of the vehicle are randomly distributed, the monetary cost is related to the driving path, the driving path is represented by a variable p, the road passing through the driving path of the vehicle is represented by a variable e, the current time step is represented by a variable t, and the variable Representing the charged amount paid by the road e, wherein the monetary cost is the sum of the charged amounts of the roads passed by the path;
The travel cost is expressed as The travel time is d, and the travel cost of traveling from region u k to region u j is/>, which is the travel cost of traveling through path pAccording to whether the travel of the vehicle has a cut-off time or not, if d=0, it indicates that the vehicle has no travel cut-off time, and at this time, the cost of money and the time cost of the vehicle are determined together, ω represents the value per unit time of travel of the vehicle,/>The running time of the vehicle on the road e at the time step t is represented, if D is not equal to 0, the running time is calculated according to the difference value x between the current time and the journey cut-off time, the variable D is used for representing the acceptable time threshold of the vehicle, and if the difference value x between the current time and the vehicle cut-off time is larger than D, the running cost is determined by the monetary cost and the time cost together; if the difference x between the current time and the vehicle cut-off time is smaller than D, the running cost is only determined by the time cost, and the specific calculation formula of the running cost of the vehicle is as follows:
more specifically, when x > D, the running cost gradually increases with the lapse of time; when x < D, the running cost is determined only by the time cost, and as time gets closer to the cutoff time, the time cost increases exponentially;
In the step (2), the rewarding value calculates and outputs the rewarding value based on the time requirement of the traveler and the rewarding function set by the congestion relief condition, and the rewarding value is feedback of actions executed by the reinforcement learning agent and helps the reinforcement learning agent to correct the actions; the reward function is determined according to the optimization target of the agent, and the calculation modes comprise three modes: a reward calculation targeting maximizing the number of vehicles reaching the destination before the trip deadline, a reward calculation targeting minimizing the number of vehicles not reaching the destination before the trip deadline, or a reward calculation targeting minimizing the total time for the vehicles to reach the destination beyond the trip deadline;
Rewards targeted to maximize the number of vehicles arriving at the destination before the trip deadline are calculated as follows:
rewards aimed at minimizing the number of vehicles that did not reach the destination before the trip deadline are calculated as follows:
calculating a prize with the objective of minimizing the total time for the vehicle to reach the destination beyond the trip deadline, as follows:
Wherein the method comprises the steps of The number of vehicles traveling on road e at time step t and destined for zone u j and trip deadline d, the variable τ represents the length of time of one time step, and the variable u k represents the departure of the vehicle trip,/>Representing the travel time of the vehicle on road e without congestion, C e representing the capacity of the current road e, M, N being constant;
In the step (4), the journey cut-off time sensitive simulated traffic environment model outputs traffic flow information of each road at the current time as a state according to the real urban traffic flow environment, and transmits the traffic flow information to the dynamic pricing model, and the dynamic pricing model outputs pricing according to the input state information and returns the pricing to the journey cut-off time sensitive simulated traffic environment model;
the journey deadline sensitive simulated traffic environment model receives pricing and executes the pricing operation; and responding to the traffic in the real urban traffic environment to obtain the next traffic information state, and further, carrying out dynamic pricing on the electronic toll road.
2. The method for dynamically pricing electronic toll roads sensitive to trip deadlines according to claim 1, wherein in step (2), the status of the current city road condition is three-dimensional and expressed as follows: s t=(e,uj, d), respectively a road e on which the vehicle is traveling, a destination u j of the journey, and a journey deadline d,The traffic status of the current time step t, i.e., the number of vehicles on the road e going to the destination u j and the trip deadline is d, is represented.
3. The method for dynamically pricing electronic toll roads sensitive to journey deadlines according to claim 1, wherein in step (2), the process of outputting rational pricing according to the status of the current urban road conditions specifically comprises:
the journey cut-off time sensitive simulated traffic environment model converts information into a state which can be identified by the model according to traffic flow state information on the urban road network And combining the action range provided by the action space to obtain pricing in the current state.
4. The method for dynamically pricing electronic toll roads sensitive to trip deadlines of claim 1, wherein in step (2) the distribution of vehicle trip requirements is a gaussian distribution; the state of the current urban road condition takes the statistical number of vehicles with the same journey deadline as the traffic flow to be processed; the deep reinforcement learning model is trained by adopting a multithread asynchronous training method.
CN202110303725.8A 2021-03-22 2021-03-22 Electronic toll road dynamic pricing method sensitive to journey deadline Active CN112907296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110303725.8A CN112907296B (en) 2021-03-22 2021-03-22 Electronic toll road dynamic pricing method sensitive to journey deadline

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110303725.8A CN112907296B (en) 2021-03-22 2021-03-22 Electronic toll road dynamic pricing method sensitive to journey deadline

Publications (2)

Publication Number Publication Date
CN112907296A CN112907296A (en) 2021-06-04
CN112907296B true CN112907296B (en) 2024-05-24

Family

ID=76106699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110303725.8A Active CN112907296B (en) 2021-03-22 2021-03-22 Electronic toll road dynamic pricing method sensitive to journey deadline

Country Status (1)

Country Link
CN (1) CN112907296B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139463A (en) * 2015-09-01 2015-12-09 罗毅青 Big data urban road pricing method and system
CN107093216A (en) * 2017-04-14 2017-08-25 广州地理研究所 Urban traffic blocking charging method and device based on vehicle electron identifying
CN112508356A (en) * 2020-11-23 2021-03-16 广州大学 Shared automobile balancing method based on reinforcement learning model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139463A (en) * 2015-09-01 2015-12-09 罗毅青 Big data urban road pricing method and system
CN107093216A (en) * 2017-04-14 2017-08-25 广州地理研究所 Urban traffic blocking charging method and device based on vehicle electron identifying
CN112508356A (en) * 2020-11-23 2021-03-16 广州大学 Shared automobile balancing method based on reinforcement learning model

Also Published As

Publication number Publication date
CN112907296A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN104157139B (en) A kind of traffic congestion Forecasting Methodology and method for visualizing
Lin et al. Efficient network-wide model-based predictive control for urban traffic networks
CN105070042B (en) A kind of modeling method of traffic forecast
CN106096756A (en) A kind of urban road network dynamic realtime Multiple Intersections routing resource
CN104408948B (en) Vehicle-mounted-GPS-based public transport priority signal control method of urban road traffic
CN107591004A (en) A kind of intelligent traffic guidance method based on bus or train route collaboration
CN105930914A (en) City bus optimal charging structure charge determination method based on origin-destination distance
Kaddoura Marginal congestion cost pricing in a multi-agent simulation investigation of the greater Berlin area
Chen et al. Real-time information feedback based on a sharp decay weighted function
Selten et al. Experimental investigation of day-to-day route-choice behaviour and network simulations of autobahn traffic in North Rhine-Westphalia
Manasra et al. Optimization-based operations control for public transportation service with transfers
CN113516277A (en) Network connection intelligent traffic path planning method based on dynamic pricing of road network
Pandey et al. Multiagent reinforcement learning algorithm for distributed dynamic pricing of managed lanes
CN111985814A (en) Method and system for optimizing inter-city train operation scheme with intermittent power supply
CN115675584A (en) Urban area line driving scheme optimization method for urban rail transit
Li et al. A bibliometric analysis and review on reinforcement learning for transportation applications
Zheng et al. A novel approach to coordinating green wave system with adaptation evolutionary strategy
CN113191028B (en) Traffic simulation method, system, program, and medium
Cui et al. Dynamic pricing for fast charging stations with deep reinforcement learning
CN112907296B (en) Electronic toll road dynamic pricing method sensitive to journey deadline
Xueyu et al. Research on the Bi-level programming model for ticket fare pricing of urban rail transit based on particle swarm optimization algorithm
Safirova et al. Choosing congestion pricing policy: Cordon tolls versus link-based tolls
Yu et al. Optimization of urban bus operation frequency under common route condition with rail transit
Vranken et al. Performance comparison of dynamic vehicle routing methods for minimizing the global dwell time in upcoming smart cities
Li et al. Traffic flow guidance and optimization of connected vehicles based on swarm intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant