CN114240002A - Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning - Google Patents

Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning Download PDF

Info

Publication number
CN114240002A
CN114240002A CN202210028133.4A CN202210028133A CN114240002A CN 114240002 A CN114240002 A CN 114240002A CN 202210028133 A CN202210028133 A CN 202210028133A CN 114240002 A CN114240002 A CN 114240002A
Authority
CN
China
Prior art keywords
bus
vehicle
station
departure
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210028133.4A
Other languages
Chinese (zh)
Inventor
伦嘉铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202210028133.4A priority Critical patent/CN114240002A/en
Publication of CN114240002A publication Critical patent/CN114240002A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • G06Q50/40

Abstract

The invention belongs to the technical field of intelligent bus dispatching systems, and discloses a dynamic optimization algorithm of a bus departure schedule based on deep reinforcement learning, wherein a deep reinforcement learning method is introduced when the bus departure schedule is optimized, and the method comprises the following steps: the deep reinforcement learning is based on a simulation model, the randomness that the travel time between the buses is possibly influenced by factors such as road conditions, traffic lights, weather and the like is considered, the passenger flow characteristics generated according to passenger requirements and OD rules are added, and the defects of the classical commercial traffic simulation software due to the lack of functions of modeling and performance evaluation of the passenger flow are overcome. The invention considers the complex passenger flow when dynamically optimizing the bus departure schedule, establishes a bus running simulation model combining the passenger flow and the traffic flow, can capture the complex environment state in real time by a PPO algorithm and quickly generate the bus departure schedule, is a dynamic optimization strategy, and has higher robustness in the face of the complex environment.

Description

Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning
Technical Field
The invention relates to the technical field of intelligent bus dispatching systems, in particular to a dynamic optimization algorithm of a bus departure schedule based on deep reinforcement learning.
Background
The mainstream method for researching the bus departure timetable at home and abroad is to adopt an operation research modeling method, establish an objective function combined by single or multiple indexes such as minimum passenger waiting time, shortest transfer time, lowest bus operation cost and the like, and optimally calculate the departure time or departure interval of a single or multiple bus routes. On the other hand, the scheduling optimization method based on the traffic simulation model has the advantage that the traffic flow and passenger flow dynamic behaviors can be truly reflected. However, traffic flow characteristics can be embodied by adopting traffic simulation software such as classical Aimsum, Vissim and Paramics, but the functions of passenger flow modeling and performance evaluation are lacked.
Because the influence of many factors (such as mixed traffic flow, passenger flow, traffic lights, etc.) must be considered when solving the bus scheduling optimization problem; most of the existing researches for bus dispatching optimization, whether the static optimization problem is solved by using a traditional operational research modeling method or the dynamic optimization problem is solved by using some machine learning methods, the solution obtained by the algorithm is not ideal due to the short visibility of decision and the lack of exploration on the problem structure and the influence of numerous uncertain environmental factors;
in summary, there is no simulation model-based scheduling optimization research that associates passenger flow with bus operation through a simulation means, thereby systematically representing the combination of passenger flow and passenger flow.
Disclosure of Invention
The invention aims to solve the problem that a simulation model based scheduling optimization research combining traffic flow and passenger flow is systematically realized due to the fact that the passenger flow is related to the operation of a bus through a simulation means is lacked in the prior art, and the dynamic optimization algorithm of the bus departure schedule based on deep reinforcement learning is provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
the bus departure timetable dynamic optimization algorithm based on deep reinforcement learning introduces a deep reinforcement learning method when optimizing a bus departure timetable, and comprises the following steps: s1, deep reinforcement learning is based on a simulation model, randomness that travel time between buses is possibly influenced by road conditions, traffic lights, weather and other factors is considered, passenger flow characteristics generated according to passenger requirements and OD rules are added, the defects of the conventional commercial traffic simulation software due to lack of functions of modeling and performance evaluation of passenger flow are overcome, the traffic flow and the passenger flow are expressed, and an actual bus system can be reflected more truly; s2, the reinforcement learning is combined with the principle of dynamic planning and supervised learning, complex scenes can be processed, and the reinforcement learning method has the capability of real-time learning and lifelong learning, so that the reinforcement learning method is very suitable for solving the problem of bus scheduling, and the reinforcement learning method can learn the optimal decision function by using reinforcement signals only by giving a group of feasible actions (Action) and the current bus system State (State), and further make the optimal decision Action.
Further, the reinforcement learning system involves two subjects, namely an Agent (Agent) and an Environment (Environment); the public traffic system has various possible complex states, and can select six characteristics to form a state set s by taking a vehicle as an objectt={tt,kt,jt,lt,bt,pt}; wherein, ttRepresents the time at time step t; k is a radical oft、jt,、ltRespectively representing the driving direction of the vehicle, the station section of the vehicle and the distance to the next station; btIndicating the remaining capacity of the vehicle compartment; p is a radical oftThe total number of waiting people of all stations between the current vehicle and the vehicle in front of the current vehicle is represented; there are two possible operations of the agent: and determining 'departure' and 'non-departure' of the station yard. And then, the state is predicted and action decision is carried out according to a certain time interval for a period of time in the future, so that the dynamic optimization of the bus departure schedule can be realized.
Furthermore, the intelligent agent corresponds to a bus dispatcher, and the environment is constructed based on a bus running simulation model; to realize the bus operation simulation, except for preset data such as line stop information, station vehicle number, departure timetable, departure type and the like, a vehicle travel time rule and a passenger flow 0D rule are required; the travel time of the bus is influenced by factors such as road conditions, traffic lights, station passengers getting on and off the bus, and the passenger 0D is influenced by factors such as weather, trip behaviors, trip time intervals and the like, both are relatively complex random variables, and in order to fit the distribution rule of the random variables, a Kernel Density Estimation (KDE) mode is adopted, so that random numbers required by simulation are generated.
Furthermore, the bus running simulation model mainly comprises three parts, namely, a first departure station, a vehicle arriving and leaving station and a passenger getting on and off the bus; when a vehicle arrives at a bus stop, a series of actions such as queuing for entering the bus stop, opening a vehicle door, getting on and off a passenger, closing the vehicle door and the like are carried out, and then the vehicle leaves the bus stop, wherein the service process can be represented as follows:
Figure BDA0003465121840000031
wherein the content of the first and second substances,
Figure BDA0003465121840000032
and
Figure BDA0003465121840000033
respectively representing the time when the vehicle arrives at and departs from the j station in the k direction pass i, beta represents the time for opening and closing the door, and beta represents the time for opening and closing the doorbAnd betaoRespectively representing the average time spent getting on and off each passenger,
Figure BDA0003465121840000034
and
Figure BDA0003465121840000035
respectively representing the number of passengers getting on the vehicle and the number of passengers getting off the vehicle when the vehicle stops at the j station in the k direction;
the number of waiting people at the station actually considers the number of the preorders due to the failure of getting on the bus:
Figure BDA0003465121840000036
wherein the content of the first and second substances,
Figure BDA0003465121840000037
the number of the total waiting people for the time i +1 at the station j in the direction k,
Figure BDA0003465121840000038
indicating the number of passengers who are staying at the j stop due to unsuccessful boarding when the k-direction pass i reaches the j stop,
Figure BDA0003465121840000039
represents the number of passengers arriving at station j within the time of pass i and pass i +1 in the k direction;
the number of passengers getting on the bus at the stop is limited by the number of passengers carrying the bus:
Figure BDA00034651218400000310
Figure BDA00034651218400000311
wherein m represents the number of nuclear people,
Figure BDA00034651218400000312
representing the number of occupants present when pass i leaves site j-1 in the k direction.
The method for calculating the number of the passengers in the bus is as follows:
Figure BDA0003465121840000041
Figure BDA0003465121840000042
the method for calculating the number of the detained people at the station is as follows:
Figure BDA0003465121840000043
Figure BDA0003465121840000044
wherein the content of the first and second substances,
Figure BDA0003465121840000045
indicating the number of arrival waits for the k-way stop j before the first trip reaches the stop.
Furthermore, the execution of each departure action may affect the effects of other actions; furthermore, the effect on the system after each action has been performed is often delayed, since the vehicle must reach the corresponding station in order to provide the passenger with service; therefore, in the running process of the bus, due to reasons such as passenger flow, road conditions and the like, deviation between the bus returning to the first station and the last station and a pre-planned time schedule often occurs after the bus finishes the preorder times, so that the number of the bus in the vehicle execution time schedule cannot be arranged; in the case of a station without a vehicle, if the agent is still making an departure action, a corresponding penalty should be incurred to better guide the agent toward the target state.
Further, the Reward function (Reward function) is composed of two parts, namely the equal-person waiting time and the penalty of wrong departure instruction:
Figure BDA0003465121840000046
wherein, the time step T is in the value range of [0, T];τ={s1,a1,s2,a2,…,sT,aTAnd is the track of state and action of one round; gamma is a discount factor, and the value range is (0, 1)];ptPunishment of wrong departure instruction is taken for the intelligent agent at the time step t; AWT is the waiting time of the average person; in order to maximize an objective function, a network structure comprising an input layer, an Actor (Actor) network and an evaluator (Critic) is designed by utilizing an artificial neural network; the artificial neural network has a complex mathematical structure, so that the artificial neural network can process a complex public traffic system considering random traffic flow and passenger flowThe method has more advantages; the state space sequence(s)t,st+1,…,sT) The data is used as an input layer and is input into hidden layers of an operator network and a critic network, and the operator network makes decision-making action according to the current state of the public transportation system; the critic network is used for estimating the state value function
Figure BDA0003465121840000051
Further calculate the merit function
Figure BDA0003465121840000052
And participate in parameter updates for the actor network.
In summary, the invention includes at least one of the following beneficial technical effects:
1. the method comprises the steps that training data are collected through bus running simulation, an artificial neural network is trained through a near-end Policy Optimization (PPO) algorithm, the algorithm can capture a complex environment state in real time, and a bus departure schedule is dynamically optimized;
2. the PPO algorithm is used for training the artificial neural network, and the efficiency is higher than that of the traditional method for solving the departure time table after the network training is finished; more importantly, the PPO algorithm can capture the complex environment state in real time and quickly generate a bus departure schedule, and is a dynamic optimization strategy; in addition, the performance of the heuristic algorithm may be affected by fine tuning of the underlying parameter settings, and is difficult to adapt to a new environment, while the invention has higher robustness in the face of a complex environment.
Drawings
FIG. 1 is a learning process diagram illustrating reinforcement learning according to the present invention;
FIG. 2 is a flow chart of bus simulation operation according to the present invention;
FIG. 3 is a diagram illustrating an Actor-critical network structure according to the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work are within the scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are to be construed broadly, e.g., "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1-3, the bus departure schedule dynamic optimization algorithm based on deep reinforcement learning introduces a deep reinforcement learning method when optimizing a bus departure schedule, and comprises the following steps: s1, deep reinforcement learning is based on a simulation model, randomness that travel time between buses is possibly influenced by road conditions, traffic lights, weather and other factors is considered, passenger flow characteristics generated according to passenger requirements and OD rules are added, the defects of the conventional commercial traffic simulation software due to lack of functions of modeling and performance evaluation of passenger flow are overcome, the traffic flow and the passenger flow are expressed, and an actual bus system can be reflected more truly;
s2, the reinforcement learning is combined with the principle of dynamic planning and supervised learning, complex scenes can be processed, and the reinforcement learning method has the capability of real-time learning and lifelong learning, so that the reinforcement learning method is very suitable for solving the problem of bus scheduling, and the reinforcement learning method can learn the optimal decision function by using reinforcement signals only by giving a group of feasible actions (Action) and the current bus system State (State), and further make the optimal decision Action.
The reinforcement learning system involves two subjects, namely an Agent and an Environment; the public traffic system has various possible complex states, and can select six characteristics to form a state set s by taking a vehicle as an objectt={tt,kt,jt,lt,bt,pt}; wherein, ttRepresents the time at time step t; k is a radical oft、jt,、ltRespectively representing the driving direction of the vehicle, the station section of the vehicle and the distance to the next station; btIndicating the remaining capacity of the vehicle compartment; p is a radical oftThe total number of waiting people of all stations between the current vehicle and the vehicle in front of the current vehicle is represented; there are two possible operations of the agent: and determining 'departure' and 'non-departure' of the station yard. Then, state prediction is carried out on a period of time in the future according to a certain time interval, and action decision is carried out, so that the dynamic optimization of the bus departure schedule can be realized; the intelligent agent corresponds to a bus dispatcher, and the environment is constructed based on a bus running simulation model; to realize the bus operation simulation, except for preset data such as line stop information, station vehicle number, departure timetable, departure type and the like, a vehicle travel time rule and a passenger flow 0D rule are required; the travel time of the bus is influenced by factors such as road conditions, traffic lights, station passengers getting on and off the bus, the passengers OD are influenced by factors such as weather, trip behaviors and trip time intervals, both are relatively complex random variables, and in order to fit the distribution rule, a Kernel Density Estimation (KDE) mode is adopted so as to generate random numbers required by simulation.
The bus operation simulation model mainly comprises three parts, namely, a first departure station, a vehicle arrival and departure station and a passenger getting-on and getting-off; when a vehicle arrives at a bus stop, a series of actions such as queuing for entering the bus stop, opening a vehicle door, getting on and off a passenger, closing the vehicle door and the like are carried out, and then the vehicle leaves the bus stop, wherein the service process can be represented as follows:
Figure BDA0003465121840000071
wherein the content of the first and second substances,
Figure BDA0003465121840000072
and
Figure BDA0003465121840000073
respectively representing the time when the vehicle arrives at and departs from the j station in the k direction pass i, beta represents the time for opening and closing the door, and beta represents the time for opening and closing the doorbAnd betaoRespectively representing the average time spent getting on and off each passenger,
Figure BDA0003465121840000074
and
Figure BDA0003465121840000075
respectively representing the number of passengers getting on the vehicle and the number of passengers getting off the vehicle when the vehicle stops at the j station in the k direction;
the number of waiting people at the station actually considers the number of the preorders due to the failure of getting on the bus:
Figure BDA0003465121840000081
wherein the content of the first and second substances,
Figure BDA0003465121840000082
the number of the total waiting people for the time i +1 at the station j in the direction k,
Figure BDA0003465121840000083
indicating the number of passengers who are staying at the j stop due to unsuccessful boarding when the k-direction pass i reaches the j stop,
Figure BDA0003465121840000084
represents the number of passengers arriving at station j within the time of pass i and pass i +1 in the k direction;
the number of passengers getting on the bus at the stop is limited by the number of passengers carrying the bus:
Figure BDA0003465121840000085
Figure BDA0003465121840000086
wherein m represents the number of nuclear people,
Figure BDA0003465121840000087
representing the number of occupants present when pass i leaves site j-1 in the k direction.
The method for calculating the number of the passengers in the bus is as follows:
Figure BDA0003465121840000088
Figure BDA0003465121840000089
the method for calculating the number of the detained people at the station is as follows:
Figure BDA00034651218400000810
Figure BDA00034651218400000811
wherein the content of the first and second substances,
Figure BDA00034651218400000812
to representThe number of arrival waiting persons before the first trip reaches the station at the station j in the k direction.
The execution of each departure action may affect the effects of other actions; furthermore, the effect on the system after each action has been performed is often delayed, since the vehicle must reach the corresponding station in order to provide the passenger with service; therefore, in the running process of the bus, due to reasons such as passenger flow, road conditions and the like, deviation between the bus returning to the first station and the last station and a pre-planned time schedule often occurs after the bus finishes the preorder times, so that the number of the bus in the vehicle execution time schedule cannot be arranged; in the case of a station without a vehicle, if the agent is still making an departure action, a corresponding penalty should be incurred to better guide the agent toward the target state.
The return function (Reward function) is composed of two parts of the average waiting time and the punishment of the wrong departure instruction:
Figure BDA0003465121840000091
wherein, the time step T is in the value range of [0, T];τ={s1,a1,s2,a2,…,sT,aTAnd is the track of state and action of one round; gamma is a discount factor, and the value range is (0, 1)];ptPunishment of wrong departure instruction is taken for the intelligent agent at the time step t; AWT is the waiting time of the average person; in order to maximize an objective function, a network structure comprising an input layer, an Actor (Actor) network and an evaluator (Critic) is designed by utilizing an artificial neural network; the artificial neural network has a complex mathematical structure, so that the artificial neural network has greater advantages when processing a complex public transportation system considering random traffic flow and passenger flow; the state space sequence(s)t,st+1,…,sT) The data is used as an input layer and is input into hidden layers of an operator network and a critic network, and the operator network makes decision-making action according to the current state of the public transportation system; the critic network is used for estimating the state value function
Figure BDA0003465121840000092
Further calculate the merit function
Figure BDA0003465121840000093
And participate in parameter updates for the actor network.
In summary, the invention optimizes the bus departure schedule by a bus operation simulation model based on the combination of traffic flow and passenger flow, thereby better conforming to the actual bus operation condition; secondly, the artificial neural network is trained by utilizing the PPO algorithm, and the efficiency is higher than that of the traditional method for solving the departure schedule after the network training is finished; more importantly, the PPO algorithm can capture the complex environment state in real time and quickly generate a bus departure schedule, and is a dynamic optimization strategy which cannot be compared with the traditional method; in addition, the performance of the heuristic algorithm may be affected by fine tuning of the underlying parameter settings, and is difficult to adapt to a new environment, while the invention has higher robustness in the face of a complex environment.
The above are all preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, so: all equivalent changes made according to the structure, shape and principle of the invention are covered by the protection scope of the invention.

Claims (6)

1. The bus departure timetable dynamic optimization algorithm based on deep reinforcement learning is characterized in that: a deep reinforcement learning method is introduced when a bus departure schedule is optimized, and the method comprises the following steps: s1, deep reinforcement learning is based on a simulation model, randomness that travel time between buses is possibly influenced by road conditions, traffic lights, weather and other factors is considered, passenger flow characteristics generated according to passenger requirements and OD rules are added, the defects of the conventional commercial traffic simulation software due to lack of functions of modeling and performance evaluation of passenger flow are overcome, the traffic flow and the passenger flow are expressed, and an actual bus system can be reflected more truly; s2, the reinforcement learning is combined with the principle of dynamic planning and supervised learning, complex scenes can be processed, and the reinforcement learning method has the capability of real-time learning and lifelong learning, so that the reinforcement learning method is very suitable for solving the problem of bus scheduling, and the reinforcement learning method can learn the optimal decision function by using reinforcement signals only by giving a group of feasible actions (Action) and the current bus system State (State), and further make the optimal decision Action.
2. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 1, characterized in that: the reinforcement learning system involves two subjects, namely an Agent and an Environment; the public traffic system has various possible complex states, and can select six characteristics to form a state set s by taking a vehicle as an objectt={tt,kt,jt,lt,bt,pt}; wherein, ttRepresents the time at time step t; k is a radical oft、jt’、lt’Respectively representing the driving direction of the vehicle, the station section of the vehicle and the distance to the next station; btIndicating the remaining capacity of the vehicle compartment; p is a radical oftThe total number of waiting people of all stations between the current vehicle and the vehicle in front of the current vehicle is represented; there are two possible operations of the agent: and determining 'departure' and 'non-departure' of the station yard. And then, the state is predicted and action decision is carried out according to a certain time interval for a period of time in the future, so that the dynamic optimization of the bus departure schedule can be realized.
3. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 2, characterized in that: the intelligent agent corresponds to a bus dispatcher, and the environment is constructed based on a bus running simulation model; to realize the bus operation simulation, except for preset data such as line stop information, station vehicle number, departure timetable, departure type and the like, a vehicle travel time rule and a passenger flow OD rule are required; the travel time of the bus is influenced by factors such as road conditions, traffic lights, station passengers getting on and off the bus, the passengers OD are influenced by factors such as weather, trip behaviors and trip time intervals, both are relatively complex random variables, and in order to fit the distribution rule, a Kernel Density Estimation (KDE) mode is adopted so as to generate random numbers required by simulation.
4. The bus departure schedule dynamic optimization algorithm based on deep reinforcement learning of claim 3, characterized in that: the bus operation simulation model mainly comprises three parts, namely, a first departure station, a vehicle arrival and departure station and a passenger getting-on and getting-off; when a vehicle arrives at a bus stop, a series of actions such as queuing for entering the bus stop, opening a vehicle door, getting on and off a passenger, closing the vehicle door and the like are carried out, and then the vehicle leaves the bus stop, wherein the service process can be represented as follows:
Figure FDA0003465121830000021
wherein the content of the first and second substances,
Figure FDA0003465121830000022
and
Figure FDA0003465121830000023
respectively representing the time when the vehicle arrives at and departs from the j station in the k direction pass i, beta represents the time for opening and closing the door, and beta represents the time for opening and closing the doorbAnd betaoRespectively representing the average time spent getting on and off each passenger,
Figure FDA0003465121830000024
and
Figure FDA0003465121830000025
respectively representing the number of passengers getting on the vehicle and the number of passengers getting off the vehicle when the vehicle stops at the j station in the k direction;
the number of waiting people at the station actually considers the number of the preorders due to the failure of getting on the bus:
Figure FDA0003465121830000026
wherein the content of the first and second substances,
Figure FDA0003465121830000027
the number of the total waiting people for the time i +1 at the station j in the direction k,
Figure FDA0003465121830000028
indicating the number of passengers who are staying at the j stop due to unsuccessful boarding when the k-direction pass i reaches the j stop,
Figure FDA0003465121830000029
represents the number of passengers arriving at station j within the time of pass i and pass i +1 in the k direction;
the number of passengers getting on the bus at the stop is limited by the number of passengers carrying the bus:
Figure FDA00034651218300000210
Figure FDA00034651218300000211
wherein m represents the number of nuclear people,
Figure FDA0003465121830000031
representing the number of occupants present when pass i leaves site j-1 in the k direction.
The method for calculating the number of the passengers in the bus is as follows:
Figure FDA0003465121830000032
Figure FDA0003465121830000033
the method for calculating the number of the detained people at the station is as follows:
Figure FDA0003465121830000034
Figure FDA0003465121830000035
wherein the content of the first and second substances,
Figure FDA0003465121830000036
indicating the number of arrival waits for the k-way stop j before the first trip reaches the stop.
5. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 4, characterized in that: the execution of each departure action may affect the effects of other actions; furthermore, the effect on the system after each action has been performed is often delayed, since the vehicle must reach the corresponding station in order to provide the passenger with service; therefore, in the running process of the bus, due to reasons such as passenger flow, road conditions and the like, deviation between the bus returning to the first station and the last station and a pre-planned time schedule often occurs after the bus finishes the preorder times, so that the number of the bus in the vehicle execution time schedule cannot be arranged; in the case of a station without a vehicle, if the agent is still making an departure action, a corresponding penalty should be incurred to better guide the agent toward the target state.
6. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 5, characterized in that: the return function (Reward function) is composed of two parts of the average waiting time and the punishment of the wrong departure instruction:
Figure FDA0003465121830000037
wherein, the time step T is in the value range of [0, T];τ={s1,a1,s2,a2,…,sT,aTAnd is the track of state and action of one round; gamma is a discount factor, and the value range is (0, 1)];ptPunishment of wrong departure instruction is taken for the intelligent agent at the time step t; AWT is the waiting time of the average person; in order to maximize an objective function, a network structure comprising an input layer, an Actor (Actor) network and an evaluator (Critic) is designed by utilizing an artificial neural network; the artificial neural network has a complex mathematical structure, so that the artificial neural network has greater advantages when processing a complex public transportation system considering random traffic flow and passenger flow; the state space sequence(s)t,st+1,…,sT) The data is used as an input layer and is input into hidden layers of an operator network and a critic network, and the operator network makes decision-making action according to the current state of the public transportation system; the critic network is used for estimating the state value function
Figure FDA0003465121830000042
Further calculate the merit function
Figure FDA0003465121830000041
And participate in parameter updates for the actor network.
CN202210028133.4A 2022-01-11 2022-01-11 Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning Withdrawn CN114240002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210028133.4A CN114240002A (en) 2022-01-11 2022-01-11 Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210028133.4A CN114240002A (en) 2022-01-11 2022-01-11 Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114240002A true CN114240002A (en) 2022-03-25

Family

ID=80746181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210028133.4A Withdrawn CN114240002A (en) 2022-01-11 2022-01-11 Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114240002A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115034522A (en) * 2022-08-10 2022-09-09 深圳市四格互联信息技术有限公司 Dynamic dispatching method for commuting regular bus based on employee off-duty time and off-duty station
CN115691196A (en) * 2022-10-19 2023-02-03 扬州大学 Multi-strategy fusion control method for bus operation in intelligent networking environment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115034522A (en) * 2022-08-10 2022-09-09 深圳市四格互联信息技术有限公司 Dynamic dispatching method for commuting regular bus based on employee off-duty time and off-duty station
CN115034522B (en) * 2022-08-10 2022-11-25 深圳市四格互联信息技术有限公司 Dynamic dispatching method for commuting regular bus based on employee off-duty time and off-duty station
CN115691196A (en) * 2022-10-19 2023-02-03 扬州大学 Multi-strategy fusion control method for bus operation in intelligent networking environment
CN115691196B (en) * 2022-10-19 2023-10-03 扬州大学 Public transport operation multi-strategy fusion control method in intelligent networking environment

Similar Documents

Publication Publication Date Title
JP3414843B2 (en) Transportation control device
EP4030365A1 (en) Multi-mode multi-service rail transit analog simulation method and system
CN112883640B (en) Digital twin station system, job scheduling method based on system and application
CN107358357A (en) Urban track traffic transfer station evaluation method
CN114240002A (en) Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning
CN110222972B (en) Urban rail transit road network cooperative current limiting method based on data driving
CN114662778B (en) Urban rail transit line network train operation interval cooperative decision method
CN115527369B (en) Large passenger flow early warning and evacuation method under large-area delay condition of airport hub
CN113222387A (en) Multi-objective scheduling and collaborative optimization method for hydrogen fuel vehicle
CN113536692B (en) Intelligent dispatching method and system for high-speed rail train under uncertain environment
Wang et al. A data-driven hybrid control framework to improve transit performance
Xiong et al. Parallel bus rapid transit (BRT) operation management system based on ACP approach
JP6902481B2 (en) Resource arbitration system and resource arbitration device
CN115563761A (en) Subway junction station surrounding road congestion prediction method based on timetable
Li et al. Real-time scheduling on a transit bus route
CN115481777A (en) Multi-line bus dynamic schedule oriented collaborative simulation optimization method, device and medium
CN114004440A (en) Synthetic hub passenger transport organization evaluation method based on angiogic
Zhang et al. Study on evaluation indicators system of crowd management for transfer stations based on pedestrian simulation
Liu Optimization of Computer-aided Decision-making System for Railroad Traffic Dispatching Command
Zhong et al. Deep Q-Learning Network Model for Optimizing Transit Bus Priority at Multiphase Traffic Signal Controlled Intersection
Lioris et al. Overview of a dynamic evaluation of collective taxi systems providing an optimal performance
Wu et al. Reinforcement Learning Based Demand-Responsive Public Transit Dispatching
Xiao et al. A novel bus scheduling model based on passenger flow and bus travel time prediction using the improved cuckoo search algorithm
CN115034622B (en) Input and output risk quantitative evaluation method for public transport system
Yu et al. A new approach on passenger flow assignment with multi-connected agents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220325

WW01 Invention patent application withdrawn after publication