CN114240002A - Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning - Google Patents
Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114240002A CN114240002A CN202210028133.4A CN202210028133A CN114240002A CN 114240002 A CN114240002 A CN 114240002A CN 202210028133 A CN202210028133 A CN 202210028133A CN 114240002 A CN114240002 A CN 114240002A
- Authority
- CN
- China
- Prior art keywords
- bus
- vehicle
- station
- departure
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 40
- 238000005457 optimization Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000004088 simulation Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims abstract description 18
- 238000011156 evaluation Methods 0.000 claims abstract description 5
- 230000007547 defect Effects 0.000 claims abstract description 4
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 239000000126 substance Substances 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 7
- 230000006399 behavior Effects 0.000 claims description 4
- 230000003111 delayed effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000011160 research Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
-
- G06Q50/40—
Abstract
The invention belongs to the technical field of intelligent bus dispatching systems, and discloses a dynamic optimization algorithm of a bus departure schedule based on deep reinforcement learning, wherein a deep reinforcement learning method is introduced when the bus departure schedule is optimized, and the method comprises the following steps: the deep reinforcement learning is based on a simulation model, the randomness that the travel time between the buses is possibly influenced by factors such as road conditions, traffic lights, weather and the like is considered, the passenger flow characteristics generated according to passenger requirements and OD rules are added, and the defects of the classical commercial traffic simulation software due to the lack of functions of modeling and performance evaluation of the passenger flow are overcome. The invention considers the complex passenger flow when dynamically optimizing the bus departure schedule, establishes a bus running simulation model combining the passenger flow and the traffic flow, can capture the complex environment state in real time by a PPO algorithm and quickly generate the bus departure schedule, is a dynamic optimization strategy, and has higher robustness in the face of the complex environment.
Description
Technical Field
The invention relates to the technical field of intelligent bus dispatching systems, in particular to a dynamic optimization algorithm of a bus departure schedule based on deep reinforcement learning.
Background
The mainstream method for researching the bus departure timetable at home and abroad is to adopt an operation research modeling method, establish an objective function combined by single or multiple indexes such as minimum passenger waiting time, shortest transfer time, lowest bus operation cost and the like, and optimally calculate the departure time or departure interval of a single or multiple bus routes. On the other hand, the scheduling optimization method based on the traffic simulation model has the advantage that the traffic flow and passenger flow dynamic behaviors can be truly reflected. However, traffic flow characteristics can be embodied by adopting traffic simulation software such as classical Aimsum, Vissim and Paramics, but the functions of passenger flow modeling and performance evaluation are lacked.
Because the influence of many factors (such as mixed traffic flow, passenger flow, traffic lights, etc.) must be considered when solving the bus scheduling optimization problem; most of the existing researches for bus dispatching optimization, whether the static optimization problem is solved by using a traditional operational research modeling method or the dynamic optimization problem is solved by using some machine learning methods, the solution obtained by the algorithm is not ideal due to the short visibility of decision and the lack of exploration on the problem structure and the influence of numerous uncertain environmental factors;
in summary, there is no simulation model-based scheduling optimization research that associates passenger flow with bus operation through a simulation means, thereby systematically representing the combination of passenger flow and passenger flow.
Disclosure of Invention
The invention aims to solve the problem that a simulation model based scheduling optimization research combining traffic flow and passenger flow is systematically realized due to the fact that the passenger flow is related to the operation of a bus through a simulation means is lacked in the prior art, and the dynamic optimization algorithm of the bus departure schedule based on deep reinforcement learning is provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
the bus departure timetable dynamic optimization algorithm based on deep reinforcement learning introduces a deep reinforcement learning method when optimizing a bus departure timetable, and comprises the following steps: s1, deep reinforcement learning is based on a simulation model, randomness that travel time between buses is possibly influenced by road conditions, traffic lights, weather and other factors is considered, passenger flow characteristics generated according to passenger requirements and OD rules are added, the defects of the conventional commercial traffic simulation software due to lack of functions of modeling and performance evaluation of passenger flow are overcome, the traffic flow and the passenger flow are expressed, and an actual bus system can be reflected more truly; s2, the reinforcement learning is combined with the principle of dynamic planning and supervised learning, complex scenes can be processed, and the reinforcement learning method has the capability of real-time learning and lifelong learning, so that the reinforcement learning method is very suitable for solving the problem of bus scheduling, and the reinforcement learning method can learn the optimal decision function by using reinforcement signals only by giving a group of feasible actions (Action) and the current bus system State (State), and further make the optimal decision Action.
Further, the reinforcement learning system involves two subjects, namely an Agent (Agent) and an Environment (Environment); the public traffic system has various possible complex states, and can select six characteristics to form a state set s by taking a vehicle as an objectt={tt,kt,jt,lt,bt,pt}; wherein, ttRepresents the time at time step t; k is a radical oft、jt,、ltRespectively representing the driving direction of the vehicle, the station section of the vehicle and the distance to the next station; btIndicating the remaining capacity of the vehicle compartment; p is a radical oftThe total number of waiting people of all stations between the current vehicle and the vehicle in front of the current vehicle is represented; there are two possible operations of the agent: and determining 'departure' and 'non-departure' of the station yard. And then, the state is predicted and action decision is carried out according to a certain time interval for a period of time in the future, so that the dynamic optimization of the bus departure schedule can be realized.
Furthermore, the intelligent agent corresponds to a bus dispatcher, and the environment is constructed based on a bus running simulation model; to realize the bus operation simulation, except for preset data such as line stop information, station vehicle number, departure timetable, departure type and the like, a vehicle travel time rule and a passenger flow 0D rule are required; the travel time of the bus is influenced by factors such as road conditions, traffic lights, station passengers getting on and off the bus, and the passenger 0D is influenced by factors such as weather, trip behaviors, trip time intervals and the like, both are relatively complex random variables, and in order to fit the distribution rule of the random variables, a Kernel Density Estimation (KDE) mode is adopted, so that random numbers required by simulation are generated.
Furthermore, the bus running simulation model mainly comprises three parts, namely, a first departure station, a vehicle arriving and leaving station and a passenger getting on and off the bus; when a vehicle arrives at a bus stop, a series of actions such as queuing for entering the bus stop, opening a vehicle door, getting on and off a passenger, closing the vehicle door and the like are carried out, and then the vehicle leaves the bus stop, wherein the service process can be represented as follows:
wherein the content of the first and second substances,andrespectively representing the time when the vehicle arrives at and departs from the j station in the k direction pass i, beta represents the time for opening and closing the door, and beta represents the time for opening and closing the doorbAnd betaoRespectively representing the average time spent getting on and off each passenger,andrespectively representing the number of passengers getting on the vehicle and the number of passengers getting off the vehicle when the vehicle stops at the j station in the k direction;
the number of waiting people at the station actually considers the number of the preorders due to the failure of getting on the bus:
wherein the content of the first and second substances,the number of the total waiting people for the time i +1 at the station j in the direction k,indicating the number of passengers who are staying at the j stop due to unsuccessful boarding when the k-direction pass i reaches the j stop,represents the number of passengers arriving at station j within the time of pass i and pass i +1 in the k direction;
the number of passengers getting on the bus at the stop is limited by the number of passengers carrying the bus:
wherein m represents the number of nuclear people,representing the number of occupants present when pass i leaves site j-1 in the k direction.
The method for calculating the number of the passengers in the bus is as follows:
the method for calculating the number of the detained people at the station is as follows:
wherein the content of the first and second substances,indicating the number of arrival waits for the k-way stop j before the first trip reaches the stop.
Furthermore, the execution of each departure action may affect the effects of other actions; furthermore, the effect on the system after each action has been performed is often delayed, since the vehicle must reach the corresponding station in order to provide the passenger with service; therefore, in the running process of the bus, due to reasons such as passenger flow, road conditions and the like, deviation between the bus returning to the first station and the last station and a pre-planned time schedule often occurs after the bus finishes the preorder times, so that the number of the bus in the vehicle execution time schedule cannot be arranged; in the case of a station without a vehicle, if the agent is still making an departure action, a corresponding penalty should be incurred to better guide the agent toward the target state.
Further, the Reward function (Reward function) is composed of two parts, namely the equal-person waiting time and the penalty of wrong departure instruction:
wherein, the time step T is in the value range of [0, T];τ={s1,a1,s2,a2,…,sT,aTAnd is the track of state and action of one round; gamma is a discount factor, and the value range is (0, 1)];ptPunishment of wrong departure instruction is taken for the intelligent agent at the time step t; AWT is the waiting time of the average person; in order to maximize an objective function, a network structure comprising an input layer, an Actor (Actor) network and an evaluator (Critic) is designed by utilizing an artificial neural network; the artificial neural network has a complex mathematical structure, so that the artificial neural network can process a complex public traffic system considering random traffic flow and passenger flowThe method has more advantages; the state space sequence(s)t,st+1,…,sT) The data is used as an input layer and is input into hidden layers of an operator network and a critic network, and the operator network makes decision-making action according to the current state of the public transportation system; the critic network is used for estimating the state value functionFurther calculate the merit functionAnd participate in parameter updates for the actor network.
In summary, the invention includes at least one of the following beneficial technical effects:
1. the method comprises the steps that training data are collected through bus running simulation, an artificial neural network is trained through a near-end Policy Optimization (PPO) algorithm, the algorithm can capture a complex environment state in real time, and a bus departure schedule is dynamically optimized;
2. the PPO algorithm is used for training the artificial neural network, and the efficiency is higher than that of the traditional method for solving the departure time table after the network training is finished; more importantly, the PPO algorithm can capture the complex environment state in real time and quickly generate a bus departure schedule, and is a dynamic optimization strategy; in addition, the performance of the heuristic algorithm may be affected by fine tuning of the underlying parameter settings, and is difficult to adapt to a new environment, while the invention has higher robustness in the face of a complex environment.
Drawings
FIG. 1 is a learning process diagram illustrating reinforcement learning according to the present invention;
FIG. 2 is a flow chart of bus simulation operation according to the present invention;
FIG. 3 is a diagram illustrating an Actor-critical network structure according to the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work are within the scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are to be construed broadly, e.g., "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1-3, the bus departure schedule dynamic optimization algorithm based on deep reinforcement learning introduces a deep reinforcement learning method when optimizing a bus departure schedule, and comprises the following steps: s1, deep reinforcement learning is based on a simulation model, randomness that travel time between buses is possibly influenced by road conditions, traffic lights, weather and other factors is considered, passenger flow characteristics generated according to passenger requirements and OD rules are added, the defects of the conventional commercial traffic simulation software due to lack of functions of modeling and performance evaluation of passenger flow are overcome, the traffic flow and the passenger flow are expressed, and an actual bus system can be reflected more truly;
s2, the reinforcement learning is combined with the principle of dynamic planning and supervised learning, complex scenes can be processed, and the reinforcement learning method has the capability of real-time learning and lifelong learning, so that the reinforcement learning method is very suitable for solving the problem of bus scheduling, and the reinforcement learning method can learn the optimal decision function by using reinforcement signals only by giving a group of feasible actions (Action) and the current bus system State (State), and further make the optimal decision Action.
The reinforcement learning system involves two subjects, namely an Agent and an Environment; the public traffic system has various possible complex states, and can select six characteristics to form a state set s by taking a vehicle as an objectt={tt,kt,jt,lt,bt,pt}; wherein, ttRepresents the time at time step t; k is a radical oft、jt,、ltRespectively representing the driving direction of the vehicle, the station section of the vehicle and the distance to the next station; btIndicating the remaining capacity of the vehicle compartment; p is a radical oftThe total number of waiting people of all stations between the current vehicle and the vehicle in front of the current vehicle is represented; there are two possible operations of the agent: and determining 'departure' and 'non-departure' of the station yard. Then, state prediction is carried out on a period of time in the future according to a certain time interval, and action decision is carried out, so that the dynamic optimization of the bus departure schedule can be realized; the intelligent agent corresponds to a bus dispatcher, and the environment is constructed based on a bus running simulation model; to realize the bus operation simulation, except for preset data such as line stop information, station vehicle number, departure timetable, departure type and the like, a vehicle travel time rule and a passenger flow 0D rule are required; the travel time of the bus is influenced by factors such as road conditions, traffic lights, station passengers getting on and off the bus, the passengers OD are influenced by factors such as weather, trip behaviors and trip time intervals, both are relatively complex random variables, and in order to fit the distribution rule, a Kernel Density Estimation (KDE) mode is adopted so as to generate random numbers required by simulation.
The bus operation simulation model mainly comprises three parts, namely, a first departure station, a vehicle arrival and departure station and a passenger getting-on and getting-off; when a vehicle arrives at a bus stop, a series of actions such as queuing for entering the bus stop, opening a vehicle door, getting on and off a passenger, closing the vehicle door and the like are carried out, and then the vehicle leaves the bus stop, wherein the service process can be represented as follows:
wherein the content of the first and second substances,andrespectively representing the time when the vehicle arrives at and departs from the j station in the k direction pass i, beta represents the time for opening and closing the door, and beta represents the time for opening and closing the doorbAnd betaoRespectively representing the average time spent getting on and off each passenger,andrespectively representing the number of passengers getting on the vehicle and the number of passengers getting off the vehicle when the vehicle stops at the j station in the k direction;
the number of waiting people at the station actually considers the number of the preorders due to the failure of getting on the bus:
wherein the content of the first and second substances,the number of the total waiting people for the time i +1 at the station j in the direction k,indicating the number of passengers who are staying at the j stop due to unsuccessful boarding when the k-direction pass i reaches the j stop,represents the number of passengers arriving at station j within the time of pass i and pass i +1 in the k direction;
the number of passengers getting on the bus at the stop is limited by the number of passengers carrying the bus:
wherein m represents the number of nuclear people,representing the number of occupants present when pass i leaves site j-1 in the k direction.
The method for calculating the number of the passengers in the bus is as follows:
the method for calculating the number of the detained people at the station is as follows:
wherein the content of the first and second substances,to representThe number of arrival waiting persons before the first trip reaches the station at the station j in the k direction.
The execution of each departure action may affect the effects of other actions; furthermore, the effect on the system after each action has been performed is often delayed, since the vehicle must reach the corresponding station in order to provide the passenger with service; therefore, in the running process of the bus, due to reasons such as passenger flow, road conditions and the like, deviation between the bus returning to the first station and the last station and a pre-planned time schedule often occurs after the bus finishes the preorder times, so that the number of the bus in the vehicle execution time schedule cannot be arranged; in the case of a station without a vehicle, if the agent is still making an departure action, a corresponding penalty should be incurred to better guide the agent toward the target state.
The return function (Reward function) is composed of two parts of the average waiting time and the punishment of the wrong departure instruction:
wherein, the time step T is in the value range of [0, T];τ={s1,a1,s2,a2,…,sT,aTAnd is the track of state and action of one round; gamma is a discount factor, and the value range is (0, 1)];ptPunishment of wrong departure instruction is taken for the intelligent agent at the time step t; AWT is the waiting time of the average person; in order to maximize an objective function, a network structure comprising an input layer, an Actor (Actor) network and an evaluator (Critic) is designed by utilizing an artificial neural network; the artificial neural network has a complex mathematical structure, so that the artificial neural network has greater advantages when processing a complex public transportation system considering random traffic flow and passenger flow; the state space sequence(s)t,st+1,…,sT) The data is used as an input layer and is input into hidden layers of an operator network and a critic network, and the operator network makes decision-making action according to the current state of the public transportation system; the critic network is used for estimating the state value functionFurther calculate the merit functionAnd participate in parameter updates for the actor network.
In summary, the invention optimizes the bus departure schedule by a bus operation simulation model based on the combination of traffic flow and passenger flow, thereby better conforming to the actual bus operation condition; secondly, the artificial neural network is trained by utilizing the PPO algorithm, and the efficiency is higher than that of the traditional method for solving the departure schedule after the network training is finished; more importantly, the PPO algorithm can capture the complex environment state in real time and quickly generate a bus departure schedule, and is a dynamic optimization strategy which cannot be compared with the traditional method; in addition, the performance of the heuristic algorithm may be affected by fine tuning of the underlying parameter settings, and is difficult to adapt to a new environment, while the invention has higher robustness in the face of a complex environment.
The above are all preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, so: all equivalent changes made according to the structure, shape and principle of the invention are covered by the protection scope of the invention.
Claims (6)
1. The bus departure timetable dynamic optimization algorithm based on deep reinforcement learning is characterized in that: a deep reinforcement learning method is introduced when a bus departure schedule is optimized, and the method comprises the following steps: s1, deep reinforcement learning is based on a simulation model, randomness that travel time between buses is possibly influenced by road conditions, traffic lights, weather and other factors is considered, passenger flow characteristics generated according to passenger requirements and OD rules are added, the defects of the conventional commercial traffic simulation software due to lack of functions of modeling and performance evaluation of passenger flow are overcome, the traffic flow and the passenger flow are expressed, and an actual bus system can be reflected more truly; s2, the reinforcement learning is combined with the principle of dynamic planning and supervised learning, complex scenes can be processed, and the reinforcement learning method has the capability of real-time learning and lifelong learning, so that the reinforcement learning method is very suitable for solving the problem of bus scheduling, and the reinforcement learning method can learn the optimal decision function by using reinforcement signals only by giving a group of feasible actions (Action) and the current bus system State (State), and further make the optimal decision Action.
2. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 1, characterized in that: the reinforcement learning system involves two subjects, namely an Agent and an Environment; the public traffic system has various possible complex states, and can select six characteristics to form a state set s by taking a vehicle as an objectt={tt,kt,jt,lt,bt,pt}; wherein, ttRepresents the time at time step t; k is a radical oft、jt’、lt’Respectively representing the driving direction of the vehicle, the station section of the vehicle and the distance to the next station; btIndicating the remaining capacity of the vehicle compartment; p is a radical oftThe total number of waiting people of all stations between the current vehicle and the vehicle in front of the current vehicle is represented; there are two possible operations of the agent: and determining 'departure' and 'non-departure' of the station yard. And then, the state is predicted and action decision is carried out according to a certain time interval for a period of time in the future, so that the dynamic optimization of the bus departure schedule can be realized.
3. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 2, characterized in that: the intelligent agent corresponds to a bus dispatcher, and the environment is constructed based on a bus running simulation model; to realize the bus operation simulation, except for preset data such as line stop information, station vehicle number, departure timetable, departure type and the like, a vehicle travel time rule and a passenger flow OD rule are required; the travel time of the bus is influenced by factors such as road conditions, traffic lights, station passengers getting on and off the bus, the passengers OD are influenced by factors such as weather, trip behaviors and trip time intervals, both are relatively complex random variables, and in order to fit the distribution rule, a Kernel Density Estimation (KDE) mode is adopted so as to generate random numbers required by simulation.
4. The bus departure schedule dynamic optimization algorithm based on deep reinforcement learning of claim 3, characterized in that: the bus operation simulation model mainly comprises three parts, namely, a first departure station, a vehicle arrival and departure station and a passenger getting-on and getting-off; when a vehicle arrives at a bus stop, a series of actions such as queuing for entering the bus stop, opening a vehicle door, getting on and off a passenger, closing the vehicle door and the like are carried out, and then the vehicle leaves the bus stop, wherein the service process can be represented as follows:
wherein the content of the first and second substances,andrespectively representing the time when the vehicle arrives at and departs from the j station in the k direction pass i, beta represents the time for opening and closing the door, and beta represents the time for opening and closing the doorbAnd betaoRespectively representing the average time spent getting on and off each passenger,andrespectively representing the number of passengers getting on the vehicle and the number of passengers getting off the vehicle when the vehicle stops at the j station in the k direction;
the number of waiting people at the station actually considers the number of the preorders due to the failure of getting on the bus:
wherein the content of the first and second substances,the number of the total waiting people for the time i +1 at the station j in the direction k,indicating the number of passengers who are staying at the j stop due to unsuccessful boarding when the k-direction pass i reaches the j stop,represents the number of passengers arriving at station j within the time of pass i and pass i +1 in the k direction;
the number of passengers getting on the bus at the stop is limited by the number of passengers carrying the bus:
wherein m represents the number of nuclear people,representing the number of occupants present when pass i leaves site j-1 in the k direction.
The method for calculating the number of the passengers in the bus is as follows:
the method for calculating the number of the detained people at the station is as follows:
5. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 4, characterized in that: the execution of each departure action may affect the effects of other actions; furthermore, the effect on the system after each action has been performed is often delayed, since the vehicle must reach the corresponding station in order to provide the passenger with service; therefore, in the running process of the bus, due to reasons such as passenger flow, road conditions and the like, deviation between the bus returning to the first station and the last station and a pre-planned time schedule often occurs after the bus finishes the preorder times, so that the number of the bus in the vehicle execution time schedule cannot be arranged; in the case of a station without a vehicle, if the agent is still making an departure action, a corresponding penalty should be incurred to better guide the agent toward the target state.
6. The bus departure schedule dynamic optimization algorithm based on the deep reinforcement learning of claim 5, characterized in that: the return function (Reward function) is composed of two parts of the average waiting time and the punishment of the wrong departure instruction:
wherein, the time step T is in the value range of [0, T];τ={s1,a1,s2,a2,…,sT,aTAnd is the track of state and action of one round; gamma is a discount factor, and the value range is (0, 1)];ptPunishment of wrong departure instruction is taken for the intelligent agent at the time step t; AWT is the waiting time of the average person; in order to maximize an objective function, a network structure comprising an input layer, an Actor (Actor) network and an evaluator (Critic) is designed by utilizing an artificial neural network; the artificial neural network has a complex mathematical structure, so that the artificial neural network has greater advantages when processing a complex public transportation system considering random traffic flow and passenger flow; the state space sequence(s)t,st+1,…,sT) The data is used as an input layer and is input into hidden layers of an operator network and a critic network, and the operator network makes decision-making action according to the current state of the public transportation system; the critic network is used for estimating the state value functionFurther calculate the merit functionAnd participate in parameter updates for the actor network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210028133.4A CN114240002A (en) | 2022-01-11 | 2022-01-11 | Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210028133.4A CN114240002A (en) | 2022-01-11 | 2022-01-11 | Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114240002A true CN114240002A (en) | 2022-03-25 |
Family
ID=80746181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210028133.4A Withdrawn CN114240002A (en) | 2022-01-11 | 2022-01-11 | Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114240002A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034522A (en) * | 2022-08-10 | 2022-09-09 | 深圳市四格互联信息技术有限公司 | Dynamic dispatching method for commuting regular bus based on employee off-duty time and off-duty station |
CN115691196A (en) * | 2022-10-19 | 2023-02-03 | 扬州大学 | Multi-strategy fusion control method for bus operation in intelligent networking environment |
-
2022
- 2022-01-11 CN CN202210028133.4A patent/CN114240002A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034522A (en) * | 2022-08-10 | 2022-09-09 | 深圳市四格互联信息技术有限公司 | Dynamic dispatching method for commuting regular bus based on employee off-duty time and off-duty station |
CN115034522B (en) * | 2022-08-10 | 2022-11-25 | 深圳市四格互联信息技术有限公司 | Dynamic dispatching method for commuting regular bus based on employee off-duty time and off-duty station |
CN115691196A (en) * | 2022-10-19 | 2023-02-03 | 扬州大学 | Multi-strategy fusion control method for bus operation in intelligent networking environment |
CN115691196B (en) * | 2022-10-19 | 2023-10-03 | 扬州大学 | Public transport operation multi-strategy fusion control method in intelligent networking environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3414843B2 (en) | Transportation control device | |
EP4030365A1 (en) | Multi-mode multi-service rail transit analog simulation method and system | |
CN112883640B (en) | Digital twin station system, job scheduling method based on system and application | |
CN107358357A (en) | Urban track traffic transfer station evaluation method | |
CN114240002A (en) | Bus departure timetable dynamic optimization algorithm based on deep reinforcement learning | |
CN110222972B (en) | Urban rail transit road network cooperative current limiting method based on data driving | |
CN114662778B (en) | Urban rail transit line network train operation interval cooperative decision method | |
CN115527369B (en) | Large passenger flow early warning and evacuation method under large-area delay condition of airport hub | |
CN113222387A (en) | Multi-objective scheduling and collaborative optimization method for hydrogen fuel vehicle | |
CN113536692B (en) | Intelligent dispatching method and system for high-speed rail train under uncertain environment | |
Wang et al. | A data-driven hybrid control framework to improve transit performance | |
Xiong et al. | Parallel bus rapid transit (BRT) operation management system based on ACP approach | |
JP6902481B2 (en) | Resource arbitration system and resource arbitration device | |
CN115563761A (en) | Subway junction station surrounding road congestion prediction method based on timetable | |
Li et al. | Real-time scheduling on a transit bus route | |
CN115481777A (en) | Multi-line bus dynamic schedule oriented collaborative simulation optimization method, device and medium | |
CN114004440A (en) | Synthetic hub passenger transport organization evaluation method based on angiogic | |
Zhang et al. | Study on evaluation indicators system of crowd management for transfer stations based on pedestrian simulation | |
Liu | Optimization of Computer-aided Decision-making System for Railroad Traffic Dispatching Command | |
Zhong et al. | Deep Q-Learning Network Model for Optimizing Transit Bus Priority at Multiphase Traffic Signal Controlled Intersection | |
Lioris et al. | Overview of a dynamic evaluation of collective taxi systems providing an optimal performance | |
Wu et al. | Reinforcement Learning Based Demand-Responsive Public Transit Dispatching | |
Xiao et al. | A novel bus scheduling model based on passenger flow and bus travel time prediction using the improved cuckoo search algorithm | |
CN115034622B (en) | Input and output risk quantitative evaluation method for public transport system | |
Yu et al. | A new approach on passenger flow assignment with multi-connected agents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220325 |
|
WW01 | Invention patent application withdrawn after publication |