CN113033928A - Design method, device and system of bus shift scheduling model based on deep reinforcement learning - Google Patents
Design method, device and system of bus shift scheduling model based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113033928A CN113033928A CN201911253753.2A CN201911253753A CN113033928A CN 113033928 A CN113033928 A CN 113033928A CN 201911253753 A CN201911253753 A CN 201911253753A CN 113033928 A CN113033928 A CN 113033928A
- Authority
- CN
- China
- Prior art keywords
- scheduling
- shift
- matrix
- reinforcement learning
- bus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000002787 reinforcement Effects 0.000 title claims abstract description 41
- 238000013461 design Methods 0.000 title claims abstract description 38
- 230000008569 process Effects 0.000 claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims description 51
- 230000009471 action Effects 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 20
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 230000000875 corresponding effect Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 230000001276 controlling effect Effects 0.000 claims description 3
- 238000013178 mathematical model Methods 0.000 abstract description 2
- 230000006399 behavior Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Educational Administration (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Primary Health Care (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a design method of a bus scheduling model based on deep reinforcement learning, which comprises the following steps of 1, converting a scheduling process into a Markov decision process, 2, solving the Markov decision process, 3, scheduling according to a solving result, scheduling a departure timetable by using the deep reinforcement learning method, establishing a scheduling mathematical model, parameterizing related information, and scheduling aiming at different cities only by adjusting parameters; the bus operation efficiency is improved, and the bus operation cost is reduced.
Description
Technical Field
The invention relates to the field of intelligent transportation and deep learning research, in particular to a method, a device and a system for designing a bus scheduling model based on deep reinforcement learning, and belongs to the field of intelligent bus scheduling and scheduling.
Background
With the continuous improvement of the motorization level of China, the construction and development of urban infrastructure are rapid, the urban area is continuously expanded, the public transportation construction of a city is more and more comprehensive, however, with the enlargement of the public transportation scale, the public transportation scheduling becomes more and more difficult, and the intelligent scheduling method plays a crucial role in efficiently and reasonably providing the public transportation resource, and is beneficial to more efficiently utilizing the public transportation resource and providing higher-quality public transportation service. In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the traditional public transport scheduling in China mainly depends on manual scheduling and on the experience of scheduling personnel, so that the efficiency is low and the rationality of the scheduling cannot be ensured; the existing scheduling method has low efficiency, and the next scheduling is often performed after one scheduling, so that the method cannot flexibly cope with the constantly changing passenger flow.
Disclosure of Invention
In order to overcome the defects of the prior art, the embodiment of the disclosure provides a design method, a device and a system of a bus scheduling model based on deep reinforcement learning, which greatly improve the scheduling efficiency, and the technical scheme is as follows:
in a first aspect, a design method of a bus shift scheduling model based on deep reinforcement learning is provided, and the method includes:
the rule matrix X belongs to {0,1}N×NElement X of a regular matrixi,jHas the following meanings
The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;
the scheduling matrix Y belongs to {0,1}N×NElement Yi,jHas the following meanings
The initialization elements of the shift scheduling matrix are all 0, and the values of the elements are changed according to the strategy of each step.
The selectable position matrix Z e 0,1}N×Nelement Z of the matrixi,jHas the following meanings
The optional position matrix is initialized to Z ═ X, and the value of the optional position matrix is changed subsequently according to an execution strategy;
the Markov decision establishment process comprises the following steps: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, πθRepresenting the strategy, and theta is a parameter of the strategy; by piθ(a | s) denotes in strategy πθAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;
defining a Markov decision process according to the task of the shift:
strategy piθThe method specifically comprises the following steps: strategic neural network
And a state s: (X, Y, Z) is E.S
Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Yi,jFill in 1 and set the ith row and jth column of Z to 0
Reward R (s, a):
score (Y) is a scoring function, score (Y) eRepresenting a real number field, wherein the scoring function is used for evaluating the shift scheduling result;
step 2, training the shift scheduling strategy neural network:
obtaining an initialization State s0Said initialization state s0The initial values of a regular matrix, a scheduling matrix and an optional position matrix are obtained;
calculating the state stProbability distribution of corresponding actions piθ(a|st):
The input to the strategic neural network is the state stI.e. the NxNx3 tensor of the three matrices, the output of the network being N2A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;
randomly selecting action a according to probability distributiont;
Performing action atThen obtaining the state st+1;
Calculating a reward r of returnt=R(st,at);
Finish a executiontThen obtain st+1If the state action atCorresponding to Zi,jIf the value is 0, exiting; if a is executedtThen, Z becomes all 0, and quitting; otherwise, returning to the step: calculating the state st+1Probability distribution of corresponding actions piθ(a|st+1)
From this, the track tau of the shift is obtained
τ=s0,a0,r0,s1,a1,r1,…,sT,aT,rT
And updating parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning to obtain the bus scheduling model.
Preferably, the shift j may be executed by the same vehicle after the shift i is executed, specifically: the departure time of the shift j is within 10-40 min after the arrival time of the shift i.
Preferably, the scoring function score (Y) is
Where alpha and beta are hyper-parameters for controlling the ratio.
the method for updating the parameters of the strategy neural network comprises the following steps:
further, the method also comprises a step 3 of using the model trained in the step 2 to carry out scheduling, wherein the action selected in each step is at=maxπθ(a|st) And finally obtaining a scheduling matrix Y to obtain a scheduling result.
Preferably, the method for generating the departure schedule includes:
acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;
acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday;
dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time periodi,i∈{1,2,…,h}
And obtaining a departure schedule according to the departure interval.
In a second aspect, a design device of a bus shift scheduling model based on deep reinforcement learning is provided, and specifically comprises a design module and a training module
The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods;
the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.
Preferably, the device further comprises a scheduling module, wherein the scheduling module is used for executing the step 3 of the design method of the bus scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.
In a third aspect, the embodiment of the present disclosure provides a design system of a bus shift scheduling model based on deep reinforcement learning, and the system includes any one of the above design devices of a bus shift scheduling model based on deep reinforcement learning.
Compared with the prior art, one of the technical schemes has the following beneficial effects:
scheduling the departure schedule by using a deep reinforcement learning method, establishing a scheduling mathematical model, parameterizing related information, and scheduling aiming at different cities only by adjusting parameters; improves the operation efficiency of the public transport, reduces the operation cost of the public transport, can continuously adjust the schedule of the public transport according to the passenger flow in the previous period of time,
drawings
Fig. 1 is a structural diagram of a strategic neural network provided in an embodiment of the present disclosure.
Fig. 2 is a scheduling result of a bus scheduling model based on deep reinforcement learning according to an embodiment of the present disclosure.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
In this embodiment, the departure timetable includes, but is not limited to, a bus departure timetable of a bus company, and also includes an enterprise regular bus departure timetable, a subway departure timetable, and the like, which adopt a similar operation mode with a bus.
In a first aspect: the embodiment of the disclosure provides a design method of a bus shift scheduling model based on deep reinforcement learning, and fig. 1 is a mechanism diagram of a strategic neural network provided by the embodiment of the disclosure, and in combination with the diagram, the method mainly comprises the following steps:
converting the scheduling problem into the operation of three matrixes X, Y and Z, wherein the horizontal and vertical titles of the matrixes correspond to the departure time of a departure schedule arranged according to the time sequence, and the three matrixes are defined as follows:
the rule matrix X belongs to {0,1}N×NElement X of a regular matrixi,jHas the following meanings
The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;
preferably, said shift j may be executed by the same vehicle after execution of shift i,the method specifically comprises the following steps: the departure time of the shift j is within a certain time range (for example, 10-40 min) after the arrival time of the shift i; (for example, if the departure time of a vehicle after the 1 st shift is 07:00, the arrival time of the vehicle at the destination is 08:00, then 3 shifts of 08:08, 08:16, 08:24 (6 th, 7 th, 8 shifts, respectively) with departure time between 08:00-08:30, then R is1,6、R1,7、R1,8All 1, i.e. after a vehicle has performed the ith shift, there are several possible outcomes for the next shift performed (jth shift).
The scheduling matrix Y belongs to {0,1}N×NElement Yi,jHas the following meanings
The initialization elements of the shift scheduling matrix are all 0, and the values of the elements are changed according to the strategy of each step. (the execution of shift j by the same vehicle after the execution of shift i is a selection that is actually executable)
The selectable position matrix Z is an element of {0,1}N×NElement Z of the matrixi,jHas the following meanings
Initializing a selectable position matrix, namely Z-X, and subsequently changing the value of the selectable position matrix according to each step of strategy;
the scheduling problem is converted into that under the constraint of a regular matrix X, an optional position matrix Z is used as the constraint in each step, a position is generated in each step to change a typesetting matrix Y, and finally a scheduling table is generated through Y; according to the definition of three matrixes, when all the matrixes Z are 0, Y is one shift;
preferably, the method for generating the departure schedule includes:
acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;
acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday; for example: aggregating the passenger flow data of each day according to the proportion of 6:00-6:30, 6:30-7:00 … … every half hour, then calculating the average value of the aggregated passenger flow in 8 continuous Mondays according to the number of days to obtain the average passenger flow of each half hour, and obtaining m average passenger flows in one day; regarding the value of the time interval of Q, if the time interval is too small, the randomness of the traffic will increase, and the accuracy of the traffic prediction will decrease, and if the time interval is too large, for example, the traffic prediction is performed every 2h, so that there are only 12 traffic prediction values in a day, and it is difficult and not reasonable to apply the 12 traffic prediction values in the above 7 time periods.
Dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time periodiI belongs to {1,2, …, h } such as the number of people in a single bus core is 60, and the expected real load rate is 0.6
Obtaining a departure schedule according to the departure interval
The Markov decision establishment process comprises the following steps: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, πθRepresenting the strategy, and theta is a parameter of the strategy; by piθ(a | s) denotes in strategy πθAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;
defining a Markov decision process according to the task of the shift:
strategy piθThe method specifically comprises the following steps: the strategic neural network, the structure of which is shown in figure 2,
and a state s: (X, Y, Z) is E.S
Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Yi,jFill in 1 and set the ith row and jth column of Z to 0
Reward R (s, a):
score (Y) is a scoring function, score (Y) eRepresenting a real number field, the scoring function is used for evaluating the shift result
Preferably, the scoring function score (Y) is
Wherein alpha and beta are hyper-parameters for controlling the ratio;
step 2, training the shift scheduling strategy neural network:
1. obtaining an initialization State s0Said initialization state s0The initial values of three matrixes of a regular matrix, a scheduling matrix and an optional position matrix
2. Calculating the state stProbability distribution of corresponding actions piθ(a|st) The input to the strategic neural network is state stI.e. the NxNx3 tensor of the three matrices, the output of the network being N2A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;
3. randomly selecting action a according to probability distributiont
4. Performing action atThen obtaining the state st+1
5. Calculating a reward r of returnt=R(st,at)
6. If the status is action atCorresponding to Zi,jIf the value is 0, exiting; if a is executedtThen, Z becomes all 0, and quitting; otherwise go to 2
From this, the track tau of the shift is obtained
τ=s0,a0,r0,s1,a1,r1,…,sT,aT,rT
7. And updating parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning to obtain the bus scheduling model.
Preferably, the method also comprises the step of scheduling by using the model trained in the step 2, wherein the action selected in each step is
at=maxπθ(a|st)
Finally obtaining a scheduling matrix Y, namely obtaining a scheduling result, wherein the scheduling result is shown in figure 2, all 0 in one row represents the corresponding shift of the row and is the first shift of the vehicle executed by the row; one of the behaviors 0 represents the shift corresponding to the current behavior and is the last shift of the vehicle executed by the behavior 0.
The second aspect provides a design device of a bus scheduling model based on deep reinforcement learning, and based on the same technical concept, the device can execute the process of the design method of the bus scheduling model based on deep reinforcement learning; the device specifically comprises a design module and a training module
The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods;
the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.
It should be noted that, when the design apparatus for a bus shift scheduling model based on deep reinforcement learning provided in the foregoing embodiment executes a design method for a bus shift scheduling model based on deep reinforcement learning, the division of the functional modules is merely illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiment of the design device of the bus scheduling model based on the deep reinforcement learning and the embodiment of the design method of the bus scheduling model based on the deep reinforcement learning belong to the same concept, and the specific implementation process is detailed in the method embodiment and is not described herein again.
In a third aspect, the embodiment of the present disclosure provides a design system of a bus shift scheduling model based on deep reinforcement learning, and the system includes any one of the above design devices of the bus shift scheduling model based on deep reinforcement learning.
The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.
Claims (9)
1. A design method of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising the following steps:
step 1, generating three matrixes according to a departure schedule: a regular matrix X, a scheduling matrix Y and an optional position matrix Z; establishing a Markov decision process;
the rule matrix X belongs to {0,1}N×NElement X of a regular matrixi,jHas the following meanings
The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;
the scheduling matrix Y belongs to {0,1}N×NElement Yi,jHas the following meanings
Initializing elements of the scheduling matrix are all 0, and changing the value of the elements according to each step of strategy;
the selectable position matrix Z is an element of {0,1}N×NElement Z of the matrixi,jHas the following meanings
The optional position matrix is initialized to Z ═ X, and the value of the optional position matrix is changed subsequently according to an execution strategy;
the Markov decision process is as follows: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, πθRepresenting the strategy, and theta is a parameter of the strategy; by piθ(a | s) denotes in strategy πθAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;
defining a Markov decision process according to the task of the shift:
strategy piθThe method specifically comprises the following steps: strategic neural network
And a state s: (X, Y, Z) is E.S
Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Yi,jFill in 1 and set the ith row and jth column of Z to 0
Reward R (s, a):
the score (Y) is a function of the score, representing a real number field, wherein the scoring function is used for evaluating the shift scheduling result;
step 2, training the shift scheduling strategy neural network:
obtaining an initialization State s0Said initialization state s0The initial values of a regular matrix, a scheduling matrix and an optional position matrix are obtained;
calculating the state stProbability distribution of corresponding actions piθ(a|st):
The input to the strategic neural network is the state stI.e. the NxNx3 tensor of the three matrices, the output of the network being N2A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;
randomly selecting action a according to probability distributiont;
Performing action atThen obtaining the state st+1;
Calculating a reward r of returnt=R(st,at);
Finish a executiontThen obtain st+1If the state action atCorresponding to Zi,j0, then quit(ii) a If a is executedtThen, Z becomes all 0, and quitting; otherwise, returning to the step: calculating the state st+1Probability distribution of corresponding actions piθ(a|st+1)
From this, the track tau of the shift is obtained
τ=s0,a0,r0,s1,a1,r1,…,sT,aT,rT
Updating the parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning,
and obtaining the bus shift scheduling model.
2. The design method of the bus scheduling model based on the deep reinforcement learning as claimed in claim 1, wherein the shift j can be executed by the same vehicle after the shift i is executed, specifically: the departure time of the shift j is within 10-40 min after the arrival time of the shift i.
5. the design method of the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 1 to 4, further comprising a step 3 of scheduling by using the model trained in the step 2, wherein the action selected in each step is
at=maxπθ(a|st)
Finally obtaining a scheduling matrix Y, and obtaining a scheduling result.
6. The design method of the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 1 to 5, wherein the generation method of the departure schedule is as follows:
acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;
acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday;
dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time periodi,i∈{1,2,…,h}
And obtaining a departure schedule according to the departure interval.
7. A design device of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising a design module and a training module
The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on deep reinforcement learning of any one of claims 1 to 6;
the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on deep reinforcement learning in any one of claims 1 to 6.
8. The device for designing the bus scheduling model based on the deep reinforcement learning as claimed in claim 7, further comprising a scheduling module, wherein the scheduling module is used for executing the step 3 of the method for designing the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 5 to 6.
9. A design system of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising a design device of the bus shift scheduling model based on deep reinforcement learning as claimed in any one of claims 7-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911253753.2A CN113033928B (en) | 2019-12-09 | 2019-12-09 | Method, device and system for designing bus shift model based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911253753.2A CN113033928B (en) | 2019-12-09 | 2019-12-09 | Method, device and system for designing bus shift model based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113033928A true CN113033928A (en) | 2021-06-25 |
CN113033928B CN113033928B (en) | 2023-10-31 |
Family
ID=76451359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911253753.2A Active CN113033928B (en) | 2019-12-09 | 2019-12-09 | Method, device and system for designing bus shift model based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113033928B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114117883A (en) * | 2021-09-15 | 2022-03-01 | 兰州理工大学 | Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning |
CN114781267A (en) * | 2022-04-28 | 2022-07-22 | 中国移动通信集团浙江有限公司杭州分公司 | Multi-source big data-based dynamic bus management method and system for stop and transfer |
CN116704778A (en) * | 2023-08-04 | 2023-09-05 | 创意(成都)数字科技有限公司 | Intelligent traffic data processing method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881992A (en) * | 2015-06-12 | 2015-09-02 | 天津大学 | Urban public transport policy analysis platform based on multi-agent simulation |
CN106228314A (en) * | 2016-08-11 | 2016-12-14 | 电子科技大学 | The workflow schedule method of study is strengthened based on the degree of depth |
CN109166303A (en) * | 2018-08-30 | 2019-01-08 | 北京航天控制仪器研究所 | A kind of public transport is arranged an order according to class and grade the method and system of scheduling |
CN110084505A (en) * | 2019-04-22 | 2019-08-02 | 南京行者易智能交通科技有限公司 | A kind of smart shift scheduling method and device based on passenger flow, mobile end equipment, server |
EP3543918A1 (en) * | 2018-03-20 | 2019-09-25 | Flink AI GmbH | Reinforcement learning method |
-
2019
- 2019-12-09 CN CN201911253753.2A patent/CN113033928B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881992A (en) * | 2015-06-12 | 2015-09-02 | 天津大学 | Urban public transport policy analysis platform based on multi-agent simulation |
CN106228314A (en) * | 2016-08-11 | 2016-12-14 | 电子科技大学 | The workflow schedule method of study is strengthened based on the degree of depth |
EP3543918A1 (en) * | 2018-03-20 | 2019-09-25 | Flink AI GmbH | Reinforcement learning method |
CN109166303A (en) * | 2018-08-30 | 2019-01-08 | 北京航天控制仪器研究所 | A kind of public transport is arranged an order according to class and grade the method and system of scheduling |
CN110084505A (en) * | 2019-04-22 | 2019-08-02 | 南京行者易智能交通科技有限公司 | A kind of smart shift scheduling method and device based on passenger flow, mobile end equipment, server |
Non-Patent Citations (1)
Title |
---|
王庆荣;朱昌盛;梁剑波;冯文熠;: "基于遗传算法的公交智能排班系统应用研究", 计算机仿真, no. 03 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114117883A (en) * | 2021-09-15 | 2022-03-01 | 兰州理工大学 | Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning |
CN114781267A (en) * | 2022-04-28 | 2022-07-22 | 中国移动通信集团浙江有限公司杭州分公司 | Multi-source big data-based dynamic bus management method and system for stop and transfer |
CN114781267B (en) * | 2022-04-28 | 2023-08-29 | 中国移动通信集团浙江有限公司杭州分公司 | Multi-source big data-based job-living connection dynamic bus management method and system |
CN116704778A (en) * | 2023-08-04 | 2023-09-05 | 创意(成都)数字科技有限公司 | Intelligent traffic data processing method, device, equipment and storage medium |
CN116704778B (en) * | 2023-08-04 | 2023-10-24 | 创意(成都)数字科技有限公司 | Intelligent traffic data processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113033928B (en) | 2023-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104504229B (en) | A kind of intelligent public transportation dispatching method based on hybrid metaheuristics | |
CN113033928A (en) | Design method, device and system of bus shift scheduling model based on deep reinforcement learning | |
Qin et al. | Reinforcement learning for ridesharing: An extended survey | |
Yang et al. | A bi-objective timetable optimization model incorporating energy allocation and passenger assignment in an energy-regenerative metro system | |
El-Tantawy et al. | Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto | |
CN102044149B (en) | City bus operation coordinating method and device based on time variant passenger flows | |
Zhao et al. | An integrated approach of train scheduling and rolling stock circulation with skip-stopping pattern for urban rail transit lines | |
Zhong et al. | A differential evolution algorithm with dual populations for solving periodic railway timetable scheduling problem | |
CN111105141B (en) | Demand response type bus scheduling method | |
CN107919014B (en) | Taxi running route optimization method for multiple passenger mileage | |
Chen et al. | Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning | |
CN114723125B (en) | Inter-city vehicle order allocation method combining deep learning and multitask optimization | |
CN110084505A (en) | A kind of smart shift scheduling method and device based on passenger flow, mobile end equipment, server | |
CN110211379A (en) | A kind of public transport method for optimizing scheduling based on machine learning | |
CN112417753A (en) | Urban public transport resource joint scheduling method | |
WO2024174566A1 (en) | Multi-vehicle-type timetable design method and system for intelligent bus system | |
CN114117883A (en) | Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning | |
CN110490365B (en) | Method for predicting network car booking order quantity based on multi-source data fusion | |
CN115222251A (en) | Network taxi appointment scheduling method based on hybrid layered reinforcement learning | |
CN117371611A (en) | Subway train operation plan programming method, medium and system | |
CN112766605A (en) | Multi-source passenger flow prediction system and method based on container cloud platform | |
CN115352502A (en) | Train operation scheme adjusting method and device, electronic equipment and storage medium | |
Hao et al. | Timetabling for a congested urban rail transit network based on mixed logic dynamic model | |
CN115510664A (en) | Instant delivery real-time cooperation scheduling system based on layered reinforcement learning | |
Wu et al. | Optimizing timetable synchronization for regional public transit with minimum transfer waiting times |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |