CN115170006A - Dispatching scheduling method, device, equipment and storage medium - Google Patents
Dispatching scheduling method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN115170006A CN115170006A CN202211095230.1A CN202211095230A CN115170006A CN 115170006 A CN115170006 A CN 115170006A CN 202211095230 A CN202211095230 A CN 202211095230A CN 115170006 A CN115170006 A CN 115170006A
- Authority
- CN
- China
- Prior art keywords
- departure
- time period
- simulation
- station
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 238000003860 storage Methods 0.000 title claims abstract description 16
- 238000004088 simulation Methods 0.000 claims abstract description 278
- 230000009471 action Effects 0.000 claims abstract description 155
- 230000015654 memory Effects 0.000 claims abstract description 77
- 230000008569 process Effects 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 33
- 230000006870 function Effects 0.000 claims description 58
- 238000004422 calculation algorithm Methods 0.000 claims description 56
- 238000012546 transfer Methods 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 13
- 230000002452 interceptive effect Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 7
- 230000010365 information processing Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000002068 genetic effect Effects 0.000 description 10
- 230000002787 reinforcement Effects 0.000 description 9
- 238000005457 optimization Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 239000003795 chemical substances by application Substances 0.000 description 6
- 230000003068 static effect Effects 0.000 description 6
- 230000002159 abnormal effect Effects 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000009499 grossing Methods 0.000 description 3
- 238000013178 mathematical model Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 230000010485 coping Effects 0.000 description 2
- 238000012067 mathematical method Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/12—Timing analysis or timing optimisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- General Engineering & Computer Science (AREA)
- Geometry (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Train Traffic Observation, Control, And Security (AREA)
Abstract
The invention discloses a dispatching method, a device, equipment and a storage medium for dispatching a train, wherein the method comprises the following steps: carrying out passenger flow simulation of preset total simulation times according to passenger travel data, acquiring sample data in the simulation process, storing the sample data into a memory bank, and acquiring the number of quick departure time periods, total waiting time and a current value network corresponding to the current simulation after each simulation is finished; when the number of the sample data in the memory base reaches a preset number threshold value, randomly selecting a batch of sample data from the memory base according to the preset batch size, and training the latest current value network to obtain the latest current value network; when the simulation times reach the preset total simulation times, determining an optimal current value network corresponding to each quick departure time period quantity; and determining the action of the next departure time period according to the optimal current value network corresponding to the number of the quick departure time periods and the state data of the departure time period. The invention can dynamically adjust the departure mode in real time.
Description
Technical Field
The invention relates to the technical field of vehicle scheduling, in particular to a dispatching method, a dispatching device, dispatching equipment and a storage medium.
Background
A mathematical model is constructed to represent the complex change of the number of subway passengers, and the method for searching the optimal scheduling mode is a common method in the scheduling field. Sun L et al constructs three mathematical models for different conditions and solves the optimal subway schedules under different conditions, including subway scheduling without departure number limit constraints, subway scheduling with departure number constraints only for peak periods/off-peak periods, and all-day subway scheduling with departure number constraints. Yang X et al have carried out multiobjective modeling optimization to subway station case, and its target has included the convenience degree of maximize passenger and the cost that the minimum subway was sent out the car and has obtained a series of optimal solutions through solving pareto optimal curve. Kang L et al constructed a Mixed Integer Linear Programming (MILP) model to coordinate departure times for last cars on multiple routes. The model is divided into two small-scale MILP models, and the model is optimized by a WebSphere ILOG CPLEX solver.
In addition, heuristic algorithms are also used for subway scheduling, and compared with a method for constructing a mathematical model, the algorithms can complete a better solution process more quickly. Kuppusamy P et al combines Long Short-Term Memory (Long Short-Term Memory) and Improved Genetic Algorithm (Improved Genetic Algorithm) to optimize the schedule of a single subway station and improve the anti-interference capability of the subway system. Yang S et al use a Non-dominant sequencing Genetic Algorithm (Non-Dominated sequencing Genetic Algorithm II) to optimize the total travel time of the passenger, including the waiting time in line and the riding time.
However, the above solution has the following disadvantages:
1. most mathematical solutions face a problem: the complexity of the subway model caused by the high-frequency departure of the subway is too high. If the dispatching scheme of the subway system is completely modeled and solved by adopting a mathematical method, the model is difficult to solve due to excessive variables caused by overhigh complexity (especially time latitude). The usual mathematical methods are therefore able to consider only a part of the system, for example only a few stations are optimized, only the case of a transfer or last car. This makes the solution of this class of solutions neglect many real-world details, which may be less than ideal in a practical application.
2. The accuracy of data recording is low, and in the real world, along with mobile payment, the mode of 'scanning code inbound' is gradually popularized, and it is difficult for subway operators to know the destination of a passenger when the passenger arrives. Some traditional models often need incoming and outgoing information accurate to specific passengers, so that the algorithm cannot be well adapted to the current trend of mobile payment development.
3. The traditional learning scheme, such as departure time given by a genetic algorithm, has great limitations in practical application. The algorithm can calculate to obtain the optimal departure timetable of the current day only after knowing the complete passenger flow condition of the current day. But the operation of the current day is finished at this time, and the obtained schedule does not have great significance to the past. The timetable obtained by the genetic algorithm is a static timetable, and if the genetic algorithm is adopted, the timetable is trained by using the data of the T day, and the departure frequency of the T +1 day is guided, the people stream movement characteristics appearing in the T +1 day can be difficult to control. If there is a crowd gathering event in the T +1 th day, such as the holding of a large concert, subway scheduling personnel cannot perform more optimal subway scheduling according to the original model, and serious personnel detention can be caused.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a dispatching method, device, equipment and storage medium for dispatching a train are provided, which can dynamically adjust the dispatching mode in real time.
In a first aspect, the present invention provides a dispatching method, including:
initializing a current value network and a target value network, and dividing the operation time of a train on a line into a preset number of departure time periods;
obtaining passenger trip data of one historical day and the capacity of the train of the line, wherein the passenger trip data comprises the inbound time, the inbound station ID, the outbound station ID and the transit station ID of each passenger, and the transit station ID is determined according to the inbound station and the outbound station through a shortest path algorithm;
according to the passenger travel data and the capacity of the trains of the line, carrying out passenger flow simulation of preset total simulation times, acquiring sample data in the process of each simulation, storing the sample data in a memory library, counting the number of quick departure time periods and the total waiting time corresponding to the current simulation, and determining a current value network corresponding to the current simulation, wherein the first simulation is the passenger flow simulation of the operation time of one day, each sample data comprises state data of a departure time period, action and state data of the next departure time period of the departure time period and a return value, the state data comprises the number of persons in each station on the line, the position of the issued trains and the number of persons in the issued trains, the action is used for departure at a preset quick departure frequency or departure at a preset slow departure frequency, and the action of the next departure time period is determined according to the latest current value network;
when new sample data is stored in a memory base and the number of the sample data in the memory base reaches a preset number threshold value, randomly selecting a batch of sample data from the memory base according to a preset batch size, training a latest current value network according to the batch of sample data, and taking the trained current value network as the latest current value network;
when the simulation times reach the preset total simulation times, determining the optimal current value network corresponding to each quick departure time period according to the quick departure time period number, the total waiting time and the current value network corresponding to each simulation;
selecting an optimal current value network corresponding to the number of the fast departure time periods according to requirements;
and acquiring the state data of the one-vehicle time period of the one line, and determining the action of the next one-vehicle time period of the one-vehicle time period through the selected optimal current value network according to the state data of the one-vehicle time period.
In a second aspect, the present invention further provides a dispatching device for dispatching a train, including:
the initialization module is used for initializing a current value network and a target value network and dividing the operation time of a train on a line into a preset number of departure time periods;
the system comprises an acquisition module, a traffic information acquisition module and a traffic information processing module, wherein the acquisition module is used for acquiring passenger trip data of one historical day and the capacity of a train of the line, the passenger trip data comprises inbound time, inbound station ID, outbound station ID and transit station ID of each passenger, and the transit station ID is determined according to the inbound station and the outbound station through a shortest path algorithm;
the simulation module is used for carrying out passenger flow simulation of preset total simulation times according to the passenger travel data and the capacity of the trains of the line, acquiring sample data in the process of each simulation, storing the sample data in a memory library, counting the number of quick departure time periods and the total waiting time corresponding to the current simulation, and determining a current value network corresponding to the current simulation, wherein the one-time simulation is the passenger flow simulation of the operation time of one day, each sample data comprises state data of a departure time period, action and state data of the next departure time period of the departure time period and a return value, the state data comprises the number of persons in each station on the line, the position of the train which has been sent out and the number of persons in the train which has been listed, the action is the departure with a preset quick departure frequency or the departure with a preset slow departure frequency, and the action of the next departure time period is determined according to the latest current value network;
the training module is used for randomly selecting a batch of sample data from the memory bank according to the size of a preset batch when new sample data is stored in the memory bank and the number of the sample data in the memory bank reaches a preset number threshold, training the latest current value network according to the batch of sample data, and taking the trained current value network as the latest current value network;
the first determining module is used for determining the optimal current value network corresponding to each quick departure time period according to the quick departure time period number, the total waiting time and the current value network corresponding to each simulation when the simulation times reach the preset total simulation times;
the selection module is used for selecting an optimal current value network corresponding to the number of the time periods of quick departure according to the requirement;
and the second determining module is used for acquiring the state data of the departure time period of the line and determining the action of the next departure time period of the departure time period through the selected optimal current value network according to the state data of the departure time period.
In a third aspect, the present invention also provides an electronic device, including:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the departure scheduling method as provided in the first aspect.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the departure scheduling method provided in the first aspect.
The invention has the beneficial effects that: the DQN algorithm is used for selecting the departure action of the next time slot by giving the environmental state of the intelligent at the current moment when being trained, so that when the DQN algorithm is deployed in the actual production life, the decision of departure of the next departure time slot can be given in real time only by giving the states of the train and the station at the current moment, the departure mode can be dynamically adjusted in real time according to the current pedestrian flow condition, the rail traffic pressure is effectively reduced, and the DQN algorithm has excellent self-adaption capability to the emergency conditions such as abnormal pedestrian flow; meanwhile, the optimal models corresponding to different quick departure time periods are generated along with the learning process of the intelligent agent in the training process, and the staff can select the appropriate models to deploy according to the actual conditions.
Drawings
Fig. 1 is a flowchart of a dispatching method provided in the present invention;
fig. 2 is a schematic structural diagram of a dispatching device for dispatching a train in accordance with the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided in the present invention;
fig. 4 is a flowchart of a departure scheduling method according to a first embodiment of the present invention;
FIG. 5 is a flow chart illustrating a passenger flow simulation according to a first embodiment of the present invention;
FIG. 6 is a flow chart illustrating passenger flow simulation during a departure time period according to a first embodiment of the present invention;
FIG. 7 is a flowchart of a method of step S404 according to a first embodiment of the present invention;
fig. 8 is a schematic diagram of training results of a reinforcement learning model when the number of fast departure time periods is 1 to 6 according to the first embodiment of the present invention;
FIG. 9 is a schematic diagram of the training results of the reinforcement learning model when the number of fast departure time periods is 7-14 according to the first embodiment of the present invention;
FIG. 10 is a schematic diagram of the training results of the reinforcement learning model when the number of fast departure time periods is 15-24 according to the first embodiment of the present invention;
fig. 11 is a schematic diagram of an optimal solution of the number of different fast departure time periods according to the first embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but could have additional steps not included in the figure. Processing may correspond to methods, functions, procedures, subroutines, sub-computer programs, and the like.
Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, the first information may be referred to as second information, and similarly, the second information may be referred to as first information, without departing from the scope of the present application. The first information and the second information are both information, but they are not the same information. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
As shown in fig. 1, a departure scheduling method includes:
s101: initializing a current value network and a target value network, and dividing the operation time of a train on a line into a preset number of departure time periods;
s102: obtaining passenger travel data of one historical day and the capacity of the train of the line, wherein the passenger travel data comprise the inbound time, the inbound station ID, the outbound station ID and the transit station ID of each passenger, and the transit station ID is determined according to the inbound station and the outbound station through a shortest path algorithm;
s103: carrying out passenger flow simulation of a preset total number of times of simulation according to the passenger travel data and the capacity of the trains of the one line, acquiring sample data in the process of each simulation, storing the sample data into a memory library, counting the number of quick departure time periods and the total waiting time corresponding to the secondary simulation after each simulation is finished, and determining a current value network corresponding to the secondary simulation, wherein the primary simulation is the passenger flow simulation of the operation time of one day, each sample data comprises state data of a departure time period, action and state data of a next departure time period of the departure time period and a return report value, the state data comprises the number of passengers in each station on the one line, the position of the issued trains and the number of passengers in the issued trains, the action is taken as departure at a preset quick departure frequency or departure at a preset slow departure frequency, and the action of the next departure time period is determined according to the latest current value network;
s104: when new sample data is stored in a memory base and the number of the sample data in the memory base reaches a preset number threshold value, randomly selecting a batch of sample data from the memory base according to a preset batch size, training a latest current value network according to the batch of sample data, and taking the trained current value network as the latest current value network;
s105: when the simulation times reach the preset total simulation times, determining the optimal current value network corresponding to each quick departure time period according to the quick departure time period number, the total waiting time and the current value network corresponding to each simulation;
s106: selecting an optimal current value network corresponding to the number of the time periods of quick departure according to the requirement;
s107: and acquiring the state data of the departure time period of the line, and determining the action of the next departure time period of the departure time period through the selected optimal current value network according to the state data of the departure time period.
In the traditional algorithm, for example, the genetic algorithm can only calculate the optimal solution of the previous day through the subway pedestrian flow situation of the previous day, and the obtained result is used for the current day on the assumption that the current pedestrian flow situation is approximately the same as that of the previous day. Therefore, the traditional algorithm ignores the stream of people characteristics of the current day and cannot make dynamic adjustment in time. When the DQN algorithm is trained, the selection of the departure action of the intelligent body in the next time period is made by giving the environmental state of the intelligent body at the current moment, so that when the DQN algorithm is deployed in the actual production life, the departure decision of the next departure time period can be given in real time only by giving the states of the subway and the subway station at the current moment, the departure mode can be dynamically adjusted in real time according to the current pedestrian flow condition of the subway, the rail traffic pressure is effectively reduced, and the DQN algorithm has excellent self-adaption capability to the emergency conditions such as abnormal pedestrian flow.
In an optional embodiment, the step S103 includes:
presetting the action of a first departure time period of operation time in the ith simulation, and taking the first departure time period as the current departure time period, wherein the initial value of i is 1;
according to the action of the current departure time period, passenger travel data, capacity and preset unit time, carrying out passenger flow simulation of the current departure time period, acquiring state data when the current departure time period is ended as the state data of the current departure time period, and meanwhile, counting the total waiting time of the current departure time period according to the number of waiting passengers at each station on the line in each unit time in the current departure time period;
generating a random number, wherein the range of the random number is 0-1;
if the random number is smaller than the corresponding exploration rate of the ith simulation, randomly generating the action of the next departure time period of the current departure time period;
if the random number is greater than or equal to the exploration rate corresponding to the ith simulation, determining the action of the next departure time period of the current departure time period according to the latest current value network;
according to the action of the next departure time period, passenger travel data, capacity and preset unit time, carrying out passenger flow simulation of the next departure time period, acquiring state data when the next departure time period is ended as the state data of the next departure time period, and meanwhile, according to the number of waiting passengers at each station on the line in each unit time in the next departure time period, counting the total waiting time of the next departure time period;
calculating a return value according to the total waiting time of the current departure time period, the action of the next departure time period and a penalty item function corresponding to the jth iteration, wherein j = 8968i/epoch \8969;, epoch is the preset simulation times of each iteration;
generating sample data according to the state data of the current departure time period, the action and the return value of the next departure time period and the state data of the next departure time period, and storing the sample data in a memory bank;
judging whether the next departure time period is the last departure time period of the operation time or not;
if not, taking the next departure time period as the current departure time period, and continuing to execute the step of generating the random number;
if so, counting the number of departure time periods in the ith simulation, which are taken as the departure time periods for departure at the fast departure frequency, to obtain the number of the fast departure time periods corresponding to the ith simulation, calculating the total waiting time corresponding to the ith simulation according to the total waiting time of each departure time period of the operation time in the ith simulation, and taking the current latest current value network as the current value network corresponding to the ith simulation;
judging whether i is equal to a preset total simulation number;
and if not, determining the corresponding exploration rate of the (i + 1) th simulation according to the exploration rate corresponding to the ith simulation and a preset minimum exploration rate, wherein the exploration rate corresponding to the first simulation is a preset exploration rate initial value, enabling i = i +1, continuously executing the action of the first departure time period of the operation time in the preset ith simulation, and taking the first departure time period as the current departure time period.
And performing primary simulation, namely performing passenger flow simulation of the operation time of one day, generating sample data and storing the sample data in a memory by performing the passenger flow simulation so as to train a current value network later, and generating new sample data based on the latest current value network in real time in the simulation process.
In an optional embodiment, the passenger flow simulation in the current departure time period according to the action in the current departure time period, the passenger travel data, the capacity amount and the preset unit time includes:
taking the first unit time of the current departure time period as the current unit time;
respectively judging whether a train arrives at each station on the line in the current unit time according to the action of the current departure time period and preset train operation data;
if a train arrives at a station, people flow interactive processing is carried out on the station according to passenger travel data and the train capacity, wherein the people flow interactive processing comprises the steps that passengers get off the station and passengers get on the station, the passengers get on the station comprise passengers who get on the station and have station IDs of the stations or transit station IDs of the stations, the passengers in the station comprise inbound passengers and transfer passengers, the inbound passengers comprise passengers who have inbound time earlier than current unit time and have station IDs of the stations and do not get on the station, the transfer passengers comprise passengers who have station IDs of the stations, have time for arriving at the transit stations exceeding preset transfer time, and the stations corresponding to the station IDs of the stations are the stations on the line and have no passengers;
updating the number of people in the station of the station and the number of people in the train which is sent out according to the people flow interaction processing result, and counting the number of people waiting for the train in the current unit time;
if no train arrives at a station, updating the number of people in the station of the station according to passenger travel data and the station ID of the station, and counting the number of people waiting for the train at the station in the current unit time;
counting the total number of waiting passengers in the current unit time according to the number of waiting passengers at each station on the line in the current unit time;
judging whether the current unit time is the last unit time of the current departure time period or not;
and if not, taking the next unit time as the current unit time, continuing to execute the step of respectively judging whether a train arrives at each station on the line in the current unit time according to the action of the current departure time period and preset train operation data.
The simulation system can simulate the flow of people by taking unit time as a minimum division value and calculate the total waiting time of all passengers, and the simulation system also supports transfer operation, and a transfer route is given by a shortest path algorithm. During each departure time period.
In an alternative embodiment, the act of determining the next departure time period from the current departure time period based on the latest current value network comprises:
according to the state data of the current departure time period and a preset action set, calculating the score of each action in the action set through a latest current value network, and taking the action corresponding to the maximum score as the action of the next departure time period, wherein the action set comprises the steps of departure with a preset fast departure frequency and departure with a preset slow departure frequency.
That is, when the generated random number is not less than the exploration rate corresponding to the current simulation, the optimal action is determined based on the latest current value network, and is taken as the action in the next vehicle-issuing time period.
In an optional embodiment, the calculating a return value according to the total waiting time of the current departure time period, the action of the next departure time period and a penalty term function corresponding to the jth iteration includes:
calculating a return value according to a return value calculation formula, wherein the return value calculation formula is r = -C t k -a(f j (x)-f j (x-1)), wherein r is a reported value, C t k If the action of the next departure time period is to proceed departure at a preset fast departure frequency, a =1, and if the action of the next departure time period is to proceed departure at a preset slow departure frequency, a =0,f j (x) And (4) performing a corresponding penalty term function for the j-th iteration.
Further, before letting i = i +1, the method further includes:
judging whether i is equal to the integral multiple of the simulation times of each iteration;
if yes, determining a penalty term function corresponding to the j +1 th iteration according to a penalty term function updating formula and a penalty term function corresponding to the j th iteration, wherein the penalty term function updating formula is f j+1 (x)=K new ×Smooth(C best,j (x))+K old ×f j (x) Wherein f is j+1 (x) Penalty function corresponding to iteration of round j +1, f j (x) For the penalty function corresponding to the j-th iteration, smooth () is a smoothing function, C best,j (x) Represents the minimum total waiting time, K, corresponding to the number x of the fast departure time periods in the jth iteration new And K old For preset regulating parameters, f 1 (x)=x∙M 0 X represents the number of fast departure time periods, M 0 Is a preset penalty item for single quick departure.
That is to say, each time a round of iterative simulation is performed, that is, the penalty term function f (x) is updated, so that the model effect is better, and the model is more universal to cope with subway environments of different cities.
In an alternative embodiment, after step S102, the method further includes:
setting the actions of each departure time period of the operation time as departure frequency, carrying out primary slow departure simulation according to the passenger travel data and the train capacity of the line, and counting the total waiting time of each departure time period in the slow departure simulation to obtain the theoretical longest waiting time;
setting the actions of each departure time period in the operation time as departure at a preset quick departure frequency, carrying out one-time quick departure simulation according to the passenger trip data and the capacity of the train of the one line, and counting to obtain the theoretical shortest waiting time according to the total waiting time of each departure time period in the quick departure simulation;
and dividing the difference between the theoretical longest waiting time and the theoretical shortest waiting time by the total number of departure time periods in a preset day to obtain a punishment item of single-time express departure.
That is, the penalty of a single fast departure can be considered as a reduction in the total waiting time for each additional fast departure time period.
In an optional embodiment, the determining the exploration rate of the i +1 th simulated correspondence according to the exploration rate corresponding to the i-th simulation and a preset minimum exploration rate includes:
determining the exploration rate corresponding to the (i + 1) th simulation according to an exploration rate updating formula, wherein the exploration rate updating formula is epsilon i+1 =max(ε min ,ε i + 0.0045) in which ε i+1 For the i +1 th simulation of the corresponding exploration rate, ε i For the ith simulation of the corresponding exploration rate, epsilon min Is a preset minimum exploration rate.
Wherein the preset initial value epsilon of the exploration rate 1 =1, preset minimum search rate ∈ min =0.1。
That is, the search rate is updated every time a simulation is performed.
In an optional embodiment, before storing the sample data in the memory, the method further includes:
if the memory bank is full, deleting the sample data stored in the memory bank earliest.
Ensuring that the sample data stored in the memory base is the latest sample data.
In an optional embodiment, the step S104 includes:
when new sample data is stored in the memory base and the number of the sample data in the memory base reaches a preset number threshold, randomly selecting sample data with a preset batch size from the memory base as the sample data of the current batch, and using the latest current value network as the current value network to be trained;
traversing the sample data of the current batch, and sequentially acquiring sample data from the sample data of the current batch;
calculating the score corresponding to the state data of the current departure time period and the action of the next departure time period in the sample data through the latest current value network to be trained, and taking the score as a first score corresponding to the sample data;
respectively calculating scores of the state data of the next vehicle-starting time period in the sample data corresponding to each action through a latest target value network, and taking the maximum score as a second score corresponding to the sample data;
calculating a loss value according to a return value in the sample data, a first score and a second score corresponding to the sample data and a preset discount rate, and updating the latest network parameters of the current value network to be trained according to the loss value;
and after traversing the sample data of the current batch, taking the latest current value network to be trained as the latest current value network.
That is, after training according to a complete batch of sample data, it is calculated that a complete update is performed on the current value network.
In an optional embodiment, the method further comprises:
and when the simulation times reach integral multiple of the preset first times, updating the network parameters of the target value network according to the latest network parameters of the current value network.
That is, each time a certain number of passenger flow simulations are performed, the network parameters of the target value network are replaced with the network parameters of the current value network which is the latest at present. The alternative is that the neural network can learn further. Before the next replacement occurs, the network parameter θ' of the target value network is fixed and unchanged, and only the network parameter θ of the current value network is changed by the training of step S104.
In an optional embodiment, the calculating a loss value according to the return value in the sample data, the first score and the second score corresponding to the sample data, and the preset discount rate includes:
calculating a Loss value according to a Loss function, the Loss function being Loss = (Q) target (s,a)-Q evel (s,a)) 2 ,Q target (s,a)=r+γ×max a'∈A Q (s ', a'), where Loss is the Loss value, Q evel (s, a) is a first score corresponding to the sample data, r is a return value in the sample data, γ is a predetermined discount rate, max a'∈A Q(s', a') is the second score corresponding to the sample data.
In an alternative embodiment, step S105 includes:
and when the simulation times reach the preset total simulation times, comparing the total waiting time corresponding to each simulation of the same quick departure time period quantity, and taking the current value network corresponding to the simulation of the time period quantity with the minimum corresponding total waiting time as the optimal current value network corresponding to the same quick departure time period quantity.
Considering that the number of the fast departure time periods is limited, the optimal model corresponding to all possible fast departure time periods is recorded. In actual production life, an operator can select an optimal model corresponding to the required fast departure time period number according to actual requirements.
As shown in fig. 2, the present invention also provides a dispatching device for dispatching a train, including:
an initialization module 201, configured to initialize a current value network and a target value network, and divide the operation time of a train on a line into a preset number of departure time periods;
the obtaining module 202 is configured to obtain passenger trip data of a historical day and a capacity of a train of the one route, where the passenger trip data includes inbound time, inbound station ID, outbound station ID, and transit station ID of each passenger, and the transit station ID is determined by a shortest path algorithm according to the inbound station and the outbound station;
the simulation module 203 is used for performing passenger flow simulation of a preset total simulation number according to the passenger travel data and the capacity of the trains on the line, acquiring sample data in the process of each simulation, storing the sample data in a memory library, counting the number of quick departure time periods and the total waiting time corresponding to the next simulation after each simulation is finished, and determining a current value network corresponding to the current simulation, wherein the one-time simulation is the passenger flow simulation of the operation time of one day, each sample data comprises state data of a departure time period, action and state data of the next departure time period of the departure time period and a return value, the state data comprises the number of persons in stations of each station on the line, the position of the issued trains and the number of persons in issued vehicles, the action is used for issuing the train at a preset quick departure frequency or issuing the train at a preset slow departure frequency, and the action of the next departure time period is determined according to the latest current value network;
the training module 204 is configured to, when new sample data is stored in a memory bank and the number of the sample data in the memory bank reaches a preset number threshold, randomly select a batch of sample data from the memory bank according to a preset batch size, train a latest current value network according to the batch of sample data, and take the trained current value network as the latest current value network;
the first determining module 205 is configured to determine, when the simulation times reach a preset total simulation times, an optimal current value network corresponding to each number of fast departure time periods according to the number of fast departure time periods, the total waiting time, and the current value network corresponding to each simulation;
a selecting module 206, configured to select an optimal current value network corresponding to a number of time periods of fast departure according to a requirement;
the second determining module 207 is configured to obtain state data of a time slot of a departure of the one line, and determine an action of a next time slot of the departure through the selected optimal current value network according to the state data of the time slot of the departure.
As shown in fig. 3, the present invention also provides an electronic device, including:
one or more processors 301;
a storage device 302 for storing one or more programs;
when executed by the one or more processors 301, the one or more programs cause the one or more processors 301 to implement the departure scheduling method as described above.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the departure scheduling method as described above.
Example one
Referring to fig. 4-11, a first embodiment of the present invention is: a dispatching method for dispatching trains can be applied to dispatching trains of subways.
In view of the problems of the existing solutions, such as long computation time, high complexity, too many neglected detail features, and incapability of adjusting the generated static schedule and coping with emergency situations, in the embodiment, a Deep Reinforcement Learning algorithm (Deep Reinforcement Learning) is used to overcome the problems.
Firstly, a subway environment simulation system accurate to minutes is established by utilizing subway data of a certain city. Because the real running condition of the reference subway network is built and the distance measurement is carried out by utilizing the subway running time, the method has good transportability among different cities. Specifically, the data of the simulated environment is passenger travel data of a certain historical time period in a certain city, and mainly comprises the inbound time, the inbound station ID and the outbound station ID of a passenger. The simulation environment will perform a simulation of the flow of people throughout the subway system with a minimum division of minutes and calculate the total waiting time of all people.
In addition to basic single-line simulation, the simulation environment supports transfer operations. Since the transfer passengers also have only the records of inbound and outbound stops in the raw data, the optimal (least time-consuming) transfer station is assigned to them using the dijkstra algorithm, i.e. the transfer route of the passenger is given by the calculation of the shortest path between two stations by the dijkstra algorithm.
Assuming that the departure frequency of the subway can be changed every 30 minutes, the simulation environment generates a departure schedule in the next 30 minutes according to the departure frequency of the given subway, and simulates frame by frame in the period, wherein the duration of each frame is 1min. With a departure time period T 0 =30min as an example, the simulation environment has a departure-to-departure interval t in the current departure time period k Simulation of departure of subway at intervals of t 0 =1min self-refresh, including whether to execute departure action, update train position, update number of outgoing and incoming persons for each station, if there is a train arriving at the station, then according to passenger journeyThe interaction between the train and the number of people in the station ensures that the train does not have practical indexes such as overload and the like. Meanwhile, the fact that the card swiping position of most subway stations is close to the boarding position is considered, and therefore the time that passengers enter the subway station and walk to the subway station is not considered in the simulation system. However, due to the arrangement of the subway line, it often takes a certain time for passengers to walk to the transfer station, so that the passengers are transferred at t in the design station in order to fully consider the transfer time of the passengers 1 The subway can not be transferred to the next subway immediately when the subway is in a transfer state within the time.
And on the basis of the simulation environment, selecting a Deep Q-Network (DQN) algorithm in Deep reinforcement learning to make a departure strategy. In order to better solve the subway scheduling problem by DQN, the problem is abstracted and simplified properly.
The operation time of the subway is uniformly divided into a plurality of departure time periods, for example, assuming that the operation time of the subway is from 6 am to 24 pm, the total time is 18 hours, the operation time of the subway is uniformly divided into 36 departure time periods, and the duration of each departure time period is half an hour. The DQN algorithm determines the departure frequency in the next departure time period according to the subway running state at the current moment, the number of people in the station and other factors. The schedule used by the authorities divides the departure intervals into two categories, the peak period and the peak period, the peak period being 6 a.m.: 30 to 8:30 and 16 pm: 30 to 18:30, the departure frequency of the subway is increased in the peak period, and the rest of the time is considered as the peak leveling period. In order to meet the actual conditions, a 36-bit variable of 0 and 1 is used as a symbol of the departure frequency of a day, wherein 0 represents slow departure with the departure frequency in a flat peak period, and 1 represents rapid departure with the frequency in a peak period. Slow departure, fast departure, represented by 0 and 1, will be simulated by mapping to a specific departure interval according to the current route, e.g. 8 min/time for slow departure and 3 min/time for fast departure.
In terms of effectiveness metrics, the evaluation is done by calculating the total waiting time T for all passengers on the line. Therefore, the subway scheduling problem is abstracted into a dual-target optimization problem of selecting n 1 (0 is less than or equal to n is less than or equal to 36) from a binary vector with the dimension of 36, and enabling T to be as small as possible under the condition that n is as small as possible.
The size of the complete solution space due to the above problem isEven if a single-target optimization problem is considered, namely, the number of departure time periods of the quick departure is limited to 8 (namely, the number of the quick departure time periods is the same as that in the existing official departure timetable, but the positions of the 8 1 in the binary vector are not limited), the method can also be usedThe total time of the simulation of each day is about 1min, and the violent solution needs about 57.6 years only when 8 quick departure time periods are considered. Therefore, it is necessary to adopt a more efficient solution algorithm.
So far, the subway scheduling problem is abstracted into a dual-target optimization problem which can be solved by a DQN algorithm. In this embodiment, for the dual-objective optimization problem, a group of pareto optimal solution sets is obtained after traversing all values of one of the objectives.
The training part of the DQN model is mainly divided into two stages, namely an exploration stage and a learning stage.
In the exploration phase, the DQN model continuously acquires the state of the current environment from the simulation environment, and because the time precision of the simulation environment is 1min, the state information of the subway/subway station in the simulation process can be easily acquired from the simulation environment. However, since the action output of DQN is once every 30min, the state of the simulation environment only needs to be sliced every 30min (simulation time), and the number of people in the real-time station of each subway station, the real-time position of the sent subway, and the number of people on the train having sent the subway are taken as characteristics, that is, input as input data (also called as state data s) into the deep neural network part in the DQN model, and the deep neural network calculates Q values Q (s, a) corresponding to all actions, a ∈ a, that is, calculates a score given by the network for executing action a in the state s (it can be understood that the total waiting time in the departure time period, the penalty term brought by the current total number of fast-departure trains, and both terms before considering in the future are converted into a score according to a certain proportion). And then selecting the action a with the maximum Q value and executing the action a. After the action is performed, the environment is changed, and a reward r of the environment is obtained and the record is stored in the memory base of the DQN. Meanwhile, the actions are repeated along with the change of the environment until the memory bank is full.
In the learning phase, besides continuing to acquire 'new information' and update the memory base in the exploration phase, the DQN model randomly extracts 'memory' from the memory base to train the neural network part therein. The neural network part has two networks of the same structure: a Q-target network and a Q-estimation network. The Q-target network is relatively fixed and stores previously learned knowledge, and the Q-estimation network is continuously updated along with the learning process and updates the network parameters of the Q-target network after a certain number of iterations.
The Q value of the DQN model is updated in the following manner:
the first formula: q target (s,a)=r(s,a)+γ×max a'∈A Q(s’,a’)
The second formula: loss = (Q) target (s,a)-Q evel (s,a)) 2
Wherein (s, a, s ', a') in the first formula respectively represent (current state, current (in-state) motion, next state, next (in-state) motion), that is, a represents motion generated according to the current state s, and s 'represents state obtained after the state s passes through the motion a, so in this embodiment, s can be regarded as the environmental state when the current departure time period ends, a can be regarded as the motion of the next departure time period, and s' can be regarded as the environmental state when the next departure time period ends; r (s, a) is a reward function; gamma is the discount rate, and the value range is 0-1; a is a preset action set, and A = {0,1}; q target (s, a) represents the Q-value score given by the target network at (s, a), given by the sum of two parts, where the previous term r (s, a) is for the current state and current actionScore, the latter term representing the estimate of the future, max a'∈A Q (s ', a ') gives the Q value for selecting the best action in state s ', while the discount rate y will reduce the future estimate to the current state in a certain proportion.
The second formula is a loss function of the estimated network, and the estimated network converges toward the first formula before updating the target network with the estimated network every other number of steps.
In the experimental process, the design of the return function in the DQN algorithm is found to play a crucial role in the result of the subway scheduling optimization problem. In contrast, after the distribution of the solution space to the subway optimization problem is combined, an effective return function is designed for the r value, which is specifically as follows:
the third formula: r = -C t k -a∙M
Wherein, C t k And a belongs to {0,1}, wherein a =0 represents that the action of the current departure time period is slow departure, namely departure is carried out at a preset slow departure frequency, and a =1 represents that the action of the current departure time period is fast departure, namely departure is carried out at a preset fast departure frequency. The reward function is designed with the expectation that the total waiting time can be reduced by at least M minutes for each additional quick departure.
Through further analysis, the value of M in the initial return function can be adjusted to be closer to the shape of a theoretical upper bound of a solution space, so that a better solution can be obtained, because the value is beneficial to a DQN algorithm to perform 'parallel exploration' when the value is close to the upper bound of an optimal solution, and the DQN algorithm is not trapped in a certain local optimal solution. In order to make the model effect better and have universality for coping with subway environments of different cities, a method for dynamically adjusting M is also designed. M at this time is a function of the number x of fast departure time periods, and is represented by F (x) = F (x) -F (x-1), and F (x) can be understood as a lower limit of a waiting time which is expected to be reduced when the number of fast departure time periods in a day is x, and 0< = x < =36 in the present embodiment.
The fourth formula: f. of 1 (x)=x∙M 0
The fifth formula: m 0 = (theoretical longest waiting time-theoretical shortest waiting time)/total departure time period
The sixth formula: f. of i+1 (x)=K new ×Smooth(C best,i (x))+K old ×f i (x),
Wherein, in the sixth formula, f i+1 (x) Represents a function for the (i + 1) th iteration; smooth () is a smoothing function; c best,i (x) The function is obtained by recording the minimum total waiting time under the optimal departure model with the number of the fast departure time periods being x in the ith iteration. K new And K old Are two adjustment parameters for controlling the update step size of f (x) in an iteration.
Further, in an actual process, due to the limited number of simulation times in the single-round iteration and the preferential property of the DQN algorithm, a solution without a large number of fast departure time periods may occur. For example, in fig. 11, when the number of fast departure time periods is greater than 28, it is found that the algorithm does not give a solution in this case. This is because under the condition that the number of the fast departure time periods is large, the total waiting time cannot be shortened remarkably by increasing the number of the fast departure time periods, but extra penalty is given to extra fast departure. Therefore, during iteration, a previous nearest "reasonable value" needs to be found for iteration, that is, if there is no solution corresponding to a certain departure time period number in the jth iteration, the solution corresponding to the departure time period number in the ith-1 round is used for substitution, and if there is no corresponding solution in the ith-1 round, the solution corresponding to the ith-2 round of iteration is continuously found forward, and the first round of iteration is directly found.
Based on the above analysis, as shown in fig. 4, the departure scheduling method of the present embodiment includes the following steps:
s401: initializing a current value network, a target value network and a hyper-parameter, and dividing the operation time of the trains of a line into a preset number of departure time periods.
The hyper-parameters comprise iteration rounds, simulation times epoch of each iteration round, memory base capacity, discount rate gamma, exploration rate initial value and the like. In this embodiment, the number of iteration rounds is 3, the simulation number epoch =2250 for each iteration round, and the initial value of the search rate ∈ is set 1 =1。
In this embodiment, the length of the departure time period is 30min, and the operation time of the train on the route is divided. For example, assuming that the operation time is from 6 am to 24 pm per day, the operation time may be divided into 36 departure periods.
S402: and obtaining passenger trip data of one historical day and the capacity of the train of the line, wherein the passenger trip data comprises the inbound time, the inbound station ID, the outbound station ID and the transit station ID of each passenger.
The ID of the transit station is determined by a shortest path algorithm (such as dijkstra algorithm) according to the inbound station and the outbound station, that is, in the subway line map, each station is taken as a node, the subway road is taken as a side, the length of the subway road is taken as a weight of the side, then the inbound station is taken as a starting point, the outbound station is taken as a terminal point, and the shortest path is calculated by the shortest path algorithm.
After the passenger travel data are obtained, preliminary screening can be performed firstly, and passenger travel data with the inbound station ID, the outbound station ID or the intermediate station ID matched with the station ID on the line are screened out, namely the passenger travel data related to the line are screened out firstly.
In this embodiment, the travel data of passengers on a certain historical day is acquired, and then the data is used to perform passenger flow simulation for the operation time of a complete day, that is, one simulation is considered to be performed.
S403: and carrying out passenger flow simulation of the total number of times of preset simulation according to the passenger travel data and the capacity of the train on the line, acquiring sample data in the process of each simulation, storing the sample data into a memory bank, and obtaining the number of the quick departure time periods, the total waiting time and the current value network corresponding to the current simulation after each simulation is finished.
In this embodiment, the total number of simulations = number of iterations × number of simulations per iteration =3 × 2250, that is, three iterations are performed, and 2250 simulations are performed per iteration.
Since one simulation is a simulation performed for one day, after each simulation is finished, the number of fast departure time periods (i.e., the number of departure time periods during which departure is performed at a preset fast departure frequency in one day), the total waiting time (i.e., the total waiting time of all passengers on the route in one day), and the current value network (i.e., the latest current value network when the current simulation is finished) corresponding to each simulation are obtained.
In this embodiment, the number of times of simulation is counted, and the number of times of simulation is increased by one each time of simulation is performed.
Specifically, as shown in fig. 5, the process of performing a passenger flow simulation includes the following steps:
s501: and initializing a simulation environment, and taking the first departure time period as the current departure time period.
S502: and according to the action of the current departure time period, passenger travel data, capacity and preset unit time, carrying out passenger flow simulation of the current departure time period, acquiring state data when the current departure time period is ended as the state data of the current departure time period, and meanwhile, counting the total waiting time of the current departure time period according to the number of the passengers waiting at each station on the line in each unit time in the current departure time period.
The first departure time period is a preset action, the action comprises two types of departure at a preset fast departure frequency (hereinafter referred to as fast departure) and departure at a preset slow departure frequency (hereinafter referred to as slow departure), and one departure time period corresponds to one action. The status data includes the number of persons at each stop on the line, the location of the issued train and the number of persons in the train that have been issued.
S503: generating a random number, wherein the range of the random number is 0-1.
S504: and judging whether the random number is smaller than the exploration rate corresponding to the current simulation, if so, executing step S505, and if not, executing step S506.
Wherein, the first simulation corresponds to the search rate, i.e. the initial value epsilon of the search rate 0 =1, and the search rate for each subsequent simulation is determined based on the search rate for the previous simulation and a preset minimum search rate, specifically, the search rate update formula is ∈ i =max(ε min ,ε i-1 + 0.0045), wherein ε i For the ith simulation of the corresponding exploration rate, epsilon i-1 For the i-1 th simulation of the corresponding exploration rate, epsilon min Is a predetermined minimum search rate, in this example, ∈ min =0.1。
S505: and randomly generating the action of the next departure time period of the current departure time period.
S506: and determining the action of the next departure time period of the current departure time period according to the latest current value network.
Specifically, the state data s of the current departure time period and an action a in a preset action set A are input into a current value network, and the current value network outputs the state data and the score Q (s, a) of the action. After the scores of all the actions in the action set are obtained through the latest current value network during the simulation, the action corresponding to the maximum score is taken as the action of the next vehicle-starting time period, namely argmax a∈A Q(s,a,θ evel ) Wherein, theta evel Network parameters representing the current, up-to-date, current value network.
S507: and according to the action of the next departure time period, the passenger travel data, the capacity and the preset unit time, carrying out passenger flow simulation of the next departure time period, acquiring state data when the next departure time period is ended as the state data of the next departure time period, and meanwhile, counting the total waiting time of the next departure time period according to the number of the waiting passengers at each station on the line in each unit time in the next departure time period.
S508: and calculating a return value according to the total waiting time of the current departure time period, the action of the next departure time period and a penalty term function corresponding to the iteration of the round, wherein the return value r of the tuple (s, a, s ') is calculated to be the state data of the current departure time period, a represents the action of the next departure time period, and s' is the state data of the next departure time period.
Specifically, the return value calculation formula is r = -C t k -a(f j (x)-f j (x-1)), wherein r is a reported value, C t k The total waiting time of the current departure time period; a represents the action of the next departure time period, and in this embodiment, a =1 if the action of the next departure time period is fast departure, and a =0 if the action of the next departure time period is slow departure.
f j (x) A penalty term function corresponding to the j-th iteration is taken as an initial value f of the penalty term function 1 (x)=x∙M 0 X represents the number of time periods of fast departure, M 0 Is a preset punishment item for single quick departure. And then, updating the penalty term function corresponding to each iteration according to the penalty term function corresponding to the previous iteration, wherein the penalty term function updating formula is f j+1 (x)=K new ×Smooth(C best,j (x))+K old ×f j (x) Wherein Smooth () is a smoothing function, C best,j (x) The function obtains K by recording the minimum total waiting time under the optimal departure model with the number of the quick departure time periods x in the jth iteration process new And K old For preset regulating parameters for controlling f j (x) The update step size in an iteration.
Therefore, the formula for calculating the return value in the first iteration is r = -C t k -a∙M 0 I.e., it is desirable to bring about a reduction in the total waiting time of at least M0 minutes for each additional quick departure time period.
Further, before the step is carried out, even before passenger flow simulation of the preset total number of times of simulation is carried out, one quick departure simulation (namely actions of all departure time periods in one day are quick departure) and one slow departure simulation (namely actions of all departure time periods in one day are slow departure) are carried out, the theoretical shortest waiting time and the theoretical longest waiting time are obtained, and then the theoretical shortest waiting time and the theoretical longest waiting time are obtained, and the passenger flow simulation is carried outDividing the difference between the theoretical longest waiting time and the theoretical shortest waiting time by the preset total number of departure time periods (36 in this embodiment) of one day to obtain a penalty term M for single-time express departure 0 I.e. M 0 = (theoretical longest waiting time-theoretical shortest waiting time)/total number of departure time periods of one day.
S509: and generating sample data (s, a, r, s') according to the state data of the current departure time period, the action of the next departure time period, the return value and the state data of the next departure time period, and storing the sample data in a memory.
Further, if the storage capacity of the memory bank reaches the preset memory bank capacity, namely the memory bank is full, the sample data stored in the memory bank at the earliest time is deleted.
S510: and judging whether the next departure time period is the last departure time period of the operation time, if so, indicating that passenger flow simulation of all departure time periods in one day is finished, executing step S512, and if not, executing step S511.
S511: and taking the next departure time period as the current departure time period, taking the state data of the next departure time period as the state data of the current departure time period, taking the total waiting time of the next departure time period as the total waiting time of the current departure time period, and then returning to execute the step S503.
S512: counting the number of departure time periods of which the actions in the simulation are departure at a quick departure frequency to obtain the number of the quick departure time periods corresponding to the simulation, and calculating the total waiting time corresponding to the simulation according to the total waiting time of each departure time period of the operation time in the simulation; and meanwhile, taking the latest current value network when the simulation is finished as the current value network corresponding to the simulation.
Wherein, the total waiting time of the kth departure time period in the simulation is assumed to be C t k ,k=[1,z]K ∈ N ×, z is the total number of departure time periods of one day, and in this embodiment, z =36, the total waiting time of 36 departure time periods in the current simulation is accumulated, so that the total waiting time of the departure time periods in the current simulation can be obtainedAnd the total waiting time corresponding to the simulation is obtained.
Further, as for steps S502 and S507, as shown in fig. 6, the process of performing the passenger flow simulation for one departure time period includes the following steps:
s601: and taking the first unit time of the current departure time period as the current unit time.
In this embodiment, the unit time is 1min, and one departure time period is 30min, so that 30 unit times are included in one departure time period.
S602: and respectively judging whether each station on the line has a train arrival station in the current unit time according to the action of the current departure time period and preset train operation data, if so, namely, a part of stations have the train arrival station and a part of stations have no train arrival station, executing a step S603 and then executing a step S604, and if not, namely, all stations on the line have no train arrival station, executing the step S604.
According to the action of the current departure time period, the specific departure time of the current departure time period can be determined. For example, if the fast departure frequency is 4 min/time (i.e., one trip is taken every 4 minutes, i.e., the departure bay interval is 4 min), the slow departure frequency is 8 min/time (i.e., one trip is taken every 8 minutes, i.e., the departure bay interval is 8 min), and the action of the current departure time period is slow departure, then one departure is respectively performed in 0 th, 8 th, 16 th, and 24 th minutes of the current departure time period.
Further, in this embodiment, a variable t is recorded p Indicating the time length from the last departure time by comparing t p And determining whether to departure or not according to the departure interval of the current departure time period. For example, if the operation of the previous departure time period is slow departure and departure is performed in each of the 0 th, 8 th, 16 th and 24 th minutes of the previous departure time period, if the operation of the current departure time period is slow departure, the first departure in the current departure time period is performed in the 2 nd minute of the current departure time period, and if the operation of the current departure time period is fast departure, t is the time t p =6min, if the departure interval is larger than 4min of the current departure time period, the vehicle can be immediately departed, namely, the vehicle can be driven at the current departure time periodThe first departure in the current departure time period is carried out.
The running modes of the trains on the same line are generally the same, so that the arrival of the train at the station in the first few minutes after the train is dispatched can be known according to the train running data, and the train arrival at the stations in each unit time can be determined by combining the obtained dispatching time.
S603: according to the passenger travel data and the train capacity, people flow interactive processing is respectively carried out on each station where the train arrives, the number of passengers (excluding passengers who get off and do not transfer) in each station where the train arrives and the number of passengers in the train which is sent out are respectively updated according to the results of the people flow interactive processing, and meanwhile, the number of passengers waiting for each station where the train arrives at the station in the current unit time is respectively counted.
Specifically, if a train arrives at a station on the route in the current unit time, people flow interaction processing is performed on the station according to passenger trip data and the capacity of the train, wherein the people flow interaction processing includes getting-on passengers getting off and getting-on passengers getting-on the station, the getting-on passengers include passengers getting-on (i.e., passengers on a train already sent on the route) and having an outbound station ID or a transit station ID of the station, the passengers in the station include inbound passengers (i.e., passengers just inbound) and transfer passengers (i.e., passengers transferring the route from other routes), the inbound passengers are passengers having inbound time earlier than the current unit time and having inbound station ID of the station and not getting-on, the transfer passengers are passengers having transit station ID of the station and belonging to the route, and the time of arriving at the transit station exceeds the preset transfer time and having not getting-on.
In this embodiment, it is considered that most of the subway station card swiping positions are closer to the boarding positions, so the time that passengers enter and walk to the platform is not considered in the simulation system, that is, the passengers start waiting after the passengers swipe cards into the platform by default. However, due to the arrangement of the subway line, it often takes a certain time for passengers to get on the transfer station after getting off the train at the transfer station, so that the passengers can be fully considered for transferTime, at which the passenger is transferred in the design station 1 The passengers in the transfer state can not transfer the next subway immediately, namely, the passengers in the transfer state do not count the number of waiting passengers. In this example, t 1 =1min, this parameter can be adjusted.
And after people stream interaction processing is carried out, updating the number of passengers in the train which has sent out and arrives at the station in the current unit time and updating the number of passengers in the station at the station according to the number of passengers arriving at the station and the number of passengers in the station getting on the train.
S604: and respectively updating the number of people in each station without train arrival in the current unit time according to the passenger trip data, and respectively counting the number of people waiting for the trains in each station without train arrival in the current unit time.
Specifically, if no train arrives at a station on the line within the current unit time, the number of passengers entering the station within the current unit time is determined according to the passenger trip data and the station ID of the station, the number of passengers entering the station within the current unit time is updated according to the number of passengers entering the station, and meanwhile the number of waiting passengers at the station within the current unit time is counted.
S605: and counting the total number of waiting passengers in the current unit time according to the number of waiting passengers at each station on the line in the current unit time.
S606: and judging whether the current unit time is the last unit time of the current departure time period, if so, indicating that the passenger flow simulation of the current departure time period is finished, executing step S607, otherwise, executing step S608.
S607: and calculating to obtain the total waiting time of the current departure time period according to the total number of waiting passengers in each unit time and the duration of the unit time in the current departure time period.
Specifically, the total waiting time C of the kth departure time period t k The calculation formula of (c) is as follows:
wherein, t 0 Is the duration of a unit time, T 0 Is the duration of a departure time period, in this embodiment, t 0 =1min,T 0 =30min,N i The total number of waiting passengers in the ith unit time in the kth departure time period.
S608: the next unit time is taken as the current unit time, and the process returns to step S602.
S404: when new sample data is stored in the memory base and the number of the sample data in the memory base reaches a preset number threshold value, randomly selecting a batch of sample data from the memory base according to a preset batch size, and training to obtain a latest current value network according to the batch of sample data.
Specifically, as shown in fig. 7, the present step includes the steps of:
s701: when new sample data is stored in the memory base and the number of the sample data in the memory base is preset as a number threshold (for example 360), randomly selecting sample data with a preset batch size from the memory base as sample data of the current batch, and taking the latest current value network as the current value network to be trained.
In this embodiment, the preset batch size is 64, that is, 64 sample data are randomly selected as the current batch of sample data.
S702: and acquiring the p-th sample data from the current batch of sample data as the current sample data, wherein the initial value of p is 1.
S703: calculating the state data of the current departure time period in the current sample data and the score Q corresponding to the action of the next departure time period through the latest current value network to be trained evel And (s, a) as a first score corresponding to the current sample data.
S704: respectively calculating the scores of the state data of the next vehicle-starting time period in the current sample data corresponding to each action through the latest target value network, and enabling the maximum value maxQ of the scores to be the maximum value a’∈A (s ', a') as a second score to which the current sample data corresponds.
Where A is a predetermined set of actions, and in this embodiment, A is {0,1}.
S705: and calculating a loss value according to a return value in the current sample data, a first score and a second score corresponding to the current sample data and a preset discount rate, and updating the latest network parameters of the current value network to be trained through a back propagation algorithm according to the loss value.
Specifically, the loss value is calculated according to a loss function, which is:
Loss=(Q target (s,a)-Q evel (s,a)) 2 ,Q target (s,a)=r+γ×max a'∈A Q(s’,a’)
wherein Loss is Loss value, Q evel (s, a) is the first score corresponding to the current sample data, r is the return value in the current sample data, gamma is the preset discount rate, max a'∈A And Q (s ', a') is a second score corresponding to the current sample data.
And after the loss value is obtained through calculation, updating the latest network parameters of the current value network to be trained through a back propagation algorithm according to the loss value, wherein the updated current value network to be trained is the latest current value network.
S706: and judging whether the current batch of sample data is traversed or not, namely whether p is equal to the preset batch size or not, if so, executing the step, and otherwise, executing the step S707.
S707: and acquiring next sample data from the current batch of sample data as the current sample data, namely making p = p +1, and returning to execute the step S702.
S708: and taking the latest current value network to be trained as the latest current value network.
That is, in the process of performing the passenger flow simulation, the training process is performed synchronously, and each time the training of a batch of sample data is performed, a latest current value network is obtained, and each time a latest current value network is obtained, the latest current value network is applied to the subsequent passenger flow simulation process.
S405: and when the simulation times reach integral multiples of the preset first times, updating the network parameters of the target value network according to the latest network parameters of the current value network.
In this embodiment, the preset first number is 720, that is, every 720 times of passenger flow simulation is performed, that is, the network parameter of the target value network is replaced by the network parameter of the current value network. The alternative is that the neural network can learn further. Until the next replacement occurs, the network parameter θ' of the target value network is fixed and unchanged, and only the network parameter θ of the current value network is changed by the training of step S404.
S406: and when the simulation times reach the preset total simulation times, determining the optimal current value network corresponding to each quick departure time period according to the quick departure time period number, the total waiting time and the current value network corresponding to each simulation.
Specifically, the total waiting time corresponding to each simulation of the same fast departure time period number is compared, and the current value network corresponding to the simulation of the minimum total waiting time is used as the optimal current value network corresponding to the same fast departure time period number. That is, for each fast departure time period number, a corresponding optimal current value network can be obtained, and in the embodiment, there are 36 fast departure time period numbers, and therefore there are 36 optimal current value networks.
Further, each iteration is carried out, the optimal current value network corresponding to each fast departure time period number in the current iteration is determined according to the fast departure time period number, the total waiting time and the current value network corresponding to each simulation in the current iteration, and then the optimal current value network is compared with the optimal solutions of the previous iterations, so that the optimal solution in the iterated round is determined.
Fig. 8-10 show the optimal solution search process of the DQN algorithm for different numbers of time periods of quick departure in the learning process, where fig. 8 shows a schematic diagram of the training results of the reinforcement learning model when the number of time periods of quick departure is 1-6, fig. 9 shows a schematic diagram of the training results of the reinforcement learning model when the number of time periods of quick departure is 7-14, fig. 10 shows a schematic diagram of the training results of the reinforcement learning model when the number of time periods of quick departure is 15-24, the abscissa in fig. 8-10 is the simulation times, and the ordinate is the inverse number of the total waiting time. Due to the limited simulation times in the single-round iteration and the preference of the DQN algorithm, in the illustrated experiment, the DQN algorithm does not try to select a strategy in which the number of fast departure time periods is too high, because when the number of fast departure time periods is too large, there is less benefit from continuously increasing the number of fast departure time periods. After training, the optimal solution schematic diagram of different fast departure time period quantities as shown in fig. 11 can be obtained, in fig. 11, the abscissa is the fast departure time period quantity, the ordinate is the opposite number of the total waiting time, and the point through which the thickened black line passes is the optimal model when the fast departure time period quantity is the abscissa value under the current condition. When the abscissa in the figure is 8, the black star point is the departure result according to the official schedule.
S407: and selecting the optimal current value network corresponding to the quantity of the time periods of quick departure according to the requirements.
S408: and acquiring the state data of the one-vehicle time period of the one line, and determining the action of the next one-vehicle time period of the one-vehicle time period through the selected optimal current value network according to the state data of the one-vehicle time period.
Specifically, an optimal current value network is selected according to actual requirements, for example, assuming that the number of fast departure vehicles of a line to be analyzed in the actual requirements in the operation time is 8, an optimal current value network corresponding to the number of fast departure time periods x =8 is selected, further, network parameters of the selected optimal current value network are loaded to a target value network, then, according to actual conditions, state data of the line at the end of the current departure time period is obtained, the state data and each action in an action set are respectively transmitted to the loaded target value network, a score corresponding to each action is obtained, and then, an action corresponding to the maximum score is used as a departure suggestion of a next departure time period.
That is to say, for a line to be analyzed, the number of the fast departure time periods of the line can be determined according to the actual requirements of a subway operator, then the optimal current value network is selected according to the determined number of the fast departure time periods, then the state data of the first departure time period of the current day on the line is obtained, the action of the second departure time period of the current day is determined through the selected optimal current value network, the state data of the second departure time period is obtained when the second departure time period is finished, then the action of the third departure time period of the current day is determined through the selected optimal current value network, and the like.
The prior method (such as a genetic algorithm) can only calculate the optimal solution of the previous day according to the pedestrian flow condition of the subway of the previous day, and assumes that the current pedestrian flow condition is approximately the same as the current day, and the obtained result is used for the current day, so that the traditional algorithm ignores many pedestrian flow characteristics of the current day and cannot make dynamic adjustment in time. During training, the DQN algorithm gives the environment state of the intelligent agent at the current moment, and the intelligent agent makes the departure action selection of the next time period. This means that when the algorithm is deployed in actual production life, the departure decision of the next time period can be given in real time only by giving the subway and the subway station state of the current time.
In summary, the present embodiment has the following advantages:
1. the method can dynamically adjust the departure mode in real time according to the current subway pedestrian flow condition, and effectively reduce the rail traffic pressure.
2. The method has excellent self-adaptive capacity to unexpected situations such as abnormal people flow. In the early/late time abnormal traffic test, the scheme of the embodiment reduces the waiting time of passengers by 43.37% (751723 min) and 21.96% (319761 min) respectively relative to the static schedule.
3. This produced a static schedule with a number of fast departure periods of 8 for comparison with the official schedule, with a reduction of about 2% in total waiting time (about 12663 min).
4. Compared with some traditional algorithms, such as genetic algorithm, the static comparison generated on several subdata sets almost achieves the optimal solution in a small-range exhaustible test sample, and the training time is only slightly increased. In addition, after training is completed, the obtained model only needs to perform inference, the inference function can be completed in about 1 minute generally, and the method can meet the departure requirement of an actual subway.
5. Different from the traditional static method, the method explores a larger solution space, and can generate optimal models with different quick departure quantities along with the learning process of the intelligent agent in the training process, so that subway workers can select proper models to deploy according to actual conditions.
Example two
Referring to fig. 2, the second embodiment of the present invention is: the departure scheduling device can execute the departure scheduling method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. The device can be implemented by software and/or hardware, and specifically comprises:
an initialization module 201, configured to initialize a current value network and a target value network, and divide the operation time of a train on a line into a preset number of departure time periods;
the obtaining module 202 is configured to obtain passenger trip data of a historical day and a capacity of a train of the one route, where the passenger trip data includes inbound time, inbound station ID, outbound station ID, and transit station ID of each passenger, and the transit station ID is determined by a shortest path algorithm according to the inbound station and the outbound station;
the simulation module 203 is used for performing passenger flow simulation of a preset total simulation number according to the passenger travel data and the capacity of the trains on the line, acquiring sample data in the process of each simulation, storing the sample data in a memory library, counting the number of quick departure time periods and the total waiting time corresponding to the next simulation after each simulation is finished, and determining a current value network corresponding to the current simulation, wherein the one-time simulation is the passenger flow simulation of the operation time of one day, each sample data comprises state data of a departure time period, action and state data of the next departure time period of the departure time period and a return value, the state data comprises the number of persons in stations of each station on the line, the position of the issued trains and the number of persons in issued vehicles, the action is used for issuing the train at a preset quick departure frequency or issuing the train at a preset slow departure frequency, and the action of the next departure time period is determined according to the latest current value network;
a training module 204, configured to randomly select a batch of sample data from a memory according to a preset batch size when new sample data is stored in the memory and the number of the sample data in the memory reaches a preset number threshold, train a latest current value network according to the batch of sample data, and use the trained current value network as the latest current value network;
the first determining module 205 is configured to determine, when the simulation times reach a preset total simulation times, an optimal current value network corresponding to each number of fast departure time periods according to the number of fast departure time periods, the total waiting time, and the current value network corresponding to each simulation;
a selecting module 206, configured to select an optimal current value network corresponding to a number of time periods of fast departure according to a requirement;
the second determining module 207 is configured to obtain state data of a time slot of a departure of the one line, and determine an action of a next time slot of the departure through the selected optimal current value network according to the state data of the time slot of the departure.
In an alternative embodiment, the simulation module 203 includes:
the first presetting unit is used for presetting the action of a first departure time period of the operation time in the ith simulation, taking the first departure time period as the current departure time period, and setting the initial value of i as 1;
the first simulation unit is used for simulating passenger flow in the current departure time period according to the action, passenger travel data, capacity and preset unit time of the current departure time period, acquiring state data when the current departure time period is ended as the state data of the current departure time period, and meanwhile, counting the total waiting time of the current departure time period according to the number of waiting passengers at each station on the line in each unit time in the current departure time period;
the device comprises a first generation unit, a second generation unit and a control unit, wherein the first generation unit is used for generating random numbers, and the range of the random numbers is 0-1;
the second generation unit is used for randomly generating the action of the next departure time period of the current departure time period if the random number is smaller than the exploration rate corresponding to the ith simulation;
the first determining unit is used for determining the action of the next departure time period of the current departure time period according to the latest current value network if the random number is greater than or equal to the exploration rate corresponding to the ith simulation;
the second simulation unit is used for simulating passenger flow of the next departure time period according to the action of the next departure time period, passenger travel data, capacity and preset unit time, acquiring state data when the next departure time period is finished as the state data of the next departure time period, and meanwhile, counting the total waiting time of the next departure time period according to the number of waiting passengers at each station on the line in each unit time in the next departure time period;
the system comprises a first calculation unit, a second calculation unit and a third calculation unit, wherein the first calculation unit is used for calculating a return value according to the total waiting time of the current departure time period, the action of the next departure time period and a penalty item function corresponding to the jth iteration, j = 8968, i/epoch \8969;, and the epoch is the preset simulation times of each iteration;
a third generating unit, configured to generate one sample data according to the status data of the current departure time period, the action of the next departure time period, the report value, and the status data of the next departure time period, and = store the sample data in the memory;
the first judgment unit is used for judging whether the next departure time period is the last departure time period of the operation time or not to obtain a first judgment result;
the first execution unit is used for taking the next departure time period as the current departure time period and returning to execute the first generation unit if the first judgment result is negative;
a first obtaining unit, configured to, if the first determination result is yes, count the number of departure time periods in the ith simulation, where the number of departure time periods is used as a departure frequency for departure, obtain the number of the fast departure time periods corresponding to the ith simulation, calculate, according to the total waiting time of each departure time period of the operation time in the ith simulation, the total waiting time corresponding to the ith simulation, and use the current latest current value network as the current value network corresponding to the ith simulation;
the second judgment unit is used for judging whether i is equal to the preset total simulation times or not to obtain a second judgment result;
and the second determining unit is used for determining the corresponding exploration rate of the (i + 1) th simulation according to the corresponding exploration rate of the (i) th simulation and the preset minimum exploration rate if the second judgment result is negative, wherein the corresponding exploration rate of the first simulation is a preset exploration rate initial value, and making i = i +1 and returning to execute the first preset unit.
In an optional embodiment, the first analog unit includes:
the first serving as a subunit, which is used for taking the first unit time of the current departure time period as the current unit time;
the first judging subunit is used for respectively judging whether a train arrives at each station on the line in the current unit time according to the action of the current departure time period and preset train operation data;
the interactive processing subunit is used for carrying out people flow interactive processing on the bus station according to passenger travel data and the train capacity if a train arrives at the bus station, wherein the people flow interactive processing comprises passenger getting-off and passenger getting-on in the bus station, the passenger getting-on comprises a passenger who gets on the bus and has an outbound station ID or a transit station ID as the station ID of the bus station, the passenger getting-on in the bus station comprises an inbound passenger and a transfer passenger, the inbound passenger comprises a passenger who has an inbound time earlier than the current unit time and has no boarding and has an inbound station ID earlier than the current unit time, and the inbound station ID is the station ID of the bus station, and the transfer passenger comprises a passenger who has a transit station ID as the station ID of the bus station, has an arrival time longer than a preset transfer time, and has a station corresponding to the outbound station ID, and has no boarding;
the first updating subunit is used for updating the number of people in the station of the station and the number of people in the train which is sent out according to the people flow interaction processing result, and counting the number of people waiting for the train at the station in the current unit time;
the second updating subunit is used for updating the number of people in the station of the station according to the passenger trip data and the station ID of the station and counting the number of people waiting for the train in the station in the current unit time if no train arrives at the station;
the counting subunit is used for counting the total number of waiting passengers in the current unit time according to the number of waiting passengers at each station on the line in the current unit time;
the second judgment subunit is used for judging whether the current unit time is the last unit time of the current departure time period or not;
and the second as a subunit, configured to, if the determination result of the second determining subunit is negative, take the next unit time as the current unit time, and return to execute the first determining subunit.
In an optional embodiment, the first determining unit is specifically configured to, if the random number is greater than or equal to the search rate corresponding to the ith simulation, calculate, according to the state data of the current departure time period and a preset action set, a score of each action in the action set through a latest current value network, and use an action corresponding to a maximum score as an action of a next departure time period, where the action set includes departure at a preset fast departure frequency and departure at a preset slow departure frequency.
In an optional embodiment, the first calculating unit is specifically configured to calculate the return value according to a return value calculating formula, where the return value calculating formula is r = -C t k -a(f j (x)-f j (x-1)), wherein r is a reported value, C t k If the action of the next departure time period is to proceed departure at a preset fast departure frequency, a =1, and if the action of the next departure time period is to proceed departure at a preset slow departure frequency, a =0,f j (x) And (5) performing a penalty term function corresponding to the j-th iteration.
In an optional embodiment, the simulation module 203 further includes:
the third judging unit is used for judging whether i is equal to the integral multiple of the simulation times of each iteration to obtain a third judging result;
a third determining unit, configured to determine a penalty term function corresponding to the j +1 th iteration according to the penalty term function update formula and a penalty term function corresponding to the j-th iteration if the third determination result is yes, where the penalty term function update formula is f j+1 (x)=K new ×Smooth(C best,j (x))+K old ×f j (x) Wherein f is j+1 (x) Penalty function corresponding to iteration of round j +1, f j (x) A penalty function corresponding to the jth iteration, smooth () is a Smooth function, C best,j (x) Represents the minimum total waiting time, K, corresponding to the number x of the fast departure time periods in the jth iteration new And K old For a predetermined setting parameter, f 1 (x)=x∙M 0 X represents the number of fast departure time periods, M 0 Is a preset punishment item for single quick departure.
In an optional embodiment, the departure scheduling apparatus further includes:
the slow departure simulation module is used for setting the actions of each departure time period in the operation time to be departure at a preset slow departure frequency, carrying out one-time slow departure simulation according to the passenger trip data and the capacity of the train of the first line, and counting the total waiting time of each departure time period in the slow departure simulation to obtain the theoretical longest waiting time;
the quick departure simulation module is used for setting the actions of each departure time period in the operation time to be departure at a preset quick departure frequency, carrying out one-time quick departure simulation according to the passenger trip data and the capacity of the train of the one line, and counting to obtain the theoretical shortest waiting time according to the total waiting time of each departure time period in the quick departure simulation;
and the calculation module is used for dividing the difference between the theoretical longest waiting time and the theoretical shortest waiting time by the preset total number of departure time periods in one day to obtain a punishment item of single-time express departure.
In an optional embodiment, the second determining unit is specifically configured to determine the (i + 1) th simulation correspondence according to an exploration rate update formulaThe exploration rate of (1), the exploration rate updating formula is epsilon i+1 =max(ε min ,ε i + 0.0045), wherein ε i+1 For the i +1 th simulation, the corresponding exploration ratio, ε i For the ith simulation of the corresponding exploration rate, epsilon min Is a preset minimum exploration rate.
Wherein the preset initial value epsilon of the exploration rate 1 =1, predetermined minimum search rate ε min =0.1。
In an optional embodiment, the simulation module 203 further includes:
and the deleting unit is used for deleting the sample data which is stored in the memory bank at the earliest time if the memory bank is full.
In an optional embodiment, the training module 204 includes:
the selecting unit is used for randomly selecting sample data with a preset batch size from the memory base as sample data of the current batch when new sample data are stored in the memory base and the number of the sample data in the memory base reaches a preset number threshold value, and taking the latest current value network as a current value network to be trained;
the acquisition unit is used for traversing the sample data of the current batch and sequentially acquiring sample data from the sample data of the current batch;
the second calculating unit is used for calculating the state data of the current departure time period in the sample data and the score corresponding to the action of the next departure time period through the latest current value network to be trained, and the state data and the score are used as first scores corresponding to the sample data;
the third calculating unit is used for respectively calculating the scores of the state data of the next vehicle-starting time period in the sample data corresponding to each action through a latest target value network, and taking the maximum score as a second score corresponding to the sample data;
a fourth calculating unit, configured to calculate a loss value according to a reported value in the sample data, the first score and the second score corresponding to the sample data, and a preset discount rate, and update a network parameter of a latest current value network to be trained according to the loss value;
and the unit is used for taking the latest current value network to be trained as the latest current value network after the current batch of sample data is traversed.
In an optional embodiment, the departure scheduling apparatus further includes:
and the updating module is used for updating the network parameters of the target value network according to the latest network parameters of the current value network when the simulation times reach the integral multiple of the preset first times.
In an optional embodiment, the fourth calculating unit is specifically configured to calculate the Loss value according to a Loss function, where the Loss function is Loss = (Q) target (s,a)-Q evel (s,a)) 2 ,Q target (s,a)=r+γ×max a'∈A Q (s ', a'), wherein Loss is the Loss value, Q evel (s, a) is the first score corresponding to the sample data, r is the return value in the sample data, γ is the predetermined discount rate, max a'∈A Q (s ', a') is a second score corresponding to the sample data.
In an optional embodiment, the first determining module 205 is specifically configured to, when the simulation times reach a preset total simulation times, compare total waiting times corresponding to simulations of the same number of fast departure time periods, and use a current value network corresponding to one simulation with the smallest corresponding total waiting time as an optimal current value network corresponding to the same number of fast departure time periods.
EXAMPLE III
Referring to fig. 3, a third embodiment of the present invention is: an electronic device, the electronic device comprising:
one or more processors 301;
a storage device 302 for storing one or more programs;
when the one or more programs are executed by the one or more processors 301, the one or more processors 301 implement the processes in the departure scheduling method embodiment as described above, and can achieve the same technical effect, and for avoiding repetition, details are not described here again.
Example four
A fourth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements each process in the foregoing embodiments of the departure scheduling method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
In summary, according to the dispatching method, the dispatching device, the dispatching equipment and the storage medium for dispatching the train, provided by the invention, because the DQN algorithm is used for making the dispatching action selection of the next time period by giving the environmental state of the intelligent agent at the current moment during training, when the algorithm is deployed in the actual production life, the dispatching decision of the next dispatching time period can be given in real time only by giving the states of the train and the station at the current moment, so that the dispatching mode can be dynamically adjusted in real time according to the current traffic flow situation, the track traffic pressure is effectively reduced, and the self-adaptive capacity for the emergency situations such as abnormal traffic flow is excellent; meanwhile, the optimal models corresponding to different fast departure time periods are generated along with the learning process of the intelligent agent in the training process, and a worker can select a proper model to deploy according to the actual situation.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the foregoing apparatus, each unit and each module included in the apparatus are merely divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.
Claims (17)
1. A departure scheduling method, comprising:
initializing a current value network and a target value network, and dividing the operation time of a train of a line into a preset number of departure time periods;
obtaining passenger trip data of one historical day and the capacity of the train of the line, wherein the passenger trip data comprises the inbound time, the inbound station ID, the outbound station ID and the transit station ID of each passenger, and the transit station ID is determined according to the inbound station and the outbound station through a shortest path algorithm;
according to the passenger travel data and the capacity of the trains of the line, carrying out passenger flow simulation of preset total simulation times, acquiring sample data in the process of each simulation, storing the sample data in a memory library, counting the number of quick departure time periods and the total waiting time corresponding to the current simulation, and determining a current value network corresponding to the current simulation, wherein the first simulation is the passenger flow simulation of the operation time of one day, each sample data comprises state data of a departure time period, action and state data of the next departure time period of the departure time period and a return value, the state data comprises the number of persons in each station on the line, the position of the issued trains and the number of persons in the issued trains, the action is used for departure at a preset quick departure frequency or departure at a preset slow departure frequency, and the action of the next departure time period is determined according to the latest current value network;
when new sample data is stored in a memory base and the number of the sample data in the memory base reaches a preset number threshold value, randomly selecting a batch of sample data from the memory base according to a preset batch size, training a latest current value network according to the batch of sample data, and taking the trained current value network as the latest current value network;
when the simulation times reach the preset total simulation times, determining the optimal current value network corresponding to each quick departure time period according to the quick departure time period number, the total waiting time and the current value network corresponding to each simulation;
selecting an optimal current value network corresponding to the number of the fast departure time periods according to requirements;
and acquiring the state data of the one-vehicle time period of the one line, and determining the action of the next one-vehicle time period of the one-vehicle time period through the selected optimal current value network according to the state data of the one-vehicle time period.
2. The departure scheduling method according to claim 1, wherein said performing passenger flow simulation for a preset total number of times of simulation according to said passenger travel data and the capacity of the train on said one route, acquiring sample data during each simulation, storing the sample data in a memory, counting the number of fast departure time periods and the total waiting time corresponding to the current simulation after each simulation is finished, and determining the current value network corresponding to the current simulation, comprises:
presetting the action of a first departure time period of operation time in the ith simulation, and taking the first departure time period as the current departure time period, wherein the initial value of i is 1;
according to the action of the current departure time period, passenger travel data, capacity and preset unit time, carrying out passenger flow simulation of the current departure time period, acquiring state data when the current departure time period is ended as the state data of the current departure time period, and meanwhile, counting the total waiting time of the current departure time period according to the number of waiting passengers at each station on the line in each unit time in the current departure time period;
generating a random number, wherein the range of the random number is 0-1;
if the random number is smaller than the corresponding exploration rate of the ith simulation, randomly generating the action of the next departure time period of the current departure time period;
if the random number is greater than or equal to the exploration rate corresponding to the ith simulation, determining the action of the next departure time period of the current departure time period according to the latest current value network;
according to the action of the next departure time period, passenger travel data, the capacity and the preset unit time, carrying out passenger flow simulation of the next departure time period, acquiring state data when the next departure time period is finished as the state data of the next departure time period, and meanwhile, counting the total waiting time of the next departure time period according to the number of the passengers waiting at each station on the line in each unit time in the next departure time period;
calculating a return value according to the total waiting time of the current departure time period, the action of the next departure time period and a penalty item function corresponding to the jth iteration, wherein j = 8968i/epoch \8969;, epoch is the preset simulation times of each iteration;
generating sample data according to the state data of the current departure time period, the action and the return value of the next departure time period and the state data of the next departure time period, and storing the sample data in a memory bank;
judging whether the next departure time period is the last departure time period of the operation time or not;
if not, taking the next departure time period as the current departure time period, and continuing to execute the step of generating the random number;
if so, counting the number of departure time periods for departure at a fast departure frequency in the ith simulation to obtain the number of the fast departure time periods corresponding to the ith simulation, calculating the total waiting time corresponding to the ith simulation according to the total waiting time of each departure time period of the operating time in the ith simulation, and taking the current latest current value network as the current value network corresponding to the ith simulation;
judging whether i is equal to a preset total simulation number;
and if not, determining the corresponding exploration rate of the (i + 1) th simulation according to the exploration rate corresponding to the ith simulation and a preset minimum exploration rate, wherein the exploration rate corresponding to the first simulation is a preset exploration rate initial value, enabling i = i +1, continuously executing the action of the first departure time period of the operation time in the preset ith simulation, and taking the first departure time period as the current departure time period.
3. The departure scheduling method according to claim 2, wherein the performing of the passenger flow simulation in the current departure time period according to the action of the current departure time period, the passenger travel data, the capacity amount and the preset unit time comprises:
taking the first unit time of the current departure time period as the current unit time;
respectively judging whether a train arrives at each station on the line in the current unit time according to the action of the current departure time period and preset train operation data;
if a train arrives at a station, people flow interactive processing is carried out on the station according to passenger trip data and the capacity of the train, the people flow interactive processing comprises the steps that passengers get off the station and passengers get on the station, the passengers getting on the station comprise passengers who get on the station and have the station ID of the station or the station ID of the transit station, the passengers in the station comprise passengers who get on the station and have the station ID of the station, the passengers in the station comprise passengers who get on the station and transfer passengers, the passengers comprise passengers who have the station ID of the station and have the station ID of the station, the time of arriving at the station exceeds the preset transfer time, and the station corresponding to the station ID of the station is the passenger on the station and have no train;
updating the number of people in the station of the station and the number of people in the train which is sent out according to the people flow interaction processing result, and counting the number of people waiting for the train in the current unit time;
if no train arrives at a station, updating the number of people in the station of the station according to passenger travel data and the station ID of the station, and counting the number of people waiting for the train at the station in the current unit time;
counting the total number of waiting passengers in the current unit time according to the number of waiting passengers at each station on the line in the current unit time;
judging whether the current unit time is the last unit time of the current departure time period or not;
and if not, taking the next unit time as the current unit time, continuing to execute the step of respectively judging whether a train arrives at each station on the line in the current unit time according to the action of the current departure time period and preset train operation data.
4. The departure scheduling method according to claim 2, wherein said act of determining the next departure time period of the current departure time period from the most recent current value network comprises:
according to the state data of the current departure time period and a preset action set, calculating the scores of all actions in the action set through a latest current value network, and taking the action corresponding to the maximum score as the action of the next departure time period, wherein the action set comprises the steps of departure with a preset fast departure frequency and departure with a preset slow departure frequency.
5. The departure scheduling method according to claim 2, wherein the calculating a return value according to the total waiting time of the current departure time period, the action of the next departure time period and the penalty term function corresponding to the jth iteration comprises:
calculating a return value according to a return value calculation formula, wherein the return value calculation formula is r = -C t k -a(f j (x)-f j (x-1)), wherein r is a reported value, C t k If the action of the next departure time period is to proceed departure at a preset fast departure frequency, a =1, and if the action of the next departure time period is to proceed departure at a preset slow departure frequency, a =0,f j (x) And (4) performing a corresponding penalty term function for the j-th iteration.
6. The departure scheduling method according to claim 5, wherein before letting i = i +1, further comprising:
judging whether i is equal to the integral multiple of the simulation times of each iteration;
if yes, determining a penalty term function corresponding to the j +1 th iteration according to a penalty term function updating formula and a penalty term function corresponding to the j th iteration, wherein the penalty term function updating formula is f j+1 (x)=K new ×Smooth(C best,j (x))+K old ×f j (x) Wherein f is j+1 (x) Penalty function corresponding to iteration of round j +1, f j (x) A penalty function corresponding to the jth iteration, smooth () is a Smooth function, C best,j (x) Represents the minimum total waiting time, K, corresponding to the number x of the fast departure time periods in the jth iteration new And K old For a predetermined setting parameter, f 1 (x)=x∙M 0 X represents the number of fast departure time periods, M 0 Is a preset punishment item for single quick departure.
7. The departure scheduling method according to claim 6, further comprising, after acquiring the passenger trip data of the historical day and the capacity of the train on the one route:
setting the actions of each departure time period of the operation time as departure frequency, carrying out primary slow departure simulation according to the passenger travel data and the train capacity of the line, and counting the total waiting time of each departure time period in the slow departure simulation to obtain the theoretical longest waiting time;
setting the actions of each departure time period of the operation time as departure frequency preset, carrying out one-time rapid departure simulation according to the passenger travel data and the capacity of the trains of the one line, and counting the total waiting time of each departure time period in the rapid departure simulation to obtain the theoretical shortest waiting time;
and dividing the difference between the theoretical longest waiting time and the theoretical shortest waiting time by the total number of departure time periods in a preset day to obtain a punishment item of single-time express departure.
8. The departure scheduling method of claim 2, wherein the determining the (i + 1) th simulated corresponding quest rate according to the quest rate corresponding to the (i) th simulation and the preset minimum quest rate comprises:
determining the exploration rate corresponding to the (i + 1) th simulation according to an exploration rate updating formula, wherein the exploration rate updating formula is epsilon i+1 =max(ε min ,ε i + 0.0045) in which ε i+1 For the i +1 th simulation of the corresponding exploration rate, ε i For the ith simulation of the corresponding exploration rate, epsilon min Is a preset minimum exploration rate.
9. The departure scheduling method according to claim 8, wherein the predetermined initial value of the search rate ∈ is set 1 =1, preset minimum search rate ∈ min =0.1。
10. The departure scheduling method according to claim 2, further comprising, before storing the sample data in a memory bank:
if the memory bank is full, deleting the sample data stored in the memory bank earliest.
11. The departure scheduling method according to claim 1, wherein when new sample data is stored in a memory bank and the number of the sample data in the memory bank reaches a preset number threshold, randomly selecting a batch of sample data from the memory bank according to a preset batch size, training a latest current value network according to the batch of sample data, and using the trained current value network as the latest current value network, comprises:
when new sample data is stored in the memory base and the number of the sample data in the memory base reaches a preset number threshold, randomly selecting sample data with a preset batch size from the memory base as the sample data of the current batch, and using the latest current value network as the current value network to be trained;
traversing the sample data of the current batch, and sequentially acquiring sample data from the sample data of the current batch;
calculating the score corresponding to the state data of the current departure time period and the action of the next departure time period in the sample data through the latest current value network to be trained, and taking the score as a first score corresponding to the sample data;
respectively calculating scores of the state data of the next vehicle-starting time period in the sample data corresponding to each action through a latest target value network, and taking the maximum score as a second score corresponding to the sample data;
calculating a loss value according to a return value in the sample data, a first score and a second score corresponding to the sample data and a preset discount rate, and updating the latest network parameters of the current value network to be trained according to the loss value;
and after traversing the sample data of the current batch, taking the latest current value network to be trained as the latest current value network.
12. The departure scheduling method according to claim 11, further comprising:
and when the simulation times reach integral multiples of the preset first times, updating the network parameters of the target value network according to the latest network parameters of the current value network.
13. The departure scheduling method according to claim 11, wherein the calculating a loss value according to the report value in the sample data, the first score and the second score corresponding to the sample data, and the predetermined discount rate includes:
calculating a Loss value according to a Loss function, the Loss function being Loss = (Q) target (s,a)-Q evel (s,a)) 2 ,Q target (s,a)=r+γ×max a'∈A Q (s ', a'), wherein Loss is the Loss value, Q evel (s, a) is the first score corresponding to the sample data, r is the return value in the sample data, γ is the predetermined discount rate, max a'∈A Q (s ', a') is a second score corresponding to the sample data.
14. The departure scheduling method according to claim 1, wherein the determining the optimal current value network corresponding to each number of the fast departure time periods according to the number of the fast departure time periods, the total waiting time and the current value network corresponding to each simulation comprises:
and comparing the total waiting time corresponding to each simulation of the same quick departure time period quantity, and taking the current value network corresponding to the simulation with the minimum total waiting time as the optimal current value network corresponding to the same quick departure time period quantity.
15. A dispatching device that dispatches vehicles, comprising:
the initialization module is used for initializing a current value network and a target value network and dividing the operation time of a train on a line into a preset number of departure time periods;
the system comprises an acquisition module, a traffic information acquisition module and a traffic information processing module, wherein the acquisition module is used for acquiring passenger trip data of one historical day and the capacity of a train of the line, the passenger trip data comprises inbound time, inbound station ID, outbound station ID and transit station ID of each passenger, and the transit station ID is determined according to the inbound station and the outbound station through a shortest path algorithm;
the simulation module is used for carrying out passenger flow simulation of preset total times of simulation according to the passenger travel data and the capacity of the trains on the line, acquiring sample data in the process of each simulation, storing the sample data into a memory, counting the number of time periods for quick departure and the total waiting time corresponding to the secondary simulation after each simulation is finished, and determining a current value network corresponding to the secondary simulation, wherein the primary simulation is passenger flow simulation of the operation time of one day, each sample data comprises state data of the time period for one departure, action and state data and a return value of the time period for the next departure of the time period for one departure, the state data comprises the number of passengers at each station on the line, the position of the issued trains and the number of issued passengers in the train, the action is used for issuing the train at a preset quick departure frequency or issuing the train at a preset slow departure frequency, and the action of the time period for the next departure is determined according to the latest current value network;
the training module is used for randomly selecting a batch of sample data from the memory bank according to the size of a preset batch when new sample data is stored in the memory bank and the number of the sample data in the memory bank reaches a preset number threshold, training the latest current value network according to the batch of sample data, and taking the trained current value network as the latest current value network;
the first determining module is used for determining the optimal current value network corresponding to each quick departure time period according to the quick departure time period number, the total waiting time and the current value network corresponding to each simulation when the simulation times reach the preset total simulation times;
the selection module is used for selecting an optimal current value network corresponding to the number of the time periods of quick departure according to the requirement;
and the second determining module is used for acquiring the state data of the one-time-of-departure period of the one line and determining the action of the next one-time-of-departure period of the one-time-of-departure period through the selected optimal current value network according to the state data of the one-time-of-departure period.
16. An electronic device, characterized in that the electronic device comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the departure scheduling method of any of claims 1-14.
17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the departure scheduling method according to any one of claims 1-14.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211095230.1A CN115170006B (en) | 2022-09-08 | 2022-09-08 | Dispatching method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211095230.1A CN115170006B (en) | 2022-09-08 | 2022-09-08 | Dispatching method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115170006A true CN115170006A (en) | 2022-10-11 |
CN115170006B CN115170006B (en) | 2022-11-29 |
Family
ID=83482435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211095230.1A Active CN115170006B (en) | 2022-09-08 | 2022-09-08 | Dispatching method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115170006B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116934249A (en) * | 2023-07-10 | 2023-10-24 | 漳州年盛信息技术有限公司 | Smart city management system based on big data and artificial intelligence |
CN118657274A (en) * | 2024-08-16 | 2024-09-17 | 深圳技术大学 | Scheduling optimization method for urban rail transit final buses under random passenger flow demand |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013043613A1 (en) * | 2011-09-20 | 2013-03-28 | Edelberg Benjamin Jason | Urban transportation system and method |
CN109657845A (en) * | 2018-11-29 | 2019-04-19 | 河海大学 | A kind of urban railway transit train timetable optimization system for time-varying passenger flow |
CN110556014A (en) * | 2019-09-11 | 2019-12-10 | 湖北公众信息产业有限责任公司 | intelligent bus dispatching platform system |
WO2020018026A1 (en) * | 2018-07-17 | 2020-01-23 | Aselsan Elektroni̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ | Line need determination method |
CN111369181A (en) * | 2020-06-01 | 2020-07-03 | 北京全路通信信号研究设计院集团有限公司 | Train autonomous scheduling deep reinforcement learning method and module |
CN112562377A (en) * | 2020-12-01 | 2021-03-26 | 厦门大学 | Passenger vehicle real-time scheduling method based on random opportunity constraint |
CN113276915A (en) * | 2021-07-06 | 2021-08-20 | 浙江非线数联科技股份有限公司 | Subway departure scheduling method and system |
CN113722874A (en) * | 2020-12-29 | 2021-11-30 | 京东城市(北京)数字科技有限公司 | Vehicle shift scheduling optimization method and device and electronic equipment |
CN113822564A (en) * | 2021-09-16 | 2021-12-21 | 民航数据通信有限责任公司 | Flight plan minimum sample size confirmation method and device for airspace simulation analysis |
CN113935181A (en) * | 2021-10-22 | 2022-01-14 | 暨南大学 | Train simulation operation optimization system construction method based on matched passenger flow |
-
2022
- 2022-09-08 CN CN202211095230.1A patent/CN115170006B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013043613A1 (en) * | 2011-09-20 | 2013-03-28 | Edelberg Benjamin Jason | Urban transportation system and method |
WO2020018026A1 (en) * | 2018-07-17 | 2020-01-23 | Aselsan Elektroni̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ | Line need determination method |
CN109657845A (en) * | 2018-11-29 | 2019-04-19 | 河海大学 | A kind of urban railway transit train timetable optimization system for time-varying passenger flow |
CN110556014A (en) * | 2019-09-11 | 2019-12-10 | 湖北公众信息产业有限责任公司 | intelligent bus dispatching platform system |
CN111369181A (en) * | 2020-06-01 | 2020-07-03 | 北京全路通信信号研究设计院集团有限公司 | Train autonomous scheduling deep reinforcement learning method and module |
CN112562377A (en) * | 2020-12-01 | 2021-03-26 | 厦门大学 | Passenger vehicle real-time scheduling method based on random opportunity constraint |
CN113722874A (en) * | 2020-12-29 | 2021-11-30 | 京东城市(北京)数字科技有限公司 | Vehicle shift scheduling optimization method and device and electronic equipment |
CN113276915A (en) * | 2021-07-06 | 2021-08-20 | 浙江非线数联科技股份有限公司 | Subway departure scheduling method and system |
CN113822564A (en) * | 2021-09-16 | 2021-12-21 | 民航数据通信有限责任公司 | Flight plan minimum sample size confirmation method and device for airspace simulation analysis |
CN113935181A (en) * | 2021-10-22 | 2022-01-14 | 暨南大学 | Train simulation operation optimization system construction method based on matched passenger flow |
Non-Patent Citations (7)
Title |
---|
WENXIN LI 等: "Comprehensive Optimization of a Metro Timetable Considering Passenger Waiting Time and Energy Efficiency", 《IEEE ACCESS》 * |
宋轩 等: "空间数据智能:概念、技术与挑战", 《计算机研究与发展》 * |
庄黄蕊等: "考虑铁路换乘客流的地铁列车发车时刻与限流方案协同优化研究", 《武汉理工大学学报(交通科学与工程版)》 * |
林禹童 等: "基于客流需求的列车时刻表和车底调度协同优化", 《山东科学》 * |
王先明: "基于元胞自动机的轨道交通建模及仿真分析", 《中国优秀硕士学位论文全文数据库(电子期刊) 工程科技II辑》 * |
赵建东等: "多源数据驱动CNN-GRU模型的公交客流量分类预测", 《交通运输工程学报》 * |
黄明华等: "换乘导向的轨道交通网络发车时间优化研究", 《西南交通大学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116934249A (en) * | 2023-07-10 | 2023-10-24 | 漳州年盛信息技术有限公司 | Smart city management system based on big data and artificial intelligence |
CN118657274A (en) * | 2024-08-16 | 2024-09-17 | 深圳技术大学 | Scheduling optimization method for urban rail transit final buses under random passenger flow demand |
Also Published As
Publication number | Publication date |
---|---|
CN115170006B (en) | 2022-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Al-Kanj et al. | Approximate dynamic programming for planning a ride-hailing system using autonomous fleets of electric vehicles | |
CN115170006B (en) | Dispatching method, device, equipment and storage medium | |
Mao et al. | Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach | |
CN108197739B (en) | Urban rail transit passenger flow prediction method | |
CN110555990B (en) | Effective parking space-time resource prediction method based on LSTM neural network | |
CN108629503B (en) | Prediction method for taxi getting-on demand based on deep learning | |
CN112074845A (en) | Deep reinforcement learning for optimizing car pooling strategies | |
CN109117993B (en) | Processing method for optimizing vehicle path | |
CN111985710A (en) | Bus passenger trip station prediction method, storage medium and server | |
EP3035314A1 (en) | A traffic data fusion system and the related method for providing a traffic state for a network of roads | |
US10522036B2 (en) | Method for robust control of a machine learning system and robust control system | |
CN109741626A (en) | Parking situation prediction technique, dispatching method and system | |
Pineda et al. | Integrated traffic-transit stochastic equilibrium model with park-and-ride facilities | |
CN113205698A (en) | Navigation reminding method based on IGWO-LSTM short-time traffic flow prediction | |
CN113672846A (en) | Network appointment scheduling method and device, electronic equipment and storage medium | |
CN107978148A (en) | A kind of traffic status prediction method based on multi-source traffic data dynamic reliability | |
Yu et al. | Optimal operations planning of electric autonomous vehicles via asynchronous learning in ride-hailing systems | |
Wang et al. | Cross-regional customized bus route planning considering staggered commuting during the COVID-19 | |
JP2023155476A (en) | Information processing system, information processing apparatus, information processing method, and information processing program | |
CN114117883A (en) | Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning | |
Haliem et al. | AdaPool: A diurnal-adaptive fleet management framework using model-free deep reinforcement learning and change point detection | |
Wu et al. | Data-driven inverse learning of passenger preferences in urban public transits | |
CN117875674B (en) | Bus scheduling method based on Q-learning | |
Kamel et al. | A modelling platform for optimizing time-dependent transit fares in large-scale multimodal networks | |
Kaddoura et al. | Optimal road pricing: Towards an agent-based marginal social cost approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |