CN113525462A - Timetable adjusting method and device under delay condition and electronic equipment - Google Patents

Timetable adjusting method and device under delay condition and electronic equipment Download PDF

Info

Publication number
CN113525462A
CN113525462A CN202110904084.1A CN202110904084A CN113525462A CN 113525462 A CN113525462 A CN 113525462A CN 202110904084 A CN202110904084 A CN 202110904084A CN 113525462 A CN113525462 A CN 113525462A
Authority
CN
China
Prior art keywords
sample
station
action
current
departure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110904084.1A
Other languages
Chinese (zh)
Other versions
CN113525462B (en
Inventor
吕宜生
王银
袁志明
王晓
王荣笙
董海荣
王飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Beijing Jiaotong University
Signal and Communication Research Institute of CARS
Original Assignee
Institute of Automation of Chinese Academy of Science
Beijing Jiaotong University
Signal and Communication Research Institute of CARS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, Beijing Jiaotong University, Signal and Communication Research Institute of CARS filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110904084.1A priority Critical patent/CN113525462B/en
Publication of CN113525462A publication Critical patent/CN113525462A/en
Application granted granted Critical
Publication of CN113525462B publication Critical patent/CN113525462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Train Traffic Observation, Control, And Security (AREA)

Abstract

The invention provides a method and a device for adjusting a schedule under a delay condition and electronic equipment, wherein the method comprises the following steps: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; based on the departure action sequence, the schedule of the current station is adjusted, the conditions of train running disorder and large-area delay to the station under emergency are reduced, and the total delay time of each station of all trains is shortened; the effect of adjusting the train schedule under the complex condition is improved.

Description

Timetable adjusting method and device under delay condition and electronic equipment
Technical Field
The invention relates to the field of high-speed rail transportation scheduling, in particular to a method and a device for adjusting a schedule under a delay condition and electronic equipment.
Background
With the gradual development of a transportation system, the high-speed railway has an increasingly prominent position in a comprehensive transportation system in China. In high-speed rail transportation, a high-speed rail train often deviates from a predetermined operation diagram due to an emergency such as communication interruption, bad weather, and human factors. The train schedule adjusting strategy with reasonable design can not only avoid train collision caused by collision, but also improve the operation efficiency of the whole high-speed rail network to the maximum extent. Therefore, the method for adjusting the high-speed train in emergency is of great significance.
At present, the high-speed train dispatching method can be mainly divided into the following three categories: simulation method, operation research method and heuristic group intelligent method. The simulation method depends on the simulation of a real environment to a great extent, a model operation platform needs to be built, and the optimization efficiency is low; the operation research method lacks real-time performance and adaptability and cannot meet the actual operation and adjustment requirements; although the heuristic group intelligent method has strong global search capability, the heuristic group intelligent method is easy to fall into the difficulty of local optimization in a complex scene, the calculation difficulty is high, and the optimization efficiency is low.
The prior art is difficult to adapt to the dynamic, complex and rapid change of high-speed rails, has poor adjustment effect and cannot effectively improve the operation efficiency of a traffic system.
Disclosure of Invention
The invention provides a method and a device for adjusting a schedule under a delay condition and electronic equipment, which are used for solving the problem of poor adjustment effect of a train schedule under the delay condition in the prior art, and realizing the reduction of the total delay time of all trains at each station under an emergency condition and the integral improvement of the adjustment effect of the train schedule under a complex condition.
The invention provides a method for adjusting a timetable under a delay condition, which comprises the following steps:
acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station;
inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;
and adjusting the schedule of the current station based on the departure action sequence.
According to the schedule adjusting method under the delay condition, the departure action planning model comprises an operating environment model and a strategy network model;
the operation environment model is used for updating the current train state to be adjusted based on the current train dispatching action, and the strategy network model is used for determining the next train dispatching action based on the action space range and the current train state to be adjusted;
wherein the initial train state to be adjusted is determined based on time information of each train to be adjusted at the current station, and the action space range is determined based on the infrastructure information.
According to the schedule adjusting method under the condition of delay, the departure action planning model is determined based on the following steps:
constructing an initial reinforcement learning model;
inputting time information and infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model to obtain an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station;
and updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
According to the schedule adjusting method under the delay condition, the initial reinforcement learning model comprises an initial operation environment model and an initial strategy network model;
the method comprises the steps of inputting time information and infrastructure information of each sample train to be adjusted of a current sample station into an initial reinforcement learning model, obtaining an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and comprises the following steps:
inputting a current sample departure action to the initial operation environment model to obtain a current train state to be adjusted output by the initial operation environment model and an action reward of the current sample departure action;
inputting the current train state of the sample to be adjusted into the initial strategy network model to obtain a next sample departure action output by the initial strategy network model based on the action space range of the current sample station and the current train state of the sample to be adjusted, and updating the next sample departure action into the current sample departure action until the current train state of the sample to be adjusted is empty;
the initial sample train state to be adjusted is determined by the initial reinforcement learning model based on the time information of each sample train to be adjusted of the current sample station, and the action space range of the current sample station is determined by the initial reinforcement learning model based on the infrastructure information of the current sample station.
According to the schedule adjusting method under the condition of delay, the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station, which correspond to the current sample departure action.
According to the schedule adjusting method under the delay condition, the parameter updating is carried out on the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model, and the method comprises the following steps:
and updating parameters of the initial reinforcement learning model by taking the strategy gradient direction as an updating direction based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
According to the time schedule adjusting method under the delay condition, the time information comprises the original planned arrival time, the original planned departure time and the actual arrival time of the corresponding train.
The invention also provides a schedule adjusting device under the delay condition, which comprises:
the system comprises an information acquisition unit, a data processing unit and a data processing unit, wherein the information acquisition unit is used for acquiring the time information of each train to be adjusted of a current station and the infrastructure information of the current station;
the sequence determining unit is used for inputting the time information and the infrastructure information of each train to be adjusted of the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;
and the time adjusting unit is used for adjusting the time table of the current station based on the departure action sequence.
The invention further provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the schedule adjusting method under any delay condition.
The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of schedule adjustment in the event of a delay as described in any of the above.
According to the method, the device and the electronic equipment for adjusting the timetable under the delay condition, the departure action sequence is obtained through the departure action planning model, the timetable of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the conditions of train operation disorder and large-area delay to the station under the emergency condition are reduced, and the total delay time of all stations of all trains is shortened; the parameters of the departure action planning model can be correspondingly adjusted according to actual requirements, so that the schedule adjustment strategy of the train is converged to an expected strategy, and the overall improvement of the train schedule adjustment effect under complex conditions is realized.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a method for adjusting a schedule in the event of a delay according to the present invention;
FIG. 2 is a second schematic flow chart of a method for adjusting a schedule under a delay condition according to the present invention;
FIG. 3 is a schematic diagram of a schedule adjustment apparatus for use in a delay condition in accordance with the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for adjusting a schedule under a delay condition according to the present invention, as shown in fig. 1, the method includes:
step 110, obtaining time information of each train to be adjusted of a current station and infrastructure information of the current station;
here, the train to be adjusted at the current station is a train which needs to be subjected to schedule adjustment when passing through the current station, and the train to be adjusted includes a train which has a delay and a train which does not have a delay but may need to be adjusted in cooperation with the delay train. The time information of each train to be adjusted may include the original planned arrival time and the original planned departure time of the train at the current station on the train schedule, may also include the actual arrival time of the train at the current station, and may also include the delay time of the train at the current station calculated according to the original planned arrival time, the actual arrival time, and the original planned departure time of the train at the current station, which is not specifically limited in the embodiment of the present invention.
The infrastructure information of the current station may be the number of adjustment tracks of the current station, i.e., the number of tracks that can be used to adjust the train track in case of a delay.
After acquiring the time information of each train to be adjusted at the current station and the infrastructure information of the current station, step 120 is performed.
Step 120, inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station.
Specifically, the departure maneuver planning model needs to be pre-trained before step 120 is performed. When the departure action planning model is trained, the delay time of each train to be adjusted at the current station can be summarized to obtain the total delay time of the current station, the total delay time of each station is obtained, the total delay time of each station is summarized, and the total delay time of each train to be adjusted at each station is obtained. And performing reinforcement learning on the initial reinforcement learning model to obtain a departure action planning model by taking the shortest total delay time of each train to be adjusted at each station as a target.
The reinforcement learning is to obtain a pre-estimated departure action sequence and action rewards of departure actions of each sample through an initial reinforcement learning model, determine current state transition information according to the current sample departure action and determine the next sample departure action according to the current state transition information; and updating the parameters of the reinforcement learning model according to the action reward of the departure action of each sample. And the state transition information is used for carrying out state transition on the current train state of the sample to be adjusted according to the time information of each train of the sample to be adjusted at the current sample station and the departure action of the current sample. The action reward of the sample departure action is used for representing the effect of the sample departure action of the sample train to be adjusted at the current sample station on the delay of each sample train to be adjusted at the current sample station.
For example, the current state transition information of the jth sample station may indicate that if the current state of the sample train to be adjusted of the jth sample station is N sample trains to be adjusted, and a certain sample train to be adjusted is selected to be sent in the t step, the next state of the current state of the sample train to be adjusted is N-1 sample trains to be adjusted, and the state transition is realized through the above manner. And then determining the sample train to be adjusted sent in the step t +1 according to the N-1 sample trains to be adjusted, and repeating the step until the number of the sample trains to be adjusted at the jth sample station is zero if the next state is N-2 sample trains to be adjusted. And updating the parameters of the initial reinforcement learning model according to the action reward of each sample departure action of the jth sample station.
In step 120, the acquired time information of each train to be adjusted at the current station and the infrastructure information of the current station are input into the departure action planning model, and the departure action planning model correspondingly outputs a departure action sequence of the current station according to the input information. The departure motion sequence here is a set of departure motions of each train to be adjusted at the current station.
After the departure motion sequence output by the departure motion planning model is obtained, step 130 is performed.
And step 130, adjusting the schedule of the current station based on the departure action sequence.
Specifically, the departure motion of each train to be adjusted at the current station is correspondingly adjusted according to the departure motion sequence of the current station output by the departure motion planning model obtained in the last step, that is, the schedule of each train to be adjusted at the current station is correspondingly adjusted. The departure operation here refers to an operation of departure of the currently selected train.
According to the method for adjusting the timetable under the delay condition, the departure action sequence is obtained through the departure action planning model, the timetable of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the conditions of train operation disorder and large-area delay to the station under the emergency condition are reduced, and the total delay time of each station of all trains is shortened; the parameters of the departure action planning model can be correspondingly adjusted according to actual requirements, so that the schedule adjustment strategy of the train is converged to an expected strategy, and the overall improvement of the train schedule adjustment effect under complex conditions is realized.
Based on the embodiment, the departure action planning model comprises an operation environment model and a strategy network model;
the operation environment model is used for updating the current train state to be adjusted based on the current train dispatching action, and the strategy network model is used for determining the next train dispatching action based on the action space range and the current train state to be adjusted;
wherein the initial train state to be adjusted is determined based on time information of each train to be adjusted at the current station, and the action space range is determined based on the infrastructure information.
The action space range here is a value range of an action space that can be selected by the policy network model determined according to the infrastructure information of the current station, and the value range of the action space is as follows:
Aj∈[0,cj-1]
wherein A isjIs a range of motion space, cjThe number of tracks is adjusted for the jth station.
It should be noted that infrastructure information of different stations is different, that is, the number of the adjustment tracks of different stations is different. In different stations, the value ranges of the action space which can be selected by the strategy network model determined according to the infrastructure information of the stations are different.
Specifically, in the process of planning departure actions, firstly, the operation environment model can determine the initial train state to be adjusted according to the time information of each train to be adjusted at the current station; the policy network model may determine the action space range of the current station according to the infrastructure information of the current station.
After that, the running environment model updates the current train state to be adjusted according to the current departure action; and the strategy network model determines the next departure action according to the action space range of the current station and the current train state to be adjusted. The strategy network model determines the current departure action according to the action space range of the initial station and the initial train state to be adjusted, and feeds the current departure action back to the operation environment model; the running environment model updates the initial train state to be adjusted according to the current departure action, and determines the updated train state to be adjusted as the current train state to be adjusted; and the strategy network model determines the next departure action according to the current train state to be adjusted and the action space range of the current station, feeds the next departure action back to the operation environment model, and the process is circulated until the current train state to be adjusted is empty, and the circulation is ended.
Based on the above embodiment, the policy network model is used to determine the next departure action based on the action space range and the current train state to be adjusted, and may be represented as:
Figure BDA0003200995440000091
wherein the content of the first and second substances,
Figure BDA0003200995440000092
the method comprises the steps that a to-be-adjusted train state of a jth station is shown when the jth station strategy network model determines the to-be-adjusted train which starts at the tth station;
Figure BDA0003200995440000093
represents the action taken when the jth station strategy network model decides the train to be adjusted which departs from the tth station,
Figure BDA0003200995440000094
representing a policy network model in state
Figure BDA0003200995440000095
Next, select action
Figure BDA0003200995440000096
The probability of (d); mu.sθDenotes a policy network model, theta denotes parameters of the policy network model,
Figure BDA0003200995440000097
representing an action
Figure BDA0003200995440000098
And exp represents an exponential function with e as the base, corresponding to the train to be adjusted.
Based on the above embodiment, the departure action planning model is determined based on the following steps:
constructing an initial reinforcement learning model;
inputting time information and infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model to obtain an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station;
and updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
Specifically, before the initial reinforcement learning model is constructed, time information of each sample train to be adjusted of the current sample station and infrastructure information of the current sample station are also acquired. And constructing an initial reinforcement learning model according to the acquired time information of each sample train to be adjusted of the current sample station and the infrastructure information of the current sample station.
The time information and the infrastructure information of each sample train to be adjusted of the current sample station are input into an initial reinforcement learning model, and the initial reinforcement learning model correspondingly outputs an estimated departure action sequence of the current sample station and an action reward of each sample departure action in the estimated departure action sequence according to the input information. The estimated departure action sequence is a set of estimated sample departure actions of each sample train to be adjusted at the current sample station. The action reward of the sample departure action is used for representing the effect of the sample departure action of the sample train to be adjusted at the current sample station on the delay of each sample train to be adjusted at the current sample station.
After the estimated departure action sequence of the current sample station output by the initial reinforcement learning model and the action reward of each sample departure action in the estimated departure action sequence are obtained, determining the next sample station of the current sample station according to the timetable of the sample train to be adjusted, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station.
The method comprises the steps of obtaining time information of each to-be-adjusted sample train of an updated current sample station and infrastructure information of the updated current sample station, inputting the time information and the infrastructure information of each to-be-adjusted sample train of the updated current sample station into an initial reinforcement learning model, obtaining an estimated departure action sequence of the updated current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, then determining a next sample station of the updated current sample station from a time list of the to-be-adjusted sample train, updating the next sample station of the updated current sample station into the current sample station, and repeating the processes until the current sample station is the last sample station on the time list of the to-be-adjusted sample train.
Further, updating parameters of the initial reinforcement learning model according to the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station, and taking the updated initial reinforcement learning model as a departure action planning model; summarizing the action rewards of each sample departure action in the estimated departure action sequence of the current sample station output by the initial reinforcement learning model to obtain the rewards of the estimated departure action sequence of the current sample station; and summarizing rewards of the estimated departure action sequence of each sample station to obtain total rewards, updating parameters of the initial reinforcement learning model according to the total rewards, wherein the updated initial reinforcement learning model is the departure action planning model.
Based on the above embodiment, the method for constructing the initial reinforcement learning model further includes:
the method comprises the steps of obtaining each sample train to be adjusted of a current sample station, abstracting each sample train to be adjusted in each sample train to be adjusted of the current sample station into a multi-element group, wherein the multi-element group comprises information of original planned arrival time, original planned departure time, actual arrival time and the like of the sample train to be adjusted at the current sample station. The tuple can be represented as:
Figure BDA0003200995440000111
wherein the content of the first and second substances,
Figure BDA0003200995440000112
the ith train of samples to be adjusted representing the jth sample station,
Figure BDA0003200995440000113
the original planned arrival time of the ith sample train to be adjusted at the jth sample station is shown,
Figure BDA0003200995440000114
represents the originally planned departure time of the ith sample train to be adjusted at the jth sample station, AijAnd the actual arrival time of the ith sample train to be adjusted at the jth sample station is shown.
Combining the multi-element combination of each sample train to be adjusted to obtain a data set of each sample train to be adjusted at the current sample station, wherein the data set of each sample train to be adjusted is represented as:
Figure BDA0003200995440000115
wherein, XjAnd the data set of each train to be adjusted is expressed as a jth sample station, and n is the number of the trains to be adjusted at the jth sample station.
Due to the fact that the acquired data set of each train to be adjusted at the current sample station may have an irregular format, the data set of each train to be adjusted needs to be preprocessed, and the time information of each train to be adjusted at the current sample station is obtained after preprocessing. The specific treatment method of the pretreatment comprises the following steps: the data set obtained was normalized, based on 24 hours a day, as shown in the following formula:
Figure BDA0003200995440000121
based on the above embodiment, the initial reinforcement learning model includes an initial operating environment model and an initial policy network model;
the method comprises the steps of inputting time information and infrastructure information of each sample train to be adjusted of a current sample station into an initial reinforcement learning model, obtaining an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and comprises the following steps:
inputting a current sample departure action to the initial operation environment model to obtain a current train state to be adjusted output by the initial operation environment model and an action reward of the current sample departure action;
inputting the current train state of the sample to be adjusted into the initial strategy network model to obtain a next sample departure action output by the initial strategy network model based on the action space range of the current sample station and the current train state of the sample to be adjusted, and updating the next sample departure action into the current sample departure action until the current train state of the sample to be adjusted is empty;
the initial sample train state to be adjusted is determined by the initial reinforcement learning model based on the time information of each sample train to be adjusted of the current sample station, and the action space range of the current sample station is determined by the initial reinforcement learning model based on the infrastructure information of the current sample station.
The action space range of the current sample station is a value range of an action space that can be selected by the initial policy network model determined according to the infrastructure information of the current sample station. Infrastructure information of different sample stations is different, namely the number of the adjusting tracks of different sample stations is different. In different sample stations, the value ranges of the action space which can be selected by the initial strategy network model determined according to the infrastructure information of the sample stations are different.
Specifically, in the training process of the initial reinforcement learning model, the initial reinforcement learning model may determine an initial sample train state to be adjusted according to time information of each sample train to be adjusted at the current sample station, and may also determine an action space range of the current sample station according to infrastructure information of the current sample station.
After that, inputting the current sample departure action into an initial operation environment model, and correspondingly outputting the current train state of the sample to be adjusted and the action reward of the current sample departure action by the initial operation environment model according to the input current sample departure action; namely, the initial operation environment model correspondingly outputs the current train state of the sample to be adjusted and the action reward of the current sample departure action according to the input current sample departure action.
And further, inputting the current train state of the sample to be adjusted output by the initial operation environment model into an initial strategy network model, wherein the initial strategy network model correspondingly outputs the next train dispatching action of the current sample station according to the action space range of the current sample station and the input current train state of the sample to be adjusted, namely the initial strategy network model determines the next train dispatching action of the current sample station according to the action space range of the current sample station and the current train state of the sample to be adjusted.
And after the next sample departure action of the current sample station output by the initial strategy network model is obtained, updating the next sample departure action of the current sample station into the current sample departure action until the current sample train state to be adjusted is empty.
The method comprises the steps that an initial strategy network model outputs a current sample departure action according to an initial sample train state to be adjusted and an action space range of an initial sample station, and feeds the current sample departure action back to an initial operation environment model; and the initial running environment model updates the initial sample train state to be adjusted according to the current sample departure action, determines the updated initial sample train state to be adjusted as the current sample train state to be adjusted, and outputs the current sample train state to be adjusted and the action reward of the current sample departure action. The initial strategy network model outputs a next sample departure action according to the current train state of the sample to be adjusted and the action space range of the current sample station, and feeds the next sample departure action back to the initial operation environment model; and circulating in such a way until the current train state to be adjusted is empty, and ending the circulation.
Based on the above embodiment, the train state of the sample to be adjusted can be represented as:
Figure BDA0003200995440000141
wherein the content of the first and second substances,
Figure BDA0003200995440000142
and the state of the train to be adjusted at the jth station is shown when the jth station initial strategy network model determines the train to be adjusted which starts at the tth station.
Based on the above embodiment, the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action.
The originally planned arrival time of the next sample station is the time when the instruction on the timetable of the sample train to be adjusted arrives at the next sample station; the actual arrival time of the next sample station is the time when the sample train to be adjusted actually arrives at the next sample station.
Specifically, the initial operation environment model determines the action reward of the current sample departure action according to the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action. The action reward for the sample departure action may be expressed as:
Figure BDA0003200995440000143
wherein the content of the first and second substances,
Figure BDA0003200995440000144
an action award indicating the sample departure action at the jth sample station at the t step, Aij+1The actual arrival time of the ith sample train to be adjusted at the j +1 st stop is shown,
Figure BDA0003200995440000145
and the original planned arrival time of the ith sample train to be adjusted at the j +1 th station is shown.
Based on the above embodiment, the obtaining the departure motion planning model by updating parameters of the initial reinforcement learning model based on the motion reward of each sample departure motion in the estimated departure motion sequence corresponding to each sample station includes:
and updating parameters of the initial reinforcement learning model by taking the strategy gradient direction as an updating direction based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
Specifically, parameters of the initial reinforcement learning model are updated by taking the strategy gradient direction as an updating direction, so that a departure action planning model is obtained. That is, the updating direction of the parameters of the initial policy network model in the initial reinforcement learning model is the policy gradient direction, and the updating direction with the policy gradient direction can be specifically expressed as:
Figure BDA0003200995440000151
Figure BDA0003200995440000152
wherein, thetanewUpdating parameters of the initial strategy network model; thetaoldThe parameters are parameters when the initial strategy network model is not updated; alpha is the learning rate of the updating of the initial strategy network model;
Figure BDA0003200995440000153
the method comprises the steps that when a jth station initial strategy network model determines a sample train to be adjusted which starts at a tth position, the state of the sample train to be adjusted at the station is shown;
Figure BDA0003200995440000154
the action taken when the jth station initial strategy network model determines the sample train to be adjusted starting from the tth station is shown,
Figure BDA0003200995440000155
indicating that the initial policy network model is in state
Figure BDA0003200995440000156
Next, select action
Figure BDA0003200995440000157
The probability of (d);
Figure BDA0003200995440000158
is the strategic gradient direction;
Figure BDA0003200995440000159
represents the accumulated reward of all actions from the t th station to the last station, gamma is the attenuation coefficient of the reward function,
Figure BDA00032009954400001510
an action reward, γ, representing the action taken when the jth station decides the sample train to be adjusted starting at the kth stationk-tThe decay factor representing the cumulative prize for all actions from the tth bit to the last bit. It should be noted that, in the embodiment of the present invention, the value range of the learning rate of the initial policy network model update is preferably [0.1,0.3 ]]。
Based on the above embodiment, the time information includes the original planned arrival time, the original planned departure time, and the actual arrival time of the corresponding train.
The originally planned arrival time of the train is the time for indicating the arrival of a certain station on the schedule of the train; the originally planned departure time is the time for a certain station to be driven out indicated on a schedule of the train; the actual arrival time is the time when the train actually arrives at a certain station.
Fig. 2 is a second schematic flow chart of the method for adjusting a schedule under a delay condition according to the present invention, as shown in fig. 2, the method includes:
an application process of the departure action planning model and a training process of the initial reinforcement learning model; the training process of the initial reinforcement learning model comprises the following steps:
and step 210, constructing an initial reinforcement learning model.
Step 220, inputting the time information and the infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model, obtaining the estimated departure action sequence of the current sample station output by the initial reinforcement learning model, and the action reward of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station.
And step 230, updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain a departure action planning model.
The application process of the departure action planning model comprises the following steps:
and 240, acquiring time information of each train to be adjusted at the current station and infrastructure information of the current station.
Step 250, inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station.
And step 260, adjusting the schedule of the current station based on the departure action sequence.
According to the method for adjusting the timetable under the delay condition, the initial reinforcement learning model is trained to obtain the departure action planning model, the departure action sequence is obtained through the departure action planning model, the timetable of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the initial reinforcement learning model is applied to the process of adjusting the timetable of the train, the optimal or suboptimal adjustment strategy is learned through interactive learning with the environment, the traffic requirements under different delays can be adapted, and the operation efficiency of the whole high-speed rail network is improved.
The schedule adjusting apparatus under the delay condition provided by the present invention is described below, and the schedule adjusting apparatus under the delay condition described below and the schedule adjusting method under the delay condition described above can be referred to in correspondence.
Fig. 3 is a schematic structural diagram of a schedule adjusting apparatus in a delay situation according to the present invention. As shown in fig. 3, the apparatus includes:
an information obtaining unit 310, configured to obtain time information of each train to be adjusted at a current station and infrastructure information of the current station;
a sequence determining unit 320, configured to input time information and infrastructure information of each train to be adjusted at the current station to a departure action planning model, so as to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;
a time adjusting unit 330, configured to adjust a time table of the current station based on the departure action sequence.
According to the time schedule adjusting device under the delay condition, the departure action sequence is obtained through the departure action planning model, the time schedule of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the conditions of train operation disorder and large-area delay to the station under the emergency condition are reduced, and the total delay time of each station of all trains is shortened; the parameters of the departure action planning model can be correspondingly adjusted according to actual requirements, so that the schedule adjustment strategy of the train is converged to an expected strategy, and the overall improvement of the train schedule adjustment effect under complex conditions is realized.
Based on the embodiment, the departure action planning model comprises an operating environment model and a strategy network model;
the operation environment model is used for updating the current train state to be adjusted based on the current train dispatching action, and the strategy network model is used for determining the next train dispatching action based on the action space range and the current train state to be adjusted;
wherein the initial train state to be adjusted is determined based on time information of each train to be adjusted at the current station, and the action space range is determined based on the infrastructure information.
Based on the above embodiment, the sequence determining unit 320 is configured to:
constructing an initial reinforcement learning model;
inputting time information and infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model to obtain an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station;
and updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
Based on the above embodiment, the initial reinforcement learning model includes an initial operating environment model and an initial policy network model; the sequence determination unit 320 is configured to:
inputting a current sample departure action to the initial operation environment model to obtain a current train state to be adjusted output by the initial operation environment model and an action reward of the current sample departure action;
inputting the current train state of the sample to be adjusted into the initial strategy network model to obtain a next sample departure action output by the initial strategy network model based on the action space range of the current sample station and the current train state of the sample to be adjusted, and updating the next sample departure action into the current sample departure action until the current train state of the sample to be adjusted is empty;
the initial sample train state to be adjusted is determined by the initial reinforcement learning model based on the time information of each sample train to be adjusted of the current sample station, and the action space range of the current sample station is determined by the initial reinforcement learning model based on the infrastructure information of the current sample station.
Based on the above embodiment, the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action.
Based on the above embodiment, the apparatus further includes a parameter updating unit, configured to:
and updating parameters of the initial reinforcement learning model by taking the strategy gradient direction as an updating direction based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
Based on the above embodiment, the time information includes the original planned arrival time, the original planned departure time, and the actual arrival time of the corresponding train.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may call logic instructions in memory 430 to perform a schedule adjustment method in the event of a stall, the method comprising: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; and adjusting the schedule of the current station based on the departure action sequence.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of schedule adjustment in the event of a delay provided by the above methods, the method comprising: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; and adjusting the schedule of the current station based on the departure action sequence.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the schedule adjustment method in the delay scenario provided above, the method comprising: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; and adjusting the schedule of the current station based on the departure action sequence.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for adjusting a schedule in the event of a delay, comprising:
acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station;
inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;
and adjusting the schedule of the current station based on the departure action sequence.
2. The method of schedule adjustment in the event of a delay according to claim 1 wherein the departure action planning model comprises an operational environment model and a policy network model;
the operation environment model is used for updating the current train state to be adjusted based on the current train dispatching action, and the strategy network model is used for determining the next train dispatching action based on the action space range and the current train state to be adjusted;
wherein the initial train state to be adjusted is determined based on time information of each train to be adjusted at the current station, and the action space range is determined based on the infrastructure information.
3. The method of schedule adjustment in the event of a delay according to claim 2 wherein the departure maneuver planning model is determined based on the steps of:
constructing an initial reinforcement learning model;
inputting time information and infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model to obtain an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station;
and updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
4. The method of schedule adjustment in the event of a delay of claim 3 wherein the initial reinforcement learning model comprises an initial operating environment model and an initial policy network model;
the method comprises the steps of inputting time information and infrastructure information of each sample train to be adjusted of a current sample station into an initial reinforcement learning model, obtaining an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and comprises the following steps:
inputting a current sample departure action to the initial operation environment model to obtain a current train state to be adjusted output by the initial operation environment model and an action reward of the current sample departure action;
inputting the current train state of the sample to be adjusted into the initial strategy network model to obtain a next sample departure action output by the initial strategy network model based on the action space range of the current sample station and the current train state of the sample to be adjusted, and updating the next sample departure action into the current sample departure action until the current train state of the sample to be adjusted is empty;
the initial sample train state to be adjusted is determined by the initial reinforcement learning model based on the time information of each sample train to be adjusted of the current sample station, and the action space range of the current sample station is determined by the initial reinforcement learning model based on the infrastructure information of the current sample station.
5. The method for adjusting the timetable under the delay condition according to claim 4, wherein the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action.
6. The method for adjusting the timetable under the delay condition according to claim 3, wherein the step of updating the parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model comprises:
and updating parameters of the initial reinforcement learning model by taking the strategy gradient direction as an updating direction based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.
7. The method according to any one of claims 1 to 6, wherein the time information includes an original planned arrival time, an original planned departure time, and an actual arrival time of the corresponding train.
8. A schedule adjustment apparatus for use in the event of a delay, comprising:
the system comprises an information acquisition unit, a data processing unit and a data processing unit, wherein the information acquisition unit is used for acquiring the time information of each train to be adjusted of a current station and the infrastructure information of the current station;
the sequence determining unit is used for inputting the time information and the infrastructure information of each train to be adjusted of the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;
and the time adjusting unit is used for adjusting the time table of the current station based on the departure action sequence.
9. An electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that said processor, when executing said program, performs the steps of the method of schedule adjustment in case of a delay as claimed in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of schedule adjustment in the event of a delay as claimed in any one of claims 1 to 7.
CN202110904084.1A 2021-08-06 2021-08-06 Method and device for adjusting timetable under delay condition and electronic equipment Active CN113525462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110904084.1A CN113525462B (en) 2021-08-06 2021-08-06 Method and device for adjusting timetable under delay condition and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110904084.1A CN113525462B (en) 2021-08-06 2021-08-06 Method and device for adjusting timetable under delay condition and electronic equipment

Publications (2)

Publication Number Publication Date
CN113525462A true CN113525462A (en) 2021-10-22
CN113525462B CN113525462B (en) 2022-06-28

Family

ID=78122092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110904084.1A Active CN113525462B (en) 2021-08-06 2021-08-06 Method and device for adjusting timetable under delay condition and electronic equipment

Country Status (1)

Country Link
CN (1) CN113525462B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115230777A (en) * 2022-06-21 2022-10-25 中国科学院自动化研究所 Scheduling policy adjustment method and device, electronic equipment and storage medium
CN115688971A (en) * 2022-09-23 2023-02-03 北京交通大学 Wire network passenger flow control and train adjustment collaborative optimization method under train delay

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
US20190114321A1 (en) * 2017-10-17 2019-04-18 Royal Bank Of Canada Auto tele-interview solution
CN109740839A (en) * 2018-11-23 2019-05-10 北京交通大学 Train Dynamic method of adjustment and system under a kind of emergency event
KR101966641B1 (en) * 2018-10-04 2019-08-19 충남대학교산학협력단 Rail temperature prediction system considering weather condition and rail temperature prediction method using the same
WO2020002017A1 (en) * 2018-06-28 2020-01-02 Konux Gmbh Planning of maintenance of railway
CN110803204A (en) * 2019-11-13 2020-02-18 东北大学 Online control system and method for maintaining running stability of high-speed train
WO2020040763A1 (en) * 2018-08-23 2020-02-27 Siemens Aktiengesellschaft Real-time production scheduling with deep reinforcement learning and monte carlo tree search
WO2020086214A1 (en) * 2018-10-26 2020-04-30 Dow Global Technologies Llc Deep reinforcement learning for production scheduling
CN111369181A (en) * 2020-06-01 2020-07-03 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling deep reinforcement learning method and module
CN111376954A (en) * 2020-06-01 2020-07-07 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling method and system
CN111768074A (en) * 2020-05-22 2020-10-13 北京交通大学 Novel train operation intelligent adjustment method
CN111932039A (en) * 2020-09-29 2020-11-13 北京交通大学 Train arrival late prediction method and device, electronic equipment and storage medium
CN112319557A (en) * 2020-10-27 2021-02-05 北京交通大学 Operation adjusting method and system for subway train under late condition
CN112389509A (en) * 2020-11-16 2021-02-23 北京交通大学 Auxiliary adjusting method and system for high-speed train timetable
CN112800565A (en) * 2021-01-12 2021-05-14 北京交通大学 Method for predicting delay spread of high-speed railway network train
US20210166181A1 (en) * 2018-06-27 2021-06-03 Siemens Aktiengesellschaft Equipment management method, device, system and storage medium
CN112977553A (en) * 2021-03-05 2021-06-18 北京交通大学 Automatic train operation adjusting method

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
US20190114321A1 (en) * 2017-10-17 2019-04-18 Royal Bank Of Canada Auto tele-interview solution
US20210166181A1 (en) * 2018-06-27 2021-06-03 Siemens Aktiengesellschaft Equipment management method, device, system and storage medium
WO2020002017A1 (en) * 2018-06-28 2020-01-02 Konux Gmbh Planning of maintenance of railway
WO2020040763A1 (en) * 2018-08-23 2020-02-27 Siemens Aktiengesellschaft Real-time production scheduling with deep reinforcement learning and monte carlo tree search
KR101966641B1 (en) * 2018-10-04 2019-08-19 충남대학교산학협력단 Rail temperature prediction system considering weather condition and rail temperature prediction method using the same
WO2020086214A1 (en) * 2018-10-26 2020-04-30 Dow Global Technologies Llc Deep reinforcement learning for production scheduling
CN109740839A (en) * 2018-11-23 2019-05-10 北京交通大学 Train Dynamic method of adjustment and system under a kind of emergency event
CN110803204A (en) * 2019-11-13 2020-02-18 东北大学 Online control system and method for maintaining running stability of high-speed train
CN111768074A (en) * 2020-05-22 2020-10-13 北京交通大学 Novel train operation intelligent adjustment method
CN111369181A (en) * 2020-06-01 2020-07-03 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling deep reinforcement learning method and module
CN111376954A (en) * 2020-06-01 2020-07-07 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling method and system
CN111932039A (en) * 2020-09-29 2020-11-13 北京交通大学 Train arrival late prediction method and device, electronic equipment and storage medium
CN112319557A (en) * 2020-10-27 2021-02-05 北京交通大学 Operation adjusting method and system for subway train under late condition
CN112389509A (en) * 2020-11-16 2021-02-23 北京交通大学 Auxiliary adjusting method and system for high-speed train timetable
CN112800565A (en) * 2021-01-12 2021-05-14 北京交通大学 Method for predicting delay spread of high-speed railway network train
CN112977553A (en) * 2021-03-05 2021-06-18 北京交通大学 Automatic train operation adjusting method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
D. ŠEMROV; R. MARSETIČ; M. ŽURA; L. TODOROVSKI; A. SRDIC: "Reinforcement learning approach for train rescheduling on a single-track railway", 《TRANSPORTATION RESEARCH PART B: METHODOLOGICAL》 *
LINGBIN NING; YIDONG LI; MIN ZHOU; HAIFENG SONG; HAIRONG DONG: "A Deep Reinforcement Learning Approach to High-speed Train Timetable Rescheduling under Disturbances", 《2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC)》 *
YIWEI GUO: "A Reinforcement Learning Approach to Train Timetabling for Inter-City High Speed Railway Lines", 《2020 IEEE 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING (ICITE)》 *
俞胜平: "基于策略梯度强化学习的高铁列车动态调度方法", 《控制与决策》 *
段艳洁,吕宜生等: "深度学习在控制领域的研究现状与展望", 《自动化学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115230777A (en) * 2022-06-21 2022-10-25 中国科学院自动化研究所 Scheduling policy adjustment method and device, electronic equipment and storage medium
CN115230777B (en) * 2022-06-21 2024-01-16 中国科学院自动化研究所 Scheduling policy adjustment method and device, electronic equipment and storage medium
CN115688971A (en) * 2022-09-23 2023-02-03 北京交通大学 Wire network passenger flow control and train adjustment collaborative optimization method under train delay

Also Published As

Publication number Publication date
CN113525462B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN113525462B (en) Method and device for adjusting timetable under delay condition and electronic equipment
CN109740839B (en) Train dynamic adjustment method and system under emergency
Dong et al. Integrated optimization of train stop planning and timetabling for commuter railways with an extended adaptive large neighborhood search metaheuristic approach
Wang et al. Efficient real-time train scheduling for urban rail transit systems using iterative convex programming
Nesheli et al. Optimal combinations of selected tactics for public-transport transfer synchronization
CN110766249B (en) Vehicle scheduling method and device, computer equipment and storage medium
CN111859666B (en) Train group generation method, device and system, storage medium and electronic equipment
CN111783357A (en) Transfer route optimization method and system based on passenger delay reduction
CN112308332A (en) Rail transit parallel deduction system and method
CN115796509A (en) Rail transit emergency scheduling aid decision-making system and method
CN113536692B (en) Intelligent dispatching method and system for high-speed rail train under uncertain environment
CN110889596B (en) Construction plan scheduling method and system for railway engineering line
Zhang et al. Optimized skip-stop metro line operation using smart card data
CN113844507A (en) Train simulation operation system construction method based on digital twins
CN110920684B (en) Method and device for determining train position information, electronic equipment and storage medium
Wang et al. Origin-destination dependent train scheduling problem with stop-skipping for urban rail transit systems
CN115049162B (en) Hybrid coding based high-speed rail station arrival and departure line application adjustment method at late train
CN116719787A (en) Method and device for uploading equipment logs in track system, medium and electronic equipment
CN115230777B (en) Scheduling policy adjustment method and device, electronic equipment and storage medium
CN110803203B (en) Method and system for predicting evolution of high-speed railway running track
CN114792133A (en) Deep reinforcement learning method and device based on multi-agent cooperation system
CN113343422A (en) Rail transit operation simulation method and system
CN112800565A (en) Method for predicting delay spread of high-speed railway network train
Zhao et al. Dynamic Bus Holding Control Using Spatial-Temporal Data–A Deep Reinforcement Learning Approach
Masoud et al. A new approach to automatically producing schedules for cane railways

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant