CN113525462A

CN113525462A - Timetable adjusting method and device under delay condition and electronic equipment

Info

Publication number: CN113525462A
Application number: CN202110904084.1A
Authority: CN
Inventors: 吕宜生; 王银; 袁志明; 王晓; 王荣笙; 董海荣; 王飞跃
Original assignee: Institute of Automation of Chinese Academy of Science; Beijing Jiaotong University; Signal and Communication Research Institute of CARS
Current assignee: Institute of Automation of Chinese Academy of Science; Beijing Jiaotong University; Signal and Communication Research Institute of CARS
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-10-22
Anticipated expiration: 2041-08-06
Also published as: CN113525462B

Abstract

The invention provides a method and a device for adjusting a schedule under a delay condition and electronic equipment, wherein the method comprises the following steps: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; based on the departure action sequence, the schedule of the current station is adjusted, the conditions of train running disorder and large-area delay to the station under emergency are reduced, and the total delay time of each station of all trains is shortened; the effect of adjusting the train schedule under the complex condition is improved.

Description

Timetable adjusting method and device under delay condition and electronic equipment

Technical Field

The invention relates to the field of high-speed rail transportation scheduling, in particular to a method and a device for adjusting a schedule under a delay condition and electronic equipment.

Background

With the gradual development of a transportation system, the high-speed railway has an increasingly prominent position in a comprehensive transportation system in China. In high-speed rail transportation, a high-speed rail train often deviates from a predetermined operation diagram due to an emergency such as communication interruption, bad weather, and human factors. The train schedule adjusting strategy with reasonable design can not only avoid train collision caused by collision, but also improve the operation efficiency of the whole high-speed rail network to the maximum extent. Therefore, the method for adjusting the high-speed train in emergency is of great significance.

At present, the high-speed train dispatching method can be mainly divided into the following three categories: simulation method, operation research method and heuristic group intelligent method. The simulation method depends on the simulation of a real environment to a great extent, a model operation platform needs to be built, and the optimization efficiency is low; the operation research method lacks real-time performance and adaptability and cannot meet the actual operation and adjustment requirements; although the heuristic group intelligent method has strong global search capability, the heuristic group intelligent method is easy to fall into the difficulty of local optimization in a complex scene, the calculation difficulty is high, and the optimization efficiency is low.

The prior art is difficult to adapt to the dynamic, complex and rapid change of high-speed rails, has poor adjustment effect and cannot effectively improve the operation efficiency of a traffic system.

Disclosure of Invention

The invention provides a method and a device for adjusting a schedule under a delay condition and electronic equipment, which are used for solving the problem of poor adjustment effect of a train schedule under the delay condition in the prior art, and realizing the reduction of the total delay time of all trains at each station under an emergency condition and the integral improvement of the adjustment effect of the train schedule under a complex condition.

The invention provides a method for adjusting a timetable under a delay condition, which comprises the following steps:

acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station;

inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;

and adjusting the schedule of the current station based on the departure action sequence.

According to the schedule adjusting method under the delay condition, the departure action planning model comprises an operating environment model and a strategy network model;

the operation environment model is used for updating the current train state to be adjusted based on the current train dispatching action, and the strategy network model is used for determining the next train dispatching action based on the action space range and the current train state to be adjusted;

wherein the initial train state to be adjusted is determined based on time information of each train to be adjusted at the current station, and the action space range is determined based on the infrastructure information.

According to the schedule adjusting method under the condition of delay, the departure action planning model is determined based on the following steps:

constructing an initial reinforcement learning model;

inputting time information and infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model to obtain an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station;

and updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.

According to the schedule adjusting method under the delay condition, the initial reinforcement learning model comprises an initial operation environment model and an initial strategy network model;

the method comprises the steps of inputting time information and infrastructure information of each sample train to be adjusted of a current sample station into an initial reinforcement learning model, obtaining an estimated departure action sequence of the current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, and comprises the following steps:

inputting a current sample departure action to the initial operation environment model to obtain a current train state to be adjusted output by the initial operation environment model and an action reward of the current sample departure action;

inputting the current train state of the sample to be adjusted into the initial strategy network model to obtain a next sample departure action output by the initial strategy network model based on the action space range of the current sample station and the current train state of the sample to be adjusted, and updating the next sample departure action into the current sample departure action until the current train state of the sample to be adjusted is empty;

the initial sample train state to be adjusted is determined by the initial reinforcement learning model based on the time information of each sample train to be adjusted of the current sample station, and the action space range of the current sample station is determined by the initial reinforcement learning model based on the infrastructure information of the current sample station.

According to the schedule adjusting method under the condition of delay, the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station, which correspond to the current sample departure action.

According to the schedule adjusting method under the delay condition, the parameter updating is carried out on the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model, and the method comprises the following steps:

and updating parameters of the initial reinforcement learning model by taking the strategy gradient direction as an updating direction based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model.

According to the time schedule adjusting method under the delay condition, the time information comprises the original planned arrival time, the original planned departure time and the actual arrival time of the corresponding train.

The invention also provides a schedule adjusting device under the delay condition, which comprises:

the system comprises an information acquisition unit, a data processing unit and a data processing unit, wherein the information acquisition unit is used for acquiring the time information of each train to be adjusted of a current station and the infrastructure information of the current station;

the sequence determining unit is used for inputting the time information and the infrastructure information of each train to be adjusted of the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;

and the time adjusting unit is used for adjusting the time table of the current station based on the departure action sequence.

The invention further provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the schedule adjusting method under any delay condition.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of schedule adjustment in the event of a delay as described in any of the above.

According to the method, the device and the electronic equipment for adjusting the timetable under the delay condition, the departure action sequence is obtained through the departure action planning model, the timetable of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the conditions of train operation disorder and large-area delay to the station under the emergency condition are reduced, and the total delay time of all stations of all trains is shortened; the parameters of the departure action planning model can be correspondingly adjusted according to actual requirements, so that the schedule adjustment strategy of the train is converged to an expected strategy, and the overall improvement of the train schedule adjustment effect under complex conditions is realized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a method for adjusting a schedule in the event of a delay according to the present invention;

FIG. 2 is a second schematic flow chart of a method for adjusting a schedule under a delay condition according to the present invention;

FIG. 3 is a schematic diagram of a schedule adjustment apparatus for use in a delay condition in accordance with the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a method for adjusting a schedule under a delay condition according to the present invention, as shown in fig. 1, the method includes:

step 110, obtaining time information of each train to be adjusted of a current station and infrastructure information of the current station;

here, the train to be adjusted at the current station is a train which needs to be subjected to schedule adjustment when passing through the current station, and the train to be adjusted includes a train which has a delay and a train which does not have a delay but may need to be adjusted in cooperation with the delay train. The time information of each train to be adjusted may include the original planned arrival time and the original planned departure time of the train at the current station on the train schedule, may also include the actual arrival time of the train at the current station, and may also include the delay time of the train at the current station calculated according to the original planned arrival time, the actual arrival time, and the original planned departure time of the train at the current station, which is not specifically limited in the embodiment of the present invention.

The infrastructure information of the current station may be the number of adjustment tracks of the current station, i.e., the number of tracks that can be used to adjust the train track in case of a delay.

After acquiring the time information of each train to be adjusted at the current station and the infrastructure information of the current station, step 120 is performed.

Step 120, inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station.

Specifically, the departure maneuver planning model needs to be pre-trained before step 120 is performed. When the departure action planning model is trained, the delay time of each train to be adjusted at the current station can be summarized to obtain the total delay time of the current station, the total delay time of each station is obtained, the total delay time of each station is summarized, and the total delay time of each train to be adjusted at each station is obtained. And performing reinforcement learning on the initial reinforcement learning model to obtain a departure action planning model by taking the shortest total delay time of each train to be adjusted at each station as a target.

The reinforcement learning is to obtain a pre-estimated departure action sequence and action rewards of departure actions of each sample through an initial reinforcement learning model, determine current state transition information according to the current sample departure action and determine the next sample departure action according to the current state transition information; and updating the parameters of the reinforcement learning model according to the action reward of the departure action of each sample. And the state transition information is used for carrying out state transition on the current train state of the sample to be adjusted according to the time information of each train of the sample to be adjusted at the current sample station and the departure action of the current sample. The action reward of the sample departure action is used for representing the effect of the sample departure action of the sample train to be adjusted at the current sample station on the delay of each sample train to be adjusted at the current sample station.

For example, the current state transition information of the jth sample station may indicate that if the current state of the sample train to be adjusted of the jth sample station is N sample trains to be adjusted, and a certain sample train to be adjusted is selected to be sent in the t step, the next state of the current state of the sample train to be adjusted is N-1 sample trains to be adjusted, and the state transition is realized through the above manner. And then determining the sample train to be adjusted sent in the step t +1 according to the N-1 sample trains to be adjusted, and repeating the step until the number of the sample trains to be adjusted at the jth sample station is zero if the next state is N-2 sample trains to be adjusted. And updating the parameters of the initial reinforcement learning model according to the action reward of each sample departure action of the jth sample station.

In step 120, the acquired time information of each train to be adjusted at the current station and the infrastructure information of the current station are input into the departure action planning model, and the departure action planning model correspondingly outputs a departure action sequence of the current station according to the input information. The departure motion sequence here is a set of departure motions of each train to be adjusted at the current station.

After the departure motion sequence output by the departure motion planning model is obtained, step 130 is performed.

And step 130, adjusting the schedule of the current station based on the departure action sequence.

Specifically, the departure motion of each train to be adjusted at the current station is correspondingly adjusted according to the departure motion sequence of the current station output by the departure motion planning model obtained in the last step, that is, the schedule of each train to be adjusted at the current station is correspondingly adjusted. The departure operation here refers to an operation of departure of the currently selected train.

According to the method for adjusting the timetable under the delay condition, the departure action sequence is obtained through the departure action planning model, the timetable of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the conditions of train operation disorder and large-area delay to the station under the emergency condition are reduced, and the total delay time of each station of all trains is shortened; the parameters of the departure action planning model can be correspondingly adjusted according to actual requirements, so that the schedule adjustment strategy of the train is converged to an expected strategy, and the overall improvement of the train schedule adjustment effect under complex conditions is realized.

Based on the embodiment, the departure action planning model comprises an operation environment model and a strategy network model;

The action space range here is a value range of an action space that can be selected by the policy network model determined according to the infrastructure information of the current station, and the value range of the action space is as follows:

A^j∈[0,c^j-1]

wherein A is^jIs a range of motion space, c^jThe number of tracks is adjusted for the jth station.

It should be noted that infrastructure information of different stations is different, that is, the number of the adjustment tracks of different stations is different. In different stations, the value ranges of the action space which can be selected by the strategy network model determined according to the infrastructure information of the stations are different.

Specifically, in the process of planning departure actions, firstly, the operation environment model can determine the initial train state to be adjusted according to the time information of each train to be adjusted at the current station; the policy network model may determine the action space range of the current station according to the infrastructure information of the current station.

After that, the running environment model updates the current train state to be adjusted according to the current departure action; and the strategy network model determines the next departure action according to the action space range of the current station and the current train state to be adjusted. The strategy network model determines the current departure action according to the action space range of the initial station and the initial train state to be adjusted, and feeds the current departure action back to the operation environment model; the running environment model updates the initial train state to be adjusted according to the current departure action, and determines the updated train state to be adjusted as the current train state to be adjusted; and the strategy network model determines the next departure action according to the current train state to be adjusted and the action space range of the current station, feeds the next departure action back to the operation environment model, and the process is circulated until the current train state to be adjusted is empty, and the circulation is ended.

Based on the above embodiment, the policy network model is used to determine the next departure action based on the action space range and the current train state to be adjusted, and may be represented as:

wherein the content of the first and second substances,

the method comprises the steps that a to-be-adjusted train state of a jth station is shown when the jth station strategy network model determines the to-be-adjusted train which starts at the tth station;

represents the action taken when the jth station strategy network model decides the train to be adjusted which departs from the tth station,

representing a policy network model in state

Next, select action

The probability of (d); mu.s_θDenotes a policy network model, theta denotes parameters of the policy network model,

representing an action

And exp represents an exponential function with e as the base, corresponding to the train to be adjusted.

Based on the above embodiment, the departure action planning model is determined based on the following steps:

constructing an initial reinforcement learning model;

Specifically, before the initial reinforcement learning model is constructed, time information of each sample train to be adjusted of the current sample station and infrastructure information of the current sample station are also acquired. And constructing an initial reinforcement learning model according to the acquired time information of each sample train to be adjusted of the current sample station and the infrastructure information of the current sample station.

The time information and the infrastructure information of each sample train to be adjusted of the current sample station are input into an initial reinforcement learning model, and the initial reinforcement learning model correspondingly outputs an estimated departure action sequence of the current sample station and an action reward of each sample departure action in the estimated departure action sequence according to the input information. The estimated departure action sequence is a set of estimated sample departure actions of each sample train to be adjusted at the current sample station. The action reward of the sample departure action is used for representing the effect of the sample departure action of the sample train to be adjusted at the current sample station on the delay of each sample train to be adjusted at the current sample station.

After the estimated departure action sequence of the current sample station output by the initial reinforcement learning model and the action reward of each sample departure action in the estimated departure action sequence are obtained, determining the next sample station of the current sample station according to the timetable of the sample train to be adjusted, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station.

The method comprises the steps of obtaining time information of each to-be-adjusted sample train of an updated current sample station and infrastructure information of the updated current sample station, inputting the time information and the infrastructure information of each to-be-adjusted sample train of the updated current sample station into an initial reinforcement learning model, obtaining an estimated departure action sequence of the updated current sample station output by the initial reinforcement learning model and action rewards of each sample departure action in the estimated departure action sequence, then determining a next sample station of the updated current sample station from a time list of the to-be-adjusted sample train, updating the next sample station of the updated current sample station into the current sample station, and repeating the processes until the current sample station is the last sample station on the time list of the to-be-adjusted sample train.

Further, updating parameters of the initial reinforcement learning model according to the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station, and taking the updated initial reinforcement learning model as a departure action planning model; summarizing the action rewards of each sample departure action in the estimated departure action sequence of the current sample station output by the initial reinforcement learning model to obtain the rewards of the estimated departure action sequence of the current sample station; and summarizing rewards of the estimated departure action sequence of each sample station to obtain total rewards, updating parameters of the initial reinforcement learning model according to the total rewards, wherein the updated initial reinforcement learning model is the departure action planning model.

Based on the above embodiment, the method for constructing the initial reinforcement learning model further includes:

the method comprises the steps of obtaining each sample train to be adjusted of a current sample station, abstracting each sample train to be adjusted in each sample train to be adjusted of the current sample station into a multi-element group, wherein the multi-element group comprises information of original planned arrival time, original planned departure time, actual arrival time and the like of the sample train to be adjusted at the current sample station. The tuple can be represented as:

wherein the content of the first and second substances,

the ith train of samples to be adjusted representing the jth sample station,

the original planned arrival time of the ith sample train to be adjusted at the jth sample station is shown,

represents the originally planned departure time of the ith sample train to be adjusted at the jth sample station, A_ijAnd the actual arrival time of the ith sample train to be adjusted at the jth sample station is shown.

Combining the multi-element combination of each sample train to be adjusted to obtain a data set of each sample train to be adjusted at the current sample station, wherein the data set of each sample train to be adjusted is represented as:

wherein, X^jAnd the data set of each train to be adjusted is expressed as a jth sample station, and n is the number of the trains to be adjusted at the jth sample station.

Due to the fact that the acquired data set of each train to be adjusted at the current sample station may have an irregular format, the data set of each train to be adjusted needs to be preprocessed, and the time information of each train to be adjusted at the current sample station is obtained after preprocessing. The specific treatment method of the pretreatment comprises the following steps: the data set obtained was normalized, based on 24 hours a day, as shown in the following formula:

based on the above embodiment, the initial reinforcement learning model includes an initial operating environment model and an initial policy network model;

The action space range of the current sample station is a value range of an action space that can be selected by the initial policy network model determined according to the infrastructure information of the current sample station. Infrastructure information of different sample stations is different, namely the number of the adjusting tracks of different sample stations is different. In different sample stations, the value ranges of the action space which can be selected by the initial strategy network model determined according to the infrastructure information of the sample stations are different.

Specifically, in the training process of the initial reinforcement learning model, the initial reinforcement learning model may determine an initial sample train state to be adjusted according to time information of each sample train to be adjusted at the current sample station, and may also determine an action space range of the current sample station according to infrastructure information of the current sample station.

After that, inputting the current sample departure action into an initial operation environment model, and correspondingly outputting the current train state of the sample to be adjusted and the action reward of the current sample departure action by the initial operation environment model according to the input current sample departure action; namely, the initial operation environment model correspondingly outputs the current train state of the sample to be adjusted and the action reward of the current sample departure action according to the input current sample departure action.

And further, inputting the current train state of the sample to be adjusted output by the initial operation environment model into an initial strategy network model, wherein the initial strategy network model correspondingly outputs the next train dispatching action of the current sample station according to the action space range of the current sample station and the input current train state of the sample to be adjusted, namely the initial strategy network model determines the next train dispatching action of the current sample station according to the action space range of the current sample station and the current train state of the sample to be adjusted.

And after the next sample departure action of the current sample station output by the initial strategy network model is obtained, updating the next sample departure action of the current sample station into the current sample departure action until the current sample train state to be adjusted is empty.

The method comprises the steps that an initial strategy network model outputs a current sample departure action according to an initial sample train state to be adjusted and an action space range of an initial sample station, and feeds the current sample departure action back to an initial operation environment model; and the initial running environment model updates the initial sample train state to be adjusted according to the current sample departure action, determines the updated initial sample train state to be adjusted as the current sample train state to be adjusted, and outputs the current sample train state to be adjusted and the action reward of the current sample departure action. The initial strategy network model outputs a next sample departure action according to the current train state of the sample to be adjusted and the action space range of the current sample station, and feeds the next sample departure action back to the initial operation environment model; and circulating in such a way until the current train state to be adjusted is empty, and ending the circulation.

Based on the above embodiment, the train state of the sample to be adjusted can be represented as:

wherein the content of the first and second substances,

and the state of the train to be adjusted at the jth station is shown when the jth station initial strategy network model determines the train to be adjusted which starts at the tth station.

Based on the above embodiment, the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action.

The originally planned arrival time of the next sample station is the time when the instruction on the timetable of the sample train to be adjusted arrives at the next sample station; the actual arrival time of the next sample station is the time when the sample train to be adjusted actually arrives at the next sample station.

Specifically, the initial operation environment model determines the action reward of the current sample departure action according to the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action. The action reward for the sample departure action may be expressed as:

wherein the content of the first and second substances,

an action award indicating the sample departure action at the jth sample station at the t step, A_ij+1The actual arrival time of the ith sample train to be adjusted at the j +1 st stop is shown,

and the original planned arrival time of the ith sample train to be adjusted at the j +1 th station is shown.

Based on the above embodiment, the obtaining the departure motion planning model by updating parameters of the initial reinforcement learning model based on the motion reward of each sample departure motion in the estimated departure motion sequence corresponding to each sample station includes:

Specifically, parameters of the initial reinforcement learning model are updated by taking the strategy gradient direction as an updating direction, so that a departure action planning model is obtained. That is, the updating direction of the parameters of the initial policy network model in the initial reinforcement learning model is the policy gradient direction, and the updating direction with the policy gradient direction can be specifically expressed as:

wherein, theta_newUpdating parameters of the initial strategy network model; theta_oldThe parameters are parameters when the initial strategy network model is not updated; alpha is the learning rate of the updating of the initial strategy network model;

the method comprises the steps that when a jth station initial strategy network model determines a sample train to be adjusted which starts at a tth position, the state of the sample train to be adjusted at the station is shown;

the action taken when the jth station initial strategy network model determines the sample train to be adjusted starting from the tth station is shown,

indicating that the initial policy network model is in state

Next, select action

The probability of (d);

is the strategic gradient direction;

represents the accumulated reward of all actions from the t th station to the last station, gamma is the attenuation coefficient of the reward function,

an action reward, γ, representing the action taken when the jth station decides the sample train to be adjusted starting at the kth station^k-tThe decay factor representing the cumulative prize for all actions from the tth bit to the last bit. It should be noted that, in the embodiment of the present invention, the value range of the learning rate of the initial policy network model update is preferably [0.1,0.3 ]]。

Based on the above embodiment, the time information includes the original planned arrival time, the original planned departure time, and the actual arrival time of the corresponding train.

The originally planned arrival time of the train is the time for indicating the arrival of a certain station on the schedule of the train; the originally planned departure time is the time for a certain station to be driven out indicated on a schedule of the train; the actual arrival time is the time when the train actually arrives at a certain station.

Fig. 2 is a second schematic flow chart of the method for adjusting a schedule under a delay condition according to the present invention, as shown in fig. 2, the method includes:

an application process of the departure action planning model and a training process of the initial reinforcement learning model; the training process of the initial reinforcement learning model comprises the following steps:

and step 210, constructing an initial reinforcement learning model.

Step 220, inputting the time information and the infrastructure information of each sample train to be adjusted of the current sample station into the initial reinforcement learning model, obtaining the estimated departure action sequence of the current sample station output by the initial reinforcement learning model, and the action reward of each sample departure action in the estimated departure action sequence, and updating the next sample station of the current sample station into the current sample station until the current sample station is the rearmost sample station.

And step 230, updating parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain a departure action planning model.

The application process of the departure action planning model comprises the following steps:

and 240, acquiring time information of each train to be adjusted at the current station and infrastructure information of the current station.

Step 250, inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station.

And step 260, adjusting the schedule of the current station based on the departure action sequence.

According to the method for adjusting the timetable under the delay condition, the initial reinforcement learning model is trained to obtain the departure action planning model, the departure action sequence is obtained through the departure action planning model, the timetable of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the initial reinforcement learning model is applied to the process of adjusting the timetable of the train, the optimal or suboptimal adjustment strategy is learned through interactive learning with the environment, the traffic requirements under different delays can be adapted, and the operation efficiency of the whole high-speed rail network is improved.

The schedule adjusting apparatus under the delay condition provided by the present invention is described below, and the schedule adjusting apparatus under the delay condition described below and the schedule adjusting method under the delay condition described above can be referred to in correspondence.

Fig. 3 is a schematic structural diagram of a schedule adjusting apparatus in a delay situation according to the present invention. As shown in fig. 3, the apparatus includes:

an information obtaining unit 310, configured to obtain time information of each train to be adjusted at a current station and infrastructure information of the current station;

a sequence determining unit 320, configured to input time information and infrastructure information of each train to be adjusted at the current station to a departure action planning model, so as to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target;

a time adjusting unit 330, configured to adjust a time table of the current station based on the departure action sequence.

According to the time schedule adjusting device under the delay condition, the departure action sequence is obtained through the departure action planning model, the time schedule of the train to be adjusted is correspondingly adjusted according to the departure action sequence, the conditions of train operation disorder and large-area delay to the station under the emergency condition are reduced, and the total delay time of each station of all trains is shortened; the parameters of the departure action planning model can be correspondingly adjusted according to actual requirements, so that the schedule adjustment strategy of the train is converged to an expected strategy, and the overall improvement of the train schedule adjustment effect under complex conditions is realized.

Based on the embodiment, the departure action planning model comprises an operating environment model and a strategy network model;

Based on the above embodiment, the sequence determining unit 320 is configured to:

constructing an initial reinforcement learning model;

Based on the above embodiment, the initial reinforcement learning model includes an initial operating environment model and an initial policy network model; the sequence determination unit 320 is configured to:

Based on the above embodiment, the apparatus further includes a parameter updating unit, configured to:

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may call logic instructions in memory 430 to perform a schedule adjustment method in the event of a stall, the method comprising: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; and adjusting the schedule of the current station based on the departure action sequence.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of schedule adjustment in the event of a delay provided by the above methods, the method comprising: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; and adjusting the schedule of the current station based on the departure action sequence.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the schedule adjustment method in the delay scenario provided above, the method comprising: acquiring time information of each train to be adjusted of a current station and infrastructure information of the current station; inputting the time information and the infrastructure information of each train to be adjusted at the current station into a departure action planning model to obtain a departure action sequence output by the departure action planning model; the departure action planning model is obtained by reinforcement learning with the aim of shortest total delay time of each station as a target; and adjusting the schedule of the current station based on the departure action sequence.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for adjusting a schedule in the event of a delay, comprising:

2. The method of schedule adjustment in the event of a delay according to claim 1 wherein the departure action planning model comprises an operational environment model and a policy network model;

3. The method of schedule adjustment in the event of a delay according to claim 2 wherein the departure maneuver planning model is determined based on the steps of:

constructing an initial reinforcement learning model;

4. The method of schedule adjustment in the event of a delay of claim 3 wherein the initial reinforcement learning model comprises an initial operating environment model and an initial policy network model;

5. The method for adjusting the timetable under the delay condition according to claim 4, wherein the action reward of the current sample departure action is determined based on the actual arrival time and the originally planned arrival time of the sample train to be adjusted at the next sample station corresponding to the current sample departure action.

6. The method for adjusting the timetable under the delay condition according to claim 3, wherein the step of updating the parameters of the initial reinforcement learning model based on the action reward of each sample departure action in the estimated departure action sequence corresponding to each sample station to obtain the departure action planning model comprises:

7. The method according to any one of claims 1 to 6, wherein the time information includes an original planned arrival time, an original planned departure time, and an actual arrival time of the corresponding train.

8. A schedule adjustment apparatus for use in the event of a delay, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that said processor, when executing said program, performs the steps of the method of schedule adjustment in case of a delay as claimed in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of schedule adjustment in the event of a delay as claimed in any one of claims 1 to 7.