CN115782988A

CN115782988A - Train schedule determining method, device, equipment and medium

Info

Publication number: CN115782988A
Application number: CN202211469425.8A
Authority: CN
Inventors: 程高云; 潘龙飞; 刘义卿; 赵兴东
Original assignee: Traffic Control Technology TCT Co Ltd
Current assignee: Traffic Control Technology TCT Co Ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-03-14

Abstract

The invention provides a method, a device, equipment and a medium for determining a train schedule, which relate to the technical field of rail transit, and the method comprises the following steps: inputting a state group of the current time step to a first strategy network model, and acquiring a departure interval of the current time step; inputting a state group of the next time step to a second strategy network model to obtain the departure interval of the next time step; inputting a state group of the current time step and a departure interval of the current time step to a first value network model, and acquiring a first evaluation value; inputting a state group of the next time step and a departure interval of the next time step to a second value network model, and acquiring a second evaluation value; inputting a state group of a target time step to a first strategy network model to obtain a target departure interval; the method has the advantages of greatly improving the optimization efficiency of the train schedule in the long-distance traffic mode, and effectively reducing the subway operation cost and the passenger waiting cost.

Description

Train schedule determining method, device, equipment and medium

Technical Field

The invention relates to the technical field of rail transit, in particular to a train schedule determining method, device, equipment and medium.

Background

In the operation process of urban rail transit, the quality of the train schedule is related to the operation cost of an enterprise and the waiting cost of passengers, the waiting cost of the passengers can be increased due to overlarge departure time interval, the satisfaction degree of the passengers can be further reduced, and the operation cost of the enterprise can be increased due to undersized departure time interval.

Disclosure of Invention

The invention provides a method, a device, equipment and a medium for determining a train schedule, which are used for solving the technical problems that the departure time interval cannot be optimized and the enterprise operation cost and the passenger waiting cost cannot be reasonably balanced in the prior art and providing an optimization scheme of the train schedule under a large and small traffic mode based on reinforcement learning.

In a first aspect, the present invention provides a train schedule determining method, including:

repeatedly executing the following steps until a preset condition is met:

inputting a state group of the current time step to a first strategy network model, and acquiring a departure interval of the current time step output by the first strategy network model; inputting a state group of a next time step to a second strategy network model, and acquiring a next time step departure interval output by the second strategy network model;

inputting the state group of the current time step and the departure interval of the current time step to a first value network model, and acquiring a first evaluation value output by the first value network model; inputting the state group of the next time step and the departure interval of the next time step to a second value network model, and acquiring a second evaluation value output by the second value network model;

updating the first strategy network model according to the departure interval of the current time step and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second strategy network model is determined by updating the first strategy network model according to preset parameters, and the second price value network model is determined by updating the first price value network model according to the preset parameters;

after a preset condition is met, inputting a state group of a target time step to the first strategy network model, and acquiring a target departure interval output by the first strategy network model;

determining a train timetable of the target time step according to the initial departure time of the target time step and the target departure interval;

the state group is any one of a large cross road state group or a small cross road state group;

the time step is a time slice with preset duration.

According to the train schedule determining method provided by the invention, before inputting the state group of the next time step to the second strategy network model, the method further comprises the following steps:

determining the initial departure time of the next time step according to the initial departure time of the current time step and the departure interval of the current time step;

under the condition that the preset duration is smaller than the departure interval of the current time step, determining a state group of the train number corresponding to the current time step as a state group of the next time step;

and under the condition that the preset time length is greater than or equal to the departure interval of the current time step, determining the state group of the next train number corresponding to the train number of the current time step as the state group of the next time step.

According to the train schedule determining method provided by the invention, the updating of the first policy network model according to the departure interval of the current time step and the first evaluation value comprises the following steps:

determining a first updating value according to the first learning parameter, the influence gradient of the current time step departure interval and the influence gradient of the first evaluation value;

and determining an updated first policy network parameter according to the first policy network parameter corresponding to the first policy network model and the first update value, so as to update the first policy network model according to the updated first policy network parameter.

According to the train schedule determining method provided by the present invention, updating the first cost value network model according to the first evaluation value and the second evaluation value includes:

determining a reward function of the current time step according to the waiting cost of the passengers and the operation cost of the enterprise;

determining a reward target according to a reward function and the second evaluation value;

determining a reward error according to the first evaluation value and the reward target;

determining a second updating value according to a second learning parameter, the reward error and the influence gradient of the first evaluation value;

determining an updated value network parameter according to the value network parameter corresponding to the first value network model and the second updated value, and updating the first value network model according to the updated value network parameter;

the passenger waiting cost is determined based on the total waiting time spent by passengers in all stations in the current time step.

According to the train schedule determining method provided by the invention, the preset condition is any one of the following conditions:

in stations shared by the large traffic station and the small traffic station, the departure time of the small traffic train at any station in the shared stations is within the constraint time interval of the large traffic;

the train departure time exceeds the traffic operation time;

the number of people left in all stations is 0;

the train departure time exceeds the traffic operation time, and no people remain in all stations;

the method comprises the steps that a constraint time interval of a large traffic route is determined according to a constraint minimum value and a constraint maximum value, the constraint minimum value is determined according to a difference value between departure time of a station of a large traffic route train in the shared station and a preset constraint interval, and the constraint maximum value is determined according to a sum value between the departure time of the station of the large traffic route train in the shared station and the preset constraint interval.

According to the train schedule determining method provided by the invention, the train schedule determining the target time step according to the initial departure time of the target time step and the target departure interval comprises the following steps:

and determining the departure time of each station in all stations along the train number according to the initial departure time of the train number corresponding to the target time step, the target departure interval, the running time between the stations and the stop time of each station, and determining the train schedule of the target time step according to the departure times of all stations.

According to the train schedule determining method provided by the invention, before inputting the state group of the current time step to the first strategy network model, the method further comprises the following steps:

under the condition that the state group is a large traffic state group, determining the large traffic state group according to the initial departure time of the train number corresponding to the time step of the large traffic state group and the total remaining number of trains which cannot take the train number corresponding to the large traffic state group in all large traffic stations along the train number corresponding to the time step of the large traffic state group;

under the condition that the state group is a small traffic road state group, determining the small traffic road state group according to the initial departure time of the train number corresponding to the time step of the small traffic road state group and the total remaining number of trains which cannot take the train number corresponding to the small traffic road state group in all common stations along the train number corresponding to the time step of the small traffic road state group;

the common stations are stations at the overlapped parts of all stations along the large traffic route and all stations along the small traffic route.

According to the train schedule determining method provided by the present invention, in a case where the state group is a large intersection state group, the determining of the state group of the next train number corresponding to the train number at the current time step as the state group of the next time step includes:

determining the total number of passengers waiting for taking the next train at each station according to the number of passengers forced to wait for the next train from each station in the large bus station by not taking the train at the current train and the number of newly-entering passengers departing from each station and going to the designated station;

determining the actual number of passengers of each station reaching the designated station according to the residual bearing capacity of the next train and the total number of passengers waiting for taking the next train at each station;

determining the number of the left passengers who cannot take the train of the next train number in each station to reach the designated station according to the total number of passengers who go from each station to the designated station and the actual number of passengers who arrive at the designated station in each station, so as to determine the total number of the left passengers who cannot take the train of the next train number in the large traffic station;

determining the total number of the left passengers who cannot take the next train in all stations according to the total number of the left passengers who cannot take the next train in the large-traffic station and the total number of the left passengers who cannot take the next train in the common station, and determining the state group of the next time step according to the initial departure time of the next time step and the total number of the left passengers who cannot take the next train in all stations;

the large bus station is the rest of the stations along the small bus line.

According to the train schedule determining method provided by the present invention, in a case where the state group is a minor crossing state group, the determining of the state group of the next train number corresponding to the train number at the current time step as the state group of the next time step includes:

determining the total number of passengers waiting for taking the next train number at each station in the shared stations according to the number of passengers forced to wait for the next train number at each station in the shared stations due to the fact that each station cannot take the current train number, the number of passengers newly entering the shared stations from any station in the shared stations to a specified station in the shared stations and the number of passengers newly entering the shared stations from any station in the shared stations to a large bus station;

determining the number of the left passengers who cannot take the next train in each station to reach the designated station according to the total number of passengers who go from each station to the designated station and the actual number of passengers who each station reaches the designated station, so as to determine the total number of the left passengers who cannot take the next train in each station of the shared stations;

and determining the total number of the left-over persons who cannot take the next train in all stations of the shared station according to the total number of the left-over persons who cannot take the next train in each station of the shared station, and determining a state group of the next time step according to the initial departure time of the next time step and the total number of the left-over persons who cannot take the next train in all stations of the shared station.

In a second aspect, the present invention provides a train schedule determining apparatus, including:

an execution unit: the method is used for repeatedly executing the following steps until a preset condition is met:

inputting a state group of a current time step to a first strategy network model, and acquiring a departure interval of the current time step output by the first strategy network model; inputting a state group of a next time step to a second strategy network model, and acquiring a next time step departure interval output by the second strategy network model;

updating the first strategy network model according to the current time step departure interval and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second strategy network model is determined by updating the first strategy network model according to preset parameters, and the second price value network model is determined by updating the first price value network model according to the preset parameters;

an acquisition unit: the system comprises a first strategy network model, a second strategy network model and a third strategy network model, wherein the first strategy network model is used for acquiring a target departure interval output by the first strategy network model;

a determination unit: the train timetable is used for determining the target time step according to the initial departure time of the target time step and the target departure interval;

the state group is any one of a large traffic state group or a small traffic state group;

the time step is a time slice with preset duration.

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program to implement any one of the above-mentioned train schedule determining methods.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a train schedule determination method as described in any of the above.

The invention provides a method, a device, equipment and a medium for determining a train schedule, which take a time step as a determination basis of a state group, determine a current departure interval through a policy network model and the state group corresponding to the current time step, input the state group corresponding to the current time step and the state group corresponding to the current departure interval to determine a state group of a next time step, and determine the next departure interval according to another policy network model, thereby realizing the continuous interactive iteration of the state group corresponding to the time step and the departure interval, simultaneously introducing a value network model, using a reward error as a value to guide the continuous iteration to update the policy network model and the value network model, further determining an optimized policy network model, and finally determining the train schedule according to the optimized policy network model. The invention constructs a virtual operation environment according to passenger flow information and train dynamics characteristics, adopts a reinforcement learning algorithm to learn and optimize on the constructed environment, simulates train operation conditions and further obtains a train schedule.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a train schedule determination method according to the present invention;

FIG. 2 is a second schematic flow chart of the train schedule determining method according to the present invention;

FIG. 3 is a schematic flow chart of updating a first policy network model provided by the present invention;

FIG. 4 is a schematic flow chart of updating a first value network model according to the present invention;

FIG. 5 is a third schematic flow chart of a train schedule determination method according to the present invention;

FIG. 6 is one of the flow diagrams for determining the set of states for the next time step provided by the present invention;

FIG. 7 is a second schematic flow chart illustrating the process of determining the state set of the next time step according to the present invention;

FIG. 8 is a schematic view of a traffic operation scene in a big-small traffic road provided by the present invention;

FIG. 9 is a fourth schematic flowchart of a train schedule determining method according to the present invention;

fig. 10 is a schematic structural diagram of a train schedule determining apparatus provided by the present invention;

fig. 11 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The traditional method for optimizing the train schedule can be divided into two major categories, wherein the first major category is an integer programming or mixed integer programming method, and the method has high computational complexity, large calculated amount and low convergence speed; the second category is heuristic methods such as particle swarm optimization, genetic algorithm and differential evolution algorithm, which depend on expert experience, have unstable performance and are easy to fall into local optimal solution.

The method abandons the traditional optimization algorithm, adopts model-free reinforcement learning in the reinforcement learning algorithm, directly carries out real-time interactive learning on the optimal strategy with the environment, and has good universality for complex application scenes.

Fig. 1 is one of the flow diagrams of a train schedule determining method provided by the present invention, and the present invention discloses a train schedule determining method, which includes:

repeatedly executing the following steps until the preset conditions are met:

inputting a state group of a current time step to a first strategy network model, and acquiring a departure interval of the current time step output by the first strategy network model; inputting a state group of a next time step to a second strategy network model, and acquiring a departure interval of the next time step output by the second strategy network model;

the time step is a time slice with preset duration.

In step 101, the time step is a time segment of a preset time duration, and in one time step, four situations may occur due to different size relationships between the preset time duration and the time interval, for example, in one time step, if the preset time duration is greater than the departure time interval, the next time step still stays at the current train number, and if the preset time duration is less than the departure time interval, the next train number enters at the next time step.

In the invention, a state group of a current time step is input to a first strategy network model, and a departure interval of the current time step output by the first strategy network model is obtained; the method comprises the steps of inputting a state group of the next time step to a second strategy network model, and obtaining the next time step departure interval output by the second strategy network model.

The parameters of the first policy network model and the parameters of the second policy network model may be the same or different, but in the subsequent continuous iteration process, the parameters of the two policy network models are in a state of being changed and updated continuously.

The method comprises the steps of firstly initializing a training environment, training a first strategy network model and a second strategy network model according to initial strategy network parameters, so that the first strategy network model and the second strategy network model can output departure intervals according to a state group. In the invention, the state is the state group, the action is the departure interval, the next state group is further determined according to the state group and the departure interval, the next state group is input into the second strategy network model, and the departure interval output by the second strategy network model is obtained.

The state groups are large traffic state groups and small traffic state groups, and the state groups are determined according to the initial departure time corresponding to the state groups and the total number of remaining passengers who cannot take trains in all stations along the train number.

Specifically, the first strategy network model is built according to first initial strategy network parameters, the second strategy network model is built according to second initial strategy network parameters, the initial strategy network parameters are parameters used for building the strategy network model in model-free learning, a large and small traffic mode is assumed before a virtual operation environment is built, trains are sufficient and fixed in a marshalling mode, the trains adopt stop modes of station stop on respective traffic roads, crossing is forbidden, the stop time of the large and small traffic trains at the same station is the same, the running time of the trains between the stations, the stop time of the station and the terminal station turn-back time are known, and a train timetable is only related to the departure time of the trains from a train section or a stop line.

Further, a quantity matrix of passenger trips is preset, the quantity matrix is composed of three dimensions, the first dimension is time, the second dimension is a departure station, and the third dimension is a destination station, for example:

in equation (1), the number matrix indicates that the number of passengers departing from station i and arriving at station j is Δ from time t-1 to time t.

Those skilled in the art will appreciate that in a multi-agent algorithm, it is difficult for a single agent to observe a complete state, i.e., the observation of a single agent is only a partial state, and in a big-small crossing mode, the state of the train is represented by a partial observation of each crossing. At time step δ, its state s _δ Can be expressed as o _δ，1 ，o _δ，2 }。

Wherein, for large traffic road observation:

wherein the content of the first and second substances,

indicating kth in large traffic ₁ (delta) time of departure of the train from the train section,

show that the train capacity is limited in a large cross road, and the train cannot ride in the station iThe number of passengers left by the last train. For small traffic observations:

wherein

Indicating kth in minor crossing ₂ (delta) times when the train departs from the stop line,

this indicates the number of passengers left in station i due to the inability to ride the previous train due to the train capacity limitation in the small traffic routes. K in the above ₁ (delta) (or k) ₂ (δ)) represents the train number of the large cross road (or small cross road) at the δ -th time step.

In the present invention, each agent has its own action space, global action space a _δ Can be expressed as

Wherein the content of the first and second substances,

indicating a large cross road k ₁ (delta) +1 train and k ₁ (delta) departure time intervals between trains,

indicating a small cross road k ₂ (delta) +1 train and k ₂ (δ) departure time intervals between trains.

As understood by those skilled in the art, there is a limit interval for the departure time interval of trains with large and small cross roads

And

in order to ensure the driving safety and the service quality of the train,

and

are respectively limited at

And

within the range. Because the intelligent agent is generally realized by adopting the neural network, in order to ensure the stability of the reinforcement learning algorithm, the output value of the neural network can be output

Limited to [ -1,1]Within range, and then converted into corresponding range by the following formula, thereby obtaining the real time interval

And

in the formulae (2) and (3),

and

respectively represents the maximum time interval allowed by departure of trains on large traffic roads and small traffic roads,

and

respectively representing the minimum time interval allowed by departure of the trains on the large traffic road and the small traffic road.

In step 102, inputting the state group of the current time step and the departure interval of the current time step to a first value network model, and acquiring a first evaluation value output by the first value network model; inputting the state group of the next time step and the departure interval of the next time step to a second value network model, and obtaining a second evaluation value output by the second value network model, wherein the construction of the first value network model and the second value network model is similar to the construction of the strategy network model.

In step 103, updating the first policy network model according to the current time step departure interval and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second policy network model is determined by updating the first policy network model according to preset parameters, the second policy network model is determined by updating the first policy network model according to the preset parameters, the first policy network model is updated according to the current time step departure interval and the first evaluation value, the updated first policy network model is obtained, the first policy network model is updated according to the first evaluation value and the second evaluation value, and the updated first policy network model is obtained.

The basis for prompting the updating of the first price network model is the reward error, the reward error is determined according to the first evaluation value, the second evaluation value and the reward function, namely, the updating of the first price network model is realized according to the evaluation value before and after each iteration and the reward function, and the relationship between the passenger waiting cost and the enterprise operation cost specified in the reward function is solved by the invention, and the key of optimizing the train schedule is that the passenger waiting cost and the enterprise operation cost are taken as the root.

The second policy network model is determined by updating the first policy network model according to preset parameters, the second value network model is determined by updating the first value network model according to the preset parameters, specifically, all target networks are updated by using a moving average, and the following formula is referred to:

wherein, in the formula (4),

for the parameters of the network model of the first policy,

for the second policy network model parameter, in equation (5),

is a first value of the network model parameter,

the parameter is a second value network model parameter, wherein eta is an element (0,1), and the parameter is a hyper-parameter which needs to be manually adjusted, namely the pre-set parameter.

In step 104, after a preset condition is met, inputting a state group of a target time step to the first policy network model, and obtaining a target departure interval output by the first policy network model. Those skilled in the art understand that, for different traffic patterns, there are corresponding modules of the policy network model and the value network model, and optionally, the embodiment of the present invention further includes:

for a large traffic route mode, inputting a large traffic route state group of a current time step to a first strategy network model, and acquiring a large traffic route current time step departure interval output by the first strategy network model; inputting a large traffic route state group of the next time step to a second strategy network model, and acquiring a next time step departure interval of the large traffic route output by the second strategy network model; inputting a large traffic route state group of the current time step and a departure interval of the large traffic route at the current time step to a first price value network model, and acquiring a first large traffic route evaluation value output by the first price value network model; inputting a large traffic route state group of the next time step and a next time step departure interval of the large traffic route to a second value network model, and acquiring a second large traffic route evaluation value output by the second value network model; updating the first strategy network model according to the current time step departure interval of the large traffic route and the first large traffic route evaluation value; updating the first price network model according to the first large intersection evaluation value and the second large intersection evaluation value; the second policy network model is determined by updating the first policy network model according to the moving average parameter, and the second price value network model is determined by updating the second price value network model according to the moving average parameter; and after a preset condition is met, inputting a large traffic state group of a target time step to the first strategy network model, and acquiring a first target departure interval output by the first strategy network model.

For a minor traffic mode, inputting a minor traffic state group of a current time step to a third policy network model, and acquiring a departure interval of the minor traffic current time step output by the second policy network model; inputting a small traffic route state group of the next time step to a fourth strategy network model, and acquiring a sending interval of the small traffic route at the next time step output by the fourth strategy network model; inputting a small traffic route state group of the current time step and a departure interval of the small traffic route at the current time step to a third-value network model, and acquiring a first small traffic route evaluation value output by the third-value network model; inputting the minor cross-road state group of the next time step and the departure interval of the next time step of the minor cross-road to a fourth-value network model, and acquiring a second minor cross-road evaluation value output by the fourth-value network model; updating the third strategy network model according to the current time step departure interval of the minor traffic routes and the first minor traffic route evaluation value; updating the third-valence network model according to the first small traffic route evaluation value and the second small traffic route evaluation value; the fourth strategy network model is determined by updating a third strategy network model according to the moving average parameter, and the fourth value network model is determined by updating a third value network model according to the moving average parameter; and after a preset condition is met, inputting a small traffic state group of a target time step to the third strategy network model, and acquiring a second target departure interval output by the third strategy network model.

Optionally, after a preset condition is met, inputting a state group of a target time step to the first policy network model, and obtaining a target departure interval output by the first policy network model, continuously inputting the state group to the first policy network model and the second policy network model which are continuously in an updated state, inputting a continuously updated state group and a continuously changed departure interval to a continuously updated first price value network model and a continuously updated second price value network model, performing iteration processing, determining an optimal policy network model after the preset condition is reached, inputting the state group of the target time step to the policy network model, obtaining the target departure interval output by the policy network model, and finally determining a train schedule.

Optionally, the preset condition is any one of the following conditions:

the train departure time exceeds the traffic operation time;

the number of people left in all stations is 0;

Optionally, when the train departure time exceeds the traffic operation time and no number of people remains in all stations, stopping iteration, and determining a final strategy network parameter according to the strategy network parameter determined in the last iteration and the final update value so as to construct a final strategy network model according to the final strategy network parameter; the final update value is determined based on the first learning parameter, the influence gradient of the departure interval in the last iteration, and the influence gradient of the first evaluation value in the last iteration.

The traffic operation time is train operation time, such as 6 to 8 as early as possible, and such as 5 to 12 as early as possible, and in order to ensure that each person can get on the train, the invention sets that no number of people remains in all stations, and then the iteration is completed.

Alternatively, in the big-small cross-road mode, in order to ensure the normal running of the train, the schedule of the big-small cross-road needs to satisfy certain constraint conditions, because the departure time on the single cross-road already satisfies the constraint, the problem of constraint detection can be understood as the problem that only the schedule conflict between the big-small cross-road and the big-small cross-road needs to be detected, and because the running time of the train between the stations, the stopping time of the station and the returning time of the terminal station are known, only the train at the station s of the big-small cross-road needs to be detected _a And s _b Can satisfy the constraint, particularly, the size of the cross train is in the uplink direction of s _a Station (descending direction at s) _b Stations) need to meet minimum time interval constraints.

Big intersection k in ascending direction ₁ (delta) train, which is at s _a The departure time of the station is

Assuming its minimum time interval constraint, i.e. a predetermined constraint interval of h _min Then k is ₁ (delta) train at station s _a The departure safety time interval is

Therefore, it is necessary to detect the presence of a train on the small traffic route at the station s _a Whether the departure time is

And in the interval, if the current time falls into the interval, the constraint condition is not met, at the moment, the environment gives a large and negative number reward and is reset to an initial state, and the condition that the constraint condition is not met is one of termination conditions in preset conditions.

In step 105, determining a train schedule of the target time step according to the initial departure time of the target time step and the target departure interval, optionally, determining the train schedule of the target time step according to the initial departure time of the target time step and the target departure interval includes:

and determining the departure time of each station in all stations along the train number according to the initial departure time of the train number corresponding to the target time step, the target departure interval, the running time between each station and the residence time of each station, and determining the train schedule of the target time step according to the departure times of all stations.

In such an embodiment, it is assumed that there are four stations, namely a first station, a second station, a third station and a fourth station, wherein 8 current train departure times are determined, and the running time between the stations is determined, that is, it takes 11 minutes from the first station to the second station, it takes 13 minutes from the second station to the third station, and it takes 8 minutes from the third station to the fourth station, and in the big-small traffic mode, there is a route schedule in the going direction and also a route schedule in the returning direction, for example, it takes 8 minutes from the fourth station to the third station, it takes 13 minutes from the third station to the second station, it takes 11 minutes from the second station to the first station, more specifically, it takes 2 minutes to stay at the first station, it stays 3 minutes at the second station, it stays 2 minutes at the third station, and it can determine all train departure times at the fourth station, and it can determine all train departure times when the train interval is known at each time.

Alternatively, since the running time between stations, the stop time of each station, and the terminal return time are fixed, the departure time of station i can be calculated by the following formula:

in formula (6), w ⁱ Indicating the residence time of station i, z ^i-1,i Representing the travel time between stations i-1 and i, it should be noted that equation (6) adds the turnaround time in the case of a terminal.

Compared with railway operation between cities, urban rail transit operation has a complex train timetable, in the operation process of urban rail transit, the advantages and disadvantages of the train timetable relate to the operation cost of enterprises and the waiting cost of passengers, the waiting cost of the passengers can be increased and the satisfaction of the passengers can be reduced if the departure time interval is too large, the operation cost of the enterprises can be increased if the departure time interval is too small, the income of the enterprises is influenced, and the train timetable refers to the arrival and departure time of the train at each station along the line. It is noted that since the distance between stations is fixed, when the travel speed, stop time, and turn-back time between stations are determined, the train schedule is related only to the time when the train departs from the initial departure station.

With reference to fig. 8 and fig. 8, which are schematic views of traffic operation scenes in large and small traffic routes provided by the present invention, with the rapid development of society, a single traffic route mode cannot meet rail traffic operation requirements, a reasonable train traffic route mode is drawing more and more attention, the large and small traffic route modes can balance spatial differences of passenger flow distribution according to passenger flow characteristics of complex routes, can significantly reduce enterprise operation costs, and meet passenger travel requirements, and the large and small traffic routes are one of rail traffic basic traffic route modes and are adapted to routes unbalanced with section passenger flow.

As shown in FIG. 8, s is common ₁ To s _n N stations in total, at the starting and ending points(s) of the line ₁ -s _n ) Running through the whole line of large traffic trainsLarger segment(s) _a -s _b ) Driving small-traffic trains, suppose s ₁ A train section (not shown) is located near the station and is responsible for the delivery and retraction of large traffic trains, s _a Nearby a stop line (not shown) for sending and receiving small traffic trains, and large traffic trains from the train section and from the train section ₁ Run to s _n Through the turn-back, from s _n Go back to s ₁ Finally returning to the vehicle section, the little-way train is sent from the stop line, from s _a Run to s _b Through the turn-back, from s _b Go back to s _a And finally back to the stop line. Optionally, the station set of the section where the large cross road is located is denoted as p ₁ The station set of the section where the small intersection is located is marked as p ₂ It is clear that, in the case of a,

wherein, [ s ] ₁ s _a ) And(s) _b s _n ]Is a set of stations belonging to the large traffic routes only.

Optionally, the train schedule refers to arrival and departure times of the train at each station along the route. Since the distance between stations is fixed, when the running speed, the stop time and the turn-back time between the stations are determined, the train schedule is only related to the time of departure of the train from the train section (stop line), and for simplification, the train schedule of the train on the whole running line is determined on the assumption that the time of departure of the train from the train section (stop line) is determined.

The invention adopts a multi-agent reinforcement learning algorithm to solve the problem of train schedule optimization under a large-and-small traffic mode, two agents are respectively responsible for the train schedule optimization of the large-traffic and the small-traffic, and each agent can only observe a local state. The interaction process of the agent with the environment is shown in the following figure. The large traffic agent and the small traffic agent respectively obtain local states from the environment, then make decisions according to the local states respectively, and apply the decisions to the environment, and the environment gives rewards and local states of the next time step. The purpose of each agent is to learn a policy function that derives as much of the reward from the environment as possible. The local state includes departure time of a train on a large traffic road (or a small traffic road) and the number of the left passengers at the station. The above-mentioned actions refer to the departure interval of the train on the large traffic road (or the small traffic road). The reward is the opposite of the weighted sum of the business operation cost of the large hand-in (or small hand-in) road and the waiting cost of the passenger.

Fig. 9 is a fourth schematic flowchart of the train schedule determining method provided by the present invention, and as shown in fig. 9, the large cross-road observation corresponds to the large cross-road policy network, and the small cross-road observation corresponds to the small cross-road policy network, after the environment is constructed, the reinforcement learning algorithm can be applied. The multi-agent depth determination strategy gradient MADDPG is a multi-agent reinforcement learning method suitable for continuous control. The large traffic route and the small traffic route intelligent agents respectively correspond to a strategy network and a value network, the strategy network is deterministic, and the output action is deterministic for the determined observation input. The input to the value network is the joint observation (state) with the actions of all agents, and the output is a real number representing how well "perform some action based on the current state". The large traffic route strategy network is used for controlling large traffic route intelligent bodies, the small traffic route strategy network is used for controlling small traffic route intelligent bodies, the large traffic route (or small traffic route) value network is used for evaluating actions taken by all intelligent bodies, and given scores can guide the large traffic route (or small traffic route) strategy network to make improvement.

The multi-agent depth determination strategy gradient MADDPG is an Off-policy, adopts a structure of centralized training and decentralized decision, reuses past experiences by using experience replay, and stores the collected experiences by using an experience replay array, wherein each experience is a quadruple(s) _δ ,a _δ ,r _δ ,s _δ+1 ) Wherein s is _δ ＝{o _δ，1 ，o _δ，2 }，

s _δ+1 ＝{o _δ+1，1 ，o _δ+1，2 }，r _v ＝{r _δ,1 ，r _δ,2 }。

Fig. 2 is a second schematic flow chart of the train schedule determining method provided by the present invention, before inputting the state group of the next time step to the second policy network model, further including:

under the condition that the preset duration is less than the departure interval of the current time step, determining a state group of the train number corresponding to the current time step as a state group of the next time step;

In step 201, the initial departure time of the next time step is determined according to the initial departure time of the current time step and the departure interval of the current time step, and the initial departure time of the next time step is determined according to the sum of the initial departure time of the current time step and the departure interval of the current time step in the same manner regardless of the large traffic mode or the small traffic mode.

In step 202, when the preset time duration is less than the departure interval of the current time step, the state group of the train number corresponding to the current time step is determined as the state group of the next time step, first, the departure interval of the current time step is determined, the preset time duration is compared with the departure interval of the current time step, and when the preset time duration is less than the departure interval of the current time step, it is considered that the next time step has been entered but the departure time of the next train number has not been reached, and the state group of the train number corresponding to the current time step is determined as the state group of the next time step.

In step 203, when the preset time length is greater than or equal to the departure interval of the current time step, the state group of the next time corresponding to the current time step is determined as the state group of the next time step, and at this time, when the preset time length is greater than or equal to the departure interval of the current time step, and when the departure time of the next time step is reached in the case of entering the next time step, the state group of the next time corresponding to the current time step is determined as the state group of the next time step.

Based on the judgment, in the same time step, four conditions may exist, the first condition is that the large traffic road and the small traffic road in the next time step are the current train number; the second type is that the large traffic road and the small traffic road in the next time step are the next train number; thirdly, in the next time step, the large traffic road enters the next train number, and the small traffic road is still in the current train number; and the fourth, that is, in the next time step, the small hand-off enters the next train number, while the large hand-off is still in the current train number.

Fig. 3 is a schematic flow chart of updating a first policy network model provided by the present invention, where the updating the first policy network model according to the current time step departure interval and the first evaluation value includes:

In step 1031, a first update value is determined according to the product of the first learning parameter, the influence gradient of the current departure interval, and the influence gradient of the first evaluation value.

In step 1032, determining an updated first policy network parameter according to the first policy network parameter corresponding to the first policy network model and the first update value, so as to update the first policy network model according to the updated first policy network parameter, and determining an updated first policy network parameter according to the first policy network parameter corresponding to the first policy network model and the first update value, specifically referring to the following formula:

in the formula (7), θ _new To update the policy network parameters, θ _now Is corresponding to the policy network modelβ is a first learning parameter,

for the gradient of influence of the current departure interval,

and establishing an updated strategy network model for the influence gradient of the first evaluation value according to the updated strategy network parameters.

Optionally, k is a big-and-small traffic mode, and under the big traffic mode, a first big traffic update value is determined according to the first learning parameter, the influence gradient of the current time step departure interval of the big traffic and the influence gradient of the first big traffic evaluation value; and determining an updated first policy network parameter according to the first policy network parameter corresponding to the first policy network model and the first large traffic route update value, so as to update the first policy network model according to the updated first policy network parameter.

Determining a first minor cross road updating value according to the first learning parameter, the influence gradient of the current time step departure interval of the minor cross road and the influence gradient of the minor cross road evaluation value under the condition that k is the minor cross road mode; and determining the updated minor traffic route strategy network parameters according to the strategy network parameters corresponding to the strategy network model corresponding to the minor traffic route and the first minor traffic route update value, so as to update the strategy network model corresponding to the minor traffic route according to the minor traffic route strategy network parameters.

Fig. 4 is a schematic flow chart of updating a first price network model according to the present invention, where the updating the first price network model according to the first evaluation value and the second evaluation value includes:

In step 1033, a reward function of the current train number is determined according to the passenger waiting cost and the enterprise operation cost, wherein the passenger waiting cost is determined according to the total waiting time spent by passengers in all stations in the current time step, specifically, the core of the reinforcement learning is that the most rewards are obtained as much as possible through interaction with the environment, so the rewards are crucial to the reinforcement learning algorithm, the reward function can be regarded as the opposite number of the cost function, the cost function is firstly calculated, the cost can be divided into two types, the first type is the passenger waiting cost, the second type is the enterprise operation cost, the enterprise operation cost is generally composed of train bottom configuration cost, operation cost, depreciation cost, maintenance cost and other complex costs, and the total cost of each train operation is regarded as the same and is represented by CO because the method is used for optimizing the operation time of the train and is not used for controlling the operation speed of the train, so the cost of each train operation cannot be accurately calculated.

The passenger waiting cost is determined according to the product of the passenger waiting time and the unit time value; determining a first weighted value according to the passenger waiting cost and the first weighted value; determining a second weighted value according to the enterprise operation cost and the second weighted value; determining a reward function of the current train number according to the first weighted value and the second weighted value; the sum of the first weight value and the second weight value is a preset constant.

Specifically, the passenger waiting cost is determined according to the product of the passenger waiting time and the unit time value, the passenger waiting cost is determined according to the total waiting time spent by passengers in all stations in the current train number and the unit time value, a first weighted value is determined according to the product of the passenger waiting cost and the first weighted value, a second weighted value is determined according to the enterprise operation cost and the second weighted value, and the second weighted value is determined according to the product of the enterprise operation cost and the second weighted value.

Optionally, a reward function of the current train number is determined according to the first weighted value and the second weighted value, specifically, the following formula is referred to:

in the formula (8), α is a weight value of 0 to 1 to perform weighted summation of the two, and CP _k For passenger waiting costs, CO is the enterprise operating costs.

And the passenger waiting cost is determined according to the product of the passenger waiting time and the value per unit time, and the calculation of the passenger waiting cost is mainly focused on the calculation of the passenger waiting time because the value per unit time is fixed and known.

Alternatively, whether it is a large traffic road or a small traffic road, the process of calculating the waiting time of the passengers is the same, taking the direction of the large traffic road as an example, the k < th > traffic road ₁ (delta) +1 passenger waiting time for train waiting for kth from each station ₁ (δ) +1 total waiting time for passengers of the train.

Waiting for kth station of any station belonging to large traffic route only ₁ (delta) +1 total number of passengers in train

The total waiting time for passengers at any station is therefore:

waiting for the kth station for the common station of the large and small crossroads ₁ (δ) + total number of passengers in 1 train:

the destinations of part a passengers are also in the shared station, the passengers in the section can ride the large traffic train or the small traffic train, and the waiting time of part a is

The destination of the part b passengers is at the large traffic station, the passengers in the section can only take the large traffic train, and the waiting time of the part b passengers is

The total waiting time for passengers at common station i for large and small crossings is therefore:

in summary, the k-th direction of the large traffic uplink ₁ (δ) +1 total waiting time for passengers of the train:

the calculation scheme can be referred to for the downlink direction of the large intersection, and is not described herein again.

In step 1034, a reward target is determined according to the reward function and the second rating value, and the reward target may be determined according to the following formula:

in the formula (13), the reaction mixture is,

for rewarding the target, r _δ,k For the reward function, γ is a discount factor, typically 0.99,q _δ+1,k Is the second evaluation value.

In step 1035, a reward error is determined according to the first evaluation value and the reward target, and the reward error may be determined according to the following formula:

in the formula (14), phi _δ,k To reward errors, q _δ,k Is a first evaluation value to be a first evaluation value,

are reward targets.

In step 1036, a second update value is determined from the product of a second learning parameter, the reward error, and the influence gradient of the first evaluation value.

In step 1037, an updated value network parameter is determined according to the value network parameter corresponding to the first value network model and the second updated value, so as to update the first value network model according to the updated value network parameter, which may refer to the following formula:

in the formula (15), the reaction mixture is,

that is to say the second update value, is,

the value network parameters corresponding to the first value network model,

is an updated value network parameter.

Fig. 5 is a third schematic flow chart of the train schedule determining method provided by the present invention, before inputting the state group of the current time step to the first policy network model, further including:

In step 301, the present invention actually has a set of combination of the first policy network model, the second policy network model, the first price network model and the second price network model for solving the prediction of the departure time interval optimization in the large traffic route for the large traffic route mode, and another set of combination of the first policy network model, the second policy network model, the first price network model and the second price network model for solving the prediction of the departure time interval optimization in the small traffic route for the small traffic route mode, however, the two models are not related to each other, and both are used to determine the state group in the large traffic route mode and the state group in the small traffic route mode at the same time step, and the state group whether the current traffic route state group or the next traffic route state group is adopted can refer to the embodiment of fig. 2. In the present invention, the problem of optimization of big and small hand-over train schedules is expressed as a Markov decision process, where two agents adjust the schedules for big and small hand-over, respectively. The decisions of the two agents are coordinated by introducing a common time step, and at each time step, the agents acquire the environment state and make corresponding decisions.

In the large traffic state group, the large traffic state group is formed according to the initial departure time corresponding to the train number of the corresponding time step and the total number of the left-over persons, and in the small traffic state group, the small traffic state group is formed according to the initial departure time corresponding to the train number of the corresponding time step and the total number of the left-over persons.

When the state group is a large traffic state group, determining the large traffic state group according to the initial departure time of the train number corresponding to the time step of the large traffic state group and the total number of remaining persons of trains which cannot take the train number corresponding to the large traffic state group in all large traffic stations along the train number corresponding to the time step of the large traffic state group, wherein step 301 and step 302 are simultaneously executed steps and respectively describe the construction modes of different state groups in different traffic modes, for example, in step 301, firstly determining the time step of the large traffic state group, determining whether the train number corresponding to the time step is the current train number or the next train number, and determining the initial departure time corresponding to each train number in the corresponding train number.

Further, all large traffic route stations along the train number corresponding to the time step are determined, and the total number of the left passengers who cannot take trains of the train number corresponding to the large traffic route state group in the stations is determined.

In step 302, the common stations are the stations at the overlapping portions of all stations along the large traffic route and all stations along the small traffic route, the time step of the state group of the small traffic route is determined, whether the train number corresponding to the time step is the current train number or the next train number is determined, the initial departure time corresponding to each train number is determined in the corresponding train number, further, all the common stations along the train number corresponding to the time step are determined, and the total number of the left-over persons who cannot take the train of the train number corresponding to the state group of the small traffic route in the stations is determined.

The initial departure times of the train numbers corresponding to the time steps at which the state groups are determined are the same, in particular in the passenger travel matrix, both for large or small traffic routes

Taking the data of the ascending passenger as an example, the large traffic mode and the small traffic mode determine a state group in the same time step delta, and the train number is k ₁ (delta) the state of the large intersection is

Acting as

From this, the train number k can be deduced ₁ (delta) +1 departure time of

When in

Hour, i.e. the next time step, train number k ₁ If the car has not been dispatched in time step delta +1, the state of the large intersection at time step delta +1 is

When in use

Hour, i.e. the next time step, train number k ₁ (δ) +1 having sent out, in the state of large intersection at δ +1 time step, the initial of the train number corresponding to the time step at which the state group is locatedThe departure time is as follows: kth ₁ (delta) +1 departure time of train

Accordingly, the above scheme may be referred to for the initial departure time of the train number corresponding to the time step in the small intersection train, which is not described herein again.

Fig. 6 is one of the schematic flow diagrams of determining a state group of a next time step provided by the present invention, where in a case that the state group is a large intersection state group, the determining a state group of a next train number corresponding to a train number at a current time step as a state group of the next time step includes:

determining the actual number of passengers of each station to reach the designated station according to the remaining bearing capacity of the next train and the total number of passengers waiting for taking the next train at each station;

the large bus station is the rest of the stations along the large bus line, but not the stations along the small bus line.

In step 3011, according to each of the large bus stopsThe number of passengers in a station that cannot ride the train of the current train number but is forced to wait for the train of the next train number, and the number of passengers that newly enter the station from the departure of each station to the designated station determines the total number of passengers waiting for the train of the next train number in each station, at stations that belong to the mass transit only (e.g., [ s ]) ₁ s _a ) And(s) _b s _n ]) Passengers can only ride large traffic trains. For a given OD matrix, the waiting times k at station i can be calculated by the following formula ₁ (delta) +1 number of trains

Wherein the content of the first and second substances,

showing each station i in the large bus station not riding k ₁ (delta) the train is forced to wait for the kth ₁ (δ) +1 number of trains, that is, the formula (16) shows waiting for the kth ride at station i ₁ (delta) +1 number of trains

Equal to no ride kth ₁ (delta) the number of trains from the kth ₁ (delta) the second train starts from station i to kth ₁ (δ) +1 number of persons who arrive at station i within the time period in which the train departs from station i.

In step 3012, the actual number of passengers at each station to the designated station is determined based on the remaining capacity of the next train and the total number of passengers waiting to take the next train at each station ₁ If the number of (delta) +1 secondary passengers is larger than the remaining capacity of the train, part of the passengers cannot take the trainThe vehicle is forced to wait for the k-th ₁ (delta) +2 trains, when k ₁ (δ) +1 time when the train arrives at station i, the remaining capacity that can be provided is calculated as follows.

Wherein in the formula (18), Γ represents the total train capacity,

indicating the number of passengers in the train before the train arrives at station i,

the number of passengers getting off at station i.

Further, each station i in the large transit stations actually rides the kth station ₁ The formula for calculating the number of people who arrive at the station j (designated station) for (delta) +1 time is as follows:

the meaning of formula (19) is: if the remaining space of the train is enough to wait for the kth ride at station i ₁ (delta) +1 time of getting on of the passenger of the train, then

Otherwise, the remaining space of the train is distributed to the passengers according to the percentage of the number of the passengers waiting for the train at the station i to the number of the passengers waiting for the train at the station j to the total number of the passengers waiting for the train at the station i. For example, if there are 5 passengers waiting for train to arrive at the station j at the station i and 15 passengers waiting for train to arrive at the station i, the percentage of the number of passengers waiting for train to arrive at the station j at the station i to the number of passengers waiting for train to arrive at the station j at the station i is 1:3, and at this time, if the remaining capacity is 9, 3 passengers waiting for train to arrive at the station i to arrive at the station j are permitted.

In step 3013, determining the number of remaining passengers that can not take the next train number at each station to reach the designated station according to the total number of passengers departing from each station to the designated station and the actual number of passengers that can reach the designated station at each station, so as to determine the total number of remaining passengers that can not take the next train number at the station of the large traffic route, combining with step 3012, the number of passengers in the train before the train reaches station i is:

in equation (20), the number of passengers in the train before the train arrives at station i is equal to the number of passengers in the train before the train arrives at station i-1 plus the number of passengers getting on at station i-1 minus the number of passengers getting off at station i-1. Number of passengers getting off station i

Is composed of

Therefore, the remaining number of trains arriving at the designated station at each station without taking the next train number is that station i cannot take the kth train ₁ (delta) +1 total number of train leaves

Is composed of

And:

in step 3014, the total number of remaining trains in all stations that cannot be taken by the next train in the next train is determined according to the total number of remaining trains in the big bus station that cannot be taken by the next train and the total number of remaining trains in the shared stations that cannot be taken by the next train, and the status group of the next time step is determined according to the initial departure time of the next time step and the total number of remaining trains in all stations that cannot be taken by the next train.

Fig. 7 is a second schematic flow chart illustrating the process of determining the state group of the next time step, where in the case that the state group is the minor-crossing state group, the determining the state group of the next train number corresponding to the current time step as the state group of the next time step includes:

In step 3021, the total number of passengers waiting to ride the next train number at each station in the shared stations is determined based on the number of passengers who cannot ride the current train number at each station in the shared stations but are forced to wait for the next train number, the number of passengers who depart from any station in the shared stations to the designated station in the shared stations, and the number of passengers who depart from any station in the shared stations to the bus station.

The station shared by large and small crossroads is set as ([ s ] _a ，s _b ]) In the calculation of the minor crossing of the present invention, except that waiting for the ride k at station i is performed ₁ (delta) +1 number of trains

Except that the calculation method of (a) is different, other places are the station (e.g., [ s ]) only belonging to the large traffic road ₁ s _a ) And(s) _b s _n ]) The calculation is similar.

Station shared by large and small crossroads ([ s ] _a ，s _b ]) Passengers riding in the train if their destination is also [ s ] _a ，s _b ]Sector, the passenger may choose to ride a large cross-over train or a small cross-over train (assuming the passenger rides the train arriving first), if the passenger's destination is not s _a ，s _b ]In the section, the passengers can only take the large traffic train. Since station i belongs to a station shared by large and small traffic routes, the ratio of the k to the k is ₁ (δ) +1 train arriving first at station i, possibly k ₁ (delta) sub-trains, and possibly small-cross trains, for ease of description, the k (delta) th ratio is denoted by k (delta) ₁ (δ) +1 train arriving first at station i.

Wait at station i for the k ₁ The formula for calculating the number of people in (delta) +1 train is as follows:

in the formula (22), a represents the number of newly entered passengers departing from any one of the common stations and heading to a specified station in the common stations, and the destination of the passengers is also [ s ] _a ，s _b ]And a section, wherein the passenger in the section can take a large-traffic train or a small-traffic train, so that the calculation formula of a is as follows:

j belongs to [ s ] _a ，s _b ]。

In the formula (22), when only the ascending is considered, b represents the number of newly entering passengers departing from any one of the common stations and heading to the station of the large bus, and the destination of the passengers is(s) _b ,s _n ]Section, since the passenger can only take the large-traffic train, the calculation formula of b is

Wherein j belongs to(s) _b ，s _n ]. It is noted that, in the calculation formulas of a and b, the lower limit of the accumulation time is different from that of the destination j, and the lower limit of the accumulation time of a is

b has a lower accumulation time limit of

Step 3022, determining the actual number of passengers arriving at the designated station at each station according to the remaining bearing capacity of the next train and the total number of passengers waiting for taking the next train at each station; step 3023 is to determine the number of remaining passengers at each station that cannot ride the next train in the designated station according to the total number of passengers departing from each station and traveling to the designated station and the actual number of passengers at each station traveling to the designated station, so as to determine the total number of remaining passengers at each station that cannot ride the next train in the designated station, where the calculation method of the above steps is the same as the calculation method in the calculation of the major crossing mode, and reference may be made to step 3012 and step 3013, which is not described herein again.

In step 3024, the total number of persons who cannot take the next train in all stations of the shared station is determined according to the total number of persons who cannot take the next train in each station of the shared station, so as to determine a state group of the next time step, which can be expressed as a tuple of the initial departure time of the next time step and the total number of persons who cannot take the next train in all stations of the shared station, according to the initial departure time of the next time step and the total number of persons who cannot take the next train in all stations of the shared station.

Fig. 10 is a schematic structural diagram of a train schedule determining apparatus provided in the present invention, and the present invention discloses a train schedule determining apparatus, which includes an executing unit 1: the method is used for repeatedly executing the following steps until a preset condition is met:

updating the first strategy network model according to the current time step departure interval and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second policy network model is determined by updating the first policy network model according to the preset parameters, the second value network model is determined by updating the first value network model according to the preset parameters, and the operation principle of the execution unit 1 may refer to the foregoing step 101, which is not described herein again.

The train schedule determining apparatus further includes an acquiring unit 2: the obtaining unit 2 may refer to the step 102, and the working principle of the obtaining unit 2 is not described herein again.

The train schedule determination apparatus further includes a determination unit 3: the train schedule for determining the target time step according to the initial departure time of the target time step and the target departure interval, and the working principle of the determining unit 3 may refer to step 103, which is not described herein again.

the time step is a time slice with preset duration.

Fig. 11 is a schematic structural diagram of an electronic device provided in the present invention. As shown in fig. 11, the electronic device may include: a processor (processor) 110, a communication Interface (communication Interface) 120, a memory (memory) 130 and a communication bus 140, wherein the processor 110, the communication Interface 120 and the memory 130 are communicated with each other via the communication bus 140. The processor 110 may invoke logic instructions in the memory 130 to perform a train schedule determination method comprising: repeatedly executing the following steps until a preset condition is met: inputting a state group of a current time step to a first strategy network model, and acquiring a departure interval of the current time step output by the first strategy network model; inputting a state group of a next time step to a second strategy network model, and acquiring a next time step departure interval output by the second strategy network model; inputting the state group of the current time step and the departure interval of the current time step to a first value network model, and acquiring a first evaluation value output by the first value network model; inputting the state group of the next time step and the departure interval of the next time step to a second value network model, and acquiring a second evaluation value output by the second value network model; updating the first strategy network model according to the current time step departure interval and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second strategy network model is determined by updating the first strategy network model according to preset parameters, and the second price value network model is determined by updating the first price value network model according to the preset parameters; after a preset condition is met, inputting a state group of a target time step to the first strategy network model, and acquiring a target departure interval output by the first strategy network model; determining a train timetable of the target time step according to the initial departure time of the target time step and the target departure interval; the state group is any one of a large traffic state group or a small traffic state group; the time step is a time slice with preset duration.

In addition, the logic instructions in the memory 130 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing a train schedule determining method provided by the above methods, the method comprising: repeatedly executing the following steps until a preset condition is met: inputting a state group of a current time step to a first strategy network model, and acquiring a departure interval of the current time step output by the first strategy network model; inputting a state group of a next time step to a second strategy network model, and acquiring a next time step departure interval output by the second strategy network model; inputting the state group of the current time step and the departure interval of the current time step to a first price value network model, and acquiring a first evaluation value output by the first price value network model; inputting the state group of the next time step and the departure interval of the next time step to a second value network model, and acquiring a second evaluation value output by the second value network model; updating the first strategy network model according to the current time step departure interval and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second strategy network model is determined by updating the first strategy network model according to preset parameters, and the second price value network model is determined by updating the first price value network model according to the preset parameters; after a preset condition is met, inputting a state group of a target time step to the first strategy network model, and acquiring a target departure interval output by the first strategy network model; determining a train timetable of the target time step according to the initial departure time of the target time step and the target departure interval; the state group is any one of a large traffic state group or a small traffic state group; the time step is a time slice with preset duration.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the train schedule determining method provided by the above methods, the method including: repeatedly executing the following steps until a preset condition is met: inputting a state group of a current time step to a first strategy network model, and acquiring a departure interval of the current time step output by the first strategy network model; inputting a state group of a next time step to a second strategy network model, and acquiring a next time step departure interval output by the second strategy network model; inputting the state group of the current time step and the departure interval of the current time step to a first value network model, and acquiring a first evaluation value output by the first value network model; inputting the state group of the next time step and the departure interval of the next time step to a second value network model, and acquiring a second evaluation value output by the second value network model; updating the first strategy network model according to the current time step departure interval and the first evaluation value; updating the first value network model according to the first evaluation value and the second evaluation value; the second strategy network model is determined by updating the first strategy network model according to preset parameters, and the second price value network model is determined by updating the first price value network model according to the preset parameters; after a preset condition is met, inputting a state group of a target time step to the first strategy network model, and acquiring a target departure interval output by the first strategy network model; determining a train timetable of the target time step according to the initial departure time of the target time step and the target departure interval; the state group is any one of a large traffic state group or a small traffic state group; the time step is a time slice with preset duration.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A train schedule determining method, comprising:

repeatedly executing the following steps until the preset conditions are met:

the time step is a time slice with preset duration.

2. The train schedule determination method of claim 1, further comprising, before entering the state set for the next time step into the second policy network model:

3. The method for determining a train schedule according to claim 1, wherein the updating the first policy network model according to the current time-step departure interval and the first evaluation value comprises:

determining a first updating value according to the first learning parameter, the influence gradient of the departure interval of the current time step and the influence gradient of the first evaluation value;

4. The train schedule determining method according to claim 1, wherein the updating the first price network model according to the first evaluation value and the second evaluation value includes:

5. The train schedule determining method according to claim 1, wherein the preset condition is any one of the following conditions:

the train departure time exceeds the traffic operation time;

the number of people left in all stations is 0;

the method comprises the steps that a constraint time interval of a large traffic route is determined according to a constraint minimum value and a constraint maximum value, the constraint minimum value is determined according to the difference value between the departure time of a station of a large traffic route train in the shared station and a preset constraint interval, and the constraint maximum value is determined according to the sum value between the departure time of the station of the large traffic route train in the shared station and the preset constraint interval.

6. The train schedule determining method according to claim 1, wherein the determining the train schedule of the target time step according to the initial departure time of the target time step and the target departure interval includes:

7. The train schedule determination method according to any of claims 2-6, further comprising, before inputting the state set for the current time step to the first policy network model:

8. The train schedule determining method according to claim 7, wherein, in a case where the state group is a large-traffic state group, the determining a state group of a next train number corresponding to the train number at the current time step as a state group of the next time step includes:

9. The train schedule determining method according to claim 7, wherein, in a case where the state group is a minor-crossing state group, the determining a state group of a next train number corresponding to the current time step as a state group of the next time step includes:

10. A train schedule determining apparatus, comprising:

the time step is a time slice with preset duration.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor when executing the program implements a train schedule determination method according to any of claims 1 to 9.

12. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the train schedule determining method according to any one of claims 1 to 9.