CN115688971A

CN115688971A - Wire network passenger flow control and train adjustment collaborative optimization method under train delay

Info

Publication number: CN115688971A
Application number: CN202211163621.2A
Authority: CN
Inventors: 郭建媛; 卢伟康; 秦勇; 贾利民; 孙方; 孙琦; 王月玥; 唐雨昕; 李�杰
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-02-03
Anticipated expiration: 2042-09-23
Also published as: CN115688971B

Abstract

The invention provides a cooperative optimization method for network passenger flow control and train adjustment under train delay. The method comprises the following steps: acquiring time, position and duration characteristics of train delay, performing clustering iteration on delay scenes of a railway line network, and randomly generating delay scenes; constructing a railway line network passenger flow control offline model, and performing station-entering offline reinforcement learning training on the railway line network passenger flow control offline model by using a delay scene to obtain an optimized railway line network passenger flow control offline model; and generating an online training environment according to an actual delay occurrence scene, and performing reinforcement learning online training on the optimized railway line network passenger flow control offline model by using the online training environment to obtain a railway line network passenger flow control and operation adjustment collaborative optimization scheme. The invention can consider passenger behaviors and train operation plans on the road network level when the uncertainty delay occurs to the train, provides a specific scheme for passenger flow control and train operation, and improves the road network operation service level.

Description

Wire network passenger flow control and train adjustment collaborative optimization method under train delay

Technical Field

The invention relates to the technical field of urban rail transit operation organization, in particular to a cooperative optimization method for network passenger flow control and train adjustment under train delay.

Background

Due to the characteristics of rapidness and environmental protection, urban rail transit is developed rapidly, and a large number of urban traffic trips are attracted and borne. In the operation process of urban rail transit, train operation is possibly affected and delayed due to various reasons such as train components, signals, power supply and the like. The passenger flow in urban rail transit is huge, the train departure frequency is high, the load is high, the train is delayed, the normal operation of other trains on the line is influenced, the passenger flow is easily gathered, and the safety problem is caused.

At present, the conventional overtime operation adjusting method aiming at delayed trains in the prior art only compresses the operation time and reduces the stop time while neglecting the passenger flow characteristics; in recent years, adjustment strategies including jump stop and passenger flow control are gradually adopted in urban rail transit operation, and good practical effects are achieved.

The above-mentioned drawbacks of the conventional adjustment method for the trip point operation of delayed trains in the prior art include: the planning is mainly based on the experience of the dispatcher and a qualitative plan, and has great defects in the aspects of global property and accuracy. At present, most of passenger flow control and train jump and stop cooperative optimization research is gathered under daily morning and evening peak conditions, and currently utilized linear programming models, secondary programming models, nonlinear combined optimization models and the like are only suitable for a specific scene or a specific example, so that the model utilization conditions are difficult to meet when train operation delay actually occurs.

Disclosure of Invention

The embodiment of the invention provides a cooperative optimization method for network passenger flow control and train adjustment under train delay, so as to effectively improve the railway network operation service level.

In order to achieve the purpose, the invention adopts the following technical scheme.

A method for cooperative optimization of network passenger flow control and train adjustment under train delay comprises the following steps:

acquiring time, position and duration characteristics of train delay occurrence, performing clustering iteration on a railway line network delay scene according to the time, position and duration characteristics of the train delay occurrence, and randomly generating a delay scene;

constructing a railway line network passenger flow control offline model, and performing station-entering offline reinforcement learning training on the railway line network passenger flow control offline model by using the delay scene to obtain an optimized railway line network passenger flow control offline model;

and generating an online training environment according to an actual delay occurrence scene, and performing reinforcement learning online training on the optimized railway line network passenger flow control offline model by using the online training environment to obtain a railway line network passenger flow control and operation adjustment collaborative optimization scheme.

Preferably, the obtaining of the characteristics of time, position and duration of occurrence of train delay includes:

according to the specific time of the train delay, judging which time peak value the specific time of the train delay belongs to by using a time peak value HL = { early peak, next early peak, late peak, next late peak, noon peak and flat peak };

according to the actual position of the train delay, calculating the distance between the starting station and the terminal station of the delay line, the number of stations of the delay place, and judging the uplink and downlink directions of the delay place, wherein the uplink direction D =1 and the downlink direction D =2;

according to the time t from the delay of the train to the recovery of the operation, k time length levels TL = { t = is utilized<＝t ₁ ,t ₁ <t<＝t ₂ ,t ₂ <t<＝t ₃ ,…,t _k <t, judging the duration grade of the time from the delay of the train to the recovery of the operation.

Preferably, the clustering iteration of the delay scenes of the railway line network is performed according to the characteristics of the time, the position and the duration of the occurrence of train delay, and the random generation of the delay scenes comprises:

randomly setting passenger travel and train delay according to a probability function, constructing initial train adjustment, and constructing an off-line training environment;

setting constraint conditions of passenger flow control and train adjustment according to the time, position and duration characteristics of the occurrence of train delay, repeatedly interacting the initial train adjustment and an offline training environment based on the constraint conditions of the passenger flow control and the train adjustment, performing reinforcement learning offline training, and outputting an offline model;

and the reinforcement learning offline training is iterated repeatedly under different offline training environments, and a delay scene is generated randomly.

Preferably, the randomly setting passenger travel and train delay according to the probability function includes:

according to the characteristics of passengers arriving at a station, the number of arriving persons of a service facility in a certain time is described, the randomness of the number of arriving persons at the station is described by adopting Poisson distribution, and the probability function of the Poisson distribution is as follows:

in the formula, the parameter lambda is an expected value of the occurrence frequency of random events in unit time and is used for describing the number of passengers arriving at a station on average in unit time, and k is the number of passengers;

according to the processes of the station-entering walking and the transfer walking of passengers, the station-entering walking time and the transfer walking time of the passengers in the rail transit station are described by using a probability function of normal distribution, wherein the probability function of normal distribution is as follows:

where x is a random variable, x obeys a mathematical expectation of μ, and the variance is σ ² Normal distribution of (2) is marked as X-N (mu, sigma) ² ) Mu represents the expected value of the travel time of the passenger, and x is distributed in [ mu-v, mu + v ]]In the interior of the container body,

according to the uncertainty of the train delay duration, a train breaks down at a certain station to cause delay, the delay time of the train at the position follows normal distribution, and the probability of the normal distribution is as follows:

where x is a random variable, x obeys a mathematical expectation of μ, and the variance is σ ² Normal distribution of (d) is expressed as X to N (mu, sigma) ² ) Mu represents the expected value of delay time of the train at the station, and x is distributed in [ mu-omega, mu + omega ]]And (4) the following steps.

Preferably, the constraint conditions of passenger flow control and train regulation include:

(1) Passenger flow control constraints

In the formula (I), the compound is shown in the specification,

in order to count the number of passengers arriving at the station,

gamma is the minimum passenger flow control coefficient for the number of passengers arriving at the station;

(2) Train capacity constraint

In the formula (I), the compound is shown in the specification,

to the highest full load rate, C _i The number of passengers is determined for the train;

(3) Train station jump restraint

Where M is a set of time periods, N ⁰ In order to allow the set of station jumping, I is the set of trains;

(4) Restraint of train operation

In the formula (I), the compound is shown in the specification,

for the dwell time of the i train at the j +1 station,

for the departure time of the i train at the j station,

for the minimum run time of j station to j +1 station,

the minimum stay time of the train at the j station is taken;

(5) Platform infrastructure capacity constraints

In the formula, s _n Rho is the maximum passenger flow density, and eta is the maximum capacity coefficient of the platform;

(6) Passenger transfer time constraints

In the formula (I), the compound is shown in the specification,

in order to provide the passenger with the time to get off the vehicle,

the travel time of the passengers;

preferably, the repeatedly interacting the train initial adjustment and the offline training environment based on the constraint conditions of the passenger flow control and the train adjustment to perform the reinforcement learning offline training includes:

step 1: the initial training times n =0, and training starts;

and 2, step: initializing a train delay operation plan, a time interval m, a state s and an award r;

and step 3: m = M +1, and when M is equal to M, skipping to step 8;

and 4, step 4: selecting a station on a line network, traversing a train arriving at the station, selecting an action according to the current state, and storing the action as an action packet;

and 5: inputting a current action packet and a current state, interacting with the environment, and obtaining a next state, a reward value and the number of the transfinites at the platform according to an environment function;

step 6: recording the current state s, the action a, the next state s' and the reward value r in a memory bank;

and 7: transmitting the data recorded in the memory base into a network for training and updating the state to obtain the reward value of the corresponding time period, and jumping to the step 3 after the reward value is accumulated to the reward r;

and step 8: and (5) skipping to the step 2 if n = n +1, n and G are woven, otherwise, finishing the training.

Preferably, the constructing of the railway line network passenger flow control offline model includes:

and constructing a railway line network passenger flow control offline model, wherein the railway line network passenger flow control offline model comprises corresponding delay scenes, neural network structures of a current network and a target network, parameter matrixes of each level, a loss function and an optimizer learning rate, and the railway line network passenger flow control offline model comprises data stored in a memory base and the size of a reward value obtained by summation after each training.

Preferably, the generating an online training environment according to an actual delay occurrence scenario, and performing online training of reinforcement learning on the optimized railway line network passenger flow control offline model by using the online training environment includes:

according to the characteristics of an actual delay scene, initializing an online model according to a stored optimized railway line passenger flow control offline model, wherein the online model inherits all parameters of the offline model and comprises data stored in a memory base;

according to the characteristics of an online environment, the updating frequency of an online model, the training set and the exploration rate are set, and a target network of the online model is only used for evaluation and does not update parameters with the increase of training times;

the online model has less training times, the online model takes action to interact with the environment, the environment returns a reward value and a next state, and only the current network parameters are updated in the training process for generating an accurate passenger flow control and train operation scheme.

Preferably, the obtaining of the railway line network passenger flow control and operation adjustment collaborative optimization scheme includes:

carrying out passenger flow control on passengers entering a rail transit platform by stages of station separation, and giving passenger flow control rate at specific time and place;

when delay occurs, a compression tracking interval and a stop time strategy are utilized to update a train schedule with the aim of recovering normal driving as soon as possible, a train drives according to the updated train schedule, and whether station jumping is carried out on a station platform allowing station jumping is selected.

According to the technical scheme provided by the embodiment of the invention, when the uncertainty delay occurs to the train, the passenger behavior and the train operation plan are considered on the road network level, the specific scheme of passenger flow control and train operation is provided, technical conditions are provided for delay passenger flow organization and dispersion, and the road network operation service level is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an implementation principle of a cooperative optimization method for wire network passenger flow control and train adjustment under train delay according to an embodiment of the present invention;

fig. 2 is a flowchart of an initial train adjustment according to an embodiment of the present invention;

FIG. 3 is a block diagram of a simulation environment according to an embodiment of the present invention;

FIG. 4 is a flowchart of an offline reinforcement learning according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an online training process for reinforcement learning according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an exemplary passenger flow control result according to an embodiment of the present invention;

fig. 7 is a diagram illustrating a train adjustment result according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for explaining the present invention and are not construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Aiming at a train delay scene, the invention provides reinforcement learning-based cooperative optimization of wire network passenger flow control and train adjustment so as to reduce passenger flow aggregation caused by delay. The embodiment of the invention provides a method different from the conventional passenger flow control and train adjustment, and can obtain an accurate passenger flow control and train operation adjustment scheme by considering the passenger travel and train delay uncertain scene calculation under the network scale.

The processing flow of the cooperative optimization method for network passenger flow control and train adjustment under train delay provided by the embodiment of the invention is shown in figure 1, and comprises the following steps:

step 1, clustering the delay scenes of the railway line network according to the characteristics of time, position, duration and the like of occurrence of train delay, and iteratively and randomly generating the delay scenes according to clustering results;

step 2, constructing a railway line network passenger flow control offline model, and performing station entry offline reinforcement learning training on the railway line network passenger flow control offline model by using a delay scene to obtain an optimized railway line network passenger flow control offline model;

and 3, generating an online training environment according to an actual delay occurrence scene, and performing reinforcement learning online training on the optimized railway line network passenger flow control offline model by using the online training environment to obtain a railway line network passenger flow control and operation adjustment collaborative optimization scheme.

The flow chart of the train initial adjustment provided by the embodiment of the invention is shown in fig. 2, and the specific processing steps are as follows:

step 1: constructing a time matrix of a train operation diagram, and extracting arrival time, residence time and departure time of the train at each station in the planned operation diagram to form initial planned operation diagram information;

step 2: acquiring specific information of train delay, including train delay time, delay place and delay duration;

and step 3: judging whether delay occurs from the initial station according to the delay information and the initial plan operation diagram information, if the delay occurs, extracting the train number information with the delay, otherwise, skipping to the step 5;

and 4, step 4: according to the extracted train number information, a compressed retention time and operation time strategy is adopted for the subsequent operation of the train number, the arrival and departure time of the train is changed, and the driving constraint is met;

and 5: and (4) judging whether the terminal station is the terminal station or not, if so, ending the process of the terminal station, and otherwise, skipping to the step 3 after adding 1 to the serial number of the station.

The frame diagram of a simulation environment provided by the embodiment of the invention is shown in fig. 3, the simulation environment can simulate the passenger flow arrival, getting-on and getting-off, getting-off and transfer processes of each station on a subway line network, and generally comprises four sub-processes: arriving at a station, entering the station, getting on and off, transferring and leaving. If the control rate of the station entering amount in unit time is more than 0, passengers limited to enter the platform should wait outside the station and enter the station according to the control rate of the next stage and the arrival sequence of the passengers waiting outside the station in the previous stage; if the train jumps, the skipped station passenger needs to choose to take the next train. The input to the environment has two main aspects: on the one hand the input of data and on the other hand the current state and the action taken. The inputs and outputs of the environment are as follows.

And (3) environment input: the information data of the stations comprises the spatial distribution of each station on the rail transit network, the effective area and the maximum capacity of the platform; passenger flow OD (Origin to Destination) data, including name numbers of a departure station and a Destination station for passengers to go out and take a subway and card swiping time; train delay and operation data including train delay time, position and duration, an initial train operation schedule, train marshalling, train full load rate, and operator fixation; the current state set is the passenger flow demand of entering station of each station in the m time period, namely: the number of people wishing to enter the station; the actions taken, namely: passenger flow control rate, whether the current train jumps at the current station.

And (3) outputting an environment: the status of the next period, namely: the number of people who wish to enter the station in the next time period, and the current prize value.

A flowchart of offline reinforcement learning according to an embodiment of the present invention is shown in fig. 4, and the specific processing includes: the number of passengers arriving at each station in the initial period is input into the double-depth Q network as an observation state s, actions are selected and values are predicted in an action space, an action a is selected to interact with the environment to obtain an observation state s 'and a reward value r, and the observation state s' is used for updating the state s to be input as the state of the next period; the capacity of the memory base is fixed, the observation state s, the selection action a, the reward r and the observation state s' are stored according to the principle that old memory is squeezed out by new memory, n pieces of information are extracted from the memory base when the maximum capacity of the memory base is achieved, and the purpose of updating the network parameters is achieved by utilizing the information.

An online reinforcement learning schematic diagram provided by the embodiment of the present invention is shown in fig. 5, and the specific processing steps are as follows:

step 1: according to an actual delay scene, searching a proper offline model in an offline model library by using train delay time, position and duration, and initializing online model parameters;

step 2: according to the characteristics of the actual environment, the updating frequency of the online model is reduced, the size of the training set is reduced, and the exploration rate is set to be 0;

and step 3: and the intelligent agent selects corresponding actions in the current network according to the initial state input, inputs the states and the actions into the environment to carry out interaction between the passengers and the train, obtains the next state and reward, inputs the next state and reward into the network again, and finishes the updating for G times by continuously iterating and updating, wherein G is not more than 50 because only the current network parameters are updated.

An exemplary diagram of a passenger flow control result provided by an embodiment of the present invention is shown in fig. 6, which takes a chang-level line and a line No. 13 as an example, and when a delay occurs, the delay is divided into six time intervals to control the passenger flow of a part of stations, so that accurate passenger flow control time, passenger flow control location and passenger flow control intensity are provided.

The diagram of the train adjustment result provided by the embodiment of the invention is shown in fig. 7, 138 trains are delayed at a 13-line five-crossing station, the delay time is 5 minutes, and subsequent trains are delayed jointly, so that a reasonable train operation plan is obtained through the optimization method.

In summary, the invention provides a method different from the conventional passenger flow control and train adjustment, and can provide a scheme for more accurately calculating the passenger flow control and train operation adjustment under the condition of considering the uncertain scenes of passenger travel and train delay under the condition of network scale under the condition of train delay, thereby improving the scientific reasonability of passenger transportation organizations and relieving the problem of large passenger flow aggregation caused by train delay.

The method considers the passenger flow control of the train delay offline network scale and the cooperative optimization of the train operation adjustment, and improves the cooperative effect of the network and measures for large passenger flow dispersion; the uncertain reinforcement learning method considering passenger travel and train delay enables the optimization result of passenger flow control and train operation adjustment to have multi-stage dynamic decision and robustness characteristics, and usability of the result is improved.

Those of ordinary skill in the art will understand that: the figures are schematic representations of one embodiment, and the blocks or processes shown in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for cooperative optimization of network passenger flow control and train adjustment under train delay is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining of the characteristics of time, position and duration of occurrence of train delay comprises:

according to the time t from the occurrence of train delay to the passing of the recovery operation, k time length grades TL = { t = is utilized<＝t ₁ ,t ₁ <t<＝t ₂ ,t ₂ <t<＝t ₃ ,…,t _k <t, judging the duration grade of the time from the delay of the train to the recovery of the operation.

3. The method according to claim 1, wherein the clustering iteration of the railway network delay scenario is performed according to the time, location and duration characteristics of the occurrence of the train delay, and the delay scenario is randomly generated, and comprises:

randomly setting passenger flow travel and train delay according to a probability function, constructing train initial adjustment, and constructing an off-line training environment;

setting constraint conditions of passenger flow control and train adjustment according to the time, position and duration characteristics of the occurrence of train delay, repeatedly interacting the train initial adjustment and an off-line training environment based on the constraint conditions of the passenger flow control and the train adjustment, performing reinforcement learning off-line training, and outputting an off-line model;

4. The method of claim 3, wherein randomly setting passenger travel and train delays according to a probability function comprises:

in the formula, a parameter lambda is an expected value of the occurrence frequency of random events in unit time and is used for describing the number of passengers arriving at a station averagely in unit time, and k is the number of passengers;

where x is a random variable, x obeys a mathematical expectation of μ, and variance σ ² Normal distribution of (d) is expressed as X to N (mu, sigma) ² ) Mu represents the expected value of the travel time of the passenger, and x is distributed in [ mu-v, mu + v ]]In the interior of said container body,

according to uncertainty of train delay duration, a train breaks down at a certain station to cause delay, the delay time of the train at the position follows normal distribution, and the probability of the normal distribution is as follows:

where x is a random variable, x obeys a mathematical expectation of μ, and variance σ ² Normal distribution of (d) is expressed as X to N (mu, sigma) ² ) Mu represents the expected value of delay time of the train at the station, and x is distributed in [ mu-omega, mu + omega]And (4) the following steps.

5. The method of claim 3, wherein the constraints of passenger flow control and train adjustment include:

(1) Passenger flow control constraints

In the formula (I), the compound is shown in the specification,

in order to count the number of passengers arriving at the station,

(2) Train capacity constraint

In the formula (I), the compound is shown in the specification,

at the highest loading rate, C _i The number of passengers is determined for the train;

(3) Train station jump restraint

(4) Restraint of train operation

In the formula (I), the compound is shown in the specification,

for the dwell time of the i train at the j +1 station,

for the departure time of the i train at the j station,

for the minimum run time of the j station to the j +1 station,

the minimum residence time of the train at the j station is defined;

(5) Platform infrastructure capacity constraints

(6) Passenger transfer time constraints

In the formula (I), the compound is shown in the specification,

in order to provide the passenger with the time to get off the vehicle,

the travel time of the passengers;

6. the method of claim 3, wherein said iteratively interacting said train initial adjustment with an offline training environment based on said passenger flow control and train adjustment constraints for reinforcement learning offline training comprises:

step 1: the initial training times n =0, and training starts;

step 2: initializing a train delay operation plan, a time interval m, a state s and an award r;

and step 3: m = M +1, and when M is equal to M, skipping to step 8;

and 5: inputting a current action packet and a state, interacting with the environment, and obtaining a next state, a reward value and the number of the over-limit people at the platform according to an environment function;

and 7: transmitting the data recorded in the memory base into a network for training and updating the state, obtaining the reward value of the corresponding time period, and jumping to the step 3 after the reward value is accumulated to the reward r;

and 8: and (5) skipping to the step 2 if n = n +1, n and G are woven, otherwise, finishing the training.

7. The method of claim 3, wherein constructing the railway line network passenger flow control offline model comprises:

8. The method according to claim 1, wherein the generating of the online training environment according to the actual delay occurrence scenario, and performing online training of reinforcement learning on the optimized railway line network passenger flow control offline model by using the online training environment, comprises:

according to the characteristics of an actual delay scene, initializing an online model according to a stored optimized railway line network passenger flow control offline model, wherein the online model inherits all parameters of the offline model and comprises data stored in a memory base;

according to the characteristics of an online environment, the updating frequency of an online model, the training set and the exploration rate are set, a target network of the online model is only used for evaluation, and parameters are not updated along with the increase of training times;

the number of times of training of the online model is small, the online model takes actions to interact with the environment, the environment returns a reward value and the next state, and only the current network parameters are updated in the training process and are used for generating an accurate passenger flow control and train operation scheme.

9. The method of claim 1, wherein obtaining a coordinated optimization scheme for railway line network traffic control and operation adjustment comprises:

carrying out passenger flow control on passengers entering the rail transit platform by stages of station separation, and giving passenger flow control rate at specific time and place;

when delay occurs, a compression tracking interval and a stop time strategy are utilized to update the train schedule with the aim of recovering normal running as soon as possible, the train runs according to the updated train schedule, and whether station jumping is carried out or not is selected at a station which allows station jumping.