CN117521485B

CN117521485B - Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning

Info

Publication number: CN117521485B
Application number: CN202311330407.6A
Authority: CN
Inventors: 何庆; 徐双婷; 高天赐; 杨东营; 冯晓云; 王青元; 孙鹏飞; 朱颖; 王平
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-06-18
Anticipated expiration: 2043-10-16
Also published as: CN117521485A

Abstract

The invention relates to the technical field of subway longitudinal section line design, in particular to an energy-saving design optimizing method for a subway longitudinal section line based on deep reinforcement learning, which comprises the following steps: 1. combining subway line design specification constraint and actual construction condition constraint, and establishing a subway longitudinal section line design model with the minimum train operation energy consumption as a target; 2. and solving the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm. Compared with the actual longitudinal section line design scheme, the train operation energy consumption cost and time cost can be reduced simultaneously.

Description

Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning

Technical Field

The invention relates to the technical field of subway longitudinal section line design, in particular to an energy-saving design optimizing method for a subway longitudinal section line based on deep reinforcement learning.

Background

Existing researches show that the train traction energy consumption mainly depends on line conditions, train operation organizations (train scheduling, driving strategies, stop schemes and the like) [2, 3] [2, 3] { Douglas, 2015#37; zhou, 2018#60 }, the train operation organizations are limited in energy saving effect due to the restriction of the line conditions, if the traction energy consumption is required to be further reduced, energy saving is required to be considered in a line design stage, the core content of the line design is a flat profile design, wherein the flat profile design aims at maximizing the passenger flow attraction, the energy saving requirement is less considered, and the profile design is more closely related to the traction energy consumption. The subway traffic line mainly has three laying modes of underground lines, overhead lines and ground lines. The underground line sections all pass through the underground, the gradient is generally not limited by the gradient, and the energy-saving slopes (high-station and low-station) are most conditionally used only by the factors such as underground buildings, pile foundations, underground pipelines and the like along the line.

The energy-saving slope current stage research is to combine the longitudinal section line design principle and traction simulation calculation to sum up the general principle of the longitudinal section energy-saving line design form and analyze the energy-saving effect, and the learner researches to find that the longitudinal section arrangement form of the outbound downhill slope and the inbound uphill slope is beneficial to reducing the traction energy consumption of the train. A scholars propose an energy-saving slope design method for changing the slope of an energy-saving slope by changing the elevation value of a station. However, the energy-saving slope parameters are often selected according to the tests, and the optimal energy-saving effect cannot be achieved.

With the rise of intelligent algorithms, students at home and abroad combine the design work of the vertical section line with the intelligent algorithms, and the study of automatically optimizing the scheme of the vertical section line by utilizing a computer gradually becomes a hot topic in the field of the design study of the vertical section line. The method is characterized in that a learner builds an optimization model of the urban rail transit line horizontal and vertical section on the basis of considering train operation behaviors, and adopts a genetic algorithm to solve the optimal design scheme of the line in the three-dimensional space. The learner uses a distance transformation algorithm to simultaneously optimize the railway line shape of the mountain area and the station position on the basis of considering the coupling constraint of the railway route and the station position. The learner has proposed a multi-stage decision model that jointly optimizes the vertical section line, cruise speed, and coast operating point to achieve the lowest cost solution. Based on geographic information, students adopt a multistage augmentation differential evolution algorithm to solve the design of the intercity railway horizontal and vertical section line.

Although the above researches can meet the requirements of related design specifications and are used for optimizing the energy conservation of the longitudinal section line, the methods adopted by the researches have certain limitations.

(1) Neglecting the design of the vertical section line to avoid the practical engineering constraint (underground building, pile foundation, drainage pipeline, bad geology, etc.).

(2) The PSO, GA or PSO-GA algorithm requires a predefined number of profile line intersections as inputs. Thus, these methods are applicable to "optimized" lines, rather than "designed" lines, and only optimized lines at this number of intersections can be found.

(3) Even though GA, PSO, DT and other evolutionary-based algorithms have been greatly improved, they still cannot learn like humans and are difficult to implement for primary optimization.

Disclosure of Invention

The invention provides an energy-saving design optimizing method for a subway longitudinal section line based on deep reinforcement learning, which can be used for obtaining an optimal energy-saving line of the subway longitudinal section.

The invention relates to a subway longitudinal section line energy-saving design optimizing method based on deep reinforcement learning, which comprises the following steps:

1. Combining subway line design specification constraint and actual construction condition constraint, and establishing a subway longitudinal section line design model with the minimum train operation energy consumption as a target;

2. and solving the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm.

Preferably, the subway longitudinal section line design model includes:

1) Environmental Environment: the avoidance area and the planar line gentle curve section area form an environment for optimizing the subway longitudinal section line;

2) Agent: the intelligent program is defined as an intelligent program for determining the energy-saving design trend of the longitudinal section line;

3) State: the spatial position of the change slope point is defined as the state The space position refers to that the two-dimensional coordinates of the vertical section of the subway comprise a longitudinal mileage coordinate and a vertical depth coordinate;

4) Action: the search direction of the next slope change point selected by the agent is defined as an action ；

5) Prize Reward: the value of the rewards depends on feedback of the environment, train operation energy consumption is taken as a main constituent part of the rewards, and other constituent parts are survival state rewards and target distance rewards;

6) Condition: if the agent cannot find the slave state satisfied ToAll constraint actionsI.e. current state change conditionCondition/>, is not establishedAnd will be the current stateThe initialization state is; Otherwise, the state change conditionCondition/>, is establishedThe selection of the next action is continued.

Preferably, the subway line design specification constraint and the actual construction condition constraint include:

(1) Station regional slope section slope length and slope constraint

Only one slope section is arranged in the station area, namely the sum of the lengths of the entering and exiting slope sections is larger than the length of the stationAssuming that the lengths of the entrance slope section and the exit slope section are the same, the slope length of the station areaThe constraint is expressed as:

；

Station gradient Using a given constantThe method comprises the following steps:

；

(2) Slope length and slope constraint of non-station area slope section

The slope length constraint of the non-station area slope section is expressed as follows:

；

The line is provided with the minimum gradient And maximum gradeThe method comprises the following steps:

；

(3) Minimum clip straight length

Length of straight line between two adjacent vertical curvesThe design specification requirements should be met, and the calculation formula is as follows:

；

In the method, in the process of the invention, Represents theLongitudinal mileage of each slope change point; /(I)Represents theTangent length at each slope change point; /(I)Representing the minimum clip line length in the design specification;

(4) Reverse limit grade

When the slope sections in two opposite directions are connected, the slope in one directionShould not be greater than the reverse limit gradientThe method comprises the following steps:

；

(5) Line burial depth constraint

The line burial depth is limited by any point on the lineRail head design elevation/>, of underground tunnelShould be smaller than the ground elevation/>, where the point is locatedSubtracting tunnel heightMinimum earth thicknessThe method comprises the following steps:

；

(6) Avoidance zone constraints

For all avoidance areas of subway longitudinal sectionThe representation shows that the subway longitudinal section line tunnel area is used forRepresentation ofAndShould be an empty set:

；

(7) Plane moderating curve segment constraint

Distance coordinates of slope changing pointStarting and ending mileage coordinates with plane relaxation curveThe distance of (2) should not be smaller than the tangential length of the vertical curve, i.e.:

。

Preferably, in the State, the first The state space at the end of an action is defined asThe following formula is shown:

；

Wherein, ForLongitudinal mileage coordinates of each slope change point,ForThe vertical elevation coordinates of the variable slope points, W and H are respectively the upper limits of mileage and depth of the target optimization area; in the above formula, the overall position of two continuous variable slope points is taken as one state in the longitudinal section line design optimization model.

Preferably, in the Action, the ActionStateConversion to StateActionRepresented in two parts, the formula:

；

wherein the method comprises the steps of AndIs the length and gradient of the slope section; /(I)The relation among the three is as follows:

。

preferably, in the rewards Reward, rewards The formula of (2) is as follows:

；

wherein the method comprises the steps of Representing the energy consumption cost, survival cost and distance cost from the end point of unit operation respectively; For/> Is a weight of (2).

Preferably, in the subway longitudinal section line design model, an approximate integration method is adopted to solve a train motion equation so as to solve the train operation energy consumption, and the integration upper limit and the integration lower limit in the operation time and distance formula are divided into a plurality of micro speed intervalsLet the initial and final speeds in the integration beThe formulas for calculating the running time and the running distance of the train by the approximate integration method are as follows:

Time division of operation

；

Unit resultant forceCalculation of (2)

Under different working conditions, the following forms can be written:

；

In the method, in the process of the invention, Is traction force; /(I)The mass of the motor car; /(I)Is the trailer mass; /(I)Braking force is a train unit; /(I)Is a common braking coefficient; /(I)Is the unit traction force,Resistance is the unit of train, including basic train running resistanceRamp additional resistanceCurve additional resistanceAdditional resistance to tunnel。

Preferably, the operating energy consumption calculation factors include reverse grade limit, limiting speed, and passenger capacity.

Preferably, the deep reinforcement learning algorithm is an improved D3QN algorithm, the improvement being as follows:

a. The experience replay mechanism is used for storing experiences obtained through interaction in an experience pool one by one, and after a certain number of experiences are accumulated, a model randomly extracts a certain batch of data training neural network from the experience pool in each step;

b. constructing two neural networks with the same structure, namely an estimation network And a target value networkEstimation network for a given stateCalculate action takenEstimating network parametersContinuously updating; the target network is used for calculating a time sequence differential target valueTarget value network parameterImmobilized, replaced with the latest estimated network parameters/>, at intervals; Target valueThe calculation formula is as follows:

；

In the method, in the process of the invention, For immediate rewards; /(I)The table is a discount factor; /(I)Expressed in the next stateTake the form of a Chinese character of makingMaximized action；

Maintaining constant over a period of time results in an estimated networkConvergence goalRelatively fixed;

c. The neural network structure is improved, the output end of the neural network structure is divided into two parts, and one part is a state value function for representing the quality of each state Another part is a dominance function/>, which distinguishes the quality of each action in a specific state：

；

In the method, in the process of the invention,For policy,Parameters that are dominance functions;

d. The target value of the D3QN model is as follows:

；

e. by mean square error The update network parameters as a loss function are as follows:

。

The invention builds a deep reinforcement learning model for subway longitudinal section line design, perceives, searches, judges and decides a line selection environment under the condition of no artificial experience, and finds an optimal energy-saving line scheme by feeding back different constraint conditions. Compared with the actual longitudinal section line design scheme, the train operation energy consumption cost and time cost can be reduced simultaneously.

Drawings

FIG. 1 is a flow chart of a method for optimizing energy-saving design of a subway longitudinal section line based on deep reinforcement learning in an embodiment;

fig. 2 is a flowchart of the D3QN algorithm obtained after the improvement in the embodiment.

Detailed Description

For a further understanding of the present invention, the present invention will be described in detail with reference to the drawings and examples. It is to be understood that the examples are illustrative of the present invention and are not intended to be limiting.

Examples:

as shown in fig. 1, the embodiment provides a method for optimizing energy-saving design of a subway longitudinal section line based on deep reinforcement learning, which comprises the following steps:

The subway longitudinal section line design model comprises:

1) Environmental Environment: the avoidance area (underground building, pile foundation, underground pipeline route, bad geology, etc.) and the planar line gentle curve section area form the environment for optimizing the subway longitudinal section line;

6) Condition: if the agent cannot find the slave state satisfied ToAll constraint actions(Slope length/>)And grade) I.e. current state change conditionCondition/>, is not establishedAnd the current state is takenThe initialization state is; Otherwise, the state change conditionCondition/>, is establishedThe selection of the next action is continued.

The subway line design specification constraint and the actual construction condition constraint comprise:

(1) Station regional slope section slope length and slope constraint

In a station area, if a train runs across a slope section, the vibration superposition of the train is caused, and the stable and safe running of the train is not facilitated. For this purpose, it is preferable to provide only one ramp in the station area, i.e. the sum of the lengths of the inbound and outbound ramp is greater than the length of the platformIf the lengths of the entrance slope section and the exit slope section are the same, the slope length of the station areaThe constraint is expressed as:

；

For the slope of the station area, in order to ensure that the phenomena of station-stopping train sliding, station luggage sliding and the like do not occur, the slope is as small as possible under the premise of meeting the drainage requirement, and the slope of the station in the embodiment is as small as possible Using a given constant/>, in the subway design SpecificationThe method comprises the following steps:

；

(2) Slope length and slope constraint of non-station area slope section

Subway design Specification requires the length of any segment in a non-station areaShould not be less than the long-term subway train consist lengthHowever, as the interval between stations of the subway line is shorter and the drainage ditch should be arranged in the interval, the length of a single slope section is not suitable to be too long, and the constraint of the slope length of the slope section in the non-station area is expressed as follows:

；

The gradient of the section positive line slope section is not excessively large due to the limitation of the traction capacity of the train. The maximum gradient of the positive line regulated by the subway design specification is not suitable to exceed the maximum gradient In addition, in order to facilitate drainage in the section, the line should be provided with a minimum gradientThe method comprises the following steps:

；

(3) Minimum clip straight length

；

In the method, in the process of the invention, Represents theLongitudinal mileage of each slope change point; /(I)Represents theTangent length at each slope change point; /(I)Representing the minimum clip line length in the design specification.

(4) Reverse limit grade

Although the subway design specification does not explicitly specify the maximum value of the subway gradient difference. However, in order to improve the comfort of passengers, the design and construction are convenient, and the maintenance and operation cost is reduced, so that when the two opposite slope sections are connected, the slope in one direction is the sameShould not be greater than the reverse limit gradientThe method comprises the following steps:

；

(5) Line burial depth constraint

The line burial depth is limited by any point on the lineRail head design elevation/>, of underground tunnelShould be smaller than the ground elevation/>, where the point is locatedSubtracting tunnel height(Track-to-tunnel outside diameter distance, fixed value) and minimum earth thickness(Fixed value), namely:

；

(6) Avoidance zone constraints

The subway longitudinal section line design scheme needs to consider factors such as underground soil quality, building pile foundation, pipelines and the like, and avoids areas which cannot be constructed or have great difficulty. For all avoidance areas of subway longitudinal sectionThe representation shows that the subway longitudinal section line tunnel area is used forRepresentation ofAndShould be an empty set:

；

(7) Plane moderating curve segment constraint

In the vertical curve range, the elevation of the rail surface changes with a certain curvature; in the range of the plane moderating curve, the elevation of the rail surface changes with a certain ultra-high downgrade. If the two are overlapped, on one hand, the elevation of the outer rail is not easy to control during track laying and maintenance; on the other hand, the shape of the straight-line ultrahigh downhill slope and the center vertical curve of the outer rail are changed, so that the running stability is affected. So on the basis of the known plane line type, when the longitudinal section line is designed, the mileage coordinates of the slope point are changedCoordinates of starting and ending mileage of planar relaxation curveThe distance of (2) should not be smaller than the tangential length of the vertical curve, i.e.:

。

in the State, the energy-saving design of the subway longitudinal section line refers to optimizing the number and the positions of the slope changing points to achieve the aim of energy saving during train operation. First, the The state space at the end of an action is defined asThe following formula is shown:

；

Wherein, ForLongitudinal mileage coordinates of each slope change point,ForThe vertical elevation coordinates of the variable slope points, W and H are respectively the mileage and the upper depth boundary of the target optimization area,

Note that at least two change hill locations must be considered to determine if the line meets the constraints. In the above formula, the overall position of two continuous variable slope points is taken as one state in the longitudinal section line design optimization model.

In Action, actionStateConversion to StateActionRepresented in two parts, the formula:

；

。

rewards Reward, taken by agent State fromConversion toHow to evaluate the takingBetter than other actions, a rewind of the environmental feedback is needed to reflect the quality of the action. RewardsThe formula of (2) is as follows:

；

wherein the method comprises the steps of Representing the energy consumption cost, survival cost and distance cost from the end point of unit operation respectively; For/> Is a weight of (2). Because the model optimization goal is minimum operating energy consumption, theIs negative; to ensure that the selected line meets all of the constraints mentioned above; /(I) Is selected for AGENTMovement toward the end point can be ensured. Each index is discussed further below.

A. Cost of energy consumption

The energy consumption cost comprises traction phase energy consumption, cruising phase energy consumption and braking phase energy consumption. The running strategy of the train is as follows: accelerating with maximum traction after outbound, transition toAnd then, a cruise working condition, namely constant-speed running is adopted, and when the working condition conversion point of cruise-braking is reached, the train enters a stop in a decelerating way by the maximum limiting power.

A) Traction energy consumption in traction stage

In the train traction acceleration stage, the traction energy consumption of each step length can be calculated according to a train motion equation and a functional conversion relation, and then the traction energy consumption of all the step lengths in the stage is accumulated. Traction energy consumption in traction stageThe method comprises the following steps:

；

Wherein: For/> Traction force of each step length can be according to train running speed/>, of the step lengthSearching a specific numerical value on the traction characteristic curve to obtain; /(I)The transmission efficiency constant is the train traction motor.

B) Traction energy consumption during cruising phase

The train running speed will enter the cruising stage after reaching the set speed, when the total resistance of the trainWhen the sum of the basic resistance of the train and the additional resistance of the line is positive, a certain traction force is required to maintain constant speed operation, and when the sum is negative, a certain braking force is required to maintain constant speed operation, and the traction force is zero. Thus, the train traction during cruising is at the value:

；

Wherein: Correlating the basic resistance of the train with the current running speed; /(I) The resistance of the slope section, the curve resistance and the tunnel resistance which are applied to the train when the train runs on the line are related to the current line condition.

Traction energy consumption during cruising phaseThe method comprises the following steps:

；

c) From state ToThe total unit energy cost of (2) is calculated as follows:

；

B. cost of survival

In this embodiment, it is necessary to find a feasible profile route solution that satisfies various constraints of the subway profile environment, so that a survival reward is added to the reward function to encourage the AGENT to satisfy all the constraints when selecting actions.

If the agent cannot find the slave state satisfiedTo, All constraints，ThenTake negative value and take current stateThe initialization state isRestarting line selection; if the agent can find a condition satisfying the slave stateTo, All constraints，ThenTake positive value and start nextIs selected from the group consisting of (a).

C. distance cost from end point

The action selected by the agent is prevented from merely satisfying the survival reward, and not moving toward the endpoint. Used hereinTo encourage the agent selection to move towards the endpoint. Thus we define the reward functionFromTo the point ofThe following are provided:

In the method, in the process of the invention, For stateStraight line distance to endpoint,For stateStraight line distance to endpoint.

In the subway longitudinal section line design model, the two heaviest index contents in train traction calculation are related to the running distance S when the running speed and the running time of the train are solved, and the two heaviest index contents are reflected on a graph to be a VS curve and a TS curve.

At present, three methods of a direct integration method, a graphic method and an approximate integration method are mainly used for solving the running time t and the running distance S. The approximate integration method is widely adopted at present, and is also adopted by traction electric calculation.

The approximate integral method is to divide the integral upper and lower limits in the running time and distance formula into several tiny speed intervals，GetUnit resultant force at average speed in range, setThe initial and final velocities within the interval areThe formulas for calculating the running time and the running distance of the train by the approximate integration method are as follows:

Time division of operation

；

The following is noted: The smaller the acquisition, the more accurate the calculation result, the calculation of this embodiment 。

Unit resultant forceCalculation of (2)

Under different working conditions, the following forms can be written:

；

The train running energy consumption is the energy consumption for overcoming the running resistance of the train to do work, increasing the kinetic energy of the train and overcoming the gravitational potential energy difference in the running process of the train, and the running energy consumption calculation factors comprise reverse gradient limit, speed limit and passenger capacity.

Reverse grade limit

On non-flat lines, the energy consumption is greatly increased relative to a flat line because the train needs to be continuously accelerated and decelerated in order to reach a predetermined operating speed. When the gradient difference of the variable slope points is too large (two adjacent reverse steep slope sections), the trains are alternately accelerated at a reduced speed, so that the running energy consumption of the trains is influenced, the running stability is also influenced, and the comfort level of passengers is reduced. When the actual engineering prescribes that two opposite direction slope sections are connected, one direction slope should not be larger thanIn the second period engineering is relaxed toTherefore, when the reverse limiting gradient of the embodiment affects the energy-saving design of the subway longitudinal section line, the reverse limiting gradient takes the following values:

；

Wherein: To limit the grade in the opposite direction.

Design speed

The slope section is matched with the design speed, so that the utilization of the potential energy of the line can be improved, and the energy loss caused by unnecessary braking can be reduced. The energy consumption in the train running process is mainly used for overcoming running resistance, when the train maintains the existing speed or accelerates, the energy consumption is increased when working against the resistance, and the air resistance and the train running speed are square. Train design speedThe design speed of the embodiment has obvious influence on train operation behavior, and when the design speed of the embodiment has influence on energy-saving design of subway longitudinal section lines, the design speed takes the following values:

Passenger capacity

The influence of passenger capacity on train operation energy consumption is mainly reflected on the influence on the total train traction weight, and in general, the larger the train traction quality is, the larger the required train starting and braking moment are, and the larger the power consumption of a traction motor required for meeting the operation requirement is, so that the energy consumption is increased. When the passenger capacity of the embodiment affects the energy-saving design of the subway longitudinal section line, the passenger capacity value is as follows:

Wherein: n is the passenger capacity of the vehicle, Is empty car,For person carrying，For carrying passengers for overman。

The Q-Learning algorithm will utilize the next state at each step of the explorationAnd updating. The following problems occur when this idea is directly applied to DQN:

(1) The premise of training the neural network is that the training data are assumed to be independent and distributed, and the sequence data obtained by the interaction of the intelligent agents have strong correlation, so that the network training is easy to be unstable.

(2) Parameters of DQN network are updated continuously, generated by same networkThe time sequence differential target of the neural network is changed continuously, which is unfavorable for the convergence of the algorithm.

(3) The model is not stable enough in the early stage of the training process, and the value function estimation has deviation and is usedThe model may overestimate the expected benefits of an action, misleading agents to choose false actions, resulting in the model not finding the optimal strategy.

The deep reinforcement learning algorithm is an improved D3QN algorithm, and the improvement is as follows:

a. The experience replay mechanism (Experience Replay) is used for storing the experiences obtained through interaction in the experience pool one by one, and after a certain number of experiences are accumulated, a model randomly extracts a certain batch of data training neural network from the experience pool in each step;

b. constructing two neural networks with the same structure, namely an estimation network And a target value networkEstimation network for a given stateCalculate take select actionEstimating network parametersContinuously updating; the target network is used for calculating a time sequence differential target valueTarget value network parameterImmobilized, replaced with the latest estimated network parameters/>, at intervals; Target valueThe calculation formula is as follows:

Maintaining constant over a period of time results in an estimated networkConvergence goalRelatively fixed; the actions of the maximum function generated by the valuation network and the target network are not necessarily identical, useGenerating actions,And calculating the target value, so that the model can be prevented from selecting the suboptimal motion which is overestimated, and the overestimation problem of the DQN algorithm is effectively solved.

C. The neural network structure is improved, the output end of the neural network structure is divided into two parts, and one part is a state value function for representing the quality of each stateAnother part is a dominance function for distinguishing the quality of each action in a specific state

d. The target value of the D3QN model is as follows:

e. The network parameters are updated with the mean square error E as a loss function as follows:

。

the flow of the improved D3QN algorithm is shown in figure 2.

1. Initializing: an estimation neural network and a target neural network are initialized, both of which are deep neural networks used to estimate an action value function. At the same time, an experience playback buffer is initialized for storing experience tuples for the agent to interact with the environment.

2. Collecting experience: the agent interacts with the environment, selects actions according to the current strategy, and observes the next state and instant rewards of the environment feedback. These experience tuples are stored into an experience playback buffer for subsequent training use.

3. Training an estimation network: a batch of experience tuples is randomly extracted from the experience playback buffer. For each experience tuple, an action value of the current state is estimated using an estimated neural network. Then, the action value of the next state is estimated using the target neural network. And calculating a target value of Q-learning, and updating parameters of the estimated neural network to enable the parameters to approach the target value.

4. Updating the target network: the parameters of the target neural network are updated periodically by copying the parameters of the estimated neural network to the target neural network. This helps stabilize the training process and reduces jitter of the estimation object.

5. Selecting: and selecting an action according to the current strategy. Policies such as epsilon-greedy may be used to trade-off between exploration and utilization.

6. Iterative training: and (5) repeatedly executing the steps 2 to 5, continuously collecting experience, updating an estimated value network, updating a target network, and optimizing the decision strategy of the intelligent agent.

7. Convergence and evaluation: as training proceeds, it is observed how the performance of the algorithm converges. The system can evaluate in the environment periodically, and test the performance of the trained agent in a new scene.

8. Ending training: when a predetermined number of training steps or algorithm convergence is reached, the training process is ended. The estimated neural network of the intelligent agent is the final training result and can be used for making decisions in practical application.

Case (B)

In the embodiment, a typical section in a subway line is selected as a research object, and the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors is solved. And finally, expanding the optimal design to a section of a certain subway line of the capital and comparing the optimal design with a scheme generated during the design of the human with abundant experience.

The main constraints are shown in Table 1

TABLE 1 major constraints

Results under different parameters

And analyzing the influence of different energy consumption calculation factors on the energy-saving design of the subway section line of the capital by using the established subway section line design D3QN model with the minimum energy consumption of train operation as a target.

(1) Reverse grade limit influence analysis

Here the reverse gradient limit is analyzedFor the influence of energy-saving design of a longitudinal section line, other energy consumption calculation factors are fixed to be as follows: /(I)。

① Model optimization effect: under different reverse gradient limiting conditions, the energy consumption of the energy-saving design line type calculation result is reduced by 3.51-3.83% compared with that of the original line type calculation result, and the more the energy consumption is reduced along with the increase of the reverse gradient limiting, the time of each part is close to the original time in operation.

② Linear change: the accelerating slope length of the energy-saving design line type is longer than that of the original line type, and due to the reverse slope limitation, in order to enable potential energy to be converted into kinetic energy better, the train can reach the design speed more quickly, and the first accelerating slope value of the energy-saving design line type is increased along with the reduction of the reverse limit slope. The second accelerating grade value increases with increasing reverse grade limit.

(2) Design speed impact analysis

Here, the influence of the analysis design speed on the energy-saving design of the longitudinal section line is analyzed, and other energy consumption calculation factors are fixedly valued as follows:。

① Model optimization effect: under different design speed conditions, the energy consumption of the energy-saving design line type calculation result is reduced by 1.32% -14.14% compared with that of the original line type calculation result, the energy consumption is reduced along with the increase of the speed in operation, the optimization effect of the model for the design speed near 80km/h is obvious, because the energy consumption of the original line type calculation is larger at the design speed near 80km/h, and the side surface proves that the original line type is not an excellent energy-saving slope line type.

② Linear change: the first acceleration slope value of the line energy saving design line increases with the increase of the design speed, and it is presumed that in order to enable the train to reach the design speed more quickly, the potential energy can be better converted into kinetic energy by increasing the first acceleration slope value.Linear gap ratioThe linear difference is large in order to bypass the planar mild curve segment.

(3) Passenger capacity impact analysis

Here, the passenger capacity is analyzedFor the influence of energy-saving design of a longitudinal section line, other energy consumption calculation factors are fixed to be as follows: /(I)。

① Model optimization effect: under the condition of passenger capacity, compared with the original linear calculation result, the energy consumption of the energy-saving design linear calculation result is reduced by 2.33-19.71%, the energy consumption is increased along with the increase of the passenger capacity in the operation time, the model has obvious no-load optimization effect, the energy consumption is reduced by 12.5s in the operation time, and the operation energy consumption is reduced by 19.71%.

② Linear change: although the gradient is increased along with the increase of the passenger capacity, the increase amplitude is not large, and the main change is that the length of the acceleration gradient is prolonged, so that the cruising period time can be effectively increased, and the purpose of reducing the energy consumption is achieved.

The embodiment builds a deep reinforcement learning model for subway longitudinal section line design, perceives, searches, judges and decides a line selection environment under the condition of no artificial experience, and finds an optimal energy-saving line scheme through feeding back different constraint conditions. Compared with the actual longitudinal section line design scheme, the train operation energy consumption cost and time cost can be reduced simultaneously.

The invention and its embodiments have been described above by way of illustration and not limitation, and the invention is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present invention.

Claims

1. The energy-saving design optimizing method for the subway longitudinal section line based on deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:

2. Solving an optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm;

The subway longitudinal section line design model comprises:

3) State: the spatial position of the slope change point is defined as a state s _i, and the spatial position refers to the two-dimensional coordinate of the vertical section of the subway and comprises a longitudinal mileage coordinate and a vertical depth coordinate;

4) Action: the search direction of the next slope change point selected by the agent is defined as action a _i;

6) Condition: if the agent cannot find the action a _i satisfying all the constraint conditions from the states s _i to s _i+1, that is, the current state change condition c _i does not satisfy the condition c _i =false, and initializing the current state s _i to s ₀; otherwise, the state change condition c _i is established, the condition c _i =true, and the selection of the next action is continued;

(1) Station regional slope section slope length and slope constraint

Only one slope section is arranged in the station area, namely the sum of the lengths of the entering slope section and the exiting slope section is larger than the length L _st of the platform, and the constraint of the slope length L _i of the station area is expressed as follows:

l_i≥L_st/2(i＝1,N+1)

Station grade α _i uses a given constant a _st, namely:

α_i∈A_st(i＝1,N+1)

(2) Slope length and slope constraint of non-station area slope section

l_min≤l_i≤l_max(i＝2,3,…,N)

The line is provided with a minimum gradient a _min and a maximum gradient a _max, namely:

α_min≤|α_i|≤α_max(i＝2,3,…,N)

(3) Minimum clip straight length

The length L _j of the clamping straight line between two adjacent vertical curves meets the requirements of design specifications, and the calculation formula is as follows:

L_j＝x_i-x_i-1-(T_i+T_i-1)≥L_jmin

Wherein x _i represents the longitudinal mileage of the i-th slope change point; t _i represents the tangential length at the ith slope change point; l _jmin represents the minimum clip line length in the design specification;

(4) Reverse limit grade

When two opposite direction segments are connected, the gradient α _i-1ori in one direction should not be larger than the reverse limit gradient α _reverse, namely:

|α_i-1ori|≤α_reverse(ifα_i-1×α_i<0)

(5) Line burial depth constraint

The line burial depth constraint is that the track top design elevation of the underground tunnel where any point n on the line is locatedShould be smaller than the ground elevation/>, where the point is locatedSubtracting the tunnel height H ^c and the minimum earth thickness H ^s, namely:

(6) Avoidance zone constraints

All avoidance areas of the subway longitudinal section are indicated by U _F, the tunnel area of the subway longitudinal section line is indicated by L _R, and the intersection of U _F and L _R is an empty set:

(7) Plane moderating curve segment constraint

2. the energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 1, wherein the method is characterized in that: in the State, the State space at the end of the i-1 st action is defined as S _i, as shown in the following equation:

S_i＝{[x_i-1,y_i-1,x_i,y_i]^T|x_i∈[0,W],y_i∈[-H,0]}

Wherein i=1, 2, …, n; x _i is the longitudinal mileage coordinate of the ith variable slope point, y _i is the vertical elevation coordinate of the ith variable slope point, and W and H are the upper limits of mileage and depth of the target optimization area respectively; in the above formula, the overall position of two continuous variable slope points is taken as one state in the longitudinal section line design optimization model.

3. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 2, which is characterized in that: in the Action, action a _i converts state S _i into state S _i+1, and Action a _i is represented by two parts, as follows:

A_i＝{[l_i,α_i]^T}

Wherein l _i and α _i are the segment length and grade; the relation among the three components S _i、A_i、S_i+1 is as follows:

S_i＝[x_i-1,y_i-1,x_i,y_i]

S_i+1＝[x_i,y_i,x_i+1,y_i+1]＝[x_i,y_i,x_i+l_i,y_i+l_i×α_i/1000].

4. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 3, wherein the method is characterized in that: in prize Reward, the formula for prize R _i is as follows:

wherein the method comprises the steps of Representing the energy consumption cost, survival cost and distance cost from the end point of unit operation respectively; u _e、u_s、u_d isIs a weight of (2).

5. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 4, wherein the method is characterized in that: in a subway longitudinal section line design model, a train motion equation is solved by adopting an approximate integration method to solve the train operation energy consumption, an integration upper limit and a lower limit in an operation time and distance formula are divided into a plurality of micro speed intervals Deltav, and the initial speed and the final speed in the integration are set as v ₁、v₂, so that a formula for solving the train operation time and the train operation distance by adopting the approximate integration method is obtained as follows:

Time division of operation

Distance of travel

Calculation of unit resultant force c _p

Under different working conditions, the following forms can be written:

① Traction operation:

② Idle running:

c_p＝-w＝-(w₀(v_s)+w_i(s)+w_r(s)+w_s(s))

③ Braking operation:

c_p＝-(w+β_b·b)

Wherein F is traction; p is the mass of the motor car; g is the trailer mass; b is the unit braking force of the train; beta _b is the service brake coefficient; f is unit traction, w is train unit resistance, including train running base resistance w ₀(v_s), ramp additional resistance w _i(s), curve additional resistance w _r(s), tunnel additional resistance w _s(s).

6. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 5, wherein the method is characterized in that: the operating energy consumption calculation factors include reverse grade limit, limiting speed, and passenger capacity.

7. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning of claim 6, wherein the method is characterized by comprising the following steps of: the deep reinforcement learning algorithm is an improved D3QN algorithm, and the improvement is as follows:

b. constructing two neural networks with the same structure, namely an estimated network Q _E (s, a; theta) and a target value network Q _T (s, a; theta'), wherein the estimated network is used for calculating an expected cumulative prize for taking action a in a given state s, and the estimated network parameter theta is continuously updated; the target network is used for calculating a time sequence differential target value Y, the target network parameter theta' is fixed, and the target network parameter theta is replaced by the latest estimated network parameter theta at intervals; the target value Y is calculated as follows:

Y＝r+γQ_T(s′,argmax_a′Q_E(s′,a′；θ)；θ′)

Wherein r is an immediate prize; the gamma table is a discount factor; argmax _a′Q_E (s ', a'; θ) represents an action a 'taken to maximize Q _E (s', a '; θ) in the next state s';

The θ' remains unchanged for a period of time, resulting in the estimation network Q _E converging the target Y relatively fixed;

c. The neural network structure is improved, the output end of the neural network structure is divided into two parts, one part is a state value function V (s; theta, mu) representing the quality of each state, and the other part is a dominance function A (s, a; theta, omega) distinguishing the quality of each action of a specific state:

Q(s,a；θ,μ,ω)＝V(s；θ,μ)+A(s,a；θ,ω)

Wherein μ is a policy, ω is a parameter of the dominance function;

d. The target value of the D3QN model is as follows:

Y＝r+γQ_T(s′,argmax_a′Q_E(s′,a′；θ,μ,ω)；θ′,μ′,ω′)

L(θ,μ,ω)＝E(Y-Q_E(s,a;θ,μ,ω))²。