CN117521485A

CN117521485A - Optimizing method for energy-saving design of longitudinal subway lines based on deep reinforcement learning

Info

Publication number: CN117521485A
Application number: CN202311330407.6A
Authority: CN
Inventors: 何庆; 徐双婷; 高天赐; 杨东营; 冯晓云; 王青元; 孙鹏飞; 朱颖; 王平
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-02-06
Anticipated expiration: 2043-10-16
Also published as: CN117521485B

Abstract

The present invention relates to the technical field of subway longitudinal section line design, and relates to a method for energy-saving design and optimization of subway longitudinal section lines based on deep reinforcement learning. It includes the following steps: 1. Combine the subway line design specification constraints and actual construction condition constraints to establish A design model for longitudinal subway lines with the goal of minimizing train operation energy consumption; 2. Using deep reinforcement learning algorithms to solve the optimal energy-saving subway longitudinal lines under different operating energy consumption calculation factors. Compared with the actual longitudinal section line design scheme, the present invention can simultaneously reduce the energy consumption cost and time cost of train operation.

Description

Optimization method for energy-saving design of subway longitudinal section based on deep reinforcement learning

技术领域Technical Field

本发明涉及地铁纵断面线路设计技术领域，具体地说，涉及一种基于深度强化学习的地铁纵断面线路节能设计寻优方法。The present invention relates to the technical field of subway longitudinal section line design, and in particular to a subway longitudinal section line energy-saving design optimization method based on deep reinforcement learning.

背景技术Background Art

既有研究表明列车牵引能耗主要取决于线路条件、列车运营组织（列车调度、行驶策略、停站方案等）[2, 3][2, 3]{Douglas, 2015 #37;Zhou, 2018 #60}{Douglas, 2015#37;Zhou, 2018 #60}，列车运营组织受到线路条件的制约，节能效果有限，若想进一步降低牵引能耗，需多在线路设计阶段考虑节能，线路设计核心内容是平纵断面设计，其中平面设计以最大化客流吸引为目标，较少考虑节能的需要，而纵断面设计与牵引能耗关系更加密切。地铁交通的线路主要有地下线、高架线和地面线三种敷设方式。地下线区间均在地下穿行，坡度一般不受坡度限制，只受到沿线地下建筑、桩基、地下管线等因素的限制，最有条件使用节能坡（高站位、低区间）。Existing studies have shown that train traction energy consumption mainly depends on line conditions and train operation organization (train scheduling, driving strategy, stop plan, etc.) [2, 3][2, 3]{Douglas, 2015 #37;Zhou, 2018 #60}{Douglas, 2015#37;Zhou, 2018 #60}. Train operation organization is restricted by line conditions and has limited energy-saving effects. If we want to further reduce traction energy consumption, we need to consider energy saving more in the line design stage. The core content of line design is horizontal and vertical section design. The horizontal design aims to maximize passenger flow attraction and rarely considers the need for energy saving. The vertical section design is more closely related to traction energy consumption. There are three main laying methods for subway transportation lines: underground lines, elevated lines and ground lines. Underground line sections all run underground, and the slope is generally not limited by the slope. It is only limited by factors such as underground buildings, pile foundations, and underground pipelines along the line. It is most suitable to use energy-saving slopes (high station, low section).

节能坡现阶段研究，是从节约能耗的角度出发，结合纵断面线路设计原则和牵引仿真计算，总结出纵断面节能线型设计形式的一般原则并分析其节能效果，有学者研究发现出站下坡和进站上坡的纵断面布置形式有助于降低列车牵引能耗。有学者提出一种改变车站高程值来改变节能坡坡度的节能坡设计方法。然而节能坡参数往往依取经验选取，不一定能达到最佳的节能效果。At present, the research on energy-saving slopes is based on the perspective of energy saving. Combining the longitudinal section line design principles and traction simulation calculations, the general principles of longitudinal section energy-saving line design forms are summarized and their energy-saving effects are analyzed. Some scholars have found that the longitudinal section layout of the downhill exit and the uphill entrance helps to reduce the train traction energy consumption. Some scholars have proposed an energy-saving slope design method that changes the station elevation value to change the slope of the energy-saving slope. However, the parameters of the energy-saving slope are often selected based on experience, which may not necessarily achieve the best energy-saving effect.

随着智能算法的兴起，国内外学者将纵断面线路设计工作与智能算法相结合，利用计算机自动优化纵断面线路方案的研究逐渐成为纵断面线路设计研究领域的热点话题。有学者在考虑列车运行行为的基础上构建了城市轨道交通线路平纵断面的优化模型，采用遗传算法求解三维空间中的线路最优设计方案。有学者在考虑铁路路线与车站位置的耦合约束的基础上，使用距离变换算法同时优化山区铁路线形和车站位置。有学者提出了多阶段决策模型，联合优化纵断面线路、巡航速度和滑行操作点，得到最低成本的解决方案。有学者基于地理信息，采用多级增广差分进化算法求解城际铁路平纵断面线路设计。With the rise of intelligent algorithms, domestic and foreign scholars have combined longitudinal section line design with intelligent algorithms, and the study of using computers to automatically optimize longitudinal section line plans has gradually become a hot topic in the field of longitudinal section line design research. Some scholars have constructed an optimization model for the horizontal and vertical sections of urban rail transit lines based on the train running behavior, and used genetic algorithms to solve the optimal design plan for the line in three-dimensional space. Some scholars have used the distance transformation algorithm to simultaneously optimize the mountain railway line shape and station location based on the coupling constraints of the railway route and the station location. Some scholars have proposed a multi-stage decision-making model to jointly optimize the longitudinal section line, cruising speed and taxiing operation point to obtain the lowest cost solution. Some scholars have used a multi-level augmented differential evolution algorithm based on geographic information to solve the horizontal and vertical section line design of intercity railways.

上述研究虽然能够满足相关设计规范要求，用于纵断面线路节能优化，但这些研究采用的方法存在一定的局限性。Although the above studies can meet the requirements of relevant design specifications and be used for energy-saving optimization of longitudinal sections, the methods used in these studies have certain limitations.

（1）忽视了纵断面线路设计需避实际工程约束（地下建筑、桩基、排水管线和不良地质等）。(1) The need to avoid actual engineering constraints (underground buildings, pile foundations, drainage pipelines, and poor geology) in longitudinal section line design was ignored.

（2）PSO、GA或PSO-GA等算法需要一个预先定义数量的纵断面线路交点作为输入。因此，这些方法适用于“优化”线路，而不是“设计”线路，只能找到该数量交点下的优化线路。(2) Algorithms such as PSO, GA or PSO-GA require a predefined number of longitudinal line intersections as input. Therefore, these methods are suitable for "optimizing" lines rather than "designing" lines, and can only find the optimal line under this number of intersections.

（3）即使GA、PSO、DT等其他基于进化的算法已经得到了大量改进，但它依旧无法像人类一样学习，难以实现主级优化。(3) Even though GA, PSO, DT and other evolution-based algorithms have been greatly improved, they still cannot learn like humans and have difficulty achieving primary-level optimization.

发明内容Summary of the invention

本发明的内容是提供一种基于深度强化学习的地铁纵断面线路节能设计寻优方法，其能够较佳地得到地铁纵断面最优节能线路。The content of the present invention is to provide a method for optimizing the energy-saving design of subway longitudinal sections based on deep reinforcement learning, which can better obtain the optimal energy-saving line of the subway longitudinal section.

根据本发明的基于深度强化学习的地铁纵断面线路节能设计寻优方法，其包括以下步骤：The method for optimizing energy-saving design of subway longitudinal section lines based on deep reinforcement learning according to the present invention comprises the following steps:

一、结合地铁线路设计规范约束与实际施工条件约束，建立以列车运行能耗最小为目标的地铁纵断面线路设计模型；1. Combining the constraints of subway line design specifications with the constraints of actual construction conditions, a subway longitudinal section line design model with the goal of minimizing train operation energy consumption is established;

二、采用深度强化学习算法求解不同运行能耗计算因子下的地铁纵断面最优节能线路。2. Use deep reinforcement learning algorithm to solve the optimal energy-saving route of the subway longitudinal section under different operating energy consumption calculation factors.

作为优选，地铁纵断面线路设计模型包括：As a preferred option, the subway longitudinal section line design model includes:

1）环境Environment：避让区和平面线路缓和曲线段区域构成了地铁纵断面线路优化的环境；1) Environment: The avoidance area and the transition curve section of the horizontal line constitute the environment for the optimization of the subway longitudinal section line;

2）代理Agent：被定义为决定纵断面线路节能设计走向的智能程序；2) Agent: It is defined as an intelligent program that determines the direction of energy-saving design of longitudinal sections;

3）状态State：变坡点的空间位置被定义为状态，空间位置指的是地铁纵断面二维坐标包括纵向里程坐标与垂向深度坐标；3) State: The spatial position of the slope change point is defined as the state , spatial position refers to the two-dimensional coordinates of the subway longitudinal section, including the longitudinal mileage coordinates and the vertical depth coordinates;

4）动作Action：被代理选择的下一个变坡点的搜索方向被定义为动作；4) Action: The search direction of the next slope change point selected by the agent is defined as the action ;

5）奖励Reward：奖励的数值取决于环境的反馈，以列车运行能耗作为奖励的主要构成部分，其他构成部分为生存状态奖励与目标距离奖励；5) Reward: The value of the reward depends on the feedback from the environment. The energy consumption of the train is the main component of the reward, and the other components are the survival status reward and the target distance reward;

6）条件Condition：如果agent无法找到满足从状态到所有约束条件的动作，即当前状态变化条件不成立condition，并将当前状态初始化状态为；反之状态变化条件成立condition，继续下一个动作的选择。6) Condition: If the agent cannot find a state that satisfies arrive All constraints actions , that is, the current state change condition Not established condition , and the current state The initialization state is ; Otherwise, the state change condition Establishment condition , continue to select the next action.

作为优选，地铁线路设计规范约束与实际施工条件约束包括：As a preferred option, the subway line design specification constraints and actual construction condition constraints include:

（1）车站区域坡段坡长和坡度约束(1) Constraints on slope length and slope gradient in the station area

在车站区域仅设置一个坡段，即进站、出站坡段长度之和大于站台长度，假设进、出站坡段长度相同，则车站区域坡长约束表示为：Only one ramp is set up in the station area, that is, the sum of the lengths of the entrance and exit ramps is greater than the platform length. , assuming that the lengths of the entrance and exit slopes are the same, then the slope length of the station area is The constraints are expressed as:

； ;

车站坡度使用给定常数，即：Station slope Use the given constant ,Right now:

； ;

（2）非车站区域坡段坡长和坡度约束(2) Slope length and slope constraints in non-station areas

非车站区域坡段坡长约束表示为：The slope length constraint of the non-station area is expressed as:

； ;

线路设有最小坡度和最大坡度，即：The line has a minimum slope and maximum slope ,Right now:

； ;

（3）最小夹直线长度(3) Minimum straight line length

两相邻竖曲线之间的夹直线长度应满足设计规范要求，计算公式如下：The length of the straight line between two adjacent vertical curves It should meet the design specification requirements, and the calculation formula is as follows:

； ;

式中，表示第个变坡点的纵向里程；表示第个变坡点处的切线长；表示设计规范中最小夹直线长度；In the formula, Indicates The longitudinal mileage of each slope change point; Indicates The length of the tangent line at the slope change point; Indicates the minimum straight line length in the design specification;

（4）反向限制坡度(4) Reverse limit slope

两相反方向的坡段连接时，其中一个方向的坡度不应大于反向限制坡度，即：When two slope sections in opposite directions are connected, the slope in one direction is Should not be greater than the reverse limit slope ,Right now:

； ;

（5）线路埋深约束(5) Line burial depth constraints

线路埋深约束是，线路上任意一点所在的地下隧道的轨顶设计高程，应小于该点所在的地面高程减去隧道高度以及最小覆土厚度，即：The line depth constraint is that any point on the line Design elevation of the track top of the underground tunnel , should be less than the ground elevation of the point Subtract tunnel height and minimum cover thickness ,Right now:

； ;

（6）避让区域约束(6) Avoidance area constraints

地铁纵断面所有避让区域用表示，地铁纵断面线路隧道区域用表示，和的交集应该为空集：All avoidance areas of subway longitudinal sections Indicates that the subway longitudinal section line tunnel area is used express, and The intersection of should be an empty set:

； ;

（7）平面缓和曲线段约束(7) Plane transition curve segment constraints

变坡点里程坐标与平面缓和曲线起终点里程坐标的距离不应小于竖曲线的切线长，即：Slope change point mileage coordinates The coordinates of the start and end points of the plane transition curve The distance should not be less than the tangent length of the vertical curve, that is:

。 .

作为优选，状态State中，第个动作结束时的状态空间定义为，如下式所示：As a preferred embodiment, in the state State, The state space at the end of each action is defined as , as shown below:

； ;

其中，为第个变坡点的纵向里程坐标，为第个变坡点的垂向高程坐标，W和H分别为目标优化区域里程与深度上界；上式中，取连续两个变坡点的整体位置作为纵断面线路设计优化模型中的一个状态。in, For the The longitudinal mileage coordinates of the slope change points, For the The vertical elevation coordinates of the slope change points, W and H are the upper limits of the mileage and depth of the target optimization area respectively; in the above formula, the overall position of two consecutive slope change points is taken as a state in the longitudinal section line design optimization model.

作为优选，动作Action中，动作将状态转化为状态，动作由两部分表示，如下式：As a preference, in the action Action, the action The status Convert to status ,action It is expressed in two parts as follows:

； ;

其中和是坡段长度和坡度；三者之间的关系为：in and is the length and slope of the slope; The relationship between the three is:

。 .

作为优选，奖励Reward中，奖励的公式如下：As a preferred reward, the reward The formula is as follows:

； ;

其中分别代表单位运行能耗成本、生存成本、距离终点的距离成本；为的权重。in They represent the unit operating energy cost, survival cost, and distance cost to the destination respectively; for The weight of .

作为优选，在地铁纵断面线路设计模型中，采用近似积分法解算列车运动方程式，以求解列车运行能耗，将运行时分和距离公式中的积分上、下限间划分为若干微小速度间隔，设积分内的初、末速为，则得近似积分法解算列车运行时分和运行距离的公式为：As a preferred method, in the subway longitudinal section line design model, the approximate integration method is used to solve the train motion equation to solve the train running energy consumption, and the upper and lower limits of the integral in the running time and distance formula are divided into several small speed intervals. , let the initial and final velocities in the integral be , then the formula for calculating the train running time and running distance using the approximate integral method is:

运行时分Run time

； ;

单位合力的计算Unit Force Calculation

不同工况下，可写出如下形式：Under different working conditions, it can be written in the following form:

； ;

式中，为牵引力；为动车质量；为拖车质量；为列车单位制动力；为常用制动系数；为单位牵引力，为列车单位阻力，其中包括列车运行基本阻力，坡道附加阻力，曲线附加阻力，隧道附加阻力。In the formula, For traction; The mass of the motor vehicle; For the trailer quality; is the unit braking force of the train; is the common braking coefficient; is the unit traction force, Unit resistance of the train, including basic resistance of train operation , ramp additional resistance , curve additional resistance , additional tunnel resistance .

作为优选，运行能耗计算因子包括反向坡度限制、限制速度和载客量。Preferably, the operating energy consumption calculation factors include reverse slope limit, speed limit and passenger capacity.

作为优选，深度强化学习算法为改进的D3QN算法，改进如下：Preferably, the deep reinforcement learning algorithm is an improved D3QN algorithm, which is improved as follows:

a、使用经验重放机制将交互得到的经验逐条存储在经验池中，积累到一定数量后，模型每步从经验池中随机抽取一定批次的数据训练神经网络；a. Use the experience replay mechanism to store the interactive experiences one by one in the experience pool. After accumulating a certain amount, the model randomly extracts a certain batch of data from the experience pool to train the neural network at each step;

b、构造两个结构相同的神经网络，分别为估值网络和目标值网络，估值网络用于给定状态，计算采取动作的期望累积奖励，估值网络参数不断更新；目标网络用于计算时序差分目标值，目标值网络参数固定不动，每隔一段时间替换为最新的估值网络参数；目标值计算公式如下：b. Construct two neural networks with the same structure, namely the valuation network and target value network , the valuation network is used for a given state , calculate the action to take The expected cumulative reward of , the estimated network parameters Continuously updated; the target network is used to calculate the timing difference target value , target value network parameter Fixed, replaced with the latest estimated network parameters at regular intervals ; Target value The calculation formula is as follows:

； ;

式中，为立即奖励；表为折扣因子；表示在下一个状态中采取使最大化的动作；In the formula, For immediate reward; Table is the discount factor; Indicates that in the next state In Maximizing Action ;

一段时间内保持不变导致估值网络收敛目标相对固定； Remaining the same over a period of time leads to a valuation network Convergence Target Relatively fixed;

c、对神经网络结构做出改进，将其输出端分流为两部分，一部为表征各状态好坏的状态值函数，另一部为区别特定状态各动作好坏的优势函数：c. Improve the neural network structure and divide its output into two parts, one of which is the state value function that characterizes the quality of each state. , and the other is the advantage function that distinguishes the good and bad actions in a specific state :

； ;

式中，为策略，为优势函数的参数；In the formula, For strategy, is the parameter of the advantage function;

d、D3QN模型的目标值如下式：d. The target value of the D3QN model is as follows:

； ;

e、以均方差作为损失函数更新网络参数如下：e. Mean square error As the loss function, the network parameters are updated as follows:

。 .

本发明构建了面向地铁纵断面线路设计的深度强化学习模型，在无人工经验的情况下对选线环境进行感知、搜索、判断、决策，通过对不同约束条件反馈，寻找到最优节能的线路方案。本发明相较于实际纵断面线路设计方案可同时降低列车运行能耗成本和时间成本。The present invention constructs a deep reinforcement learning model for subway longitudinal section line design, which senses, searches, judges, and makes decisions on the line selection environment without manual experience, and finds the optimal energy-saving line plan by feedback on different constraints. Compared with the actual longitudinal section line design plan, the present invention can reduce the energy consumption cost and time cost of train operation at the same time.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为实施例中一种基于深度强化学习的地铁纵断面线路节能设计寻优方法的流程图；FIG1 is a flow chart of a method for optimizing energy-saving design of a subway longitudinal section line based on deep reinforcement learning in an embodiment;

图2为实施例中改进后得到的D3QN算法流程图。FIG2 is a flow chart of the improved D3QN algorithm in the embodiment.

具体实施方式DETAILED DESCRIPTION

为进一步了解本发明的内容，结合附图和实施例对本发明作详细描述。应当理解的是，实施例仅仅是对本发明进行解释而并非限定。In order to further understand the content of the present invention, the present invention is described in detail in conjunction with the accompanying drawings and embodiments. It should be understood that the embodiments are only for explaining the present invention and are not intended to limit it.

实施例：Example:

如图1所示，本实施例提供了一种基于深度强化学习的地铁纵断面线路节能设计寻优方法，其包括以下步骤：As shown in FIG1 , this embodiment provides a method for optimizing energy-saving design of a subway longitudinal section line based on deep reinforcement learning, which includes the following steps:

地铁纵断面线路设计模型包括：The subway longitudinal section line design model includes:

1）环境Environment：避让区（地下建筑物、桩基、地下管道路线、不良地质等）和平面线路缓和曲线段区域构成了地铁纵断面线路优化的环境；1) Environment: The avoidance area (underground buildings, pile foundations, underground pipeline routes, adverse geology, etc.) and the plane line transition curve section area constitute the environment for subway longitudinal section line optimization;

6）条件Condition：如果agent无法找到满足从状态到所有约束条件的动作（坡长与坡度），即当前状态变化条件不成立condition，并将当前状态初始化状态为；反之状态变化条件成立condition，继续下一个动作的选择。6) Condition: If the agent cannot find a state that satisfies arrive All constraints actions (Slope length With slope ), that is, the current state change condition Not established condition , and the current state The initialization state is ; Otherwise, the state change condition Establishment condition , continue to select the next action.

地铁线路设计规范约束与实际施工条件约束包括：The constraints of subway line design specifications and actual construction conditions include:

在车站区域，若列车跨坡段运行，将导致列车振动叠加，不利于列车运行平稳、安全。为此，在车站区域最好仅设置一个坡段，即进站、出站坡段长度之和大于站台长度，设进、出站坡段长度相同，则车站区域坡长约束表示为：In the station area, if the train runs across a slope, it will cause the train vibration to be superimposed, which is not conducive to the smooth and safe operation of the train. For this reason, it is best to set up only one slope in the station area, that is, the sum of the length of the entrance and exit slopes is greater than the platform length. , assuming that the lengths of the entrance and exit slopes are the same, then the slope length of the station area is The constraints are expressed as:

； ;

对于车站区域坡度，为保证不发生停站列车溜车和站台行李滑落等现象，故在满足排水要求的前提下尽可能使用较小的坡度，本实施例车站坡度使用《地铁设计规范》中的给定常数，即：For the slope of the station area, in order to prevent the train from slipping and the luggage from falling off the platform, a smaller slope is used as much as possible while meeting the drainage requirements. Use the given constants in the Metro Design Code ,Right now:

； ;

《地铁设计规范》要求非车站区域内任意坡段的坡段长度应不小于远期地铁列车编组长度，但由于地铁线路区间站间距较短且在区间应设置排水沟，单个坡段长度不宜过长，则非车站区域坡段坡长约束表示为：The "Metro Design Specifications" require that the length of any slope section in the non-station area Should not be less than the future length of subway trains However, since the distance between subway stations is short and drainage ditches should be set up in the interval, the length of a single slope section should not be too long. The slope length constraint of the non-station area is expressed as:

； ;

受列车牵引能力限制，区间正线坡段的坡度不宜过大。《地铁设计规范》规定正线的最大坡度不宜超过最大坡度，此外为了方便区间排水，线路应设有最小坡度，即：Due to the limitation of train traction capacity, the slope of the main line section should not be too steep. The "Metro Design Specifications" stipulate that the maximum slope of the main line should not exceed the maximum slope In addition, in order to facilitate drainage of the section, the line should have a minimum slope ,Right now:

； ;

（3）最小夹直线长度(3) Minimum straight line length

； ;

式中，表示第个变坡点的纵向里程；表示第个变坡点处的切线长；表示设计规范中最小夹直线长度。In the formula, Indicates The longitudinal mileage of each slope change point; Indicates The length of the tangent line at the slope change point; Indicates the minimum clip line length in the design specification.

（4）反向限制坡度(4) Reverse limit slope

虽然《地铁设计规范》没有明确规定地铁坡度差最大值取值。但为了提高乘客舒适度，方便设计施工，减少养护维修和运营费用等多方面考虑，所以，两相反方向的坡段连接时，其中一个方向的坡度不应大于反向限制坡度，即：Although the "Metro Design Code" does not clearly stipulate the maximum value of the subway gradient difference, in order to improve passenger comfort, facilitate design and construction, reduce maintenance and operation costs, etc., when two slope sections in opposite directions are connected, the slope in one direction is Should not be greater than the reverse limit slope ,Right now:

； ;

（5）线路埋深约束(5) Line burial depth constraints

线路埋深约束是，线路上任意一点所在的地下隧道的轨顶设计高程，应小于该点所在的地面高程减去隧道高度（轨顶到隧道外径距离，固定值）以及最小覆土厚度（固定值），即：The line depth constraint is that any point on the line Design elevation of the track top of the underground tunnel , should be less than the ground elevation of the point Subtract tunnel height (distance from rail top to tunnel outer diameter, fixed value) and minimum cover thickness (fixed value), that is:

； ;

（6）避让区域约束(6) Avoidance area constraints

地铁纵断面线路设计方案需考虑地下土质、建筑物桩基、管线等因素，避开无法施工或难度较大的区域。地铁纵断面所有避让区域用表示，地铁纵断面线路隧道区域用表示，和的交集应该为空集：The design of the subway longitudinal section line should take into account factors such as underground soil, building pile foundations, pipelines, etc., and avoid areas where construction is impossible or difficult. Indicates that the subway longitudinal section line tunnel area is used express, and The intersection of should be an empty set:

； ;

竖曲线范围内，轨面高程以一定曲率变化；平面缓和曲线范围内，轨面高程以一定的超高顺坡变化。如两者重叠，一方面在轨道铺设和养护时，外轨高程不宜控制；另一方面，外轨的直线形超高顺坡和圆心竖曲线，都要改变形状，影响行车的平稳。故在已知平面线型的基础上，纵断面线路设计时，变坡点里程坐标与平面缓和曲线起终点里程坐标的距离不应小于竖曲线的切线长，即：Within the vertical curve range, the track surface elevation changes with a certain curvature; within the plane transition curve range, the track surface elevation changes with a certain superelevation along the slope. If the two overlap, on the one hand, the outer rail elevation is not easy to control during track laying and maintenance; on the other hand, the outer rail's straight superelevation along the slope and the center vertical curve will change shape, affecting the stability of driving. Therefore, based on the known plane line type, when designing the longitudinal section line, the mileage coordinates of the slope change point The coordinates of the start and end points of the plane transition curve The distance should not be less than the tangent length of the vertical curve, that is:

。 .

状态State中，地铁纵断面线路节能设计是指优化变坡点的个数与位置来达到列车运行节能目的。第个动作结束时的状态空间定义为，如下式所示：In the State, the energy-saving design of the subway longitudinal section line refers to optimizing the number and location of slope change points to achieve the purpose of energy saving in train operation. The state space at the end of each action is defined as , as shown below:

； ;

其中，为第个变坡点的纵向里程坐标，为第个变坡点的垂向高程坐标，W和H分别为目标优化区域里程与深度上界，in, For the The longitudinal mileage coordinates of the slope change points, For the The vertical elevation coordinates of the slope change points, W and H are the upper limits of the mileage and depth of the target optimization area, respectively.

注意，必须考虑至少两个变坡点位置，才能确定线路是否满足约束。上式中，取连续两个变坡点的整体位置作为纵断面线路设计优化模型中的一个状态。Note that at least two slope change points must be considered to determine whether the line satisfies the constraints. In the above formula, the overall position of two consecutive slope change points is taken as a state in the longitudinal line design optimization model.

动作Action中，动作将状态转化为状态，动作由两部分表示，如下式：In Action, The status Convert to status ,action It is expressed in two parts as follows:

； ;

。 .

奖励Reward中，通过agent采取，将状态从转化为，如何评价采取的比其他action“更好”，则需要环境反馈的reward来反应动作的好坏。奖励的公式如下：In the reward Reward, the agent takes , change the state from Convert to How to evaluate the adoption of If an action is "better" than other actions, then a reward from the environment is needed to reflect the quality of the action. The formula is as follows:

； ;

其中分别代表单位运行能耗成本、生存成本、距离终点的距离成本；为的权重。因为模型优化目标是最小运行能耗，故为负值；是为了保证所选线路满足上述所提及的所有约束条件；是为了AGENT能选出的能够保证朝着终点移动。下面将进一步讨论每个指标。in They represent the unit operating energy cost, survival cost, and distance cost to the destination respectively; for Since the model optimization goal is to minimize the operating energy consumption, is a negative value; This is to ensure that the selected route meets all the constraints mentioned above; It is for the AGENT to choose Each indicator is discussed further below.

A、能耗成本A. Energy consumption cost

A、能耗成本 A. Energy consumption cost

能耗成本包括牵引阶段能耗、巡航阶段能耗、制动阶段能耗。列车的运行策略为：出站之后以最大牵引力加速，转换至后采用巡航工况，即匀速运行，当到达巡航-制动的工况转换点后，列车以最大限制动力减速进站。Energy consumption costs include energy consumption in the traction phase, cruising phase, and braking phase. The train operation strategy is: after leaving the station, accelerate with maximum traction and switch to The train then adopts the cruise mode, i.e. runs at a constant speed. When it reaches the cruise-brake mode conversion point, it decelerates and enters the station at the maximum limited power.

a）牵引阶段牵引能耗a) Traction energy consumption during traction phase

在列车牵引加速阶段，可以根据列车运动方程和功能转换关系计算每个步长的牵引能耗，然后对该阶段所有步长的牵引能耗进行累加。牵引阶段牵引能耗为：In the train traction acceleration phase, the traction energy consumption of each step can be calculated based on the train motion equation and the functional conversion relationship, and then the traction energy consumption of all steps in this phase is accumulated. for:

； ;

式中：为第个步长的牵引力，可根据该步长的列车运行速度在牵引特性曲线上查找具体数值得到；为列车牵引电机传动效率常数。Where: For the The traction force of each step can be calculated based on the train running speed of the step. Find the specific value on the traction characteristic curve; is the transmission efficiency constant of the train traction motor.

b）巡航阶段牵引能耗b) Traction energy consumption during cruising phase

列车运行速度达到设定速度后将进入巡航阶段，当列车总阻力（列车基本阻力和线路附加阻力之和）为正时，需要一定的牵引力维持匀速运行，为负时，需要一定的制动力维持匀速运行，牵引力为零。因此，巡航阶段的列车牵引力在数值为：When the train speed reaches the set speed, it will enter the cruising stage. When (the sum of the basic resistance of the train and the additional resistance of the line) is positive, a certain traction force is required to maintain a constant speed. When it is negative, a certain braking force is required to maintain a constant speed, and the traction force is zero. Therefore, the train traction force in the cruising stage is:

； ;

式中：为列车基本阻力与当前运行速度相关；为列车在线路上运行时受到的坡段阻力、曲线阻力、隧道阻力与当前线路情况相关。Where: The basic resistance of the train is related to the current running speed; The slope resistance, curve resistance and tunnel resistance that the train encounters when running on the line are related to the current line conditions.

巡航阶段牵引能耗为：Traction energy consumption during cruising for:

； ;

c）从状态到的总单位能量成本计算如下：c) From the state arrive The total unit energy cost is calculated as follows:

； ;

B、生存成本 B. Cost of living

在本实施例中，寻找一条满足地铁纵断面环境各种约束的可行纵断面线路方案是非常必要的，因此将一个生存奖励添加到奖励功能里，以激励AGENT在选择动作时能够满足所有约束条件。In this embodiment, it is very necessary to find a feasible longitudinal section route plan that meets various constraints of the subway longitudinal section environment, so a survival reward is added to the reward function to encourage the agent to meet all constraints when selecting actions.

如果agent无法找到满足从状态到所有约束条件的，，则取负值并将当前状态初始化状态为，重新开始选线；如果agent能找到一个满足从状态到所有约束条件的，，则取正值并开始下一个的选取。If the agent cannot find a state that satisfies arrive All constraints , ,but Take the negative value and set the current state The initialization state is , restart line selection; if the agent can find a satisfying state arrive All constraints , ,but Take a positive value and start the next Selection.

C、距离终点的距离成本 C. Distance cost from the destination

为了防止代理选择的动作仅仅满足生存奖励，而不是朝着终点方向移动。这里使用来激励代理选择的动作是朝着终点移动的。因此，我们定义奖励函数, 从到如下：To prevent the agent from choosing actions that only satisfy the survival reward instead of moving towards the end point, we use To encourage the agent to choose actions that move towards the end point. Therefore, we define the reward function , from arrive as follows:

式中，为状态到终点的直线距离，为状态到终点的直线距离。In the formula, Status The straight-line distance to the end point, Status The straight-line distance to the end point.

在地铁纵断面线路设计模型中，列车牵引计算中两个最重的指标内容为求解列车的运行速度和运行时分，都与运行距离S有关，反映到图形上即为VS曲线和TS曲线。In the subway longitudinal section line design model, the two most important indicators in the train traction calculation are the train's running speed and running time, both of which are related to the running distance S, and are reflected in the graph as the VS curve and TS curve.

求解运行时分t和运行距离S，目前主要有直接积分法、图解法和近似积分法三种方法。其中近似积分法是目前广泛采用的方法，也是牵引电算采用的方法。There are three main methods for solving the running time t and running distance S. There are three main methods: direct integration method, graphical method and approximate integration method. Among them, the approximate integration method is the most widely used method and is also the method used by traction computer.

近似积分法解算列车运动方程式是将运行时分和距离公式中的积分上、下限间划分为若干微小的速度间隔隔，取范围内平均速度时的单位合力，设间隔内的初、末速为，则得近似积分法解算列车运行时分和运行距离的公式为：The approximate integral method for solving the train motion equation is to divide the upper and lower limits of the integral in the running time and distance formula into several small speed intervals. , Pick The unit force at the average speed within the range is The initial and final velocities in the interval are , then the formula for calculating the train running time and running distance using the approximate integral method is:

运行时分Run time

； ;

注明：取得愈小，计算结果愈精确，本实施例计算取。Note: The smaller the value, the more accurate the calculation result. .

单位合力的计算Unit Force Calculation

； ;

列车运行能耗是列车运行过程中用于克服列车运行阻力做功、增加列车动能和克服重力势能差的能耗，运行能耗计算因子包括反向坡度限制、限制速度和载客量。Train operation energy consumption refers to the energy consumed during train operation to overcome train running resistance, increase train kinetic energy and overcome gravitational potential energy difference. The calculation factors of operation energy consumption include reverse slope limit, speed limit and passenger capacity.

反向坡度限制Reverse Slope Limit

在非平直线路上，由于列车需要不停地加减速，以便达到预定的运行速度，因此其能耗相对于平直线路有很大的增幅。当变坡点坡度差过大（相邻两反向陡坡段），列车交替降速加速，影响列车运行能耗也影响行车平稳性和降低乘客舒适度。实际工程规定两相反方向坡段连接时，其中一个方向坡度不应大于，在二期工程放宽至，故本实施例反向限制坡度对地铁纵断面线路节能设计影响时，反向限制坡度取值如下：On non-straight roads, since trains need to constantly accelerate and decelerate in order to reach the predetermined running speed, their energy consumption increases significantly compared to straight roads. When the gradient difference at the slope change point is too large (two adjacent opposite steep slopes), the train slows down and accelerates alternately, which affects the train's running energy consumption, driving stability, and reduces passenger comfort. Actual engineering regulations stipulate that when two opposite slopes are connected, the slope in one direction should not be greater than , in the second phase of the project, it was relaxed to Therefore, when the reverse limit slope of this embodiment affects the energy-saving design of the subway longitudinal section line, the reverse limit slope value is as follows:

； ;

式中：为反向限制坡度。Where: Limit the slope in the opposite direction.

设计速度Design speed

坡段与设计速度相匹配，可以提高线路势能的利用并减少不必要制动带来的能量损失。列车运行过程中的能耗主要用于克服运行阻力，列车维持既有速度或加速时，克服阻力做功会能耗增加，且空气阻力与列车行车速度成平方关系。列车设计速度对列车运行行为有明显的影响，本实施例设计速度对地铁纵断面线路节能设计影响时，设计速度取值如下：Matching the slope section with the design speed can improve the utilization of line potential energy and reduce energy loss caused by unnecessary braking. The energy consumption during train operation is mainly used to overcome running resistance. When the train maintains the existing speed or accelerates, the energy consumption will increase when overcoming resistance, and air resistance is proportional to the square of the train speed. Train design speed It has a significant impact on the train running behavior. When the design speed of this embodiment affects the energy-saving design of the subway longitudinal section line, the design speed value is as follows:

载客量Passenger Capacity

载客量对列车运行能耗的影响主要体现在对列车牵引总重的影响上，通常情况下列车牵引质量越大，要求列车启动、制动力矩越大，满足运营需要所需的牵引电机耗电量也越大，从而造成能耗随之增加。本实施例载客量对地铁纵断面线路节能设计影响时，载客量取值如下：The impact of passenger capacity on train operation energy consumption is mainly reflected in the impact on the total traction weight of the train. Generally, the greater the traction mass of the train, the greater the starting and braking torque required for the train, and the greater the power consumption of the traction motor required to meet the operation needs, resulting in an increase in energy consumption. When the passenger capacity of this embodiment affects the energy-saving design of the subway longitudinal section line, the passenger capacity is taken as follows:

式中：N为载客量，为空车，为定员载客，为超员载客。Where: N is the passenger capacity, For an empty car, For full capacity , Overload .

Q-Learning算法在探索的每一步都会利用下一状态的进行更新。这个思路直接用于DQN会出现如下问题：The Q-Learning algorithm uses the next state at each step of exploration. If this idea is directly applied to DQN, the following problems will occur:

（1）训练神经网络的前提是假设训练数据独立同分布，而智能体交互得到的顺序数据之间存在很强关联性，易造成网络训练不稳定。(1) The premise of training neural networks is to assume that the training data is independent and identically distributed. However, there is a strong correlation between the sequential data obtained by the interaction of intelligent agents, which can easily cause instability in network training.

（2）DQN网络的参数不断更新，用相同网络生成导致神经网络的时序差分目标不断变动，不利于算法的收敛。(2) The parameters of the DQN network are continuously updated, and the same network is used to generate This causes the temporal difference target of the neural network to change continuously, which is not conducive to the convergence of the algorithm.

（3）训练过程前期模型不够稳定，值函数估计存在偏差，使用会导致模型过高估计某一动作的预期收益，误导智能体选择错误动作导致模型无法找到最优的策略。(3) The model is not stable enough in the early stage of training, and the value function estimation has deviations. This will cause the model to overestimate the expected benefit of a certain action, misleading the agent to choose the wrong action and causing the model to be unable to find the optimal strategy.

深度强化学习算法为改进的D3QN算法，改进如下：The deep reinforcement learning algorithm is the improved D3QN algorithm, which is improved as follows:

a、使用经验重放机制(Experience Replay)将交互得到的经验逐条存储在经验池中，积累到一定数量后，模型每步从经验池中随机抽取一定批次的数据训练神经网络；a. Use the experience replay mechanism to store the interactive experiences one by one in the experience pool. After accumulating a certain amount, the model randomly extracts a certain batch of data from the experience pool to train the neural network at each step;

b、构造两个结构相同的神经网络，分别为估值网络和目标值网络，估值网络用于给定状态，计算采取选择动作的期望累积奖励，估值网络参数不断更新；目标网络用于计算时序差分目标值，目标值网络参数固定不动，每隔一段时间替换为最新的估值网络参数；目标值计算公式如下：b. Construct two neural networks with the same structure, namely the valuation network and target value network , the valuation network is used for a given state , calculate the selected action The expected cumulative reward of , the estimated network parameters Continuously updated; the target network is used to calculate the timing difference target value , target value network parameter Fixed, replaced with the latest estimated network parameters at regular intervals ; Target value The calculation formula is as follows:

一段时间内保持不变导致估值网络收敛目标相对固定；估值网络和目标值网络产生的最大值函数的动作不一定相同，用产生动作，计算目标值，能避免模型选到被高估的次优动作，有效解决DQN算法的过估计问题。 Remaining the same over a period of time leads to a valuation network Convergence Target Relatively fixed; the actions of the maximum function generated by the valuation network and the target value network are not necessarily the same. Produce action, Calculating the target value can prevent the model from selecting overestimated suboptimal actions, effectively solving the overestimation problem of the DQN algorithm.

c、对神经网络结构做出改进，将其输出端分流为两部分，一部为表征各状态好坏的状态值函数，另一部为区别特定状态各动作好坏的优势函数c. Improve the neural network structure and divide its output into two parts, one of which is the state value function that characterizes the quality of each state. , and the other is the advantage function that distinguishes the good and bad actions in a specific state

e、以均方差E作为损失函数更新网络参数如下：e. Update the network parameters using mean square error E as the loss function as follows:

。 .

改进后得到的D3QN算法流程如图2所示。The improved D3QN algorithm flow is shown in Figure 2.

1、初始化：初始化估值神经网络和目标神经网络，这两个网络都是用来估计动作值函数的深度神经网络。同时，初始化经验回放缓冲区，用于存储智能体与环境交互的经验元组。1. Initialization: Initialize the valuation neural network and the target neural network, both of which are deep neural networks used to estimate the action value function. At the same time, initialize the experience playback buffer to store the experience tuples of the agent's interaction with the environment.

2、收集经验：智能体与环境进行交互，根据当前策略选择动作，并观察环境反馈的下一个状态和即时奖励。将这些经验元组存储到经验回放缓冲区中，以便后续的训练使用。2. Collect experience: The agent interacts with the environment, selects actions based on the current strategy, and observes the next state and immediate reward of the environment feedback. These experience tuples are stored in the experience playback buffer for subsequent training.

3、训练估值网络：从经验回放缓冲区中随机抽取一批经验元组。对于每个经验元组，使用估值神经网络估计当前状态的动作值。然后，使用目标神经网络估计下一个状态的动作值。计算Q-learning的目标值，更新估值神经网络的参数，使其逼近目标值。3. Train the valuation network: Randomly extract a batch of experience tuples from the experience replay buffer. For each experience tuple, use the valuation neural network to estimate the action value of the current state. Then, use the target neural network to estimate the action value of the next state. Calculate the target value of Q-learning and update the parameters of the valuation neural network to make it close to the target value.

4、更新目标网络：定期更新目标神经网络的参数，通过将估值神经网络的参数复制给目标神经网络。这有助于稳定训练过程，减少估计目标的抖动。4. Update the target network: Regularly update the parameters of the target neural network by copying the parameters of the estimated neural network to the target neural network. This helps stabilize the training process and reduce jitter in the estimated target.

5、选择动作：根据当前策略选择动作。可以使用ε-greedy等策略来在探索与利用之间进行权衡。5. Select action: Select an action based on the current strategy. You can use strategies such as ε-greedy to balance exploration and exploitation.

6、迭代训练：重复执行步骤2到步骤5，不断收集经验、更新估值网络、更新目标网络，并优化智能体的决策策略。6. Iterative training: Repeat steps 2 to 5 to continuously collect experience, update the valuation network, update the target network, and optimize the decision-making strategy of the agent.

7、收敛与评估：随着训练的进行，观察算法的性能如何收敛。可以定期在环境中进行评估，测试训练后的智能体在新场景中的表现。7. Convergence and evaluation: As training progresses, observe how the algorithm's performance converges. You can periodically evaluate in the environment to test how the trained agent performs in new scenarios.

8、结束训练：当达到预定的训练步数或算法收敛时，结束训练过程。智能体的估值神经网络即为最终训练结果，可用于实际应用中做出决策。8. End training: When the predetermined number of training steps is reached or the algorithm converges, the training process ends. The agent's valuation neural network is the final training result and can be used to make decisions in practical applications.

案例Case

本实施例先选取成都某地铁线路中一个典型区间作为研究对象，求解不同运行能耗计算因子下的地铁纵断面最优节能线路。最后将优化设计扩展到成都某地铁线路段，并与经验丰富的人类设计时生成的方案进行比较。This example first selects a typical section of a subway line in Chengdu as the research object, and solves the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors. Finally, the optimization design is extended to a subway line section in Chengdu and compared with the solution generated by experienced human designers.

主要约束条件见表1The main constraints are shown in Table 1

表1 主要约束条件Table 1 Main constraints

不同参数下的结果Results under different parameters

利用建立的以列车运行能耗最小为目标的地铁纵断面线路设计D3QN模型，分析不同能耗计算因子对成都某地铁区间纵断面线路节能设计影响。The D3QN model for subway longitudinal section line design, which aims to minimize train operation energy consumption, is used to analyze the impact of different energy consumption calculation factors on the energy-saving design of the longitudinal section line of a certain subway section in Chengdu.

（1）反向坡度限制影响分析(1) Analysis of the impact of reverse slope limitation

这里分析反向坡度限制对于纵断面线路节能设计影响，其他能耗计算因子固定取值为：。Here we analyze the reverse slope limit For the impact of energy-saving design on longitudinal section lines, other energy consumption calculation factors are fixed at: .

①模型优化效果：在不同反向坡度限制条件下，节能设计线型计算结果与原线型计算结果相比，能耗减少了3.51%～3.83%，并且随着反向坡度限制的增加能耗降低越多，各部分运行时分与原时间接近。① Model optimization effect: Under different reverse slope limit conditions, the energy consumption of the energy-saving design linear calculation results is reduced by 3.51% to 3.83% compared with the original linear calculation results. Moreover, the energy consumption is reduced more with the increase of reverse slope limit, and the running time of each part is close to the original time.

②线型变化：节能设计线型的加速坡坡长比原线型坡长长，由于反向坡度限制的原因，为了能够更好让势能转化为动能，列车能够更快速到达设计速度，节能设计线型的第一个加速坡坡度数值随着反向限制坡度的减小而增加。第二个加速坡坡度数值随着反向坡度限制的增加而增加。② Line type change: The acceleration slope of the energy-saving design line type is longer than that of the original line type. Due to the reverse slope limit, in order to better convert potential energy into kinetic energy and enable the train to reach the design speed more quickly, the value of the first acceleration slope of the energy-saving design line type increases as the reverse slope limit decreases. The value of the second acceleration slope increases as the reverse slope limit increases.

（2）设计速度影响分析(2) Analysis of the impact of design speed

这里分析设计速度对于纵断面线路节能设计影响，其他能耗计算因子固定取值为：。Here we analyze the impact of design speed on energy-saving design of longitudinal section lines, and other energy consumption calculation factors are fixed at: .

①模型优化效果：在不同设计速度条件下，节能设计线型计算结果与原线型计算结果相比，能耗减少了1.32%～14.14%，运行时分随着速度的增加而减少，该模型针对80km/h附近的设计速度优化效果明显，是因为原线型在80km/h附近的设计速度计算能耗偏大，侧面证明原线型不是一个“优秀”的节能坡线型。① Model optimization effect: Under different design speed conditions, the energy consumption of the energy-saving design line type calculation results is reduced by 1.32% to 14.14% compared with the original line type calculation results. The running time decreases with the increase of speed. The model has obvious optimization effect for the design speed near 80km/h. This is because the energy consumption of the original line type at the design speed near 80km/h is too large, which indirectly proves that the original line type is not an "excellent" energy-saving slope line type.

②线型变化：线路节能设计线型的第一个加速坡坡度数值随着设计速度的增加而增加，推测为了让列车能够更快速到达设计速度，从而增加第一个加速坡坡度值，能够更好让势能转化为动能。线型差距比线型差距大，是为了绕过平面缓和曲线段。② Line shape change: The slope value of the first acceleration slope of the line energy-saving design line shape increases with the increase of the design speed. It is speculated that in order to allow the train to reach the design speed faster, the slope value of the first acceleration slope is increased, which can better convert potential energy into kinetic energy. Linear gap ratio The large difference in line type is to bypass the plane transition curve segment.

（3）载客量影响分析(3) Analysis of the impact of passenger volume

这里分析载客量对于纵断面线路节能设计影响，其他能耗计算因子固定取值为：。Here we analyze the passenger volume For the impact of energy-saving design on longitudinal section lines, other energy consumption calculation factors are fixed at: .

①模型优化效果：在载客量条件下，节能设计线型计算结果与原线型计算结果相比，能耗减少了2.33%～19.71%，运行时分随着载客量的增加而增加，该模型针对空载优化效果明显，运行时分降低了12.5s，运营能耗降低19.71%。① Model optimization effect: Under the passenger capacity condition, the energy consumption of the energy-saving design linear calculation results is reduced by 2.33% to 19.71% compared with the original linear calculation results. The operating time increases with the increase of passenger capacity. The model has obvious optimization effect for no-load, with the operating time reduced by 12.5s and the operating energy consumption reduced by 19.71%.

②线型变化：虽然坡度随着载客量的增加有所增加，但增加幅度不大，主要变化是加速坡坡长进行了延长，能够有效增加巡航段时间，从而达到降低能耗目的。② Linear changes: Although the slope increases with the increase in passenger capacity, the increase is not large. The main change is that the length of the acceleration slope has been extended, which can effectively increase the cruising time and thus achieve the purpose of reducing energy consumption.

本实施例构建了面向地铁纵断面线路设计的深度强化学习模型，在无人工经验的情况下对选线环境进行感知、搜索、判断、决策，通过对不同约束条件反馈，寻找到最优节能的线路方案。本实施例相较于实际纵断面线路设计方案可同时降低列车运行能耗成本和时间成本。This embodiment builds a deep reinforcement learning model for subway longitudinal section line design, which senses, searches, judges, and makes decisions on the line selection environment without manual experience, and finds the optimal energy-saving line plan by feedback on different constraints. Compared with the actual longitudinal section line design plan, this embodiment can reduce the energy consumption cost and time cost of train operation at the same time.

以上示意性的对本发明及其实施方式进行了描述，该描述没有限制性，附图中所示的也只是本发明的实施方式之一，实际的结构并不局限于此。所以，如果本领域的普通技术人员受其启示，在不脱离本发明创造宗旨的情况下，不经创造性的设计出与该技术方案相似的结构方式及实施例，均应属于本发明的保护范围。The above is a schematic description of the present invention and its implementation methods, which is not restrictive. The drawings show only one implementation method of the present invention, and the actual structure is not limited thereto. Therefore, if a person skilled in the art is inspired by it and does not deviate from the purpose of the invention, he or she can design a structure and an embodiment similar to the technical solution without creativity, which should fall within the protection scope of the present invention.

Claims

1. A method for optimizing energy-saving design of subway longitudinal sections based on deep reinforcement learning, characterized in that it includes the following steps:

1. Combining the constraints of subway line design specifications with the constraints of actual construction conditions, a subway longitudinal section line design model with the goal of minimizing train operation energy consumption is established;

2. Use deep reinforcement learning algorithm to solve the optimal energy-saving route of the subway longitudinal section under different operating energy consumption calculation factors.

2. According to the method for optimizing energy-saving design of subway longitudinal section line based on deep reinforcement learning in claim 1, it is characterized in that: the subway longitudinal section line design model includes:

1) Environment: The avoidance area and the transition curve section of the horizontal line constitute the environment for the optimization of the subway longitudinal section line;

2) Agent: It is defined as an intelligent program that determines the direction of energy-saving design of longitudinal sections;

3) State: The spatial position of the slope change point is defined as the state , spatial position refers to the two-dimensional coordinates of the subway longitudinal section, including the longitudinal mileage coordinates and the vertical depth coordinates;

4) Action: The search direction of the next slope change point selected by the agent is defined as the action ;

5) Reward: The value of the reward depends on the feedback from the environment. The energy consumption of the train is the main component of the reward, and the other components are the survival status reward and the target distance reward;

6) Condition: If the agent cannot find a state that satisfies arrive All constraints actions , that is, the current state change condition Not established condition , and the current state The initialization state is ; Otherwise, the state change condition Establishment condition , continue to select the next action.

3. The method for optimizing energy-saving design of subway longitudinal sections based on deep reinforcement learning according to claim 2 is characterized in that: the subway line design specification constraints and actual construction condition constraints include:

(1) Constraints on slope length and slope gradient in the station area

Only one ramp is set up in the station area, that is, the sum of the lengths of the entrance and exit ramps is greater than the platform length. , assuming that the lengths of the entrance and exit slopes are the same, then the slope length of the station area is The constraints are expressed as:

;

Station slope Use the given constant ,Right now:

;

(2) Slope length and slope constraints in non-station areas

The slope length constraint of the non-station area is expressed as:

;

The line has a minimum slope and maximum slope ,Right now:

;

(3) Minimum straight line length

The length of the straight line between two adjacent vertical curves It should meet the design specification requirements, and the calculation formula is as follows:

;

In the formula, Indicates The longitudinal mileage of each slope change point; Indicates The length of the tangent line at the slope change point; Indicates the minimum straight line length in the design specification;

(4) Reverse limit slope

When two slope sections in opposite directions are connected, the slope in one direction is Should not be greater than the reverse limit slope ,Right now:

;

(5) Line burial depth constraints

The line depth constraint is that any point on the line Design elevation of the track top of the underground tunnel , should be less than the ground elevation of the point Subtract tunnel height and minimum cover thickness ,Right now:

;

(6) Avoidance area constraints

All avoidance areas of subway longitudinal sections Indicates that the subway longitudinal section line tunnel area is used express, and The intersection of should be an empty set:

;

(7) Plane transition curve segment constraints

Slope change point mileage coordinates The coordinates of the start and end points of the plane transition curve The distance should not be less than the tangent length of the vertical curve, that is:

.

4. The method for optimizing energy-saving design of subway longitudinal sections based on deep reinforcement learning according to claim 3 is characterized in that: in the state State, The state space at the end of each action is defined as , as shown below:

;

in, For the The longitudinal mileage coordinates of the slope change points, For the The vertical elevation coordinates of the slope change points, W and H are the upper limits of the mileage and depth of the target optimization area respectively; in the above formula, the overall position of two consecutive slope change points is taken as a state in the longitudinal section line design optimization model.

5. The method for optimizing energy-saving design of subway longitudinal sections based on deep reinforcement learning according to claim 4 is characterized in that: in the action Action, the action The status Convert to status ,action It is expressed in two parts as follows:

;

in and is the length and slope of the slope; The relationship between the three is:

.

6. The method for optimizing energy-saving design of subway longitudinal sections based on deep reinforcement learning according to claim 5 is characterized in that: in the reward Reward, the reward The formula is as follows:

;

in They represent the unit operating energy cost, survival cost, and distance cost to the destination respectively; for The weight of .

7. The method for optimizing energy-saving design of subway longitudinal section lines based on deep reinforcement learning according to claim 6 is characterized in that: in the subway longitudinal section line design model, the train motion equation is solved by approximate integration method to solve the train running energy consumption, and the upper and lower limits of the integral in the running time and distance formula are divided into several small speed intervals. , let the initial and final velocities in the integral be , then the formula for calculating the train running time and running distance using the approximate integral method is:

Run time

;

Unit Force Calculation

Under different working conditions, it can be written in the following form:

;

In the formula, For traction; The mass of the motor vehicle; For the trailer quality; is the unit braking force of the train; is the common braking coefficient; is the unit traction force, Unit resistance of the train, including basic resistance of train operation , ramp additional resistance , curve additional resistance , additional tunnel resistance .

8. According to the deep reinforcement learning-based optimization method for energy-saving design of subway longitudinal sections of claim 7, it is characterized in that the operating energy consumption calculation factors include reverse slope limit, speed limit and passenger capacity.

9. The method for optimizing energy-saving design of subway longitudinal sections based on deep reinforcement learning according to claim 8 is characterized in that the deep reinforcement learning algorithm is an improved D3QN algorithm, which is improved as follows:

a. Use the experience replay mechanism to store the interactive experiences one by one in the experience pool. After accumulating a certain amount, the model randomly extracts a certain batch of data from the experience pool to train the neural network at each step;

b. Construct two neural networks with the same structure, namely the valuation network and target value network , the valuation network is used for a given state , calculate the action to take The expected cumulative reward of , the estimated network parameters Continuously updated; the target network is used to calculate the timing difference target value , target value network parameter Fixed, replaced with the latest estimated network parameters at regular intervals ; Target value The calculation formula is as follows:

;

In the formula, For immediate reward; Table is the discount factor; Indicates that in the next state In Maximizing Action ;

Remaining the same over a period of time leads to a valuation network Convergence Target Relatively fixed;

c. Improve the neural network structure and divide its output into two parts, one of which is the state value function that characterizes the quality of each state. , and the other is the advantage function that distinguishes the good and bad actions in a specific state :

;

In the formula, For strategy, is the parameter of the advantage function;

d. The target value of the D3QN model is as follows:

;

e. Mean square error As the loss function, the network parameters are updated as follows:

.