CN117521485A - Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning - Google Patents

Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning Download PDF

Info

Publication number
CN117521485A
CN117521485A CN202311330407.6A CN202311330407A CN117521485A CN 117521485 A CN117521485 A CN 117521485A CN 202311330407 A CN202311330407 A CN 202311330407A CN 117521485 A CN117521485 A CN 117521485A
Authority
CN
China
Prior art keywords
subway
slope
longitudinal section
line
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311330407.6A
Other languages
Chinese (zh)
Other versions
CN117521485B (en
Inventor
何庆
徐双婷
高天赐
杨东营
冯晓云
王青元
孙鹏飞
朱颖
王平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202311330407.6A priority Critical patent/CN117521485B/en
Publication of CN117521485A publication Critical patent/CN117521485A/en
Application granted granted Critical
Publication of CN117521485B publication Critical patent/CN117521485B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Train Traffic Observation, Control, And Security (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of subway longitudinal section line design, in particular to an energy-saving design optimizing method for a subway longitudinal section line based on deep reinforcement learning, which comprises the following steps: 1. combining subway line design specification constraint and actual construction condition constraint, and establishing a subway longitudinal section line design model with the minimum train operation energy consumption as a target; 2. and solving the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm. Compared with the actual longitudinal section line design scheme, the train operation energy consumption cost and time cost can be reduced simultaneously.

Description

Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning
Technical Field
The invention relates to the technical field of subway longitudinal section line design, in particular to an energy-saving design optimizing method for a subway longitudinal section line based on deep reinforcement learning.
Background
Existing researches show that the train traction energy consumption mainly depends on line conditions, train operation organizations (train scheduling, driving strategies, stop schemes and the like) [2, 3] [2, 3] { Douglas, 2015#37; zhou, 2018#60 }, the train operation organizations are limited in energy saving effect due to the restriction of the line conditions, if the traction energy consumption is required to be further reduced, energy saving is required to be considered in a line design stage, the core content of the line design is a flat profile design, wherein the flat profile design aims at maximizing the passenger flow attraction, the energy saving requirement is less considered, and the profile design is more closely related to the traction energy consumption. The subway traffic line mainly has three laying modes of underground lines, overhead lines and ground lines. The underground line sections all pass through the underground, the gradient is generally not limited by the gradient, and the energy-saving slopes (high-station and low-station) are most conditionally used only by the factors such as underground buildings, pile foundations, underground pipelines and the like along the line.
The energy-saving slope current stage research is to combine the longitudinal section line design principle and traction simulation calculation to sum up the general principle of the longitudinal section energy-saving line design form and analyze the energy-saving effect, and the learner researches to find that the longitudinal section arrangement form of the outbound downhill slope and the inbound uphill slope is beneficial to reducing the traction energy consumption of the train. A scholars propose an energy-saving slope design method for changing the slope of an energy-saving slope by changing the elevation value of a station. However, the energy-saving slope parameters are often selected according to the tests, and the optimal energy-saving effect cannot be achieved.
With the rise of intelligent algorithms, students at home and abroad combine the design work of the vertical section line with the intelligent algorithms, and the study of automatically optimizing the scheme of the vertical section line by utilizing a computer gradually becomes a hot topic in the field of the design study of the vertical section line. The method is characterized in that a learner builds an optimization model of the urban rail transit line horizontal and vertical section on the basis of considering train operation behaviors, and adopts a genetic algorithm to solve the optimal design scheme of the line in the three-dimensional space. The learner uses a distance transformation algorithm to simultaneously optimize the railway line shape of the mountain area and the station position on the basis of considering the coupling constraint of the railway route and the station position. The learner has proposed a multi-stage decision model that jointly optimizes the vertical section line, cruise speed, and coast operating point to achieve the lowest cost solution. Based on geographic information, students adopt a multistage augmentation differential evolution algorithm to solve the design of the intercity railway horizontal and vertical section line.
Although the above researches can meet the requirements of related design specifications and are used for optimizing the energy conservation of the longitudinal section line, the methods adopted by the researches have certain limitations.
(1) Neglecting the design of the vertical section line to avoid the practical engineering constraint (underground building, pile foundation, drainage pipeline, bad geology, etc.).
(2) The PSO, GA or PSO-GA algorithm requires a predefined number of profile line intersections as inputs. Thus, these methods are applicable to "optimized" lines, rather than "designed" lines, and only optimized lines at this number of intersections can be found.
(3) Even though GA, PSO, DT and other evolutionary-based algorithms have improved significantly, they still cannot learn like humans and are difficult to implement for primary optimization.
Disclosure of Invention
The invention provides an energy-saving design optimizing method for a subway longitudinal section line based on deep reinforcement learning, which can be used for obtaining an optimal energy-saving line of the subway longitudinal section.
The invention relates to a subway longitudinal section line energy-saving design optimizing method based on deep reinforcement learning, which comprises the following steps:
1. combining subway line design specification constraint and actual construction condition constraint, and establishing a subway longitudinal section line design model with the minimum train operation energy consumption as a target;
2. and solving the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm.
Preferably, the subway longitudinal section line design model includes:
1) Environmental Environment: the avoidance area and the planar line gentle curve section area form an environment for optimizing the subway longitudinal section line;
2) Agent: the intelligent program is defined as an intelligent program for determining the energy-saving design trend of the longitudinal section line;
3) State: the spatial position of the change slope point is defined as the stateThe space position refers to that the two-dimensional coordinates of the vertical section of the subway comprise a longitudinal mileage coordinate and a vertical depth coordinate;
4) Action: the search direction of the next slope change point selected by the agent is defined as an action
5) Awarding Reward: the value of the rewards depends on feedback of the environment, train operation energy consumption is taken as a main constituent part of the rewards, and other constituent parts are survival state rewards and target distance rewards;
6) Condition: if the agent cannot find the slave state satisfiedTo->Action of all constraints->I.e. current state change condition->Condition of not being satisfied->And the current state +>The initialization state is +.>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, the condition of state change->Condition is satisfied->The selection of the next action is continued.
Preferably, the subway line design specification constraint and the actual construction condition constraint include:
(1) Station regional slope section slope length and slope constraint
Only one slope section is arranged in the station area, namely the sum of the lengths of the entering and exiting slope sections is larger than the length of the stationAssuming that the lengths of the entrance slope section and the exit slope section are the same, the slope length of the station area is +.>The constraint is expressed as:
station gradientUsing a given constant +.>The method comprises the following steps:
(2) Slope length and slope constraint of non-station area slope section
The slope length constraint of the non-station area slope section is expressed as follows:
the line is provided with the minimum gradientAnd maximum grade->The method comprises the following steps:
(3) Minimum clip straight length
Length of straight line between two adjacent vertical curvesThe design specification requirements should be met, and the calculation formula is as follows:
in the method, in the process of the invention,indicate->Longitudinal mileage of each slope change point;Indicate->Tangent length at each slope change point;Representing the minimum clip line length in the design specification;
(4) Reverse limit grade
When the slope sections in two opposite directions are connected, the slope in one directionShould not be greater than the reverse limit gradientThe method comprises the following steps:
(5) Line burial depth constraint
The line burial depth is limited by any point on the lineRail head design elevation of underground tunnel>Should be smaller than the ground elevation where the point is located +.>Minus tunnel height +.>Minimum earth thickness +.>The method comprises the following steps:
(6) Avoidance zone constraints
For all avoidance areas of subway longitudinal sectionIndicating the subway longitudinal section line tunnel area is used +.>Indicating (I)>And->Should be an empty set:
(7) Plane moderating curve segment constraint
Distance coordinates of slope changing pointStart-end mileage coordinates with plane relaxation curve +.>The distance of (2) should not be smaller than the tangential length of the vertical curve, i.e.:
preferably, in the State, the firstThe state space at the end of the individual actions is defined as +.>The following formula is shown:
wherein, is->Longitudinal mileage coordinates of each slope change point, +.>Is->The vertical elevation coordinates of the variable slope points, W and H are respectively the upper limits of mileage and depth of the target optimization area; in the above formula, the overall position of two continuous variable slope points is taken as one state in the longitudinal section line design optimization model.
Preferably, in the Action, the ActionStatus +.>Transition to State->Action->Represented in two parts, the formula:
wherein the method comprises the steps ofAnd->Is the length and gradient of the slope section;The relation among the three is as follows:
preferably, in rewarding Reward, rewardsThe formula of (2) is as follows:
wherein the method comprises the steps ofRepresenting the energy consumption cost, survival cost and distance cost from the end point of unit operation respectively;Is->Is a weight of (2).
Preferably, in the subway longitudinal section line design model, an approximate integration method is adopted to solve a train motion equation so as to solve the train operation energy consumption, and the integration upper limit and the integration lower limit in the operation time and distance formula are divided into a plurality of micro speed intervalsLet the initial and final speeds in the integral be +.>The formulas for calculating the running time and the running distance of the train by the approximate integration method are as follows:
time division of operation
Unit resultant forceCalculation of (2)
Under different working conditions, the following forms can be written:
in the method, in the process of the invention,is traction force;The mass of the motor car;Is the trailer mass;Braking force is a train unit;Is a common braking coefficient;Is traction unit->Resistance units for trains, including basic resistance to train operationRamp additional resistance +.>Curve additional resistance->Additional resistance of tunnel->
Preferably, the operating energy consumption calculation factors include reverse grade limit, limiting speed, and passenger capacity.
Preferably, the deep reinforcement learning algorithm is an improved D3QN algorithm, the improvement being as follows:
a. the experience replay mechanism is used for storing experiences obtained through interaction in an experience pool one by one, and after a certain number of experiences are accumulated, a model randomly extracts a certain batch of data training neural network from the experience pool in each step;
b. constructing two neural networks with the same structure, namely an estimation networkAnd a target value networkThe evaluation network is used for a given state->Calculate action taken->Estimating network parameters for a desired jackpot of (2)Continuously updating; the target network is used for calculating a time sequence differential target value ++>Target value network parameter->Immobilized, replaced by the latest estimated network parameters at intervals +.>The method comprises the steps of carrying out a first treatment on the surface of the Target value->The calculation formula is as follows:
in the method, in the process of the invention,for immediate rewards;The table is a discount factor;Indicates in the next state->Adopts the formula of->Maximized action->
The remaining unchanged for a period of time results in an evaluation network +.>Convergence goal->Relatively fixed;
c. the neural network structure is improved, the output end of the neural network structure is divided into two parts, and one part is a state value function for representing the quality of each stateThe other part is a dominance function for distinguishing the quality of each action in a specific state>
In the method, in the process of the invention,for policy (S)>Parameters that are dominance functions;
d. the target value of the D3QN model is as follows:
e. by mean square errorThe update network parameters as a loss function are as follows:
the invention builds a deep reinforcement learning model for subway longitudinal section line design, perceives, searches, judges and decides a line selection environment under the condition of no artificial experience, and finds an optimal energy-saving line scheme by feeding back different constraint conditions. Compared with the actual longitudinal section line design scheme, the train operation energy consumption cost and time cost can be reduced simultaneously.
Drawings
FIG. 1 is a flow chart of a method for optimizing energy-saving design of a subway longitudinal section line based on deep reinforcement learning in an embodiment;
fig. 2 is a flowchart of the D3QN algorithm obtained after the improvement in the embodiment.
Detailed Description
For a further understanding of the present invention, the present invention will be described in detail with reference to the drawings and examples. It is to be understood that the examples are illustrative of the present invention and are not intended to be limiting.
Examples:
as shown in fig. 1, the embodiment provides a method for optimizing energy-saving design of a subway longitudinal section line based on deep reinforcement learning, which comprises the following steps:
1. combining subway line design specification constraint and actual construction condition constraint, and establishing a subway longitudinal section line design model with the minimum train operation energy consumption as a target;
2. and solving the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm.
The subway longitudinal section line design model comprises:
1) Environmental Environment: the avoidance area (underground building, pile foundation, underground pipeline route, bad geology, etc.) and the planar line gentle curve section area form the environment for optimizing the subway longitudinal section line;
2) Agent: the intelligent program is defined as an intelligent program for determining the energy-saving design trend of the longitudinal section line;
3) State: the spatial position of the change slope point is defined as the stateThe space position refers to that the two-dimensional coordinates of the vertical section of the subway comprise a longitudinal mileage coordinate and a vertical depth coordinate;
4) Action: the search direction of the next slope change point selected by the agent is defined as an action
5) Awarding Reward: the value of the rewards depends on feedback of the environment, train operation energy consumption is taken as a main constituent part of the rewards, and other constituent parts are survival state rewards and target distance rewards;
6) Condition: if the agent cannot find the slave state satisfiedTo->Action of all constraints->(Permeability->And (2) gradient->) I.e. current state change condition->Condition of not being satisfied->And the current state +>The initialization state is +.>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, the condition of state change->Condition is satisfied->The selection of the next action is continued.
The subway line design specification constraint and the actual construction condition constraint comprise:
(1) Station regional slope section slope length and slope constraint
In a station area, if a train runs across a slope section, the vibration superposition of the train is caused, and the stable and safe running of the train is not facilitated. For this purpose, it is preferable to provide only one ramp in the station area, i.e. the sum of the lengths of the inbound and outbound ramp is greater than the length of the platformIf the lengths of the entrance slope section and the exit slope section are the same, the slope length of the station area is +.>The constraint is expressed as:
for the slope of the station area, in order to ensure that the phenomena of station-stopping train sliding, station luggage sliding and the like do not occur, the slope is as small as possible under the premise of meeting the drainage requirement, and the slope of the station in the embodiment is as small as possibleUse of a given constant +.>The method comprises the following steps:
(2) Slope length and slope constraint of non-station area slope section
Subway design Specification requires the length of any segment in a non-station areaShould be no less than the long-term subway train grouping length +.>However, as the interval between stations of the subway line is shorter and the drainage ditch should be arranged in the interval, the length of a single slope section is not suitable to be too long, and the constraint of the slope length of the slope section in the non-station area is expressed as follows:
the gradient of the section positive line slope section is not excessively large due to the limitation of the traction capacity of the train. The maximum gradient of the positive line regulated by the subway design specification is not suitable to exceed the maximum gradientIn addition, in order to facilitate drainage in the section, the line should be provided with a minimum gradient +.>The method comprises the following steps:
(3) Minimum clip straight length
Length of straight line between two adjacent vertical curvesThe design specification requirements should be met, and the calculation formula is as follows:
in the method, in the process of the invention,indicate->Longitudinal mileage of each slope change point;Indicate->Tangent length at each slope change point;Representing the minimum clip line length in the design specification.
(4) Reverse limit grade
Although the subway design specification does not explicitly specify the maximum value of the subway gradient difference. However, in order to improve the comfort of passengers, the design and construction are convenient, and the maintenance and operation cost is reduced, so that when the two opposite slope sections are connected, the slope in one direction is the sameShould not be greater than the reverse limit gradient +>The method comprises the following steps:
(5) Line burial depth constraint
The line burial depth is limited by any point on the lineRail head design elevation of underground tunnel>Should be smaller than the ground elevation where the point is located +.>Minus tunnel height +.>(track-to-tunnel outside diameter distance,fixed value) and a minimum earth thickness +.>(fixed value), namely:
(6) Avoidance zone constraints
The subway longitudinal section line design scheme needs to consider factors such as underground soil quality, building pile foundation, pipelines and the like, and avoids areas which cannot be constructed or have great difficulty. For all avoidance areas of subway longitudinal sectionIndicating for subway longitudinal section line tunnel areaIndicating (I)>And->Should be an empty set:
(7) Plane moderating curve segment constraint
In the vertical curve range, the elevation of the rail surface changes with a certain curvature; in the range of the plane moderating curve, the elevation of the rail surface changes with a certain ultra-high downgrade. If the two are overlapped, on one hand, the elevation of the outer rail is not easy to control during track laying and maintenance; on the other hand, the shape of the straight-line ultrahigh downhill slope and the center vertical curve of the outer rail are changed, so that the running stability is affected. So on the basis of the known plane line type, when the longitudinal section line is designed, the mileage coordinates of the slope point are changedCoordinates of starting and ending mileage of planar relaxation curveThe distance of (2) should not be smaller than the tangential length of the vertical curve, i.e.:
in the State, the energy-saving design of the subway longitudinal section line refers to optimizing the number and the positions of the slope changing points to achieve the aim of energy saving during train operation. First, theThe state space at the end of the individual actions is defined as +.>The following formula is shown:
wherein, is->Longitudinal mileage coordinates of each slope change point, +.>Is->The vertical elevation coordinates of the variable slope points, W and H are respectively the mileage and the upper depth boundary of the target optimization area,
note that at least two change hill locations must be considered to determine if the line meets the constraints. In the above formula, the overall position of two continuous variable slope points is taken as one state in the longitudinal section line design optimization model.
In Action, actionStatus +.>Transition to State->Action->Represented in two parts, the formula:
wherein the method comprises the steps ofAnd->Is the length and gradient of the slope section;The relation among the three is as follows:
in rewarding Reward, take by agentStatus is taken from->Is transformed into->How to evaluate the->Better than other actions, a rewind of the environmental feedback is needed to reflect the quality of the action. RewardsThe formula of (2) is as follows:
wherein the method comprises the steps ofRepresenting the energy consumption cost, survival cost and distance cost from the end point of unit operation respectively;Is->Is a weight of (2). Because the model optimization goal is minimum operating energy consumption, the model optimization goal isIs negative;To ensure that the selected line meets all of the constraints mentioned above;Is selected for AGENT +.>Movement toward the end point can be ensured. Each index is discussed further below.
A. Cost of energy consumption
A. Cost of energy consumption
The energy consumption cost comprises traction phase energy consumption, cruising phase energy consumption and braking phase energy consumption. The running strategy of the train is as follows: accelerating with maximum traction after outbound, transition toAnd then, a cruise working condition, namely constant-speed running is adopted, and when the working condition conversion point of cruise-braking is reached, the train enters a stop in a decelerating way by the maximum limiting power.
a) Traction energy consumption in traction stage
In the train traction acceleration stage, the traction energy consumption of each step length can be calculated according to a train motion equation and a functional conversion relation, and then the traction energy consumption of all the step lengths in the stage is accumulated. Traction energy consumption in traction stageThe method comprises the following steps:
wherein:is->Traction force of a step, which can be dependent on the train speed of the step +.>Searching a specific numerical value on the traction characteristic curve to obtain;The transmission efficiency constant is the train traction motor.
b) Traction energy consumption during cruising phase
The train running speed will enter the cruising stage after reaching the set speed, when the total resistance of the trainWhen the sum of the basic resistance of the train and the additional resistance of the line is positive, a certain traction force is required to maintain constant speed operation, and when the sum is negative, a certain braking force is required to maintain constant speed operation, and the traction force is zero. Thus, the train traction during cruising is at the value:
wherein:correlating the basic resistance of the train with the current running speed;The resistance of the slope section, the curve resistance and the tunnel resistance which are applied to the train when the train runs on the line are related to the current line condition.
Traction energy consumption during cruising phaseThe method comprises the following steps:
c) From stateTo->The total unit energy cost of (2) is calculated as follows:
B. cost of survival
In this embodiment, it is necessary to find a feasible profile route solution that satisfies various constraints of the subway profile environment, so that a survival reward is added to the reward function to encourage the AGENT to satisfy all the constraints when selecting actions.
If the agent cannot find the slave state satisfiedTo->All constraints +.>Then->Take negative value and let current state->The initialization state is +.>Restarting line selection; if the agent can find a condition satisfying the slave status +.>To->All constraints +.>Then->Take positive value and start the next +.>Is selected from the group consisting of (a).
C. Distance cost from end point
The action selected by the agent is prevented from merely satisfying the survival reward, and not moving toward the endpoint. Used hereinTo encourage agent selectionThe action is moving towards the endpoint. Thus we define a reward function +.>From->To the point ofThe following are provided:
in the method, in the process of the invention,status +.>Straight line distance to endpoint, +.>Status +.>Straight line distance to endpoint.
In the subway longitudinal section line design model, the two heaviest index contents in train traction calculation are related to the running distance S when the running speed and the running time of the train are solved, and the two heaviest index contents are reflected on a graph to be a VS curve and a TS curve.
At present, three methods of a direct integration method, a graphic method and an approximate integration method are mainly used for solving the running time t and the running distance S. The approximate integration method is widely adopted at present, and is also adopted by traction electric calculation.
The approximate integral method is to divide the integral upper and lower limits in the running time and distance formula into several tiny speed intervalsGet->Unit resultant force at average speed in the range, +.>The first and last speeds in the interval are +.>The formulas for calculating the running time and the running distance of the train by the approximate integration method are as follows:
time division of operation
The following is noted:the smaller the acquisition, the more accurate the calculation result, the calculation of this embodiment is about>
Unit resultant forceCalculation of (2)
Under different working conditions, the following forms can be written:
in the method, in the process of the invention,is traction force;The mass of the motor car;Is the trailer mass;Braking force is a train unit;Is a common braking coefficient;Is traction unit->Resistance units for trains, including basic resistance to train operationRamp additional resistance +.>Curve additional resistance->Additional resistance of tunnel->
The train running energy consumption is the energy consumption for overcoming the running resistance of the train to do work, increasing the kinetic energy of the train and overcoming the gravitational potential energy difference in the running process of the train, and the running energy consumption calculation factors comprise reverse gradient limit, speed limit and passenger capacity.
Reverse grade limit
On non-flat lines, the energy consumption is greatly increased relative to a flat line because the train needs to be continuously accelerated and decelerated in order to reach a predetermined operating speed. When the gradient difference of the variable slope points is too large (two adjacent reverse steep slope sections), the trains are alternately accelerated at a reduced speed, so that the running energy consumption of the trains is influenced, the running stability is also influenced, and the comfort level of passengers is reduced. When the actual engineering prescribes that two opposite direction slope sections are connected, one direction slope should not be larger thanIn the second phase engineering is relaxed to +.>Therefore, when the reverse limiting gradient of the embodiment affects the energy-saving design of the subway longitudinal section line, the reverse limiting gradient takes the following values:
wherein:to limit the grade in the opposite direction.
Design speed
The slope section is matched with the design speed, so that the utilization of the potential energy of the line can be improved, and the energy loss caused by unnecessary braking can be reduced. The energy consumption in the train running process is mainly used for overcoming running resistance, when the train maintains the existing speed or accelerates, the energy consumption is increased when working against the resistance, and the air resistance and the train running speed are square. Train design speedThe design speed of the embodiment has obvious influence on train operation behavior, and when the design speed of the embodiment has influence on energy-saving design of subway longitudinal section lines, the design speed takes the following values:
passenger capacity
The influence of passenger capacity on train operation energy consumption is mainly reflected on the influence on the total train traction weight, and in general, the larger the train traction quality is, the larger the required train starting and braking moment are, and the larger the power consumption of a traction motor required for meeting the operation requirement is, so that the energy consumption is increased. When the passenger capacity of the embodiment affects the energy-saving design of the subway longitudinal section line, the passenger capacity value is as follows:
wherein: n is the passenger capacity of the vehicle,is empty and is filled with->Carry passenger for the person>For superman carry passenger->
The Q-Learning algorithm will utilize the next state at each step of the explorationAnd updating. The following problems occur when this idea is directly applied to DQN:
(1) The premise of training the neural network is that the training data are assumed to be independent and distributed, and the sequence data obtained by the interaction of the intelligent agents have strong correlation, so that the network training is easy to be unstable.
(2) Parameters of DQN network are updated continuously, generated by same networkThe time sequence differential target of the neural network is changed continuously, which is unfavorable for the convergence of the algorithm.
(3) The model is not stable enough in the early stage of the training process, and the value function estimation has deviation and is usedThe model may overestimate the expected benefits of an action, misleading agents to choose false actions, resulting in the model not finding the optimal strategy.
The deep reinforcement learning algorithm is an improved D3QN algorithm, and the improvement is as follows:
a. the experience replay mechanism (Experience Replay) is used for storing the experiences obtained through interaction in an experience pool one by one, and after a certain number of experiences are accumulated, a model randomly extracts a certain batch of data training neural networks from the experience pool in each step;
b. constructing two neural networks with the same structure, namely an estimation networkAnd a target value networkThe evaluation network is used for a given state->Calculate take select action +.>Estimate the network parameters +.>Continuously updating; the target network is used for calculating a time sequence differential target value ++>Target value network parameter->Immobilized, replaced by the latest estimated network parameters at intervals +.>The method comprises the steps of carrying out a first treatment on the surface of the Target value->The calculation formula is as follows:
in the method, in the process of the invention,for immediate rewards;The table is a discount factor;Indicates in the next state->Adopts the formula of->Maximized action->
The remaining unchanged for a period of time results in an evaluation network +.>Convergence goal->Relatively fixed; the action of the maximum function generated by the evaluation network and the target network is not necessarily identical, use +.>Producing movements(s)>And calculating the target value, so that the model can be prevented from selecting the suboptimal motion which is overestimated, and the overestimation problem of the DQN algorithm is effectively solved.
c. The neural network structure is improved, the output end of the neural network structure is divided into two parts, and one part is a state value function for representing the quality of each stateAnother part is a dominance function for distinguishing the quality of each action in a specific state
In the method, in the process of the invention,for policy (S)>Parameters that are dominance functions;
d. the target value of the D3QN model is as follows:
e. the network parameters are updated with the mean square error E as a loss function as follows:
the flow of the improved D3QN algorithm is shown in figure 2.
1. Initializing: an estimation neural network and a target neural network are initialized, both of which are deep neural networks used to estimate an action value function. At the same time, an experience playback buffer is initialized for storing experience tuples for the agent to interact with the environment.
2. Collecting experience: the agent interacts with the environment, selects actions according to the current strategy, and observes the next state and instant rewards of the environment feedback. These experience tuples are stored into an experience playback buffer for subsequent training use.
3. Training an estimation network: a batch of experience tuples is randomly extracted from the experience playback buffer. For each experience tuple, an action value of the current state is estimated using an estimated neural network. Then, the action value of the next state is estimated using the target neural network. And calculating a target value of Q-learning, and updating parameters of the estimated neural network to enable the parameters to approach the target value.
4. Updating the target network: the parameters of the target neural network are updated periodically by copying the parameters of the estimated neural network to the target neural network. This helps stabilize the training process and reduces jitter of the estimation object.
5. Selecting: and selecting an action according to the current strategy. Policies such as epsilon-greedy may be used to trade-off between exploration and utilization.
6. Iterative training: and (5) repeatedly executing the steps 2 to 5, continuously collecting experience, updating an estimated value network, updating a target network, and optimizing the decision strategy of the intelligent agent.
7. Convergence and evaluation: as training proceeds, it is observed how the performance of the algorithm converges. The system can evaluate in the environment periodically, and test the performance of the trained agent in a new scene.
8. Ending training: when a predetermined number of training steps or algorithm convergence is reached, the training process is ended. The estimated neural network of the intelligent agent is the final training result and can be used for making decisions in practical application.
Case (B)
In the embodiment, a typical section in a subway line is selected as a research object, and the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors is solved. And finally, expanding the optimal design to a section of a certain subway line of the capital and comparing the optimal design with a scheme generated during the design of the human with abundant experience.
The main constraints are shown in Table 1
TABLE 1 major constraints
Results under different parameters
And analyzing the influence of different energy consumption calculation factors on the energy-saving design of the subway section line of the capital by using the established subway section line design D3QN model with the minimum energy consumption of train operation as a target.
(1) Reverse grade limit influence analysis
Here the reverse gradient limit is analyzedFor the influence of energy-saving design of a longitudinal section line, other energy consumption calculation factors are fixed to be as follows:
(1) model optimization effect: under different reverse gradient limiting conditions, the energy consumption of the energy-saving design line type calculation result is reduced by 3.51-3.83% compared with that of the original line type calculation result, and the more the energy consumption is reduced along with the increase of the reverse gradient limiting, the time of each part is close to the original time in operation.
(2) Linear change: the accelerating slope length of the energy-saving design line type is longer than that of the original line type, and due to the reverse slope limitation, in order to enable potential energy to be converted into kinetic energy better, the train can reach the design speed more quickly, and the first accelerating slope value of the energy-saving design line type is increased along with the reduction of the reverse limit slope. The second accelerating grade value increases with increasing reverse grade limit.
(2) Design speed impact analysis
Here, the influence of the analysis design speed on the energy-saving design of the longitudinal section line is analyzed, and other energy consumption calculation factors are fixedly valued as follows:
(1) model optimization effect: under different design speed conditions, the energy consumption of the energy-saving design line type calculation result is reduced by 1.32% -14.14% compared with that of the original line type calculation result, the energy consumption is reduced along with the increase of the speed in operation, the optimization effect of the model for the design speed near 80km/h is obvious, because the energy consumption of the original line type calculation is larger at the design speed near 80km/h, and the side surface proves that the original line type is not an excellent energy-saving slope line type.
(2) Linear change: the first acceleration slope value of the line energy saving design line increases with the increase of the design speed, and it is presumed that in order to enable the train to reach the design speed more quickly, the potential energy can be better converted into kinetic energy by increasing the first acceleration slope value.Linear gap ratioThe linear difference is large to bypassThe plane eases the curve segment.
(3) Passenger capacity impact analysis
Here, the passenger capacity is analyzedFor the influence of energy-saving design of a longitudinal section line, other energy consumption calculation factors are fixed to be as follows:
(1) Model optimization effect: under the condition of passenger capacity, compared with the original linear calculation result, the energy consumption of the energy-saving design linear calculation result is reduced by 2.33-19.71%, the energy consumption is increased along with the increase of the passenger capacity in the operation time, the model has obvious no-load optimization effect, the energy consumption is reduced by 12.5s in the operation time, and the operation energy consumption is reduced by 19.71%.
(2) Linear change: although the gradient is increased along with the increase of the passenger capacity, the increase amplitude is not large, and the main change is that the length of the acceleration gradient is prolonged, so that the cruising period time can be effectively increased, and the purpose of reducing the energy consumption is achieved.
The embodiment builds a deep reinforcement learning model for subway longitudinal section line design, perceives, searches, judges and decides a line selection environment under the condition of no artificial experience, and finds an optimal energy-saving line scheme through feeding back different constraint conditions. Compared with the actual longitudinal section line design scheme, the train operation energy consumption cost and time cost can be reduced simultaneously.
The invention and its embodiments have been described above by way of illustration and not limitation, and the invention is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present invention.

Claims (9)

1. The energy-saving design optimizing method for the subway longitudinal section line based on deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:
1. combining subway line design specification constraint and actual construction condition constraint, and establishing a subway longitudinal section line design model with the minimum train operation energy consumption as a target;
2. and solving the optimal energy-saving line of the subway longitudinal section under different operation energy consumption calculation factors by adopting a deep reinforcement learning algorithm.
2. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 1, wherein the method is characterized in that: the subway longitudinal section line design model comprises:
1) Environmental Environment: the avoidance area and the planar line gentle curve section area form an environment for optimizing the subway longitudinal section line;
2) Agent: the intelligent program is defined as an intelligent program for determining the energy-saving design trend of the longitudinal section line;
3) State: the spatial position of the change slope point is defined as the stateThe space position refers to that the two-dimensional coordinates of the vertical section of the subway comprise a longitudinal mileage coordinate and a vertical depth coordinate;
4) Action: the search direction of the next slope change point selected by the agent is defined as an action
5) Awarding Reward: the value of the rewards depends on feedback of the environment, train operation energy consumption is taken as a main constituent part of the rewards, and other constituent parts are survival state rewards and target distance rewards;
6) Condition: if the agent cannot find the slave state satisfiedTo->Action of all constraints->I.e. current state change condition->Condition of not being satisfied->And the current state +>The initialization state isThe method comprises the steps of carrying out a first treatment on the surface of the Otherwise, the condition of state change->Condition is satisfied->The selection of the next action is continued.
3. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 2, which is characterized in that: the subway line design specification constraint and the actual construction condition constraint comprise:
(1) Station regional slope section slope length and slope constraint
Only one slope section is arranged in the station area, namely the sum of the lengths of the entering and exiting slope sections is larger than the length of the stationAssuming that the lengths of the entrance slope section and the exit slope section are the same, the slope length of the station area is +.>The constraint is expressed as:
station gradientUsing a given constant +.>The method comprises the following steps:
(2) Slope length and slope constraint of non-station area slope section
The slope length constraint of the non-station area slope section is expressed as follows:
the line is provided with the minimum gradientAnd maximum grade->The method comprises the following steps:
(3) Minimum clip straight length
Length of straight line between two adjacent vertical curvesThe design specification requirements should be met, and the calculation formula is as follows:
in the method, in the process of the invention,indicate->Longitudinal mileage of each slope change point;Indicate->Tangent length at each slope change point;Representing the minimum clip line length in the design specification;
(4) Reverse limit grade
When the slope sections in two opposite directions are connected, the slope in one directionShould not be greater than the reverse limit gradientThe method comprises the following steps:
(5) Line burial depth constraint
The line burial depth is limited by any point on the lineRail head design elevation of underground tunnel>Should be smaller than the ground elevation where the point is located +.>Minus tunnel height +.>Minimum earth thickness +.>The method comprises the following steps:
(6) Avoidance zone constraints
For all avoidance areas of subway longitudinal sectionIndicating the subway longitudinal section line tunnel area is used +.>Indicating (I)>And->Should be an empty set:
(7) Plane moderating curve segment constraint
Distance coordinates of slope changing pointStart-end mileage coordinates with plane relaxation curve +.>The distance of (2) should not be smaller than the tangential length of the vertical curve, i.e.:
4. the energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 3, wherein the method is characterized in that: in State State, the firstThe state space at the end of the individual actions is defined as +.>The following formula is shown:
wherein, is->Longitudinal mileage coordinates of each slope change point, +.>Is->The vertical elevation coordinates of the variable slope points, W and H are respectively the upper limits of mileage and depth of the target optimization area; in the above formula, the overall position of two continuous variable slope points is taken as one state in the longitudinal section line design optimization model.
5. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 4, wherein the method is characterized in that: in Action, actionStatus +.>Transition to State->Action->Represented in two parts, the formula:
wherein the method comprises the steps ofAnd->Is the length and gradient of the slope section;The relation among the three is as follows:
6. the energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning according to claim 5, wherein the method is characterized in that: rewards in RewardThe formula of (2) is as follows:
wherein the method comprises the steps ofRepresenting the energy consumption cost, survival cost and distance cost from the end point of unit operation respectively;is->Is a weight of (2).
7. The energy-saving design optimizing method for subway longitudinal section lines based on deep reinforcement learning of claim 6, wherein the method is characterized by comprising the following steps of: in a subway longitudinal section line design model, a train motion equation is solved by adopting an approximate integral method so as to solve the train operation energy consumption, and an integral upper limit and a integral lower limit in an operation time and distance formula are divided into a plurality of micro speed intervalsLet the initial and final speeds in the integral be +.>The formulas for calculating the running time and the running distance of the train by the approximate integration method are as follows:
time division of operation
Unit resultant forceCalculation of (2)
Under different working conditions, the following forms can be written:
in the method, in the process of the invention,is traction force;The mass of the motor car;Is the trailer mass;Braking force is a train unit;Is a common braking coefficient;Is traction unit->For train unit resistance, including train running basic resistance +.>Ramp additional resistance +.>Curve additional resistance->Additional resistance of tunnel->
8. The energy-saving design optimizing method for the subway longitudinal section line based on deep reinforcement learning of claim 7, which is characterized in that: the operating energy consumption calculation factors include reverse grade limit, limiting speed, and passenger capacity.
9. The energy-saving design optimizing method for the subway longitudinal section line based on deep reinforcement learning of claim 8, which is characterized in that: the deep reinforcement learning algorithm is an improved D3QN algorithm, and the improvement is as follows:
a. the experience replay mechanism is used for storing experiences obtained through interaction in an experience pool one by one, and after a certain number of experiences are accumulated, a model randomly extracts a certain batch of data training neural network from the experience pool in each step;
b. constructing two neural networks with the same structure, namely an estimation networkAnd a target value networkThe evaluation network is used for a given state->Calculate action taken->Estimating network parameters for a desired jackpot of (2)Continuously updating; the target network is used for calculating a time sequence differential target value ++>Target value network parameter->Immobilized, replaced by the latest estimated network parameters at intervals +.>The method comprises the steps of carrying out a first treatment on the surface of the Target value->The calculation formula is as follows:
in the method, in the process of the invention,for immediate rewards;The table is a discount factor;Indicating in the next stateAdopts the formula of->Maximized action->
The remaining unchanged for a period of time results in an evaluation network +.>Convergence goal->Relatively fixed;
c. the neural network structure is improved, the output end of the neural network structure is divided into two parts, and one part is a state value function for representing the quality of each stateThe other part is a dominance function for distinguishing the quality of each action in a specific state>
In the method, in the process of the invention,for policy (S)>Parameters that are dominance functions;
d. the target value of the D3QN model is as follows:
e. by mean square errorThe update network parameters as a loss function are as follows:
CN202311330407.6A 2023-10-16 2023-10-16 Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning Active CN117521485B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311330407.6A CN117521485B (en) 2023-10-16 2023-10-16 Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311330407.6A CN117521485B (en) 2023-10-16 2023-10-16 Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN117521485A true CN117521485A (en) 2024-02-06
CN117521485B CN117521485B (en) 2024-06-18

Family

ID=89765249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311330407.6A Active CN117521485B (en) 2023-10-16 2023-10-16 Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN117521485B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165427A (en) * 2018-08-06 2019-01-08 中国铁路设计集团有限公司 Run the optimized calculation method of high-speed rail vertical alignment Adjusted Option
CN112380605A (en) * 2020-11-16 2021-02-19 广州地铁设计研究院股份有限公司 Method and device for optimizing subway longitudinal section design and energy-saving operation scheme
CN113505414A (en) * 2021-06-08 2021-10-15 广州地铁设计研究院股份有限公司 Method, system, equipment and storage medium for designing subway line longitudinal section

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165427A (en) * 2018-08-06 2019-01-08 中国铁路设计集团有限公司 Run the optimized calculation method of high-speed rail vertical alignment Adjusted Option
CN112380605A (en) * 2020-11-16 2021-02-19 广州地铁设计研究院股份有限公司 Method and device for optimizing subway longitudinal section design and energy-saving operation scheme
CN113505414A (en) * 2021-06-08 2021-10-15 广州地铁设计研究院股份有限公司 Method, system, equipment and storage medium for designing subway line longitudinal section

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
孙元广;汪茜;彭磊;齐嫣然;柏赟;: "考虑快慢车模式的地铁地下线纵断面优化研究", 交通运输系统工程与信息, no. 04, 15 August 2020 (2020-08-15), pages 1 - 6 *
孙大为: "铁路线路局部方案平、纵断面联合优化设计方法", 中国优秀硕士学位论文全文库工程科技Ⅱ辑, no. 1, 15 January 2013 (2013-01-15), pages 1 - 86 *
李睿 等: "地铁线路纵断面设计探讨", 铁道标准设计, no. 1, 15 January 2013 (2013-01-15), pages 1 - 6 *
柏赟;白骁;孙元广;李佳杰;周雨鹤;: "地铁线路区间纵断面节能设计优化模型", 铁道学报, no. 09, 15 September 2020 (2020-09-15), pages 1 - 5 *
柴杨 等: "城际列车多区间节能运行优化方法研究", 计算机仿真, no. 3, 15 March 2020 (2020-03-15), pages 1 - 6 *
王平: "公路工程设计阶段的造价控制研究", 交通世界, no. 8, 15 August 2022 (2022-08-15), pages 1 - 5 *

Also Published As

Publication number Publication date
CN117521485B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN105460048B (en) Comprehensive energy-saving control method and method integrating optimized manipulation and traffic scheduling for urban rail transit
CN106503804A (en) A kind of train timing energy-saving operation method based on Pareto multi-objective genetic algorithms
CN112193280A (en) Heavy-load train reinforcement learning control method and system
CN114241778A (en) Multi-objective optimization control method and system for expressway network connection vehicle cooperating with ramp junction
CN111267830B (en) Hybrid power bus energy management method, device and storage medium
CN109910866A (en) Hybrid vehicle energy management method and system based on road condition predicting
US20140316618A1 (en) System and method for determining dynamically changing distributions of vehicles in a vehicle system
CN110497943A (en) A kind of municipal rail train energy-saving run strategy method for on-line optimization based on intensified learning
CN105551337A (en) Driving auxiliary method and system for train driver
CN106056238B (en) Planning method for train interval running track
CN110703757A (en) Energy consumption optimization-oriented high-speed train speed planning method
CN109398426B (en) Energy-saving driving strategy optimization method based on discrete ant colony algorithm under timing condition
CN111591324B (en) Heavy-load train energy consumption optimization method based on gray wolf optimization algorithm
CN103119335A (en) Method for controlling the shifting of an automatic geared transmission
CN113821966A (en) Energy-saving optimization method and system for high-speed maglev train operation and storage medium
CN113408214B (en) Fuel consumption and emission integrated optimization commercial vehicle queue merging method
Ding et al. Simulation algorithm for energy-efficient train control under moving block system
CN117521485B (en) Energy-saving design optimizing method for subway longitudinal section line based on deep reinforcement learning
CN114692266B (en) Energy-saving slope optimization method based on Gaussian pseudo-spectrum method
Keskin et al. Energy efficient motion control for a light rail vehicle using the big bang big crunch algorithm
CN112560172B (en) On-line energy hybrid feedback control method for subway train automatic driving
Cao et al. Optimisation of recommended speed profile for train operation based on ant colony algorithm
CA2379775A1 (en) Method for power optimization in a vehicle/train having an efficiency that depends on the operating point
Li et al. Research on Train Operation Optimization Based on Simulation and Multi-Objective Optimization Models
Bai et al. Distributed cooperative control of a high-speed train

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant