CN111598311B - Novel intelligent optimization method for train running speed curve - Google Patents

Novel intelligent optimization method for train running speed curve Download PDF

Info

Publication number
CN111598311B
CN111598311B CN202010349688.XA CN202010349688A CN111598311B CN 111598311 B CN111598311 B CN 111598311B CN 202010349688 A CN202010349688 A CN 202010349688A CN 111598311 B CN111598311 B CN 111598311B
Authority
CN
China
Prior art keywords
train
train operation
data
state
reward
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010349688.XA
Other languages
Chinese (zh)
Other versions
CN111598311A (en
Inventor
董海荣
周学影
周敏
宋海锋
袁磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202010349688.XA priority Critical patent/CN111598311B/en
Publication of CN111598311A publication Critical patent/CN111598311A/en
Application granted granted Critical
Publication of CN111598311B publication Critical patent/CN111598311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biomedical Technology (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Train Traffic Observation, Control, And Security (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a novel intelligent optimization method for a train running speed curve, which comprises the following steps: step one, building a train operation reinforcement learning environment; step two, establishing a reward mechanism; step three, updating a train operation historical information database; and step four, the intelligent agent interacts with the train operation reinforcement learning environment. The invention carries out multi-objective optimization aiming at the train running speed curve, the optimization objectives comprise train punctuality, energy consumption and comfort level, certain energy saving is realized under the condition that the train is as accurate as possible, and the passenger riding comfort level can be improved.

Description

Novel intelligent optimization method for train running speed curve
Technical Field
The invention belongs to the technical field of train operation optimization, and particularly relates to a novel intelligent optimization method for a train operation speed curve.
Background
The railway system has the characteristics of severe change of line environment, complex external influence factors, more cross-line long-distance cross-road operation, complex and various infrastructure and train characteristics and the like, so the train operation process is a nonlinear problem which is limited by various factors such as line conditions, speed limit and the like. Because different running speed curves have great influence on the aspects of train energy consumption, safety, accuracy and the like, the existing train running speed curve optimization algorithm is difficult to meet the requirements of quickly recovering running and ensuring efficient operation under complex line conditions, and cannot meet the requirements of train running optimization control on the aspects of real-time performance and the like under complex environments. In the existing train operation control system, when an emergency occurs and the train operation time needs to be adjusted, an ATO system (train automatic driving system) cannot automatically adjust a speed curve to ensure that a train arrives on time, but needs to be converted into a manual driving mode. With the development of computer computing power, artificial intelligence and the like in recent years, the search for efficient intelligent optimization methods has become a research hotspot. Therefore, how to combine a novel intelligent method to carry out rapid real-time optimization on the speed curve of the train and improve the operation indexes of the train, such as the accuracy, comfort, energy conservation and the like, is still a problem worthy of further thinking and research.
The train speed curve optimization problem is a multi-stage decision problem, the reinforcement learning model-free method shows the superiority and rapidity of approximate optimal solution search in the multi-stage decision problem, and reinforcement learning analysis data have the characteristics of self online learning and environmental feedback information insight, and how to achieve set targets can be learned in complex and uncertain environments. Therefore, the combination of the reinforcement learning method has important theoretical and practical significance for speed curve optimization.
Disclosure of Invention
The invention provides a novel intelligent optimization method for a train running speed curve, which aims to improve three performance indexes of a train, namely the accuracy, the comfort level and the energy conservation.
The technical scheme of the invention is as follows:
a novel intelligent optimization method for a train running speed curve comprises the following steps:
step one, building a train operation reinforcement learning environment:
establishing a train operation reinforcement learning environment in the first step according to the line static data, the train static data and the train operation dynamic data; the train running dynamic data comprises the current running position, the speed, the acceleration and the train running time of the train;
step two, establishing a reward mechanism:
through the reward function, the vehicle-mounted controller determines reward values corresponding to different working condition actions in each state; the reward function is set as a relevant function of train operation time, energy consumption and acceleration, and the reward value is provided by the train operation reinforcement learning environment in the step one;
step three, updating the train operation historical information database:
collecting train operation data information in a real railway scene, wherein the data information comprises the position, the speed, the acceleration value, the time, the line gradient and the line speed limit value of train operation to form a train operation state data set; aiming at the train running state data set, finding out a state data set with the maximum similarity through the Manhattan distance; combining the train running state data set and the corresponding action to form a train state-action data set, namely a train running track; the train operation historical information database is formed by a plurality of train operation tracks; processing the data in the train operation history information database according to the reward mechanism in the step two, namely calculating reward for each state-action pair in each operation track to obtain an updated train operation history information database for training neural network parameters;
step four, interaction between the intelligent agent and the train operation reinforcement learning environment:
the train operation reinforcement learning environment generates a new state, a new reward value and a new state value function and feeds the new state, the new reward value and the new state value function back to the intelligent agent; data obtained after real train operation historical data processing are stored in an experience playback data area; the intelligent agent continuously carries out strategy evaluation and strategy improvement through the action value function, selects the maximum action value function, feeds back the action corresponding to the maximum action value function to the train operation reinforcement learning environment, continuously updates the train working condition value through a closed loop structure, and finally selects the optimal working condition action to generate an optimal train speed curve;
and in the fourth step, the intelligent agent is equal to the train-mounted controller in the second step.
The invention has the beneficial effects that:
firstly, a reinforcement learning method is adopted, train parameters are trained by using train historical operation data, on one hand, the optimization of a speed curve does not depend on a specific train model, and the adverse effect of a complex and changeable train operation environment on solving is avoided; on the other hand, the method learns from real historical data, calculates a reward function, improves a learning model, and improves the solving speed and quality of the approximate optimal solution of the speed curve.
Secondly, the invention carries out multi-objective optimization aiming at the train running speed curve, the optimization objectives comprise train punctuality, energy consumption and comfort, certain energy saving of the train is ensured under the condition of as accurate point as possible, and the passenger riding comfort is improved.
Drawings
Fig. 1 is a graph of the shortest operation time of a novel intelligent optimization method for a train operation speed curve according to an embodiment of the present invention;
fig. 2 is a schematic diagram of agent-environment interaction of a novel intelligent optimization method for a train running speed curve according to an embodiment of the present invention;
fig. 3 is a structural diagram of a novel intelligent optimization method for a train operation speed curve according to an embodiment of the present invention.
Detailed Description
The present invention is further described below in conjunction with the drawings and the embodiments so that those skilled in the art can better understand the present invention and can carry out the present invention, and the embodiments of the present invention are not limited thereto.
In the first step, a train operation reinforcement learning environment is established, and data of the reinforcement learning environment comprises static data of lines and trains and dynamic data of train operation. The train running dynamic data comprises the current running position, speed, acceleration and train running time of the train. By using the data, constraint of the train running speed solution is given, and the solution space is reduced. As shown in fig. 1, a curve of the shortest operation time of the train under the conditions of maximum traction, maximum braking and section speed limit constraint is given, and the curve gives the maximum speed which can be reached by the train operation, and is used as a constraint condition when data sampling is performed by using similarity in the third step, so that a solution space which accords with the actual train operation is obtained.
In the second step, the invention aims to improve the train operation punctuality rate, reduce the train operation energy consumption and improve the riding comfort of passengers. The reward function is the optimization goal, and therefore, the reward function is set as a correlation function of train operation time, energy consumption and acceleration. Can be expressed as follows:
Figure GDA0003893874240000041
wherein, the reward function related to punctuality is designed as follows:
Figure GDA0003893874240000042
to ensure punctuality, at the end of each trial, an additional term will be added for the reward function regarding punctuality, namely:
Figure GDA0003893874240000043
the reward function related to energy consumption is designed as follows:
Figure GDA0003893874240000051
the comfort-related reward function is designed as follows:
Figure GDA0003893874240000052
wherein,
Figure GDA0003893874240000053
t max respectively the actual time and the maximum time, T, spent by each run of the train for a step length r T is actual running time and planned running time between stations respectively; u. of i Δ d is the train acceleration and train position step length, respectively, N is the total number of steps, e max For each run of one step lengthThe maximum energy consumed; Δ c max And the maximum impact rate of the train operation is obtained.
If different performance indicators have different requirements, a weight may be set for each performance indicator. As follows:
r i =w 1 *r i time +w 2 *r i energy +w 3 *r i comfort ,w 1 +w 2 +w 3 =1
w 1 ,w 2 ,w 3 respectively, corresponding weights for time, energy consumption and comfort function.
After the reward mechanism is established, the reward value of the train taking different actions in each state can be obtained. As shown in fig. 2, the acceleration, the time and the maximum time spent by the unit distance of train operation, the maximum energy consumption and the maximum impact rate of train operation in the reward function are obtained from the step one, i.e. the reward value is provided by the train operation reinforcement learning environment.
In the third step, the collected train operation data information includes train operation position, speed, acceleration and deceleration values and time. Each train operation state is set as follows:
Figure GDA0003893874240000054
wherein, d i ,v i ,t i ,u i ,g i
Figure GDA0003893874240000055
The position, the speed, the time, the acceleration value, the line gradient and the line speed limit value of the train in the current i state are respectively.
The invention needs to generate data for strengthening learning training from the historical data information, namely, for the current train running state, a state set with the maximum similarity is found from the data sets, and the similarity between the data can be measured by adopting the Manhattan distance. Namely:
Figure GDA0003893874240000061
are respectively s i And s k The jth element in (a).
For each state s i The method is adopted to find n nearest states { s } k1 ,s k2 ,...,s kn Accordingly, state-action pairs(s) corresponding to the n states can be obtained k ,a k ). Wherein s is k ,a k Respectively representing for one state s i The obtained approximate state set and corresponding action set correspond to the acceleration, and different accelerations correspond to different actions. It is noted that n is not fixed and is determined according to the constraints provided in step one.
The train trajectory can be described as a set of state-actions as follows:
τ={s 0 ,a 0 ,s 1 ,a 1 ,...,s N-1 ,a N-1 ,s N }
the plurality of train running tracks form a train running history information database, namely:
M={τ 12 ,...,τ M }
the historical information database in the step comprises a large number of train running tracks, and the rewarding mechanism provided in the step two is adopted to calculate the rewarding of each state-action pair in each running track, so that the following data set is obtained:
τ′={s 0 ,a 0 ,r 0 ,s 1 ,a 1 ,r 1 ,...,s N-1 ,a N-1 ,r N-1 ,s N }
namely, the train history information database is updated to M' = { τ = 1 ′,τ 2 ′,...,τ′ M }. The updated database information is increased by the reward value of the state-action pair compared with the original database information and is used for training the neural network in the next step four.
And step four is a core part of the invention, in the part, the intelligent agent and the reinforcement learning environment continuously carry out interaction and learning, and the optimal working condition action is selected and fed back to the train operation reinforcement learning environment through the evaluation and improvement of the strategy value function.
As shown in fig. 2, the intelligent agent is equivalent to a vehicle-mounted controller, in the running process of a train, the intelligent agent interacts with the environment, the environment generates a new state, a reward value and a state value function and feeds back the new state, the reward value and the state value function to the intelligent agent, the intelligent agent continuously performs strategy evaluation and strategy improvement through the value function, selects a maximum action value function, feeds back an action corresponding to the maximum action value function to the train running reinforcement learning environment, continuously updates a working condition value through a closed-loop structure, finally selects an optimal working condition action, generates an optimal train speed curve, and achieves the purpose of energy conservation and comfort.
The intelligent agent action value function is updated in a Deep Q learning Network (Deep-Q-Network) mode, and the updating method adopts a gradient descent method:
Figure GDA0003893874240000071
θ - theta is a network parameter of the target network and a network parameter of the value function approximation respectively, a and a' are actions selected in the current state and the next state respectively, corresponding acceleration, r is an award value, gamma is a discount factor, and Q is a representative value function.
The basic procedure for optimizing the speed profile using DQN is as follows:
inputting: state S belongs to S, train action a belongs to A, value function v belongs to R, and mapping S multiplied by A → R is established
Initializing an empirical playback data zone D of capacity N
Initializing a state-action value function Q with a random weight θ
Let theta - = theta, initialize target neural network
Figure GDA0003893874240000072
Beginning:
for the first training segment, epicode =1:
obtaining an initial state s 1 =(d 1 ,v 1 ,u 1 ,t 1 ) (initial state is zero vector)
For t =1:
selecting N pieces(s) from historical information database i ,a i ,r i ,s i+1 ) Data storage into D
Sampling m training samples(s) from D j ,a j ,r j ,s j+1 )
Computing
Figure GDA0003893874240000073
Solving for (y) using a gradient descent algorithm j -Q(φ j ,a j ;θ)) 2
After C, updating the target network weight theta - ←θ
End of each intra-event loop
End inter-event cycling
Through the steps, the trained neural network parameters used for approximating the value function are finally obtained. By using the parameters, the train running state and the relevant line conditions can be input in the running process of the train, and an optimized speed curve is obtained.
It can be seen from the above that, unlike the previous DQN algorithm, the experience playback data field in the DQN algorithm of the present invention stores data obtained by processing actual train operation history data, as shown in fig. 3, instead of experience data generated by an enhanced learning environment. This data can be obtained from the onboard computer of the train. By adopting the processing mode, on one hand, the method is independent of a specific train dynamics model, adverse effects of a complex train operation environment on modeling solution are avoided, on the other hand, the method learns from real historical data, calculates a reward function, improves a learning model, and improves the solving speed and quality of a speed curve approximate optimal solution. The neural network trained by using the historical data can output the optimal action according to the current state, namely, the optimal train operation condition is output according to the current train operation state, and the purpose of energy conservation and comfort on the spot is achieved through the modes of off-line training and on-line optimization.
The above description is only a few examples of the present invention, and is not intended to limit the present invention. All the modifications and improvements made to the above examples according to the technical essence of the present invention fall within the scope of the present invention.

Claims (1)

1. A novel intelligent optimization method for a train running speed curve comprises the following steps:
step one, building a train operation reinforcement learning environment:
establishing a train operation reinforcement learning environment in the first step according to the line static data, the train static data and the train operation dynamic data; the train running dynamic data comprises the current running position, the speed, the acceleration and the train running time of the train;
step two, establishing a reward mechanism:
determining a reward value corresponding to different working condition actions adopted in each state by the vehicle-mounted controller through a reward function; the reward function is set as a relevant function of train operation time, energy consumption and acceleration, and the reward value is provided by the train operation reinforcement learning environment in the step one;
step three, updating the train operation historical information database:
collecting train operation data information in a real railway scene, wherein the data information comprises the position, the speed, the acceleration value, the time, the line gradient and the line speed limit value of train operation to form a train operation state data set; aiming at the train running state data set, finding out a state data set with the maximum similarity through the Manhattan distance; combining the train running state data set and the corresponding action to form a train state-action data set, namely a train running track; the train operation historical information database is formed by a plurality of train operation tracks; processing the data in the train operation history information database according to the reward mechanism in the step two, namely calculating reward for each state-action pair in each operation track to obtain an updated train operation history information database for training neural network parameters;
step four, interaction between the intelligent agent and the train operation reinforcement learning environment:
the train operation reinforcement learning environment generates a new state, a new reward value and a new state value function and feeds the new state, the new reward value and the new state value function back to the intelligent agent; data obtained after real train operation historical data processing are stored in an experience playback data area; the intelligent agent continuously carries out strategy evaluation and strategy improvement through the action value function, selects the maximum action value function, feeds back the action corresponding to the maximum action value function to the train operation reinforcement learning environment, continuously updates the train working condition value through a closed loop structure, and finally selects the optimal working condition action to generate an optimal train speed curve;
and in the fourth step, the intelligent agent is equal to the train-mounted controller in the second step.
CN202010349688.XA 2020-04-28 2020-04-28 Novel intelligent optimization method for train running speed curve Active CN111598311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010349688.XA CN111598311B (en) 2020-04-28 2020-04-28 Novel intelligent optimization method for train running speed curve

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010349688.XA CN111598311B (en) 2020-04-28 2020-04-28 Novel intelligent optimization method for train running speed curve

Publications (2)

Publication Number Publication Date
CN111598311A CN111598311A (en) 2020-08-28
CN111598311B true CN111598311B (en) 2022-11-25

Family

ID=72182289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010349688.XA Active CN111598311B (en) 2020-04-28 2020-04-28 Novel intelligent optimization method for train running speed curve

Country Status (1)

Country Link
CN (1) CN111598311B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231870B (en) * 2020-09-23 2022-08-02 西南交通大学 Intelligent generation method for railway line in complex mountain area
CN118194710A (en) * 2024-03-20 2024-06-14 华东交通大学 Multi-objective optimization method and system for magnetic levitation train

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109703606A (en) * 2019-01-16 2019-05-03 北京交通大学 Bullet train intelligent driving control method based on history data
CN109978350A (en) * 2019-03-13 2019-07-05 北京工业大学 A kind of subway train energy conservation optimizing method based on regime decomposition dynamic programming algorithm
CN110497943A (en) * 2019-09-03 2019-11-26 西南交通大学 A kind of municipal rail train energy-saving run strategy method for on-line optimization based on intensified learning
CN110562301A (en) * 2019-08-16 2019-12-13 北京交通大学 Subway train energy-saving driving curve calculation method based on Q learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109703606A (en) * 2019-01-16 2019-05-03 北京交通大学 Bullet train intelligent driving control method based on history data
CN109978350A (en) * 2019-03-13 2019-07-05 北京工业大学 A kind of subway train energy conservation optimizing method based on regime decomposition dynamic programming algorithm
CN110562301A (en) * 2019-08-16 2019-12-13 北京交通大学 Subway train energy-saving driving curve calculation method based on Q learning
CN110497943A (en) * 2019-09-03 2019-11-26 西南交通大学 A kind of municipal rail train energy-saving run strategy method for on-line optimization based on intensified learning

Also Published As

Publication number Publication date
CN111598311A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN112193280B (en) Heavy-load train reinforcement learning control method and system
CN102981408B (en) Running process modeling and adaptive control method for motor train unit
CN114241778B (en) Multi-objective optimization control method and system for expressway internet of vehicles cooperating with ramp junction
CN112307564B (en) Train ATO target running speed curve optimization method and device
CN111267831A (en) Hybrid vehicle intelligent time-domain-variable model prediction energy management method
Juang Combination of online clustering and Q-value based GA for reinforcement fuzzy system design
Zhou et al. Smart train operation algorithms based on expert knowledge and reinforcement learning
CN112389436B (en) Safety automatic driving track changing planning method based on improved LSTM neural network
CN112686453B (en) Intelligent prediction method and system for locomotive energy consumption
CN111598311B (en) Novel intelligent optimization method for train running speed curve
CN106056238B (en) Planning method for train interval running track
CN103879414A (en) Locomotive optimal manipulation method based on self-adaption A-Star algorithm
CN113911172A (en) High-speed train optimal operation control method based on self-adaptive dynamic planning
CN111591324B (en) Heavy-load train energy consumption optimization method based on gray wolf optimization algorithm
CN113821966A (en) Energy-saving optimization method and system for high-speed maglev train operation and storage medium
CN108749816B (en) Method for regulating and controlling speed of intelligent vehicle by using energy dissipation theory
He et al. Research on multi-objective real-time optimization of automatic train operation (ATO) in urban rail transit
CN117104310A (en) Virtual marshalling control method and system based on data-driven predictive control
CN116176654A (en) Scene self-adaptive track traffic ATO control system
CN115871742A (en) Control method of man-machine hybrid driving intelligent train under multiple scenes
Gao et al. Study on Multi-Objective Intelligent Speed Controller Model of Automatic Train Operation for High Speed Train Based on Grey System Theory and Genetic Algorithm
CN112224244B (en) High-speed train automatic driving curve generation method based on temperature and load conditions
Zhang et al. A flexible and robust train operation model based on expert knowledge and online adjustment
Yang et al. Research on Multi-objective Optimal Control of Heavy Haul Train Based on Improved Genetic Algorithm
Yang et al. Trajectory Tracking Control of Autonomous Vehicles Based on Reinforcement Learning and Curvature Feedforward

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant