CN106842925A - A kind of locomotive smart steering method and system based on deeply study - Google Patents

A kind of locomotive smart steering method and system based on deeply study Download PDF

Info

Publication number
CN106842925A
CN106842925A CN201710045758.0A CN201710045758A CN106842925A CN 106842925 A CN106842925 A CN 106842925A CN 201710045758 A CN201710045758 A CN 201710045758A CN 106842925 A CN106842925 A CN 106842925A
Authority
CN
China
Prior art keywords
locomotive
study
learning
train
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710045758.0A
Other languages
Chinese (zh)
Other versions
CN106842925B (en
Inventor
赵曦滨
夏雅楠
黄晋
卢莎
任育琦
顾明
孙家广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
CRRC Dalian Institute Co Ltd
CRRC Information Technology Co Ltd
Original Assignee
Tsinghua University
CRRC Dalian Institute Co Ltd
CRRC Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, CRRC Dalian Institute Co Ltd, CRRC Information Technology Co Ltd filed Critical Tsinghua University
Priority to CN201710045758.0A priority Critical patent/CN106842925B/en
Publication of CN106842925A publication Critical patent/CN106842925A/en
Application granted granted Critical
Publication of CN106842925B publication Critical patent/CN106842925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention relates to a kind of locomotive smart steering method and system based on deeply study, the system includes data source modules, locomotive operation environment learning module, evaluates mechanism study module and control strategy study module, data source modules are for locomotive operation environment learning module and evaluate the data input needed for mechanism study module is provided, and the specific running environment and reward functions value that locomotive operation environment learning module and evaluation mechanism study module will be obtained respectively are exported to control strategy study module.Based on deeply learning algorithm, the Real-Time Evaluation that locomotive operation environmental model is acted using locomotive control is used as feedback information, by awarding or punishing current control action, a reward functions are fed back as award evaluation of estimate to control strategy, and control strategy combination running status is made iteratively the renewal and optimization of strategy.The present invention can preferably realize that locomotive intelligent optimization is manipulated, and considerably reduce artificial participation.

Description

A kind of locomotive smart steering method and system based on deeply study
Technical field
The present invention relates to a kind of locomotive control method and system, more particularly to a kind of locomotive intelligence based on deeply study Energy method of operating and system, belong to field of locomotive control.
Background technology
The automatic Pilot and optimized handling of railway locomotive are for liberation manpower, reducing energy consumption, raising locomotive punctuality rate and peace The aspects such as full property play an important roll.Because train operation environment is complicated, influence factor is numerous, scholars are excellent in locomotive control Changing algorithm has carried out numerous studies, wherein can substantially be divided three classes:Analytical Solution method, numerical optimization and didactic Optimized algorithm.In Analytical Solution method application, two kinds are generally divided into:It is a kind of be applied to input tractive force and brake force be from The locomotive of type is dissipated, the tractive force and brake force that another kind is applied to input are the locomotives of continuous type.But Analytical Solution side Constraint in method is excessively simple, it is impossible to which fitting locomotive shows ruuning situation well, and numerical optimization real-time is poor, difficult For the real-time optimal control of locomotive, didactic optimized algorithm has the shortcomings that artificial dependency degree is excessive.Current locomotive Manipulate real time control algorithms typically can be all designed based on ad hoc hypothesis, it is difficult to suitable for the operating condition that locomotive is complicated, from And it is difficult to ensure that locomotive operation safety.
In recent years, the engine optimizing control based on machine learning artificial intelligence technology also turns into study hotspot.Luo Hengyu and Xu Hongze propose a comprehensive intelligent control system for being applied to express locomotive automated control operation system System.Comprising multiple fuzzy neural network controllers in system, and selected automatically with the running status that expert decision system is based on locomotive Optimal controller is selected to realize effective control of locomotive.Heqing Sun et al. propose a learning algorithm for iteration with reality The tracking of existing locomotive operation track, the algorithm is based on dynamics model of the locomotive, use in conjunction error feedback mechanism.They pass through What theory analysis demonstrated algorithm can convergence.Lixing Yang et al. are for the real-time locomotive under condition of uncertainty interference The realization of control system, two RTO algorithms and an on-line learning algorithm are proposed based on expert's study, and algorithm considers not true The interference of qualitative condition, meets the requirement of multiple target.Jia TengYin et al. are added on the basis of existing ATO algorithms Stopped algorithm (HSA) based on data mining algorithm and expert's study and didactic locomotive, form the STO algorithms of optimization. These researchs by means of pilot steering experience to a certain extent, and being aided with the modes such as machine learning by expert system realizes locomotive Optimized handling, but it is excessive and be difficult to ensure that effect of optimization to still suffer from artificial participation.
The development of deeply study (Deep Reinforcement Learning) also result in machine learning field Huge sensation.Research team with DeepMind team as representative proposes the depth based on DQN (Deep Q-Network) first Intensified learning method, and use the partial games of Atari 2600 as test object, as a result can exceed human player.The machine Breakthrough on learning art is then delivered on Nature periodicals, causes the huge sensation of machine learning research field. The theoretical developments process can trace back to the related work that Lange in 2010 does earliest, and he proposes Deep auto-encoder For the relevant control of view-based access control model.Cuccu in 2011 et al. and Abtahi etc. is studied in related fields per capita, wherein, Abtahi proposes the method for approaching device replaced with DBN in traditional intensified learning, and this is non-with the thought of deeply study Very close to.2012, Lange further started to do application, it is proposed that Deep Fitted Q learn for wagon control.2013 Year, Deep Mind team has delivered their article on NIPS, convolutional neural networks and intensified learning has been combined, with The Value Function that raw image data is acted as input, using each are played as output by Atari 2600 As test, there are 6 to exceed human levels in 7 game for finding the method test.DeepMind team exists afterwards The DQN articles of modified version are delivered on Nature, has caused the extensive concern of people.It is similar that experiment shows that the method is more applied to The optimization control process such as game, locomotive control, new thinking and opportunity are provided to railway locomotive optimized handling.
The content of the invention
The present invention realizes applying machine learning completely using the important breakthrough of machine learning field deeply learning method Artificial intelligence means carry out railway locomotive optimized handling.For the target, emphasis of the invention is the depth of engine optimizing operation The evaluation mechanism study of locomotive operation environment and locomotive real time management needed for nitrification enhancement, and deeply learning process Also realized using machine learning method, and uncertain and influence operation safety the non-standard operation in environment will be taken into account Deng.
A kind of locomotive smart steering system based on deeply study, it is characterised in that the locomotive smart steering system System includes data source modules, locomotive operation environment learning module, evaluates mechanism study module and control strategy study module;
The data source modules are used to carry out data prediction to the data source for obtaining, data source bag described in the data source Include locomotive operation daily record, train operation cross-channel data, energy consumption in train journey information and train schedule information, the data Pretreatment is that the locomotive operation daily record and the train operation cross-channel data are delivered into the locomotive operation environment learning mould Block, by the energy consumption in train journey information and the train schedule information conveyance to the evaluation mechanism study module;
The locomotive operation environment learning module is used to build locomotive operation environmental model, the locomotive operation environment learning The study of underlying parameter part and disturbance parameter part comprising train runing parameters, learning outcome constitutes locomotive and specifically runs The specific running environment of the locomotive for obtaining is delivered to the control strategy by environment, the locomotive operation environment learning module Study module;
The information combining assessment mechanism that the evaluation mechanism study module will be obtained from the data source modules obtains machine The reward functions wanted needed for car running, the reward functions are as the feedback data of the evaluation mechanism by the evaluation Mechanism study module is delivered to the control strategy study module;
The control strategy study module is from the locomotive operation environment learning module and the evaluation mechanism study module The specific running environment of the locomotive and the reward functions are obtained respectively, and carry out the train based on deeply learning method Optimized handling policy learning is trained, and continuous interactive learning is carried out with the locomotive operation environmental model, by the evaluation machine Reward functions that study module processed is fed back so as to for instructing the manipulation sequence after train, and by policy update Mechanism, obtains the final manipulation of physical strategy of the locomotive.
Further, the evaluation mechanism includes the study of train operation scoring and non-standard operation Penalty Mark mechanism Design.
Further, the control strategy study module carries out deeply study is carried out based on DQN models, described DQN models carry out continuous interactive learning with the locomotive operation environmental model.
Present invention additionally comprises a kind of locomotive smart steering method based on deeply study, it is characterised in that the machine Car smart steering method is achieved by the steps of:
S1:Data source is pre-processed;
The characteristic of locomotive operation environmental model study, i.e. locomotive operation daily record are extracted from data source and train is transported Row cross-channel data, constitute the sample data of locomotive operation environment supervised learning algorithm study.Train fortune is extracted from data source The parameter that the data of row consumption information and train schedule information learn as the mechanism of evaluation;
S2:The study of locomotive operation environment and structure;
By the running environment information of locomotive using supervised learning and dynamic time sequence nomography based on history data Training and the structure of locomotive operation environmental model are carried out, locomotive operation environmental model specifically runs ring by learning acquisition locomotive Border, and the specific running environment of locomotive that will be obtained is for control strategy study;
S3:Evaluation mechanism learns;
The information combining assessment mechanism that will be obtained from data source is carried out for given travel route and locomotive state information Target observations in short interval obtain the reward functions of locomotive operation, and reward functions are used for control as the evaluation of estimate of locomotive control Policy learning processed;
S4:Control strategy learns;
Policy learning is controlled to the specific running environment of locomotive using deeply learning method, and by acquisition Reward functions carry out the renewal and optimization of strategy to running status, and then obtain the optimized handling control strategy of locomotive.
Further, the locomotive smart steering method also includes policy update mechanism, the control strategy after optimization Real-time policy update can be carried out using the policy update mechanism, instructed from the basis of current control strategy, it is real When adaptive learning draw the control strategy for more optimizing, realize the successive optimization of locomotive control strategy.
Further, in step s 2, the running environment information of locomotive includes locomotive operation daily record, train operation cross-channel number According to the train for constituting status information in itself and the ambient parameter information in the external world, wherein most parameter ripple in certain scope It is dynamic, it is the fluctuation information that can be observed and predict by historical data, and it is uncertain in actual scene to have fraction parameter Property, and uncertain fluctuation may occur.
Further, the locomotive operation environmental model is based on mechanism model and completes train operation by supervised learning algorithm Basic model parameter learning realizes the covering to general scene, and train operation environment perturbation parameters are completed based on dynamic graph model Practise.
Further, the supervised learning algorithm is decision Tree algorithms or neural network algorithm.
Further, in step s3, the evaluation mechanism includes that train operation scoring and non-standard operation are punished Scoring, the train operation scoring is formulated based on history log, the non-standard operation Penalty Mark mechanism Formulated based on non-standard operation.
Further, in step s 4, complete control strategy by DQN models to learn, based on deeply study Algorithm, the Real-Time Evaluation that the locomotive operation environmental model is acted using locomotive control evaluates mechanism by prize as feedback information Current control action is appreciated or punished, gives the DQN model feedbacks one award evaluation of estimate, the DQN models couplings run shape State is made iteratively the renewal and optimization of strategy.
The beneficial effects of the invention are as follows:
(1) optimized handling of railway locomotive is realized by the autonomous learning of machine, the present invention learns to calculate based on deeply Method, locomotive operation environment and reward functions are realized by the autonomous learning of machine, during whole algorithm design and implementation, As much as possible property avoids artificial participation.
(2) running environment of locomotive and the reward functions of locomotive control are trained and structure using machine learning techniques Build, and taken into account the uncertain security with locomotive control of environmental model.The present invention is used for the running environment of locomotive Supervised learning and dynamic time sequence nomography based on history data carry out training and the structure of model.Wherein dynamic time sequence Nomography is innovatively applied to the study of ambient parameter variation tendency, to set up locomotive operation environmental model.The present invention is directed to The reward functions of locomotive control, it is considered to locomotive control safety issue, respectively in terms of normal operating and non-standard operation two Reward functions value is obtained, and based on train history information, the evaluation mechanism of locomotive control is completed using supervised learning The training of habit.
(3) towards engine optimizing operation and the deeply learning algorithm of real-time policy update mechanism.It is of the invention specific real Shi Zhong, the optimized algorithm scheme suitable for this problem is devised based on deeply learning algorithm (DQN models) in a creative way, and The program can draw real-time policy update mechanism in specific implementation with reference to deep learning Algorithm for Training.
Therefore, the present invention can preferably realize that locomotive intelligent optimization is manipulated, and considerably reduce artificial participation.
Brief description of the drawings
Fig. 1 is locomotive smart steering system structure diagram of the present invention based on deeply study;
Fig. 2 is the technology path flow chart of locomotive smart steering method of the present invention based on deeply study;
Fig. 3 is deeply study basic model flow chart in the present invention;
Fig. 4 is DQN model support compositions in the present invention.
Specific embodiment
Technical scheme is described in detail with reference to the accompanying drawings and examples.
The present embodiment provides a kind of locomotive smart steering system based on deeply study, as shown in figure 1, the system bag Containing four modules, it is respectively:Data source modules, locomotive operation environment learning module, evaluation mechanism study module and control strategy Study module.
Data source modules are used to pre-process the data source for obtaining, and data source includes that locomotive operation daily record, train are transported Row cross-channel data, energy consumption in train journey information and train schedule information, data prediction is to be extracted from data source Locomotive operation daily record and train operation cross-channel data are delivered to locomotive operation environmentology as the characteristic of locomotive operation environment Module is practised, the sample data of locomotive operation environment learning is constituted, by energy consumption in train journey information and train schedule information Evaluation mechanism study module is delivered to, Real-Time Evaluation is carried out to locomotive control for evaluating mechanism study module.
Locomotive operation environment learning module is used to build locomotive operation environmental model, and locomotive operation environment learning includes two Divide the study of parameter, i.e. the study of the underlying parameter part and disturbance parameter part of train runing parameters, learning outcome constitutes machine The specific running environment of car.Generally respectively using classical supervised learning algorithm and dynamic time sequence nomography to this two parts parameter Learnt.The specific running environment of the locomotive of acquisition is delivered to control strategy study mould by locomotive operation environment learning module Block.
The information combining assessment mechanism that evaluation mechanism study module will be obtained from data source modules obtains locomotive operation mistake The reward functions wanted needed for journey.Evaluation mechanism includes the study of train operation scoring and non-standard operation Penalty Mark mechanism Design.Reward functions are evaluated mechanism study module and are delivered to control strategy as the feedback data for evaluating mechanism study module Practise module.
Control strategy study module obtains specific fortune from locomotive running environment study module and evaluation mechanism study module Row environment and reward functions, and deeply study is carried out based on DQN models, that is, carry out the row based on deeply learning method Car optimized handling policy learning is trained, specifically, DQN models and locomotive operation environmental model carry out continuous interactive learning (see Fig. 3), by evaluating reward functions that mechanism study module fed back so as to for instructing the manipulation sequence after train, and By policy update mechanism, the final manipulation of physical strategy of locomotive is obtained.
Above-mentioned locomotive smart steering system is based on the smart steering that locomotive is realized in deeply study, as shown in Fig. 2 used Method is:
Step 1, pre-processes to data source
The characteristic of locomotive operation environmental model study, i.e. locomotive operation daily record are extracted from data source and train is transported Row cross-channel data, constitute the sample data of locomotive operation environment supervised learning algorithm study.Train fortune is extracted from data source The parameter that the data of row consumption information and train schedule information learn as the mechanism of evaluation.
Step 2, study and the structure of locomotive operation environment
The running environment information of locomotive does not only include the row that locomotive operation daily record and train operation cross-channel data are constituted generally Car status information in itself, also including extraneous ambient parameter information, wherein most parameter fluctuates in certain scope, is The fluctuation information that can be observed and predict by historical data;And have fraction parameter be in actual scene it is probabilistic, And uncertain fluctuation may occur.The present invention is by the running environment information of locomotive using the prison based on history data Educational inspector practises training and the structure that probabilistic locomotive operation environmental model is carried out with dynamic time sequence nomography.Specifically, lead to Cross supervised learning algorithm (such as decision tree, neutral net classic algorithm) and be based on mechanism model completion train operation basic model ginseng Mathematics is practised realizing the covering to general scene, and the study of train operation environment perturbation parameters is completed based on dynamic graph model.
Locomotive operation environmental model obtains the specific running environment of locomotive by learning, and the locomotive of acquisition is specifically transported Row environment learns for control strategy.
Step 3, evaluates mechanism study
The study of evaluation mechanism is the award letter that the information combining assessment mechanism that will be obtained from data source obtains locomotive operation Number, reward functions value is used for control strategy and learns as the evaluation of estimate of locomotive control, is that the intensified learning that the present invention is based on is calculated Method, the policy selection foundation on basis.The reward functions value is in general application scenarios (such as game manipulation, robot control) It is determining, objective, it is that the evaluation of estimate is directly obtained according to game rule such as in game manipulation.And in the present invention, award letter Number cannot directly determine that it needs the information knot that will be obtained from data source as the evaluation of locomotive operation according to rule Close evaluation mechanism carries out the target observations in short interval to determine the value for given travel route and locomotive state information.This hair The bright evaluation mechanism that operation is formulated for locomotive driving optimization aim.The evaluation mechanism includes what is formulated based on history log Train operation scoring and by analyzing non-standard operation after formulate non-standard operation Penalty Mark mechanism, especially, Based on the non-standard operation Penalty Mark mechanism that non-standard operation is formulated, it is contemplated that the system requirements of high security, for possible The non-standard operation (such as risk is stopped or exceeded the speed limit on slope) of serious consequence is caused to give maximum penalty value, it is nonstandard to evade such Locomotive control is acted, and the security of strategy generating is effectively ensured.
Step 4, control strategy study
The present invention is controlled policy learning using deeply learning method to the specific running environment of locomotive, and passes through The reward functions of acquisition carry out the renewal and optimization of strategy to running status, and then obtain the optimized handling control strategy of locomotive. Deeply learning method has significant advantage in terms of the optimized handling strategy generating of complication system.Nitrification enhancement can So that algorithm relies on few external information, by continuous repetitive exercise in the environment, and by itself study, optimization behaviour is obtained Vertical control strategy.Deep learning algorithm has significant advantage in terms of complex multi-dimensional data are processed.So, intensified learning with The deeply study that deep learning is combined can solve the problems, such as the optimized handling strategy generating under complication system.Such as Fig. 3 institutes Show, under free position, based on deeply learning algorithm, locomotive operation environmental model is made with the Real-Time Evaluation that locomotive control is acted It is feedback information, evaluation mechanism is by award or punishes current control action, gives DQN model feedbacks one reward functions conduct Award evaluation of estimate, DQN models coupling running statuses are made iteratively the renewal and optimization of strategy.
The present invention carries out the design of deeply learning method based on DQN models.Specifically, DQN models and locomotive operation Environmental model carries out continuous interactive learning, makes and changing using uncertain locomotive operation environment and evaluation mechanism in the present invention Enter, locomotive often performs an operation (action) under free position, and evaluation mechanism just feeds back an award evaluation of estimate, for instructing Manipulation sequence after train, i.e., constantly excitation DQN models carry out the renewal and optimization of strategy, are asked with solving engine optimizing operation Topic, after multiple iteration, the Train Control strategy that model will finally be restrained and be optimized.
The detailed architecture figure of DQN models is as shown in figure 4, wherein interactive environment is uncertain train operation environment.In tool During body is implemented, nitrification enhancement employs the Q-learning algorithms of optimization, and its optimization method is:In Q-learning algorithms The thought of middle combination Experience Replay, i.e., set up a playback storage pool during algorithm iteration, will learn to arrive every time Experience save, next time training when random selection one experience be trained.Using the relatively common extensive chemical of the thought Practising mainly has three below advantage:(1) can effectively break the correlation between status data, reduce the not true of data renewal It is qualitative;(2) harsh conditions of local optimum are caused when can be prevented effectively from algorithmic statement;(3) mesh of nitrification enhancement is solved Mark not fixation problem.Mutually tied with the Q-learning algorithms of optimization using deep learning algorithm (such as deep neural network) in model Close, be obtained in that the element value of approximate Q matrixes (the accumulative valuation functions of train operation described in Q values as Fig. 2), Q in such as Fig. 4 Network is the Q matrix norm types that deep neural network builds.During specific algorithm is implemented, Q network models then update per iteration n times Target Q network parameter, then further updates the DQN differences of DQN models, and Q nets are instructed eventually through gradient descent algorithm Network model continues to optimize training.The application of deep learning method can effectively solve the problem that system state space magnitude is larger and ask Topic.Finally, the selection of locomotive operation (action) is tactful using conventional ε-greedy in DQN models, i.e., the strategy is with very little Probability random selection is operated and with the current optimal operation of greater probability selection, is finally iteratively generating engine optimizing operation plan Slightly.
Additionally, locomotive smart steering method also includes policy update mechanism, the control strategy after optimization being capable of application strategy Update mechanism carries out real-time policy update, that is, instruct from the basis of current control strategy, and real-time adaptive learns Go out the control strategy for more optimizing, realize the successive optimization of locomotive control strategy.
Although being described in detail to principle of the invention above in conjunction with the preferred embodiments of the present invention, this area skill Art personnel are not wrapped to the present invention it should be understood that above-described embodiment is only the explanation to exemplary implementation of the invention Restriction containing scope.Details in embodiment is simultaneously not meant to limit the scope of the invention, without departing substantially from spirit of the invention and In the case of scope, any equivalent transformation based on technical solution of the present invention, simple replacement etc. are obvious to be changed, and is all fallen within Within the scope of the present invention.

Claims (10)

1. it is a kind of based on deeply study locomotive smart steering system, it is characterised in that the locomotive smart steering system Including data source modules, locomotive operation environment learning module, evaluate mechanism study module and control strategy study module;
The data source modules are used to carry out data prediction to the data source for obtaining, and data source described in the data source includes machine Car running log, train operation cross-channel data, energy consumption in train journey information and train schedule information, the data are located in advance Reason is that the locomotive operation daily record and the train operation cross-channel data are delivered into the locomotive operation environment learning module, will The energy consumption in train journey information and the train schedule information conveyance are to the evaluation mechanism study module;
The locomotive operation environment learning module is used to build locomotive operation environmental model, and the locomotive operation environment learning is included The study of the underlying parameter part and disturbance parameter part of train runing parameters, learning outcome constitutes locomotive and specifically runs ring The specific running environment of the locomotive for obtaining is delivered to the control strategy by border, the locomotive operation environment learning module Practise module;
The information combining assessment mechanism that the evaluation mechanism study module will be obtained from the data source modules obtains locomotive fortune Required reward functions during row, the reward functions are as the feedback data of the evaluation mechanism by the evaluation mechanism Study module is delivered to the control strategy study module;
The control strategy study module is distinguished from the locomotive operation environment learning module and the evaluation mechanism study module The specific running environment of the locomotive and the reward functions are obtained, and carries out the train based on deeply learning method and optimized Handling Strategy learning training, continuous interactive learning is carried out with the locomotive operation environmental model, by the evaluation mechanism The reward functions that are fed back of module are practised so as to for instructing the manipulation sequence after train, and by policy update machine System, obtains the final manipulation of physical strategy of the locomotive.
2. it is according to claim 1 based on deeply study locomotive smart steering system, it is characterised in that institute's commentary Valency mechanism includes the study of train operation scoring and non-standard operation Penalty Mark Mechanism Design.
3. it is according to claim 1 based on deeply study locomotive smart steering system, it is characterised in that the control Policy learning module processed carries out deeply study to be carried out based on DQN models, the DQN models and the locomotive operation ring Border model carries out continuous interactive learning.
4. it is a kind of based on deeply study locomotive smart steering method, it is characterised in that the locomotive smart steering method It is achieved by the steps of:
S1:Data source is pre-processed;
The characteristic of locomotive operation environmental model study, i.e. locomotive operation daily record are extracted from data source and train operation is handed over Circuit-switched data, constitutes the sample data of locomotive operation environment supervised learning algorithm study.Train operation energy is extracted from data source The parameter that the data of consumption information and train schedule information learn as the mechanism of evaluation;
S2:The study of locomotive operation environment and structure;
Carried out using supervised learning and dynamic time sequence nomography based on history data by the running environment information of locomotive The training of locomotive operation environmental model and structure, locomotive operation environmental model obtain the specific running environment of locomotive by learning, And the specific running environment of locomotive that will be obtained learns for control strategy;
S3:Evaluation mechanism learns;
The information combining assessment mechanism that will be obtained from data source carries out short area for given travel route and locomotive state information Interior target observations obtain the reward functions of locomotive operation, and reward functions be used to control plan as the evaluation of estimate of locomotive control Slightly learn;
S4:Control strategy learns;
Policy learning, and the award by obtaining are controlled to the specific running environment of locomotive using deeply learning method Function pair running status carries out the renewal and optimization of strategy, and then obtains the optimized handling control strategy of locomotive.
5. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that the machine Car smart steering method also includes policy update mechanism, and the control strategy after optimization can apply the policy update mechanism Real-time policy update is carried out, is instructed from the basis of current control strategy, real-time adaptive study draws what is more optimized Control strategy, realizes the successive optimization of locomotive control strategy.
6. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that in step In S2, the running environment information of locomotive includes train that locomotive operation daily record, train operation cross-channel data constitute state in itself Information and the ambient parameter information in the external world, wherein most parameter fluctuates in certain scope, is that can be seen by historical data The fluctuation information examined and predict, and it is probabilistic in actual scene to have fraction parameter, and may occur can not be pre- The fluctuation of survey.
7. it is according to claim 6 based on deeply study locomotive smart steering method, it is characterised in that the machine Car running environment model completes train operation basic model parameter learning to realize by supervised learning algorithm based on mechanism model Covering to general scene, the study of train operation environment perturbation parameters is completed based on dynamic graph model.
8. it is according to claim 7 based on deeply study locomotive smart steering method, it is characterised in that the prison Learning algorithm is superintended and directed for decision Tree algorithms or neural network algorithm.
9. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that in step In S3, the evaluation mechanism includes train operation scoring and non-standard operation Penalty Mark mechanism, and the train operation is commented Extension set system is formulated based on history log, and the non-standard operation Penalty Mark mechanism is formulated based on non-standard operation.
10. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that in step In rapid S4, control strategy is completed by DQN models and is learnt, based on the deeply learning algorithm, the locomotive operation environment , used as feedback information, evaluation mechanism is by award or punishes that current manipulation is moved for the Real-Time Evaluation that model is acted using locomotive control Make, give the DQN model feedbacks one award evaluation of estimate, the DQN models couplings running status is made iteratively strategy more Newly with optimization.
CN201710045758.0A 2017-01-20 2017-01-20 A kind of locomotive smart steering method and system based on deeply study Active CN106842925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710045758.0A CN106842925B (en) 2017-01-20 2017-01-20 A kind of locomotive smart steering method and system based on deeply study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710045758.0A CN106842925B (en) 2017-01-20 2017-01-20 A kind of locomotive smart steering method and system based on deeply study

Publications (2)

Publication Number Publication Date
CN106842925A true CN106842925A (en) 2017-06-13
CN106842925B CN106842925B (en) 2019-10-11

Family

ID=59119196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710045758.0A Active CN106842925B (en) 2017-01-20 2017-01-20 A kind of locomotive smart steering method and system based on deeply study

Country Status (1)

Country Link
CN (1) CN106842925B (en)

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN107239628A (en) * 2017-06-15 2017-10-10 清华大学 A kind of uncertain locomotive simulation model system construction method based on dynamic time sequence figure
CN107315573A (en) * 2017-07-19 2017-11-03 北京上格云技术有限公司 Build control method, storage medium and the terminal device of Mechatronic Systems
CN107315572A (en) * 2017-07-19 2017-11-03 北京上格云技术有限公司 Build control method, storage medium and the terminal device of Mechatronic Systems
CN107367929A (en) * 2017-07-19 2017-11-21 北京上格云技术有限公司 Update method, storage medium and the terminal device of Q value matrixs
CN107450593A (en) * 2017-08-30 2017-12-08 清华大学 A kind of unmanned plane autonomous navigation method and system
CN107544516A (en) * 2017-10-11 2018-01-05 苏州大学 Automated driving system and method based on relative entropy depth against intensified learning
CN107563426A (en) * 2017-08-25 2018-01-09 清华大学 A kind of learning method of locomotive operation temporal aspect
CN107832836A (en) * 2017-11-27 2018-03-23 清华大学 Model-free depth enhancing study heuristic approach and device
CN108161934A (en) * 2017-12-25 2018-06-15 清华大学 A kind of method for learning to realize robot multi peg-in-hole using deeply
CN108333959A (en) * 2018-03-09 2018-07-27 清华大学 A kind of energy saving method of operating of locomotive based on convolutional neural networks model
CN108549237A (en) * 2018-05-16 2018-09-18 华南理工大学 Preview based on depth enhancing study controls humanoid robot gait's planing method
CN108820157A (en) * 2018-04-25 2018-11-16 武汉理工大学 A kind of Ship Intelligent Collision Avoidance method based on intensified learning
CN108984275A (en) * 2018-08-27 2018-12-11 洛阳中科龙网创新科技有限公司 The agricultural driver training method of Intelligent unattended based on Unity3D and depth enhancing study
CN109204390A (en) * 2018-09-29 2019-01-15 交控科技股份有限公司 A kind of Train control method based on deep learning
CN109225640A (en) * 2018-10-15 2019-01-18 厦门邑通软件科技有限公司 A kind of wisdom electric precipitation power-economizing method
CN109243021A (en) * 2018-08-28 2019-01-18 余利 Deeply learning type intelligent door lock system and device based on user experience analysis
CN109472984A (en) * 2018-12-27 2019-03-15 苏州科技大学 Signalized control method, system and storage medium based on deeply study
CN109740839A (en) * 2018-11-23 2019-05-10 北京交通大学 Train Dynamic method of adjustment and system under a kind of emergency event
CN109782600A (en) * 2019-01-25 2019-05-21 东华大学 A method of autonomous mobile robot navigation system is established by virtual environment
CN109835375A (en) * 2019-01-29 2019-06-04 中国铁道科学研究院集团有限公司通信信号研究所 High Speed Railway Trains automated driving system based on artificial intelligence technology
CN109919243A (en) * 2019-03-15 2019-06-21 天津拾起卖科技有限公司 A kind of scrap iron and steel type automatic identifying method and device based on CNN
CN109919319A (en) * 2018-12-31 2019-06-21 中国科学院软件研究所 Deeply learning method and equipment based on multiple history best Q networks
CN109977998A (en) * 2019-02-14 2019-07-05 网易(杭州)网络有限公司 Information processing method and device, storage medium and electronic device
CN110147891A (en) * 2019-05-23 2019-08-20 北京地平线机器人技术研发有限公司 Method, apparatus and electronic equipment applied to intensified learning training process
CN110194041A (en) * 2019-05-19 2019-09-03 瑞立集团瑞安汽车零部件有限公司 The adaptive bodywork height adjusting method of Multi-source Information Fusion
EP3557489A1 (en) * 2018-04-19 2019-10-23 Siemens Mobility GmbH Energy optimisation in operation of a rail vehicle
CN110390398A (en) * 2018-04-13 2019-10-29 北京智行者科技有限公司 On-line study method
CN110687802A (en) * 2018-07-06 2020-01-14 珠海格力电器股份有限公司 Intelligent household electrical appliance control method and intelligent household electrical appliance control device
WO2020098226A1 (en) * 2018-11-16 2020-05-22 Huawei Technologies Co., Ltd. System and methods of efficient, continuous, and safe learning using first principles and constraints
CN111324099A (en) * 2018-12-12 2020-06-23 上汽通用汽车有限公司 Machine learning-based calibration method and machine learning-based calibration system
CN111381511A (en) * 2018-12-27 2020-07-07 松下知识产权经营株式会社 Jet lag reduction system and jet lag reduction method
CN111542836A (en) * 2017-10-04 2020-08-14 华为技术有限公司 Method for selecting action for object by using neural network
CN111581178A (en) * 2020-05-12 2020-08-25 国网安徽省电力有限公司信息通信分公司 Ceph system performance tuning strategy and system based on deep reinforcement learning
CN111670468A (en) * 2017-12-18 2020-09-15 日立汽车系统株式会社 Moving body behavior prediction device and moving body behavior prediction method
CN111781940A (en) * 2020-05-19 2020-10-16 中车工业研究院有限公司 Train attitude control method based on DQN reinforcement learning
US10831208B2 (en) 2018-11-01 2020-11-10 Ford Global Technologies, Llc Vehicle neural network processing
CN111965981A (en) * 2020-09-07 2020-11-20 厦门大学 Aeroengine reinforcement learning control method and system
CN112193280A (en) * 2020-12-04 2021-01-08 华东交通大学 Heavy-load train reinforcement learning control method and system
CN113537603A (en) * 2021-07-21 2021-10-22 北京交通大学 Intelligent scheduling control method and system for high-speed train
CN113525462A (en) * 2021-08-06 2021-10-22 中国科学院自动化研究所 Timetable adjusting method and device under delay condition and electronic equipment
CN114450131A (en) * 2019-09-30 2022-05-06 三菱电机株式会社 Non-derivative model learning system and design for robot system
US11472452B2 (en) 2019-10-11 2022-10-18 Progress Rail Services Corporation Machine learning based train handling evaluation
CN115598985A (en) * 2022-11-01 2023-01-13 南栖仙策(南京)科技有限公司(Cn) Feedback controller training method and device, electronic equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981408A (en) * 2012-12-10 2013-03-20 华东交通大学 Running process modeling and adaptive control method for motor train unit
CN103019267A (en) * 2012-12-10 2013-04-03 华东交通大学 Predicative control method for modeling and running speed of adaptive network-based fuzzy inference system (ANFIS) of high-speed train
CN103870892A (en) * 2014-03-26 2014-06-18 北京清软英泰信息技术有限公司 Method and system for achieving railway locomotive operation control from off-line mode to on-line mode
CN103879414A (en) * 2014-03-26 2014-06-25 北京清软英泰信息技术有限公司 Locomotive optimal manipulation method based on self-adaption A-Star algorithm
CN104951425A (en) * 2015-07-20 2015-09-30 东北大学 Cloud service performance adaptive action type selection method based on deep learning
CN105427016A (en) * 2015-10-28 2016-03-23 南车株洲电力机车研究所有限公司 Locomotive vehicle data processing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981408A (en) * 2012-12-10 2013-03-20 华东交通大学 Running process modeling and adaptive control method for motor train unit
CN103019267A (en) * 2012-12-10 2013-04-03 华东交通大学 Predicative control method for modeling and running speed of adaptive network-based fuzzy inference system (ANFIS) of high-speed train
CN103870892A (en) * 2014-03-26 2014-06-18 北京清软英泰信息技术有限公司 Method and system for achieving railway locomotive operation control from off-line mode to on-line mode
CN103879414A (en) * 2014-03-26 2014-06-25 北京清软英泰信息技术有限公司 Locomotive optimal manipulation method based on self-adaption A-Star algorithm
CN104951425A (en) * 2015-07-20 2015-09-30 东北大学 Cloud service performance adaptive action type selection method based on deep learning
CN105427016A (en) * 2015-10-28 2016-03-23 南车株洲电力机车研究所有限公司 Locomotive vehicle data processing method and system

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239628A (en) * 2017-06-15 2017-10-10 清华大学 A kind of uncertain locomotive simulation model system construction method based on dynamic time sequence figure
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN107194612B (en) * 2017-06-20 2020-10-13 清华大学 Train operation scheduling method and system based on deep reinforcement learning
CN107315573A (en) * 2017-07-19 2017-11-03 北京上格云技术有限公司 Build control method, storage medium and the terminal device of Mechatronic Systems
CN107315572A (en) * 2017-07-19 2017-11-03 北京上格云技术有限公司 Build control method, storage medium and the terminal device of Mechatronic Systems
CN107367929A (en) * 2017-07-19 2017-11-21 北京上格云技术有限公司 Update method, storage medium and the terminal device of Q value matrixs
CN107315573B (en) * 2017-07-19 2020-06-16 北京上格云技术有限公司 Control method of building electromechanical system, storage medium and terminal equipment
CN107315572B (en) * 2017-07-19 2020-08-11 北京上格云技术有限公司 Control method of building electromechanical system, storage medium and terminal equipment
CN107563426A (en) * 2017-08-25 2018-01-09 清华大学 A kind of learning method of locomotive operation temporal aspect
WO2019037557A1 (en) * 2017-08-25 2019-02-28 清华大学 Method for learning time sequence characteristics of locomotive operation
CN107450593B (en) * 2017-08-30 2020-06-12 清华大学 Unmanned aerial vehicle autonomous navigation method and system
CN107450593A (en) * 2017-08-30 2017-12-08 清华大学 A kind of unmanned plane autonomous navigation method and system
CN111542836B (en) * 2017-10-04 2024-05-17 华为技术有限公司 Method for selecting action by using neural network as object
CN111542836A (en) * 2017-10-04 2020-08-14 华为技术有限公司 Method for selecting action for object by using neural network
CN107544516A (en) * 2017-10-11 2018-01-05 苏州大学 Automated driving system and method based on relative entropy depth against intensified learning
CN107832836B (en) * 2017-11-27 2020-04-21 清华大学 Model-free deep reinforcement learning exploration method and device
CN107832836A (en) * 2017-11-27 2018-03-23 清华大学 Model-free depth enhancing study heuristic approach and device
CN111670468A (en) * 2017-12-18 2020-09-15 日立汽车系统株式会社 Moving body behavior prediction device and moving body behavior prediction method
CN108161934A (en) * 2017-12-25 2018-06-15 清华大学 A kind of method for learning to realize robot multi peg-in-hole using deeply
CN108161934B (en) * 2017-12-25 2020-06-09 清华大学 Method for realizing robot multi-axis hole assembly by utilizing deep reinforcement learning
CN108333959A (en) * 2018-03-09 2018-07-27 清华大学 A kind of energy saving method of operating of locomotive based on convolutional neural networks model
CN110390398B (en) * 2018-04-13 2021-09-10 北京智行者科技有限公司 Online learning method
CN110390398A (en) * 2018-04-13 2019-10-29 北京智行者科技有限公司 On-line study method
EP3557489A1 (en) * 2018-04-19 2019-10-23 Siemens Mobility GmbH Energy optimisation in operation of a rail vehicle
CN108820157A (en) * 2018-04-25 2018-11-16 武汉理工大学 A kind of Ship Intelligent Collision Avoidance method based on intensified learning
CN108549237A (en) * 2018-05-16 2018-09-18 华南理工大学 Preview based on depth enhancing study controls humanoid robot gait's planing method
CN108549237B (en) * 2018-05-16 2020-04-28 华南理工大学 Preset control humanoid robot gait planning method based on deep reinforcement learning
CN110687802A (en) * 2018-07-06 2020-01-14 珠海格力电器股份有限公司 Intelligent household electrical appliance control method and intelligent household electrical appliance control device
CN108984275A (en) * 2018-08-27 2018-12-11 洛阳中科龙网创新科技有限公司 The agricultural driver training method of Intelligent unattended based on Unity3D and depth enhancing study
CN109243021A (en) * 2018-08-28 2019-01-18 余利 Deeply learning type intelligent door lock system and device based on user experience analysis
CN109204390A (en) * 2018-09-29 2019-01-15 交控科技股份有限公司 A kind of Train control method based on deep learning
CN109204390B (en) * 2018-09-29 2021-03-12 交控科技股份有限公司 Train control method based on deep learning
CN109225640A (en) * 2018-10-15 2019-01-18 厦门邑通软件科技有限公司 A kind of wisdom electric precipitation power-economizing method
US10831208B2 (en) 2018-11-01 2020-11-10 Ford Global Technologies, Llc Vehicle neural network processing
WO2020098226A1 (en) * 2018-11-16 2020-05-22 Huawei Technologies Co., Ltd. System and methods of efficient, continuous, and safe learning using first principles and constraints
CN109740839A (en) * 2018-11-23 2019-05-10 北京交通大学 Train Dynamic method of adjustment and system under a kind of emergency event
CN109740839B (en) * 2018-11-23 2021-06-18 北京交通大学 Train dynamic adjustment method and system under emergency
CN111324099A (en) * 2018-12-12 2020-06-23 上汽通用汽车有限公司 Machine learning-based calibration method and machine learning-based calibration system
CN111381511B (en) * 2018-12-27 2023-09-01 松下知识产权经营株式会社 Time difference reaction reducing system and time difference reaction reducing method
CN111381511A (en) * 2018-12-27 2020-07-07 松下知识产权经营株式会社 Jet lag reduction system and jet lag reduction method
CN109472984A (en) * 2018-12-27 2019-03-15 苏州科技大学 Signalized control method, system and storage medium based on deeply study
CN109919319A (en) * 2018-12-31 2019-06-21 中国科学院软件研究所 Deeply learning method and equipment based on multiple history best Q networks
CN109782600A (en) * 2019-01-25 2019-05-21 东华大学 A method of autonomous mobile robot navigation system is established by virtual environment
CN109835375A (en) * 2019-01-29 2019-06-04 中国铁道科学研究院集团有限公司通信信号研究所 High Speed Railway Trains automated driving system based on artificial intelligence technology
CN109977998A (en) * 2019-02-14 2019-07-05 网易(杭州)网络有限公司 Information processing method and device, storage medium and electronic device
CN109977998B (en) * 2019-02-14 2022-05-03 网易(杭州)网络有限公司 Information processing method and apparatus, storage medium, and electronic apparatus
CN109919243A (en) * 2019-03-15 2019-06-21 天津拾起卖科技有限公司 A kind of scrap iron and steel type automatic identifying method and device based on CNN
CN110194041A (en) * 2019-05-19 2019-09-03 瑞立集团瑞安汽车零部件有限公司 The adaptive bodywork height adjusting method of Multi-source Information Fusion
CN110147891B (en) * 2019-05-23 2021-06-01 北京地平线机器人技术研发有限公司 Method and device applied to reinforcement learning training process and electronic equipment
CN110147891A (en) * 2019-05-23 2019-08-20 北京地平线机器人技术研发有限公司 Method, apparatus and electronic equipment applied to intensified learning training process
CN114450131A (en) * 2019-09-30 2022-05-06 三菱电机株式会社 Non-derivative model learning system and design for robot system
US11472452B2 (en) 2019-10-11 2022-10-18 Progress Rail Services Corporation Machine learning based train handling evaluation
CN111581178A (en) * 2020-05-12 2020-08-25 国网安徽省电力有限公司信息通信分公司 Ceph system performance tuning strategy and system based on deep reinforcement learning
CN111781940A (en) * 2020-05-19 2020-10-16 中车工业研究院有限公司 Train attitude control method based on DQN reinforcement learning
CN111781940B (en) * 2020-05-19 2022-12-20 中车工业研究院有限公司 Train attitude control method based on DQN reinforcement learning
CN111965981A (en) * 2020-09-07 2020-11-20 厦门大学 Aeroengine reinforcement learning control method and system
CN111965981B (en) * 2020-09-07 2022-02-22 厦门大学 Aeroengine reinforcement learning control method and system
CN112193280A (en) * 2020-12-04 2021-01-08 华东交通大学 Heavy-load train reinforcement learning control method and system
US11205124B1 (en) 2020-12-04 2021-12-21 East China Jiaotong University Method and system for controlling heavy-haul train based on reinforcement learning
CN112193280B (en) * 2020-12-04 2021-03-16 华东交通大学 Heavy-load train reinforcement learning control method and system
CN113537603A (en) * 2021-07-21 2021-10-22 北京交通大学 Intelligent scheduling control method and system for high-speed train
CN113525462A (en) * 2021-08-06 2021-10-22 中国科学院自动化研究所 Timetable adjusting method and device under delay condition and electronic equipment
CN115598985A (en) * 2022-11-01 2023-01-13 南栖仙策(南京)科技有限公司(Cn) Feedback controller training method and device, electronic equipment and medium
CN115598985B (en) * 2022-11-01 2024-02-02 南栖仙策(南京)高新技术有限公司 Training method and device of feedback controller, electronic equipment and medium

Also Published As

Publication number Publication date
CN106842925B (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN106842925B (en) A kind of locomotive smart steering method and system based on deeply study
CN107194612A (en) A kind of train operation dispatching method learnt based on deeply and system
CN105700526B (en) Online limit of sequence learning machine method with independent learning ability
CN107943022A (en) A kind of PID locomotive automatic Pilot optimal control methods based on intensified learning
Palmroth Performance monitoring and operator assistance systems in mobile machines
CN109635246A (en) A kind of multiattribute data modeling method based on deep learning
CN108333959A (en) A kind of energy saving method of operating of locomotive based on convolutional neural networks model
Buche et al. An expert system manipulating knowledge to help human learners into virtual environment
Guevarra et al. Augmenting flight training with AI to efficiently train pilots
Li et al. Complementary learning-team machines to enlighten and exploit human expertise
CN106647279B (en) A kind of locomotive smart steering optimized calculation method based on fuzzy rule
CN117719535A (en) Human feedback automatic driving vehicle interactive self-adaptive decision control method
Forneris et al. Implementing Deep Reinforcement Learning (DRL)-based Driving Styles for Non-Player Vehicles
Liu et al. Design of transfer reinforcement learning mechanisms for autonomous collision avoidance
Stein et al. Learning in context: enhancing machine learning with context-based reasoning
Knox et al. Understanding human teaching modalities in reinforcement learning environments: A preliminary report
CN105279978B (en) Intersection traffic signal control method and equipment
Mi et al. Smart Port and Artificial Intelligence
Yan Research on path planning of robot based on artificial intelligence algorithm
Li Introduction to Reinforcement Learning
Yuan et al. Human feedback enhanced autonomous intelligent systems: a perspective from intelligent driving
Tervo et al. A hierarchical fuzzy inference method for skill evaluation of machine operators
Weigand et al. Reinforcement learning using guided observability
LeCun A path to ai
Pighetti et al. High-level decision-making non-player vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant