CN106842925A

CN106842925A - A kind of locomotive smart steering method and system based on deeply study

Info

Publication number: CN106842925A
Application number: CN201710045758.0A
Authority: CN
Inventors: 赵曦滨; 夏雅楠; 黄晋; 卢莎; 任育琦; 顾明; 孙家广
Original assignee: Tsinghua University; CRRC Dalian Institute Co Ltd; CRRC Information Technology Co Ltd
Current assignee: Tsinghua University; CRRC Dalian Institute Co Ltd; CRRC Information Technology Co Ltd
Priority date: 2017-01-20
Filing date: 2017-01-20
Publication date: 2017-06-13
Anticipated expiration: 2037-01-20
Also published as: CN106842925B

Abstract

The present invention relates to a kind of locomotive smart steering method and system based on deeply study, the system includes data source modules, locomotive operation environment learning module, evaluates mechanism study module and control strategy study module, data source modules are for locomotive operation environment learning module and evaluate the data input needed for mechanism study module is provided, and the specific running environment and reward functions value that locomotive operation environment learning module and evaluation mechanism study module will be obtained respectively are exported to control strategy study module.Based on deeply learning algorithm, the Real-Time Evaluation that locomotive operation environmental model is acted using locomotive control is used as feedback information, by awarding or punishing current control action, a reward functions are fed back as award evaluation of estimate to control strategy, and control strategy combination running status is made iteratively the renewal and optimization of strategy.The present invention can preferably realize that locomotive intelligent optimization is manipulated, and considerably reduce artificial participation.

Description

A kind of locomotive smart steering method and system based on deeply study

Technical field

The present invention relates to a kind of locomotive control method and system, more particularly to a kind of locomotive intelligence based on deeply study Energy method of operating and system, belong to field of locomotive control.

Background technology

The automatic Pilot and optimized handling of railway locomotive are for liberation manpower, reducing energy consumption, raising locomotive punctuality rate and peace The aspects such as full property play an important roll.Because train operation environment is complicated, influence factor is numerous, scholars are excellent in locomotive control Changing algorithm has carried out numerous studies, wherein can substantially be divided three classes：Analytical Solution method, numerical optimization and didactic Optimized algorithm.In Analytical Solution method application, two kinds are generally divided into：It is a kind of be applied to input tractive force and brake force be from The locomotive of type is dissipated, the tractive force and brake force that another kind is applied to input are the locomotives of continuous type.But Analytical Solution side Constraint in method is excessively simple, it is impossible to which fitting locomotive shows ruuning situation well, and numerical optimization real-time is poor, difficult For the real-time optimal control of locomotive, didactic optimized algorithm has the shortcomings that artificial dependency degree is excessive.Current locomotive Manipulate real time control algorithms typically can be all designed based on ad hoc hypothesis, it is difficult to suitable for the operating condition that locomotive is complicated, from And it is difficult to ensure that locomotive operation safety.

In recent years, the engine optimizing control based on machine learning artificial intelligence technology also turns into study hotspot.Luo Hengyu and Xu Hongze propose a comprehensive intelligent control system for being applied to express locomotive automated control operation system System.Comprising multiple fuzzy neural network controllers in system, and selected automatically with the running status that expert decision system is based on locomotive Optimal controller is selected to realize effective control of locomotive.Heqing Sun et al. propose a learning algorithm for iteration with reality The tracking of existing locomotive operation track, the algorithm is based on dynamics model of the locomotive, use in conjunction error feedback mechanism.They pass through What theory analysis demonstrated algorithm can convergence.Lixing Yang et al. are for the real-time locomotive under condition of uncertainty interference The realization of control system, two RTO algorithms and an on-line learning algorithm are proposed based on expert's study, and algorithm considers not true The interference of qualitative condition, meets the requirement of multiple target.Jia TengYin et al. are added on the basis of existing ATO algorithms Stopped algorithm (HSA) based on data mining algorithm and expert's study and didactic locomotive, form the STO algorithms of optimization. These researchs by means of pilot steering experience to a certain extent, and being aided with the modes such as machine learning by expert system realizes locomotive Optimized handling, but it is excessive and be difficult to ensure that effect of optimization to still suffer from artificial participation.

The development of deeply study (Deep Reinforcement Learning) also result in machine learning field Huge sensation.Research team with DeepMind team as representative proposes the depth based on DQN (Deep Q-Network) first Intensified learning method, and use the partial games of Atari 2600 as test object, as a result can exceed human player.The machine Breakthrough on learning art is then delivered on Nature periodicals, causes the huge sensation of machine learning research field. The theoretical developments process can trace back to the related work that Lange in 2010 does earliest, and he proposes Deep auto-encoder For the relevant control of view-based access control model.Cuccu in 2011 et al. and Abtahi etc. is studied in related fields per capita, wherein, Abtahi proposes the method for approaching device replaced with DBN in traditional intensified learning, and this is non-with the thought of deeply study Very close to.2012, Lange further started to do application, it is proposed that Deep Fitted Q learn for wagon control.2013 Year, Deep Mind team has delivered their article on NIPS, convolutional neural networks and intensified learning has been combined, with The Value Function that raw image data is acted as input, using each are played as output by Atari 2600 As test, there are 6 to exceed human levels in 7 game for finding the method test.DeepMind team exists afterwards The DQN articles of modified version are delivered on Nature, has caused the extensive concern of people.It is similar that experiment shows that the method is more applied to The optimization control process such as game, locomotive control, new thinking and opportunity are provided to railway locomotive optimized handling.

The content of the invention

The present invention realizes applying machine learning completely using the important breakthrough of machine learning field deeply learning method Artificial intelligence means carry out railway locomotive optimized handling.For the target, emphasis of the invention is the depth of engine optimizing operation The evaluation mechanism study of locomotive operation environment and locomotive real time management needed for nitrification enhancement, and deeply learning process Also realized using machine learning method, and uncertain and influence operation safety the non-standard operation in environment will be taken into account Deng.

A kind of locomotive smart steering system based on deeply study, it is characterised in that the locomotive smart steering system System includes data source modules, locomotive operation environment learning module, evaluates mechanism study module and control strategy study module；

The data source modules are used to carry out data prediction to the data source for obtaining, data source bag described in the data source Include locomotive operation daily record, train operation cross-channel data, energy consumption in train journey information and train schedule information, the data Pretreatment is that the locomotive operation daily record and the train operation cross-channel data are delivered into the locomotive operation environment learning mould Block, by the energy consumption in train journey information and the train schedule information conveyance to the evaluation mechanism study module；

The locomotive operation environment learning module is used to build locomotive operation environmental model, the locomotive operation environment learning The study of underlying parameter part and disturbance parameter part comprising train runing parameters, learning outcome constitutes locomotive and specifically runs The specific running environment of the locomotive for obtaining is delivered to the control strategy by environment, the locomotive operation environment learning module Study module；

The information combining assessment mechanism that the evaluation mechanism study module will be obtained from the data source modules obtains machine The reward functions wanted needed for car running, the reward functions are as the feedback data of the evaluation mechanism by the evaluation Mechanism study module is delivered to the control strategy study module；

The control strategy study module is from the locomotive operation environment learning module and the evaluation mechanism study module The specific running environment of the locomotive and the reward functions are obtained respectively, and carry out the train based on deeply learning method Optimized handling policy learning is trained, and continuous interactive learning is carried out with the locomotive operation environmental model, by the evaluation machine Reward functions that study module processed is fed back so as to for instructing the manipulation sequence after train, and by policy update Mechanism, obtains the final manipulation of physical strategy of the locomotive.

Further, the evaluation mechanism includes the study of train operation scoring and non-standard operation Penalty Mark mechanism Design.

Further, the control strategy study module carries out deeply study is carried out based on DQN models, described DQN models carry out continuous interactive learning with the locomotive operation environmental model.

Present invention additionally comprises a kind of locomotive smart steering method based on deeply study, it is characterised in that the machine Car smart steering method is achieved by the steps of：

S1：Data source is pre-processed；

The characteristic of locomotive operation environmental model study, i.e. locomotive operation daily record are extracted from data source and train is transported Row cross-channel data, constitute the sample data of locomotive operation environment supervised learning algorithm study.Train fortune is extracted from data source The parameter that the data of row consumption information and train schedule information learn as the mechanism of evaluation；

S2：The study of locomotive operation environment and structure；

By the running environment information of locomotive using supervised learning and dynamic time sequence nomography based on history data Training and the structure of locomotive operation environmental model are carried out, locomotive operation environmental model specifically runs ring by learning acquisition locomotive Border, and the specific running environment of locomotive that will be obtained is for control strategy study；

S3：Evaluation mechanism learns；

The information combining assessment mechanism that will be obtained from data source is carried out for given travel route and locomotive state information Target observations in short interval obtain the reward functions of locomotive operation, and reward functions are used for control as the evaluation of estimate of locomotive control Policy learning processed；

S4：Control strategy learns；

Policy learning is controlled to the specific running environment of locomotive using deeply learning method, and by acquisition Reward functions carry out the renewal and optimization of strategy to running status, and then obtain the optimized handling control strategy of locomotive.

Further, the locomotive smart steering method also includes policy update mechanism, the control strategy after optimization Real-time policy update can be carried out using the policy update mechanism, instructed from the basis of current control strategy, it is real When adaptive learning draw the control strategy for more optimizing, realize the successive optimization of locomotive control strategy.

Further, in step s 2, the running environment information of locomotive includes locomotive operation daily record, train operation cross-channel number According to the train for constituting status information in itself and the ambient parameter information in the external world, wherein most parameter ripple in certain scope It is dynamic, it is the fluctuation information that can be observed and predict by historical data, and it is uncertain in actual scene to have fraction parameter Property, and uncertain fluctuation may occur.

Further, the locomotive operation environmental model is based on mechanism model and completes train operation by supervised learning algorithm Basic model parameter learning realizes the covering to general scene, and train operation environment perturbation parameters are completed based on dynamic graph model Practise.

Further, the supervised learning algorithm is decision Tree algorithms or neural network algorithm.

Further, in step s3, the evaluation mechanism includes that train operation scoring and non-standard operation are punished Scoring, the train operation scoring is formulated based on history log, the non-standard operation Penalty Mark mechanism Formulated based on non-standard operation.

Further, in step s 4, complete control strategy by DQN models to learn, based on deeply study Algorithm, the Real-Time Evaluation that the locomotive operation environmental model is acted using locomotive control evaluates mechanism by prize as feedback information Current control action is appreciated or punished, gives the DQN model feedbacks one award evaluation of estimate, the DQN models couplings run shape State is made iteratively the renewal and optimization of strategy.

The beneficial effects of the invention are as follows：

(1) optimized handling of railway locomotive is realized by the autonomous learning of machine, the present invention learns to calculate based on deeply Method, locomotive operation environment and reward functions are realized by the autonomous learning of machine, during whole algorithm design and implementation, As much as possible property avoids artificial participation.

(2) running environment of locomotive and the reward functions of locomotive control are trained and structure using machine learning techniques Build, and taken into account the uncertain security with locomotive control of environmental model.The present invention is used for the running environment of locomotive Supervised learning and dynamic time sequence nomography based on history data carry out training and the structure of model.Wherein dynamic time sequence Nomography is innovatively applied to the study of ambient parameter variation tendency, to set up locomotive operation environmental model.The present invention is directed to The reward functions of locomotive control, it is considered to locomotive control safety issue, respectively in terms of normal operating and non-standard operation two Reward functions value is obtained, and based on train history information, the evaluation mechanism of locomotive control is completed using supervised learning The training of habit.

(3) towards engine optimizing operation and the deeply learning algorithm of real-time policy update mechanism.It is of the invention specific real Shi Zhong, the optimized algorithm scheme suitable for this problem is devised based on deeply learning algorithm (DQN models) in a creative way, and The program can draw real-time policy update mechanism in specific implementation with reference to deep learning Algorithm for Training.

Therefore, the present invention can preferably realize that locomotive intelligent optimization is manipulated, and considerably reduce artificial participation.

Brief description of the drawings

Fig. 1 is locomotive smart steering system structure diagram of the present invention based on deeply study；

Fig. 2 is the technology path flow chart of locomotive smart steering method of the present invention based on deeply study；

Fig. 3 is deeply study basic model flow chart in the present invention；

Fig. 4 is DQN model support compositions in the present invention.

Specific embodiment

Technical scheme is described in detail with reference to the accompanying drawings and examples.

The present embodiment provides a kind of locomotive smart steering system based on deeply study, as shown in figure 1, the system bag Containing four modules, it is respectively：Data source modules, locomotive operation environment learning module, evaluation mechanism study module and control strategy Study module.

Data source modules are used to pre-process the data source for obtaining, and data source includes that locomotive operation daily record, train are transported Row cross-channel data, energy consumption in train journey information and train schedule information, data prediction is to be extracted from data source Locomotive operation daily record and train operation cross-channel data are delivered to locomotive operation environmentology as the characteristic of locomotive operation environment Module is practised, the sample data of locomotive operation environment learning is constituted, by energy consumption in train journey information and train schedule information Evaluation mechanism study module is delivered to, Real-Time Evaluation is carried out to locomotive control for evaluating mechanism study module.

Locomotive operation environment learning module is used to build locomotive operation environmental model, and locomotive operation environment learning includes two Divide the study of parameter, i.e. the study of the underlying parameter part and disturbance parameter part of train runing parameters, learning outcome constitutes machine The specific running environment of car.Generally respectively using classical supervised learning algorithm and dynamic time sequence nomography to this two parts parameter Learnt.The specific running environment of the locomotive of acquisition is delivered to control strategy study mould by locomotive operation environment learning module Block.

The information combining assessment mechanism that evaluation mechanism study module will be obtained from data source modules obtains locomotive operation mistake The reward functions wanted needed for journey.Evaluation mechanism includes the study of train operation scoring and non-standard operation Penalty Mark mechanism Design.Reward functions are evaluated mechanism study module and are delivered to control strategy as the feedback data for evaluating mechanism study module Practise module.

Control strategy study module obtains specific fortune from locomotive running environment study module and evaluation mechanism study module Row environment and reward functions, and deeply study is carried out based on DQN models, that is, carry out the row based on deeply learning method Car optimized handling policy learning is trained, specifically, DQN models and locomotive operation environmental model carry out continuous interactive learning (see Fig. 3), by evaluating reward functions that mechanism study module fed back so as to for instructing the manipulation sequence after train, and By policy update mechanism, the final manipulation of physical strategy of locomotive is obtained.

Above-mentioned locomotive smart steering system is based on the smart steering that locomotive is realized in deeply study, as shown in Fig. 2 used Method is：

Step 1, pre-processes to data source

The characteristic of locomotive operation environmental model study, i.e. locomotive operation daily record are extracted from data source and train is transported Row cross-channel data, constitute the sample data of locomotive operation environment supervised learning algorithm study.Train fortune is extracted from data source The parameter that the data of row consumption information and train schedule information learn as the mechanism of evaluation.

Step 2, study and the structure of locomotive operation environment

The running environment information of locomotive does not only include the row that locomotive operation daily record and train operation cross-channel data are constituted generally Car status information in itself, also including extraneous ambient parameter information, wherein most parameter fluctuates in certain scope, is The fluctuation information that can be observed and predict by historical data；And have fraction parameter be in actual scene it is probabilistic, And uncertain fluctuation may occur.The present invention is by the running environment information of locomotive using the prison based on history data Educational inspector practises training and the structure that probabilistic locomotive operation environmental model is carried out with dynamic time sequence nomography.Specifically, lead to Cross supervised learning algorithm (such as decision tree, neutral net classic algorithm) and be based on mechanism model completion train operation basic model ginseng Mathematics is practised realizing the covering to general scene, and the study of train operation environment perturbation parameters is completed based on dynamic graph model.

Locomotive operation environmental model obtains the specific running environment of locomotive by learning, and the locomotive of acquisition is specifically transported Row environment learns for control strategy.

Step 3, evaluates mechanism study

The study of evaluation mechanism is the award letter that the information combining assessment mechanism that will be obtained from data source obtains locomotive operation Number, reward functions value is used for control strategy and learns as the evaluation of estimate of locomotive control, is that the intensified learning that the present invention is based on is calculated Method, the policy selection foundation on basis.The reward functions value is in general application scenarios (such as game manipulation, robot control) It is determining, objective, it is that the evaluation of estimate is directly obtained according to game rule such as in game manipulation.And in the present invention, award letter Number cannot directly determine that it needs the information knot that will be obtained from data source as the evaluation of locomotive operation according to rule Close evaluation mechanism carries out the target observations in short interval to determine the value for given travel route and locomotive state information.This hair The bright evaluation mechanism that operation is formulated for locomotive driving optimization aim.The evaluation mechanism includes what is formulated based on history log Train operation scoring and by analyzing non-standard operation after formulate non-standard operation Penalty Mark mechanism, especially, Based on the non-standard operation Penalty Mark mechanism that non-standard operation is formulated, it is contemplated that the system requirements of high security, for possible The non-standard operation (such as risk is stopped or exceeded the speed limit on slope) of serious consequence is caused to give maximum penalty value, it is nonstandard to evade such Locomotive control is acted, and the security of strategy generating is effectively ensured.

Step 4, control strategy study

The present invention is controlled policy learning using deeply learning method to the specific running environment of locomotive, and passes through The reward functions of acquisition carry out the renewal and optimization of strategy to running status, and then obtain the optimized handling control strategy of locomotive. Deeply learning method has significant advantage in terms of the optimized handling strategy generating of complication system.Nitrification enhancement can So that algorithm relies on few external information, by continuous repetitive exercise in the environment, and by itself study, optimization behaviour is obtained Vertical control strategy.Deep learning algorithm has significant advantage in terms of complex multi-dimensional data are processed.So, intensified learning with The deeply study that deep learning is combined can solve the problems, such as the optimized handling strategy generating under complication system.Such as Fig. 3 institutes Show, under free position, based on deeply learning algorithm, locomotive operation environmental model is made with the Real-Time Evaluation that locomotive control is acted It is feedback information, evaluation mechanism is by award or punishes current control action, gives DQN model feedbacks one reward functions conduct Award evaluation of estimate, DQN models coupling running statuses are made iteratively the renewal and optimization of strategy.

The present invention carries out the design of deeply learning method based on DQN models.Specifically, DQN models and locomotive operation Environmental model carries out continuous interactive learning, makes and changing using uncertain locomotive operation environment and evaluation mechanism in the present invention Enter, locomotive often performs an operation (action) under free position, and evaluation mechanism just feeds back an award evaluation of estimate, for instructing Manipulation sequence after train, i.e., constantly excitation DQN models carry out the renewal and optimization of strategy, are asked with solving engine optimizing operation Topic, after multiple iteration, the Train Control strategy that model will finally be restrained and be optimized.

The detailed architecture figure of DQN models is as shown in figure 4, wherein interactive environment is uncertain train operation environment.In tool During body is implemented, nitrification enhancement employs the Q-learning algorithms of optimization, and its optimization method is：In Q-learning algorithms The thought of middle combination Experience Replay, i.e., set up a playback storage pool during algorithm iteration, will learn to arrive every time Experience save, next time training when random selection one experience be trained.Using the relatively common extensive chemical of the thought Practising mainly has three below advantage：(1) can effectively break the correlation between status data, reduce the not true of data renewal It is qualitative；(2) harsh conditions of local optimum are caused when can be prevented effectively from algorithmic statement；(3) mesh of nitrification enhancement is solved Mark not fixation problem.Mutually tied with the Q-learning algorithms of optimization using deep learning algorithm (such as deep neural network) in model Close, be obtained in that the element value of approximate Q matrixes (the accumulative valuation functions of train operation described in Q values as Fig. 2), Q in such as Fig. 4 Network is the Q matrix norm types that deep neural network builds.During specific algorithm is implemented, Q network models then update per iteration n times Target Q network parameter, then further updates the DQN differences of DQN models, and Q nets are instructed eventually through gradient descent algorithm Network model continues to optimize training.The application of deep learning method can effectively solve the problem that system state space magnitude is larger and ask Topic.Finally, the selection of locomotive operation (action) is tactful using conventional ε-greedy in DQN models, i.e., the strategy is with very little Probability random selection is operated and with the current optimal operation of greater probability selection, is finally iteratively generating engine optimizing operation plan Slightly.

Additionally, locomotive smart steering method also includes policy update mechanism, the control strategy after optimization being capable of application strategy Update mechanism carries out real-time policy update, that is, instruct from the basis of current control strategy, and real-time adaptive learns Go out the control strategy for more optimizing, realize the successive optimization of locomotive control strategy.

Although being described in detail to principle of the invention above in conjunction with the preferred embodiments of the present invention, this area skill Art personnel are not wrapped to the present invention it should be understood that above-described embodiment is only the explanation to exemplary implementation of the invention Restriction containing scope.Details in embodiment is simultaneously not meant to limit the scope of the invention, without departing substantially from spirit of the invention and In the case of scope, any equivalent transformation based on technical solution of the present invention, simple replacement etc. are obvious to be changed, and is all fallen within Within the scope of the present invention.

Claims

1. it is a kind of based on deeply study locomotive smart steering system, it is characterised in that the locomotive smart steering system Including data source modules, locomotive operation environment learning module, evaluate mechanism study module and control strategy study module；

The data source modules are used to carry out data prediction to the data source for obtaining, and data source described in the data source includes machine Car running log, train operation cross-channel data, energy consumption in train journey information and train schedule information, the data are located in advance Reason is that the locomotive operation daily record and the train operation cross-channel data are delivered into the locomotive operation environment learning module, will The energy consumption in train journey information and the train schedule information conveyance are to the evaluation mechanism study module；

The locomotive operation environment learning module is used to build locomotive operation environmental model, and the locomotive operation environment learning is included The study of the underlying parameter part and disturbance parameter part of train runing parameters, learning outcome constitutes locomotive and specifically runs ring The specific running environment of the locomotive for obtaining is delivered to the control strategy by border, the locomotive operation environment learning module Practise module；

The information combining assessment mechanism that the evaluation mechanism study module will be obtained from the data source modules obtains locomotive fortune Required reward functions during row, the reward functions are as the feedback data of the evaluation mechanism by the evaluation mechanism Study module is delivered to the control strategy study module；

The control strategy study module is distinguished from the locomotive operation environment learning module and the evaluation mechanism study module The specific running environment of the locomotive and the reward functions are obtained, and carries out the train based on deeply learning method and optimized Handling Strategy learning training, continuous interactive learning is carried out with the locomotive operation environmental model, by the evaluation mechanism The reward functions that are fed back of module are practised so as to for instructing the manipulation sequence after train, and by policy update machine System, obtains the final manipulation of physical strategy of the locomotive.

2. it is according to claim 1 based on deeply study locomotive smart steering system, it is characterised in that institute's commentary Valency mechanism includes the study of train operation scoring and non-standard operation Penalty Mark Mechanism Design.

3. it is according to claim 1 based on deeply study locomotive smart steering system, it is characterised in that the control Policy learning module processed carries out deeply study to be carried out based on DQN models, the DQN models and the locomotive operation ring Border model carries out continuous interactive learning.

4. it is a kind of based on deeply study locomotive smart steering method, it is characterised in that the locomotive smart steering method It is achieved by the steps of：

S1：Data source is pre-processed；

The characteristic of locomotive operation environmental model study, i.e. locomotive operation daily record are extracted from data source and train operation is handed over Circuit-switched data, constitutes the sample data of locomotive operation environment supervised learning algorithm study.Train operation energy is extracted from data source The parameter that the data of consumption information and train schedule information learn as the mechanism of evaluation；

S2：The study of locomotive operation environment and structure；

Carried out using supervised learning and dynamic time sequence nomography based on history data by the running environment information of locomotive The training of locomotive operation environmental model and structure, locomotive operation environmental model obtain the specific running environment of locomotive by learning, And the specific running environment of locomotive that will be obtained learns for control strategy；

S3：Evaluation mechanism learns；

The information combining assessment mechanism that will be obtained from data source carries out short area for given travel route and locomotive state information Interior target observations obtain the reward functions of locomotive operation, and reward functions be used to control plan as the evaluation of estimate of locomotive control Slightly learn；

S4：Control strategy learns；

Policy learning, and the award by obtaining are controlled to the specific running environment of locomotive using deeply learning method Function pair running status carries out the renewal and optimization of strategy, and then obtains the optimized handling control strategy of locomotive.

5. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that the machine Car smart steering method also includes policy update mechanism, and the control strategy after optimization can apply the policy update mechanism Real-time policy update is carried out, is instructed from the basis of current control strategy, real-time adaptive study draws what is more optimized Control strategy, realizes the successive optimization of locomotive control strategy.

6. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that in step In S2, the running environment information of locomotive includes train that locomotive operation daily record, train operation cross-channel data constitute state in itself Information and the ambient parameter information in the external world, wherein most parameter fluctuates in certain scope, is that can be seen by historical data The fluctuation information examined and predict, and it is probabilistic in actual scene to have fraction parameter, and may occur can not be pre- The fluctuation of survey.

7. it is according to claim 6 based on deeply study locomotive smart steering method, it is characterised in that the machine Car running environment model completes train operation basic model parameter learning to realize by supervised learning algorithm based on mechanism model Covering to general scene, the study of train operation environment perturbation parameters is completed based on dynamic graph model.

8. it is according to claim 7 based on deeply study locomotive smart steering method, it is characterised in that the prison Learning algorithm is superintended and directed for decision Tree algorithms or neural network algorithm.

9. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that in step In S3, the evaluation mechanism includes train operation scoring and non-standard operation Penalty Mark mechanism, and the train operation is commented Extension set system is formulated based on history log, and the non-standard operation Penalty Mark mechanism is formulated based on non-standard operation.

10. it is according to claim 4 based on deeply study locomotive smart steering method, it is characterised in that in step In rapid S4, control strategy is completed by DQN models and is learnt, based on the deeply learning algorithm, the locomotive operation environment , used as feedback information, evaluation mechanism is by award or punishes that current manipulation is moved for the Real-Time Evaluation that model is acted using locomotive control Make, give the DQN model feedbacks one award evaluation of estimate, the DQN models couplings running status is made iteratively strategy more Newly with optimization.