WO2019155511A1 - Système, procédé et programme de commande prédictive de modèle inverse - Google Patents

Système, procédé et programme de commande prédictive de modèle inverse Download PDF

Info

Publication number
WO2019155511A1
WO2019155511A1 PCT/JP2018/003952 JP2018003952W WO2019155511A1 WO 2019155511 A1 WO2019155511 A1 WO 2019155511A1 JP 2018003952 W JP2018003952 W JP 2018003952W WO 2019155511 A1 WO2019155511 A1 WO 2019155511A1
Authority
WO
WIPO (PCT)
Prior art keywords
model predictive
constraints
objective function
inverse model
predictive control
Prior art date
Application number
PCT/JP2018/003952
Other languages
English (en)
Inventor
Wemer Wee
Riki Eto
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Priority to PCT/JP2018/003952 priority Critical patent/WO2019155511A1/fr
Publication of WO2019155511A1 publication Critical patent/WO2019155511A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/048Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators using a predictor

Definitions

  • the present invention relates to an inverse model predictive control system, an inverse model predictive control method, and an inverse model predictive control program.
  • Model predictive control is a well-known optimization-based control technique used in many advanced large-scale control systems.
  • MPC requires an objective function to calculate optimal control actions at each step.
  • some tasks can be easy for an expert or agent to solve or demonstrate but difficult to describe formally as an objective function, or in a widely used special case, as a weighted linear combination of specific objective terms or features, with the weights representing the relative importance of each term or feature in the overall task.
  • NPL 1 An initial approach to the solution of the above problem is provided in NPL 1, where an inverse MPC algorithm was developed that estimates the weights in the objective function using output measurements only. Aside from the objective, the proposed method takes into account control constraints during learning. The criterion used for finding desired weights is based on the minimization of the condition number of a matrix built from the weights, which imposes regularization on the inverse problem for determining a unique solution.
  • NPL 2 disclosed projection-based and maximum margin techniques for recovering reward or objective functions that are assumed to be a linear function of known features. It also introduced a strategy of matching feature expectations between observed and learned behavior.
  • NPL 3 proposed a probabilistic approach based on the principle of maximum entropy, which provides a well-defined distribution over decision sequences. This has been applied in a driving route prediction system for improving fuel efficiency in PTL 1.
  • the objective function weights learned maximize the likelihood of the observed trajectories.
  • the existing works are based on the assumption that the control actions or policies are generated by solving a forward reinforcement learning problem, and does not involve MPC optimization components such as constraints or horizon length.
  • PTL 2 discloses an information processing device that efficiently implements control learning based on a real-world environment.
  • the information processing device disclosed in PTL 2 has a function related to the selection of expert information in order to acquire a reward function of behavior and a policy on driving from control logs included in expert information in inverse reinforcement learning.
  • constraints in a model predictive controller play an important role in determining appropriate control actions for actuation.
  • the constraints have a significant effect on the trajectories that can be generated, and thus, if no available or prior information can be used to set such constraints close to the actual constraints in effect during the expert or agent demonstration, it is necessary to learn appropriate constraints so that the recovered set of weights will generate the desired control actions.
  • PTL 2 discloses the application of the standard reinforcement learning approach using different environmental models generated by a physical simulator, and discloses the possibility of using inverse reinforcement learning for the purpose of learning control policies. However, PTL 2 does not disclose an approach for efficiently learning objective functions and constraints used in the MPC.
  • the subject matter of the present invention is directed to realizing the above features in order to overcome, or at least reduce the effects of, one or more of the problems set forth above. That is, it is an exemplary object of the present invention to provide an inverse model predictive control system, an inverse model predictive control method and an inverse model predictive control program capable of learning and tuning the objective function and constraints in a model predictive controller from expert or agent demonstration.
  • An inverse model predictive control system is an inverse model predictive control system which infers expert objectives and constraints from data
  • the inverse model predictive control system includes: a model predictive controller which receives at least one of observations and state measurements from a control target and calculates control actions and generates trajectories based on an objective function and a set of constraints; and an inverse model predictive control learner which compares features extracted from the trajectories of the control target generated by the model predictive controller with features of data representing expert demonstrations and, by applying machine learning techniques, solves an optimization problem to update the objective function and constraints based on difference of compared features.
  • An autonomous driving control system includes: a plant controller which calculates control actions for actuation and generates trajectories of target plant based on an objective function and a set of constraints and controls the target plant; and an inverse model predictive control learner which compares features extracted from trajectories of the target plant generated by the plant controller with features of data representing expert demonstrations and, by applying machine learning techniques, solves an optimization problem to update the objective function and constraints based on difference of compared features, wherein the plant controller calculates the control actions and generates the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.
  • An inverse model predictive control method is an inverse model predictive control method which infers expert objectives and constraints from data
  • the inverse model predictive control method includes: receiving at least one of observations and state measurements from a control target; calculating control actions and generating trajectories based on an objective function and a set of constraints; comparing features extracted from the trajectories of the control target generated by the model predictive controller with features of data representing expert demonstrations; and solving an optimization problem to update the objective function and constraints based on difference of compared features by applying machine learning techniques.
  • An autonomous driving control method comprising: calculating control actions for actuation and generating trajectories of target plant based on an objective function and a set of constraints; controlling the target plant; comparing features extracted from generated trajectories of the target plant with features of data representing expert demonstrations; solving an optimization problem to update the objective function and constraints based on difference of compared features by applying machine learning techniques; and calculating the control actions and generating the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.
  • An inverse model predictive control program is an inverse model predictive control program mounted on a computer which infers expert objectives and constraints from data, the program causing the computer to perform: a model predictive process of receiving at least one of observations and state measurements from a control target and calculating control actions and generating trajectories based on an objective function and a set of constraints; and an inverse model predictive control learning process of comparing features extracted from the generated trajectories of the control target with features of data representing expert demonstrations and, by applying machine learning techniques, solving an optimization problem to update the objective function and constraints based on difference of compared features.
  • An autonomous driving control program for causing a computer to execute: a plant control process of calculating control actions for actuation and generating trajectories of target plant based on an objective function and a set of constraints and controls the target plant; and an inverse model predictive control learning process of comparing features extracted from trajectories of the target plant generated by the plant controller with features of data representing expert demonstrations and, by applying machine learning techniques, solving an optimization problem to update the objective function and constraints based on difference of compared features, wherein in the plant control process, calculating the control actions and generating the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.
  • the present invention infers objectives and constraints from expert or agent demonstration, which are used in a model predictive controller for generating or calculating optimal control actions, decisions or behavior that match or exceed the performance of the expert or agent in terms of certain performance measures or criteria.
  • the objectives or intent of experts can be captured by recovering or learning an objective function from demonstrations.
  • the objective function or weights inferred may then be used to explain observed behavior, describe the intent of the expert or agent, and improve the design of the controller to achieve better overall control performance.
  • a machine learning approach allows us to employ probabilistic and non-probabilistic methods such as maximum margin or maximum likelihood techniques like those in NPL 2 and NPL 3 and not only rely on the condition number of a matrix containing the weights as in NPL 1, which basically imposes conditions or regularization on the weights only.
  • the present invention is different from the inverse reinforcement learning and inverse optimal control prior arts like NPL 2 and NPL 3 due to the use of a model predictive controller, which involves the combination of at least two components that needs to be learned simultaneously in order to recover the objective of the expert or agent more accurately.
  • the present invention considers the constraints as part of what needs to be learned from demonstration.
  • the present invention relates to a method and system for inverse model predictive control, i.e., learning objective function weights and constraints in a model predictive controller from expert or agent demonstration.
  • inverse model predictive control i.e., learning objective function weights and constraints in a model predictive controller from expert or agent demonstration.
  • Fig. 1 is an exemplary block diagram illustrating the structure of an exemplary embodiment of an inverse model predictive control system according to the present invention.
  • Fig. 2 is an exemplary explanatory diagram illustrating the structure of an exemplary embodiment of the inverse model predictive control system according to the present invention.
  • the inverse model predictive control system of the present embodiment infers the objective function weights and constraints from expert or agent demonstration for use in a model predictive controller that allow replication of intent or behavior.
  • the inverse model predictive control system 100 includes a model predictive (MPC) controller 101, a plant or simulator 102, and an inverse MPC learner 103.
  • the plant 102 is a control target of the MPC controller 101.
  • the inverse MPC learner 103 sends the objective function weights and constraints to the MPC controller 101.
  • the MPC controller 101 then generates control policies or actions actuated in the plant 102.
  • the plant outputs are acquired by sensors of the plant, and the collection of control actions and plant outputs or trajectories 112 are sent to the inverse MPC learner 103 for computation of an update to the weights, which will then be sent to the MPC controller 101 for generation of new control actions for actuation.
  • This process of alternating between a trajectory generation step using the MPC controller 101 and the plant 102 and weight updating step via the learner 103 continues as long as no stopping criterion is satisfied.
  • the inverse model predictive control system of the present invention can be applied to, for example, a system that controls autonomous driving vehicles. Therefore, the inverse model predictive control system of the present embodiment can be referred to as an autonomous driving control system.
  • the MPC controller 101 operates as a plant controller which controls a target plant.
  • the plant outputs or observations can be the state of the plant 102 or variables related to the environment or surroundings which are acquired by the sensor.
  • the outputs of the MPC controller 101 are the control actions required in the specific task, i.e., the control signals that are required by the actuator or a possibly fixed lower-level controller next to it, if present.
  • the controller will output its calculated optimal steering angle and acceleration, e.g., (0.1 rad, 2.3 m/sec ⁇ 2).
  • the MPC controller 101 receives at least one of observations and state measurements (observations) from a control target (i.e. the plant 102) and calculates control actions for actuation and generates trajectories based on an objective function and a set of constraints. Specifically, the MPC controller 101 employs a plant model for state predictions and optimizes an objective function containing different terms or features related to different performance criteria, with some accompanying input, state and/or output constraints. The constrained optimization problem is solved to obtain control actions that are optimal in the sense of the performance indices in the MPC controller 101.
  • the MPC objective function is usually a weighted sum of terms or features that represent performance measures such as distance to target state or change in input. In the autonomous driving example, this can be the sum of terms relating to distance to target location, change in acceleration and steering, comfort or energy consumption.
  • the inverse MPC learner 103 compares features extracted from the trajectories of the control target (i.e. the plant 102) generated by the model predictive controller 101 with features of data representing expert demonstrations. Moreover, by applying machine learning techniques, the inverse MPC learner 103 solves an optimization problem to update the objective function and constraints based on difference of compared features.
  • the features are extracted from data. For example, given a dataset containing sensor data from a vehicle, such as position data, it is possible to compute quantities, called features, which represent certain properties of driving styles. Features which we can compute or generate in this scenario are the velocity, acceleration, distance to other objects, and distance to the centerline.
  • the inverse MPC learner 103 may employ any machine learning technique such as maximum margin and maximum likelihood approaches.
  • the inverse MPC learner 103 may use a criterion that characterizes the distribution of the trajectories or decision sequences such as the maximum entropy criterion.
  • Such criterion formalizes the objective function weights as a solution of an optimization problem that maximizes the likelihood of the available expert demonstration. Note that the criterion for the weight update optimization problem can be readily replaced by other types of penalty function.
  • the uniqueness or a further characterization of appropriate objective function weights can be based on the addition of criteria or constraints in the maximum likelihood optimization problem.
  • the inverse MPC learner 103 may use feature matching as one constraint, i.e., the features values or expectations of the generated trajectories using the desired or calculated weights should be close or the same as those of the expert demonstrations.
  • the inverse MPC learner 103 may calculate the empirical feature values, i.e., the feature expectations of the expert trajectories, by averaging the values of the features or terms in the expert trajectory that may be manually chosen in the design of the MPC controller 101 or feature engineered using machine learning techniques.
  • the expected feature values of the generated trajectories can be approximated using the same approach, and thus, this method can be applied in large discrete or continuous state spaces.
  • the inverse MPC learner 103 can add a penalty function to extend the maximum likelihood optimization problem that will reduce the difference between quantities or residuals that can be defined using Karush-Kuhn-Tucker (KKT) conditions. That is, the inverse MPC learner 103 may solve an extended maximum likelihood optimization problem employing quantities derived from Karush-Kuhn-Tucker conditions to update weights of the objective function and constraints for use in the model predictive controller.
  • KT Karush-Kuhn-Tucker
  • the KKT conditions can induce quantities or residuals dependent on the desired MPC constraints.
  • the inverse MPC learner 103 may use any penalty such as Euclidean norm or Huber penalty function for minimizing differences of such quantities.
  • the inverse MPC learner 103 learns the objective function weights and constraints simultaneously.
  • the inverse MPC learner 103 may also do the learning of objective function weights and constraints in an alternating fashion by for instance first solving a first-stage optimization problem based on a maximum likelihood criterion to obtain weight updates, and then by solving a second-stage optimization problem that penalizes quantities built using KKT conditions to get updates of appropriate constraints. That is, the inverse MPC learner 103 may solve an optimization problem based on a maximum likelihood criterion to update weights of the objective function, and solves another optimization problem, to update the constraints, using the weights of the objective function and the quantities derived from Karush-Kuhn-Tucker conditions that are used in a penalty function.
  • the formulation described above provides us a way to solve different problems related to objective function terms or computational expense using some modifications or extensions.
  • One important problem is feature selection, which can be performed by employing regularization functions such as the lasso for the weights. After generating or engineering features from data, it may be the case that there are many of them which will make computation or interpretation difficult. To address this issue, it is important to select the most relevant features that will still allow us to capture a meaningful representation of the intent of the expert.
  • the inverse MPC learner 103 may use subgradient or proximal gradient methods to solve the new extended optimization problem. Also, to minimize the number of simulations required to update the weights and constraints to a reasonable level of efficacy, the number of times needed to solve the forward control problem should be reduced. To address this problem, the inverse MPC learner 103 may employ accelerated gradient descent methods for solving the optimization problem. That is, the inverse MPC learner 103 may extend the optimization problems for weights of the objective function and the constraints with regularization for solving feature selection problems and solve the extended optimization problems using subgradient or proximal gradient methods.
  • the MPC controller 101 and the inverse MPC learner 103 are each implemented by a CPU of a computer that operates in accordance with a program (inverse MPC program).
  • the program may be stored in a storage unit (not shown) included in the inverse model predictive control system 100, and the CPU may read the program and operate as the MPC controller 101 and the inverse MPC learner 103 in accordance with the program.
  • the MPC controller 101 and the inverse MPC learner 103 may each be implemented by dedicated hardware. Further, the inverse model predictive control system according to the present invention may be configured with two or more physically separate devices which are connected in a wired or wireless manner.
  • Fig. 3 is a flowchart illustrating an operation example of the inverse model predictive control system in this exemplary embodiment.
  • the MPC controller 101 receives the data or values relating to the initial and goal states and the objective function weights and constraints 111 for use with the plant 102.
  • the MPC controller 101 performs control experiments or simulations to generate control actions to be actuated in the plant 102. Specifically, the MPC controller 101 performs simulation using the MPC controller 101 and the plant 102 to generate trajectories.
  • the inverse MPC learner 103 accesses or uses the expert demonstrations 110, or feature values or expectations extracted from it.
  • inverse MPC learner 103 accesses the trajectories involving the states and control actions generated at S103 for extraction or calculation of feature values or expectations.
  • the inverse MPC learner 103 performs an update step for the objective function weights and constraints by solving a maximum likelihood optimization problem or its extension.
  • the inverse MPC learner 103 calculates and sends the updates to the objective function weights and constraints to the MPC controller 101 and the process is then repeated.
  • the MPC controller 101 and the plant 102 generate trajectories using objective function weights and constraints
  • the inverse MPC learner 103 uses the generated trajectories and performs comparison with the expert trajectories or demonstrations to update the estimates for the expert’s objective function weights and constraints.
  • objective function weights and constraints learned using the inverse model predictive control method can be used in a model predictive controller to generate trajectories or behavior that can match or exceed the performance of the expert in certain measures.
  • the proposed solution is to collect demonstration data from an expert, for instance containing histories of manipulated variables such as front wheel steering angle and longitudinal speed, as well as state measurements such as position, speed and other observations from the vehicle or plant 102.
  • initialization or reference signals required by the MPC controller 101 can be set.
  • the objective terms or features can be chosen as the squares of different state variables such as position, velocity, and change in acceleration, and the constraints can be linear in the variables involved in the objective function.
  • the objective function weights and constraints in the MPC controller 101 can then be initialized to perform a driving experiment or simulation with the vehicle, generating a new set of trajectories from which feature values or expectations can be calculated by the inverse MPC learner 103 and compared with those of the expert trajectories.
  • the inverse MPC learner 103 then computes an updated estimate of the constraints and weights or relative importance of the objective terms or features for use in another driving experiment or simulation.
  • Fig. 4 is a block diagram illustrating an outline of the inverse model predictive control system according to the present invention.
  • the inverse model predictive control system 80 is an inverse model predictive control system (for example, inverse model predictive control system 100) that infers expert objectives and constraints from data, and includes: a model predictive controller 81 (for example, the MPC controller 101) which receives at least one of observations and state measurements from a control target (for example, plant 102) and calculates control actions and generates trajectories based on an objective function and a set of constraints; and an inverse model predictive control learner 82 (for example, inverse MPC learner 103) which compares features extracted from the trajectories of the control target generated by the model predictive controller 81 with features of data representing expert demonstrations and, by applying machine learning techniques, solves an optimization problem to update the objective function and constraints based on difference of compared features.
  • a model predictive controller 81 for example, the MPC controller 101
  • a control target for example, plant 102
  • the inverse model predictive control learner 82 may solve an extended maximum likelihood optimization problem employing quantities derived from Karush-Kuhn-Tucker conditions to update weights of the objective function and constraints for use in the model predictive controller.
  • the inverse model predictive control learner 82 may solve an optimization problem based on a maximum likelihood criterion to update weights of the objective function, and solves another optimization problem, to update the constraints, using the weights of the objective function and the quantities derived from Karush-Kuhn-Tucker conditions that are used in a penalty function.
  • the inverse model predictive control learner 82 may extend the optimization problems for weights of the objective function and the constraints with regularization for solving feature selection problems and solves the extended optimization problems using subgradient or proximal gradient methods.
  • the inverse model predictive control learner 82 may solve the optimization problems using accelerated gradient methods so that weights of the objective function and constraints reach an acceptable level of performance relative to the expert by using a reduced number of simulations.
  • Fig. 5 is a block diagram illustrating an outline of the autonomous driving control system according to the present invention.
  • the autonomous driving control system 90 includes: a plant controller 91 (for example, the MPC controller 101) which calculates control actions for actuation and generates trajectories of target plant (for example, plant 102) based on an objective function and a set of constraints and controls the target plant; and an inverse model predictive control learner 92 (for example, inverse MPC learner 103) which compares features extracted from trajectories of the target plant generated by the plant controller 91 with features of data representing expert demonstrations and, by applying machine learning techniques, solves an optimization problem to update the objective function and constraints based on difference of compared features, wherein the plant controller 91 calculates the control actions and generates the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.
  • a plant controller 91 for example, the MPC controller 101
  • inverse model predictive control learner 92 for example, inverse M
  • An inverse model predictive control system which infers expert objectives and constraints from data, the inverse model predictive control system comprising: a model predictive controller which receives at least one of observations and state measurements from a control target and calculates control actions and generates trajectories based on an objective function and a set of constraints; and an inverse model predictive control learner which compares features extracted from the trajectories of the control target generated by the model predictive controller with features of data representing expert demonstrations and, by applying machine learning techniques, solves an optimization problem to update the objective function and constraints based on difference of compared features.
  • An autonomous driving control system comprising: a plant controller which calculates control actions for actuation and generates trajectories of target plant based on an objective function and a set of constraints and controls the target plant; and an inverse model predictive control learner which compares features extracted from trajectories of the target plant generated by the plant controller with features of data representing expert demonstrations and, by applying machine learning techniques, solves an optimization problem to update the objective function and constraints based on difference of compared features, wherein the plant controller calculates the control actions and generates the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.
  • An inverse model predictive control method which infers expert objectives and constraints from data, the inverse model predictive control method comprising: receiving at least one of observations and state measurements from a control target; calculating control actions and generating trajectories based on an objective function and a set of constraints; comparing features extracted from the trajectories of the control target generated by the model predictive controller with features of data representing expert demonstrations; and solving an optimization problem to update the objective function and constraints based on difference of compared features by applying machine learning techniques.
  • An autonomous driving control method comprising: calculating control actions for actuation and generating trajectories of target plant based on an objective function and a set of constraints; controlling the target plant; comparing features extracted from generated trajectories of the target plant with features of data representing expert demonstrations; solving an optimization problem to update the objective function and constraints based on difference of compared features by applying machine learning techniques; and calculating the control actions and generating the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.
  • An inverse model predictive control program mounted on a computer which infers expert objectives and constraints from data, the program causing the computer to perform: a model predictive process of receiving at least one of observations and state measurements from a control target and calculating control actions and generating trajectories based on an objective function and a set of constraints; and an inverse model predictive control learning process of comparing features extracted from the generated trajectories of the control target with features of data representing expert demonstrations and, by applying machine learning techniques, solving an optimization problem to update the objective function and constraints based on difference of compared features.
  • An autonomous driving control program for causing a computer to execute: a plant control process of calculating control actions for actuation and generating trajectories of target plant based on an objective function and a set of constraints and controls the target plant; and an inverse model predictive control learning process of comparing features extracted from trajectories of the target plant generated by the plant controller with features of data representing expert demonstrations and, by applying machine learning techniques, solving an optimization problem to update the objective function and constraints based on difference of compared features, wherein in the plant control process, calculating the control actions and generating the trajectories based on updated objective function and constraints with at least one of observations and state measurements from the target plant.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

L'invention concerne un contrôleur prédictif de modèle 81 qui reçoit des observations et/ou des mesures d'état d'une cible de commande et qui calcule des actions de commande et génère des trajectoires sur la base d'une fonction objective et d'un ensemble de contraintes. Un dispositif d'apprentissage 82 de commande prédictive de modèle inverse compare des caractéristiques extraites des trajectoires de la cible de commande générées par le contrôleur prédictif de modèle 81 avec des caractéristiques de données représentant des démonstrations d'expert et, par application de techniques d'apprentissage machine, résout un problème d'optimisation pour mettre à jour la fonction objective et des contraintes sur la base de la différence des caractéristiques comparées.
PCT/JP2018/003952 2018-02-06 2018-02-06 Système, procédé et programme de commande prédictive de modèle inverse WO2019155511A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/003952 WO2019155511A1 (fr) 2018-02-06 2018-02-06 Système, procédé et programme de commande prédictive de modèle inverse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/003952 WO2019155511A1 (fr) 2018-02-06 2018-02-06 Système, procédé et programme de commande prédictive de modèle inverse

Publications (1)

Publication Number Publication Date
WO2019155511A1 true WO2019155511A1 (fr) 2019-08-15

Family

ID=67547891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/003952 WO2019155511A1 (fr) 2018-02-06 2018-02-06 Système, procédé et programme de commande prédictive de modèle inverse

Country Status (1)

Country Link
WO (1) WO2019155511A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021033380A1 (fr) * 2019-08-16 2021-02-25 Mitsubishi Electric Corporation Adaptateur de contraintes pour commande d'apprentissage de renforcement
NL2027227B1 (en) * 2020-12-24 2022-07-20 Univ Delft Tech Learning agent for operating in ML control device and method for adjusting cost function of learning agent
WO2023073401A1 (fr) 2021-10-26 2023-05-04 Telefonaktiebolaget Lm Ericsson (Publ) Procédé de mise en œuvre de systèmes de recommandation de topologie radio

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0883104A (ja) * 1994-09-12 1996-03-26 Toshiba Corp プラント制御装置
JP2004118658A (ja) * 2002-09-27 2004-04-15 Advanced Telecommunication Research Institute International 物理系の制御方法および装置ならびに物理系の制御のためのコンピュータプログラム
US20150286192A1 (en) * 2014-04-04 2015-10-08 Invensys Systems, Inc. Integrated model predictive control and advanced process control

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0883104A (ja) * 1994-09-12 1996-03-26 Toshiba Corp プラント制御装置
JP2004118658A (ja) * 2002-09-27 2004-04-15 Advanced Telecommunication Research Institute International 物理系の制御方法および装置ならびに物理系の制御のためのコンピュータプログラム
US20150286192A1 (en) * 2014-04-04 2015-10-08 Invensys Systems, Inc. Integrated model predictive control and advanced process control

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021033380A1 (fr) * 2019-08-16 2021-02-25 Mitsubishi Electric Corporation Adaptateur de contraintes pour commande d'apprentissage de renforcement
NL2027227B1 (en) * 2020-12-24 2022-07-20 Univ Delft Tech Learning agent for operating in ML control device and method for adjusting cost function of learning agent
WO2023073401A1 (fr) 2021-10-26 2023-05-04 Telefonaktiebolaget Lm Ericsson (Publ) Procédé de mise en œuvre de systèmes de recommandation de topologie radio

Similar Documents

Publication Publication Date Title
Zhang et al. Solar: Deep structured representations for model-based reinforcement learning
Williams et al. Information theoretic mpc for model-based reinforcement learning
Kahn et al. Uncertainty-aware reinforcement learning for collision avoidance
Cutler et al. Efficient reinforcement learning for robots using informative simulated priors
Hussein et al. Deep imitation learning for 3D navigation tasks
CN113498523B (zh) 用于控制机器对象的操作的装置和方法以及存储介质
US11049010B2 (en) Early prediction of an intention of a user's actions
US10001760B1 (en) Adaptive control system capable of recovering from unexpected situations
JP2020535539A (ja) システムを制御する予測コントローラ、車両及び方法
WO2019155511A1 (fr) Système, procédé et programme de commande prédictive de modèle inverse
EP3793784A1 (fr) Apprentissage de renforcement hiérarchique efficace en termes de données
US20200249637A1 (en) Ensemble control system, ensemble control method, and ensemble control program
US9747543B1 (en) System and method for controller adaptation
Paul et al. Fingerprint policy optimisation for robust reinforcement learning
Paul et al. Alternating optimisation and quadrature for robust control
CN116368439A (zh) 混合整数最优控制优化中提前终止的控制器
McKinnon et al. Learning probabilistic models for safe predictive control in unknown environments
Lin et al. An ensemble method for inverse reinforcement learning
Bellegarda et al. An online training method for augmenting mpc with deep reinforcement learning
US20220080586A1 (en) Device and method for controlling a robotic device
Baert et al. Maximum causal entropy inverse constrained reinforcement learning
Wang et al. Interaction-aware model predictive control for autonomous driving
CN115151916A (zh) 在实验室环境中训练演员-评论家算法
CN111949013A (zh) 控制载具的方法和用于控制载具的装置
Lowrey et al. Real-time state estimation with whole-body multi-contact dynamics: A modified UKF approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18904739

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18904739

Country of ref document: EP

Kind code of ref document: A1