JP7301034B2 - 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 - Google Patents

準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 Download PDF

Info

Publication number
JP7301034B2
JP7301034B2 JP2020159841A JP2020159841A JP7301034B2 JP 7301034 B2 JP7301034 B2 JP 7301034B2 JP 2020159841 A JP2020159841 A JP 2020159841A JP 2020159841 A JP2020159841 A JP 2020159841A JP 7301034 B2 JP7301034 B2 JP 7301034B2
Authority
JP
Japan
Prior art keywords
policy
function
controller
state
objective function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2020159841A
Other languages
English (en)
Japanese (ja)
Other versions
JP2021060988A (ja
JP2021060988A5 (enExample
Inventor
ジャー・デベシュ
ラフナサン・アルビンド
ロメレス・ディエゴ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Publication of JP2021060988A publication Critical patent/JP2021060988A/ja
Publication of JP2021060988A5 publication Critical patent/JP2021060988A5/ja
Application granted granted Critical
Publication of JP7301034B2 publication Critical patent/JP7301034B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/029Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks and expert systems
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/047Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators the criterion being a time optimal performance criterion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Feedback Control In General (AREA)
JP2020159841A 2019-10-04 2020-09-24 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 Active JP7301034B2 (ja)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/592,977 2019-10-04
US16/592,977 US11650551B2 (en) 2019-10-04 2019-10-04 System and method for policy optimization using quasi-Newton trust region method

Publications (3)

Publication Number Publication Date
JP2021060988A JP2021060988A (ja) 2021-04-15
JP2021060988A5 JP2021060988A5 (enExample) 2023-04-06
JP7301034B2 true JP7301034B2 (ja) 2023-06-30

Family

ID=75275122

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2020159841A Active JP7301034B2 (ja) 2019-10-04 2020-09-24 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法

Country Status (2)

Country Link
US (1) US11650551B2 (enExample)
JP (1) JP7301034B2 (enExample)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6728495B2 (ja) * 2016-11-04 2020-07-22 ディープマインド テクノロジーズ リミテッド 強化学習を用いた環境予測
WO2020137019A1 (ja) * 2018-12-27 2020-07-02 日本電気株式会社 方策作成装置、制御装置、方策作成方法、及び、方策作成プログラムが格納された非一時的なコンピュータ可読媒体
US11992945B2 (en) * 2020-11-10 2024-05-28 Google Llc System and methods for training robot policies in the real world
JP7556276B2 (ja) * 2020-12-02 2024-09-26 富士通株式会社 量子化プログラム,量子化方法および量子化装置
US20220414531A1 (en) * 2021-06-25 2022-12-29 International Business Machines Corporation Mitigating adversarial attacks for simultaneous prediction and optimization of models
US12313276B2 (en) * 2022-04-21 2025-05-27 Mitsubishi Electric Research Laboratories, Inc. Time-varying reinforcement learning for robust adaptive estimator design with application to HVAC flow control
CN115042174B (zh) * 2022-06-07 2024-08-30 中国北方车辆研究所 一种分层驱动的自主无人系统类人控制架构
JP2024118220A (ja) * 2023-02-20 2024-08-30 富士通株式会社 強化学習プログラム、情報処理装置および強化学習方法
JP2024148223A (ja) * 2023-04-05 2024-10-18 富士通株式会社 強化学習プログラム、強化学習方法、および情報処理装置
CN117162086B (zh) * 2023-08-07 2024-07-05 南京云创大数据科技股份有限公司 一种用于机械臂目标寻找的训练方法、方法及训练系统
WO2025160824A1 (zh) * 2024-01-31 2025-08-07 电子科技大学(深圳)高等研究院 基于人工智能的直流-直流转换器自适应控制方法及设备
CN117674595B (zh) * 2024-01-31 2024-06-18 电子科技大学(深圳)高等研究院 基于人工智能的直流-直流转换器自适应控制方法及设备
CN118721205B (zh) * 2024-07-24 2025-01-28 华中科技大学 一种机械臂运动规划方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009289199A (ja) 2008-05-30 2009-12-10 Okinawa Institute Of Science & Technology 制御器、制御方法および制御プログラム
US20170286840A1 (en) 2016-04-04 2017-10-05 Financialsharp, Inc. System and method for performance evaluation of probability forecast

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9434389B2 (en) * 2013-11-18 2016-09-06 Mitsubishi Electric Research Laboratories, Inc. Actions prediction for hypothetical driving conditions
US10976730B2 (en) 2017-07-13 2021-04-13 Anand Deshpande Device for sound based monitoring of machine operations and method for operating the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009289199A (ja) 2008-05-30 2009-12-10 Okinawa Institute Of Science & Technology 制御器、制御方法および制御プログラム
US20170286840A1 (en) 2016-04-04 2017-10-05 Financialsharp, Inc. System and method for performance evaluation of probability forecast

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
金森 敬文,機械学習のための連続最適化,株式会社講談社 鈴木 哲,2016年,107-142頁

Also Published As

Publication number Publication date
JP2021060988A (ja) 2021-04-15
US20210103255A1 (en) 2021-04-08
US11650551B2 (en) 2023-05-16

Similar Documents

Publication Publication Date Title
JP7301034B2 (ja) 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法
EP3924884B1 (en) System and method for robust optimization for trajectory-centric model-based reinforcement learning
Bhardwaj et al. Differentiable gaussian process motion planning
Breyer et al. Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning
CN115917564A (zh) 用于学习可重用选项以在任务之间传递知识的系统和方法
Balakrishna et al. On-policy robot imitation learning from a converging supervisor
US20220308530A1 (en) System for Performing a Task According to a Reference Trajectory
Zhao et al. Model accelerated reinforcement learning for high precision robotic assembly
Parag et al. Value learning from trajectory optimization and sobolev descent: A step toward reinforcement learning with superlinear convergence properties
Afzali et al. A modified convergence DDPG algorithm for robotic manipulation
Xue et al. Logic-skill programming: An optimization-based approach to sequential skill planning
Surovik et al. Learning an expert skill-space for replanning dynamic quadruped locomotion over obstacles
Berdica et al. Reinforcement learning controllers for soft robots using learned environments
US12124230B2 (en) System and method for polytopic policy optimization for robust feedback control during learning
Zimmer et al. Neural fitted actor-critic
De Carvalho et al. Data-driven motion planning: A survey on deep neural networks, reinforcement learning, and large language model approaches
EP4607293A1 (en) Universal control policy for machine actuators
Gonzalez-Fierro et al. Behavior sequencing based on demonstrations: a case of a humanoid opening a door while walking
Li et al. Bellman gradient iteration for inverse reinforcement learning
Reuter et al. Genetic programming-based inverse kinematics for robotic manipulators
Hong et al. Dynamics-aware metric embedding: Metric learning in a latent space for visual planning
Gorodetskiy et al. Model-based policy optimization with neural differential equations for robotic arm control
Vijayan et al. Comparative study on the performance of neural networks for prediction in assisting visual servoing
CN116476067B (zh) 机器人运动控制方法、设备及介质
Nobre et al. Reinforcement learning for assisted visual-inertial robotic calibration

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230329

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20230329

A871 Explanation of circumstances concerning accelerated examination

Free format text: JAPANESE INTERMEDIATE CODE: A871

Effective date: 20230329

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20230418

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230511

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20230523

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20230620

R150 Certificate of patent or registration of utility model

Ref document number: 7301034

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150