JP7301034B2 - 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 - Google Patents
準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 Download PDFInfo
- Publication number
- JP7301034B2 JP7301034B2 JP2020159841A JP2020159841A JP7301034B2 JP 7301034 B2 JP7301034 B2 JP 7301034B2 JP 2020159841 A JP2020159841 A JP 2020159841A JP 2020159841 A JP2020159841 A JP 2020159841A JP 7301034 B2 JP7301034 B2 JP 7301034B2
- Authority
- JP
- Japan
- Prior art keywords
- policy
- function
- controller
- state
- objective function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/029—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks and expert systems
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/047—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators the criterion being a time optimal performance criterion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Feedback Control In General (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/592,977 | 2019-10-04 | ||
| US16/592,977 US11650551B2 (en) | 2019-10-04 | 2019-10-04 | System and method for policy optimization using quasi-Newton trust region method |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2021060988A JP2021060988A (ja) | 2021-04-15 |
| JP2021060988A5 JP2021060988A5 (enExample) | 2023-04-06 |
| JP7301034B2 true JP7301034B2 (ja) | 2023-06-30 |
Family
ID=75275122
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2020159841A Active JP7301034B2 (ja) | 2019-10-04 | 2020-09-24 | 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US11650551B2 (enExample) |
| JP (1) | JP7301034B2 (enExample) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6728495B2 (ja) * | 2016-11-04 | 2020-07-22 | ディープマインド テクノロジーズ リミテッド | 強化学習を用いた環境予測 |
| WO2020137019A1 (ja) * | 2018-12-27 | 2020-07-02 | 日本電気株式会社 | 方策作成装置、制御装置、方策作成方法、及び、方策作成プログラムが格納された非一時的なコンピュータ可読媒体 |
| US11992945B2 (en) * | 2020-11-10 | 2024-05-28 | Google Llc | System and methods for training robot policies in the real world |
| JP7556276B2 (ja) * | 2020-12-02 | 2024-09-26 | 富士通株式会社 | 量子化プログラム,量子化方法および量子化装置 |
| US20220414531A1 (en) * | 2021-06-25 | 2022-12-29 | International Business Machines Corporation | Mitigating adversarial attacks for simultaneous prediction and optimization of models |
| US12313276B2 (en) * | 2022-04-21 | 2025-05-27 | Mitsubishi Electric Research Laboratories, Inc. | Time-varying reinforcement learning for robust adaptive estimator design with application to HVAC flow control |
| CN115042174B (zh) * | 2022-06-07 | 2024-08-30 | 中国北方车辆研究所 | 一种分层驱动的自主无人系统类人控制架构 |
| JP2024118220A (ja) * | 2023-02-20 | 2024-08-30 | 富士通株式会社 | 強化学習プログラム、情報処理装置および強化学習方法 |
| JP2024148223A (ja) * | 2023-04-05 | 2024-10-18 | 富士通株式会社 | 強化学習プログラム、強化学習方法、および情報処理装置 |
| CN117162086B (zh) * | 2023-08-07 | 2024-07-05 | 南京云创大数据科技股份有限公司 | 一种用于机械臂目标寻找的训练方法、方法及训练系统 |
| WO2025160824A1 (zh) * | 2024-01-31 | 2025-08-07 | 电子科技大学(深圳)高等研究院 | 基于人工智能的直流-直流转换器自适应控制方法及设备 |
| CN117674595B (zh) * | 2024-01-31 | 2024-06-18 | 电子科技大学(深圳)高等研究院 | 基于人工智能的直流-直流转换器自适应控制方法及设备 |
| CN118721205B (zh) * | 2024-07-24 | 2025-01-28 | 华中科技大学 | 一种机械臂运动规划方法及系统 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2009289199A (ja) | 2008-05-30 | 2009-12-10 | Okinawa Institute Of Science & Technology | 制御器、制御方法および制御プログラム |
| US20170286840A1 (en) | 2016-04-04 | 2017-10-05 | Financialsharp, Inc. | System and method for performance evaluation of probability forecast |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9434389B2 (en) * | 2013-11-18 | 2016-09-06 | Mitsubishi Electric Research Laboratories, Inc. | Actions prediction for hypothetical driving conditions |
| US10976730B2 (en) | 2017-07-13 | 2021-04-13 | Anand Deshpande | Device for sound based monitoring of machine operations and method for operating the same |
-
2019
- 2019-10-04 US US16/592,977 patent/US11650551B2/en active Active
-
2020
- 2020-09-24 JP JP2020159841A patent/JP7301034B2/ja active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2009289199A (ja) | 2008-05-30 | 2009-12-10 | Okinawa Institute Of Science & Technology | 制御器、制御方法および制御プログラム |
| US20170286840A1 (en) | 2016-04-04 | 2017-10-05 | Financialsharp, Inc. | System and method for performance evaluation of probability forecast |
Non-Patent Citations (1)
| Title |
|---|
| 金森 敬文,機械学習のための連続最適化,株式会社講談社 鈴木 哲,2016年,107-142頁 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2021060988A (ja) | 2021-04-15 |
| US20210103255A1 (en) | 2021-04-08 |
| US11650551B2 (en) | 2023-05-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7301034B2 (ja) | 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 | |
| EP3924884B1 (en) | System and method for robust optimization for trajectory-centric model-based reinforcement learning | |
| Bhardwaj et al. | Differentiable gaussian process motion planning | |
| Breyer et al. | Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning | |
| CN115917564A (zh) | 用于学习可重用选项以在任务之间传递知识的系统和方法 | |
| Balakrishna et al. | On-policy robot imitation learning from a converging supervisor | |
| US20220308530A1 (en) | System for Performing a Task According to a Reference Trajectory | |
| Zhao et al. | Model accelerated reinforcement learning for high precision robotic assembly | |
| Parag et al. | Value learning from trajectory optimization and sobolev descent: A step toward reinforcement learning with superlinear convergence properties | |
| Afzali et al. | A modified convergence DDPG algorithm for robotic manipulation | |
| Xue et al. | Logic-skill programming: An optimization-based approach to sequential skill planning | |
| Surovik et al. | Learning an expert skill-space for replanning dynamic quadruped locomotion over obstacles | |
| Berdica et al. | Reinforcement learning controllers for soft robots using learned environments | |
| US12124230B2 (en) | System and method for polytopic policy optimization for robust feedback control during learning | |
| Zimmer et al. | Neural fitted actor-critic | |
| De Carvalho et al. | Data-driven motion planning: A survey on deep neural networks, reinforcement learning, and large language model approaches | |
| EP4607293A1 (en) | Universal control policy for machine actuators | |
| Gonzalez-Fierro et al. | Behavior sequencing based on demonstrations: a case of a humanoid opening a door while walking | |
| Li et al. | Bellman gradient iteration for inverse reinforcement learning | |
| Reuter et al. | Genetic programming-based inverse kinematics for robotic manipulators | |
| Hong et al. | Dynamics-aware metric embedding: Metric learning in a latent space for visual planning | |
| Gorodetskiy et al. | Model-based policy optimization with neural differential equations for robotic arm control | |
| Vijayan et al. | Comparative study on the performance of neural networks for prediction in assisting visual servoing | |
| CN116476067B (zh) | 机器人运动控制方法、设备及介质 | |
| Nobre et al. | Reinforcement learning for assisted visual-inertial robotic calibration |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230329 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20230329 |
|
| A871 | Explanation of circumstances concerning accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A871 Effective date: 20230329 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20230418 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230511 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20230523 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20230620 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7301034 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |