JP5330138B2 - 強化学習システム - Google Patents
強化学習システム Download PDFInfo
- Publication number
- JP5330138B2 JP5330138B2 JP2009174585A JP2009174585A JP5330138B2 JP 5330138 B2 JP5330138 B2 JP 5330138B2 JP 2009174585 A JP2009174585 A JP 2009174585A JP 2009174585 A JP2009174585 A JP 2009174585A JP 5330138 B2 JP5330138 B2 JP 5330138B2
- Authority
- JP
- Japan
- Prior art keywords
- value
- reward
- function
- learning
- robot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Automation & Control Theory (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Manipulator (AREA)
- Feedback Control In General (AREA)
Description
x=dV1/dt,
Y(x)≡1(x≧0)または0(x<0) ‥(3)
r2(t)=exp(−αp|p(t)−p0|2)−βE(t)+ηx・Y(−x),
x=dV1(t)/dt ‥(6)
Claims (3)
- エージェントがタスクを実行するための行動方策を学習するための強化学習システムであって、
環境を表わす第1〜第n状態変数(n≧2)を認識する環境認識器と、
前記環境認識器により認識された第j状態変数(j=1,2,‥n)に基づいて第j報酬を算出し、当該第j状態変数に基づき、第j価値関数にしたがって第j価値を算出し、前記第j価値および前記第j報酬に基づいて第j誤差を算出し、前記第j誤差に基づいて前記第j価値関数を適宜変更するn個の第j学習器と、
前記n個の第j学習器により算出された前記第j報酬のうち一部または全部に基づいて前記エージェントが採るべき行動方策を決定する行動方策決定器とを備え、
第i+1学習器(i=1,2,‥n−1)が第i状態変数に応じた第i報酬関数の値と、第i価値関数の時間微分である第i価値勾配関数の値とに基づいて第i+1報酬を算出することを特徴とする強化学習システム。 - 請求項1記載の強化学習システムにおいて、
前記第i+1学習器が、前記第i価値勾配関数が大きい負値であるほど前記第i+1報酬を低く評価することを特徴とする強化学習システム。 - 請求項2記載の強化学習システムにおいて、
前記第i+1学習器が、前記第i価値勾配関数が正値である場合、例外的に前記第i価値勾配関数の値とは無関係に前記第i+1報酬を評価することを特徴とする強化学習システム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009174585A JP5330138B2 (ja) | 2008-11-04 | 2009-07-27 | 強化学習システム |
US12/610,709 US8392346B2 (en) | 2008-11-04 | 2009-11-02 | Reinforcement learning system |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008283677 | 2008-11-04 | ||
JP2008283677 | 2008-11-04 | ||
JP2009174585A JP5330138B2 (ja) | 2008-11-04 | 2009-07-27 | 強化学習システム |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2010134907A JP2010134907A (ja) | 2010-06-17 |
JP5330138B2 true JP5330138B2 (ja) | 2013-10-30 |
Family
ID=42132664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2009174585A Expired - Fee Related JP5330138B2 (ja) | 2008-11-04 | 2009-07-27 | 強化学習システム |
Country Status (2)
Country | Link |
---|---|
US (1) | US8392346B2 (ja) |
JP (1) | JP5330138B2 (ja) |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8648867B2 (en) | 2006-09-25 | 2014-02-11 | Neurala Llc | Graphic processor based accelerator system and method |
JP5109098B2 (ja) * | 2007-06-14 | 2012-12-26 | 本田技研工業株式会社 | 運動制御システム、運動制御方法および運動制御プログラム |
CN102200787B (zh) * | 2011-04-18 | 2013-04-17 | 重庆大学 | 机器人行为多层次集成学习方法及系统 |
US8843236B2 (en) * | 2012-03-15 | 2014-09-23 | GM Global Technology Operations LLC | Method and system for training a robot using human-assisted task demonstration |
EP3000030A4 (en) | 2013-05-22 | 2017-07-05 | Neurala Inc. | Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence |
EP2999940A4 (en) | 2013-05-22 | 2017-11-15 | Neurala Inc. | Methods and apparatus for early sensory integration and robust acquisition of real world knowledge |
US9509763B2 (en) | 2013-05-24 | 2016-11-29 | Qualcomm Incorporated | Delayed actions for a decentralized system of learning devices |
US9747554B2 (en) * | 2013-05-24 | 2017-08-29 | Qualcomm Incorporated | Learning device with continuous configuration capability |
US9679491B2 (en) * | 2013-05-24 | 2017-06-13 | Qualcomm Incorporated | Signaling device for teaching learning devices |
US20140351182A1 (en) * | 2013-05-24 | 2014-11-27 | Qualcomm Incorporated | Modifying Learning Capabilities of Learning Devices |
US9358685B2 (en) * | 2014-02-03 | 2016-06-07 | Brain Corporation | Apparatus and methods for control of robot actions based on corrective user inputs |
US9626566B2 (en) | 2014-03-19 | 2017-04-18 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
EP3120300A4 (en) | 2014-03-19 | 2017-11-22 | Neurala Inc. | Methods and apparatus for autonomous robotic control |
CN104542875B (zh) * | 2014-12-31 | 2016-05-25 | 梧州神冠蛋白肠衣有限公司 | 一种孔状胶原纤维调节机构 |
WO2017019555A1 (en) | 2015-07-24 | 2017-02-02 | Google Inc. | Continuous control with deep reinforcement learning |
DE102016015936B8 (de) | 2015-07-31 | 2024-10-24 | Fanuc Corporation | Vorrichtung für maschinelles Lernen, Robotersystem und maschinelles Lernsystem zum Lernen eines Werkstückaufnahmevorgangs |
JP6522488B2 (ja) * | 2015-07-31 | 2019-05-29 | ファナック株式会社 | ワークの取り出し動作を学習する機械学習装置、ロボットシステムおよび機械学習方法 |
JP6240689B2 (ja) | 2015-07-31 | 2017-11-29 | ファナック株式会社 | 人の行動パターンを学習する機械学習装置、ロボット制御装置、ロボットシステム、および機械学習方法 |
US20170061283A1 (en) * | 2015-08-26 | 2017-03-02 | Applied Brain Research Inc. | Methods and systems for performing reinforcement learning in hierarchical and temporally extended environments |
EP3400558A1 (en) | 2016-02-09 | 2018-11-14 | Google LLC | Reinforcement learning using advantage estimates |
JP2019518273A (ja) * | 2016-04-27 | 2019-06-27 | ニューララ インコーポレイテッド | 深層ニューラルネットワークベースのq学習の経験メモリをプルーニングする方法及び装置 |
CN106094817B (zh) * | 2016-06-14 | 2018-12-11 | 华南理工大学 | 基于大数据方式的强化学习仿人机器人步态规划方法 |
JP6517762B2 (ja) * | 2016-08-23 | 2019-05-22 | ファナック株式会社 | 人とロボットが協働して作業を行うロボットの動作を学習するロボットシステム |
WO2018042730A1 (ja) * | 2016-08-30 | 2018-03-08 | 本田技研工業株式会社 | ロボットの制御装置およびロボットの制御方法 |
JP6514171B2 (ja) * | 2016-09-27 | 2019-05-15 | ファナック株式会社 | 最適な物品把持経路を学習する機械学習装置、及び機械学習方法 |
JP6718834B2 (ja) * | 2017-02-28 | 2020-07-08 | 株式会社日立製作所 | 学習システムおよび学習方法 |
US11138503B2 (en) | 2017-03-22 | 2021-10-05 | Larsx | Continuously learning and optimizing artificial intelligence (AI) adaptive neural network (ANN) computer modeling methods and systems |
US11893488B2 (en) | 2017-03-22 | 2024-02-06 | Larsx | Continuously learning and optimizing artificial intelligence (AI) adaptive neural network (ANN) computer modeling methods and systems |
JP6549644B2 (ja) | 2017-06-27 | 2019-07-24 | ファナック株式会社 | 機械学習装置、ロボット制御システム及び機械学習方法 |
JP6919856B2 (ja) | 2017-09-15 | 2021-08-18 | 富士通株式会社 | 強化学習プログラム、強化学習方法、および強化学習装置 |
JP6845529B2 (ja) * | 2017-11-08 | 2021-03-17 | 本田技研工業株式会社 | 行動決定システム及び自動運転制御装置 |
JP6902487B2 (ja) * | 2018-03-14 | 2021-07-14 | 株式会社日立製作所 | 機械学習システム |
JP7035734B2 (ja) * | 2018-03-30 | 2022-03-15 | 富士通株式会社 | 強化学習プログラム、強化学習方法、および強化学習装置 |
JP7044244B2 (ja) * | 2018-04-04 | 2022-03-30 | ギリア株式会社 | 強化学習システム |
US10860926B2 (en) * | 2018-05-18 | 2020-12-08 | Deepmind Technologies Limited | Meta-gradient updates for training return functions for reinforcement learning systems |
JP2020121381A (ja) * | 2019-01-31 | 2020-08-13 | セイコーエプソン株式会社 | 機械学習器、ロボットシステム、及び機械学習方法 |
US12061673B1 (en) | 2019-03-05 | 2024-08-13 | Hrl Laboratories, Llc | Multi-agent planning and autonomy |
KR102120049B1 (ko) * | 2019-06-12 | 2020-06-08 | 한국인터넷진흥원 | 감가율 자동 조정 방식의 강화 학습 방법 |
DE102019210372A1 (de) * | 2019-07-12 | 2021-01-14 | Robert Bosch Gmbh | Verfahren, Vorrichtung und Computerprogramm zum Erstellen einer Strategie für einen Roboter |
US11605026B2 (en) | 2020-05-15 | 2023-03-14 | Huawei Technologies Co. Ltd. | Methods and systems for support policy learning |
CN114047745B (zh) * | 2021-10-13 | 2023-04-07 | 广州城建职业学院 | 机器人运动控制方法、机器人、计算机装置和存储介质 |
CN114012735B (zh) * | 2021-12-06 | 2022-08-05 | 山西大学 | 一种基于深度强化学习的机械臂控制方法及系统 |
CN115057006B (zh) * | 2022-06-15 | 2024-10-15 | 中国科学院软件研究所 | 一种基于强化学习的蒸馏策略评估的方法、装置及介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3465236B2 (ja) * | 2000-12-20 | 2003-11-10 | 科学技術振興事業団 | ロバスト強化学習方式 |
JP3703821B2 (ja) * | 2003-09-02 | 2005-10-05 | 株式会社国際電気通信基礎技術研究所 | 並列学習装置、並列学習方法及び並列学習プログラム |
-
2009
- 2009-07-27 JP JP2009174585A patent/JP5330138B2/ja not_active Expired - Fee Related
- 2009-11-02 US US12/610,709 patent/US8392346B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US20100114807A1 (en) | 2010-05-06 |
US8392346B2 (en) | 2013-03-05 |
JP2010134907A (ja) | 2010-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5330138B2 (ja) | 強化学習システム | |
Chen et al. | Tracking control of robot manipulators with unknown models: A jacobian-matrix-adaption method | |
JP6343559B2 (ja) | 可変剛性の接合を有するロボットおよび最適化剛性の計算方法 | |
US20210162589A1 (en) | Systems and methods for learning agile locomotion for multiped robots | |
JP2019166626A (ja) | 制御装置及び機械学習装置 | |
JP5859036B2 (ja) | ロボット | |
JP6321905B2 (ja) | 関節システムの制御方法、記憶媒体、制御システム | |
JP5465142B2 (ja) | ロボットおよびその行動制御システム | |
CN108555914B (zh) | 一种基于腱驱动灵巧手的dnn神经网络自适应控制方法 | |
US8805583B2 (en) | Robot and control method thereof | |
Liang et al. | A novel impedance control method of rubber unstacking robot dealing with unpredictable and time-variable adhesion force | |
CN113568422B (zh) | 基于模型预测控制优化强化学习的四足机器人控制方法 | |
Hu et al. | Biped gait optimization using spline function based probability model | |
Lowrey et al. | Real-time state estimation with whole-body multi-contact dynamics: A modified UKF approach | |
TWI781708B (zh) | 學習裝置、學習方法、學習程式、控制裝置、控制方法及控制程式 | |
Chalodhorn et al. | Learning humanoid motion dynamics through sensory-motor mapping in reduced dimensional spaces | |
KR20100065809A (ko) | 학습에 의한 로봇의 보행 방법 및 학습에 의한 보행 메커니즘을 구비한 로봇 | |
CN117572877B (zh) | 一种双足机器人步态控制方法、装置、存储介质及设备 | |
Aloulou et al. | A minimum jerk-impedance controller for planning stable and safe walking patterns of biped robots | |
Chu et al. | Full-body grasping strategy for planar underactuated soft manipulators using passivity-based control | |
Li et al. | Kinodynamic Pose Optimization for Humanoid Loco-Manipulation | |
Zhou et al. | T-TD3: A Reinforcement Learning Framework for Stable Grasping of Deformable Objects Using Tactile Prior | |
Li et al. | Terrain adaptation of hexapod robot based on ground detection and sliding mode control | |
Mirzaee et al. | Adaptive Terminal Sliding Mode Control Using Deep Reinforcement Learning for Zero-Force Control of Exoskeleton Robot Systems | |
Savić et al. | SVM Regression-Based Computed Torque Control of Humanoid Robot Reaching Task |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20120726 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A821 Effective date: 20120726 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20130709 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20130725 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 5330138 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
LAPS | Cancellation because of no payment of annual fees |