JP2023512722A - 適応リターン計算方式を用いた強化学習 - Google Patents
適応リターン計算方式を用いた強化学習 Download PDFInfo
- Publication number
- JP2023512722A JP2023512722A JP2022548005A JP2022548005A JP2023512722A JP 2023512722 A JP2023512722 A JP 2023512722A JP 2022548005 A JP2022548005 A JP 2022548005A JP 2022548005 A JP2022548005 A JP 2022548005A JP 2023512722 A JP2023512722 A JP 2023512722A
- Authority
- JP
- Japan
- Prior art keywords
- reward
- action
- environment
- return
- intrinsic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 69
- 230000002787 reinforcement Effects 0.000 title claims abstract description 9
- 230000003044 adaptive effect Effects 0.000 title abstract description 6
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000003860 storage Methods 0.000 claims abstract description 11
- 230000009471 action Effects 0.000 claims description 206
- 238000013528 artificial neural network Methods 0.000 claims description 127
- 238000012549 training Methods 0.000 claims description 79
- 238000012545 processing Methods 0.000 claims description 20
- 230000003993 interaction Effects 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000004590 computer program Methods 0.000 abstract description 17
- 239000003795 chemical substances by application Substances 0.000 description 99
- 230000015654 memory Effects 0.000 description 23
- 230000008569 process Effects 0.000 description 20
- 230000006870 function Effects 0.000 description 17
- 230000000875 corresponding effect Effects 0.000 description 11
- 238000010801 machine learning Methods 0.000 description 9
- 230000001276 controlling effect Effects 0.000 description 7
- 238000009826 distribution Methods 0.000 description 7
- 102000004169 proteins and genes Human genes 0.000 description 7
- 108090000623 proteins and genes Proteins 0.000 description 7
- 238000004088 simulation Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 4
- 238000004821 distillation Methods 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000012846 protein folding Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000011524 similarity measure Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 241000009334 Singa Species 0.000 description 1
- 206010048669 Terminal state Diseases 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000009428 plumbing Methods 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/002—Special television systems not provided for by H04N7/007 - H04N7/18
- H04N7/005—Special television systems not provided for by H04N7/007 - H04N7/18 using at least one opto-electrical conversion device
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Feedback Control In General (AREA)
- Geophysics And Detection Of Objects (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062971890P | 2020-02-07 | 2020-02-07 | |
US62/971,890 | 2020-02-07 | ||
PCT/EP2021/052988 WO2021156518A1 (en) | 2020-02-07 | 2021-02-08 | Reinforcement learning with adaptive return computation schemes |
Publications (1)
Publication Number | Publication Date |
---|---|
JP2023512722A true JP2023512722A (ja) | 2023-03-28 |
Family
ID=74591970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2022548005A Pending JP2023512722A (ja) | 2020-02-07 | 2021-02-08 | 適応リターン計算方式を用いた強化学習 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20230059004A1 (ko) |
EP (1) | EP4100881A1 (ko) |
JP (1) | JP2023512722A (ko) |
KR (1) | KR20220137732A (ko) |
CN (1) | CN115298668A (ko) |
CA (1) | CA3167201A1 (ko) |
WO (1) | WO2021156518A1 (ko) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114362773B (zh) * | 2021-12-29 | 2022-12-06 | 西南交通大学 | 一种面向光学射频对消的实时自适应追踪决策方法 |
GB202202994D0 (en) * | 2022-03-03 | 2022-04-20 | Deepmind Tech Ltd | Agent control through cultural transmission |
CN114676635B (zh) * | 2022-03-31 | 2022-11-11 | 香港中文大学(深圳) | 一种基于强化学习的光学谐振腔反向设计和优化的方法 |
CN114492845B (zh) * | 2022-04-01 | 2022-07-15 | 中国科学技术大学 | 资源受限条件下提高强化学习探索效率的方法 |
WO2024056891A1 (en) * | 2022-09-15 | 2024-03-21 | Deepmind Technologies Limited | Data-efficient reinforcement learning with adaptive return computation schemes |
WO2024068610A1 (en) * | 2022-09-26 | 2024-04-04 | Deepmind Technologies Limited | Controlling agents using reporter neural networks |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10977551B2 (en) * | 2016-12-14 | 2021-04-13 | Microsoft Technology Licensing, Llc | Hybrid reward architecture for reinforcement learning |
-
2021
- 2021-02-08 US US17/797,878 patent/US20230059004A1/en active Pending
- 2021-02-08 EP EP21704741.4A patent/EP4100881A1/en active Pending
- 2021-02-08 CN CN202180021105.3A patent/CN115298668A/zh active Pending
- 2021-02-08 KR KR1020227030755A patent/KR20220137732A/ko unknown
- 2021-02-08 WO PCT/EP2021/052988 patent/WO2021156518A1/en unknown
- 2021-02-08 CA CA3167201A patent/CA3167201A1/en active Pending
- 2021-02-08 JP JP2022548005A patent/JP2023512722A/ja active Pending
Also Published As
Publication number | Publication date |
---|---|
CN115298668A (zh) | 2022-11-04 |
EP4100881A1 (en) | 2022-12-14 |
WO2021156518A1 (en) | 2021-08-12 |
KR20220137732A (ko) | 2022-10-12 |
US20230059004A1 (en) | 2023-02-23 |
CA3167201A1 (en) | 2021-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3776364B1 (en) | Deep reinforcement learning with fast updating recurrent neural networks and slow updating recurrent neural networks | |
JP2023512722A (ja) | 適応リターン計算方式を用いた強化学習 | |
US20240028866A1 (en) | Jointly learning exploratory and non-exploratory action selection policies | |
JP7335434B2 (ja) | 後知恵モデリングを用いた行動選択ニューラルネットワークの訓練 | |
US20230244936A1 (en) | Multi-agent reinforcement learning with matchmaking policies | |
CN112119404A (zh) | 样本高效的强化学习 | |
US20220366247A1 (en) | Training action selection neural networks using q-learning combined with look ahead search | |
WO2020065024A1 (en) | Stacked convolutional long short-term memory for model-free reinforcement learning | |
JP7419547B2 (ja) | 学習済み隠れ状態を使用するエージェント制御のためのプランニング | |
US12008077B1 (en) | Training action-selection neural networks from demonstrations using multiple losses | |
JP7354460B2 (ja) | ブートストラップされた潜在性の予測を使用するエージェント制御のための学習環境表現 | |
JP2023528150A (ja) | マルチタスク強化学習におけるメタ勾配を用いたアクション選択のための学習オプション | |
CN115066686A (zh) | 使用对规划嵌入的注意操作生成在环境中实现目标的隐式规划 | |
EP3788554A1 (en) | Imitation learning using a generative predecessor neural network | |
JP2023545021A (ja) | パレートフロント最適化を使用する制約付き強化学習ニューラルネットワークシステム | |
EP3948670A1 (en) | Hierarchical policies for multitask transfer | |
US20240086703A1 (en) | Controlling agents using state associative learning for long-term credit assignment | |
WO2024056891A1 (en) | Data-efficient reinforcement learning with adaptive return computation schemes | |
JP2024519271A (ja) | 弁別器モデルの集合を使用した強化学習 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20221005 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20230818 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20230904 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20231204 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20240304 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240604 |