JP2023512722A - 適応リターン計算方式を用いた強化学習 - Google Patents

適応リターン計算方式を用いた強化学習 Download PDF

Info

Publication number
JP2023512722A
JP2023512722A JP2022548005A JP2022548005A JP2023512722A JP 2023512722 A JP2023512722 A JP 2023512722A JP 2022548005 A JP2022548005 A JP 2022548005A JP 2022548005 A JP2022548005 A JP 2022548005A JP 2023512722 A JP2023512722 A JP 2023512722A
Authority
JP
Japan
Prior art keywords
reward
action
environment
return
intrinsic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2022548005A
Other languages
English (en)
Japanese (ja)
Inventor
アドリア・プイドメネチ・バディア
ビラル・ピオット
パブロ・スプレッヒマン
スティーヴン・ジェームズ・カプチュロヴスキ
アレックス・ヴィトヴィツキイ
ジャオハン・グオ
チャールズ・ブランデル
Original Assignee
ディープマインド テクノロジーズ リミテッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ディープマインド テクノロジーズ リミテッド filed Critical ディープマインド テクノロジーズ リミテッド
Publication of JP2023512722A publication Critical patent/JP2023512722A/ja
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/002Special television systems not provided for by H04N7/007 - H04N7/18
    • H04N7/005Special television systems not provided for by H04N7/007 - H04N7/18 using at least one opto-electrical conversion device

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)
  • Geophysics And Detection Of Objects (AREA)
JP2022548005A 2020-02-07 2021-02-08 適応リターン計算方式を用いた強化学習 Pending JP2023512722A (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202062971890P 2020-02-07 2020-02-07
US62/971,890 2020-02-07
PCT/EP2021/052988 WO2021156518A1 (en) 2020-02-07 2021-02-08 Reinforcement learning with adaptive return computation schemes

Publications (1)

Publication Number Publication Date
JP2023512722A true JP2023512722A (ja) 2023-03-28

Family

ID=74591970

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2022548005A Pending JP2023512722A (ja) 2020-02-07 2021-02-08 適応リターン計算方式を用いた強化学習

Country Status (7)

Country Link
US (1) US20230059004A1 (ko)
EP (1) EP4100881A1 (ko)
JP (1) JP2023512722A (ko)
KR (1) KR20220137732A (ko)
CN (1) CN115298668A (ko)
CA (1) CA3167201A1 (ko)
WO (1) WO2021156518A1 (ko)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114362773B (zh) * 2021-12-29 2022-12-06 西南交通大学 一种面向光学射频对消的实时自适应追踪决策方法
GB202202994D0 (en) * 2022-03-03 2022-04-20 Deepmind Tech Ltd Agent control through cultural transmission
CN114676635B (zh) * 2022-03-31 2022-11-11 香港中文大学(深圳) 一种基于强化学习的光学谐振腔反向设计和优化的方法
CN114492845B (zh) * 2022-04-01 2022-07-15 中国科学技术大学 资源受限条件下提高强化学习探索效率的方法
WO2024056891A1 (en) * 2022-09-15 2024-03-21 Deepmind Technologies Limited Data-efficient reinforcement learning with adaptive return computation schemes
WO2024068610A1 (en) * 2022-09-26 2024-04-04 Deepmind Technologies Limited Controlling agents using reporter neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10977551B2 (en) * 2016-12-14 2021-04-13 Microsoft Technology Licensing, Llc Hybrid reward architecture for reinforcement learning

Also Published As

Publication number Publication date
CN115298668A (zh) 2022-11-04
EP4100881A1 (en) 2022-12-14
WO2021156518A1 (en) 2021-08-12
KR20220137732A (ko) 2022-10-12
US20230059004A1 (en) 2023-02-23
CA3167201A1 (en) 2021-08-12

Similar Documents

Publication Publication Date Title
EP3776364B1 (en) Deep reinforcement learning with fast updating recurrent neural networks and slow updating recurrent neural networks
JP2023512722A (ja) 適応リターン計算方式を用いた強化学習
US20240028866A1 (en) Jointly learning exploratory and non-exploratory action selection policies
JP7335434B2 (ja) 後知恵モデリングを用いた行動選択ニューラルネットワークの訓練
US20230244936A1 (en) Multi-agent reinforcement learning with matchmaking policies
CN112119404A (zh) 样本高效的强化学习
US20220366247A1 (en) Training action selection neural networks using q-learning combined with look ahead search
WO2020065024A1 (en) Stacked convolutional long short-term memory for model-free reinforcement learning
JP7419547B2 (ja) 学習済み隠れ状態を使用するエージェント制御のためのプランニング
US12008077B1 (en) Training action-selection neural networks from demonstrations using multiple losses
JP7354460B2 (ja) ブートストラップされた潜在性の予測を使用するエージェント制御のための学習環境表現
JP2023528150A (ja) マルチタスク強化学習におけるメタ勾配を用いたアクション選択のための学習オプション
CN115066686A (zh) 使用对规划嵌入的注意操作生成在环境中实现目标的隐式规划
EP3788554A1 (en) Imitation learning using a generative predecessor neural network
JP2023545021A (ja) パレートフロント最適化を使用する制約付き強化学習ニューラルネットワークシステム
EP3948670A1 (en) Hierarchical policies for multitask transfer
US20240086703A1 (en) Controlling agents using state associative learning for long-term credit assignment
WO2024056891A1 (en) Data-efficient reinforcement learning with adaptive return computation schemes
JP2024519271A (ja) 弁別器モデルの集合を使用した強化学習

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20221005

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20230818

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20230904

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20231204

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20240304

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240604