DK3079106T3 - UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER - Google Patents

UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER Download PDF

Info

Publication number
DK3079106T3
DK3079106T3 DK16164072.7T DK16164072T DK3079106T3 DK 3079106 T3 DK3079106 T3 DK 3079106T3 DK 16164072 T DK16164072 T DK 16164072T DK 3079106 T3 DK3079106 T3 DK 3079106T3
Authority
DK
Denmark
Prior art keywords
observations
objectives
reinforcement learning
learning actions
selecting
Prior art date
Application number
DK16164072.7T
Other languages
English (en)
Inventor
Tom Schaul
Daniel George Horgan
Karol Gregor
David Silver
Original Assignee
Deepmind Tech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deepmind Tech Ltd filed Critical Deepmind Tech Ltd
Application granted granted Critical
Publication of DK3079106T3 publication Critical patent/DK3079106T3/da

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
DK16164072.7T 2015-04-06 2016-04-06 UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER DK3079106T3 (da)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US201562143677P 2015-04-06 2015-04-06

Publications (1)

Publication Number Publication Date
DK3079106T3 true DK3079106T3 (da) 2022-08-01

Family

ID=55697112

Family Applications (1)

Application Number Title Priority Date Filing Date
DK16164072.7T DK3079106T3 (da) 2015-04-06 2016-04-06 UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER

Country Status (4)

Country Link
US (1) US10628733B2 (da)
EP (1) EP3079106B1 (da)
CN (1) CN106056213B (da)
DK (1) DK3079106T3 (da)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110235148B (zh) * 2016-11-03 2024-03-19 渊慧科技有限公司 训练动作选择神经网络
JP6926203B2 (ja) 2016-11-04 2021-08-25 ディープマインド テクノロジーズ リミテッド 補助タスクを伴う強化学習
KR20220147154A (ko) 2016-11-04 2022-11-02 딥마인드 테크놀로지스 리미티드 신경망을 이용한 장면 이해 및 생성
CN117371492A (zh) * 2016-11-04 2024-01-09 渊慧科技有限公司 一种计算机实现的方法及其系统
EP3523761B1 (en) * 2016-11-04 2023-09-20 DeepMind Technologies Limited Recurrent environment predictors
WO2018085778A1 (en) 2016-11-04 2018-05-11 Google Llc Unsupervised detection of intermediate reinforcement learning goals
JP6728495B2 (ja) * 2016-11-04 2020-07-22 ディープマインド テクノロジーズ リミテッド 強化学習を用いた環境予測
WO2018093935A1 (en) 2016-11-15 2018-05-24 Google Llc Training neural networks using a clustering loss
EP3559865A1 (en) * 2017-01-31 2019-10-30 Deepmind Technologies Limited Data-efficient reinforcement learning for continuous control tasks
WO2018142378A1 (en) * 2017-02-06 2018-08-09 Deepmind Technologies Limited Memory augmented generative temporal models
JP6926218B2 (ja) * 2017-02-24 2021-08-25 ディープマインド テクノロジーズ リミテッド ニューラルネットワークを使用する強化学習のための行動選択
CN110235149B (zh) * 2017-02-24 2023-07-07 渊慧科技有限公司 神经情节控制
CN110326004B (zh) * 2017-02-24 2023-06-30 谷歌有限责任公司 使用路径一致性学习训练策略神经网络
CN110192205A (zh) * 2017-03-17 2019-08-30 谷歌有限责任公司 镜像损失神经网络
WO2018172513A1 (en) 2017-03-23 2018-09-27 Deepmind Technologies Limited Training neural networks using posterior sharpening
EP3933713A1 (en) * 2017-04-14 2022-01-05 DeepMind Technologies Limited Distributional reinforcement learning
TWI719302B (zh) 2017-04-26 2021-02-21 美商谷歌有限責任公司 將機器學習整合至控制系統
CN110622174A (zh) * 2017-05-19 2019-12-27 渊慧科技有限公司 基于想象的智能体神经网络
CN110574046A (zh) * 2017-05-19 2019-12-13 渊慧科技有限公司 各种行为的数据有效模仿
CN110447041B (zh) * 2017-05-20 2023-05-30 渊慧科技有限公司 噪声神经网络层
EP3602409B1 (en) * 2017-06-05 2023-11-01 Deepmind Technologies Limited Selecting actions using multi-modal inputs
US11604997B2 (en) * 2017-06-09 2023-03-14 Deepmind Technologies Limited Training action selection neural networks using leave-one-out-updates
US11868882B2 (en) 2017-06-28 2024-01-09 Deepmind Technologies Limited Training action selection neural networks using apprenticeship
EP3680836A4 (en) * 2017-09-07 2020-08-12 Sony Corporation INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING PROCESS
WO2019081783A1 (en) * 2017-10-27 2019-05-02 Deepmind Technologies Limited REINFORCING LEARNING USING DISTRIBUTED PRIORIZED READING
US11604941B1 (en) * 2017-10-27 2023-03-14 Deepmind Technologies Limited Training action-selection neural networks from demonstrations using multiple losses
FI20175970A1 (en) * 2017-11-01 2019-05-02 Curious Ai Oy Setting up a control system for the target system
WO2019127063A1 (en) * 2017-12-27 2019-07-04 Intel Corporation Reinforcement learning for human robot interaction
US11593646B2 (en) * 2018-02-05 2023-02-28 Deepmind Technologies Limited Distributed training using actor-critic reinforcement learning with off-policy correction factors
US10628686B2 (en) 2018-03-12 2020-04-21 Waymo Llc Neural networks for object detection and characterization
CN108563112A (zh) * 2018-03-30 2018-09-21 南京邮电大学 用于仿真足球机器人控球的控制方法
US10605608B2 (en) * 2018-05-09 2020-03-31 Deepmind Technologies Limited Performing navigation tasks using grid codes
CN109116854B (zh) * 2018-09-16 2021-03-12 南京大学 一种基于强化学习的多组机器人协作控制方法及控制系统
EP3788554B1 (en) * 2018-09-27 2024-01-10 DeepMind Technologies Limited Imitation learning using a generative predecessor neural network
EP3834138A1 (en) * 2018-09-27 2021-06-16 DeepMind Technologies Limited Reinforcement learning neural networks grounded in learned visual entities
EP3788549B1 (en) * 2018-09-27 2023-09-06 DeepMind Technologies Limited Stacked convolutional long short-term memory for model-free reinforcement learning
CN112930541A (zh) * 2018-10-29 2021-06-08 谷歌有限责任公司 通过最小化妄想影响来确定控制策略
CN109598332B (zh) * 2018-11-14 2021-04-09 北京市商汤科技开发有限公司 神经网络生成方法及装置、电子设备和存储介质
US10739777B2 (en) * 2018-11-20 2020-08-11 Waymo Llc Trajectory representation in behavior prediction systems
US11755923B2 (en) 2018-11-29 2023-09-12 International Business Machines Corporation Guided plan recognition
EP3668050A1 (de) * 2018-12-12 2020-06-17 Siemens Aktiengesellschaft Anpassen einer software-anwendung, die auf einem gateway ausgeführt wird
US20220076099A1 (en) * 2019-02-19 2022-03-10 Google Llc Controlling agents using latent plans
CN110018869B (zh) 2019-02-20 2021-02-05 创新先进技术有限公司 通过强化学习向用户展示页面的方法及装置
WO2020190326A1 (en) 2019-03-15 2020-09-24 3M Innovative Properties Company Determining causal models for controlling environments
CN110428057A (zh) * 2019-05-06 2019-11-08 南京大学 一种基于多智能体深度强化学习算法的智能博弈系统
CN110554604B (zh) * 2019-08-08 2021-07-09 中国地质大学(武汉) 一种多智能体同步控制方法、设备及存储设备
CN110516389B (zh) * 2019-08-29 2021-04-13 腾讯科技(深圳)有限公司 行为控制策略的学习方法、装置、设备及存储介质
US11907335B2 (en) 2020-10-16 2024-02-20 Cognitive Space System and method for facilitating autonomous target selection

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101568347B1 (ko) * 2011-04-12 2015-11-12 한국전자통신연구원 지능형 로봇 특성을 갖는 휴대형 컴퓨터 장치 및 그 동작 방법
CN102200787B (zh) * 2011-04-18 2013-04-17 重庆大学 机器人行为多层次集成学习方法及系统
CN102402712B (zh) * 2011-08-31 2014-03-05 山东大学 基于神经网络的机器人强化学习初始化方法
CN102799179B (zh) * 2012-07-06 2014-12-31 山东大学 基于单链序贯回溯q学习的移动机器人路径规划算法
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning

Also Published As

Publication number Publication date
EP3079106A3 (en) 2017-03-29
US20160292568A1 (en) 2016-10-06
CN106056213A (zh) 2016-10-26
EP3079106A2 (en) 2016-10-12
US10628733B2 (en) 2020-04-21
EP3079106B1 (en) 2022-06-08
CN106056213B (zh) 2022-03-29

Similar Documents

Publication Publication Date Title
DK3079106T3 (da) UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER
DK3602409T3 (da) Udvælgelse af handlinger ved hjælp af multimodale inputs
FR3020369B1 (fr) Renfort plat multi-composite
DK3207481T3 (da) Reduktion af fejl ved forudsigelse af genetiske slægtskab
DK3180426T3 (da) Genomredigering ved anvendelse af cas9-nickaser
DK3189081T3 (da) Cd123-bindende midler og anvendelser deraf
DK3568810T3 (da) Handlingsudvælgelse til forstærkningslæring ved hjælp af neurale netværk
FR3036651B1 (fr) Renfort plat multi-composite
DK3504506T3 (da) Mål
DK3092024T3 (da) Kateterpatronenheder
DK3455372T3 (da) Polynukleotidberigelse og -amplifikation ved anvendelse af argonautsystemer
DK3319106T3 (da) Kontrolknap af paneltypen
DK3334576T3 (da) Stavelement
DK3307436T3 (da) Mikrofluidanordning
DK3128005T3 (da) Sirp-alfa-variantkonstruktioner og anvendelser deraf
FR3025617B1 (fr) Architecture bi-voies
DK3304451T3 (da) Beslutningstagning
DK3209677T3 (da) Varianter af gal2-transporter og anvendelser deraf
FI11890U1 (fi) Kosketustyyppinen syöttölaite
DK3365321T3 (da) Solabegron-zwitterion og anvendelser deraf
DK3337506T3 (da) Kombinationer og anvendelser deraf
DK3164233T3 (da) Ordning og fremgangsmåde til fremstilling af et armeringsbur
DK3482821T3 (da) Omrører
FR3024169B1 (fr) Element de construction
DK3548610T3 (da) Nk-medieret immunterapi og anvendelser deraf