DK3535705T3 - Forstærkningslæring med hjælpeopgaver - Google Patents

Forstærkningslæring med hjælpeopgaver Download PDF

Info

Publication number
DK3535705T3
DK3535705T3 DK17808163.4T DK17808163T DK3535705T3 DK 3535705 T3 DK3535705 T3 DK 3535705T3 DK 17808163 T DK17808163 T DK 17808163T DK 3535705 T3 DK3535705 T3 DK 3535705T3
Authority
DK
Denmark
Prior art keywords
reinforcement learning
assistant tasks
tasks
assistant
learning
Prior art date
Application number
DK17808163.4T
Other languages
English (en)
Inventor
Volodymyr Mnih
Wojciech Czarnecki
Maxwell Elliot Jaderberg
Tom Schaul
David Silver
Koray Kavukcuoglu
Original Assignee
Deepmind Tech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deepmind Tech Ltd filed Critical Deepmind Tech Ltd
Application granted granted Critical
Publication of DK3535705T3 publication Critical patent/DK3535705T3/da

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
DK17808163.4T 2016-11-04 2017-11-04 Forstærkningslæring med hjælpeopgaver DK3535705T3 (da)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662418120P 2016-11-04 2016-11-04
PCT/IB2017/056906 WO2018083671A1 (en) 2016-11-04 2017-11-04 Reinforcement learning with auxiliary tasks

Publications (1)

Publication Number Publication Date
DK3535705T3 true DK3535705T3 (da) 2022-05-30

Family

ID=60543606

Family Applications (1)

Application Number Title Priority Date Filing Date
DK17808163.4T DK3535705T3 (da) 2016-11-04 2017-11-04 Forstærkningslæring med hjælpeopgaver

Country Status (7)

Country Link
US (3) US10956820B2 (da)
EP (1) EP3535705B1 (da)
JP (2) JP6926203B2 (da)
KR (1) KR102424893B1 (da)
CN (1) CN110114783B (da)
DK (1) DK3535705T3 (da)
WO (1) WO2018083671A1 (da)

Families Citing this family (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910545A (zh) 2015-11-12 2024-04-19 渊慧科技有限公司 使用优先化经验存储器训练神经网络
CA3004885C (en) 2015-11-12 2020-07-14 Deepmind Technologies Limited Asynchronous deep reinforcement learning
JP6926203B2 (ja) * 2016-11-04 2021-08-25 ディープマインド テクノロジーズ リミテッド 補助タスクを伴う強化学習
JP6728495B2 (ja) * 2016-11-04 2020-07-22 ディープマインド テクノロジーズ リミテッド 強化学習を用いた環境予測
US11604997B2 (en) * 2017-06-09 2023-03-14 Deepmind Technologies Limited Training action selection neural networks using leave-one-out-updates
KR102100350B1 (ko) * 2017-10-16 2020-04-14 농업회사법인 상상텃밭 주식회사 온실 시스템의 제어 모델 생성 방법
US20210192357A1 (en) 2018-05-17 2021-06-24 Magic Leap, Inc. Gradient adversarial training of neural networks
CN112154461A (zh) * 2018-05-18 2020-12-29 渊慧科技有限公司 用于多代理环境中的行为预测和强化学习的图神经网络系统
US11600387B2 (en) 2018-05-18 2023-03-07 Htc Corporation Control method and reinforcement learning for medical system
US11086674B2 (en) 2018-05-25 2021-08-10 Royal Bank Of Canada Trade platform with reinforcement learning network and matching engine
EP3807823A1 (en) 2018-06-12 2021-04-21 Intergraph Corporation Artificial intelligence applications for computer-aided dispatch systems
CN109239661A (zh) * 2018-09-18 2019-01-18 广西大学 一种基于深度q网络的rfid室内定位系统及算法
US20200097811A1 (en) * 2018-09-25 2020-03-26 International Business Machines Corporation Reinforcement learning by sharing individual data within dynamic groups
EP3788549B1 (en) * 2018-09-27 2023-09-06 DeepMind Technologies Limited Stacked convolutional long short-term memory for model-free reinforcement learning
US11568207B2 (en) * 2018-09-27 2023-01-31 Deepmind Technologies Limited Learning observation representations by predicting the future in latent space
CA3060914A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada Opponent modeling with asynchronous methods in deep rl
US11574148B2 (en) 2018-11-05 2023-02-07 Royal Bank Of Canada System and method for deep reinforcement learning
WO2020122985A1 (en) * 2018-12-10 2020-06-18 Interactive-Al, Llc Neural modulation codes for multilingual and style dependent speech and language processing
US11313950B2 (en) 2019-01-15 2022-04-26 Image Sensing Systems, Inc. Machine learning based highway radar vehicle classification across multiple lanes and speeds
US11074480B2 (en) * 2019-01-31 2021-07-27 StradVision, Inc. Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning
DE102019105280A1 (de) * 2019-03-01 2020-09-03 Friedrich-Alexander-Universität Erlangen-Nürnberg Autonomes selbstlernendes System
KR102267316B1 (ko) * 2019-03-05 2021-06-21 네이버랩스 주식회사 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템
KR102596158B1 (ko) * 2019-03-20 2023-11-01 소니그룹주식회사 이중 액터 크리틱 알고리즘을 통한 강화 학습
US11308362B2 (en) * 2019-03-26 2022-04-19 Shenzhen Keya Medical Technology Corporation Method and system for generating a centerline for an object, and computer readable medium
US11587552B2 (en) 2019-04-30 2023-02-21 Sutherland Global Services Inc. Real time key conversational metrics prediction and notability
WO2021007019A1 (en) * 2019-07-08 2021-01-14 Google Llc Optimizing a cellular network using machine learning
KR102082113B1 (ko) * 2019-07-23 2020-02-27 주식회사 애자일소다 데이터 기반 강화 학습 장치 및 방법
US11676064B2 (en) * 2019-08-16 2023-06-13 Mitsubishi Electric Research Laboratories, Inc. Constraint adaptor for reinforcement learning control
KR102155055B1 (ko) 2019-10-28 2020-09-11 라온피플 주식회사 강화학습 기반 신호 제어 장치 및 신호 제어 방법
CN110852438B (zh) * 2019-11-11 2023-08-04 北京百度网讯科技有限公司 模型生成方法和装置
US20210158196A1 (en) * 2019-11-25 2021-05-27 Deepmind Technologies Limited Non-stationary delayed bandits with intermediate signals
KR102173579B1 (ko) * 2019-12-02 2020-11-03 한국기술교육대학교 산학협력단 연합강화학습을 통한 다중 디바이스 제어 시스템 및 그 방법
US11579575B2 (en) * 2019-12-03 2023-02-14 Baidu Usa Llc Inverse reinforcement learning with model predictive control
CN111026272B (zh) * 2019-12-09 2023-10-31 网易(杭州)网络有限公司 虚拟对象行为策略的训练方法及装置、电子设备、存储介质
CN111130698B (zh) * 2019-12-26 2022-05-31 南京中感微电子有限公司 无线通信接收窗口预测方法、装置及无线通信设备
CN115066691A (zh) * 2020-02-07 2022-09-16 渊慧科技有限公司 生成或者处理图像序列的循环单元
KR102100688B1 (ko) * 2020-02-19 2020-04-14 주식회사 애자일소다 한도 소진률을 높이기 위한 데이터 기반 강화 학습 장치 및 방법
KR102100686B1 (ko) * 2020-02-19 2020-04-14 주식회사 애자일소다 손실률을 낮추기 위한 데이터 기반 강화 학습 장치 및 방법
KR102440817B1 (ko) * 2020-02-19 2022-09-06 사회복지법인 삼성생명공익재단 기록된 데이터에서 인과성을 식별하는 강화학습 방법, 장치 및 프로그램
CN111416774B (zh) * 2020-03-17 2023-03-21 深圳市赛为智能股份有限公司 网络拥塞控制方法、装置、计算机设备及存储介质
CN111461325B (zh) * 2020-03-30 2023-06-20 华南理工大学 一种用于稀疏奖励环境问题的多目标分层强化学习算法
SG11202102364YA (en) * 2020-04-02 2021-04-29 Alipay Hangzhou Inf Tech Co Ltd Determining action selection policies of an execution device
KR102195433B1 (ko) * 2020-04-07 2020-12-28 주식회사 애자일소다 학습의 목표와 보상을 연계한 데이터 기반 강화 학습 장치 및 방법
KR102272501B1 (ko) * 2020-04-24 2021-07-01 연세대학교 산학협력단 분산 강화 학습 장치 및 방법
CN111496794B (zh) * 2020-04-29 2022-04-01 华中科技大学 一种基于仿真工业机器人的运动学自抓取学习方法和系统
CN111666149B (zh) * 2020-05-06 2023-04-07 西北工业大学 基于深度强化学习的超密边缘计算网络移动性管理方法
CN115380293A (zh) 2020-06-05 2022-11-22 渊慧科技有限公司 多任务强化学习中利用元梯度学习动作选择的选项
US11528347B2 (en) * 2020-06-25 2022-12-13 Nokia Solutions And Networks Oy Inter-packet communication of machine learning information
CN111882030B (zh) * 2020-06-29 2023-12-05 武汉钢铁有限公司 一种基于深度强化学习的加锭策略方法
CN111818570B (zh) * 2020-07-25 2022-04-01 清华大学 一种面向真实网络环境的智能拥塞控制方法及系统
DE102020209685B4 (de) 2020-07-31 2023-07-06 Robert Bosch Gesellschaft mit beschränkter Haftung Verfahren zum steuern einer robotervorrichtung und robotervorrichtungssteuerung
CN112002321B (zh) * 2020-08-11 2023-09-19 海信电子科技(武汉)有限公司 显示设备、服务器及语音交互方法
CN113422751B (zh) * 2020-08-27 2023-12-05 阿里巴巴集团控股有限公司 基于在线强化学习的流媒体处理方法、装置及电子设备
KR102345267B1 (ko) * 2020-10-12 2021-12-31 서울대학교산학협력단 목표 지향적 강화학습 방법 및 이를 수행하기 위한 장치
CN112347104B (zh) * 2020-11-06 2023-09-29 中国人民大学 一种基于深度强化学习的列存储布局优化方法
CN112541835A (zh) * 2020-12-08 2021-03-23 香港中文大学(深圳) 一种基于混合模型的风电场控制学习方法
CN112949988B (zh) * 2021-02-01 2024-01-05 浙江大学 一种基于强化学习的服务流程构造方法
KR102599363B1 (ko) * 2021-02-04 2023-11-09 박근식 사용자기반의 ai에너지 절감 및 수요예측시스템
GB2604640A (en) * 2021-03-12 2022-09-14 Samsung Electronics Co Ltd Performing a processing task instructed by an application
US20220303191A1 (en) * 2021-03-18 2022-09-22 Nokia Solutions And Networks Oy Network management
WO2022199792A1 (en) * 2021-03-22 2022-09-29 Telefonaktiebolaget Lm Ericsson (Publ) Reward estimation for a target policy
CN113242469B (zh) * 2021-04-21 2022-07-12 南京大学 一种自适应视频传输配置方法和系统
CN113420806B (zh) * 2021-06-21 2023-02-03 西安电子科技大学 一种人脸检测质量评分方法及系统
WO2023023848A1 (en) * 2021-08-24 2023-03-02 Royal Bank Of Canada System and method for machine learning architecture with multiple policy heads
CN113810954B (zh) * 2021-09-08 2023-12-29 国网宁夏电力有限公司信息通信公司 基于流量预测与深度强化学习的虚拟资源动态扩缩容方法
CN116330310B (zh) * 2023-02-14 2023-11-07 河南泽远网络科技有限公司 一种低延时机器人交互方法
CN116453706B (zh) * 2023-06-14 2023-09-08 之江实验室 一种基于强化学习的血液透析方案制定方法及系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8345962B2 (en) 2007-11-29 2013-01-01 Nec Laboratories America, Inc. Transfer learning methods and systems for feed-forward visual recognition systems
KR20100112742A (ko) * 2009-04-10 2010-10-20 경기대학교 산학협력단 강화 학습을 위한 행위-기반 구조
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
US10445641B2 (en) 2015-02-06 2019-10-15 Deepmind Technologies Limited Distributed training of reinforcement learning systems
EP3079106B1 (en) * 2015-04-06 2022-06-08 DeepMind Technologies Limited Selecting reinforcement learning actions using goals and observations
FR3052457B1 (fr) 2016-06-14 2018-06-22 Bostik Sa Compositions adhesives a base de polymeres silyles reticulables
JP6926203B2 (ja) * 2016-11-04 2021-08-25 ディープマインド テクノロジーズ リミテッド 補助タスクを伴う強化学習

Also Published As

Publication number Publication date
US11842281B2 (en) 2023-12-12
JP2021185492A (ja) 2021-12-09
US10956820B2 (en) 2021-03-23
EP3535705A1 (en) 2019-09-11
US20190258938A1 (en) 2019-08-22
JP7235813B2 (ja) 2023-03-08
US20210182688A1 (en) 2021-06-17
US20240144015A1 (en) 2024-05-02
KR20190069582A (ko) 2019-06-19
JP6926203B2 (ja) 2021-08-25
KR102424893B1 (ko) 2022-07-25
WO2018083671A1 (en) 2018-05-11
CN110114783B (zh) 2023-07-18
JP2019534517A (ja) 2019-11-28
CN110114783A (zh) 2019-08-09
EP3535705B1 (en) 2022-03-30

Similar Documents

Publication Publication Date Title
DK3535705T3 (da) Forstærkningslæring med hjælpeopgaver
DK3490685T3 (da) Træningsredskab med flere funktioner
FR3020369B1 (fr) Renfort plat multi-composite
DK3180238T3 (da) Bøje med integreret bevægelseskompensering
FR3036651B1 (fr) Renfort plat multi-composite
DK3443466T3 (da) Dokumentautomatisering
DE112016004327A5 (de) Industrieroboter
DE112017003628A5 (de) Fortbewegungshilfe
DK3192935T3 (da) Gulvbelægningselement med skridsikker bagklædning
DK3537878T3 (da) Algedræbende organismer
KR20180084831A (ko) 고무 가교물
DE102015117211B8 (de) Roboterarm mit Eingabeelementen
ES2970531T3 (es) Refuerzo de componentes estructurales
FR3043582B1 (fr) Robot a caractere humanoide motorise
DK3436447T3 (da) Isoquinolinyl-triazolon-komplekser
DE112015001657A5 (de) Schaltwalzenaktorik mit durch Spannungserhöhung dynamikgesteigerter Ausführung
FR3024480B1 (fr) Element de structure a precontrainte anticipee
GB201715657D0 (en) Bridge with truss structures
DK3050978T3 (da) Fleksibel rørformet struktur med stålelement
ES1124880Y (es) Guante reforzado
ES1192483Y (es) Tutor ecológico reforzado
FI11875U1 (fi) Kiinnitin
UA33808S (uk) Піктограма
UA34940S (uk) Фігурка-оріґамі
FI11521U1 (fi) Opaste tilaustoiminnolla