DK3079106T3 - UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER - Google Patents
UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER Download PDFInfo
- Publication number
- DK3079106T3 DK3079106T3 DK16164072.7T DK16164072T DK3079106T3 DK 3079106 T3 DK3079106 T3 DK 3079106T3 DK 16164072 T DK16164072 T DK 16164072T DK 3079106 T3 DK3079106 T3 DK 3079106T3
- Authority
- DK
- Denmark
- Prior art keywords
- observations
- objectives
- reinforcement learning
- learning actions
- selecting
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562143677P | 2015-04-06 | 2015-04-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
DK3079106T3 true DK3079106T3 (da) | 2022-08-01 |
Family
ID=55697112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DK16164072.7T DK3079106T3 (da) | 2015-04-06 | 2016-04-06 | UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER |
Country Status (4)
Country | Link |
---|---|
US (1) | US10628733B2 (da) |
EP (1) | EP3079106B1 (da) |
CN (1) | CN106056213B (da) |
DK (1) | DK3079106T3 (da) |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110235148B (zh) * | 2016-11-03 | 2024-03-19 | 渊慧科技有限公司 | 训练动作选择神经网络 |
JP6926203B2 (ja) | 2016-11-04 | 2021-08-25 | ディープマインド テクノロジーズ リミテッド | 補助タスクを伴う強化学習 |
KR20220147154A (ko) | 2016-11-04 | 2022-11-02 | 딥마인드 테크놀로지스 리미티드 | 신경망을 이용한 장면 이해 및 생성 |
CN117371492A (zh) * | 2016-11-04 | 2024-01-09 | 渊慧科技有限公司 | 一种计算机实现的方法及其系统 |
EP3523761B1 (en) * | 2016-11-04 | 2023-09-20 | DeepMind Technologies Limited | Recurrent environment predictors |
WO2018085778A1 (en) | 2016-11-04 | 2018-05-11 | Google Llc | Unsupervised detection of intermediate reinforcement learning goals |
JP6728495B2 (ja) * | 2016-11-04 | 2020-07-22 | ディープマインド テクノロジーズ リミテッド | 強化学習を用いた環境予測 |
WO2018093935A1 (en) | 2016-11-15 | 2018-05-24 | Google Llc | Training neural networks using a clustering loss |
EP3559865A1 (en) * | 2017-01-31 | 2019-10-30 | Deepmind Technologies Limited | Data-efficient reinforcement learning for continuous control tasks |
WO2018142378A1 (en) * | 2017-02-06 | 2018-08-09 | Deepmind Technologies Limited | Memory augmented generative temporal models |
JP6926218B2 (ja) * | 2017-02-24 | 2021-08-25 | ディープマインド テクノロジーズ リミテッド | ニューラルネットワークを使用する強化学習のための行動選択 |
CN110235149B (zh) * | 2017-02-24 | 2023-07-07 | 渊慧科技有限公司 | 神经情节控制 |
CN110326004B (zh) * | 2017-02-24 | 2023-06-30 | 谷歌有限责任公司 | 使用路径一致性学习训练策略神经网络 |
CN110192205A (zh) * | 2017-03-17 | 2019-08-30 | 谷歌有限责任公司 | 镜像损失神经网络 |
WO2018172513A1 (en) | 2017-03-23 | 2018-09-27 | Deepmind Technologies Limited | Training neural networks using posterior sharpening |
EP3933713A1 (en) * | 2017-04-14 | 2022-01-05 | DeepMind Technologies Limited | Distributional reinforcement learning |
TWI719302B (zh) | 2017-04-26 | 2021-02-21 | 美商谷歌有限責任公司 | 將機器學習整合至控制系統 |
CN110622174A (zh) * | 2017-05-19 | 2019-12-27 | 渊慧科技有限公司 | 基于想象的智能体神经网络 |
CN110574046A (zh) * | 2017-05-19 | 2019-12-13 | 渊慧科技有限公司 | 各种行为的数据有效模仿 |
CN110447041B (zh) * | 2017-05-20 | 2023-05-30 | 渊慧科技有限公司 | 噪声神经网络层 |
EP3602409B1 (en) * | 2017-06-05 | 2023-11-01 | Deepmind Technologies Limited | Selecting actions using multi-modal inputs |
US11604997B2 (en) * | 2017-06-09 | 2023-03-14 | Deepmind Technologies Limited | Training action selection neural networks using leave-one-out-updates |
US11868882B2 (en) | 2017-06-28 | 2024-01-09 | Deepmind Technologies Limited | Training action selection neural networks using apprenticeship |
EP3680836A4 (en) * | 2017-09-07 | 2020-08-12 | Sony Corporation | INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING PROCESS |
WO2019081783A1 (en) * | 2017-10-27 | 2019-05-02 | Deepmind Technologies Limited | REINFORCING LEARNING USING DISTRIBUTED PRIORIZED READING |
US11604941B1 (en) * | 2017-10-27 | 2023-03-14 | Deepmind Technologies Limited | Training action-selection neural networks from demonstrations using multiple losses |
FI20175970A1 (en) * | 2017-11-01 | 2019-05-02 | Curious Ai Oy | Setting up a control system for the target system |
WO2019127063A1 (en) * | 2017-12-27 | 2019-07-04 | Intel Corporation | Reinforcement learning for human robot interaction |
US11593646B2 (en) * | 2018-02-05 | 2023-02-28 | Deepmind Technologies Limited | Distributed training using actor-critic reinforcement learning with off-policy correction factors |
US10628686B2 (en) | 2018-03-12 | 2020-04-21 | Waymo Llc | Neural networks for object detection and characterization |
CN108563112A (zh) * | 2018-03-30 | 2018-09-21 | 南京邮电大学 | 用于仿真足球机器人控球的控制方法 |
US10605608B2 (en) * | 2018-05-09 | 2020-03-31 | Deepmind Technologies Limited | Performing navigation tasks using grid codes |
CN109116854B (zh) * | 2018-09-16 | 2021-03-12 | 南京大学 | 一种基于强化学习的多组机器人协作控制方法及控制系统 |
EP3788554B1 (en) * | 2018-09-27 | 2024-01-10 | DeepMind Technologies Limited | Imitation learning using a generative predecessor neural network |
EP3834138A1 (en) * | 2018-09-27 | 2021-06-16 | DeepMind Technologies Limited | Reinforcement learning neural networks grounded in learned visual entities |
EP3788549B1 (en) * | 2018-09-27 | 2023-09-06 | DeepMind Technologies Limited | Stacked convolutional long short-term memory for model-free reinforcement learning |
CN112930541A (zh) * | 2018-10-29 | 2021-06-08 | 谷歌有限责任公司 | 通过最小化妄想影响来确定控制策略 |
CN109598332B (zh) * | 2018-11-14 | 2021-04-09 | 北京市商汤科技开发有限公司 | 神经网络生成方法及装置、电子设备和存储介质 |
US10739777B2 (en) * | 2018-11-20 | 2020-08-11 | Waymo Llc | Trajectory representation in behavior prediction systems |
US11755923B2 (en) | 2018-11-29 | 2023-09-12 | International Business Machines Corporation | Guided plan recognition |
EP3668050A1 (de) * | 2018-12-12 | 2020-06-17 | Siemens Aktiengesellschaft | Anpassen einer software-anwendung, die auf einem gateway ausgeführt wird |
US20220076099A1 (en) * | 2019-02-19 | 2022-03-10 | Google Llc | Controlling agents using latent plans |
CN110018869B (zh) | 2019-02-20 | 2021-02-05 | 创新先进技术有限公司 | 通过强化学习向用户展示页面的方法及装置 |
WO2020190326A1 (en) | 2019-03-15 | 2020-09-24 | 3M Innovative Properties Company | Determining causal models for controlling environments |
CN110428057A (zh) * | 2019-05-06 | 2019-11-08 | 南京大学 | 一种基于多智能体深度强化学习算法的智能博弈系统 |
CN110554604B (zh) * | 2019-08-08 | 2021-07-09 | 中国地质大学(武汉) | 一种多智能体同步控制方法、设备及存储设备 |
CN110516389B (zh) * | 2019-08-29 | 2021-04-13 | 腾讯科技(深圳)有限公司 | 行为控制策略的学习方法、装置、设备及存储介质 |
US11907335B2 (en) | 2020-10-16 | 2024-02-20 | Cognitive Space | System and method for facilitating autonomous target selection |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101568347B1 (ko) * | 2011-04-12 | 2015-11-12 | 한국전자통신연구원 | 지능형 로봇 특성을 갖는 휴대형 컴퓨터 장치 및 그 동작 방법 |
CN102200787B (zh) * | 2011-04-18 | 2013-04-17 | 重庆大学 | 机器人行为多层次集成学习方法及系统 |
CN102402712B (zh) * | 2011-08-31 | 2014-03-05 | 山东大学 | 基于神经网络的机器人强化学习初始化方法 |
CN102799179B (zh) * | 2012-07-06 | 2014-12-31 | 山东大学 | 基于单链序贯回溯q学习的移动机器人路径规划算法 |
US9679258B2 (en) * | 2013-10-08 | 2017-06-13 | Google Inc. | Methods and apparatus for reinforcement learning |
-
2016
- 2016-04-06 EP EP16164072.7A patent/EP3079106B1/en active Active
- 2016-04-06 US US15/091,840 patent/US10628733B2/en active Active
- 2016-04-06 CN CN201610328938.5A patent/CN106056213B/zh active Active
- 2016-04-06 DK DK16164072.7T patent/DK3079106T3/da active
Also Published As
Publication number | Publication date |
---|---|
EP3079106A3 (en) | 2017-03-29 |
US20160292568A1 (en) | 2016-10-06 |
CN106056213A (zh) | 2016-10-26 |
EP3079106A2 (en) | 2016-10-12 |
US10628733B2 (en) | 2020-04-21 |
EP3079106B1 (en) | 2022-06-08 |
CN106056213B (zh) | 2022-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DK3079106T3 (da) | UDVÆLGELSE AF FORSTÆRKNINGSLÆRINGSHANDLINGER VED HJÆLP AF MÅL og OBSERVATIONER | |
DK3602409T3 (da) | Udvælgelse af handlinger ved hjælp af multimodale inputs | |
FR3020369B1 (fr) | Renfort plat multi-composite | |
DK3207481T3 (da) | Reduktion af fejl ved forudsigelse af genetiske slægtskab | |
DK3180426T3 (da) | Genomredigering ved anvendelse af cas9-nickaser | |
DK3189081T3 (da) | Cd123-bindende midler og anvendelser deraf | |
DK3568810T3 (da) | Handlingsudvælgelse til forstærkningslæring ved hjælp af neurale netværk | |
FR3036651B1 (fr) | Renfort plat multi-composite | |
DK3504506T3 (da) | Mål | |
DK3092024T3 (da) | Kateterpatronenheder | |
DK3455372T3 (da) | Polynukleotidberigelse og -amplifikation ved anvendelse af argonautsystemer | |
DK3319106T3 (da) | Kontrolknap af paneltypen | |
DK3334576T3 (da) | Stavelement | |
DK3307436T3 (da) | Mikrofluidanordning | |
DK3128005T3 (da) | Sirp-alfa-variantkonstruktioner og anvendelser deraf | |
FR3025617B1 (fr) | Architecture bi-voies | |
DK3304451T3 (da) | Beslutningstagning | |
DK3209677T3 (da) | Varianter af gal2-transporter og anvendelser deraf | |
FI11890U1 (fi) | Kosketustyyppinen syöttölaite | |
DK3365321T3 (da) | Solabegron-zwitterion og anvendelser deraf | |
DK3337506T3 (da) | Kombinationer og anvendelser deraf | |
DK3164233T3 (da) | Ordning og fremgangsmåde til fremstilling af et armeringsbur | |
DK3482821T3 (da) | Omrører | |
FR3024169B1 (fr) | Element de construction | |
DK3548610T3 (da) | Nk-medieret immunterapi og anvendelser deraf |