JP6728495B2 - 強化学習を用いた環境予測 - Google Patents
強化学習を用いた環境予測 Download PDFInfo
- Publication number
- JP6728495B2 JP6728495B2 JP2019523612A JP2019523612A JP6728495B2 JP 6728495 B2 JP6728495 B2 JP 6728495B2 JP 2019523612 A JP2019523612 A JP 2019523612A JP 2019523612 A JP2019523612 A JP 2019523612A JP 6728495 B2 JP6728495 B2 JP 6728495B2
- Authority
- JP
- Japan
- Prior art keywords
- neural network
- state representation
- internal time
- internal
- time step
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Description
vk=rk+1+γk+1rk+2+γk+1γk+2rk+3+...
ここで、vkは、計画ステップkにおける価値予測であり、riは、計画ステップiにおける予測された報酬116であり、γiは、計画ステップiにおける予測された係数118である。
gk=r1+γ1(r2+γ2(...+γk-1(rk+γkvk)...))
として決定し、ここで、gkはkステップリターンであり、riは計画ステップiの報酬であり、γiは計画ステップiの割引係数であり、vkは計画ステップkの価値予測である。
gk,λ=(1-λk)vk+λk(rk+1+γk+1gk+1,λ)、およびgK,λ=vK
であり、λ重み付けリターンgλは、g0,λとして決定される。
102 エージェント
104 行動
106 環境
108 観察
110 アグリゲート報酬
112 アキュムレータ
114 内部状態表現
116 予測された報酬
118 予測された割引係数
120 予測ニューラルネットワーク
122 状態表現ニューラルネットワーク
124 価値予測ニューラルネットワーク
126 ラムダニューラルネットワーク
128 結果
130 トレーニングエンジン
Claims (13)
エージェントが対話している環境の状態を特徴づける1つまたは複数の観察を受信することと、
前記1つまたは複数の観察を処理して、現在の環境状態の内部状態表現を生成することと
を行うように構成された、状態表現ニューラルネットワークと、
複数の内部時間ステップの各々について、
前記内部時間ステップのための内部状態表現を受信することと、
前記内部時間ステップのための前記内部状態表現を処理して、
次の内部時間ステップのための内部状態表現、および
前記次の内部時間ステップのための予測された報酬
を生成することと
を行うように構成された、予測ニューラルネットワークと、
前記複数の内部時間ステップの各々について、
前記内部時間ステップのための前記内部状態表現を受信することと、
前記内部時間ステップのための前記内部状態表現を処理して、次の内部時間ステップ以降の将来の累積割引報酬の推定である価値予測を生成することと
を行うように構成された、価値予測ニューラルネットワークと、
前記環境の状態を特徴づける1つまたは複数の観察を受信することと、
前記現在の環境状態の内部状態表現を生成するために、前記状態表現ニューラルネットワークへの入力として、前記1つまたは複数の観察を提供することと、
前記複数の内部時間ステップの各々について、
前記予測ニューラルネットワークおよび前記価値予測ニューラルネットワークを使用して、前記内部時間ステップのための前記内部状態表現から、前記次の内部時間ステップのための内部状態表現、前記次の内部時間ステップのための予測された報酬、および価値予測を生成することと、
前記内部時間ステップのための、前記予測された報酬および前記価値予測から、アグリゲート報酬を決定することと
を行うように構成された、プレディクトロンサブシステムとを備える、
システム。
前記環境が前記現在の状態にあることから生じる報酬の推定として、前記アグリゲート報酬を提供するようにさらに構成された、
請求項1に記載のシステム。
請求項1または2に記載のシステム。
前記内部時間ステップの各々について、現在の内部時間ステップのための内部状態表現を処理して、次の内部時間ステップのためのラムダ係数を生成するように構成されたラムダニューラルネットワークをさらに備え、前記プレディクトロンサブシステムが、前記アグリゲート報酬を決定する際に、前記内部時間ステップのためのリターン係数を決定することと、前記ラムダ係数を使用して、前記リターン係数のための重みを決定することとを行うように構成された、
請求項2または3に記載のシステム。
請求項1から4のいずれか一項に記載のシステム。
請求項1から4のいずれか一項に記載のシステム。
請求項1から6のいずれか一項に記載のシステム。
請求項1から6のいずれか一項に記載のシステム。
1つまたは複数のコンピュータ可読記憶媒体。
前記アグリゲート報酬と、前記環境が前記現在の状態にあることから生じる報酬の推定とに基づく、損失の勾配を決定するステップと、
前記状態表現ニューラルネットワーク、前記予測ニューラルネットワーク、前記価値予測ニューラルネットワーク、および前記ラムダニューラルネットワークのパラメータの現在の値を更新するために、前記損失の前記勾配をバックプロパゲートするステップとを含む、
方法。
前記プレディクトロンサブシステムによって決定された前記内部時間ステップのための前記リターン係数の一貫性に基づく、一貫性損失の勾配を決定するステップと、
前記状態表現ニューラルネットワーク、前記予測ニューラルネットワーク、前記価値予測ニューラルネットワーク、および前記ラムダニューラルネットワークのパラメータの現在の値を更新するために、前記一貫性損失の前記勾配をバックプロパゲートするステップとを含む、
方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020111559A JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662418159P | 2016-11-04 | 2016-11-04 | |
US62/418,159 | 2016-11-04 | ||
PCT/IB2017/056902 WO2018083667A1 (en) | 2016-11-04 | 2017-11-04 | Reinforcement learning systems |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2020111559A Division JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2019537136A JP2019537136A (ja) | 2019-12-19 |
JP6728495B2 true JP6728495B2 (ja) | 2020-07-22 |
Family
ID=60515745
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019523612A Active JP6728495B2 (ja) | 2016-11-04 | 2017-11-04 | 強化学習を用いた環境予測 |
JP2020111559A Active JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2020111559A Active JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Country Status (5)
Country | Link |
---|---|
US (2) | US10733501B2 (ja) |
EP (1) | EP3523760B1 (ja) |
JP (2) | JP6728495B2 (ja) |
CN (2) | CN110088775B (ja) |
WO (1) | WO2018083667A1 (ja) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110088775B (zh) * | 2016-11-04 | 2023-11-07 | 渊慧科技有限公司 | 使用加强学习的环境预测 |
US10692244B2 (en) | 2017-10-06 | 2020-06-23 | Nvidia Corporation | Learning based camera pose estimation from images of an environment |
US11735028B2 (en) | 2018-06-12 | 2023-08-22 | Intergraph Corporation | Artificial intelligence applications for computer-aided dispatch systems |
US10789511B2 (en) | 2018-10-12 | 2020-09-29 | Deepmind Technologies Limited | Controlling agents over long time scales using temporal value transport |
US11313950B2 (en) | 2019-01-15 | 2022-04-26 | Image Sensing Systems, Inc. | Machine learning based highway radar vehicle classification across multiple lanes and speeds |
US11587552B2 (en) | 2019-04-30 | 2023-02-21 | Sutherland Global Services Inc. | Real time key conversational metrics prediction and notability |
CN114761965A (zh) | 2019-09-13 | 2022-07-15 | 渊慧科技有限公司 | 数据驱动的机器人控制 |
CN114020079B (zh) * | 2021-11-03 | 2022-09-16 | 北京邮电大学 | 一种室内空间温度和湿度调控方法及装置 |
US20230367697A1 (en) * | 2022-05-13 | 2023-11-16 | Microsoft Technology Licensing, Llc | Cloud architecture for reinforcement learning |
Family Cites Families (249)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2004068399A1 (ja) | 2003-01-31 | 2006-05-25 | 松下電器産業株式会社 | 予測型行動決定装置および行動決定方法 |
US20160086222A1 (en) * | 2009-01-21 | 2016-03-24 | Truaxis, Inc. | Method and system to remind users of targeted offers in similar categories |
US8775341B1 (en) * | 2010-10-26 | 2014-07-08 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US9015093B1 (en) * | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US8819523B2 (en) * | 2011-05-19 | 2014-08-26 | Cambridge Silicon Radio Limited | Adaptive controller for a configurable audio coding system |
US8793557B2 (en) * | 2011-05-19 | 2014-07-29 | Cambrige Silicon Radio Limited | Method and apparatus for real-time multidimensional adaptation of an audio coding system |
JP5874292B2 (ja) * | 2011-10-12 | 2016-03-02 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US10803525B1 (en) * | 2014-02-19 | 2020-10-13 | Allstate Insurance Company | Determining a property of an insurance policy based on the autonomous features of a vehicle |
US10558987B2 (en) * | 2014-03-12 | 2020-02-11 | Adobe Inc. | System identification framework |
JP5984147B2 (ja) * | 2014-03-27 | 2016-09-06 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 情報処理装置、情報処理方法、及び、プログラム |
US10091785B2 (en) * | 2014-06-11 | 2018-10-02 | The Board Of Trustees Of The University Of Alabama | System and method for managing wireless frequency usage |
WO2016106238A1 (en) * | 2014-12-24 | 2016-06-30 | Google Inc. | Augmenting neural networks to generate additional outputs |
US11080587B2 (en) * | 2015-02-06 | 2021-08-03 | Deepmind Technologies Limited | Recurrent neural networks for data item generation |
CN106056213B (zh) * | 2015-04-06 | 2022-03-29 | 渊慧科技有限公司 | 使用目标和观察来选择强化学习动作 |
CA2993551C (en) * | 2015-07-24 | 2022-10-11 | Google Llc | Continuous control with deep reinforcement learning |
US20170061283A1 (en) * | 2015-08-26 | 2017-03-02 | Applied Brain Research Inc. | Methods and systems for performing reinforcement learning in hierarchical and temporally extended environments |
WO2017044842A1 (en) * | 2015-09-11 | 2017-03-16 | Google Inc. | Training reinforcement learning neural networks |
US10380481B2 (en) * | 2015-10-08 | 2019-08-13 | Via Alliance Semiconductor Co., Ltd. | Neural network unit that performs concurrent LSTM cell calculations |
JP6010204B1 (ja) * | 2015-10-26 | 2016-10-19 | ファナック株式会社 | パワー素子の予測寿命を学習する機械学習装置及び方法並びに該機械学習装置を備えた寿命予測装置及びモータ駆動装置 |
CN108701252B (zh) * | 2015-11-12 | 2024-02-02 | 渊慧科技有限公司 | 使用优先化经验存储器训练神经网络 |
KR102172277B1 (ko) * | 2015-11-12 | 2020-10-30 | 딥마인드 테크놀로지스 리미티드 | 듀얼 심층 신경 네트워크 |
US11072067B2 (en) * | 2015-11-16 | 2021-07-27 | Kindred Systems Inc. | Systems, devices, and methods for distributed artificial neural network computation |
US9536191B1 (en) * | 2015-11-25 | 2017-01-03 | Osaro, Inc. | Reinforcement learning using confidence scores |
JP6193961B2 (ja) * | 2015-11-30 | 2017-09-06 | ファナック株式会社 | 機械の送り軸の送りの滑らかさを最適化する機械学習装置および方法ならびに該機械学習装置を備えたモータ制御装置 |
WO2017096079A1 (en) * | 2015-12-01 | 2017-06-08 | Google Inc. | Selecting action slates using reinforcement learning |
US10885432B1 (en) * | 2015-12-16 | 2021-01-05 | Deepmind Technologies Limited | Selecting actions from large discrete action sets using reinforcement learning |
CN108431549B (zh) * | 2016-01-05 | 2020-09-04 | 御眼视觉技术有限公司 | 具有施加的约束的经训练的系统 |
US20170213150A1 (en) * | 2016-01-25 | 2017-07-27 | Osaro, Inc. | Reinforcement learning using a partitioned input state space |
JP6339603B2 (ja) * | 2016-01-28 | 2018-06-06 | ファナック株式会社 | レーザ加工開始条件を学習する機械学習装置、レーザ装置および機械学習方法 |
JP2017138881A (ja) * | 2016-02-05 | 2017-08-10 | ファナック株式会社 | 操作メニューの表示を学習する機械学習器,数値制御装置,工作機械システム,製造システムおよび機械学習方法 |
JP6669897B2 (ja) * | 2016-02-09 | 2020-03-18 | グーグル エルエルシー | 優位推定を使用する強化学習 |
EP3417242B1 (en) * | 2016-02-15 | 2022-12-21 | Allstate Insurance Company | Real time risk assessment and operational changes with semi-autonomous vehicles |
JP6360090B2 (ja) * | 2016-03-10 | 2018-07-18 | ファナック株式会社 | 機械学習装置、レーザ装置および機械学習方法 |
JP6348137B2 (ja) * | 2016-03-24 | 2018-06-27 | ファナック株式会社 | 工作物の良否を判定する加工機械システム |
WO2017192183A1 (en) * | 2016-05-04 | 2017-11-09 | Google Llc | Augmenting neural networks with external memory using reinforcement learning |
EP3459018B1 (en) * | 2016-05-20 | 2021-10-20 | Deepmind Technologies Limited | Reinforcement learning using pseudo-counts |
US11521056B2 (en) * | 2016-06-17 | 2022-12-06 | Graham Fyffe | System and methods for intrinsic reward reinforcement learning |
JP2018004473A (ja) * | 2016-07-04 | 2018-01-11 | ファナック株式会社 | 軸受の予測寿命を学習する機械学習装置、寿命予測装置および機械学習方法 |
US10839310B2 (en) * | 2016-07-15 | 2020-11-17 | Google Llc | Selecting content items using reinforcement learning |
JP6506219B2 (ja) * | 2016-07-21 | 2019-04-24 | ファナック株式会社 | モータの電流指令を学習する機械学習器,モータ制御装置および機械学習方法 |
WO2018022715A1 (en) * | 2016-07-26 | 2018-02-01 | University Of Connecticut | Early prediction of an intention of a user's actions |
DE202016004628U1 (de) * | 2016-07-27 | 2016-09-23 | Google Inc. | Durchqueren einer Umgebungsstatusstruktur unter Verwendung neuronaler Netze |
US10049301B2 (en) * | 2016-08-01 | 2018-08-14 | Siemens Healthcare Gmbh | Medical scanner teaches itself to optimize clinical protocols and image acquisition |
US11080591B2 (en) * | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
WO2018053187A1 (en) * | 2016-09-15 | 2018-03-22 | Google Inc. | Deep reinforcement learning for robotic manipulation |
US11188821B1 (en) * | 2016-09-15 | 2021-11-30 | X Development Llc | Control policies for collective robot learning |
JP6514166B2 (ja) * | 2016-09-16 | 2019-05-15 | ファナック株式会社 | ロボットの動作プログラムを学習する機械学習装置,ロボットシステムおよび機械学習方法 |
CN115343947A (zh) * | 2016-09-23 | 2022-11-15 | 苹果公司 | 自主车辆的运动控制决策 |
US20180100662A1 (en) * | 2016-10-11 | 2018-04-12 | Mitsubishi Electric Research Laboratories, Inc. | Method for Data-Driven Learning-based Control of HVAC Systems using High-Dimensional Sensory Observations |
US9989964B2 (en) * | 2016-11-03 | 2018-06-05 | Mitsubishi Electric Research Laboratories, Inc. | System and method for controlling vehicle using neural network |
EP3696737B1 (en) * | 2016-11-03 | 2022-08-31 | Deepmind Technologies Limited | Training action selection neural networks |
CN110088775B (zh) * | 2016-11-04 | 2023-11-07 | 渊慧科技有限公司 | 使用加强学习的环境预测 |
WO2018085778A1 (en) * | 2016-11-04 | 2018-05-11 | Google Llc | Unsupervised detection of intermediate reinforcement learning goals |
KR102424893B1 (ko) * | 2016-11-04 | 2022-07-25 | 딥마인드 테크놀로지스 리미티드 | 보조 작업들을 통한 강화 학습 |
US11062207B2 (en) * | 2016-11-04 | 2021-07-13 | Raytheon Technologies Corporation | Control systems using deep reinforcement learning |
CN108230057A (zh) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | 一种智能推荐方法及系统 |
US20180165602A1 (en) * | 2016-12-14 | 2018-06-14 | Microsoft Technology Licensing, Llc | Scalability of reinforcement learning by separation of concerns |
CN110073376A (zh) * | 2016-12-14 | 2019-07-30 | 索尼公司 | 信息处理装置和信息处理方法 |
US20200365015A1 (en) * | 2016-12-19 | 2020-11-19 | ThruGreen, LLC | Connected and adaptive vehicle traffic management system with digital prioritization |
EP3552156B8 (en) * | 2017-02-24 | 2022-08-03 | DeepMind Technologies Limited | Neural episodic control |
WO2018156891A1 (en) * | 2017-02-24 | 2018-08-30 | Google Llc | Training policy neural networks using path consistency learning |
US10373313B2 (en) * | 2017-03-02 | 2019-08-06 | Siemens Healthcare Gmbh | Spatially consistent multi-scale anatomical landmark detection in incomplete 3D-CT data |
US10542019B2 (en) * | 2017-03-09 | 2020-01-21 | International Business Machines Corporation | Preventing intersection attacks |
US10379538B1 (en) * | 2017-03-20 | 2019-08-13 | Zoox, Inc. | Trajectory generation using motion primitives |
US10345808B2 (en) * | 2017-03-30 | 2019-07-09 | Uber Technologies, Inc | Systems and methods to control autonomous vehicle motion |
CN110832509B (zh) * | 2017-04-12 | 2023-11-03 | 渊慧科技有限公司 | 使用神经网络的黑盒优化 |
WO2018188981A1 (en) * | 2017-04-12 | 2018-10-18 | Koninklijke Philips N.V. | Drawing conclusions from free form texts with deep reinforcement learning |
EP3933713A1 (en) * | 2017-04-14 | 2022-01-05 | DeepMind Technologies Limited | Distributional reinforcement learning |
US10606898B2 (en) * | 2017-04-19 | 2020-03-31 | Brown University | Interpreting human-robot instructions |
EP3596662A1 (en) * | 2017-05-19 | 2020-01-22 | Deepmind Technologies Limited | Imagination-based agent neural networks |
EP3593289A1 (en) * | 2017-05-19 | 2020-01-15 | Deepmind Technologies Limited | Training action selection neural networks using a differentiable credit function |
CN117592504A (zh) * | 2017-05-26 | 2024-02-23 | 渊慧科技有限公司 | 训练动作选择神经网络的方法 |
DK3602409T3 (da) * | 2017-06-05 | 2024-01-29 | Deepmind Tech Ltd | Udvælgelse af handlinger ved hjælp af multimodale inputs |
EP3593292A1 (en) * | 2017-06-09 | 2020-01-15 | Deepmind Technologies Limited | Training action selection neural networks |
CN110785268B (zh) * | 2017-06-28 | 2023-04-04 | 谷歌有限责任公司 | 用于语义机器人抓取的机器学习方法和装置 |
US10883844B2 (en) * | 2017-07-27 | 2021-01-05 | Waymo Llc | Neural networks for vehicle trajectory planning |
US11256983B2 (en) * | 2017-07-27 | 2022-02-22 | Waymo Llc | Neural networks for vehicle trajectory planning |
JP6756676B2 (ja) * | 2017-07-27 | 2020-09-16 | ファナック株式会社 | 製造システム |
US20200174490A1 (en) * | 2017-07-27 | 2020-06-04 | Waymo Llc | Neural networks for vehicle trajectory planning |
US11112796B2 (en) * | 2017-08-08 | 2021-09-07 | Uatc, Llc | Object motion prediction and autonomous vehicle control |
JP6564432B2 (ja) * | 2017-08-29 | 2019-08-21 | ファナック株式会社 | 機械学習装置、制御システム、制御装置、及び機械学習方法 |
EP3467717A1 (en) * | 2017-10-04 | 2019-04-10 | Prowler.io Limited | Machine learning system |
US10739776B2 (en) * | 2017-10-12 | 2020-08-11 | Honda Motor Co., Ltd. | Autonomous vehicle policy generation |
US10701641B2 (en) * | 2017-10-13 | 2020-06-30 | Apple Inc. | Interference mitigation in ultra-dense wireless networks |
EP3688675A1 (en) * | 2017-10-27 | 2020-08-05 | DeepMind Technologies Limited | Distributional reinforcement learning for continuous control tasks |
US20200285940A1 (en) * | 2017-10-27 | 2020-09-10 | Deepmind Technologies Limited | Machine learning systems with memory based parameter adaptation for learning fast and slower |
US11701773B2 (en) * | 2017-12-05 | 2023-07-18 | Google Llc | Viewpoint invariant visual servoing of robot end effector using recurrent neural network |
US10926408B1 (en) * | 2018-01-12 | 2021-02-23 | Amazon Technologies, Inc. | Artificial intelligence system for efficiently learning robotic control policies |
US20190244099A1 (en) * | 2018-02-05 | 2019-08-08 | Deepmind Technologies Limited | Continual reinforcement learning with a multi-task agent |
WO2019149949A1 (en) * | 2018-02-05 | 2019-08-08 | Deepmind Technologies Limited | Distributed training using off-policy actor-critic reinforcement learning |
US11221413B2 (en) * | 2018-03-14 | 2022-01-11 | Uatc, Llc | Three-dimensional object detection |
US11467590B2 (en) * | 2018-04-09 | 2022-10-11 | SafeAI, Inc. | Techniques for considering uncertainty in use of artificial intelligence models |
JP6740277B2 (ja) * | 2018-04-13 | 2020-08-12 | ファナック株式会社 | 機械学習装置、制御装置、及び機械学習方法 |
EP3782080A1 (en) * | 2018-04-18 | 2021-02-24 | DeepMind Technologies Limited | Neural networks for scalable continual learning in domains with sequentially learned tasks |
US11263531B2 (en) * | 2018-05-18 | 2022-03-01 | Deepmind Technologies Limited | Unsupervised control using learned rewards |
CN117549293A (zh) * | 2018-05-18 | 2024-02-13 | 谷歌有限责任公司 | 数据高效的分层强化学习 |
US11370423B2 (en) * | 2018-06-15 | 2022-06-28 | Uatc, Llc | Multi-task machine-learned models for object intention determination in autonomous driving |
US11454975B2 (en) * | 2018-06-28 | 2022-09-27 | Uatc, Llc | Providing actionable uncertainties in autonomous vehicles |
US11397089B2 (en) * | 2018-07-13 | 2022-07-26 | Uatc, Llc | Autonomous vehicle routing with route extension |
JP6608010B1 (ja) * | 2018-07-25 | 2019-11-20 | 積水化学工業株式会社 | 制御装置、サーバ、管理システム、コンピュータプログラム、学習モデル及び制御方法 |
US11423295B2 (en) * | 2018-07-26 | 2022-08-23 | Sap Se | Dynamic, automated fulfillment of computer-based resource request provisioning using deep reinforcement learning |
US11537872B2 (en) * | 2018-07-30 | 2022-12-27 | International Business Machines Corporation | Imitation learning by action shaping with antagonist reinforcement learning |
US11734575B2 (en) * | 2018-07-30 | 2023-08-22 | International Business Machines Corporation | Sequential learning of constraints for hierarchical reinforcement learning |
EP3605334A1 (en) * | 2018-07-31 | 2020-02-05 | Prowler.io Limited | Incentive control for multi-agent systems |
JP7011239B2 (ja) * | 2018-08-17 | 2022-01-26 | 横河電機株式会社 | 装置、方法、プログラム、および、記録媒体 |
US11833681B2 (en) * | 2018-08-24 | 2023-12-05 | Nvidia Corporation | Robotic control system |
WO2020047657A1 (en) * | 2018-09-04 | 2020-03-12 | Kindred Systems Inc. | Real-time real-world reinforcement learning systems and methods |
WO2020055759A1 (en) * | 2018-09-11 | 2020-03-19 | Nvidia Corporation | Future object trajectory predictions for autonomous machine applications |
US20220067850A1 (en) * | 2018-09-12 | 2022-03-03 | Electra Vehicles, Inc. | Systems and methods for managing energy storage systems |
US20210325894A1 (en) * | 2018-09-14 | 2021-10-21 | Google Llc | Deep reinforcement learning-based techniques for end to end robot navigation |
US20200097808A1 (en) * | 2018-09-21 | 2020-03-26 | International Business Machines Corporation | Pattern Identification in Reinforcement Learning |
US10872294B2 (en) * | 2018-09-27 | 2020-12-22 | Deepmind Technologies Limited | Imitation learning using a generative predecessor neural network |
WO2020064994A1 (en) * | 2018-09-27 | 2020-04-02 | Deepmind Technologies Limited | Reinforcement learning neural networks grounded in learned visual entities |
JP2022501090A (ja) * | 2018-09-27 | 2022-01-06 | クアンタム サージカル | 自動位置決め手段を備えた医療ロボット |
US11568207B2 (en) * | 2018-09-27 | 2023-01-31 | Deepmind Technologies Limited | Learning observation representations by predicting the future in latent space |
EP3788549B1 (en) * | 2018-09-27 | 2023-09-06 | DeepMind Technologies Limited | Stacked convolutional long short-term memory for model-free reinforcement learning |
US10831210B1 (en) * | 2018-09-28 | 2020-11-10 | Zoox, Inc. | Trajectory generation and optimization using closed-form numerical integration in route-relative coordinates |
JP6901450B2 (ja) * | 2018-10-02 | 2021-07-14 | ファナック株式会社 | 機械学習装置、制御装置及び機械学習方法 |
US20210402598A1 (en) * | 2018-10-10 | 2021-12-30 | Sony Corporation | Robot control device, robot control method, and robot control program |
EP3640873A1 (en) * | 2018-10-17 | 2020-04-22 | Tata Consultancy Services Limited | System and method for concurrent dynamic optimization of replenishment decision in networked node environment |
SG11202104066UA (en) * | 2018-10-26 | 2021-05-28 | Dow Global Technologies Llc | Deep reinforcement learning for production scheduling |
US20210383218A1 (en) * | 2018-10-29 | 2021-12-09 | Google Llc | Determining control policies by minimizing the impact of delusion |
US20200134445A1 (en) * | 2018-10-31 | 2020-04-30 | Advanced Micro Devices, Inc. | Architecture for deep q learning |
US11231717B2 (en) * | 2018-11-08 | 2022-01-25 | Baidu Usa Llc | Auto-tuning motion planning system for autonomous vehicles |
JP6849643B2 (ja) * | 2018-11-09 | 2021-03-24 | ファナック株式会社 | 出力装置、制御装置、及び評価関数と機械学習結果の出力方法 |
WO2020099672A1 (en) * | 2018-11-16 | 2020-05-22 | Deepmind Technologies Limited | Controlling agents using amortized q learning |
US11048253B2 (en) * | 2018-11-21 | 2021-06-29 | Waymo Llc | Agent prioritization for autonomous vehicles |
JP6970078B2 (ja) * | 2018-11-28 | 2021-11-24 | 株式会社東芝 | ロボット動作計画装置、ロボットシステム、および方法 |
KR101990326B1 (ko) * | 2018-11-28 | 2019-06-18 | 한국인터넷진흥원 | 감가율 자동 조정 방식의 강화 학습 방법 |
US10997729B2 (en) * | 2018-11-30 | 2021-05-04 | Baidu Usa Llc | Real time object behavior prediction |
US11137762B2 (en) * | 2018-11-30 | 2021-10-05 | Baidu Usa Llc | Real time decision making for autonomous driving vehicles |
US11131992B2 (en) * | 2018-11-30 | 2021-09-28 | Denso International America, Inc. | Multi-level collaborative control system with dual neural network planning for autonomous vehicle control in a noisy environment |
WO2020132339A2 (en) * | 2018-12-19 | 2020-06-25 | Uatc, Llc | Routing autonomous vehicles using temporal data |
WO2020152364A1 (en) * | 2019-01-24 | 2020-07-30 | Deepmind Technologies Limited | Multi-agent reinforcement learning with matchmaking policies |
JP2020116869A (ja) * | 2019-01-25 | 2020-08-06 | セイコーエプソン株式会社 | 印刷装置、学習装置、学習方法および学習プログラム |
US20200272905A1 (en) * | 2019-02-26 | 2020-08-27 | GE Precision Healthcare LLC | Artificial neural network compression via iterative hybrid reinforcement learning approach |
US10700935B1 (en) * | 2019-02-27 | 2020-06-30 | Peritus.AI, Inc. | Automatic configuration and operation of complex systems |
CA3075156A1 (en) * | 2019-03-15 | 2020-09-15 | Mission Control Space Services Inc. | Terrain traficability assesment for autonomous or semi-autonomous rover or vehicle |
US20200310420A1 (en) * | 2019-03-26 | 2020-10-01 | GM Global Technology Operations LLC | System and method to train and select a best solution in a dynamical system |
US11132608B2 (en) * | 2019-04-04 | 2021-09-28 | Cisco Technology, Inc. | Learning-based service migration in mobile edge computing |
US11312372B2 (en) * | 2019-04-16 | 2022-04-26 | Ford Global Technologies, Llc | Vehicle path prediction |
JP7010877B2 (ja) * | 2019-04-25 | 2022-01-26 | ファナック株式会社 | 機械学習装置、数値制御システム及び機械学習方法 |
JP2022532853A (ja) * | 2019-04-30 | 2022-07-20 | ソウル マシーンズ リミティド | シーケンシング及びプランニングのためのシステム |
US11701771B2 (en) * | 2019-05-15 | 2023-07-18 | Nvidia Corporation | Grasp generation using a variational autoencoder |
WO2020234476A1 (en) * | 2019-05-23 | 2020-11-26 | Deepmind Technologies Limited | Large scale generative neural network model with inference for representation learning using adversial training |
WO2020239641A1 (en) * | 2019-05-24 | 2020-12-03 | Deepmind Technologies Limited | Hierarchical policies for multitask transfer |
US11482210B2 (en) * | 2019-05-29 | 2022-10-25 | Lg Electronics Inc. | Artificial intelligence device capable of controlling other devices based on device information |
US11814046B2 (en) * | 2019-05-29 | 2023-11-14 | Motional Ad Llc | Estimating speed profiles |
JP7221423B6 (ja) * | 2019-06-10 | 2023-05-16 | ジョビー エアロ,インコーポレイテッド | 時間変動音量予測システム |
EP3977227A4 (en) * | 2019-07-03 | 2023-01-25 | Waymo Llc | AGENT PATH PREDICTION USING ANCHOR PATHS |
WO2021004437A1 (en) * | 2019-07-05 | 2021-01-14 | Huawei Technologies Co., Ltd. | Method and system for predictive control of vehicle using digital images |
US20220269948A1 (en) * | 2019-07-12 | 2022-08-25 | Elektrobit Automotive Gmbh | Training of a convolutional neural network |
JP7342491B2 (ja) * | 2019-07-25 | 2023-09-12 | オムロン株式会社 | 推論装置、推論方法、及び推論プログラム |
US11481420B2 (en) * | 2019-08-08 | 2022-10-25 | Nice Ltd. | Systems and methods for analyzing computer input to provide next action |
US11407409B2 (en) * | 2019-08-13 | 2022-08-09 | Zoox, Inc. | System and method for trajectory validation |
SE1950924A1 (en) * | 2019-08-13 | 2021-02-14 | Kaaberg Johard Leonard | Improved machine learning for technical systems |
US11397434B2 (en) * | 2019-08-13 | 2022-07-26 | Zoox, Inc. | Consistency validation for vehicle trajectory selection |
US11458965B2 (en) * | 2019-08-13 | 2022-10-04 | Zoox, Inc. | Feasibility validation for vehicle trajectory selection |
US11599823B2 (en) * | 2019-08-14 | 2023-03-07 | International Business Machines Corporation | Quantum reinforcement learning agent |
WO2021040958A1 (en) * | 2019-08-23 | 2021-03-04 | Carrier Corporation | System and method for early event detection using generative and discriminative machine learning models |
EP4003664A1 (en) * | 2019-08-27 | 2022-06-01 | Google LLC | Future prediction, using stochastic adversarial based sampling, for robotic control |
US11132403B2 (en) * | 2019-09-06 | 2021-09-28 | Digital Asset Capital, Inc. | Graph-manipulation based domain-specific execution environment |
CN114761965A (zh) * | 2019-09-13 | 2022-07-15 | 渊慧科技有限公司 | 数据驱动的机器人控制 |
EP4003665A1 (en) * | 2019-09-15 | 2022-06-01 | Google LLC | Determining environment-conditioned action sequences for robotic tasks |
CN114521262A (zh) * | 2019-09-25 | 2022-05-20 | 渊慧科技有限公司 | 使用因果正确环境模型来控制智能体 |
JP7335434B2 (ja) * | 2019-09-25 | 2023-08-29 | ディープマインド テクノロジーズ リミテッド | 後知恵モデリングを用いた行動選択ニューラルネットワークの訓練 |
US20210089908A1 (en) * | 2019-09-25 | 2021-03-25 | Deepmind Technologies Limited | Modulating agent behavior to optimize learning progress |
WO2021058583A1 (en) * | 2019-09-25 | 2021-04-01 | Deepmind Technologies Limited | Training action selection neural networks using q-learning combined with look ahead search |
US11650551B2 (en) * | 2019-10-04 | 2023-05-16 | Mitsubishi Electric Research Laboratories, Inc. | System and method for policy optimization using quasi-Newton trust region method |
US11645518B2 (en) * | 2019-10-07 | 2023-05-09 | Waymo Llc | Multi-agent simulations |
EP3812972A1 (en) * | 2019-10-25 | 2021-04-28 | Robert Bosch GmbH | Method for controlling a robot and robot controller |
US11586931B2 (en) * | 2019-10-31 | 2023-02-21 | Waymo Llc | Training trajectory scoring neural networks to accurately assign scores |
US20210133583A1 (en) * | 2019-11-05 | 2021-05-06 | Nvidia Corporation | Distributed weight update for backpropagation of a neural network |
US11912271B2 (en) * | 2019-11-07 | 2024-02-27 | Motional Ad Llc | Trajectory prediction from precomputed or dynamically generated bank of trajectories |
CN112937564B (zh) * | 2019-11-27 | 2022-09-02 | 魔门塔(苏州)科技有限公司 | 换道决策模型生成方法和无人车换道决策方法及装置 |
US11735045B2 (en) * | 2019-12-04 | 2023-08-22 | Uatc, Llc | Systems and methods for computational resource allocation for autonomous vehicles |
US11442459B2 (en) * | 2019-12-11 | 2022-09-13 | Uatc, Llc | Systems and methods for training predictive models for autonomous devices |
US20210192287A1 (en) * | 2019-12-18 | 2021-06-24 | Nvidia Corporation | Master transform architecture for deep learning |
CN111061277B (zh) * | 2019-12-31 | 2022-04-05 | 歌尔股份有限公司 | 一种无人车全局路径规划方法和装置 |
US11332165B2 (en) * | 2020-01-27 | 2022-05-17 | Honda Motor Co., Ltd. | Human trust calibration for autonomous driving agent of vehicle |
US11494649B2 (en) * | 2020-01-31 | 2022-11-08 | At&T Intellectual Property I, L.P. | Radio access network control with deep reinforcement learning |
US20220291666A1 (en) * | 2020-02-03 | 2022-09-15 | Strong Force TX Portfolio 2018, LLC | Ai solution selection for an automated robotic process |
EP4104104A1 (en) * | 2020-02-10 | 2022-12-21 | Deeplife | Generative digital twin of complex systems |
JP7234970B2 (ja) * | 2020-02-17 | 2023-03-08 | 株式会社デンソー | 車両行動生成装置、車両行動生成方法、および車両行動生成プログラム |
DE102020202350A1 (de) * | 2020-02-24 | 2021-08-26 | Volkswagen Aktiengesellschaft | Verfahren und Vorrichtung zum Unterstützen einer Manöverplanung für ein automatisiert fahrendes Fahrzeug oder einen Roboter |
US11717960B2 (en) * | 2020-02-25 | 2023-08-08 | Intelligrated Headquarters, Llc | Anti-sway control for a robotic arm with adaptive grasping |
US11759951B2 (en) * | 2020-02-28 | 2023-09-19 | Honda Motor Co., Ltd. | Systems and methods for incorporating latent states into robotic planning |
US11782438B2 (en) * | 2020-03-17 | 2023-10-10 | Nissan North America, Inc. | Apparatus and method for post-processing a decision-making model of an autonomous vehicle using multivariate data |
US20210327578A1 (en) * | 2020-04-08 | 2021-10-21 | Babylon Partners Limited | System and Method for Medical Triage Through Deep Q-Learning |
US20210334654A1 (en) * | 2020-04-24 | 2021-10-28 | Mastercard International Incorporated | Methods and systems for reducing bias in an artificial intelligence model |
WO2021220008A1 (en) * | 2020-04-29 | 2021-11-04 | Deep Render Ltd | Image compression and decoding, video compression and decoding: methods and systems |
WO2021232047A1 (en) * | 2020-05-12 | 2021-11-18 | Uber Technologies, Inc. | Vehicle routing using third party vehicle capabilities |
EP4162338A1 (en) * | 2020-06-05 | 2023-04-12 | Gatik AI Inc. | Method and system for deterministic trajectory selection based on uncertainty estimation for an autonomous agent |
EP4162721A4 (en) * | 2020-06-05 | 2024-03-06 | Ericsson Telefon Ab L M | MACHINE LEARNING-BASED DYNAMIC SPECTRUM SHARING |
US20210390409A1 (en) * | 2020-06-12 | 2021-12-16 | Google Llc | Training reinforcement learning agents using augmented temporal difference learning |
US20210397959A1 (en) * | 2020-06-22 | 2021-12-23 | Google Llc | Training reinforcement learning agents to learn expert exploration behaviors from demonstrators |
US11734624B2 (en) * | 2020-07-24 | 2023-08-22 | Genesys Cloud Services, Inc. | Method and system for scalable contact center agent scheduling utilizing automated AI modeling and multi-objective optimization |
US11835958B2 (en) * | 2020-07-28 | 2023-12-05 | Huawei Technologies Co., Ltd. | Predictive motion planning system and method |
US20220032949A1 (en) * | 2020-07-29 | 2022-02-03 | Uber Technologies, Inc. | Routing feature flags |
DE102020209685B4 (de) * | 2020-07-31 | 2023-07-06 | Robert Bosch Gesellschaft mit beschränkter Haftung | Verfahren zum steuern einer robotervorrichtung und robotervorrichtungssteuerung |
EP4196876A4 (en) * | 2020-08-14 | 2024-04-10 | Lancium Llc | PERFORMANCE-CONSCIOUS PLANNING |
JP7366860B2 (ja) * | 2020-08-17 | 2023-10-23 | 株式会社日立製作所 | 攻撃シナリオシミュレーション装置、攻撃シナリオ生成システム、および攻撃シナリオ生成方法 |
US11715007B2 (en) * | 2020-08-28 | 2023-08-01 | UMNAI Limited | Behaviour modeling, verification, and autonomous actions and triggers of ML and AI systems |
EP4205034A1 (en) * | 2020-10-02 | 2023-07-05 | DeepMind Technologies Limited | Training reinforcement learning agents using augmented temporal difference learning |
US20220129708A1 (en) * | 2020-10-22 | 2022-04-28 | Applied Materials Israel Ltd. | Segmenting an image using a neural network |
EP4244770A1 (en) * | 2020-11-12 | 2023-09-20 | Umnai Limited | Architecture for explainable reinforcement learning |
US20220152826A1 (en) * | 2020-11-13 | 2022-05-19 | Nvidia Corporation | Object rearrangement using learned implicit collision functions |
US20220164657A1 (en) * | 2020-11-25 | 2022-05-26 | Chevron U.S.A. Inc. | Deep reinforcement learning for field development planning optimization |
US20220188695A1 (en) * | 2020-12-16 | 2022-06-16 | Argo AI, LLC | Autonomous vehicle system for intelligent on-board selection of data for training a remote machine learning model |
US20220197280A1 (en) * | 2020-12-22 | 2022-06-23 | Uatc, Llc | Systems and Methods for Error Sourcing in Autonomous Vehicle Simulation |
US20210133633A1 (en) * | 2020-12-22 | 2021-05-06 | Intel Corporation | Autonomous machine knowledge transfer |
US20220204055A1 (en) * | 2020-12-30 | 2022-06-30 | Waymo Llc | Optimization of planning trajectories for multiple agents |
US20220207337A1 (en) * | 2020-12-31 | 2022-06-30 | Deepx Co., Ltd. | Method for artificial neural network and neural processing unit |
US20220234651A1 (en) * | 2021-01-25 | 2022-07-28 | GM Global Technology Operations LLC | Methods, systems, and apparatuses for adaptive driver override for path based automated driving assist |
CN114912041A (zh) * | 2021-01-29 | 2022-08-16 | 伊姆西Ip控股有限责任公司 | 信息处理方法、电子设备和计算机程序产品 |
US20220261635A1 (en) * | 2021-02-12 | 2022-08-18 | DeeMind Technologies Limited | Training a policy neural network for controlling an agent using best response policy iteration |
US20220269937A1 (en) * | 2021-02-24 | 2022-08-25 | Nvidia Corporation | Generating frames for neural simulation using one or more neural networks |
US20220276657A1 (en) * | 2021-03-01 | 2022-09-01 | Samsung Electronics Co., Ltd. | Trajectory generation of a robot using a neural network |
US11475043B2 (en) * | 2021-03-05 | 2022-10-18 | International Business Machines Corporation | Machine learning based application of changes in a target database system |
US20220284261A1 (en) * | 2021-03-05 | 2022-09-08 | The Aerospace Corporation | Training-support-based machine learning classification and regression augmentation |
US20220300851A1 (en) * | 2021-03-18 | 2022-09-22 | Toyota Research Institute, Inc. | System and method for training a multi-task model |
US20220305649A1 (en) * | 2021-03-25 | 2022-09-29 | Naver Corporation | Reachable manifold and inverse mapping training for robots |
US20220309336A1 (en) * | 2021-03-26 | 2022-09-29 | Nvidia Corporation | Accessing tensors |
US11787055B2 (en) * | 2021-03-30 | 2023-10-17 | Honda Research Institute Europe Gmbh | Controlling a robot using predictive decision making |
US11945441B2 (en) * | 2021-03-31 | 2024-04-02 | Nissan North America, Inc. | Explainability and interface design for lane-level route planner |
US20220318557A1 (en) * | 2021-04-06 | 2022-10-06 | Nvidia Corporation | Techniques for identification of out-of-distribution input data in neural networks |
US20220335624A1 (en) * | 2021-04-15 | 2022-10-20 | Waymo Llc | Unsupervised training of optical flow estimation neural networks |
US11144847B1 (en) * | 2021-04-15 | 2021-10-12 | Latent Strategies LLC | Reinforcement learning using obfuscated environment models |
US11713059B2 (en) * | 2021-04-22 | 2023-08-01 | SafeAI, Inc. | Autonomous control of heavy equipment and vehicles using task hierarchies |
US20220355825A1 (en) * | 2021-04-23 | 2022-11-10 | Motional Ad Llc | Predicting agent trajectories |
US20220366220A1 (en) * | 2021-04-29 | 2022-11-17 | Nvidia Corporation | Dynamic weight updates for neural networks |
US20220366263A1 (en) * | 2021-05-06 | 2022-11-17 | Waymo Llc | Training distilled machine learning models using a pre-trained feature extractor |
US20220373980A1 (en) * | 2021-05-06 | 2022-11-24 | Massachusetts Institute Of Technology | Dymamic control of a manufacturing process using deep reinforcement learning |
US11546665B2 (en) * | 2021-05-07 | 2023-01-03 | Hulu, LLC | Reinforcement learning for guaranteed delivery of supplemental content |
US20220366235A1 (en) * | 2021-05-13 | 2022-11-17 | Deepmind Technologies Limited | Controlling operation of actor and learner computing units based on a usage rate of a replay memory |
CA3160224A1 (en) * | 2021-05-21 | 2022-11-21 | Royal Bank Of Canada | System and method for conditional marginal distributions at flexible evaluation horizons |
US20220398283A1 (en) * | 2021-05-25 | 2022-12-15 | Nvidia Corporation | Method for fast and better tree search for reinforcement learning |
US11941899B2 (en) * | 2021-05-26 | 2024-03-26 | Nvidia Corporation | Data selection based on uncertainty quantification |
US11921506B2 (en) * | 2021-05-28 | 2024-03-05 | Nissan North America, Inc. | Belief state determination for real-time decision-making |
US20220383074A1 (en) * | 2021-05-28 | 2022-12-01 | Deepmind Technologies Limited | Persistent message passing for graph neural networks |
US20230025154A1 (en) * | 2021-07-22 | 2023-01-26 | The Boeing Company | Dual agent reinforcement learning based system for autonomous operation of aircraft |
US20230075473A1 (en) * | 2021-09-09 | 2023-03-09 | Mycronic AB | Device and method for enabling deriving of corrected digital pattern descriptions |
US20230121913A1 (en) * | 2021-10-19 | 2023-04-20 | Volvo Car Corporation | Intelligent messaging framework for vehicle ecosystem communication |
US20230237342A1 (en) * | 2022-01-24 | 2023-07-27 | Nvidia Corporation | Adaptive lookahead for planning and learning |
CN114362175B (zh) * | 2022-03-10 | 2022-06-07 | 山东大学 | 基于深度确定性策略梯度算法的风电功率预测方法及系统 |
US11429845B1 (en) * | 2022-03-29 | 2022-08-30 | Intuit Inc. | Sparsity handling for machine learning model forecasting |
US20230376961A1 (en) * | 2022-05-19 | 2023-11-23 | Oracle Financial Services Software Limited | Reinforcement learning agent simulation to measure monitoring system strength |
US20240070485A1 (en) * | 2022-08-16 | 2024-02-29 | Optum, Inc. | Reinforcement learning for optimizing cross-channel communications |
CN115529278A (zh) * | 2022-09-07 | 2022-12-27 | 华东师范大学 | 基于多智能体强化学习的数据中心网络ecn自动调控方法 |
-
2017
- 2017-11-04 CN CN201780078702.3A patent/CN110088775B/zh active Active
- 2017-11-04 CN CN202311473332.7A patent/CN117521725A/zh active Pending
- 2017-11-04 EP EP17807934.9A patent/EP3523760B1/en active Active
- 2017-11-04 JP JP2019523612A patent/JP6728495B2/ja active Active
- 2017-11-04 WO PCT/IB2017/056902 patent/WO2018083667A1/en unknown
-
2019
- 2019-05-03 US US16/403,314 patent/US10733501B2/en active Active
-
2020
- 2020-06-25 US US16/911,992 patent/US20200327399A1/en active Pending
- 2020-06-29 JP JP2020111559A patent/JP6917508B2/ja active Active
Also Published As
Publication number | Publication date |
---|---|
CN110088775A (zh) | 2019-08-02 |
US20200327399A1 (en) | 2020-10-15 |
EP3523760B1 (en) | 2024-01-24 |
US20190259051A1 (en) | 2019-08-22 |
EP3523760A1 (en) | 2019-08-14 |
JP2020191097A (ja) | 2020-11-26 |
CN117521725A (zh) | 2024-02-06 |
US10733501B2 (en) | 2020-08-04 |
CN110088775B (zh) | 2023-11-07 |
JP2019537136A (ja) | 2019-12-19 |
WO2018083667A1 (en) | 2018-05-11 |
JP6917508B2 (ja) | 2021-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6728495B2 (ja) | 強化学習を用いた環境予測 | |
JP6926203B2 (ja) | 補助タスクを伴う強化学習 | |
JP6935550B2 (ja) | 強化学習を使用した環境ナビゲーション | |
CN110692066B (zh) | 使用多模态输入选择动作 | |
JP7258965B2 (ja) | ニューラルネットワークを使用する強化学習のための行動選択 | |
CN107851216B (zh) | 一种用于选择待由与环境进行交互的强化学习代理执行的动作的方法 | |
CN108027897B (zh) | 利用深度强化学习的连续控制 | |
US20230237375A1 (en) | Dynamic placement of computation sub-graphs | |
JP2019537132A (ja) | アクション選択ニューラルネットワークをトレーニングすること | |
US11200482B2 (en) | Recurrent environment predictors | |
US10860895B2 (en) | Imagination-based agent neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20190705 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20200601 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20200701 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6728495 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |