JP2019537136A - 強化学習を用いた環境予測 - Google Patents
強化学習を用いた環境予測 Download PDFInfo
- Publication number
- JP2019537136A JP2019537136A JP2019523612A JP2019523612A JP2019537136A JP 2019537136 A JP2019537136 A JP 2019537136A JP 2019523612 A JP2019523612 A JP 2019523612A JP 2019523612 A JP2019523612 A JP 2019523612A JP 2019537136 A JP2019537136 A JP 2019537136A
- Authority
- JP
- Japan
- Prior art keywords
- neural network
- state representation
- internal
- internal time
- time step
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Abstract
Description
vk=rk+1+γk+1rk+2+γk+1γk+2rk+3+...
ここで、vkは、計画ステップkにおける価値予測であり、riは、計画ステップiにおける予測された報酬116であり、γiは、計画ステップiにおける予測された係数118である。
gk=r1+γ1(r2+γ2(...+γk-1(rk+γkvk)...))
として決定し、ここで、gkはkステップリターンであり、riは計画ステップiの報酬であり、γiは計画ステップiの割引係数であり、vkは計画ステップkの価値予測である。
gk,λ=(1-λk)vk+λk(rk+1+γk+1gk+1,λ)、およびgK,λ=vK
であり、λ重み付けリターンgλは、g0,λとして決定される。
102 エージェント
104 行動
106 環境
108 観察
110 アグリゲート報酬
112 アキュムレータ
114 内部状態表現
116 予測された報酬
118 予測された割引係数
120 予測ニューラルネットワーク
122 状態表現ニューラルネットワーク
124 価値予測ニューラルネットワーク
126 ラムダニューラルネットワーク
128 結果
130 トレーニングエンジン
Claims (13)
エージェントが対話している環境の状態を特徴づける1つまたは複数の観察を受信することと、
前記1つまたは複数の観察を処理して、現在の環境状態の内部状態表現を生成することと
を行うように構成された、状態表現ニューラルネットワークと、
複数の内部時間ステップの各々について、
前記内部時間ステップのための内部状態表現を受信することと、
前記内部時間ステップのための前記内部状態表現を処理して、
次の内部時間ステップのための内部状態表現、および
前記次の内部時間ステップのための予測された報酬
を生成することと
を行うように構成された、予測ニューラルネットワークと、
前記複数の内部時間ステップの各々について、
前記内部時間ステップのための前記内部状態表現を受信することと、
前記内部時間ステップのための前記内部状態表現を処理して、次の内部時間ステップ以降の将来の累積割引報酬の推定である価値予測を生成することと
を行うように構成された、価値予測ニューラルネットワークと、
前記環境の状態を特徴づける1つまたは複数の観察を受信することと、
前記現在の環境状態の内部状態表現を生成するために、前記状態表現ニューラルネットワークへの入力として、前記1つまたは複数の観察を提供することと、
前記複数の内部時間ステップの各々について、
前記予測ニューラルネットワークおよび前記価値予測ニューラルネットワークを使用して、前記内部時間ステップのための前記内部状態表現から、前記次の内部時間ステップのための内部状態表現、前記次の内部時間ステップのための予測された報酬、および価値予測を生成することと、
前記内部時間ステップのための、前記予測された報酬および前記価値予測から、アグリゲート報酬を決定することと
を行うように構成された、プレディクトロンサブシステムとを備える、
システム。
前記環境が前記現在の状態にあることから生じる報酬の推定として、前記アグリゲート報酬を提供するようにさらに構成された、
請求項1に記載のシステム。
請求項1または2に記載のシステム。
前記内部時間ステップの各々について、現在の内部時間ステップのための内部状態表現を処理して、次の内部時間ステップのためのラムダ係数を生成するように構成されたラムダニューラルネットワークをさらに備え、前記プレディクトロンサブシステムが、前記アグリゲート報酬を決定する際に、前記内部時間ステップのためのリターン係数を決定することと、前記ラムダ係数を使用して、前記リターン係数のための重みを決定することとを行うように構成された、
請求項2または3に記載のシステム。
請求項1から4のいずれか一項に記載のシステム。
請求項1から4のいずれか一項に記載のシステム。
請求項1から6のいずれか一項に記載のシステム。
請求項1から6のいずれか一項に記載のシステム。
1つまたは複数のコンピュータ可読記憶媒体。
前記アグリゲート報酬と、前記環境が前記現在の状態にあることから生じる報酬の推定とに基づく、損失の勾配を決定するステップと、
前記状態表現ニューラルネットワーク、前記予測ニューラルネットワーク、前記価値予測ニューラルネットワーク、および前記ラムダニューラルネットワークのパラメータの現在の値を更新するために、前記損失の前記勾配をバックプロパゲートするステップとを含む、
方法。
前記プレディクトロンサブシステムによって決定された前記内部時間ステップのための前記リターン係数の一貫性に基づく、一貫性損失の勾配を決定するステップと、
前記状態表現ニューラルネットワーク、前記予測ニューラルネットワーク、前記価値予測ニューラルネットワーク、および前記ラムダニューラルネットワークのパラメータの現在の値を更新するために、前記一貫性損失の前記勾配をバックプロパゲートするステップとを含む、
方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020111559A JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662418159P | 2016-11-04 | 2016-11-04 | |
US62/418,159 | 2016-11-04 | ||
PCT/IB2017/056902 WO2018083667A1 (en) | 2016-11-04 | 2017-11-04 | Reinforcement learning systems |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2020111559A Division JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2019537136A true JP2019537136A (ja) | 2019-12-19 |
JP6728495B2 JP6728495B2 (ja) | 2020-07-22 |
Family
ID=60515745
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019523612A Active JP6728495B2 (ja) | 2016-11-04 | 2017-11-04 | 強化学習を用いた環境予測 |
JP2020111559A Active JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2020111559A Active JP6917508B2 (ja) | 2016-11-04 | 2020-06-29 | 強化学習を用いた環境予測 |
Country Status (5)
Country | Link |
---|---|
US (2) | US10733501B2 (ja) |
EP (1) | EP3523760B1 (ja) |
JP (2) | JP6728495B2 (ja) |
CN (2) | CN117521725A (ja) |
WO (1) | WO2018083667A1 (ja) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117521725A (zh) * | 2016-11-04 | 2024-02-06 | 渊慧科技有限公司 | 加强学习系统 |
US10692244B2 (en) | 2017-10-06 | 2020-06-23 | Nvidia Corporation | Learning based camera pose estimation from images of an environment |
WO2019241145A1 (en) | 2018-06-12 | 2019-12-19 | Intergraph Corporation | Artificial intelligence applications for computer-aided dispatch systems |
JP7139524B2 (ja) * | 2018-10-12 | 2022-09-20 | ディープマインド テクノロジーズ リミテッド | 時間的価値移送を使用した長いタイムスケールにわたるエージェントの制御 |
US11313950B2 (en) | 2019-01-15 | 2022-04-26 | Image Sensing Systems, Inc. | Machine learning based highway radar vehicle classification across multiple lanes and speeds |
US11587552B2 (en) | 2019-04-30 | 2023-02-21 | Sutherland Global Services Inc. | Real time key conversational metrics prediction and notability |
CN114761965A (zh) | 2019-09-13 | 2022-07-15 | 渊慧科技有限公司 | 数据驱动的机器人控制 |
CN114020079B (zh) * | 2021-11-03 | 2022-09-16 | 北京邮电大学 | 一种室内空间温度和湿度调控方法及装置 |
US20230367697A1 (en) * | 2022-05-13 | 2023-11-16 | Microsoft Technology Licensing, Llc | Cloud architecture for reinforcement learning |
Family Cites Families (249)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7107107B2 (en) * | 2003-01-31 | 2006-09-12 | Matsushita Electric Industrial Co., Ltd. | Predictive action decision device and action decision method |
US20160086222A1 (en) * | 2009-01-21 | 2016-03-24 | Truaxis, Inc. | Method and system to remind users of targeted offers in similar categories |
US9015093B1 (en) * | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US8775341B1 (en) * | 2010-10-26 | 2014-07-08 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US8793557B2 (en) * | 2011-05-19 | 2014-07-29 | Cambrige Silicon Radio Limited | Method and apparatus for real-time multidimensional adaptation of an audio coding system |
US8819523B2 (en) * | 2011-05-19 | 2014-08-26 | Cambridge Silicon Radio Limited | Adaptive controller for a configurable audio coding system |
JP5874292B2 (ja) * | 2011-10-12 | 2016-03-02 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US10803525B1 (en) * | 2014-02-19 | 2020-10-13 | Allstate Insurance Company | Determining a property of an insurance policy based on the autonomous features of a vehicle |
US10558987B2 (en) * | 2014-03-12 | 2020-02-11 | Adobe Inc. | System identification framework |
JP5984147B2 (ja) * | 2014-03-27 | 2016-09-06 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 情報処理装置、情報処理方法、及び、プログラム |
US10091785B2 (en) * | 2014-06-11 | 2018-10-02 | The Board Of Trustees Of The University Of Alabama | System and method for managing wireless frequency usage |
US10691997B2 (en) * | 2014-12-24 | 2020-06-23 | Deepmind Technologies Limited | Augmenting neural networks to generate additional outputs |
US11080587B2 (en) * | 2015-02-06 | 2021-08-03 | Deepmind Technologies Limited | Recurrent neural networks for data item generation |
CN106056213B (zh) * | 2015-04-06 | 2022-03-29 | 渊慧科技有限公司 | 使用目标和观察来选择强化学习动作 |
RU2686030C1 (ru) * | 2015-07-24 | 2019-04-23 | Дипмайнд Текнолоджиз Лимитед | Непрерывное управление с помощью глубокого обучения с подкреплением |
US20170061283A1 (en) * | 2015-08-26 | 2017-03-02 | Applied Brain Research Inc. | Methods and systems for performing reinforcement learning in hierarchical and temporally extended environments |
CN107851216B (zh) * | 2015-09-11 | 2022-03-08 | 谷歌有限责任公司 | 一种用于选择待由与环境进行交互的强化学习代理执行的动作的方法 |
US10380481B2 (en) * | 2015-10-08 | 2019-08-13 | Via Alliance Semiconductor Co., Ltd. | Neural network unit that performs concurrent LSTM cell calculations |
JP6010204B1 (ja) * | 2015-10-26 | 2016-10-19 | ファナック株式会社 | パワー素子の予測寿命を学習する機械学習装置及び方法並びに該機械学習装置を備えた寿命予測装置及びモータ駆動装置 |
EP3360083B1 (en) * | 2015-11-12 | 2023-10-25 | DeepMind Technologies Limited | Dueling deep neural networks |
WO2017083767A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Training neural networks using a prioritized experience memory |
US11072067B2 (en) * | 2015-11-16 | 2021-07-27 | Kindred Systems Inc. | Systems, devices, and methods for distributed artificial neural network computation |
US9536191B1 (en) * | 2015-11-25 | 2017-01-03 | Osaro, Inc. | Reinforcement learning using confidence scores |
JP6193961B2 (ja) * | 2015-11-30 | 2017-09-06 | ファナック株式会社 | 機械の送り軸の送りの滑らかさを最適化する機械学習装置および方法ならびに該機械学習装置を備えたモータ制御装置 |
US10699187B2 (en) * | 2015-12-01 | 2020-06-30 | Deepmind Technologies Limited | Selecting action slates using reinforcement learning |
US10885432B1 (en) * | 2015-12-16 | 2021-01-05 | Deepmind Technologies Limited | Selecting actions from large discrete action sets using reinforcement learning |
WO2017120336A2 (en) * | 2016-01-05 | 2017-07-13 | Mobileye Vision Technologies Ltd. | Trained navigational system with imposed constraints |
US20170213150A1 (en) * | 2016-01-25 | 2017-07-27 | Osaro, Inc. | Reinforcement learning using a partitioned input state space |
JP6339603B2 (ja) * | 2016-01-28 | 2018-06-06 | ファナック株式会社 | レーザ加工開始条件を学習する機械学習装置、レーザ装置および機械学習方法 |
JP2017138881A (ja) * | 2016-02-05 | 2017-08-10 | ファナック株式会社 | 操作メニューの表示を学習する機械学習器,数値制御装置,工作機械システム,製造システムおよび機械学習方法 |
CN108701251B (zh) * | 2016-02-09 | 2022-08-12 | 谷歌有限责任公司 | 使用优势估计强化学习 |
CA3014660C (en) * | 2016-02-15 | 2021-08-17 | Allstate Insurance Company | Early notification of non-autonomous area |
JP6360090B2 (ja) * | 2016-03-10 | 2018-07-18 | ファナック株式会社 | 機械学習装置、レーザ装置および機械学習方法 |
JP6348137B2 (ja) * | 2016-03-24 | 2018-06-27 | ファナック株式会社 | 工作物の良否を判定する加工機械システム |
CN109661672B (zh) * | 2016-05-04 | 2023-08-22 | 渊慧科技有限公司 | 使用强化学习利用外部存储器增强神经网络 |
EP3459018B1 (en) * | 2016-05-20 | 2021-10-20 | Deepmind Technologies Limited | Reinforcement learning using pseudo-counts |
WO2017218699A1 (en) * | 2016-06-17 | 2017-12-21 | Graham Leslie Fyffe | System and methods for intrinsic reward reinforcement learning |
JP2018004473A (ja) * | 2016-07-04 | 2018-01-11 | ファナック株式会社 | 軸受の予測寿命を学習する機械学習装置、寿命予測装置および機械学習方法 |
US10839310B2 (en) * | 2016-07-15 | 2020-11-17 | Google Llc | Selecting content items using reinforcement learning |
JP6506219B2 (ja) * | 2016-07-21 | 2019-04-24 | ファナック株式会社 | モータの電流指令を学習する機械学習器,モータ制御装置および機械学習方法 |
WO2018022715A1 (en) * | 2016-07-26 | 2018-02-01 | University Of Connecticut | Early prediction of an intention of a user's actions |
DE202016004628U1 (de) * | 2016-07-27 | 2016-09-23 | Google Inc. | Durchqueren einer Umgebungsstatusstruktur unter Verwendung neuronaler Netze |
US10049301B2 (en) * | 2016-08-01 | 2018-08-14 | Siemens Healthcare Gmbh | Medical scanner teaches itself to optimize clinical protocols and image acquisition |
US11080591B2 (en) * | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
US11188821B1 (en) * | 2016-09-15 | 2021-11-30 | X Development Llc | Control policies for collective robot learning |
KR102211012B1 (ko) * | 2016-09-15 | 2021-02-03 | 구글 엘엘씨 | 로봇 조작을 위한 심층 강화 학습 |
JP6514166B2 (ja) * | 2016-09-16 | 2019-05-15 | ファナック株式会社 | ロボットの動作プログラムを学習する機械学習装置,ロボットシステムおよび機械学習方法 |
CN115343947A (zh) * | 2016-09-23 | 2022-11-15 | 苹果公司 | 自主车辆的运动控制决策 |
US20180100662A1 (en) * | 2016-10-11 | 2018-04-12 | Mitsubishi Electric Research Laboratories, Inc. | Method for Data-Driven Learning-based Control of HVAC Systems using High-Dimensional Sensory Observations |
JP6827539B2 (ja) * | 2016-11-03 | 2021-02-10 | ディープマインド テクノロジーズ リミテッド | アクション選択ニューラルネットワークをトレーニングすること |
US9989964B2 (en) * | 2016-11-03 | 2018-06-05 | Mitsubishi Electric Research Laboratories, Inc. | System and method for controlling vehicle using neural network |
US11062207B2 (en) * | 2016-11-04 | 2021-07-13 | Raytheon Technologies Corporation | Control systems using deep reinforcement learning |
US11580360B2 (en) * | 2016-11-04 | 2023-02-14 | Google Llc | Unsupervised detection of intermediate reinforcement learning goals |
CN117521725A (zh) * | 2016-11-04 | 2024-02-06 | 渊慧科技有限公司 | 加强学习系统 |
KR102424893B1 (ko) * | 2016-11-04 | 2022-07-25 | 딥마인드 테크놀로지스 리미티드 | 보조 작업들을 통한 강화 학습 |
CN108230057A (zh) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | 一种智能推荐方法及系统 |
EP3557493A4 (en) * | 2016-12-14 | 2020-01-08 | Sony Corporation | INFORMATION PROCESSING DEVICE AND METHOD |
US20180165602A1 (en) * | 2016-12-14 | 2018-06-14 | Microsoft Technology Licensing, Llc | Scalability of reinforcement learning by separation of concerns |
US20200365015A1 (en) * | 2016-12-19 | 2020-11-19 | ThruGreen, LLC | Connected and adaptive vehicle traffic management system with digital prioritization |
JP6817456B2 (ja) * | 2017-02-24 | 2021-01-20 | ディープマインド テクノロジーズ リミテッド | ニューラルエピソード制御 |
WO2018156891A1 (en) * | 2017-02-24 | 2018-08-30 | Google Llc | Training policy neural networks using path consistency learning |
US10373313B2 (en) * | 2017-03-02 | 2019-08-06 | Siemens Healthcare Gmbh | Spatially consistent multi-scale anatomical landmark detection in incomplete 3D-CT data |
US10542019B2 (en) * | 2017-03-09 | 2020-01-21 | International Business Machines Corporation | Preventing intersection attacks |
US10379538B1 (en) * | 2017-03-20 | 2019-08-13 | Zoox, Inc. | Trajectory generation using motion primitives |
US10345808B2 (en) * | 2017-03-30 | 2019-07-09 | Uber Technologies, Inc | Systems and methods to control autonomous vehicle motion |
CN117313789A (zh) * | 2017-04-12 | 2023-12-29 | 渊慧科技有限公司 | 使用神经网络的黑盒优化 |
WO2018188981A1 (en) * | 2017-04-12 | 2018-10-18 | Koninklijke Philips N.V. | Drawing conclusions from free form texts with deep reinforcement learning |
EP3933713A1 (en) * | 2017-04-14 | 2022-01-05 | DeepMind Technologies Limited | Distributional reinforcement learning |
US10606898B2 (en) * | 2017-04-19 | 2020-03-31 | Brown University | Interpreting human-robot instructions |
WO2018211139A1 (en) * | 2017-05-19 | 2018-11-22 | Deepmind Technologies Limited | Training action selection neural networks using a differentiable credit function |
WO2018211142A1 (en) * | 2017-05-19 | 2018-11-22 | Deepmind Technologies Limited | Imagination-based agent neural networks |
CN117592504A (zh) * | 2017-05-26 | 2024-02-23 | 渊慧科技有限公司 | 训练动作选择神经网络的方法 |
DK3602409T3 (da) * | 2017-06-05 | 2024-01-29 | Deepmind Tech Ltd | Udvælgelse af handlinger ved hjælp af multimodale inputs |
EP3593292A1 (en) * | 2017-06-09 | 2020-01-15 | Deepmind Technologies Limited | Training action selection neural networks |
WO2019006091A2 (en) * | 2017-06-28 | 2019-01-03 | Google Llc | METHODS AND APPARATUS FOR MACHINE LEARNING FOR SEMANTIC ROBOTIC SEIZURE |
JP6756676B2 (ja) * | 2017-07-27 | 2020-09-16 | ファナック株式会社 | 製造システム |
US11256983B2 (en) * | 2017-07-27 | 2022-02-22 | Waymo Llc | Neural networks for vehicle trajectory planning |
US20200174490A1 (en) * | 2017-07-27 | 2020-06-04 | Waymo Llc | Neural networks for vehicle trajectory planning |
US10883844B2 (en) * | 2017-07-27 | 2021-01-05 | Waymo Llc | Neural networks for vehicle trajectory planning |
US11112796B2 (en) * | 2017-08-08 | 2021-09-07 | Uatc, Llc | Object motion prediction and autonomous vehicle control |
JP6564432B2 (ja) * | 2017-08-29 | 2019-08-21 | ファナック株式会社 | 機械学習装置、制御システム、制御装置、及び機械学習方法 |
EP3467717A1 (en) * | 2017-10-04 | 2019-04-10 | Prowler.io Limited | Machine learning system |
US10739776B2 (en) * | 2017-10-12 | 2020-08-11 | Honda Motor Co., Ltd. | Autonomous vehicle policy generation |
US10701641B2 (en) * | 2017-10-13 | 2020-06-30 | Apple Inc. | Interference mitigation in ultra-dense wireless networks |
US20200285940A1 (en) * | 2017-10-27 | 2020-09-10 | Deepmind Technologies Limited | Machine learning systems with memory based parameter adaptation for learning fast and slower |
EP3688675A1 (en) * | 2017-10-27 | 2020-08-05 | DeepMind Technologies Limited | Distributional reinforcement learning for continuous control tasks |
WO2019113067A2 (en) * | 2017-12-05 | 2019-06-13 | Google Llc | Viewpoint invariant visual servoing of robot end effector using recurrent neural network |
US10926408B1 (en) * | 2018-01-12 | 2021-02-23 | Amazon Technologies, Inc. | Artificial intelligence system for efficiently learning robotic control policies |
US20190244099A1 (en) * | 2018-02-05 | 2019-08-08 | Deepmind Technologies Limited | Continual reinforcement learning with a multi-task agent |
WO2019149949A1 (en) * | 2018-02-05 | 2019-08-08 | Deepmind Technologies Limited | Distributed training using off-policy actor-critic reinforcement learning |
US11221413B2 (en) * | 2018-03-14 | 2022-01-11 | Uatc, Llc | Three-dimensional object detection |
US11467590B2 (en) * | 2018-04-09 | 2022-10-11 | SafeAI, Inc. | Techniques for considering uncertainty in use of artificial intelligence models |
JP6740277B2 (ja) * | 2018-04-13 | 2020-08-12 | ファナック株式会社 | 機械学習装置、制御装置、及び機械学習方法 |
WO2019202073A1 (en) * | 2018-04-18 | 2019-10-24 | Deepmind Technologies Limited | Neural networks for scalable continual learning in domains with sequentially learned tasks |
US20210187733A1 (en) * | 2018-05-18 | 2021-06-24 | Google Llc | Data-efficient hierarchical reinforcement learning |
US11263531B2 (en) * | 2018-05-18 | 2022-03-01 | Deepmind Technologies Limited | Unsupervised control using learned rewards |
US11370423B2 (en) * | 2018-06-15 | 2022-06-28 | Uatc, Llc | Multi-task machine-learned models for object intention determination in autonomous driving |
US11454975B2 (en) * | 2018-06-28 | 2022-09-27 | Uatc, Llc | Providing actionable uncertainties in autonomous vehicles |
US11397089B2 (en) * | 2018-07-13 | 2022-07-26 | Uatc, Llc | Autonomous vehicle routing with route extension |
JP6608010B1 (ja) * | 2018-07-25 | 2019-11-20 | 積水化学工業株式会社 | 制御装置、サーバ、管理システム、コンピュータプログラム、学習モデル及び制御方法 |
US11423295B2 (en) * | 2018-07-26 | 2022-08-23 | Sap Se | Dynamic, automated fulfillment of computer-based resource request provisioning using deep reinforcement learning |
US11537872B2 (en) * | 2018-07-30 | 2022-12-27 | International Business Machines Corporation | Imitation learning by action shaping with antagonist reinforcement learning |
US11734575B2 (en) * | 2018-07-30 | 2023-08-22 | International Business Machines Corporation | Sequential learning of constraints for hierarchical reinforcement learning |
EP3605334A1 (en) * | 2018-07-31 | 2020-02-05 | Prowler.io Limited | Incentive control for multi-agent systems |
JP7011239B2 (ja) * | 2018-08-17 | 2022-01-26 | 横河電機株式会社 | 装置、方法、プログラム、および、記録媒体 |
US11833681B2 (en) * | 2018-08-24 | 2023-12-05 | Nvidia Corporation | Robotic control system |
EP3824358A4 (en) * | 2018-09-04 | 2022-04-13 | Kindred Systems Inc. | REAL-TIME-REAL-WORLD REINFORCEMENT LEARNING SYSTEMS AND METHODS |
CN113056749A (zh) * | 2018-09-11 | 2021-06-29 | 辉达公司 | 用于自主机器应用的未来对象轨迹预测 |
EP3850551A4 (en) * | 2018-09-12 | 2022-10-12 | Electra Vehicles, Inc. | SYSTEMS AND METHODS FOR MANAGEMENT OF ENERGY STORAGE SYSTEMS |
EP3837641A1 (en) * | 2018-09-14 | 2021-06-23 | Google LLC | Deep reinforcement learning-based techniques for end to end robot navigation |
US20200097808A1 (en) * | 2018-09-21 | 2020-03-26 | International Business Machines Corporation | Pattern Identification in Reinforcement Learning |
CN112770687B (zh) * | 2018-09-27 | 2024-03-29 | 康坦手术股份有限公司 | 包括自动定位机构的医疗机器人 |
US10872294B2 (en) * | 2018-09-27 | 2020-12-22 | Deepmind Technologies Limited | Imitation learning using a generative predecessor neural network |
KR20210011422A (ko) * | 2018-09-27 | 2021-02-01 | 딥마인드 테크놀로지스 리미티드 | 모델 없는 강화 학습을 위한 스택형 컨볼루션 장단기 메모리 |
EP3834138A1 (en) * | 2018-09-27 | 2021-06-16 | DeepMind Technologies Limited | Reinforcement learning neural networks grounded in learned visual entities |
US11568207B2 (en) * | 2018-09-27 | 2023-01-31 | Deepmind Technologies Limited | Learning observation representations by predicting the future in latent space |
US10831210B1 (en) * | 2018-09-28 | 2020-11-10 | Zoox, Inc. | Trajectory generation and optimization using closed-form numerical integration in route-relative coordinates |
JP6901450B2 (ja) * | 2018-10-02 | 2021-07-14 | ファナック株式会社 | 機械学習装置、制御装置及び機械学習方法 |
US20210402598A1 (en) * | 2018-10-10 | 2021-12-30 | Sony Corporation | Robot control device, robot control method, and robot control program |
EP3640873A1 (en) * | 2018-10-17 | 2020-04-22 | Tata Consultancy Services Limited | System and method for concurrent dynamic optimization of replenishment decision in networked node environment |
CA3116855A1 (en) * | 2018-10-26 | 2020-04-30 | Dow Global Technologies Llc | Deep reinforcement learning for production scheduling |
US20210383218A1 (en) * | 2018-10-29 | 2021-12-09 | Google Llc | Determining control policies by minimizing the impact of delusion |
US20200134445A1 (en) * | 2018-10-31 | 2020-04-30 | Advanced Micro Devices, Inc. | Architecture for deep q learning |
US11231717B2 (en) * | 2018-11-08 | 2022-01-25 | Baidu Usa Llc | Auto-tuning motion planning system for autonomous vehicles |
JP6849643B2 (ja) * | 2018-11-09 | 2021-03-24 | ファナック株式会社 | 出力装置、制御装置、及び評価関数と機械学習結果の出力方法 |
US11868866B2 (en) * | 2018-11-16 | 2024-01-09 | Deep Mind Technologies Limited | Controlling agents using amortized Q learning |
US11048253B2 (en) * | 2018-11-21 | 2021-06-29 | Waymo Llc | Agent prioritization for autonomous vehicles |
JP6970078B2 (ja) * | 2018-11-28 | 2021-11-24 | 株式会社東芝 | ロボット動作計画装置、ロボットシステム、および方法 |
KR101990326B1 (ko) * | 2018-11-28 | 2019-06-18 | 한국인터넷진흥원 | 감가율 자동 조정 방식의 강화 학습 방법 |
US11131992B2 (en) * | 2018-11-30 | 2021-09-28 | Denso International America, Inc. | Multi-level collaborative control system with dual neural network planning for autonomous vehicle control in a noisy environment |
US10997729B2 (en) * | 2018-11-30 | 2021-05-04 | Baidu Usa Llc | Real time object behavior prediction |
US11137762B2 (en) * | 2018-11-30 | 2021-10-05 | Baidu Usa Llc | Real time decision making for autonomous driving vehicles |
US11519742B2 (en) * | 2018-12-19 | 2022-12-06 | Uber Technologies, Inc. | Routing autonomous vehicles using temporal data |
EP3899797A1 (en) * | 2019-01-24 | 2021-10-27 | DeepMind Technologies Limited | Multi-agent reinforcement learning with matchmaking policies |
JP2020116869A (ja) * | 2019-01-25 | 2020-08-06 | セイコーエプソン株式会社 | 印刷装置、学習装置、学習方法および学習プログラム |
US20200272905A1 (en) * | 2019-02-26 | 2020-08-27 | GE Precision Healthcare LLC | Artificial neural network compression via iterative hybrid reinforcement learning approach |
US10700935B1 (en) * | 2019-02-27 | 2020-06-30 | Peritus.AI, Inc. | Automatic configuration and operation of complex systems |
CA3075156A1 (en) * | 2019-03-15 | 2020-09-15 | Mission Control Space Services Inc. | Terrain traficability assesment for autonomous or semi-autonomous rover or vehicle |
US20200310420A1 (en) * | 2019-03-26 | 2020-10-01 | GM Global Technology Operations LLC | System and method to train and select a best solution in a dynamical system |
US11132608B2 (en) * | 2019-04-04 | 2021-09-28 | Cisco Technology, Inc. | Learning-based service migration in mobile edge computing |
US11312372B2 (en) * | 2019-04-16 | 2022-04-26 | Ford Global Technologies, Llc | Vehicle path prediction |
JP7010877B2 (ja) * | 2019-04-25 | 2022-01-26 | ファナック株式会社 | 機械学習装置、数値制御システム及び機械学習方法 |
EP3963520A4 (en) * | 2019-04-30 | 2023-01-11 | Soul Machines | SYSTEM FOR SEQUENCING AND PLANNING |
US11701771B2 (en) * | 2019-05-15 | 2023-07-18 | Nvidia Corporation | Grasp generation using a variational autoencoder |
US11875269B2 (en) * | 2019-05-23 | 2024-01-16 | Deepmind Technologies Limited | Large scale generative neural network model with inference for representation learning using adversarial training |
EP3948670A1 (en) * | 2019-05-24 | 2022-02-09 | DeepMind Technologies Limited | Hierarchical policies for multitask transfer |
US11814046B2 (en) * | 2019-05-29 | 2023-11-14 | Motional Ad Llc | Estimating speed profiles |
US11482210B2 (en) * | 2019-05-29 | 2022-10-25 | Lg Electronics Inc. | Artificial intelligence device capable of controlling other devices based on device information |
JP7221423B6 (ja) * | 2019-06-10 | 2023-05-16 | ジョビー エアロ,インコーポレイテッド | 時間変動音量予測システム |
WO2021003379A1 (en) * | 2019-07-03 | 2021-01-07 | Waymo Llc | Agent trajectory prediction using anchor trajectories |
WO2021004437A1 (en) * | 2019-07-05 | 2021-01-14 | Huawei Technologies Co., Ltd. | Method and system for predictive control of vehicle using digital images |
US20220269948A1 (en) * | 2019-07-12 | 2022-08-25 | Elektrobit Automotive Gmbh | Training of a convolutional neural network |
JP7342491B2 (ja) * | 2019-07-25 | 2023-09-12 | オムロン株式会社 | 推論装置、推論方法、及び推論プログラム |
US11481420B2 (en) * | 2019-08-08 | 2022-10-25 | Nice Ltd. | Systems and methods for analyzing computer input to provide next action |
US11458965B2 (en) * | 2019-08-13 | 2022-10-04 | Zoox, Inc. | Feasibility validation for vehicle trajectory selection |
US11397434B2 (en) * | 2019-08-13 | 2022-07-26 | Zoox, Inc. | Consistency validation for vehicle trajectory selection |
SE1950924A1 (en) * | 2019-08-13 | 2021-02-14 | Kaaberg Johard Leonard | Improved machine learning for technical systems |
US11407409B2 (en) * | 2019-08-13 | 2022-08-09 | Zoox, Inc. | System and method for trajectory validation |
US11599823B2 (en) * | 2019-08-14 | 2023-03-07 | International Business Machines Corporation | Quantum reinforcement learning agent |
WO2021040958A1 (en) * | 2019-08-23 | 2021-03-04 | Carrier Corporation | System and method for early event detection using generative and discriminative machine learning models |
EP4003664A1 (en) * | 2019-08-27 | 2022-06-01 | Google LLC | Future prediction, using stochastic adversarial based sampling, for robotic control |
US11132403B2 (en) * | 2019-09-06 | 2021-09-28 | Digital Asset Capital, Inc. | Graph-manipulation based domain-specific execution environment |
CN114761965A (zh) * | 2019-09-13 | 2022-07-15 | 渊慧科技有限公司 | 数据驱动的机器人控制 |
CN114423574A (zh) * | 2019-09-15 | 2022-04-29 | 谷歌有限责任公司 | 确定针对机器人任务的环境调节的动作序列 |
JP7335434B2 (ja) * | 2019-09-25 | 2023-08-29 | ディープマインド テクノロジーズ リミテッド | 後知恵モデリングを用いた行動選択ニューラルネットワークの訓練 |
US20210089908A1 (en) * | 2019-09-25 | 2021-03-25 | Deepmind Technologies Limited | Modulating agent behavior to optimize learning progress |
EP4014162A1 (en) * | 2019-09-25 | 2022-06-22 | DeepMind Technologies Limited | Controlling agents using causally correct environment models |
CN114467100A (zh) * | 2019-09-25 | 2022-05-10 | 渊慧科技有限公司 | 使用q学习与前瞻搜索相结合训练动作选择神经网络 |
US11650551B2 (en) * | 2019-10-04 | 2023-05-16 | Mitsubishi Electric Research Laboratories, Inc. | System and method for policy optimization using quasi-Newton trust region method |
US11645518B2 (en) * | 2019-10-07 | 2023-05-09 | Waymo Llc | Multi-agent simulations |
EP3812972A1 (en) * | 2019-10-25 | 2021-04-28 | Robert Bosch GmbH | Method for controlling a robot and robot controller |
US11586931B2 (en) * | 2019-10-31 | 2023-02-21 | Waymo Llc | Training trajectory scoring neural networks to accurately assign scores |
US20210133583A1 (en) * | 2019-11-05 | 2021-05-06 | Nvidia Corporation | Distributed weight update for backpropagation of a neural network |
US11912271B2 (en) * | 2019-11-07 | 2024-02-27 | Motional Ad Llc | Trajectory prediction from precomputed or dynamically generated bank of trajectories |
CN112937564B (zh) * | 2019-11-27 | 2022-09-02 | 魔门塔(苏州)科技有限公司 | 换道决策模型生成方法和无人车换道决策方法及装置 |
US11735045B2 (en) * | 2019-12-04 | 2023-08-22 | Uatc, Llc | Systems and methods for computational resource allocation for autonomous vehicles |
US11442459B2 (en) * | 2019-12-11 | 2022-09-13 | Uatc, Llc | Systems and methods for training predictive models for autonomous devices |
US20210192287A1 (en) * | 2019-12-18 | 2021-06-24 | Nvidia Corporation | Master transform architecture for deep learning |
CN111061277B (zh) * | 2019-12-31 | 2022-04-05 | 歌尔股份有限公司 | 一种无人车全局路径规划方法和装置 |
US11332165B2 (en) * | 2020-01-27 | 2022-05-17 | Honda Motor Co., Ltd. | Human trust calibration for autonomous driving agent of vehicle |
US11494649B2 (en) * | 2020-01-31 | 2022-11-08 | At&T Intellectual Property I, L.P. | Radio access network control with deep reinforcement learning |
US20220291666A1 (en) * | 2020-02-03 | 2022-09-15 | Strong Force TX Portfolio 2018, LLC | Ai solution selection for an automated robotic process |
EP4104104A1 (en) * | 2020-02-10 | 2022-12-21 | Deeplife | Generative digital twin of complex systems |
JP7234970B2 (ja) * | 2020-02-17 | 2023-03-08 | 株式会社デンソー | 車両行動生成装置、車両行動生成方法、および車両行動生成プログラム |
DE102020202350A1 (de) * | 2020-02-24 | 2021-08-26 | Volkswagen Aktiengesellschaft | Verfahren und Vorrichtung zum Unterstützen einer Manöverplanung für ein automatisiert fahrendes Fahrzeug oder einen Roboter |
US11717960B2 (en) * | 2020-02-25 | 2023-08-08 | Intelligrated Headquarters, Llc | Anti-sway control for a robotic arm with adaptive grasping |
US11759951B2 (en) * | 2020-02-28 | 2023-09-19 | Honda Motor Co., Ltd. | Systems and methods for incorporating latent states into robotic planning |
US11782438B2 (en) * | 2020-03-17 | 2023-10-10 | Nissan North America, Inc. | Apparatus and method for post-processing a decision-making model of an autonomous vehicle using multivariate data |
US20210327578A1 (en) * | 2020-04-08 | 2021-10-21 | Babylon Partners Limited | System and Method for Medical Triage Through Deep Q-Learning |
US20210334654A1 (en) * | 2020-04-24 | 2021-10-28 | Mastercard International Incorporated | Methods and systems for reducing bias in an artificial intelligence model |
EP4144087A1 (en) * | 2020-04-29 | 2023-03-08 | Deep Render Ltd | Image compression and decoding, video compression and decoding: methods and systems |
US20210356965A1 (en) * | 2020-05-12 | 2021-11-18 | Uber Technologies, Inc. | Vehicle routing using third party vehicle capabilities |
WO2021246925A1 (en) * | 2020-06-05 | 2021-12-09 | Telefonaktiebolaget Lm Ericsson (Publ) | Dynamic spectrum sharing based on machine learning |
CA3180999A1 (en) * | 2020-06-05 | 2021-12-09 | Gatik Ai Inc. | Method and system for deterministic trajectory selection based on uncertainty estimation for an autonomous agent |
US20210390409A1 (en) * | 2020-06-12 | 2021-12-16 | Google Llc | Training reinforcement learning agents using augmented temporal difference learning |
US20210397959A1 (en) * | 2020-06-22 | 2021-12-23 | Google Llc | Training reinforcement learning agents to learn expert exploration behaviors from demonstrators |
JP2023537278A (ja) * | 2020-07-24 | 2023-08-31 | ジェネシス クラウド サービシーズ ホールディングス セカンド エルエルシー | 自動aiモデリング及び多目的最適化を利用したスケーラブルなコンタクトセンターエージェントスケジューリングのための方法及びシステム |
US11835958B2 (en) * | 2020-07-28 | 2023-12-05 | Huawei Technologies Co., Ltd. | Predictive motion planning system and method |
WO2022027057A1 (en) * | 2020-07-29 | 2022-02-03 | Uber Technologies, Inc. | Routing feature flags |
DE102020209685B4 (de) * | 2020-07-31 | 2023-07-06 | Robert Bosch Gesellschaft mit beschränkter Haftung | Verfahren zum steuern einer robotervorrichtung und robotervorrichtungssteuerung |
CA3189144A1 (en) * | 2020-08-14 | 2022-02-17 | Andrew GRIMSHAW | Power aware scheduling |
JP7366860B2 (ja) * | 2020-08-17 | 2023-10-23 | 株式会社日立製作所 | 攻撃シナリオシミュレーション装置、攻撃シナリオ生成システム、および攻撃シナリオ生成方法 |
US11715007B2 (en) * | 2020-08-28 | 2023-08-01 | UMNAI Limited | Behaviour modeling, verification, and autonomous actions and triggers of ML and AI systems |
WO2022069747A1 (en) * | 2020-10-02 | 2022-04-07 | Deepmind Technologies Limited | Training reinforcement learning agents using augmented temporal difference learning |
US20220129708A1 (en) * | 2020-10-22 | 2022-04-28 | Applied Materials Israel Ltd. | Segmenting an image using a neural network |
WO2022101452A1 (en) * | 2020-11-12 | 2022-05-19 | UMNAI Limited | Architecture for explainable reinforcement learning |
US20220152826A1 (en) * | 2020-11-13 | 2022-05-19 | Nvidia Corporation | Object rearrangement using learned implicit collision functions |
US20220164657A1 (en) * | 2020-11-25 | 2022-05-26 | Chevron U.S.A. Inc. | Deep reinforcement learning for field development planning optimization |
US20220188695A1 (en) * | 2020-12-16 | 2022-06-16 | Argo AI, LLC | Autonomous vehicle system for intelligent on-board selection of data for training a remote machine learning model |
US20210133633A1 (en) * | 2020-12-22 | 2021-05-06 | Intel Corporation | Autonomous machine knowledge transfer |
US20220197280A1 (en) * | 2020-12-22 | 2022-06-23 | Uatc, Llc | Systems and Methods for Error Sourcing in Autonomous Vehicle Simulation |
US20220204055A1 (en) * | 2020-12-30 | 2022-06-30 | Waymo Llc | Optimization of planning trajectories for multiple agents |
US20220207337A1 (en) * | 2020-12-31 | 2022-06-30 | Deepx Co., Ltd. | Method for artificial neural network and neural processing unit |
US20220234651A1 (en) * | 2021-01-25 | 2022-07-28 | GM Global Technology Operations LLC | Methods, systems, and apparatuses for adaptive driver override for path based automated driving assist |
CN114912041A (zh) * | 2021-01-29 | 2022-08-16 | 伊姆西Ip控股有限责任公司 | 信息处理方法、电子设备和计算机程序产品 |
US20220261635A1 (en) * | 2021-02-12 | 2022-08-18 | DeeMind Technologies Limited | Training a policy neural network for controlling an agent using best response policy iteration |
US20220269937A1 (en) * | 2021-02-24 | 2022-08-25 | Nvidia Corporation | Generating frames for neural simulation using one or more neural networks |
US20220276657A1 (en) * | 2021-03-01 | 2022-09-01 | Samsung Electronics Co., Ltd. | Trajectory generation of a robot using a neural network |
US20220284261A1 (en) * | 2021-03-05 | 2022-09-08 | The Aerospace Corporation | Training-support-based machine learning classification and regression augmentation |
US11475043B2 (en) * | 2021-03-05 | 2022-10-18 | International Business Machines Corporation | Machine learning based application of changes in a target database system |
US20220300851A1 (en) * | 2021-03-18 | 2022-09-22 | Toyota Research Institute, Inc. | System and method for training a multi-task model |
US20220305649A1 (en) * | 2021-03-25 | 2022-09-29 | Naver Corporation | Reachable manifold and inverse mapping training for robots |
US20220309336A1 (en) * | 2021-03-26 | 2022-09-29 | Nvidia Corporation | Accessing tensors |
US11787055B2 (en) * | 2021-03-30 | 2023-10-17 | Honda Research Institute Europe Gmbh | Controlling a robot using predictive decision making |
US11945441B2 (en) * | 2021-03-31 | 2024-04-02 | Nissan North America, Inc. | Explainability and interface design for lane-level route planner |
US20220318557A1 (en) * | 2021-04-06 | 2022-10-06 | Nvidia Corporation | Techniques for identification of out-of-distribution input data in neural networks |
US11144847B1 (en) * | 2021-04-15 | 2021-10-12 | Latent Strategies LLC | Reinforcement learning using obfuscated environment models |
EP4080452A1 (en) * | 2021-04-15 | 2022-10-26 | Waymo LLC | Unsupervised training of optical flow estimation neural networks |
US11713059B2 (en) * | 2021-04-22 | 2023-08-01 | SafeAI, Inc. | Autonomous control of heavy equipment and vehicles using task hierarchies |
KR20230166129A (ko) * | 2021-04-23 | 2023-12-06 | 모셔널 에이디 엘엘씨 | 에이전트 궤적 예측 |
US20220366220A1 (en) * | 2021-04-29 | 2022-11-17 | Nvidia Corporation | Dynamic weight updates for neural networks |
US20220373980A1 (en) * | 2021-05-06 | 2022-11-24 | Massachusetts Institute Of Technology | Dymamic control of a manufacturing process using deep reinforcement learning |
US20220366263A1 (en) * | 2021-05-06 | 2022-11-17 | Waymo Llc | Training distilled machine learning models using a pre-trained feature extractor |
US11546665B2 (en) * | 2021-05-07 | 2023-01-03 | Hulu, LLC | Reinforcement learning for guaranteed delivery of supplemental content |
US20220366235A1 (en) * | 2021-05-13 | 2022-11-17 | Deepmind Technologies Limited | Controlling operation of actor and learner computing units based on a usage rate of a replay memory |
US20220383075A1 (en) * | 2021-05-21 | 2022-12-01 | Royal Bank Of Canada | System and method for conditional marginal distributions at flexible evaluation horizons |
US20220398283A1 (en) * | 2021-05-25 | 2022-12-15 | Nvidia Corporation | Method for fast and better tree search for reinforcement learning |
US11941899B2 (en) * | 2021-05-26 | 2024-03-26 | Nvidia Corporation | Data selection based on uncertainty quantification |
US20220383074A1 (en) * | 2021-05-28 | 2022-12-01 | Deepmind Technologies Limited | Persistent message passing for graph neural networks |
US11921506B2 (en) * | 2021-05-28 | 2024-03-05 | Nissan North America, Inc. | Belief state determination for real-time decision-making |
US20230025154A1 (en) * | 2021-07-22 | 2023-01-26 | The Boeing Company | Dual agent reinforcement learning based system for autonomous operation of aircraft |
US20230075473A1 (en) * | 2021-09-09 | 2023-03-09 | Mycronic AB | Device and method for enabling deriving of corrected digital pattern descriptions |
US20230121913A1 (en) * | 2021-10-19 | 2023-04-20 | Volvo Car Corporation | Intelligent messaging framework for vehicle ecosystem communication |
US20230237342A1 (en) * | 2022-01-24 | 2023-07-27 | Nvidia Corporation | Adaptive lookahead for planning and learning |
CN114362175B (zh) * | 2022-03-10 | 2022-06-07 | 山东大学 | 基于深度确定性策略梯度算法的风电功率预测方法及系统 |
US11429845B1 (en) * | 2022-03-29 | 2022-08-30 | Intuit Inc. | Sparsity handling for machine learning model forecasting |
US20230376961A1 (en) * | 2022-05-19 | 2023-11-23 | Oracle Financial Services Software Limited | Reinforcement learning agent simulation to measure monitoring system strength |
US20240070485A1 (en) * | 2022-08-16 | 2024-02-29 | Optum, Inc. | Reinforcement learning for optimizing cross-channel communications |
CN115529278A (zh) * | 2022-09-07 | 2022-12-27 | 华东师范大学 | 基于多智能体强化学习的数据中心网络ecn自动调控方法 |
-
2017
- 2017-11-04 CN CN202311473332.7A patent/CN117521725A/zh active Pending
- 2017-11-04 EP EP17807934.9A patent/EP3523760B1/en active Active
- 2017-11-04 CN CN201780078702.3A patent/CN110088775B/zh active Active
- 2017-11-04 WO PCT/IB2017/056902 patent/WO2018083667A1/en unknown
- 2017-11-04 JP JP2019523612A patent/JP6728495B2/ja active Active
-
2019
- 2019-05-03 US US16/403,314 patent/US10733501B2/en active Active
-
2020
- 2020-06-25 US US16/911,992 patent/US20200327399A1/en active Pending
- 2020-06-29 JP JP2020111559A patent/JP6917508B2/ja active Active
Also Published As
Publication number | Publication date |
---|---|
CN110088775B (zh) | 2023-11-07 |
EP3523760B1 (en) | 2024-01-24 |
JP6917508B2 (ja) | 2021-08-11 |
CN110088775A (zh) | 2019-08-02 |
CN117521725A (zh) | 2024-02-06 |
US20190259051A1 (en) | 2019-08-22 |
US20200327399A1 (en) | 2020-10-15 |
US10733501B2 (en) | 2020-08-04 |
WO2018083667A1 (en) | 2018-05-11 |
JP6728495B2 (ja) | 2020-07-22 |
JP2020191097A (ja) | 2020-11-26 |
EP3523760A1 (en) | 2019-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6917508B2 (ja) | 強化学習を用いた環境予測 | |
JP6926203B2 (ja) | 補助タスクを伴う強化学習 | |
JP6935550B2 (ja) | 強化学習を使用した環境ナビゲーション | |
JP7258965B2 (ja) | ニューラルネットワークを使用する強化学習のための行動選択 | |
CN110235148B (zh) | 训练动作选择神经网络 | |
CN107851216B (zh) | 一种用于选择待由与环境进行交互的强化学习代理执行的动作的方法 | |
US20210201156A1 (en) | Sample-efficient reinforcement learning | |
US20230237375A1 (en) | Dynamic placement of computation sub-graphs | |
US11627165B2 (en) | Multi-agent reinforcement learning with matchmaking policies | |
CN110692066A (zh) | 使用多模态输入选择动作 | |
US11200482B2 (en) | Recurrent environment predictors | |
US10860895B2 (en) | Imagination-based agent neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20190705 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20200601 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20200701 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6728495 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |