WO2020143848A3 - Determining action selection policies of an execution device - Google Patents
Determining action selection policies of an execution device Download PDFInfo
- Publication number
- WO2020143848A3 WO2020143848A3 PCT/CN2020/082914 CN2020082914W WO2020143848A3 WO 2020143848 A3 WO2020143848 A3 WO 2020143848A3 CN 2020082914 W CN2020082914 W CN 2020082914W WO 2020143848 A3 WO2020143848 A3 WO 2020143848A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subtask
- snn
- sequence
- action selection
- execution device
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Feedback Control In General (AREA)
Abstract
Computer-implemented methods, systems, and apparatus, including computer-readable medium, for generating an action selection policy for causing an execution device to complete a task are described. Data representing a task that is divided into a sequence of subtasks are obtained. Data specifying a strategy neural network (SNN) for a subtask in the sequence of subtasks are obtained. The SNN receives inputs include a sequence of actions that reach an initial state of the subtask, and predicts an action selection policy of the execution device for the subtask. The SNN is trained based on a value neural network (VNN) for a next subtask that follows the subtask in the sequence of subtasks. An input to the SNN is determined. The input includes a sequence of actions that reach a subtask initial state of the subtask. An action selection policy for completing the subtask is determined based on an output of the SNN.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080004369.3A CN112533681A (en) | 2020-04-02 | 2020-04-02 | Determining action selection guidelines for an execution device |
PCT/CN2020/082914 WO2020143848A2 (en) | 2020-04-02 | 2020-04-02 | Determining action selection policies of an execution device |
SG11202102364YA SG11202102364YA (en) | 2020-04-02 | 2020-04-02 | Determining action selection policies of an execution device |
US17/219,038 US11204803B2 (en) | 2020-04-02 | 2021-03-31 | Determining action selection policies of an execution device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/082914 WO2020143848A2 (en) | 2020-04-02 | 2020-04-02 | Determining action selection policies of an execution device |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/219,038 Continuation US11204803B2 (en) | 2020-04-02 | 2021-03-31 | Determining action selection policies of an execution device |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2020143848A2 WO2020143848A2 (en) | 2020-07-16 |
WO2020143848A3 true WO2020143848A3 (en) | 2021-01-28 |
Family
ID=71522304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/082914 WO2020143848A2 (en) | 2020-04-02 | 2020-04-02 | Determining action selection policies of an execution device |
Country Status (4)
Country | Link |
---|---|
US (1) | US11204803B2 (en) |
CN (1) | CN112533681A (en) |
SG (1) | SG11202102364YA (en) |
WO (1) | WO2020143848A2 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG11202010204TA (en) | 2019-12-12 | 2020-11-27 | Alipay Hangzhou Inf Tech Co Ltd | Determining action selection policies of an execution device |
SG11202010721QA (en) | 2019-12-12 | 2020-11-27 | Alipay Hangzhou Inf Tech Co Ltd | Determining action selection policies of execution device |
WO2020098822A2 (en) | 2019-12-12 | 2020-05-22 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160260024A1 (en) * | 2015-03-04 | 2016-09-08 | Qualcomm Incorporated | System of distributed planning |
CN109101339A (en) * | 2018-08-15 | 2018-12-28 | 北京邮电大学 | Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group |
CN110170171A (en) * | 2019-06-03 | 2019-08-27 | 深圳市腾讯网域计算机网络有限公司 | A kind of control method and device of target object |
US20190332922A1 (en) * | 2017-02-24 | 2019-10-31 | Google Llc | Training policy neural networks using path consistency learning |
US20190354813A1 (en) * | 2017-01-31 | 2019-11-21 | Deepmind Technologies Limited | Data-efficient reinforcement learning for continuous control tasks |
CN110489223A (en) * | 2019-08-26 | 2019-11-22 | 北京邮电大学 | Method for scheduling task, device and electronic equipment in a kind of isomeric group |
CN110882544A (en) * | 2019-11-28 | 2020-03-17 | 网易(杭州)网络有限公司 | Multi-agent training method and device and electronic equipment |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2459254C2 (en) | 2007-04-27 | 2012-08-20 | Сименс Акциенгезелльшафт | Method for computer-aided training of one or more neural networks |
US20090276385A1 (en) * | 2008-04-30 | 2009-11-05 | Stanley Hill | Artificial-Neural-Networks Training Artificial-Neural-Networks |
US20140039913A1 (en) | 2012-07-31 | 2014-02-06 | Tuomas W. Sandholm | Medical treatment planning via sequential games |
EP3872715A1 (en) * | 2015-11-12 | 2021-09-01 | Deepmind Technologies Limited | Asynchronous deep reinforcement learning |
US10002029B1 (en) * | 2016-02-05 | 2018-06-19 | Sas Institute Inc. | Automated transfer of neural network definitions among federated areas |
US10057367B2 (en) | 2016-03-02 | 2018-08-21 | Huawei Technologies Canada Co., Ltd. | Systems and methods for data caching in a communications network |
DE202016004627U1 (en) * | 2016-07-27 | 2016-09-23 | Google Inc. | Training a neural value network |
DE202016004628U1 (en) * | 2016-07-27 | 2016-09-23 | Google Inc. | Traversing an environment state structure using neural networks |
CN106296006A (en) | 2016-08-10 | 2017-01-04 | 哈尔滨工业大学深圳研究生院 | The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation |
US10694526B2 (en) | 2016-09-30 | 2020-06-23 | Drexel University | Adaptive pursuit learning method to mitigate small-cell interference through directionality |
EP3516595B1 (en) * | 2016-11-03 | 2020-09-09 | Deepmind Technologies Limited | Training action selection neural networks |
JP6926203B2 (en) * | 2016-11-04 | 2021-08-25 | ディープマインド テクノロジーズ リミテッド | Reinforcement learning with auxiliary tasks |
EP3535702B1 (en) * | 2016-11-04 | 2024-05-01 | Google LLC | Unsupervised detection of intermediate reinforcement learning goals |
US20180189950A1 (en) * | 2016-12-30 | 2018-07-05 | Google Inc. | Generating structured output predictions using neural networks |
WO2018211141A1 (en) * | 2017-05-19 | 2018-11-22 | Deepmind Technologies Limited | Imagination-based agent neural networks |
WO2018215665A1 (en) * | 2017-05-26 | 2018-11-29 | Deepmind Technologies Limited | Training action selection neural networks using look-ahead search |
CN116957055A (en) * | 2017-06-05 | 2023-10-27 | 渊慧科技有限公司 | Selecting actions using multimodal input |
CN110574048B (en) * | 2017-06-09 | 2023-07-07 | 渊慧科技有限公司 | Training action selection neural network |
US11138513B2 (en) | 2017-06-13 | 2021-10-05 | Princeton University | Dynamic learning system |
CN110651279B (en) * | 2017-06-28 | 2023-11-07 | 渊慧科技有限公司 | Training action selection neural networks using apprentices |
EP3616128A1 (en) * | 2017-08-25 | 2020-03-04 | Google LLC | Batched reinforcement learning |
US10846109B2 (en) * | 2017-12-20 | 2020-11-24 | Google Llc | Suggesting actions based on machine learning |
WO2019133052A1 (en) * | 2017-12-28 | 2019-07-04 | Yang Shao Wen | Visual fog |
US11688160B2 (en) * | 2018-01-17 | 2023-06-27 | Huawei Technologies Co., Ltd. | Method of generating training data for training a neural network, method of training a neural network and using neural network for autonomous operations |
US20190244099A1 (en) * | 2018-02-05 | 2019-08-08 | Deepmind Technologies Limited | Continual reinforcement learning with a multi-task agent |
US20190324795A1 (en) * | 2018-04-24 | 2019-10-24 | Microsoft Technology Licensing, Llc | Composite task execution |
US20190392309A1 (en) * | 2018-06-21 | 2019-12-26 | Denso International America, Inc. | LSTM Training For Neural Network Based Course Of Action Selection |
CN112292701A (en) | 2019-01-17 | 2021-01-29 | 创新先进技术有限公司 | Conducting policy search in multi-party policy interaction |
SG11202001804QA (en) * | 2019-05-15 | 2020-12-30 | Advanced New Technologies Co Ltd | Determining action selection policies of an execution device |
US11714990B2 (en) * | 2019-05-23 | 2023-08-01 | Deepmind Technologies Limited | Jointly learning exploratory and non-exploratory action selection policies |
US11227167B2 (en) * | 2019-06-28 | 2022-01-18 | Baidu Usa Llc | Determining vanishing points based on lane lines |
CN110327624B (en) * | 2019-07-03 | 2023-03-17 | 广州多益网络股份有限公司 | Game following method and system based on curriculum reinforcement learning |
US20210158162A1 (en) * | 2019-11-27 | 2021-05-27 | Google Llc | Training reinforcement learning agents to learn farsighted behaviors by predicting in latent space |
WO2021153969A1 (en) * | 2020-01-27 | 2021-08-05 | Samsung Electronics Co., Ltd. | Methods and systems for managing processing of neural network across heterogeneous processors |
-
2020
- 2020-04-02 SG SG11202102364YA patent/SG11202102364YA/en unknown
- 2020-04-02 CN CN202080004369.3A patent/CN112533681A/en active Pending
- 2020-04-02 WO PCT/CN2020/082914 patent/WO2020143848A2/en active Application Filing
-
2021
- 2021-03-31 US US17/219,038 patent/US11204803B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160260024A1 (en) * | 2015-03-04 | 2016-09-08 | Qualcomm Incorporated | System of distributed planning |
US20190354813A1 (en) * | 2017-01-31 | 2019-11-21 | Deepmind Technologies Limited | Data-efficient reinforcement learning for continuous control tasks |
US20190332922A1 (en) * | 2017-02-24 | 2019-10-31 | Google Llc | Training policy neural networks using path consistency learning |
CN109101339A (en) * | 2018-08-15 | 2018-12-28 | 北京邮电大学 | Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group |
CN110170171A (en) * | 2019-06-03 | 2019-08-27 | 深圳市腾讯网域计算机网络有限公司 | A kind of control method and device of target object |
CN110489223A (en) * | 2019-08-26 | 2019-11-22 | 北京邮电大学 | Method for scheduling task, device and electronic equipment in a kind of isomeric group |
CN110882544A (en) * | 2019-11-28 | 2020-03-17 | 网易(杭州)网络有限公司 | Multi-agent training method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
US11204803B2 (en) | 2021-12-21 |
SG11202102364YA (en) | 2021-04-29 |
US20210311777A1 (en) | 2021-10-07 |
CN112533681A (en) | 2021-03-19 |
WO2020143848A2 (en) | 2020-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020143847A3 (en) | Determining action selection policies of an execution device | |
WO2020143848A3 (en) | Determining action selection policies of an execution device | |
US20220351091A1 (en) | Training distilled machine learning models | |
TWI627592B (en) | Neural network processor | |
CN109328362B (en) | Progressive neural network | |
TWI767000B (en) | Method and computer storage medium of generating waveform | |
KR102387113B1 (en) | Non-verbal Evaluation Method, System and Computer-readable Medium Based on Machine Learning | |
JP2023501257A (en) | Identifying Optimal Weights to Improve Prediction Accuracy in Machine Learning Techniques | |
WO2020023960A8 (en) | Cloud-based, data-driven artificial intelligence and machine learning financial planning and analysis visualization platform | |
JP2020149044A (en) | Voice separation device, method, and storage medium | |
JP2016218513A (en) | Neural network and computer program therefor | |
US20200302293A1 (en) | Methods and systems for field development decision optimization | |
Nishanov et al. | METHODS OF INDISTINCT REGULATION IN MANAGEMENT PROBLEMS EDUCATIONAL PROCESS | |
Stativko | Some Approaches to Analysis of Learning Trajectory Correction Using Theory of Fuzzy Sets | |
EP3340122A1 (en) | Computationally-efficient spike train filtering | |
Volna et al. | Prediction by means of Elliott waves recognition | |
TWI658458B (en) | Method for improving the performance of singing voice separation, non-transitory computer readable medium and computer program product thereof | |
Tapaswini et al. | Non-probabilistic solution of uncertain vibration equation of large membranes using Adomian decomposition method | |
Gebretsadik et al. | Designing Machine Learning Method for Software Project Effort Prediction | |
Kienitz et al. | Deep option pricing-term structure models | |
JP7211556B1 (en) | neural network system | |
JP2020027245A5 (en) | Information processing method, information processing apparatus, and program | |
CN110874312B (en) | Crowd-sourcing machine suitable for heterogeneous multi-intelligent-agent and implementation method thereof | |
US20220310068A1 (en) | Methods and devices for structured pruning for automatic speech recognition | |
Pepe et al. | AI4SE and SE4AI: Setting the Roadmap toward Human‐Machine Co‐Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20738988 Country of ref document: EP Kind code of ref document: A2 |