WO2020143848A3 - Determining action selection policies of an execution device - Google Patents

Determining action selection policies of an execution device Download PDF

Info

Publication number
WO2020143848A3
WO2020143848A3 PCT/CN2020/082914 CN2020082914W WO2020143848A3 WO 2020143848 A3 WO2020143848 A3 WO 2020143848A3 CN 2020082914 W CN2020082914 W CN 2020082914W WO 2020143848 A3 WO2020143848 A3 WO 2020143848A3
Authority
WO
WIPO (PCT)
Prior art keywords
subtask
snn
sequence
action selection
execution device
Prior art date
Application number
PCT/CN2020/082914
Other languages
French (fr)
Other versions
WO2020143848A2 (en
Inventor
Hui Li
Le SONG
Original Assignee
Alipay (Hangzhou) Information Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay (Hangzhou) Information Technology Co., Ltd. filed Critical Alipay (Hangzhou) Information Technology Co., Ltd.
Priority to CN202080004369.3A priority Critical patent/CN112533681A/en
Priority to PCT/CN2020/082914 priority patent/WO2020143848A2/en
Priority to SG11202102364YA priority patent/SG11202102364YA/en
Publication of WO2020143848A2 publication Critical patent/WO2020143848A2/en
Publication of WO2020143848A3 publication Critical patent/WO2020143848A3/en
Priority to US17/219,038 priority patent/US11204803B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Feedback Control In General (AREA)

Abstract

Computer-implemented methods, systems, and apparatus, including computer-readable medium, for generating an action selection policy for causing an execution device to complete a task are described. Data representing a task that is divided into a sequence of subtasks are obtained. Data specifying a strategy neural network (SNN) for a subtask in the sequence of subtasks are obtained. The SNN receives inputs include a sequence of actions that reach an initial state of the subtask, and predicts an action selection policy of the execution device for the subtask. The SNN is trained based on a value neural network (VNN) for a next subtask that follows the subtask in the sequence of subtasks. An input to the SNN is determined. The input includes a sequence of actions that reach a subtask initial state of the subtask. An action selection policy for completing the subtask is determined based on an output of the SNN.
PCT/CN2020/082914 2020-04-02 2020-04-02 Determining action selection policies of an execution device WO2020143848A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202080004369.3A CN112533681A (en) 2020-04-02 2020-04-02 Determining action selection guidelines for an execution device
PCT/CN2020/082914 WO2020143848A2 (en) 2020-04-02 2020-04-02 Determining action selection policies of an execution device
SG11202102364YA SG11202102364YA (en) 2020-04-02 2020-04-02 Determining action selection policies of an execution device
US17/219,038 US11204803B2 (en) 2020-04-02 2021-03-31 Determining action selection policies of an execution device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/082914 WO2020143848A2 (en) 2020-04-02 2020-04-02 Determining action selection policies of an execution device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/219,038 Continuation US11204803B2 (en) 2020-04-02 2021-03-31 Determining action selection policies of an execution device

Publications (2)

Publication Number Publication Date
WO2020143848A2 WO2020143848A2 (en) 2020-07-16
WO2020143848A3 true WO2020143848A3 (en) 2021-01-28

Family

ID=71522304

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082914 WO2020143848A2 (en) 2020-04-02 2020-04-02 Determining action selection policies of an execution device

Country Status (4)

Country Link
US (1) US11204803B2 (en)
CN (1) CN112533681A (en)
SG (1) SG11202102364YA (en)
WO (1) WO2020143848A2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11202010204TA (en) 2019-12-12 2020-11-27 Alipay Hangzhou Inf Tech Co Ltd Determining action selection policies of an execution device
SG11202010721QA (en) 2019-12-12 2020-11-27 Alipay Hangzhou Inf Tech Co Ltd Determining action selection policies of execution device
WO2020098822A2 (en) 2019-12-12 2020-05-22 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160260024A1 (en) * 2015-03-04 2016-09-08 Qualcomm Incorporated System of distributed planning
CN109101339A (en) * 2018-08-15 2018-12-28 北京邮电大学 Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group
CN110170171A (en) * 2019-06-03 2019-08-27 深圳市腾讯网域计算机网络有限公司 A kind of control method and device of target object
US20190332922A1 (en) * 2017-02-24 2019-10-31 Google Llc Training policy neural networks using path consistency learning
US20190354813A1 (en) * 2017-01-31 2019-11-21 Deepmind Technologies Limited Data-efficient reinforcement learning for continuous control tasks
CN110489223A (en) * 2019-08-26 2019-11-22 北京邮电大学 Method for scheduling task, device and electronic equipment in a kind of isomeric group
CN110882544A (en) * 2019-11-28 2020-03-17 网易(杭州)网络有限公司 Multi-agent training method and device and electronic equipment

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2459254C2 (en) 2007-04-27 2012-08-20 Сименс Акциенгезелльшафт Method for computer-aided training of one or more neural networks
US20090276385A1 (en) * 2008-04-30 2009-11-05 Stanley Hill Artificial-Neural-Networks Training Artificial-Neural-Networks
US20140039913A1 (en) 2012-07-31 2014-02-06 Tuomas W. Sandholm Medical treatment planning via sequential games
EP3872715A1 (en) * 2015-11-12 2021-09-01 Deepmind Technologies Limited Asynchronous deep reinforcement learning
US10002029B1 (en) * 2016-02-05 2018-06-19 Sas Institute Inc. Automated transfer of neural network definitions among federated areas
US10057367B2 (en) 2016-03-02 2018-08-21 Huawei Technologies Canada Co., Ltd. Systems and methods for data caching in a communications network
DE202016004627U1 (en) * 2016-07-27 2016-09-23 Google Inc. Training a neural value network
DE202016004628U1 (en) * 2016-07-27 2016-09-23 Google Inc. Traversing an environment state structure using neural networks
CN106296006A (en) 2016-08-10 2017-01-04 哈尔滨工业大学深圳研究生院 The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation
US10694526B2 (en) 2016-09-30 2020-06-23 Drexel University Adaptive pursuit learning method to mitigate small-cell interference through directionality
EP3516595B1 (en) * 2016-11-03 2020-09-09 Deepmind Technologies Limited Training action selection neural networks
JP6926203B2 (en) * 2016-11-04 2021-08-25 ディープマインド テクノロジーズ リミテッド Reinforcement learning with auxiliary tasks
EP3535702B1 (en) * 2016-11-04 2024-05-01 Google LLC Unsupervised detection of intermediate reinforcement learning goals
US20180189950A1 (en) * 2016-12-30 2018-07-05 Google Inc. Generating structured output predictions using neural networks
WO2018211141A1 (en) * 2017-05-19 2018-11-22 Deepmind Technologies Limited Imagination-based agent neural networks
WO2018215665A1 (en) * 2017-05-26 2018-11-29 Deepmind Technologies Limited Training action selection neural networks using look-ahead search
CN116957055A (en) * 2017-06-05 2023-10-27 渊慧科技有限公司 Selecting actions using multimodal input
CN110574048B (en) * 2017-06-09 2023-07-07 渊慧科技有限公司 Training action selection neural network
US11138513B2 (en) 2017-06-13 2021-10-05 Princeton University Dynamic learning system
CN110651279B (en) * 2017-06-28 2023-11-07 渊慧科技有限公司 Training action selection neural networks using apprentices
EP3616128A1 (en) * 2017-08-25 2020-03-04 Google LLC Batched reinforcement learning
US10846109B2 (en) * 2017-12-20 2020-11-24 Google Llc Suggesting actions based on machine learning
WO2019133052A1 (en) * 2017-12-28 2019-07-04 Yang Shao Wen Visual fog
US11688160B2 (en) * 2018-01-17 2023-06-27 Huawei Technologies Co., Ltd. Method of generating training data for training a neural network, method of training a neural network and using neural network for autonomous operations
US20190244099A1 (en) * 2018-02-05 2019-08-08 Deepmind Technologies Limited Continual reinforcement learning with a multi-task agent
US20190324795A1 (en) * 2018-04-24 2019-10-24 Microsoft Technology Licensing, Llc Composite task execution
US20190392309A1 (en) * 2018-06-21 2019-12-26 Denso International America, Inc. LSTM Training For Neural Network Based Course Of Action Selection
CN112292701A (en) 2019-01-17 2021-01-29 创新先进技术有限公司 Conducting policy search in multi-party policy interaction
SG11202001804QA (en) * 2019-05-15 2020-12-30 Advanced New Technologies Co Ltd Determining action selection policies of an execution device
US11714990B2 (en) * 2019-05-23 2023-08-01 Deepmind Technologies Limited Jointly learning exploratory and non-exploratory action selection policies
US11227167B2 (en) * 2019-06-28 2022-01-18 Baidu Usa Llc Determining vanishing points based on lane lines
CN110327624B (en) * 2019-07-03 2023-03-17 广州多益网络股份有限公司 Game following method and system based on curriculum reinforcement learning
US20210158162A1 (en) * 2019-11-27 2021-05-27 Google Llc Training reinforcement learning agents to learn farsighted behaviors by predicting in latent space
WO2021153969A1 (en) * 2020-01-27 2021-08-05 Samsung Electronics Co., Ltd. Methods and systems for managing processing of neural network across heterogeneous processors

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160260024A1 (en) * 2015-03-04 2016-09-08 Qualcomm Incorporated System of distributed planning
US20190354813A1 (en) * 2017-01-31 2019-11-21 Deepmind Technologies Limited Data-efficient reinforcement learning for continuous control tasks
US20190332922A1 (en) * 2017-02-24 2019-10-31 Google Llc Training policy neural networks using path consistency learning
CN109101339A (en) * 2018-08-15 2018-12-28 北京邮电大学 Video task parallel method, device and Heterogeneous Cluster Environment in isomeric group
CN110170171A (en) * 2019-06-03 2019-08-27 深圳市腾讯网域计算机网络有限公司 A kind of control method and device of target object
CN110489223A (en) * 2019-08-26 2019-11-22 北京邮电大学 Method for scheduling task, device and electronic equipment in a kind of isomeric group
CN110882544A (en) * 2019-11-28 2020-03-17 网易(杭州)网络有限公司 Multi-agent training method and device and electronic equipment

Also Published As

Publication number Publication date
US11204803B2 (en) 2021-12-21
SG11202102364YA (en) 2021-04-29
US20210311777A1 (en) 2021-10-07
CN112533681A (en) 2021-03-19
WO2020143848A2 (en) 2020-07-16

Similar Documents

Publication Publication Date Title
WO2020143847A3 (en) Determining action selection policies of an execution device
WO2020143848A3 (en) Determining action selection policies of an execution device
US20220351091A1 (en) Training distilled machine learning models
TWI627592B (en) Neural network processor
CN109328362B (en) Progressive neural network
TWI767000B (en) Method and computer storage medium of generating waveform
KR102387113B1 (en) Non-verbal Evaluation Method, System and Computer-readable Medium Based on Machine Learning
JP2023501257A (en) Identifying Optimal Weights to Improve Prediction Accuracy in Machine Learning Techniques
WO2020023960A8 (en) Cloud-based, data-driven artificial intelligence and machine learning financial planning and analysis visualization platform
JP2020149044A (en) Voice separation device, method, and storage medium
JP2016218513A (en) Neural network and computer program therefor
US20200302293A1 (en) Methods and systems for field development decision optimization
Nishanov et al. METHODS OF INDISTINCT REGULATION IN MANAGEMENT PROBLEMS EDUCATIONAL PROCESS
Stativko Some Approaches to Analysis of Learning Trajectory Correction Using Theory of Fuzzy Sets
EP3340122A1 (en) Computationally-efficient spike train filtering
Volna et al. Prediction by means of Elliott waves recognition
TWI658458B (en) Method for improving the performance of singing voice separation, non-transitory computer readable medium and computer program product thereof
Tapaswini et al. Non-probabilistic solution of uncertain vibration equation of large membranes using Adomian decomposition method
Gebretsadik et al. Designing Machine Learning Method for Software Project Effort Prediction
Kienitz et al. Deep option pricing-term structure models
JP7211556B1 (en) neural network system
JP2020027245A5 (en) Information processing method, information processing apparatus, and program
CN110874312B (en) Crowd-sourcing machine suitable for heterogeneous multi-intelligent-agent and implementation method thereof
US20220310068A1 (en) Methods and devices for structured pruning for automatic speech recognition
Pepe et al. AI4SE and SE4AI: Setting the Roadmap toward Human‐Machine Co‐Learning

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20738988

Country of ref document: EP

Kind code of ref document: A2