JP7347544B2 - 方策学習方法、方策学習装置、プログラム - Google Patents

方策学習方法、方策学習装置、プログラム Download PDF

Info

Publication number
JP7347544B2
JP7347544B2 JP2021570601A JP2021570601A JP7347544B2 JP 7347544 B2 JP7347544 B2 JP 7347544B2 JP 2021570601 A JP2021570601 A JP 2021570601A JP 2021570601 A JP2021570601 A JP 2021570601A JP 7347544 B2 JP7347544 B2 JP 7347544B2
Authority
JP
Japan
Prior art keywords
state
learning
value
graph
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2021570601A
Other languages
English (en)
Japanese (ja)
Other versions
JPWO2021144963A1 (https=
JPWO2021144963A5 (https=
Inventor
豊 八鍬
貴志 丸山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of JPWO2021144963A1 publication Critical patent/JPWO2021144963A1/ja
Publication of JPWO2021144963A5 publication Critical patent/JPWO2021144963A5/ja
Application granted granted Critical
Publication of JP7347544B2 publication Critical patent/JP7347544B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)
JP2021570601A 2020-01-17 2020-01-17 方策学習方法、方策学習装置、プログラム Active JP7347544B2 (ja)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/001500 WO2021144963A1 (ja) 2020-01-17 2020-01-17 方策学習方法、方策学習装置、プログラム

Publications (3)

Publication Number Publication Date
JPWO2021144963A1 JPWO2021144963A1 (https=) 2021-07-22
JPWO2021144963A5 JPWO2021144963A5 (https=) 2022-08-23
JP7347544B2 true JP7347544B2 (ja) 2023-09-20

Family

ID=76864131

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2021570601A Active JP7347544B2 (ja) 2020-01-17 2020-01-17 方策学習方法、方策学習装置、プログラム

Country Status (3)

Country Link
US (1) US20230023899A1 (https=)
JP (1) JP7347544B2 (https=)
WO (1) WO2021144963A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114005014B (zh) * 2021-12-23 2022-06-17 杭州华鲤智能科技有限公司 一种模型训练、社交互动策略优化方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190357520A1 (en) 2018-05-24 2019-11-28 Blue River Technology Inc. Boom sprayer including machine feedback control

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3696737B1 (en) * 2016-11-03 2022-08-31 Deepmind Technologies Limited Training action selection neural networks
EP4273757A3 (en) * 2017-06-05 2024-02-14 DeepMind Technologies Limited Selecting actions using multi-modal inputs
US10935982B2 (en) * 2017-10-04 2021-03-02 Huawei Technologies Co., Ltd. Method of selection of an action for an object using a neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190357520A1 (en) 2018-05-24 2019-11-28 Blue River Technology Inc. Boom sprayer including machine feedback control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEGLER, Marwin H.S.,World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces,arXiv:1912.13007v1,version v1,[online], arXiv (Cornell University),2019年12月30日,Pages 1-6,[retrieved on 2020-03-18], Retrieved from the Internet: <URL: https://arxiv.org/abs/1912.13007v1> an

Also Published As

Publication number Publication date
JPWO2021144963A1 (https=) 2021-07-22
US20230023899A1 (en) 2023-01-26
WO2021144963A1 (ja) 2021-07-22

Similar Documents

Publication Publication Date Title
US12591415B2 (en) Intelligent and predictive modules for software development and coding using artificial intelligence and machine learning
US10528349B2 (en) Branch synthetic generation across multiple microarchitecture generations
CN114355793B (zh) 用于车辆仿真评测的自动驾驶规划模型的训练方法及装置
CN109491494A (zh) 功率参数的调整方法、装置及强化学习模型训练方法
JP7822521B2 (ja) 深層強化学習を用いたgan分散型rf電力増幅器自動化設計
Gao et al. A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
Lorig Hypothesis-Driven Simulation Studies
CN120031099A (zh) 图形界面智能体的训练方法、设备及存储介质
JP7347544B2 (ja) 方策学習方法、方策学習装置、プログラム
Kopsick et al. Formation and retrieval of cell assemblies in a biologically realistic spiking neural network model of area CA3 in the mouse hippocampus
CN118092764B (zh) 一种大语言模型指导的智能体动作控制方法及装置
WO2024180789A1 (ja) 情報処理装置、情報処理方法、プログラム
Chandrasiri et al. Multi-Objective Cloud Workflow Scheduling with Two-Stage Deep Reinforcement Learning and Graph Neural Networks
Xu et al. Beyond Token-Level Policy Gradients for Complex Reasoning with Large Language Models
Hage et al. BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning
CN119967614B (zh) 面向多资源联合调度的操作系统配置调优方法及装置
Moreno-Palancas et al. Handling discrete decisions in bilevel optimization via neural network embeddings
Gartner Adaptive Reinforcement Learning and Microstructure-Aware Optimization in Foreign Exchange Markets: A Unified Framework for Algorithmic Trading and Risk-Aware Execution
CN120746653B (zh) 基于预测性投放规划的广告自动出价方法及系统
KR20250083782A (ko) 의료 데이터 관련 강화 학습을 위한 전자 장치 및 그 동작 방법
Sha et al. World Model Aided Parameter Adjustment Decision and Evaluation System for Radio Access Network
Rodrigues The Lean Startup Methodology: A Catalyst for Semiconductor Innovation
CN118297687A (zh) DeFi协议链上治理智能决策模型、方法、设备及介质
Moolawi Reinforcement learning for optimizing regression testing
Stanev Quantum Photonic Architectures and Machine Learning

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20220623

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20220623

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20230530

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230724

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20230808

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20230821

R151 Written notification of patent or utility model registration

Ref document number: 7347544

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R151