JP7529145B2 - 学習装置、学習方法および学習プログラム - Google Patents

学習装置、学習方法および学習プログラム Download PDF

Info

Publication number
JP7529145B2
JP7529145B2 JP2023516888A JP2023516888A JP7529145B2 JP 7529145 B2 JP7529145 B2 JP 7529145B2 JP 2023516888 A JP2023516888 A JP 2023516888A JP 2023516888 A JP2023516888 A JP 2023516888A JP 7529145 B2 JP7529145 B2 JP 7529145B2
Authority
JP
Japan
Prior art keywords
learning
function
parameter
trajectory data
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2023516888A
Other languages
English (en)
Japanese (ja)
Other versions
JPWO2022230038A5 (https=
JPWO2022230038A1 (https=
Inventor
大 窪田
力 江藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of JPWO2022230038A1 publication Critical patent/JPWO2022230038A1/ja
Publication of JPWO2022230038A5 publication Critical patent/JPWO2022230038A5/ja
Application granted granted Critical
Publication of JP7529145B2 publication Critical patent/JP7529145B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Manipulator (AREA)
JP2023516888A 2021-04-27 2021-04-27 学習装置、学習方法および学習プログラム Active JP7529145B2 (ja)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/016728 WO2022230038A1 (ja) 2021-04-27 2021-04-27 学習装置、学習方法および学習プログラム

Publications (3)

Publication Number Publication Date
JPWO2022230038A1 JPWO2022230038A1 (https=) 2022-11-03
JPWO2022230038A5 JPWO2022230038A5 (https=) 2024-01-18
JP7529145B2 true JP7529145B2 (ja) 2024-08-06

Family

ID=83846769

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023516888A Active JP7529145B2 (ja) 2021-04-27 2021-04-27 学習装置、学習方法および学習プログラム

Country Status (3)

Country Link
US (1) US20240202504A1 (https=)
JP (1) JP7529145B2 (https=)
WO (1) WO2022230038A1 (https=)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230401262A1 (en) * 2022-06-10 2023-12-14 Multiverse Computing Sl Quantum-inspired method and system for clustering of data
CN119388413B (zh) * 2024-08-30 2025-12-26 北京长木谷医疗科技股份有限公司 基于具身智能的手术机器人控制逆强化学习方法及装置
CN119328776B (zh) * 2024-12-20 2025-07-01 江苏骠马电力科技有限公司 一种基于变电站仿生操作机器人视觉定位引导方法
CN120217907B (zh) * 2025-05-28 2025-10-21 集美大学 一种基于航行意图感知的无人艇避碰决策方法
CN121094343B (zh) * 2025-11-11 2026-04-14 江西五十铃汽车有限公司 新能源汽车动力系统的跨技术路线协同决策方法及系统

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHOU Glen et al., Learning Constraints from Demonstrations,arXiv [online],2019年,pp.1-25,[検索日 2021.07.12], インターネット: <URL: https://arxiv.org/abs/1812.07084v2>
SCOBEE R.R. Dexter et al.,Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning,arXiv [online],2020年,pp.1-12,[検索日 2021.07.12], インターネット: <URL: https://arxiv.org/abs/1909.05477v2>
中口悠輝, 外2名,最大エントロピー原理に基づく逆強化ダイナミクス学習フレームワークの構築,2019年度人工知能学会全国大会(第33回),2019年06月07日,p.1-4
今井拓司,専門家の意図が分かる模倣学習を逆強化学習でNECが単発の意思決定問題から実用へ,NIKKEI Robotics,日本,日経BP,2019年09月10日,第51号,p.22-26
増山 岳人,梅田 和昇,逆強化学習による学習者の選好を考慮した報酬関数の推定,第32回日本ロボット学会学術講演会,2014年

Also Published As

Publication number Publication date
JPWO2022230038A1 (https=) 2022-11-03
WO2022230038A1 (ja) 2022-11-03
US20240202504A1 (en) 2024-06-20

Similar Documents

Publication Publication Date Title
JP7529145B2 (ja) 学習装置、学習方法および学習プログラム
US12372929B2 (en) Machine learning for technical systems
US10643154B2 (en) Transforming attributes for training automated modeling systems
US20220179374A1 (en) Evaluation and/or adaptation of industrial and/or technical process models
Walsh et al. Exploring compact reinforcement-learning representations with linear regression
JP7315007B2 (ja) 学習装置、学習方法および学習プログラム
JP2022189799A (ja) Few-shot模倣のためのデモンストレーション条件付き強化学習
CN113614743B (zh) 用于操控机器人的方法和设备
CN112016611B (zh) 生成器网络和策略生成网络的训练方法、装置和电子设备
JP7268757B2 (ja) 学習装置、学習方法および学習プログラム
US20240202569A1 (en) Learning device, learning method, and recording medium
Di Natale et al. Simba: System identification methods leveraging backpropagation
Petelin et al. Control system with evolving Gaussian process models
Fan et al. Learning stable Koopman embeddings for identification and control
Zhao et al. Stable and safe human-aligned reinforcement learning through neural ordinary differential equations
JP7464115B2 (ja) 学習装置、学習方法および学習プログラム
US20240037452A1 (en) Learning device, learning method, and learning program
EP4332845A1 (en) Learning device, learning method, and learning program
Zhao et al. Extended kalman filtering for recursive online discrete-time inverse optimal control
JP7420236B2 (ja) 学習装置、学習方法および学習プログラム
Schweitzer et al. Metamodel-based Simulation Optimization Using Machine Learning for Solving Production Planning Problems in the Automotive Industry
US20220405599A1 (en) Automated design of architectures of artificial neural networks
KR20230060478A (ko) 비지도 학습을 이용한 집단적 네트워크 최적화 방법 및 그 장치
CN116011591A (zh) 安全强化学习方法和装置以及智能体和存储介质
Cubuktepe et al. Verification of Markov decision processes with risk-sensitive measures

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20231005

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20231005

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20240625

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20240708

R150 Certificate of patent or registration of utility model

Ref document number: 7529145

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150