CN108885721B - 利用密度比估计的直接逆向强化学习 - Google Patents

利用密度比估计的直接逆向强化学习 Download PDF

Info

Publication number
CN108885721B
CN108885721B CN201780017406.2A CN201780017406A CN108885721B CN 108885721 B CN108885721 B CN 108885721B CN 201780017406 A CN201780017406 A CN 201780017406A CN 108885721 B CN108885721 B CN 108885721B
Authority
CN
China
Prior art keywords
estimating
behavior
logarithm
function
cost function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780017406.2A
Other languages
English (en)
Chinese (zh)
Other versions
CN108885721A (zh
Inventor
内部英治
铜谷贤治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Okinawa Institute of Science and Technology School Corp
Original Assignee
Okinawa Institute of Science and Technology School Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Okinawa Institute of Science and Technology School Corp filed Critical Okinawa Institute of Science and Technology School Corp
Publication of CN108885721A publication Critical patent/CN108885721A/zh
Application granted granted Critical
Publication of CN108885721B publication Critical patent/CN108885721B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)
CN201780017406.2A 2016-03-15 2017-02-07 利用密度比估计的直接逆向强化学习 Active CN108885721B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201662308722P 2016-03-15 2016-03-15
US62/308,722 2016-03-15
PCT/JP2017/004463 WO2017159126A1 (en) 2016-03-15 2017-02-07 Direct inverse reinforcement learning with density ratio estimation

Publications (2)

Publication Number Publication Date
CN108885721A CN108885721A (zh) 2018-11-23
CN108885721B true CN108885721B (zh) 2022-05-06

Family

ID=59851115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780017406.2A Active CN108885721B (zh) 2016-03-15 2017-02-07 利用密度比估计的直接逆向强化学习

Country Status (5)

Country Link
EP (1) EP3430578A4 (ja)
JP (1) JP6910074B2 (ja)
KR (1) KR102198733B1 (ja)
CN (1) CN108885721B (ja)
WO (1) WO2017159126A1 (ja)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021229626A1 (ja) * 2020-05-11 2021-11-18 日本電気株式会社 学習装置、学習方法および学習プログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756177B1 (en) * 2011-04-18 2014-06-17 The Boeing Company Methods and systems for estimating subject intent from surveillance
CN104573621A (zh) * 2014-09-30 2015-04-29 李文生 基于Chebyshev神经网络的动态手势学习和识别方法
WO2016021210A1 (en) * 2014-08-07 2016-02-11 Okinawa Institute Of Science And Technology School Corporation Inverse reinforcement learning by density ratio estimation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8359226B2 (en) * 2006-01-20 2013-01-22 International Business Machines Corporation System and method for marketing mix optimization for brand equity management
US9090255B2 (en) * 2012-07-12 2015-07-28 Honda Motor Co., Ltd. Hybrid vehicle fuel efficiency using inverse reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756177B1 (en) * 2011-04-18 2014-06-17 The Boeing Company Methods and systems for estimating subject intent from surveillance
WO2016021210A1 (en) * 2014-08-07 2016-02-11 Okinawa Institute Of Science And Technology School Corporation Inverse reinforcement learning by density ratio estimation
CN104573621A (zh) * 2014-09-30 2015-04-29 李文生 基于Chebyshev神经网络的动态手势学习和识别方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Density-ratio Framework for Statistical Data Processing;Masashi Sugiyama 等;《IPSJ Transactions on Computer Vision and Application》;20090901;全文 *
Multi-robot inverse reinforcement learning under occlusion with interactions;Kenneth Bogert 等;《AAMAS "14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems》;20140505;全文 *

Also Published As

Publication number Publication date
EP3430578A1 (en) 2019-01-23
JP2019508817A (ja) 2019-03-28
KR20180113587A (ko) 2018-10-16
JP6910074B2 (ja) 2021-07-28
KR102198733B1 (ko) 2021-01-05
EP3430578A4 (en) 2019-11-13
WO2017159126A1 (en) 2017-09-21
CN108885721A (zh) 2018-11-23

Similar Documents

Publication Publication Date Title
CN106575382B (zh) 估计对象行为的计算机方法和系统、预测偏好的系统和介质
US10896383B2 (en) Direct inverse reinforcement learning with density ratio estimation
Chatzis et al. Echo state Gaussian process
Zhe et al. Scalable high-order gaussian process regression
Osa Motion planning by learning the solution manifold in trajectory optimization
Wang et al. Focused model-learning and planning for non-Gaussian continuous state-action systems
Ramirez et al. Reinforcement learning from expert demonstrations with application to redundant robot control
Chatzis et al. The copula echo state network
Wang et al. Dynamic-resolution model learning for object pile manipulation
Stojkovic et al. Distance Based Modeling of Interactions in Structured Regression.
Theodoropoulos et al. Cyber-physical systems in non-rigid assemblies: A methodology for the calibration of deformable object reconstruction models
CN108885721B (zh) 利用密度比估计的直接逆向强化学习
Yamaguchi et al. Model-based multi-objective reinforcement learning with unknown weights
Liu et al. Distributional reinforcement learning with epistemic and aleatoric uncertainty estimation
Obukhov et al. Neural network method for automatic data generation in adaptive information systems
Matsumoto et al. Mobile robot navigation using learning-based method based on predictive state representation in a dynamic environment
Zhou et al. Bayesian inference for data-efficient, explainable, and safe robotic motion planning: A review
Vien et al. A covariance matrix adaptation evolution strategy for direct policy search in reproducing kernel Hilbert space
Das et al. Inverse Reinforcement Learning with Constraint Recovery
Okadome et al. Predictive control method for a redundant robot using a non-parametric predictor
Meden et al. First steps towards state representation learning for cognitive robotics
Pinto et al. One-shot learning in the road sign problem
Keurulainen Improving the sample efficiency of few-shot reinforcement learning with policy embeddings
Watson et al. Machine Learning with Physics Knowledge for Prediction: A Survey
Huang et al. 3D skeleton-based human motion prediction using spatial–temporal graph convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant