CN108885721B - 利用密度比估计的直接逆向强化学习 - Google Patents
利用密度比估计的直接逆向强化学习 Download PDFInfo
- Publication number
- CN108885721B CN108885721B CN201780017406.2A CN201780017406A CN108885721B CN 108885721 B CN108885721 B CN 108885721B CN 201780017406 A CN201780017406 A CN 201780017406A CN 108885721 B CN108885721 B CN 108885721B
- Authority
- CN
- China
- Prior art keywords
- estimating
- behavior
- logarithm
- function
- cost function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 73
- 230000006870 function Effects 0.000 claims abstract description 157
- 238000000034 method Methods 0.000 claims abstract description 118
- 230000006399 behavior Effects 0.000 claims abstract description 50
- 230000007704 transition Effects 0.000 claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 19
- 230000009471 action Effects 0.000 claims description 15
- 238000007477 logistic regression Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 10
- 239000013598 vector Substances 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 230000001419 dependent effect Effects 0.000 description 8
- 238000005070 sampling Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 6
- 238000004088 simulation Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000002790 cross-validation Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- NRNCYVBFPDDJNE-UHFFFAOYSA-N pemoline Chemical compound O1C(N)=NC(=O)C1C1=CC=CC=C1 NRNCYVBFPDDJNE-UHFFFAOYSA-N 0.000 description 3
- 101100518501 Mus musculus Spp1 gene Proteins 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000009827 uniform distribution Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 241000243621 Vandenboschia maxima Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000001311 chemical methods and process Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004540 process dynamic Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Feedback Control In General (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662308722P | 2016-03-15 | 2016-03-15 | |
US62/308,722 | 2016-03-15 | ||
PCT/JP2017/004463 WO2017159126A1 (en) | 2016-03-15 | 2017-02-07 | Direct inverse reinforcement learning with density ratio estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108885721A CN108885721A (zh) | 2018-11-23 |
CN108885721B true CN108885721B (zh) | 2022-05-06 |
Family
ID=59851115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780017406.2A Active CN108885721B (zh) | 2016-03-15 | 2017-02-07 | 利用密度比估计的直接逆向强化学习 |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3430578A4 (ja) |
JP (1) | JP6910074B2 (ja) |
KR (1) | KR102198733B1 (ja) |
CN (1) | CN108885721B (ja) |
WO (1) | WO2017159126A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021229626A1 (ja) * | 2020-05-11 | 2021-11-18 | 日本電気株式会社 | 学習装置、学習方法および学習プログラム |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8756177B1 (en) * | 2011-04-18 | 2014-06-17 | The Boeing Company | Methods and systems for estimating subject intent from surveillance |
CN104573621A (zh) * | 2014-09-30 | 2015-04-29 | 李文生 | 基于Chebyshev神经网络的动态手势学习和识别方法 |
WO2016021210A1 (en) * | 2014-08-07 | 2016-02-11 | Okinawa Institute Of Science And Technology School Corporation | Inverse reinforcement learning by density ratio estimation |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8359226B2 (en) * | 2006-01-20 | 2013-01-22 | International Business Machines Corporation | System and method for marketing mix optimization for brand equity management |
US9090255B2 (en) * | 2012-07-12 | 2015-07-28 | Honda Motor Co., Ltd. | Hybrid vehicle fuel efficiency using inverse reinforcement learning |
-
2017
- 2017-02-07 KR KR1020187026764A patent/KR102198733B1/ko active IP Right Grant
- 2017-02-07 JP JP2018546050A patent/JP6910074B2/ja active Active
- 2017-02-07 WO PCT/JP2017/004463 patent/WO2017159126A1/en active Application Filing
- 2017-02-07 EP EP17766134.5A patent/EP3430578A4/en not_active Ceased
- 2017-02-07 CN CN201780017406.2A patent/CN108885721B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8756177B1 (en) * | 2011-04-18 | 2014-06-17 | The Boeing Company | Methods and systems for estimating subject intent from surveillance |
WO2016021210A1 (en) * | 2014-08-07 | 2016-02-11 | Okinawa Institute Of Science And Technology School Corporation | Inverse reinforcement learning by density ratio estimation |
CN104573621A (zh) * | 2014-09-30 | 2015-04-29 | 李文生 | 基于Chebyshev神经网络的动态手势学习和识别方法 |
Non-Patent Citations (2)
Title |
---|
A Density-ratio Framework for Statistical Data Processing;Masashi Sugiyama 等;《IPSJ Transactions on Computer Vision and Application》;20090901;全文 * |
Multi-robot inverse reinforcement learning under occlusion with interactions;Kenneth Bogert 等;《AAMAS "14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems》;20140505;全文 * |
Also Published As
Publication number | Publication date |
---|---|
EP3430578A1 (en) | 2019-01-23 |
JP2019508817A (ja) | 2019-03-28 |
KR20180113587A (ko) | 2018-10-16 |
JP6910074B2 (ja) | 2021-07-28 |
KR102198733B1 (ko) | 2021-01-05 |
EP3430578A4 (en) | 2019-11-13 |
WO2017159126A1 (en) | 2017-09-21 |
CN108885721A (zh) | 2018-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106575382B (zh) | 估计对象行为的计算机方法和系统、预测偏好的系统和介质 | |
US10896383B2 (en) | Direct inverse reinforcement learning with density ratio estimation | |
Chatzis et al. | Echo state Gaussian process | |
Zhe et al. | Scalable high-order gaussian process regression | |
Osa | Motion planning by learning the solution manifold in trajectory optimization | |
Wang et al. | Focused model-learning and planning for non-Gaussian continuous state-action systems | |
Ramirez et al. | Reinforcement learning from expert demonstrations with application to redundant robot control | |
Chatzis et al. | The copula echo state network | |
Wang et al. | Dynamic-resolution model learning for object pile manipulation | |
Stojkovic et al. | Distance Based Modeling of Interactions in Structured Regression. | |
Theodoropoulos et al. | Cyber-physical systems in non-rigid assemblies: A methodology for the calibration of deformable object reconstruction models | |
CN108885721B (zh) | 利用密度比估计的直接逆向强化学习 | |
Yamaguchi et al. | Model-based multi-objective reinforcement learning with unknown weights | |
Liu et al. | Distributional reinforcement learning with epistemic and aleatoric uncertainty estimation | |
Obukhov et al. | Neural network method for automatic data generation in adaptive information systems | |
Matsumoto et al. | Mobile robot navigation using learning-based method based on predictive state representation in a dynamic environment | |
Zhou et al. | Bayesian inference for data-efficient, explainable, and safe robotic motion planning: A review | |
Vien et al. | A covariance matrix adaptation evolution strategy for direct policy search in reproducing kernel Hilbert space | |
Das et al. | Inverse Reinforcement Learning with Constraint Recovery | |
Okadome et al. | Predictive control method for a redundant robot using a non-parametric predictor | |
Meden et al. | First steps towards state representation learning for cognitive robotics | |
Pinto et al. | One-shot learning in the road sign problem | |
Keurulainen | Improving the sample efficiency of few-shot reinforcement learning with policy embeddings | |
Watson et al. | Machine Learning with Physics Knowledge for Prediction: A Survey | |
Huang et al. | 3D skeleton-based human motion prediction using spatial–temporal graph convolutional network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |