CN113313209A - 一种高样本效率的多智能体强化学习训练方法 - Google Patents
一种高样本效率的多智能体强化学习训练方法 Download PDFInfo
- Publication number
- CN113313209A CN113313209A CN202110718305.6A CN202110718305A CN113313209A CN 113313209 A CN113313209 A CN 113313209A CN 202110718305 A CN202110718305 A CN 202110718305A CN 113313209 A CN113313209 A CN 113313209A
- Authority
- CN
- China
- Prior art keywords
- training
- agent
- reinforcement learning
- network
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110718305.6A CN113313209A (zh) | 2021-06-28 | 2021-06-28 | 一种高样本效率的多智能体强化学习训练方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110718305.6A CN113313209A (zh) | 2021-06-28 | 2021-06-28 | 一种高样本效率的多智能体强化学习训练方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113313209A true CN113313209A (zh) | 2021-08-27 |
Family
ID=77380583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110718305.6A Pending CN113313209A (zh) | 2021-06-28 | 2021-06-28 | 一种高样本效率的多智能体强化学习训练方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113313209A (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116226662A (zh) * | 2023-01-05 | 2023-06-06 | 哈尔滨工业大学(深圳) | 一种多智能体协同强化学习方法、终端及存储介质 |
CN117369286A (zh) * | 2023-12-04 | 2024-01-09 | 中国海洋大学 | 一种海洋平台动力定位控制方法 |
-
2021
- 2021-06-28 CN CN202110718305.6A patent/CN113313209A/zh active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116226662A (zh) * | 2023-01-05 | 2023-06-06 | 哈尔滨工业大学(深圳) | 一种多智能体协同强化学习方法、终端及存储介质 |
CN116226662B (zh) * | 2023-01-05 | 2024-02-09 | 哈尔滨工业大学(深圳) | 一种多智能体协同强化学习方法、终端及存储介质 |
CN117369286A (zh) * | 2023-12-04 | 2024-01-09 | 中国海洋大学 | 一种海洋平台动力定位控制方法 |
CN117369286B (zh) * | 2023-12-04 | 2024-02-09 | 中国海洋大学 | 一种海洋平台动力定位控制方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhong et al. | Blockqnn: Efficient block-wise neural network architecture generation | |
Mousavi et al. | Traffic light control using deep policy‐gradient and value‐function‐based reinforcement learning | |
Seo et al. | Reinforcement learning with action-free pre-training from videos | |
Lin et al. | An efficient deep reinforcement learning model for urban traffic control | |
CN110852448A (zh) | 一种基于多智能体强化学习的合作型智能体的学习方法 | |
CN110794842A (zh) | 基于势场的强化学习路径规划算法 | |
CN109829541A (zh) | 基于学习自动机的深度神经网络增量式训练方法及系统 | |
Yu | From information networking to intelligence networking: Motivations, scenarios, and challenges | |
CN113313209A (zh) | 一种高样本效率的多智能体强化学习训练方法 | |
CN110014428B (zh) | 一种基于强化学习的时序逻辑任务规划方法 | |
CN113919485B (zh) | 基于动态层级通信网络的多智能体强化学习方法及系统 | |
CN111798002A (zh) | 一种局部模型占比可控的联邦学习全局模型聚合方法 | |
CN110135584A (zh) | 基于自适应并行遗传算法的大规模符号回归方法及系统 | |
CN111191728A (zh) | 基于异步或同步的深度强化学习分布式训练方法及系统 | |
CN112732436B (zh) | 一种多核处理器-单图形处理器的深度强化学习加速方法 | |
CN111950722A (zh) | 一种基于环境预测模型的强化学习方法 | |
CN111401557A (zh) | 智能体决策制定方法、ai模型训练方法、服务器及介质 | |
CN114510012A (zh) | 一种基于元动作序列强化学习的无人集群演进系统及方法 | |
Xu et al. | Living with artificial intelligence: A paradigm shift toward future network traffic control | |
CN116841317A (zh) | 一种基于图注意力强化学习的无人机集群协同对抗方法 | |
CN108470212A (zh) | 一种能利用事件持续时间的高效lstm设计方法 | |
Li et al. | Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning | |
CN111783983A (zh) | 用于实现导航的可迁移的元学习的无监督dqn强化学习 | |
CN114053712B (zh) | 一种虚拟对象的动作生成方法、装置及设备 | |
CN111950691A (zh) | 一种基于潜在动作表示空间的强化学习策略学习方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Song Guanghua Inventor after: Ye Zhenhui Inventor after: Chen Yining Inventor after: Wang Ke Inventor after: Ying Haochao Inventor after: Wu Jian Inventor after: Jiang Xiaohong Inventor before: Wu Jian Inventor before: Song Guanghua Inventor before: Jiang Xiaohong Inventor before: Ye Zhenhui Inventor before: Chen Yining Inventor before: Wang Ke Inventor before: Ying Haochao |