CN116484942B - 用于多智能体强化学习的方法、系统、设备和存储介质 - Google Patents
用于多智能体强化学习的方法、系统、设备和存储介质 Download PDFInfo
- Publication number
- CN116484942B CN116484942B CN202310402439.6A CN202310402439A CN116484942B CN 116484942 B CN116484942 B CN 116484942B CN 202310402439 A CN202310402439 A CN 202310402439A CN 116484942 B CN116484942 B CN 116484942B
- Authority
- CN
- China
- Prior art keywords
- network
- vector
- updating
- agent
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 56
- 239000013598 vector Substances 0.000 claims abstract description 216
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 154
- 230000006870 function Effects 0.000 claims abstract description 114
- 230000003993 interaction Effects 0.000 claims abstract description 62
- 230000004044 response Effects 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims description 28
- 230000009471 action Effects 0.000 claims description 25
- 230000015654 memory Effects 0.000 claims description 5
- 230000006403 short-term memory Effects 0.000 claims description 4
- 230000007787 long-term memory Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 8
- 230000006399 behavior Effects 0.000 description 4
- 238000011478 gradient descent method Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 241000287196 Asthenes Species 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (30)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310402439.6A CN116484942B (zh) | 2023-04-13 | 2023-04-13 | 用于多智能体强化学习的方法、系统、设备和存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310402439.6A CN116484942B (zh) | 2023-04-13 | 2023-04-13 | 用于多智能体强化学习的方法、系统、设备和存储介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116484942A CN116484942A (zh) | 2023-07-25 |
CN116484942B true CN116484942B (zh) | 2024-03-15 |
Family
ID=87224445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310402439.6A Active CN116484942B (zh) | 2023-04-13 | 2023-04-13 | 用于多智能体强化学习的方法、系统、设备和存储介质 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116484942B (zh) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269315A (zh) * | 2021-06-29 | 2021-08-17 | 安徽寒武纪信息科技有限公司 | 利用深度强化学习执行任务的设备、方法及可读存储介质 |
CN114037048A (zh) * | 2021-10-15 | 2022-02-11 | 大连理工大学 | 基于变分循环网络模型的信念一致多智能体强化学习方法 |
CN114662639A (zh) * | 2022-03-24 | 2022-06-24 | 河海大学 | 一种基于值分解的多智能体强化学习方法及系统 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11574148B2 (en) * | 2018-11-05 | 2023-02-07 | Royal Bank Of Canada | System and method for deep reinforcement learning |
EP4010847A1 (en) * | 2019-09-25 | 2022-06-15 | DeepMind Technologies Limited | Training action selection neural networks using hindsight modelling |
CN111612126A (zh) * | 2020-04-18 | 2020-09-01 | 华为技术有限公司 | 强化学习的方法和装置 |
-
2023
- 2023-04-13 CN CN202310402439.6A patent/CN116484942B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269315A (zh) * | 2021-06-29 | 2021-08-17 | 安徽寒武纪信息科技有限公司 | 利用深度强化学习执行任务的设备、方法及可读存储介质 |
CN114037048A (zh) * | 2021-10-15 | 2022-02-11 | 大连理工大学 | 基于变分循环网络模型的信念一致多智能体强化学习方法 |
CN114662639A (zh) * | 2022-03-24 | 2022-06-24 | 河海大学 | 一种基于值分解的多智能体强化学习方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN116484942A (zh) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6811894B2 (ja) | ニューラルネットワーク構造の生成方法および装置、電子機器、ならびに記憶媒体 | |
Strehl et al. | PAC model-free reinforcement learning | |
KR20190028531A (ko) | 복수의 기계 학습 태스크에 대해 기계 학습 모델들을 훈련 | |
CN110942248B (zh) | 交易风控网络的训练方法及装置、交易风险检测方法 | |
CN111309880A (zh) | 多智能体行动策略学习方法、装置、介质和计算设备 | |
CN113962362A (zh) | 强化学习模型训练方法、决策方法、装置、设备及介质 | |
JP7315007B2 (ja) | 学習装置、学習方法および学習プログラム | |
CN112990958A (zh) | 数据处理方法、装置、存储介质及计算机设备 | |
CN116227180A (zh) | 基于数据驱动的机组组合智能决策方法 | |
CN116484942B (zh) | 用于多智能体强化学习的方法、系统、设备和存储介质 | |
CN111510473B (zh) | 访问请求处理方法、装置、电子设备和计算机可读介质 | |
US20230342626A1 (en) | Model processing method and related apparatus | |
CN111430035B (zh) | 传染病人数预测方法、装置、电子设备及介质 | |
CN111461862B (zh) | 为业务数据确定目标特征的方法及装置 | |
US20230252355A1 (en) | Systems and methods for knowledge transfer in machine learning | |
Mattila et al. | What did your adversary believeƒ Optimal filtering and smoothing in counter-adversarial autonomous systems | |
CN116894778A (zh) | 一种用于图像生成的扩散模型采样方法和装置 | |
JP7307785B2 (ja) | 機器学習装置及び方法 | |
CN113112311B (zh) | 训练因果推断模型的方法、信息提示方法以装置 | |
CN114792097A (zh) | 预训练模型提示向量的确定方法、装置及电子设备 | |
CN113836438B (zh) | 用于帖子推荐的方法、电子设备和存储介质 | |
CN115077549B (zh) | 车辆状态跟踪方法、系统、计算机及可读存储介质 | |
CN111064617B (zh) | 基于经验模态分解聚类的网络流量预测方法及装置 | |
CN112949988B (zh) | 一种基于强化学习的服务流程构造方法 | |
CN114066286A (zh) | 一种变电生产日常修理标准成本测算方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Guo Jiaming Inventor after: Peng Shaohui Inventor after: Yi Qi Inventor after: Hu Xing Inventor after: Guo Qi Inventor after: Li Wei Inventor before: Guo Jiaming Inventor before: Peng Shaohui Inventor before: Yi Qi Inventor before: Hu Xing Inventor before: Guo Qi Inventor before: Li Wei |
|
GR01 | Patent grant | ||
GR01 | Patent grant |