CN113313267A - Multi-agent reinforcement learning method based on value decomposition and attention mechanism - Google Patents
Multi-agent reinforcement learning method based on value decomposition and attention mechanism Download PDFInfo
- Publication number
- CN113313267A CN113313267A CN202110717897.XA CN202110717897A CN113313267A CN 113313267 A CN113313267 A CN 113313267A CN 202110717897 A CN202110717897 A CN 202110717897A CN 113313267 A CN113313267 A CN 113313267A
- Authority
- CN
- China
- Prior art keywords
- value
- agent
- network
- function
- tot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000007246 mechanism Effects 0.000 title claims abstract description 23
- 230000002787 reinforcement Effects 0.000 title claims abstract description 21
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 50
- 230000009471 action Effects 0.000 claims abstract description 38
- 230000008901 benefit Effects 0.000 claims abstract description 9
- 239000010410 layer Substances 0.000 claims description 27
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000002156 mixing Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012549 training Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 206010010071 Coma Diseases 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110717897.XA CN113313267B (en) | 2021-06-28 | 2021-06-28 | Multi-agent reinforcement learning method based on value decomposition and attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110717897.XA CN113313267B (en) | 2021-06-28 | 2021-06-28 | Multi-agent reinforcement learning method based on value decomposition and attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113313267A true CN113313267A (en) | 2021-08-27 |
CN113313267B CN113313267B (en) | 2023-12-08 |
Family
ID=77380579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110717897.XA Active CN113313267B (en) | 2021-06-28 | 2021-06-28 | Multi-agent reinforcement learning method based on value decomposition and attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113313267B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792861A (en) * | 2021-09-16 | 2021-12-14 | 中国科学技术大学 | Multi-agent reinforcement learning method and system based on value distribution |
CN113902125A (en) * | 2021-09-24 | 2022-01-07 | 浙江大学 | Intra-group cooperation intelligent agent control method based on deep hierarchical reinforcement learning |
CN113919485A (en) * | 2021-10-19 | 2022-01-11 | 西安交通大学 | Multi-agent reinforcement learning method and system based on dynamic hierarchical communication network |
CN114037048A (en) * | 2021-10-15 | 2022-02-11 | 大连理工大学 | Belief consistency multi-agent reinforcement learning method based on variational cycle network model |
CN114130034A (en) * | 2021-11-19 | 2022-03-04 | 天津大学 | Multi-agent game AI (Artificial Intelligence) design method based on attention mechanism and reinforcement learning |
CN114139637A (en) * | 2021-12-03 | 2022-03-04 | 哈尔滨工业大学(深圳) | Multi-agent information fusion method and device, electronic equipment and readable storage medium |
CN114463997A (en) * | 2022-02-14 | 2022-05-10 | 中国科学院电工研究所 | Lantern-free intersection vehicle cooperative control method and system |
CN114527666A (en) * | 2022-03-09 | 2022-05-24 | 西北工业大学 | CPS system reinforcement learning control method based on attention mechanism |
CN114900619A (en) * | 2022-05-06 | 2022-08-12 | 北京航空航天大学 | Self-adaptive exposure driving camera shooting underwater image processing system |
CN115047907A (en) * | 2022-06-10 | 2022-09-13 | 中国电子科技集团公司第二十八研究所 | Air isomorphic formation command method based on multi-agent PPO algorithm |
CN115300910A (en) * | 2022-07-15 | 2022-11-08 | 浙江大学 | Confusion-removing game strategy model generation method based on multi-agent reinforcement learning |
CN116090688A (en) * | 2023-04-10 | 2023-05-09 | 中国人民解放军国防科技大学 | Moving target traversal access sequence planning method based on improved pointer network |
WO2023231961A1 (en) * | 2022-06-02 | 2023-12-07 | 华为技术有限公司 | Multi-agent reinforcement learning method and related device |
CN117852710A (en) * | 2024-01-08 | 2024-04-09 | 山东大学 | Collaborative optimization scheduling method and system for multi-park comprehensive energy system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200125957A1 (en) * | 2018-10-17 | 2020-04-23 | Peking University | Multi-agent cooperation decision-making and training method |
CN112101564A (en) * | 2020-08-17 | 2020-12-18 | 清华大学 | Multi-agent value function decomposition method and device based on attention mechanism |
CN112232478A (en) * | 2020-09-03 | 2021-01-15 | 天津(滨海)人工智能军民融合创新中心 | Multi-agent reinforcement learning method and system based on layered attention mechanism |
CN112396187A (en) * | 2020-11-19 | 2021-02-23 | 天津大学 | Multi-agent reinforcement learning method based on dynamic collaborative map |
CN112417760A (en) * | 2020-11-20 | 2021-02-26 | 哈尔滨工程大学 | Warship control method based on competitive hybrid network |
-
2021
- 2021-06-28 CN CN202110717897.XA patent/CN113313267B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200125957A1 (en) * | 2018-10-17 | 2020-04-23 | Peking University | Multi-agent cooperation decision-making and training method |
CN112101564A (en) * | 2020-08-17 | 2020-12-18 | 清华大学 | Multi-agent value function decomposition method and device based on attention mechanism |
CN112232478A (en) * | 2020-09-03 | 2021-01-15 | 天津(滨海)人工智能军民融合创新中心 | Multi-agent reinforcement learning method and system based on layered attention mechanism |
CN112396187A (en) * | 2020-11-19 | 2021-02-23 | 天津大学 | Multi-agent reinforcement learning method based on dynamic collaborative map |
CN112417760A (en) * | 2020-11-20 | 2021-02-26 | 哈尔滨工程大学 | Warship control method based on competitive hybrid network |
Non-Patent Citations (2)
Title |
---|
JIANYU SU 等: "Value-Decomposition Multi-Agent Actor-Critics", AAAI-21 * |
YUANXIN ZHANG 等: "AVD-Net:Attention Value Decomposition Network For Deep Multi-Agent Reinforcement Learning", IEEE * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792861A (en) * | 2021-09-16 | 2021-12-14 | 中国科学技术大学 | Multi-agent reinforcement learning method and system based on value distribution |
CN113792861B (en) * | 2021-09-16 | 2024-02-27 | 中国科学技术大学 | Multi-agent reinforcement learning method and system based on value distribution |
CN113902125A (en) * | 2021-09-24 | 2022-01-07 | 浙江大学 | Intra-group cooperation intelligent agent control method based on deep hierarchical reinforcement learning |
CN114037048A (en) * | 2021-10-15 | 2022-02-11 | 大连理工大学 | Belief consistency multi-agent reinforcement learning method based on variational cycle network model |
CN114037048B (en) * | 2021-10-15 | 2024-05-28 | 大连理工大学 | Belief-consistent multi-agent reinforcement learning method based on variational circulation network model |
CN113919485A (en) * | 2021-10-19 | 2022-01-11 | 西安交通大学 | Multi-agent reinforcement learning method and system based on dynamic hierarchical communication network |
CN113919485B (en) * | 2021-10-19 | 2024-03-15 | 西安交通大学 | Multi-agent reinforcement learning method and system based on dynamic hierarchical communication network |
CN114130034A (en) * | 2021-11-19 | 2022-03-04 | 天津大学 | Multi-agent game AI (Artificial Intelligence) design method based on attention mechanism and reinforcement learning |
CN114139637A (en) * | 2021-12-03 | 2022-03-04 | 哈尔滨工业大学(深圳) | Multi-agent information fusion method and device, electronic equipment and readable storage medium |
CN114463997A (en) * | 2022-02-14 | 2022-05-10 | 中国科学院电工研究所 | Lantern-free intersection vehicle cooperative control method and system |
CN114527666B (en) * | 2022-03-09 | 2023-08-11 | 西北工业大学 | CPS system reinforcement learning control method based on attention mechanism |
CN114527666A (en) * | 2022-03-09 | 2022-05-24 | 西北工业大学 | CPS system reinforcement learning control method based on attention mechanism |
CN114900619A (en) * | 2022-05-06 | 2022-08-12 | 北京航空航天大学 | Self-adaptive exposure driving camera shooting underwater image processing system |
WO2023231961A1 (en) * | 2022-06-02 | 2023-12-07 | 华为技术有限公司 | Multi-agent reinforcement learning method and related device |
CN115047907A (en) * | 2022-06-10 | 2022-09-13 | 中国电子科技集团公司第二十八研究所 | Air isomorphic formation command method based on multi-agent PPO algorithm |
CN115047907B (en) * | 2022-06-10 | 2024-05-07 | 中国电子科技集团公司第二十八研究所 | Air isomorphic formation command method based on multi-agent PPO algorithm |
CN115300910A (en) * | 2022-07-15 | 2022-11-08 | 浙江大学 | Confusion-removing game strategy model generation method based on multi-agent reinforcement learning |
CN116090688A (en) * | 2023-04-10 | 2023-05-09 | 中国人民解放军国防科技大学 | Moving target traversal access sequence planning method based on improved pointer network |
CN117852710A (en) * | 2024-01-08 | 2024-04-09 | 山东大学 | Collaborative optimization scheduling method and system for multi-park comprehensive energy system |
Also Published As
Publication number | Publication date |
---|---|
CN113313267B (en) | 2023-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113313267A (en) | Multi-agent reinforcement learning method based on value decomposition and attention mechanism | |
CN108921298B (en) | Multi-agent communication and decision-making method for reinforcement learning | |
CN112116090B (en) | Neural network structure searching method and device, computer equipment and storage medium | |
CN115081936B (en) | Method and device for scheduling observation tasks of multiple remote sensing satellites under emergency condition | |
CN111401547B (en) | HTM design method based on circulation learning unit for passenger flow analysis | |
CN112763967B (en) | BiGRU-based intelligent electric meter metering module fault prediction and diagnosis method | |
CN113286275A (en) | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning | |
CN114860893A (en) | Intelligent decision-making method and device based on multi-mode data fusion and reinforcement learning | |
CN112990485A (en) | Knowledge strategy selection method and device based on reinforcement learning | |
CN111401557A (en) | Agent decision making method, AI model training method, server and medium | |
CN108921935A (en) | A kind of extraterrestrial target method for reconstructing based on acceleration gauss hybrid models | |
CN114510012A (en) | Unmanned cluster evolution system and method based on meta-action sequence reinforcement learning | |
CN116306686B (en) | Method for generating multi-emotion-guided co-emotion dialogue | |
CN115099606A (en) | Training method and terminal for power grid dispatching model | |
CN116643499A (en) | Model reinforcement learning-based agent path planning method and system | |
CN113313209A (en) | Multi-agent reinforcement learning training method with high sample efficiency | |
CN116933931A (en) | Cloud computing double-flow feature interaction electric vehicle charging pile occupation prediction method | |
CN111767991B (en) | Measurement and control resource scheduling method based on deep Q learning | |
CN114880527B (en) | Multi-modal knowledge graph representation method based on multi-prediction task | |
CN116128028A (en) | Efficient deep reinforcement learning algorithm for continuous decision space combination optimization | |
CN115587615A (en) | Internal reward generation method for sensing action loop decision | |
CN114371729B (en) | Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback | |
CN115204249A (en) | Group intelligent meta-learning method based on competition mechanism | |
CN114139674A (en) | Behavior cloning method, electronic device, storage medium, and program product | |
Sachdeva et al. | Gapformer: Fast autoregressive transformers meet rnns for personalized adaptive cruise control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Song Guanghua Inventor after: Fan Cheng Inventor after: Ye Zhenhui Inventor after: Chen Yining Inventor after: Ying Haochao Inventor after: Wu Jian Inventor after: Jiang Xiaohong Inventor before: Wu Jian Inventor before: Song Guanghua Inventor before: Jiang Xiaohong Inventor before: Fan Cheng Inventor before: Ye Zhenhui Inventor before: Chen Yining Inventor before: Ying Haochao |
|
GR01 | Patent grant | ||
GR01 | Patent grant |