CN114298178A - Multi-agent communication learning method - Google Patents

Multi-agent communication learning method Download PDF

Info

Publication number
CN114298178A
CN114298178A CN202111549398.0A CN202111549398A CN114298178A CN 114298178 A CN114298178 A CN 114298178A CN 202111549398 A CN202111549398 A CN 202111549398A CN 114298178 A CN114298178 A CN 114298178A
Authority
CN
China
Prior art keywords
agent
agents
communication
phase
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111549398.0A
Other languages
Chinese (zh)
Inventor
代浩
吴嘉澍
王洋
叶可江
张锦霞
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202111549398.0A priority Critical patent/CN114298178A/en
Publication of CN114298178A publication Critical patent/CN114298178A/en
Priority to PCT/CN2022/138140 priority patent/WO2023109699A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a communication learning method of a multi-agent, which comprises the following steps: the CriticNet, the ActorNet, the PriorNet and the EncoderNet, wherein the CriticNet is used for calculating the communication importance degree in the training phase and training three corresponding networks on the end equipment, namely the ActorNet, the PriorNet and the EncoderNet, wherein the ActorNet is used for selecting corresponding action on the intelligent body end and acting on the intelligent body end, and both the training phase and the execution phase work, the ActorNet needs to learn the strategy pi of the intelligent body in the training phase, and then generates corresponding action according to local observation and received information
Figure DDA0003416739830000011
That is to
Figure DDA0003416739830000012
Wherein
Figure DDA0003416739830000013
Is the message received by agent i at time t, i.e. the importance of the message of agent j to agent i, wherein PriorNet is used for agent to select the communication object, PriorNet will evaluate the agent observed in local observation and output an importance value
Figure DDA0003416739830000014
Wherein the EncodeNet is used for the agent to encode its own information to reduce the size of the message body.

Description

Multi-agent communication learning method
Technical Field
The invention relates to a communication learning method, in particular to a communication learning method of multiple intelligent agents.
Background
In a cooperative multi-agent system, all cooperative agents only have one global reward function, however, the observation range of each agent is limited, so that global information is lacked to perform perception or decision during cooperation, mutual exclusion decision occurs among the agents, and global optimization is difficult to achieve.
Deep Reinforcement Learning (DRL), an advanced artificial intelligence technique, has enjoyed great success in many challenging reality problems. It is widely deployed on different devices, such as smart cars, smart phones, wearable devices, smart cameras, and other smart objects in edge networks. The cooperative multi-agent reinforcement learning is a paradigm with more difficulty and more practical application value in DRL, and each agent is only locally observed and lacks global information, so that the action space is very huge and the calculation is complex; meanwhile, since there is only one global reward, it is difficult to assign a corresponding reward to an independent agent, so that it is difficult to train and guarantee convergence.
In order to solve the difficulty, the mainstream multi-intelligence algorithm at present adopts a Central Training and Distributed Execution (CTDE) architecture, global information is provided during training, and only agent observation is provided during execution. The architecture has a critic network during training, the critic network and the actor network are updated by the network according to the state-action combination of all agents, each agent only has an independent actor network during execution, and decision is made according to local observation. Typical architectures such as IQL, QMIX, etc. have global information during training, and each agent can only make a decision based on local information during execution. In the methods, other agents are regarded as part of the environment to be modeled, and the method only solves the problem of a single agent, so that the convergence cannot be guaranteed, and the agent can easily fall into endless exploration.
Therefore, many researches are beginning to focus on a communication-based multi-agent reinforcement learning method, and the most direct idea of the method is to introduce information transfer into the cooperation of multiple agents, and realize local centralization through the information transfer among the agents, so as to further overcome the problem of unstable environment and promote the cooperation among the agents. The mainstream method at present is CommNet, which receives local observations of all agents by using a mean value unit among policy networks of a plurality of agents, and broadcasts all agents after generating a message (star-shaped communication framework); TarMAC is a fully-connected network architecture, and all agents broadcast messages. The star-shaped and full-connected network architectures are all used for ensuring that messages generated by all agents are not omitted, ensuring that local observation information can be transmitted to all agents, and ensuring that the agents can have global information for decision making.
The existing communication learning method ensures that all agents can obtain the messages of all other agents, but brings huge redundant information. Because of the different dependencies between agents, the transfer of information between unrelated agents is not only useless, but may even negatively impact the agent's decision making.
Meanwhile, redundant information transmission also generates huge examination on the edge network, and because the edge network has a complex structure and limited communication bandwidth resources, the traditional communication learning method is often difficult to apply to the edge environment. The main application scene of the reinforcement learning of the multi-agent is under the edge network environment, so in order to solve the problem of mismatching between network bandwidth and resources required by communication learning, the invention analyzes the influence of messages of other agents on the current agent, provides an index for describing the importance of the messages, groups the agents accordingly, reduces network communication traffic through the idea of layered transmission, and realizes the communication learning method facing the edge network deep reinforcement learning.
Disclosure of Invention
It is an advantage of the present invention to provide a multi-agent communication learning method that introduces messaging between multi-agents to transmit local observations for the agents to take into account global conditions in making decisions.
An advantage of the present invention is to provide a multi-agent communication learning method, wherein the multi-agent communication learning method designs an importance ranking index and an efficient grouping algorithm to reduce the amount of messages to be transmitted, and implements an efficient communication learning method for effectively reducing communication bandwidth consumption caused by unnecessary messages.
An advantage of the present invention is to provide a multi-agent communication learning method, which can be used for all kinds of applications such as multi-agent intelligent driving, robot navigation, logistics scheduling, etc. in multi-agent reinforcement learning in a border network.
One advantage of the present invention is to provide a multi-agent communication learning method, wherein the multi-agent communication learning method is suitable for scenes requiring multi-scene fusion sensing, such as multi-camera fusion and other scenes.
The technical scheme provided by the invention for the technical problem is as follows:
the invention provides a communication learning method of a multi-agent, which comprises the following steps:
criticenet, wherein said criticenet is used for calculating the communication importance in training phase, and is used for training three corresponding networks on the end device, namely said actenet, said PriorNet and said EncoderNet;
the ActorNet is used for selecting corresponding actions on the intelligent body end, acting on the intelligent body end, working in both a training phase and an execution phase, needing to learn out a strategy pi of the intelligent body in the training phase, and then generating corresponding actions according to local observation and received messages
Figure BDA0003416739810000031
That is to
Figure BDA0003416739810000032
Wherein
Figure BDA0003416739810000033
The message received by agent i at time t, i.e. the importance of the message of agent j to agent i;
PriorNet, wherein the PriorNet is used for the agent to select the communication object, PriorNet evaluates the agent observed in the local observation and outputs an importance value
Figure BDA0003416739810000034
And
EncodeNet, wherein the EncodeNet is used for the agent to encode its own information, in order to reduce the size of the message body.
Preferably, the CriticNet runs in the cloud, works only in the training phase, and by calculating the global reward and the communication priority, the CriticNet will calculate the network loss and transfer the gradient back to the rest of the networks, and update the rest of the network parameters.
Preferably, when the importance value exceeds a certain threshold, it indicates that agent i currently needs to obtain the message of agent j to make a decision.
Preferably, the agent encodes previous actions of the agent itself together with observations for reference by other agents, improving the stability of the cooperation.
Preferably, the method further comprises a method for calculating the importance, and the steps are as follows:
step A: observing whether the action of the output of the ActorNet network is caused or not by removing the message of the agent j;
and B: since the actions of the actonet output are the distribution of an action set, the KL divergence is used to calculate the difference between the action distributions of the actonet output of the agent, and the specific formula is as follows:
Figure BDA0003416739810000035
and C: wherein o is{i}Set of messages, o, representing all the remaining agents observed by agent i{{i}\j}Representing the set of messages observed by agents other than agent j, the difference calculated by the formula representing whether the decision distribution of the message lacking agent j is consistent with the decision distribution of the message owning agent j;
step D: if the difference is large, the message of the agent j is important for i, so the communication confidence is higher;
step E: after calculating the confidence degrees of all the agents, obtaining a confidence matrix M between the agents, and grouping the agents by the confidence matrix.
Preferably, further comprising a distributed grouping method, the PriorNet network outputs two values, respectively, query and signature: the signature vector is an information fingerprint of the agent and comprises the position of the agent and the code of a label; the query vector is query information and represents the encoding of the set of agents that the agent needs to communicate.
Preferably, the communication system further comprises a communication mechanism, wherein the communication mechanism comprises a handshake phase, an election phase, a communication phase and a decision phase, in the handshake phase, all the smarts broadcast the query and the signature to the smarts in the observation, after receiving the query and the signature, all the smarts restore a confidence matrix of the communication by multiplying vectors, in the election phase, all the smarts calculate an adjacency graph and select the smarts with the greatest degree as preset smarts, namely, most of the smarts want to obtain messages of the preset smarts to make decisions, and the preset smarts serve as leader nodes, in the communication phase, all the non-leader nodes send their own messages to the leader nodes, the leader encodes the received messages through an encoder network, then performs communication among the leaders, and the leader transmits the messages to other leaders, in the decision stage, the leader makes a decision according to the received other leader messages, and sends the decision and the messages to other non-leader agents in the same group, and the other agents make the next decision according to the decision and the messages.
Compared with the current mainstream methods such as star shape and full-connection shape, the method has the advantages that:
1. both fully-connected and star communication networks disregard the effect of the message itself on the agent's decision making, and the receipt of an inappropriate message may affect agent convergence, and thus global reward maximization. The invention provides the method for measuring the importance of the message by using the KL divergence, ensures that only effective information is transmitted, avoids redundant message transmission and improves the convergence rate.
2. The invention adopts grouping and electing leader mode to communicate, thus greatly reducing communication link and reducing communication bandwidth consumption.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a network diagram of a multi-agent communication learning method provided by the present invention.
FIG. 2 is a schematic diagram of spectral clustering of a multi-agent communication learning method provided by the present invention.
Fig. 3 illustrates the group communication of agents in the multi-agent communication learning method provided by the present invention.
FIG. 4 is a diagram illustrating an improvement in global rewards for cooperating agents in a multi-agent communication learning method of the present invention.
Fig. 5 illustrates communication traffic between multiple wisdom for the multi-agent communication learning method provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
A typical distributed edge computing architecture is composed of a plurality of edge devices ("Device" expression), and assuming that there are N edge devices, each Device i can be regarded as an agent, and the agents can be connected with each other through WIFI, 5G, and other networks, and have limited computing power and bandwidth resources. Each agent has an action set A, agent i has its own local observation oit at each time t, and the agent selects and executes the next action based on its own observation oit and action policy, i.e., it is
Figure BDA0003416739810000051
Meanwhile, when all agents have made corresponding actions, all agents can obtain a global reward value r ═ env (a)0,a1,...,an)。
The goal of a cooperative multi-agent system is to maximize the cumulative value of this global reward r, so all agents need to keep track of the global information of interest through messaging to achieve a cooperative decision.
The invention follows CTDE framework, keeps comprehensive information intercommunication in training stage, and carries out information coding and communication object selection according to the trained communication network in execution stage.
As shown in the figure1, the multi-agent communication learning method of the invention comprises a CriticNet, an ActorNet, a PriorNet and an EncoderNet, wherein the CriticNet is used for calculating communication importance in a training phase and training three corresponding networks on an end device, namely the ActorNet, the PriorrNet and the EncoderNet, further, the CriticNet operates in a cloud and works only in the training phase, and by calculating global reward and communication priority, the CriticNet calculates network loss and transmits gradient back to the rest networks and updates the rest network parameters, wherein the ActorNet is used for selecting corresponding action at an end of an agent, acting on the agent, working in both the training phase and the execution phase, the ActorNet needs to learn strategy pi of the agent in the training phase, and then generates corresponding action according to local observation and received message
Figure BDA0003416739810000061
That is to
Figure BDA0003416739810000062
Wherein
Figure BDA0003416739810000063
Is the message received by agent i at time t, where PriorNet is used for agent to select the object of communication, PriorNet will evaluate the agent observed in local observation and output an importance value
Figure BDA0003416739810000064
Namely the importance of the message of the agent j to the agent i, when the importance value exceeds a certain threshold value, it indicates that the agent i needs to obtain the message of the agent j to make a decision, wherein the encoderNet is used for encoding the information of the agent j, because the observation of the agent to the environment is low-dimensional and sparse, the agent needs to be converted into a high-dimensional representation through an encoding network, so as to reduce the size of the agent, besides, the agent needs to encode the previous action of the agent j together with the observation in addition to the observation information, so as to be referred by other agents, and the stability of cooperation is improved.
For the strategy network, namely the ActorNet and the corresponding reward loss, the multi-agent communication learning method uses a cross entropy loss function as an error and uses a gradient descent method as a parameter updating means.
Furthermore, the multi-agent communication learning method of the invention is used for selecting communication objects of the agents and communication interaction modes.
The invention relates to a method for calculating importance, namely how to weight other agents observed by an agent i and distribute the execution degree of communication, which comprises the following steps:
step A: observing whether the action of the output of the ActorNet network is caused or not by removing the message of the agent j;
and B: since the actions of the actonet output are the distribution of an action set, the KL divergence is used to calculate the difference between the action distributions of the actonet output of the agent, and the specific formula is as follows:
Figure BDA0003416739810000065
and C: wherein o is{i}Set of messages, o, representing all the remaining agents observed by agent i{{i}\j}Representing the set of messages observed by agents other than agent j, the difference calculated by the formula representing whether the decision distribution of the message lacking agent j is consistent with the decision distribution of the message owning agent j;
step D: if the difference is large, the message of the agent j is important for i, so the communication confidence is higher;
it should be noted that the network needs to calculate the output of the actranet many times, so the calculation can be performed only in the training phase, and the calculation result will be used as a supervision signal of PriorNet to train the network, so that the communication confidence can be directly calculated by PriorNet without repeated calculation in the execution phase.
Step E: after calculating the confidence degrees of all the agents, obtaining a confidence matrix M between the agents, and grouping the agents by the confidence matrix.
As shown in the following figure, an exemplary matrix M is shown, which is sparse, indicating that most agents do not need to communicate with each other, and the agents can be grouped by a spectral clustering algorithm. Spectral clustering is an algorithm evolved from graph theory, and is widely applied to clustering later. The main idea is to treat all data as points in space, which can be connected by edges. The edge weight value between two points with a longer distance is lower, the edge weight value between two points with a shorter distance is higher, and the graph formed by all data points is cut, so that the edge weight sum between different subgraphs after graph cutting is as low as possible, and the edge weight sum in the subgraph is as high as possible, thereby achieving the purpose of clustering.
Figure BDA0003416739810000071
As shown in FIG. 3, with the clustering algorithm, communication within each group can be made dense, while communication between groups is made sparse.
In the execution stage, the intelligent agents are communicated in a distributed mode, and a central node does not help grouping, so the distributed grouping method is provided by the invention. The invention enables the PriorNet network to output two values, namely query and signature: the signature vector is an information fingerprint of the agent and comprises the position of the agent and the code of a label; the query vector is query information and represents the encoding of the set of agents that the agent needs to communicate.
Further, the communication mechanism of the invention comprises a handshake phase, an election phase, a communication phase and a decision phase, wherein in the handshake phase, all the agents broadcast the query and the signature to the agents in the observation, and after receiving the query and the signature, all the agents can restore the confidence matrix of the communication by multiplying vectors, wherein in the election phase, all the agents calculate the confidence matrix, calculate an adjacency graph and select the agent with the maximum outcome, namely most agents want to obtain the message of the agent for decision, so that the agent can be used as a leader node, wherein in the communication phase, all non-leader nodes send their own messages to the leader node, the leader encodes the received messages through an encoder network, then performs communication among the leaders, the leader transmits the messages to other leaders, and in the decision phase, and the leader makes a decision according to the received other leader messages, and sends the decision and the message to other non-leader agents in the same group, and the other agents make the next decision according to the decision and the message.
Through the grouped communication mode, the invention effectively reduces the communication cost, reduces the communication link, and realizes the high-efficiency communication learning of the multi-agent reinforcement learning, thereby realizing the calculation and the measurement of the importance of the message through the KL divergence in the multi-agent; grouping of agents is achieved by using a spectral clustering algorithm on the confidence matrix, thereby reducing communication links; and (4) electing in the group through the outdegree of the graph, and selecting leader nodes from the elected leader nodes to realize communication among the groups, thereby reducing the communication traffic.
As shown in fig. 4, the invention is proved to be feasible by sufficient experiments, and verified in an openai-sourced multi-intelligence reinforcement learning environment, it can be found that the invention can help promote cooperation among multiple intelligence and maximize global rewards.
As shown in FIG. 5, the communication volume of the present invention will decrease steadily with the training, and the agents learn to promote cooperation through communication, so the communication volume increases rapidly, and as the training continues, the grouping method starts to work, and the communication volume decreases gradually.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A multi-agent communication learning method, comprising:
criticenet, wherein said criticenet is used for calculating the communication importance in training phase, and is used for training three corresponding networks on the end device, namely said actenet, said PriorNet and said EncoderNet;
the ActorNet is used for selecting corresponding actions on the intelligent body end, acting on the intelligent body end, working in both a training phase and an execution phase, needing to learn out a strategy pi of the intelligent body in the training phase, and then generating corresponding actions according to local observation and received messages
Figure FDA0003416739800000011
That is to
Figure FDA0003416739800000012
Wherein
Figure FDA0003416739800000013
The message received by agent i at time t, i.e. the importance of the message of agent j to agent i;
PriorNet, wherein the PriorNet is used for the agent to select the communication object, PriorNet evaluates the agent observed in the local observation and outputs an importance value
Figure FDA0003416739800000014
And
EncodeNet, wherein the EncodeNet is used for the agent to encode its own information, in order to reduce the size of the message body.
2. The method of claim 1, wherein the CriticNet runs in the cloud, works only in the training phase, and by computing global reward and communication priority, the CriticNet will compute network loss and pass the gradient back to the rest of the network and update the rest of the network parameters.
3. The method of claim 1, wherein when the importance value exceeds a certain threshold, it indicates that agent i currently needs to obtain a message from agent j to make a decision.
4. The agent of claim 1, wherein the agent encodes previous actions of itself with observations for reference by other agents to promote stability of collaboration.
5. The method of claim 1, further comprising a method of calculating importance, the steps of:
step A: observing whether the action of the output of the ActorNet network is caused or not by removing the message of the agent j;
and B: since the actions of the actonet output are the distribution of an action set, the KL divergence is used to calculate the difference between the action distributions of the actonet output of the agent, and the specific formula is as follows:
Figure FDA0003416739800000015
and C: wherein o is{i}Set of messages, o, representing all the remaining agents observed by agent i{{o}\j}Representing the set of messages observed by agents other than agent j, the difference calculated by the formula representing whether the decision distribution of the message lacking agent j is consistent with the decision distribution of the message owning agent j;
step D: if the difference is large, the message of the agent j is important for i, so the communication confidence is higher;
step E: after calculating the confidence degrees of all the agents, obtaining a confidence matrix M between the agents, and grouping the agents by the confidence matrix.
6. The method of claim 1, further comprising a distributed grouping method, wherein the PriorNet network outputs two values, respectively, query and signature: the signature vector is an information fingerprint of the agent and comprises the position of the agent and the code of a label; the query vector is query information and represents the encoding of the set of agents that the agent needs to communicate.
7. The method of claim 6, further comprising a communication mechanism, wherein the communication mechanism comprises a handshake phase, an election phase, a communication phase and a decision phase, wherein in the handshake phase, all agents broadcast query and signature to agents in observation, all agents restore a confidence matrix of communication through multiplication between vectors after receiving query and signature, in the election phase, all agents calculate an adjacency graph after calculating the confidence matrix, and select the agent with the highest degree of derivation as a preset agent, namely, most agents want to obtain messages of the preset agent for decision, the preset agent serves as a leader node, in the communication phase, all non-leader nodes send their own messages to the leader node, the leader encodes the received messages through an encoder network, and then performs communication between leaders, and after the leader mutually transmits the messages to other leaders, in the decision stage, the leader makes a decision according to the received other leader messages and sends the decision and the messages to other non-leader agents in the same group, and the other agents make the next decision according to the decision and the messages.
CN202111549398.0A 2021-12-17 2021-12-17 Multi-agent communication learning method Pending CN114298178A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111549398.0A CN114298178A (en) 2021-12-17 2021-12-17 Multi-agent communication learning method
PCT/CN2022/138140 WO2023109699A1 (en) 2021-12-17 2022-12-09 Multi-agent communication learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111549398.0A CN114298178A (en) 2021-12-17 2021-12-17 Multi-agent communication learning method

Publications (1)

Publication Number Publication Date
CN114298178A true CN114298178A (en) 2022-04-08

Family

ID=80967633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111549398.0A Pending CN114298178A (en) 2021-12-17 2021-12-17 Multi-agent communication learning method

Country Status (2)

Country Link
CN (1) CN114298178A (en)
WO (1) WO2023109699A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109699A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Multi-agent communication learning method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114123178B (en) * 2021-11-17 2023-12-19 哈尔滨工程大学 Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method
CN117031399B (en) * 2023-10-10 2024-02-20 浙江华创视讯科技有限公司 Multi-agent cooperative sound source positioning method, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113286275A (en) * 2021-04-23 2021-08-20 南京大学 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
CN113642233B (en) * 2021-07-29 2023-12-29 太原理工大学 Group intelligent collaboration method for optimizing communication mechanism
CN113592079A (en) * 2021-08-13 2021-11-02 大连大学 Cooperative multi-agent communication method oriented to large-scale task space
CN114298178A (en) * 2021-12-17 2022-04-08 深圳先进技术研究院 Multi-agent communication learning method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109699A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Multi-agent communication learning method

Also Published As

Publication number Publication date
WO2023109699A1 (en) 2023-06-22

Similar Documents

Publication Publication Date Title
CN114298178A (en) Multi-agent communication learning method
CN109039942B (en) Network load balancing system and balancing method based on deep reinforcement learning
WO2021017227A1 (en) Path optimization method and device for unmanned aerial vehicle, and storage medium
CN110852448A (en) Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN111628855B (en) Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning
CN113010305B (en) Federal learning system deployed in edge computing network and learning method thereof
CN111629380A (en) Dynamic resource allocation method for high-concurrency multi-service industrial 5G network
CN113537514A (en) High-energy-efficiency federal learning framework based on digital twins
CN113033712A (en) Multi-user cooperative training people flow statistical method and system based on federal learning
CN113341712B (en) Intelligent hierarchical control selection method for unmanned aerial vehicle autonomous control system
CN113312177B (en) Wireless edge computing system and optimizing method based on federal learning
CN111865474B (en) Wireless communication anti-interference decision method and system based on edge calculation
CN114710439B (en) Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning
CN110300417A (en) The energy efficiency optimization method and device of Communication Network for UAVS
CN109889525A (en) Multi-communication protocol Intellisense method
CN117639244A (en) Centralized control system of multi-domain heterogeneous power distribution communication network
CN110661566B (en) Unmanned aerial vehicle cluster networking method and system adopting depth map embedding
Meng et al. Intelligent routing orchestration for ultra-low latency transport networks
Zhuang et al. GA-MADDPG: A Demand-Aware UAV Network Adaptation Method for Joint Communication and Positioning in Emergency Scenarios
Sun et al. QPSO-based QoS multicast routing algorithm
CN114997422A (en) Grouping type federal learning method of heterogeneous communication network
CN114022731A (en) Federal learning node selection method based on DRL
Zhou et al. Digital Twin-based 3D Map Management for Edge-assisted Device Pose Tracking in Mobile AR
Di Giacomo et al. Edge-assisted gossiping learning: Leveraging v2v communications between connected vehicles
Ma Multi-Task Offloading via Graph Neural Networks in Heterogeneous Multi-access Edge Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination