CN116582442A - Multi-agent cooperation method based on hierarchical communication mechanism - Google Patents
Multi-agent cooperation method based on hierarchical communication mechanism Download PDFInfo
- Publication number
- CN116582442A CN116582442A CN202310285877.9A CN202310285877A CN116582442A CN 116582442 A CN116582442 A CN 116582442A CN 202310285877 A CN202310285877 A CN 202310285877A CN 116582442 A CN116582442 A CN 116582442A
- Authority
- CN
- China
- Prior art keywords
- agent
- communication
- agents
- group
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000006854 communication Effects 0.000 title claims abstract description 107
- 238000004891 communication Methods 0.000 title claims abstract description 106
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000007246 mechanism Effects 0.000 title claims abstract description 27
- 239000010410 layer Substances 0.000 claims abstract description 19
- 230000009471 action Effects 0.000 claims abstract description 17
- 230000002776 aggregation Effects 0.000 claims abstract description 13
- 238000004220 aggregation Methods 0.000 claims abstract description 13
- 239000002356 single layer Substances 0.000 claims abstract description 7
- 239000003795 chemical substances by application Substances 0.000 claims description 178
- 230000003993 interaction Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 5
- 230000002787 reinforcement Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000002079 cooperative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a multi-agent cooperation method based on a layering communication mechanism, which comprises the steps of firstly establishing an intra-group communication topological structure and dividing agents into different groups according to communication distances; then establishing an inter-group communication topological structure, and searching for high-level intelligent agents in the group by using a centrality method based on the number of neighbor nodes; then, carrying out intra-group information aggregation and inter-group information exchange; constructing a hierarchical graph convolution network by using a multi-head attention mechanism, and weighting and adding value representations of all input features through a relation for each attention head; then, the outputs of m attention heads of the intelligent agent are connected in series, and then the outputs are sent into a single-layer MLP with ReLU nonlinearity to obtain the output of a convolution layer; and finally, making an action decision, and taking action according to the strategy. The invention breaks through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, and relieves the problem of dimension disasters of centralized training.
Description
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a multi-agent cooperative method.
Background
Along with the increase of the scale of unmanned systems, the multi-agent system mainly comprising cooperation starts to play a great role, completes complex and difficult tasks in a high-dimensional dynamic environment, and is widely applied to the aspects of automatic driving in intelligent traffic, collaborative combat in future war, autonomous distribution in intelligent logistics, communication search and rescue of geological disasters and the like. In recent years, multi-agent deep reinforcement learning (Multi-agent Deep Reinforcement Learning, MADRL) combines the perception capability of deep learning and the decision capability of reinforcement learning and applies the combination to a Multi-agent system, so as to effectively complete various complex tasks in the Multi-agent environment, such as competition, cooperation or mixed tasks. The multi-agent communication establishes a communication structure between agents and the communication strategy to carry out abstract information transmission and reception to exchange respective states so as to coordinate respective strategies, thereby helping the agents to negotiate and adjust behavior decisions through exchanging information such as observation, intention or experience in operation, improving the overall learning performance and realizing respective learning targets. However, in view of the large number of agents and the nature of redundant information interference systems, it is of great importance to provide a hierarchical multi-agent communication mechanism with efficient communication capabilities.
In the method level, the existing algorithms based on a gate mechanism, an attention mechanism and an algorithm based on a graph network explore multi-agent communication mechanism technology in the fields of selection of communication objects, determination of communication time, content of communication information, updating of information, improvement of communication efficiency and the like. Algorithms that use gating mechanisms feature learning when to communicate and communicate efficiently. CommNet is a classical communication network that averages all agent states at a previous time and aggregates them with the agent state at the current time by LSTM to make action predictions. The gate mechanism is used for enabling the agents to receive vectors from other agents as part of input, the vectors contain certain communication contents, the contents can synchronously learn along with gradient descent, and information is transferred between the agents, so that the completion of tasks is facilitated. Algorithms that use the attention mechanism to screen and decide with whom to communicate and how to effectively aggregate information for hidden states. Vain introduces an attention mechanism to calculate an attention vector of each agent outputting communication information, and on the basis of this, represents the relationship between two agents. The graph network-based model focuses on modeling relationships between agents, utilizing the graph convolution layer to learn collaboration from potential features generated from progressively increasing accepted domains. The DGN calculates interaction between the agents by taking the attention of the multi-head dot product as a convolution kernel on the basis of a graph network, and can extract higher-order relation expression by utilizing a multi-convolution layer, so that the interaction between the agents is effectively captured, residual connection is introduced, and the stability of collaborative decision and convergence is promoted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-agent cooperation method based on a layering communication mechanism, which comprises the steps of firstly establishing an intra-group communication topological structure and dividing agents into different groups according to communication distances; then establishing an inter-group communication topological structure, and searching for high-level intelligent agents in the group by using a centrality method based on the number of neighbor nodes; then, carrying out intra-group information aggregation and inter-group information exchange; constructing a hierarchical graph convolution network by using a multi-head attention mechanism, and weighting and adding value representations of all input features through a relation for each attention head; then, the outputs of m attention heads of the intelligent agent are connected in series, and then the outputs are sent into a single-layer MLP with ReLU nonlinearity to obtain the output of a convolution layer; and finally, making an action decision, and taking action according to the strategy. The invention breaks through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, and relieves the problem of dimension disasters of centralized training.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
step 1: establishing an intra-group communication topological structure;
dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v i I-th agent, v j The communication distance rc is the j-th intelligent agent;
obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) ij ) n×n A representation; e, e ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;
step 2: establishing an inter-group communication topological structure;
searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;
high-level intelligent agentA representation;
for centrality-measuring methodsA representation;
wherein Dgree is max To obtain the maximum centrality node in the group, dgre is the centrality of the calculation node, edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes;
step 3: intra-group information aggregation and inter-group information exchange;
constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v i Set of agents N in local area in communication therewith i For the mth attention header, agent v i and vj ∈N i The relationship between them is calculated as:
where beta is a scale factor and where beta is a scale factor,represents h i Is a query weight matrix of h i Representing agent v i Feature vector of>Represents h j Key weight matrix of h j Representing agent v j Feature vector, h e Representing agent v e Is represented in agent v i Set of agents N in local area in communication therewith i The e-th agent in (a);
for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:
wherein ,represents h j Is a matrix of value weights, M represents the number of attention heads, σ represents the MLP feel with ReLU activation functionA machine is known;
step 4: action decision-making;
cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I l Representing each layer of the graph convolutional neural network, +.>Representing agent v in each tier of network i Is characterized by (2);
the loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein γ is a discount factor; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, which is a hierarchical multi-agent cooperative communication model;
based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
The beneficial effects of the invention are as follows:
the invention adopts a flexible and easily understood hierarchical communication topological structure, can divide the hierarchical structure among the agents to distinguish the states of different groups among the agents and the influence of different groups on the agents, uses the graph convolution based on the attention mechanism to carry out information aggregation, and helps the agents expand the receptive field through graph convolution communication, thereby improving the strategy quality. And meanwhile, the neural network replayed by experience is used for reducing the interaction times with the environment, and the training data with high efficiency and diversity are fully utilized. By establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, centralized data acquisition, aggregation and utilization are broken, and the problem of dimension disasters of centralized training is relieved. Communication efficiency and performance of the multi-agent system in cooperation, coordination or cooperative activities in the global scope are improved, and influences of individuals and group agents are balanced better.
Drawings
FIG. 1 is a diagram of the overall framework of the hierarchical multi-agent cooperative communication mechanism of the present invention.
FIG. 2 is a diagram of a multi-head attention mechanism according to the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
The invention provides a multi-agent cooperation method based on a layering communication mechanism. The following principle is utilized: the hierarchical structure is a more complex communication structure which is evolved in the topological structure, the hierarchy of the nodes and the sequence of the communication structure can be divided, and the communication mode and strategy which are closer to reality can be learned. In order to solve the problem of high-efficiency communication in a multi-agent deep reinforcement learning scene, a hierarchical multi-agent cooperative communication model (Hierarchical Multi-Agent Collaborative Communication Mechanism, HMACC) is provided. Firstly, the hierarchical structure is divided among the agents to distinguish states of different groups among the agents, influence of the different groups on the agents is distinguished, then information in the groups and among the groups is distinguished, the agents are helped to expand receptive fields, and policy quality is improved. The invention decomposes the communication process into a local interaction process and a global interaction process, models and learns through proper strategies, and improves the communication efficiency and performance when the multi-agent system performs cooperation, coordination or cooperation activities in a global scope.
A multi-agent cooperation method based on a layering communication mechanism comprises the following steps:
step 1: establishing an intra-group communication topological structure;
dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v i I-th agent, v j The communication distance rc is the j-th intelligent agent;
obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) ij ) n×n A representation; e, e ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;
step 2: establishing an inter-group communication topological structure;
searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;
high-level intelligent agentA representation;
for centrality-measuring methodsA representation;
wherein Dgree is max To obtain the maximum centrality node in the group, dgre is the centrality of the calculation node, edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes;
step 3: intra-group information aggregation and inter-group information exchange;
constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v i A set of intelligence in local area in communication therewithEnergy set N i For the mth attention header, agent v i and vj ∈N i The relationship between them is calculated as:
where beta is a scale factor and where beta is a scale factor,represents h i Is a query weight matrix of h i Representing agent v i Feature vector of>Represents h j Key weight matrix of h j Representing agent v j Feature vector, h e Representing agent v e Is represented in agent v i Set of agents N in local area in communication therewith i The e-th agent in (a);
for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:
wherein ,represents h j M represents the number of attention heads, σ represents the MLP perceptron with the ReLU activation function;
step 4: action decision-making;
cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I l Representing a graph convolutionEach layer of the neural network, < >>Representing agent v in each tier of network i Is characterized by (2);
the loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein gamma is a discount factor, and the discount factor has smaller influence on the state return value of the closer time, so that the influence on the decision result is larger, and the learning process is more stable; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, and theta is the hierarchical multi-agent cooperative communication model provided by the invention;
based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
The invention discloses a multi-agent cooperation method based on a layering communication mechanism, and relates to the field of multi-agent cooperation based on multi-agent deep reinforcement learning. The specific process is as follows: firstly, establishing an intra-group communication topological structure, dividing the intelligent agents into different groups according to communication distances, and communicating the intelligent agents within the communication range of the intelligent agents, wherein in the communication topological diagram, all the intelligent agents are full duplex, namely, all the intelligent agents transmit and receive information. And then establishing an inter-group communication topological structure, searching for high-level intelligent agents in groups by using a centrality method based on the number of neighbor nodes, wherein the high-level intelligent agents can communicate with the high-level intelligent agents in other groups, and exchange inter-group information. Intra-group information aggregation and inter-group information exchange then take place. A hierarchical graph convolution network is built by using a multi-head attention mechanism, interaction among agents is calculated by using multi-head dot product attention as a convolution kernel, and each agent is taken as an entity. For each agent, there is a set of agents (dynamically changing number of neighbor agents) in a local area, and for each attention header, the value representations of all input features are weighted and added by relationships. Then, the outputs of the m attention heads of the agent are serially connected and then fed into a single layer MLP with ReLU nonlinearity to obtain the output of the convolutional layer. And finally, performing action decision, wherein in each step, each agent observes the state, takes action according to the strategy, and uses the residual network structure to cascade and send the hidden layer characteristics obtained in each layer into the strategy network, and the strategy network uses the Q network with experience replay. HMACs break through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between agents, and relieve the dimension disaster problem of centralized training.
Step 1: establishing intra-group communication topology
The agents are divided into different groups according to communication distance. For dividing groupsTo indicate (I)>For a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group for the j-th agent formed at time t,/>For the communication range cluster dividing module, in order to adapt to the practical environment, the intelligent agents in the communication range of the intelligent agents can communicate with each other, and in the communication topological diagram, the invention defaults that all the intelligent agents are full duplex, namely, all the intelligent agents transmit and receive information. v i And rc is the communication distance for the ith agent. Intra-group adjacency matrix of the communication topology graph can then be obtained, using c= (e) ij ) n×n And (3) representing.
Step 2: establishing inter-group communication topology
On the basis, a centrality method based on the number of neighbor nodes is used for searching high-level intelligent agents in the group. The higher level agents may communicate with other groups of higher level agents, exchanging inter-group information. High-level intelligent agentTo indicate, use of the method for centrality +.> And (3) representing. Wherein Dgree is max Dgre is the maximum centrality node in the group, and Dgree is the centrality of a node calculated by the node, edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes.
Step 3: intra-group information aggregation and inter-group information exchange
A hierarchical graph convolutional neural network is built by using a multi-head attention mechanism, interaction among agents is calculated by using multi-head dot product attention as a convolution kernel, and each agent is taken as an entity. For each agent v i In the local area there is a group of agents N i (number of dynamically changing neighborhood agents). For attention header m, v i and vj ∈N i The relation between them is calculated as
Where β is a scale factor, the value representations of all input features are weighted and added by a relationship for each attention header. Then, agent v i The outputs of the m attention heads of (a) are connected in series and then fed into a single layer MLP with ReLU nonlinearity to obtain the output of the convolution layer
This approach can better account for correlation between agents and help balance intra-and inter-group information. The effective aggregation and transfer of agent information within groups will directly affect the agent's strategy, and communication between groups is beneficial to group jump out of local optimization.
Step 4: action decision
In each step, each agent observes the state and takes action according to the policy. In consideration of expansibility, the invention designs the Q network replayed by experience to reduce the interaction times with the environment and efficiently utilizes diversified training data. The features obtained from each layer are first cascaded byRepresentation, wherein I l Each layer of the convolutional neural network is represented so that the actions of the agent can be represented as a i =θ(h″ i ) θ is our model, loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'). Wherein gamma is a discount factor, and the discount factor has smaller influence on the state return value of the more recent time, so that the influence on the decision result is larger, and the learning process is more stable. />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, and theta is the hierarchical multi-agent cooperative communication model provided by the invention; based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
Claims (1)
1. The multi-agent cooperation method based on the layering communication mechanism is characterized by comprising the following steps of:
step 1: establishing an intra-group communication topological structure;
dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v i I-th agent, v j The communication distance rc is the j-th intelligent agent;
obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) ij ) n×n A representation; e, e ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;
step 2: establishing an inter-group communication topological structure;
searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;
high-level intelligent agentA representation;
for centrality-measuring methodsA representation;
wherein Dgree is max To obtain the maximum centrality node in the group, dgree is the degree of the calculation nodeHeart and edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes;
step 3: intra-group information aggregation and inter-group information exchange;
constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v i Set of agents N in local area in communication therewith i For the mth attention header, agent v i and vj ∈N i The relationship between them is calculated as:
where beta is a scale factor and where beta is a scale factor,represents h i Is a query weight matrix of h i Representing agent v i Feature vector of>Represents h j Key weight matrix of h j Representing agent v j Feature vector, h e Representing agent v e Is represented in agent v i Set of agents N in local area in communication therewith i The e-th agent in (a);
for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:
wherein ,represents h j M represents the number of attention heads, σ represents the MLP perceptron with the ReLU activation function;
step 4: action decision-making;
cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I l Representing each layer of the graph convolutional neural network, +.>Representing agent v in each tier of network i Is characterized by (2);
the loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein γ is a discount factor; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, which is a hierarchical multi-agent cooperative communication model;
based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310285877.9A CN116582442A (en) | 2023-03-22 | 2023-03-22 | Multi-agent cooperation method based on hierarchical communication mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310285877.9A CN116582442A (en) | 2023-03-22 | 2023-03-22 | Multi-agent cooperation method based on hierarchical communication mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116582442A true CN116582442A (en) | 2023-08-11 |
Family
ID=87532901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310285877.9A Pending CN116582442A (en) | 2023-03-22 | 2023-03-22 | Multi-agent cooperation method based on hierarchical communication mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116582442A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116992400A (en) * | 2023-09-27 | 2023-11-03 | 之江实验室 | Collaborative sensing method and collaborative sensing device based on space-time feature fusion |
-
2023
- 2023-03-22 CN CN202310285877.9A patent/CN116582442A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116992400A (en) * | 2023-09-27 | 2023-11-03 | 之江实验室 | Collaborative sensing method and collaborative sensing device based on space-time feature fusion |
CN116992400B (en) * | 2023-09-27 | 2024-01-05 | 之江实验室 | Collaborative sensing method and collaborative sensing device based on space-time feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Multi-agent game abstraction via graph attention neural network | |
CN109635917B (en) | Multi-agent cooperation decision and training method | |
Jiang et al. | Distributed resource scheduling for large-scale MEC systems: A multiagent ensemble deep reinforcement learning with imitation acceleration | |
CN110401964A (en) | A kind of user oriented is Poewr control method of the center network based on deep learning | |
Ge et al. | Multi-agent transfer reinforcement learning with multi-view encoder for adaptive traffic signal control | |
Dong et al. | Joint optimization of deployment and trajectory in UAV and IRS-assisted IoT data collection system | |
CN114091667A (en) | Federal mutual learning model training method oriented to non-independent same distribution data | |
CN112990485A (en) | Knowledge strategy selection method and device based on reinforcement learning | |
WO2023109699A1 (en) | Multi-agent communication learning method | |
CN112396187A (en) | Multi-agent reinforcement learning method based on dynamic collaborative map | |
CN116582442A (en) | Multi-agent cooperation method based on hierarchical communication mechanism | |
CN113592162B (en) | Multi-agent reinforcement learning-based multi-underwater unmanned vehicle collaborative search method | |
Xiao et al. | Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment | |
Wang et al. | Adaptive traffic signal control using distributed marl and federated learning | |
Xie et al. | Et-hf: A novel information sharing model to improve multi-agent cooperation | |
Shen et al. | An AI‐based virtual simulation experimental teaching system in space engineering education | |
CN117893807B (en) | Knowledge distillation-based federal self-supervision contrast learning image classification system and method | |
Ouyang et al. | Domain adversarial graph neural network with cross-city graph structure learning for traffic prediction | |
Zhao et al. | Learning transformer-based cooperation for networked traffic signal control | |
Mukhtar et al. | CCGN: Centralized collaborative graphical transformer multi-agent reinforcement learning for multi-intersection signal free-corridor | |
Le et al. | Applications of distributed machine learning for the Internet-of-Things: A comprehensive survey | |
Zhao et al. | Enhancing traffic signal control with composite deep intelligence | |
Zeng et al. | Periodic Collaboration and Real-Time Dispatch Using an Actor–Critic Framework for UAV Movement in Mobile Edge Computing | |
Sun et al. | Learning controlled and targeted communication with the centralized critic for the multi-agent system | |
Sun et al. | Leveraging digital twin and drl for collaborative context offloading in c-v2x autonomous driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |