CN116582442A - Multi-agent cooperation method based on hierarchical communication mechanism - Google Patents

Multi-agent cooperation method based on hierarchical communication mechanism Download PDF

Info

Publication number
CN116582442A
CN116582442A CN202310285877.9A CN202310285877A CN116582442A CN 116582442 A CN116582442 A CN 116582442A CN 202310285877 A CN202310285877 A CN 202310285877A CN 116582442 A CN116582442 A CN 116582442A
Authority
CN
China
Prior art keywords
agent
communication
agents
group
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310285877.9A
Other languages
Chinese (zh)
Inventor
郭斌
任浩阳
於志文
刘佳琪
孙卓
张江山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202310285877.9A priority Critical patent/CN116582442A/en
Publication of CN116582442A publication Critical patent/CN116582442A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a multi-agent cooperation method based on a layering communication mechanism, which comprises the steps of firstly establishing an intra-group communication topological structure and dividing agents into different groups according to communication distances; then establishing an inter-group communication topological structure, and searching for high-level intelligent agents in the group by using a centrality method based on the number of neighbor nodes; then, carrying out intra-group information aggregation and inter-group information exchange; constructing a hierarchical graph convolution network by using a multi-head attention mechanism, and weighting and adding value representations of all input features through a relation for each attention head; then, the outputs of m attention heads of the intelligent agent are connected in series, and then the outputs are sent into a single-layer MLP with ReLU nonlinearity to obtain the output of a convolution layer; and finally, making an action decision, and taking action according to the strategy. The invention breaks through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, and relieves the problem of dimension disasters of centralized training.

Description

Multi-agent cooperation method based on hierarchical communication mechanism
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a multi-agent cooperative method.
Background
Along with the increase of the scale of unmanned systems, the multi-agent system mainly comprising cooperation starts to play a great role, completes complex and difficult tasks in a high-dimensional dynamic environment, and is widely applied to the aspects of automatic driving in intelligent traffic, collaborative combat in future war, autonomous distribution in intelligent logistics, communication search and rescue of geological disasters and the like. In recent years, multi-agent deep reinforcement learning (Multi-agent Deep Reinforcement Learning, MADRL) combines the perception capability of deep learning and the decision capability of reinforcement learning and applies the combination to a Multi-agent system, so as to effectively complete various complex tasks in the Multi-agent environment, such as competition, cooperation or mixed tasks. The multi-agent communication establishes a communication structure between agents and the communication strategy to carry out abstract information transmission and reception to exchange respective states so as to coordinate respective strategies, thereby helping the agents to negotiate and adjust behavior decisions through exchanging information such as observation, intention or experience in operation, improving the overall learning performance and realizing respective learning targets. However, in view of the large number of agents and the nature of redundant information interference systems, it is of great importance to provide a hierarchical multi-agent communication mechanism with efficient communication capabilities.
In the method level, the existing algorithms based on a gate mechanism, an attention mechanism and an algorithm based on a graph network explore multi-agent communication mechanism technology in the fields of selection of communication objects, determination of communication time, content of communication information, updating of information, improvement of communication efficiency and the like. Algorithms that use gating mechanisms feature learning when to communicate and communicate efficiently. CommNet is a classical communication network that averages all agent states at a previous time and aggregates them with the agent state at the current time by LSTM to make action predictions. The gate mechanism is used for enabling the agents to receive vectors from other agents as part of input, the vectors contain certain communication contents, the contents can synchronously learn along with gradient descent, and information is transferred between the agents, so that the completion of tasks is facilitated. Algorithms that use the attention mechanism to screen and decide with whom to communicate and how to effectively aggregate information for hidden states. Vain introduces an attention mechanism to calculate an attention vector of each agent outputting communication information, and on the basis of this, represents the relationship between two agents. The graph network-based model focuses on modeling relationships between agents, utilizing the graph convolution layer to learn collaboration from potential features generated from progressively increasing accepted domains. The DGN calculates interaction between the agents by taking the attention of the multi-head dot product as a convolution kernel on the basis of a graph network, and can extract higher-order relation expression by utilizing a multi-convolution layer, so that the interaction between the agents is effectively captured, residual connection is introduced, and the stability of collaborative decision and convergence is promoted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-agent cooperation method based on a layering communication mechanism, which comprises the steps of firstly establishing an intra-group communication topological structure and dividing agents into different groups according to communication distances; then establishing an inter-group communication topological structure, and searching for high-level intelligent agents in the group by using a centrality method based on the number of neighbor nodes; then, carrying out intra-group information aggregation and inter-group information exchange; constructing a hierarchical graph convolution network by using a multi-head attention mechanism, and weighting and adding value representations of all input features through a relation for each attention head; then, the outputs of m attention heads of the intelligent agent are connected in series, and then the outputs are sent into a single-layer MLP with ReLU nonlinearity to obtain the output of a convolution layer; and finally, making an action decision, and taking action according to the strategy. The invention breaks through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, and relieves the problem of dimension disasters of centralized training.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
step 1: establishing an intra-group communication topological structure;
dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v i I-th agent, v j The communication distance rc is the j-th intelligent agent;
obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) ij ) n×n A representation; e, e ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;
step 2: establishing an inter-group communication topological structure;
searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;
high-level intelligent agentA representation;
for centrality-measuring methodsA representation;
wherein Dgree is max To obtain the maximum centrality node in the group, dgre is the centrality of the calculation node, edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes;
step 3: intra-group information aggregation and inter-group information exchange;
constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v i Set of agents N in local area in communication therewith i For the mth attention header, agent v i and vj ∈N i The relationship between them is calculated as:
where beta is a scale factor and where beta is a scale factor,represents h i Is a query weight matrix of h i Representing agent v i Feature vector of>Represents h j Key weight matrix of h j Representing agent v j Feature vector, h e Representing agent v e Is represented in agent v i Set of agents N in local area in communication therewith i The e-th agent in (a);
for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:
wherein ,represents h j Is a matrix of value weights, M represents the number of attention heads, σ represents the MLP feel with ReLU activation functionA machine is known;
step 4: action decision-making;
cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I l Representing each layer of the graph convolutional neural network, +.>Representing agent v in each tier of network i Is characterized by (2);
the loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein γ is a discount factor; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, which is a hierarchical multi-agent cooperative communication model;
based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
The beneficial effects of the invention are as follows:
the invention adopts a flexible and easily understood hierarchical communication topological structure, can divide the hierarchical structure among the agents to distinguish the states of different groups among the agents and the influence of different groups on the agents, uses the graph convolution based on the attention mechanism to carry out information aggregation, and helps the agents expand the receptive field through graph convolution communication, thereby improving the strategy quality. And meanwhile, the neural network replayed by experience is used for reducing the interaction times with the environment, and the training data with high efficiency and diversity are fully utilized. By establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, centralized data acquisition, aggregation and utilization are broken, and the problem of dimension disasters of centralized training is relieved. Communication efficiency and performance of the multi-agent system in cooperation, coordination or cooperative activities in the global scope are improved, and influences of individuals and group agents are balanced better.
Drawings
FIG. 1 is a diagram of the overall framework of the hierarchical multi-agent cooperative communication mechanism of the present invention.
FIG. 2 is a diagram of a multi-head attention mechanism according to the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
The invention provides a multi-agent cooperation method based on a layering communication mechanism. The following principle is utilized: the hierarchical structure is a more complex communication structure which is evolved in the topological structure, the hierarchy of the nodes and the sequence of the communication structure can be divided, and the communication mode and strategy which are closer to reality can be learned. In order to solve the problem of high-efficiency communication in a multi-agent deep reinforcement learning scene, a hierarchical multi-agent cooperative communication model (Hierarchical Multi-Agent Collaborative Communication Mechanism, HMACC) is provided. Firstly, the hierarchical structure is divided among the agents to distinguish states of different groups among the agents, influence of the different groups on the agents is distinguished, then information in the groups and among the groups is distinguished, the agents are helped to expand receptive fields, and policy quality is improved. The invention decomposes the communication process into a local interaction process and a global interaction process, models and learns through proper strategies, and improves the communication efficiency and performance when the multi-agent system performs cooperation, coordination or cooperation activities in a global scope.
A multi-agent cooperation method based on a layering communication mechanism comprises the following steps:
step 1: establishing an intra-group communication topological structure;
dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v i I-th agent, v j The communication distance rc is the j-th intelligent agent;
obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) ij ) n×n A representation; e, e ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;
step 2: establishing an inter-group communication topological structure;
searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;
high-level intelligent agentA representation;
for centrality-measuring methodsA representation;
wherein Dgree is max To obtain the maximum centrality node in the group, dgre is the centrality of the calculation node, edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes;
step 3: intra-group information aggregation and inter-group information exchange;
constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v i A set of intelligence in local area in communication therewithEnergy set N i For the mth attention header, agent v i and vj ∈N i The relationship between them is calculated as:
where beta is a scale factor and where beta is a scale factor,represents h i Is a query weight matrix of h i Representing agent v i Feature vector of>Represents h j Key weight matrix of h j Representing agent v j Feature vector, h e Representing agent v e Is represented in agent v i Set of agents N in local area in communication therewith i The e-th agent in (a);
for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:
wherein ,represents h j M represents the number of attention heads, σ represents the MLP perceptron with the ReLU activation function;
step 4: action decision-making;
cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I l Representing a graph convolutionEach layer of the neural network, < >>Representing agent v in each tier of network i Is characterized by (2);
the loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein gamma is a discount factor, and the discount factor has smaller influence on the state return value of the closer time, so that the influence on the decision result is larger, and the learning process is more stable; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, and theta is the hierarchical multi-agent cooperative communication model provided by the invention;
based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
The invention discloses a multi-agent cooperation method based on a layering communication mechanism, and relates to the field of multi-agent cooperation based on multi-agent deep reinforcement learning. The specific process is as follows: firstly, establishing an intra-group communication topological structure, dividing the intelligent agents into different groups according to communication distances, and communicating the intelligent agents within the communication range of the intelligent agents, wherein in the communication topological diagram, all the intelligent agents are full duplex, namely, all the intelligent agents transmit and receive information. And then establishing an inter-group communication topological structure, searching for high-level intelligent agents in groups by using a centrality method based on the number of neighbor nodes, wherein the high-level intelligent agents can communicate with the high-level intelligent agents in other groups, and exchange inter-group information. Intra-group information aggregation and inter-group information exchange then take place. A hierarchical graph convolution network is built by using a multi-head attention mechanism, interaction among agents is calculated by using multi-head dot product attention as a convolution kernel, and each agent is taken as an entity. For each agent, there is a set of agents (dynamically changing number of neighbor agents) in a local area, and for each attention header, the value representations of all input features are weighted and added by relationships. Then, the outputs of the m attention heads of the agent are serially connected and then fed into a single layer MLP with ReLU nonlinearity to obtain the output of the convolutional layer. And finally, performing action decision, wherein in each step, each agent observes the state, takes action according to the strategy, and uses the residual network structure to cascade and send the hidden layer characteristics obtained in each layer into the strategy network, and the strategy network uses the Q network with experience replay. HMACs break through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between agents, and relieve the dimension disaster problem of centralized training.
Step 1: establishing intra-group communication topology
The agents are divided into different groups according to communication distance. For dividing groupsTo indicate (I)>For a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group for the j-th agent formed at time t,/>For the communication range cluster dividing module, in order to adapt to the practical environment, the intelligent agents in the communication range of the intelligent agents can communicate with each other, and in the communication topological diagram, the invention defaults that all the intelligent agents are full duplex, namely, all the intelligent agents transmit and receive information. v i And rc is the communication distance for the ith agent. Intra-group adjacency matrix of the communication topology graph can then be obtained, using c= (e) ij ) n×n And (3) representing.
Step 2: establishing inter-group communication topology
On the basis, a centrality method based on the number of neighbor nodes is used for searching high-level intelligent agents in the group. The higher level agents may communicate with other groups of higher level agents, exchanging inter-group information. High-level intelligent agentTo indicate, use of the method for centrality +.> And (3) representing. Wherein Dgree is max Dgre is the maximum centrality node in the group, and Dgree is the centrality of a node calculated by the node, edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes.
Step 3: intra-group information aggregation and inter-group information exchange
A hierarchical graph convolutional neural network is built by using a multi-head attention mechanism, interaction among agents is calculated by using multi-head dot product attention as a convolution kernel, and each agent is taken as an entity. For each agent v i In the local area there is a group of agents N i (number of dynamically changing neighborhood agents). For attention header m, v i and vj ∈N i The relation between them is calculated as
Where β is a scale factor, the value representations of all input features are weighted and added by a relationship for each attention header. Then, agent v i The outputs of the m attention heads of (a) are connected in series and then fed into a single layer MLP with ReLU nonlinearity to obtain the output of the convolution layer
This approach can better account for correlation between agents and help balance intra-and inter-group information. The effective aggregation and transfer of agent information within groups will directly affect the agent's strategy, and communication between groups is beneficial to group jump out of local optimization.
Step 4: action decision
In each step, each agent observes the state and takes action according to the policy. In consideration of expansibility, the invention designs the Q network replayed by experience to reduce the interaction times with the environment and efficiently utilizes diversified training data. The features obtained from each layer are first cascaded byRepresentation, wherein I l Each layer of the convolutional neural network is represented so that the actions of the agent can be represented as a i =θ(h″ i ) θ is our model, loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'). Wherein gamma is a discount factor, and the discount factor has smaller influence on the state return value of the more recent time, so that the influence on the decision result is larger, and the learning process is more stable. />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, and theta is the hierarchical multi-agent cooperative communication model provided by the invention; based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.

Claims (1)

1. The multi-agent cooperation method based on the layering communication mechanism is characterized by comprising the following steps of:
step 1: establishing an intra-group communication topological structure;
dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v i I-th agent, v j The communication distance rc is the j-th intelligent agent;
obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) ij ) n×n A representation; e, e ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;
step 2: establishing an inter-group communication topological structure;
searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;
high-level intelligent agentA representation;
for centrality-measuring methodsA representation;
wherein Dgree is max To obtain the maximum centrality node in the group, dgree is the degree of the calculation nodeHeart and edge i Representing the number of existing edges connected to a node, all i Representing node v i The number of edges connected to other nodes;
step 3: intra-group information aggregation and inter-group information exchange;
constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v i Set of agents N in local area in communication therewith i For the mth attention header, agent v i and vj ∈N i The relationship between them is calculated as:
where beta is a scale factor and where beta is a scale factor,represents h i Is a query weight matrix of h i Representing agent v i Feature vector of>Represents h j Key weight matrix of h j Representing agent v j Feature vector, h e Representing agent v e Is represented in agent v i Set of agents N in local area in communication therewith i The e-th agent in (a);
for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:
wherein ,represents h j M represents the number of attention heads, σ represents the MLP perceptron with the ReLU activation function;
step 4: action decision-making;
cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I l Representing each layer of the graph convolutional neural network, +.>Representing agent v in each tier of network i Is characterized by (2);
the loss function isy i =r i +γmaxQ(O′ i ,a′ i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein γ is a discount factor; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O i A set of observations representing all agents in the ith receiving domain, a i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, which is a hierarchical multi-agent cooperative communication model;
based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.
CN202310285877.9A 2023-03-22 2023-03-22 Multi-agent cooperation method based on hierarchical communication mechanism Pending CN116582442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310285877.9A CN116582442A (en) 2023-03-22 2023-03-22 Multi-agent cooperation method based on hierarchical communication mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310285877.9A CN116582442A (en) 2023-03-22 2023-03-22 Multi-agent cooperation method based on hierarchical communication mechanism

Publications (1)

Publication Number Publication Date
CN116582442A true CN116582442A (en) 2023-08-11

Family

ID=87532901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310285877.9A Pending CN116582442A (en) 2023-03-22 2023-03-22 Multi-agent cooperation method based on hierarchical communication mechanism

Country Status (1)

Country Link
CN (1) CN116582442A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992400A (en) * 2023-09-27 2023-11-03 之江实验室 Collaborative sensing method and collaborative sensing device based on space-time feature fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992400A (en) * 2023-09-27 2023-11-03 之江实验室 Collaborative sensing method and collaborative sensing device based on space-time feature fusion
CN116992400B (en) * 2023-09-27 2024-01-05 之江实验室 Collaborative sensing method and collaborative sensing device based on space-time feature fusion

Similar Documents

Publication Publication Date Title
Liu et al. Multi-agent game abstraction via graph attention neural network
CN109635917B (en) Multi-agent cooperation decision and training method
Jiang et al. Distributed resource scheduling for large-scale MEC systems: A multiagent ensemble deep reinforcement learning with imitation acceleration
CN110401964A (en) A kind of user oriented is Poewr control method of the center network based on deep learning
Ge et al. Multi-agent transfer reinforcement learning with multi-view encoder for adaptive traffic signal control
Dong et al. Joint optimization of deployment and trajectory in UAV and IRS-assisted IoT data collection system
CN114091667A (en) Federal mutual learning model training method oriented to non-independent same distribution data
CN112990485A (en) Knowledge strategy selection method and device based on reinforcement learning
WO2023109699A1 (en) Multi-agent communication learning method
CN112396187A (en) Multi-agent reinforcement learning method based on dynamic collaborative map
CN116582442A (en) Multi-agent cooperation method based on hierarchical communication mechanism
CN113592162B (en) Multi-agent reinforcement learning-based multi-underwater unmanned vehicle collaborative search method
Xiao et al. Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment
Wang et al. Adaptive traffic signal control using distributed marl and federated learning
Xie et al. Et-hf: A novel information sharing model to improve multi-agent cooperation
Shen et al. An AI‐based virtual simulation experimental teaching system in space engineering education
CN117893807B (en) Knowledge distillation-based federal self-supervision contrast learning image classification system and method
Ouyang et al. Domain adversarial graph neural network with cross-city graph structure learning for traffic prediction
Zhao et al. Learning transformer-based cooperation for networked traffic signal control
Mukhtar et al. CCGN: Centralized collaborative graphical transformer multi-agent reinforcement learning for multi-intersection signal free-corridor
Le et al. Applications of distributed machine learning for the Internet-of-Things: A comprehensive survey
Zhao et al. Enhancing traffic signal control with composite deep intelligence
Zeng et al. Periodic Collaboration and Real-Time Dispatch Using an Actor–Critic Framework for UAV Movement in Mobile Edge Computing
Sun et al. Learning controlled and targeted communication with the centralized critic for the multi-agent system
Sun et al. Leveraging digital twin and drl for collaborative context offloading in c-v2x autonomous driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination