CN116582442A

CN116582442A - Multi-agent cooperation method based on hierarchical communication mechanism

Info

Publication number: CN116582442A
Application number: CN202310285877.9A
Authority: CN
Inventors: 郭斌; 任浩阳; 於志文; 刘佳琪; 孙卓; 张江山
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-08-11

Abstract

The invention discloses a multi-agent cooperation method based on a layering communication mechanism, which comprises the steps of firstly establishing an intra-group communication topological structure and dividing agents into different groups according to communication distances; then establishing an inter-group communication topological structure, and searching for high-level intelligent agents in the group by using a centrality method based on the number of neighbor nodes; then, carrying out intra-group information aggregation and inter-group information exchange; constructing a hierarchical graph convolution network by using a multi-head attention mechanism, and weighting and adding value representations of all input features through a relation for each attention head; then, the outputs of m attention heads of the intelligent agent are connected in series, and then the outputs are sent into a single-layer MLP with ReLU nonlinearity to obtain the output of a convolution layer; and finally, making an action decision, and taking action according to the strategy. The invention breaks through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, and relieves the problem of dimension disasters of centralized training.

Description

Multi-agent cooperation method based on hierarchical communication mechanism

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a multi-agent cooperative method.

Background

Along with the increase of the scale of unmanned systems, the multi-agent system mainly comprising cooperation starts to play a great role, completes complex and difficult tasks in a high-dimensional dynamic environment, and is widely applied to the aspects of automatic driving in intelligent traffic, collaborative combat in future war, autonomous distribution in intelligent logistics, communication search and rescue of geological disasters and the like. In recent years, multi-agent deep reinforcement learning (Multi-agent Deep Reinforcement Learning, MADRL) combines the perception capability of deep learning and the decision capability of reinforcement learning and applies the combination to a Multi-agent system, so as to effectively complete various complex tasks in the Multi-agent environment, such as competition, cooperation or mixed tasks. The multi-agent communication establishes a communication structure between agents and the communication strategy to carry out abstract information transmission and reception to exchange respective states so as to coordinate respective strategies, thereby helping the agents to negotiate and adjust behavior decisions through exchanging information such as observation, intention or experience in operation, improving the overall learning performance and realizing respective learning targets. However, in view of the large number of agents and the nature of redundant information interference systems, it is of great importance to provide a hierarchical multi-agent communication mechanism with efficient communication capabilities.

In the method level, the existing algorithms based on a gate mechanism, an attention mechanism and an algorithm based on a graph network explore multi-agent communication mechanism technology in the fields of selection of communication objects, determination of communication time, content of communication information, updating of information, improvement of communication efficiency and the like. Algorithms that use gating mechanisms feature learning when to communicate and communicate efficiently. CommNet is a classical communication network that averages all agent states at a previous time and aggregates them with the agent state at the current time by LSTM to make action predictions. The gate mechanism is used for enabling the agents to receive vectors from other agents as part of input, the vectors contain certain communication contents, the contents can synchronously learn along with gradient descent, and information is transferred between the agents, so that the completion of tasks is facilitated. Algorithms that use the attention mechanism to screen and decide with whom to communicate and how to effectively aggregate information for hidden states. Vain introduces an attention mechanism to calculate an attention vector of each agent outputting communication information, and on the basis of this, represents the relationship between two agents. The graph network-based model focuses on modeling relationships between agents, utilizing the graph convolution layer to learn collaboration from potential features generated from progressively increasing accepted domains. The DGN calculates interaction between the agents by taking the attention of the multi-head dot product as a convolution kernel on the basis of a graph network, and can extract higher-order relation expression by utilizing a multi-convolution layer, so that the interaction between the agents is effectively captured, residual connection is introduced, and the stability of collaborative decision and convergence is promoted.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a multi-agent cooperation method based on a layering communication mechanism, which comprises the steps of firstly establishing an intra-group communication topological structure and dividing agents into different groups according to communication distances; then establishing an inter-group communication topological structure, and searching for high-level intelligent agents in the group by using a centrality method based on the number of neighbor nodes; then, carrying out intra-group information aggregation and inter-group information exchange; constructing a hierarchical graph convolution network by using a multi-head attention mechanism, and weighting and adding value representations of all input features through a relation for each attention head; then, the outputs of m attention heads of the intelligent agent are connected in series, and then the outputs are sent into a single-layer MLP with ReLU nonlinearity to obtain the output of a convolution layer; and finally, making an action decision, and taking action according to the strategy. The invention breaks through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, and relieves the problem of dimension disasters of centralized training.

The technical scheme adopted by the invention for solving the technical problems comprises the following steps:

step 1: establishing an intra-group communication topological structure;

dividing agents into different groups according to communication distanceFor a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group in which the jth intelligent agent formed at the time t is located; />The communication range cluster dividing module is used for enabling the intelligent agents in the communication range of the intelligent agents to communicate with each other; v _i I-th agent, v _j The communication distance rc is the j-th intelligent agent;

obtaining an intra-group adjacency matrix of the communication topological graph, wherein C= (e) _ij ) _n×n A representation; e, e _ij Indicating the side relationship between the intelligent agents, wherein the mutual communication is 1, and the non-communication is 0; n represents the number of agents;

step 2: establishing an inter-group communication topological structure;

searching for high-grade intelligent agents in the group by using a degree centrality method based on the number of neighbor nodes, wherein the high-grade intelligent agents can communicate with other high-grade intelligent agents in the group, and exchange information between groups;

high-level intelligent agentA representation;

for centrality-measuring methodsA representation;

wherein Dgree is _max To obtain the maximum centrality node in the group, dgre is the centrality of the calculation node, edge _i Representing the number of existing edges connected to a node, all _i Representing node v _i The number of edges connected to other nodes;

step 3: intra-group information aggregation and inter-group information exchange;

constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v _i Set of agents N in local area in communication therewith _i For the mth attention header, agent v _i and v_j ∈N _i The relationship between them is calculated as:

where beta is a scale factor and where beta is a scale factor,represents h _i Is a query weight matrix of h _i Representing agent v _i Feature vector of>Represents h _j Key weight matrix of h _j Representing agent v _j Feature vector, h _e Representing agent v _e Is represented in agent v _i Set of agents N in local area in communication therewith _i The e-th agent in (a);

for each attention header, the value representations of all input features are weighted and added by a relationship; then, agent v _i The outputs of the M total attention heads are concatenated and then fed into a single layer MLP with ReLU nonlinearity, resulting in the output of the convolutional layer:

wherein ,represents h _j Is a matrix of value weights, M represents the number of attention heads, σ represents the MLP feel with ReLU activation functionA machine is known;

step 4: action decision-making;

cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I _l Representing each layer of the graph convolutional neural network, +.>Representing agent v in each tier of network _i Is characterized by (2);

the loss function isy _i ＝r _i +γmaxQ(O′ _i ，a′ _i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein γ is a discount factor; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O _i A set of observations representing all agents in the ith receiving domain, a _i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, which is a hierarchical multi-agent cooperative communication model;

based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.

The beneficial effects of the invention are as follows:

the invention adopts a flexible and easily understood hierarchical communication topological structure, can divide the hierarchical structure among the agents to distinguish the states of different groups among the agents and the influence of different groups on the agents, uses the graph convolution based on the attention mechanism to carry out information aggregation, and helps the agents expand the receptive field through graph convolution communication, thereby improving the strategy quality. And meanwhile, the neural network replayed by experience is used for reducing the interaction times with the environment, and the training data with high efficiency and diversity are fully utilized. By establishing a reasonable and efficient hierarchical communication structure between the intelligent agents, centralized data acquisition, aggregation and utilization are broken, and the problem of dimension disasters of centralized training is relieved. Communication efficiency and performance of the multi-agent system in cooperation, coordination or cooperative activities in the global scope are improved, and influences of individuals and group agents are balanced better.

Drawings

FIG. 1 is a diagram of the overall framework of the hierarchical multi-agent cooperative communication mechanism of the present invention.

FIG. 2 is a diagram of a multi-head attention mechanism according to the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

The invention provides a multi-agent cooperation method based on a layering communication mechanism. The following principle is utilized: the hierarchical structure is a more complex communication structure which is evolved in the topological structure, the hierarchy of the nodes and the sequence of the communication structure can be divided, and the communication mode and strategy which are closer to reality can be learned. In order to solve the problem of high-efficiency communication in a multi-agent deep reinforcement learning scene, a hierarchical multi-agent cooperative communication model (Hierarchical Multi-Agent Collaborative Communication Mechanism, HMACC) is provided. Firstly, the hierarchical structure is divided among the agents to distinguish states of different groups among the agents, influence of the different groups on the agents is distinguished, then information in the groups and among the groups is distinguished, the agents are helped to expand receptive fields, and policy quality is improved. The invention decomposes the communication process into a local interaction process and a global interaction process, models and learns through proper strategies, and improves the communication efficiency and performance when the multi-agent system performs cooperation, coordination or cooperation activities in a global scope.

A multi-agent cooperation method based on a layering communication mechanism comprises the following steps:

step 1: establishing an intra-group communication topological structure;

step 2: establishing an inter-group communication topological structure;

high-level intelligent agentA representation;

for centrality-measuring methodsA representation;

constructing a hierarchical graph convolutional neural network by using a multi-head attention mechanism, and calculating interaction among intelligent agents by using multi-head dot product attention as convolution kernel; taking each agent as an entity, and in the agent v _i A set of intelligence in local area in communication therewithEnergy set N _i For the mth attention header, agent v _i and v_j ∈N _i The relationship between them is calculated as:

wherein ,represents h _j M represents the number of attention heads, σ represents the MLP perceptron with the ReLU activation function;

step 4: action decision-making;

cascading features obtained from each layer of the graph convolution neural network byRepresentation, wherein I _l Representing a graph convolutionEach layer of the neural network, < >>Representing agent v in each tier of network _i Is characterized by (2);

the loss function isy _i ＝r _i +γmaxQ(O′ _i ，a′ _i The method comprises the steps of carrying out a first treatment on the surface of the θ'); wherein gamma is a discount factor, and the discount factor has smaller influence on the state return value of the closer time, so that the influence on the decision result is larger, and the learning process is more stable; />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O _i A set of observations representing all agents in the ith receiving domain, a _i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, and theta is the hierarchical multi-agent cooperative communication model provided by the invention;

The invention discloses a multi-agent cooperation method based on a layering communication mechanism, and relates to the field of multi-agent cooperation based on multi-agent deep reinforcement learning. The specific process is as follows: firstly, establishing an intra-group communication topological structure, dividing the intelligent agents into different groups according to communication distances, and communicating the intelligent agents within the communication range of the intelligent agents, wherein in the communication topological diagram, all the intelligent agents are full duplex, namely, all the intelligent agents transmit and receive information. And then establishing an inter-group communication topological structure, searching for high-level intelligent agents in groups by using a centrality method based on the number of neighbor nodes, wherein the high-level intelligent agents can communicate with the high-level intelligent agents in other groups, and exchange inter-group information. Intra-group information aggregation and inter-group information exchange then take place. A hierarchical graph convolution network is built by using a multi-head attention mechanism, interaction among agents is calculated by using multi-head dot product attention as a convolution kernel, and each agent is taken as an entity. For each agent, there is a set of agents (dynamically changing number of neighbor agents) in a local area, and for each attention header, the value representations of all input features are weighted and added by relationships. Then, the outputs of the m attention heads of the agent are serially connected and then fed into a single layer MLP with ReLU nonlinearity to obtain the output of the convolutional layer. And finally, performing action decision, wherein in each step, each agent observes the state, takes action according to the strategy, and uses the residual network structure to cascade and send the hidden layer characteristics obtained in each layer into the strategy network, and the strategy network uses the Q network with experience replay. HMACs break through centralized data acquisition, aggregation and utilization by establishing a reasonable and efficient hierarchical communication structure between agents, and relieve the dimension disaster problem of centralized training.

Step 1: establishing intra-group communication topology

The agents are divided into different groups according to communication distance. For dividing groupsTo indicate (I)>For a communication group in which the ith agent formed at time t is located,/for the communication group>A communication group for the j-th agent formed at time t,/>For the communication range cluster dividing module, in order to adapt to the practical environment, the intelligent agents in the communication range of the intelligent agents can communicate with each other, and in the communication topological diagram, the invention defaults that all the intelligent agents are full duplex, namely, all the intelligent agents transmit and receive information. v _i And rc is the communication distance for the ith agent. Intra-group adjacency matrix of the communication topology graph can then be obtained, using c= (e) _ij ) _n×n And (3) representing.

Step 2: establishing inter-group communication topology

On the basis, a centrality method based on the number of neighbor nodes is used for searching high-level intelligent agents in the group. The higher level agents may communicate with other groups of higher level agents, exchanging inter-group information. High-level intelligent agentTo indicate, use of the method for centrality +.> And (3) representing. Wherein Dgree is _max Dgre is the maximum centrality node in the group, and Dgree is the centrality of a node calculated by the node, edge _i Representing the number of existing edges connected to a node, all _i Representing node v _i The number of edges connected to other nodes.

Step 3: intra-group information aggregation and inter-group information exchange

A hierarchical graph convolutional neural network is built by using a multi-head attention mechanism, interaction among agents is calculated by using multi-head dot product attention as a convolution kernel, and each agent is taken as an entity. For each agent v _i In the local area there is a group of agents N _i (number of dynamically changing neighborhood agents). For attention header m, v _i and v_j ∈N _i The relation between them is calculated as

Where β is a scale factor, the value representations of all input features are weighted and added by a relationship for each attention header. Then, agent v _i The outputs of the m attention heads of (a) are connected in series and then fed into a single layer MLP with ReLU nonlinearity to obtain the output of the convolution layer

This approach can better account for correlation between agents and help balance intra-and inter-group information. The effective aggregation and transfer of agent information within groups will directly affect the agent's strategy, and communication between groups is beneficial to group jump out of local optimization.

Step 4: action decision

In each step, each agent observes the state and takes action according to the policy. In consideration of expansibility, the invention designs the Q network replayed by experience to reduce the interaction times with the environment and efficiently utilizes diversified training data. The features obtained from each layer are first cascaded byRepresentation, wherein I _l Each layer of the convolutional neural network is represented so that the actions of the agent can be represented as a _i ＝θ(h″ _i ) θ is our model, loss function isy _i ＝r _i +γmaxQ(O′ _i ，a′ _i The method comprises the steps of carrying out a first treatment on the surface of the θ'). Wherein gamma is a discount factor, and the discount factor has smaller influence on the state return value of the more recent time, so that the influence on the decision result is larger, and the learning process is more stable. />Representing a historical empirically extracted sample batch, N represents a total of N accepted fields, O _i A set of observations representing all agents in the ith receiving domain, a _i Representing actions of the agent in the ith receiving domain; the model is parameterized by theta, and theta is the hierarchical multi-agent cooperative communication model provided by the invention; based on the Q-learning algorithm, the target network θ ' is updated by θ ' =βθ+ (1- β) θ ', and β acts as a super parameter to regulate the influence of the network and the target network.

Claims

1. The multi-agent cooperation method based on the layering communication mechanism is characterized by comprising the following steps of:

step 1: establishing an intra-group communication topological structure;

step 2: establishing an inter-group communication topological structure;

high-level intelligent agentA representation;

for centrality-measuring methodsA representation;

wherein Dgree is _max To obtain the maximum centrality node in the group, dgree is the degree of the calculation nodeHeart and edge _i Representing the number of existing edges connected to a node, all _i Representing node v _i The number of edges connected to other nodes;

step 4: action decision-making;