CN115759199A

CN115759199A - Multi-robot environment exploration method and system based on hierarchical graph neural network

Info

Publication number: CN115759199A
Application number: CN202211454807.3A
Authority: CN
Inventors: 程吉禹; 张�浩; 张伟; 张�林; 宋然; 李晓磊
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-03-07
Anticipated expiration: 2042-11-21
Also published as: CN115759199B

Abstract

The invention provides a multi-robot environment exploration method and system based on a hierarchical graph neural network, and relates to the field of multi-robot unknown environment exploration. The method comprises the following steps: the environment modeling method based on the topological graph represents the continuous environment map as the topological graph; extracting features of the topological graph based on a hierarchical graph neural network, aggregating feature information of different hop counts in the topological graph, and fusing features of nodes and edges in the topological graph by using a multi-head attention mechanism to obtain a final output topological graph; and taking the node characteristics corresponding to a single robot node in the final output topological graph as the state of a corresponding intelligent agent in reinforcement learning, and performing information fusion on the node characteristics from a plurality of robots by using a multi-head attention mechanism to obtain the total state value of the robot system. The invention can extract the characteristic information in the environment topological graph with attention, and utilizes the multi-agent reinforcement learning framework to carry out strategy learning, thereby improving the overall cooperativity and the task execution efficiency of the multi-robot system.

Description

Multi-robot environment exploration method and system based on hierarchical graph neural network

Technical Field

The invention belongs to the field of multi-robot unknown environment exploration, and particularly relates to a multi-robot environment exploration method and system based on a hierarchical graph neural network.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

The unknown environment exploration, as a basic problem of robotics, is always a hot problem of research in the field of robots and is also a major challenge. In the task of exploring the unknown environment of multiple robots, the multiple robots are required to locally sense the structural information of a working space through sensors carried by the robots, communicate with companion robots under a certain condition, cooperatively make a decision, and further quickly and accurately reconstruct an environment model in the motion process. This task has two major difficulties: the first difficulty is how to properly allocate task points among multiple robots, and the overall efficiency of a single robot and a multi-robot system is considered, so that conflicts are avoided. The problem of co-ordination between robots is itself a problem with NP-hard. Secondly, for entities (robot teammates, obstacles, map boundary information, etc.) with complex and numerous features in the environment, it is also challenging to perform appropriate modeling and select a feature extraction framework adapted to the entities to reduce the decision complexity.

For task allocation among multiple robots, the existing work is roughly divided into methods based on emotion modeling, methods based on auction mechanism and methods based on deep reinforcement learning according to different theoretical bases. Although the method based on the auction mechanism is widely applied to indoor exploration tasks, the method itself may bring about serious path repetition, which affects the overall working efficiency of the system. Although the emotion-based method is relieved for the problem of path repetition, the emotional state and the feature extraction both need manual design, and the generalization capability in a complex environment is poor. The reinforcement learning-based method is a solution to the decision problem arising in recent years with the development of artificial intelligence, and has the advantages of being capable of processing a complex state space and an action space while considering decision efficiency and optimality.

For entities with complex and numerous characteristics in an environment, topological diagrams are widely used to describe spatial and structural characteristics among entities. However, for complex topological structure information in a topological graph, most of the existing methods adopt a graph neural network with a permutation invariance to perform feature extraction. Although the method can extract the features of the topological graph with the variable size and structure, the hierarchical information embodied in the topological graph is easy to ignore.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a multi-robot environment exploration method and a multi-robot environment exploration system based on a hierarchical graph neural network.

In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

the invention provides a multi-robot environment exploration method based on a hierarchical graph neural network.

A multi-robot environment exploration method based on a hierarchical graph neural network comprises the following steps:

the environment modeling method based on the topological graph represents the continuous environment map as the topological graph;

extracting features of the topological graph based on a hierarchical graph neural network, aggregating feature information of different hop counts in the topological graph, and fusing features of nodes and edges in the topological graph by using a multi-head attention mechanism to obtain a final output topological graph;

and taking the node characteristics corresponding to the single robot node in the final output topological graph as the state of a corresponding intelligent agent in reinforcement learning, and performing information fusion on the node characteristics from a plurality of robots by using a multi-head attention mechanism to obtain the total state value of the robot system.

The invention provides a multi-robot environment exploration system based on a hierarchical graph neural network.

A multi-robot environment exploration system based on a hierarchical graph neural network comprises:

an environment modeling module configured to: the environment modeling method based on the topological graph represents the continuous environment map as the topological graph;

a feature extraction module configured to: extracting features of the topological graph based on the hierarchical graph neural network, aggregating feature information of different hop counts in the topological graph, and fusing the features of nodes and edges in the topological graph by using a multi-head attention mechanism to obtain a final output topological graph;

a reinforcement learning module configured to: and taking the node characteristics corresponding to the single robot node in the final output topological graph as the state of a corresponding intelligent agent in reinforcement learning, and performing information fusion on the node characteristics from a plurality of robots by using a multi-head attention mechanism to obtain the total state value of the robot system.

A third aspect of the present invention provides a computer readable storage medium having stored thereon a program which, when executed by a processor, performs the steps in the hierarchical graph neural network based multi-robot environment exploration method according to the first aspect of the present invention.

A fourth aspect of the present invention provides an electronic device, comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the hierarchical graph neural network-based multi-robot environment exploration method according to the first aspect of the present invention when executing the program.

The above one or more technical solutions have the following beneficial effects:

(1) The invention adopts the topological graph to model the entities in the environment, extracts the spatial feature information among the key entities in the environment while reducing the computational complexity, and is beneficial to the robot to make a decision more reasonably and efficiently. On the basis, the invention adopts the hierarchical graph neural network as a strategy network to distinguish and process information from different topological levels, thereby improving the cognitive ability of the robot on the spatial characteristics and enabling the decision of the robot to be more targeted and interpretable.

(2) The invention adopts a multi-agent deep reinforcement learning framework of 'centralized training and decentralized execution' to encode the cooperativity of the robot system into the discrete strategy of each robot, thereby reducing the conflicts of repeated coverage, collision and the like of the robots and leading each robot to efficiently and autonomously finish the exploration of unknown environment in a team.

(3) The robot decision mode is decentralized decision, and in a multi-robot team, each robot shares a policy network; meanwhile, the graph neural network can extract features of topological graphs with different input sizes. Therefore, the strategy network provided by the invention has strong generalization and expandability, and can expand the model obtained by training in a small scene to a large scene, thereby greatly saving the training cost and improving the training efficiency.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a schematic diagram of a topology establishment process according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a multi-agent reinforcement learning framework employed by the first embodiment of the present invention;

FIG. 3 is a flow chart of a hierarchical neural policy network used in the first embodiment of the present invention.

Fig. 4 is a system configuration diagram of the second embodiment.

Detailed Description

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

The general idea provided by the invention is as follows:

in a first aspect, the invention provides a topological graph-based environment modeling method and a hierarchical graph neural network-based feature extraction and decision framework. In the environment modeling, first, the environment is discretized according to a certain distance specification, and a continuous environment map (including a robot) is represented as a discrete grid point map. In the invention, each grid point in the map passable area is used as a node in the topological graph, and then the upper edge is added between adjacent nodes, so that the topological feature extraction of the environment is completed, and a topological map is obtained. The entire modeling process is shown in fig. 1.

Then, for such topological graph information, a hierarchical graph neural network is proposed to extract information in the environment. The hierarchical graph neural network updates the characteristics of each node into the characteristic aggregation of the neighbor nodes thereof through the flowing and aggregation of information. After iterative updating is carried out for multiple times, information aggregation of different hop counts in the topological graph can be obtained. Different from the existing method, the invention utilizes a multi-head attention mechanism to perform importance distinction on information from different hop counts in the topological graph so as to help the robot to sense the surrounding environment information in a more distinguishing way.

In a second aspect, the present invention provides a "focused training, distributed execution" multi-agent reinforcement learning framework that implicitly integrates the collaboration between agents into the independent policies of each agent. When multiple robots are deployed in a search task, the cooperation between the robots must be considered comprehensively to ensure efficiency and avoid conflicts such as repeated searches. As shown in fig. 2, the present invention proposes that a plurality of robots share a centralized value network and a decentralized strategy network during training. Wherein the final output of the centralized value network comes from the weighted results of each robot value network. The corresponding weighting factors are assigned by means of attention. In this manner, the contribution of a single robot to the overall task in the system can be reasonably distributed, and the cooperativity among multiple robots is implicitly coded into the strategy that each robot is discrete. In addition, since each agent shares parameters during the training process, the trained model shows good extensibility, which makes it possible to train the model in a small scene (containing fewer robots) and extend it to a larger scene (containing more robots) for application.

Example one

The embodiment discloses a multi-robot environment exploration method based on a hierarchical graph neural network.

As shown in fig. 1-3, the multi-robot environment exploration method based on the hierarchical graph neural network comprises the following steps:

and taking the node characteristics corresponding to a single robot node in the final output topological graph as the state of a corresponding intelligent agent in reinforcement learning, and performing information fusion on the node characteristics from a plurality of robots by using a multi-head attention mechanism to obtain the total state value of the robot system.

The embodiment mainly provides a multi-robot unknown environment exploration method based on a hierarchical graph neural network, and the actual force is performed in a simulated 2D map. The continuous map in this embodiment is first discretized into a 2D grid map. As shown in FIG. 1, the trafficable region in the environment and the grid points corresponding to the robots are used as nodes in the topological graph, and the adjacent grid points are oppositeUpper edges will be added between nodes. Finally, after the upper edges are added between the robot nodes and the adjacent nodes, the topological representation of the grid map is obtained. Specifically, the present embodiment employs a three-dimensional vector

To represent the characteristics of node i:

wherein R is a set of robots; w is a group of _t Is a collection of passable areas; f _t Representing a set of boundary points at time t;

indicating whether node i belongs to the corresponding set. The characteristic of the edge j in the topological graph is defined as the distance between two node nodes corresponding to the edge j in the graph, namely

Finally, the feature sets of the nodes and edges in the topological graph are respectively expressed as

And

the topology is represented as G = { V, E }.

For the topological graph G, the embodiment adopts a hierarchical graph neural network to extract features. The hierarchical graph neural network mainly comprises two modules: an underlying feature aggregation module and a hierarchical feature aggregation module.

Carrying out feature extraction on the topological graph based on the hierarchical graph neural network, which specifically comprises the following steps:

inputting the topological graph into a bottom layer feature aggregation module, updating the feature of each node in the topological graph into the feature aggregation of a neighbor, updating the feature of an edge into the feature aggregation of two adjacent nodes, and iterating the bottom layer feature aggregation module for K times to obtain K levels of feature information and K +1 output topological graphs;

inputting the K +1 output topological graphs into a hierarchical feature aggregation module, and fusing the features of nodes and edges in the output topological graphs by using a multi-head attention mechanism to obtain a hidden feature graph;

and inputting the hidden feature map into an output layer to extract the topological features of the hidden layer, so as to obtain a final output topological map.

The bottom layer characteristic aggregation module is responsible for aggregating the characteristic information of different hop counts in the topological graph and mainly comprises two updating functions phi ^e 、φ ^v And an aggregation function p ^e→v . Each time the bottom layer feature aggregation is carried out, the feature of each node in the topological graph passes through rho ^e→v and φ^v Update as feature aggregation of its neighbors, the edge's features will pass phi ^e Updating the feature aggregation of his two neighboring nodes. And iterating the modules to perform K times of operation to obtain the characteristic information from K layers. In specific implementation, the embodiment defines an equivalent directed graph as G = { V, E }, where G =

At the same time, for the update function phi ^e ，φ ^v And an aggregation function ρ ^e→v As defined below:

wherein

Respectively, are a collection of edge and node features in the output topology. After the K times of bottom layer feature aggregation operations are iteratively performed, K +1 output topological graphs are finally obtained. And the hierarchical feature aggregation module takes the K +1 topological graphs as input and fuses the features of the nodes and the edges in the graphs by using a multi-head attention mechanism. In specific implementation, for each node I and each edge j in the ith topological graph, learnable weights are introduced in the embodiment

And

generate a corresponding key, denoted as

And

where m represents the number of heads corresponding to the current attention mechanism. This embodiment then introduces learnable

And

for each node i and edge j, the corresponding feature of the mth head attention mechanism aggregation is represented as

And

thus, information from different levels is fused into a topological graph through a multi-head attention mechanism. For output graphs from M different head numbers, the characteristics of edges and nodes are pasted and sent to an MLP to obtain an output graph G ^out ＝{V ^out ,E ^out }：

In the final output topological graph, the node characteristics corresponding to the robot node i are used as the state of the corresponding intelligent agent i in reinforcement learning; edges connected with the robot node i as candidates of the current action value of the robot are sent into a softmax layer to obtain action probability distribution. And finally, the robot determines the action at the current moment according to the probability distribution.

The strategy network in the embodiment adopts a multi-agent reinforcement learning framework with an operator-critic structure, and adopts a centralized critic structure and a decentralized operator structure. At each momentt, the state of the robot system being a map, i.e. s, detected by a plurality of robots before time t _t ＝G _t . Observation and observation of robot i

Is the topology information within the K hop range obtained by agent i at time t. The action space of each agent is a = { up, down, left, right }. If the grid points adjacent to the robot are impassable, the action will be set to "stop".

The present embodiment employs a fully collaborative multi-agent reinforcement learning so that agents share a common reward objective. The reward of the robot is the difference between the previous time and the search area at that time. Regarding the aspect of the network structure, since the inputs of the policy function and the cost function are local information of the robot, the embodiment shares the first layers of parameters of the value network and the policy network to improve the stability of training. After the input graph passes through the hierarchical graph neural network, a hidden feature graph is generated, and then the output layer of the hierarchical graph neural network is used for extracting the topological features of the hidden layer. The value network output layer performs information fusion on the characteristics from the n robots by using a multi-head attention mechanism, and finally the characteristics of the output nodes are one-dimensional.

The characteristics of the node corresponding to the robot i are taken as the state value of the agent i, namely

And finally, the sum of the state values of all the agents is used as the state value of the whole system. For a policy network, in order to fully utilize a topological network, the present embodiment sends the features of the edges connected to the agent on the output graph to the softmax layer as a behavior probability distribution.

Representing the set of edges adjacent to node i, the action of agent i may be represented as:

the training algorithm adopts an operator-critic structure, the operator output action enables the obtained accumulated expected reward to be maximum, the critic network scores the operator action, loss is calculated according to the reward obtained by the operator action and the score given by the critic, and the critic network parameters are updated by gradient reduction. After a large amount of training, the rewards acquired by the robots tend to be stable, the algorithm is gradually converged, a plurality of robots can complete exploration on the unknown environment, and the task is completed.

Example two

The embodiment discloses a multi-robot environment exploration system based on a hierarchical graph neural network.

As shown in fig. 2, the system for multi-robot environment exploration based on hierarchical graph neural network includes:

a feature extraction module configured to: extracting features of the topological graph based on a hierarchical graph neural network, aggregating feature information of different hop counts in the topological graph, and fusing features of nodes and edges in the topological graph by using a multi-head attention mechanism to obtain a final output topological graph;

EXAMPLE III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the hierarchical graph neural network-based multi-robot environment exploration method according to embodiment 1 of the present disclosure.

Example four

An object of the present embodiment is to provide an electronic device.

An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for multi-robot environment exploration based on hierarchical graph neural network according to embodiment 1 of the present disclosure when executing the program.

The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.

Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. The multi-robot environment exploration method based on the hierarchical graph neural network is characterized by comprising the following steps of:

2. The multi-robot environment exploration method based on the hierarchical graph neural network as claimed in claim 1, wherein the environment modeling method based on the topological graph represents a continuous environment map as the topological graph, specifically: discretizing the environment according to a certain distance specification, taking the passable area in the environment and the corresponding lattice points of the robot as nodes in the topological graph, adding an upper edge between adjacent nodes, and simultaneously adding an upper edge between the nodes of the robot and the adjacent nodes, and representing the environmental graph as the topological graph.

3. The multi-robot environment exploration method based on the hierarchical graph neural network as claimed in claim 1, wherein the topological graph is subjected to feature extraction based on the hierarchical graph neural network, and specifically comprises:

and inputting the hidden feature diagram into an output layer to extract the topological feature of the hidden layer, so as to obtain a final output topological diagram.

4. The multi-robot environment exploration method based on hierarchical graph neural network of claim 1, wherein a multi-head attention mechanism is used to perform information fusion on node features from multiple robots, and the final output features are one-dimensional.

5. The multi-robot environment exploration method based on the hierarchical graph neural network as claimed in claim 1, wherein at each time t, the robot system state is a map detected by a plurality of robots before the time t, and finally the sum of the state values of all the agents is used as the state value of the whole system.

6. The multi-robot environment exploration method based on the hierarchical graph neural network as claimed in claim 1, further comprising the steps of taking edges connected with the robot nodes in the final output topological graph as candidates of the current action value of the robot, sending the candidates into a softmax layer to obtain action probability distribution, and determining the action of the robot at the current moment based on the action probability distribution.

7. The multi-robot environment exploration method based on hierarchical graph neural network of claim 6, wherein the actions of robot i are expressed as:

wherein ,

represents the set of edges adjacent to node i,

representing the edges in the final output topology connected to node j.

8. A multi-robot environment exploration system based on a hierarchical graph neural network is characterized in that: the method comprises the following steps:

9. Computer-readable storage medium, on which a program is stored which, when being executed by a processor, carries out the steps of the method for multi-robot environment exploration based on hierarchical graph neural networks according to any of claims 1 to 7.

10. Electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the method for multi-robot environment exploration based on hierarchical graph neural networks according to any of claims 1-7 when executing said program.