CN115907248A

CN115907248A - Multi-robot unknown environment path planning method based on geometric neural network

Info

Publication number: CN115907248A
Application number: CN202211328724.XA
Authority: CN
Inventors: 程吉禹; 丁俊锋; 张伟; 张�浩
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-04-04
Anticipated expiration: 2042-10-26
Also published as: CN115907248B

Abstract

The invention provides a method and a system for planning paths of multiple robots in an unknown environment based on a geometric neural network, which relate to the technical field of artificial intelligence, realize effective coding of relative positions among the robots by using the method of the geometric neural network, weight information of neighbor robots according to the position coding, realize a decentralized distributed control mode by using a deep reinforcement learning method, improve the accuracy of the robots in information aggregation of the neighbor robots, and improve the success rate of path planning of the multiple robots, and the specific scheme comprises the following steps: extracting a robot map perception feature based on map perception information around the position of the robot at the current moment; the geometric figure neural network carries out weighted information aggregation on the robot map perception characteristics and the relative position codes to obtain the complete state representation of the robot; inputting the complete state representation of the robot into a long-term and short-term memory network, and extracting time sequence characteristics; and calculating a behavior decision to generate an action to be executed by the robot at the current moment.

Description

Multi-robot unknown environment path planning method based on geometric neural network

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-robot unknown environment path planning method and system based on a geometric neural network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The path planning means that an optimal collision-free path from an initial position to a target position is generated in the working environment of the mobile robot; the robot path planning is a key content for realizing robot automation and is always a hot spot in the robot research field; with the development of robot technology and the improvement of industrial production requirements, the application of multiple robots is continuously increased, and the research of a path planning algorithm of the multiple robots is more and more important; compared with the path planning of a single mobile robot, the actions of a plurality of robots are mutually influenced, so that the working environment of the robots is unstable, and a single robot path planning algorithm is not suitable any more; in addition, the increase of the number of robots causes the dimension of a system state space and an action space to be increased sharply, the system optimization solving difficulty is high, and the requirements on the calculation capability and the reaction speed of deployment equipment are high.

The existing multi-robot path planning algorithm based on the deep reinforcement learning method mostly adopts a graph neural network mode to realize the communication between robots; the graph data mainly comprises nodes and edges, in the multi-robot path planning, different robots form the nodes on the graph structure, whether the edges exist between the robot nodes mainly depends on whether the geometric distance between the robots is in the robot communication range, if the geometric distance is smaller than the communication radius of the robots, the edges exist between the robots, the two robots are mutually neighbor robots, otherwise, the edges do not exist; the multiple robots realize the information exchange between neighbors in the constructed graph structure by using the graph neural network, aggregate the map information sensed by the neighbor node robots, indirectly increase the receptive field of the robots and increase the cooperation among the robots; however, the graph data is a non-euclidean structure data, has no regular spatial structure, and is irregular and unordered; although graph data can meet the requirement of random structure transformation of a plurality of mobile robots in the process of traveling, relative position relationship information among the robots is lost, the relative position relationship of the robots is not explicitly coded in the existing methods, the position relationship among the robots is mainly learned in an implicit mode, the robots cannot accurately acquire the positions of neighbor robots, only coded observation information transmitted by all the neighbor robots in a neighborhood is weighted and summed, and the robots cannot accurately sense the surrounding environment.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a multi-robot unknown environment path planning method and a multi-robot unknown environment path planning system based on a geometric graph neural network, and provides a path planning network model.

In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

the invention provides a multi-robot unknown environment path planning method based on a geometric figure neural network;

the multi-robot unknown environment path planning method based on the geometric neural network comprises the following steps:

extracting map perception features of the robot based on the acquired map perception information around the position of the robot at the current moment;

after the relative position between the robots is coded, the geometric figure neural network carries out weighted information aggregation on the map perception characteristics and the relative position codes of the robots to obtain the complete state representation of the robots;

inputting the complete state representation of the robot into a long-term and short-term memory network, and extracting time sequence characteristics;

and calculating a behavior decision based on the extracted time sequence characteristics, and generating an action to be executed by the robot at the current moment.

Further, a double-layer decision mode of global target guidance and local dynamic obstacle avoidance is adopted: and taking the complete path which is calculated by the A-x algorithm and leads to the target point as long-term target guide of the robot in the traveling process, providing reference for a reinforcement learning model strategy, and simultaneously adjusting the local path of the robot by the reinforcement learning model according to the dynamic change of the environment so as to finish obstacle avoidance.

Further, the map perception information comprises the distribution situation of obstacles and the positions of other robots.

Furthermore, the relative position coding is realized by two fully-connected neural networks, namely a relative position weight coding network and a relative position bias coding network.

Further, the relative position between the robot and the neighboring robot is input into two fully-connected neural networks, and the weight code of the relative position is output

And relative position offset coding>

Further, the weighting information aggregation specifically includes:

wherein the content of the first and second substances,

is a complete state representation, N _i Represents the set of all neighbors of the robot i, n _i Represents the number of robot i neighbors, in conjunction with a selection of a number of robot i neighbors>

Is a robot map perception feature, is->

Is relative position weight encoded, and->

Is a relative position offset encoding.

Further, the calculation behavior decision is based on a robot perception decision model formed by a fully-connected network;

the robot perception decision model learns an action strategy through deep reinforcement learning training, maps from states to action probabilities, and maximizes accumulated rewards.

The invention provides a multi-robot unknown environment path planning system based on a geometric neural network.

The multi-robot unknown environment path planning system based on the geometric neural network comprises a perception feature extraction module, a state representation extraction module, a time sequence feature extraction module and an action generation module:

a perceptual feature extraction module configured to: extracting a robot map perception feature based on the acquired map perception information around the position of the robot at the current moment;

a state representation extraction module configured to: after the relative position between the robots is coded, the geometric figure neural network carries out weighted information aggregation on the map perception characteristics and the relative position codes of the robots to obtain the complete state representation of the robots;

a timing feature extraction module configured to: inputting the complete state representation of the robot into a long-term and short-term memory network, and extracting time sequence characteristics;

an action generation module configured to: and calculating a behavior decision based on the extracted time sequence characteristics, and generating an action to be executed by the robot at the current moment.

A third aspect of the present invention provides a computer-readable storage medium, on which a program is stored, which, when being executed by a processor, implements the steps of the method for planning the path of the unknown environment of the multi-robot based on the neural network of the geometry map according to the first aspect of the present invention.

A fourth aspect of the present invention provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor implements the steps in the method for planning a path of an unknown environment of multiple robots based on a neural network of geometric figures according to the first aspect of the present invention when executing the program.

The above one or more technical solutions have the following beneficial effects:

(1) The invention adopts a deep reinforcement learning method to realize decentralized distributed control of multiple robots, each robot can independently complete a path planning task, the dependence on a central controller is removed, and the difficulty of actual robot deployment is reduced; meanwhile, the strong strategy representation capability of multi-agent reinforcement learning is utilized, the influence of environmental instability caused by the motion of multiple robots on the strategy of the mobile robot is reduced, and the robustness of the motion strategy of the robot is improved.

(2) The invention adopts the neural network to realize the relative position coding, improves the flexibility and the representation capability of the coding, and is easier to learn the physical significance of the relative position; the method avoids setting the maximum representation distance in absolute position coding, has better expansibility, is easier to directly transfer the model trained in a small scene to a large-scale running environment, and reduces the model training cost.

(3) The invention provides a path planning network model, which adopts a geometric figure neural network model to combine relative position codes and figure neural network information together in an aggregation manner, overcomes the defect that a figure neural network does not have a regular spatial structure, enables a robot to distinguish neighbors at different positions when aggregating neighbor robot information, weights map observation information transmitted by neighbor robots according to the relative position codes of the neighbor robots, improves the environment sensing capability of the robot, and improves the algorithm success rate.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a diagram of a grid environment in a first embodiment.

Fig. 2 is a flow chart of the method of the first embodiment.

Fig. 3 is a system configuration diagram of the second embodiment.

Detailed Description

The invention is further described with reference to the following figures and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention; unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The invention provides a method for realizing neighbor information aggregation of robots according to relative position relation based on a path planning network model, which mainly comprises two parts of coding of the relative position relation among the robots and aggregation of graph network weighting information.

In the decision making process of the robot, firstly, environment perception is carried out: at the moment t, the robot acquires map information around the position of the robot, including the distribution situation of the obstacles and the positions of other robots.

The sensed map information is input into a convolutional neural network for feature extraction, and finally the robot i can obtain a one-dimensional map feature vector

The method is different from the prior method in which the robots directly extract vectors by using map information features for information exchange, and the method firstly performs relative position coding on relative positions among the robots and then performs information weighted summation based on the relative positions on neighbor information by using the position coding to realize information aggregation; the relative position coding module is realized by two fully-connected neural networks, namely a relative position weight coding network and a relative position bias coding network. The relative position between the robot i and the adjacent robot j is input into a relative position coding module, and the network outputs relative position weight codes

And a relative position bias encode>

Finally, the geometric graph neural network extracts vectors and position codes according to the map information features of the robot to carry out information aggregation, and the robot i aggregates the features of the neighbor robots to obtain a complete state representation->

Wherein N is _i Represents the set of all neighbors of the robot i, n _i Indicating the number of robot i neighbors.

Subsequent state characterization

The time sequence information of the historical time is extracted by a long-term and short-term memory network (LSTM), and then the time sequence information is input into an action generating module formed by a full-connection network, and the action which should be executed by the output robot i at the current time is greater than or equal to>

And finishing the action decision.

The path planning network model used by the invention is trained by adopting a reinforcement learning method, and all robots share one set of network, so that the learning efficiency is improved, and the expansibility of the system is enhanced.

Example one

The embodiment provides a multi-robot unknown environment path planning method based on a geometric neural network, and the example is carried out in a grid environment; as shown in fig. 1, circles in an example map represent a plurality of mobile robots, triangles of corresponding numbers represent destinations of different robots, respectively, and boxes represent obstacles in the map, the robots having a task of reaching the respective destinations in as short a time as possible and being unable to collide with the obstacles and other robots.

When each decision step starts, the robot firstly assumes an unknown map area as a feasible area, and then calculates a complete path leading to a target point by utilizing an A-star algorithm according to a known environment map; the path is used as long-term target guidance of the robot in the traveling process, reference is provided for a reinforcement learning model strategy, and meanwhile, the reinforcement learning model adjusts the local path of the robot according to the dynamic change of the environment so as to complete obstacle avoidance, so that a double-layer decision mode of global target guidance and local dynamic obstacle avoidance is formed; along with the moving of the robot, the unknown map environment is gradually ascertained, the robot movement and the map reconstruction are synchronously carried out, and the long-term target guidance calculated by the A-star algorithm is more and more accurate.

As shown in fig. 2, the multi-robot unknown environment path planning method based on the geometric neural network, that is, the specific steps of the path planning network model, include:

step S1: extracting a robot map perception feature based on the acquired map perception information around the position of the robot at the current moment;

the map perception information comprises the distribution situation of obstacles and the positions of other robots; at time t, for the map perceived by robot i

Is represented by, wherein W _FOV And H _FOV Respectively, the radius r of the robot in its field of view _FOV The width and height of the field of view observed below; r is ⁴ And the system represents four dimensions, wherein the first dimension represents obstacle information around the robot, the second dimension represents the destination of the robot i, the third dimension represents the states of other robots in the surrounding environment, and the fourth dimension represents the global target guidance calculated by the robot by using an A-x algorithm.

Robot map perception information

Inputting the data into a convolutional neural network for feature extraction to obtain a high-dimensional vector->

As a robot map perception feature.

Step S2: after the relative position between the robots is coded, the geometric figure neural network carries out weighted information aggregation on the map perception characteristics and the relative position codes of the robots to obtain the complete state representation of the robots;

and (3) the robot map perception features extracted by the convolutional neural network features are used as map embedding vectors of the map neural network nodes, and the geometric map neural network is utilized for information exchange between the robots.

The robots exchange the current using the graph structure as a communication network, and at time t, the communication network graph G = (V, epsilon) _t ) Showing that V represents the robot population, constitutes a node on the neural network,

representing the edges between the robots.

Establishing a diagram according to the position of the robot, and establishing a diagram according to the position of the robot when the robot is positioned

Is less than the radius r of the robot _com I.e. based on>

When the robot is in a working state, edges exist between nodes of the graph represented by the two robots; when (v) _i ,v _j )∈ε _t In time, the robot v _i And a robot v _j At time t, an ac path exists, and ac can be performed directly.

The geometric figure neural network firstly uses a relative position coding module to code the relative position between the robots, the relative position coding module comprises two fully-connected neural networks which are a relative position weight coding network and a relative position offset coding network respectively, and the network codes the relative position of the robot i and a neighbor robot j

Encoded as a relative position weight encode>

And a relative position bias encode>

Wherein->

Then the geometric neural network fuses the relative position code into the robot communication process, namely the weighted information is aggregated, and the formula is as follows:

wherein the content of the first and second substances,

Is a robot map perception feature，/>

Is relative position weight encoded, and->

Is a relative position offset encoding.

And step S3: inputting the complete state representation of the robot into a long-term and short-term memory network, and extracting time sequence characteristics;

characterizing the extracted complete state

The time sequence characteristic is input into an LSTM module for time sequence characteristic extraction.

And step S4: and calculating a behavior decision based on the extracted time sequence characteristics to generate the action to be executed by the robot at the current moment.

Inputting the extracted time sequence characteristics into a robot perception decision model formed by a fully-connected network, and calculating the behavior decision of the current t moment

Output the action which the robot i should perform at the current moment &>

And finishing the action decision.

The robot can select 5 actions at each moment, i.e. up, down, left, right and stop.

Therefore, the path planning problem of multiple robots is converted into a sequence decision problem, the robots make decisions on the behaviors at the current moment at each time step by means of input of local area visual fields and communication of a geometric graph neural network, and the requirement on the computing power of a central controller is lowered.

Deep reinforcement learning training is adopted in the path planning network model, and the robot learns an action strategy pi (a) _t |S _T ；θ _π ) I.e. mapping from state to action probability and maximizing the accumulationAnd (6) awarding.

In order to solve the problem of sparse reward in the problem of path planning of reinforcement learning, the robot is encouraged to follow the long-term target guidance given by the A-star algorithm to a certain extent, and reward is set for the behavior; like this in the reinforcement learning training in-process robot obtains the reward abundanter, avoids because of the training difficulty scheduling problem that reward sparse arouses, robot path planning reward definition is:

the training algorithm adopts an actor-critic structure, and the actor expects to output an action strategy pi (a) _t |S _T ；θ _π ) The obtained accumulated reward is the maximum, the critic network scores the action, loss is calculated according to the reward obtained by the action and the score given by the critic, and the critic network parameter is updated by gradient reduction; after a large amount of training, the rewards acquired by the robots tend to be stable, the algorithm gradually converges, the robots can collaborate to reach respective destinations in unknown environments, and the task is completed.

Example two

The embodiment discloses a multi-robot unknown environment path planning system based on a geometric figure neural network;

as shown in fig. 3, the multi-robot unknown environment path planning system based on the geometric neural network includes a perceptual feature extraction module, a state representation extraction module, a time sequence feature extraction module, and an action generation module:

a perceptual feature extraction module configured to: extracting map perception features of the robot based on the acquired map perception information around the position of the robot at the current moment;

a state representation extraction module configured to: after the relative position between the robots is coded, the geometric graph neural network carries out weighted information aggregation on the robot map perception characteristics and the relative position codes to obtain the complete state representation of the robots;

a temporal feature extraction module configured to: inputting the complete state representation of the robot into a long-term and short-term memory network, and extracting time sequence characteristics;

an action generation module configured to: and calculating a behavior decision based on the extracted time sequence characteristics to generate the action to be executed by the robot at the current moment.

EXAMPLE III

An object of the present embodiments is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in the method for planning paths in a multi-robot unknown environment based on a geometric neural network according to an embodiment of the present disclosure.

Example four

An object of the present embodiment is to provide an electronic device.

The electronic device comprises a memory, a processor and a program which is stored in the memory and can run on the processor, wherein the processor executes the program to realize the steps of the multi-robot unknown environment path planning method based on the geometric neural network according to the embodiment of the disclosure.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The multi-robot unknown environment path planning method based on the geometric neural network is characterized by comprising the following steps:

after the relative position between the robots is coded, the geometric graph neural network carries out weighted information aggregation on the robot map perception characteristics and the relative position codes to obtain the complete state representation of the robots;

and calculating a behavior decision based on the extracted time sequence characteristics to generate the action to be executed by the robot at the current moment.

2. The multi-robot unknown environment path planning method based on the geometric neural network as claimed in claim 1, characterized in that a global target guidance and local dynamic obstacle avoidance double-layer decision-making mode is adopted: and taking the complete path which is calculated by the A-x algorithm and leads to the target point as long-term target guide of the robot in the traveling process, providing reference for a reinforcement learning model strategy, and simultaneously adjusting the local path of the robot by the reinforcement learning model according to the dynamic change of the environment so as to finish obstacle avoidance.

3. The method for planning the paths of the unknown environments of the multiple robots based on the neural network of the geometric figures as claimed in claim 1, wherein said map senses information including the distribution of obstacles and the positions of other robots.

4. The method for planning the paths of the unknown environments of the multiple robots based on the neural network of the geometric figure as claimed in claim 1, wherein the relative position coding is realized by two fully connected neural networks, namely a relative position weight coding network and a relative position bias coding network.

5. The method for planning the path of unknown environment of multiple robots based on the geometric neural network as claimed in claim 4, wherein the relative position between the robot and the neighboring robot is inputted into two fully connected neural networks, and the weight code of the relative position is outputted

And a relative position bias encode>

6. The multi-robot unknown environment path planning method based on the geometric neural network as claimed in claim 1, wherein the weighted information aggregation specifically comprises:

wherein the content of the first and second substances,

is a complete state representation, N _i Represents the set of all neighbors of the robot i, n _i Indicating the number of neighbors of the robot i,

is a robot map perception feature, is->

Is a relative position weight code, and->

Is a relative position offset encoding.

7. The multi-robot unknown environment path planning method based on the geometric neural network as claimed in claim 1, wherein the computational behavior decision is based on a robot perception decision model composed of a fully connected network;

8. The multi-robot unknown environment path planning system based on the geometric neural network is characterized by comprising a perception feature extraction module, a state representation extraction module, a time sequence feature extraction module and an action generation module:

9. Computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, carries out the steps of the method for multi-robot unknown environment path planning based on a neural network of geometrical figures as claimed in any one of claims 1 to 7.

10. Electronic equipment comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for multi-robot unknown environment path planning based on neural network as claimed in any of claims 1-7.