CN115907248B

CN115907248B - Multi-robot unknown environment path planning method based on geometric neural network

Info

Publication number: CN115907248B
Application number: CN202211328724.XA
Authority: CN
Inventors: 程吉禹; 丁俊锋; 张伟; 张�浩
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-07-14
Anticipated expiration: 2042-10-26
Also published as: CN115907248A

Abstract

The invention provides a method and a system for planning a path of an unknown environment of a plurality of robots based on a geometric neural network, which relate to the technical field of artificial intelligence, realize effective coding of relative positions among the robots by the method of the geometric neural network, weight neighbor robot information according to the position coding, realize a decentralization distributed control mode by adopting a deep reinforcement learning method, improve the accuracy of the robots on neighbor robot information aggregation and improve the success rate of the path planning of the plurality of robots, and the specific scheme comprises the following steps: extracting map sensing characteristics of the robot based on map sensing information around the position of the robot at the current moment; the geometrical map neural network carries out weighted information aggregation on the robot map sensing characteristics and the relative position codes to obtain the robot complete state representation; inputting the robot complete state representation into a long-short-period memory network, and extracting time sequence characteristics; and calculating the behavior decision, and generating an action to be executed by the robot at the current moment.

Description

Multi-robot unknown environment path planning method based on geometric neural network

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-robot unknown environment path planning method and system based on a geometric neural network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Path planning refers to generating an optimal collision-free path from a starting position to a target position in a mobile robot work environment; robot path planning is a key content for realizing robot automation, and is always a hotspot in the field of robot research; along with the development of robot technology and the improvement of industrial production requirements, the application of multiple robots is continuously increased, and the research of a path planning algorithm of the multiple robots is more and more important; compared with single mobile robot path planning, actions among a plurality of robots can affect each other, so that the robot work environment is unstable, and a single robot path planning algorithm is not applicable any more; in addition, the number of robots is increased to cause the system state space and the action space dimension to be increased sharply, the difficulty of system optimization and solving is high, and the requirements on the computing capacity and the response speed of deployment equipment are high.

The existing multi-robot path planning algorithm based on the deep reinforcement learning method realizes communication among robots by adopting a pattern neural network mode; the graph data mainly comprises nodes and edges, in the path planning of multiple robots, each different robot forms a node on the graph structure, whether edges exist among the robot nodes or not mainly depends on whether the geometric distance among the robots is in the communication range of the robots, if the geometric distance is smaller than the communication radius of the robots, the edges exist among the robots, and the two robots are adjacent robots, otherwise, the edges do not exist; the robots utilize the graph neural network to realize the information exchange among neighbors in the constructed graph structure, aggregate the map information perceived by the neighbor node robots, indirectly increase the receptive field of the robots and increase the cooperation among the robots; however, the graph data is a non-euclidean structure data, does not have a regular spatial structure, and is irregular and unordered; although the graph data can meet the requirement of random structure transformation of a plurality of mobile robots in the travelling process, the relative position relation information among the robots is lost, the relative position relation among the robots is not explicitly coded in the existing method, the position relation among the robots is mainly learned by implicit mode, the robots cannot accurately acquire the positions of the neighbor robots, only the coded observation information transmitted by all the neighbor robots in the neighborhood is weighted and summed, and the robot is not beneficial to more accurate perception of the surrounding environment.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a multi-robot unknown environment path planning method and system based on a geometric neural network, and provides a path planning network model.

To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

the first aspect of the invention provides a multi-robot unknown environment path planning method based on a geometric neural network;

the method for planning the path of the unknown environment of the multiple robots based on the neural network of the geometric figure comprises the following steps:

extracting map sensing characteristics of the robot based on the acquired map sensing information around the position of the robot at the current moment;

after relative position codes are carried out on the relative positions among robots, the geometric neural network carries out weighted information aggregation on the robot map sensing characteristics and the relative position codes, and the robot complete state representation is obtained;

inputting the robot complete state representation into a long-short-period memory network, and extracting time sequence characteristics;

based on the extracted time sequence features, calculating a behavior decision, and generating actions to be executed by the robot at the current moment.

Further, a double-layer decision mode of global target guidance and local dynamic obstacle avoidance is adopted: and taking the complete path which is calculated by using the A-type algorithm and leads to the target point as long-term target guide of the robot in the running process, providing reference for a reinforcement learning model strategy, and adjusting the local path of the robot by the reinforcement learning model according to the dynamic change of the environment so as to finish obstacle avoidance.

Further, the map sensing information comprises obstacle distribution conditions and positions of other robots.

Further, the relative position coding is realized by two fully-connected neural networks, namely a relative position weight coding network and a relative position bias coding network.

Further, the relative position between the robot and the neighbor robot is input into two fully connected neural networks, and the relative position weight code is output

And relative position offset coding +.>

Further, the weighting information aggregation specifically includes:

wherein, the liquid crystal display device comprises a liquid crystal display device,

is a complete state representation, N _i Representing a set of all neighbors of robot i, n _i Representing the number of neighbors of robot i, +.>

Is a robot map perception feature, < >>

Is a relative position weight code,/->

Is a relative position offset code.

Further, the computational behavior decision is based on a robot perception decision model formed by a fully connected network;

the robot perception decision model learns an action strategy through deep reinforcement learning training, maps the state to the action probability, and maximizes the cumulative rewards.

The second aspect of the invention provides a multi-robot unknown environment path planning system based on a geometric neural network.

The multi-robot unknown environment path planning system based on the geometric neural network comprises a perception feature extraction module, a state representation extraction module, a time sequence feature extraction module and an action generation module:

a perceptual feature extraction module configured to: extracting map sensing characteristics of the robot based on the acquired map sensing information around the position of the robot at the current moment;

a state representation extraction module configured to: after relative position codes are carried out on the relative positions among robots, the geometric neural network carries out weighted information aggregation on the robot map sensing characteristics and the relative position codes, and the robot complete state representation is obtained;

a timing feature extraction module configured to: inputting the robot complete state representation into a long-short-period memory network, and extracting time sequence characteristics;

an action generation module configured to: based on the extracted time sequence features, calculating a behavior decision, and generating actions to be executed by the robot at the current moment.

A third aspect of the invention provides a computer readable storage medium having stored thereon a program which when executed by a processor performs the steps of a multi-robot unknown environmental path planning method based on a geometric neural network according to the first aspect of the invention.

A fourth aspect of the invention provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the geometric neural network based multi-robot unknown environmental path planning method according to the first aspect of the invention when the program is executed.

The one or more of the above technical solutions have the following beneficial effects:

(1) According to the invention, the deep reinforcement learning method is adopted to realize the decentralized distribution control of multiple robots, each robot can independently complete the path planning task, the dependence on a central controller is removed, and the difficulty of the deployment of actual robots is reduced; meanwhile, the strong strategy representation capability of multi-agent reinforcement learning is utilized, the influence of environmental instability caused by multi-robot movement on the mobile robot strategy is reduced, and the robustness of the robot movement strategy is improved.

(2) The invention adopts the neural network to realize the relative position coding, improves the flexibility and the representation capability of the coding, and is easier to learn the physical meaning of the relative position; the setting of the maximum characterization distance in absolute position coding is avoided, the expansibility is better, the model trained in a small scene is more easily and directly migrated to a large-range operation environment, and the model training cost is reduced.

(3) The invention provides a path planning network model, which combines relative position codes with image neural network information aggregation by adopting a geometric image neural network model, overcomes the defect that the image neural network does not have a regular space structure, enables a robot to distinguish neighbors at different positions when aggregating neighbor robot information, weights map observation information transmitted by a neighbor robot according to the relative position codes of the neighbor robot, improves the environment sensing capability of the robot and improves the algorithm success rate.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a grid environment diagram in a first embodiment.

Fig. 2 is a flow chart of a method of the first embodiment.

Fig. 3 is a system configuration diagram of a second embodiment.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention; unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The invention provides a method for realizing neighbor information aggregation of robots according to relative position relations based on a path planning network model, which mainly comprises two parts of relative position relation coding and graph network weighting information aggregation among the robots.

In the decision making process of the robot, firstly, environment sensing is carried out: at time t, the robot acquires map information around the position of the robot, including the distribution condition of the obstacle and the positions of other robots.

The perceived map information is input into a convolutional neural network for feature extraction, and finally the robot i can obtain a one-dimensional map feature vector

In the invention, after relative position coding is carried out on the relative positions among robots, information weighted summation based on the relative positions is carried out on neighbor information by using the position coding, so as to realize information aggregation; the relative position coding module is realized by two fully-connected neural networks, namely a relative position weight coding network and a relative position bias coding network. The relative position between robot i and its neighbor robot j is input to relative position coding module, and the network outputs relative position weight code

And opposite toPosition offset coding->

The final geometric neural network performs information aggregation according to the robot map information feature extraction vector and the position code, and the robot i obtains the complete state representation after aggregating the features of the neighbor robots>

Wherein N is _i Representing a set of all neighbors of robot i, n _i Representing the number of neighbors of robot i.

Subsequent state characterization

The time sequence information of the historical moment is input into a long-short-period memory network (LSTM), the time sequence information is extracted and then is input into an action generating module formed by a fully-connected network, and the action which the robot i should execute at the current moment is output>

And finishing action decision.

The path planning network model used in the invention is trained by adopting a reinforcement learning method, and all robots share one set of network, so that the learning efficiency is improved, and the expansibility of the system is enhanced.

Example 1

The embodiment provides a multi-robot unknown environment path planning method based on a geometric neural network, wherein an example is performed in a grid environment; as shown in fig. 1, circles in the example map represent a plurality of mobile robots, triangles corresponding to numbers represent destinations of different robots, respectively, and boxes represent obstacles in the map, the task of the robots is to reach the respective destinations in as short a time as possible and not collide with the obstacle and other robots.

When each decision step starts, the robot firstly assumes an unknown map area as a feasible area, and then calculates a complete path leading to a target point by utilizing an A-algorithm according to a known environment map; the path is used as long-term target guide of the robot in the running process, provides reference for a reinforcement learning model strategy, and adjusts a local path of the robot according to the dynamic change of the environment to finish obstacle avoidance, so that a double-layer decision mode of global target guide and local dynamic obstacle avoidance is formed; as the robot advances, the unknown map environment will be ascertained step by step, the robot motion and map reconstruction are performed synchronously, and the long-term target guidance calculated by the algorithm a will be more and more accurate.

As shown in fig. 2, the method for planning the path of the multi-robot unknown environment based on the geometric neural network, namely, the specific steps of the path planning network model, include:

step S1: extracting map sensing characteristics of the robot based on the acquired map sensing information around the position of the robot at the current moment;

map sensing information including obstacle distribution conditions and positions of other robots; at time t, robot i perceives map

Wherein W is _FOV And H _FOV The robot has a radius r of view _FOV The width and height of the field of view observed; r is R ⁴ Representing four dimensions, the first dimension represents obstacle information around the robot, the second dimension represents the destination of the robot i, the third dimension represents the states of other robots in the surrounding environment, and the fourth dimension represents global target guidance calculated by the robot by using an a-x algorithm.

Map sensing information of robot

Inputting into convolutional neural network for feature extraction to obtain a high-dimensional vector +.>

As a robot map perception feature.

Step S2: after relative position codes are carried out on the relative positions among robots, the geometric neural network carries out weighted information aggregation on the robot map sensing characteristics and the relative position codes, and the robot complete state representation is obtained;

the robot map sensing characteristics extracted by the convolutional neural network characteristics are used as map embedding vectors of map neural network nodes, and the geometric neural network is used for information communication among robots.

Robots communicate with each other by using the graph structure as an communication network, and at time t, the communication network uses the graph g= (V, epsilon) _t ) And V represents the population of machines, the nodes on the neural network,

representing the edges between robots.

According to the position of the robot, a map is built, when the robot is at the position

The distance between the two is smaller than the communication radius r of the robot _com I.e. +.>

When the two robots represent graph nodes, edges exist between the graph nodes; when (v) _i ,v _j )∈ε _t Robot v _i And robot v _j At time t, there is an exchange path, and the exchange can be directly performed.

The geometric neural network firstly uses a relative position coding module to code the relative positions among robots, the relative position coding module comprises two fully-connected neural networks, namely a relative position weight coding network and a relative position offset coding network, and the networks are used for coding the relative positions of the robot i and a neighbor robot j of the robot i

Coding as relative position weight coding +.>

And relative position offset coding +.>

Wherein->

The relative position codes are integrated into the robot communication process by the geometric neural network, namely the weighted information is aggregated, and the formula is as follows:

Is a robot map perception feature, < >>

Is a relative position weight code,/->

Is a relative position offset code.

Step S3: inputting the robot complete state representation into a long-short-period memory network, and extracting time sequence characteristics;

characterization of the extracted complete state

Is input to the LSTM module for timing feature extraction.

Step S4: based on the extracted time sequence features, calculating a behavior decision, and generating actions to be executed by the robot at the current moment.

Inputting the extracted time sequence characteristics into a robot perception decision model formed by a fully connected network, and calculating the current time tCarved behavior decision

Outputting the action which the robot i should perform at the current moment +.>

And finishing action decision.

There are 5 actions selectable by the robot at each moment, up, down, left, right and stopped.

The path planning problem of the multiple robots is converted into a sequence decision problem, the robots make decisions on the current moment behaviors by means of the local area visual field input and the geometric neural network communication at each time step, and the calculation force requirement of the central controller is reduced.

Deep reinforcement learning training is adopted in a path planning network model, and a robot learns an action strategy pi (a _t |S _T ；θ _π ) I.e., mapping from state to action probability, and maximizes the jackpot.

In order to solve the sparse rewarding problem in the reinforcement learning path planning problem, the embodiment encourages the robot to follow the long-term target guide given by the A-algorithm to a certain extent, and sets rewards for the behavior; in this way, the robot obtains more abundant rewards in the reinforcement learning training process, avoids the problems of difficult training and the like caused by sparse rewards, and the robot path planning rewards are defined as follows:

the training algorithm adopts an actor-critic structure, and the actor expects to output action strategies pi (a _t |S _T ；θ _π ) The obtained accumulated rewards are the largest, the critic network scores the action, and calculates loss according to the rewards obtained by the action and the scores given by the critic, and gradient descent is utilized to update the parameters of the critic network; after a great deal of training, rewards obtained by the robot tend to be stable, and the algorithm is gradually collectedThe plurality of robots can cooperate in an unknown environment to reach respective destinations, and the task is completed.

Example two

The embodiment discloses a multi-robot unknown environment path planning system based on a geometric neural network;

as shown in fig. 3, the multi-robot unknown environment path planning system based on the geometric neural network comprises a perception feature extraction module, a state characterization extraction module, a time sequence feature extraction module and an action generation module:

Example III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a multi-robot unknown environmental path planning method based on a geometric neural network according to an embodiment of the present disclosure.

Example IV

An object of the present embodiment is to provide an electronic apparatus.

An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor, when executing the program, performs the steps in the geometric neural network-based multi-robot unknown environmental path planning method according to the first embodiment of the present disclosure.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for planning the path of the unknown environment of the multiple robots based on the neural network of the geometric figure is characterized by comprising the following steps:

(1) Extracting map sensing characteristics of the robot based on the acquired map sensing information around the position of the robot at the current moment;

the map sensing information comprises obstacle distribution conditions and positions of other robots;

(2) After relative position codes are carried out on the relative positions among robots, the geometric neural network carries out weighted information aggregation on the robot map sensing characteristics and the relative position codes, and the robot complete state representation is obtained;

the relative position coding is realized through two fully-connected neural networks, namely a relative position weight coding network and a relative position bias coding network; the relative position between the robot and the neighbor robot is input into two fully-connected neural networks, and the relative position weight code is output

And relative position offset coding +.>

；

The weight information aggregation specifically comprises the following steps:

is a complete state representation,/->

Representing the set of all neighbors of robot i, +.>

Representing the number of neighbors of robot i, +.>

Is a robot map perception feature, < >>

Is a relative position weight code,/->

Is a relative position offset code;

(3) Inputting the robot complete state representation into a long-short-period memory network, and extracting time sequence characteristics;

(4) Based on the extracted time sequence features, calculating a behavior decision, and generating actions to be executed by the robot at the current moment.

2. The method for planning the path of the unknown environment of the multiple robots based on the neural network of the geometric figure according to claim 1, wherein a double-layer decision mode of global target guidance and local dynamic obstacle avoidance is adopted: and taking the complete path which is calculated by using the A-type algorithm and leads to the target point as long-term target guide of the robot in the running process, providing reference for a reinforcement learning model strategy, and adjusting the local path of the robot by the reinforcement learning model according to the dynamic change of the environment so as to finish obstacle avoidance.

3. The geometrical neural network-based multi-robot unknown environmental path planning method of claim 1, wherein the computational behavior decision is based on a robot perception decision model composed of a fully connected network;

the robot perception decision model learns an action strategy, i.e. a mapping from states to action probabilities, through deep reinforcement learning training and maximizes the jackpot.

4. The multi-robot unknown environment path planning system based on the geometric neural network is characterized by comprising a perception feature extraction module, a state representation extraction module, a time sequence feature extraction module and an action generation module:

And relative position offset coding +.>

；

The weight information aggregation specifically comprises the following steps:

wherein the method comprises the steps of，

Is a complete state representation,/->

Representing the set of all neighbors of robot i, +.>

Representing the number of neighbors of robot i, +.>

Is a robot map perception feature, < >>

Is a relative position weight code,/->

Is a relative position offset code;

5. A computer readable storage medium having stored thereon a program, which when executed by a processor, implements the steps of the geometrical neural network based multi-robot unknown environment path planning method according to any of claims 1-3.

6. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor performs the steps in the method for planning paths of multiple robots based on a geometric neural network as claimed in any one of claims 1 to 3.