GB2610276A

GB2610276A - Method for multi-agent dynamic path planning

Info

Publication number: GB2610276A
Application number: GB2209423.9A
Authority: GB
Inventors: Cai Xuelian; Zheng Jing; Fu Xiao; Tan Zihang
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-08-12
Filing date: 2022-06-28
Publication date: 2023-03-01
Also published as: CN113625716B; AU2022204569A1; AU2022204569B2; CN113625716A; GB202209423D0

Abstract

A method for multi-agent dynamic path planning, includes: (1) Constructing a grid map containing a safety zone. (2) Roughly planning, by using an A* algorithm, a running path for each agent in a Cartesian rectangular coordinate system of the grid map. (3) Re-planning, by using a Q-learning-based two-agent collaborative obstacle avoidance algorithm, running paths for two agents after a conflict. (4) Determining whether updated initial positions are target positions set by the agents, and if yes, executing the step (5), otherwise, executing the step (3). (5) Sequentially connecting path points from start points to end points of the agents to obtain optimal conflict-free running paths for the two agents.

Description

METHOD FOR MULTI-AGENT DYNAMIC PATH PLANNING

TECHNICAL FIELD

The present disclosure belongs to the technical field of path planning, and further relates to a method for multi-agent dynamic path planning in the technical field of dynamic path planning. The present disclosure may efficiently avoid dynamic obstacles, so as to plan optimal paths for agents to go to destinations during the movement process.

BACKGROUND

Multiple agents include a series of interactive agents. Each agent inside follows corresponding organizational rules to complete specific tasks. The agents refer to entities with basic characteristics of self-organization and sociality. An agent may be recognized as a corresponding software program or an entity (such as a person, a vehicle, a robot, etc.), that is embedded in an environment, perceives the environment through a sensor, acts autonomously on the environment through an effector and meets design requirements. Multi-agent path planning is widely used in intelligent inspection, aircraft trajectory planning, autonomous driving and other fields. The multi-agent path planning is to analyze and process prior information such as maps and data sensed by sensors in a space with obstacles to perceive a state of the environment around the agents. After the environment around is perceived, an optimal conflict-free path from a start point to a target point that conforms to a certain evaluation index is found through a designed policy, so as to ensure that the agent may bypass the obstacles during the running according to the path to reach a destination smoothly. In the multi-agent path planning, the following three main goals are achieved: a path from the start point to the target point is planned; during the process of running along the path, the agent may successfully avoid the obstacles when encountering them; and on the basis of satisfying the above two goals, the planned path meets the global optimality to the greatest extent. At present, methods to solve such problems usually include a heuristic random search algorithm and an intelligent optimization algorithm.

The heuristic random search algorithm is a method for path planning based on heuristic information. The algorithm is guided to search in a most promising direction by using heuristic information carried by problems, so as to find an optimal path. For example: The University of Electronic Science and Technology of China disclosed a heuristic multidirectional rapidly exploring random tree-based method for robot path planning optimization in its applied patent document "Method for Unmanned Aerial Vehicle Path Planning Based on Rapidly exploring Random Tree (RRT) Algorithm" (application number: CN202110409420.5, and application publication number: CNI12987799A). In a high-dimensional configuration space, this method draws on an idea of the RRT algorithm. First, when a random tree does not encounter obstacles during the exploring process, it will gradually enhance its tendency to quickly approach a target point, and then once it encounters the obstacles, it will immediately reduce this tendency to maximize the randomness of growth, thereby making the random tree avoid the obstacles more efficiently, and eventually generating a locally optimal path that successfully avoids the obstacles. This method may select a parent node with smaller cost, instead of simply selecting a nearest node, thereby maximizing the planning efficiency and speed. However, the following shortcomings of this method still exist: the path planned by the improved RRT algorithm does not take into account dynamic obstacles, namely, other agents, and a mobile agent may only wait to avoid the obstacles or go back the same way, which increases the length of an invalid running path, and seriously affects the running speed of the mobile agent.

Methods for multi-agent dynamic path planning through the intelligent optimization algorithm are mostly researches based on reinforcement learning methods, such as a Q-leaming algorithm and a State Action Reward State Action (SARSA) algorithm, that is, through continuous interaction between agents and environments, obstacle construction environment modeling is unified to quickly detect collision between paths and all obstacles, so as to plan an optimal path for an agent. For example: The Dalian University of Technology disclosed an ant colony algorithm-based method for multi-agent path planning by reinforcement learning in its applied patent document "Ant Colony Algorithm-based Method for Multi-Agent Path Planning by Reinforcement Learning" (application number: CN202011257321.1, and application publication number: CN112286203A). This method includes two stages. A first stage is to obtain current environment information of an agent cluster. Firstly, pheromones in an ant colony algorithm are used as heuristic information to detect obstacles in an environment, and their polar coordinate information is stored. Secondly, according to sensor information, target position information, pheromone information, and a sequence of serial numbers, collected state information is abstracted into a tuple, which serves as a current state to describe a pheromone map in an initialized environment. A second stage is to train a deep reinforcement learning model for multi-agent path planning. Firstly, a shared experience pool D is initialized, the size of the shared experience pool is set to be AT, the ant colony pheromone map is initialized, the diffusion rate /7 and attenuation rate)° of the pheromones and the number Q of agents in the agent cluster are set, and an improved deep q-network (DQN) and deep reinforcement learning method based on Q-learning and an ant colony "pheromone" coordination mechanism are adopted. Secondly, a neural network is trained and updated based on historical information of the agent cluster. Finally, an optimal path planning policy of each agent in the agent cluster is obtained. Although this method may also be used for multi-agent path planning, the following shortcomings of this method still exist: it only takes into account non-opposite conflicts, and ignores opposite conflicts caused by mobile agents, which leads to that during the actual running process, the agent needs to continuously avoid the obstacles, an optimal avoidance path for the agent may not be re-planned, and the running efficiency of the agent is seriously affected.

SUMMARY

In view of the above deficiencies of the prior art, an objective of the present disclosure is to provide a method for multi-agent dynamic path planning, so as to solve the problems that during the actual running process, an agent needs to continuously avoid obstacles and the running efficiency of the agent is seriously affected due to opposite conflicts caused by other mobile agents during the dynamic path planning of agents.

To achieve the above objective, an idea of the present disclosure is as follows: when two agents have an opposite conflict during the running, running paths for the agents with the conflict are planned by using a Q-leaming-based two-agent collaborative obstacle avoidance algorithm, the two agents are subjected to global dynamic path planning with a combination of passable grid positions as an initial position, and reward values are given to the agents; and the agents are guided to select moving directions for making the running paths shortest in a searchable range as next moving directions, thereby solving the problem that optimal avoidance paths for the two agents may not be re-planned. According to the present disclosure, by introducing a safety zone in a grid map, only one of the two agents running in opposite directions is enabled to enter the safety zone, and the other of the two agents running in the opposite directions is enabled to stay at an entrance of the safety zone, thereby getting rid of constraints of fixed obstacles in an existing method for dynamic path planning, and solving the problem that the agent runs on repeated road segments during the dynamic path planning The objective of the present disclosure is achieved by the following specific steps: step 1: constructing the grid map containing the safety zone: (1 a) constructing, by using a grid process, the grid map with a width and a length of X and I' ; (lb) in the grid map, setting a zone where the agents running in the opposite directions avoid other agents at the same time as the safety zone in the one-way lane; and (1c) establishing, by taking an upper left comer of the grid map as an origin, rightward extension as a positive direction of a horizontal axis, and downward extension as a positive direction of a vertical axis, a Cartesian rectangular coordinate system of the grid map, and marking initial positions and target positions of the agents in the Cartesian rectangular coordinate system; step 2: roughly planning, by using an At algorithm, a running path for each agent in the Cartesian rectangular coordinate system of the grid map; step 3: planning, by using the Q-leaming-based two-agent collaborative obstacle avoidance algorithm, the running paths for the agents with the conflict: (3a) enabling each agent to run according to its roughly planned path, and if the two agents in the one-way lane have the opposite conflict, planning conflict-free running paths for the two agents simultaneously; (3b) creating a Q-value table for storing information of the two agents, and setting an initial Q-value to be -2 x 10-9 wherein horizontal coordinates of the Q-value table represent a combination of passable grid positions of the two agents, and vertical coordinates of the Q-value table represent a combination of moving directions of the two agents, (3c) when the two agents detect the same safety zone simultaneously, enabling only one of the two agents running in the opposite directions to enter the safety zone and the other of the two agents running in the opposite directions to stay at an entrance of this safety zone, and taking current positions as the initial positions of the two agents respectively; s (3d) according to an greedy policy, enabling each of the two agents to randomly select, by using an exploration rate c, a moving direction in up, down, left, right, and stop respectively, wherein the exploration rate c is set to be 0.5; (3e) enabling the two agents to update the positions according to the selected directions, and taking the updated positions as the initial positions; (30 rewarding each of the two agents whose initial positions are the target positions with 1 point, publishing at least one of the two agents, that encounters an obstacle or whose initial position belongs to the safety zone, with I point, and rewarding at least one of the agents, that reaches the passable position but not the target position, with I point; (3G) updating, by using an update formula, all Q-values in the Q-value table and selecting out updated maximum Q-values; and (3h) determining whether the updated initial positions are the target positions set in the step (lc), and if yes, executing the step 4, otherwise, executing the step (3d); and step 4: sequentially connecting path points from the initial positions to the target positions of the agents to obtain the conflict-free running paths for the agents.

Compared with the prior art, the present disclosure has the following advantages: Firstly, according to the present disclosure, the running paths for the agents with the conflict are planned by using the Q-learning-based two-agent collaborative obstacle avoidance algorithm, the global dynamic path planning is performed with the combination of the passable grid positions as the initial position, and the reward values are given to the agents; and the agents are guided to select the moving directions for making the running paths shortest in the searchable range as the next moving directions, thereby solving the problems that the optimal avoidance paths for the two agents may not be re-planned and the running efficiency of the mobile agents is seriously affected, and making the paths planned by the present disclosure improve the running efficiency of the agents, shorten the running time of the agents and effectively avoid the conflicts again.

Secondly, according to the present disclosure, by introducing the safety zone in the grid map, only one of the two agents running in the opposite directions is enabled to enter the safety zone, the other of the two agents running in the opposite directions is enabled to stay at the entrance of the safety zone, and an opposite conflict avoidance algorithm is directly triggered, thereby solving the problems that when there are dynamic obstacles during the dynamic path planning in the prior art, the agent may only wait or go back the same way to avoid the obstacles, and the running speed of the mobile agent is seriously affected. Therefore, the path planned by the present disclosure increases the running speed of the agent, improves the efficiency of the opposite conflict avoidance algorithm, and reduces the invalid running distance of the agent.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a flowchart of the present disclosure;

FIG. 2 is a schematic diagram of a grid map containing a safety zone in the

present disclosure;

FIG. 3 is a schematic diagram of agents with a conflict in the present disclosure; and FIG. 4 is a simulation diagram of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

The present disclosure is further described in detail below with reference to the accompanying drawings and embodiments.

Referring to FIG. 1, the implementation steps of the present disclosure are further described in detail.

Step 1: a grid map containing a safety zone is constructed.

By using a grid process, the Did map with a width and a length of X and Y is constructed.

In the grid map, a zone where agents running in opposite directions avoid other agents at the same time is set as the safety zone in a one-way lane.

By taking an upper left corner of the grid map as an origin, rightward extension as a positive direction of a horizontal axis, and downward extension as a positive direction of a vertical axis, a Cartesian rectangular coordinate system of the grid map is established, and initial positions and target positions of the agents are marked in the Cartesian rectangular coordinate system.

Referring to FIG. 2, the grid map containing the safety zone constructed according to the present disclosure is further described in detail.

FIG. 2 is the grid map containing the safety zone constructed according to the present disclosure. The upper left corner of the grid map is the origin 0, the x axis represents the width of the grid map, in meters, and the Y axis represents the length of the grid map, in meters. White squares in FIG. 2 represent passable grids, and dark squares represent impassable grids. A number 1 in FIG. 2 represents an agent 1, a number 2 represents an agent 2, and a zone represented by oblique lines is the safety zone.

Step 2: by using an A algorithm, a running path for each agent is roughly planned in the Cartesian rectangular coordinate system of the grid map.

Step 3: by using a Q-learning-based two-agent collaborative obstacle avoidance algorithm, running paths for the agents with a conflict are planned.

Firstly, each agent runs according to its roughly planned path, and if the two agents in the one-way lane have the opposite conflict, conflict-free running paths are planned for the two agents simultaneously.

Secondly, a Q-value table for storing information of the two agents is created, and an initial Q-value is set to be -2 x 10-9, wherein horizontal coordinates of the Q-value table represent a combination of passable grid positions of the two agents, and vertical coordinates of the Q-value table represent a combination of moving directions of the two agents.

Thirdly, when the two agents detect the same safety zone simultaneously, current positions are taken as the initial positions of the two agents respectively.

Fourthly, according to an 6. greedy policy, each of the two agents randomly select, by using an exploration rate, a moving direction in up, down, left, right, and stop respectively, wherein the exploration rate g is set to be 0.5.

Fifthly, the two agents update the positions according to the selected directions, and the updated positions are taken as the initial positions Sixthly, each of the two agents whose initial positions are the target positions is rewarded with 1 point, at least one of the two agents, that encounters an obstacle or whose initial position belongs to the safety zone, is published with 1 point, and at least one of the agents, that reaches the passable position but not the target position, is rewarded with 1 point.

Seventhly, by using an update formula, all Q-values in the Q-value table are updated and updated maximum Q-values are selected out; and Eighthly, whether the updated initial positions are the target positions set in the step 1 is determined, and if yes, the step 4 is executed, otherwise, this step is executed.

Referring to FIG. 3, opposite conflict types of conflict-free paths planned by the present disclosure are further described in detail.

FIG. 3 is a schematic diagram of two types of opposite conflicts. FIG. 3 (a) is a schematic diagram of a first type of the opposite conflict of the two agents, wherein a black dot represents that the agent 1 and the agent 2 possibly collide in the middle of grids, and at this time, the coordinates of the initial positions of the two agents after the opposite conflict in the grid map are same. FIG. 3 (b) is a schematic diagram of a second type of the opposite conflict of the two agents, wherein a black dot represents that the agent 1 and the agent 2 collide at a junction of the grids, and at this time, the coordinates of the initial positions of the two agents after the opposite conflict in the grid map are different Step 4: path points from the initial positions to the target positions of the agents are sequentially connected to obtain the conflict-free running paths for the agents.

The effects of the present disclosure are further described below with reference to a simulation experiment.

1. Conditions for the simulation experiment: A hardware platform for the simulation experiment in the present disclosure is that: a central processing unit is an Intel (R) Core (TM) i7-9700 CPU, a main frequency is 3.00 GHz, and a memory is 16.0 GB. A software platform for the simulation experiment in the present disclosure is: a Windows10 operating system and Java 11.

The simulation experiment in the present disclosure mainly verifies the completeness of the opposite conflict avoidance and shortest path planning algorithms.

In order to be able to more clearly display moving trajectories of the agents, the running paths for the two agents are planned in the small-scale grid map of I 5111* 29m. The side length of each grid is Im, and the distribution of passable grids and impassable grids in the grid map is manually set.

2. Content and result analysis of the simulation experiment: The simulation experiment in the present disclosure respectively simulates path planning of the preset grid map by using the method provided by the present disclosure and an At algorithm-based method for path planning in a prior art.

Specific parameters for the simulation experiment are as shown in a Table 1: Table 1: Parameter table for simulation experiment Parameter name Parameter value Number number of agents 100 /r.

Task arrival rate 2 1 per second Running speed vitt' of agent lini s Turning time consumption 1.c Ir urn of agent Learning rate a 0.5 In the simulation experiments, a prior art used refers to that: An algorithm-based method for path planning in the A* prior art refers to a method for path planning proposed by Gelperin et al. in "On the optimality of A*Pl. Artificial Intelligence, 1977, 8(1):69-76.", abbreviated as the A' algorithm-based method for path planning.

The effects of the present disclosure are further described below with reference to a simulation diagram of FIG. 4.

FIG. 4 is the simulation diagram of the present disclosure, wherein a black dot represents the agent 1, a black solid line represents the running path for the agent 1, a gray dot represents the agent 2, a black dotted line represents the running path for the agent 2, and an arrow represents a running direction thereof A diamond grid in FIG. 4 is a target position of an uninstallation task.

FIG. 4 (a) is a simulation result diagram of original running paths for the agents before path planning; FIG. 4 (b) represents a simulation result diagram that if the two agents run according to the original paths, the two agents will have the opposite conflict at a position marked with a black triangle; FIG. 4 (c) represents a simulation result diagram that the two agents will perform dynamic obstacle avoidance at a black dot and a gray dot respectively, that is, the step before collision; and FIG. 4 (d) is a simulation result diagram that the method for path planning provided by the present disclosure performs path planning, wherein a black solid line is a path re-planned for the agent 1, and a black dotted line represents a path re-planned for the agent 2.

From results of the simulation experiment, it may be seen that the Q-learning-based two-agent collaborative obstacle avoidance algorithm may successfully avoid the opposite conflict and achieve the shortest path to the target position. When the two agents have the opposite conflict in FIG. 4 (c), a dynamic path planning algorithm is used for avoidance. By comparing FIG. 4 (c) and FIG.4 (d), it may be seen that the running distance of the agent is significantly reduced compared with that according to the original running path before planning after use of the Q-learning-based two-agent collaborative obstacle avoidance algorithm. Direct recording and output are performed in a program.

Table 2: Quantitative analysis table for path planning results of present disclosure and prior art in simulation experiment Number Running time (s) of agent Improvement percentage of tasks (%) A* Q-learning Two-agent Q-learning algorith method method m 2487 2402 2237 10.05 400 4877 4754 4237 13.12 600 7232 7001 6138 1513 800 9618 9455 8679 9.76 1000 12047 11358 10358 14.02 1200 14147 14011 13014 8.01 1400 16548 16245 15124 8.61 1600 19547 19006 17545 10.24 1800 21477 21014 19370 9.81 2000 24168 23114 21156 12.46 In order to compare the running efficiency of three methods in the simulation experiment of the present disclosure, a timer in multiple threads of Java software is used to directly record and output the running time for the agent to complete a task in the program, and a statistical record result is as shown in the Table 2. The number of agents is fixed to be 100, the number of tasks varies from 200 to 2000, with an interval of 200, that is, 10 different tasks are selected to perform statistics on the running time, 5 experiments were carried out for the number of tasks each time to take an average value, and the improvement percentage of the running time was calculated.

With reference to the Table 2, it may be seen that with the gradual increase in the number of tasks, the method for path planning provided by the present disclosure shortens the running time and increases the percentage of running efficiency compared with the method for path planning based on the A algorithm and the Q-learning algorithm. It may be seen that the present disclosure requires less running time than the prior art during the planning of the paths with the opposite conflict, and it is proved that a better planned path may be obtained by using the method provided by the present disclosure.

In order to verify the safety zone introduced in the grid map in the simulation experiment of the present disclosure, differences between the total running distances for the agents to complete a certain number of tasks and the paths are respectively calculated, and a statistical result thereof is as shown in a Table 3. Likewise, a statistical method for the number of agents and the number of tasks is consistent with a method for calculating the running time.

Table 3: Quantitative analysis table for path planning result of introduction of safety zone in grid map in simulation experiment Numbe Total running distance (m) Path Improvement r of difference percentage (%) of tasks (n) running efficiency Q-learning Two-agent Q-learning method method 124540 101457 23083 18.53 400 231457 191117 40340 17.43 600 348777 289824 58953 16.90 800 487510 405773 81737 16.77 1000 601000 510103 90897 15.12 1200 713204 600540 112664 15.80 1400 874114 741454 132660 15.18 1600 984512 841544 142968 14.52 1800 1150177 962475 187702 16.32 2000 1344874 1094099 250775 16.23 With reference to the Table 3, it may be seen that with the gradual increase in the number of tasks, the method for path planning provided by the present disclosure reduces the total running distance of the agent by an average of 16% and increases the percentage of running efficiency after the introduction of the safety zone in the grid map. It may be seen that after the present disclosure introduces the safety zone during the planning of the paths with the opposite conflict, the total running distance required for the agent is reduced, and it is proved that the better planned path may be obtained by using the method provided by the present disclosure.

The above simulation experiment shows that the present disclosure re-plans two conflict-free optimal paths for two agents having the opposite conflict in combination with the introduction of the safety zone in the grid map by using the Q-learning-based two-agent collaborative obstacle avoidance algorithm, which shortens and reduces the running time and the total running distance during the dynamic path planning, and improves the running efficiency of the agents

Claims

CLAIMS1. A method for multi-agent dynamic path planning, wherein when there is an opposite conflict in running paths for two agents in a one-way lane, multi-agent dynamic path planning with collaborative obstacle avoidance is performed by adopting an Q-learning-based two-agent collaborative obstacle avoidance algorithm in combination with the introduction of a safety zone in a Did map; and the method for planning comprises the following steps: step 1: constructing the grid map containing the safety zone: (1 a) constructing, by using a grid process, the grid map with a width and a length of X and 17; (lb) in the grid map, setting a zone where other agents avoid agents running in opposite directions at the same time as the safety zone in the one-way lane; and (lc) establishing, by taking an upper left corner of the grid map as an origin, rightward extension as a positive direction of a horizontal axis, and downward extension as a positive direction of a vertical axis, a Cartesian rectangular coordinate system of the grid map, and marking initial positions and target positions of the agents in the Cartesian rectangular coordinate system; step 2: roughly planning, by using an At algorithm, a running path for each of the agents in the Cartesian rectangular coordinate system of the grid map; step 3: planning, by using the Q-leaming-based two-agent collaborative obstacle avoidance algorithm, the running paths for the agents with the conflict: (3a) enabling each of the agents to run according to its roughly planned path, and if the two agents in the one-way lane have the opposite conflict, planning conflict-free running paths for the two agents simultaneously; (3b) creating a Q-value table for storing information of the two agents, and setting an initial Q-value to be -2 x 10-9, wherein horizontal coordinates of the Q-value table represent a combination of passable grid positions of the two agents, and vertical coordinates of the Q-value table represent a combination of moving directions of the two agents; (3c) when the two agents detect the same safety zone simultaneously, enabling only one of the two agents running in the opposite directions to enter the safety zone and the other of the two agents running in the opposite directions to stay at an entrance of this safety zone, and taking current positions as the initial positions of the two agents respectively; ce& (3d) according to an -gret policy, enabling each of the two agents to randomly select, by using an exploration rate 6, a moving direction in up, down, left, right, and stop respectively, wherein the exploration rate e is set to be 0.5; (3e) enabling the two agents to update the positions according to the selected directions, and taking the updated positions as the initial positions; (3f) rewarding each of the two agents whose initial positions are the target positions with I point, publishing at least one of the two agents, that encounters an obstacle or whose initial position belongs to the safety zone, with 1 point, and rewarding at least one of the agents, that reaches the passable position but not the target position, with 1 point, (3G) updating, by using an update formula, all the Q-values in the Q-value table and selecting out updated maximum Q-values; and (3h) determining whether the updated initial positions are the target positions set in the step (lc), and if yes, executing the step 4, otherwise, executing the step (3d); and step 4: sequentially connecting path points from the initial positions to the target positions of the agents to obtain the conflict-free running paths for the agents 2. The method for multi-agent dynamic path planning according to claim 1, wherein the combination of the passable grid positions of the two agents in the step (3b) is as follows: stale= XY2.x1 + )(1))1 + Vx, + Y2 wherein state represents the combination of the passable grid positions of the two agents, x' and Y' represent coordinate values of the current position of one agent, and X2 and Y' represent coordinate values of the current position of the other agent 3. The method for multi-agent dynamic path planning according to claim 1, wherein the update formula in the step (3g) is as follows: new° <-(1-cr)0 + a(R+ yinax(2) wherein new() represents the updated Q-value, a represents a learning rate of 0.5, R represents a reward value, 2/ represents a discount factor of 0 or 1, and mar represents maximum operation.