AU2022204569B2 - Method for multi-agent dynamic path planning - Google Patents

Method for multi-agent dynamic path planning Download PDF

Info

Publication number
AU2022204569B2
AU2022204569B2 AU2022204569A AU2022204569A AU2022204569B2 AU 2022204569 B2 AU2022204569 B2 AU 2022204569B2 AU 2022204569 A AU2022204569 A AU 2022204569A AU 2022204569 A AU2022204569 A AU 2022204569A AU 2022204569 B2 AU2022204569 B2 AU 2022204569B2
Authority
AU
Australia
Prior art keywords
agents
agent
running
positions
conflict
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
AU2022204569A
Other versions
AU2022204569A1 (en
Inventor
Xuelian Cai
Xiao FU
Zihang TAN
Jing Zheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Publication of AU2022204569A1 publication Critical patent/AU2022204569A1/en
Application granted granted Critical
Publication of AU2022204569B2 publication Critical patent/AU2022204569B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/0088Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0219Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0287Control of position or course in two dimensions specially adapted to land vehicles involving a plurality of land vehicles, e.g. fleet or convoy travelling
    • G05D1/0289Control of position or course in two dimensions specially adapted to land vehicles involving a plurality of land vehicles, e.g. fleet or convoy travelling with means for avoiding collisions between vehicles

Abstract

The present disclosure provides a method for multi-agent dynamic path planning, including the following steps: (1) constructing a grid map containing a safety zone; (2) roughly planning, by using an A* algorithm, a running path for each agent in a Cartesian rectangular 5 coordinate system of the grid map; (3) re-planning, by using a Q-learning-based two-agent collaborative obstacle avoidance algorithm, running paths for two agents after a conflict; (4) determining whether updated initial positions are target positions set by the agents, and if yes, executing the step (5), otherwise, executing the step (3); and (5) sequentially connecting path points from start points to end points of the agents to obtain optimal conflict-free running paths 10 for the two agents. Compared with an existing algorithm for path planning, the present disclosure has the advantages of less running time, short total running distance, high running efficiency, and capability of avoiding opposite conflicts again. -1/4 DRAWINGS Construct grid map containing safety zone Roughly plan path for each agent by using A* algorithm Make each agent run according to its roughly planned Dath r No Does opposite conflict occur during running __ Yes Plan conflict-free path by using Q learning-based two-agent collaborative obstacle avoidance algorithm Create Q-value table for storing two agents Take current position where safety zone is detected as initial position combination Select next moving direction according greedypolicy Make agent update position and give certain reward Update Q-value table and select maximum Q-value No Determine whether agents all have reached target positions __ Yes Obtain conflict-free optimal paths for two agents FIG. 1

Description

-1/4 DRAWINGS Construct grid map containing safety zone
Roughly plan path for each agent by using A* algorithm
Make each agent run according to its roughly planned Dath
r conflict No Does opposite occur during running
__ Yes Plan conflict-free path by using Q learning-based two-agent collaborative obstacle avoidance algorithm
Create Q-value table for storing two agents
Take current position where safety zone is detected as initial position combination
Select next moving direction according greedypolicy
Make agent update position and give certain reward
Update Q-value table and select maximum Q-value
No Determine whether agents all have reached target positions
__ Yes Obtain conflict-free optimal paths for two agents
FIG. 1
-I METHOD FOR MULTI-AGENT DYNAMIC PATH PLANNING
TECHNICAL FIELD The present disclosure belongs to the technical field of path planning, and further
relates to a method for multi-agent dynamic path planning in the technical field of
dynamic path planning. The present disclosure may efficiently avoid dynamic obstacles,
so as to plan optimal paths for agents to go to destinations during the movement process.
BACKGROUND Multiple agents include a series of interactive agents. Each agent inside follows
corresponding organizational rules to complete specific tasks. The agents refer to entities with
basic characteristics of self-organization and sociality. An agent may be recognized as a
corresponding software program or an entity (such as a person, a vehicle, a robot, etc.), that is
embedded in an environment, perceives the environment through a sensor, acts autonomously
on the environment through an effector and meets design requirements. Multi-agent path
planning is widely used in intelligent inspection, aircraft trajectory planning, autonomous
driving and other fields. The multi-agent path planning is to analyze and process prior
information such as maps and data sensed by sensors in a space with obstacles to perceive a
state of the environment around the agents. After the environment around is perceived, an
optimal conflict-free path from a start point to a target point that conforms to a certain
evaluation index is found through a designed policy, so as to ensure that the agent may bypass
the obstacles during the running according to the path to reach a destination smoothly. In the
multi-agent path planning, the following three main goals are achieved: a path from the start
point to the target point is planned; during the process of running along the path, the agent may
successfully avoid the obstacles when encountering them; and on the basis of satisfying the
above two goals, the planned path meets the global optimality to the greatest extent. At present,
methods to solve such problems usually include: a heuristic random search algorithm and an
intelligent optimization algorithm.
The heuristic random search algorithm is a method for path planning based on heuristic information. The algorithm is guided to search in a most promising direction by using heuristic information carried by problems, so as to find an optimal path. For example:
The University of Electronic Science and Technology of China disclosed a heuristic
multidirectional rapidly exploring random tree-based method for robot path planning
optimization in its applied patent document "Method for Unmanned Aerial Vehicle Path
Planning Based on Rapidly exploring Random Tree (RRT) Algorithm" (application number:
CN202110409420.5, and application publication number: CN112987799A). In a high
dimensional configuration space, this method draws on an idea of the RRT algorithm. First,
when a random tree does not encounter obstacles during the exploring process, it will gradually
enhance its tendency to quickly approach a target point, and then once it encounters the
obstacles, it will immediately reduce this tendency to maximize the randomness of growth,
thereby making the random tree avoid the obstacles more efficiently, and eventually generating
a locally optimal path that successfully avoids the obstacles. This method may select a parent
node with smaller cost, instead of simply selecting a nearest node, thereby maximizing the
planning efficiency and speed. However, the following shortcomings of this method still exist:
the path planned by the improved RRT algorithm does not take into account dynamic obstacles,
namely, other agents, and a mobile agent may only wait to avoid the obstacles or go back the
same way, which increases the length of an invalid running path, and seriously affects the
running speed of the mobile agent.
Methods for multi-agent dynamic path planning through the intelligent optimization
algorithm are mostly researches based on reinforcement learning methods, such as a Q
learning algorithm and a State Action Reward State Action (SARSA) algorithm, that is,
through continuous interaction between agents and environments, obstacle construction
environment modeling is unified to quickly detect collision between paths and all obstacles,
so as to plan an optimal path for an agent. For example:
The Dalian University of Technology disclosed an ant colony algorithm-based method
for multi-agent path planning by reinforcement learning in its applied patent document
"Ant Colony Algorithm-based Method for Multi-Agent Path Planning by Reinforcement
Learning" (application number: CN202011257321.1, and application publication number:
CN112286203A). This method includes two stages. A first stage is to obtain current environment information of an agent cluster. Firstly, pheromones in an ant colony algorithm are used as heuristic information to detect obstacles in an environment, and their polar coordinate information is stored. Secondly, according to sensor information, target position information, pheromone information, and a sequence of serial numbers, collected state information is abstracted into a tuple, which serves as a current state to describe a pheromone map in an initialized environment. A second stage is to train a deep reinforcement learning model for multi-agent path planning. Firstly, a shared experience pool D is initialized, the size of the shared experience pool is set to be N, the ant colony pheromone map is initialized, the diffusion rate I and attenuation rate P of the pheromones and the number Q of agents in the agent cluster are set, and an improved deep q-network (DQN) and deep reinforcement learning method based on Q learning and an ant colony "pheromone" coordination mechanism are adopted. Secondly, a neural network is trained and updated based on historical information of the agent cluster. Finally, an optimal path planning policy of each agent in the agent cluster is obtained. Although this method may also be used for multi-agent path planning, the following shortcomings of this method still exist: it only takes into account non-opposite conflicts, and ignores opposite conflicts caused by mobile agents, which leads to that during the actual running process, the agent needs to continuously avoid the obstacles, an optimal avoidance path for the agent may not be re-planned, and the running efficiency of the agent is seriously affected.
SUMMARY In view of the above deficiencies of the prior art, an objective of the present disclosure is to provide a method for multi-agent dynamic path planning, so as to solve the problems that during the actual running process, an agent needs to continuously avoid obstacles and the running efficiency of the agent is seriously affected due to opposite conflicts caused by other mobile agents during the dynamic path planning of agents. To achieve the above objective, an idea of the present disclosure is as follows: when two agents have an opposite conflict during the running, running paths for the agents with the conflict are planned by using a Q-learning-based two-agent collaborative obstacle avoidance algorithm, the two agents are subjected to global dynamic path planning with a combination of passable grid positions as an initial position, and reward values are given to the agents; and the agents are guided to select moving directions for making the running paths shortest in a searchable range as next moving directions, thereby solving the problem that optimal avoidance paths for the two agents may not be re-planned. According to the present disclosure, by introducing a safety zone in a grid map, only one of the two agents running in opposite directions is enabled to enter the safety zone, and the other of the two agents running in the opposite directions is enabled to stay at an entrance of the safety zone, thereby getting rid of constraints of fixed obstacles in an existing method for dynamic path planning, and solving the problem that the agent runs on repeated road segments during the dynamic path planning.
The objective of the present disclosure is achieved by the following specific steps:
step 1: constructing the grid map containing the safety zone:
(la) constructing, by using a grid process, the grid map with a width and a length of X
and Y;
(lb) in the grid map, setting a zone where the agents running in the opposite directions
avoid other agents at the same time as the safety zone in the one-way lane; and
(1c) establishing, by taking an upper left corner of the grid map as an origin, rightward
extension as a positive direction of a horizontal axis, and downward extension as a positive
direction of a vertical axis, a Cartesian rectangular coordinate system of the grid map, and
marking initial positions and target positions of the agents in the Cartesian rectangular
coordinate system;
step 2: roughly planning, by using an A* algorithm, a running path for each agent in the
Cartesian rectangular coordinate system of the grid map;
step 3: planning, by using the Q-learning-based two-agent collaborative obstacle
avoidance algorithm, the running paths for the agents with the conflict:
(3a) enabling each agent to run according to its roughly planned path, and if the two
agents in the one-way lane have the opposite conflict, planning conflict-free running paths for
the two agents simultaneously;
(3b) creating a Q-value table for storing information of the two agents, and setting an
initial Q-value to be -2x10-9, wherein horizontal coordinates of the Q-value table represent a combination of passable grid positions of the two agents, and vertical coordinates of the Q value table represent a combination of moving directions of the two agents; (3c) when the two agents detect the same safety zone simultaneously, enabling only one of the two agents running in the opposite directions to enter the safety zone and the other of the two agents running in the opposite directions to stay at an entrance of this safety zone, and taking current positions as the initial positions of the two agents respectively; (3d) according to an c- greedy policy, enabling each of the two agents to randomly select, by using an exploration rate 8, a moving direction in up, down, left, right, and stop respectively, wherein the exploration rate 8 is set to be 0.5; (3e) enabling the two agents to update the positions according to the selected directions, and taking the updated positions as the initial positions; (3f) rewarding each of the two agents whose initial positions are the target positions with 1 point, publishing at least one of the two agents, that encounters an obstacle or whose initial position belongs to the safety zone, with 1 point, and rewarding at least one of the agents, that reaches the passable position but not the target position, with 1 point; (3G) updating, by using an update formula, all Q-values in the Q-value table and selecting out updated maximum Q-values; and (3h) determining whether the updated initial positions are the target positions set in the step (1c), and if yes, executing the step 4, otherwise, executing the step (3d); and step 4: sequentially connecting path points from the initial positions to the target positions of the agents to obtain the conflict-free running paths for the agents. Compared with the prior art, the present disclosure has the following advantages: Firstly, according to the present disclosure, the running paths for the agents with the conflict are planned by using the Q-leaming-based two-agent collaborative obstacle avoidance algorithm, the global dynamic path planning is performed with the combination of the passable grid positions as the initial position, and the reward values are given to the agents; and the agents are guided to select the moving directions for making the running paths shortest in the searchable range as the next moving directions, thereby solving the problems that the optimal avoidance paths for the two agents may not be re-planned and the running efficiency of the mobile agents is seriously affected, and making the paths planned by the present disclosure improve the running efficiency of the agents, shorten the running time of the agents and effectively avoid the conflicts again.
Secondly, according to the present disclosure, by introducing the safety zone in the grid
map, only one of the two agents running in the opposite directions is enabled to enter the safety
zone, the other of the two agents running in the opposite directions is enabled to stay at the
entrance of the safety zone, and an opposite conflict avoidance algorithm is directly triggered,
thereby solving the problems that when there are dynamic obstacles during the dynamic path
planning in the prior art, the agent may only wait or go back the same way to avoid the
obstacles, and the running speed of the mobile agent is seriously affected. Therefore, the path
planned by the present disclosure increases the running speed of the agent, improves the
efficiency of the opposite conflict avoidance algorithm, and reduces the invalid running
distance of the agent.
BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a flowchart of the present disclosure;
FIG. 2 is a schematic diagram of a grid map containing a safety zone in the present
disclosure;
FIG. 3 is a schematic diagram of agents with a conflict in the present disclosure; and
FIG. 4 is a simulation diagram of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS The present disclosure is further described in detail below with reference to the
accompanying drawings and embodiments.
Referring to FIG. 1, the implementation steps of the present disclosure are further
described in detail. Step 1: a grid map containing a safety zone is constructed.
By using a grid process, the grid map with a width and a length of X and Y is
constructed.
In the grid map, a zone where agents running in opposite directions avoid other agents at
the same time is set as the safety zone in a one-way lane.
By taking an upper left corner of the grid map as an origin, rightward extension as a
positive direction of a horizontal axis, and downward extension as a positive direction of a
vertical axis, a Cartesian rectangular coordinate system of the grid map is established, and
initial positions and target positions of the agents are marked in the Cartesian rectangular
coordinate system.
Referring to FIG. 2, the grid map containing the safety zone constructed according to the
present disclosure is further described in detail.
FIG. 2 is the grid map containing the safety zone constructed according to the present
disclosure. The upper left corner of the grid map is the origin O, the X axis represents the
width of the grid map, in meters, and the Y axis represents the length of the grid map, in
meters. White squares in FIG. 2 represent passable grids, and dark squares represent
impassable grids. A number 1 in FIG. 2 represents an agent 1, a number 2 represents an agent
2, and a zone represented by oblique lines is the safety zone.
Step 2: by using an A* algorithm, a running path for each agent is roughly planned in
the Cartesian rectangular coordinate system of the grid map.
Step 3: by using a Q-learning-based two-agent collaborative obstacle avoidance
algorithm, running paths for the agents with a conflict are planned.
Firstly, each agent runs according to its roughly planned path, and if the two agents in the
one-way lane have the opposite conflict, conflict-free running paths are planned for the two
agents simultaneously.
Secondly, a Q-value table for storing information of the two agents is created, and an
initial Q-value is set to be -2x10-9, wherein horizontal coordinates of the Q-value table
represent a combination of passable grid positions of the two agents, and vertical coordinates
of the Q-value table represent a combination of moving directions of the two agents.
Thirdly, when the two agents detect the same safety zone simultaneously, current
positions are taken as the initial positions of the two agents respectively.
Fourthly, according to an c - greedy policy, each of the two agents randomly select, by
using an exploration rate -,a moving direction in up, down, left, right, and stop respectively,
wherein the exploration rate C issettobe0.5.
Fifthly, the two agents update the positions according to the selected directions, and the updated positions are taken as the initial positions.
Sixthly, each of the two agents whose initial positions are the target positions is rewarded
with 1 point, at least one of the two agents, that encounters an obstacle or whose initial position
belongs to the safety zone, is published with 1 point, and at least one of the agents, that reaches
the passable position but not the target position, is rewarded with 1 point.
Seventhly, by using an update formula, all Q-values in the Q-value table are updated and
updated maximum Q-values are selected out; and
Eighthly, whether the updated initial positions are the target positions set in the step 1 is
determined, and if yes, the step 4 is executed, otherwise, this step is executed.
Referring to FIG. 3, opposite conflict types of conflict-free paths planned by the present
disclosure are further described in detail.
FIG. 3 is a schematic diagram of two types of opposite conflicts. FIG. 3 (a) is a schematic
diagram of a first type of the opposite conflict of the two agents, wherein a black dot represents
that the agent 1 and the agent 2 possibly collide in the middle of grids, and at this time, the
coordinates of the initial positions of the two agents after the opposite conflict in the grid map
are same. FIG. 3 (b) is a schematic diagram of a second type of the opposite conflict of the two
agents, wherein a black dot represents that the agent 1 and the agent 2 collide at a junction of
the grids, and at this time, the coordinates of the initial positions of the two agents after the
opposite conflict in the grid map are different.
Step 4: path points from the initial positions to the target positions of the agents are
sequentially connected to obtain the conflict-free running paths for the agents.
The effects of the present disclosure are further described below with reference to a
simulation experiment.
1. Conditions for the simulation experiment:
A hardware platform for the simulation experiment in the present disclosure is that: a
central processing unit is an Intel (R) Core (TM) i7-9700 CPU, a main frequency is 3.00 GHz,
and a memory is 16.0 GB.
A software platform for the simulation experiment in the present disclosure is: a
Windows10 operating system and Java1.
The simulation experiment in the present disclosure mainly verifies the completeness of the opposite conflict avoidance and shortest path planning algorithms. In order to be able to more clearly display moving trajectories of the agents, the running paths for the two agents are planned in the small-scale grid map of 15m*29m. The side length of each grid is Im,and the distribution of passable grids and impassable grids in the grid map is manually set.
2. Content and result analysis of the simulation experiment:
The simulation experiment in the present disclosure respectively simulates path planning
of the preset grid map by using the method provided by the present disclosure and an A*
algorithm-based method for path planning in a prior art.
Specific parameters for the simulation experiment are as shown in a Table 1:
Table 1: Parameter table for simulation experiment Parameter name Parameter value
Number number of agents 100
1 per second Task arrival rate A
Running speed vrob of agent im/s Turning time consumption
t1urn of agent Is
Learning rate a 0.5
In the simulation experiments, a prior art used refers to that:
An algorithm-based method for path planning in the A* prior art refers to a method for
path planning proposed by Gelperin et al. in "On the optimality ofA*[J]. Artificial Intelligence,
1977, 8(1):69-76.", abbreviated as the A* algorithm-based method for path planning.
The effects of the present disclosure are further described below with reference to a
simulation diagram of FIG. 4.
FIG. 4 is the simulation diagram of the present disclosure, wherein a black dot represents
the agent 1, a black solid line represents the running path for the agent 1, a gray dot represents
the agent 2, a black dotted line represents the running path for the agent 2, and an arrow
represents a running direction thereof. A diamond grid in FIG. 4 is a target position of an
uninstallation task.
FIG. 4 (a) is a simulation result diagram of original running paths for the agents before path planning; FIG. 4 (b) represents a simulation result diagram that if the two agents run according to the original paths, the two agents will have the opposite conflict at a position marked with a black triangle; FIG. 4 (c) represents a simulation result diagram that the two agents will perform dynamic obstacle avoidance at a black dot and a gray dot respectively, that is, the step before collision; and FIG. 4 (d) is a simulation result diagram that the method for path planning provided by the present disclosure performs path planning, wherein a black solid line is a path re-planned for the agent 1, and a black dotted line represents a path re-planned for the agent 2.
From results of the simulation experiment, it may be seen that the Q-learning-based two
agent collaborative obstacle avoidance algorithm may successfully avoid the opposite conflict
and achieve the shortest path to the target position. When the two agents have the opposite
conflict in FIG. 4 (c), a dynamic path planning algorithm is used for avoidance. By comparing
FIG. 4 (c) and FIG.4 (d), it may be seen that the running distance of the agent is significantly
reduced compared with that according to the original running path before planning after use
of the Q-leaming-based two-agent collaborative obstacle avoidance algorithm. Direct
recording and output are performed in a program.
Table 2: Quantitative analysis table for path planning results of present disclosure and prior
art in simulation experiment
Number Running time (s) of agent Improvement of tasks A* Q-learning Two-agent Q-leaming percentage algorithm method method (%) 200 2487 2402 2237 10.05 400 4877 4754 4237 13.12 600 7232 7001 6138 15.13 800 9618 9455 8679 9.76 1000 12047 11358 10358 14.02 1200 14147 14011 13014 8.01 1400 16548 16245 15124 8.61 1600 19547 19006 17545 10.24 1800 21477 21014 19370 9.81 2000 24168 23114 21156 12.46
In order to compare the running efficiency of three methods in the simulation experiment of the present disclosure, a timer in multiple threads of Java software is used to directly record and output the running time for the agent to complete a task in the program, and a statistical record result is as shown in the Table 2. The number of agents is fixed to be 100, the number of tasks varies from 200 to 2000, with an interval of 200, that is, 10 different tasks are selected to perform statistics on the running time, 5 experiments were carried out for the number of tasks each time to take an average value, and the improvement percentage of the running time was calculated. With reference to the Table 2, it may be seen that with the gradual increase in the number of tasks, the method for path planning provided by the present disclosure shortens the running time and increases the percentage of running efficiency compared with the method for path planning based on the A* algorithm and the Q-learning algorithm. It may be seen that the present disclosure requires less running time than the prior art during the planning of the paths with the opposite conflict, and it is proved that a better planned path may be obtained by using the method provided by the present disclosure. In order to verify the safety zone introduced in the grid map in the simulation experiment of the present disclosure, differences between the total running distances for the agents to complete a certain number of tasks and the paths are respectively calculated, and a statistical result thereof is as shown in a Table 3. Likewise, a statistical method for the number of agents and the number of tasks is consistent with a method for calculating the running time. Table 3: Quantitative analysis table for path planning result of introduction of safety zone in grid map in simulation experiment Number Total running distance (m) Path Improvement of tasks Q-learning Two-agent Q-leaming difference percentage (%) of method method (m) running efficiency 200 124540 101457 23083 18.53 400 231457 191117 40340 17.43 600 348777 289824 58953 16.90 800 487510 405773 81737 16.77 1000 601000 510103 90897 15.12 1200 713204 600540 112664 15.80 1400 874114 741454 132660 15.18
1600 984512 841544 142968 14.52 1800 1150177 962475 187702 16.32 2000 1344874 1094099 250775 16.23
With reference to the Table 3, it may be seen that with the gradual increase in the number
of tasks, the method for path planning provided by the present disclosure reduces the total
running distance of the agent by an average of 16% and increases the percentage of running
efficiency after the introduction of the safety zone in the grid map. It may be seen that after
the present disclosure introduces the safety zone during the planning of the paths with the
opposite conflict, the total running distance required for the agent is reduced, and it is proved
that the better planned path may be obtained by using the method provided by the present
disclosure.
The above simulation experiment shows that the present disclosure re-plans two conflict
free optimal paths for two agents having the opposite conflict in combination with the
introduction of the safety zone in the grid map by using the Q-learning-based two-agent
collaborative obstacle avoidance algorithm, which shortens and reduces the running time and
the total running distance during the dynamic path planning, and improves the running
efficiency of the agents.

Claims (3)

  1. CLAIMS 1. A method for multi-agent dynamic path planning, wherein when there is an opposite
    conflict in running paths for two agents in a one-way lane, multi-agent dynamic path planning
    with collaborative obstacle avoidance is performed by adopting an Q-learning-based two
    agent collaborative obstacle avoidance algorithm in combination with the introduction of a
    safety zone in a grid map; and the method for planning comprises the following steps:
    step 1: constructing the grid map containing the safety zone:
    (la) constructing, by using a grid process, the grid map with a width and a length of X
    and Y;
    (lb) in the grid map, setting a zone where other agents avoid agents running in opposite
    directions at the same time as the safety zone in the one-way lane; and
    (1c) establishing, by taking an upper left corner of the grid map as an origin, rightward
    extension as a positive direction of a horizontal axis, and downward extension as a positive
    direction of a vertical axis, a Cartesian rectangular coordinate system of the grid map, and
    marking initial positions and target positions of the agents in the Cartesian rectangular
    coordinate system;
    step 2: roughly planning, by using an A* algorithm, a running path for each of the agents
    in the Cartesian rectangular coordinate system of the grid map;
    step 3: planning, by using the Q-learning-based two-agent collaborative obstacle
    avoidance algorithm, the running paths for the agents with the conflict:
    (3a) enabling each of the agents to run according to its roughly planned path, and if the
    two agents in the one-way lane have the opposite conflict, planning conflict-free running paths
    for the two agents simultaneously;
    (3b) creating a Q-value table for storing information of the two agents, and setting an
    initial Q-value to be -2x10- 9 , wherein horizontal coordinates of the Q-value table represent
    a combination of passable grid positions of the two agents, and vertical coordinates of the Q value table represent a combination of moving directions of the two agents;
    (3c) when the two agents detect the same safety zone simultaneously, enabling only one
    of the two agents running in the opposite directions to enter the safety zone and the other of
    the two agents running in the opposite directions to stay at an entrance of this safety zone, and taking current positions as the initial positions of the two agents respectively;
    (3d) according to an c- greedy policy, enabling each of the two agents to randomly
    select, by using an exploration rate 6, a moving direction in up, down, left, right, and stop
    respectively, wherein the exploration rate E is set to be 0.5;
    (3e) enabling the two agents to update the positions according to the selected directions,
    and taking the updated positions as the initial positions;
    (3f) rewarding each of the two agents whose initial positions are the target positions with
    1 point, publishing at least one of the two agents, that encounters an obstacle or whose initial
    position belongs to the safety zone, with 1 point, and rewarding at least one of the agents, that
    reaches the passable position but not the target position, with 1 point;
    (3G) updating, by using an update formula, all the Q-values in the Q-value table and
    selecting out updated maximum Q-values; and
    (3h) determining whether the updated initial positions are the target positions set in the
    step (1c), and if yes, executing the step 4, otherwise, executing the step (3d); and
    step 4: sequentially connecting path points from the initial positions to the target positions
    of the agents to obtain the conflict-free running paths for the agents.
  2. 2. The method for multi-agent dynamic path planning according to claim 1, wherein the
    combination of the passable grid positions of the two agents in the step (3b) is as follows:
    state = XY 2 x±+ XYy +Yx 2 + y2
    wherein state represents the combination of the passable grid positions of the two agents, XI and Y1 represent coordinate values of the current position of one agent, and X 2 and Y 2
    represent coordinate values of the current position of the other agent.
  3. 3. The method for multi-agent dynamic path planning according to claim 1, wherein the
    update formula in the step (3g) is as follows:
    ymaxQ) newQ<-(1-a)Q+a(R+
    wherein newQ represents the updated Q-value, a represents a learning rate of 0.5, R represents a reward value, 7 represents a discount factor of 0 or 1, and max represents
    maximum operation.
AU2022204569A 2021-08-12 2022-06-28 Method for multi-agent dynamic path planning Active AU2022204569B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110924688.2A CN113625716B (en) 2021-08-12 2021-08-12 Multi-agent dynamic path planning method
CN202110924688.2 2021-08-12

Publications (2)

Publication Number Publication Date
AU2022204569A1 AU2022204569A1 (en) 2023-03-02
AU2022204569B2 true AU2022204569B2 (en) 2023-07-20

Family

ID=78384864

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2022204569A Active AU2022204569B2 (en) 2021-08-12 2022-06-28 Method for multi-agent dynamic path planning

Country Status (3)

Country Link
CN (1) CN113625716B (en)
AU (1) AU2022204569B2 (en)
GB (1) GB2610276A (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114578827B (en) * 2022-03-22 2023-03-24 北京理工大学 Distributed multi-agent cooperative full-coverage path planning method
CN116069023B (en) * 2022-12-20 2024-02-23 南京航空航天大学 Multi-unmanned vehicle formation control method and system based on deep reinforcement learning
CN116483086B (en) * 2023-04-26 2024-03-26 西安电子科技大学广州研究院 Long-term multi-agent path planning method for decoupling edge conflict and point conflict
CN116382304B (en) * 2023-05-26 2023-09-15 国网江苏省电力有限公司南京供电分公司 DQN model-based multi-inspection robot collaborative path planning method and system
CN117572876B (en) * 2024-01-15 2024-04-12 湖南大学 Multi-agent collision prevention control method based on dependency relationship

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105137967B (en) * 2015-07-16 2018-01-19 北京工业大学 The method for planning path for mobile robot that a kind of depth autocoder is combined with Q learning algorithms
CN108776483B (en) * 2018-08-16 2021-06-29 圆通速递有限公司 AGV path planning method and system based on ant colony algorithm and multi-agent Q learning
CN109765896A (en) * 2019-01-29 2019-05-17 重庆大学 A kind of dynamic path planning method based on the more AGV of intelligent parking lot
CN111566583A (en) * 2019-10-04 2020-08-21 香港应用科技研究院有限公司 System and method for adaptive path planning
CN111413980A (en) * 2020-04-07 2020-07-14 苏州哈工吉乐优智能装备科技有限公司 Automatic guided vehicle path planning method for inspection
CN112286203B (en) * 2020-11-11 2021-10-15 大连理工大学 Multi-agent reinforcement learning path planning method based on ant colony algorithm
CN112987799B (en) * 2021-04-16 2022-04-05 电子科技大学 Unmanned aerial vehicle path planning method based on improved RRT algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Lv, Y. et al., 'Q-Learning Dynamic Path Planning for an HCV Avoiding Unknown Threatened Area', Chinese Automation Congress (CAC), 2020, doi: 10.1109/CAC51589.2020.9327757. *

Also Published As

Publication number Publication date
GB202209423D0 (en) 2022-08-10
CN113625716A (en) 2021-11-09
CN113625716B (en) 2023-06-16
AU2022204569A1 (en) 2023-03-02
GB2610276A (en) 2023-03-01

Similar Documents

Publication Publication Date Title
AU2022204569B2 (en) Method for multi-agent dynamic path planning
Gul et al. Meta-heuristic approach for solving multi-objective path planning for autonomous guided robot using PSO–GWO optimization algorithm with evolutionary programming
CN112650229B (en) Mobile robot path planning method based on improved ant colony algorithm
CN113156980B (en) Tower crane path planning method and system based on deep reinforcement learning
Zhang et al. Path planning based quadtree representation for mobile robot using hybrid-simulated annealing and ant colony optimization algorithm
CN110045738A (en) Robot path planning method based on ant group algorithm and Maklink figure
Kyprianou et al. Towards the achievement of path planning with multi-robot systems in dynamic environments
CN109885082A (en) The method that a kind of lower unmanned aerial vehicle flight path of task based access control driving is planned
Ming et al. A survey of path planning algorithms for autonomous vehicles
Zghair et al. Intelligent Hybrid Path Planning Algorithms for Autonomous Mobile Robots.
Wang et al. Improved Q-learning applied to dynamic obstacle avoidance and path planning
Li et al. Robot path planning using improved artificial bee colony algorithm
Omar Path planning for unmanned aerial vehicles using visibility line-based methods
BAYGIN et al. PSO based path planning approach for multi service robots in dynamic environments
Han et al. Research on UAV indoor path planning algorithm based on global subdivision grids
Liu et al. Intelligent robot motion trajectory planning based on machine vision
Zhang et al. A State-Decomposition DDPG Algorithm for UAV Autonomous Navigation in 3D Complex Environments
Li et al. End-to-end autonomous exploration for mobile robots in unknown environments through deep reinforcement learning
Yao et al. The Application of Internet of Things in Robot Route Planning Based on Multisource Information Fusion
Li et al. Improved sparrow search algorithm applied to path planning of mobile robot
Luo et al. A Max-Min Ant System Approach to Autonomous Navigation
Li et al. Towards Path Planning Algorithm Combining with A-Star Algorithm and Dynamic Window Approach Algorithm
Neuman et al. Anytime policy planning in large dynamic environments with interactive uncertainty
Zhang et al. A deep reinforcement learning method for mobile robot path planning in unknown environments
Botao et al. A expected-time optimal path planning method for robot target search in uncertain environments

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)