CN115047878A - DM-DQN-based mobile robot path planning method - Google Patents
DM-DQN-based mobile robot path planning method Download PDFInfo
- Publication number
- CN115047878A CN115047878A CN202210673628.2A CN202210673628A CN115047878A CN 115047878 A CN115047878 A CN 115047878A CN 202210673628 A CN202210673628 A CN 202210673628A CN 115047878 A CN115047878 A CN 115047878A
- Authority
- CN
- China
- Prior art keywords
- dqn
- function
- reward function
- path planning
- mobile robot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000009471 action Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 9
- 230000033001 locomotion Effects 0.000 claims description 11
- 230000005484 gravity Effects 0.000 claims description 5
- 230000008901 benefit Effects 0.000 abstract description 4
- 238000011156 evaluation Methods 0.000 abstract description 4
- 230000004888 barrier function Effects 0.000 abstract description 3
- 230000002860 competitive effect Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 56
- 230000003068 static effect Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 5
- 230000002787 reinforcement Effects 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0238—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
- G05D1/024—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0257—Control of position or course in two dimensions specially adapted to land vehicles using a radar
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Aviation & Aerospace Engineering (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Optics & Photonics (AREA)
- Electromagnetism (AREA)
- Feedback Control In General (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to the technical field of DQN algorithm, in particular to a mobile robot path planning method based on DM-DQN, which comprises the steps of establishing a mobile robot path planning model based on DM-DQN; designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm; and training the DM-DQN algorithm to obtain an experience reward value, and completing the collision-free path planning of the robot. The invention introduces a competitive network structure, decomposes the network structure into a value function and an advantage function, and decouples action selection and action evaluation, so that the state is not completely judged depending on the value of the action any more, and independent value prediction can be carried out, thereby solving the problem of low convergence speed; and by designing the reward function based on the artificial potential field, the problem that the robot is too close to the edge of the barrier is solved.
Description
Technical Field
The invention relates to the technical field of DQN (differential Quadrature reference network) algorithms, in particular to a mobile robot path planning method based on DM-DQN.
Background
With the development trend of artificial intelligence, the robot industry also develops towards the intelligent direction of autonomous learning and autonomous exploration, and the path planning of a mobile robot is a core problem in the motion of the robot, and aims to find an optimal or suboptimal path without collision from a starting point to a terminal point; with the continuous development of science and technology, the environment faced by the robot is more and more complex, and in an unknown environment, the information of the whole environment cannot be obtained, so that the traditional path planning algorithm cannot meet the requirements of people, for example: artificial potential field algorithm, ant colony algorithm, genetic algorithm, particle swarm algorithm and the like. Aiming at the situation, deep reinforcement learning is provided, and the deep learning is combined with the reinforcement learning, wherein the deep learning mainly extracts features of an input unknown environment state through a neural network, and the fitting of the environment state to an action value function is realized; and the reinforcement learning completes the decision according to the output of the deep neural network and the exploration strategy, thereby realizing the mapping from the state to the action. The combination of deep learning and reinforcement learning solves the problem of dimension disaster caused by mapping from states to actions, and can better meet the motion requirements of the robot in a complex environment.
Disclosure of Invention
Aiming at the defects of the existing algorithm, the invention introduces a competition network structure, decomposes the network structure into a value function and an advantage function, and thereby decouples the action selection and the action evaluation, so that the state is not completely dependent on the value of the action for judgment any more, the independent value prediction can be carried out, and the problem of low convergence speed is solved; and by designing the reward function based on the artificial potential field, the problem that the robot is too close to the edge of the barrier is solved.
The technical scheme adopted by the invention is as follows: a DM-DQN-based mobile robot path planning method comprises the following steps:
step one, establishing a mobile robot path planning model based on DM-DQN;
designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm;
further, the structure of the DM-DQN network model is divided into a cost function V (s, ω, α) and a merit function a (s, a, ω, β), and the output of the DM-DQN network model is represented as:
Q(s,a,ω,α,β)=V(s,ω,α)+A(s,a,ω,β) (4)
where s represents the state, a represents the motion, ω is a parameter common to V and a, α and β are parameters of V and a, respectively, the value of V can be regarded as the average of the Q values in the state of s, the value of a is a limit with the average being 0, and the sum of the value of V and the value of a is the original Q value.
Further, the merit function is centralized, and the output of the DM-DQN network model is represented as:
where s denotes the state, a denotes the action, a' denotes the next action, a is an alternative action, ω is a common parameter for V and A, and α and β are parameters for V and A, respectively.
Further, the reward function is divided into a position reward function and a direction reward function, and a total reward function is calculated according to the position reward function and the direction reward function.
Further, in the position reward function, firstly, the gravity potential field function is used for constructing a target guide reward function:
where ζ represents the gravity reward function constant, d goal Representing the distance between the current position and the target point;
secondly, constructing an obstacle avoidance reward function by using a repulsive force potential field function, wherein the reward is a negative reward and is reduced along with the reduction of the distance between the robot and the obstacle:
wherein η represents a repulsive reward function constant, d obs Indicating the distance between the current position and the obstacle, d max Representing the maximum impact distance of the obstacle.
Further, the direction reward function is expressed according to the angle difference between the expected direction and the actual direction of the robot, and the formula is as follows:
wherein, F q Denotes the desired direction, F a Which is indicative of the actual direction of the light,representing the angle between the expected direction and the actual direction;
the directional reward function may be expressed as:
further, the overall reward function of the mobile robot is expressed as:
wherein r is goal Representing the radius of the target area, r, centered on the target point obs Representing the radius of the impact zone centered on the obstacle;
and step three, training the DM-DQN algorithm, obtaining an experience reward value, and completing the collision-free path planning of the robot.
The invention has the beneficial effects that:
1. by introducing a competitive network structure, the network structure is decomposed into a value function and an advantage function, so that action selection and action evaluation are decoupled, the state is judged without completely depending on the value of the action, independent value prediction can be performed, the problem of low convergence speed is solved, and the network structure has better generalization performance.
2. By designing the reward function based on the artificial potential field, the problem that the robot is too close to the edge of the barrier is solved; the learning efficiency in the dynamic unknown environment is higher, the convergence speed is higher, and a collision-free path far away from an obstacle can be planned.
Drawings
Fig. 1 is a diagram of a DM-DQN network architecture of the present invention;
FIGS. 2(a) and (b) are a static environment diagram and a dynamic and static environment diagram, respectively, according to the present invention;
FIGS. 3(a), (b) are plots of reward values for the static and dynamic environments of the DM-DQN algorithm of the present invention;
fig. 4(a) and (b) are a static environment generation path diagram and a dynamic and static environment generation path diagram according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and which illustrate only the basic structure of the invention and, therefore, only show the structures associated with the invention.
Aiming at the problem of low convergence speed of M-DQN, the method is improved, a competition network structure is introduced, and the network structure is decomposed into a cost function and an advantage function; and aiming at the problem that the motion trail of the robot is too close to the edge of the obstacle, a reward function of the artificial potential field method is designed, so that the motion trail of the robot is far away from the periphery of the obstacle.
As shown in fig. 1, a DM-DQN-based mobile robot path planning method includes the following steps:
step one, establishing a DM-DQN-based mobile robot path planning model, and describing a mobile robot path planning problem as a Markov decision process;
first, the Q value is estimated by an online reducing Q-network with a weight of theta, and the weight of theta is copied to the weight of theta every C stepsIn the target network of (2);
secondly, by interacting with the environment using an epsilon-greedy strategy, the robot gets a reward and the next state according to the designed artificial potential field based reward function, and finally transitions(s) t ,a t ,r t ,s t+1 ) Is stored in a fixed size FIFO playback buffer, and every F steps the DM-DQN randomly extracts a batch D from the playback buffer D t And according to the following formula:
returning to the target, and reducing the loss to the minimum;
where s represents the status, a represents the action, r represents the prize value, and γ represents the discount factor.Satisfy the requirement ofτ is a hyperparameter for controlling the weight of the entropy, a' represents the action at time t +1, α is the hyperparameter set to 1,indicating the policy selected in that state,is an alternative action.
Designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm;
the state space includes: the method comprises the following steps of (1) laser radar data, a current control instruction of the mobile robot, a control instruction of the mobile robot at the last moment, and the direction and distance of a target point;
the motion space includes: angular and linear velocities of the mobile robot;
the motion space of the robot is dispersed into 5 motions, the fixed linear velocity v is 0.15m/s, an angular velocity value is given, the angular velocity is selected instead of directly giving a corner for the output of the control quantity, the kinematic characteristics of the mobile robot are better met, and the angular velocity is given according to the following formula:
wherein action _ size represents that the action space is dispersed into 5 actions, action [5 ]]The values representing the actions are: 0 to 4, max _ angel vel The maximum angular velocity value representing the robot steering is 1.5rad/s, and 5 actions are calculated according to equation (2), as shown in equation (3), where linear velocity v is in m/s and angular velocity ω is in rad/s.
Further, the DM-DQN network model divides the network structure into two parts, as shown in fig. 1, the first part is only related to the state S, called cost function, and is represented as V (S, ω, α); another part is related to state S and action a, called the dominance function, denoted as a (S, a, ω, β), and therefore the output of the network can be expressed as:
Q(s,a,ω,α,β)=V(s,ω,α)+A(s,a,ω,β) (4)
wherein s represents the state, a represents the motion, ω is a common parameter of V and A, α and β are parameters of V and A respectively, the value of V can be regarded as the average of Q values in the state of s, the value of A is a limit with the average being 0, and the sum of the value of V and the value of A is the original Q value;
since there is a limit that the sum of the values a must be 0, the network will preferentially update the value V, which is the average of the values Q, and the adjustment of the average is equivalent to updating all the values Q in the state at one time, so that the network not only updates the value Q of a certain action, but adjusts the value Q of all the actions in the state at one time.
Further, in the robot path planning, the merit function is a condition that the learning robot does not detect the obstacle, and the merit function is a condition that the learning robot knows that the robot detects the obstacle, and in order to solve the identifiability problem, the merit function is centralized:
where s denotes the state, a denotes the action, a' denotes the next action, a is an alternative action, ω is a common parameter for V and A, and α and β are parameters for V and A, respectively.
Further, the reward function is designed according to an artificial potential field method, and is decomposed into two parts: the first part is a position reward function which comprises a target guide reward function and an obstacle avoidance reward function, wherein the target reward function is used for guiding the robot to quickly reach a target point, and the obstacle avoidance reward function is used for keeping the robot and an obstacle at a certain distance;
the second part is a direction reward function, the current orientation of the robot plays a key role in rational navigation, and the direction reward function is designed to guide the robot to move towards a target point in view of the fact that the direction of the resultant force applied to the robot in the artificial potential field can well fit the moving direction of the robot.
Further, in the position reward function, the gravity potential field function is firstly used for constructing a target guide reward function:
where ζ represents the gravity reward function constant, d goal Representing the distance between the current position and the target point;
secondly, constructing an obstacle avoidance reward function by using a repulsive force potential field function, wherein the reward is a negative reward and is reduced along with the reduction of the distance between the robot and the obstacle:
where eta represents a repulsive reward function constant, d obs Indicating current position to obstacleDistance between objects, d max Representing the maximum impact distance of the obstacle.
Further, in the direction reward function, the angular difference between the expected direction and the actual direction of the robot is expressed as:
wherein, F q Denotes the desired direction, F a Which represents the actual direction of the light beam,representing the angle between the expected direction and the actual direction;
thus, the directional reward function may be expressed as:
further, the overall reward function may be expressed as:
the overall reward function for a mobile robot is expressed as:
wherein r is goal Representing the radius of the target area, r, centered on the target point obs Indicating the radius of the impact zone centered on the obstacle.
Designing a simulation environment, interacting the mobile robot with the environment, acquiring training data, sampling the training data to carry out simulation training on the mobile robot, and completing collision-free path planning;
and step three, training the DM-DQN algorithm to obtain an experience reward value, and completing the collision-free path planning of the robot.
The specific experimental steps are as follows:
a virtual simulation environment is created through a Gazebo simulator, and a robot model is established through the Gazebo to realize a path planning task, wherein the simulation environment comprises a static environment and a dynamic and static environment as shown in FIG. 2, FIG. 2(a) is the static environment, and FIG. 2(b) is the dynamic environment;
and implementing a path planning algorithm by adopting a python language and calling a built-in Gazebo simulator to control the motion of the robot and acquire the perception information of the robot.
The DM-DQN algorithm obtains an experience reward value through 320 times of simulation training, as shown in fig. 3, fig. 3(a) and (b) respectively show that the DM-DQN algorithm records the accumulated reward of each round and the average reward of the agent in static environment and dynamic and static environment, wherein each point represents one round, and a black curve represents the average reward, which indicates that the DM-DQN adopts a competitive network structure, and action selection and action evaluation are decoupled so that it has a faster learning rate, so that the experience of environmental exploration in the early stage can be more fully utilized, thereby obtaining a greater reward.
7 points are appointed for the robot to navigate, the robot autonomously reaches the No. 2-No. 7 positions from the No. 1 position in sequence without collision in an unknown environment and then returns to the No. 1 position, and collision-free path planning is achieved, as shown in FIG. 4.
As shown in table 1, comparing the DM-DQN algorithm of the present invention with the existing algorithm under the same training condition, the average moving times to a target point and the number of times to successfully reach the target point in 300 rounds are respectively compared, and it can be found from the table that the average moving times of DM-DQN is the least, and the success times is increased by 50% compared with the DQN algorithm; increased by 23.6% compared with dulling DQN; compared with M-DQN, the increase is 19.3%.
TABLE 1
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.
Claims (7)
1. A DM-DQN-based mobile robot path planning method is characterized by comprising the following steps:
step one, establishing a mobile robot path planning model based on DM-DQN;
designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm;
and step three, training the DM-DQN algorithm to obtain an experience reward value, and completing the collision-free path planning of the robot.
2. The DM-DQN based mobile robot path planning method according to claim 1, wherein the structure of the DM-DQN network model is divided into cost function V (s, ω, α) and dominance function a (s, a, ω, β), and the output of DM-DQN network model is expressed as:
Q(s,a,ω,α,β)=V(s,ω,α)+A(s,a,ω,β) (4)
where s represents the state, a represents the motion, ω is a parameter common to V and a, α and β are parameters of V and a, respectively, and the value of V is the average of the values of Q in the state of s.
3. The DM-DQN based mobile robot path planning method of claim 2, in which the dominance function is centralized, the output of DM-DQN network model is expressed as:
where s represents the state, a represents the action, a' represents the next action, a is the alternative action, ω is a parameter common to V and a, and α and β are the parameters of V and a, respectively.
4. The DM-DQN based mobile robot path planning method of claim 1, wherein: the reward function is divided into a position reward function and a direction reward function, and a total reward function is calculated according to the position reward function and the direction reward function.
5. The DM-DQN based mobile robot path planning method of claim 4, wherein in the location reward function, an objective guided reward function is first constructed using a gravitational potential field function:
where ζ represents the gravity reward function constant, d goal Representing the distance between the current position and the target point;
secondly, constructing an obstacle avoidance reward function by using a repulsive force potential field function:
where eta represents a repulsive reward function constant, d obs Indicating the distance between the current position and the obstacle, d max Representing the maximum impact distance of the obstacle.
6. The DM-DQN based mobile robot path planning method of claim 4, wherein the direction reward function is expressed in terms of the angle difference between the robot's expected and actual directions, the formula of the angle difference is:
wherein, F q Denotes the desired direction, F a Which represents the actual direction of the light beam,representing the angle between the expected direction and the actual direction;
the directional reward function is expressed as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210673628.2A CN115047878A (en) | 2022-06-13 | 2022-06-13 | DM-DQN-based mobile robot path planning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210673628.2A CN115047878A (en) | 2022-06-13 | 2022-06-13 | DM-DQN-based mobile robot path planning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115047878A true CN115047878A (en) | 2022-09-13 |
Family
ID=83161444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210673628.2A Pending CN115047878A (en) | 2022-06-13 | 2022-06-13 | DM-DQN-based mobile robot path planning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115047878A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116382304A (en) * | 2023-05-26 | 2023-07-04 | 国网江苏省电力有限公司南京供电分公司 | DQN model-based multi-inspection robot collaborative path planning method and system |
CN116527567A (en) * | 2023-06-30 | 2023-08-01 | 南京信息工程大学 | Intelligent network path optimization method and system based on deep reinforcement learning |
CN117474295A (en) * | 2023-12-26 | 2024-01-30 | 长春工业大学 | Multi-AGV load balancing and task scheduling method based on lasting DQN algorithm |
CN118534913A (en) * | 2024-07-25 | 2024-08-23 | 中国人民解放军国防科技大学 | Large ship path planning method integrating DQN and artificial potential field |
-
2022
- 2022-06-13 CN CN202210673628.2A patent/CN115047878A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116382304A (en) * | 2023-05-26 | 2023-07-04 | 国网江苏省电力有限公司南京供电分公司 | DQN model-based multi-inspection robot collaborative path planning method and system |
CN116382304B (en) * | 2023-05-26 | 2023-09-15 | 国网江苏省电力有限公司南京供电分公司 | DQN model-based multi-inspection robot collaborative path planning method and system |
CN116527567A (en) * | 2023-06-30 | 2023-08-01 | 南京信息工程大学 | Intelligent network path optimization method and system based on deep reinforcement learning |
CN116527567B (en) * | 2023-06-30 | 2023-09-12 | 南京信息工程大学 | Intelligent network path optimization method and system based on deep reinforcement learning |
CN117474295A (en) * | 2023-12-26 | 2024-01-30 | 长春工业大学 | Multi-AGV load balancing and task scheduling method based on lasting DQN algorithm |
CN117474295B (en) * | 2023-12-26 | 2024-04-26 | 长春工业大学 | Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method |
CN118534913A (en) * | 2024-07-25 | 2024-08-23 | 中国人民解放军国防科技大学 | Large ship path planning method integrating DQN and artificial potential field |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115047878A (en) | DM-DQN-based mobile robot path planning method | |
Jiang et al. | Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge | |
Qiu et al. | A multi-objective pigeon-inspired optimization approach to UAV distributed flocking among obstacles | |
CN112677995B (en) | Vehicle track planning method and device, storage medium and equipment | |
CN109144102B (en) | Unmanned aerial vehicle route planning method based on improved bat algorithm | |
US7765029B2 (en) | Hybrid control device | |
CN111930121B (en) | Mixed path planning method for indoor mobile robot | |
CN112731916A (en) | Global dynamic path planning method integrating skip point search method and dynamic window method | |
CN114489059A (en) | Mobile robot path planning method based on D3QN-PER | |
CN111506063B (en) | Mobile robot map-free navigation method based on layered reinforcement learning framework | |
Cai et al. | A PSO-based approach with fuzzy obstacle avoidance for cooperative multi-robots in unknown environments | |
CN113759901A (en) | Mobile robot autonomous obstacle avoidance method based on deep reinforcement learning | |
Chang et al. | Interpretable fuzzy logic control for multirobot coordination in a cluttered environment | |
CN116360457A (en) | Path planning method based on self-adaptive grid and improved A-DWA fusion algorithm | |
Regier et al. | Improving navigation with the social force model by learning a neural network controller in pedestrian crowds | |
CN117434950A (en) | Mobile robot dynamic path planning method based on Harris eagle heuristic hybrid algorithm | |
Wu et al. | Distributed multirobot path planning based on MRDWA-MADDPG | |
Raiesdana | A hybrid method for industrial robot navigation | |
Smit et al. | Informed sampling-based trajectory planner for automated driving in dynamic urban environments | |
Feng et al. | A hybrid motion planning algorithm for multi-robot formation in a dynamic environment | |
CN115542921A (en) | Autonomous path planning method for multiple robots | |
Yung et al. | Avoidance of moving obstacles through behavior fusion and motion prediction | |
Gao et al. | A survey on path planning for mobile robot systems | |
CN114740873A (en) | Path planning method of autonomous underwater robot based on multi-target improved particle swarm algorithm | |
CN114545971A (en) | Multi-agent distributed flyable path planning method, system, computer equipment and medium under communication constraint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |