CN115268494A - Unmanned aerial vehicle path planning method based on layered reinforcement learning - Google Patents

Unmanned aerial vehicle path planning method based on layered reinforcement learning Download PDF

Info

Publication number
CN115268494A
CN115268494A CN202210883240.5A CN202210883240A CN115268494A CN 115268494 A CN115268494 A CN 115268494A CN 202210883240 A CN202210883240 A CN 202210883240A CN 115268494 A CN115268494 A CN 115268494A
Authority
CN
China
Prior art keywords
algorithm
unmanned aerial
aerial vehicle
path
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210883240.5A
Other languages
Chinese (zh)
Other versions
CN115268494B (en
Inventor
王�琦
潘德民
王栋
高尚
于化龙
崔弘杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202210883240.5A priority Critical patent/CN115268494B/en
Publication of CN115268494A publication Critical patent/CN115268494A/en
Application granted granted Critical
Publication of CN115268494B publication Critical patent/CN115268494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which comprises the following steps: step 1: initializing a deep Q network algorithm and a Q learning algorithm; step 2: driving the unmanned aerial vehicle to move from the starting point to the target point, and training a depth Q network algorithm and a Q learning algorithm; when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm; when the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm; and step 3: and (3) repeating the step (2) until the training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinate, the starting point coordinate and the target point coordinate of the unmanned aerial vehicle, and planning the path through the trained deep Q network algorithm and the trained Q learning algorithm. The invention overcomes the problem that the network fitting is easily influenced by dynamic obstacles when a single algorithm is applied to a dynamic environment, and improves the performance of algorithm path planning.

Description

Unmanned aerial vehicle path planning method based on layered reinforcement learning
Technical Field
The invention relates to the technical field of unmanned aerial vehicle path planning, in particular to an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning.
Background
In recent years, the wide application of unmanned aerial vehicles in many fields of military use and civil use makes the demand for autonomy of unmanned aerial vehicles stronger, wherein the autonomous path planning of unmanned aerial vehicles is the key point of research. In the current phase of research on unmanned aerial vehicle path planning, most of the research focuses on path planning in a static environment, and the research on a dynamic environment is less. In the prior art, reinforcement learning becomes a hotspot method of path planning due to the unique reward and punishment mechanism and the characteristic of autonomously learning an optimal strategy through interaction with the environment. Q-learning (Q-learning), which is the most classical algorithm of reinforcement learning, is widely applied to the path planning problem of unmanned aerial vehicles. However, due to the characteristics of table learning, Q learning cannot be applied to a scene with a complex environment or a large state space dimension. Therefore, deep reinforcement learning combined with deep learning is proposed and applied to various complicated unmanned aerial vehicle path planning problems, and the most widely applied method is a Deep Q Network (DQN) algorithm.
However, the inventor of the present invention finds that, in the problem of implementing the dynamic path planning of the unmanned aerial vehicle based on the deep Q network algorithm, the reinforcement learning algorithm employs a random selection action exploration strategy, which results in low efficiency at the initial training stage, too long iteration times, and a non-optimal planned path. This situation is further exacerbated in complex environments where dynamic and static obstacles coexist. In addition, it is found that when a single deep Q network algorithm faces a dynamic environment, the fitting of the network in the training process is not good due to the unfixed position of the dynamic barrier, and the finally trained network performance is also not good.
Therefore, the technical problems that the training efficiency is low and the network fitting is susceptible exist in the prior art.
Disclosure of Invention
The invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, and aims to solve the problems that in the prior art, training efficiency is low and network fitting is easily influenced.
The invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which comprises the following steps:
step 1: initializing a deep Q network algorithm and a Q learning algorithm;
step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a depth Q network algorithm and a Q learning algorithm;
when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
when the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm;
and step 3: and (3) repeating the step (2) until the training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinate, the starting point coordinate and the target point coordinate of the unmanned aerial vehicle, and planning the path through the trained deep Q network algorithm and the trained Q learning algorithm.
Further, when the unmanned aerial vehicle does not detect a dynamic obstacle, the Q-depth network algorithm plans a path, and updating the Q learning algorithm through an experience element group generated in the Q-depth network algorithm after the path is planned currently. At the moment, the reward function used by the updated depth Q network algorithm is kept consistent with the normal update of the reward function;
when the unmanned aerial vehicle detects a dynamic obstacle, the Q learning algorithm plans a path, and the method also comprises the step of updating the depth Q network algorithm through an experience tuple generated in the Q learning algorithm after the path is currently planned.
Further, when the Q learning algorithm is updated through the experience element group generated in the depth Q network algorithm after the path is planned currently, the reward function formula used by the Q learning algorithm is as follows:
reward=η(ds-1-ds)
wherein η is a constant; ds-1The distance from the unmanned aerial vehicle to the target point at the last moment; dsThe distance from the unmanned aerial vehicle to the target point at the current moment.
Further, in step 2, before the path is planned by the deep Q network algorithm and the Q learning algorithm, the method further includes: a heuristic fish algorithm is used as an action guide of a deep Q network algorithm and a Q learning algorithm in path planning; wherein the heuristic fish algorithm comprises: the method comprises a travelling behavior process and a foraging behavior process, wherein the travelling behavior process is to acquire the collision direction of the unmanned aerial vehicle and surrounding obstacles; the foraging behavior process is to acquire a plurality of high-priority directions of the unmanned aerial vehicle moving towards a target point, and the heuristic fish algorithm removes collision directions in the plurality of high-priority directions as action guidance.
Further, when the direction that unmanned aerial vehicle is likely to collide with surrounding obstacles is obtained, and when the obstacles are dynamic, whether the unmanned aerial vehicle collides with the obstacles is judged according to the movement direction and the movement speed of the obstacles.
The invention has the beneficial effects that:
the invention adds the action guidance strategy of the heuristic fish algorithm into the action selection strategy of the basic deep Q network algorithm and the Q learning algorithm. The method carries out action guidance on two aspects of fast reaching a target point and avoiding of dynamic and static obstacles, and the action guidance greatly reduces unnecessary exploration in the initial stage of algorithm training so as to reduce the blindness of original algorithm exploration.
The invention utilizes layered reinforcement learning to respectively process static and dynamic obstacles by using two algorithms when facing a dynamic complex environment. The design overcomes the problem that the network fitting is easily influenced by dynamic obstacles when a single algorithm is applied to a dynamic environment, and improves the performance of algorithm path planning.
The two effects respectively solve the problems that algorithm training efficiency is low and a planning path is lack of safety consideration in the prior art.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating the detection of UAV sensors in an environment according to an embodiment of the present invention;
FIG. 3 is a flow chart of a heuristic fish algorithm according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a situation in foraging behavior of a heuristic fish algorithm described in an embodiment of the present invention;
FIG. 5 is a diagram illustrating a case in which a heuristic fish algorithm performs according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, the flow structure of the method is shown in figure 1, and the method comprises the following steps:
step 1: network parameter theta, empirical playback zone for initializing deep Q network algorithm
Figure BDA0003765006590000041
And a Q table for Q learning; number of initial training rounds NepisodeSetting a starting point P of the flight mission of the unmanned aerial vehicleOAnd a target point PT
And 2, step: when the training round number is less than the set maximum round number, the state and the environment are reset, and the training of the round is started. Detecting the environment according to the sensor, and judging whether a dynamic barrier exists in a detection range, wherein the detection range of the sensor is shown in figure 2;
when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
and the depth Q network algorithm selects and executes actions according to the current position of the unmanned aerial vehicle and the position information of the static obstacle by using a heuristic fish algorithm as an action guide of the algorithm, and then reaches the next state. For the current action, the reward can be obtained by a reward function, and the embodiment of the invention sets the reward function of the static path planning part as follows:
Figure BDA0003765006590000042
alpha and beta are constants, and the weights of the two reward calculation units in the total reward function are determined. According to experimental commissioning, the present example sets α, β to 1.1,2, respectively. dsRepresenting the distance between the unmanned aerial vehicle and the target point in the last state; d is a radical ofs-1Representing the distance between the drone and the target point in the next state.
Figure BDA0003765006590000043
The distance from the unmanned aerial vehicle to each static obstacle.
Obtaining an experience tuple [ S, A, R, S 'consisting of the current state, the action, the reward and the next state from the interaction']Deposit to an experience playback zone
Figure BDA0003765006590000051
In (1). Then the algorithm follows the set batch number m to get from the experience playback zone
Figure BDA0003765006590000052
And sampling data to update the Q network of the deep Q network algorithm.
Meanwhile, when the deep Q network algorithm and the Q learning algorithm are switched to be used, if one algorithm is completely separated from the other algorithm to stop working, the Q value of a partial state action pair is lost after the two algorithms are trained. To avoid this problem, when the deep Q network algorithm works, the Q table of the Q learning algorithm is also updated by using the experience element group interactively generated in the previous step, and at this time, since the Q learning algorithm has no dynamic obstacle in the range of the drone sensor in the non-working period, the reward function is defined as:
reward=η(ds-1-ds)
finally, if the action taken by the drone this time results in a collision, ending and starting a new training round; if no collision is caused, the training of the current round is continued.
When the unmanned aerial vehicle detects a dynamic barrier in the moving process, planning a path by using a Q learning algorithm;
and the Q learning algorithm selects and executes actions according to the current position of the unmanned aerial vehicle and the information of the detected dynamic obstacles by using a heuristic fish algorithm as the action guidance of the algorithm, and the next state is reached. For the reward function of the dynamic path planning part, the embodiment of the invention sets the reward function as follows:
Figure BDA0003765006590000053
gamma, delta are weight constants, and according to experimental debugging, the example sets gamma, delta to be 1.1,1 respectively; d'u→t,du→tRespectively representing the distances between the unmanned aerial vehicle and the target point at the previous moment and the current moment; d'u→o,du→oAnd respectively representing the distances between the unmanned aerial vehicle and the hidden dynamic barrier at the previous moment and the current moment.
Then, the Q table of the Q learning algorithm is updated according to the information tuple [ S, A, R, S' ] obtained by the interaction.
And similarly, updating the network of the deep Q network algorithm by using the experience tuple obtained by the previous step of interaction. At this time, the reward function is consistent with the reward function when the deep Q network algorithm actually performs static path planning.
Finally, if the action taken by the drone this time results in a collision, ending and starting a new training round; if no collision is caused, the training of the current round is continued.
And step 3: repeating the step 2, and ending the current round if the unmanned aerial vehicle reaches the target point; if the current training round number of the unmanned aerial vehicle reaches the set maximum round number NepisodeAnd finishing the training of the deep Q network algorithm and the Q learning algorithm. At this moment, the unmanned aerial vehicle is arrangedAnd (4) planning the path by using the interpositional coordinates, the starting point coordinates and the target point coordinates through a trained deep Q network algorithm and a trained Q learning algorithm.
In step 2, before the path is planned by the deep Q network algorithm and the Q learning algorithm, the method further comprises the following steps: a heuristic fish algorithm is used as an action guide of a depth Q network algorithm and a Q learning algorithm in path planning; the heuristic fish algorithm is inspired by the phenomenon that fish can forage for food by using lateral line organs in a dark environment in nature, and comprises the following steps: the method comprises a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is to acquire the collision direction of the unmanned aerial vehicle and surrounding obstacles; the foraging behavior process is to obtain a plurality of high-priority directions of the unmanned aerial vehicle moving towards a target point, and the heuristic fish algorithm removes the collision direction from the high-priority directions to be used as an action guide. The algorithm flow is shown in fig. 3, and comprises the following steps:
step 21: when the depth Q network algorithm or the Q learning algorithm calls the heuristic fish algorithm to select actions, the current state, the position of a target point and information containing dynamic and static obstacles are input into the heuristic fish algorithm. The experimental environment adopted by the invention is a grid environment, the unmanned aerial vehicle can take actions in eight directions, and the heuristic fish algorithm is responsible for selecting the optimal action in the current state from the actions.
Step 22: the foraging behavior calculates a set of selectable actions according to the current state and the target point position, as shown in fig. 4. Let the direction vector formed by the current position of the unmanned aerial vehicle and the target point be Lu→t,LhorizontalThe vector is a unit vector in the forward direction of the unmanned aerial vehicle, and the included angle between the two vectors is as follows:
Figure BDA0003765006590000061
second, LactionThe action element A is a unit direction vector of a certain action in the action space, each action and LhorizontalThe included angle between the two is as follows:
Figure BDA0003765006590000062
then thetatAnd each thetaactionThe difference of (d) is:
Figure BDA0003765006590000063
and finally, giving priority to each action from high to low according to the difference from small to large, and returning to the action set with the first five priorities.
Step 23: the traveling behavior calculates an optional set of actions that will not cause a collision based on the current state and the information of the dynamic and static obstacles, as shown in fig. 5, where the gray squares represent static obstacles and the slashed squares represent dynamic obstacles.
For the avoidance of the static obstacle, the position information of the static obstacle is utilized, when the unmanned aerial vehicle executes a certain action and enters the area of the static obstacle, the action is set to be the forbidden action in the current state, and the available action is returned.
For the avoidance of the dynamic barrier, a threat area of the dynamic barrier at the next moment is predicted according to an information set [ speed, direction, position ] of the dynamic barrier detected by a sensor, when the unmanned aerial vehicle executes a certain action and enters the threat area, the action is set as a forbidden action in the current state, and an available action is returned.
Step 24: and (4) integrating the actions returned in the step (22) and the step (23), and returning a plurality of actions which have high priority and cannot cause collision to the deep Q network algorithm or the Q learning. And ending the calling.
The specific embodiment process is exemplified in a simulation manner, which specifically includes the following steps:
example 1: layered reinforcement learning
Step 1: initializing network parameters of a deep Q network algorithm, empirical playback zone size
Figure BDA0003765006590000071
1000000; initializing the Q-learning algorithmAnd Q table. Setting the total training round number as 500 rounds and starting point P of unmanned aerial vehicle flight taskO=[0,0]And a target point PT=[29,29];
Step 2: the sensor detection range is set to 3 as shown in fig. 2.
And if no dynamic barrier exists in the current detection range of the unmanned aerial vehicle, calling a depth Q algorithm to plan a static path, and then calling a heuristic fish algorithm to select actions. The drone executes the selected action into the next state while receiving a reward for performing the action. The algorithm stores the experience tuples into an experience replay area. And meanwhile, updating network parameters according to set batch m =16 sampling information from experience playback, and updating a Q table of a Q learning algorithm by using the experience tuple.
If a dynamic obstacle exists in the detection range, as shown in the condition of fig. 2, a Q learning algorithm is called to carry out dynamic path planning. And calling a heuristic fish algorithm to select the action, and then executing the selected action by the unmanned aerial vehicle to enter the next state and obtaining the reward of the action. And finally, updating the Q table by using the Q learning algorithm through the experience tuple, and updating the network of the deep Q network algorithm through the experience tuple.
And step 3: the unmanned aerial vehicle and the environment interact continuously: detecting dynamic obstacles → switching algorithm → selecting action → performing action → calculating reward → updating Q network/Q table until collision with obstacles or arrival at target point, ending current turn. When the total number of training rounds reaches the set NepisodeAnd when the training is finished, the whole training is finished.
Example 2: heuristic fish algorithm
Step 1: the heuristic method is called by a deep Q network algorithm or a Q learning algorithm, and inputs information including the current state, the target point position and dynamic and static obstacles. And performing foraging and traveling behavior selection on the available action sets respectively by using a heuristic algorithm.
And 2, step: calculating theta according to the current state and the position of the target pointt,θactionThen, calculate thetatAnd each thetaactionAnd then eight are assigned according to the differenceAnd (5) actions with different priorities and actions with the top five of the priorities are returned. Referring to FIG. 4, the returned set of priority actions in this case is [ left front, left, right front, left back ]]。
And 3, step 3: the traveling behavior returns an action that does not cause a collision according to the information of the static and dynamic obstacles. For static obstacles, the action of selecting to enter the area is forbidden due to the fixed position of the static obstacles; for a dynamic obstacle, the position of the obstacle at the next moment is predicted by using the set [ speed, direction, position ], and then the action of selecting to enter the area is prohibited. In the scenario of a traveling behavior shown in fig. 5, the gray box is a static obstacle, the oblique line is a dynamic obstacle, and the information of the dynamic obstacle is [1, left, current position ], so the next time is the marked area in the figure. Finally, the actions [ left and right rear ] which can cause the collision are removed, and the remaining 6 actions are selectable actions.
And 4, step 4: and (4) integrating the actions returned in the step (2) and the step (3), returning an optional action set of [ front left, front right, front left, back left ], and ending the call.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (5)

1. An unmanned aerial vehicle path planning method based on layered reinforcement learning is characterized by comprising the following steps:
step 1: initializing a deep Q network algorithm and a Q learning algorithm;
step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a depth Q network algorithm and a Q learning algorithm;
when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
when the unmanned aerial vehicle detects a dynamic barrier in the moving process, planning a path by using a Q learning algorithm;
and step 3: and (3) repeating the step (2) until the training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinate, the starting point coordinate and the target point coordinate of the unmanned aerial vehicle, and planning the path through the trained deep Q network algorithm and the trained Q learning algorithm.
2. The method for unmanned aerial vehicle path planning based on hierarchical reinforcement learning of claim 1, wherein when the unmanned aerial vehicle does not detect a dynamic obstacle, the deep Q network algorithm plans the path, and further comprising updating the Q learning algorithm by an experience element group generated in the deep Q network algorithm after the path is currently planned;
when the unmanned aerial vehicle detects a dynamic obstacle, the Q learning algorithm plans a path, and the method also comprises the step of updating the depth Q network algorithm through an experience tuple generated in the Q learning algorithm after the path is currently planned.
3. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning of claim 2, wherein when the Q learning algorithm is updated by an experience element group generated in a deep Q network algorithm after a path is currently planned, a reward function formula used by the Q learning algorithm is as follows:
reward=η(ds-1-ds)
wherein η is a constant; ds-1The distance from the unmanned aerial vehicle to the target point at the last moment; dsThe distance from the unmanned aerial vehicle to the target point at the current moment.
4. The method for unmanned aerial vehicle path planning based on hierarchical reinforcement learning according to claim 1, wherein in step 2, before the path is planned by the deep Q-network algorithm and the Q-learning algorithm, the method further includes: a heuristic fish algorithm is used as an action guide of a depth Q network algorithm and a Q learning algorithm in path planning; wherein the heuristic fish algorithm comprises: the method comprises a travelling behavior process and a foraging behavior process, wherein the travelling behavior process is to acquire the collision direction of the unmanned aerial vehicle and surrounding obstacles; the foraging behavior process is to acquire a plurality of high-priority directions of the unmanned aerial vehicle moving towards a target point, and the heuristic fish algorithm removes collision directions in the plurality of high-priority directions as action guidance.
5. The method for planning the path of the unmanned aerial vehicle based on the hierarchical reinforcement learning of claim 4, wherein when acquiring the direction of collision between the unmanned aerial vehicle and surrounding obstacles and when the obstacles are dynamic, whether the unmanned aerial vehicle collides with the obstacles is judged according to the moving direction and the moving speed of the obstacles.
CN202210883240.5A 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning Active CN115268494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210883240.5A CN115268494B (en) 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210883240.5A CN115268494B (en) 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning

Publications (2)

Publication Number Publication Date
CN115268494A true CN115268494A (en) 2022-11-01
CN115268494B CN115268494B (en) 2024-05-28

Family

ID=83769868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210883240.5A Active CN115268494B (en) 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning

Country Status (1)

Country Link
CN (1) CN115268494B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
WO2019147235A1 (en) * 2018-01-24 2019-08-01 Ford Global Technologies, Llc Path planning for autonomous moving devices
CN113821041A (en) * 2021-10-09 2021-12-21 中山大学 Multi-robot collaborative navigation and obstacle avoidance method
CN114003059A (en) * 2021-11-01 2022-02-01 河海大学常州校区 UAV path planning method based on deep reinforcement learning under kinematic constraint condition
CN114518770A (en) * 2022-03-01 2022-05-20 西安交通大学 Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning
CN114529061A (en) * 2022-01-26 2022-05-24 江苏科技大学 Method for automatically predicting garbage output distribution and planning optimal transportation route
CN114527759A (en) * 2022-02-25 2022-05-24 重庆大学 End-to-end driving method based on layered reinforcement learning
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019147235A1 (en) * 2018-01-24 2019-08-01 Ford Global Technologies, Llc Path planning for autonomous moving devices
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN113821041A (en) * 2021-10-09 2021-12-21 中山大学 Multi-robot collaborative navigation and obstacle avoidance method
CN114003059A (en) * 2021-11-01 2022-02-01 河海大学常州校区 UAV path planning method based on deep reinforcement learning under kinematic constraint condition
CN114529061A (en) * 2022-01-26 2022-05-24 江苏科技大学 Method for automatically predicting garbage output distribution and planning optimal transportation route
CN114527759A (en) * 2022-02-25 2022-05-24 重庆大学 End-to-end driving method based on layered reinforcement learning
CN114518770A (en) * 2022-03-01 2022-05-20 西安交通大学 Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DEMIN PAN,等: "D3QHF: A Hybrid Double-deck Heuristic Reinforcement Learning Approach for UAV Path Planning", IEEE, 31 December 2022 (2022-12-31), pages 1221 - 1226 *
QI WANG, 等: "Study on interface temperature control of laser direct joining of CFRTP and aluminum alloy based on staged laser path planning", OPTICS AND LASER TECHNOLOGY, vol. 154, 9 June 2022 (2022-06-09), pages 1 - 13 *
唐博文,等: "基于事件驱动的无人机强化学习避障研究", 广西科技大学学报, no. 1, 31 March 2019 (2019-03-31), pages 96 - 102 *
程先峰,严勇杰: "基于MAXQ分层强化学习的有人机/无人机协同路径规划研究", 信息化研究, vol. 46, no. 1, 29 February 2020 (2020-02-29), pages 13 - 19 *
陈开元,等: "基于分数阶MRAC 的四旋翼姿态控制", 电光与控制, vol. 28, no. 12, 31 December 2021 (2021-12-31), pages 1 - 5 *

Also Published As

Publication number Publication date
CN115268494B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
Wang et al. Learning to navigate through complex dynamic environment with modular deep reinforcement learning
CN113110509B (en) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN109254588B (en) Unmanned aerial vehicle cluster cooperative reconnaissance method based on cross variation pigeon swarm optimization
US20220315219A1 (en) Air combat maneuvering method based on parallel self-play
Botteghi et al. On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach
Huq et al. Mobile robot navigation using motor schema and fuzzy context dependent behavior modulation
CN113534819A (en) Method and storage medium for pilot-follow multi-agent formation path planning
CN111723931B (en) Multi-agent confrontation action prediction method and device
CN114967721B (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
Xue et al. Multi-agent deep reinforcement learning for UAVs navigation in unknown complex environment
Han et al. Multi-uav automatic dynamic obstacle avoidance with experience-shared a2c
CN115237151A (en) Multi-moving-object searching method for group unmanned aerial vehicle based on pheromone elicitation
Santos et al. Exploratory path planning using the Max-min ant system algorithm
CN115268494A (en) Unmanned aerial vehicle path planning method based on layered reinforcement learning
Panda et al. Autonomous mobile robot path planning using hybridization of particle swarm optimization and Tabu search
Liang et al. Hierarchical deep reinforcement learning for multi-robot cooperation in partially observable environment
Schwartz An object oriented approach to fuzzy actor-critic learning for multi-agent differential games
Duo et al. A deep reinforcement learning based mapless navigation algorithm using continuous actions
Patel et al. Scalable monte carlo tree search for cav s action planning in colliding scenarios
CN113189985B (en) Partially observable driving planning method based on adaptive particle and belief filling
CN115373415A (en) Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN114386556A (en) Target source positioning and obstacle avoidance method based on tabu search and particle swarm optimization
Patel et al. Adaptive reward for CAV action planning using Monte Carlo tree search
CN111562740A (en) Automatic control method based on multi-target reinforcement learning algorithm utilizing gradient
Niu et al. A plume-tracing strategy via continuous state-action reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant