CN115268494B - Unmanned aerial vehicle path planning method based on layered reinforcement learning - Google Patents
Unmanned aerial vehicle path planning method based on layered reinforcement learning Download PDFInfo
- Publication number
- CN115268494B CN115268494B CN202210883240.5A CN202210883240A CN115268494B CN 115268494 B CN115268494 B CN 115268494B CN 202210883240 A CN202210883240 A CN 202210883240A CN 115268494 B CN115268494 B CN 115268494B
- Authority
- CN
- China
- Prior art keywords
- algorithm
- aerial vehicle
- unmanned aerial
- path
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000002787 reinforcement Effects 0.000 title claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 132
- 238000012549 training Methods 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims abstract description 23
- 230000009471 action Effects 0.000 claims description 65
- 241000251468 Actinopterygii Species 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 11
- 230000006399 behavior Effects 0.000 claims description 9
- 230000019637 foraging behavior Effects 0.000 claims description 9
- 230000003068 static effect Effects 0.000 description 19
- 238000001514 detection method Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 5
- 239000013598 vector Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002431 foraging effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which comprises the following steps: step 1: initializing a deep Q network algorithm and a Q learning algorithm; step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a deep Q network algorithm and a Q learning algorithm; when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm; when the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm; step 3: and (3) repeating the step (2) until training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle, and planning a path through the trained deep Q network algorithm and the trained Q learning algorithm. The method solves the problem that the network fitting is easily affected by dynamic obstacles when a single algorithm is applied to a dynamic environment, and improves the performance of algorithm path planning.
Description
Technical Field
The invention relates to the technical field of unmanned aerial vehicle path planning, in particular to an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning.
Background
In recent years, unmanned aerial vehicles are widely applied in a plurality of fields of military use and civil use, so that the demands for autonomy are stronger, and the unmanned aerial vehicle autonomous path planning is the key point of research. At present, most of unmanned plane path planning researches focus on path planning in a static environment, and the research on dynamic environments is relatively few. In the current methods of path planning, reinforcement learning is a hot spot method of path planning due to the unique reward and punishment mechanism and the characteristic of autonomous learning of an optimal strategy through interaction with the environment. Q learning (Q-learning), which is the most classical algorithm for reinforcement learning, is widely applied to the path planning problem of unmanned aerial vehicles. However, Q learning cannot be applied to scenes with complex environments or large dimensions of state space due to the characteristics of table learning. Deep reinforcement learning in combination with deep learning has been proposed and applied to various complex unmanned path planning problems, the most widely used of which is the Deep Q Network (DQN) algorithm.
However, the inventor finds that in the unmanned aerial vehicle dynamic path planning problem based on the deep Q network algorithm, the reinforcement learning algorithm adopts a search strategy of random selection action, so that the efficiency at the initial stage of training is low, the iteration times are too long, and the planned path is not optimal. This situation is more severe in complex environments where dynamic and static obstacles coexist. In addition, it is found that when a single deep Q network algorithm faces a dynamic environment, the position of a dynamic obstacle is not fixed, so that the fitting of a network in the training process is poor, and the performance of the finally trained network is poor.
It can be seen that the prior art has the technical problems of low training efficiency and easily influenced network fitting.
Disclosure of Invention
The invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which aims to solve the problems of low training efficiency and easiness in influence of network fitting in the prior art.
The invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which comprises the following steps:
Step 1: initializing a deep Q network algorithm and a Q learning algorithm;
Step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a deep Q network algorithm and a Q learning algorithm;
When the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
When the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm;
Step 3: and (3) repeating the step (2) until training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle, and planning a path through the trained deep Q network algorithm and the trained Q learning algorithm.
Further, when the unmanned aerial vehicle does not detect the dynamic obstacle, the depth Q network algorithm performs planning on the path, and then the method further comprises updating the Q learning algorithm through an experience tuple generated in the depth Q network algorithm after the path is planned currently. At this time, the reward function used by the update depth Q network algorithm is consistent with the normal update of the reward function;
when the unmanned aerial vehicle detects a dynamic obstacle, the Q learning algorithm performs planning on the path, and the method further comprises updating the deep Q network algorithm through an experience tuple generated in the Q learning algorithm after the path is planned currently.
Further, when the Q learning algorithm is updated by the experience tuple generated in the deep Q network algorithm after the current planning path, the reward function formula used by the Q learning algorithm is as follows:
reward=η(ds-1-ds)
Wherein η is a constant; d s-1 is the distance from the unmanned aerial vehicle to the target point at the last moment; d s is the distance from the unmanned plane to the target point at the current moment.
Further, in the step 2, before planning the path by the deep Q network algorithm and the Q learning algorithm, the method further includes: the heuristic fish algorithm is used as action guidance of a deep Q network algorithm and a Q learning algorithm in path planning; wherein the heuristic fish algorithm comprises: the method comprises the following steps of a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is used for acquiring the collision direction of an unmanned opportunity and surrounding obstacles; the foraging behavior process is to acquire a plurality of directions with high priority of the unmanned aerial vehicle advancing towards the target point, and the heuristic fish algorithm takes the collision direction out of the plurality of directions with high priority as action guidance.
Further, when the collision direction of the unmanned aerial vehicle with surrounding obstacles is obtained, and when the obstacles are dynamic, whether the unmanned aerial vehicle collides with the obstacles or not is judged through the movement direction and the movement speed of the obstacles.
The invention has the beneficial effects that:
the invention adds the action guidance strategy of heuristic fish algorithm into the action selection strategy of basic deep Q network algorithm and Q learning algorithm. And the action guidance is performed on two aspects of quickly reaching the target point and avoiding the dynamic and static obstacle, and the action guidance greatly reduces unnecessary exploration in the early stage of algorithm training so as to reduce the blindness of the original algorithm exploration.
The invention utilizes hierarchical reinforcement learning, and when facing a dynamic complex environment, two algorithms are used to respectively treat static and dynamic obstacles. The design overcomes the problem that network fitting is easily affected by dynamic obstacles when a single algorithm is applied to a dynamic environment, and improves the performance of algorithm path planning.
The two effects respectively solve the problems that the algorithm training efficiency is low and the planned path lacks safety consideration in the prior art.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a schematic view of detection of a drone sensor in an environment according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a heuristic fish algorithm according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a foraging behavior of a heuristic fish algorithm according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a traveling behavior of a heuristic fish algorithm according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which has a flow structure shown in figure 1 and comprises the following steps:
Step 1: initializing network parameter theta of depth Q network algorithm, experience playback zone Q tables for Q learning; initializing a training round number N episode, and setting a starting point P O and a target point P T of the unmanned aerial vehicle flight task;
Step 2: when the training round number is smaller than the set maximum round number, the state and the environment are reset, and the training of the round is started. Detecting the environment according to a sensor, judging whether dynamic obstacles exist in a detection range, wherein the detection range of the sensor is shown in figure 2;
When the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
And the depth Q network algorithm selects and executes the action according to the current position of the unmanned aerial vehicle and the position information of the static obstacle by using the heuristic fish algorithm as the action guidance of the algorithm, and then reaches the next state. Rewards for the current action may be derived from a rewards function, embodiments of the invention set the static path planning partial rewards function to:
Alpha, beta are constants that determine the weights of the two reward calculation units in the total reward function. According to experimental debugging, the present example sets α, β to 1.1,2, respectively. d s represents the distance between the unmanned plane and the target point in the last state; d s-1 denotes the distance between the drone and the target point in the next state. Is the distance from the unmanned aerial vehicle to each static obstacle.
Storing the experience tuple [ S, A, R, S' ] consisting of the current state, action, rewards and the next state obtained in the interaction into an experience playback areaIs a kind of medium. The algorithm then plays back zone/>, from experience, according to the set batch number mThe data is sampled to update the Q network of the deep Q network algorithm.
Meanwhile, when the deep Q network algorithm and the Q learning algorithm are switched and used, if one party is completely separated from the other party to stop working, the Q value of the partial state action pair is lost after training of the two algorithms is completed. To avoid this problem, when the deep Q network algorithm works, the Q table of the Q learning algorithm is also updated by using the experience tuples interactively generated in the previous step, and at this time, since the Q learning algorithm has no dynamic obstacle in the range of the unmanned aerial vehicle sensor in the non-working period, the reward function is defined as:
reward=η(ds-1-ds)
Finally, if the action taken by the unmanned aerial vehicle at this time causes a collision, ending and starting a new training round; if no collision is caused, the current round of training is continued.
When the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm;
And the Q learning algorithm selects and executes the action according to the current position of the unmanned aerial vehicle and the information of the detected dynamic obstacle by using the heuristic fish algorithm as the action guidance of the algorithm, and the next state is reached. For the bonus function of the dynamic path planning part, the embodiment of the invention sets it as:
Gamma and delta are weight constants, and gamma and delta are respectively set as 1.1,1 according to experimental debugging; d' u→t,du→t represents the distance between the unmanned plane and the target point at the previous moment and the current moment respectively; d' u→o,du→o represents the distance of the unmanned aerial vehicle from the dynamic obstacle to be avoided at the previous moment and the current moment respectively.
And then, updating the Q table of the Q learning algorithm according to the information tuples [ S, A, R, S' ] obtained by the interaction.
And updating the network of the deep Q network algorithm by using the experience tuple obtained in the previous interaction. At this time, the reward function is consistent with the reward function when the deep Q network algorithm actually performs static path planning.
Finally, if the action taken by the unmanned aerial vehicle at this time causes a collision, ending and starting a new training round; if no collision is caused, the current round of training is continued.
Step 3: repeating the step 2, and ending the current round if the unmanned aerial vehicle reaches the target point; if the current training round number of the unmanned plane reaches the set maximum round number N episode, training of the deep Q network algorithm and the Q learning algorithm is completed. At the moment, the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle are set, and a path is planned through a depth Q network algorithm and a Q learning algorithm which are completed through training.
In step 2, before planning the path by the deep Q network algorithm and the Q learning algorithm, the method further includes: the heuristic fish algorithm is used as action guidance of a deep Q network algorithm and a Q learning algorithm in path planning; the heuristic fish algorithm is inspired by the phenomenon that fishes can feed by utilizing side organs in dark environments in nature, and comprises the following steps: the method comprises the following steps of a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is used for acquiring the collision direction of an unmanned opportunity and surrounding obstacles; the foraging behavior process is to acquire a plurality of directions with high priority of the unmanned aerial vehicle advancing towards the target point, and the heuristic fish algorithm takes the collision direction out of the plurality of directions with high priority as action guidance. The algorithm flow is shown in fig. 3, and comprises the following steps:
Step 21: when the depth Q network algorithm or the Q learning algorithm calls the heuristic fish algorithm to select actions, the current state, the target point position and the information containing the dynamic and static obstacles are input into the heuristic fish algorithm. The experimental environment adopted by the invention is a grid environment, the unmanned aerial vehicle can take eight actions in directions, and the heuristic fish algorithm is responsible for selecting the optimal action in the current state.
Step 22: the foraging behavior calculates the set of selectable actions based on the current state and the target point location, as shown in fig. 4. Let the direction vector that unmanned aerial vehicle current position and target point constitute be L u→t,Lhorizontal and be a unit vector of unmanned aerial vehicle forward direction, then the contained angle of two vectors is:
Next, L action, action e a is a unit direction vector on a certain action in the action space, and the included angle between each action and L horizontal is:
The difference between θ t and each θ action is:
And finally, giving priority to each action from high to low according to the difference from small to large, and returning to the action set with the first five priorities.
Step 23: the traveling behavior calculates an optional action set that does not cause collision according to the current state and the moving and static obstacle information, as shown in fig. 5, gray squares represent static obstacles, and diagonal squares represent dynamic obstacles.
For avoiding the static obstacle, the position information of the static obstacle is utilized, when the unmanned aerial vehicle executes a certain action and enters the area of the static obstacle, the action is set as the forbidden action of the current state, and the available action is returned.
For avoiding a dynamic obstacle, predicting a threat area of the dynamic obstacle at the next moment according to a dynamic obstacle information set (speed, direction, position) detected by a sensor, setting the action as a forbidden action of the current state when the unmanned aerial vehicle executes a certain action and returning to an available action.
Step 24: combining the actions returned in step 22 and step 23, returning a plurality of actions with high priority and without collision to the deep Q network algorithm or Q learning. The call ends.
The process of the specific embodiment is exemplified in a simulation manner, and is specifically as follows:
example 1: hierarchical reinforcement learning
Step 1: initializing network parameters of a deep Q network algorithm, empirical playback zone size1000000; The Q table of the Q learning algorithm is initialized. Setting the total training round number as 500 rounds, wherein the starting point P O = [0,0] and the target point P T = [29, 29] of the unmanned aerial vehicle flight task;
Step 2: the sensor detection range is set to 3 as shown in fig. 2.
And if no dynamic obstacle exists in the current detection range of the unmanned aerial vehicle, calling a depth Q algorithm to conduct static path planning, and then calling a heuristic fish algorithm to conduct action selection. The unmanned aerial vehicle performs the selected action to enter the next state, and obtains a reward for performing the action. The algorithm deposits the experience tuples to the experience playback zone. And meanwhile, updating network parameters from the experience playback zone according to the set batch m=16 sampling information, and updating a Q table of a Q learning algorithm by using the experience tuple.
And if the dynamic obstacle exists in the detection range as in the case of fig. 2, calling a Q learning algorithm to perform dynamic path planning. And the heuristic fish algorithm is also called to select an action, and then the unmanned aerial vehicle executes the selected action to enter the next state and obtains rewards of the action. And finally, updating the Q table by using the experience tuple and simultaneously updating the network of the deep Q network algorithm by using the experience tuple.
Step 3: the unmanned plane is constantly circulated in the interaction process with the environment: detecting a dynamic obstacle, switching an algorithm, selecting an action, executing the action, calculating rewards, updating a Q network/Q table until collision with the obstacle occurs or a target point is reached, and ending the current round. When the total training round number reaches the set N episode, the whole training is ended.
Example 2: heuristic fish algorithm
Step 1: the heuristic is invoked by a deep Q network algorithm or a Q learning algorithm and inputs information including the current state, the target point position, and the dynamic and static obstacles. The heuristic algorithm performs the foraging and travelling actions, respectively, to select the available action set.
Step 2: the foraging behavior calculates theta t,θaction according to the current state and the target point position, calculates the difference value between theta t and each theta action, gives eight actions different priorities according to the difference value, and returns to the action of the first five priorities. Referring to FIG. 4, the set of priority actions returned in this case is [ front left, front right, rear left ].
Step 3: the traveling behavior returns actions which do not lead to collisions according to the information of static and dynamic obstacles. For static obstacles, the action of selecting to enter the area is forbidden because of the fixed position; for dynamic obstacles, the position of the obstacle at the next moment is predicted by using a set [ speed, direction, position ], and then the action of selecting to enter the area is forbidden. As shown in fig. 5, a gray box is a static obstacle, a diagonal line is a dynamic obstacle, and the information of the dynamic obstacle is [1, left, current position ], so that the next time is the marked area in the figure. Finally, the actions causing collision [ left, right back ] are removed, and the rest 6 actions are optional actions.
Step 4: and 2, integrating the actions returned in the step 3, wherein the optional action set returned is [ left front, right front, left back ], and the call is ended.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.
Claims (4)
1. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning is characterized by comprising the following steps of:
Step 1: initializing a deep Q network algorithm and a Q learning algorithm;
Step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a deep Q network algorithm and a Q learning algorithm;
when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm, and updating a Q learning algorithm through an experience tuple generated in the depth Q network algorithm after the path is planned currently;
when the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm, and updating a depth Q network algorithm through an experience tuple generated in the Q learning algorithm after the path is planned currently;
Step 3: and (3) repeating the step (2) until training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle, and planning a path through the trained deep Q network algorithm and the trained Q learning algorithm.
2. The hierarchical reinforcement learning-based unmanned aerial vehicle path planning method of claim 1, wherein when the Q learning algorithm is updated by an empirical tuple generated in a depth Q network algorithm after a current planned path, a reward function formula used by the Q learning algorithm is as follows:
reward=η(ds-1-ds)
Wherein η is a constant; d s-1 is the distance from the unmanned aerial vehicle to the target point at the last moment; d s is the distance from the unmanned plane to the target point at the current moment.
3. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning according to claim 1, wherein in the step 2, before planning the path by the deep Q network algorithm and the Q learning algorithm, the method further comprises: the heuristic fish algorithm is used as action guidance of a deep Q network algorithm and a Q learning algorithm in path planning; wherein the heuristic fish algorithm comprises: the method comprises the following steps of a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is used for acquiring the collision direction of an unmanned opportunity and surrounding obstacles; the foraging behavior process is to acquire a plurality of directions with high priority of the unmanned aerial vehicle advancing towards the target point, and the heuristic fish algorithm takes the collision direction out of the plurality of directions with high priority as action guidance.
4. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning according to claim 3, wherein when the direction in which the unmanned aerial vehicle collides with surrounding obstacles is obtained, and when the obstacles are dynamic, whether the unmanned aerial vehicle collides with the obstacles is judged by the movement direction and movement speed of the obstacles.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210883240.5A CN115268494B (en) | 2022-07-26 | 2022-07-26 | Unmanned aerial vehicle path planning method based on layered reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210883240.5A CN115268494B (en) | 2022-07-26 | 2022-07-26 | Unmanned aerial vehicle path planning method based on layered reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115268494A CN115268494A (en) | 2022-11-01 |
CN115268494B true CN115268494B (en) | 2024-05-28 |
Family
ID=83769868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210883240.5A Active CN115268494B (en) | 2022-07-26 | 2022-07-26 | Unmanned aerial vehicle path planning method based on layered reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115268494B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118394109A (en) * | 2024-06-26 | 2024-07-26 | 烟台中飞海装科技有限公司 | Simulated countermeasure training method based on multi-agent reinforcement learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
WO2019147235A1 (en) * | 2018-01-24 | 2019-08-01 | Ford Global Technologies, Llc | Path planning for autonomous moving devices |
CN113821041A (en) * | 2021-10-09 | 2021-12-21 | 中山大学 | Multi-robot collaborative navigation and obstacle avoidance method |
CN114003059A (en) * | 2021-11-01 | 2022-02-01 | 河海大学常州校区 | UAV path planning method based on deep reinforcement learning under kinematic constraint condition |
CN114518770A (en) * | 2022-03-01 | 2022-05-20 | 西安交通大学 | Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning |
CN114527759A (en) * | 2022-02-25 | 2022-05-24 | 重庆大学 | End-to-end driving method based on layered reinforcement learning |
CN114529061A (en) * | 2022-01-26 | 2022-05-24 | 江苏科技大学 | Method for automatically predicting garbage output distribution and planning optimal transportation route |
CN114625151A (en) * | 2022-03-10 | 2022-06-14 | 大连理工大学 | Underwater robot obstacle avoidance path planning method based on reinforcement learning |
-
2022
- 2022-07-26 CN CN202210883240.5A patent/CN115268494B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019147235A1 (en) * | 2018-01-24 | 2019-08-01 | Ford Global Technologies, Llc | Path planning for autonomous moving devices |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN113821041A (en) * | 2021-10-09 | 2021-12-21 | 中山大学 | Multi-robot collaborative navigation and obstacle avoidance method |
CN114003059A (en) * | 2021-11-01 | 2022-02-01 | 河海大学常州校区 | UAV path planning method based on deep reinforcement learning under kinematic constraint condition |
CN114529061A (en) * | 2022-01-26 | 2022-05-24 | 江苏科技大学 | Method for automatically predicting garbage output distribution and planning optimal transportation route |
CN114527759A (en) * | 2022-02-25 | 2022-05-24 | 重庆大学 | End-to-end driving method based on layered reinforcement learning |
CN114518770A (en) * | 2022-03-01 | 2022-05-20 | 西安交通大学 | Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning |
CN114625151A (en) * | 2022-03-10 | 2022-06-14 | 大连理工大学 | Underwater robot obstacle avoidance path planning method based on reinforcement learning |
Non-Patent Citations (6)
Title |
---|
D3QHF: A Hybrid Double-deck Heuristic Reinforcement Learning Approach for UAV Path Planning;Demin Pan,等;IEEE;20221231;1221-1226 * |
Study on interface temperature control of laser direct joining of CFRTP and aluminum alloy based on staged laser path planning;Qi Wang, 等;Optics and Laser Technology;20220609;第154卷;1-13 * |
基于MAXQ分层强化学习的有人机/无人机协同路径规划研究;程先峰,严勇杰;信息化研究;20200229;第46卷(第1期);13-19 * |
基于事件驱动的无人机强化学习避障研究;唐博文,等;广西科技大学学报;20190331(第1期);96-102 * |
基于分数阶MRAC 的四旋翼姿态控制;陈开元,等;电光与控制;20211231;第28卷(第12期);1-5 * |
王曌,胡立生.基于深度Q 学习的工业机械臂路径规划方法.化工自动化及仪表.(第2期),141-145. * |
Also Published As
Publication number | Publication date |
---|---|
CN115268494A (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108776483B (en) | AGV path planning method and system based on ant colony algorithm and multi-agent Q learning | |
CN109765893B (en) | Mobile robot path planning method based on whale optimization algorithm | |
Kurzer et al. | Decentralized cooperative planning for automated vehicles with hierarchical monte carlo tree search | |
CN111260027B (en) | Intelligent agent automatic decision-making method based on reinforcement learning | |
CN115268494B (en) | Unmanned aerial vehicle path planning method based on layered reinforcement learning | |
CN113561986A (en) | Decision-making method and device for automatically driving automobile | |
CN114089776B (en) | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning | |
CN111723931B (en) | Multi-agent confrontation action prediction method and device | |
CN112269382A (en) | Robot multi-target path planning method | |
CN117705113A (en) | Unmanned aerial vehicle vision obstacle avoidance and autonomous navigation method for improving PPO | |
CN114967721B (en) | Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet | |
CN112651486A (en) | Method for improving convergence rate of MADDPG algorithm and application thereof | |
CN112612275A (en) | Complex path planning system and method for database machine room | |
CN115373415A (en) | Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning | |
CN112613608A (en) | Reinforced learning method and related device | |
Han et al. | Multi-uav automatic dynamic obstacle avoidance with experience-shared a2c | |
CN111562740B (en) | Automatic control method based on multi-target reinforcement learning algorithm utilizing gradient | |
CN116579372A (en) | Multi-agent collaborative navigation method based on deep reinforcement learning | |
CN116839582A (en) | Unmanned aerial vehicle track planning method based on improved Q learning | |
Xiao et al. | Design of reward functions based on The DDQN Algorithm | |
CN113589810B (en) | Dynamic autonomous obstacle avoidance movement method and device for intelligent body, server and storage medium | |
CN114326720A (en) | Real-time obstacle avoidance method and system for unmanned ship | |
CN112947421B (en) | AUV autonomous obstacle avoidance method based on reinforcement learning | |
CN110955239B (en) | Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning | |
CN118363386B (en) | Escape game control method, equipment and medium for unmanned ship cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |