CN112344944A - Reinforced learning path planning method introducing artificial potential field - Google Patents
Reinforced learning path planning method introducing artificial potential field Download PDFInfo
- Publication number
- CN112344944A CN112344944A CN202011327198.6A CN202011327198A CN112344944A CN 112344944 A CN112344944 A CN 112344944A CN 202011327198 A CN202011327198 A CN 202011327198A CN 112344944 A CN112344944 A CN 112344944A
- Authority
- CN
- China
- Prior art keywords
- value
- path planning
- potential field
- artificial potential
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000006870 function Effects 0.000 claims abstract description 21
- 230000002787 reinforcement Effects 0.000 claims abstract description 21
- 238000004088 simulation Methods 0.000 claims abstract description 5
- 230000000694 effects Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000002922 simulated annealing Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Automation & Control Theory (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Aviation & Aerospace Engineering (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
The invention discloses a reinforcement learning path planning method introducing an artificial potential field, which comprises the following steps: s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent; s2, initializing algorithm parameters; s3, selecting actions by adopting a dynamic factor adjustment strategy; s4, executing the action and updating the Q value; s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition; s6, selecting the action with the maximum Q value in each step to obtain an optimal path; and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path. Compared with the traditional algorithm, the improved Q-learning algorithm shortens the path planning time by 85.1%, reduces the iteration times before convergence by 74.7%, and improves the convergence result stability of the algorithm.
Description
Technical Field
The invention relates to the technical field of robot path planning, in particular to a reinforcement learning path planning method introducing an artificial potential field.
Background
With the development of science and technology, more and more mobile robots walk into the daily life of people. The problem of path planning for mobile robots is also becoming more and more important. The path planning technology can help the robot avoid the obstacle to plan an optimal movement path from the starting point to the target point under the condition of referring to a certain index. Path planning can be divided into global path planning and local path planning according to the known degree of environmental knowledge in the path planning process. The global path planning algorithm which is widely applied comprises an A-star algorithm, a dijkstra algorithm, a visual graph method, a free space method and the like; the local path planning algorithm comprises an artificial potential field algorithm, a genetic algorithm, a neural network algorithm, a reinforcement learning algorithm and the like. The reinforcement learning algorithm is an algorithm with relatively strong adaptability, and can continuously try and error to find an optimal path in a completely unknown environment, so that the reinforcement learning algorithm obtains more and more attention in the field of path planning of the mobile robot.
The reinforced learning algorithm which is most widely applied in the field of path planning of mobile robots is a Q-learning algorithm. The conventional Q-learning algorithm has the following problems: (1) all Q values are set to be 0 or random values in the initialization process, so that the intelligent agent can only search blindly in the initial stage, and excessive invalid iterations occur in the initial stage of the algorithm; (2) an epsilon-greedy strategy is adopted during action selection, too large epsilon value enables more exploration environments of the intelligent agent to be difficult to converge, and too small epsilon value enables the intelligent agent to find out a suboptimal solution due to insufficient environment exploration, and the relation between exploration and utilization is difficult to balance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a reinforcement learning path planning method introducing an artificial potential field, which introduces a gravitational field function of the artificial potential field in the Q value initialization process, so that the state value is larger when the artificial potential field is closer to a target position, an intelligent agent can search towards the target position in the initial stage, invalid iteration of the initial stage of the algorithm is reduced, and the path planning time of a mobile robot based on reinforcement learning is shortened.
A reinforcement learning path planning method for introducing an artificial potential field comprises the following steps:
s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent;
s2, initializing algorithm parameters;
s3, selecting actions by adopting a dynamic factor adjustment strategy;
s4, executing the action and updating the Q value;
s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition;
s6, selecting the action with the maximum Q value in each step to obtain an optimal path;
and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
Preferably, in step S1, the specific process is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, defining the grid as a target position, namely a position to be finally reached by the mobile robot; the other grids are defined as non-obstacle grids, and the robot can calculate the attraction value of each grid according to the formula (1);
where ζ is a scale factor greater than 0 for adjusting the magnitude of attraction; | d | is the distance between the current position and the position of the target point; eta is a normal number, and the attraction value at the target point is prevented from being infinite.
Preferably, in step S2, the parameters include: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilonmax,εmin,T,n;
Initializing the Q function using equation (2)
Wherein, P(s),| s, a) is the probability of transition to state s, V(s), from the current state s and action a determined,) As a function of the state value of the next state, V(s),)=Uatt。
Preferably, in step S3, the greedy factor adjustment strategy is as follows:
wherein the specific form of the tanh function is as follows:
e is the base of the natural logarithm, as the independent variabletWhen greater than 0, tanh: (t) The value range of (1) is (0);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilonmaxAnd εminThe set maximum value and minimum value of the search rate are respectively.
Preferably, in step S4, the action a selected in the third step is executed to arrive at S, the instant reward R (S, a) is obtained, the Q-value function is updated by using a Q-learning algorithm introducing artificial potential field, and the updating rule is as shown in formula (5)
Wherein (s, a) is a current state-action pair; (s),,a,) Is the state-action pair at the next time, and R (s, a) is the instant reward for performing action a at state s.
The invention has the beneficial effects that:
in order to solve the problems of low convergence speed, multiple iteration times, unstable convergence result and the like when the traditional reinforcement learning algorithm is applied to path planning of an unknown environment of a mobile robot, an improved Q-learning algorithm is provided. Introducing an artificial potential field method during state initialization, so that the state value is larger when the intelligent agent is closer to a target position, and the intelligent agent is guided to move towards the target position; and an epsilon-greedy strategy is improved on action selection, and a greedy factor epsilon is dynamically adjusted according to the convergence degree of the algorithm, so that the relationship between exploration and utilization is well balanced. Simulation results based on the grid map show that compared with the traditional algorithm, the improved Q-learning algorithm shortens the path planning time by 85.1%, the iteration frequency before convergence is reduced by 74.7%, and meanwhile, the convergence result stability of the algorithm is improved.
Drawings
FIG. 1 is a schematic view of the general flow of the process of the present invention.
Fig. 2 is a grid map of the operation of the mobile robot according to the embodiment of the present invention.
FIG. 3 is a diagram of conventional Q-learning convergence.
FIG. 4 is a diagram of improved Q-learning convergence according to an embodiment of the invention.
FIG. 5 is a diagram of an optimized path drawn by the improved Q-learning scheme according to the embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1, the method for planning a reinforcement learning path by introducing an artificial potential field according to the present invention includes the following steps:
the first step is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, determining that the grid is the target position and the position to which the mobile robot finally arrives; the other grids are defined as barrier-free grids, and the robot can pass through, and calculate the attraction value of each grid according to formula (1).
Zeta is a scale factor greater than 0 and is used for adjusting the size of the attraction force, | d | is the distance between the current position and the position of the target point, and η is a normal number, so that the attraction force value at the target point is prevented from being infinite.
Through the above steps, a simulation environment for training the reinforcement learning agent can be obtained, and the grid map for the mobile robot in the embodiment is shown in fig. 2.
The second step is that: initializing algorithm parameters, the parameters comprising: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilonmax,εmin,T,n。
The Q value function is initialized using equation (2).
Wherein, P(s),| s, a) is the probability of transition to state s, V(s), from the current state s and action a determined,) As a function of the state value of the next state, V(s),)=Uatt。
The third step: and selecting the action by adopting a greedy factor dynamic adjustment strategy, wherein the greedy factor dynamic adjustment strategy is as follows:
wherein the specific form of the tanh function is as follows:
e is the base of the natural logarithm, as the independent variabletWhen greater than 0, tanh: (t) The value range of (1) is (0);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilonmaxAnd εminThe set maximum value and minimum value of the search rate are respectively.
The fourth step: performing the action a selected in the third step,Arrival s,Obtaining an instant reward R (s, a), updating the Q value function by using a Q-learning algorithm introducing an artificial potential field, and updating the rule as shown in the formula (5)
Wherein (s, a) is a current state-action pair; (s),,a,) Is the state-action pair at the next time; r (s, a) is an instant reward for performing action a in state s.
And repeatedly executing the third step and the fourth step until a certain step number or a certain convergence condition is reached.
The fifth step: and selecting the action with the maximum Q value in each step to obtain the optimal path.
And a sixth step: and sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
The parameter settings in this embodiment are as follows: learning rate a =0.01, discount factor γ =0.9, maximum number of iterations is set to 20000, scale factor ζ is set to 0.6, constant η is set to 1, and ∈ is setmax=0.5,εmin=0.01, T =500, n =10, the reward function is set as:
in this embodiment, we can obtain the optimal path by using the above method and setting the parameters as shown in fig. 5.
Comparing fig. 3 and fig. 4, it can be seen that, by comparing the improved Q-learning algorithm with the conventional Q-learning algorithm, the improved algorithm shortens the algorithm convergence time by 85.1%, reduces the iteration number by 74.7%, and improves the convergence result stability of the algorithm.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
Claims (7)
1. A reinforcement learning path planning method for introducing an artificial potential field is characterized by comprising the following steps: the method comprises the following steps:
s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent;
s2, initializing algorithm parameters;
s3, selecting actions by adopting a dynamic factor adjustment strategy;
s4, executing the action and updating the Q value;
s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition;
s6, selecting the action with the maximum Q value in each step to obtain an optimal path;
and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
2. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising: the specific process of step S1 is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, defining the grid as a target position, namely a position to be finally reached by the mobile robot; the other grids are defined as non-obstacle grids, and the robot can calculate the attraction value of each grid according to the formula (1);
where ζ is a scale factor greater than 0 for adjusting the magnitude of attraction; | d | is the distance between the current position and the position of the target point; eta is a normal number, and the attraction value at the target point is prevented from being infinite.
3. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising: in step S2, the parameters include: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilonmax,εmin,T,n;
Initializing the Q function using equation (2)
Wherein, P(s),| s, a) is the probability of transition to state s, V(s), from the current state s and action a determined,) As a function of the state value of the next state, V(s),)=Uatt。
4. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising:
in step S3, the greedy factor adjustment strategy is as follows:
wherein the specific form of the tanh function is as follows:
e is the base of the natural logarithm, and when the independent variable is greater than 0, the value range of the tanh () is (0, 1);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilonmaxAnd εminThe set maximum value and minimum value of the search rate are respectively.
5. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising:
in step S4, the action a selected in the third step is executed to S, the instant reward R (S, a) is obtained, the Q value function is updated by using the Q-learning algorithm introduced into the artificial potential field, and the updating rule is as the formula (5)
Wherein (s, a) is a current state-action pair; (s),,a,) Is the state-action pair at the next time; r (s, a) is an instant reward for performing action a in state s.
6. The method of reinforcement learning path planning incorporating an artificial potential field of claim 2, wherein: the scale factor ζ is set to 0.6 and the constant η is set to 1.
7. A method of reinforcement learning path planning incorporating an artificial potential field according to claim 3, characterised by: learning rate α =0.01, discount factor γ =0.9, maximum number of iterations is set to 20000, ∈ max =0.5, ∈ min =0.01, T =500, n =10, and the reward function is set to:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011327198.6A CN112344944B (en) | 2020-11-24 | 2020-11-24 | Reinforced learning path planning method introducing artificial potential field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011327198.6A CN112344944B (en) | 2020-11-24 | 2020-11-24 | Reinforced learning path planning method introducing artificial potential field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112344944A true CN112344944A (en) | 2021-02-09 |
CN112344944B CN112344944B (en) | 2022-08-05 |
Family
ID=74365572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011327198.6A Active CN112344944B (en) | 2020-11-24 | 2020-11-24 | Reinforced learning path planning method introducing artificial potential field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112344944B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112964272A (en) * | 2021-03-16 | 2021-06-15 | 湖北汽车工业学院 | Improved Dyna-Q learning path planning algorithm |
CN113534819A (en) * | 2021-08-26 | 2021-10-22 | 鲁东大学 | Method and storage medium for pilot-follow multi-agent formation path planning |
CN113790729A (en) * | 2021-11-16 | 2021-12-14 | 北京科技大学 | Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm |
CN113848911A (en) * | 2021-09-28 | 2021-12-28 | 华东理工大学 | Mobile robot global path planning method based on Q-learning and RRT |
CN114021775A (en) * | 2021-09-30 | 2022-02-08 | 成都海天数联科技有限公司 | Intelligent body handicap device putting method based on optimal solution |
CN114296440A (en) * | 2021-09-30 | 2022-04-08 | 中国航空工业集团公司北京长城航空测控技术研究所 | AGV real-time scheduling method integrating online learning |
CN114518758A (en) * | 2022-02-08 | 2022-05-20 | 中建八局第三建设有限公司 | Q learning-based indoor measuring robot multi-target-point moving path planning method |
CN115016499A (en) * | 2022-07-07 | 2022-09-06 | 吉林大学 | Path planning method based on SCA-QL |
CN115542912A (en) * | 2022-09-29 | 2022-12-30 | 福州大学 | Mobile robot path planning method based on improved Q-learning algorithm |
CN115629607A (en) * | 2022-10-25 | 2023-01-20 | 湖北汽车工业学院 | Reinforced learning path planning method integrating historical information |
CN116700258A (en) * | 2023-06-13 | 2023-09-05 | 重庆市荣冠科技有限公司 | Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning |
CN118131628A (en) * | 2024-03-12 | 2024-06-04 | 南通大学 | Mobile robot tracking control method based on multi-target point information fusion |
CN115016499B (en) * | 2022-07-07 | 2024-10-25 | 吉林大学 | SCA-QL-based path planning method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090327011A1 (en) * | 2008-06-30 | 2009-12-31 | Autonomous Solutions, Inc. | Vehicle dispatching method and system |
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
US20180012137A1 (en) * | 2015-11-24 | 2018-01-11 | The Research Foundation for the State University New York | Approximate value iteration with complex returns by bounding |
WO2018120739A1 (en) * | 2016-12-30 | 2018-07-05 | 深圳光启合众科技有限公司 | Path planning method, apparatus and robot |
CN110132296A (en) * | 2019-05-22 | 2019-08-16 | 山东师范大学 | Multiple agent sub-goal based on dissolution potential field divides paths planning method and system |
CN110307848A (en) * | 2019-07-04 | 2019-10-08 | 南京大学 | A kind of Mobile Robotics Navigation method |
US20190369637A1 (en) * | 2017-03-20 | 2019-12-05 | Mobileye Vision Technologies Ltd. | Trajectory selection for an autonomous vehicle |
CN110726416A (en) * | 2019-10-23 | 2020-01-24 | 西安工程大学 | Reinforced learning path planning method based on obstacle area expansion strategy |
CN110794842A (en) * | 2019-11-15 | 2020-02-14 | 北京邮电大学 | Reinforced learning path planning algorithm based on potential field |
CN111896006A (en) * | 2020-08-11 | 2020-11-06 | 燕山大学 | Path planning method and system based on reinforcement learning and heuristic search |
-
2020
- 2020-11-24 CN CN202011327198.6A patent/CN112344944B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090327011A1 (en) * | 2008-06-30 | 2009-12-31 | Autonomous Solutions, Inc. | Vehicle dispatching method and system |
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
US20180012137A1 (en) * | 2015-11-24 | 2018-01-11 | The Research Foundation for the State University New York | Approximate value iteration with complex returns by bounding |
WO2018120739A1 (en) * | 2016-12-30 | 2018-07-05 | 深圳光启合众科技有限公司 | Path planning method, apparatus and robot |
US20190369637A1 (en) * | 2017-03-20 | 2019-12-05 | Mobileye Vision Technologies Ltd. | Trajectory selection for an autonomous vehicle |
CN110132296A (en) * | 2019-05-22 | 2019-08-16 | 山东师范大学 | Multiple agent sub-goal based on dissolution potential field divides paths planning method and system |
CN110307848A (en) * | 2019-07-04 | 2019-10-08 | 南京大学 | A kind of Mobile Robotics Navigation method |
CN110726416A (en) * | 2019-10-23 | 2020-01-24 | 西安工程大学 | Reinforced learning path planning method based on obstacle area expansion strategy |
CN110794842A (en) * | 2019-11-15 | 2020-02-14 | 北京邮电大学 | Reinforced learning path planning algorithm based on potential field |
CN111896006A (en) * | 2020-08-11 | 2020-11-06 | 燕山大学 | Path planning method and system based on reinforcement learning and heuristic search |
Non-Patent Citations (3)
Title |
---|
YUKIYASU NOGUCHI 等: "Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs", 《2019 IEEE UNDERWATER TECHNOLOGY (UT)》, 13 June 2019 (2019-06-13), pages 1 - 6 * |
宋勇等: "移动机器人路径规划强化学习的初始化", 《控制理论与应用》, vol. 29, no. 12, 15 December 2012 (2012-12-15), pages 1623 - 1628 * |
徐晓苏 等: "基于改进强化学习的移动机器人路径规划方法", 《中国惯性技术学报》, vol. 27, no. 3, 30 June 2019 (2019-06-30), pages 314 - 320 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112964272A (en) * | 2021-03-16 | 2021-06-15 | 湖北汽车工业学院 | Improved Dyna-Q learning path planning algorithm |
CN113534819A (en) * | 2021-08-26 | 2021-10-22 | 鲁东大学 | Method and storage medium for pilot-follow multi-agent formation path planning |
CN113534819B (en) * | 2021-08-26 | 2024-03-15 | 鲁东大学 | Method and storage medium for pilot following type multi-agent formation path planning |
CN113848911A (en) * | 2021-09-28 | 2021-12-28 | 华东理工大学 | Mobile robot global path planning method based on Q-learning and RRT |
CN114021775A (en) * | 2021-09-30 | 2022-02-08 | 成都海天数联科技有限公司 | Intelligent body handicap device putting method based on optimal solution |
CN114296440A (en) * | 2021-09-30 | 2022-04-08 | 中国航空工业集团公司北京长城航空测控技术研究所 | AGV real-time scheduling method integrating online learning |
CN114296440B (en) * | 2021-09-30 | 2024-04-09 | 中国航空工业集团公司北京长城航空测控技术研究所 | AGV real-time scheduling method integrating online learning |
CN113790729A (en) * | 2021-11-16 | 2021-12-14 | 北京科技大学 | Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm |
CN113790729B (en) * | 2021-11-16 | 2022-04-08 | 北京科技大学 | Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm |
CN114518758B (en) * | 2022-02-08 | 2023-12-12 | 中建八局第三建设有限公司 | Indoor measurement robot multi-target point moving path planning method based on Q learning |
CN114518758A (en) * | 2022-02-08 | 2022-05-20 | 中建八局第三建设有限公司 | Q learning-based indoor measuring robot multi-target-point moving path planning method |
CN115016499A (en) * | 2022-07-07 | 2022-09-06 | 吉林大学 | Path planning method based on SCA-QL |
CN115016499B (en) * | 2022-07-07 | 2024-10-25 | 吉林大学 | SCA-QL-based path planning method |
CN115542912A (en) * | 2022-09-29 | 2022-12-30 | 福州大学 | Mobile robot path planning method based on improved Q-learning algorithm |
CN115542912B (en) * | 2022-09-29 | 2024-06-07 | 福州大学 | Mobile robot path planning method based on improved Q-learning algorithm |
CN115629607A (en) * | 2022-10-25 | 2023-01-20 | 湖北汽车工业学院 | Reinforced learning path planning method integrating historical information |
CN116700258A (en) * | 2023-06-13 | 2023-09-05 | 重庆市荣冠科技有限公司 | Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning |
CN116700258B (en) * | 2023-06-13 | 2024-05-03 | 万基泰科工集团数字城市科技有限公司 | Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning |
CN118131628A (en) * | 2024-03-12 | 2024-06-04 | 南通大学 | Mobile robot tracking control method based on multi-target point information fusion |
Also Published As
Publication number | Publication date |
---|---|
CN112344944B (en) | 2022-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112344944B (en) | Reinforced learning path planning method introducing artificial potential field | |
CN111896006B (en) | Path planning method and system based on reinforcement learning and heuristic search | |
Sherstov et al. | Function approximation via tile coding: Automating parameter choice | |
Tesauro | Extending Q-learning to general adaptive multi-agent systems | |
CN109765893A (en) | Method for planning path for mobile robot based on whale optimization algorithm | |
CN110806759A (en) | Aircraft route tracking method based on deep reinforcement learning | |
CN110632922B (en) | Path planning method based on bat algorithm and reinforcement learning | |
CN108594803B (en) | Path planning method based on Q-learning algorithm | |
CN107703751A (en) | PID controller optimization method based on dragonfly algorithm | |
CN108490965A (en) | Rotor craft attitude control method based on Genetic Algorithm Optimized Neural Network | |
CN106850289B (en) | Service combination method combining Gaussian process and reinforcement learning | |
CN113552891A (en) | Robot multi-target path planning based on improved butterfly optimization algorithm | |
CN115115284B (en) | Energy consumption analysis method based on neural network | |
CN115629607A (en) | Reinforced learning path planning method integrating historical information | |
CN110110380B (en) | Piezoelectric actuator hysteresis nonlinear modeling method and application | |
CN112595326A (en) | Improved Q-learning path planning algorithm with fusion of priori knowledge | |
CN116859903A (en) | Robot smooth path planning method based on improved Harris eagle optimization algorithm | |
CN115373400A (en) | Robot path planning method and system based on dynamic update mechanism ant colony algorithm | |
Khasanov et al. | Gradient descent in machine learning | |
CN109657800A (en) | Intensified learning model optimization method and device based on parametric noise | |
CN117970782B (en) | Fuzzy PID control method based on fish scale evolution GSOM improvement | |
CN115469532A (en) | Controller optimization method based on improved gull algorithm | |
CN109062040A (en) | Predictive PID method based on the optimization of system nesting | |
CN114967713A (en) | Underwater vehicle buoyancy discrete change control method based on reinforcement learning | |
CN115730743A (en) | Battlefield combat trend prediction method based on deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |