CN112344944A - Reinforced learning path planning method introducing artificial potential field - Google Patents

Reinforced learning path planning method introducing artificial potential field Download PDF

Info

Publication number
CN112344944A
CN112344944A CN202011327198.6A CN202011327198A CN112344944A CN 112344944 A CN112344944 A CN 112344944A CN 202011327198 A CN202011327198 A CN 202011327198A CN 112344944 A CN112344944 A CN 112344944A
Authority
CN
China
Prior art keywords
value
path planning
potential field
artificial potential
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011327198.6A
Other languages
Chinese (zh)
Other versions
CN112344944B (en
Inventor
王科银
石振
张建辉
杨正才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Automotive Technology
Original Assignee
Hubei University of Automotive Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Automotive Technology filed Critical Hubei University of Automotive Technology
Priority to CN202011327198.6A priority Critical patent/CN112344944B/en
Publication of CN112344944A publication Critical patent/CN112344944A/en
Application granted granted Critical
Publication of CN112344944B publication Critical patent/CN112344944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a reinforcement learning path planning method introducing an artificial potential field, which comprises the following steps: s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent; s2, initializing algorithm parameters; s3, selecting actions by adopting a dynamic factor adjustment strategy; s4, executing the action and updating the Q value; s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition; s6, selecting the action with the maximum Q value in each step to obtain an optimal path; and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path. Compared with the traditional algorithm, the improved Q-learning algorithm shortens the path planning time by 85.1%, reduces the iteration times before convergence by 74.7%, and improves the convergence result stability of the algorithm.

Description

Reinforced learning path planning method introducing artificial potential field
Technical Field
The invention relates to the technical field of robot path planning, in particular to a reinforcement learning path planning method introducing an artificial potential field.
Background
With the development of science and technology, more and more mobile robots walk into the daily life of people. The problem of path planning for mobile robots is also becoming more and more important. The path planning technology can help the robot avoid the obstacle to plan an optimal movement path from the starting point to the target point under the condition of referring to a certain index. Path planning can be divided into global path planning and local path planning according to the known degree of environmental knowledge in the path planning process. The global path planning algorithm which is widely applied comprises an A-star algorithm, a dijkstra algorithm, a visual graph method, a free space method and the like; the local path planning algorithm comprises an artificial potential field algorithm, a genetic algorithm, a neural network algorithm, a reinforcement learning algorithm and the like. The reinforcement learning algorithm is an algorithm with relatively strong adaptability, and can continuously try and error to find an optimal path in a completely unknown environment, so that the reinforcement learning algorithm obtains more and more attention in the field of path planning of the mobile robot.
The reinforced learning algorithm which is most widely applied in the field of path planning of mobile robots is a Q-learning algorithm. The conventional Q-learning algorithm has the following problems: (1) all Q values are set to be 0 or random values in the initialization process, so that the intelligent agent can only search blindly in the initial stage, and excessive invalid iterations occur in the initial stage of the algorithm; (2) an epsilon-greedy strategy is adopted during action selection, too large epsilon value enables more exploration environments of the intelligent agent to be difficult to converge, and too small epsilon value enables the intelligent agent to find out a suboptimal solution due to insufficient environment exploration, and the relation between exploration and utilization is difficult to balance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a reinforcement learning path planning method introducing an artificial potential field, which introduces a gravitational field function of the artificial potential field in the Q value initialization process, so that the state value is larger when the artificial potential field is closer to a target position, an intelligent agent can search towards the target position in the initial stage, invalid iteration of the initial stage of the algorithm is reduced, and the path planning time of a mobile robot based on reinforcement learning is shortened.
A reinforcement learning path planning method for introducing an artificial potential field comprises the following steps:
s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent;
s2, initializing algorithm parameters;
s3, selecting actions by adopting a dynamic factor adjustment strategy;
s4, executing the action and updating the Q value;
s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition;
s6, selecting the action with the maximum Q value in each step to obtain an optimal path;
and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
Preferably, in step S1, the specific process is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, defining the grid as a target position, namely a position to be finally reached by the mobile robot; the other grids are defined as non-obstacle grids, and the robot can calculate the attraction value of each grid according to the formula (1);
Figure 933987DEST_PATH_IMAGE001
where ζ is a scale factor greater than 0 for adjusting the magnitude of attraction; | d | is the distance between the current position and the position of the target point; eta is a normal number, and the attraction value at the target point is prevented from being infinite.
Preferably, in step S2, the parameters include: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilonmax,εmin,T,n;
Initializing the Q function using equation (2)
Figure DEST_PATH_IMAGE002
Wherein, P(s)| s, a) is the probability of transition to state s, V(s), from the current state s and action a determined) As a function of the state value of the next state, V(s))=Uatt
Preferably, in step S3, the greedy factor adjustment strategy is as follows:
Figure 460914DEST_PATH_IMAGE003
wherein the specific form of the tanh function is as follows:
Figure DEST_PATH_IMAGE004
e is the base of the natural logarithm, as the independent variabletWhen greater than 0, tanh: (t) The value range of (1) is (0);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilonmaxAnd εminThe set maximum value and minimum value of the search rate are respectively.
Preferably, in step S4, the action a selected in the third step is executed to arrive at S, the instant reward R (S, a) is obtained, the Q-value function is updated by using a Q-learning algorithm introducing artificial potential field, and the updating rule is as shown in formula (5)
Figure 443914DEST_PATH_IMAGE005
Wherein (s, a) is a current state-action pair; (s),a) Is the state-action pair at the next time, and R (s, a) is the instant reward for performing action a at state s.
The invention has the beneficial effects that:
in order to solve the problems of low convergence speed, multiple iteration times, unstable convergence result and the like when the traditional reinforcement learning algorithm is applied to path planning of an unknown environment of a mobile robot, an improved Q-learning algorithm is provided. Introducing an artificial potential field method during state initialization, so that the state value is larger when the intelligent agent is closer to a target position, and the intelligent agent is guided to move towards the target position; and an epsilon-greedy strategy is improved on action selection, and a greedy factor epsilon is dynamically adjusted according to the convergence degree of the algorithm, so that the relationship between exploration and utilization is well balanced. Simulation results based on the grid map show that compared with the traditional algorithm, the improved Q-learning algorithm shortens the path planning time by 85.1%, the iteration frequency before convergence is reduced by 74.7%, and meanwhile, the convergence result stability of the algorithm is improved.
Drawings
FIG. 1 is a schematic view of the general flow of the process of the present invention.
Fig. 2 is a grid map of the operation of the mobile robot according to the embodiment of the present invention.
FIG. 3 is a diagram of conventional Q-learning convergence.
FIG. 4 is a diagram of improved Q-learning convergence according to an embodiment of the invention.
FIG. 5 is a diagram of an optimized path drawn by the improved Q-learning scheme according to the embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1, the method for planning a reinforcement learning path by introducing an artificial potential field according to the present invention includes the following steps:
the first step is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, determining that the grid is the target position and the position to which the mobile robot finally arrives; the other grids are defined as barrier-free grids, and the robot can pass through, and calculate the attraction value of each grid according to formula (1).
Figure 307965DEST_PATH_IMAGE001
Zeta is a scale factor greater than 0 and is used for adjusting the size of the attraction force, | d | is the distance between the current position and the position of the target point, and η is a normal number, so that the attraction force value at the target point is prevented from being infinite.
Through the above steps, a simulation environment for training the reinforcement learning agent can be obtained, and the grid map for the mobile robot in the embodiment is shown in fig. 2.
The second step is that: initializing algorithm parameters, the parameters comprising: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilonmax,εmin,T,n。
The Q value function is initialized using equation (2).
Wherein, P(s)| s, a) is the probability of transition to state s, V(s), from the current state s and action a determined) As a function of the state value of the next state, V(s))=Uatt
The third step: and selecting the action by adopting a greedy factor dynamic adjustment strategy, wherein the greedy factor dynamic adjustment strategy is as follows:
Figure 325599DEST_PATH_IMAGE003
wherein the specific form of the tanh function is as follows:
Figure 413641DEST_PATH_IMAGE004
e is the base of the natural logarithm, as the independent variabletWhen greater than 0, tanh: (t) The value range of (1) is (0);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilonmaxAnd εminThe set maximum value and minimum value of the search rate are respectively.
The fourth step: performing the action a selected in the third stepArrival sObtaining an instant reward R (s, a), updating the Q value function by using a Q-learning algorithm introducing an artificial potential field, and updating the rule as shown in the formula (5)
Figure 883936DEST_PATH_IMAGE005
Wherein (s, a) is a current state-action pair; (s),a) Is the state-action pair at the next time; r (s, a) is an instant reward for performing action a in state s.
And repeatedly executing the third step and the fourth step until a certain step number or a certain convergence condition is reached.
The fifth step: and selecting the action with the maximum Q value in each step to obtain the optimal path.
And a sixth step: and sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
The parameter settings in this embodiment are as follows: learning rate a =0.01, discount factor γ =0.9, maximum number of iterations is set to 20000, scale factor ζ is set to 0.6, constant η is set to 1, and ∈ is setmax=0.5,εmin=0.01, T =500, n =10, the reward function is set as:
Figure DEST_PATH_IMAGE006
in this embodiment, we can obtain the optimal path by using the above method and setting the parameters as shown in fig. 5.
Comparing fig. 3 and fig. 4, it can be seen that, by comparing the improved Q-learning algorithm with the conventional Q-learning algorithm, the improved algorithm shortens the algorithm convergence time by 85.1%, reduces the iteration number by 74.7%, and improves the convergence result stability of the algorithm.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

Claims (7)

1. A reinforcement learning path planning method for introducing an artificial potential field is characterized by comprising the following steps: the method comprises the following steps:
s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent;
s2, initializing algorithm parameters;
s3, selecting actions by adopting a dynamic factor adjustment strategy;
s4, executing the action and updating the Q value;
s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition;
s6, selecting the action with the maximum Q value in each step to obtain an optimal path;
and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
2. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising: the specific process of step S1 is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, defining the grid as a target position, namely a position to be finally reached by the mobile robot; the other grids are defined as non-obstacle grids, and the robot can calculate the attraction value of each grid according to the formula (1);
Figure 108471DEST_PATH_IMAGE001
where ζ is a scale factor greater than 0 for adjusting the magnitude of attraction; | d | is the distance between the current position and the position of the target point; eta is a normal number, and the attraction value at the target point is prevented from being infinite.
3. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising: in step S2, the parameters include: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilonmax,εmin,T,n;
Initializing the Q function using equation (2)
Figure 952537DEST_PATH_IMAGE002
Wherein, P(s)| s, a) is the probability of transition to state s, V(s), from the current state s and action a determined) As a function of the state value of the next state, V(s))=Uatt
4. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising:
in step S3, the greedy factor adjustment strategy is as follows:
Figure 774999DEST_PATH_IMAGE003
wherein the specific form of the tanh function is as follows:
Figure 307612DEST_PATH_IMAGE004
e is the base of the natural logarithm, and when the independent variable is greater than 0, the value range of the tanh () is (0, 1);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilonmaxAnd εminThe set maximum value and minimum value of the search rate are respectively.
5. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising:
in step S4, the action a selected in the third step is executed to S, the instant reward R (S, a) is obtained, the Q value function is updated by using the Q-learning algorithm introduced into the artificial potential field, and the updating rule is as the formula (5)
Figure 772091DEST_PATH_IMAGE005
Wherein (s, a) is a current state-action pair; (s),a) Is the state-action pair at the next time; r (s, a) is an instant reward for performing action a in state s.
6. The method of reinforcement learning path planning incorporating an artificial potential field of claim 2, wherein: the scale factor ζ is set to 0.6 and the constant η is set to 1.
7. A method of reinforcement learning path planning incorporating an artificial potential field according to claim 3, characterised by: learning rate α =0.01, discount factor γ =0.9, maximum number of iterations is set to 20000, ∈ max =0.5, ∈ min =0.01, T =500, n =10, and the reward function is set to:
Figure 706549DEST_PATH_IMAGE006
CN202011327198.6A 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field Active CN112344944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011327198.6A CN112344944B (en) 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011327198.6A CN112344944B (en) 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field

Publications (2)

Publication Number Publication Date
CN112344944A true CN112344944A (en) 2021-02-09
CN112344944B CN112344944B (en) 2022-08-05

Family

ID=74365572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011327198.6A Active CN112344944B (en) 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field

Country Status (1)

Country Link
CN (1) CN112344944B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
CN113534819A (en) * 2021-08-26 2021-10-22 鲁东大学 Method and storage medium for pilot-follow multi-agent formation path planning
CN113790729A (en) * 2021-11-16 2021-12-14 北京科技大学 Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm
CN113848911A (en) * 2021-09-28 2021-12-28 华东理工大学 Mobile robot global path planning method based on Q-learning and RRT
CN114296440A (en) * 2021-09-30 2022-04-08 中国航空工业集团公司北京长城航空测控技术研究所 AGV real-time scheduling method integrating online learning
CN114518758A (en) * 2022-02-08 2022-05-20 中建八局第三建设有限公司 Q learning-based indoor measuring robot multi-target-point moving path planning method
CN115542912A (en) * 2022-09-29 2022-12-30 福州大学 Mobile robot path planning method based on improved Q-learning algorithm
CN116700258A (en) * 2023-06-13 2023-09-05 重庆市荣冠科技有限公司 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning
CN115542912B (en) * 2022-09-29 2024-06-07 福州大学 Mobile robot path planning method based on improved Q-learning algorithm

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327011A1 (en) * 2008-06-30 2009-12-31 Autonomous Solutions, Inc. Vehicle dispatching method and system
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
US20180012137A1 (en) * 2015-11-24 2018-01-11 The Research Foundation for the State University New York Approximate value iteration with complex returns by bounding
WO2018120739A1 (en) * 2016-12-30 2018-07-05 深圳光启合众科技有限公司 Path planning method, apparatus and robot
CN110132296A (en) * 2019-05-22 2019-08-16 山东师范大学 Multiple agent sub-goal based on dissolution potential field divides paths planning method and system
CN110307848A (en) * 2019-07-04 2019-10-08 南京大学 A kind of Mobile Robotics Navigation method
US20190369637A1 (en) * 2017-03-20 2019-12-05 Mobileye Vision Technologies Ltd. Trajectory selection for an autonomous vehicle
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 Path planning method and system based on reinforcement learning and heuristic search

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327011A1 (en) * 2008-06-30 2009-12-31 Autonomous Solutions, Inc. Vehicle dispatching method and system
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
US20180012137A1 (en) * 2015-11-24 2018-01-11 The Research Foundation for the State University New York Approximate value iteration with complex returns by bounding
WO2018120739A1 (en) * 2016-12-30 2018-07-05 深圳光启合众科技有限公司 Path planning method, apparatus and robot
US20190369637A1 (en) * 2017-03-20 2019-12-05 Mobileye Vision Technologies Ltd. Trajectory selection for an autonomous vehicle
CN110132296A (en) * 2019-05-22 2019-08-16 山东师范大学 Multiple agent sub-goal based on dissolution potential field divides paths planning method and system
CN110307848A (en) * 2019-07-04 2019-10-08 南京大学 A kind of Mobile Robotics Navigation method
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 Path planning method and system based on reinforcement learning and heuristic search

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUKIYASU NOGUCHI 等: "Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs", 《2019 IEEE UNDERWATER TECHNOLOGY (UT)》, 13 June 2019 (2019-06-13), pages 1 - 6 *
宋勇等: "移动机器人路径规划强化学习的初始化", 《控制理论与应用》, vol. 29, no. 12, 15 December 2012 (2012-12-15), pages 1623 - 1628 *
徐晓苏 等: "基于改进强化学习的移动机器人路径规划方法", 《中国惯性技术学报》, vol. 27, no. 3, 30 June 2019 (2019-06-30), pages 314 - 320 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
CN113534819B (en) * 2021-08-26 2024-03-15 鲁东大学 Method and storage medium for pilot following type multi-agent formation path planning
CN113534819A (en) * 2021-08-26 2021-10-22 鲁东大学 Method and storage medium for pilot-follow multi-agent formation path planning
CN113848911A (en) * 2021-09-28 2021-12-28 华东理工大学 Mobile robot global path planning method based on Q-learning and RRT
CN114296440A (en) * 2021-09-30 2022-04-08 中国航空工业集团公司北京长城航空测控技术研究所 AGV real-time scheduling method integrating online learning
CN114296440B (en) * 2021-09-30 2024-04-09 中国航空工业集团公司北京长城航空测控技术研究所 AGV real-time scheduling method integrating online learning
CN113790729A (en) * 2021-11-16 2021-12-14 北京科技大学 Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm
CN113790729B (en) * 2021-11-16 2022-04-08 北京科技大学 Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm
CN114518758A (en) * 2022-02-08 2022-05-20 中建八局第三建设有限公司 Q learning-based indoor measuring robot multi-target-point moving path planning method
CN114518758B (en) * 2022-02-08 2023-12-12 中建八局第三建设有限公司 Indoor measurement robot multi-target point moving path planning method based on Q learning
CN115542912A (en) * 2022-09-29 2022-12-30 福州大学 Mobile robot path planning method based on improved Q-learning algorithm
CN115542912B (en) * 2022-09-29 2024-06-07 福州大学 Mobile robot path planning method based on improved Q-learning algorithm
CN116700258A (en) * 2023-06-13 2023-09-05 重庆市荣冠科技有限公司 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning
CN116700258B (en) * 2023-06-13 2024-05-03 万基泰科工集团数字城市科技有限公司 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Also Published As

Publication number Publication date
CN112344944B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN112344944B (en) Reinforced learning path planning method introducing artificial potential field
CN111896006B (en) Path planning method and system based on reinforcement learning and heuristic search
Sherstov et al. Function approximation via tile coding: Automating parameter choice
CN109818775B (en) Short-term network flow prediction method
CN110806759A (en) Aircraft route tracking method based on deep reinforcement learning
CN108594803B (en) Path planning method based on Q-learning algorithm
CN110632922B (en) Path planning method based on bat algorithm and reinforcement learning
CN114460941B (en) Robot path planning method and system based on improved sparrow search algorithm
CN115629607A (en) Reinforced learning path planning method integrating historical information
CN113552891A (en) Robot multi-target path planning based on improved butterfly optimization algorithm
CN110110380B (en) Piezoelectric actuator hysteresis nonlinear modeling method and application
WO2018227820A1 (en) Method and device for controlling manipulator movement, storage medium, and terminal device
Karakuzu Parameter tuning of fuzzy sliding mode controller using particle swarm optimization
CN115115284B (en) Energy consumption analysis method based on neural network
CN116859903A (en) Robot smooth path planning method based on improved Harris eagle optimization algorithm
Khasanov et al. Gradient descent in machine learning
CN115730743A (en) Battlefield combat trend prediction method based on deep neural network
CN108121206A (en) Compound self-adaptive model generation optimization method based on efficient modified differential evolution algorithm
CN114967713A (en) Underwater vehicle buoyancy discrete change control method based on reinforcement learning
CN112595326A (en) Improved Q-learning path planning algorithm with fusion of priori knowledge
CN117471919A (en) Robot path planning method based on improved pelican optimization algorithm
Al_Duais et al. A review on enhancements to speed up training of the batch back propagation algorithm
CN115167419A (en) Robot path planning method based on DQN algorithm
CN114995105A (en) Water turbine regulating system PID parameter optimization method based on improved genetic algorithm
CN114329320A (en) Partial differential equation numerical solution method based on heuristic training data sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant