CN112344944B - Reinforced learning path planning method introducing artificial potential field - Google Patents

Reinforced learning path planning method introducing artificial potential field Download PDF

Info

Publication number
CN112344944B
CN112344944B CN202011327198.6A CN202011327198A CN112344944B CN 112344944 B CN112344944 B CN 112344944B CN 202011327198 A CN202011327198 A CN 202011327198A CN 112344944 B CN112344944 B CN 112344944B
Authority
CN
China
Prior art keywords
value
action
algorithm
path planning
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011327198.6A
Other languages
Chinese (zh)
Other versions
CN112344944A (en
Inventor
王科银
石振
张建辉
杨正才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Automotive Technology
Original Assignee
Hubei University of Automotive Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Automotive Technology filed Critical Hubei University of Automotive Technology
Priority to CN202011327198.6A priority Critical patent/CN112344944B/en
Publication of CN112344944A publication Critical patent/CN112344944A/en
Application granted granted Critical
Publication of CN112344944B publication Critical patent/CN112344944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a reinforcement learning path planning method introducing an artificial potential field, which comprises the following steps: s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent; s2, initializing algorithm parameters; s3, selecting actions by adopting a dynamic factor adjustment strategy; s4, executing the action and updating the Q value; s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition; s6, selecting the action with the maximum Q value in each step to obtain an optimal path; and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path. Compared with the traditional algorithm, the improved Q-learning algorithm shortens the path planning time by 85.1%, reduces the iteration times before convergence by 74.7%, and improves the convergence result stability of the algorithm.

Description

Reinforced learning path planning method introducing artificial potential field
Technical Field
The invention relates to the technical field of robot path planning, in particular to a reinforcement learning path planning method introducing an artificial potential field.
Background
With the development of science and technology, more and more mobile robots walk into the daily life of people. The problem of path planning for mobile robots is also becoming more and more important. The path planning technology can help the robot avoid the obstacle to plan an optimal movement route from the starting point to the target point under the condition of referring to a certain index. Path planning can be divided into global path planning and local path planning according to the known degree of environmental knowledge in the path planning process. The global path planning algorithm which is widely applied comprises an A-star algorithm, a dijkstra algorithm, a visual graph method, a free space method and the like; the local path planning algorithm comprises an artificial potential field algorithm, a genetic algorithm, a neural network algorithm, a reinforcement learning algorithm and the like. The reinforcement learning algorithm is an algorithm with relatively strong adaptability, and can continuously try and error to find an optimal path in a completely unknown environment, so that the reinforcement learning algorithm obtains more and more attention in the field of path planning of the mobile robot.
The reinforced learning algorithm which is most widely applied in the field of path planning of mobile robots is a Q-learning algorithm. The conventional Q-learning algorithm has the following problems: (1) all Q values are set to be 0 or random values in the initialization process, so that the intelligent agent can only search blindly in the initial stage, and excessive invalid iterations occur in the initial stage of the algorithm; (2) an epsilon-greedy strategy is adopted during action selection, too large epsilon value enables more exploration environments of the intelligent agent to be difficult to converge, and too small epsilon value enables the intelligent agent to find out a suboptimal solution due to insufficient environment exploration, and the relation between exploration and utilization is difficult to balance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a reinforcement learning path planning method introducing an artificial potential field, which introduces a gravitational field function of the artificial potential field in the Q value initialization process, so that the state value is larger when the artificial potential field is closer to a target position, an intelligent agent can search towards the target position in the initial stage, invalid iteration of the initial stage of the algorithm is reduced, and the path planning time of a mobile robot based on reinforcement learning is shortened.
A reinforcement learning path planning method for introducing an artificial potential field comprises the following steps:
s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent;
s2, initializing algorithm parameters;
s3, selecting actions by adopting a dynamic factor adjustment strategy;
s4, executing the action and updating the Q value;
s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition;
s6, selecting the action with the maximum Q value in each step to obtain an optimal path;
and S7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
Preferably, in step S1, the specific process is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, defining the grid as a target position, namely a position to be finally reached by the mobile robot; the other grids are defined as non-obstacle grids, and the robot can calculate the attraction value of each grid according to the formula (1);
Figure 933987DEST_PATH_IMAGE001
where ζ is a scale factor greater than 0 for adjusting the magnitude of attraction; | d | is the distance between the current position and the position of the target point; eta is a normal number, and the attraction value at the target point is prevented from being infinite.
Preferably, in step S2, the parameters include: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilon max ,ε min ,T,n;
Initializing the Q function using equation (2)
Figure 100002_DEST_PATH_IMAGE002
Wherein, P(s) | s, a) is the probability of transition to state s, V(s), from the current state s and action a determined ) As a function of the state value of the next state,V(s )=U att
preferably, in step S3, the greedy factor adjustment strategy is as follows:
Figure 460914DEST_PATH_IMAGE003
wherein the specific form of the tanh function is as follows:
Figure 100002_DEST_PATH_IMAGE004
e is the base of the natural logarithm, as the independent variabletWhen greater than 0, tanh: (t) The value range of (1) is (0);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilon max And ε min The set maximum value and minimum value of the search rate are respectively.
Preferably, in step S4, the action a selected in the third step is executed to arrive at S, the instant reward R (S, a) is obtained, the Q-value function is updated by using a Q-learning algorithm introducing artificial potential field, and the updating rule is as shown in formula (5)
Figure 443914DEST_PATH_IMAGE005
Wherein (s, a) is a current state-action pair; (s) ,a ) Is the state-action pair at the next time, and R (s, a) is the instant reward for performing action a at state s.
The invention has the beneficial effects that:
in order to solve the problems of low convergence speed, multiple iteration times, unstable convergence result and the like when the traditional reinforcement learning algorithm is applied to path planning of an unknown environment of a mobile robot, an improved Q-learning algorithm is provided. Introducing an artificial potential field method during state initialization, so that the state value is larger when the intelligent agent is closer to a target position, and the intelligent agent is guided to move towards the target position; and an epsilon-greedy strategy is improved on action selection, and a greedy factor epsilon is dynamically adjusted according to the convergence degree of the algorithm, so that the relationship between exploration and utilization is well balanced. Simulation results based on the grid map show that compared with the traditional algorithm, the improved Q-learning algorithm shortens the path planning time by 85.1%, the iteration frequency before convergence is reduced by 74.7%, and meanwhile, the convergence result stability of the algorithm is improved.
Drawings
FIG. 1 is a schematic view of the general flow of the process of the present invention.
Fig. 2 is a grid map of the operation of the mobile robot according to the embodiment of the present invention.
FIG. 3 is a diagram of conventional Q-learning convergence.
FIG. 4 is a diagram of improved Q-learning convergence according to an embodiment of the invention.
FIG. 5 is a diagram of an optimized path drawn by the improved Q-learning scheme according to the embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only used as examples, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1, the method for planning a reinforcement learning path by introducing an artificial potential field according to the present invention includes the following steps:
the first step is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, determining that the grid is the target position and the position to which the mobile robot finally arrives; the other grids are defined as barrier-free grids, and the robot can pass through, and calculate the attraction value of each grid according to formula (1).
Figure 307965DEST_PATH_IMAGE001
Zeta is a scale factor greater than 0 and is used for adjusting the size of the attraction force, | d | is the distance between the current position and the position of the target point, and η is a normal number, so that the attraction force value at the target point is prevented from being infinite.
Through the above steps, a simulation environment for training the reinforcement learning agent can be obtained, and the grid map for the mobile robot in the embodiment is shown in fig. 2.
The second step is that: initializing algorithm parameters, the parameters comprising: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilon max ,ε min ,T,n。
The Q value function is initialized using equation (2).
Wherein, P(s) | s, a) is the probability of transition to state s, V(s), from the current state s and action a determined ) As a function of the state value of the next state, V(s) )=U att
The third step: and selecting the action by adopting a greedy factor dynamic adjustment strategy, wherein the greedy factor dynamic adjustment strategy is as follows:
Figure 325599DEST_PATH_IMAGE003
wherein the specific form of the tanh function is as follows:
Figure 413641DEST_PATH_IMAGE004
e is the base of the natural logarithm, as the independent variabletAbove 0, tan h: (t) The value range of (1) is (0);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilon max And ε min Respectively the maximum value of the set exploration rateAnd a minimum value.
The fourth step: performing the action a selected in the third step Arrival s Obtaining an instant reward R (s, a), updating the Q value function by using a Q-learning algorithm introducing an artificial potential field, and updating the rule as shown in the formula (5)
Figure 883936DEST_PATH_IMAGE005
Wherein (s, a) is a current state-action pair; (s) ,a ) Is the state-action pair at the next time; r (s, a) is an instant reward for performing action a in state s.
And repeatedly executing the third step and the fourth step until a certain step number or a certain convergence condition is reached.
The fifth step: and selecting the action with the maximum Q value in each step to obtain the optimal path.
And a sixth step: and sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path.
The parameter settings in this embodiment are as follows: learning rate a =0.01, discount factor γ =0.9, maximum number of iterations is set to 20000, scale factor ζ is set to 0.6, constant η is set to 1, and ∈ is set max =0.5,ε min =0.01, T =500, n =10, the reward function is set as:
Figure 100002_DEST_PATH_IMAGE006
in this embodiment, we can obtain the optimal path by using the above method and setting the parameters as shown in fig. 5.
Comparing fig. 3 and fig. 4, it can be seen that, by comparing the improved Q-learning algorithm with the conventional Q-learning algorithm, the improved algorithm shortens the algorithm convergence time by 85.1%, reduces the iteration number by 74.7%, and improves the convergence result stability of the algorithm.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

Claims (4)

1. A reinforcement learning path planning method for introducing an artificial potential field is characterized by comprising the following steps: the method comprises the following steps:
s1, establishing a grid map, introducing a gravitational field function initialization state value, and obtaining a simulation environment for training the reinforcement learning agent;
s2, initializing algorithm parameters;
s3, selecting actions by adopting a dynamic factor adjustment strategy;
s4, executing the action and updating the Q value;
s5, repeating the third step and the fourth step until reaching a certain step number or a certain convergence condition;
s6, selecting the action with the maximum Q value in each step to obtain an optimal path;
s7, sending the optimal path to a controller of the mobile robot, and controlling the mobile robot to walk according to the optimal path;
the specific process of step S1 is as follows: the method comprises the steps of carrying out segmentation processing on an environment image obtained by a mobile robot, segmenting the image into 20 x 20 grids, establishing an environment model by adopting a grid method, and if an obstacle is found in the grids, defining the grids as the position of the obstacle, wherein the robot cannot pass through the grids; if the target point is found in the grid, defining the grid as a target position, namely a position to be finally reached by the mobile robot; the other grids are defined as non-obstacle grids, and the robot can calculate the attraction value of each grid according to the formula (1);
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
where ζ is a scale factor greater than 0 for adjusting the magnitude of attraction; | d | is the distance between the current position and the position of the target point; eta is a normal number, so that the attraction value at the target point is prevented from being infinite;
in step S2, the parameters include: learning rate alpha epsilon (0, 1), discount factor gamma epsilon (0, 1), maximum iteration times, reward function r and greedy factor dynamic adjustment strategy parameter epsilon max ,ε min ,T,n;
Initializing the Q function using equation (2)
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE008
Wherein, P(s) S, a) is a transition to the next state s from the case determined by the current state s and the action a Probability of (a), V(s) ) As a function of the state value of the next state, V(s) )=U att
Figure 669079DEST_PATH_IMAGE009
The prize value obtained for the current state s and taking action a,
Figure DEST_PATH_IMAGE010
s in (1) is a state set, U att A gravity value for the current position;
in step S3, the greedy factor adjustment strategy is as follows:
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014
wherein the specific form of the tanh function is as follows:
Figure DEST_PATH_IMAGE016
Figure DEST_PATH_IMAGE018
e is the base of the natural logarithm, and when the independent variable is greater than 0, the value range of the tanh () is (0, 1);std n step number standard deviation under continuous n times of iteration; t is a coefficient, and is opposite to the effect of temperature values in the simulated annealing algorithm, and the larger T is, the smaller the randomness is; epsilon max And epsilon min The set maximum and minimum values of the search rate are set.
2. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising:
in step S4, the action a selected in the third step is executed to reach S to obtain the instant reward R (S, a), the Q value function is updated by using the Q-learning algorithm introduced with artificial potential field, and the updating rule is as the formula (5)
Figure DEST_PATH_IMAGE019
Figure DEST_PATH_IMAGE021
Wherein (s, a) is a current state-action pair; (s) ,a ) Is the state-action pair at the next time; r (s, a) is the instant reward for executing action a under state s, a is the learning rate, and γ is the discount factor.
3. The method of reinforcement learning path planning with introduction of an artificial potential field according to claim 1, characterized by: the scale factor ζ is set to 0.6 and the constant η is set to 1.
4. The method of reinforcement learning path planning incorporating an artificial potential field of claim 1, further comprising: learning rate α =0.01, discount factor γ =0.9, maximum number of iterations is set to 20000, ∈ max =0.5, ∈ min =0.01, T =500, n =10, and the reward function is set to:
Figure DEST_PATH_IMAGE023
CN202011327198.6A 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field Active CN112344944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011327198.6A CN112344944B (en) 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011327198.6A CN112344944B (en) 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field

Publications (2)

Publication Number Publication Date
CN112344944A CN112344944A (en) 2021-02-09
CN112344944B true CN112344944B (en) 2022-08-05

Family

ID=74365572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011327198.6A Active CN112344944B (en) 2020-11-24 2020-11-24 Reinforced learning path planning method introducing artificial potential field

Country Status (1)

Country Link
CN (1) CN112344944B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
CN113534819B (en) * 2021-08-26 2024-03-15 鲁东大学 Method and storage medium for pilot following type multi-agent formation path planning
CN113848911B (en) * 2021-09-28 2023-06-27 华东理工大学 Mobile robot global path planning method based on Q-learning and RRT
CN114296440B (en) * 2021-09-30 2024-04-09 中国航空工业集团公司北京长城航空测控技术研究所 AGV real-time scheduling method integrating online learning
CN113790729B (en) * 2021-11-16 2022-04-08 北京科技大学 Unmanned overhead traveling crane path planning method and device based on reinforcement learning algorithm
CN114518758B (en) * 2022-02-08 2023-12-12 中建八局第三建设有限公司 Indoor measurement robot multi-target point moving path planning method based on Q learning
CN115542912B (en) * 2022-09-29 2024-06-07 福州大学 Mobile robot path planning method based on improved Q-learning algorithm
CN116700258B (en) * 2023-06-13 2024-05-03 万基泰科工集团数字城市科技有限公司 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
WO2018120739A1 (en) * 2016-12-30 2018-07-05 深圳光启合众科技有限公司 Path planning method, apparatus and robot
CN110132296A (en) * 2019-05-22 2019-08-16 山东师范大学 Multiple agent sub-goal based on dissolution potential field divides paths planning method and system
CN110307848A (en) * 2019-07-04 2019-10-08 南京大学 A kind of Mobile Robotics Navigation method
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 Path planning method and system based on reinforcement learning and heuristic search

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626565B2 (en) * 2008-06-30 2014-01-07 Autonomous Solutions, Inc. Vehicle dispatching method and system
US10839302B2 (en) * 2015-11-24 2020-11-17 The Research Foundation For The State University Of New York Approximate value iteration with complex returns by bounding
CN110462544A (en) * 2017-03-20 2019-11-15 御眼视觉技术有限公司 The track of autonomous vehicle selects

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
WO2018120739A1 (en) * 2016-12-30 2018-07-05 深圳光启合众科技有限公司 Path planning method, apparatus and robot
CN110132296A (en) * 2019-05-22 2019-08-16 山东师范大学 Multiple agent sub-goal based on dissolution potential field divides paths planning method and system
CN110307848A (en) * 2019-07-04 2019-10-08 南京大学 A kind of Mobile Robotics Navigation method
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 Path planning method and system based on reinforcement learning and heuristic search

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Yukiyasu Noguchi 等.Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs.《2019 IEEE Underwater Technology (UT)》.2019,第1-6页. *
宋勇等.移动机器人路径规划强化学习的初始化.《控制理论与应用》.2012,第29卷(第12期),第1623-1628页. *
徐晓苏 等.基于改进强化学习的移动机器人路径规划方法.《中国惯性技术学报》.2019,第27卷(第3期),第314-320页. *

Also Published As

Publication number Publication date
CN112344944A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN112344944B (en) Reinforced learning path planning method introducing artificial potential field
CN111896006B (en) Path planning method and system based on reinforcement learning and heuristic search
CN109765893A (en) Method for planning path for mobile robot based on whale optimization algorithm
CN111381600B (en) UUV path planning method based on particle swarm optimization
CN107703751A (en) PID controller optimization method based on dragonfly algorithm
CN113867369B (en) Robot path planning method based on alternating current learning seagull algorithm
CN114460941B (en) Robot path planning method and system based on improved sparrow search algorithm
CN108594803B (en) Path planning method based on Q-learning algorithm
CN115629607A (en) Reinforced learning path planning method integrating historical information
CN113885536A (en) Mobile robot path planning method based on global gull algorithm
CN115115284B (en) Energy consumption analysis method based on neural network
CN114967713B (en) Underwater vehicle buoyancy discrete change control method based on reinforcement learning
CN116859903A (en) Robot smooth path planning method based on improved Harris eagle optimization algorithm
CN114742231A (en) Multi-objective reinforcement learning method and device based on pareto optimization
Khasanov et al. Gradient descent in machine learning
CN108955689A (en) It is looked for food the RBPF-SLAM method of optimization algorithm based on adaptive bacterium
CN108121206A (en) Compound self-adaptive model generation optimization method based on efficient modified differential evolution algorithm
CN110889531A (en) Wind power prediction method and prediction system based on improved GSA-BP neural network
CN115167419A (en) Robot path planning method based on DQN algorithm
CN114548497B (en) Crowd motion path planning method and system for realizing scene self-adaption
CN115655279A (en) Marine unmanned rescue airship path planning method based on improved whale algorithm
CN115344046A (en) Mobile robot path planning based on improved deep Q network algorithm
CN114995105A (en) Water turbine regulating system PID parameter optimization method based on improved genetic algorithm
CN113807505A (en) Method for improving cyclic variation learning rate through neural network
Hewlett et al. Optimization using a modified second-order approach with evolutionary enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant