CN113894787B - Heuristic reward function design method for mechanical arm reinforcement learning motion planning - Google Patents

Heuristic reward function design method for mechanical arm reinforcement learning motion planning Download PDF

Info

Publication number
CN113894787B
CN113894787B CN202111278998.8A CN202111278998A CN113894787B CN 113894787 B CN113894787 B CN 113894787B CN 202111278998 A CN202111278998 A CN 202111278998A CN 113894787 B CN113894787 B CN 113894787B
Authority
CN
China
Prior art keywords
heuristic
mechanical arm
function
motion planning
planning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111278998.8A
Other languages
Chinese (zh)
Other versions
CN113894787A (en
Inventor
白成超
张家维
郭继峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202111278998.8A priority Critical patent/CN113894787B/en
Publication of CN113894787A publication Critical patent/CN113894787A/en
Application granted granted Critical
Publication of CN113894787B publication Critical patent/CN113894787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J18/00Arms
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed

Abstract

The invention discloses a design method of a heuristic reward function used in mechanical arm reinforcement learning motion planning, and relates to the technical field of robot motion planning and intelligent control. The method aims to solve the problem that a reward function design of a mechanical arm motion planning algorithm based on reinforcement learning is designed without a unified guidance method which usually depends on experience. The invention includes: establishing a heuristic function of a mechanical arm motion planning problem; constructing a heuristic reward function of the mechanical arm movement planning according to the heuristic function; determining parameter values in a heuristic reward function; and training the neural network motion planner for mechanical arm motion planning by using the constructed heuristic reward function. The heuristic reward function obviously improves the success rate of the motion planning and accelerates the convergence speed. The method is used for the field of motion planning and intelligent control of the mechanical arm.

Description

Heuristic reward function design method for mechanical arm reinforcement learning motion planning
Technical Field
The invention relates to a heuristic reward function design method in mechanical arm reinforcement learning motion planning, and belongs to the technical field of robot motion planning and intelligent control.
Background
Compared with the motion planning of unmanned vehicles or unmanned planes, the motion planning of mechanical arms is usually performed in an abstract high-dimensional joint space (configuration space), which makes some classical planning algorithms difficult to apply to a mechanical arm system because the unobstructed configuration space of the mechanical arm is difficult to obtain explicitly. The current commonly used mechanical arm motion planning algorithm can be divided into a sampling motion planning algorithm, a track optimization algorithm, an artificial potential field method and a graph searching motion planning algorithm. The traditional motion planning algorithm for the mechanical arm is difficult to realize rapid planning under a high-dimensional complex environment at present, and in recent years, the motion planning algorithm based on reinforcement learning is concerned by a plurality of scholars. With the continuous and deep research of the deep reinforcement learning on the multi-dimensional continuous motion space task, the reinforcement learning-based motion planning algorithm has the potential of simultaneously meeting high-dimensional adaptability, complex environment adaptability and planning rapidity. At present, researches for the problem of mechanical arm reinforcement learning motion planning mainly focus on the design of a neural network and the research of an auxiliary learning strategy, the research on a reward function design method of the motion planning problem is less, and the existing researches comprise the following steps: the method comprises the following steps of orientation reward function design, switching strategies between intensive reward and sparse reward and the like, and the reward function of the existing mechanical arm movement planning problem is usually designed by depending on experience without guidance of a unified design method. Therefore, in the prior art, no one has proposed how to design a heuristic reward function for the mechanical arm reinforcement learning motion planning.
The heuristic function is an important concept in an information-based search algorithm, and in a graph search algorithm, the heuristic function represents dissipation estimation values of the lowest dissipation path from a current node to a target node. By careful design of heuristic functions, graph search motion planning exhibits high success rate in very complex environments, which is difficult to realize by other planning algorithms. The design of the heuristic function has great influence on the performance of the graph search motion planning, and the meaning of the heuristic function is similar to that of the reinforcement learning reward function.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
the invention provides a heuristic reward function design method used in mechanical arm reinforcement learning motion planning, and aims to solve the problem that convergence speed and success rate of mechanical arm reinforcement learning motion planning are affected because the existing reward function design based on the reinforcement learning mechanical arm motion planning algorithm is designed by means of experience because a unified guidance method is not available.
The technical scheme adopted by the invention for solving the technical problems is as follows:
A design method for a heuristic reward function in mechanical arm reinforcement learning motion planning comprises the following steps:
the method comprises the following steps: establishing a heuristic function h (n) of a mechanical arm motion planning problem;
step two: constructing a heuristic reward function of the mechanical arm movement plan according to the heuristic function;
step three: determining parameter values in a heuristic reward function;
step four: and training the neural network motion planner for mechanical arm motion planning by using the constructed heuristic reward function.
Step one, the heuristic function h (n) is constructed in the following three ways:
1) when obvious constraint conditions exist in the motion planning problem, a relaxation problem which is easy to calculate the optimal solution is obtained by removing some constraint conditions of the original problem, and the solution of the relaxation problem is used as a heuristic function of the original problem;
2) when the motion planning problem has a subproblem with clear structure, the solution and dissipation of the subproblem of the original problem are used as the heuristic method of the original problem;
3) when the heuristic function is difficult to construct by 1) or 2), the heuristic function is constructed by directly inducing and learning experiences.
The process of constructing the heuristic function h (n) by the method 1) is as follows (the heuristic function of two mechanical arm motion plans is constructed by the invention):
The method comprises the following steps: removing constraint conditions from the original motion planning problem and relaxing: adopting a first relaxation mode or a second relaxation mode;
a first relaxation mode: when the obstacles are dense and the environment is complex in a scene of mechanical arm motion planning, removing mechanical arm kinematic constraint in the original problem, wherein the original problem is relaxed and is the problem that the tail end of the mechanical arm moves to a target position without collision;
a second relaxation mode: when the scenario of the mechanical arm motion planning is simple, kinematic constraint and collision-free constraint of the mechanical arm tail end and the obstacle are removed, and the original problem is relaxed to be the problem that the mechanical arm tail end and the obstacle are directly moved to the target position under the condition that the collision is not considered.
The first step is: taking the solution of the relaxed problem as a heuristic function of the original problem:
the heuristic function corresponding to the first relaxation mode is as follows: an estimated value of the shortest path length for a sphere representing the tail end of a mechanical arm to reach a target position without collision in the three-dimensional working space;
the heuristic function corresponding to the second relaxation mode is as follows: and estimating the linear distance between the tail end of the mechanical arm and the target position.
Step two, the heuristic function corresponding to the first relaxation mode has the following concrete form:
Figure GDA0003632052770000021
in the above formula, P represents a sphere motion path calculated by the RRT-connect motion planning algorithm, and is composed of N path points, P (i) represents the position of the ith path point, and h 1(st) Referred to as RRT heuristic functions.
The specific form of the heuristic function corresponding to the second relaxation mode is as follows:
Figure GDA0003632052770000031
in the above formula, p(s)t) Denotes the end position of the robot arm at time t, pgoalTarget position, h, representing a robot arm motion plan2(st) Referred to as straight line heuristic functions.
Step two, the heuristic reward function of the mechanical arm movement plan is constructed in the following mode:
Figure GDA0003632052770000032
in the above equation,. epsilon.denotes a distance threshold for judging whether planning is successful or not, f (h(s)t+1) ) are defined as follows:
f(h(st+1))=λ123-h(st+1))
the above formula is composed of two parts, the former term lambda1The time penalty item aims to ensure that the mechanical arm can move to a target position as fast as possible; the effect of the latter term is to convert h(s)t+1) Is scaled by the value of3For adjusting h(s)t+1) Positive and negative constants, λ2For adjusting h(s)t+1) A constant of magnitude.
Step three, the mode of determining the parameter value in the heuristic reward function is as follows:
the values of the parameters are adjusted according to the size of the heuristic function and are designed according to the constraint relation of the following formula,
Figure GDA0003632052770000033
in the above formula, TendRepresents the total number of interaction steps in a training round, t represents the t-th interaction, gammatA value discount coefficient representing a t-th interaction; if the value of the above equation is less than-1, the agent may choose to actively collide with the obstacle to end the current round in order to increase the jackpot, and if the value of the above equation is greater than 1, the agent may move around the goal state until the training round is ended; when the value of the above expression is between-1 and 1, the intelligent agent can be prevented from learning a wrong strategy, and the function of guiding learning is achieved.
The invention has at least the following beneficial technical effects:
the heuristic function in the mechanical arm movement planning is designed based on a heuristic function design method in a graph search movement planning algorithm, then the heuristic function is used for designing a reward function in the mechanical arm reinforcement learning movement planning, and finally the obtained reward function is called the heuristic reward function.
The invention provides a novel exercise planning reward function design framework based on a heuristic function design method in a graph search exercise planning algorithm, and experiments verify that two heuristic functions designed based on the framework can accelerate the training process of reinforcement learning and improve the success rate of exercise planning. The method solves the problem that the reward function design of the mechanical arm motion planning algorithm based on reinforcement learning is designed without a unified guidance method which usually depends on experience. The invention firstly establishes a heuristic function through a heuristic function establishing method of original problem relaxation, subproblem solving, learning and induction, and provides a method for establishing a reward function by utilizing the established heuristic function on the basis.
By verification, the heuristic reward function designed by the invention obviously improves the success rate of motion planning and accelerates the convergence speed. The method is used for the field of motion planning and intelligent control of the mechanical arm.
Drawings
FIG. 1 is a simulation scenario;
FIG. 2 is a graph of the success rate of the training process in a desktop scenario;
FIG. 3 is a graph of success rate of the training process in a wall obstruction scenario;
fig. 4 is a graph of the success rate of the training process in a cabinet scenario.
Detailed Description
The first specific implementation way is as follows:
the heuristic reward function design framework for mechanical arm motion planning in the embodiment comprises the following steps of:
the method comprises the following steps: establishing a heuristic function h (n) of the mechanical arm motion planning problem, wherein the heuristic function can be established in the following three ways:
1) when obvious constraint conditions exist in the motion planning problem, a relaxation problem which is easy to calculate the optimal solution is obtained by removing some constraint conditions of the original problem, and the solution of the relaxation problem is used as a heuristic function of the original problem.
2) And when the motion planning problem has sub-problems with clear structures, the solution and dissipation of the sub-problems of the original problem are used as the heuristic method of the original problem.
3) When the heuristic function is difficult to construct, the heuristic function is constructed by directly inducing and learning experiences.
Taking the mode 1) as an example, the method constructs two heuristic functions of mechanical arm motion planning according to the following steps:
The method comprises the following steps: and removing the constraint condition of the original motion planning problem and relaxing.
The first relaxation mode is as follows: when the obstacles are dense and the environment is complex in a scene of mechanical arm motion planning, the first relaxation mode is to remove the mechanical arm kinematic constraint in the original problem, and the original problem is the problem that the tail end of the mechanical arm moves to a target position without collision.
And (2) a second relaxation mode: when the scenario of the mechanical arm motion planning is simple, kinematic constraint and collision-free constraint of the mechanical arm tail end and the obstacle are removed, and the original problem is relaxed to be the problem that the mechanical arm tail end and the obstacle are directly moved to the target position under the condition that the collision is not considered.
The first step is: and taking the solution of the relaxed problem as a heuristic function of the original problem.
The heuristic function corresponding to the first relaxation mode is as follows: an estimate of the shortest path length in the three dimensional workspace to the target location without collision for a sphere representing the end of the robot arm. The invention adopts an RRT-connect motion planning algorithm based on sampling to calculate the collision-free path of a sphere in a three-dimensional Euclidean space, and the heuristic function is as follows:
Figure GDA0003632052770000051
in the above formula, P represents the sphere motion path calculated by RRT-connect motion planning algorithm, and is composed of N path points, P (i) represents the position of the ith path point, and h is calculated 1(st) Referred to as RRT heuristic functions.
The heuristic function corresponding to the second relaxation mode is as follows: the linear distance estimation value of the tail end of the mechanical arm from the target position is obtained by directly calculating the distance of the tail end of the mechanical arm from the target position, and the heuristic function is as follows:
Figure GDA0003632052770000052
in the above formula, p(s)t) Denotes the end position of the robot arm at time t, pgoalTarget position, h, representing a robot arm motion plan2(st) Referred to as straight-line heuristic functions.
Step two: and constructing a heuristic reward function of the mechanical arm movement plan according to the heuristic function:
Figure GDA0003632052770000053
in the above equation,. epsilon.represents a distance threshold for determining whether planning is successful, and f (h(s)t+1) ) are defined as follows:
f(h(st+1))=λ123-h(st+1))
the above formula is composed of two parts, the former term lambda1The aim is to make the mechanical arm move to the target position as fast as possible for a time penalty item. The effect of the latter term is to convert h(s)t+1) Is scaled to a suitable range, λ3For adjusting positive and negative, lambda2For resizing.
Step three: and determining parameter values in the heuristic reward function, wherein the value of each parameter needs to be adjusted according to the size of the heuristic function and can be designed according to the constraint relation of the following formula.
Figure GDA0003632052770000054
In the above formula, TendRepresenting the total time of a training round, if the value of the above equation is less than-1, the agent may choose to actively collide with obstacles to increase the jackpot to end the current round, and if the value of the above equation is greater than 1, the agent may move around the goal state until the training round ends. When the value of the above expression is between-1 and 1, the intelligent agent can be prevented from learning a wrong strategy, and the function of guiding learning is achieved.
Step four: and training the neural network motion planner for mechanical arm motion planning by using the constructed heuristic reward function.
The beneficial effects of the present invention are demonstrated with the following examples:
example (b):
1) experimental setup
In the invention, lambda is1、λ2、λ3The values of are respectively selected as: -0.001, -0.02, 0.2. In order to highlight the role of the heuristic function in the reward function, a non-addition function is setHeuristic sparse reward function to compare:
Figure GDA0003632052770000061
the invention is based on a jaco2 cooperative mechanical arm with 7 degrees of freedom for training and testing. In order to fully test the effect of the heuristic reward function, 3 experimental scenes with increasing difficulty are set, namely a desktop scene, a wall obstacle scene and a cabinet scene, as shown in fig. 1. Each scene establishes a corresponding training environment in MuJoCo, and the task of the mechanical arm is to move from a random initial position of a three-dimensional working space to a random target position.
Experimental results and analysis:
the method evaluates the effectiveness of the provided heuristic reward function from the two aspects of success rate and convergence speed, in the aspect of success rate evaluation, under the condition of taking noise-eliminating disturbance and random strategy after training, each neural network motion planner executes planning for 100 times under new initial configuration and target configuration, and the success rate of real planning of the neural network motion planner is evaluated by the success rate of planning for 100 times. In the aspect of convergence rate, the number of training rounds when the planning success rate reaches 80% for the first time in the training process is used as an index for evaluating the convergence rate of the strategy, and if the success rate cannot reach 80% before the training is finished, the training is considered to fail.
Because of two heuristic functions h in the desktop scene1(st) And h2(st) Are identical, so only h is taken for ease of calculation1(st) To be tested. The experiment under the desktop scene has 2 groups of contrast experiments, and the wall obstacle and the cupboard scene respectively carry out 3 groups of contrast experiments. Each group of experiments are repeated for 3 times under the same training parameters, the specific meaning of each control experiment and the success rate curve of the training process are shown in fig. 2-4, the solid line in the graph is the success rate average of 3 times of training, and the shadow behind the solid line is the success rate coverage of 3 times of training.
TABLE 1 success Rate test results
Figure GDA0003632052770000062
Figure GDA0003632052770000071
TABLE 2 Convergence Rate test results
Figure GDA0003632052770000072
The success rate test results and convergence rate data after training are shown in tables 1 and 2, respectively. The experimental results show that the success rate of the heuristic reward functions is higher than that of the situation without the heuristic reward functions, and the straight line heuristic reward functions have higher success rate than the RRT heuristic reward functions. In the aspect of convergence speed, the convergence speed of the heuristic reward function is higher than that of the case without the heuristic function, and the RRT heuristic reward function has more advantages than the straight line heuristic function in the convergence speed.
According to the method provided by the invention, the training convergence speed and the motion planning success rate of the mechanical arm motion planning algorithm based on reinforcement learning can be improved, and guidance is provided for the design of a reward function.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A heuristic reward function design method for mechanical arm reinforcement learning motion planning is characterized in that: the method comprises the following steps:
the method comprises the following steps: establishing a heuristic function h (n) of a mechanical arm motion planning problem;
step two: constructing a heuristic reward function of the mechanical arm movement plan according to the heuristic function; the heuristic reward function of the mechanical arm motion planning is constructed in the following mode:
Figure FDA0003632052760000011
in the above equation,. epsilon.denotes a distance threshold for judging whether planning is successful or not, f (h(s)t+1) ) are defined as follows:
f(h(st+1))=λ123-h(st+1))
the above formula is composed of two parts, the former term lambda1The time penalty item aims to ensure that the mechanical arm can move to a target position as fast as possible; the effect of the latter term is to convert h(s)t+1) Is scaled by the value of3For adjusting h(s)t+1) Positive and negative constants, λ2For adjusting h(s)t+1) A constant of magnitude; p(s)t) Denotes the end position of the robot arm at time t, pgoalA target position representing a robot arm motion plan;
Step three: determining parameter values in a heuristic reward function;
step four: and training the neural network motion planner for mechanical arm motion planning by using the constructed heuristic reward function.
2. A heuristic reward function design method for mechanical arm reinforcement learning exercise planning as claimed in claim 1, wherein the step one heuristic function h (n) is constructed by three ways:
1) when obvious constraint conditions exist in the motion planning problem, a relaxation problem which is easy to calculate the optimal solution is obtained by removing some constraint conditions of the original problem, and the solution of the relaxation problem is used as a heuristic function of the original problem;
2) when the motion planning problem has a subproblem with clear structure, the solution and dissipation of the subproblem of the original problem are used as the heuristic method of the original problem;
3) when the heuristic function is difficult to construct by 1) or 2), the heuristic function is constructed by directly inducing and learning experiences.
3. A heuristic reward function design method for mechanical arm reinforcement learning movement planning as claimed in claim 2, wherein the process of constructing the heuristic function h (n) by the method of the 1 st) is as follows:
the method comprises the following steps: removing constraint conditions from the original motion planning problem and relaxing: adopting a first relaxation mode or a second relaxation mode;
The first relaxation mode is as follows: when obstacles are dense in a scene of mechanical arm motion planning, removing mechanical arm kinematic constraint in the original problem, wherein the original problem is relaxed and is the problem that the tail end of the mechanical arm moves to a target position without collision;
and (2) a second relaxation mode: when the scenario of the mechanical arm motion planning is simple, removing kinematic constraint and collision-free constraint of the tail end of the mechanical arm and the barrier, and relaxing the original problem into the problem that the mechanical arm directly moves to the target position without considering the collision of the tail end of the mechanical arm and the barrier;
the first step is: taking the solution of the relaxed problem as a heuristic function of the original problem:
the heuristic function corresponding to the first relaxation mode is as follows: an estimated value of the shortest path length for a sphere representing the tail end of a mechanical arm to reach a target position without collision in the three-dimensional working space;
the heuristic function corresponding to the second relaxation mode is as follows: and estimating the linear distance between the tail end of the mechanical arm and the target position.
4. A heuristic reward function design method for mechanical arm reinforcement learning movement planning as claimed in claim 3, wherein the step two the relaxation mode one corresponds to a heuristic function in a concrete form as follows:
Figure FDA0003632052760000021
in the above formula, P represents the sphere motion path calculated by RRT-connect motion planning algorithm, and N path point groups P (i) represents the position of the ith path point, h1(st) Referred to as RRT heuristic functions.
5. The heuristic reward function design method for the mechanical arm reinforcement learning movement planning as claimed in claim 3, wherein the specific form of the heuristic function corresponding to the second relaxation mode in the first and second steps is as follows:
Figure FDA0003632052760000022
in the above formula, p(s)t) Denotes the end position of the robot arm at time t, pgoalTarget position, h, representing a robot arm motion plan2(st) Referred to as straight-line heuristic functions.
6. A heuristic reward function design method for mechanical arm reinforcement learning movement planning according to claim 1, characterized in that the way of determining the parameter values in the heuristic reward function in step three is as follows:
the values of the parameters are adjusted according to the size of the heuristic function and are designed according to the constraint relation of the following formula,
Figure FDA0003632052760000023
in the above formula, TendRepresents the total number of interaction steps of a training round, m represents the mth interaction, gammamA value discount coefficient representing the mth interaction; if the value of the above formula is less than-1, the mechanical arm selects to collide with the obstacle actively for increasing the accumulated reward to finish the current round, and if the value of the above formula is more than 1, the mechanical arm moves around the target state until the training round is finished; when the value of the above expression is between-1 and 1, the mechanical arm can be prevented from learning a wrong strategy, and the function of guiding learning is achieved.
7. A computer-readable storage medium, characterized in that: the computer readable storage medium stores a computer program configured to, when invoked by a processor, implement the steps of the heuristic reward function design method for robot arm reinforcement learning motion planning of any of claims 1-6.
CN202111278998.8A 2021-10-31 2021-10-31 Heuristic reward function design method for mechanical arm reinforcement learning motion planning Active CN113894787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111278998.8A CN113894787B (en) 2021-10-31 2021-10-31 Heuristic reward function design method for mechanical arm reinforcement learning motion planning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111278998.8A CN113894787B (en) 2021-10-31 2021-10-31 Heuristic reward function design method for mechanical arm reinforcement learning motion planning

Publications (2)

Publication Number Publication Date
CN113894787A CN113894787A (en) 2022-01-07
CN113894787B true CN113894787B (en) 2022-06-14

Family

ID=79027742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111278998.8A Active CN113894787B (en) 2021-10-31 2021-10-31 Heuristic reward function design method for mechanical arm reinforcement learning motion planning

Country Status (1)

Country Link
CN (1) CN113894787B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295080A (en) * 2013-06-14 2013-09-11 西安工业大学 Three-dimensional path programming method based on elevation diagram and ant colony foraging
CN108363690A (en) * 2018-02-08 2018-08-03 北京十三科技有限公司 Dialog semantics Intention Anticipation method based on neural network and learning training method
KR20190106920A (en) * 2019-08-30 2019-09-18 엘지전자 주식회사 Robot system and Control method of the same
CN111531543A (en) * 2020-05-12 2020-08-14 中国科学院自动化研究所 Robot self-adaptive impedance control method based on biological heuristic neural network
CN112325897A (en) * 2020-11-19 2021-02-05 东北大学 Path planning method based on heuristic deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295080A (en) * 2013-06-14 2013-09-11 西安工业大学 Three-dimensional path programming method based on elevation diagram and ant colony foraging
CN108363690A (en) * 2018-02-08 2018-08-03 北京十三科技有限公司 Dialog semantics Intention Anticipation method based on neural network and learning training method
KR20190106920A (en) * 2019-08-30 2019-09-18 엘지전자 주식회사 Robot system and Control method of the same
CN111531543A (en) * 2020-05-12 2020-08-14 中国科学院自动化研究所 Robot self-adaptive impedance control method based on biological heuristic neural network
CN112325897A (en) * 2020-11-19 2021-02-05 东北大学 Path planning method based on heuristic deep reinforcement learning

Also Published As

Publication number Publication date
CN113894787A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN110083165B (en) Path planning method of robot in complex narrow environment
CN107168324B (en) Robot path planning method based on ANFIS fuzzy neural network
CN109343345B (en) Mechanical arm polynomial interpolation track planning method based on QPSO algorithm
CN112947562A (en) Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG
Hu et al. A dynamic adjusting reward function method for deep reinforcement learning with adjustable parameters
Kulchenko et al. First-exit model predictive control of fast discontinuous dynamics: Application to ball bouncing
CN113296496A (en) Multi-sampling-point-based gravitational adaptive step size bidirectional RRT path planning method
CN113721622B (en) Robot path planning method
Yan et al. Path Planning for Mobile Robot's Continuous Action Space Based on Deep Reinforcement Learning
CN113894787B (en) Heuristic reward function design method for mechanical arm reinforcement learning motion planning
Luo et al. Reinforcement learning in robotic motion planning by combined experience-based planning and self-imitation learning
Camuffo et al. Moving drones for wireless coverage in a three-dimensional grid analyzed via game theory
Zhang et al. Vehicle driving longitudinal control based on double deep Q network
Botteghi et al. Entropy-based exploration for mobile robot navigation: a learning-based approach
Liu et al. Path Planning for Mobile Robot Based on Deep Reinforcement Learning and Fuzzy Control
CN113515130A (en) Method and storage medium for agent path planning
Yu et al. An intelligent robot motion planning method and application via lppo in unknown environment
Tang et al. Reinforcement learning for robots path planning with rule-based shallow-trial
Ali et al. Exploration of unknown environment using deep reinforcement learning
Ji et al. Research on Path Planning of Mobile Robot Based on Reinforcement Learning
Zhao et al. Robot Trajectory Planning Optimization Algorithm Based on Improved TD3 Algorithm
Liu et al. Improving learning from demonstrations by learning from experience
Yang et al. Robot path planning based on q-learning Algorithm
Qiu et al. Sub-optimal policy aided multi-agent reinforcement learning for flocking control
Li et al. Visual-Based Deep Reinforcement Learning for Robot Grasping with Pushing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant