CN112799386B - Robot path planning method based on artificial potential field and reinforcement learning - Google Patents

Robot path planning method based on artificial potential field and reinforcement learning Download PDF

Info

Publication number
CN112799386B
CN112799386B CN201911020333.XA CN201911020333A CN112799386B CN 112799386 B CN112799386 B CN 112799386B CN 201911020333 A CN201911020333 A CN 201911020333A CN 112799386 B CN112799386 B CN 112799386B
Authority
CN
China
Prior art keywords
potential field
reinforcement learning
field
path planning
obstacle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911020333.XA
Other languages
Chinese (zh)
Other versions
CN112799386A (en
Inventor
么庆丰
郑泽宇
赵明
潘怡君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Automation of CAS
Original Assignee
Shenyang Institute of Automation of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Automation of CAS filed Critical Shenyang Institute of Automation of CAS
Priority to CN201911020333.XA priority Critical patent/CN112799386B/en
Publication of CN112799386A publication Critical patent/CN112799386A/en
Application granted granted Critical
Publication of CN112799386B publication Critical patent/CN112799386B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Abstract

The invention discloses a robot path planning method based on an artificial potential field and reinforcement learning, and belongs to the field of path planning. First, a map is constructed using an artificial potential field method. Secondly, a small-range strong-acting-force domain potential field is added under the condition of multiple targets. And finally, realizing path planning of the multiple intelligent agents under the condition of multiple targets by using reinforcement learning and distributed course learning technologies. The method combines an artificial potential field method and a reinforcement learning method, effectively models a multi-target environment, reduces the occurrence of local stable points, learns and avoids the occurrence of the local stable points by using reinforcement learning, and improves the success rate of path planning. The invention has higher reliability for path planning.

Description

Robot path planning method based on artificial potential field and reinforcement learning
Technical Field
The invention belongs to the field of path planning, and particularly relates to a path planning method based on an artificial potential field, which utilizes a time sequence difference learning method and a reinforcement learning method.
Background
With the continuous development of intelligent agents and artificial intelligence theories, autonomous mobile intelligent agent technologies are mature day by day and are widely applied to various fields such as industry, military, medical treatment, service and the like. Meanwhile, the tasks of the intelligent agents are more complex, and the environment is changed from the original single intelligent agent and deterministic environment to a multi-intelligent agent and uncertain environment. Therefore, in recent years, research on the intelligent agent autonomous intelligent control technology in the complex system has gained wide attention in academic and industrial fields, and path planning and navigation, as key technologies thereof, have become one of the research hotspots of the intelligent agent at present.
Current path planning techniques include two broad categories: global planning based on the determined environment and local planning based on the sensing probe information. The former is to perform path planning in a static known environment, which is also called as a static path planning method, and currently, the more applied methods include: greedy algorithm, Dijkstra algorithm and a algorithm; the latter requires real-time path planning according to environmental information input by a sensor for the case where environmental information is unknown, and the mainstream methods include an artificial potential field method, a neural network method, a fuzzy logic method, and the like.
The artificial potential field method is a virtual force field method, which virtualizes the motion of an intelligent body in the environment into the motion in an artificial force field, wherein a target point generates attraction, an obstacle generates repulsion, and the resultant force of the attraction and the repulsion controls the motion of a robot. The movement of the robot is controlled by both attractive and repulsive forces. The algorithm is widely applied to the field of real-time obstacle avoidance and path planning due to the advantages of simple mathematical analysis, small calculated amount, smooth path and the like.
Disclosure of Invention
The invention provides a new path planning method, which adds a domain field with small range and strong acting force on an original artificial potential field method, and further adds a reinforcement learning algorithm in the domain potential field to solve the problem of multi-target point navigation and obstacle avoidance.
The technical scheme adopted by the invention for realizing the purpose is as follows:
the robot path planning method based on the artificial potential field and the reinforcement learning comprises the following steps:
the method comprises the following steps: constructing an artificial potential field, wherein the potential field is formed by overlapping a gravitational potential field and a repulsive potential field; the target point provides attraction for the intelligent body to form an attraction potential field; the obstacle provides repulsion to the intelligent body to form a repulsion potential field;
step two: and pre-training the reinforcement learning in a domain-artificial potential field to obtain a strategy for reinforcement learning, and the intelligent agent avoids obstacles according to the strategy and searches for a target point.
The method for planning the artificial potential field path aims at the intelligent algorithm optimization method of the non-convex obstacle and comprises the following steps: and further learning the intelligent agent which learns the preliminary strategy in the step two aiming at the specific local stable point condition, and learning and processing the environment of the complex condition.
The construction process of the potential field in the first step is as follows:
1) respectively constructing gravitational fields of the obstacle and the target point according to the positions of the obstacle and the target point, wherein the gravitational fields are as follows:
Figure BDA0002247005580000021
wherein U isatt(q) gravitational field, k, generated by the target point at position qattThe gravity coefficient of the target point is larger, the target point has stronger attraction, q is a position coordinate, and the coordinate of the target point is qgSo q isgThe potential field is 0;
2) constructing repulsive force fields of obstacles
Figure BDA0002247005580000022
Wherein U isrep(q) is the repulsive field generated by the obstacle at position q, krepIs the repulsion coefficient of the obstacle, the larger the repulsion coefficient is, the stronger the repulsion around the obstacle is, q-q0The distance between the current position coordinate and the obstacle is the repulsive force field range of the obstacle, and the repulsive force field range is p0Beyond this range, the robot does not receive the repulsive force of the obstacle.
Further comprising constructing a domain potential field for the local stable point case
Figure BDA0002247005580000031
Wherein U isstr(q) is the field potential field, kstrIs a strong attractive force index, which is greater than katt,q-qgFor the distance between the current position coordinates and the target point, a range field p is providedsWithin the range, strong attraction of the target point can be sensed.
In the second step, the pre-training of reinforcement learning in the domain-artificial potential field is carried out to obtain a strategy for reinforcement learning, and the steps are as follows:
1) and (3) establishing a Q function to calculate a reward value, and obtaining the reward when the intelligent body avoids the obstacle and arrives at the target point. The Q function predicts the total reward value obtained from the current policy until the end of the iteration, in the current action and state, the process for the agent to obtain the reward value is:
Qπ(s,a)=E[r|st=s,at=a,π]
wherein QπIs the Q function of the strategy pi, s is the current state of the agent, i.e. the current potential field, a is the action taken by the agent, E is the mathematical expectation, r is the value of the reward obtained, stState of agent at time t, atThe action taken by the intelligent agent at the moment t, wherein pi is the strategy adopted by the intelligent agent at present;
2) the method comprises the following steps of approximating a Q function by using a deep neural network, using a deep Q learning method, using a neural network to express the value of the Q function generated by a target, and learning a value function by combining a time sequence difference method, wherein the method comprises the following steps:
Figure BDA0002247005580000032
wherein Y isiAs a function of the time sequence value, gamma is the decay rate,
Figure BDA0002247005580000033
to take a ' of the maximum Q, s ' is the state of the agent at the next time, a ' is the action taken by the agent at the next time, θiThe strategy coefficients adopted by the agent for the ith iteration;
training was performed using the following loss function:
L(θi)=Es,a,r,s′[(Yi-Q(s,a|θi))2]
wherein L (θ)i) As a loss function, Es,a,r,s′Adopting the expectation that a is currently awarded as r and the next state is s' for the behavior with the current state as s;
updating theta as a parameter of a deep neural network by gradient descent of a loss functioniCompleting pre-training;
3) and obtaining the reward value according to the real-time action and state of the intelligent agent, wherein the action corresponding to the maximum reward value is the strategy for reinforcement learning.
The input of the deep neural network is a and s, and the output is a reward value.
The following loss function is used for training, a method of belonging to the group of Greedy is used, a random behavior is selected by the probability of a greedy selection coefficient belonging to the group of greedy when the intelligent agent selects a new behavior each time, and the optimal behavior at the current moment, namely the action corresponding to the maximum reward value, is selected by the probability of 1 belonging to the group of greedy.
And aiming at the condition of local stable points, the potential field is the superposition of a gravitational field, a repulsive force field and a domain potential field.
The method is used for path planning of the industrial intelligent warehousing robot.
The invention has the following beneficial effects and advantages:
1. the invention has the advantages of exploration and development by using reinforcement learning, and the reinforcement learning can learn how to jump out of the local stable point under the condition that the traditional algorithm falls into the local optimum.
2. The method adds a domain field aiming at the condition of multiple target points to prevent local stable points from appearing, and simultaneously plays a role in helping to strengthen learning convergence.
3. The gravity potential field sent by the corresponding target point is controlled by the domain information, so that resource waste caused by the fact that a plurality of warehousing robots simultaneously move forward in the same direction is avoided, and the working efficiency of the multi-warehousing robot is improved.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention;
fig. 2 is a block diagram of a robot training flow of the method of the present invention.
Detailed Description
The invention provides a new path planning method, which adds a domain field with small range and strong acting force on an original artificial potential field method, and further adds a reinforcement learning algorithm in the domain potential field to solve the problem of multi-target point navigation and obstacle avoidance. The warehousing robot can automatically adapt to the environment under the condition of a partially visible Markov decision process, and the problem of local stable points of potential field superposition of multiple target points is solved by using basic information of the environment through Q learning and time difference learning. In addition, a distributed course learning method is used for strengthening the training of complex problems and helping the warehousing robot to process complex environments. And finally, under the condition of multiple robots, the domain signal is used as an alternating current signal to help control the action range of the potential field of the corresponding target point, so that the working efficiency of the multiple robots is enhanced.
Example (b):
the intelligent storage robot adopts intelligent operating system, through the system instruction, removes required goods shelves to operating personnel in the front for selecting, realizes the novel mode of "people is looked for to goods, people is looked for to the goods shelves": through advanced automatic weighing photographing, multi-layer conveying, cross sorting and other systems, productivity doubling can be achieved. The intelligent warehousing robot has the characteristics of stability, flexibility, high efficiency and intelligence. The intelligent storage system is connected by a wireless network, is provided with radar scanning, automatic searching and positioning, automatic charging, can work continuously for 24 hours, and is intelligently adaptive to various storage modes by utilizing big data analysis.
The use of intelligent storage robot is favorable to reducing the cost of commodity circulation letter sorting transport, reduces personnel's input, improves logistics management, reduces the probability that the goods transport harmd, can improve the letter sorting efficiency of modern commodity circulation, promotes the development of commodity circulation trade. It is valuable to enhance the ability of the smart warehousing robot to collaborate with each other and to enhance the robustness of the smart warehousing robot to different situations.
The following detailed description of the steps for carrying out the present invention is provided in conjunction with specific procedures:
as shown in fig. 1, a multi-agent path planning method based on artificial potential field and reinforcement learning mainly adopts a path planning method based on reinforcement learning and artificial potential field and a path optimization method of course learning, and includes the following steps:
the method comprises the following steps: constructing an improved artificial potential field, constructing a virtual potential field in an environment, wherein the potential field is formed by superposing two potential fields, and a target point provides the attraction force for the warehousing robot to form an attraction potential field; the obstacle provides repulsive force to form a repulsive force field. Under the drive of the potential field resultant force, the warehousing robot reaches a target point along a collision-free path.
Step two: the storage robot uses reinforcement learning to learn the improved artificial potential field without local stable points, avoids conventional obstacles and searches for target points.
Step three: and (4) aiming at the algorithm optimization of the warehousing robot of the non-convex obstacle, further learning the warehousing robot which learns the preliminary strategy in the step two aiming at the specific local stable point condition, and learning and processing the environment of the complex condition.
The potential field construction process in the first step is as follows:
1) respectively constructing gravitational fields of the obstacle and the target point according to the positions of the obstacle and the target point, wherein the gravitational fields are as follows:
Figure BDA0002247005580000061
wherein U isatt(q) gravitational field, k, generated by the target point at position qattThe gravity coefficient of the target point is larger, the target point has stronger attraction, q is a position coordinate, and the coordinate of the target point is qgSo q isgThe potential field is 0;
2) constructing repulsive force fields of obstacles
Figure BDA0002247005580000062
Wherein U isrep(q) is the repulsive field generated by the obstacle at position q, krepIs the repulsion coefficient of the obstacle, the larger the repulsion coefficient is, the stronger the repulsion around the obstacle is, q-q0The distance between the current coordinate and the obstacle is the repulsive force field range of the obstacle, and the repulsive force field range is p0Beyond which the robot does not experience the repulsive force of the obstacle;
3) constructing a domain potential field with small-range strong acting force aiming at the condition of local stable points of multiple target points
Figure BDA0002247005580000071
Wherein U isstr(q) is the field potential field, kstrIs a strong attractive force index, which is greater than katt,q-qgFor the distance between the current coordinate and the target point, a range field p is providedsWithin the range, strong attraction of the target point can be sensed. The local stable point represents the point where the sum of the obstacle and target point attractive forces experienced by the agent is 0.
As shown in fig. 2, the pre-training of the reinforcement learning in the domain artificial potential field in step two is as follows:
1) establishing a Q function to calculate an accumulated reward value, under the condition that the Q function predicts the current action and state, obtaining the total income according to the current strategy till the iteration is finished, wherein the process agent obtains the accumulated income;
QT(s,a)=E[r|st=s,at=a,π]
wherein QπThe method comprises the steps that the function Q of a strategy pi is obtained, s is the current state of the storage robot, namely the current potential field condition, a is an action (including front, back, left, right, left front, left back, right front, right back and static) taken by the storage robot, E is a mathematical expectation, r is an obtained reward value (used for judging whether the storage robot can avoid obstacles and reach a target point), and pi is the strategy adopted by a current intelligent agent.
2) The traditional method uses an iterative Bellman equation to solve the Q function, but the Q function is difficult to realize under the condition of larger state space, a deep neural network is used for approximating the Q function, a deep Q learning method is used for expressing the Q value generated by a target by using the neural network, and a time sequence difference method is combined to learn the value function
Figure BDA0002247005580000072
Wherein Y isiAs a function of the time sequence value, gamma is the decay rate,
Figure BDA0002247005580000073
to take a ' of the maximum Q, s ' is the state of the agent at the next time, a ' is the action taken by the agent at the next time, θiThe strategy coefficients adopted by the agent for the ith iteration;
training was performed using the following loss function:
L(θi)=Es,a,r,s′[(Yi-Q(s,a|θi))2]
wherein L (θ)i) As a loss function, Es,a,r,s′Adopting a expectation that a is currently awarded as r and s is next state for s behavior
And simultaneously, an E-greedy method is used, when the warehousing robot selects a new behavior, an E probability selects a random behavior, a 1-E probability selects the current best behavior, and the E value is reduced along with the increase of the training time.
The storage robot algorithm for the non-convex obstacles (the projections on the horizontal ground are U-shaped and L-shaped) is optimized:
1) and applying a reinforcement learning algorithm to the pre-training storage robot in the second step. The storage robot learns the preliminary obstacle avoidance and target navigation capability.
2) And (3) aiming at the conditions that the obstacles with different shapes are easy to fall into local stable points, the warehousing robot can be further trained by using the algorithm of the third step under the condition that the warehousing robot can avoid square obstacles.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the claims.

Claims (7)

1. The robot path planning method based on the artificial potential field and the reinforcement learning is characterized by comprising the following steps of:
the method comprises the following steps: constructing an artificial potential field, wherein the potential field is formed by overlapping a gravitational potential field and a repulsive potential field; the target point provides attraction for the intelligent body to form an attraction potential field; the obstacle provides repulsion to the intelligent body to form a repulsion potential field;
the construction process of the potential field in the first step is as follows:
1) respectively constructing gravitational fields of the obstacle and the target point according to the positions of the obstacle and the target point, wherein the gravitational fields are as follows:
Figure FDA0003292599110000011
wherein U isatt(q) gravitational field, k, generated by the target point at position qattThe gravity coefficient of the target point is larger, the target point has stronger attraction, q is a position coordinate, and the coordinate of the target point is qgSo q isgThe potential field is 0;
2) constructing repulsive force fields of obstacles
Figure FDA0003292599110000012
Wherein U isrep(q) is the repulsive field generated by the obstacle at position q, krepIs the repulsion coefficient of the obstacle, the larger the repulsion coefficient is, the stronger the repulsion around the obstacle is, q-q0The distance between the current position coordinate and the obstacle is the repulsive force field range of the obstacle, and the repulsive force field range is p0Beyond which the robot does not experience the repulsive force of the obstacle;
further comprising:
constructing a domain potential field for a locally stable point condition
Figure FDA0003292599110000013
Wherein U isstr(q) is the field potential field, kstrIs a strong attractive force index, which is greater than katt,q-qgFor the distance between the current position coordinates and the target point, a range field p is providedsStrong attraction of the target point can be sensed in the range;
step two: and pre-training the reinforcement learning in a domain-artificial potential field to obtain a strategy for reinforcement learning, and the intelligent agent avoids obstacles according to the strategy and searches for a target point.
2. The method for robot path planning based on artificial potential field and reinforcement learning of claim 1, wherein the method for robot path planning based on artificial potential field and reinforcement learning is an intelligent algorithm optimization method for non-convex obstacles, and comprises the following steps: and further learning the intelligent agent which learns the preliminary strategy in the step two aiming at the specific local stable point condition, and learning and processing the environment of the complex condition.
3. The method for robot path planning based on artificial potential field and reinforcement learning of claim 1, wherein in the second step, the reinforcement learning is pre-trained in a domain-artificial potential field to obtain a strategy for reinforcement learning, and the steps are as follows:
1) establishing a Q function to calculate an award value, obtaining the award when the intelligent agent avoids the obstacle and arrives at the target point, predicting a total award value obtained by the Q function according to the current strategy till the iteration is finished under the current action and state, wherein the process that the intelligent agent obtains the award value is as follows:
Qπ(s,a)=E[r|st=s,at=a,π]
wherein QπIs the Q function of the strategy pi, s is the current state of the agent, i.e. the current potential field, a is the action taken by the agent, E is the mathematical expectation, r is the value of the reward obtained, stState of agent at time t, atThe action taken by the intelligent agent at the moment t, wherein pi is the strategy adopted by the intelligent agent at present;
2) the method comprises the following steps of approximating a Q function by using a deep neural network, using a deep Q learning method, using a neural network to express the value of the Q function generated by a target, and learning a value function by combining a time sequence difference method, wherein the method comprises the following steps:
Figure FDA0003292599110000021
wherein Y isiAs a function of the time sequence value, gamma is the decay rate,
Figure FDA0003292599110000022
to take a ' of the maximum Q, s ' is the state of the agent at the next time, a ' is the action taken by the agent at the next time, θiThe strategy coefficients adopted by the agent for the ith iteration;
training was performed using the following loss function:
L(θi)=Es,a,r,s′[(Yi-Q(s,a|θi))2]
wherein L (θ)i) As a loss function, Es,a,r,s′Adopting the expectation that a is currently awarded as r and the next state is s' for the behavior with the current state as s;
updating theta as a parameter of a deep neural network by gradient descent of a loss functioniCompleting pre-training;
3) and obtaining the reward value according to the real-time action and state of the intelligent agent, wherein the action corresponding to the maximum reward value is the strategy for reinforcement learning.
4. The method of claim 3, wherein the deep neural network has inputs a, s and an output as a reward value.
5. The method for robot path planning based on artificial potential field and reinforcement learning of claim 3, wherein training is performed while using an e-greedy method, each time the agent selects a new behavior, a random behavior is selected with a probability of greedy selection coefficient e, and an action corresponding to the best behavior at the current time, i.e., the maximum reward value, is selected with a probability of 1-e.
6. The method for robot path planning based on artificial potential field and reinforcement learning of claim 1, wherein for a local stable point situation, the potential field is a superposition of an attraction field, a repulsion field and a domain potential field.
7. The robot path planning method based on the artificial potential field and the reinforcement learning of any one of claims 1 to 6, which is used for path planning of industrial intelligent storage robots.
CN201911020333.XA 2019-10-25 2019-10-25 Robot path planning method based on artificial potential field and reinforcement learning Active CN112799386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911020333.XA CN112799386B (en) 2019-10-25 2019-10-25 Robot path planning method based on artificial potential field and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911020333.XA CN112799386B (en) 2019-10-25 2019-10-25 Robot path planning method based on artificial potential field and reinforcement learning

Publications (2)

Publication Number Publication Date
CN112799386A CN112799386A (en) 2021-05-14
CN112799386B true CN112799386B (en) 2021-11-23

Family

ID=75802949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911020333.XA Active CN112799386B (en) 2019-10-25 2019-10-25 Robot path planning method based on artificial potential field and reinforcement learning

Country Status (1)

Country Link
CN (1) CN112799386B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113341958B (en) * 2021-05-21 2022-02-25 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113128657B (en) * 2021-06-17 2021-09-14 中国科学院自动化研究所 Multi-agent behavior decision method and device, electronic equipment and storage medium
CN113778097B (en) * 2021-09-15 2023-05-19 龙岩学院 Intelligent warehouse logistics robot path planning method based on L-shaped path trend improved A-STAR algorithm
CN113534669B (en) * 2021-09-17 2021-11-30 中国人民解放军国防科技大学 Unmanned vehicle control method and device based on data driving and computer equipment
CN114055471B (en) * 2021-11-30 2022-05-10 哈尔滨工业大学 Mechanical arm online motion planning method combining neural motion planning algorithm and artificial potential field method
CN114442630B (en) * 2022-01-25 2023-12-05 浙江大学 Intelligent vehicle planning control method based on reinforcement learning and model prediction
CN114518770A (en) * 2022-03-01 2022-05-20 西安交通大学 Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning
CN115294698B (en) * 2022-08-05 2023-06-06 东风悦享科技有限公司 Automatic tool delivery and recovery system and method based on unmanned tool vehicle
CN117093010B (en) * 2023-10-20 2024-01-19 清华大学 Underwater multi-agent path planning method, device, computer equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1883887A (en) * 2006-07-07 2006-12-27 中国科学院力学研究所 Robot obstacle-avoiding route planning method based on virtual scene
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
CN102819264A (en) * 2012-07-30 2012-12-12 山东大学 Path planning Q-learning initial method of mobile robot
WO2016045615A1 (en) * 2014-09-25 2016-03-31 科沃斯机器人有限公司 Robot static path planning method
WO2018176594A1 (en) * 2017-03-31 2018-10-04 深圳市靖洲科技有限公司 Artificial potential field path planning method for unmanned bicycle
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN109670270A (en) * 2019-01-11 2019-04-23 山东师范大学 Crowd evacuation emulation method and system based on the study of multiple agent deeply
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110083165A (en) * 2019-05-21 2019-08-02 大连大学 A kind of robot paths planning method under complicated narrow environment
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1883887A (en) * 2006-07-07 2006-12-27 中国科学院力学研究所 Robot obstacle-avoiding route planning method based on virtual scene
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
CN102819264A (en) * 2012-07-30 2012-12-12 山东大学 Path planning Q-learning initial method of mobile robot
WO2016045615A1 (en) * 2014-09-25 2016-03-31 科沃斯机器人有限公司 Robot static path planning method
WO2018176594A1 (en) * 2017-03-31 2018-10-04 深圳市靖洲科技有限公司 Artificial potential field path planning method for unmanned bicycle
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN109670270A (en) * 2019-01-11 2019-04-23 山东师范大学 Crowd evacuation emulation method and system based on the study of multiple agent deeply
CN110083165A (en) * 2019-05-21 2019-08-02 大连大学 A kind of robot paths planning method under complicated narrow environment
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field;Yanbin Zheng;《2015 11th International Conference on Natural Computation (ICNC)》;20160111;第363-368页 *
基于分层强化学习及人工势场的多Agent路径规划方法;郑延斌;《计算机应用》;20151231;第35卷(第12期);第3491-3496页 *
引入势场及陷阱搜索的强化学习路径规划算法;董培方;《计算机工程与应用》;20180831;第54卷(第16期);第129-134页 *

Also Published As

Publication number Publication date
CN112799386A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112799386B (en) Robot path planning method based on artificial potential field and reinforcement learning
Mohanan et al. A survey of robotic motion planning in dynamic environments
CN113485380B (en) AGV path planning method and system based on reinforcement learning
Liu et al. Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments
CN102819264B (en) Path planning Q-learning initial method of mobile robot
CN113110509B (en) Warehousing system multi-robot path planning method based on deep reinforcement learning
Xia et al. Neural inverse reinforcement learning in autonomous navigation
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN111780777A (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
US11561544B2 (en) Indoor monocular navigation method based on cross-sensor transfer learning and system thereof
Zhao et al. The experience-memory Q-learning algorithm for robot path planning in unknown environment
CN110442129B (en) Control method and system for multi-agent formation
CN112362066A (en) Path planning method based on improved deep reinforcement learning
CN114003059B (en) UAV path planning method based on deep reinforcement learning under kinematic constraint condition
CN112344945B (en) Indoor distribution robot path planning method and system and indoor distribution robot
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Ma et al. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots
CN114020013B (en) Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning
Mokhtari et al. Safe deep q-network for autonomous vehicles at unsignalized intersection
CN112799385A (en) Intelligent agent path planning method based on artificial potential field of guide domain
Liang et al. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network
Chen et al. Deep reinforcement learning-based robot exploration for constructing map of unknown environment
CN116551703B (en) Motion planning method based on machine learning in complex environment
Zhang et al. Visual navigation of mobile robots in complex environments based on distributed deep reinforcement learning
CN113959446A (en) Robot autonomous logistics transportation navigation method based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant