CN112799386B - Robot path planning method based on artificial potential field and reinforcement learning - Google Patents
Robot path planning method based on artificial potential field and reinforcement learning Download PDFInfo
- Publication number
- CN112799386B CN112799386B CN201911020333.XA CN201911020333A CN112799386B CN 112799386 B CN112799386 B CN 112799386B CN 201911020333 A CN201911020333 A CN 201911020333A CN 112799386 B CN112799386 B CN 112799386B
- Authority
- CN
- China
- Prior art keywords
- potential field
- reinforcement learning
- field
- path planning
- obstacle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Abstract
The invention discloses a robot path planning method based on an artificial potential field and reinforcement learning, and belongs to the field of path planning. First, a map is constructed using an artificial potential field method. Secondly, a small-range strong-acting-force domain potential field is added under the condition of multiple targets. And finally, realizing path planning of the multiple intelligent agents under the condition of multiple targets by using reinforcement learning and distributed course learning technologies. The method combines an artificial potential field method and a reinforcement learning method, effectively models a multi-target environment, reduces the occurrence of local stable points, learns and avoids the occurrence of the local stable points by using reinforcement learning, and improves the success rate of path planning. The invention has higher reliability for path planning.
Description
Technical Field
The invention belongs to the field of path planning, and particularly relates to a path planning method based on an artificial potential field, which utilizes a time sequence difference learning method and a reinforcement learning method.
Background
With the continuous development of intelligent agents and artificial intelligence theories, autonomous mobile intelligent agent technologies are mature day by day and are widely applied to various fields such as industry, military, medical treatment, service and the like. Meanwhile, the tasks of the intelligent agents are more complex, and the environment is changed from the original single intelligent agent and deterministic environment to a multi-intelligent agent and uncertain environment. Therefore, in recent years, research on the intelligent agent autonomous intelligent control technology in the complex system has gained wide attention in academic and industrial fields, and path planning and navigation, as key technologies thereof, have become one of the research hotspots of the intelligent agent at present.
Current path planning techniques include two broad categories: global planning based on the determined environment and local planning based on the sensing probe information. The former is to perform path planning in a static known environment, which is also called as a static path planning method, and currently, the more applied methods include: greedy algorithm, Dijkstra algorithm and a algorithm; the latter requires real-time path planning according to environmental information input by a sensor for the case where environmental information is unknown, and the mainstream methods include an artificial potential field method, a neural network method, a fuzzy logic method, and the like.
The artificial potential field method is a virtual force field method, which virtualizes the motion of an intelligent body in the environment into the motion in an artificial force field, wherein a target point generates attraction, an obstacle generates repulsion, and the resultant force of the attraction and the repulsion controls the motion of a robot. The movement of the robot is controlled by both attractive and repulsive forces. The algorithm is widely applied to the field of real-time obstacle avoidance and path planning due to the advantages of simple mathematical analysis, small calculated amount, smooth path and the like.
Disclosure of Invention
The invention provides a new path planning method, which adds a domain field with small range and strong acting force on an original artificial potential field method, and further adds a reinforcement learning algorithm in the domain potential field to solve the problem of multi-target point navigation and obstacle avoidance.
The technical scheme adopted by the invention for realizing the purpose is as follows:
the robot path planning method based on the artificial potential field and the reinforcement learning comprises the following steps:
the method comprises the following steps: constructing an artificial potential field, wherein the potential field is formed by overlapping a gravitational potential field and a repulsive potential field; the target point provides attraction for the intelligent body to form an attraction potential field; the obstacle provides repulsion to the intelligent body to form a repulsion potential field;
step two: and pre-training the reinforcement learning in a domain-artificial potential field to obtain a strategy for reinforcement learning, and the intelligent agent avoids obstacles according to the strategy and searches for a target point.
The method for planning the artificial potential field path aims at the intelligent algorithm optimization method of the non-convex obstacle and comprises the following steps: and further learning the intelligent agent which learns the preliminary strategy in the step two aiming at the specific local stable point condition, and learning and processing the environment of the complex condition.
The construction process of the potential field in the first step is as follows:
1) respectively constructing gravitational fields of the obstacle and the target point according to the positions of the obstacle and the target point, wherein the gravitational fields are as follows:
wherein U isatt(q) gravitational field, k, generated by the target point at position qattThe gravity coefficient of the target point is larger, the target point has stronger attraction, q is a position coordinate, and the coordinate of the target point is qgSo q isgThe potential field is 0;
2) constructing repulsive force fields of obstacles
Wherein U isrep(q) is the repulsive field generated by the obstacle at position q, krepIs the repulsion coefficient of the obstacle, the larger the repulsion coefficient is, the stronger the repulsion around the obstacle is, q-q0The distance between the current position coordinate and the obstacle is the repulsive force field range of the obstacle, and the repulsive force field range is p0Beyond this range, the robot does not receive the repulsive force of the obstacle.
Further comprising constructing a domain potential field for the local stable point case
Wherein U isstr(q) is the field potential field, kstrIs a strong attractive force index, which is greater than katt,q-qgFor the distance between the current position coordinates and the target point, a range field p is providedsWithin the range, strong attraction of the target point can be sensed.
In the second step, the pre-training of reinforcement learning in the domain-artificial potential field is carried out to obtain a strategy for reinforcement learning, and the steps are as follows:
1) and (3) establishing a Q function to calculate a reward value, and obtaining the reward when the intelligent body avoids the obstacle and arrives at the target point. The Q function predicts the total reward value obtained from the current policy until the end of the iteration, in the current action and state, the process for the agent to obtain the reward value is:
Qπ(s,a)=E[r|st=s,at=a,π]
wherein QπIs the Q function of the strategy pi, s is the current state of the agent, i.e. the current potential field, a is the action taken by the agent, E is the mathematical expectation, r is the value of the reward obtained, stState of agent at time t, atThe action taken by the intelligent agent at the moment t, wherein pi is the strategy adopted by the intelligent agent at present;
2) the method comprises the following steps of approximating a Q function by using a deep neural network, using a deep Q learning method, using a neural network to express the value of the Q function generated by a target, and learning a value function by combining a time sequence difference method, wherein the method comprises the following steps:
wherein Y isiAs a function of the time sequence value, gamma is the decay rate,to take a ' of the maximum Q, s ' is the state of the agent at the next time, a ' is the action taken by the agent at the next time, θiThe strategy coefficients adopted by the agent for the ith iteration;
training was performed using the following loss function:
L(θi)=Es,a,r,s′[(Yi-Q(s,a|θi))2]
wherein L (θ)i) As a loss function, Es,a,r,s′Adopting the expectation that a is currently awarded as r and the next state is s' for the behavior with the current state as s;
updating theta as a parameter of a deep neural network by gradient descent of a loss functioniCompleting pre-training;
3) and obtaining the reward value according to the real-time action and state of the intelligent agent, wherein the action corresponding to the maximum reward value is the strategy for reinforcement learning.
The input of the deep neural network is a and s, and the output is a reward value.
The following loss function is used for training, a method of belonging to the group of Greedy is used, a random behavior is selected by the probability of a greedy selection coefficient belonging to the group of greedy when the intelligent agent selects a new behavior each time, and the optimal behavior at the current moment, namely the action corresponding to the maximum reward value, is selected by the probability of 1 belonging to the group of greedy.
And aiming at the condition of local stable points, the potential field is the superposition of a gravitational field, a repulsive force field and a domain potential field.
The method is used for path planning of the industrial intelligent warehousing robot.
The invention has the following beneficial effects and advantages:
1. the invention has the advantages of exploration and development by using reinforcement learning, and the reinforcement learning can learn how to jump out of the local stable point under the condition that the traditional algorithm falls into the local optimum.
2. The method adds a domain field aiming at the condition of multiple target points to prevent local stable points from appearing, and simultaneously plays a role in helping to strengthen learning convergence.
3. The gravity potential field sent by the corresponding target point is controlled by the domain information, so that resource waste caused by the fact that a plurality of warehousing robots simultaneously move forward in the same direction is avoided, and the working efficiency of the multi-warehousing robot is improved.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention;
fig. 2 is a block diagram of a robot training flow of the method of the present invention.
Detailed Description
The invention provides a new path planning method, which adds a domain field with small range and strong acting force on an original artificial potential field method, and further adds a reinforcement learning algorithm in the domain potential field to solve the problem of multi-target point navigation and obstacle avoidance. The warehousing robot can automatically adapt to the environment under the condition of a partially visible Markov decision process, and the problem of local stable points of potential field superposition of multiple target points is solved by using basic information of the environment through Q learning and time difference learning. In addition, a distributed course learning method is used for strengthening the training of complex problems and helping the warehousing robot to process complex environments. And finally, under the condition of multiple robots, the domain signal is used as an alternating current signal to help control the action range of the potential field of the corresponding target point, so that the working efficiency of the multiple robots is enhanced.
Example (b):
the intelligent storage robot adopts intelligent operating system, through the system instruction, removes required goods shelves to operating personnel in the front for selecting, realizes the novel mode of "people is looked for to goods, people is looked for to the goods shelves": through advanced automatic weighing photographing, multi-layer conveying, cross sorting and other systems, productivity doubling can be achieved. The intelligent warehousing robot has the characteristics of stability, flexibility, high efficiency and intelligence. The intelligent storage system is connected by a wireless network, is provided with radar scanning, automatic searching and positioning, automatic charging, can work continuously for 24 hours, and is intelligently adaptive to various storage modes by utilizing big data analysis.
The use of intelligent storage robot is favorable to reducing the cost of commodity circulation letter sorting transport, reduces personnel's input, improves logistics management, reduces the probability that the goods transport harmd, can improve the letter sorting efficiency of modern commodity circulation, promotes the development of commodity circulation trade. It is valuable to enhance the ability of the smart warehousing robot to collaborate with each other and to enhance the robustness of the smart warehousing robot to different situations.
The following detailed description of the steps for carrying out the present invention is provided in conjunction with specific procedures:
as shown in fig. 1, a multi-agent path planning method based on artificial potential field and reinforcement learning mainly adopts a path planning method based on reinforcement learning and artificial potential field and a path optimization method of course learning, and includes the following steps:
the method comprises the following steps: constructing an improved artificial potential field, constructing a virtual potential field in an environment, wherein the potential field is formed by superposing two potential fields, and a target point provides the attraction force for the warehousing robot to form an attraction potential field; the obstacle provides repulsive force to form a repulsive force field. Under the drive of the potential field resultant force, the warehousing robot reaches a target point along a collision-free path.
Step two: the storage robot uses reinforcement learning to learn the improved artificial potential field without local stable points, avoids conventional obstacles and searches for target points.
Step three: and (4) aiming at the algorithm optimization of the warehousing robot of the non-convex obstacle, further learning the warehousing robot which learns the preliminary strategy in the step two aiming at the specific local stable point condition, and learning and processing the environment of the complex condition.
The potential field construction process in the first step is as follows:
1) respectively constructing gravitational fields of the obstacle and the target point according to the positions of the obstacle and the target point, wherein the gravitational fields are as follows:
wherein U isatt(q) gravitational field, k, generated by the target point at position qattThe gravity coefficient of the target point is larger, the target point has stronger attraction, q is a position coordinate, and the coordinate of the target point is qgSo q isgThe potential field is 0;
2) constructing repulsive force fields of obstacles
Wherein U isrep(q) is the repulsive field generated by the obstacle at position q, krepIs the repulsion coefficient of the obstacle, the larger the repulsion coefficient is, the stronger the repulsion around the obstacle is, q-q0The distance between the current coordinate and the obstacle is the repulsive force field range of the obstacle, and the repulsive force field range is p0Beyond which the robot does not experience the repulsive force of the obstacle;
3) constructing a domain potential field with small-range strong acting force aiming at the condition of local stable points of multiple target points
Wherein U isstr(q) is the field potential field, kstrIs a strong attractive force index, which is greater than katt,q-qgFor the distance between the current coordinate and the target point, a range field p is providedsWithin the range, strong attraction of the target point can be sensed. The local stable point represents the point where the sum of the obstacle and target point attractive forces experienced by the agent is 0.
As shown in fig. 2, the pre-training of the reinforcement learning in the domain artificial potential field in step two is as follows:
1) establishing a Q function to calculate an accumulated reward value, under the condition that the Q function predicts the current action and state, obtaining the total income according to the current strategy till the iteration is finished, wherein the process agent obtains the accumulated income;
QT(s,a)=E[r|st=s,at=a,π]
wherein QπThe method comprises the steps that the function Q of a strategy pi is obtained, s is the current state of the storage robot, namely the current potential field condition, a is an action (including front, back, left, right, left front, left back, right front, right back and static) taken by the storage robot, E is a mathematical expectation, r is an obtained reward value (used for judging whether the storage robot can avoid obstacles and reach a target point), and pi is the strategy adopted by a current intelligent agent.
2) The traditional method uses an iterative Bellman equation to solve the Q function, but the Q function is difficult to realize under the condition of larger state space, a deep neural network is used for approximating the Q function, a deep Q learning method is used for expressing the Q value generated by a target by using the neural network, and a time sequence difference method is combined to learn the value function
Wherein Y isiAs a function of the time sequence value, gamma is the decay rate,to take a ' of the maximum Q, s ' is the state of the agent at the next time, a ' is the action taken by the agent at the next time, θiThe strategy coefficients adopted by the agent for the ith iteration;
training was performed using the following loss function:
L(θi)=Es,a,r,s′[(Yi-Q(s,a|θi))2]
wherein L (θ)i) As a loss function, Es,a,r,s′Adopting a expectation that a is currently awarded as r and s is next state for s behavior
And simultaneously, an E-greedy method is used, when the warehousing robot selects a new behavior, an E probability selects a random behavior, a 1-E probability selects the current best behavior, and the E value is reduced along with the increase of the training time.
The storage robot algorithm for the non-convex obstacles (the projections on the horizontal ground are U-shaped and L-shaped) is optimized:
1) and applying a reinforcement learning algorithm to the pre-training storage robot in the second step. The storage robot learns the preliminary obstacle avoidance and target navigation capability.
2) And (3) aiming at the conditions that the obstacles with different shapes are easy to fall into local stable points, the warehousing robot can be further trained by using the algorithm of the third step under the condition that the warehousing robot can avoid square obstacles.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the claims.
Claims (7)
1. The robot path planning method based on the artificial potential field and the reinforcement learning is characterized by comprising the following steps of:
the method comprises the following steps: constructing an artificial potential field, wherein the potential field is formed by overlapping a gravitational potential field and a repulsive potential field; the target point provides attraction for the intelligent body to form an attraction potential field; the obstacle provides repulsion to the intelligent body to form a repulsion potential field;
the construction process of the potential field in the first step is as follows:
1) respectively constructing gravitational fields of the obstacle and the target point according to the positions of the obstacle and the target point, wherein the gravitational fields are as follows:
wherein U isatt(q) gravitational field, k, generated by the target point at position qattThe gravity coefficient of the target point is larger, the target point has stronger attraction, q is a position coordinate, and the coordinate of the target point is qgSo q isgThe potential field is 0;
2) constructing repulsive force fields of obstacles
Wherein U isrep(q) is the repulsive field generated by the obstacle at position q, krepIs the repulsion coefficient of the obstacle, the larger the repulsion coefficient is, the stronger the repulsion around the obstacle is, q-q0The distance between the current position coordinate and the obstacle is the repulsive force field range of the obstacle, and the repulsive force field range is p0Beyond which the robot does not experience the repulsive force of the obstacle;
further comprising:
constructing a domain potential field for a locally stable point condition
Wherein U isstr(q) is the field potential field, kstrIs a strong attractive force index, which is greater than katt,q-qgFor the distance between the current position coordinates and the target point, a range field p is providedsStrong attraction of the target point can be sensed in the range;
step two: and pre-training the reinforcement learning in a domain-artificial potential field to obtain a strategy for reinforcement learning, and the intelligent agent avoids obstacles according to the strategy and searches for a target point.
2. The method for robot path planning based on artificial potential field and reinforcement learning of claim 1, wherein the method for robot path planning based on artificial potential field and reinforcement learning is an intelligent algorithm optimization method for non-convex obstacles, and comprises the following steps: and further learning the intelligent agent which learns the preliminary strategy in the step two aiming at the specific local stable point condition, and learning and processing the environment of the complex condition.
3. The method for robot path planning based on artificial potential field and reinforcement learning of claim 1, wherein in the second step, the reinforcement learning is pre-trained in a domain-artificial potential field to obtain a strategy for reinforcement learning, and the steps are as follows:
1) establishing a Q function to calculate an award value, obtaining the award when the intelligent agent avoids the obstacle and arrives at the target point, predicting a total award value obtained by the Q function according to the current strategy till the iteration is finished under the current action and state, wherein the process that the intelligent agent obtains the award value is as follows:
Qπ(s,a)=E[r|st=s,at=a,π]
wherein QπIs the Q function of the strategy pi, s is the current state of the agent, i.e. the current potential field, a is the action taken by the agent, E is the mathematical expectation, r is the value of the reward obtained, stState of agent at time t, atThe action taken by the intelligent agent at the moment t, wherein pi is the strategy adopted by the intelligent agent at present;
2) the method comprises the following steps of approximating a Q function by using a deep neural network, using a deep Q learning method, using a neural network to express the value of the Q function generated by a target, and learning a value function by combining a time sequence difference method, wherein the method comprises the following steps:
wherein Y isiAs a function of the time sequence value, gamma is the decay rate,to take a ' of the maximum Q, s ' is the state of the agent at the next time, a ' is the action taken by the agent at the next time, θiThe strategy coefficients adopted by the agent for the ith iteration;
training was performed using the following loss function:
L(θi)=Es,a,r,s′[(Yi-Q(s,a|θi))2]
wherein L (θ)i) As a loss function, Es,a,r,s′Adopting the expectation that a is currently awarded as r and the next state is s' for the behavior with the current state as s;
updating theta as a parameter of a deep neural network by gradient descent of a loss functioniCompleting pre-training;
3) and obtaining the reward value according to the real-time action and state of the intelligent agent, wherein the action corresponding to the maximum reward value is the strategy for reinforcement learning.
4. The method of claim 3, wherein the deep neural network has inputs a, s and an output as a reward value.
5. The method for robot path planning based on artificial potential field and reinforcement learning of claim 3, wherein training is performed while using an e-greedy method, each time the agent selects a new behavior, a random behavior is selected with a probability of greedy selection coefficient e, and an action corresponding to the best behavior at the current time, i.e., the maximum reward value, is selected with a probability of 1-e.
6. The method for robot path planning based on artificial potential field and reinforcement learning of claim 1, wherein for a local stable point situation, the potential field is a superposition of an attraction field, a repulsion field and a domain potential field.
7. The robot path planning method based on the artificial potential field and the reinforcement learning of any one of claims 1 to 6, which is used for path planning of industrial intelligent storage robots.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911020333.XA CN112799386B (en) | 2019-10-25 | 2019-10-25 | Robot path planning method based on artificial potential field and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911020333.XA CN112799386B (en) | 2019-10-25 | 2019-10-25 | Robot path planning method based on artificial potential field and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112799386A CN112799386A (en) | 2021-05-14 |
CN112799386B true CN112799386B (en) | 2021-11-23 |
Family
ID=75802949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911020333.XA Active CN112799386B (en) | 2019-10-25 | 2019-10-25 | Robot path planning method based on artificial potential field and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112799386B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113341958B (en) * | 2021-05-21 | 2022-02-25 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN113128657B (en) * | 2021-06-17 | 2021-09-14 | 中国科学院自动化研究所 | Multi-agent behavior decision method and device, electronic equipment and storage medium |
CN113778097B (en) * | 2021-09-15 | 2023-05-19 | 龙岩学院 | Intelligent warehouse logistics robot path planning method based on L-shaped path trend improved A-STAR algorithm |
CN113534669B (en) * | 2021-09-17 | 2021-11-30 | 中国人民解放军国防科技大学 | Unmanned vehicle control method and device based on data driving and computer equipment |
CN114055471B (en) * | 2021-11-30 | 2022-05-10 | 哈尔滨工业大学 | Mechanical arm online motion planning method combining neural motion planning algorithm and artificial potential field method |
CN114442630B (en) * | 2022-01-25 | 2023-12-05 | 浙江大学 | Intelligent vehicle planning control method based on reinforcement learning and model prediction |
CN114518770A (en) * | 2022-03-01 | 2022-05-20 | 西安交通大学 | Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning |
CN115294698B (en) * | 2022-08-05 | 2023-06-06 | 东风悦享科技有限公司 | Automatic tool delivery and recovery system and method based on unmanned tool vehicle |
CN117093010B (en) * | 2023-10-20 | 2024-01-19 | 清华大学 | Underwater multi-agent path planning method, device, computer equipment and medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1883887A (en) * | 2006-07-07 | 2006-12-27 | 中国科学院力学研究所 | Robot obstacle-avoiding route planning method based on virtual scene |
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
WO2016045615A1 (en) * | 2014-09-25 | 2016-03-31 | 科沃斯机器人有限公司 | Robot static path planning method |
WO2018176594A1 (en) * | 2017-03-31 | 2018-10-04 | 深圳市靖洲科技有限公司 | Artificial potential field path planning method for unmanned bicycle |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
CN109670270A (en) * | 2019-01-11 | 2019-04-23 | 山东师范大学 | Crowd evacuation emulation method and system based on the study of multiple agent deeply |
CN109726866A (en) * | 2018-12-27 | 2019-05-07 | 浙江农林大学 | Unmanned boat paths planning method based on Q learning neural network |
CN110083165A (en) * | 2019-05-21 | 2019-08-02 | 大连大学 | A kind of robot paths planning method under complicated narrow environment |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
-
2019
- 2019-10-25 CN CN201911020333.XA patent/CN112799386B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1883887A (en) * | 2006-07-07 | 2006-12-27 | 中国科学院力学研究所 | Robot obstacle-avoiding route planning method based on virtual scene |
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
WO2016045615A1 (en) * | 2014-09-25 | 2016-03-31 | 科沃斯机器人有限公司 | Robot static path planning method |
WO2018176594A1 (en) * | 2017-03-31 | 2018-10-04 | 深圳市靖洲科技有限公司 | Artificial potential field path planning method for unmanned bicycle |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
CN109726866A (en) * | 2018-12-27 | 2019-05-07 | 浙江农林大学 | Unmanned boat paths planning method based on Q learning neural network |
CN109670270A (en) * | 2019-01-11 | 2019-04-23 | 山东师范大学 | Crowd evacuation emulation method and system based on the study of multiple agent deeply |
CN110083165A (en) * | 2019-05-21 | 2019-08-02 | 大连大学 | A kind of robot paths planning method under complicated narrow environment |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
Non-Patent Citations (3)
Title |
---|
A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field;Yanbin Zheng;《2015 11th International Conference on Natural Computation (ICNC)》;20160111;第363-368页 * |
基于分层强化学习及人工势场的多Agent路径规划方法;郑延斌;《计算机应用》;20151231;第35卷(第12期);第3491-3496页 * |
引入势场及陷阱搜索的强化学习路径规划算法;董培方;《计算机工程与应用》;20180831;第54卷(第16期);第129-134页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112799386A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112799386B (en) | Robot path planning method based on artificial potential field and reinforcement learning | |
Mohanan et al. | A survey of robotic motion planning in dynamic environments | |
CN113485380B (en) | AGV path planning method and system based on reinforcement learning | |
Liu et al. | Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments | |
CN102819264B (en) | Path planning Q-learning initial method of mobile robot | |
CN113110509B (en) | Warehousing system multi-robot path planning method based on deep reinforcement learning | |
Xia et al. | Neural inverse reinforcement learning in autonomous navigation | |
CN112465151A (en) | Multi-agent federal cooperation method based on deep reinforcement learning | |
CN111780777A (en) | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning | |
US11561544B2 (en) | Indoor monocular navigation method based on cross-sensor transfer learning and system thereof | |
Zhao et al. | The experience-memory Q-learning algorithm for robot path planning in unknown environment | |
CN110442129B (en) | Control method and system for multi-agent formation | |
CN112362066A (en) | Path planning method based on improved deep reinforcement learning | |
CN114003059B (en) | UAV path planning method based on deep reinforcement learning under kinematic constraint condition | |
CN112344945B (en) | Indoor distribution robot path planning method and system and indoor distribution robot | |
CN113391633A (en) | Urban environment-oriented mobile robot fusion path planning method | |
Ma et al. | State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots | |
CN114020013B (en) | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning | |
Mokhtari et al. | Safe deep q-network for autonomous vehicles at unsignalized intersection | |
CN112799385A (en) | Intelligent agent path planning method based on artificial potential field of guide domain | |
Liang et al. | Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network | |
Chen et al. | Deep reinforcement learning-based robot exploration for constructing map of unknown environment | |
CN116551703B (en) | Motion planning method based on machine learning in complex environment | |
Zhang et al. | Visual navigation of mobile robots in complex environments based on distributed deep reinforcement learning | |
CN113959446A (en) | Robot autonomous logistics transportation navigation method based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |