CN113177664B - Self-learning path planning method taking safety and distance cost as constraint - Google Patents
Self-learning path planning method taking safety and distance cost as constraint Download PDFInfo
- Publication number
- CN113177664B CN113177664B CN202110550501.7A CN202110550501A CN113177664B CN 113177664 B CN113177664 B CN 113177664B CN 202110550501 A CN202110550501 A CN 202110550501A CN 113177664 B CN113177664 B CN 113177664B
- Authority
- CN
- China
- Prior art keywords
- distance
- path planning
- self
- agent
- planning method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000006870 function Effects 0.000 claims abstract description 29
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 abstract description 28
- 230000002787 reinforcement Effects 0.000 abstract description 8
- 230000008569 process Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Manipulator (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a self-learning path planning method taking security and distance cost as constraints, which designs a heuristic function comprehensively considering the security cost and the distance cost by referring to the design thought of an A-heuristic function, and introduces the heuristic function into the design of a reward function in a reinforcement learning DQN algorithm, so that an intelligent agent can be guided to find a safe and shortest path through a new reward function.
Description
Technical Field
The invention relates to a self-learning path planning method taking safety and distance cost as constraints, and belongs to the field of intelligent cabin display.
Background
Reinforcement learning is a closed-loop learning method by reference to experience, and a robot continuously performs information interaction with the environment, so that an autonomous learning process is achieved. The process of interacting between the robot and the environment may be described as a markov decision problem.
The Q_learning algorithm in reinforcement Learning is widely applied to a robot path planning technology, and the robot interacts with the environment through the Q_learning algorithm so as to achieve the purpose of autonomous path planning. Since the Q_learning algorithm is used for calculating the value in the Q table, and then the action with larger Q value is selected as the action to be executed, the problems of slow calculation speed, dimension explosion and the like are easily caused, and therefore the Deep Q_learning algorithm, namely the DQN algorithm, is provided, and a Deep neural network is added to the DQN algorithm on the basis of the Q_learning algorithm for calculating the Q value, so that the problems of dimension explosion and the like caused by the Q_learning algorithm are solved.
The basic idea of the DQN algorithm is to combine the reinforcement Learning Q_learning algorithm with a deep neural network, calculate a return value through the neural network to replace a Q table, reduce an error value between Q estimation and Q reality through continuous Learning, further continuously update a targetQ network, optimize weights and finally achieve the purpose of autonomous path planning. However, the DQN algorithm needs to search the learning space continuously, and the search has great blindness and unnecessary, so that the algorithm has the problems of low environmental utilization rate, low search efficiency and the like, and further the defects of low algorithm learning efficiency, long search time, long search path and the like are easily caused.
The a-Star algorithm is a direct search method in a static road network which is most effective in solving the shortest path, and is also an effective algorithm for solving a plurality of search problems. The closer the distance estimate in the algorithm is to the actual value, the faster the final search speed. The path finding algorithm is a typical practice of heuristic exploration, an estimated value (i.e. heuristic) is bound to each node in the path finding process, an estimated value priority principle is adopted in the traversing process of the node, and the node with the better estimated value can be traversed preferentially. The definition of the estimation function is important, significantly affecting the algorithm efficiency.
In the prior art, the distance cost is used as an important index for evaluating a path, which plays an important role in path planning, and the existing algorithm designed by taking the distance cost as a core idea is more used in global path planning (an algorithm), so that the work can not be completed well in a dynamic environment. While the safety is the primary judgment index of path planning, the importance is self-evident, but the problem of local optimum is easily caused by considering only the safety.
Disclosure of Invention
Aiming at the problems, the invention provides a self-learning path planning method taking safety and distance cost as constraint, designs a heuristic function comprehensively considering the safety cost and the distance cost by referring to the design thought of an A-th heuristic function, introduces the heuristic function into the design of a reward function in a reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function.
The invention adopts the following technical scheme for solving the technical problems:
a self-learning path planning method constrained by security and distance costs, the method comprising the steps of:
acquiring position data of an intelligent agent at the current moment and a preset track of the intelligent agent;
acquiring the current expected running direction of the intelligent agent by using a trained DQN model according to the intelligent agent position data and the preset track;
and controlling the running direction of the intelligent body according to the current expected running direction.
Further, training the DQN model according to historical data of the agent and a preset track.
Further, the reward function of the DQN model is:
where k is the distance boundary threshold,obs is the distance between the agent and the nearest obstacle,e is the distance between the agent and the target point, < >>D is the distance between the starting point and the agent, and H is the straight line segment distance between the starting point and the target point.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
the invention designs a heuristic function which comprehensively considers the safety cost and the distance cost, and introduces the heuristic function into the design of a reward function in the reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function.
Detailed Description
In the prior art, the distance cost is used as an important index for evaluating a path, which plays an important role in path planning, and the existing algorithm designed by taking the distance cost as a core idea is more used in global path planning (an algorithm), so that the work can not be completed well in a dynamic environment. While the safety is the primary judgment index of path planning, the importance is self-evident, but the problem of local optimum is easily caused by considering only the safety.
The patent designs a heuristic function comprehensively considering the safety cost and the distance cost, and introduces the heuristic function into the design of a reward function in the reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function, and the specific steps are as follows:
step 1: heuristic function design of distance cost
Mainly by taking the thought of A-heuristic function design as reference, a heuristic function mainly based on distance cost is designed, and an intelligent agent is guided to learn out the shortest path, and the design is as follows:
wherein D is the distance between the starting point and the agent, H is the straight line distance between the starting point and the target point, and E is the distance between the agent and the target point.
Step 2: heuristic function design of security cost
In order to ensure the safety of the intelligent body in the learning process, a potential force field method is designed to represent the distance between the intelligent body and the obstacle and the distance between the intelligent body and the target point, and the potential force field method is designed as follows:
the design threshold k represents a distance boundary, and when the distance between the intelligent body and the obstacle or the target point is smaller than k, the intelligent body enters the potential force field, otherwise, the intelligent body does not enter the potential force field. After entering the potential field, the potential field can be subjected to a resistance field caused by an obstacle or a gravitational field caused by a target point:
wherein obs is the distance between the agent and the nearest obstacle; o(s) is negative, directing the agent away from the obstacle;
t(s) is positive, directing the agent towards the target point.
Step 3: a new heuristic function is designed.
The method combines the steps 1 and 2, integrates the safety and the distance cost, and enables the intelligent body to learn the shortest path on the premise of ensuring the safety:
step 4: combining the heuristic function designed in the step 3 with the reinforcement learning DQN algorithm.
Introducing the new heuristic function designed in the step 3 into a reward function of the DQN algorithm to guide the intelligent agent to learn a safe and shortest path:
wherein, the return value of the robot reaching the target point is T(s); the return value of the obstacle reached by the robot is O(s); the remaining case is F1.
The application also provides a self-learning path planning system constrained by safety and distance cost, comprising: a memory and a processor; the memory stores a computer program which, when executed by the processor, implements the self-learning path planning method constrained by security and distance costs described above.
The application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the self-learning path planning method constrained by the safety and distance cost when being executed by a processor. The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that the above embodiments are only for aiding in understanding the method of the present application and its core idea, and that it will be obvious to those skilled in the art that several improvements and modifications can be made to the present application without departing from the principle of the present application, and these improvements and modifications are also within the scope of the claims of the present application.
Claims (4)
1. A self-learning path planning method taking security and distance cost as constraint is characterized by comprising the following steps:
acquiring position data of an intelligent agent at the current moment and a preset track of the intelligent agent;
acquiring the current expected running direction of the intelligent agent by using a trained DQN model according to the intelligent agent position data and the preset track;
controlling the running direction of the intelligent body according to the current expected running direction;
the reward function of the DQN model is:
where k is the distance boundary threshold,obs is the distance between the agent and the nearest obstacle,e is the distance between the agent and the target point, < >>D is the distance between the starting point and the agent, and H is the straight line segment distance between the starting point and the target point.
2. The self-learning path planning method of claim 1 wherein the DQN model is trained based on historical data of the agent and a predetermined trajectory.
3. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor implements the self-learning path planning method constrained by a security, distance cost according to any of claims 1 to 2.
4. A self-learning path planning system constrained by a security, distance cost, comprising: a memory and a processor; the memory has stored thereon a computer program which, when executed by the processor, implements the self-learning path planning method constrained by a security, distance cost as claimed in any one of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110550501.7A CN113177664B (en) | 2021-05-20 | 2021-05-20 | Self-learning path planning method taking safety and distance cost as constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110550501.7A CN113177664B (en) | 2021-05-20 | 2021-05-20 | Self-learning path planning method taking safety and distance cost as constraint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113177664A CN113177664A (en) | 2021-07-27 |
CN113177664B true CN113177664B (en) | 2024-03-19 |
Family
ID=76929400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110550501.7A Active CN113177664B (en) | 2021-05-20 | 2021-05-20 | Self-learning path planning method taking safety and distance cost as constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113177664B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017173990A1 (en) * | 2016-04-07 | 2017-10-12 | 北京进化者机器人科技有限公司 | Method for planning shortest path in robot obstacle avoidance |
CN108776483A (en) * | 2018-08-16 | 2018-11-09 | 圆通速递有限公司 | AGV paths planning methods and system based on ant group algorithm and multiple agent Q study |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment |
CN110646009A (en) * | 2019-09-27 | 2020-01-03 | 北京邮电大学 | DQN-based vehicle automatic driving path planning method and device |
CN110883776A (en) * | 2019-11-29 | 2020-03-17 | 河南大学 | Robot path planning algorithm for improving DQN under quick search mechanism |
CN111780777A (en) * | 2020-07-13 | 2020-10-16 | 江苏中科智能制造研究院有限公司 | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning |
CN112665603A (en) * | 2020-12-16 | 2021-04-16 | 的卢技术有限公司 | Multi-vehicle path planning method based on improvement with time window A |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10695911B2 (en) * | 2018-01-12 | 2020-06-30 | Futurewei Technologies, Inc. | Robot navigation and object tracking |
-
2021
- 2021-05-20 CN CN202110550501.7A patent/CN113177664B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017173990A1 (en) * | 2016-04-07 | 2017-10-12 | 北京进化者机器人科技有限公司 | Method for planning shortest path in robot obstacle avoidance |
CN108776483A (en) * | 2018-08-16 | 2018-11-09 | 圆通速递有限公司 | AGV paths planning methods and system based on ant group algorithm and multiple agent Q study |
CN110646009A (en) * | 2019-09-27 | 2020-01-03 | 北京邮电大学 | DQN-based vehicle automatic driving path planning method and device |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment |
CN110883776A (en) * | 2019-11-29 | 2020-03-17 | 河南大学 | Robot path planning algorithm for improving DQN under quick search mechanism |
CN111780777A (en) * | 2020-07-13 | 2020-10-16 | 江苏中科智能制造研究院有限公司 | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning |
CN112665603A (en) * | 2020-12-16 | 2021-04-16 | 的卢技术有限公司 | Multi-vehicle path planning method based on improvement with time window A |
Non-Patent Citations (2)
Title |
---|
The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method;Xing WuHaolei ChenHamido Fujita;《Knowledge-Based Systems》;全文 * |
基于强化学习的AUV行为重规划方法研究;王力锋;《中国优秀硕士学位论文全文数据库》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113177664A (en) | 2021-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109540159B (en) | Rapid and complete automatic driving track planning method | |
CN109791409B (en) | Motion control decision for autonomous vehicles | |
CN111332285B (en) | Method and device for vehicle to avoid obstacle, electronic equipment and storage medium | |
US10324469B2 (en) | System and method for controlling motion of vehicle in shared environment | |
JP2022516383A (en) | Autonomous vehicle planning | |
Stefansson et al. | Human-robot interaction for truck platooning using hierarchical dynamic games | |
WO2017197170A1 (en) | Safely controlling an autonomous entity in presence of intelligent agents | |
CN113177664B (en) | Self-learning path planning method taking safety and distance cost as constraint | |
CN110688920A (en) | Unmanned control method and device and server | |
US20220155732A9 (en) | System and Method of Efficient, Continuous, and Safe Learning Using First Principles and Constraints | |
CN105956704A (en) | Destination identification method for plug-in type hybrid vehicle | |
CN115826581A (en) | Mobile robot path planning algorithm combining fuzzy control and reinforcement learning | |
RU2019143947A (en) | METHOD AND SYSTEM FOR CALCULATING DATA FOR CONTROLLING THE OPERATION OF AN UNMOUNTED VEHICLE | |
CN113341941A (en) | Control method and device of unmanned equipment | |
CN112415995A (en) | Planning control method based on real-time safety boundary | |
CN112255628A (en) | Obstacle trajectory prediction method, apparatus, device, and medium | |
CN116182884A (en) | Intelligent vehicle local path planning method based on transverse and longitudinal decoupling of frenet coordinate system | |
CN105109485A (en) | Driving method and system | |
CN113110449B (en) | Simulation system of vehicle automatic driving technology | |
Han et al. | Reinforcement learning guided by double replay memory | |
CN112612267B (en) | Automatic driving path planning method and device | |
CN114132340A (en) | Lane change trajectory prediction method and device and computer storage medium | |
CN112947495A (en) | Model training method, unmanned equipment control method and device | |
US20230162539A1 (en) | Driving decision-making method and apparatus and chip | |
CN116734877A (en) | Robot dynamic obstacle avoidance method based on improved A-algorithm and dynamic window method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |