CN113177664B - Self-learning path planning method taking safety and distance cost as constraint - Google Patents

Self-learning path planning method taking safety and distance cost as constraint Download PDF

Info

Publication number
CN113177664B
CN113177664B CN202110550501.7A CN202110550501A CN113177664B CN 113177664 B CN113177664 B CN 113177664B CN 202110550501 A CN202110550501 A CN 202110550501A CN 113177664 B CN113177664 B CN 113177664B
Authority
CN
China
Prior art keywords
distance
path planning
self
agent
planning method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110550501.7A
Other languages
Chinese (zh)
Other versions
CN113177664A (en
Inventor
陈天星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dilu Technology Co Ltd
Original Assignee
Dilu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dilu Technology Co Ltd filed Critical Dilu Technology Co Ltd
Priority to CN202110550501.7A priority Critical patent/CN113177664B/en
Publication of CN113177664A publication Critical patent/CN113177664A/en
Application granted granted Critical
Publication of CN113177664B publication Critical patent/CN113177664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a self-learning path planning method taking security and distance cost as constraints, which designs a heuristic function comprehensively considering the security cost and the distance cost by referring to the design thought of an A-heuristic function, and introduces the heuristic function into the design of a reward function in a reinforcement learning DQN algorithm, so that an intelligent agent can be guided to find a safe and shortest path through a new reward function.

Description

Self-learning path planning method taking safety and distance cost as constraint
Technical Field
The invention relates to a self-learning path planning method taking safety and distance cost as constraints, and belongs to the field of intelligent cabin display.
Background
Reinforcement learning is a closed-loop learning method by reference to experience, and a robot continuously performs information interaction with the environment, so that an autonomous learning process is achieved. The process of interacting between the robot and the environment may be described as a markov decision problem.
The Q_learning algorithm in reinforcement Learning is widely applied to a robot path planning technology, and the robot interacts with the environment through the Q_learning algorithm so as to achieve the purpose of autonomous path planning. Since the Q_learning algorithm is used for calculating the value in the Q table, and then the action with larger Q value is selected as the action to be executed, the problems of slow calculation speed, dimension explosion and the like are easily caused, and therefore the Deep Q_learning algorithm, namely the DQN algorithm, is provided, and a Deep neural network is added to the DQN algorithm on the basis of the Q_learning algorithm for calculating the Q value, so that the problems of dimension explosion and the like caused by the Q_learning algorithm are solved.
The basic idea of the DQN algorithm is to combine the reinforcement Learning Q_learning algorithm with a deep neural network, calculate a return value through the neural network to replace a Q table, reduce an error value between Q estimation and Q reality through continuous Learning, further continuously update a targetQ network, optimize weights and finally achieve the purpose of autonomous path planning. However, the DQN algorithm needs to search the learning space continuously, and the search has great blindness and unnecessary, so that the algorithm has the problems of low environmental utilization rate, low search efficiency and the like, and further the defects of low algorithm learning efficiency, long search time, long search path and the like are easily caused.
The a-Star algorithm is a direct search method in a static road network which is most effective in solving the shortest path, and is also an effective algorithm for solving a plurality of search problems. The closer the distance estimate in the algorithm is to the actual value, the faster the final search speed. The path finding algorithm is a typical practice of heuristic exploration, an estimated value (i.e. heuristic) is bound to each node in the path finding process, an estimated value priority principle is adopted in the traversing process of the node, and the node with the better estimated value can be traversed preferentially. The definition of the estimation function is important, significantly affecting the algorithm efficiency.
In the prior art, the distance cost is used as an important index for evaluating a path, which plays an important role in path planning, and the existing algorithm designed by taking the distance cost as a core idea is more used in global path planning (an algorithm), so that the work can not be completed well in a dynamic environment. While the safety is the primary judgment index of path planning, the importance is self-evident, but the problem of local optimum is easily caused by considering only the safety.
Disclosure of Invention
Aiming at the problems, the invention provides a self-learning path planning method taking safety and distance cost as constraint, designs a heuristic function comprehensively considering the safety cost and the distance cost by referring to the design thought of an A-th heuristic function, introduces the heuristic function into the design of a reward function in a reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function.
The invention adopts the following technical scheme for solving the technical problems:
a self-learning path planning method constrained by security and distance costs, the method comprising the steps of:
acquiring position data of an intelligent agent at the current moment and a preset track of the intelligent agent;
acquiring the current expected running direction of the intelligent agent by using a trained DQN model according to the intelligent agent position data and the preset track;
and controlling the running direction of the intelligent body according to the current expected running direction.
Further, training the DQN model according to historical data of the agent and a preset track.
Further, the reward function of the DQN model is:
where k is the distance boundary threshold,obs is the distance between the agent and the nearest obstacle,e is the distance between the agent and the target point, < >>D is the distance between the starting point and the agent, and H is the straight line segment distance between the starting point and the target point.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
the invention designs a heuristic function which comprehensively considers the safety cost and the distance cost, and introduces the heuristic function into the design of a reward function in the reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function.
Detailed Description
In the prior art, the distance cost is used as an important index for evaluating a path, which plays an important role in path planning, and the existing algorithm designed by taking the distance cost as a core idea is more used in global path planning (an algorithm), so that the work can not be completed well in a dynamic environment. While the safety is the primary judgment index of path planning, the importance is self-evident, but the problem of local optimum is easily caused by considering only the safety.
The patent designs a heuristic function comprehensively considering the safety cost and the distance cost, and introduces the heuristic function into the design of a reward function in the reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function, and the specific steps are as follows:
step 1: heuristic function design of distance cost
Mainly by taking the thought of A-heuristic function design as reference, a heuristic function mainly based on distance cost is designed, and an intelligent agent is guided to learn out the shortest path, and the design is as follows:
wherein D is the distance between the starting point and the agent, H is the straight line distance between the starting point and the target point, and E is the distance between the agent and the target point.
Step 2: heuristic function design of security cost
In order to ensure the safety of the intelligent body in the learning process, a potential force field method is designed to represent the distance between the intelligent body and the obstacle and the distance between the intelligent body and the target point, and the potential force field method is designed as follows:
the design threshold k represents a distance boundary, and when the distance between the intelligent body and the obstacle or the target point is smaller than k, the intelligent body enters the potential force field, otherwise, the intelligent body does not enter the potential force field. After entering the potential field, the potential field can be subjected to a resistance field caused by an obstacle or a gravitational field caused by a target point:
wherein obs is the distance between the agent and the nearest obstacle; o(s) is negative, directing the agent away from the obstacle;
t(s) is positive, directing the agent towards the target point.
Step 3: a new heuristic function is designed.
The method combines the steps 1 and 2, integrates the safety and the distance cost, and enables the intelligent body to learn the shortest path on the premise of ensuring the safety:
step 4: combining the heuristic function designed in the step 3 with the reinforcement learning DQN algorithm.
Introducing the new heuristic function designed in the step 3 into a reward function of the DQN algorithm to guide the intelligent agent to learn a safe and shortest path:
wherein, the return value of the robot reaching the target point is T(s); the return value of the obstacle reached by the robot is O(s); the remaining case is F1.
The application also provides a self-learning path planning system constrained by safety and distance cost, comprising: a memory and a processor; the memory stores a computer program which, when executed by the processor, implements the self-learning path planning method constrained by security and distance costs described above.
The application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the self-learning path planning method constrained by the safety and distance cost when being executed by a processor. The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that the above embodiments are only for aiding in understanding the method of the present application and its core idea, and that it will be obvious to those skilled in the art that several improvements and modifications can be made to the present application without departing from the principle of the present application, and these improvements and modifications are also within the scope of the claims of the present application.

Claims (4)

1. A self-learning path planning method taking security and distance cost as constraint is characterized by comprising the following steps:
acquiring position data of an intelligent agent at the current moment and a preset track of the intelligent agent;
acquiring the current expected running direction of the intelligent agent by using a trained DQN model according to the intelligent agent position data and the preset track;
controlling the running direction of the intelligent body according to the current expected running direction;
the reward function of the DQN model is:
where k is the distance boundary threshold,obs is the distance between the agent and the nearest obstacle,e is the distance between the agent and the target point, < >>D is the distance between the starting point and the agent, and H is the straight line segment distance between the starting point and the target point.
2. The self-learning path planning method of claim 1 wherein the DQN model is trained based on historical data of the agent and a predetermined trajectory.
3. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor implements the self-learning path planning method constrained by a security, distance cost according to any of claims 1 to 2.
4. A self-learning path planning system constrained by a security, distance cost, comprising: a memory and a processor; the memory has stored thereon a computer program which, when executed by the processor, implements the self-learning path planning method constrained by a security, distance cost as claimed in any one of claims 1 to 2.
CN202110550501.7A 2021-05-20 2021-05-20 Self-learning path planning method taking safety and distance cost as constraint Active CN113177664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110550501.7A CN113177664B (en) 2021-05-20 2021-05-20 Self-learning path planning method taking safety and distance cost as constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110550501.7A CN113177664B (en) 2021-05-20 2021-05-20 Self-learning path planning method taking safety and distance cost as constraint

Publications (2)

Publication Number Publication Date
CN113177664A CN113177664A (en) 2021-07-27
CN113177664B true CN113177664B (en) 2024-03-19

Family

ID=76929400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110550501.7A Active CN113177664B (en) 2021-05-20 2021-05-20 Self-learning path planning method taking safety and distance cost as constraint

Country Status (1)

Country Link
CN (1) CN113177664B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017173990A1 (en) * 2016-04-07 2017-10-12 北京进化者机器人科技有限公司 Method for planning shortest path in robot obstacle avoidance
CN108776483A (en) * 2018-08-16 2018-11-09 圆通速递有限公司 AGV paths planning methods and system based on ant group algorithm and multiple agent Q study
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110646009A (en) * 2019-09-27 2020-01-03 北京邮电大学 DQN-based vehicle automatic driving path planning method and device
CN110883776A (en) * 2019-11-29 2020-03-17 河南大学 Robot path planning algorithm for improving DQN under quick search mechanism
CN111780777A (en) * 2020-07-13 2020-10-16 江苏中科智能制造研究院有限公司 Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN112665603A (en) * 2020-12-16 2021-04-16 的卢技术有限公司 Multi-vehicle path planning method based on improvement with time window A

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10695911B2 (en) * 2018-01-12 2020-06-30 Futurewei Technologies, Inc. Robot navigation and object tracking

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017173990A1 (en) * 2016-04-07 2017-10-12 北京进化者机器人科技有限公司 Method for planning shortest path in robot obstacle avoidance
CN108776483A (en) * 2018-08-16 2018-11-09 圆通速递有限公司 AGV paths planning methods and system based on ant group algorithm and multiple agent Q study
CN110646009A (en) * 2019-09-27 2020-01-03 北京邮电大学 DQN-based vehicle automatic driving path planning method and device
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110883776A (en) * 2019-11-29 2020-03-17 河南大学 Robot path planning algorithm for improving DQN under quick search mechanism
CN111780777A (en) * 2020-07-13 2020-10-16 江苏中科智能制造研究院有限公司 Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN112665603A (en) * 2020-12-16 2021-04-16 的卢技术有限公司 Multi-vehicle path planning method based on improvement with time window A

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method;Xing WuHaolei ChenHamido Fujita;《Knowledge-Based Systems》;全文 *
基于强化学习的AUV行为重规划方法研究;王力锋;《中国优秀硕士学位论文全文数据库》;全文 *

Also Published As

Publication number Publication date
CN113177664A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN109540159B (en) Rapid and complete automatic driving track planning method
CN109791409B (en) Motion control decision for autonomous vehicles
CN111332285B (en) Method and device for vehicle to avoid obstacle, electronic equipment and storage medium
US10324469B2 (en) System and method for controlling motion of vehicle in shared environment
JP2022516383A (en) Autonomous vehicle planning
Stefansson et al. Human-robot interaction for truck platooning using hierarchical dynamic games
WO2017197170A1 (en) Safely controlling an autonomous entity in presence of intelligent agents
CN113177664B (en) Self-learning path planning method taking safety and distance cost as constraint
CN110688920A (en) Unmanned control method and device and server
US20220155732A9 (en) System and Method of Efficient, Continuous, and Safe Learning Using First Principles and Constraints
CN105956704A (en) Destination identification method for plug-in type hybrid vehicle
CN115826581A (en) Mobile robot path planning algorithm combining fuzzy control and reinforcement learning
RU2019143947A (en) METHOD AND SYSTEM FOR CALCULATING DATA FOR CONTROLLING THE OPERATION OF AN UNMOUNTED VEHICLE
CN113341941A (en) Control method and device of unmanned equipment
CN112415995A (en) Planning control method based on real-time safety boundary
CN112255628A (en) Obstacle trajectory prediction method, apparatus, device, and medium
CN116182884A (en) Intelligent vehicle local path planning method based on transverse and longitudinal decoupling of frenet coordinate system
CN105109485A (en) Driving method and system
CN113110449B (en) Simulation system of vehicle automatic driving technology
Han et al. Reinforcement learning guided by double replay memory
CN112612267B (en) Automatic driving path planning method and device
CN114132340A (en) Lane change trajectory prediction method and device and computer storage medium
CN112947495A (en) Model training method, unmanned equipment control method and device
US20230162539A1 (en) Driving decision-making method and apparatus and chip
CN116734877A (en) Robot dynamic obstacle avoidance method based on improved A-algorithm and dynamic window method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant