CN113177664B

CN113177664B - Self-learning path planning method taking safety and distance cost as constraint

Info

Publication number: CN113177664B
Application number: CN202110550501.7A
Authority: CN
Inventors: 陈天星
Original assignee: Dilu Technology Co Ltd
Current assignee: Dilu Technology Co Ltd
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2024-03-19
Anticipated expiration: 2041-05-20
Also published as: CN113177664A

Abstract

The invention discloses a self-learning path planning method taking security and distance cost as constraints, which designs a heuristic function comprehensively considering the security cost and the distance cost by referring to the design thought of an A-heuristic function, and introduces the heuristic function into the design of a reward function in a reinforcement learning DQN algorithm, so that an intelligent agent can be guided to find a safe and shortest path through a new reward function.

Description

Self-learning path planning method taking safety and distance cost as constraint

Technical Field

The invention relates to a self-learning path planning method taking safety and distance cost as constraints, and belongs to the field of intelligent cabin display.

Background

Reinforcement learning is a closed-loop learning method by reference to experience, and a robot continuously performs information interaction with the environment, so that an autonomous learning process is achieved. The process of interacting between the robot and the environment may be described as a markov decision problem.

The Q_learning algorithm in reinforcement Learning is widely applied to a robot path planning technology, and the robot interacts with the environment through the Q_learning algorithm so as to achieve the purpose of autonomous path planning. Since the Q_learning algorithm is used for calculating the value in the Q table, and then the action with larger Q value is selected as the action to be executed, the problems of slow calculation speed, dimension explosion and the like are easily caused, and therefore the Deep Q_learning algorithm, namely the DQN algorithm, is provided, and a Deep neural network is added to the DQN algorithm on the basis of the Q_learning algorithm for calculating the Q value, so that the problems of dimension explosion and the like caused by the Q_learning algorithm are solved.

The basic idea of the DQN algorithm is to combine the reinforcement Learning Q_learning algorithm with a deep neural network, calculate a return value through the neural network to replace a Q table, reduce an error value between Q estimation and Q reality through continuous Learning, further continuously update a targetQ network, optimize weights and finally achieve the purpose of autonomous path planning. However, the DQN algorithm needs to search the learning space continuously, and the search has great blindness and unnecessary, so that the algorithm has the problems of low environmental utilization rate, low search efficiency and the like, and further the defects of low algorithm learning efficiency, long search time, long search path and the like are easily caused.

The a-Star algorithm is a direct search method in a static road network which is most effective in solving the shortest path, and is also an effective algorithm for solving a plurality of search problems. The closer the distance estimate in the algorithm is to the actual value, the faster the final search speed. The path finding algorithm is a typical practice of heuristic exploration, an estimated value (i.e. heuristic) is bound to each node in the path finding process, an estimated value priority principle is adopted in the traversing process of the node, and the node with the better estimated value can be traversed preferentially. The definition of the estimation function is important, significantly affecting the algorithm efficiency.

In the prior art, the distance cost is used as an important index for evaluating a path, which plays an important role in path planning, and the existing algorithm designed by taking the distance cost as a core idea is more used in global path planning (an algorithm), so that the work can not be completed well in a dynamic environment. While the safety is the primary judgment index of path planning, the importance is self-evident, but the problem of local optimum is easily caused by considering only the safety.

Disclosure of Invention

Aiming at the problems, the invention provides a self-learning path planning method taking safety and distance cost as constraint, designs a heuristic function comprehensively considering the safety cost and the distance cost by referring to the design thought of an A-th heuristic function, introduces the heuristic function into the design of a reward function in a reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function.

The invention adopts the following technical scheme for solving the technical problems:

a self-learning path planning method constrained by security and distance costs, the method comprising the steps of:

acquiring position data of an intelligent agent at the current moment and a preset track of the intelligent agent;

acquiring the current expected running direction of the intelligent agent by using a trained DQN model according to the intelligent agent position data and the preset track;

and controlling the running direction of the intelligent body according to the current expected running direction.

Further, training the DQN model according to historical data of the agent and a preset track.

Further, the reward function of the DQN model is:

where k is the distance boundary threshold,obs is the distance between the agent and the nearest obstacle,e is the distance between the agent and the target point, < >>D is the distance between the starting point and the agent, and H is the straight line segment distance between the starting point and the target point.

Compared with the prior art, the technical scheme provided by the invention has the following technical effects:

the invention designs a heuristic function which comprehensively considers the safety cost and the distance cost, and introduces the heuristic function into the design of a reward function in the reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function.

Detailed Description

The patent designs a heuristic function comprehensively considering the safety cost and the distance cost, and introduces the heuristic function into the design of a reward function in the reinforcement learning DQN algorithm, and can guide an intelligent agent to find a safe and shortest path through a new reward function, and the specific steps are as follows:

step 1: heuristic function design of distance cost

Mainly by taking the thought of A-heuristic function design as reference, a heuristic function mainly based on distance cost is designed, and an intelligent agent is guided to learn out the shortest path, and the design is as follows:

wherein D is the distance between the starting point and the agent, H is the straight line distance between the starting point and the target point, and E is the distance between the agent and the target point.

Step 2: heuristic function design of security cost

In order to ensure the safety of the intelligent body in the learning process, a potential force field method is designed to represent the distance between the intelligent body and the obstacle and the distance between the intelligent body and the target point, and the potential force field method is designed as follows:

the design threshold k represents a distance boundary, and when the distance between the intelligent body and the obstacle or the target point is smaller than k, the intelligent body enters the potential force field, otherwise, the intelligent body does not enter the potential force field. After entering the potential field, the potential field can be subjected to a resistance field caused by an obstacle or a gravitational field caused by a target point:

wherein obs is the distance between the agent and the nearest obstacle; o(s) is negative, directing the agent away from the obstacle;

t(s) is positive, directing the agent towards the target point.

Step 3: a new heuristic function is designed.

The method combines the steps 1 and 2, integrates the safety and the distance cost, and enables the intelligent body to learn the shortest path on the premise of ensuring the safety:

step 4: combining the heuristic function designed in the step 3 with the reinforcement learning DQN algorithm.

Introducing the new heuristic function designed in the step 3 into a reward function of the DQN algorithm to guide the intelligent agent to learn a safe and shortest path:

wherein, the return value of the robot reaching the target point is T(s); the return value of the obstacle reached by the robot is O(s); the remaining case is F1.

The application also provides a self-learning path planning system constrained by safety and distance cost, comprising: a memory and a processor; the memory stores a computer program which, when executed by the processor, implements the self-learning path planning method constrained by security and distance costs described above.

The application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the self-learning path planning method constrained by the safety and distance cost when being executed by a processor. The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that the above embodiments are only for aiding in understanding the method of the present application and its core idea, and that it will be obvious to those skilled in the art that several improvements and modifications can be made to the present application without departing from the principle of the present application, and these improvements and modifications are also within the scope of the claims of the present application.

Claims

1. A self-learning path planning method taking security and distance cost as constraint is characterized by comprising the following steps:

controlling the running direction of the intelligent body according to the current expected running direction;

the reward function of the DQN model is:

2. The self-learning path planning method of claim 1 wherein the DQN model is trained based on historical data of the agent and a predetermined trajectory.

3. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor implements the self-learning path planning method constrained by a security, distance cost according to any of claims 1 to 2.

4. A self-learning path planning system constrained by a security, distance cost, comprising: a memory and a processor; the memory has stored thereon a computer program which, when executed by the processor, implements the self-learning path planning method constrained by a security, distance cost as claimed in any one of claims 1 to 2.