CN114326734A

CN114326734A - Path planning method and device

Info

Publication number: CN114326734A
Application number: CN202111635189.8A
Authority: CN
Inventors: 薛均晓; 董博威; 万里红; 冷洁; 张世文
Original assignee: Zhongyuan Power Intelligent Robot Co ltd
Current assignee: Zhongyuan Power Intelligent Robot Co ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-12
Anticipated expiration: 2041-12-29
Also published as: CN114326734B

Abstract

The invention discloses a path planning method and a device, wherein the method comprises the following steps: firstly, acquiring a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles respectively; and screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state, and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state. The method and the device can improve the accuracy rate of avoiding obstacles in the dense scene.

Description

Path planning method and device

Technical Field

The invention relates to the technical field of data processing, in particular to a path planning method and device.

Background

The path planning problem is a core component for realizing the autonomous movement of the intelligent equipment, and aims to find an optimal path from a starting point to a terminal point in a preset area range under the optimization goals of the intelligent equipment such as the minimum time or the shortest distance.

In dense scenarios, path planning becomes more difficult: a large number of obstacles not only bring a large observation space, but also require the intelligent agent to perform path planning and avoid obstacles in real time more quickly. For a global path, a plurality of obstacles are often required to be avoided in a dense scene, which requires a large amount of time to explore environment learning obstacle avoidance behaviors, resulting in slow convergence time and even non-convergence, and further resulting in low accuracy of path planning.

In summary, the existing path planning method has the problem of low obstacle avoidance accuracy in the dense scene.

Disclosure of Invention

The embodiment of the invention provides a path planning method and device, which improve the accuracy of obstacle avoidance in a dense scene.

A first aspect of an embodiment of the present application provides a path planning method, including:

acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;

and after the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent device to be planned according to the local environment state.

In a possible implementation manner of the first aspect, the second relative distance is set as a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: a decision network and a value network;

setting the second relative distance as a local environment state and inputting the local environment state into a decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;

after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the learning decision-making behavior is enhanced through a value network, and a first reward value is given to the decision-making behavior;

inputting the first reward value into the value network so that the value network calculates a strategy evaluation value of the decision behavior;

and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and finishing the path planning of the intelligent equipment to be planned.

In a possible implementation manner of the first aspect, the determining the decision behavior is an obstacle avoidance behavior, which specifically includes:

if the historical movement behavior is continuously executed, the first barrier is collided, and the decision-making behavior is not collided, the decision-making behavior is judged to be the obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not the obstacle avoidance behavior.

In a possible implementation manner of the first aspect, the method further includes:

acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;

calculating to obtain a second reward value according to the moving angle and a preset constraint angle;

and when the moving angle is smaller than the preset angle, giving a second reward value to the moving angle.

In a possible implementation manner of the first aspect, the obtaining the second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:

the local perception conditions include: a first preset value;

and when the first relative distance is judged to be smaller than a first preset value, taking the first relative distance as a second relative distance and obtaining the second relative distance.

A second aspect of the embodiments of the present application provides a path planning apparatus, including: an acquisition module and a planning module;

the acquisition module is used for acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;

the planning module is used for screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network plans a path of the intelligent device to be planned according to the local environment state.

In a possible implementation manner of the second aspect, the second relative distance is set as a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: a decision network and a value network;

In a possible implementation manner of the second aspect, the determining the decision behavior is an obstacle avoidance behavior, which specifically includes:

In a possible implementation manner of the second aspect, the method further includes:

In a possible implementation manner of the second aspect, the obtaining the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, and specifically includes:

the local perception conditions include: a first preset value;

Compared with the prior art, the path planning method and the path planning device provided by the embodiment of the invention comprise the following steps: firstly, acquiring a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles respectively; and screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state, and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state.

The beneficial effects are that: according to the method and the device, after the second relative distance is obtained through screening according to the local sensing condition, the second relative distance is set to be in the local environment state and is input into the neural network, so that the key environment state is kept, the complexity of the environment is reduced, the time for learning the obstacle avoidance behavior of the intelligent device environment to be planned can be shortened in the scene of high-density obstacles, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent device to be planned is improved.

Secondly, the embodiment of the invention introduces a global guidance mode by adding an angle constraint mode, guides the intelligent device to be planned from a global environment, gives a certain punishment when the movement angle exceeds the preset constraint angle, and gives a proper reward when the movement angle is smaller than the preset constraint angle, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range, and the intelligent device to be planned can be effectively prevented from being stuck in a local environment and being incapable of advancing.

Moreover, after the obstacle avoidance behaviors are screened out from the decision behaviors, the obstacle avoidance behaviors are strengthened, and corresponding reward values are given to the obstacle avoidance behaviors, so that the intelligent device to be planned can quickly memorize and learn how to avoid the obstacle.

Finally, the reward value is calculated, the goal of approaching the target position is taken as the optimization goal, the goal of directly starting towards the target position in each step of decision-making behaviors except the obstacle avoidance behavior is achieved, the finally planned route is smooth and short, the intelligent device to be planned can quickly reach the target position, and the moving efficiency is improved.

Drawings

Fig. 1 is a schematic flow chart of a path planning method according to an embodiment of the present invention;

FIG. 2 is a schematic view of a moving angle provided by an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flow chart of a path planning method according to an embodiment of the present invention is shown, which includes: S101-S102:

s101: and acquiring a plurality of first relative distances between the intelligent equipment to be planned and the first obstacles respectively.

Preferably, the first obstacle is an obstacle in a dense scene.

S102: and after the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent device to be planned according to the local environment state.

Specifically, in a scenario with high-density obstacles, a large state space is generated if the first relative distance is taken as an environmental state, but the influence of obstacles at different distances is different for the intelligent agent (i.e. the intelligent device to be planned): obstacles that are far away from the smart device to be planned are unlikely to collide with the agent in the next step. Therefore, the second relative distance is obtained by screening the plurality of first relative distances according to the local perception condition, and the second relative distance is the relative distance between the closer obstacle and the intelligent device to be planned. The second relative distance is set as a local environment state and is input into the neural network, so that not only is the key environment state maintained, but also the environment complexity is reduced, the time for learning the obstacle avoidance behavior of the intelligent equipment to be planned in the high-density obstacle scene can be reduced, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.

In this embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into a neural network so that the neural network performs path planning on the intelligent device to be planned according to the local environment state specifically includes:

the neural network includes: a decision network and a value network;

setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;

after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;

inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;

and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.

Specifically, the decision behavior includes a preset advance distance of the smart device to be planned, and the preset advance distance may be represented by the following coordinates:

a(STEP*a[0]，STEP*a[1])；

wherein a represents a decision behavior; STEP is a preset fixed STEP length and is used for zooming an action space; the preset advance distance of the intelligent device to be planned comprises the following steps: a movement distance STEP a [0] in the X direction and a movement distance STEP a [1] in the Y direction.

In a specific embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into the decision network specifically includes:

and normalizing the second relative distance, setting the normalized second relative distance as a local environment state, and inputting the local environment state into the decision network, wherein the normalization is as follows:

wherein the second relative distance comprises the second relative distance in the x-direction

And a second relative distance in the y-direction

[X₀，Y₀]For the coordinate position of the smart device to be planned, [ X ]_i，Y_i]Is the coordinate position of the first obstacle, W is the width of the environment, and H is the height of the environment.

In an embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:

if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.

The method specifically comprises the following steps: record the last step movement behavior (i.e. historical movement behavior) as a before executing each step movement behavior_t-1. Under the current environment state, a_t-1With the current decision behavior a_tA comparison is made. If the previous moving behavior executed in the current state is collided, but the current moving behavior (namely the decision-making behavior) executed is not collided, judging that the current moving behavior is the obstacle avoidance behavior, and finishing the screening of the obstacle avoidance behavior.

After the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, the second relative distance is set as a local environment state and is input into the neural network, and local observation of the environment is performed, namely the intelligent device to be planned interacts with the surrounding environment in a small range, but for the problem of path planning, a global path is required, and the global path is difficult to explore due to the environment interaction in the local range. Therefore, a global guidance mode needs to be introduced by adding an angle constraint mode, and the purpose of guiding the behavior of the intelligent device to be planned is to be achieved in a global environment. The angle constraint mode specifically includes: acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position; calculating to obtain a second reward value according to the moving angle and a preset constraint angle; and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.

Furthermore, the intelligent device to be planned gradually searches for feasible paths through an exploration environment, the moving direction of the intelligent body is limited in an angle constraint mode, when the moving angle exceeds a preset constraint angle, a certain punishment is given, and when the moving angle is smaller than the preset constraint angle, a proper reward is given, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range.

Specifically, the calculation process of the second prize value may be represented by the following formula:

R＝(15-θ)*γ；

wherein, R is the second reward value, 15 is the preset constraint angle, theta is the movement angle, gamma is the scale factor, and then (15-theta) represents the included angle. When the included angle is smaller, the advancing direction is closer to the target direction, and the reward value is larger; the larger the angle, the smaller the prize value.

Further, when the included angle is larger than the set angle difference, punishment is given, and the punishment is higher if the included angle is larger.

In this embodiment, the obtaining a second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:

the local perception conditions include: a first preset value;

and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.

In this embodiment, the path planning problem is modeled as a reinforcement learning problem, and global path planning is implemented in a sequential decision manner. The concrete expression is as follows: the intelligent device to be planned acquires the environmental state, makes a decision behavior (the decision behavior comprises a preset advancing distance and a preset advancing direction) through a decision network, controls the intelligent device to be planned to move according to the decision behavior, and inputs the changed environmental state into the decision network again when the environmental state changes, and repeats the decision process until the intelligent device to be planned reaches the target position.

In model-free reinforcement learning, transition probabilities between states are not determined, and the learning process mainly consists of policy evaluation and policy improvement. 1. And (3) policy evaluation: the current strategy is evaluated by generally adopting a mode of calculating a value function, including a state value function and a behavior value function, and adopting a random sample estimation value as an evaluation standard. And fitting the value function by using a neural network, directly outputting a specific value, and then reducing the gap between the specific value and the actual value by updating network parameters. 2. Strategy improvement: after the strategy evaluation value is obtained, updating the strategy according to the evaluation value, gradually improving the strategy to obtain higher value, and specifically mapping the improvement process to the updating of the network parameters.

Network updating: the neural network used in the embodiment of the invention mainly comprises two parts: decision networks and value networks. The decision network is used for outputting decision behaviors, and the value network is used for evaluating the decision behaviors. And the decision network and the value network adopt a gradient descent mode to update the network.

Parameter gradients for decision networks

As follows:

wherein the content of the first and second substances,

for value gradients, this parameter is derived from the value network with an update goal of maximizing value. Further, s_iIs the environmental state at the i-th time, a_iIs the motion at the ith time, theta^QAnd m is the number of samples, and n is the number of samples extracted from the experience pool each time. Due to the gradient descent mode, the value maximization is realized by taking the negative gradient as an update.

The gradient of the parameters of the value network is as follows:

wherein, y_i＝r_i+γQ′(s_i+1，a_i+1|θ^Q′) Expressed as a value criterion at the current moment, r_iThe prize value is expressed as the environmental feedback at the current time. The network is updated with the gap that minimizes the target value. Further, s_i+1Is the environmental state at the time of i +1, a_i+1Is the action at the i +1 th time, theta^Q’The network parameter at the i +1 th time is m, and n is the number of samples taken from the experience pool each time.

To further explain the calculation process of the movement angle, please refer to fig. 2, and fig. 2 is a schematic diagram of the movement angle according to an embodiment of the present invention.

Wherein a [ x, y ] represents a decision behavior, (x0, y0) represents a departure position of the intelligent device to be planned, and (xi, yi) represents a target position of the intelligent device to be planned.

The calculation of the movement angle θ is represented by the following equation:

θ＝arctan y/x-arctan(y_i-y₀)/(x_i-x₀)；

wherein X represents the moving distance of the intelligent device to be planned in the X direction, Y represents the moving distance of the intelligent device to be planned in the Y direction, and (xi-X0), (yi-Y0) represent the real-time moving direction of the intelligent device to be planned.

To further explain the path planning device, please refer to fig. 3, where fig. 3 is a schematic structural diagram of a path planning device according to an embodiment of the present invention, including: an acquisition module 301 and a planning module 302;

the obtaining module 301 is configured to obtain a plurality of first relative distances between the intelligent device to be planned and the plurality of first obstacles, respectively.

The planning module 302 is configured to, after a second relative distance is obtained by screening from the plurality of first relative distances according to a local perception condition, set the second relative distance as a local environment state and input the local environment state into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state.

the neural network includes: a decision network and a value network;

In this embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:

if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior; wherein the historical movement behavior is a previous movement behavior of the decision behavior.

In this embodiment, the method further includes:

acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;

and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.

the local perception conditions include: a first preset value;

According to the embodiment of the invention, a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles are obtained through an obtaining module; and then, after the second relative distance is obtained by screening from the plurality of first relative distances through the planning module according to the local perception condition, the second relative distance is set to be in a local environment state and is input into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state.

According to the method and the device, after the second relative distance is obtained through screening according to the local sensing condition, the second relative distance is set to be in the local environment state and is input into the neural network, so that the key environment state is kept, the complexity of the environment is reduced, the time for learning the obstacle avoidance behavior of the intelligent device environment to be planned can be shortened in the scene of high-density obstacles, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent device to be planned is improved.

The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims

1. A method of path planning, comprising:

and after a second relative distance is obtained by screening from the plurality of first relative distances according to a local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into a neural network, so that the neural network plans a path of the intelligent equipment to be planned according to the local environment state.

2. The path planning method according to claim 1, wherein the second relative distance is set as a local environment state and is input to a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: a decision network and a value network;

3. The path planning method according to claim 2, wherein the determining the decision behavior is an obstacle avoidance behavior, and specifically includes:

4. A path planning method according to claim 3, further comprising:

5. The path planning method according to claim 4, wherein the obtaining of the second relative distance is based on a local sensing condition and a plurality of the first relative distances, and specifically includes:

the local perception conditions include: a first preset value;

6. A path planning apparatus, comprising: an acquisition module and a planning module;

the planning module is used for screening a second relative distance from the plurality of first relative distances according to local perception conditions, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network plans the path of the intelligent device to be planned according to the local environment state.

7. The path planning device according to claim 6, wherein the second relative distance is set as a local environment state and is input to a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: a decision network and a value network;

8. The path planning device according to claim 7, wherein the determining the decision behavior is an obstacle avoidance behavior, specifically:

9. A path planner according to claim 8, further comprising:

10. The path planning device according to claim 9, wherein the second relative distance is obtained according to a local sensing condition and a plurality of the first relative distances, and specifically:

the local perception conditions include: a first preset value;