CN114326734A - Path planning method and device - Google Patents

Path planning method and device Download PDF

Info

Publication number
CN114326734A
CN114326734A CN202111635189.8A CN202111635189A CN114326734A CN 114326734 A CN114326734 A CN 114326734A CN 202111635189 A CN202111635189 A CN 202111635189A CN 114326734 A CN114326734 A CN 114326734A
Authority
CN
China
Prior art keywords
decision
behavior
value
relative distance
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111635189.8A
Other languages
Chinese (zh)
Other versions
CN114326734B (en
Inventor
薛均晓
董博威
万里红
冷洁
张世文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongyuan Power Intelligent Robot Co ltd
Original Assignee
Zhongyuan Power Intelligent Robot Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongyuan Power Intelligent Robot Co ltd filed Critical Zhongyuan Power Intelligent Robot Co ltd
Priority to CN202111635189.8A priority Critical patent/CN114326734B/en
Publication of CN114326734A publication Critical patent/CN114326734A/en
Application granted granted Critical
Publication of CN114326734B publication Critical patent/CN114326734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a path planning method and a device, wherein the method comprises the following steps: firstly, acquiring a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles respectively; and screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state, and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state. The method and the device can improve the accuracy rate of avoiding obstacles in the dense scene.

Description

Path planning method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a path planning method and device.
Background
The path planning problem is a core component for realizing the autonomous movement of the intelligent equipment, and aims to find an optimal path from a starting point to a terminal point in a preset area range under the optimization goals of the intelligent equipment such as the minimum time or the shortest distance.
In dense scenarios, path planning becomes more difficult: a large number of obstacles not only bring a large observation space, but also require the intelligent agent to perform path planning and avoid obstacles in real time more quickly. For a global path, a plurality of obstacles are often required to be avoided in a dense scene, which requires a large amount of time to explore environment learning obstacle avoidance behaviors, resulting in slow convergence time and even non-convergence, and further resulting in low accuracy of path planning.
In summary, the existing path planning method has the problem of low obstacle avoidance accuracy in the dense scene.
Disclosure of Invention
The embodiment of the invention provides a path planning method and device, which improve the accuracy of obstacle avoidance in a dense scene.
A first aspect of an embodiment of the present application provides a path planning method, including:
acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
and after the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent device to be planned according to the local environment state.
In a possible implementation manner of the first aspect, the second relative distance is set as a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the local environment state into a decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the learning decision-making behavior is enhanced through a value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network so that the value network calculates a strategy evaluation value of the decision behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and finishing the path planning of the intelligent equipment to be planned.
In a possible implementation manner of the first aspect, the determining the decision behavior is an obstacle avoidance behavior, which specifically includes:
if the historical movement behavior is continuously executed, the first barrier is collided, and the decision-making behavior is not collided, the decision-making behavior is judged to be the obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not the obstacle avoidance behavior.
In a possible implementation manner of the first aspect, the method further includes:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving a second reward value to the moving angle.
In a possible implementation manner of the first aspect, the obtaining the second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than a first preset value, taking the first relative distance as a second relative distance and obtaining the second relative distance.
A second aspect of the embodiments of the present application provides a path planning apparatus, including: an acquisition module and a planning module;
the acquisition module is used for acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
the planning module is used for screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network plans a path of the intelligent device to be planned according to the local environment state.
In a possible implementation manner of the second aspect, the second relative distance is set as a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the local environment state into a decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the learning decision-making behavior is enhanced through a value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network so that the value network calculates a strategy evaluation value of the decision behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and finishing the path planning of the intelligent equipment to be planned.
In a possible implementation manner of the second aspect, the determining the decision behavior is an obstacle avoidance behavior, which specifically includes:
if the historical movement behavior is continuously executed, the first barrier is collided, and the decision-making behavior is not collided, the decision-making behavior is judged to be the obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not the obstacle avoidance behavior.
In a possible implementation manner of the second aspect, the method further includes:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving a second reward value to the moving angle.
In a possible implementation manner of the second aspect, the obtaining the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, and specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than a first preset value, taking the first relative distance as a second relative distance and obtaining the second relative distance.
Compared with the prior art, the path planning method and the path planning device provided by the embodiment of the invention comprise the following steps: firstly, acquiring a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles respectively; and screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state, and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state.
The beneficial effects are that: according to the method and the device, after the second relative distance is obtained through screening according to the local sensing condition, the second relative distance is set to be in the local environment state and is input into the neural network, so that the key environment state is kept, the complexity of the environment is reduced, the time for learning the obstacle avoidance behavior of the intelligent device environment to be planned can be shortened in the scene of high-density obstacles, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent device to be planned is improved.
Secondly, the embodiment of the invention introduces a global guidance mode by adding an angle constraint mode, guides the intelligent device to be planned from a global environment, gives a certain punishment when the movement angle exceeds the preset constraint angle, and gives a proper reward when the movement angle is smaller than the preset constraint angle, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range, and the intelligent device to be planned can be effectively prevented from being stuck in a local environment and being incapable of advancing.
Moreover, after the obstacle avoidance behaviors are screened out from the decision behaviors, the obstacle avoidance behaviors are strengthened, and corresponding reward values are given to the obstacle avoidance behaviors, so that the intelligent device to be planned can quickly memorize and learn how to avoid the obstacle.
Finally, the reward value is calculated, the goal of approaching the target position is taken as the optimization goal, the goal of directly starting towards the target position in each step of decision-making behaviors except the obstacle avoidance behavior is achieved, the finally planned route is smooth and short, the intelligent device to be planned can quickly reach the target position, and the moving efficiency is improved.
Drawings
Fig. 1 is a schematic flow chart of a path planning method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a moving angle provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flow chart of a path planning method according to an embodiment of the present invention is shown, which includes: S101-S102:
s101: and acquiring a plurality of first relative distances between the intelligent equipment to be planned and the first obstacles respectively.
Preferably, the first obstacle is an obstacle in a dense scene.
S102: and after the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent device to be planned according to the local environment state.
Specifically, in a scenario with high-density obstacles, a large state space is generated if the first relative distance is taken as an environmental state, but the influence of obstacles at different distances is different for the intelligent agent (i.e. the intelligent device to be planned): obstacles that are far away from the smart device to be planned are unlikely to collide with the agent in the next step. Therefore, the second relative distance is obtained by screening the plurality of first relative distances according to the local perception condition, and the second relative distance is the relative distance between the closer obstacle and the intelligent device to be planned. The second relative distance is set as a local environment state and is input into the neural network, so that not only is the key environment state maintained, but also the environment complexity is reduced, the time for learning the obstacle avoidance behavior of the intelligent equipment to be planned in the high-density obstacle scene can be reduced, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.
In this embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into a neural network so that the neural network performs path planning on the intelligent device to be planned according to the local environment state specifically includes:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
Specifically, the decision behavior includes a preset advance distance of the smart device to be planned, and the preset advance distance may be represented by the following coordinates:
a(STEP*a[0],STEP*a[1]);
wherein a represents a decision behavior; STEP is a preset fixed STEP length and is used for zooming an action space; the preset advance distance of the intelligent device to be planned comprises the following steps: a movement distance STEP a [0] in the X direction and a movement distance STEP a [1] in the Y direction.
In a specific embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into the decision network specifically includes:
and normalizing the second relative distance, setting the normalized second relative distance as a local environment state, and inputting the local environment state into the decision network, wherein the normalization is as follows:
Figure BDA0003441800250000061
Figure BDA0003441800250000062
wherein the second relative distance comprises the second relative distance in the x-direction
Figure BDA0003441800250000063
And a second relative distance in the y-direction
Figure BDA0003441800250000064
[X0,Y0]For the coordinate position of the smart device to be planned, [ X ]i,Yi]Is the coordinate position of the first obstacle, W is the width of the environment, and H is the height of the environment.
In an embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.
The method specifically comprises the following steps: record the last step movement behavior (i.e. historical movement behavior) as a before executing each step movement behaviort-1. Under the current environment state, at-1With the current decision behavior atA comparison is made. If the previous moving behavior executed in the current state is collided, but the current moving behavior (namely the decision-making behavior) executed is not collided, judging that the current moving behavior is the obstacle avoidance behavior, and finishing the screening of the obstacle avoidance behavior.
After the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, the second relative distance is set as a local environment state and is input into the neural network, and local observation of the environment is performed, namely the intelligent device to be planned interacts with the surrounding environment in a small range, but for the problem of path planning, a global path is required, and the global path is difficult to explore due to the environment interaction in the local range. Therefore, a global guidance mode needs to be introduced by adding an angle constraint mode, and the purpose of guiding the behavior of the intelligent device to be planned is to be achieved in a global environment. The angle constraint mode specifically includes: acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position; calculating to obtain a second reward value according to the moving angle and a preset constraint angle; and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
Furthermore, the intelligent device to be planned gradually searches for feasible paths through an exploration environment, the moving direction of the intelligent body is limited in an angle constraint mode, when the moving angle exceeds a preset constraint angle, a certain punishment is given, and when the moving angle is smaller than the preset constraint angle, a proper reward is given, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range.
Specifically, the calculation process of the second prize value may be represented by the following formula:
R=(15-θ)*γ;
wherein, R is the second reward value, 15 is the preset constraint angle, theta is the movement angle, gamma is the scale factor, and then (15-theta) represents the included angle. When the included angle is smaller, the advancing direction is closer to the target direction, and the reward value is larger; the larger the angle, the smaller the prize value.
Further, when the included angle is larger than the set angle difference, punishment is given, and the punishment is higher if the included angle is larger.
In this embodiment, the obtaining a second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
In this embodiment, the path planning problem is modeled as a reinforcement learning problem, and global path planning is implemented in a sequential decision manner. The concrete expression is as follows: the intelligent device to be planned acquires the environmental state, makes a decision behavior (the decision behavior comprises a preset advancing distance and a preset advancing direction) through a decision network, controls the intelligent device to be planned to move according to the decision behavior, and inputs the changed environmental state into the decision network again when the environmental state changes, and repeats the decision process until the intelligent device to be planned reaches the target position.
In model-free reinforcement learning, transition probabilities between states are not determined, and the learning process mainly consists of policy evaluation and policy improvement. 1. And (3) policy evaluation: the current strategy is evaluated by generally adopting a mode of calculating a value function, including a state value function and a behavior value function, and adopting a random sample estimation value as an evaluation standard. And fitting the value function by using a neural network, directly outputting a specific value, and then reducing the gap between the specific value and the actual value by updating network parameters. 2. Strategy improvement: after the strategy evaluation value is obtained, updating the strategy according to the evaluation value, gradually improving the strategy to obtain higher value, and specifically mapping the improvement process to the updating of the network parameters.
Network updating: the neural network used in the embodiment of the invention mainly comprises two parts: decision networks and value networks. The decision network is used for outputting decision behaviors, and the value network is used for evaluating the decision behaviors. And the decision network and the value network adopt a gradient descent mode to update the network.
Parameter gradients for decision networks
Figure BDA0003441800250000081
As follows:
Figure BDA0003441800250000082
wherein the content of the first and second substances,
Figure BDA0003441800250000083
for value gradients, this parameter is derived from the value network with an update goal of maximizing value. Further, siIs the environmental state at the i-th time, aiIs the motion at the ith time, thetaQAnd m is the number of samples, and n is the number of samples extracted from the experience pool each time. Due to the gradient descent mode, the value maximization is realized by taking the negative gradient as an update.
The gradient of the parameters of the value network is as follows:
Figure BDA0003441800250000091
wherein, yi=ri+γQ′(si+1,ai+1Q′) Expressed as a value criterion at the current moment, riThe prize value is expressed as the environmental feedback at the current time. The network is updated with the gap that minimizes the target value. Further, si+1Is the environmental state at the time of i +1, ai+1Is the action at the i +1 th time, thetaQ’The network parameter at the i +1 th time is m, and n is the number of samples taken from the experience pool each time.
To further explain the calculation process of the movement angle, please refer to fig. 2, and fig. 2 is a schematic diagram of the movement angle according to an embodiment of the present invention.
Wherein a [ x, y ] represents a decision behavior, (x0, y0) represents a departure position of the intelligent device to be planned, and (xi, yi) represents a target position of the intelligent device to be planned.
The calculation of the movement angle θ is represented by the following equation:
θ=arctan y/x-arctan(yi-y0)/(xi-x0);
wherein X represents the moving distance of the intelligent device to be planned in the X direction, Y represents the moving distance of the intelligent device to be planned in the Y direction, and (xi-X0), (yi-Y0) represent the real-time moving direction of the intelligent device to be planned.
To further explain the path planning device, please refer to fig. 3, where fig. 3 is a schematic structural diagram of a path planning device according to an embodiment of the present invention, including: an acquisition module 301 and a planning module 302;
the obtaining module 301 is configured to obtain a plurality of first relative distances between the intelligent device to be planned and the plurality of first obstacles, respectively.
The planning module 302 is configured to, after a second relative distance is obtained by screening from the plurality of first relative distances according to a local perception condition, set the second relative distance as a local environment state and input the local environment state into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state.
In this embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into a neural network so that the neural network performs path planning on the intelligent device to be planned according to the local environment state specifically includes:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
In this embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior; wherein the historical movement behavior is a previous movement behavior of the decision behavior.
In this embodiment, the method further includes:
acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
In this embodiment, the obtaining a second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
According to the embodiment of the invention, a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles are obtained through an obtaining module; and then, after the second relative distance is obtained by screening from the plurality of first relative distances through the planning module according to the local perception condition, the second relative distance is set to be in a local environment state and is input into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state.
According to the method and the device, after the second relative distance is obtained through screening according to the local sensing condition, the second relative distance is set to be in the local environment state and is input into the neural network, so that the key environment state is kept, the complexity of the environment is reduced, the time for learning the obstacle avoidance behavior of the intelligent device environment to be planned can be shortened in the scene of high-density obstacles, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent device to be planned is improved.
Secondly, the embodiment of the invention introduces a global guidance mode by adding an angle constraint mode, guides the intelligent device to be planned from a global environment, gives a certain punishment when the movement angle exceeds the preset constraint angle, and gives a proper reward when the movement angle is smaller than the preset constraint angle, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range, and the intelligent device to be planned can be effectively prevented from being stuck in a local environment and being incapable of advancing.
Moreover, after the obstacle avoidance behaviors are screened out from the decision behaviors, the obstacle avoidance behaviors are strengthened, and corresponding reward values are given to the obstacle avoidance behaviors, so that the intelligent device to be planned can quickly memorize and learn how to avoid the obstacle.
Finally, the reward value is calculated, the goal of approaching the target position is taken as the optimization goal, the goal of directly starting towards the target position in each step of decision-making behaviors except the obstacle avoidance behavior is achieved, the finally planned route is smooth and short, the intelligent device to be planned can quickly reach the target position, and the moving efficiency is improved.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims (10)

1. A method of path planning, comprising:
acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
and after a second relative distance is obtained by screening from the plurality of first relative distances according to a local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into a neural network, so that the neural network plans a path of the intelligent equipment to be planned according to the local environment state.
2. The path planning method according to claim 1, wherein the second relative distance is set as a local environment state and is input to a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
3. The path planning method according to claim 2, wherein the determining the decision behavior is an obstacle avoidance behavior, and specifically includes:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.
4. A path planning method according to claim 3, further comprising:
acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
5. The path planning method according to claim 4, wherein the obtaining of the second relative distance is based on a local sensing condition and a plurality of the first relative distances, and specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
6. A path planning apparatus, comprising: an acquisition module and a planning module;
the acquisition module is used for acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
the planning module is used for screening a second relative distance from the plurality of first relative distances according to local perception conditions, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network plans the path of the intelligent device to be planned according to the local environment state.
7. The path planning device according to claim 6, wherein the second relative distance is set as a local environment state and is input to a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
8. The path planning device according to claim 7, wherein the determining the decision behavior is an obstacle avoidance behavior, specifically:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.
9. A path planner according to claim 8, further comprising:
acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
10. The path planning device according to claim 9, wherein the second relative distance is obtained according to a local sensing condition and a plurality of the first relative distances, and specifically:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
CN202111635189.8A 2021-12-29 2021-12-29 Path planning method and device Active CN114326734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111635189.8A CN114326734B (en) 2021-12-29 2021-12-29 Path planning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111635189.8A CN114326734B (en) 2021-12-29 2021-12-29 Path planning method and device

Publications (2)

Publication Number Publication Date
CN114326734A true CN114326734A (en) 2022-04-12
CN114326734B CN114326734B (en) 2024-03-08

Family

ID=81016080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111635189.8A Active CN114326734B (en) 2021-12-29 2021-12-29 Path planning method and device

Country Status (1)

Country Link
CN (1) CN114326734B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168324A (en) * 2017-06-08 2017-09-15 中国矿业大学 A kind of robot path planning method based on ANFIS fuzzy neural networks
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN110083165A (en) * 2019-05-21 2019-08-02 大连大学 A kind of robot paths planning method under complicated narrow environment
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111399541A (en) * 2020-03-30 2020-07-10 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN112977146A (en) * 2021-02-24 2021-06-18 中原动力智能机器人有限公司 Charging method and system for automatic driving vehicle and charging pile
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168324A (en) * 2017-06-08 2017-09-15 中国矿业大学 A kind of robot path planning method based on ANFIS fuzzy neural networks
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN110083165A (en) * 2019-05-21 2019-08-02 大连大学 A kind of robot paths planning method under complicated narrow environment
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111399541A (en) * 2020-03-30 2020-07-10 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN112977146A (en) * 2021-02-24 2021-06-18 中原动力智能机器人有限公司 Charging method and system for automatic driving vehicle and charging pile
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
梁宏伟,等: "移动机器人的遗传多点路径规划", 工程技术, pages 584 - 587 *
薛均晓,等: "基于深度强化学习的舰载机动态避障方法", 计算机辅助设计与图形学学报, pages 1102 - 1112 *
薛均晓,等: "密集障碍物场景下基于改进DDPG 的机器人路径规划方法", 2022中国自动化大会论文集, pages 1 - 6 *

Also Published As

Publication number Publication date
CN114326734B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
WO2021208771A1 (en) Reinforced learning method and device
CN112937564B (en) Lane change decision model generation method and unmanned vehicle lane change decision method and device
US11189171B2 (en) Traffic prediction with reparameterized pushforward policy for autonomous vehicles
US20170139423A1 (en) Control system and method for multi-vehicle systems
US11604469B2 (en) Route determining device, robot, and route determining method
CN112180950B (en) Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning
JP2022506404A (en) Methods and devices for determining vehicle speed
Wang et al. Autonomous ramp merge maneuver based on reinforcement learning with continuous action space
CN112669345B (en) Cloud deployment-oriented multi-target track tracking method and system
CN114489059A (en) Mobile robot path planning method based on D3QN-PER
Rafieisakhaei et al. Feedback motion planning under non-gaussian uncertainty and non-convex state constraints
Mohamed et al. Autonomous navigation of agvs in unknown cluttered environments: log-mppi control strategy
CN112255628A (en) Obstacle trajectory prediction method, apparatus, device, and medium
CN114442630A (en) Intelligent vehicle planning control method based on reinforcement learning and model prediction
CN114326734A (en) Path planning method and device
CN117141520A (en) Real-time track planning method, device and equipment
Wiering Reinforcement learning in dynamic environments using instantiated information
Wang et al. Tracking moving target for 6 degree-of-freedom robot manipulator with adaptive visual servoing based on deep reinforcement learning PID controller
Gopalakrishnan et al. Chance constraint based multi agent navigation under uncertainty
Ward et al. Towards risk minimizing trajectory planning in on-road scenarios
CN114169463A (en) Autonomous prediction lane information model training method and device
Hakobyan et al. Toward improving the distributional robustness of risk-aware controllers in learning-enabled environments
CN113158539A (en) Method for long-term trajectory prediction of traffic participants
CN112904837A (en) Data processing method, device and computer readable storage medium
Raj et al. Dynamic Obstacle Avoidance Technique for Mobile Robot Navigation Using Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant