CN114326734A - Path planning method and device - Google Patents
Path planning method and device Download PDFInfo
- Publication number
- CN114326734A CN114326734A CN202111635189.8A CN202111635189A CN114326734A CN 114326734 A CN114326734 A CN 114326734A CN 202111635189 A CN202111635189 A CN 202111635189A CN 114326734 A CN114326734 A CN 114326734A
- Authority
- CN
- China
- Prior art keywords
- decision
- behavior
- value
- relative distance
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000013528 artificial neural network Methods 0.000 claims abstract description 45
- 230000008447 perception Effects 0.000 claims abstract description 17
- 238000012216 screening Methods 0.000 claims abstract description 14
- 238000011156 evaluation Methods 0.000 claims description 17
- 230000006399 behavior Effects 0.000 description 98
- 230000007613 environmental effect Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Abstract
The invention discloses a path planning method and a device, wherein the method comprises the following steps: firstly, acquiring a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles respectively; and screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state, and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state. The method and the device can improve the accuracy rate of avoiding obstacles in the dense scene.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a path planning method and device.
Background
The path planning problem is a core component for realizing the autonomous movement of the intelligent equipment, and aims to find an optimal path from a starting point to a terminal point in a preset area range under the optimization goals of the intelligent equipment such as the minimum time or the shortest distance.
In dense scenarios, path planning becomes more difficult: a large number of obstacles not only bring a large observation space, but also require the intelligent agent to perform path planning and avoid obstacles in real time more quickly. For a global path, a plurality of obstacles are often required to be avoided in a dense scene, which requires a large amount of time to explore environment learning obstacle avoidance behaviors, resulting in slow convergence time and even non-convergence, and further resulting in low accuracy of path planning.
In summary, the existing path planning method has the problem of low obstacle avoidance accuracy in the dense scene.
Disclosure of Invention
The embodiment of the invention provides a path planning method and device, which improve the accuracy of obstacle avoidance in a dense scene.
A first aspect of an embodiment of the present application provides a path planning method, including:
acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
and after the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent device to be planned according to the local environment state.
In a possible implementation manner of the first aspect, the second relative distance is set as a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the local environment state into a decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the learning decision-making behavior is enhanced through a value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network so that the value network calculates a strategy evaluation value of the decision behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and finishing the path planning of the intelligent equipment to be planned.
In a possible implementation manner of the first aspect, the determining the decision behavior is an obstacle avoidance behavior, which specifically includes:
if the historical movement behavior is continuously executed, the first barrier is collided, and the decision-making behavior is not collided, the decision-making behavior is judged to be the obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not the obstacle avoidance behavior.
In a possible implementation manner of the first aspect, the method further includes:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving a second reward value to the moving angle.
In a possible implementation manner of the first aspect, the obtaining the second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than a first preset value, taking the first relative distance as a second relative distance and obtaining the second relative distance.
A second aspect of the embodiments of the present application provides a path planning apparatus, including: an acquisition module and a planning module;
the acquisition module is used for acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
the planning module is used for screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network plans a path of the intelligent device to be planned according to the local environment state.
In a possible implementation manner of the second aspect, the second relative distance is set as a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the local environment state into a decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the learning decision-making behavior is enhanced through a value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network so that the value network calculates a strategy evaluation value of the decision behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and finishing the path planning of the intelligent equipment to be planned.
In a possible implementation manner of the second aspect, the determining the decision behavior is an obstacle avoidance behavior, which specifically includes:
if the historical movement behavior is continuously executed, the first barrier is collided, and the decision-making behavior is not collided, the decision-making behavior is judged to be the obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not the obstacle avoidance behavior.
In a possible implementation manner of the second aspect, the method further includes:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving a second reward value to the moving angle.
In a possible implementation manner of the second aspect, the obtaining the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, and specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than a first preset value, taking the first relative distance as a second relative distance and obtaining the second relative distance.
Compared with the prior art, the path planning method and the path planning device provided by the embodiment of the invention comprise the following steps: firstly, acquiring a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles respectively; and screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state, and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state.
The beneficial effects are that: according to the method and the device, after the second relative distance is obtained through screening according to the local sensing condition, the second relative distance is set to be in the local environment state and is input into the neural network, so that the key environment state is kept, the complexity of the environment is reduced, the time for learning the obstacle avoidance behavior of the intelligent device environment to be planned can be shortened in the scene of high-density obstacles, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent device to be planned is improved.
Secondly, the embodiment of the invention introduces a global guidance mode by adding an angle constraint mode, guides the intelligent device to be planned from a global environment, gives a certain punishment when the movement angle exceeds the preset constraint angle, and gives a proper reward when the movement angle is smaller than the preset constraint angle, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range, and the intelligent device to be planned can be effectively prevented from being stuck in a local environment and being incapable of advancing.
Moreover, after the obstacle avoidance behaviors are screened out from the decision behaviors, the obstacle avoidance behaviors are strengthened, and corresponding reward values are given to the obstacle avoidance behaviors, so that the intelligent device to be planned can quickly memorize and learn how to avoid the obstacle.
Finally, the reward value is calculated, the goal of approaching the target position is taken as the optimization goal, the goal of directly starting towards the target position in each step of decision-making behaviors except the obstacle avoidance behavior is achieved, the finally planned route is smooth and short, the intelligent device to be planned can quickly reach the target position, and the moving efficiency is improved.
Drawings
Fig. 1 is a schematic flow chart of a path planning method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a moving angle provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flow chart of a path planning method according to an embodiment of the present invention is shown, which includes: S101-S102:
s101: and acquiring a plurality of first relative distances between the intelligent equipment to be planned and the first obstacles respectively.
Preferably, the first obstacle is an obstacle in a dense scene.
S102: and after the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network carries out path planning on the intelligent device to be planned according to the local environment state.
Specifically, in a scenario with high-density obstacles, a large state space is generated if the first relative distance is taken as an environmental state, but the influence of obstacles at different distances is different for the intelligent agent (i.e. the intelligent device to be planned): obstacles that are far away from the smart device to be planned are unlikely to collide with the agent in the next step. Therefore, the second relative distance is obtained by screening the plurality of first relative distances according to the local perception condition, and the second relative distance is the relative distance between the closer obstacle and the intelligent device to be planned. The second relative distance is set as a local environment state and is input into the neural network, so that not only is the key environment state maintained, but also the environment complexity is reduced, the time for learning the obstacle avoidance behavior of the intelligent equipment to be planned in the high-density obstacle scene can be reduced, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.
In this embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into a neural network so that the neural network performs path planning on the intelligent device to be planned according to the local environment state specifically includes:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
Specifically, the decision behavior includes a preset advance distance of the smart device to be planned, and the preset advance distance may be represented by the following coordinates:
a(STEP*a[0],STEP*a[1]);
wherein a represents a decision behavior; STEP is a preset fixed STEP length and is used for zooming an action space; the preset advance distance of the intelligent device to be planned comprises the following steps: a movement distance STEP a [0] in the X direction and a movement distance STEP a [1] in the Y direction.
In a specific embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into the decision network specifically includes:
and normalizing the second relative distance, setting the normalized second relative distance as a local environment state, and inputting the local environment state into the decision network, wherein the normalization is as follows:
wherein the second relative distance comprises the second relative distance in the x-directionAnd a second relative distance in the y-direction[X0,Y0]For the coordinate position of the smart device to be planned, [ X ]i,Yi]Is the coordinate position of the first obstacle, W is the width of the environment, and H is the height of the environment.
In an embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.
The method specifically comprises the following steps: record the last step movement behavior (i.e. historical movement behavior) as a before executing each step movement behaviort-1. Under the current environment state, at-1With the current decision behavior atA comparison is made. If the previous moving behavior executed in the current state is collided, but the current moving behavior (namely the decision-making behavior) executed is not collided, judging that the current moving behavior is the obstacle avoidance behavior, and finishing the screening of the obstacle avoidance behavior.
After the second relative distance is obtained by screening from the plurality of first relative distances according to the local perception condition, the second relative distance is set as a local environment state and is input into the neural network, and local observation of the environment is performed, namely the intelligent device to be planned interacts with the surrounding environment in a small range, but for the problem of path planning, a global path is required, and the global path is difficult to explore due to the environment interaction in the local range. Therefore, a global guidance mode needs to be introduced by adding an angle constraint mode, and the purpose of guiding the behavior of the intelligent device to be planned is to be achieved in a global environment. The angle constraint mode specifically includes: acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position; calculating to obtain a second reward value according to the moving angle and a preset constraint angle; and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
Furthermore, the intelligent device to be planned gradually searches for feasible paths through an exploration environment, the moving direction of the intelligent body is limited in an angle constraint mode, when the moving angle exceeds a preset constraint angle, a certain punishment is given, and when the moving angle is smaller than the preset constraint angle, a proper reward is given, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range.
Specifically, the calculation process of the second prize value may be represented by the following formula:
R=(15-θ)*γ;
wherein, R is the second reward value, 15 is the preset constraint angle, theta is the movement angle, gamma is the scale factor, and then (15-theta) represents the included angle. When the included angle is smaller, the advancing direction is closer to the target direction, and the reward value is larger; the larger the angle, the smaller the prize value.
Further, when the included angle is larger than the set angle difference, punishment is given, and the punishment is higher if the included angle is larger.
In this embodiment, the obtaining a second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
In this embodiment, the path planning problem is modeled as a reinforcement learning problem, and global path planning is implemented in a sequential decision manner. The concrete expression is as follows: the intelligent device to be planned acquires the environmental state, makes a decision behavior (the decision behavior comprises a preset advancing distance and a preset advancing direction) through a decision network, controls the intelligent device to be planned to move according to the decision behavior, and inputs the changed environmental state into the decision network again when the environmental state changes, and repeats the decision process until the intelligent device to be planned reaches the target position.
In model-free reinforcement learning, transition probabilities between states are not determined, and the learning process mainly consists of policy evaluation and policy improvement. 1. And (3) policy evaluation: the current strategy is evaluated by generally adopting a mode of calculating a value function, including a state value function and a behavior value function, and adopting a random sample estimation value as an evaluation standard. And fitting the value function by using a neural network, directly outputting a specific value, and then reducing the gap between the specific value and the actual value by updating network parameters. 2. Strategy improvement: after the strategy evaluation value is obtained, updating the strategy according to the evaluation value, gradually improving the strategy to obtain higher value, and specifically mapping the improvement process to the updating of the network parameters.
Network updating: the neural network used in the embodiment of the invention mainly comprises two parts: decision networks and value networks. The decision network is used for outputting decision behaviors, and the value network is used for evaluating the decision behaviors. And the decision network and the value network adopt a gradient descent mode to update the network.
wherein the content of the first and second substances,for value gradients, this parameter is derived from the value network with an update goal of maximizing value. Further, siIs the environmental state at the i-th time, aiIs the motion at the ith time, thetaQAnd m is the number of samples, and n is the number of samples extracted from the experience pool each time. Due to the gradient descent mode, the value maximization is realized by taking the negative gradient as an update.
The gradient of the parameters of the value network is as follows:
wherein, yi=ri+γQ′(si+1,ai+1|θQ′) Expressed as a value criterion at the current moment, riThe prize value is expressed as the environmental feedback at the current time. The network is updated with the gap that minimizes the target value. Further, si+1Is the environmental state at the time of i +1, ai+1Is the action at the i +1 th time, thetaQ’The network parameter at the i +1 th time is m, and n is the number of samples taken from the experience pool each time.
To further explain the calculation process of the movement angle, please refer to fig. 2, and fig. 2 is a schematic diagram of the movement angle according to an embodiment of the present invention.
Wherein a [ x, y ] represents a decision behavior, (x0, y0) represents a departure position of the intelligent device to be planned, and (xi, yi) represents a target position of the intelligent device to be planned.
The calculation of the movement angle θ is represented by the following equation:
θ=arctan y/x-arctan(yi-y0)/(xi-x0);
wherein X represents the moving distance of the intelligent device to be planned in the X direction, Y represents the moving distance of the intelligent device to be planned in the Y direction, and (xi-X0), (yi-Y0) represent the real-time moving direction of the intelligent device to be planned.
To further explain the path planning device, please refer to fig. 3, where fig. 3 is a schematic structural diagram of a path planning device according to an embodiment of the present invention, including: an acquisition module 301 and a planning module 302;
the obtaining module 301 is configured to obtain a plurality of first relative distances between the intelligent device to be planned and the plurality of first obstacles, respectively.
The planning module 302 is configured to, after a second relative distance is obtained by screening from the plurality of first relative distances according to a local perception condition, set the second relative distance as a local environment state and input the local environment state into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state.
In this embodiment, the setting the second relative distance as a local environment state and inputting the local environment state into a neural network so that the neural network performs path planning on the intelligent device to be planned according to the local environment state specifically includes:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
In this embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior; wherein the historical movement behavior is a previous movement behavior of the decision behavior.
In this embodiment, the method further includes:
acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
In this embodiment, the obtaining a second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
According to the embodiment of the invention, a plurality of first relative distances between intelligent equipment to be planned and a plurality of first obstacles are obtained through an obtaining module; and then, after the second relative distance is obtained by screening from the plurality of first relative distances through the planning module according to the local perception condition, the second relative distance is set to be in a local environment state and is input into the neural network, so that the neural network carries out path planning on the intelligent equipment to be planned according to the local environment state.
According to the method and the device, after the second relative distance is obtained through screening according to the local sensing condition, the second relative distance is set to be in the local environment state and is input into the neural network, so that the key environment state is kept, the complexity of the environment is reduced, the time for learning the obstacle avoidance behavior of the intelligent device environment to be planned can be shortened in the scene of high-density obstacles, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent device to be planned is improved.
Secondly, the embodiment of the invention introduces a global guidance mode by adding an angle constraint mode, guides the intelligent device to be planned from a global environment, gives a certain punishment when the movement angle exceeds the preset constraint angle, and gives a proper reward when the movement angle is smaller than the preset constraint angle, so that the intelligent device to be planned gradually learns to move at an angle in a fixed range, and the intelligent device to be planned can be effectively prevented from being stuck in a local environment and being incapable of advancing.
Moreover, after the obstacle avoidance behaviors are screened out from the decision behaviors, the obstacle avoidance behaviors are strengthened, and corresponding reward values are given to the obstacle avoidance behaviors, so that the intelligent device to be planned can quickly memorize and learn how to avoid the obstacle.
Finally, the reward value is calculated, the goal of approaching the target position is taken as the optimization goal, the goal of directly starting towards the target position in each step of decision-making behaviors except the obstacle avoidance behavior is achieved, the finally planned route is smooth and short, the intelligent device to be planned can quickly reach the target position, and the moving efficiency is improved.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.
Claims (10)
1. A method of path planning, comprising:
acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
and after a second relative distance is obtained by screening from the plurality of first relative distances according to a local perception condition, setting the second relative distance as a local environment state and inputting the local environment state into a neural network, so that the neural network plans a path of the intelligent equipment to be planned according to the local environment state.
2. The path planning method according to claim 1, wherein the second relative distance is set as a local environment state and is input to a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
3. The path planning method according to claim 2, wherein the determining the decision behavior is an obstacle avoidance behavior, and specifically includes:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.
4. A path planning method according to claim 3, further comprising:
acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
5. The path planning method according to claim 4, wherein the obtaining of the second relative distance is based on a local sensing condition and a plurality of the first relative distances, and specifically includes:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
6. A path planning apparatus, comprising: an acquisition module and a planning module;
the acquisition module is used for acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
the planning module is used for screening a second relative distance from the plurality of first relative distances according to local perception conditions, setting the second relative distance as a local environment state and inputting the local environment state into the neural network, so that the neural network plans the path of the intelligent device to be planned according to the local environment state.
7. The path planning device according to claim 6, wherein the second relative distance is set as a local environment state and is input to a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: a decision network and a value network;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates to obtain a decision behavior according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be an obstacle avoidance behavior, the decision-making behavior is enhanced and learned through the value network, and a first reward value is given to the decision-making behavior;
inputting the first reward value into the value network to enable the value network to calculate a strategy evaluation value of the decision-making behavior;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, and completing path planning of the intelligent equipment to be planned.
8. The path planning device according to claim 7, wherein the determining the decision behavior is an obstacle avoidance behavior, specifically:
if the historical movement behavior is continuously executed, the first obstacle will be collided with, and the decision-making behavior is executed without being collided with, the decision-making behavior is judged to be an obstacle avoidance behavior; otherwise, judging that the decision-making behavior is not an obstacle avoidance behavior.
9. A path planner according to claim 8, further comprising:
acquiring the real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating to obtain a second reward value according to the moving angle and a preset constraint angle;
and when the movement angle is smaller than the preset angle, giving the second incentive value to the movement angle.
10. The path planning device according to claim 9, wherein the second relative distance is obtained according to a local sensing condition and a plurality of the first relative distances, and specifically:
the local perception conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset numerical value, taking the first relative distance as the second relative distance and obtaining the second relative distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111635189.8A CN114326734B (en) | 2021-12-29 | 2021-12-29 | Path planning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111635189.8A CN114326734B (en) | 2021-12-29 | 2021-12-29 | Path planning method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114326734A true CN114326734A (en) | 2022-04-12 |
CN114326734B CN114326734B (en) | 2024-03-08 |
Family
ID=81016080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111635189.8A Active CN114326734B (en) | 2021-12-29 | 2021-12-29 | Path planning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114326734B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107168324A (en) * | 2017-06-08 | 2017-09-15 | 中国矿业大学 | A kind of robot path planning method based on ANFIS fuzzy neural networks |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
CN110083165A (en) * | 2019-05-21 | 2019-08-02 | 大连大学 | A kind of robot paths planning method under complicated narrow environment |
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111399541A (en) * | 2020-03-30 | 2020-07-10 | 西北工业大学 | Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network |
CN112977146A (en) * | 2021-02-24 | 2021-06-18 | 中原动力智能机器人有限公司 | Charging method and system for automatic driving vehicle and charging pile |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
-
2021
- 2021-12-29 CN CN202111635189.8A patent/CN114326734B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107168324A (en) * | 2017-06-08 | 2017-09-15 | 中国矿业大学 | A kind of robot path planning method based on ANFIS fuzzy neural networks |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
CN110083165A (en) * | 2019-05-21 | 2019-08-02 | 大连大学 | A kind of robot paths planning method under complicated narrow environment |
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111399541A (en) * | 2020-03-30 | 2020-07-10 | 西北工业大学 | Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network |
CN112977146A (en) * | 2021-02-24 | 2021-06-18 | 中原动力智能机器人有限公司 | Charging method and system for automatic driving vehicle and charging pile |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
Non-Patent Citations (3)
Title |
---|
梁宏伟,等: "移动机器人的遗传多点路径规划", 工程技术, pages 584 - 587 * |
薛均晓,等: "基于深度强化学习的舰载机动态避障方法", 计算机辅助设计与图形学学报, pages 1102 - 1112 * |
薛均晓,等: "密集障碍物场景下基于改进DDPG 的机器人路径规划方法", 2022中国自动化大会论文集, pages 1 - 6 * |
Also Published As
Publication number | Publication date |
---|---|
CN114326734B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021208771A1 (en) | Reinforced learning method and device | |
CN112937564B (en) | Lane change decision model generation method and unmanned vehicle lane change decision method and device | |
US11189171B2 (en) | Traffic prediction with reparameterized pushforward policy for autonomous vehicles | |
US20170139423A1 (en) | Control system and method for multi-vehicle systems | |
US11604469B2 (en) | Route determining device, robot, and route determining method | |
CN112180950B (en) | Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning | |
JP2022506404A (en) | Methods and devices for determining vehicle speed | |
Wang et al. | Autonomous ramp merge maneuver based on reinforcement learning with continuous action space | |
CN112669345B (en) | Cloud deployment-oriented multi-target track tracking method and system | |
CN114489059A (en) | Mobile robot path planning method based on D3QN-PER | |
Rafieisakhaei et al. | Feedback motion planning under non-gaussian uncertainty and non-convex state constraints | |
Mohamed et al. | Autonomous navigation of agvs in unknown cluttered environments: log-mppi control strategy | |
CN112255628A (en) | Obstacle trajectory prediction method, apparatus, device, and medium | |
CN114442630A (en) | Intelligent vehicle planning control method based on reinforcement learning and model prediction | |
CN114326734A (en) | Path planning method and device | |
CN117141520A (en) | Real-time track planning method, device and equipment | |
Wiering | Reinforcement learning in dynamic environments using instantiated information | |
Wang et al. | Tracking moving target for 6 degree-of-freedom robot manipulator with adaptive visual servoing based on deep reinforcement learning PID controller | |
Gopalakrishnan et al. | Chance constraint based multi agent navigation under uncertainty | |
Ward et al. | Towards risk minimizing trajectory planning in on-road scenarios | |
CN114169463A (en) | Autonomous prediction lane information model training method and device | |
Hakobyan et al. | Toward improving the distributional robustness of risk-aware controllers in learning-enabled environments | |
CN113158539A (en) | Method for long-term trajectory prediction of traffic participants | |
CN112904837A (en) | Data processing method, device and computer readable storage medium | |
Raj et al. | Dynamic Obstacle Avoidance Technique for Mobile Robot Navigation Using Deep Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |