CN114003059A - UAV path planning method based on deep reinforcement learning under kinematic constraint condition - Google Patents
UAV path planning method based on deep reinforcement learning under kinematic constraint condition Download PDFInfo
- Publication number
- CN114003059A CN114003059A CN202111282488.8A CN202111282488A CN114003059A CN 114003059 A CN114003059 A CN 114003059A CN 202111282488 A CN202111282488 A CN 202111282488A CN 114003059 A CN114003059 A CN 114003059A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- neural network
- reinforcement learning
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 claims abstract description 60
- 230000009471 action Effects 0.000 claims abstract description 53
- 239000013598 vector Substances 0.000 claims abstract description 39
- 230000004888 barrier function Effects 0.000 claims abstract description 21
- 230000003068 static effect Effects 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 7
- 210000005036 nerve Anatomy 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000004880 explosion Methods 0.000 claims description 3
- 239000011541 reaction mixture Substances 0.000 claims description 3
- 238000005034 decoration Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002028 premature Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Abstract
The invention discloses a UAV path planning method based on deep reinforcement learning under a kinematic constraint condition, which comprises the following specific steps of: s1: the deep reinforcement learning neural network obtains the shortest path according to the vector coordinates of the plurality of task points and the static barrier; s2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off; s3: when the existence of the dynamic barrier is detected, the unmanned aerial vehicle sends a signal to the base station, and the super computer predicts the position of the unmanned aerial vehicle when receiving the signal; s4: outputting a new flight path by using a deep reinforcement learning neural network according to the coordinates of the dynamic barrier and the rest task points, and sending the new flight path to the unmanned aerial vehicle by radio; s5: and the unmanned aerial vehicle executes the tasks along the new path, and finally returns to the base after all the tasks are executed. The invention provides a framework based on online and offline, which not only solves the problem that the state and the action in Q-Learning are high-dimensional, but also considers a kinematic model and avoids dynamic obstacles while solving the TSP problem.
Description
Technical Field
The invention belongs to the field of unmanned aerial vehicle path planning design, and particularly relates to a UAV path planning method based on deep reinforcement learning under a kinematic constraint condition.
Background
In the civil and military fields, an unmanned aerial vehicle usually needs to perform tasks at multiple target points, and finding an optimal path to traverse all the target points is a key technology of unmanned aerial vehicle application research, namely a path planning problem.
Generally, path planning problems fall into three categories:
1) numerical methods, such as methods of mixed integer programming; however, the numerical method usually needs to solve the problem of non-convex optimization, which not only needs special commercial software (such as CPLEX) but also takes a long time.
2) Traditional intelligent algorithms such as genetic algorithm, ant colony algorithm, greedy algorithm, simulated annealing method and the like. However, the group intelligence algorithm is easy to fall into local optimization, and because the implementation of the operator has many parameters, such as cross rate and mutation rate, the selection of the parameters may cause the problem of solving premature convergence; and the traditional intelligent algorithm can only provide a solution close to the optimal solution, and cannot ensure or globally optimize the solution.
3) An algorithm based on reinforcement learning. The principle of reinforcement learning is an algorithm that an agent selects an action by observing the current state and learns according to the obtained reward value. Compared with numerical algorithms and traditional intelligent algorithms, reinforcement learning is based on a markov process, which makes use of the property that markov matrices necessarily converge for global planning.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a UAV path planning method based on deep reinforcement Learning under a kinematic constraint condition, provides a framework based on online and ofline, not only solves the problem that the state and the action in Q-Learning are high-dimensional, but also considers a kinematic model and avoids dynamic obstacles while solving the TSP problem.
The invention mainly adopts the technical scheme that:
a UAV path planning method based on deep reinforcement learning under a kinematic constraint condition comprises the following specific steps:
s1: when the unmanned aerial vehicle is at the base, according to the vector coordinates of the plurality of task points and the static barrier, the shortest path of the unmanned aerial vehicle under the kinematic constraint is obtained by using the deep reinforcement learning neural network;
s2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off;
s3: in the process of executing tasks, when the radar on the unmanned aerial vehicle detects that a dynamic obstacle exists within 5km, the unmanned aerial vehicle is connected with the unmanned aerial vehicleTransmitting the vector coordinates of the dynamic obstacles and the residual task points to the base station by the over-the-air system, flying along the original path before receiving the feedback signal of the base station, and transmitting the signal to the super computer of the base station according to the time t from the transmission of the signal to the reception of the signal by the unmanned aerial vehicle0Predicting the position of the unmanned aerial vehicle when receiving the signal;
s4: the super computer of the base outputs Q values of all actions by using a deep reinforcement learning neural network according to the coordinates of the dynamic obstacles and the residual task points, generates a new action selection strategy epsilon-greedy from the Q values, selects actions according to the new action selection strategy epsilon-greedy to obtain a new flight path, and sends the new flight path to the unmanned aerial vehicle through radio;
s5: and after receiving the feedback signal, the unmanned aerial vehicle executes the tasks along the new path, finally returns to the base after all the tasks are executed, and the unmanned aerial vehicle completes the tasks.
Preferably, the specific steps of using the deep reinforcement learning neural network to obtain the shortest path of the drone under the kinematic constraint in the step S1 are as follows:
s1-1: when the unmanned aerial vehicle is at the base, the N target task points are numbered as 1, 2 and 3 … … N in sequence, the base is numbered as 0, the dimension of the state vector of the unmanned aerial vehicle is set as N +2, the first bit in the state vector of the unmanned aerial vehicle is 0 and represents the base number, and the last bit is thetaiThe current task point incidence angle with the number i represents, and the middle digit is updated to the task point number according to the task point reached by the unmanned aerial vehicle, so that the initial state vector of the unmanned aerial vehicle at the base is as follows:
sinitial=[0,0,0,…,0,θ0]T (1);
wherein the first bit is 0, representing the base number, the other 0 representing the initial state when the task point is not reached, θ0Represents the angle of incidence of the drone at base 0;
s1-2: the unmanned aerial vehicle state vector is used as the input of a deep reinforcement learning neural network, which action is solved and selected by the deep reinforcement learning neural network, so that the total distance under the kinematic constraint is shortest, namely the Q value is maximum, and an action selection strategy epsilon-greedy is generated;
s1-3: the deep reinforcement learning neural network selects actions according to an action selection strategy epsilon-greedy, determines which task point to go to and at what angle to fly out, randomly searches when the random number is smaller than epsilon, and selects the action with the maximum Q value when the random number is larger than or equal to epsilon, so that the state of the unmanned aerial vehicle is updated as follows:
sbcd=[0,b,c,d,0…,0,θd]T (2);
wherein, b, c and d are the serial numbers of the task points reached by the unmanned aerial vehicle in sequence, and thetadThe number of the incident angle is d, the state vector of the unmanned aerial vehicle is the number sequence of the flying task points of the unmanned aerial vehicle, and the state vector of the unmanned aerial vehicle is updated once every action is taken.
Preferably, the deep reinforcement learning neural network comprises two neural networks of the same structure: neural network QevalAnd neural network QtargetDuring initialization, the parameter weights of the two neural networks are the same, and then the neural network QevalTraining and updating nerve Q is reversely transmitted every h steps while generating action selection strategy epsilon-greedyevalNetwork parameter omega of network to obtain new neural network QevalThe method comprises the following specific steps:
s1-21: shortest Dubins curve distance l between two pointsDubinsThe calculation formula of (a) is as follows:
in the formula (3), alpha and beta are incident angles of two points respectively, d is a linear distance between the two points, R is a turning radius of a Dubins curve, R represents clockwise motion, S represents linear motion, and L represents anticlockwise motion;
when any two task points P1And P2When no barrier exists between the two task points, substituting the vector coordinates of the two task points into a formula (3) to calculate the shortest Dubins curve distance of the two task points
When any two task points P1And P2When a static obstacle or a dynamic obstacle exists between the two, the shortest Dubins curve distance of the two task pointsThe specific calculation steps are as follows:
firstly, a circle C with radius r is taken as the center of a circle by taking the center of a dynamic barrier or a static barrier as the center of the circle2Wherein r is the turning radius of the Dubins curve; then, the unmanned plane moves towards the circle C from the moving direction of the position2Making tangent lines to obtain common tangent pointsAnd a vector(Vector)Expressed as:
wherein the content of the first and second substances,respectively two common tangent pointsIs determined by the coordinate of (a) in the space,is two common tangent pointsAngle of incidence of;
according to two task points P1And P2Vector coordinates and vectorsCalculates two task points P1And P2The shortest distance between the Dubins curvesAs shown in equation (5):
wherein the content of the first and second substances,respectively represent task points P1And P2Of (2), wherein P1For the current task point, P2Is the next task point;are obtained by calculation according to a formula (3);
s1-22: according to the shortest Dubins curve distance between two task pointsCalculating the prize value p as shown in equation (6):
in the formula (6), γ1The discount coefficient is set to be 0.1 and is used for preventing gradient explosion caused by too large difference of training data;
s1-23: calculating the Loss value of the Loss function by using the reward value p calculated in the step S1-22, wherein the Loss function is expressed by the formula (7):
in the formula (7), the reaction mixture is,neural network Q for deep reinforcement learningevalApproximate Q value of output, sjIs the state of the jth data, ajFor the j-th data, ω is the deep reinforcement learning neural network QevalParameter of (2) need to be trained, yjA Q value calculated for the drone by the instant prize value, as shown in equation (8):
in the formula (8), ρjIs shown in state sjTaking action ajInstant prize value, gamma, obtained2For the discount coefficient, set to 0.01,neural network Q for deep reinforcement learningtargetPredicted at state s'j+1Take action a'jMaximum Q value obtainable, wherein, state s'j+1Is the state s in the formula (8)jTaking action ajRear state, a'jIs unmanned plane at state s'j+1An action capable of obtaining a maximum Q value;
s1-24: training and updating the nerve Q according to the Loss value obtained in the step S1-23 and the reverse transmissionevalThe network parameters ω of the network, and in addition, the neural network Q, are applied every 5 × h stepsevalTo the network QtargetAnd (6) updating.
Preferably, the neural network QevalAnd neural network QtargetEach comprising 3 convolutional layers and 3 fully-connected layers, each convolutional layer having a convolutional kernel size of 4 × 4, a step size of 3 × 3, and an output action number of N × 24.
Has the advantages that: the invention provides a UAV path planning method based on Deep reinforcement Learning under a kinematic constraint condition, which obtains the path planning of an unmanned aerial vehicle by adopting Deep reinforcement Learning DQN (Deep Q-Learning), and has the following advantages:
(1) aiming at the complex problem that reinforcement learning cannot process high dimension, Deep Reinforcement Learning (DRL) adopts a neural network to approximate a Q value, so that the defect of reinforcement learning is overcome;
(2) due to the existence of the exploration rate epsilon, the algorithm can explore a global optimal solution, and the problem of premature convergence is solved;
(3) compared with the traditional intelligent algorithm, the optimal solution under the kinematic constraint condition can be obtained, the method has a certain obstacle avoidance function, and can be widely applied to the aspects of inspection, detection or logistics dispatching and the like of multiple target points in civil or military affairs.
Drawings
FIG. 1 is a flow chart of a path planning method of the present invention;
fig. 2 is a schematic diagram of path planning when encountering dynamic or static obstacles.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example 1
As shown in fig. 1, a UAV path planning method based on deep reinforcement learning under the kinematic constraint condition includes the following specific steps:
s1: when the unmanned aerial vehicle is at a base, according to the vector coordinates of the plurality of task points and the static barrier, the shortest path of the unmanned aerial vehicle under the kinematic constraint is obtained by using a deep reinforcement learning neural network, wherein the concrete steps of the shortest path are as follows:
s1-1: when the unmanned aerial vehicle is at the base, the N target task points are numbered as 1, 2 and 3 … … N in sequence, the base is numbered as 0, the state vector dimension of the unmanned aerial vehicle is set as N +2, the first bit in the state vector of the unmanned aerial vehicle is 0 and represents the base number, and finally the unmanned aerial vehicle is at the baseOne bit is thetaiThe current task point incidence angle with the number i represents, and the middle digit is updated to the task point number according to the task point reached by the unmanned aerial vehicle, so that the initial state vector of the unmanned aerial vehicle at the base is as follows:
sinitial=[0,0,0,…,0,θ0]T (1);
wherein the first bit is 0, representing the base number, the other 0 representing the initial state when the task point is not reached, θ0Represents the angle of incidence of the drone at base 0;
s1-2: the unmanned aerial vehicle state vector is used as the input of a deep reinforcement learning neural network, which action is solved and selected by the deep reinforcement learning neural network, so that the total distance under the kinematic constraint is shortest, namely the Q value is maximum, and an action selection strategy epsilon-greedy is generated;
s1-3: the deep reinforcement learning neural network selects actions according to an action selection strategy epsilon-greedy, determines which task point to go to and at what angle to fly out, randomly searches when the random number is smaller than epsilon, and selects the action with the maximum Q value when the random number is larger than or equal to epsilon, so that the state of the unmanned aerial vehicle is updated as follows:
sbcd=[0,b,c,d,0…,0,θd]T (2);
wherein, b, c and d are the serial numbers of the task points reached by the unmanned aerial vehicle in sequence, and thetadThe number of the incident angle is d, the state vector of the unmanned aerial vehicle is the number sequence of the flying task points of the unmanned aerial vehicle, and the state vector of the unmanned aerial vehicle is updated once every action is taken.
S2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off;
s3: in the process of executing the task, when a radar on the unmanned aerial vehicle detects that a dynamic barrier exists within 5km, the unmanned aerial vehicle sends the dynamic barrier and the vector coordinates of the residual task points to the base through radio, flies along the original path before receiving a feedback signal of the base, and a supercomputer of the base flies according to the time t from the sending of the signal to the receiving of the signal of the unmanned aerial vehicle0Predicting the position of the unmanned aerial vehicle when receiving the signal;
s4: the super computer of the base outputs Q values of all actions by using a deep reinforcement learning neural network according to the coordinates of the dynamic obstacles and the residual task points, generates a new action selection strategy epsilon-greedy from the Q values, selects actions according to the new action selection strategy epsilon-greedy to obtain a new flight path, and sends the new flight path to the unmanned aerial vehicle through radio;
s5: and after receiving the feedback signal, the unmanned aerial vehicle executes the tasks along the new path, finally returns to the base after all the tasks are executed, and the unmanned aerial vehicle completes the tasks.
In this embodiment 1, the deep reinforcement learning neural network includes two neural networks with the same structure: namely neural network QevalAnd neural network QtargetAnd the neural network QevalAnd neural network QtargetEach comprising 3 convolutional layers and 3 fully-connected layers, each convolutional layer having a convolutional kernel size of 4 × 4, a step size of 3 × 3, and an output action number of N × 24. During initialization, the parameter weights of the two neural networks are the same, and subsequently, the neural network QevalTraining and updating nerve Q is reversely transmitted every h steps while generating action selection strategy epsilon-greedyevalNetwork parameter omega of network to obtain new neural network QevalThe method comprises the following specific steps:
s1-21: shortest Dubins curve distance l between two pointsDubinsThe calculation formula of (a) is as follows:
in the formula (3), alpha and beta are incident angles of two points respectively, d is a linear distance between the two points, R is a turning radius of a Dubins curve, R represents clockwise motion, S represents linear motion, and L represents anticlockwise motion;
when any two task points P1And P2When no barrier exists between the two task points, substituting the vector coordinates of the two task points into a formula (3) to calculate the shortest Dubins curve distance of the two task points
When any two task points P1And P2When a static obstacle or a dynamic obstacle exists between the two, the shortest Dubins curve distance of the two task pointsThe specific calculation steps are as follows:
as shown in FIG. 2, a circle C with radius r is first formed by using the center of the dynamic obstacle or the static obstacle as the center of a circle2Wherein r is the turning radius of the Dubins curve; then, the unmanned plane moves towards the circle C from the moving direction of the position2Making tangent lines to obtain common tangent pointsAnd a vector(Vector)Expressed as:
wherein the content of the first and second substances,respectively two common tangent pointsIs determined by the coordinate of (a) in the space,is two common tangent pointsAngle of incidence of;
according to two task points P1And P2Vector coordinates and vectorsCalculates two task points P1And P2The shortest distance between the Dubins curvesAs shown in equation (5):
wherein the content of the first and second substances,respectively represent task points P1And P2Of (2), wherein P1For the current task point, P2Is the next task point;are obtained by calculation according to a formula (3);
s1-22: according to the shortest Dubins curve distance between two task pointsCalculating the prize value p as shown in equation (6):
in the formula (6), γ1The discount coefficient is set to be 0.1 and is used for preventing gradient explosion caused by too large difference of training data;
s1-23: calculating the Loss value of the Loss function by using the reward value p calculated in the step S1-22, wherein the Loss function is expressed by the formula (7):
in the formula (7), the reaction mixture is,neural network Q for deep reinforcement learningevalApproximate Q value of output, sjIs the state of the jth data, ajFor the j-th data, ω is the deep reinforcement learning neural network QevalParameter of (2) need to be trained, yjA Q value calculated for the drone by the instant prize value, as shown in equation (8):
in the formula (8), ρjIs shown in state sjTaking action ajInstant prize value, gamma, obtained2For the discount coefficient, set to 0.01,neural network Q for deep reinforcement learningtargetPredicted at state s'j+1Take action a'jMaximum Q value obtainable, wherein, state s'j+1Is the state s in the formula (8)jTaking action ajRear state, a'jIs unmanned plane at state s'j+1An action capable of obtaining a maximum Q value;
s1-24: training and updating the nerve Q according to the Loss value obtained in the step S1-23 and the reverse transmissionevalThe network parameters ω of the network, and in addition, the neural network Q, are applied every 5 × h stepsevalTo the neural network QtargetAnd (6) updating.
In the invention, the tangent lines are made as shown in figure 2 when the static obstacle and the dynamic obstacle meet, and the shortest Dubins curve distance of the two task pointsThe same is true for the calculation of (c). This is because the action selected by the action selection strategy of the deep reinforcement learning determines at what angle the next point is emittedThe angle of incidence of the two points is known, and the coordinates of the obstacle are known, and the calculation is the same. Therefore, when the dynamic barrier is detected, the unmanned plane sends the dynamic barrier coordinate to the base, the dynamic barrier coordinate and the incident angles of the two points are provided, and the tangent method and the calculation method are consistent with the static barrier processing mode.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (4)
1. A UAV path planning method based on deep reinforcement learning under a kinematic constraint condition is characterized by comprising the following specific steps:
s1: when the unmanned aerial vehicle is at the base, according to the vector coordinates of the plurality of task points and the static barrier, the shortest path of the unmanned aerial vehicle under the kinematic constraint is obtained by using the deep reinforcement learning neural network;
s2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off;
s3: in the process of executing the task, when a radar on the unmanned aerial vehicle detects that a dynamic barrier exists within 5km, the unmanned aerial vehicle sends the dynamic barrier and the vector coordinates of the residual task points to the base through radio, flies along the original path before receiving a feedback signal of the base, and a supercomputer of the base flies according to the time t from the sending of the signal to the receiving of the signal of the unmanned aerial vehicle0Predicting the position of the unmanned aerial vehicle when receiving the signal;
s4: the super computer of the base outputs Q values of all actions by using a deep reinforcement learning neural network according to the coordinates of the dynamic obstacles and the residual task points, generates a new action selection strategy epsilon-greedy from the Q values, selects actions according to the new action selection strategy epsilon-greedy to obtain a new flight path, and sends the new flight path to the unmanned aerial vehicle through radio;
s5: and after receiving the feedback signal, the unmanned aerial vehicle executes the tasks along the new path, finally returns to the base after all the tasks are executed, and the unmanned aerial vehicle completes the tasks.
2. The method for planning UAV path based on deep reinforcement learning under the kinematic constraint condition of claim 1, wherein the specific steps of using the deep reinforcement learning neural network to derive the shortest path of the drone under the kinematic constraint in step S1 are as follows:
s1-1: when the unmanned aerial vehicle is at the base, the N target task points are numbered as 1, 2 and 3 … … N in sequence, the base is numbered as 0, the dimension of the state vector of the unmanned aerial vehicle is set as N +2, the first bit in the state vector of the unmanned aerial vehicle is 0 and represents the base number, and the last bit is thetaiThe current task point incidence angle with the number i represents, and the middle digit is updated to the task point number according to the task point reached by the unmanned aerial vehicle, so that the initial state vector of the unmanned aerial vehicle at the base is as follows:
sinitial=[0,0,0,...,0,θ0]T (1);
wherein the first bit is 0, representing the base number, the other 0 representing the initial state when the task point is not reached, θ0Represents the angle of incidence of the drone at base 0;
s1-2: the unmanned aerial vehicle state vector is used as the input of a deep reinforcement learning neural network, which action is solved and selected by the deep reinforcement learning neural network, so that the total distance under the kinematic constraint is shortest, namely the Q value is maximum, and an action selection strategy epsilon-greedy is generated;
s1-3: the deep reinforcement learning neural network selects actions according to an action selection strategy epsilon-greedy, determines which task point to go to and at what angle to fly out, randomly searches when the random number is smaller than epsilon, and selects the action with the maximum Q value when the random number is larger than or equal to epsilon, so that the state of the unmanned aerial vehicle is updated as follows:
sbcd=[0,b,c,d,0…,0,θd]T (2);
wherein, b, c and d are the serial numbers of the task points reached by the unmanned aerial vehicle in sequence, and thetadThe incident angle of the task point is numbered as d, and the state direction of the unmanned aerial vehicleThe number is the sequence of the serial numbers of the flying task points of the unmanned aerial vehicle, and the state vector of the unmanned aerial vehicle is updated once every action is taken.
3. The method for planning UAV path according to claim 2, wherein the deep-reinforcement learning neural network comprises two neural networks with the same structure: neural network QevalAnd neural network QtargetDuring initialization, the parameter weights of the two neural networks are the same, and then the neural network QevalTraining and updating nerve Q is reversely transmitted every h steps while generating action selection strategy epsilon-greedyevalNetwork parameter omega of network to obtain new neural network QevalThe method comprises the following specific steps:
s1-21: shortest Dubins curve distance l between two pointsDubinsThe calculation formula of (a) is as follows:
in the formula (3), alpha and beta are incident angles of two points respectively, d is a linear distance between the two points, R is a turning radius of a Dubins curve, R represents clockwise motion, S represents linear motion, and L represents anticlockwise motion;
when any two task points P1And P2When no barrier exists between the two task points, substituting the vector coordinates of the two task points into a formula (3) to calculate the shortest Dubins curve distance of the two task points
When any two task points P1And P2When a static obstacle or a dynamic obstacle exists between the two, the shortest Dubins curve distance of the two task pointsThe specific calculation steps are as follows:
firstly, a circle C with radius r is taken as the center of a circle by taking the center of a dynamic barrier or a static barrier as the center of the circle2Wherein r is the turning radius of the Dubins curve; then, the unmanned plane moves towards the circle C from the moving direction of the position2Making tangent lines to obtain common tangent pointsAnd a vector(Vector)Expressed as:
wherein the content of the first and second substances,respectively two common tangent pointsIs determined by the coordinate of (a) in the space,is two common tangent pointsAngle of incidence of;
according to two task points P1And P2Vector coordinates and vectorsCalculates two task points P1And P2The shortest distance between the Dubins curvesAs shown in equation (5):
wherein the content of the first and second substances,respectively represent task points P1And P2Of (2), wherein P1For the current task point, P2Is the next task point;are obtained by calculation according to a formula (3);
s1-22: according to the shortest Dubins curve distance between two task pointsCalculating the prize value p as shown in equation (6):
in the formula (6), γ1The discount coefficient is set to be 0.1 and is used for preventing gradient explosion caused by too large difference of training data;
s1-23: calculating the Loss value of the Loss function by using the reward value p calculated in the step S1-22, wherein the Loss function is expressed by the formula (7):
in the formula (7), the reaction mixture is,neural network Q for deep reinforcement learningevalOf the outputApproximate Q value, sjIs the state of the jth data, ajFor the j-th data, ω is the deep reinforcement learning neural network QevalParameter of (2) need to be trained, yjA Q value calculated for the drone by the instant prize value, as shown in equation (8):
in the formula (8), ρjIs shown in state sjTaking action ajInstant prize value, gamma, obtained2For the discount coefficient, set to 0.01,neural network Q for deep reinforcement learningtargetPredicted at state s'j+1Take action a'jMaximum Q value obtainable, wherein, state s'j+1Is the state s in the formula (8)jTaking action ajRear state, a'jIs unmanned plane at state s'j+1An action capable of obtaining a maximum Q value;
s1-24: training and updating the nerve Q according to the Loss value obtained in the step S1-23 and the reverse transmissionevalThe network parameters ω of the network, and in addition, the neural network Q, are applied every 5 × h stepsevalTo the network QtargetAnd (6) updating.
4. The UAV path planning method based on deep reinforcement learning under the kinematic constraint condition of claim 3, wherein the neural network Q isevalAnd neural network QtargetEach comprising 3 convolutional layers and 3 fully-connected layers, each convolutional layer having a convolutional kernel size of 4 × 4, a step size of 3 × 3, and an output action number of N × 24.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111282488.8A CN114003059B (en) | 2021-11-01 | 2021-11-01 | UAV path planning method based on deep reinforcement learning under kinematic constraint condition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111282488.8A CN114003059B (en) | 2021-11-01 | 2021-11-01 | UAV path planning method based on deep reinforcement learning under kinematic constraint condition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114003059A true CN114003059A (en) | 2022-02-01 |
CN114003059B CN114003059B (en) | 2024-04-16 |
Family
ID=79926040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111282488.8A Active CN114003059B (en) | 2021-11-01 | 2021-11-01 | UAV path planning method based on deep reinforcement learning under kinematic constraint condition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114003059B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115268494A (en) * | 2022-07-26 | 2022-11-01 | 江苏科技大学 | Unmanned aerial vehicle path planning method based on layered reinforcement learning |
CN116661501A (en) * | 2023-07-24 | 2023-08-29 | 北京航空航天大学 | Unmanned aerial vehicle cluster high dynamic environment obstacle avoidance and moving platform landing combined planning method |
CN116683349A (en) * | 2023-06-27 | 2023-09-01 | 国网青海省电力公司海北供电公司 | Correction method and system for power equipment sky inspection line and inspection unmanned aerial vehicle |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106406346A (en) * | 2016-11-01 | 2017-02-15 | 北京理工大学 | Plan method for rapid coverage track search coordinated by multiple UAVs (Unmanned Aerial Vehicles) |
CN109685237A (en) * | 2017-10-19 | 2019-04-26 | 北京航空航天大学 | A kind of real-time planing method of unmanned aerial vehicle flight path based on the path Dubins and branch and bound |
CN110362089A (en) * | 2019-08-02 | 2019-10-22 | 大连海事大学 | A method of the unmanned boat independent navigation based on deeply study and genetic algorithm |
CN110470301A (en) * | 2019-08-13 | 2019-11-19 | 上海交通大学 | Unmanned plane paths planning method under more dynamic task target points |
CN110488872A (en) * | 2019-09-04 | 2019-11-22 | 中国人民解放军国防科技大学 | A kind of unmanned plane real-time route planing method based on deeply study |
WO2020056875A1 (en) * | 2018-09-20 | 2020-03-26 | 初速度(苏州)科技有限公司 | Parking strategy based on deep reinforcement learning |
CN111027143A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Shipboard aircraft approach guiding method based on deep reinforcement learning |
CN112947594A (en) * | 2021-04-07 | 2021-06-11 | 东北大学 | Unmanned aerial vehicle-oriented flight path planning method |
CN113064422A (en) * | 2021-03-09 | 2021-07-02 | 河海大学 | Autonomous underwater vehicle path planning method based on double neural network reinforcement learning |
CN113190039A (en) * | 2021-04-27 | 2021-07-30 | 大连理工大学 | Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning |
-
2021
- 2021-11-01 CN CN202111282488.8A patent/CN114003059B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106406346A (en) * | 2016-11-01 | 2017-02-15 | 北京理工大学 | Plan method for rapid coverage track search coordinated by multiple UAVs (Unmanned Aerial Vehicles) |
CN109685237A (en) * | 2017-10-19 | 2019-04-26 | 北京航空航天大学 | A kind of real-time planing method of unmanned aerial vehicle flight path based on the path Dubins and branch and bound |
WO2020056875A1 (en) * | 2018-09-20 | 2020-03-26 | 初速度(苏州)科技有限公司 | Parking strategy based on deep reinforcement learning |
CN110362089A (en) * | 2019-08-02 | 2019-10-22 | 大连海事大学 | A method of the unmanned boat independent navigation based on deeply study and genetic algorithm |
CN110470301A (en) * | 2019-08-13 | 2019-11-19 | 上海交通大学 | Unmanned plane paths planning method under more dynamic task target points |
CN110488872A (en) * | 2019-09-04 | 2019-11-22 | 中国人民解放军国防科技大学 | A kind of unmanned plane real-time route planing method based on deeply study |
CN111027143A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Shipboard aircraft approach guiding method based on deep reinforcement learning |
CN113064422A (en) * | 2021-03-09 | 2021-07-02 | 河海大学 | Autonomous underwater vehicle path planning method based on double neural network reinforcement learning |
CN112947594A (en) * | 2021-04-07 | 2021-06-11 | 东北大学 | Unmanned aerial vehicle-oriented flight path planning method |
CN113190039A (en) * | 2021-04-27 | 2021-07-30 | 大连理工大学 | Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
MINGSHENG GAO: "UAV path planning with kinematic constraints based on deep reinforcement learning", EVENT: 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, 1 August 2022 (2022-08-01), pages 1 - 10 * |
相晓嘉: "基于深度强化学习的固定翼无人机编队协调控制方法", 航空学报, vol. 42, no. 4, 30 April 2021 (2021-04-30), pages 1 - 14 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115268494A (en) * | 2022-07-26 | 2022-11-01 | 江苏科技大学 | Unmanned aerial vehicle path planning method based on layered reinforcement learning |
CN116683349A (en) * | 2023-06-27 | 2023-09-01 | 国网青海省电力公司海北供电公司 | Correction method and system for power equipment sky inspection line and inspection unmanned aerial vehicle |
CN116683349B (en) * | 2023-06-27 | 2024-01-26 | 国网青海省电力公司海北供电公司 | Correction method and system for power equipment sky inspection line and inspection unmanned aerial vehicle |
CN116661501A (en) * | 2023-07-24 | 2023-08-29 | 北京航空航天大学 | Unmanned aerial vehicle cluster high dynamic environment obstacle avoidance and moving platform landing combined planning method |
CN116661501B (en) * | 2023-07-24 | 2023-10-10 | 北京航空航天大学 | Unmanned aerial vehicle cluster high dynamic environment obstacle avoidance and moving platform landing combined planning method |
Also Published As
Publication number | Publication date |
---|---|
CN114003059B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992000B (en) | Multi-unmanned aerial vehicle path collaborative planning method and device based on hierarchical reinforcement learning | |
CN114003059A (en) | UAV path planning method based on deep reinforcement learning under kinematic constraint condition | |
US11794898B2 (en) | Air combat maneuvering method based on parallel self-play | |
CN110991972B (en) | Cargo transportation system based on multi-agent reinforcement learning | |
CN110488872B (en) | Unmanned aerial vehicle real-time path planning method based on deep reinforcement learning | |
Duan et al. | ? Hybrid particle swarm optimization and genetic algorithm for multi-UAV formation reconfiguration | |
CN110488859B (en) | Unmanned aerial vehicle route planning method based on improved Q-learning algorithm | |
Liu et al. | Multi-UAV path planning based on fusion of sparrow search algorithm and improved bioinspired neural network | |
Sahingoz | Flyable path planning for a multi-UAV system with Genetic Algorithms and Bezier curves | |
US20210325891A1 (en) | Graph construction and execution ml techniques | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN112947592B (en) | Reentry vehicle trajectory planning method based on reinforcement learning | |
Cao et al. | Hunting algorithm for multi-auv based on dynamic prediction of target trajectory in 3d underwater environment | |
CN113268074B (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
CN113093733B (en) | Sea-to-sea striking method for unmanned boat cluster | |
CN113065709A (en) | Cross-domain heterogeneous cluster path planning method based on reinforcement learning | |
Xiang et al. | An effective memetic algorithm for UAV routing and orientation under uncertain navigation environments | |
Ke et al. | Cooperative path planning for air–sea heterogeneous unmanned vehicles using search-and-tracking mission | |
Xue et al. | Multi-agent deep reinforcement learning for uavs navigation in unknown complex environment | |
Jin et al. | Cooperative path planning with priority target assignment and collision avoidance guidance for rescue unmanned surface vehicles in a complex ocean environment | |
CN114967721A (en) | Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet | |
Sun et al. | Cooperative strategy for pursuit-evasion problem with collision avoidance | |
Liang et al. | Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network | |
Zeng et al. | Path planning for rendezvous of multiple AUVs operating in a variable ocean | |
Ali et al. | Feature selection-based decision model for UAV path planning on rough terrains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |