CN114003059A - UAV path planning method based on deep reinforcement learning under kinematic constraint condition - Google Patents

UAV path planning method based on deep reinforcement learning under kinematic constraint condition Download PDF

Info

Publication number
CN114003059A
CN114003059A CN202111282488.8A CN202111282488A CN114003059A CN 114003059 A CN114003059 A CN 114003059A CN 202111282488 A CN202111282488 A CN 202111282488A CN 114003059 A CN114003059 A CN 114003059A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
neural network
reinforcement learning
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111282488.8A
Other languages
Chinese (zh)
Other versions
CN114003059B (en
Inventor
高明生
张晓璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Campus of Hohai University
Original Assignee
Changzhou Campus of Hohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Campus of Hohai University filed Critical Changzhou Campus of Hohai University
Priority to CN202111282488.8A priority Critical patent/CN114003059B/en
Publication of CN114003059A publication Critical patent/CN114003059A/en
Application granted granted Critical
Publication of CN114003059B publication Critical patent/CN114003059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Abstract

The invention discloses a UAV path planning method based on deep reinforcement learning under a kinematic constraint condition, which comprises the following specific steps of: s1: the deep reinforcement learning neural network obtains the shortest path according to the vector coordinates of the plurality of task points and the static barrier; s2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off; s3: when the existence of the dynamic barrier is detected, the unmanned aerial vehicle sends a signal to the base station, and the super computer predicts the position of the unmanned aerial vehicle when receiving the signal; s4: outputting a new flight path by using a deep reinforcement learning neural network according to the coordinates of the dynamic barrier and the rest task points, and sending the new flight path to the unmanned aerial vehicle by radio; s5: and the unmanned aerial vehicle executes the tasks along the new path, and finally returns to the base after all the tasks are executed. The invention provides a framework based on online and offline, which not only solves the problem that the state and the action in Q-Learning are high-dimensional, but also considers a kinematic model and avoids dynamic obstacles while solving the TSP problem.

Description

UAV path planning method based on deep reinforcement learning under kinematic constraint condition
Technical Field
The invention belongs to the field of unmanned aerial vehicle path planning design, and particularly relates to a UAV path planning method based on deep reinforcement learning under a kinematic constraint condition.
Background
In the civil and military fields, an unmanned aerial vehicle usually needs to perform tasks at multiple target points, and finding an optimal path to traverse all the target points is a key technology of unmanned aerial vehicle application research, namely a path planning problem.
Generally, path planning problems fall into three categories:
1) numerical methods, such as methods of mixed integer programming; however, the numerical method usually needs to solve the problem of non-convex optimization, which not only needs special commercial software (such as CPLEX) but also takes a long time.
2) Traditional intelligent algorithms such as genetic algorithm, ant colony algorithm, greedy algorithm, simulated annealing method and the like. However, the group intelligence algorithm is easy to fall into local optimization, and because the implementation of the operator has many parameters, such as cross rate and mutation rate, the selection of the parameters may cause the problem of solving premature convergence; and the traditional intelligent algorithm can only provide a solution close to the optimal solution, and cannot ensure or globally optimize the solution.
3) An algorithm based on reinforcement learning. The principle of reinforcement learning is an algorithm that an agent selects an action by observing the current state and learns according to the obtained reward value. Compared with numerical algorithms and traditional intelligent algorithms, reinforcement learning is based on a markov process, which makes use of the property that markov matrices necessarily converge for global planning.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a UAV path planning method based on deep reinforcement Learning under a kinematic constraint condition, provides a framework based on online and ofline, not only solves the problem that the state and the action in Q-Learning are high-dimensional, but also considers a kinematic model and avoids dynamic obstacles while solving the TSP problem.
The invention mainly adopts the technical scheme that:
a UAV path planning method based on deep reinforcement learning under a kinematic constraint condition comprises the following specific steps:
s1: when the unmanned aerial vehicle is at the base, according to the vector coordinates of the plurality of task points and the static barrier, the shortest path of the unmanned aerial vehicle under the kinematic constraint is obtained by using the deep reinforcement learning neural network;
s2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off;
s3: in the process of executing tasks, when the radar on the unmanned aerial vehicle detects that a dynamic obstacle exists within 5km, the unmanned aerial vehicle is connected with the unmanned aerial vehicleTransmitting the vector coordinates of the dynamic obstacles and the residual task points to the base station by the over-the-air system, flying along the original path before receiving the feedback signal of the base station, and transmitting the signal to the super computer of the base station according to the time t from the transmission of the signal to the reception of the signal by the unmanned aerial vehicle0Predicting the position of the unmanned aerial vehicle when receiving the signal;
s4: the super computer of the base outputs Q values of all actions by using a deep reinforcement learning neural network according to the coordinates of the dynamic obstacles and the residual task points, generates a new action selection strategy epsilon-greedy from the Q values, selects actions according to the new action selection strategy epsilon-greedy to obtain a new flight path, and sends the new flight path to the unmanned aerial vehicle through radio;
s5: and after receiving the feedback signal, the unmanned aerial vehicle executes the tasks along the new path, finally returns to the base after all the tasks are executed, and the unmanned aerial vehicle completes the tasks.
Preferably, the specific steps of using the deep reinforcement learning neural network to obtain the shortest path of the drone under the kinematic constraint in the step S1 are as follows:
s1-1: when the unmanned aerial vehicle is at the base, the N target task points are numbered as 1, 2 and 3 … … N in sequence, the base is numbered as 0, the dimension of the state vector of the unmanned aerial vehicle is set as N +2, the first bit in the state vector of the unmanned aerial vehicle is 0 and represents the base number, and the last bit is thetaiThe current task point incidence angle with the number i represents, and the middle digit is updated to the task point number according to the task point reached by the unmanned aerial vehicle, so that the initial state vector of the unmanned aerial vehicle at the base is as follows:
sinitial=[0,0,0,…,0,θ0]T (1);
wherein the first bit is 0, representing the base number, the other 0 representing the initial state when the task point is not reached, θ0Represents the angle of incidence of the drone at base 0;
s1-2: the unmanned aerial vehicle state vector is used as the input of a deep reinforcement learning neural network, which action is solved and selected by the deep reinforcement learning neural network, so that the total distance under the kinematic constraint is shortest, namely the Q value is maximum, and an action selection strategy epsilon-greedy is generated;
s1-3: the deep reinforcement learning neural network selects actions according to an action selection strategy epsilon-greedy, determines which task point to go to and at what angle to fly out, randomly searches when the random number is smaller than epsilon, and selects the action with the maximum Q value when the random number is larger than or equal to epsilon, so that the state of the unmanned aerial vehicle is updated as follows:
sbcd=[0,b,c,d,0…,0,θd]T (2);
wherein, b, c and d are the serial numbers of the task points reached by the unmanned aerial vehicle in sequence, and thetadThe number of the incident angle is d, the state vector of the unmanned aerial vehicle is the number sequence of the flying task points of the unmanned aerial vehicle, and the state vector of the unmanned aerial vehicle is updated once every action is taken.
Preferably, the deep reinforcement learning neural network comprises two neural networks of the same structure: neural network QevalAnd neural network QtargetDuring initialization, the parameter weights of the two neural networks are the same, and then the neural network QevalTraining and updating nerve Q is reversely transmitted every h steps while generating action selection strategy epsilon-greedyevalNetwork parameter omega of network to obtain new neural network QevalThe method comprises the following specific steps:
s1-21: shortest Dubins curve distance l between two pointsDubinsThe calculation formula of (a) is as follows:
Figure BDA0003331688900000041
in the formula (3), alpha and beta are incident angles of two points respectively, d is a linear distance between the two points, R is a turning radius of a Dubins curve, R represents clockwise motion, S represents linear motion, and L represents anticlockwise motion;
when any two task points P1And P2When no barrier exists between the two task points, substituting the vector coordinates of the two task points into a formula (3) to calculate the shortest Dubins curve distance of the two task points
Figure BDA0003331688900000042
When any two task points P1And P2When a static obstacle or a dynamic obstacle exists between the two, the shortest Dubins curve distance of the two task points
Figure BDA0003331688900000043
The specific calculation steps are as follows:
firstly, a circle C with radius r is taken as the center of a circle by taking the center of a dynamic barrier or a static barrier as the center of the circle2Wherein r is the turning radius of the Dubins curve; then, the unmanned plane moves towards the circle C from the moving direction of the position2Making tangent lines to obtain common tangent points
Figure BDA0003331688900000044
And a vector
Figure BDA0003331688900000045
(Vector)
Figure BDA0003331688900000046
Expressed as:
Figure BDA0003331688900000047
wherein the content of the first and second substances,
Figure BDA0003331688900000048
respectively two common tangent points
Figure BDA0003331688900000049
Is determined by the coordinate of (a) in the space,
Figure BDA00033316889000000410
is two common tangent points
Figure BDA00033316889000000411
Angle of incidence of;
according to two task points P1And P2Vector coordinates and vectors
Figure BDA00033316889000000412
Calculates two task points P1And P2The shortest distance between the Dubins curves
Figure BDA0003331688900000051
As shown in equation (5):
Figure BDA0003331688900000052
wherein the content of the first and second substances,
Figure BDA0003331688900000053
respectively represent task points P1And P2Of (2), wherein P1For the current task point, P2Is the next task point;
Figure BDA0003331688900000054
are obtained by calculation according to a formula (3);
s1-22: according to the shortest Dubins curve distance between two task points
Figure BDA0003331688900000055
Calculating the prize value p as shown in equation (6):
Figure BDA0003331688900000056
in the formula (6), γ1The discount coefficient is set to be 0.1 and is used for preventing gradient explosion caused by too large difference of training data;
s1-23: calculating the Loss value of the Loss function by using the reward value p calculated in the step S1-22, wherein the Loss function is expressed by the formula (7):
Figure BDA0003331688900000057
in the formula (7), the reaction mixture is,
Figure BDA0003331688900000058
neural network Q for deep reinforcement learningevalApproximate Q value of output, sjIs the state of the jth data, ajFor the j-th data, ω is the deep reinforcement learning neural network QevalParameter of (2) need to be trained, yjA Q value calculated for the drone by the instant prize value, as shown in equation (8):
Figure BDA0003331688900000059
in the formula (8), ρjIs shown in state sjTaking action ajInstant prize value, gamma, obtained2For the discount coefficient, set to 0.01,
Figure BDA00033316889000000510
neural network Q for deep reinforcement learningtargetPredicted at state s'j+1Take action a'jMaximum Q value obtainable, wherein, state s'j+1Is the state s in the formula (8)jTaking action ajRear state, a'jIs unmanned plane at state s'j+1An action capable of obtaining a maximum Q value;
s1-24: training and updating the nerve Q according to the Loss value obtained in the step S1-23 and the reverse transmissionevalThe network parameters ω of the network, and in addition, the neural network Q, are applied every 5 × h stepsevalTo the network QtargetAnd (6) updating.
Preferably, the neural network QevalAnd neural network QtargetEach comprising 3 convolutional layers and 3 fully-connected layers, each convolutional layer having a convolutional kernel size of 4 × 4, a step size of 3 × 3, and an output action number of N × 24.
Has the advantages that: the invention provides a UAV path planning method based on Deep reinforcement Learning under a kinematic constraint condition, which obtains the path planning of an unmanned aerial vehicle by adopting Deep reinforcement Learning DQN (Deep Q-Learning), and has the following advantages:
(1) aiming at the complex problem that reinforcement learning cannot process high dimension, Deep Reinforcement Learning (DRL) adopts a neural network to approximate a Q value, so that the defect of reinforcement learning is overcome;
(2) due to the existence of the exploration rate epsilon, the algorithm can explore a global optimal solution, and the problem of premature convergence is solved;
(3) compared with the traditional intelligent algorithm, the optimal solution under the kinematic constraint condition can be obtained, the method has a certain obstacle avoidance function, and can be widely applied to the aspects of inspection, detection or logistics dispatching and the like of multiple target points in civil or military affairs.
Drawings
FIG. 1 is a flow chart of a path planning method of the present invention;
fig. 2 is a schematic diagram of path planning when encountering dynamic or static obstacles.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example 1
As shown in fig. 1, a UAV path planning method based on deep reinforcement learning under the kinematic constraint condition includes the following specific steps:
s1: when the unmanned aerial vehicle is at a base, according to the vector coordinates of the plurality of task points and the static barrier, the shortest path of the unmanned aerial vehicle under the kinematic constraint is obtained by using a deep reinforcement learning neural network, wherein the concrete steps of the shortest path are as follows:
s1-1: when the unmanned aerial vehicle is at the base, the N target task points are numbered as 1, 2 and 3 … … N in sequence, the base is numbered as 0, the state vector dimension of the unmanned aerial vehicle is set as N +2, the first bit in the state vector of the unmanned aerial vehicle is 0 and represents the base number, and finally the unmanned aerial vehicle is at the baseOne bit is thetaiThe current task point incidence angle with the number i represents, and the middle digit is updated to the task point number according to the task point reached by the unmanned aerial vehicle, so that the initial state vector of the unmanned aerial vehicle at the base is as follows:
sinitial=[0,0,0,…,0,θ0]T (1);
wherein the first bit is 0, representing the base number, the other 0 representing the initial state when the task point is not reached, θ0Represents the angle of incidence of the drone at base 0;
s1-2: the unmanned aerial vehicle state vector is used as the input of a deep reinforcement learning neural network, which action is solved and selected by the deep reinforcement learning neural network, so that the total distance under the kinematic constraint is shortest, namely the Q value is maximum, and an action selection strategy epsilon-greedy is generated;
s1-3: the deep reinforcement learning neural network selects actions according to an action selection strategy epsilon-greedy, determines which task point to go to and at what angle to fly out, randomly searches when the random number is smaller than epsilon, and selects the action with the maximum Q value when the random number is larger than or equal to epsilon, so that the state of the unmanned aerial vehicle is updated as follows:
sbcd=[0,b,c,d,0…,0,θd]T (2);
wherein, b, c and d are the serial numbers of the task points reached by the unmanned aerial vehicle in sequence, and thetadThe number of the incident angle is d, the state vector of the unmanned aerial vehicle is the number sequence of the flying task points of the unmanned aerial vehicle, and the state vector of the unmanned aerial vehicle is updated once every action is taken.
S2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off;
s3: in the process of executing the task, when a radar on the unmanned aerial vehicle detects that a dynamic barrier exists within 5km, the unmanned aerial vehicle sends the dynamic barrier and the vector coordinates of the residual task points to the base through radio, flies along the original path before receiving a feedback signal of the base, and a supercomputer of the base flies according to the time t from the sending of the signal to the receiving of the signal of the unmanned aerial vehicle0Predicting the position of the unmanned aerial vehicle when receiving the signal;
s4: the super computer of the base outputs Q values of all actions by using a deep reinforcement learning neural network according to the coordinates of the dynamic obstacles and the residual task points, generates a new action selection strategy epsilon-greedy from the Q values, selects actions according to the new action selection strategy epsilon-greedy to obtain a new flight path, and sends the new flight path to the unmanned aerial vehicle through radio;
s5: and after receiving the feedback signal, the unmanned aerial vehicle executes the tasks along the new path, finally returns to the base after all the tasks are executed, and the unmanned aerial vehicle completes the tasks.
In this embodiment 1, the deep reinforcement learning neural network includes two neural networks with the same structure: namely neural network QevalAnd neural network QtargetAnd the neural network QevalAnd neural network QtargetEach comprising 3 convolutional layers and 3 fully-connected layers, each convolutional layer having a convolutional kernel size of 4 × 4, a step size of 3 × 3, and an output action number of N × 24. During initialization, the parameter weights of the two neural networks are the same, and subsequently, the neural network QevalTraining and updating nerve Q is reversely transmitted every h steps while generating action selection strategy epsilon-greedyevalNetwork parameter omega of network to obtain new neural network QevalThe method comprises the following specific steps:
s1-21: shortest Dubins curve distance l between two pointsDubinsThe calculation formula of (a) is as follows:
Figure BDA0003331688900000091
in the formula (3), alpha and beta are incident angles of two points respectively, d is a linear distance between the two points, R is a turning radius of a Dubins curve, R represents clockwise motion, S represents linear motion, and L represents anticlockwise motion;
when any two task points P1And P2When no barrier exists between the two task points, substituting the vector coordinates of the two task points into a formula (3) to calculate the shortest Dubins curve distance of the two task points
Figure BDA0003331688900000092
When any two task points P1And P2When a static obstacle or a dynamic obstacle exists between the two, the shortest Dubins curve distance of the two task points
Figure BDA0003331688900000093
The specific calculation steps are as follows:
as shown in FIG. 2, a circle C with radius r is first formed by using the center of the dynamic obstacle or the static obstacle as the center of a circle2Wherein r is the turning radius of the Dubins curve; then, the unmanned plane moves towards the circle C from the moving direction of the position2Making tangent lines to obtain common tangent points
Figure BDA0003331688900000094
And a vector
Figure BDA0003331688900000095
(Vector)
Figure BDA0003331688900000096
Expressed as:
Figure BDA0003331688900000097
wherein the content of the first and second substances,
Figure BDA0003331688900000101
respectively two common tangent points
Figure BDA0003331688900000102
Is determined by the coordinate of (a) in the space,
Figure BDA0003331688900000103
is two common tangent points
Figure BDA0003331688900000104
Angle of incidence of;
according to two task points P1And P2Vector coordinates and vectors
Figure BDA0003331688900000105
Calculates two task points P1And P2The shortest distance between the Dubins curves
Figure BDA0003331688900000106
As shown in equation (5):
Figure BDA0003331688900000107
wherein the content of the first and second substances,
Figure BDA0003331688900000108
respectively represent task points P1And P2Of (2), wherein P1For the current task point, P2Is the next task point;
Figure BDA0003331688900000109
are obtained by calculation according to a formula (3);
s1-22: according to the shortest Dubins curve distance between two task points
Figure BDA00033316889000001010
Calculating the prize value p as shown in equation (6):
Figure BDA00033316889000001011
in the formula (6), γ1The discount coefficient is set to be 0.1 and is used for preventing gradient explosion caused by too large difference of training data;
s1-23: calculating the Loss value of the Loss function by using the reward value p calculated in the step S1-22, wherein the Loss function is expressed by the formula (7):
Figure BDA00033316889000001012
in the formula (7), the reaction mixture is,
Figure BDA00033316889000001013
neural network Q for deep reinforcement learningevalApproximate Q value of output, sjIs the state of the jth data, ajFor the j-th data, ω is the deep reinforcement learning neural network QevalParameter of (2) need to be trained, yjA Q value calculated for the drone by the instant prize value, as shown in equation (8):
Figure BDA00033316889000001014
in the formula (8), ρjIs shown in state sjTaking action ajInstant prize value, gamma, obtained2For the discount coefficient, set to 0.01,
Figure BDA0003331688900000111
neural network Q for deep reinforcement learningtargetPredicted at state s'j+1Take action a'jMaximum Q value obtainable, wherein, state s'j+1Is the state s in the formula (8)jTaking action ajRear state, a'jIs unmanned plane at state s'j+1An action capable of obtaining a maximum Q value;
s1-24: training and updating the nerve Q according to the Loss value obtained in the step S1-23 and the reverse transmissionevalThe network parameters ω of the network, and in addition, the neural network Q, are applied every 5 × h stepsevalTo the neural network QtargetAnd (6) updating.
In the invention, the tangent lines are made as shown in figure 2 when the static obstacle and the dynamic obstacle meet, and the shortest Dubins curve distance of the two task points
Figure BDA0003331688900000112
The same is true for the calculation of (c). This is because the action selected by the action selection strategy of the deep reinforcement learning determines at what angle the next point is emittedThe angle of incidence of the two points is known, and the coordinates of the obstacle are known, and the calculation is the same. Therefore, when the dynamic barrier is detected, the unmanned plane sends the dynamic barrier coordinate to the base, the dynamic barrier coordinate and the incident angles of the two points are provided, and the tangent method and the calculation method are consistent with the static barrier processing mode.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (4)

1. A UAV path planning method based on deep reinforcement learning under a kinematic constraint condition is characterized by comprising the following specific steps:
s1: when the unmanned aerial vehicle is at the base, according to the vector coordinates of the plurality of task points and the static barrier, the shortest path of the unmanned aerial vehicle under the kinematic constraint is obtained by using the deep reinforcement learning neural network;
s2: flying and executing tasks along the shortest path after the unmanned aerial vehicle takes off;
s3: in the process of executing the task, when a radar on the unmanned aerial vehicle detects that a dynamic barrier exists within 5km, the unmanned aerial vehicle sends the dynamic barrier and the vector coordinates of the residual task points to the base through radio, flies along the original path before receiving a feedback signal of the base, and a supercomputer of the base flies according to the time t from the sending of the signal to the receiving of the signal of the unmanned aerial vehicle0Predicting the position of the unmanned aerial vehicle when receiving the signal;
s4: the super computer of the base outputs Q values of all actions by using a deep reinforcement learning neural network according to the coordinates of the dynamic obstacles and the residual task points, generates a new action selection strategy epsilon-greedy from the Q values, selects actions according to the new action selection strategy epsilon-greedy to obtain a new flight path, and sends the new flight path to the unmanned aerial vehicle through radio;
s5: and after receiving the feedback signal, the unmanned aerial vehicle executes the tasks along the new path, finally returns to the base after all the tasks are executed, and the unmanned aerial vehicle completes the tasks.
2. The method for planning UAV path based on deep reinforcement learning under the kinematic constraint condition of claim 1, wherein the specific steps of using the deep reinforcement learning neural network to derive the shortest path of the drone under the kinematic constraint in step S1 are as follows:
s1-1: when the unmanned aerial vehicle is at the base, the N target task points are numbered as 1, 2 and 3 … … N in sequence, the base is numbered as 0, the dimension of the state vector of the unmanned aerial vehicle is set as N +2, the first bit in the state vector of the unmanned aerial vehicle is 0 and represents the base number, and the last bit is thetaiThe current task point incidence angle with the number i represents, and the middle digit is updated to the task point number according to the task point reached by the unmanned aerial vehicle, so that the initial state vector of the unmanned aerial vehicle at the base is as follows:
sinitial=[0,0,0,...,0,θ0]T (1);
wherein the first bit is 0, representing the base number, the other 0 representing the initial state when the task point is not reached, θ0Represents the angle of incidence of the drone at base 0;
s1-2: the unmanned aerial vehicle state vector is used as the input of a deep reinforcement learning neural network, which action is solved and selected by the deep reinforcement learning neural network, so that the total distance under the kinematic constraint is shortest, namely the Q value is maximum, and an action selection strategy epsilon-greedy is generated;
s1-3: the deep reinforcement learning neural network selects actions according to an action selection strategy epsilon-greedy, determines which task point to go to and at what angle to fly out, randomly searches when the random number is smaller than epsilon, and selects the action with the maximum Q value when the random number is larger than or equal to epsilon, so that the state of the unmanned aerial vehicle is updated as follows:
sbcd=[0,b,c,d,0…,0,θd]T (2);
wherein, b, c and d are the serial numbers of the task points reached by the unmanned aerial vehicle in sequence, and thetadThe incident angle of the task point is numbered as d, and the state direction of the unmanned aerial vehicleThe number is the sequence of the serial numbers of the flying task points of the unmanned aerial vehicle, and the state vector of the unmanned aerial vehicle is updated once every action is taken.
3. The method for planning UAV path according to claim 2, wherein the deep-reinforcement learning neural network comprises two neural networks with the same structure: neural network QevalAnd neural network QtargetDuring initialization, the parameter weights of the two neural networks are the same, and then the neural network QevalTraining and updating nerve Q is reversely transmitted every h steps while generating action selection strategy epsilon-greedyevalNetwork parameter omega of network to obtain new neural network QevalThe method comprises the following specific steps:
s1-21: shortest Dubins curve distance l between two pointsDubinsThe calculation formula of (a) is as follows:
Figure FDA0003331688890000031
in the formula (3), alpha and beta are incident angles of two points respectively, d is a linear distance between the two points, R is a turning radius of a Dubins curve, R represents clockwise motion, S represents linear motion, and L represents anticlockwise motion;
when any two task points P1And P2When no barrier exists between the two task points, substituting the vector coordinates of the two task points into a formula (3) to calculate the shortest Dubins curve distance of the two task points
Figure FDA0003331688890000032
When any two task points P1And P2When a static obstacle or a dynamic obstacle exists between the two, the shortest Dubins curve distance of the two task points
Figure FDA0003331688890000033
The specific calculation steps are as follows:
firstly, a circle C with radius r is taken as the center of a circle by taking the center of a dynamic barrier or a static barrier as the center of the circle2Wherein r is the turning radius of the Dubins curve; then, the unmanned plane moves towards the circle C from the moving direction of the position2Making tangent lines to obtain common tangent points
Figure FDA0003331688890000034
And a vector
Figure FDA0003331688890000035
(Vector)
Figure FDA0003331688890000036
Expressed as:
Figure FDA0003331688890000037
wherein the content of the first and second substances,
Figure FDA0003331688890000038
respectively two common tangent points
Figure FDA0003331688890000039
Is determined by the coordinate of (a) in the space,
Figure FDA00033316888900000310
is two common tangent points
Figure FDA00033316888900000311
Angle of incidence of;
according to two task points P1And P2Vector coordinates and vectors
Figure FDA00033316888900000312
Calculates two task points P1And P2The shortest distance between the Dubins curves
Figure FDA00033316888900000313
As shown in equation (5):
Figure FDA00033316888900000314
wherein the content of the first and second substances,
Figure FDA0003331688890000041
respectively represent task points P1And P2Of (2), wherein P1For the current task point, P2Is the next task point;
Figure FDA0003331688890000042
are obtained by calculation according to a formula (3);
s1-22: according to the shortest Dubins curve distance between two task points
Figure FDA0003331688890000043
Calculating the prize value p as shown in equation (6):
Figure FDA0003331688890000044
in the formula (6), γ1The discount coefficient is set to be 0.1 and is used for preventing gradient explosion caused by too large difference of training data;
s1-23: calculating the Loss value of the Loss function by using the reward value p calculated in the step S1-22, wherein the Loss function is expressed by the formula (7):
Figure FDA0003331688890000045
in the formula (7), the reaction mixture is,
Figure FDA0003331688890000046
neural network Q for deep reinforcement learningevalOf the outputApproximate Q value, sjIs the state of the jth data, ajFor the j-th data, ω is the deep reinforcement learning neural network QevalParameter of (2) need to be trained, yjA Q value calculated for the drone by the instant prize value, as shown in equation (8):
Figure FDA0003331688890000047
in the formula (8), ρjIs shown in state sjTaking action ajInstant prize value, gamma, obtained2For the discount coefficient, set to 0.01,
Figure FDA0003331688890000048
neural network Q for deep reinforcement learningtargetPredicted at state s'j+1Take action a'jMaximum Q value obtainable, wherein, state s'j+1Is the state s in the formula (8)jTaking action ajRear state, a'jIs unmanned plane at state s'j+1An action capable of obtaining a maximum Q value;
s1-24: training and updating the nerve Q according to the Loss value obtained in the step S1-23 and the reverse transmissionevalThe network parameters ω of the network, and in addition, the neural network Q, are applied every 5 × h stepsevalTo the network QtargetAnd (6) updating.
4. The UAV path planning method based on deep reinforcement learning under the kinematic constraint condition of claim 3, wherein the neural network Q isevalAnd neural network QtargetEach comprising 3 convolutional layers and 3 fully-connected layers, each convolutional layer having a convolutional kernel size of 4 × 4, a step size of 3 × 3, and an output action number of N × 24.
CN202111282488.8A 2021-11-01 2021-11-01 UAV path planning method based on deep reinforcement learning under kinematic constraint condition Active CN114003059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111282488.8A CN114003059B (en) 2021-11-01 2021-11-01 UAV path planning method based on deep reinforcement learning under kinematic constraint condition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111282488.8A CN114003059B (en) 2021-11-01 2021-11-01 UAV path planning method based on deep reinforcement learning under kinematic constraint condition

Publications (2)

Publication Number Publication Date
CN114003059A true CN114003059A (en) 2022-02-01
CN114003059B CN114003059B (en) 2024-04-16

Family

ID=79926040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111282488.8A Active CN114003059B (en) 2021-11-01 2021-11-01 UAV path planning method based on deep reinforcement learning under kinematic constraint condition

Country Status (1)

Country Link
CN (1) CN114003059B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115268494A (en) * 2022-07-26 2022-11-01 江苏科技大学 Unmanned aerial vehicle path planning method based on layered reinforcement learning
CN116661501A (en) * 2023-07-24 2023-08-29 北京航空航天大学 Unmanned aerial vehicle cluster high dynamic environment obstacle avoidance and moving platform landing combined planning method
CN116683349A (en) * 2023-06-27 2023-09-01 国网青海省电力公司海北供电公司 Correction method and system for power equipment sky inspection line and inspection unmanned aerial vehicle

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106406346A (en) * 2016-11-01 2017-02-15 北京理工大学 Plan method for rapid coverage track search coordinated by multiple UAVs (Unmanned Aerial Vehicles)
CN109685237A (en) * 2017-10-19 2019-04-26 北京航空航天大学 A kind of real-time planing method of unmanned aerial vehicle flight path based on the path Dubins and branch and bound
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110470301A (en) * 2019-08-13 2019-11-19 上海交通大学 Unmanned plane paths planning method under more dynamic task target points
CN110488872A (en) * 2019-09-04 2019-11-22 中国人民解放军国防科技大学 A kind of unmanned plane real-time route planing method based on deeply study
WO2020056875A1 (en) * 2018-09-20 2020-03-26 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN111027143A (en) * 2019-12-18 2020-04-17 四川大学 Shipboard aircraft approach guiding method based on deep reinforcement learning
CN112947594A (en) * 2021-04-07 2021-06-11 东北大学 Unmanned aerial vehicle-oriented flight path planning method
CN113064422A (en) * 2021-03-09 2021-07-02 河海大学 Autonomous underwater vehicle path planning method based on double neural network reinforcement learning
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106406346A (en) * 2016-11-01 2017-02-15 北京理工大学 Plan method for rapid coverage track search coordinated by multiple UAVs (Unmanned Aerial Vehicles)
CN109685237A (en) * 2017-10-19 2019-04-26 北京航空航天大学 A kind of real-time planing method of unmanned aerial vehicle flight path based on the path Dubins and branch and bound
WO2020056875A1 (en) * 2018-09-20 2020-03-26 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110470301A (en) * 2019-08-13 2019-11-19 上海交通大学 Unmanned plane paths planning method under more dynamic task target points
CN110488872A (en) * 2019-09-04 2019-11-22 中国人民解放军国防科技大学 A kind of unmanned plane real-time route planing method based on deeply study
CN111027143A (en) * 2019-12-18 2020-04-17 四川大学 Shipboard aircraft approach guiding method based on deep reinforcement learning
CN113064422A (en) * 2021-03-09 2021-07-02 河海大学 Autonomous underwater vehicle path planning method based on double neural network reinforcement learning
CN112947594A (en) * 2021-04-07 2021-06-11 东北大学 Unmanned aerial vehicle-oriented flight path planning method
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MINGSHENG GAO: "UAV path planning with kinematic constraints based on deep reinforcement learning", EVENT: 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, 1 August 2022 (2022-08-01), pages 1 - 10 *
相晓嘉: "基于深度强化学习的固定翼无人机编队协调控制方法", 航空学报, vol. 42, no. 4, 30 April 2021 (2021-04-30), pages 1 - 14 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115268494A (en) * 2022-07-26 2022-11-01 江苏科技大学 Unmanned aerial vehicle path planning method based on layered reinforcement learning
CN116683349A (en) * 2023-06-27 2023-09-01 国网青海省电力公司海北供电公司 Correction method and system for power equipment sky inspection line and inspection unmanned aerial vehicle
CN116683349B (en) * 2023-06-27 2024-01-26 国网青海省电力公司海北供电公司 Correction method and system for power equipment sky inspection line and inspection unmanned aerial vehicle
CN116661501A (en) * 2023-07-24 2023-08-29 北京航空航天大学 Unmanned aerial vehicle cluster high dynamic environment obstacle avoidance and moving platform landing combined planning method
CN116661501B (en) * 2023-07-24 2023-10-10 北京航空航天大学 Unmanned aerial vehicle cluster high dynamic environment obstacle avoidance and moving platform landing combined planning method

Also Published As

Publication number Publication date
CN114003059B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN109992000B (en) Multi-unmanned aerial vehicle path collaborative planning method and device based on hierarchical reinforcement learning
CN114003059A (en) UAV path planning method based on deep reinforcement learning under kinematic constraint condition
US11794898B2 (en) Air combat maneuvering method based on parallel self-play
CN110991972B (en) Cargo transportation system based on multi-agent reinforcement learning
CN110488872B (en) Unmanned aerial vehicle real-time path planning method based on deep reinforcement learning
Duan et al. ? Hybrid particle swarm optimization and genetic algorithm for multi-UAV formation reconfiguration
CN110488859B (en) Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
Liu et al. Multi-UAV path planning based on fusion of sparrow search algorithm and improved bioinspired neural network
Sahingoz Flyable path planning for a multi-UAV system with Genetic Algorithms and Bezier curves
US20210325891A1 (en) Graph construction and execution ml techniques
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN112947592B (en) Reentry vehicle trajectory planning method based on reinforcement learning
Cao et al. Hunting algorithm for multi-auv based on dynamic prediction of target trajectory in 3d underwater environment
CN113268074B (en) Unmanned aerial vehicle flight path planning method based on joint optimization
CN113093733B (en) Sea-to-sea striking method for unmanned boat cluster
CN113065709A (en) Cross-domain heterogeneous cluster path planning method based on reinforcement learning
Xiang et al. An effective memetic algorithm for UAV routing and orientation under uncertain navigation environments
Ke et al. Cooperative path planning for air–sea heterogeneous unmanned vehicles using search-and-tracking mission
Xue et al. Multi-agent deep reinforcement learning for uavs navigation in unknown complex environment
Jin et al. Cooperative path planning with priority target assignment and collision avoidance guidance for rescue unmanned surface vehicles in a complex ocean environment
CN114967721A (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
Sun et al. Cooperative strategy for pursuit-evasion problem with collision avoidance
Liang et al. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network
Zeng et al. Path planning for rendezvous of multiple AUVs operating in a variable ocean
Ali et al. Feature selection-based decision model for UAV path planning on rough terrains

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant