CN112435275A - Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm - Google Patents
Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm Download PDFInfo
- Publication number
- CN112435275A CN112435275A CN202011440212.3A CN202011440212A CN112435275A CN 112435275 A CN112435275 A CN 112435275A CN 202011440212 A CN202011440212 A CN 202011440212A CN 112435275 A CN112435275 A CN 112435275A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- target
- state
- ddqn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Operations Research (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Feedback Control In General (AREA)
Abstract
The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithms, which is used for accurately estimating a target motion state to obtain a target position and speed, further combining the state information of an unmanned aerial vehicle as neural network input, taking the acceleration and the angular speed of the unmanned aerial vehicle as action output, learning through the DDQN algorithm, completing the training of a flight strategy network, and realizing the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target. The method effectively solves the error problem of direct distance measurement by using the sensor in the traditional unmanned aerial vehicle target tracking task, has higher application value, and effectively solves the DQN over-estimation problem in the traditional DQN algorithm.
Description
Technical Field
The invention relates to the field of control, in particular to a method for tracking a maneuvering target of an unmanned aerial vehicle, relates to a DDQN algorithm based on deep reinforcement learning in the fields of Kalman filtering algorithm and computer, and belongs to the method application of interdiscipline.
Background
Unmanned Aerial Vehicle (UAV), as a novel aviation aircraft device, has become a practical and effective development tool in the fields of military, civil and scientific research, and plays an important role in aviation industry upgrading, military and civil technology integration and industrial efficiency innovation. In practical application, the unmanned aerial vehicle often needs to face task scenes such as cluster cooperative flight or ground target tracking, and the like, and has higher requirements on manual operation, task allocation and flight path planning technologies. Therefore, the method for effectively tracking the autonomous maneuvering target of the high-precision unmanned aerial vehicle has important significance.
When the unmanned aerial vehicle executes a flight task, the unmanned aerial vehicle often needs to complete related cooperative work along with a specified target, or track a maneuvering target to execute a reconnaissance task, and the flight tracks of multiple machines need to be planned in advance respectively. The unmanned aerial vehicle has the advantages that the unmanned aerial vehicle can learn by considering that the unmanned aerial vehicle has high requirements on manual operation and often faces dynamic external environment interference and other factors, and the unmanned aerial vehicle has great significance in autonomous learning and completing a tracking task on a maneuvering target in an unknown dynamic environment. The patent publication CN110610512A proposes an unmanned aerial vehicle target tracking method based on a BP neural network fusion kalman filtering algorithm, which predicts a target flight trajectory through kalman filtering, predicts a target position by using a BP neural network, and controls an unmanned aerial vehicle to track through PID. However, the method only utilizes the fitting capability of the neural network, does not provide learning capability for the unmanned aerial vehicle, and cannot be applied to a dynamically changing environment. Patent CN110806759A provides an aircraft route tracking method based on deep reinforcement learning, which completes route tracking by constructing a Markov decision process model and combining a deep reinforcement learning algorithm, however, in the method, an aircraft can only fly to reach each task point in sequence according to provided information to complete route tracking, and cannot track a maneuvering target with unknown motion trajectory, and certain limitations are provided.
The Kalman filtering algorithm is a common method in a control theory and a control system project, can estimate a true value based on data of an observed value and an estimated value, can be used for estimating the motion state of a maneuvering target in real time, and effectively improves the accuracy of target state prediction. As an optimization algorithm based on a deep reinforcement learning DQN algorithm, DDQN can not only help an unmanned aerial vehicle to provide learning ability and complete a maneuvering target tracking task, but also can effectively solve the problem of over-estimation of target state prediction, and improve learning precision and efficiency. Therefore, the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm is designed, and has important significance for the unmanned aerial vehicle to independently complete high-precision track prediction and real-time tracking task of maneuvering target.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm. According to the invention, a Kalman filtering technology is utilized to accurately estimate the motion state of a target to obtain the position and the speed of the target, the state information of the unmanned aerial vehicle is further combined as the input of a neural network, the acceleration and the angular speed of the unmanned aerial vehicle are used as the action output, the training of a flight strategy network is completed through the learning of a DDQN algorithm, and the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;
step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
wherein: x is the number of2,y2The position coordinates of the object are represented and,is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
wherein, FtFor the state transition matrix,. DELTA.t is the update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the variance of the noise; predicting the observed noise P at the moment t of the statetComprises the following steps:
Pt=FtPt-1Ft T+Qw
Pt-1refers to the observed noise covariance matrix at time t-1,
solving the Kalman coefficient at the moment t as follows:
wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
updating the noise covariance matrix and the prediction state:
wherein, I is an identity matrix,respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:
wherein the content of the first and second substances,is the acceleration of the unmanned aerial vehicle at the moment t,the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
step 1-4: defining a discount factor γ:
setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:
Rall=R1+γR2+γ2R3+…+γn-1Rn
Rnrepresenting the prize value obtained by the drone at the nth time;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;
step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectivelyParts by weight:
step 2-4: initialization step k is 0, training time t is 0 and state input s0;
Step 2-5: move each motion in the motion spaceAnd state stTransmitting the data into the main network theta to obtain Q value output corresponding to all actions of the main network,greedy selection of corresponding action a in current Q-value outputt;
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt:
Wherein the content of the first and second substances,outputting the corresponding action for the maximum Q value of the main network;
step 2-8: and updating the main network:
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,is a Hamiltonian, and is a Hamiltonian,represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;
step 2-9: updating the target network:
θ′←τθ+(1-τ)θ′
τ represents an update scale factor;
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
The reward function R is:
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;
wherein the content of the first and second substances,the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;
wherein the content of the first and second substances,representing the speed change rate of the unmanned aerial vehicle at the t moment;
setting an MDP model winning excitation function R:
wherein λ is1、λ2、λ3Weights are awarded for each.
The discount factor 0< γ < 1.
The invention has the beneficial effects that:
(1) in the MDP model constructed by the method, Kalman filtering is introduced for state input to process, so that the accuracy of target state prediction is improved, the error problem of direct distance measurement by using a sensor in a traditional unmanned aerial vehicle target tracking task is effectively solved, and the method has high application value;
(2) the invention uses DDQN algorithm, effectively solves the DQN over-estimation problem in the traditional DQN algorithm. The method comprises the steps of obtaining accurate target state estimation by combining with Kalman filtering, discretizing by combining with self state, and outputting appropriate actions through network processing to complete tracking of a maneuvering target; the unmanned aerial vehicle trained by the DDQN algorithm can cope with a dynamic change environment and complete the tracking of a maneuvering target.
Drawings
Fig. 1 is a flow chart of unmanned aerial vehicle training based on kalman filtering and DDQN algorithm.
Fig. 2 is a schematic diagram of unmanned aerial vehicle maneuvering target tracking based on kalman filtering and DDQN algorithm.
Fig. 3 is a task display diagram of unmanned aerial vehicle maneuvering target tracking.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm, and the whole flow is shown in figure 1. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:
step 1: construction of Markov (MDP) model for unmanned aerial vehicle maneuvering target tracking
Step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
wherein: x is the number of2,y2The position coordinates of the object are represented and,is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
wherein the content of the first and second substances,is in a stateTransition matrix, Δ t as update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the noise variance. Prediction of observed noise P at state instantstComprises the following steps:
Pt=FtPt-1Ft T+Qw
solving the Kalman coefficient at the moment t as follows:
according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
wherein H is an observation matrix, and O is an observation noise variance; updating the noise covariance matrix and the prediction state:
wherein, I is an identity matrix,respectively, the predicted values of the position component and the velocity component of the kalman filter optimized target along the X, Y axis. Combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
and outputting an action A to represent an action set taken by the unmanned aerial vehicle aiming at the self state value after receiving the external feedback value. According to the invention, the unmanned aerial vehicle changes the self speed change rate and the steering change rate through state input to adjust the self motion track. The output is set as:
wherein the content of the first and second substances,is the acceleration of the unmanned aerial vehicle at the moment t,the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the previous moment and the current moment are respectively;
wherein the content of the first and second substances,the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;
wherein the content of the first and second substances,representing the speed change rate of the unmanned aerial vehicle at the t moment;
and (3) combining the weights, setting an MDP model winning incentive function R:
step 1-4: defining a discount factor γ:
setting the discount factor γ to 0.9, the total return of the whole learning process is:
Rall=R1+γR2+γ2R3+…+γn-1Rn
Rnrepresenting the value of the reward obtained by the drone at step n;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle, and the schematic diagram of the tracking of the maneuvering target of the unmanned aerial vehicle based on the DDQN algorithm is shown in FIG. 2:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting the maximum training round as E as 800, setting the maximum step number of each round as K as 400, setting the size M of an experience playback queue as 20000, setting the soft update proportionality coefficient tau of a target neural network as 0.01, and setting the learning rate alpha of the neural network as 0.001; setting the round number e to be 0;
step 2-3: in this embodiment, the acceleration of the unmanned aerial vehicle is set(unit: m/s)2) And flight yaw rate(unit: degree/sec), discretized into 7 parts:
step 2-4: initialization step k is 0, training time t is 0, and state input s0=[100,120,20,0,100,200,20,0];
Step 2-5: move each motion in the motion spaceAnd state stAnd transmitting the data to the main network theta to obtain Q value output corresponding to all actions of the main network. Greedy selection of corresponding action a in current Q-value outputt;
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt:
Wherein the content of the first and second substances,outputting the corresponding action for the maximum Q value of the main network;
step 2-8: and updating the main network:
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,is a Hamiltonian, and is a Hamiltonian,represents updates propagated back through the neural network gradient;
step 2-9: updating the target network:
θ′←0.01θ+(1-0.01)θ′
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
According to the unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm, target states are predicted by introducing Kalman filtering, an MDP model is further constructed and loaded into the DDQN algorithm. In each round, the unmanned aerial vehicle combines the state information of the unmanned aerial vehicle and the target state information after Kalman filtering processing optimization, and inputs the target state information into a neural network, and the neural network processes the state information and outputs the synthesized unmanned aerial vehicle action. Through continuous learning, the unmanned aerial vehicle can successfully complete the maneuvering target tracking task finally.
With continuous training and learning, the unmanned aerial vehicle can gradually utilize Kalman filtering to accurately predict a maneuvering target and utilize a DDQN algorithm to complete the tracking of the target. The simulation result is shown in fig. 3, and it can be seen that the unmanned aerial vehicle trained based on kalman filtering and DDQN algorithm can keep a small distance error with a complex maneuvering target, and complete a tracking task.
The above description is only a preferred embodiment of the present invention, and it should be noted that: the embodiments of the present invention are not limited to the above-described implementation methods; other embodiments of the invention, such as deletion, modification, and coloring, are also within the scope of the invention.
Claims (4)
1. An unmanned aerial vehicle maneuvering target tracking method fused with Kalman filtering and DDQN algorithm is characterized by comprising the following steps:
step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;
step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
wherein: x is the number of2,y2The position coordinates of the object are represented and,is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
wherein, FtFor the state transition matrix,. DELTA.t is the update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the variance of the noise; predicting the observed noise P at the moment t of the statetComprises the following steps:
Pt=FtPt-1Ft T+Qw
Pt-1refers to the observed noise covariance matrix at time t-1,
solving the Kalman coefficient at the moment t as follows:
wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
updating the noise covariance matrix and the prediction state:
Pt=(I-KtH)Pt -
wherein, I is an identity matrix,respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:
wherein the content of the first and second substances,is the acceleration of the unmanned aerial vehicle at the moment t,the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
step 1-4: defining a discount factor γ:
setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:
Rall=R1+γR2+γ2R3+…+γn-1Rn
Rnrepresenting the prize value obtained by the drone at the nth time;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;
step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectivelyParts by weight:
step 2-4: initialization step k is 0, training time t is 0 and state input s0;
Step 2-5: move each motion in the motion spaceAnd state stTransmitting the Q value outputs to the main network theta to obtain Q value outputs corresponding to all actions of the main network, and selecting corresponding action a from the current Q value outputs by a greedy methodt;
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt:
Wherein the content of the first and second substances,outputting the corresponding action for the maximum Q value of the main network;
step 2-8: and updating the main network:
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,is a Hamiltonian, and is a Hamiltonian,represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;
step 2-9: updating the target network:
θ′←τθ+(1-τ)θ′
τ represents an update scale factor;
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
3. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:
the reward function R is:
setting tracking reward r1 trackComprises the following steps:
r1 track=Dt-1+Dt
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;
wherein the content of the first and second substances,the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;
wherein the content of the first and second substances,representing the speed change rate of the unmanned aerial vehicle at the t moment;
setting an MDP model winning excitation function R:
wherein λ is1、λ2、λ3Weights are awarded for each.
4. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:
the discount factor 0< γ < 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011440212.3A CN112435275A (en) | 2020-12-07 | 2020-12-07 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011440212.3A CN112435275A (en) | 2020-12-07 | 2020-12-07 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112435275A true CN112435275A (en) | 2021-03-02 |
Family
ID=74692387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011440212.3A Pending CN112435275A (en) | 2020-12-07 | 2020-12-07 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112435275A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113093124A (en) * | 2021-04-07 | 2021-07-09 | 哈尔滨工程大学 | DQN algorithm-based real-time allocation method for radar interference resources |
CN113283516A (en) * | 2021-06-01 | 2021-08-20 | 西北工业大学 | Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory |
CN113554680A (en) * | 2021-07-21 | 2021-10-26 | 清华大学 | Target tracking method and device, unmanned aerial vehicle and storage medium |
CN113625569A (en) * | 2021-08-12 | 2021-11-09 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving |
CN113962927A (en) * | 2021-09-01 | 2022-01-21 | 北京长木谷医疗科技有限公司 | Acetabulum cup position adjusting method and device based on reinforcement learning and storage medium |
CN113967909A (en) * | 2021-09-13 | 2022-01-25 | 中国人民解放军军事科学院国防科技创新研究院 | Mechanical arm intelligent control method based on direction reward |
CN114018250A (en) * | 2021-10-18 | 2022-02-08 | 杭州鸿泉物联网技术股份有限公司 | Inertial navigation method, electronic device, storage medium, and computer program product |
CN114089776A (en) * | 2021-11-09 | 2022-02-25 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN114954840A (en) * | 2022-05-30 | 2022-08-30 | 武汉理工大学 | Stability changing control method, system and device for stability changing ship and storage medium |
CN117111620A (en) * | 2023-10-23 | 2023-11-24 | 山东省科学院海洋仪器仪表研究所 | Autonomous decision-making method for task allocation of heterogeneous unmanned system |
CN117271967A (en) * | 2023-11-17 | 2023-12-22 | 北京科技大学 | Rescue co-location method and system based on reinforcement learning compensation filtering |
CN117707207A (en) * | 2024-02-06 | 2024-03-15 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610512A (en) * | 2019-09-09 | 2019-12-24 | 西安交通大学 | Unmanned aerial vehicle target tracking method based on BP neural network fusion Kalman filtering algorithm |
CN110958135A (en) * | 2019-11-05 | 2020-04-03 | 东华大学 | Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning |
CN111580544A (en) * | 2020-03-25 | 2020-08-25 | 北京航空航天大学 | Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111862165A (en) * | 2020-06-17 | 2020-10-30 | 南京理工大学 | Target tracking method for updating Kalman filter based on deep reinforcement learning |
US20200380401A1 (en) * | 2019-05-29 | 2020-12-03 | United States Of America As Represented By The Secretary Of The Navy | Method for Performing Multi-Agent Reinforcement Learning in the Presence of Unreliable Communications Via Distributed Consensus |
-
2020
- 2020-12-07 CN CN202011440212.3A patent/CN112435275A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200380401A1 (en) * | 2019-05-29 | 2020-12-03 | United States Of America As Represented By The Secretary Of The Navy | Method for Performing Multi-Agent Reinforcement Learning in the Presence of Unreliable Communications Via Distributed Consensus |
CN110610512A (en) * | 2019-09-09 | 2019-12-24 | 西安交通大学 | Unmanned aerial vehicle target tracking method based on BP neural network fusion Kalman filtering algorithm |
CN110958135A (en) * | 2019-11-05 | 2020-04-03 | 东华大学 | Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning |
CN111580544A (en) * | 2020-03-25 | 2020-08-25 | 北京航空航天大学 | Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111862165A (en) * | 2020-06-17 | 2020-10-30 | 南京理工大学 | Target tracking method for updating Kalman filter based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
RAHMAD SADLI: "Object Tracking: 2-D Object Tracking using Kalman Filter in Python", 《OBJECT DETECTION, OBJECT TRACKING, PYTHON PROGRAMMING》 * |
XU HUANG 等: "Attitude Control of Fixed-wing UAV Based on DDQN", 《2019 CHINESE AUTOMATION CONGRESS (CAC)》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113093124A (en) * | 2021-04-07 | 2021-07-09 | 哈尔滨工程大学 | DQN algorithm-based real-time allocation method for radar interference resources |
CN113283516A (en) * | 2021-06-01 | 2021-08-20 | 西北工业大学 | Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory |
CN113283516B (en) * | 2021-06-01 | 2023-02-28 | 西北工业大学 | Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory |
CN113554680A (en) * | 2021-07-21 | 2021-10-26 | 清华大学 | Target tracking method and device, unmanned aerial vehicle and storage medium |
CN113625569A (en) * | 2021-08-12 | 2021-11-09 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving |
CN113625569B (en) * | 2021-08-12 | 2022-02-08 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model |
CN113962927A (en) * | 2021-09-01 | 2022-01-21 | 北京长木谷医疗科技有限公司 | Acetabulum cup position adjusting method and device based on reinforcement learning and storage medium |
CN113967909A (en) * | 2021-09-13 | 2022-01-25 | 中国人民解放军军事科学院国防科技创新研究院 | Mechanical arm intelligent control method based on direction reward |
CN114018250A (en) * | 2021-10-18 | 2022-02-08 | 杭州鸿泉物联网技术股份有限公司 | Inertial navigation method, electronic device, storage medium, and computer program product |
CN114018250B (en) * | 2021-10-18 | 2024-05-03 | 杭州鸿泉物联网技术股份有限公司 | Inertial navigation method, electronic device, storage medium and computer program product |
CN114089776A (en) * | 2021-11-09 | 2022-02-25 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN114089776B (en) * | 2021-11-09 | 2023-10-24 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN114954840A (en) * | 2022-05-30 | 2022-08-30 | 武汉理工大学 | Stability changing control method, system and device for stability changing ship and storage medium |
CN114954840B (en) * | 2022-05-30 | 2023-09-05 | 武汉理工大学 | Method, system and device for controlling stability of ship |
CN117111620A (en) * | 2023-10-23 | 2023-11-24 | 山东省科学院海洋仪器仪表研究所 | Autonomous decision-making method for task allocation of heterogeneous unmanned system |
CN117111620B (en) * | 2023-10-23 | 2024-03-29 | 山东省科学院海洋仪器仪表研究所 | Autonomous decision-making method for task allocation of heterogeneous unmanned system |
CN117271967A (en) * | 2023-11-17 | 2023-12-22 | 北京科技大学 | Rescue co-location method and system based on reinforcement learning compensation filtering |
CN117271967B (en) * | 2023-11-17 | 2024-02-13 | 北京科技大学 | Rescue co-location method and system based on reinforcement learning compensation filtering |
CN117707207A (en) * | 2024-02-06 | 2024-03-15 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
CN117707207B (en) * | 2024-02-06 | 2024-04-19 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435275A (en) | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm | |
CN109655066B (en) | Unmanned aerial vehicle path planning method based on Q (lambda) algorithm | |
CN111667513B (en) | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
CN107479368B (en) | Method and system for training unmanned aerial vehicle control model based on artificial intelligence | |
Wu | Coordinated path planning for an unmanned aerial-aquatic vehicle (UAAV) and an autonomous underwater vehicle (AUV) in an underwater target strike mission | |
CN110320809B (en) | AGV track correction method based on model predictive control | |
CN100591900C (en) | Flight control system having a three control loop design | |
CN110908395A (en) | Improved unmanned aerial vehicle flight path real-time planning method | |
CN113268074B (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
Nie et al. | Three-dimensional path-following control of a robotic airship with reinforcement learning | |
CN108664024A (en) | The motion planning and Cooperative Localization Method and device that unmanned vehicle network is formed into columns | |
Mansouri et al. | Distributed model predictive control for unmanned aerial vehicles | |
CN114859910A (en) | Unmanned ship path following system and method based on deep reinforcement learning | |
Wu et al. | An adaptive reentry guidance method considering the influence of blackout zone | |
CN115562357A (en) | Intelligent path planning method for unmanned aerial vehicle cluster | |
CN114967721A (en) | Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet | |
Chajan et al. | GPU based model-predictive path control for self-driving vehicles | |
Steinbrener et al. | Improved state propagation through AI-based pre-processing and down-sampling of high-speed inertial data | |
Miller et al. | Coordinated guidance of autonomous uavs via nominal belief-state optimization | |
Rottmann et al. | Adaptive autonomous control using online value iteration with gaussian processes | |
Wang et al. | Tracking moving target for 6 degree-of-freedom robot manipulator with adaptive visual servoing based on deep reinforcement learning PID controller | |
Wilson et al. | UAV rendezvous: From concept to flight test | |
CN114964268A (en) | Unmanned aerial vehicle navigation method and device | |
Chindhe et al. | Advances in vision-based UAV manoeuvring techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210302 |