CN112435275A - Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm - Google Patents
Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm Download PDFInfo
- Publication number
- CN112435275A CN112435275A CN202011440212.3A CN202011440212A CN112435275A CN 112435275 A CN112435275 A CN 112435275A CN 202011440212 A CN202011440212 A CN 202011440212A CN 112435275 A CN112435275 A CN 112435275A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- target
- state
- ddqn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000001914 filtration Methods 0.000 title claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims abstract description 27
- 230000009471 action Effects 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000001133 acceleration Effects 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 17
- 230000008859 change Effects 0.000 claims description 10
- 230000000875 corresponding effect Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 230000000644 propagated effect Effects 0.000 claims description 3
- 230000026676 system process Effects 0.000 claims description 3
- 230000005284 excitation Effects 0.000 claims description 2
- 238000005259 measurement Methods 0.000 abstract description 2
- 230000002787 reinforcement Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Operations Research (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Feedback Control In General (AREA)
Abstract
The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithms, which is used for accurately estimating a target motion state to obtain a target position and speed, further combining the state information of an unmanned aerial vehicle as neural network input, taking the acceleration and the angular speed of the unmanned aerial vehicle as action output, learning through the DDQN algorithm, completing the training of a flight strategy network, and realizing the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target. The method effectively solves the error problem of direct distance measurement by using the sensor in the traditional unmanned aerial vehicle target tracking task, has higher application value, and effectively solves the DQN over-estimation problem in the traditional DQN algorithm.
Description
Technical Field
The invention relates to the field of control, in particular to a method for tracking a maneuvering target of an unmanned aerial vehicle, relates to a DDQN algorithm based on deep reinforcement learning in the fields of Kalman filtering algorithm and computer, and belongs to the method application of interdiscipline.
Background
Unmanned Aerial Vehicle (UAV), as a novel aviation aircraft device, has become a practical and effective development tool in the fields of military, civil and scientific research, and plays an important role in aviation industry upgrading, military and civil technology integration and industrial efficiency innovation. In practical application, the unmanned aerial vehicle often needs to face task scenes such as cluster cooperative flight or ground target tracking, and the like, and has higher requirements on manual operation, task allocation and flight path planning technologies. Therefore, the method for effectively tracking the autonomous maneuvering target of the high-precision unmanned aerial vehicle has important significance.
When the unmanned aerial vehicle executes a flight task, the unmanned aerial vehicle often needs to complete related cooperative work along with a specified target, or track a maneuvering target to execute a reconnaissance task, and the flight tracks of multiple machines need to be planned in advance respectively. The unmanned aerial vehicle has the advantages that the unmanned aerial vehicle can learn by considering that the unmanned aerial vehicle has high requirements on manual operation and often faces dynamic external environment interference and other factors, and the unmanned aerial vehicle has great significance in autonomous learning and completing a tracking task on a maneuvering target in an unknown dynamic environment. The patent publication CN110610512A proposes an unmanned aerial vehicle target tracking method based on a BP neural network fusion kalman filtering algorithm, which predicts a target flight trajectory through kalman filtering, predicts a target position by using a BP neural network, and controls an unmanned aerial vehicle to track through PID. However, the method only utilizes the fitting capability of the neural network, does not provide learning capability for the unmanned aerial vehicle, and cannot be applied to a dynamically changing environment. Patent CN110806759A provides an aircraft route tracking method based on deep reinforcement learning, which completes route tracking by constructing a Markov decision process model and combining a deep reinforcement learning algorithm, however, in the method, an aircraft can only fly to reach each task point in sequence according to provided information to complete route tracking, and cannot track a maneuvering target with unknown motion trajectory, and certain limitations are provided.
The Kalman filtering algorithm is a common method in a control theory and a control system project, can estimate a true value based on data of an observed value and an estimated value, can be used for estimating the motion state of a maneuvering target in real time, and effectively improves the accuracy of target state prediction. As an optimization algorithm based on a deep reinforcement learning DQN algorithm, DDQN can not only help an unmanned aerial vehicle to provide learning ability and complete a maneuvering target tracking task, but also can effectively solve the problem of over-estimation of target state prediction, and improve learning precision and efficiency. Therefore, the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm is designed, and has important significance for the unmanned aerial vehicle to independently complete high-precision track prediction and real-time tracking task of maneuvering target.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm. According to the invention, a Kalman filtering technology is utilized to accurately estimate the motion state of a target to obtain the position and the speed of the target, the state information of the unmanned aerial vehicle is further combined as the input of a neural network, the acceleration and the angular speed of the unmanned aerial vehicle are used as the action output, the training of a flight strategy network is completed through the learning of a DDQN algorithm, and the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;
step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
wherein: x is the number of2,y2The position coordinates of the object are represented and,is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
wherein, FtFor the state transition matrix,. DELTA.t is the update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the variance of the noise; predicting the observed noise P at the moment t of the statetComprises the following steps:
Pt=FtPt-1Ft T+Qw
Pt-1refers to the observed noise covariance matrix at time t-1,
solving the Kalman coefficient at the moment t as follows:
wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
updating the noise covariance matrix and the prediction state:
wherein, I is an identity matrix,respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:
wherein,is the acceleration of the unmanned aerial vehicle at the moment t,the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
step 1-4: defining a discount factor γ:
setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:
Rall=R1+γR2+γ2R3+…+γn-1Rn
Rnrepresenting the prize value obtained by the drone at the nth time;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;
step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectivelyParts by weight:
step 2-4: initialization step k is 0, training time t is 0 and state input s0;
Step 2-5: move each motion in the motion spaceAnd state stTransmitting the data into the main network theta to obtain Q value output corresponding to all actions of the main network,greedy selection of corresponding action a in current Q-value outputt;
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt:
step 2-8: and updating the main network:
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,is a Hamiltonian, and is a Hamiltonian,represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;
step 2-9: updating the target network:
θ′←τθ+(1-τ)θ′
τ represents an update scale factor;
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
The reward function R is:
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;
setting an MDP model winning excitation function R:
wherein λ is1、λ2、λ3Weights are awarded for each.
The discount factor 0< γ < 1.
The invention has the beneficial effects that:
(1) in the MDP model constructed by the method, Kalman filtering is introduced for state input to process, so that the accuracy of target state prediction is improved, the error problem of direct distance measurement by using a sensor in a traditional unmanned aerial vehicle target tracking task is effectively solved, and the method has high application value;
(2) the invention uses DDQN algorithm, effectively solves the DQN over-estimation problem in the traditional DQN algorithm. The method comprises the steps of obtaining accurate target state estimation by combining with Kalman filtering, discretizing by combining with self state, and outputting appropriate actions through network processing to complete tracking of a maneuvering target; the unmanned aerial vehicle trained by the DDQN algorithm can cope with a dynamic change environment and complete the tracking of a maneuvering target.
Drawings
Fig. 1 is a flow chart of unmanned aerial vehicle training based on kalman filtering and DDQN algorithm.
Fig. 2 is a schematic diagram of unmanned aerial vehicle maneuvering target tracking based on kalman filtering and DDQN algorithm.
Fig. 3 is a task display diagram of unmanned aerial vehicle maneuvering target tracking.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm, and the whole flow is shown in figure 1. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:
step 1: construction of Markov (MDP) model for unmanned aerial vehicle maneuvering target tracking
Step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
wherein: x is the number of2,y2The position coordinates of the object are represented and,is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
wherein,is in a stateTransition matrix, Δ t as update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the noise variance. Prediction of observed noise P at state instantstComprises the following steps:
Pt=FtPt-1Ft T+Qw
solving the Kalman coefficient at the moment t as follows:
according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
wherein H is an observation matrix, and O is an observation noise variance; updating the noise covariance matrix and the prediction state:
wherein, I is an identity matrix,respectively, the predicted values of the position component and the velocity component of the kalman filter optimized target along the X, Y axis. Combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
and outputting an action A to represent an action set taken by the unmanned aerial vehicle aiming at the self state value after receiving the external feedback value. According to the invention, the unmanned aerial vehicle changes the self speed change rate and the steering change rate through state input to adjust the self motion track. The output is set as:
wherein,is the acceleration of the unmanned aerial vehicle at the moment t,the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the previous moment and the current moment are respectively;
and (3) combining the weights, setting an MDP model winning incentive function R:
step 1-4: defining a discount factor γ:
setting the discount factor γ to 0.9, the total return of the whole learning process is:
Rall=R1+γR2+γ2R3+…+γn-1Rn
Rnrepresenting the value of the reward obtained by the drone at step n;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle, and the schematic diagram of the tracking of the maneuvering target of the unmanned aerial vehicle based on the DDQN algorithm is shown in FIG. 2:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting the maximum training round as E as 800, setting the maximum step number of each round as K as 400, setting the size M of an experience playback queue as 20000, setting the soft update proportionality coefficient tau of a target neural network as 0.01, and setting the learning rate alpha of the neural network as 0.001; setting the round number e to be 0;
step 2-3: in this embodiment, the acceleration of the unmanned aerial vehicle is set(unit: m/s)2) And flight yaw rate(unit: degree/sec), discretized into 7 parts:
step 2-4: initialization step k is 0, training time t is 0, and state input s0=[100,120,20,0,100,200,20,0];
Step 2-5: move each motion in the motion spaceAnd state stAnd transmitting the data to the main network theta to obtain Q value output corresponding to all actions of the main network. Greedy selection of corresponding action a in current Q-value outputt;
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt:
step 2-8: and updating the main network:
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,is a Hamiltonian, and is a Hamiltonian,represents updates propagated back through the neural network gradient;
step 2-9: updating the target network:
θ′←0.01θ+(1-0.01)θ′
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
According to the unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm, target states are predicted by introducing Kalman filtering, an MDP model is further constructed and loaded into the DDQN algorithm. In each round, the unmanned aerial vehicle combines the state information of the unmanned aerial vehicle and the target state information after Kalman filtering processing optimization, and inputs the target state information into a neural network, and the neural network processes the state information and outputs the synthesized unmanned aerial vehicle action. Through continuous learning, the unmanned aerial vehicle can successfully complete the maneuvering target tracking task finally.
With continuous training and learning, the unmanned aerial vehicle can gradually utilize Kalman filtering to accurately predict a maneuvering target and utilize a DDQN algorithm to complete the tracking of the target. The simulation result is shown in fig. 3, and it can be seen that the unmanned aerial vehicle trained based on kalman filtering and DDQN algorithm can keep a small distance error with a complex maneuvering target, and complete a tracking task.
The above description is only a preferred embodiment of the present invention, and it should be noted that: the embodiments of the present invention are not limited to the above-described implementation methods; other embodiments of the invention, such as deletion, modification, and coloring, are also within the scope of the invention.
Claims (4)
1. An unmanned aerial vehicle maneuvering target tracking method fused with Kalman filtering and DDQN algorithm is characterized by comprising the following steps:
step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;
step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
wherein: x is the number of2,y2The position coordinates of the object are represented and,is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
wherein, FtFor the state transition matrix,. DELTA.t is the update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the variance of the noise; predicting the observed noise P at the moment t of the statetComprises the following steps:
Pt=FtPt-1Ft T+Qw
Pt-1refers to the observed noise covariance matrix at time t-1,
solving the Kalman coefficient at the moment t as follows:
wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
updating the noise covariance matrix and the prediction state:
Pt=(I-KtH)Pt -
wherein, I is an identity matrix,respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:
wherein,is the acceleration of the unmanned aerial vehicle at the moment t,the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
step 1-4: defining a discount factor γ:
setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:
Rall=R1+γR2+γ2R3+…+γn-1Rn
Rnrepresenting the prize value obtained by the drone at the nth time;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;
step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectivelyParts by weight:
step 2-4: initialization step k is 0, training time t is 0 and state input s0;
Step 2-5: move each motion in the motion spaceAnd state stTransmitting the Q value outputs to the main network theta to obtain Q value outputs corresponding to all actions of the main network, and selecting corresponding action a from the current Q value outputs by a greedy methodt;
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt:
step 2-8: and updating the main network:
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,is a Hamiltonian, and is a Hamiltonian,represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;
step 2-9: updating the target network:
θ′←τθ+(1-τ)θ′
τ represents an update scale factor;
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
3. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:
the reward function R is:
setting tracking reward r1 trackComprises the following steps:
r1 track=Dt-1+Dt
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;
setting an MDP model winning excitation function R:
wherein λ is1、λ2、λ3Weights are awarded for each.
4. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:
the discount factor 0< γ < 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011440212.3A CN112435275A (en) | 2020-12-07 | 2020-12-07 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011440212.3A CN112435275A (en) | 2020-12-07 | 2020-12-07 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112435275A true CN112435275A (en) | 2021-03-02 |
Family
ID=74692387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011440212.3A Pending CN112435275A (en) | 2020-12-07 | 2020-12-07 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112435275A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113093124A (en) * | 2021-04-07 | 2021-07-09 | 哈尔滨工程大学 | DQN algorithm-based real-time allocation method for radar interference resources |
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113283516A (en) * | 2021-06-01 | 2021-08-20 | 西北工业大学 | Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory |
CN113554680A (en) * | 2021-07-21 | 2021-10-26 | 清华大学 | Target tracking method and device, unmanned aerial vehicle and storage medium |
CN113625569A (en) * | 2021-08-12 | 2021-11-09 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving |
CN113962927A (en) * | 2021-09-01 | 2022-01-21 | 北京长木谷医疗科技有限公司 | Acetabulum cup position adjusting method and device based on reinforcement learning and storage medium |
CN113967909A (en) * | 2021-09-13 | 2022-01-25 | 中国人民解放军军事科学院国防科技创新研究院 | Mechanical arm intelligent control method based on direction reward |
CN114018250A (en) * | 2021-10-18 | 2022-02-08 | 杭州鸿泉物联网技术股份有限公司 | Inertial navigation method, electronic device, storage medium, and computer program product |
CN114089776A (en) * | 2021-11-09 | 2022-02-25 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN114675673A (en) * | 2022-04-08 | 2022-06-28 | 北京航空航天大学 | Aerial moving target tracking method and system |
CN114954840A (en) * | 2022-05-30 | 2022-08-30 | 武汉理工大学 | Stability changing control method, system and device for stability changing ship and storage medium |
CN117111620A (en) * | 2023-10-23 | 2023-11-24 | 山东省科学院海洋仪器仪表研究所 | Autonomous decision-making method for task allocation of heterogeneous unmanned system |
CN117271967A (en) * | 2023-11-17 | 2023-12-22 | 北京科技大学 | Rescue co-location method and system based on reinforcement learning compensation filtering |
CN117707207A (en) * | 2024-02-06 | 2024-03-15 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
CN118313277A (en) * | 2024-06-05 | 2024-07-09 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle interference link planning method, device and equipment driven by dynamic data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610512A (en) * | 2019-09-09 | 2019-12-24 | 西安交通大学 | Unmanned aerial vehicle target tracking method based on BP neural network fusion Kalman filtering algorithm |
CN110958135A (en) * | 2019-11-05 | 2020-04-03 | 东华大学 | Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning |
CN111580544A (en) * | 2020-03-25 | 2020-08-25 | 北京航空航天大学 | Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111862165A (en) * | 2020-06-17 | 2020-10-30 | 南京理工大学 | Target tracking method for updating Kalman filter based on deep reinforcement learning |
US20200380401A1 (en) * | 2019-05-29 | 2020-12-03 | United States Of America As Represented By The Secretary Of The Navy | Method for Performing Multi-Agent Reinforcement Learning in the Presence of Unreliable Communications Via Distributed Consensus |
-
2020
- 2020-12-07 CN CN202011440212.3A patent/CN112435275A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200380401A1 (en) * | 2019-05-29 | 2020-12-03 | United States Of America As Represented By The Secretary Of The Navy | Method for Performing Multi-Agent Reinforcement Learning in the Presence of Unreliable Communications Via Distributed Consensus |
CN110610512A (en) * | 2019-09-09 | 2019-12-24 | 西安交通大学 | Unmanned aerial vehicle target tracking method based on BP neural network fusion Kalman filtering algorithm |
CN110958135A (en) * | 2019-11-05 | 2020-04-03 | 东华大学 | Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning |
CN111580544A (en) * | 2020-03-25 | 2020-08-25 | 北京航空航天大学 | Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111862165A (en) * | 2020-06-17 | 2020-10-30 | 南京理工大学 | Target tracking method for updating Kalman filter based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
RAHMAD SADLI: "Object Tracking: 2-D Object Tracking using Kalman Filter in Python", 《OBJECT DETECTION, OBJECT TRACKING, PYTHON PROGRAMMING》 * |
XU HUANG 等: "Attitude Control of Fixed-wing UAV Based on DDQN", 《2019 CHINESE AUTOMATION CONGRESS (CAC)》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113093124A (en) * | 2021-04-07 | 2021-07-09 | 哈尔滨工程大学 | DQN algorithm-based real-time allocation method for radar interference resources |
CN113283516A (en) * | 2021-06-01 | 2021-08-20 | 西北工业大学 | Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory |
CN113283516B (en) * | 2021-06-01 | 2023-02-28 | 西北工业大学 | Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory |
CN113554680A (en) * | 2021-07-21 | 2021-10-26 | 清华大学 | Target tracking method and device, unmanned aerial vehicle and storage medium |
CN113625569B (en) * | 2021-08-12 | 2022-02-08 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model |
CN113625569A (en) * | 2021-08-12 | 2021-11-09 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving |
CN113962927A (en) * | 2021-09-01 | 2022-01-21 | 北京长木谷医疗科技有限公司 | Acetabulum cup position adjusting method and device based on reinforcement learning and storage medium |
CN113967909A (en) * | 2021-09-13 | 2022-01-25 | 中国人民解放军军事科学院国防科技创新研究院 | Mechanical arm intelligent control method based on direction reward |
CN114018250B (en) * | 2021-10-18 | 2024-05-03 | 杭州鸿泉物联网技术股份有限公司 | Inertial navigation method, electronic device, storage medium and computer program product |
CN114018250A (en) * | 2021-10-18 | 2022-02-08 | 杭州鸿泉物联网技术股份有限公司 | Inertial navigation method, electronic device, storage medium, and computer program product |
CN114089776B (en) * | 2021-11-09 | 2023-10-24 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN114089776A (en) * | 2021-11-09 | 2022-02-25 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN114675673A (en) * | 2022-04-08 | 2022-06-28 | 北京航空航天大学 | Aerial moving target tracking method and system |
CN114675673B (en) * | 2022-04-08 | 2024-06-07 | 北京航空航天大学 | Method and system for tracking moving target in air |
CN114954840A (en) * | 2022-05-30 | 2022-08-30 | 武汉理工大学 | Stability changing control method, system and device for stability changing ship and storage medium |
CN114954840B (en) * | 2022-05-30 | 2023-09-05 | 武汉理工大学 | Method, system and device for controlling stability of ship |
CN117111620A (en) * | 2023-10-23 | 2023-11-24 | 山东省科学院海洋仪器仪表研究所 | Autonomous decision-making method for task allocation of heterogeneous unmanned system |
CN117111620B (en) * | 2023-10-23 | 2024-03-29 | 山东省科学院海洋仪器仪表研究所 | Autonomous decision-making method for task allocation of heterogeneous unmanned system |
CN117271967B (en) * | 2023-11-17 | 2024-02-13 | 北京科技大学 | Rescue co-location method and system based on reinforcement learning compensation filtering |
CN117271967A (en) * | 2023-11-17 | 2023-12-22 | 北京科技大学 | Rescue co-location method and system based on reinforcement learning compensation filtering |
CN117707207A (en) * | 2024-02-06 | 2024-03-15 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
CN117707207B (en) * | 2024-02-06 | 2024-04-19 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
CN118313277A (en) * | 2024-06-05 | 2024-07-09 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle interference link planning method, device and equipment driven by dynamic data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435275A (en) | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm | |
CN109655066B (en) | Unmanned aerial vehicle path planning method based on Q (lambda) algorithm | |
CN111667513B (en) | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
Wu | Coordinated path planning for an unmanned aerial-aquatic vehicle (UAAV) and an autonomous underwater vehicle (AUV) in an underwater target strike mission | |
CN107479368B (en) | Method and system for training unmanned aerial vehicle control model based on artificial intelligence | |
CN101943916B (en) | Kalman filter prediction-based robot obstacle avoidance method | |
CN110320809B (en) | AGV track correction method based on model predictive control | |
Mousavi et al. | LTV-MPC based path planning of an autonomous vehicle via convex optimization | |
CN110908395A (en) | Improved unmanned aerial vehicle flight path real-time planning method | |
CN113268074B (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
Mansouri et al. | Distributed model predictive control for unmanned aerial vehicles | |
Wu et al. | An adaptive reentry guidance method considering the influence of blackout zone | |
Wang et al. | Decentralized MPC-based trajectory generation for multiple quadrotors in cluttered environments | |
Steinbrener et al. | Improved state propagation through AI-based pre-processing and down-sampling of high-speed inertial data | |
Chajan et al. | GPU based model-predictive path control for self-driving vehicles | |
CN117387635B (en) | Unmanned aerial vehicle navigation method based on deep reinforcement learning and PID controller | |
Luo et al. | UAV path planning based on the average TD3 algorithm with prioritized experience replay | |
Ferede et al. | End-to-end neural network based optimal quadcopter control | |
Miller et al. | Coordinated guidance of autonomous uavs via nominal belief-state optimization | |
Rottmann et al. | Adaptive autonomous control using online value iteration with gaussian processes | |
Cui | Multi-target points path planning for fixed-wing unmanned aerial vehicle performing reconnaissance missions | |
CN114740882B (en) | Track generation method for elastic target tracking with visibility ensured by unmanned aerial vehicle | |
CN114604273A (en) | Vehicle track planning method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210302 |