CN112435275A - Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm - Google Patents

Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm Download PDF

Info

Publication number
CN112435275A
CN112435275A CN202011440212.3A CN202011440212A CN112435275A CN 112435275 A CN112435275 A CN 112435275A CN 202011440212 A CN202011440212 A CN 202011440212A CN 112435275 A CN112435275 A CN 112435275A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
target
state
ddqn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011440212.3A
Other languages
Chinese (zh)
Inventor
张修社
韩春雷
李琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 20 Research Institute
Original Assignee
CETC 20 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 20 Research Institute filed Critical CETC 20 Research Institute
Priority to CN202011440212.3A priority Critical patent/CN112435275A/en
Publication of CN112435275A publication Critical patent/CN112435275A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithms, which is used for accurately estimating a target motion state to obtain a target position and speed, further combining the state information of an unmanned aerial vehicle as neural network input, taking the acceleration and the angular speed of the unmanned aerial vehicle as action output, learning through the DDQN algorithm, completing the training of a flight strategy network, and realizing the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target. The method effectively solves the error problem of direct distance measurement by using the sensor in the traditional unmanned aerial vehicle target tracking task, has higher application value, and effectively solves the DQN over-estimation problem in the traditional DQN algorithm.

Description

Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm
Technical Field
The invention relates to the field of control, in particular to a method for tracking a maneuvering target of an unmanned aerial vehicle, relates to a DDQN algorithm based on deep reinforcement learning in the fields of Kalman filtering algorithm and computer, and belongs to the method application of interdiscipline.
Background
Unmanned Aerial Vehicle (UAV), as a novel aviation aircraft device, has become a practical and effective development tool in the fields of military, civil and scientific research, and plays an important role in aviation industry upgrading, military and civil technology integration and industrial efficiency innovation. In practical application, the unmanned aerial vehicle often needs to face task scenes such as cluster cooperative flight or ground target tracking, and the like, and has higher requirements on manual operation, task allocation and flight path planning technologies. Therefore, the method for effectively tracking the autonomous maneuvering target of the high-precision unmanned aerial vehicle has important significance.
When the unmanned aerial vehicle executes a flight task, the unmanned aerial vehicle often needs to complete related cooperative work along with a specified target, or track a maneuvering target to execute a reconnaissance task, and the flight tracks of multiple machines need to be planned in advance respectively. The unmanned aerial vehicle has the advantages that the unmanned aerial vehicle can learn by considering that the unmanned aerial vehicle has high requirements on manual operation and often faces dynamic external environment interference and other factors, and the unmanned aerial vehicle has great significance in autonomous learning and completing a tracking task on a maneuvering target in an unknown dynamic environment. The patent publication CN110610512A proposes an unmanned aerial vehicle target tracking method based on a BP neural network fusion kalman filtering algorithm, which predicts a target flight trajectory through kalman filtering, predicts a target position by using a BP neural network, and controls an unmanned aerial vehicle to track through PID. However, the method only utilizes the fitting capability of the neural network, does not provide learning capability for the unmanned aerial vehicle, and cannot be applied to a dynamically changing environment. Patent CN110806759A provides an aircraft route tracking method based on deep reinforcement learning, which completes route tracking by constructing a Markov decision process model and combining a deep reinforcement learning algorithm, however, in the method, an aircraft can only fly to reach each task point in sequence according to provided information to complete route tracking, and cannot track a maneuvering target with unknown motion trajectory, and certain limitations are provided.
The Kalman filtering algorithm is a common method in a control theory and a control system project, can estimate a true value based on data of an observed value and an estimated value, can be used for estimating the motion state of a maneuvering target in real time, and effectively improves the accuracy of target state prediction. As an optimization algorithm based on a deep reinforcement learning DQN algorithm, DDQN can not only help an unmanned aerial vehicle to provide learning ability and complete a maneuvering target tracking task, but also can effectively solve the problem of over-estimation of target state prediction, and improve learning precision and efficiency. Therefore, the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm is designed, and has important significance for the unmanned aerial vehicle to independently complete high-precision track prediction and real-time tracking task of maneuvering target.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm. According to the invention, a Kalman filtering technology is utilized to accurately estimate the motion state of a target to obtain the position and the speed of the target, the state information of the unmanned aerial vehicle is further combined as the input of a neural network, the acceleration and the angular speed of the unmanned aerial vehicle are used as the action output, the training of a flight strategy network is completed through the learning of a DDQN algorithm, and the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;
step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
Figure BDA0002821990920000021
wherein: x is the number of2,y2The position coordinates of the object are represented and,
Figure BDA0002821990920000022
is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
Figure BDA0002821990920000023
wherein, FtFor the state transition matrix,. DELTA.t is the update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the variance of the noise; predicting the observed noise P at the moment t of the statetComprises the following steps:
Pt=FtPt-1Ft T+Qw
Pt-1refers to the observed noise covariance matrix at time t-1,
solving the Kalman coefficient at the moment t as follows:
Figure BDA0002821990920000024
wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
Figure BDA0002821990920000031
updating the noise covariance matrix and the prediction state:
Figure BDA0002821990920000032
Figure BDA0002821990920000033
wherein, I is an identity matrix,
Figure BDA0002821990920000034
respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
Figure BDA0002821990920000035
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:
Figure BDA0002821990920000036
wherein the content of the first and second substances,
Figure BDA0002821990920000037
is the acceleration of the unmanned aerial vehicle at the moment t,
Figure BDA0002821990920000038
the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
step 1-4: defining a discount factor γ:
setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:
Rall=R1+γR22R3+…+γn-1Rn
Rnrepresenting the prize value obtained by the drone at the nth time;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;
step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectively
Figure BDA0002821990920000041
Parts by weight:
Figure BDA0002821990920000042
Figure BDA0002821990920000043
step 2-4: initialization step k is 0, training time t is 0 and state input s0
Step 2-5: move each motion in the motion space
Figure BDA0002821990920000044
And state stTransmitting the data into the main network theta to obtain Q value output corresponding to all actions of the main network,greedy selection of corresponding action a in current Q-value outputt
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt
Figure BDA0002821990920000045
Wherein the content of the first and second substances,
Figure BDA0002821990920000046
outputting the corresponding action for the maximum Q value of the main network;
step 2-8: and updating the main network:
Figure BDA0002821990920000047
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,
Figure BDA0002821990920000048
is a Hamiltonian, and is a Hamiltonian,
Figure BDA0002821990920000049
represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;
step 2-9: updating the target network:
θ′←τθ+(1-τ)θ′
τ represents an update scale factor;
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
The state transition matrix
Figure BDA0002821990920000051
The reward function R is:
setting tracking rewards
Figure BDA0002821990920000052
Comprises the following steps:
Figure BDA0002821990920000053
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;
set direction reward
Figure BDA0002821990920000054
Comprises the following steps:
Figure BDA0002821990920000055
wherein the content of the first and second substances,
Figure BDA0002821990920000056
the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;
setting a stable flight reward
Figure BDA0002821990920000057
Comprises the following steps:
Figure BDA0002821990920000058
wherein the content of the first and second substances,
Figure BDA0002821990920000059
representing the speed change rate of the unmanned aerial vehicle at the t moment;
setting an MDP model winning excitation function R:
Figure BDA00028219909200000510
wherein λ is1、λ2、λ3Weights are awarded for each.
The discount factor 0< γ < 1.
The invention has the beneficial effects that:
(1) in the MDP model constructed by the method, Kalman filtering is introduced for state input to process, so that the accuracy of target state prediction is improved, the error problem of direct distance measurement by using a sensor in a traditional unmanned aerial vehicle target tracking task is effectively solved, and the method has high application value;
(2) the invention uses DDQN algorithm, effectively solves the DQN over-estimation problem in the traditional DQN algorithm. The method comprises the steps of obtaining accurate target state estimation by combining with Kalman filtering, discretizing by combining with self state, and outputting appropriate actions through network processing to complete tracking of a maneuvering target; the unmanned aerial vehicle trained by the DDQN algorithm can cope with a dynamic change environment and complete the tracking of a maneuvering target.
Drawings
Fig. 1 is a flow chart of unmanned aerial vehicle training based on kalman filtering and DDQN algorithm.
Fig. 2 is a schematic diagram of unmanned aerial vehicle maneuvering target tracking based on kalman filtering and DDQN algorithm.
Fig. 3 is a task display diagram of unmanned aerial vehicle maneuvering target tracking.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm, and the whole flow is shown in figure 1. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:
step 1: construction of Markov (MDP) model for unmanned aerial vehicle maneuvering target tracking
Step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
Figure BDA0002821990920000061
wherein: x is the number of2,y2The position coordinates of the object are represented and,
Figure BDA0002821990920000062
is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
Figure BDA0002821990920000063
wherein the content of the first and second substances,
Figure BDA0002821990920000064
is in a stateTransition matrix, Δ t as update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the noise variance. Prediction of observed noise P at state instantstComprises the following steps:
Pt=FtPt-1Ft T+Qw
solving the Kalman coefficient at the moment t as follows:
Figure BDA0002821990920000071
according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
Figure BDA0002821990920000072
wherein H is an observation matrix, and O is an observation noise variance; updating the noise covariance matrix and the prediction state:
Figure BDA0002821990920000073
Figure BDA0002821990920000074
wherein, I is an identity matrix,
Figure BDA0002821990920000075
respectively, the predicted values of the position component and the velocity component of the kalman filter optimized target along the X, Y axis. Combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
Figure BDA0002821990920000076
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
and outputting an action A to represent an action set taken by the unmanned aerial vehicle aiming at the self state value after receiving the external feedback value. According to the invention, the unmanned aerial vehicle changes the self speed change rate and the steering change rate through state input to adjust the self motion track. The output is set as:
Figure BDA0002821990920000077
wherein the content of the first and second substances,
Figure BDA0002821990920000078
is the acceleration of the unmanned aerial vehicle at the moment t,
Figure BDA0002821990920000079
the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
setting tracking rewards
Figure BDA00028219909200000710
Comprises the following steps:
Figure BDA00028219909200000711
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the previous moment and the current moment are respectively;
set direction reward
Figure BDA00028219909200000712
Comprises the following steps:
Figure BDA00028219909200000713
wherein the content of the first and second substances,
Figure BDA00028219909200000714
the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;
setting a stable flight reward
Figure BDA00028219909200000715
Comprises the following steps:
Figure BDA0002821990920000081
wherein the content of the first and second substances,
Figure BDA0002821990920000082
representing the speed change rate of the unmanned aerial vehicle at the t moment;
and (3) combining the weights, setting an MDP model winning incentive function R:
Figure BDA0002821990920000083
step 1-4: defining a discount factor γ:
setting the discount factor γ to 0.9, the total return of the whole learning process is:
Rall=R1+γR22R3+…+γn-1Rn
Rnrepresenting the value of the reward obtained by the drone at step n;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle, and the schematic diagram of the tracking of the maneuvering target of the unmanned aerial vehicle based on the DDQN algorithm is shown in FIG. 2:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting the maximum training round as E as 800, setting the maximum step number of each round as K as 400, setting the size M of an experience playback queue as 20000, setting the soft update proportionality coefficient tau of a target neural network as 0.01, and setting the learning rate alpha of the neural network as 0.001; setting the round number e to be 0;
step 2-3: in this embodiment, the acceleration of the unmanned aerial vehicle is set
Figure BDA0002821990920000084
(unit: m/s)2) And flight yaw rate
Figure BDA0002821990920000085
(unit: degree/sec), discretized into 7 parts:
Figure BDA0002821990920000086
Figure BDA0002821990920000087
step 2-4: initialization step k is 0, training time t is 0, and state input s0=[100,120,20,0,100,200,20,0];
Step 2-5: move each motion in the motion space
Figure BDA0002821990920000088
And state stAnd transmitting the data to the main network theta to obtain Q value output corresponding to all actions of the main network. Greedy selection of corresponding action a in current Q-value outputt
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt
Figure BDA0002821990920000089
Wherein the content of the first and second substances,
Figure BDA0002821990920000091
outputting the corresponding action for the maximum Q value of the main network;
step 2-8: and updating the main network:
Figure BDA0002821990920000092
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,
Figure BDA0002821990920000093
is a Hamiltonian, and is a Hamiltonian,
Figure BDA0002821990920000094
represents updates propagated back through the neural network gradient;
step 2-9: updating the target network:
θ′←0.01θ+(1-0.01)θ′
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
According to the unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm, target states are predicted by introducing Kalman filtering, an MDP model is further constructed and loaded into the DDQN algorithm. In each round, the unmanned aerial vehicle combines the state information of the unmanned aerial vehicle and the target state information after Kalman filtering processing optimization, and inputs the target state information into a neural network, and the neural network processes the state information and outputs the synthesized unmanned aerial vehicle action. Through continuous learning, the unmanned aerial vehicle can successfully complete the maneuvering target tracking task finally.
With continuous training and learning, the unmanned aerial vehicle can gradually utilize Kalman filtering to accurately predict a maneuvering target and utilize a DDQN algorithm to complete the tracking of the target. The simulation result is shown in fig. 3, and it can be seen that the unmanned aerial vehicle trained based on kalman filtering and DDQN algorithm can keep a small distance error with a complex maneuvering target, and complete a tracking task.
The above description is only a preferred embodiment of the present invention, and it should be noted that: the embodiments of the present invention are not limited to the above-described implementation methods; other embodiments of the invention, such as deletion, modification, and coloring, are also within the scope of the invention.

Claims (4)

1. An unmanned aerial vehicle maneuvering target tracking method fused with Kalman filtering and DDQN algorithm is characterized by comprising the following steps:
step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;
step 1-1: determining state variables in the MDP model:
the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:
S1=[x1,y1,v,θ]
wherein: x is the number of1,y1The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;
setting a target state according to the sensor information:
Figure FDA0002821990910000011
wherein: x is the number of2,y2The position coordinates of the object are represented and,
Figure FDA0002821990910000012
is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:
Figure FDA0002821990910000013
wherein, FtFor the state transition matrix,. DELTA.t is the update step, BtTo control the matrix, utIs a state control vector, w is the system process noise, w-N (0, Q)w),QwIs the variance of the noise; predicting the observed noise P at the moment t of the statetComprises the following steps:
Pt=FtPt-1Ft T+Qw
Pt-1refers to the observed noise covariance matrix at time t-1,
solving the Kalman coefficient at the moment t as follows:
Figure FDA0002821990910000014
wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:
Figure FDA0002821990910000015
updating the noise covariance matrix and the prediction state:
Pt=(I-KtH)Pt -
Figure FDA0002821990910000016
wherein, I is an identity matrix,
Figure FDA0002821990910000017
respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:
Figure FDA0002821990910000021
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:
Figure FDA0002821990910000022
wherein the content of the first and second substances,
Figure FDA0002821990910000023
is the acceleration of the unmanned aerial vehicle at the moment t,
Figure FDA0002821990910000024
the flight yaw rate of the unmanned aerial vehicle at the moment t;
step 1-3: a reward function R defining a markov model:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
step 1-4: defining a discount factor γ:
setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:
Rall=R1+γR22R3+…+γn-1Rn
Rnrepresenting the prize value obtained by the drone at the nth time;
step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:
step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';
step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;
step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectively
Figure FDA0002821990910000025
Parts by weight:
Figure FDA0002821990910000026
Figure FDA0002821990910000027
step 2-4: initialization step k is 0, training time t is 0 and state input s0
Step 2-5: move each motion in the motion space
Figure FDA0002821990910000031
And state stTransmitting the Q value outputs to the main network theta to obtain Q value outputs corresponding to all actions of the main network, and selecting corresponding action a from the current Q value outputs by a greedy methodt
Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current timetAnd unmanned plane state input s at next momentt+1The experience bar [ s ]t,at,rt,st+1]Adding the data into an experience playback queue;
step 2-7: calculating a DDQN target value Y in conjunction with the target networkt
Figure FDA0002821990910000032
Wherein the content of the first and second substances,
Figure FDA0002821990910000033
outputting the corresponding action for the maximum Q value of the main network;
step 2-8: and updating the main network:
Figure FDA0002821990910000034
Q(st,at| θ) is the state stTaking action atThe value of Q obtained is the value of,
Figure FDA0002821990910000035
is a Hamiltonian, and is a Hamiltonian,
Figure FDA0002821990910000036
represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;
step 2-9: updating the target network:
θ′←τθ+(1-τ)θ′
τ represents an update scale factor;
and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:
step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;
step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;
step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.
2. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that: the state transition matrix
Figure FDA0002821990910000037
3. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:
the reward function R is:
setting tracking reward r1 trackComprises the following steps:
r1 track=Dt-1+Dt
wherein D ist-1、DtThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;
set direction reward
Figure FDA0002821990910000041
Comprises the following steps:
Figure FDA0002821990910000042
wherein the content of the first and second substances,
Figure FDA0002821990910000043
the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;
setting a stable flight reward
Figure FDA0002821990910000047
Comprises the following steps:
Figure FDA0002821990910000044
wherein the content of the first and second substances,
Figure FDA0002821990910000045
representing the speed change rate of the unmanned aerial vehicle at the t moment;
setting an MDP model winning excitation function R:
Figure FDA0002821990910000046
wherein λ is1、λ2、λ3Weights are awarded for each.
4. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:
the discount factor 0< γ < 1.
CN202011440212.3A 2020-12-07 2020-12-07 Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm Pending CN112435275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011440212.3A CN112435275A (en) 2020-12-07 2020-12-07 Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011440212.3A CN112435275A (en) 2020-12-07 2020-12-07 Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm

Publications (1)

Publication Number Publication Date
CN112435275A true CN112435275A (en) 2021-03-02

Family

ID=74692387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011440212.3A Pending CN112435275A (en) 2020-12-07 2020-12-07 Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm

Country Status (1)

Country Link
CN (1) CN112435275A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093803A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm
CN113093124A (en) * 2021-04-07 2021-07-09 哈尔滨工程大学 DQN algorithm-based real-time allocation method for radar interference resources
CN113283516A (en) * 2021-06-01 2021-08-20 西北工业大学 Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory
CN113554680A (en) * 2021-07-21 2021-10-26 清华大学 Target tracking method and device, unmanned aerial vehicle and storage medium
CN113625569A (en) * 2021-08-12 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113962927A (en) * 2021-09-01 2022-01-21 北京长木谷医疗科技有限公司 Acetabulum cup position adjusting method and device based on reinforcement learning and storage medium
CN113967909A (en) * 2021-09-13 2022-01-25 中国人民解放军军事科学院国防科技创新研究院 Mechanical arm intelligent control method based on direction reward
CN114018250A (en) * 2021-10-18 2022-02-08 杭州鸿泉物联网技术股份有限公司 Inertial navigation method, electronic device, storage medium, and computer program product
CN114089776A (en) * 2021-11-09 2022-02-25 南京航空航天大学 Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN114954840A (en) * 2022-05-30 2022-08-30 武汉理工大学 Stability changing control method, system and device for stability changing ship and storage medium
CN117111620A (en) * 2023-10-23 2023-11-24 山东省科学院海洋仪器仪表研究所 Autonomous decision-making method for task allocation of heterogeneous unmanned system
CN117271967A (en) * 2023-11-17 2023-12-22 北京科技大学 Rescue co-location method and system based on reinforcement learning compensation filtering
CN117707207A (en) * 2024-02-06 2024-03-15 中国民用航空飞行学院 Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610512A (en) * 2019-09-09 2019-12-24 西安交通大学 Unmanned aerial vehicle target tracking method based on BP neural network fusion Kalman filtering algorithm
CN110958135A (en) * 2019-11-05 2020-04-03 东华大学 Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning
CN111580544A (en) * 2020-03-25 2020-08-25 北京航空航天大学 Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm
CN111667513A (en) * 2020-06-01 2020-09-15 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN111862165A (en) * 2020-06-17 2020-10-30 南京理工大学 Target tracking method for updating Kalman filter based on deep reinforcement learning
US20200380401A1 (en) * 2019-05-29 2020-12-03 United States Of America As Represented By The Secretary Of The Navy Method for Performing Multi-Agent Reinforcement Learning in the Presence of Unreliable Communications Via Distributed Consensus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200380401A1 (en) * 2019-05-29 2020-12-03 United States Of America As Represented By The Secretary Of The Navy Method for Performing Multi-Agent Reinforcement Learning in the Presence of Unreliable Communications Via Distributed Consensus
CN110610512A (en) * 2019-09-09 2019-12-24 西安交通大学 Unmanned aerial vehicle target tracking method based on BP neural network fusion Kalman filtering algorithm
CN110958135A (en) * 2019-11-05 2020-04-03 东华大学 Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning
CN111580544A (en) * 2020-03-25 2020-08-25 北京航空航天大学 Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm
CN111667513A (en) * 2020-06-01 2020-09-15 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN111862165A (en) * 2020-06-17 2020-10-30 南京理工大学 Target tracking method for updating Kalman filter based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RAHMAD SADLI: "Object Tracking: 2-D Object Tracking using Kalman Filter in Python", 《OBJECT DETECTION, OBJECT TRACKING, PYTHON PROGRAMMING》 *
XU HUANG 等: "Attitude Control of Fixed-wing UAV Based on DDQN", 《2019 CHINESE AUTOMATION CONGRESS (CAC)》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093803A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm
CN113093124A (en) * 2021-04-07 2021-07-09 哈尔滨工程大学 DQN algorithm-based real-time allocation method for radar interference resources
CN113283516A (en) * 2021-06-01 2021-08-20 西北工业大学 Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory
CN113283516B (en) * 2021-06-01 2023-02-28 西北工业大学 Multi-sensor data fusion method based on reinforcement learning and D-S evidence theory
CN113554680A (en) * 2021-07-21 2021-10-26 清华大学 Target tracking method and device, unmanned aerial vehicle and storage medium
CN113625569A (en) * 2021-08-12 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113625569B (en) * 2021-08-12 2022-02-08 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN113962927A (en) * 2021-09-01 2022-01-21 北京长木谷医疗科技有限公司 Acetabulum cup position adjusting method and device based on reinforcement learning and storage medium
CN113967909A (en) * 2021-09-13 2022-01-25 中国人民解放军军事科学院国防科技创新研究院 Mechanical arm intelligent control method based on direction reward
CN114018250A (en) * 2021-10-18 2022-02-08 杭州鸿泉物联网技术股份有限公司 Inertial navigation method, electronic device, storage medium, and computer program product
CN114018250B (en) * 2021-10-18 2024-05-03 杭州鸿泉物联网技术股份有限公司 Inertial navigation method, electronic device, storage medium and computer program product
CN114089776A (en) * 2021-11-09 2022-02-25 南京航空航天大学 Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN114089776B (en) * 2021-11-09 2023-10-24 南京航空航天大学 Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN114954840A (en) * 2022-05-30 2022-08-30 武汉理工大学 Stability changing control method, system and device for stability changing ship and storage medium
CN114954840B (en) * 2022-05-30 2023-09-05 武汉理工大学 Method, system and device for controlling stability of ship
CN117111620A (en) * 2023-10-23 2023-11-24 山东省科学院海洋仪器仪表研究所 Autonomous decision-making method for task allocation of heterogeneous unmanned system
CN117111620B (en) * 2023-10-23 2024-03-29 山东省科学院海洋仪器仪表研究所 Autonomous decision-making method for task allocation of heterogeneous unmanned system
CN117271967A (en) * 2023-11-17 2023-12-22 北京科技大学 Rescue co-location method and system based on reinforcement learning compensation filtering
CN117271967B (en) * 2023-11-17 2024-02-13 北京科技大学 Rescue co-location method and system based on reinforcement learning compensation filtering
CN117707207A (en) * 2024-02-06 2024-03-15 中国民用航空飞行学院 Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning
CN117707207B (en) * 2024-02-06 2024-04-19 中国民用航空飞行学院 Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN112435275A (en) Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm
CN109655066B (en) Unmanned aerial vehicle path planning method based on Q (lambda) algorithm
CN111667513B (en) Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
CN107479368B (en) Method and system for training unmanned aerial vehicle control model based on artificial intelligence
Wu Coordinated path planning for an unmanned aerial-aquatic vehicle (UAAV) and an autonomous underwater vehicle (AUV) in an underwater target strike mission
CN110320809B (en) AGV track correction method based on model predictive control
CN100591900C (en) Flight control system having a three control loop design
CN110908395A (en) Improved unmanned aerial vehicle flight path real-time planning method
CN113268074B (en) Unmanned aerial vehicle flight path planning method based on joint optimization
Nie et al. Three-dimensional path-following control of a robotic airship with reinforcement learning
CN108664024A (en) The motion planning and Cooperative Localization Method and device that unmanned vehicle network is formed into columns
Mansouri et al. Distributed model predictive control for unmanned aerial vehicles
CN114859910A (en) Unmanned ship path following system and method based on deep reinforcement learning
Wu et al. An adaptive reentry guidance method considering the influence of blackout zone
CN115562357A (en) Intelligent path planning method for unmanned aerial vehicle cluster
CN114967721A (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
Chajan et al. GPU based model-predictive path control for self-driving vehicles
Steinbrener et al. Improved state propagation through AI-based pre-processing and down-sampling of high-speed inertial data
Miller et al. Coordinated guidance of autonomous uavs via nominal belief-state optimization
Rottmann et al. Adaptive autonomous control using online value iteration with gaussian processes
Wang et al. Tracking moving target for 6 degree-of-freedom robot manipulator with adaptive visual servoing based on deep reinforcement learning PID controller
Wilson et al. UAV rendezvous: From concept to flight test
CN114964268A (en) Unmanned aerial vehicle navigation method and device
Chindhe et al. Advances in vision-based UAV manoeuvring techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210302