CN112435275A

CN112435275A - Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm

Info

Publication number: CN112435275A
Application number: CN202011440212.3A
Authority: CN
Inventors: 张修社; 韩春雷; 李琳
Original assignee: CETC 20 Research Institute
Current assignee: CETC 20 Research Institute
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-02

Abstract

The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithms, which is used for accurately estimating a target motion state to obtain a target position and speed, further combining the state information of an unmanned aerial vehicle as neural network input, taking the acceleration and the angular speed of the unmanned aerial vehicle as action output, learning through the DDQN algorithm, completing the training of a flight strategy network, and realizing the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target. The method effectively solves the error problem of direct distance measurement by using the sensor in the traditional unmanned aerial vehicle target tracking task, has higher application value, and effectively solves the DQN over-estimation problem in the traditional DQN algorithm.

Description

Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm

Technical Field

The invention relates to the field of control, in particular to a method for tracking a maneuvering target of an unmanned aerial vehicle, relates to a DDQN algorithm based on deep reinforcement learning in the fields of Kalman filtering algorithm and computer, and belongs to the method application of interdiscipline.

Background

Unmanned Aerial Vehicle (UAV), as a novel aviation aircraft device, has become a practical and effective development tool in the fields of military, civil and scientific research, and plays an important role in aviation industry upgrading, military and civil technology integration and industrial efficiency innovation. In practical application, the unmanned aerial vehicle often needs to face task scenes such as cluster cooperative flight or ground target tracking, and the like, and has higher requirements on manual operation, task allocation and flight path planning technologies. Therefore, the method for effectively tracking the autonomous maneuvering target of the high-precision unmanned aerial vehicle has important significance.

When the unmanned aerial vehicle executes a flight task, the unmanned aerial vehicle often needs to complete related cooperative work along with a specified target, or track a maneuvering target to execute a reconnaissance task, and the flight tracks of multiple machines need to be planned in advance respectively. The unmanned aerial vehicle has the advantages that the unmanned aerial vehicle can learn by considering that the unmanned aerial vehicle has high requirements on manual operation and often faces dynamic external environment interference and other factors, and the unmanned aerial vehicle has great significance in autonomous learning and completing a tracking task on a maneuvering target in an unknown dynamic environment. The patent publication CN110610512A proposes an unmanned aerial vehicle target tracking method based on a BP neural network fusion kalman filtering algorithm, which predicts a target flight trajectory through kalman filtering, predicts a target position by using a BP neural network, and controls an unmanned aerial vehicle to track through PID. However, the method only utilizes the fitting capability of the neural network, does not provide learning capability for the unmanned aerial vehicle, and cannot be applied to a dynamically changing environment. Patent CN110806759A provides an aircraft route tracking method based on deep reinforcement learning, which completes route tracking by constructing a Markov decision process model and combining a deep reinforcement learning algorithm, however, in the method, an aircraft can only fly to reach each task point in sequence according to provided information to complete route tracking, and cannot track a maneuvering target with unknown motion trajectory, and certain limitations are provided.

The Kalman filtering algorithm is a common method in a control theory and a control system project, can estimate a true value based on data of an observed value and an estimated value, can be used for estimating the motion state of a maneuvering target in real time, and effectively improves the accuracy of target state prediction. As an optimization algorithm based on a deep reinforcement learning DQN algorithm, DDQN can not only help an unmanned aerial vehicle to provide learning ability and complete a maneuvering target tracking task, but also can effectively solve the problem of over-estimation of target state prediction, and improve learning precision and efficiency. Therefore, the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm is designed, and has important significance for the unmanned aerial vehicle to independently complete high-precision track prediction and real-time tracking task of maneuvering target.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm. According to the invention, a Kalman filtering technology is utilized to accurately estimate the motion state of a target to obtain the position and the speed of the target, the state information of the unmanned aerial vehicle is further combined as the input of a neural network, the acceleration and the angular speed of the unmanned aerial vehicle are used as the action output, the training of a flight strategy network is completed through the learning of a DDQN algorithm, and the autonomous tracking decision of the unmanned aerial vehicle on a maneuvering target is realized.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step 1: constructing a Markov (MDP) model for tracking a maneuvering target of the unmanned aerial vehicle;

step 1-1: determining state variables in the MDP model:

the inertial navigation system is used for carrying out fixed-height flight on the unmanned aerial vehicle, and the state of the unmanned aerial vehicle in a two-dimensional space is set:

S₁＝[x₁,y₁,v,θ]

wherein: x is the number of₁,y₁The position coordinate of the unmanned aerial vehicle is represented, v is the flight speed of the unmanned aerial vehicle, and theta is the flight yaw angle of the unmanned aerial vehicle;

setting a target state according to the sensor information:

wherein: x is the number of₂,y₂The position coordinates of the object are represented and,

is the velocity component of the target along axis X, Y; introducing a Kalman filtering method, and predicting the information of the target at the next moment:

wherein, F_tFor the state transition matrix,. DELTA.t is the update step, B_tTo control the matrix, u_tIs a state control vector, w is the system process noise, w-N (0, Q)_w)，Q_wIs the variance of the noise; predicting the observed noise P at the moment t of the state_tComprises the following steps:

P_t＝F_tP_t-1F_t ^T+Q_w

P_t-1refers to the observed noise covariance matrix at time t-1,

solving the Kalman coefficient at the moment t as follows:

wherein H is an observation matrix, and O is an observation noise variance; according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:

updating the noise covariance matrix and the prediction state:

wherein, I is an identity matrix,

respectively predicting values of a position component and a speed component of the target along an X, Y axis after Kalman filtering optimization; combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:

step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:

the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; unmanned aerial vehicle passes through state input, changes self speed change rate and turns to the rate of change and adjust self motion trail, sets for output and moves as:

wherein the content of the first and second substances,

is the acceleration of the unmanned aerial vehicle at the moment t,

the flight yaw rate of the unmanned aerial vehicle at the moment t;

step 1-3: a reward function R defining a markov model:

acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;

step 1-4: defining a discount factor γ:

setting a discount factor gamma, wherein when the discount factor is large, the longer-term income is emphasized, and the total return of the whole learning process is as follows:

R_all＝R₁+γR₂+γ²R₃+…+γ^n-1R_n

R_nrepresenting the prize value obtained by the drone at the nth time;

step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle:

step 2-1: constructing a hidden layer theta and a state-behavior value function Q (s, a | theta) of a main BP neural network in a DDQN algorithm, and copying main network parameters into a target network theta ', namely theta → theta';

step 2-2: setting a maximum training return as E, setting the maximum step number of each return as K, setting an experience playback queue size M, and setting a soft update proportionality coefficient tau and a neural network learning rate alpha of a target neural network; setting the round number e to be 0;

step 2-3: discretizing the acceleration and the flight yaw angular velocity of the unmanned aerial vehicle respectively

Parts by weight:

step 2-4: initialization step k is 0, training time t is 0 and state input s₀；

Step 2-5: move each motion in the motion space

And state s_tTransmitting the data into the main network theta to obtain Q value output corresponding to all actions of the main network,greedy selection of corresponding action a in current Q-value output_t；

Step 2-6: executing action, reading MDP model and updating environment information to obtain reward r at current time_tAnd unmanned plane state input s at next moment_t+1The experience bar [ s ]_t,a_t,r_t,s_t+1]Adding the data into an experience playback queue;

step 2-7: calculating a DDQN target value Y in conjunction with the target network_t：

Wherein the content of the first and second substances,

outputting the corresponding action for the maximum Q value of the main network;

step 2-8: and updating the main network:

Q(s_t,a_t| θ) is the state s_tTaking action a_tThe value of Q obtained is the value of,

is a Hamiltonian, and is a Hamiltonian,

represents updates propagated back through the neural network gradient; alpha is the neural network learning rate;

step 2-9: updating the target network:

θ′←τθ+(1-τ)θ′

τ represents an update scale factor;

and step 3: and (3) training the model by combining the MDP model and the DDQN algorithm:

step 3-1: updating step k and adding 1, and executing judgment: when K is less than K, the updating time t is t + delta t, and the step 2-4 is carried out; otherwise, entering step 3-2;

step 3-2: the number of update rounds e is added to 1, and judgment is performed: if E is less than E, updating to the step 2-3; otherwise, finishing the training and entering the step 3-3;

step 3-3: terminating the DDQN network training process and storing the current network parameters; loading the stored parameters into an unmanned aerial vehicle maneuvering target tracking system, and inputting the unmanned aerial vehicle into a neural network at each moment by integrating the state of the unmanned aerial vehicle and the target state after Kalman filtering processing; and (3) outputting a proper action through fitting of a DDQN neural network to complete tracking of the automatic target.

The state transition matrix

The reward function R is:

setting tracking rewards

Comprises the following steps:

wherein D is_t-1、D_tThe distances between the unmanned aerial vehicle and the target at the last moment t-1 and the current moment t are respectively;

set direction reward

Comprises the following steps:

wherein the content of the first and second substances,

the relative azimuth angle of the unmanned aerial vehicle and the target is obtained;

setting a stable flight reward

Comprises the following steps:

wherein the content of the first and second substances,

representing the speed change rate of the unmanned aerial vehicle at the t moment;

setting an MDP model winning excitation function R:

wherein λ is₁、λ₂、λ₃Weights are awarded for each.

The discount factor 0< γ < 1.

The invention has the beneficial effects that:

(1) in the MDP model constructed by the method, Kalman filtering is introduced for state input to process, so that the accuracy of target state prediction is improved, the error problem of direct distance measurement by using a sensor in a traditional unmanned aerial vehicle target tracking task is effectively solved, and the method has high application value;

(2) the invention uses DDQN algorithm, effectively solves the DQN over-estimation problem in the traditional DQN algorithm. The method comprises the steps of obtaining accurate target state estimation by combining with Kalman filtering, discretizing by combining with self state, and outputting appropriate actions through network processing to complete tracking of a maneuvering target; the unmanned aerial vehicle trained by the DDQN algorithm can cope with a dynamic change environment and complete the tracking of a maneuvering target.

Drawings

Fig. 1 is a flow chart of unmanned aerial vehicle training based on kalman filtering and DDQN algorithm.

Fig. 2 is a schematic diagram of unmanned aerial vehicle maneuvering target tracking based on kalman filtering and DDQN algorithm.

Fig. 3 is a task display diagram of unmanned aerial vehicle maneuvering target tracking.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

The invention provides an unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm, and the whole flow is shown in figure 1. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:

step 1: construction of Markov (MDP) model for unmanned aerial vehicle maneuvering target tracking

Step 1-1: determining state variables in the MDP model:

S₁＝[x₁,y₁,v,θ]

setting a target state according to the sensor information:

wherein the content of the first and second substances,

is in a stateTransition matrix, Δ t as update step, B_tTo control the matrix, u_tIs a state control vector, w is the system process noise, w-N (0, Q)_w)，Q_wIs the noise variance. Prediction of observed noise P at state instants_tComprises the following steps:

P_t＝F_tP_t-1F_t ^T+Q_w

solving the Kalman coefficient at the moment t as follows:

according to the sensor information, calculating the observation distance Z of the unmanned aerial vehicle to the target:

wherein H is an observation matrix, and O is an observation noise variance; updating the noise covariance matrix and the prediction state:

wherein, I is an identity matrix,

respectively, the predicted values of the position component and the velocity component of the kalman filter optimized target along the X, Y axis. Combining the state of the unmanned aerial vehicle and the target state prediction information, setting the state input in the MDP model as follows:

and outputting an action A to represent an action set taken by the unmanned aerial vehicle aiming at the self state value after receiving the external feedback value. According to the invention, the unmanned aerial vehicle changes the self speed change rate and the steering change rate through state input to adjust the self motion track. The output is set as:

wherein the content of the first and second substances,

is the acceleration of the unmanned aerial vehicle at the moment t,

the flight yaw rate of the unmanned aerial vehicle at the moment t;

step 1-3: a reward function R defining a markov model:

setting tracking rewards

Comprises the following steps:

wherein D is_t-1、D_tThe distances between the unmanned aerial vehicle and the target at the previous moment and the current moment are respectively;

set direction reward

Comprises the following steps:

wherein the content of the first and second substances,

setting a stable flight reward

Comprises the following steps:

wherein the content of the first and second substances,

and (3) combining the weights, setting an MDP model winning incentive function R:

step 1-4: defining a discount factor γ:

setting the discount factor γ to 0.9, the total return of the whole learning process is:

R_all＝R₁+γR₂+γ²R₃+…+γ^n-1R_n

R_nrepresenting the value of the reward obtained by the drone at step n;

step 2: according to the MDP model constructed in the step 1, the DDQN algorithm is used for realizing the tracking training and control of the maneuvering target of the unmanned aerial vehicle, and the schematic diagram of the tracking of the maneuvering target of the unmanned aerial vehicle based on the DDQN algorithm is shown in FIG. 2:

step 2-2: setting the maximum training round as E as 800, setting the maximum step number of each round as K as 400, setting the size M of an experience playback queue as 20000, setting the soft update proportionality coefficient tau of a target neural network as 0.01, and setting the learning rate alpha of the neural network as 0.001; setting the round number e to be 0;

step 2-3: in this embodiment, the acceleration of the unmanned aerial vehicle is set

(unit: m/s)²) And flight yaw rate

(unit: degree/sec), discretized into 7 parts:

step 2-4: initialization step k is 0, training time t is 0, and state input s₀＝[100,120,20,0,100,200,20,0]；

Step 2-5: move each motion in the motion space

And state s_tAnd transmitting the data to the main network theta to obtain Q value output corresponding to all actions of the main network. Greedy selection of corresponding action a in current Q-value output_t；

Wherein the content of the first and second substances,

step 2-8: and updating the main network:

is a Hamiltonian, and is a Hamiltonian,

represents updates propagated back through the neural network gradient;

step 2-9: updating the target network:

θ′←0.01θ+(1-0.01)θ′

According to the unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm, target states are predicted by introducing Kalman filtering, an MDP model is further constructed and loaded into the DDQN algorithm. In each round, the unmanned aerial vehicle combines the state information of the unmanned aerial vehicle and the target state information after Kalman filtering processing optimization, and inputs the target state information into a neural network, and the neural network processes the state information and outputs the synthesized unmanned aerial vehicle action. Through continuous learning, the unmanned aerial vehicle can successfully complete the maneuvering target tracking task finally.

With continuous training and learning, the unmanned aerial vehicle can gradually utilize Kalman filtering to accurately predict a maneuvering target and utilize a DDQN algorithm to complete the tracking of the target. The simulation result is shown in fig. 3, and it can be seen that the unmanned aerial vehicle trained based on kalman filtering and DDQN algorithm can keep a small distance error with a complex maneuvering target, and complete a tracking task.

The above description is only a preferred embodiment of the present invention, and it should be noted that: the embodiments of the present invention are not limited to the above-described implementation methods; other embodiments of the invention, such as deletion, modification, and coloring, are also within the scope of the invention.

Claims

1. An unmanned aerial vehicle maneuvering target tracking method fused with Kalman filtering and DDQN algorithm is characterized by comprising the following steps:

step 1-1: determining state variables in the MDP model:

S₁＝[x₁,y₁,v,θ]

setting a target state according to the sensor information:

P_t＝F_tP_t-1F_t ^T+Q_w

P_t-1refers to the observed noise covariance matrix at time t-1,

solving the Kalman coefficient at the moment t as follows:

updating the noise covariance matrix and the prediction state:

P_t＝(I-K_tH)P_t ^-

wherein, I is an identity matrix,

wherein the content of the first and second substances,

is the acceleration of the unmanned aerial vehicle at the moment t,

the flight yaw rate of the unmanned aerial vehicle at the moment t;

step 1-3: a reward function R defining a markov model:

step 1-4: defining a discount factor γ:

R_all＝R₁+γR₂+γ²R₃+…+γ^n-1R_n

R_nrepresenting the prize value obtained by the drone at the nth time;

Parts by weight:

Step 2-5: move each motion in the motion space

And state s_tTransmitting the Q value outputs to the main network theta to obtain Q value outputs corresponding to all actions of the main network, and selecting corresponding action a from the current Q value outputs by a greedy method_t；

Wherein the content of the first and second substances,

step 2-8: and updating the main network:

is a Hamiltonian, and is a Hamiltonian,

step 2-9: updating the target network:

θ′←τθ+(1-τ)θ′

τ represents an update scale factor;

2. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that: the state transition matrix

3. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:

the reward function R is:

setting tracking reward r₁ ^trackComprises the following steps:

r₁ ^track＝D_t-1+D_t

set direction reward

Comprises the following steps:

wherein the content of the first and second substances,

setting a stable flight reward

Comprises the following steps:

wherein the content of the first and second substances,

setting an MDP model winning excitation function R:

wherein λ is₁、λ₂、λ₃Weights are awarded for each.

4. The unmanned aerial vehicle maneuvering target tracking method fusing Kalman filtering and DDQN algorithm according to claim 1, characterized in that:

the discount factor 0< γ < 1.