CN112363519A

CN112363519A - Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method

Info

Publication number: CN112363519A
Application number: CN202011125416.8A
Authority: CN
Inventors: 鲜斌; 张诗婧
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-10-20
Filing date: 2020-10-20
Publication date: 2021-02-12
Anticipated expiration: 2040-10-20
Also published as: CN112363519B

Abstract

The invention relates to a quadrotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method, aiming at the problem of quadrotor unmanned aerial vehicle attitude control of a quadrotor unmanned aerial vehicle kinetic model with an unmodeled part, a reinforcement learning controller based on an execution-evaluation neural network is designed for estimating the unmodeled part of the model, and a nonlinear robust controller based on multivariable super-twisting is designed at the same time, so that attitude stability control of the quadrotor unmanned aerial vehicle is realized.

Description

Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method

Technical Field

The invention relates to attitude precision control of a quad-rotor unmanned aerial vehicle. Aiming at the influence of an unmodeled part in a system dynamics model of the quad-rotor unmanned aerial vehicle on the system control performance and the dependence of a control method based on the system dynamics model on an accurate model, the nonlinear attitude controller based on reinforcement learning and a second-order sliding mode control algorithm is provided, and the result of finite time convergence of the attitude control error of the unmanned aerial vehicle is realized. In particular to a finite time convergence attitude control method for a quad-rotor unmanned aerial vehicle.

Background

Traditional linear control algorithms, such as PID algorithms, LQR algorithms, etc., have been used in a wider range of applications in quad-rotor drones. However, the linear control algorithm only ensures that the system has a good control effect in the state near the balance point, and is difficult to obtain satisfactory effects in the aspects of processing the nonlinear multivariable control system, ensuring the anti-interference capability of the system and the like, so that the improvement of the dynamic performance and the steady-state performance of the system is also limited (journal: flight mechanics; prey: li yi ji sha, zhang xiao east; published year-month: 2011-4 month; article title: the current situation and development of the unmanned aerial vehicle flight control method research; page number: 1-5). To this end, a number of non-linear control algorithms are used for quad-rotor drone control. If the Control of the quad-rotor unmanned aerial vehicle is realized by adopting the adaptive sliding mode Control method, the experimental result shows that the adaptive sliding mode Control method has better performance in the aspects of processing sensor noise and model uncertainty (Journal: International Journal of Control, Automation and Systems; Renders: Daewon Lee, H Jin Kim, Shankar science; published month: 2009, 5 months; article title: Feedback linkage v.adaptive sliding mode Control for a quadrotor satellite; page number: 419: 42). However, the conventional first-order sliding mode control algorithm has the buffeting problem and is not beneficial to long-term stable operation of the system. Thus, some researchers began to utilize super-twisting robust control design methods. Theoretically, this algorithm can eliminate buffeting and is used by many researchers for quad-rotor unmanned controls (Journal: Journal of Franklin Institute; Rev: Laloui Defafa, Abdelaziz Benalleguo, LFridman; published month: 3 2012; article title: Super twisting control algorithm for the attute tracking of the four rotors utore uav; page number: 685-. Still other researchers have proposed a Multivariable super-like algorithm considering the Multivariable characteristics of quad-rotor drones and used it for quad-rotor drone attitude control (journal: IEEE Transactions on Industrial Electronics; authors: Bailing tie, Lihong Liu, Hanchen Lu, et al; published month: 2017 month 8; article title: Multivariable fine time attribute control for quadruperator u: Theory and experiment; page number: 2567 + 2577).

With the rapid development of machine learning research work, learning algorithms such as reinforcement learning are also used for the control design of the quad-rotor unmanned aerial vehicle. In consideration of the problems of safe flight and the like, researchers firstly use actual flight data to carry out model identification to obtain an offline learning state transfer model or a random Markov model, then use an enhanced learning algorithm to carry out offline iteration to obtain an optimal control strategy, and finally use the optimal control strategy for unmanned aerial vehicle control (Conference: IEEE RSJ International reference on Intelligent Robots and Systems; authors: Wallander, Gabriel M Hoffmann, Jung Soon Jang, etc.; publication year/month: 2005; article title: Multi-agent quadrat test control design: Integrated sizing mode v.e. recovery leaving; page number: 3712 + 3717). In the simulation environment of the quad-rotor unmanned aerial vehicle, a trainee trains a neural network by using reinforcement learning, and applies the trained neural network to unmanned aerial vehicle Control, so that the flight task of unmanned aerial vehicle throwing and hovering is realized (journal: IEEE Transactions on Industrial Electronics; Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and the like; published month: 2017, 6 months; article title: Control of quadra with recovery learning, and page number: 2096-. Although the offline learning method achieves a good unmanned aerial vehicle control effect, the research provides stability proof less, and the offline learning is long in time consumption and large in calculation amount. On the other hand, part of off-line learning methods are performed in a simulation environment, and various disturbances in a real environment cannot be completely simulated, so that the generalization capability of the learned control algorithm is insufficient. The experiment of Hwangbo et al, although it works well on hover tasks, the tracking effect is not as good as that of the non-linear controller. For this reason, online learning reinforcement learning algorithms are also used for quad-rotor drone control. For example, Sugimoto and the like installs a camera at the bottom of an unmanned aerial vehicle for identifying mark Information on the ground, and then controls the unmanned aerial vehicle to keep the ground mark always in the center of the visual field of the camera by using a reinforcement learning algorithm on a ground station, thereby realizing the hovering experiment of the quadrotor unmanned aerial vehicle (Conference: 20163rd International Conference on Information Science and Control Engineering (ICISCE); authors: Takuya Sugimoto, Manabu Gouko; publication year: 2016; article title: Acquisition of knowledge by actual use of Information retrieval and leaving; page number: 148-.

In consideration of the problems of long calculation time, large calculation amount and the like when the reinforcement learning algorithm is used for unmanned aerial vehicle control, a learner designs a controller based on a Robust Integral of an error sign function (RISE) control algorithm and the reinforcement learning algorithm, and uses the controller for unmanned helicopter attitude control to obtain a good control effect (journal: control theory and application; writer: peaceful, fresh; published year, year 2019, month 4; article title: attitude reinforcement learning control design and verification of an unmanned helicopter; page number: 516-. However, this approach has less application on quad-rotor drones.

With regard to the research on quad-rotor drone control, researchers have achieved some success today, but there are also some limitations: 1) the existing control design usually ignores unmodeled parts in a four-rotor unmanned aerial vehicle dynamic model, but a control method based on a sixuanyiwure basis-tolerant dynamic model has high dependence on an accurate model. Therefore, when the attitude of the quad-rotor unmanned aerial vehicle is accurately controlled, the influence of the unmodeled part of the quad-rotor unmanned aerial vehicle is considered. 2) Some control methods based on reinforcement learning generally utilize flight data to carry out off-line training, and the controller generalization ability that obtains from this is not enough, is difficult to guarantee the flight effect of four rotor unmanned aerial vehicle under special environment.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a nonlinear attitude controller based on reinforcement learning for a quad-rotor unmanned aerial vehicle. The invention considers the unmodeled part in the dynamics model of the quad-rotor unmanned aerial vehicle, and applies a reinforcement learning method and a multivariable super-twisting algorithm to carry out on-line training on the quad-rotor unmanned aerial vehicle to solve the problem of insufficient generalization capability of the controller. Therefore, the invention adopts the technical scheme that the finite time convergence attitude control method of the quad-rotor unmanned aerial vehicle comprises the following steps:

1) establishing a four-rotor unmanned aerial vehicle dynamics model

The unmanned aerial vehicle is an X-shaped quadrotor unmanned aerial vehicle, and a dynamics model of the quadrotor unmanned aerial vehicle is established by adopting a Newton-Euler method, wherein the expression is as follows:

the invention adopts Newton-Euler method to establish a four-rotor unmanned plane dynamics model, and the expression is as follows:

the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,

representing a matrix of coriolis forces and centrifugal forces,

representing a matrix of rotational damping coefficients, where K₁、K₂And K₃Are all unknown constants. And delta (eta) represents unmodeled dynamics in the dynamics model of the quadrotor unmanned aerial vehicle, and meets the condition that | delta (eta) | is less than or equal to rho (| eta |) | | eta | | |, wherein rho is a positive real number, and norms involved in the situation are 2 norms.

And representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle.

Representing control input torque, wherein_φ(t) represents the roll angle channel control input torque, τ_θ(t) represents the pitch channel control input torque, τ_ψ(t) represents the yaw path control input torque. Angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)_r(t) is defined as follows:

the dynamical model in formula (1) has a parameter uncertainty, which can be represented by the following formula:

in formula (3), M₀、C₀Is M (η) and

best estimate of, M_△And C_△Is a parameter uncertainty part.

Formula (1) can be rewritten as follows:

wherein:

to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined

And a slip form surface

The following were used:

wherein

In order to be able to adjust the positive real gain,

is the desired gesture trajectory. The first time derivative is obtained for sigma (t) and substituted by equation (4):

to facilitate subsequent calculations, functions are defined

Is an unmodeled part of a quadrotor drone dynamics model, and is of the form:

therefore, the quad-rotor drone dynamics model can be rewritten as:

then, the design of the nonlinear controller based on the reinforcement learning and multivariable super-twisting control algorithm is carried out for the quadrotor unmanned aerial vehicle dynamics model of the formula (9).

2) Reinforcement learning controller part design

The reinforcement learning controller is designed using an implement-evaluate (Actor-Critic) neural network approach, and thus the section includes two neural networks — the design of the implement neural network and the evaluate neural network. Before two neural network designs are carried out, a performance index function needs to be designed to evaluate the result. The form is as follows:

wherein,

and is

Are all positive definite symmetric constant matrixes.

The minimum of equation (10) is in the form of the Bellman equation:

wherein

According to equation (11), the Hamiltonian function is defined as follows:

defining an optimal control strategy tau^*The corresponding optimal state value function is:

then sigma^*The following Hamiltonian equation is satisfied:

order to

Substituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:

solving the HJB equation to obtain the optimal control quantity tau^*The following were used:

non-modeling part in dynamics model of quad-rotor unmanned aerial vehicle

The impact on a quad-rotor drone is indicated by B. As can be seen from equations (6) - (9), the control objective herein is to give the command within a limited time

Therefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic model

The optimal compensation value of (1) is:

the quad-rotor unmanned aerial vehicle system is a nonlinear system, and for the nonlinear system, the HJB equation is a nonlinear partial differential equation, so that an analytic solution is difficult to obtain. The present specification therefore uses a method of performing-evaluating neural networks to estimate B^*. Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑^*(σ), a specific form thereof is represented as follows:

wherein, W_cTo evaluate the neural network weights ideally, μ_c(σ) to evaluate the neural network excitation function,

to evaluate approximation error of a neural network.

Order to

Is W_cThe optimal estimated values of (1) are:

defining a weight estimation error as

Can be substituted by formula (11):

wherein

Design of

The update rate of (c) is:

wherein, beta_cIn order to evaluate the learning rate of the neural network,

to facilitate subsequent analysis, define

This gives:

according to the foregoing, implementing a neural network for compensating unmodeled parts of a dynamics model of a quad-rotor drone

Influence B (x) on quad-rotor drone, wherein

Representing a state variable. Use executive spiritThe form of the representation b (x) over the network is as follows:

wherein W_aTo implement an ideal weight matrix for the neural network, μ_a(x) In order to perform the neural network excitation function,

to perform approximation error of the neural network. The execution neural network is designed as follows:

substitution of formula (19) for formula (17) can give:

substituting equation (25) for equation (24) defines an error as:

according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:

wherein beta is_a>0 is the learning rate for executing neural networks. Defining weight errors for performing neural networks

And substituting it into formula (27) to obtain

The update rate of (c) is:

wherein

3) Non-linear controller part design

According to the execution-evaluation neural network design, the execution neural network can compensate the unmodeled part in the dynamics model of the quadrotor unmanned aerial vehicle

The impact of the process. Bringing formula (23) into formula (9) can yield:

the control quantity τ is designed as:

wherein

Is a virtual control quantity.

Designing by using a multivariate super-twisting algorithm:

wherein

k₁,k₂,k₃,k₄The gain is positively controlled.

The invention has the characteristics and beneficial effects that:

the invention establishes a dynamics model containing an unmodeled part aiming at the quad-rotor unmanned aerial vehicle, designs a reinforcement learning nonlinear attitude controller based on reinforcement learning and a multivariable super-twisting control algorithm, realizes the finite time convergence control of the attitude error of the quad-rotor unmanned aerial vehicle, improves the robustness of the quad-rotor unmanned aerial vehicle system, and realizes the accurate control of the attitude of the quad-rotor unmanned aerial vehicle.

Description of the drawings:

FIG. 1 is a schematic diagram of a quad-rotor drone system for use with the present invention;

FIG. 2 is a graph of three attitude angles of a quad-rotor drone during flight using a control scheme;

fig. 3 is a graph of three attitude angles of a quad-rotor drone in flight when subjected to external disturbances after the control scheme is employed.

Detailed Description

The technical scheme adopted by the invention is as follows: the method for establishing the dynamics model of the quad-rotor unmanned aerial vehicle comprising the unmodeled part of the system and designing the corresponding reinforcement learning nonlinear attitude controller comprises the following steps:

first, a quad-rotor drone dynamics model needs to be built. Fig. 1 is a schematic diagram of a quad-rotor drone system as used herein. The unmanned aerial vehicle is an X-shaped quadrotor unmanned aerial vehicle, and a dynamics model of the quadrotor unmanned aerial vehicle is established by adopting a Newton-Euler method, wherein the expression is as follows:

representing a matrix of coriolis forces and centrifugal forces,

representing a matrix of rotational damping coefficients, where K₁、K₂And K₃Are all unknown constants. Delta (η) represents unmodeled dynamics in the quad-rotor drone dynamics model and satisfies | | | delta (η) | ≦ rho (| | η |) | | | η | | |, where ρ is a positive real number and the norms referred to in the claims are all 2 norms.

in formula (3), M₀、C₀Is M (η) and

best estimate of, M_△And C_△Is a parameter uncertainty part.

Formula (1) can be rewritten as follows:

wherein:

And a slip form surface

The following were used:

wherein

In order to be able to adjust the positive real gain,

to facilitate subsequent calculations, functions are defined

Is an unmodeled part of a dynamics model of a quadrotor unmanned aerial vehicle and has the following form：

Therefore, the quad-rotor drone dynamics model can be rewritten as:

wherein,

and is

Are all positive definite symmetric constant matrixes.

The minimum of equation (10) is in the form of the Bellman equation:

wherein

According to equation (11), the Hamiltonian function is defined as followsFormula (II):

then sigma^*The following Hamiltonian equation is satisfied:

order to

non-modeling part in dynamics model of quad-rotor unmanned aerial vehicle

Therefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic model

The optimal compensation value of (1) is:

to evaluate approximation error of a neural network.

Order to

Is W_cThe optimal estimated values of (1) are:

defining a weight estimation error as

Can be substituted by formula (11):

wherein

Design of

The update rate of (c) is:

wherein, beta_cIn order to evaluate the learning rate of the neural network,

to facilitate subsequent analysis, define

This gives:

Influence B (x) on quad-rotor drone, wherein

Representing a state variable. The form of the representation b (x) using the executive neural network is as follows:

wherein W_aTo implement the ideal weight matrix for the neural network,μ_a(x) In order to perform the neural network excitation function,

substitution of formula (19) for formula (17) can give:

substituting equation (25) for equation (24) defines an error as:

And substituting it into formula (27) to obtain

The update rate of (c) is:

wherein

The impact of the process. Bringing formula (23) into formula (9) can yield:

the control quantity τ is designed as:

wherein

Is a virtual control quantity.

Designing by using a multivariate super-twisting algorithm:

wherein

k₁,k₂,k₃,k₄The gain is positively controlled.

Obtained by substituting formula (31) for formula (29):

wherein,

it can be shown that when the gain k is₁、k₂、k₃And k₄When equation (33) is satisfied, the attitude tracking error of the quad-rotor drone can converge to zero in a limited time.

In the formula (33)

And

the specific form is as follows:

in the formula (34)

Specific examples of implementation are given below:

first, introduction of experiment platform

The experiment platform adopts a real quad-rotor unmanned aerial vehicle as a controlled object, and a real attitude sensor is loaded on the unmanned aerial vehicle, so that a real and visual unmanned aerial vehicle attitude control effect can be obtained, and the result is closer to the actual flight condition. Meanwhile, the platform establishes communication among the upper computer, the target computer and the monitoring computer by utilizing a network, and is convenient for data interaction and control.

Second, flight experiment results

In order to verify the effectiveness and the feasibility of the nonlinear attitude controller provided by the invention, the four-rotor unmanned aerial vehicle attitude stabilization experiment is carried out on the experimental platform. The control target is that three attitude angles of the unmanned aerial vehicle approach to zero in limited time, namely:

and can still be recovered to a stable state when being interfered by the outside.

The experimental platform relates to the parameter values of inertia moment J ═ diag [1.34,1.31,2.54 ]]^T×10^-2kg·m²The half-axle distance l is 0.225m, the lift-torque coefficient c is 0.25, and the mass m is 1.5 kg.

As can be seen from fig. 2, using the reinforcement learning nonlinear attitude controller, the error can be controlled to within ± 1 °. It can be seen from fig. 3 that the steady state can still be reached when the external disturbance reaches 40 °. Therefore, the quadrotor unmanned aerial vehicle reinforcement learning nonlinear attitude controller designed by the invention has good robustness and can accurately control the attitude angle.

Claims

1. Aiming at the problem of attitude control of a quadrotor unmanned aerial vehicle with an unmodeled part in a quadrotor unmanned aerial vehicle kinetic model, a reinforcement learning controller based on an execution-evaluation neural network is designed for estimating the unmodeled part of the model, and a nonlinear robust controller based on multivariable super-twisting is designed at the same time, so that attitude stability control of the quadrotor unmanned aerial vehicle is realized, and the method comprises the following design steps:

step 1) establishing a four-rotor unmanned aerial vehicle dynamic model;

a Newton-Euler method is adopted to establish a four-rotor unmanned plane dynamic model, and the expression formula is as follows:

representing a matrix of coriolis forces and centrifugal forces,

representing a matrix of rotational damping coefficients, where K₁、K₂And K₃Are all unknown constants; Δ (η) represents unmodeled dynamics in a quadrotor drone dynamics model;

representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle;

representing control input torque, wherein_φ(t) represents the roll angle channel control input torque, τ_θ(t) represents the pitch channel control input torque, τ_ψ(t) represents a yaw angle channel control input torque; angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)_r(t) is defined as follows:

the kinetic model in formula (1) has a parameter uncertainty represented by the following formula:

in formula (3), M₀、C₀Is M (η) and

best estimated value of，M_△And C_△Is a parameter uncertainty portion;

formula (1) is rewritten as follows:

wherein:

And a slip form surface

The following were used:

wherein

In order to be able to adjust the positive real gain,

a desired pose trajectory; the first time derivative is obtained for sigma (t) and substituted by equation (4):

is a squareThen subsequently calculating and defining function

Is an unmodeled part of a quadrotor drone dynamics model, and is of the form:

therefore, the quadrotor drone dynamics model is rewritten as:

then, designing a nonlinear controller based on reinforcement learning and a multivariable super-twisting control algorithm aiming at the four-rotor unmanned aerial vehicle dynamic model of the formula (9);

step 2) designing a reinforcement learning controller part;

the reinforcement learning controller is designed by adopting an executive-evaluation (Actor-critical) neural network method, the part comprises two neural networks, namely the executive neural network and the evaluation neural network, before the two neural networks are designed, a performance index function needs to be designed to evaluate the result, and the form of the performance index function is as follows:

wherein,

and is

Are positive definite symmetric constant matrixes;

the minimum of equation (10) is in the form of the Bellman equation:

wherein

According to equation (11), the Hamiltonian function is defined as follows:

then sigma^*The following Hamiltonian equation is satisfied:

order to

minH＝r+(▽Σ^*)^T(γ+Gτ^*)＝0. (15)

non-modeling part in dynamics model of quad-rotor unmanned aerial vehicle

The impact on a quad-rotor drone is represented by B; the control target is to order within a limited time

Therefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic model

The optimal compensation value of (1) is:

quad-rotor drone systems are nonlinear systems, for which B is estimated using a method of performing-evaluating a neural network^*Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑^*(σ), a specific form thereof is represented as follows:

to evaluate approximation error of a neural network;

order to

Is W_cThe optimal estimated values of (1) are:

defining weightsEstimate error of

Can be substituted by formula (11):

wherein

Design of

The update rate of (c) is:

wherein, beta_cTo evaluate the learning rate of neural networks, beta_c>0，

To facilitate subsequent analysis, define

This gives:

implementation of neural network for compensation of unmodeled part of quadrotor drone dynamics model

Influence B (x) brought for quad-rotor unmanned aerial vehicle, whichIn

Representing a state variable; the form of the representation b (x) using the executive neural network is as follows:

wherein W_aTo implement an ideal weight matrix for the neural network, μ_α(x) In order to perform the neural network excitation function,

to perform approximation error of the neural network; the execution neural network is designed as follows:

substitution of formula (19) for formula (17) can give:

substituting equation (25) for equation (24) defines an error as:

wherein beta is_a>0 is the learning rate of the execution neural network; defining weight errors for performing neural networks

And substituting it into formula (27) to obtain

The update rate of (c) is:

wherein

Step 3), designing a control rate;

performing neural network compensation for unmodeled portions of a quad-rotor drone dynamics model based on execution-evaluation neural network design

The influence of this is obtained by bringing formula (23) into formula (9):

the control quantity τ is designed as:

wherein

Is a virtual control quantity;

designing by using a multivariate super-twisting algorithm:

wherein

k₁,k₂,k₃,k₄Is a positive control gain;

obtained by substituting formula (31) for formula (29):

wherein,

when gain k₁、k₂、k₃And k₄When the formula (33) is satisfied, the attitude tracking error of the quad-rotor unmanned aerial vehicle can be converged to zero within a limited time;

in the formula (33)

And

the specific form is as follows:

in the formula (34)