CN112363519A - Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method - Google Patents

Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method Download PDF

Info

Publication number
CN112363519A
CN112363519A CN202011125416.8A CN202011125416A CN112363519A CN 112363519 A CN112363519 A CN 112363519A CN 202011125416 A CN202011125416 A CN 202011125416A CN 112363519 A CN112363519 A CN 112363519A
Authority
CN
China
Prior art keywords
formula
neural network
aerial vehicle
unmanned aerial
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011125416.8A
Other languages
Chinese (zh)
Other versions
CN112363519B (en
Inventor
鲜斌
张诗婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011125416.8A priority Critical patent/CN112363519B/en
Publication of CN112363519A publication Critical patent/CN112363519A/en
Application granted granted Critical
Publication of CN112363519B publication Critical patent/CN112363519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • G05D1/0816Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability
    • G05D1/0825Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability using mathematical models

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a quadrotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method, aiming at the problem of quadrotor unmanned aerial vehicle attitude control of a quadrotor unmanned aerial vehicle kinetic model with an unmodeled part, a reinforcement learning controller based on an execution-evaluation neural network is designed for estimating the unmodeled part of the model, and a nonlinear robust controller based on multivariable super-twisting is designed at the same time, so that attitude stability control of the quadrotor unmanned aerial vehicle is realized.

Description

Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method
Technical Field
The invention relates to attitude precision control of a quad-rotor unmanned aerial vehicle. Aiming at the influence of an unmodeled part in a system dynamics model of the quad-rotor unmanned aerial vehicle on the system control performance and the dependence of a control method based on the system dynamics model on an accurate model, the nonlinear attitude controller based on reinforcement learning and a second-order sliding mode control algorithm is provided, and the result of finite time convergence of the attitude control error of the unmanned aerial vehicle is realized. In particular to a finite time convergence attitude control method for a quad-rotor unmanned aerial vehicle.
Background
Traditional linear control algorithms, such as PID algorithms, LQR algorithms, etc., have been used in a wider range of applications in quad-rotor drones. However, the linear control algorithm only ensures that the system has a good control effect in the state near the balance point, and is difficult to obtain satisfactory effects in the aspects of processing the nonlinear multivariable control system, ensuring the anti-interference capability of the system and the like, so that the improvement of the dynamic performance and the steady-state performance of the system is also limited (journal: flight mechanics; prey: li yi ji sha, zhang xiao east; published year-month: 2011-4 month; article title: the current situation and development of the unmanned aerial vehicle flight control method research; page number: 1-5). To this end, a number of non-linear control algorithms are used for quad-rotor drone control. If the Control of the quad-rotor unmanned aerial vehicle is realized by adopting the adaptive sliding mode Control method, the experimental result shows that the adaptive sliding mode Control method has better performance in the aspects of processing sensor noise and model uncertainty (Journal: International Journal of Control, Automation and Systems; Renders: Daewon Lee, H Jin Kim, Shankar science; published month: 2009, 5 months; article title: Feedback linkage v.adaptive sliding mode Control for a quadrotor satellite; page number: 419: 42). However, the conventional first-order sliding mode control algorithm has the buffeting problem and is not beneficial to long-term stable operation of the system. Thus, some researchers began to utilize super-twisting robust control design methods. Theoretically, this algorithm can eliminate buffeting and is used by many researchers for quad-rotor unmanned controls (Journal: Journal of Franklin Institute; Rev: Laloui Defafa, Abdelaziz Benalleguo, LFridman; published month: 3 2012; article title: Super twisting control algorithm for the attute tracking of the four rotors utore uav; page number: 685-. Still other researchers have proposed a Multivariable super-like algorithm considering the Multivariable characteristics of quad-rotor drones and used it for quad-rotor drone attitude control (journal: IEEE Transactions on Industrial Electronics; authors: Bailing tie, Lihong Liu, Hanchen Lu, et al; published month: 2017 month 8; article title: Multivariable fine time attribute control for quadruperator u: Theory and experiment; page number: 2567 + 2577).
With the rapid development of machine learning research work, learning algorithms such as reinforcement learning are also used for the control design of the quad-rotor unmanned aerial vehicle. In consideration of the problems of safe flight and the like, researchers firstly use actual flight data to carry out model identification to obtain an offline learning state transfer model or a random Markov model, then use an enhanced learning algorithm to carry out offline iteration to obtain an optimal control strategy, and finally use the optimal control strategy for unmanned aerial vehicle control (Conference: IEEE RSJ International reference on Intelligent Robots and Systems; authors: Wallander, Gabriel M Hoffmann, Jung Soon Jang, etc.; publication year/month: 2005; article title: Multi-agent quadrat test control design: Integrated sizing mode v.e. recovery leaving; page number: 3712 + 3717). In the simulation environment of the quad-rotor unmanned aerial vehicle, a trainee trains a neural network by using reinforcement learning, and applies the trained neural network to unmanned aerial vehicle Control, so that the flight task of unmanned aerial vehicle throwing and hovering is realized (journal: IEEE Transactions on Industrial Electronics; Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and the like; published month: 2017, 6 months; article title: Control of quadra with recovery learning, and page number: 2096-. Although the offline learning method achieves a good unmanned aerial vehicle control effect, the research provides stability proof less, and the offline learning is long in time consumption and large in calculation amount. On the other hand, part of off-line learning methods are performed in a simulation environment, and various disturbances in a real environment cannot be completely simulated, so that the generalization capability of the learned control algorithm is insufficient. The experiment of Hwangbo et al, although it works well on hover tasks, the tracking effect is not as good as that of the non-linear controller. For this reason, online learning reinforcement learning algorithms are also used for quad-rotor drone control. For example, Sugimoto and the like installs a camera at the bottom of an unmanned aerial vehicle for identifying mark Information on the ground, and then controls the unmanned aerial vehicle to keep the ground mark always in the center of the visual field of the camera by using a reinforcement learning algorithm on a ground station, thereby realizing the hovering experiment of the quadrotor unmanned aerial vehicle (Conference: 20163rd International Conference on Information Science and Control Engineering (ICISCE); authors: Takuya Sugimoto, Manabu Gouko; publication year: 2016; article title: Acquisition of knowledge by actual use of Information retrieval and leaving; page number: 148-.
In consideration of the problems of long calculation time, large calculation amount and the like when the reinforcement learning algorithm is used for unmanned aerial vehicle control, a learner designs a controller based on a Robust Integral of an error sign function (RISE) control algorithm and the reinforcement learning algorithm, and uses the controller for unmanned helicopter attitude control to obtain a good control effect (journal: control theory and application; writer: peaceful, fresh; published year, year 2019, month 4; article title: attitude reinforcement learning control design and verification of an unmanned helicopter; page number: 516-. However, this approach has less application on quad-rotor drones.
With regard to the research on quad-rotor drone control, researchers have achieved some success today, but there are also some limitations: 1) the existing control design usually ignores unmodeled parts in a four-rotor unmanned aerial vehicle dynamic model, but a control method based on a sixuanyiwure basis-tolerant dynamic model has high dependence on an accurate model. Therefore, when the attitude of the quad-rotor unmanned aerial vehicle is accurately controlled, the influence of the unmodeled part of the quad-rotor unmanned aerial vehicle is considered. 2) Some control methods based on reinforcement learning generally utilize flight data to carry out off-line training, and the controller generalization ability that obtains from this is not enough, is difficult to guarantee the flight effect of four rotor unmanned aerial vehicle under special environment.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a nonlinear attitude controller based on reinforcement learning for a quad-rotor unmanned aerial vehicle. The invention considers the unmodeled part in the dynamics model of the quad-rotor unmanned aerial vehicle, and applies a reinforcement learning method and a multivariable super-twisting algorithm to carry out on-line training on the quad-rotor unmanned aerial vehicle to solve the problem of insufficient generalization capability of the controller. Therefore, the invention adopts the technical scheme that the finite time convergence attitude control method of the quad-rotor unmanned aerial vehicle comprises the following steps:
1) establishing a four-rotor unmanned aerial vehicle dynamics model
The unmanned aerial vehicle is an X-shaped quadrotor unmanned aerial vehicle, and a dynamics model of the quadrotor unmanned aerial vehicle is established by adopting a Newton-Euler method, wherein the expression is as follows:
the invention adopts Newton-Euler method to establish a four-rotor unmanned plane dynamics model, and the expression is as follows:
Figure BDA0002733450100000021
the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,
Figure BDA0002733450100000022
representing a matrix of coriolis forces and centrifugal forces,
Figure BDA0002733450100000023
Figure BDA0002733450100000024
representing a matrix of rotational damping coefficients, where K1、K2And K3Are all unknown constants. And delta (eta) represents unmodeled dynamics in the dynamics model of the quadrotor unmanned aerial vehicle, and meets the condition that | delta (eta) | is less than or equal to rho (| eta |) | | eta | | |, wherein rho is a positive real number, and norms involved in the situation are 2 norms.
Figure BDA0002733450100000025
And representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle.
Figure BDA0002733450100000031
Representing control input torque, whereinφ(t) represents the roll angle channel control input torque, τθ(t) represents the pitch channel control input torque, τψ(t) represents the yaw path control input torque. Angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)r(t) is defined as follows:
Figure BDA0002733450100000032
the dynamical model in formula (1) has a parameter uncertainty, which can be represented by the following formula:
Figure BDA0002733450100000033
in formula (3), M0、C0Is M (η) and
Figure BDA0002733450100000034
best estimate of, MAnd CIs a parameter uncertainty part.
Formula (1) can be rewritten as follows:
Figure BDA0002733450100000035
wherein:
Figure BDA0002733450100000036
to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined
Figure BDA0002733450100000037
Figure BDA0002733450100000038
And a slip form surface
Figure BDA0002733450100000039
The following were used:
Figure BDA00027334501000000310
wherein
Figure BDA00027334501000000311
In order to be able to adjust the positive real gain,
Figure BDA00027334501000000312
is the desired gesture trajectory. The first time derivative is obtained for sigma (t) and substituted by equation (4):
Figure BDA00027334501000000313
to facilitate subsequent calculations, functions are defined
Figure BDA00027334501000000314
Is an unmodeled part of a quadrotor drone dynamics model, and is of the form:
Figure BDA00027334501000000315
therefore, the quad-rotor drone dynamics model can be rewritten as:
Figure BDA00027334501000000316
then, the design of the nonlinear controller based on the reinforcement learning and multivariable super-twisting control algorithm is carried out for the quadrotor unmanned aerial vehicle dynamics model of the formula (9).
2) Reinforcement learning controller part design
The reinforcement learning controller is designed using an implement-evaluate (Actor-Critic) neural network approach, and thus the section includes two neural networks — the design of the implement neural network and the evaluate neural network. Before two neural network designs are carried out, a performance index function needs to be designed to evaluate the result. The form is as follows:
Figure BDA00027334501000000317
wherein,
Figure BDA00027334501000000318
and is
Figure BDA00027334501000000319
Are all positive definite symmetric constant matrixes.
The minimum of equation (10) is in the form of the Bellman equation:
Figure BDA00027334501000000320
wherein
Figure BDA0002733450100000041
According to equation (11), the Hamiltonian function is defined as follows:
Figure BDA0002733450100000042
defining an optimal control strategy tau*The corresponding optimal state value function is:
Figure BDA0002733450100000043
then sigma*The following Hamiltonian equation is satisfied:
Figure BDA0002733450100000044
order to
Figure BDA0002733450100000045
Substituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:
Figure BDA0002733450100000046
solving the HJB equation to obtain the optimal control quantity tau*The following were used:
Figure BDA0002733450100000047
non-modeling part in dynamics model of quad-rotor unmanned aerial vehicle
Figure BDA0002733450100000048
The impact on a quad-rotor drone is indicated by B. As can be seen from equations (6) - (9), the control objective herein is to give the command within a limited time
Figure BDA0002733450100000049
Therefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic model
Figure BDA00027334501000000410
The optimal compensation value of (1) is:
Figure BDA00027334501000000411
the quad-rotor unmanned aerial vehicle system is a nonlinear system, and for the nonlinear system, the HJB equation is a nonlinear partial differential equation, so that an analytic solution is difficult to obtain. The present specification therefore uses a method of performing-evaluating neural networks to estimate B*. Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑*(σ), a specific form thereof is represented as follows:
Figure BDA00027334501000000412
wherein, WcTo evaluate the neural network weights ideally, μc(σ) to evaluate the neural network excitation function,
Figure BDA00027334501000000413
to evaluate approximation error of a neural network.
Order to
Figure BDA00027334501000000414
Is WcThe optimal estimated values of (1) are:
Figure BDA00027334501000000415
defining a weight estimation error as
Figure BDA00027334501000000416
Can be substituted by formula (11):
Figure BDA00027334501000000417
wherein
Figure BDA00027334501000000418
Design of
Figure BDA00027334501000000419
The update rate of (c) is:
Figure BDA00027334501000000420
wherein, betacIn order to evaluate the learning rate of the neural network,
Figure BDA00027334501000000421
to facilitate subsequent analysis, define
Figure BDA00027334501000000422
Figure BDA00027334501000000423
This gives:
Figure BDA00027334501000000424
according to the foregoing, implementing a neural network for compensating unmodeled parts of a dynamics model of a quad-rotor drone
Figure BDA0002733450100000051
Influence B (x) on quad-rotor drone, wherein
Figure BDA0002733450100000052
Representing a state variable. Use executive spiritThe form of the representation b (x) over the network is as follows:
Figure BDA0002733450100000053
wherein WaTo implement an ideal weight matrix for the neural network, μa(x) In order to perform the neural network excitation function,
Figure BDA0002733450100000054
to perform approximation error of the neural network. The execution neural network is designed as follows:
Figure BDA0002733450100000055
substitution of formula (19) for formula (17) can give:
Figure BDA0002733450100000056
substituting equation (25) for equation (24) defines an error as:
Figure BDA0002733450100000057
according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:
Figure BDA0002733450100000058
wherein beta isa>0 is the learning rate for executing neural networks. Defining weight errors for performing neural networks
Figure BDA0002733450100000059
And substituting it into formula (27) to obtain
Figure BDA00027334501000000510
The update rate of (c) is:
Figure BDA00027334501000000511
wherein
Figure BDA00027334501000000512
3) Non-linear controller part design
According to the execution-evaluation neural network design, the execution neural network can compensate the unmodeled part in the dynamics model of the quadrotor unmanned aerial vehicle
Figure BDA00027334501000000513
The impact of the process. Bringing formula (23) into formula (9) can yield:
Figure BDA00027334501000000514
the control quantity τ is designed as:
Figure BDA00027334501000000515
wherein
Figure BDA00027334501000000516
Is a virtual control quantity.
Figure BDA00027334501000000517
Designing by using a multivariate super-twisting algorithm:
Figure BDA00027334501000000518
wherein
Figure BDA00027334501000000519
k1,k2,k3,k4The gain is positively controlled.
The invention has the characteristics and beneficial effects that:
the invention establishes a dynamics model containing an unmodeled part aiming at the quad-rotor unmanned aerial vehicle, designs a reinforcement learning nonlinear attitude controller based on reinforcement learning and a multivariable super-twisting control algorithm, realizes the finite time convergence control of the attitude error of the quad-rotor unmanned aerial vehicle, improves the robustness of the quad-rotor unmanned aerial vehicle system, and realizes the accurate control of the attitude of the quad-rotor unmanned aerial vehicle.
Description of the drawings:
FIG. 1 is a schematic diagram of a quad-rotor drone system for use with the present invention;
FIG. 2 is a graph of three attitude angles of a quad-rotor drone during flight using a control scheme;
fig. 3 is a graph of three attitude angles of a quad-rotor drone in flight when subjected to external disturbances after the control scheme is employed.
Detailed Description
The technical scheme adopted by the invention is as follows: the method for establishing the dynamics model of the quad-rotor unmanned aerial vehicle comprising the unmodeled part of the system and designing the corresponding reinforcement learning nonlinear attitude controller comprises the following steps:
first, a quad-rotor drone dynamics model needs to be built. Fig. 1 is a schematic diagram of a quad-rotor drone system as used herein. The unmanned aerial vehicle is an X-shaped quadrotor unmanned aerial vehicle, and a dynamics model of the quadrotor unmanned aerial vehicle is established by adopting a Newton-Euler method, wherein the expression is as follows:
Figure BDA0002733450100000061
the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,
Figure BDA0002733450100000062
representing a matrix of coriolis forces and centrifugal forces,
Figure BDA0002733450100000063
Figure BDA0002733450100000064
representing a matrix of rotational damping coefficients, where K1、K2And K3Are all unknown constants. Delta (η) represents unmodeled dynamics in the quad-rotor drone dynamics model and satisfies | | | delta (η) | ≦ rho (| | η |) | | | η | | |, where ρ is a positive real number and the norms referred to in the claims are all 2 norms.
Figure BDA0002733450100000065
And representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle.
Figure BDA0002733450100000066
Figure BDA0002733450100000067
Representing control input torque, whereinφ(t) represents the roll angle channel control input torque, τθ(t) represents the pitch channel control input torque, τψ(t) represents the yaw path control input torque. Angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)r(t) is defined as follows:
Figure BDA0002733450100000068
the dynamical model in formula (1) has a parameter uncertainty, which can be represented by the following formula:
Figure BDA0002733450100000069
in formula (3), M0、C0Is M (η) and
Figure BDA00027334501000000610
best estimate of, MAnd CIs a parameter uncertainty part.
Formula (1) can be rewritten as follows:
Figure BDA00027334501000000611
wherein:
Figure BDA00027334501000000612
to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined
Figure BDA0002733450100000071
Figure BDA0002733450100000072
And a slip form surface
Figure BDA0002733450100000073
The following were used:
Figure BDA0002733450100000074
wherein
Figure BDA0002733450100000075
In order to be able to adjust the positive real gain,
Figure BDA0002733450100000076
is the desired gesture trajectory. The first time derivative is obtained for sigma (t) and substituted by equation (4):
Figure BDA0002733450100000077
to facilitate subsequent calculations, functions are defined
Figure BDA0002733450100000078
Is an unmodeled part of a dynamics model of a quadrotor unmanned aerial vehicle and has the following form:
Figure BDA0002733450100000079
Therefore, the quad-rotor drone dynamics model can be rewritten as:
Figure BDA00027334501000000710
then, the design of the nonlinear controller based on the reinforcement learning and multivariable super-twisting control algorithm is carried out for the quadrotor unmanned aerial vehicle dynamics model of the formula (9).
The reinforcement learning controller is designed using an implement-evaluate (Actor-Critic) neural network approach, and thus the section includes two neural networks — the design of the implement neural network and the evaluate neural network. Before two neural network designs are carried out, a performance index function needs to be designed to evaluate the result. The form is as follows:
Figure BDA00027334501000000711
wherein,
Figure BDA00027334501000000712
and is
Figure BDA00027334501000000713
Are all positive definite symmetric constant matrixes.
The minimum of equation (10) is in the form of the Bellman equation:
Figure BDA00027334501000000714
wherein
Figure BDA00027334501000000715
According to equation (11), the Hamiltonian function is defined as followsFormula (II):
Figure BDA00027334501000000716
defining an optimal control strategy tau*The corresponding optimal state value function is:
Figure BDA00027334501000000717
then sigma*The following Hamiltonian equation is satisfied:
Figure BDA00027334501000000718
order to
Figure BDA00027334501000000719
Substituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:
Figure BDA00027334501000000720
solving the HJB equation to obtain the optimal control quantity tau*The following were used:
Figure BDA00027334501000000721
non-modeling part in dynamics model of quad-rotor unmanned aerial vehicle
Figure BDA00027334501000000722
The impact on a quad-rotor drone is indicated by B. As can be seen from equations (6) - (9), the control objective herein is to give the command within a limited time
Figure BDA00027334501000000723
Therefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic model
Figure BDA00027334501000000724
The optimal compensation value of (1) is:
Figure BDA00027334501000000725
the quad-rotor unmanned aerial vehicle system is a nonlinear system, and for the nonlinear system, the HJB equation is a nonlinear partial differential equation, so that an analytic solution is difficult to obtain. The present specification therefore uses a method of performing-evaluating neural networks to estimate B*. Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑*(σ), a specific form thereof is represented as follows:
Figure BDA0002733450100000081
wherein, WcTo evaluate the neural network weights ideally, μc(σ) to evaluate the neural network excitation function,
Figure BDA0002733450100000082
to evaluate approximation error of a neural network.
Order to
Figure BDA0002733450100000083
Is WcThe optimal estimated values of (1) are:
Figure BDA0002733450100000084
defining a weight estimation error as
Figure BDA0002733450100000085
Can be substituted by formula (11):
Figure BDA0002733450100000086
wherein
Figure BDA0002733450100000087
Design of
Figure BDA0002733450100000088
The update rate of (c) is:
Figure BDA0002733450100000089
wherein, betacIn order to evaluate the learning rate of the neural network,
Figure BDA00027334501000000810
to facilitate subsequent analysis, define
Figure BDA00027334501000000811
Figure BDA00027334501000000812
This gives:
Figure BDA00027334501000000813
according to the foregoing, implementing a neural network for compensating unmodeled parts of a dynamics model of a quad-rotor drone
Figure BDA00027334501000000814
Influence B (x) on quad-rotor drone, wherein
Figure BDA00027334501000000815
Representing a state variable. The form of the representation b (x) using the executive neural network is as follows:
Figure BDA00027334501000000816
wherein WaTo implement the ideal weight matrix for the neural network,μa(x) In order to perform the neural network excitation function,
Figure BDA00027334501000000817
to perform approximation error of the neural network. The execution neural network is designed as follows:
Figure BDA00027334501000000818
substitution of formula (19) for formula (17) can give:
Figure BDA00027334501000000819
substituting equation (25) for equation (24) defines an error as:
Figure BDA00027334501000000820
according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:
Figure BDA00027334501000000821
wherein beta isa>0 is the learning rate for executing neural networks. Defining weight errors for performing neural networks
Figure BDA0002733450100000091
And substituting it into formula (27) to obtain
Figure BDA0002733450100000092
The update rate of (c) is:
Figure BDA0002733450100000093
wherein
Figure BDA0002733450100000094
According to the execution-evaluation neural network design, the execution neural network can compensate the unmodeled part in the dynamics model of the quadrotor unmanned aerial vehicle
Figure BDA0002733450100000095
The impact of the process. Bringing formula (23) into formula (9) can yield:
Figure BDA0002733450100000096
the control quantity τ is designed as:
Figure BDA0002733450100000097
wherein
Figure BDA0002733450100000098
Is a virtual control quantity.
Figure BDA0002733450100000099
Designing by using a multivariate super-twisting algorithm:
Figure BDA00027334501000000910
wherein
Figure BDA00027334501000000911
k1,k2,k3,k4The gain is positively controlled.
Obtained by substituting formula (31) for formula (29):
Figure BDA00027334501000000912
wherein,
Figure BDA00027334501000000913
it can be shown that when the gain k is1、k2、k3And k4When equation (33) is satisfied, the attitude tracking error of the quad-rotor drone can converge to zero in a limited time.
Figure BDA00027334501000000914
In the formula (33)
Figure BDA00027334501000000915
And
Figure BDA00027334501000000916
the specific form is as follows:
Figure BDA00027334501000000917
in the formula (34)
Figure BDA00027334501000000918
Specific examples of implementation are given below:
first, introduction of experiment platform
The experiment platform adopts a real quad-rotor unmanned aerial vehicle as a controlled object, and a real attitude sensor is loaded on the unmanned aerial vehicle, so that a real and visual unmanned aerial vehicle attitude control effect can be obtained, and the result is closer to the actual flight condition. Meanwhile, the platform establishes communication among the upper computer, the target computer and the monitoring computer by utilizing a network, and is convenient for data interaction and control.
Second, flight experiment results
In order to verify the effectiveness and the feasibility of the nonlinear attitude controller provided by the invention, the four-rotor unmanned aerial vehicle attitude stabilization experiment is carried out on the experimental platform. The control target is that three attitude angles of the unmanned aerial vehicle approach to zero in limited time, namely:
Figure BDA0002733450100000101
and can still be recovered to a stable state when being interfered by the outside.
The experimental platform relates to the parameter values of inertia moment J ═ diag [1.34,1.31,2.54 ]]T×10-2kg·m2The half-axle distance l is 0.225m, the lift-torque coefficient c is 0.25, and the mass m is 1.5 kg.
As can be seen from fig. 2, using the reinforcement learning nonlinear attitude controller, the error can be controlled to within ± 1 °. It can be seen from fig. 3 that the steady state can still be reached when the external disturbance reaches 40 °. Therefore, the quadrotor unmanned aerial vehicle reinforcement learning nonlinear attitude controller designed by the invention has good robustness and can accurately control the attitude angle.

Claims (1)

1. Aiming at the problem of attitude control of a quadrotor unmanned aerial vehicle with an unmodeled part in a quadrotor unmanned aerial vehicle kinetic model, a reinforcement learning controller based on an execution-evaluation neural network is designed for estimating the unmodeled part of the model, and a nonlinear robust controller based on multivariable super-twisting is designed at the same time, so that attitude stability control of the quadrotor unmanned aerial vehicle is realized, and the method comprises the following design steps:
step 1) establishing a four-rotor unmanned aerial vehicle dynamic model;
a Newton-Euler method is adopted to establish a four-rotor unmanned plane dynamic model, and the expression formula is as follows:
Figure FDA0002733450090000011
the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,
Figure FDA0002733450090000012
representing a matrix of coriolis forces and centrifugal forces,
Figure FDA0002733450090000013
Figure FDA0002733450090000014
representing a matrix of rotational damping coefficients, where K1、K2And K3Are all unknown constants; Δ (η) represents unmodeled dynamics in a quadrotor drone dynamics model;
Figure FDA0002733450090000015
representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle;
Figure FDA0002733450090000016
Figure FDA0002733450090000017
representing control input torque, whereinφ(t) represents the roll angle channel control input torque, τθ(t) represents the pitch channel control input torque, τψ(t) represents a yaw angle channel control input torque; angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)r(t) is defined as follows:
Figure FDA0002733450090000018
the kinetic model in formula (1) has a parameter uncertainty represented by the following formula:
Figure FDA0002733450090000019
in formula (3), M0、C0Is M (η) and
Figure FDA00027334500900000110
best estimated value of,MAnd CIs a parameter uncertainty portion;
formula (1) is rewritten as follows:
Figure FDA00027334500900000111
wherein:
Figure FDA00027334500900000112
to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined
Figure FDA00027334500900000113
Figure FDA00027334500900000114
And a slip form surface
Figure FDA00027334500900000115
The following were used:
Figure FDA00027334500900000116
wherein
Figure FDA00027334500900000117
In order to be able to adjust the positive real gain,
Figure FDA00027334500900000118
a desired pose trajectory; the first time derivative is obtained for sigma (t) and substituted by equation (4):
Figure FDA00027334500900000119
is a squareThen subsequently calculating and defining function
Figure FDA00027334500900000120
Is an unmodeled part of a quadrotor drone dynamics model, and is of the form:
Figure FDA00027334500900000121
therefore, the quadrotor drone dynamics model is rewritten as:
Figure FDA00027334500900000122
then, designing a nonlinear controller based on reinforcement learning and a multivariable super-twisting control algorithm aiming at the four-rotor unmanned aerial vehicle dynamic model of the formula (9);
step 2) designing a reinforcement learning controller part;
the reinforcement learning controller is designed by adopting an executive-evaluation (Actor-critical) neural network method, the part comprises two neural networks, namely the executive neural network and the evaluation neural network, before the two neural networks are designed, a performance index function needs to be designed to evaluate the result, and the form of the performance index function is as follows:
Figure FDA0002733450090000021
wherein,
Figure FDA0002733450090000022
and is
Figure FDA0002733450090000023
Are positive definite symmetric constant matrixes;
the minimum of equation (10) is in the form of the Bellman equation:
Figure FDA0002733450090000024
wherein
Figure FDA0002733450090000025
According to equation (11), the Hamiltonian function is defined as follows:
Figure FDA0002733450090000026
defining an optimal control strategy tau*The corresponding optimal state value function is:
Figure FDA0002733450090000027
then sigma*The following Hamiltonian equation is satisfied:
Figure FDA0002733450090000028
order to
Figure FDA0002733450090000029
Substituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:
minH=r+(▽Σ*)T(γ+Gτ*)=0. (15)
solving the HJB equation to obtain the optimal control quantity tau*The following were used:
Figure FDA00027334500900000210
non-modeling part in dynamics model of quad-rotor unmanned aerial vehicle
Figure FDA00027334500900000211
The impact on a quad-rotor drone is represented by B; the control target is to order within a limited time
Figure FDA00027334500900000212
Therefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic model
Figure FDA00027334500900000213
The optimal compensation value of (1) is:
Figure FDA00027334500900000214
quad-rotor drone systems are nonlinear systems, for which B is estimated using a method of performing-evaluating a neural network*Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑*(σ), a specific form thereof is represented as follows:
Figure FDA00027334500900000215
wherein, WcTo evaluate the neural network weights ideally, μc(σ) to evaluate the neural network excitation function,
Figure FDA00027334500900000216
to evaluate approximation error of a neural network;
order to
Figure FDA00027334500900000217
Is WcThe optimal estimated values of (1) are:
Figure FDA00027334500900000218
defining weightsEstimate error of
Figure FDA0002733450090000031
Can be substituted by formula (11):
Figure FDA0002733450090000032
wherein
Figure FDA0002733450090000033
Design of
Figure FDA0002733450090000034
The update rate of (c) is:
Figure FDA0002733450090000035
wherein, betacTo evaluate the learning rate of neural networks, betac>0,
Figure FDA0002733450090000036
To facilitate subsequent analysis, define
Figure FDA0002733450090000037
Figure FDA0002733450090000038
This gives:
Figure FDA0002733450090000039
implementation of neural network for compensation of unmodeled part of quadrotor drone dynamics model
Figure FDA00027334500900000310
Influence B (x) brought for quad-rotor unmanned aerial vehicle, whichIn
Figure FDA00027334500900000311
Representing a state variable; the form of the representation b (x) using the executive neural network is as follows:
Figure FDA00027334500900000312
wherein WaTo implement an ideal weight matrix for the neural network, μα(x) In order to perform the neural network excitation function,
Figure FDA00027334500900000313
to perform approximation error of the neural network; the execution neural network is designed as follows:
Figure FDA00027334500900000314
substitution of formula (19) for formula (17) can give:
Figure FDA00027334500900000315
substituting equation (25) for equation (24) defines an error as:
Figure FDA00027334500900000316
according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:
Figure FDA00027334500900000317
wherein beta isa>0 is the learning rate of the execution neural network; defining weight errors for performing neural networks
Figure FDA00027334500900000318
And substituting it into formula (27) to obtain
Figure FDA00027334500900000319
The update rate of (c) is:
Figure FDA00027334500900000320
wherein
Figure FDA00027334500900000321
Step 3), designing a control rate;
performing neural network compensation for unmodeled portions of a quad-rotor drone dynamics model based on execution-evaluation neural network design
Figure FDA0002733450090000041
The influence of this is obtained by bringing formula (23) into formula (9):
Figure FDA0002733450090000042
the control quantity τ is designed as:
Figure FDA0002733450090000043
wherein
Figure FDA0002733450090000044
Is a virtual control quantity;
Figure FDA0002733450090000045
designing by using a multivariate super-twisting algorithm:
Figure FDA0002733450090000046
wherein
Figure FDA0002733450090000047
k1,k2,k3,k4Is a positive control gain;
obtained by substituting formula (31) for formula (29):
Figure FDA0002733450090000048
wherein,
Figure FDA0002733450090000049
when gain k1、k2、k3And k4When the formula (33) is satisfied, the attitude tracking error of the quad-rotor unmanned aerial vehicle can be converged to zero within a limited time;
Figure FDA00027334500900000410
in the formula (33)
Figure FDA00027334500900000411
And
Figure FDA00027334500900000412
the specific form is as follows:
Figure FDA00027334500900000413
in the formula (34)
Figure FDA00027334500900000414
CN202011125416.8A 2020-10-20 2020-10-20 Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method Active CN112363519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011125416.8A CN112363519B (en) 2020-10-20 2020-10-20 Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011125416.8A CN112363519B (en) 2020-10-20 2020-10-20 Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method

Publications (2)

Publication Number Publication Date
CN112363519A true CN112363519A (en) 2021-02-12
CN112363519B CN112363519B (en) 2021-12-07

Family

ID=74507738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011125416.8A Active CN112363519B (en) 2020-10-20 2020-10-20 Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method

Country Status (1)

Country Link
CN (1) CN112363519B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113359473A (en) * 2021-07-06 2021-09-07 天津大学 Microminiature unmanned helicopter nonlinear control method based on iterative learning
CN113721655A (en) * 2021-08-26 2021-11-30 南京大学 Control period self-adaptive reinforcement learning unmanned aerial vehicle stable flight control method
CN113900440A (en) * 2021-07-21 2022-01-07 中国电子科技集团公司电子科学研究院 Unmanned aerial vehicle control law design method and device and readable storage medium
CN113985924A (en) * 2021-12-27 2022-01-28 中国科学院自动化研究所 Aircraft control method, device, equipment and computer program product
CN114063453A (en) * 2021-10-26 2022-02-18 广州大学 Helicopter system control method, system, device and medium based on reinforcement learning
CN114545979A (en) * 2022-03-16 2022-05-27 哈尔滨逐宇航天科技有限责任公司 Aircraft intelligent sliding mode formation control method based on reinforcement learning
CN115061371A (en) * 2022-06-20 2022-09-16 中国航空工业集团公司沈阳飞机设计研究所 Unmanned aerial vehicle control strategy reinforcement learning generation method for preventing strategy jitter
CN116661478A (en) * 2023-07-27 2023-08-29 安徽大学 Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter
CN110908281A (en) * 2019-11-29 2020-03-24 天津大学 Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter
CN111625019A (en) * 2020-05-18 2020-09-04 天津大学 Trajectory planning method for four-rotor unmanned aerial vehicle suspension air transportation system based on reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter
CN110908281A (en) * 2019-11-29 2020-03-24 天津大学 Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter
CN111625019A (en) * 2020-05-18 2020-09-04 天津大学 Trajectory planning method for four-rotor unmanned aerial vehicle suspension air transportation system based on reinforcement learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
NODLAND D 等: "Neural network-based optimal adaptive output feedback control of a helicopter UAV", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS》 *
安航 等: "无人直升机的姿态增强学习控制设计与验证", 《控制理论与应用》 *
宋占魁 等: "小型四旋翼无人飞行器非线性控制方法研究", 《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》 *
李辰: "面向四旋翼无人机的非线性控制方法与实现", 《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》 *
郝伟 等: "四旋翼无人机姿态系统的非线性容错控制设计", 《控制理论与应用》 *
鲜斌 等: "基于强化学习的小型无人直升机有限时间收敛控制设计", 《控制与决策》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113359473A (en) * 2021-07-06 2021-09-07 天津大学 Microminiature unmanned helicopter nonlinear control method based on iterative learning
CN113359473B (en) * 2021-07-06 2022-03-11 天津大学 Microminiature unmanned helicopter nonlinear control method based on iterative learning
CN113900440B (en) * 2021-07-21 2023-03-14 中国电子科技集团公司电子科学研究院 Unmanned aerial vehicle control law design method and device and readable storage medium
CN113900440A (en) * 2021-07-21 2022-01-07 中国电子科技集团公司电子科学研究院 Unmanned aerial vehicle control law design method and device and readable storage medium
CN113721655A (en) * 2021-08-26 2021-11-30 南京大学 Control period self-adaptive reinforcement learning unmanned aerial vehicle stable flight control method
CN114063453A (en) * 2021-10-26 2022-02-18 广州大学 Helicopter system control method, system, device and medium based on reinforcement learning
CN114063453B (en) * 2021-10-26 2023-04-25 广州大学 Helicopter system control method, system, device and medium based on reinforcement learning
CN113985924A (en) * 2021-12-27 2022-01-28 中国科学院自动化研究所 Aircraft control method, device, equipment and computer program product
CN113985924B (en) * 2021-12-27 2022-04-08 中国科学院自动化研究所 Aircraft control method, device, equipment and computer readable storage medium
CN114545979A (en) * 2022-03-16 2022-05-27 哈尔滨逐宇航天科技有限责任公司 Aircraft intelligent sliding mode formation control method based on reinforcement learning
CN115061371A (en) * 2022-06-20 2022-09-16 中国航空工业集团公司沈阳飞机设计研究所 Unmanned aerial vehicle control strategy reinforcement learning generation method for preventing strategy jitter
CN115061371B (en) * 2022-06-20 2023-08-04 中国航空工业集团公司沈阳飞机设计研究所 Unmanned plane control strategy reinforcement learning generation method capable of preventing strategy jitter
CN116661478A (en) * 2023-07-27 2023-08-29 安徽大学 Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning
CN116661478B (en) * 2023-07-27 2023-09-22 安徽大学 Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning

Also Published As

Publication number Publication date
CN112363519B (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN112363519B (en) Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method
CN106444799B (en) Four-rotor unmanned aerial vehicle control method based on fuzzy extended state observer and self-adaptive sliding mode
Bou-Ammar et al. Controller design for quadrotor uavs using reinforcement learning
CN105912009B (en) Four-rotor aircraft control method based on pole allocation and fuzzy active disturbance rejection control technology
CN110908281A (en) Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter
CN105607473B (en) The attitude error Fast Convergent self-adaptation control method of small-sized depopulated helicopter
Mueller et al. Iterative learning of feed-forward corrections for high-performance tracking
CN110442020B (en) Novel fault-tolerant control method based on whale optimization algorithm
CN113759979B (en) Event-driven-based online track planning method for unmanned aerial vehicle hanging system
CN112947518B (en) Four-rotor robust attitude control method based on disturbance observer
Cheng et al. Neural-networks control for hover to high-speed-level-flight transition of ducted fan uav with provable stability
CN111367182A (en) Hypersonic aircraft anti-interference backstepping control method considering input limitation
CN112578805A (en) Attitude control method of rotor craft
CN107817818B (en) Finite time control method for flight path tracking of uncertain model airship
Razzaghian et al. Adaptive fuzzy sliding mode control for a model-scaled unmanned helicopter
CN115576341A (en) Unmanned aerial vehicle trajectory tracking control method based on function differentiation and adaptive variable gain
CN117742156B (en) Four-rotor unmanned aerial vehicle control method and system based on RBF neural network
Brahim et al. Finite Time Adaptive SMC for UAV Trajectory Tracking Under Unknown Disturbances and Actuators Constraints
CN113805481A (en) Four-rotor aircraft self-adaptive neural network positioning control method based on visual feedback
Spitzer et al. Inverting learned dynamics models for aggressive multirotor control
Bouzid et al. 3d trajectory tracking control of quadrotor UAV with on-line disturbance compensation
CN116203840A (en) Adaptive gain scheduling control method for reusable carrier
Sheng et al. Multivariable MRAC for a quadrotor UAV with a non-diagonal interactor matrix
Dasgupta Adaptive attitude tracking of a quad-rotorcraft using nonlinear control hierarchy
CN115268475A (en) Robot fish accurate terrain tracking control method based on finite time disturbance observer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant