CN112363519A - Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method - Google Patents
Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method Download PDFInfo
- Publication number
- CN112363519A CN112363519A CN202011125416.8A CN202011125416A CN112363519A CN 112363519 A CN112363519 A CN 112363519A CN 202011125416 A CN202011125416 A CN 202011125416A CN 112363519 A CN112363519 A CN 112363519A
- Authority
- CN
- China
- Prior art keywords
- formula
- neural network
- aerial vehicle
- unmanned aerial
- control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 claims abstract description 69
- 238000011156 evaluation Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 238000013461 design Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000005284 excitation Effects 0.000 claims description 6
- 238000011217 control strategy Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 4
- 238000013016 damping Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 239000013641 positive control Substances 0.000 claims 1
- 230000000694 effects Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/08—Control of attitude, i.e. control of roll, pitch, or yaw
- G05D1/0808—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
- G05D1/0816—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability
- G05D1/0825—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability using mathematical models
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Feedback Control In General (AREA)
Abstract
The invention relates to a quadrotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method, aiming at the problem of quadrotor unmanned aerial vehicle attitude control of a quadrotor unmanned aerial vehicle kinetic model with an unmodeled part, a reinforcement learning controller based on an execution-evaluation neural network is designed for estimating the unmodeled part of the model, and a nonlinear robust controller based on multivariable super-twisting is designed at the same time, so that attitude stability control of the quadrotor unmanned aerial vehicle is realized.
Description
Technical Field
The invention relates to attitude precision control of a quad-rotor unmanned aerial vehicle. Aiming at the influence of an unmodeled part in a system dynamics model of the quad-rotor unmanned aerial vehicle on the system control performance and the dependence of a control method based on the system dynamics model on an accurate model, the nonlinear attitude controller based on reinforcement learning and a second-order sliding mode control algorithm is provided, and the result of finite time convergence of the attitude control error of the unmanned aerial vehicle is realized. In particular to a finite time convergence attitude control method for a quad-rotor unmanned aerial vehicle.
Background
Traditional linear control algorithms, such as PID algorithms, LQR algorithms, etc., have been used in a wider range of applications in quad-rotor drones. However, the linear control algorithm only ensures that the system has a good control effect in the state near the balance point, and is difficult to obtain satisfactory effects in the aspects of processing the nonlinear multivariable control system, ensuring the anti-interference capability of the system and the like, so that the improvement of the dynamic performance and the steady-state performance of the system is also limited (journal: flight mechanics; prey: li yi ji sha, zhang xiao east; published year-month: 2011-4 month; article title: the current situation and development of the unmanned aerial vehicle flight control method research; page number: 1-5). To this end, a number of non-linear control algorithms are used for quad-rotor drone control. If the Control of the quad-rotor unmanned aerial vehicle is realized by adopting the adaptive sliding mode Control method, the experimental result shows that the adaptive sliding mode Control method has better performance in the aspects of processing sensor noise and model uncertainty (Journal: International Journal of Control, Automation and Systems; Renders: Daewon Lee, H Jin Kim, Shankar science; published month: 2009, 5 months; article title: Feedback linkage v.adaptive sliding mode Control for a quadrotor satellite; page number: 419: 42). However, the conventional first-order sliding mode control algorithm has the buffeting problem and is not beneficial to long-term stable operation of the system. Thus, some researchers began to utilize super-twisting robust control design methods. Theoretically, this algorithm can eliminate buffeting and is used by many researchers for quad-rotor unmanned controls (Journal: Journal of Franklin Institute; Rev: Laloui Defafa, Abdelaziz Benalleguo, LFridman; published month: 3 2012; article title: Super twisting control algorithm for the attute tracking of the four rotors utore uav; page number: 685-. Still other researchers have proposed a Multivariable super-like algorithm considering the Multivariable characteristics of quad-rotor drones and used it for quad-rotor drone attitude control (journal: IEEE Transactions on Industrial Electronics; authors: Bailing tie, Lihong Liu, Hanchen Lu, et al; published month: 2017 month 8; article title: Multivariable fine time attribute control for quadruperator u: Theory and experiment; page number: 2567 + 2577).
With the rapid development of machine learning research work, learning algorithms such as reinforcement learning are also used for the control design of the quad-rotor unmanned aerial vehicle. In consideration of the problems of safe flight and the like, researchers firstly use actual flight data to carry out model identification to obtain an offline learning state transfer model or a random Markov model, then use an enhanced learning algorithm to carry out offline iteration to obtain an optimal control strategy, and finally use the optimal control strategy for unmanned aerial vehicle control (Conference: IEEE RSJ International reference on Intelligent Robots and Systems; authors: Wallander, Gabriel M Hoffmann, Jung Soon Jang, etc.; publication year/month: 2005; article title: Multi-agent quadrat test control design: Integrated sizing mode v.e. recovery leaving; page number: 3712 + 3717). In the simulation environment of the quad-rotor unmanned aerial vehicle, a trainee trains a neural network by using reinforcement learning, and applies the trained neural network to unmanned aerial vehicle Control, so that the flight task of unmanned aerial vehicle throwing and hovering is realized (journal: IEEE Transactions on Industrial Electronics; Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and the like; published month: 2017, 6 months; article title: Control of quadra with recovery learning, and page number: 2096-. Although the offline learning method achieves a good unmanned aerial vehicle control effect, the research provides stability proof less, and the offline learning is long in time consumption and large in calculation amount. On the other hand, part of off-line learning methods are performed in a simulation environment, and various disturbances in a real environment cannot be completely simulated, so that the generalization capability of the learned control algorithm is insufficient. The experiment of Hwangbo et al, although it works well on hover tasks, the tracking effect is not as good as that of the non-linear controller. For this reason, online learning reinforcement learning algorithms are also used for quad-rotor drone control. For example, Sugimoto and the like installs a camera at the bottom of an unmanned aerial vehicle for identifying mark Information on the ground, and then controls the unmanned aerial vehicle to keep the ground mark always in the center of the visual field of the camera by using a reinforcement learning algorithm on a ground station, thereby realizing the hovering experiment of the quadrotor unmanned aerial vehicle (Conference: 20163rd International Conference on Information Science and Control Engineering (ICISCE); authors: Takuya Sugimoto, Manabu Gouko; publication year: 2016; article title: Acquisition of knowledge by actual use of Information retrieval and leaving; page number: 148-.
In consideration of the problems of long calculation time, large calculation amount and the like when the reinforcement learning algorithm is used for unmanned aerial vehicle control, a learner designs a controller based on a Robust Integral of an error sign function (RISE) control algorithm and the reinforcement learning algorithm, and uses the controller for unmanned helicopter attitude control to obtain a good control effect (journal: control theory and application; writer: peaceful, fresh; published year, year 2019, month 4; article title: attitude reinforcement learning control design and verification of an unmanned helicopter; page number: 516-. However, this approach has less application on quad-rotor drones.
With regard to the research on quad-rotor drone control, researchers have achieved some success today, but there are also some limitations: 1) the existing control design usually ignores unmodeled parts in a four-rotor unmanned aerial vehicle dynamic model, but a control method based on a sixuanyiwure basis-tolerant dynamic model has high dependence on an accurate model. Therefore, when the attitude of the quad-rotor unmanned aerial vehicle is accurately controlled, the influence of the unmodeled part of the quad-rotor unmanned aerial vehicle is considered. 2) Some control methods based on reinforcement learning generally utilize flight data to carry out off-line training, and the controller generalization ability that obtains from this is not enough, is difficult to guarantee the flight effect of four rotor unmanned aerial vehicle under special environment.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a nonlinear attitude controller based on reinforcement learning for a quad-rotor unmanned aerial vehicle. The invention considers the unmodeled part in the dynamics model of the quad-rotor unmanned aerial vehicle, and applies a reinforcement learning method and a multivariable super-twisting algorithm to carry out on-line training on the quad-rotor unmanned aerial vehicle to solve the problem of insufficient generalization capability of the controller. Therefore, the invention adopts the technical scheme that the finite time convergence attitude control method of the quad-rotor unmanned aerial vehicle comprises the following steps:
1) establishing a four-rotor unmanned aerial vehicle dynamics model
The unmanned aerial vehicle is an X-shaped quadrotor unmanned aerial vehicle, and a dynamics model of the quadrotor unmanned aerial vehicle is established by adopting a Newton-Euler method, wherein the expression is as follows:
the invention adopts Newton-Euler method to establish a four-rotor unmanned plane dynamics model, and the expression is as follows:
the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,representing a matrix of coriolis forces and centrifugal forces, representing a matrix of rotational damping coefficients, where K1、K2And K3Are all unknown constants. And delta (eta) represents unmodeled dynamics in the dynamics model of the quadrotor unmanned aerial vehicle, and meets the condition that | delta (eta) | is less than or equal to rho (| eta |) | | eta | | |, wherein rho is a positive real number, and norms involved in the situation are 2 norms.And representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle.Representing control input torque, whereinφ(t) represents the roll angle channel control input torque, τθ(t) represents the pitch channel control input torque, τψ(t) represents the yaw path control input torque. Angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)r(t) is defined as follows:
the dynamical model in formula (1) has a parameter uncertainty, which can be represented by the following formula:
Formula (1) can be rewritten as follows:
wherein:
to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined And a slip form surfaceThe following were used:
whereinIn order to be able to adjust the positive real gain,is the desired gesture trajectory. The first time derivative is obtained for sigma (t) and substituted by equation (4):
to facilitate subsequent calculations, functions are definedIs an unmodeled part of a quadrotor drone dynamics model, and is of the form:
therefore, the quad-rotor drone dynamics model can be rewritten as:
then, the design of the nonlinear controller based on the reinforcement learning and multivariable super-twisting control algorithm is carried out for the quadrotor unmanned aerial vehicle dynamics model of the formula (9).
2) Reinforcement learning controller part design
The reinforcement learning controller is designed using an implement-evaluate (Actor-Critic) neural network approach, and thus the section includes two neural networks — the design of the implement neural network and the evaluate neural network. Before two neural network designs are carried out, a performance index function needs to be designed to evaluate the result. The form is as follows:
The minimum of equation (10) is in the form of the Bellman equation:
defining an optimal control strategy tau*The corresponding optimal state value function is:
then sigma*The following Hamiltonian equation is satisfied:
order toSubstituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:
solving the HJB equation to obtain the optimal control quantity tau*The following were used:
non-modeling part in dynamics model of quad-rotor unmanned aerial vehicleThe impact on a quad-rotor drone is indicated by B. As can be seen from equations (6) - (9), the control objective herein is to give the command within a limited timeTherefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic modelThe optimal compensation value of (1) is:
the quad-rotor unmanned aerial vehicle system is a nonlinear system, and for the nonlinear system, the HJB equation is a nonlinear partial differential equation, so that an analytic solution is difficult to obtain. The present specification therefore uses a method of performing-evaluating neural networks to estimate B*. Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑*(σ), a specific form thereof is represented as follows:
wherein, WcTo evaluate the neural network weights ideally, μc(σ) to evaluate the neural network excitation function,to evaluate approximation error of a neural network.
wherein, betacIn order to evaluate the learning rate of the neural network,to facilitate subsequent analysis, define This gives:
according to the foregoing, implementing a neural network for compensating unmodeled parts of a dynamics model of a quad-rotor droneInfluence B (x) on quad-rotor drone, whereinRepresenting a state variable. Use executive spiritThe form of the representation b (x) over the network is as follows:
wherein WaTo implement an ideal weight matrix for the neural network, μa(x) In order to perform the neural network excitation function,to perform approximation error of the neural network. The execution neural network is designed as follows:
substitution of formula (19) for formula (17) can give:
substituting equation (25) for equation (24) defines an error as:
according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:
wherein beta isa>0 is the learning rate for executing neural networks. Defining weight errors for performing neural networksAnd substituting it into formula (27) to obtainThe update rate of (c) is:
3) Non-linear controller part design
According to the execution-evaluation neural network design, the execution neural network can compensate the unmodeled part in the dynamics model of the quadrotor unmanned aerial vehicleThe impact of the process. Bringing formula (23) into formula (9) can yield:
the control quantity τ is designed as:
The invention has the characteristics and beneficial effects that:
the invention establishes a dynamics model containing an unmodeled part aiming at the quad-rotor unmanned aerial vehicle, designs a reinforcement learning nonlinear attitude controller based on reinforcement learning and a multivariable super-twisting control algorithm, realizes the finite time convergence control of the attitude error of the quad-rotor unmanned aerial vehicle, improves the robustness of the quad-rotor unmanned aerial vehicle system, and realizes the accurate control of the attitude of the quad-rotor unmanned aerial vehicle.
Description of the drawings:
FIG. 1 is a schematic diagram of a quad-rotor drone system for use with the present invention;
FIG. 2 is a graph of three attitude angles of a quad-rotor drone during flight using a control scheme;
fig. 3 is a graph of three attitude angles of a quad-rotor drone in flight when subjected to external disturbances after the control scheme is employed.
Detailed Description
The technical scheme adopted by the invention is as follows: the method for establishing the dynamics model of the quad-rotor unmanned aerial vehicle comprising the unmodeled part of the system and designing the corresponding reinforcement learning nonlinear attitude controller comprises the following steps:
first, a quad-rotor drone dynamics model needs to be built. Fig. 1 is a schematic diagram of a quad-rotor drone system as used herein. The unmanned aerial vehicle is an X-shaped quadrotor unmanned aerial vehicle, and a dynamics model of the quadrotor unmanned aerial vehicle is established by adopting a Newton-Euler method, wherein the expression is as follows:
the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,representing a matrix of coriolis forces and centrifugal forces, representing a matrix of rotational damping coefficients, where K1、K2And K3Are all unknown constants. Delta (η) represents unmodeled dynamics in the quad-rotor drone dynamics model and satisfies | | | delta (η) | ≦ rho (| | η |) | | | η | | |, where ρ is a positive real number and the norms referred to in the claims are all 2 norms.And representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle. Representing control input torque, whereinφ(t) represents the roll angle channel control input torque, τθ(t) represents the pitch channel control input torque, τψ(t) represents the yaw path control input torque. Angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)r(t) is defined as follows:
the dynamical model in formula (1) has a parameter uncertainty, which can be represented by the following formula:
Formula (1) can be rewritten as follows:
wherein:
to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined And a slip form surfaceThe following were used:
whereinIn order to be able to adjust the positive real gain,is the desired gesture trajectory. The first time derivative is obtained for sigma (t) and substituted by equation (4):
to facilitate subsequent calculations, functions are definedIs an unmodeled part of a dynamics model of a quadrotor unmanned aerial vehicle and has the following form:
Therefore, the quad-rotor drone dynamics model can be rewritten as:
then, the design of the nonlinear controller based on the reinforcement learning and multivariable super-twisting control algorithm is carried out for the quadrotor unmanned aerial vehicle dynamics model of the formula (9).
The reinforcement learning controller is designed using an implement-evaluate (Actor-Critic) neural network approach, and thus the section includes two neural networks — the design of the implement neural network and the evaluate neural network. Before two neural network designs are carried out, a performance index function needs to be designed to evaluate the result. The form is as follows:
The minimum of equation (10) is in the form of the Bellman equation:
defining an optimal control strategy tau*The corresponding optimal state value function is:
then sigma*The following Hamiltonian equation is satisfied:
order toSubstituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:
solving the HJB equation to obtain the optimal control quantity tau*The following were used:
non-modeling part in dynamics model of quad-rotor unmanned aerial vehicleThe impact on a quad-rotor drone is indicated by B. As can be seen from equations (6) - (9), the control objective herein is to give the command within a limited timeTherefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic modelThe optimal compensation value of (1) is:
the quad-rotor unmanned aerial vehicle system is a nonlinear system, and for the nonlinear system, the HJB equation is a nonlinear partial differential equation, so that an analytic solution is difficult to obtain. The present specification therefore uses a method of performing-evaluating neural networks to estimate B*. Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑*(σ), a specific form thereof is represented as follows:
wherein, WcTo evaluate the neural network weights ideally, μc(σ) to evaluate the neural network excitation function,to evaluate approximation error of a neural network.
wherein, betacIn order to evaluate the learning rate of the neural network,to facilitate subsequent analysis, define This gives:
according to the foregoing, implementing a neural network for compensating unmodeled parts of a dynamics model of a quad-rotor droneInfluence B (x) on quad-rotor drone, whereinRepresenting a state variable. The form of the representation b (x) using the executive neural network is as follows:
wherein WaTo implement the ideal weight matrix for the neural network,μa(x) In order to perform the neural network excitation function,to perform approximation error of the neural network. The execution neural network is designed as follows:
substitution of formula (19) for formula (17) can give:
substituting equation (25) for equation (24) defines an error as:
according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:
wherein beta isa>0 is the learning rate for executing neural networks. Defining weight errors for performing neural networksAnd substituting it into formula (27) to obtainThe update rate of (c) is:
According to the execution-evaluation neural network design, the execution neural network can compensate the unmodeled part in the dynamics model of the quadrotor unmanned aerial vehicleThe impact of the process. Bringing formula (23) into formula (9) can yield:
the control quantity τ is designed as:
Obtained by substituting formula (31) for formula (29):
it can be shown that when the gain k is1、k2、k3And k4When equation (33) is satisfied, the attitude tracking error of the quad-rotor drone can converge to zero in a limited time.
Specific examples of implementation are given below:
first, introduction of experiment platform
The experiment platform adopts a real quad-rotor unmanned aerial vehicle as a controlled object, and a real attitude sensor is loaded on the unmanned aerial vehicle, so that a real and visual unmanned aerial vehicle attitude control effect can be obtained, and the result is closer to the actual flight condition. Meanwhile, the platform establishes communication among the upper computer, the target computer and the monitoring computer by utilizing a network, and is convenient for data interaction and control.
Second, flight experiment results
In order to verify the effectiveness and the feasibility of the nonlinear attitude controller provided by the invention, the four-rotor unmanned aerial vehicle attitude stabilization experiment is carried out on the experimental platform. The control target is that three attitude angles of the unmanned aerial vehicle approach to zero in limited time, namely:
and can still be recovered to a stable state when being interfered by the outside.
The experimental platform relates to the parameter values of inertia moment J ═ diag [1.34,1.31,2.54 ]]T×10-2kg·m2The half-axle distance l is 0.225m, the lift-torque coefficient c is 0.25, and the mass m is 1.5 kg.
As can be seen from fig. 2, using the reinforcement learning nonlinear attitude controller, the error can be controlled to within ± 1 °. It can be seen from fig. 3 that the steady state can still be reached when the external disturbance reaches 40 °. Therefore, the quadrotor unmanned aerial vehicle reinforcement learning nonlinear attitude controller designed by the invention has good robustness and can accurately control the attitude angle.
Claims (1)
1. Aiming at the problem of attitude control of a quadrotor unmanned aerial vehicle with an unmodeled part in a quadrotor unmanned aerial vehicle kinetic model, a reinforcement learning controller based on an execution-evaluation neural network is designed for estimating the unmodeled part of the model, and a nonlinear robust controller based on multivariable super-twisting is designed at the same time, so that attitude stability control of the quadrotor unmanned aerial vehicle is realized, and the method comprises the following design steps:
step 1) establishing a four-rotor unmanned aerial vehicle dynamic model;
a Newton-Euler method is adopted to establish a four-rotor unmanned plane dynamic model, and the expression formula is as follows:
the variables in formula (1) are defined as follows: m (eta) represents an inertia matrix,representing a matrix of coriolis forces and centrifugal forces, representing a matrix of rotational damping coefficients, where K1、K2And K3Are all unknown constants; Δ (η) represents unmodeled dynamics in a quadrotor drone dynamics model;representing the attitude angle of the unmanned aerial vehicle, wherein phi (t) is a roll angle, theta (t) is a pitch angle, and psi (t) is a yaw angle; representing control input torque, whereinφ(t) represents the roll angle channel control input torque, τθ(t) represents the pitch channel control input torque, τψ(t) represents a yaw angle channel control input torque; angular velocity transfer matrix R from inertial coordinate system to body coordinate system in formula (1)r(t) is defined as follows:
the kinetic model in formula (1) has a parameter uncertainty represented by the following formula:
in formula (3), M0、C0Is M (η) andbest estimated value of,M△And C△Is a parameter uncertainty portion;
formula (1) is rewritten as follows:
wherein:
to achieve attitude angle control of the drone, a quad-rotor drone attitude tracking error vector is defined And a slip form surfaceThe following were used:
whereinIn order to be able to adjust the positive real gain,a desired pose trajectory; the first time derivative is obtained for sigma (t) and substituted by equation (4):
is a squareThen subsequently calculating and defining functionIs an unmodeled part of a quadrotor drone dynamics model, and is of the form:
therefore, the quadrotor drone dynamics model is rewritten as:
then, designing a nonlinear controller based on reinforcement learning and a multivariable super-twisting control algorithm aiming at the four-rotor unmanned aerial vehicle dynamic model of the formula (9);
step 2) designing a reinforcement learning controller part;
the reinforcement learning controller is designed by adopting an executive-evaluation (Actor-critical) neural network method, the part comprises two neural networks, namely the executive neural network and the evaluation neural network, before the two neural networks are designed, a performance index function needs to be designed to evaluate the result, and the form of the performance index function is as follows:
the minimum of equation (10) is in the form of the Bellman equation:
defining an optimal control strategy tau*The corresponding optimal state value function is:
then sigma*The following Hamiltonian equation is satisfied:
order toSubstituting formula (9) for formula (14) yields the HJB (Hamilton-Jacobi-bellman) equation, which is of the form:
minH=r+(▽Σ*)T(γ+Gτ*)=0. (15)
solving the HJB equation to obtain the optimal control quantity tau*The following were used:
non-modeling part in dynamics model of quad-rotor unmanned aerial vehicleThe impact on a quad-rotor drone is represented by B; the control target is to order within a limited timeTherefore, unmodeled part in four-rotor unmanned aerial vehicle dynamic modelThe optimal compensation value of (1) is:
quad-rotor drone systems are nonlinear systems, for which B is estimated using a method of performing-evaluating a neural network*Wherein evaluating the output values of the neural network is used to approximate an optimum state value function ∑*(σ), a specific form thereof is represented as follows:
wherein, WcTo evaluate the neural network weights ideally, μc(σ) to evaluate the neural network excitation function,to evaluate approximation error of a neural network;
wherein, betacTo evaluate the learning rate of neural networks, betac>0,To facilitate subsequent analysis, define This gives:
implementation of neural network for compensation of unmodeled part of quadrotor drone dynamics modelInfluence B (x) brought for quad-rotor unmanned aerial vehicle, whichInRepresenting a state variable; the form of the representation b (x) using the executive neural network is as follows:
wherein WaTo implement an ideal weight matrix for the neural network, μα(x) In order to perform the neural network excitation function,to perform approximation error of the neural network; the execution neural network is designed as follows:
substitution of formula (19) for formula (17) can give:
substituting equation (25) for equation (24) defines an error as:
according to the gradient descent algorithm, the update rate of the weight of the executed neural network is designed as follows:
wherein beta isa>0 is the learning rate of the execution neural network; defining weight errors for performing neural networksAnd substituting it into formula (27) to obtainThe update rate of (c) is:
Step 3), designing a control rate;
performing neural network compensation for unmodeled portions of a quad-rotor drone dynamics model based on execution-evaluation neural network designThe influence of this is obtained by bringing formula (23) into formula (9):
the control quantity τ is designed as:
obtained by substituting formula (31) for formula (29):
when gain k1、k2、k3And k4When the formula (33) is satisfied, the attitude tracking error of the quad-rotor unmanned aerial vehicle can be converged to zero within a limited time;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011125416.8A CN112363519B (en) | 2020-10-20 | 2020-10-20 | Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011125416.8A CN112363519B (en) | 2020-10-20 | 2020-10-20 | Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112363519A true CN112363519A (en) | 2021-02-12 |
CN112363519B CN112363519B (en) | 2021-12-07 |
Family
ID=74507738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011125416.8A Active CN112363519B (en) | 2020-10-20 | 2020-10-20 | Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112363519B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113359473A (en) * | 2021-07-06 | 2021-09-07 | 天津大学 | Microminiature unmanned helicopter nonlinear control method based on iterative learning |
CN113721655A (en) * | 2021-08-26 | 2021-11-30 | 南京大学 | Control period self-adaptive reinforcement learning unmanned aerial vehicle stable flight control method |
CN113900440A (en) * | 2021-07-21 | 2022-01-07 | 中国电子科技集团公司电子科学研究院 | Unmanned aerial vehicle control law design method and device and readable storage medium |
CN113985924A (en) * | 2021-12-27 | 2022-01-28 | 中国科学院自动化研究所 | Aircraft control method, device, equipment and computer program product |
CN114063453A (en) * | 2021-10-26 | 2022-02-18 | 广州大学 | Helicopter system control method, system, device and medium based on reinforcement learning |
CN114545979A (en) * | 2022-03-16 | 2022-05-27 | 哈尔滨逐宇航天科技有限责任公司 | Aircraft intelligent sliding mode formation control method based on reinforcement learning |
CN115061371A (en) * | 2022-06-20 | 2022-09-16 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned aerial vehicle control strategy reinforcement learning generation method for preventing strategy jitter |
CN116661478A (en) * | 2023-07-27 | 2023-08-29 | 安徽大学 | Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
CN110908281A (en) * | 2019-11-29 | 2020-03-24 | 天津大学 | Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter |
CN111625019A (en) * | 2020-05-18 | 2020-09-04 | 天津大学 | Trajectory planning method for four-rotor unmanned aerial vehicle suspension air transportation system based on reinforcement learning |
-
2020
- 2020-10-20 CN CN202011125416.8A patent/CN112363519B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
CN110908281A (en) * | 2019-11-29 | 2020-03-24 | 天津大学 | Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter |
CN111625019A (en) * | 2020-05-18 | 2020-09-04 | 天津大学 | Trajectory planning method for four-rotor unmanned aerial vehicle suspension air transportation system based on reinforcement learning |
Non-Patent Citations (6)
Title |
---|
NODLAND D 等: "Neural network-based optimal adaptive output feedback control of a helicopter UAV", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS》 * |
安航 等: "无人直升机的姿态增强学习控制设计与验证", 《控制理论与应用》 * |
宋占魁 等: "小型四旋翼无人飞行器非线性控制方法研究", 《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》 * |
李辰: "面向四旋翼无人机的非线性控制方法与实现", 《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》 * |
郝伟 等: "四旋翼无人机姿态系统的非线性容错控制设计", 《控制理论与应用》 * |
鲜斌 等: "基于强化学习的小型无人直升机有限时间收敛控制设计", 《控制与决策》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113359473A (en) * | 2021-07-06 | 2021-09-07 | 天津大学 | Microminiature unmanned helicopter nonlinear control method based on iterative learning |
CN113359473B (en) * | 2021-07-06 | 2022-03-11 | 天津大学 | Microminiature unmanned helicopter nonlinear control method based on iterative learning |
CN113900440B (en) * | 2021-07-21 | 2023-03-14 | 中国电子科技集团公司电子科学研究院 | Unmanned aerial vehicle control law design method and device and readable storage medium |
CN113900440A (en) * | 2021-07-21 | 2022-01-07 | 中国电子科技集团公司电子科学研究院 | Unmanned aerial vehicle control law design method and device and readable storage medium |
CN113721655A (en) * | 2021-08-26 | 2021-11-30 | 南京大学 | Control period self-adaptive reinforcement learning unmanned aerial vehicle stable flight control method |
CN114063453A (en) * | 2021-10-26 | 2022-02-18 | 广州大学 | Helicopter system control method, system, device and medium based on reinforcement learning |
CN114063453B (en) * | 2021-10-26 | 2023-04-25 | 广州大学 | Helicopter system control method, system, device and medium based on reinforcement learning |
CN113985924A (en) * | 2021-12-27 | 2022-01-28 | 中国科学院自动化研究所 | Aircraft control method, device, equipment and computer program product |
CN113985924B (en) * | 2021-12-27 | 2022-04-08 | 中国科学院自动化研究所 | Aircraft control method, device, equipment and computer readable storage medium |
CN114545979A (en) * | 2022-03-16 | 2022-05-27 | 哈尔滨逐宇航天科技有限责任公司 | Aircraft intelligent sliding mode formation control method based on reinforcement learning |
CN115061371A (en) * | 2022-06-20 | 2022-09-16 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned aerial vehicle control strategy reinforcement learning generation method for preventing strategy jitter |
CN115061371B (en) * | 2022-06-20 | 2023-08-04 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned plane control strategy reinforcement learning generation method capable of preventing strategy jitter |
CN116661478A (en) * | 2023-07-27 | 2023-08-29 | 安徽大学 | Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning |
CN116661478B (en) * | 2023-07-27 | 2023-09-22 | 安徽大学 | Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN112363519B (en) | 2021-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112363519B (en) | Four-rotor unmanned aerial vehicle reinforcement learning nonlinear attitude control method | |
CN106444799B (en) | Four-rotor unmanned aerial vehicle control method based on fuzzy extended state observer and self-adaptive sliding mode | |
Bou-Ammar et al. | Controller design for quadrotor uavs using reinforcement learning | |
CN105912009B (en) | Four-rotor aircraft control method based on pole allocation and fuzzy active disturbance rejection control technology | |
CN110908281A (en) | Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter | |
CN105607473B (en) | The attitude error Fast Convergent self-adaptation control method of small-sized depopulated helicopter | |
Mueller et al. | Iterative learning of feed-forward corrections for high-performance tracking | |
CN110442020B (en) | Novel fault-tolerant control method based on whale optimization algorithm | |
CN113759979B (en) | Event-driven-based online track planning method for unmanned aerial vehicle hanging system | |
CN112947518B (en) | Four-rotor robust attitude control method based on disturbance observer | |
Cheng et al. | Neural-networks control for hover to high-speed-level-flight transition of ducted fan uav with provable stability | |
CN111367182A (en) | Hypersonic aircraft anti-interference backstepping control method considering input limitation | |
CN112578805A (en) | Attitude control method of rotor craft | |
CN107817818B (en) | Finite time control method for flight path tracking of uncertain model airship | |
Razzaghian et al. | Adaptive fuzzy sliding mode control for a model-scaled unmanned helicopter | |
CN115576341A (en) | Unmanned aerial vehicle trajectory tracking control method based on function differentiation and adaptive variable gain | |
CN117742156B (en) | Four-rotor unmanned aerial vehicle control method and system based on RBF neural network | |
Brahim et al. | Finite Time Adaptive SMC for UAV Trajectory Tracking Under Unknown Disturbances and Actuators Constraints | |
CN113805481A (en) | Four-rotor aircraft self-adaptive neural network positioning control method based on visual feedback | |
Spitzer et al. | Inverting learned dynamics models for aggressive multirotor control | |
Bouzid et al. | 3d trajectory tracking control of quadrotor UAV with on-line disturbance compensation | |
CN116203840A (en) | Adaptive gain scheduling control method for reusable carrier | |
Sheng et al. | Multivariable MRAC for a quadrotor UAV with a non-diagonal interactor matrix | |
Dasgupta | Adaptive attitude tracking of a quad-rotorcraft using nonlinear control hierarchy | |
CN115268475A (en) | Robot fish accurate terrain tracking control method based on finite time disturbance observer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |