CN116661478A

CN116661478A - Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning

Info

Publication number: CN116661478A
Application number: CN202310930078.2A
Authority: CN
Inventors: 赵冬; 苏延旭; 黄大荣
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2023-08-29
Anticipated expiration: 2043-07-27
Also published as: CN116661478B

Abstract

The invention discloses a four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning, which comprises the following steps: constructing an attitude tracking error model; constructing a long-term cost function of the quadrotor unmanned aerial vehicle based on the discretized attitude tracking error model, and forming a real-time reward function of integral reinforcement learning; constructing an evaluation neural network, constructing an error model of integral reinforcement learning based on an estimated value of the evaluation neural network on a long-term cost function, and constructing an evaluation neural network-action neural network integral reinforcement learning control model by combining a real-time rewarding function; and respectively designing a weight update law for the evaluation neural network and the action neural network in the control model, and tracking and controlling the gesture of the four-rotor unmanned aerial vehicle by using an integral reinforcement learning control model adopting the weight update law. The invention can ensure that the transient performance, the closed loop stability and the output tracking of the four-rotor unmanned aerial vehicle are improved, and the autonomy and the adaptability to new scenes of the four-rotor unmanned aerial vehicle are improved.

Description

Four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle automatic control, and particularly relates to a four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning.

Background

With the development and progress of aerospace technology, in various types of quadrotors unmanned aerial vehicle, the quadrotors are taken as a special aircraft in an unmanned aerial vehicle family, and the quadrotors are mainly used for monitoring and reconnaissance, emergency rescue, aerial photography, atmosphere monitoring and other purposes due to the characteristics of low cost, small size, simple structure and strong maneuverability, so that huge application prospects are shown in military and civil fields, research hot flashes are formed worldwide, and the research of a control system is the core in the research of the quadrotors.

Considering that a four-rotor aircraft is a multivariable, under-actuated and strongly coupled nonlinear system, some students adopt intelligent control strategies to identify and compensate the nonlinear system, but transient performance under the condition of strong nonlinearity is not considered yet, and insufficient control over the transient performance can lead to poor system response, including overshoot, convergence rate and other relevant factors, endanger the stability of the system and even possibly lead to system faults. Therefore, the comprehensive research on the transient performance of the four-rotor unmanned aerial vehicle control system has a vital meaning, and how to enhance the capability of the control system for effectively processing dynamic abrupt changes so as to improve the safety performance of the system becomes a research hot spot.

At present, the tracking control method of the four-rotor unmanned aerial vehicle mainly focuses on the following aspects: (1) external tamper-resistant control based on a disturbance observer; (2) Self-adaptive obstacle avoidance control based on potential functions or images; and (3) attitude control based on adaptive dynamic programming, and the like. In the past four rotor unmanned aerial vehicle design process, the robustness, security and maneuverability to four rotor unmanned aerial vehicle flight under the general circumstances are studied, aim at improving four rotor unmanned aerial vehicle complex environment's adaptability, but the past method still has fresh research in the aspect of improving the transient performance and intelligent autonomy of system.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides the four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning, which improves the performance of the four-rotor unmanned aerial vehicle in terms of dynamic performance and autonomy on the basis of the traditional steady-control four-rotor unmanned aerial vehicle and provides powerful support for subsequent intelligent autonomous application.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning comprises the following steps:

step 1: building a posture power model of the four-rotor unmanned aerial vehicle, building a posture angle state constraint on the posture power model by adopting a preset performance function, and building a posture tracking error model meeting the requirement of transient response performance of the four-rotor unmanned aerial vehicle by combining a posture angle error variable;

step 2: discretizing the attitude tracking error model constructed in the step 1, constructing a long-term cost function of the four-rotor unmanned aerial vehicle based on the discretized attitude tracking error model, and forming a real-time reward function of integral reinforcement learning;

step 3: constructing an evaluation neural network for controlling the performance of the four-rotor unmanned aerial vehicle system, constructing an error model of integral reinforcement learning based on an estimated value of the evaluation neural network on a long-term cost function, and constructing an evaluation neural network-action neural network integral reinforcement learning control model by combining the real-time reward function formed in the step 2;

step 4: and respectively designing weight updating rules for the evaluation neural network and the action neural network in the evaluation neural network-action neural network integral reinforcement learning control model, and tracking and controlling the gesture of the four-rotor unmanned aerial vehicle by using the integral reinforcement learning control model adopting the weight updating rules.

In order to optimize the technical scheme, the specific measures adopted further comprise:

the step 1 comprises the following substeps:

step 11: building a gesture power model of the four-rotor unmanned aerial vehicle:

；

wherein ,the change rate of the attitude angle of the four-rotor unmanned aerial vehicle is the change rate;

the change rate of the attitude angular rate of the quadrotor unmanned aerial vehicle is the change rate of the attitude angular rate of the quadrotor unmanned aerial vehicle;

a rotation matrix of an attitude angle system of the quadrotor unmanned aerial vehicle;

、/>the attitude angular rate and the moment of inertia of the four-rotor unmanned aerial vehicle are adopted;

the attitude angular rate matrix is the attitude angular rate matrix of the quadrotor unmanned aerial vehicle;

the control moment of the four-rotor unmanned aerial vehicle is;

external bounded interference to the quad-rotor unmanned helicopter;

step 12: establishing attitude angle state constraint on an attitude power model by adopting a preset performance function:

；

wherein The attitude angle of the four-rotor unmanned aerial vehicle under i is;

，/>、/>、/>roll angle, pitch angle and yaw angle, respectively, represent the subscript +.>Refers to one of a roll angle, a pitch angle, and a yaw angle;

for presetting performance index function, satisfy->，/>；/>，/>Is constant, satisfy->，/>Is a time variable;

the amplitude adjusting parameter of the function for presetting the performance index satisfies +.>；

Step 13: combining with an attitude angle error variable, constructing an attitude tracking error model meeting the transient response performance requirement of the four-rotor unmanned aerial vehicle:

；

wherein , and />The preset performance tracking error vectors respectively +.>About time->First and second derivatives of (2);

the state constraint and control model of the four-rotor unmanned aerial vehicle is considered for the attitude angle error variable, and specifically comprises the following steps:

；

for four rotor unmanned aerial vehicle attitude angle vector, +.>Intermediate variable +.>Wherein the auxiliary attitude angle constraint variable +.>，/>Is->First derivative with respect to time, < >>Is aboutBundle posture angle->Is a preset performance index function vector of +.>Is->First derivative with respect to time, < >>Is->Second derivative with respect to time,/>Rotation matrix of attitude angle system for quadrotor unmanned aerial vehicle, +.>For the first derivative of the rotation matrix of the attitude angle system of a quadrotor unmanned aerial vehicle with respect to time,is an intermediate variable introduced.

Above mentioned，/>And->The method comprises the following steps of:

；

wherein The roll angle rate, pitch angle rate and yaw angle rate, respectively; />、/>、/>The roll angle, pitch angle and yaw angle, respectively.

Above mentionedThe method comprises the following steps:

；

wherein ,representing a hyperbolic secant function->；

，/> and />For performance parameters selected according to the transient performance of a quadrotor unmanned aerial vehicle>，/>Determining the initial boundary and the end boundary of the attitude angle operation of the quadrotor unmanned aerial vehicle, < ->And determining the convergence speed of the attitude angle of the quadrotor unmanned aerial vehicle under the constraint of the preset performance function.

The step 2 comprises the following substeps:

step 21: discretizing the attitude tracking error model constructed in the step 1 to obtain a discretized attitude tracking error model:

；

wherein ,discretizing the first part based on the forward difference method>Step preset Performance tracking error vector +.>And presetting a first derivative of the performance tracking error vector with respect to time；

、/>Discretized according to forward difference method>Step preset Performance tracking error vector +.>And presetting a first derivative of the performance tracking error vector with respect to time；

Control input torque for discretization;

a model matrix and a control distribution matrix of the discretization model respectively;

the external bounded disturbance of the discretized four-rotor unmanned aerial vehicle;

step 22: constructing a long-term cost function of the four-rotor unmanned aerial vehicle based on the error state quantity and the control quantity in the discretized attitude tracking error model obtained in the step 21:

；

wherein ,as a positive function, reflecting whether the attitude angle of the current quadrotor unmanned aerial vehicle is out of range;

to be at the present->Based on the steps, the control performance prediction step number is carried out backwards in time;

is discretized->Step (3) a preset performance tracking error vector and a first derivative thereof;

for->Is->The function value of the power of the second order>Is a discount factor, satisfy->；

The weight matrix is a positive definite matrix, and the tracking error performance and the energy consumption of the four-rotor unmanned aerial vehicle model are balanced;

is discretized->Controlling input moment;

the initial moment is based on a forward difference discretization error model;

step 23: long term cost function according to step 22Form integration reinforcement learning->Real-time bonus function of steps->：

；

wherein ,representing the output quantity of the four-rotor unmanned aerial vehicle attitude model; />A weight matrix is positively defined; />Is a desired four-rotor unmanned aerial vehicle attitude angle signal.

The step 3 comprises the following sub-steps:

step 31: constructing an evaluation neural network for controlling behavior of the four-rotor unmanned aerial vehicle model:

；

wherein ,the method is an ideal weight matrix for evaluating the neural network;

for a desired long-term performance index function, +.>A vector that is all zeros;

to evaluate the activation function of the neural network;

to evaluate the neural network for the desired long-term performance index function +.>Is determined by the estimation error of (a);

satisfy the following requirements，/> and />，/>Are all unknown constants;

step 32: based on the estimated value of the long-term cost function of the evaluation neural network, constructing an error model of integral reinforcement learning:

；

wherein ,to evaluate the long-term cost function of a neural network>Estimated value of ∈10->Error for integral reinforcement learning;

step 33: an evaluation neural network-action neural network integral reinforcement learning control model is established based on an error model of integral reinforcement learning and a real-time rewarding function:

error model based on integral reinforcement learning, in the firstStep, establish four rotor unmanned aerial vehicle attitude angle tracking error +.>：

；

wherein , and />；/>Tracking signals for the attitude angles of the expected four-rotor unmanned aerial vehicle;

furthermore, the first step is introduced on the basis of the attitude angle tracking errorStep attitude angular rate tracking error:

；

according to the design method of the state feedback control law, an ideal controller is designed as follows:

；

wherein ,control gain for the design;

the following action neural network design is introduced:

；

wherein ,weights for ideal action neural network, +.>An activation function for the action neural network;

action neural networkThe input of (1) is defined as +.>；

Representing the weight of hidden layers in the action neural network,thus, the establishment of the evaluation neural network-action neural network integral reinforcement learning control model is completed.

In the step 4, the weight update law design process for evaluating the neural network is as follows:

for evaluating neural networksThe following approximation errors of the evaluation neural network are introduced: />；

Wherein, the weight approximation error of the neural network；/>Ideal weight for evaluating neural network>Is a function of the estimated value of (2);

the error of integral reinforcement learning is further evolved by combining the Bellman iterative equation and the empirical difference method:

；

to minimize long-term cost functionIt is mapped to the error cost function +.>In (a):

；

further, in combination with the self-adaptive gradient descent method of the discrete model, the following weight update gradient of the evaluation neural network is designed:

；

wherein ,in combination with the chain law,/->The method comprises the following steps:

；

wherein ,learning gain for evaluating the neural network weight update law;

thus, the following evaluation neural network weight update law can be obtained:

。

in the step 4, the weight update law design process of the action neural network is as follows:

aiming at an action neural network with ideal weight, the following neural network estimation strategy is designed to solve the problem that the action neural network cannot be directly applied:

；

wherein ,the output of the action neural network;

，/> and />，/>Is an unknown constant value;

further, by combining the tracking error of the attitude angular rate of the quadrotor unmanned aerial vehicle, the following steps are obtained:

；

wherein ,，/>is the desired attitude angular rate;

meanwhile, the estimation error of the action neural network weightDefined as->；

Combining external bounded interference and action neural network estimation errors received by the four-rotor unmanned aerial vehicle, and defining the attitude angle tracking error of the four-rotor unmanned aerial vehicle caused by the external bounded interference and action neural network estimation errors received by the unmanned aerial vehicle as；

Further, an estimation error of the action neural network is introducedIn combination with the aim of minimizing the output of the evaluation neural network, the following action neural network errors are designed:

；

wherein ,，/>an estimated value that is an output of the ideal evaluation neural network;

according to minimizing action neural network errorsIs subject to introduction of quadratic errors +.>In combination with the chain rule, the update gradient of the action neural network weight is designed as follows:

；

further, an action neural network weight update law can be obtained:

。

the invention has the following beneficial effects:

aiming at the strict transient response performance requirements faced by a four-rotor unmanned aerial vehicle attitude control model, the tracking error of the four-rotor unmanned aerial vehicle attitude is measured through a preset performance function, and the tracking error with performance constraint is dynamically converted into an equivalent 'state constraint' model; further, constructing a performance index function based on integral reinforcement learning, and balancing the long-term optimal performance and flexible transient response performance of the attitude control of the four-rotor unmanned aerial vehicle; and then, by designing a motion neural network aiming at minimizing the long-term performance and the transient performance cost function of the gesture, the integrated reinforcement learning preset performance four-rotor unmanned aerial vehicle gesture controller under the evaluation neural network-motion neural network self-adaptive neural network control architecture is formed.

According to the four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning, a fusion application framework for fusion preset performance control and integral reinforcement learning is established, on one hand, the transient performance, system closed loop stability and output tracking of the four-rotor unmanned aerial vehicle can be guaranteed to be improved, on the other hand, the autonomy and the adaptability to new scenes of the four-rotor unmanned aerial vehicle can be improved, and the adaptive neural network tracking control framework based on the preset performance integral reinforcement learning has a simple structure and is easy to realize.

Drawings

FIG. 1 is a frame diagram of a four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning;

FIG. 2 is a graph of circular position trajectory tracking based on preset performance integral reinforcement learning;

FIG. 3 is a graph of attitude angle preset constraint control for preset performance integral reinforcement learning;

FIG. 4 is a control output closed loop response curve for preset performance integral reinforcement learning.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Although the steps of the present invention are arranged by reference numerals, the order of the steps is not limited, and the relative order of the steps may be adjusted unless the order of the steps is explicitly stated or the execution of a step requires other steps as a basis. It is to be understood that the term "and/or" as used in this disclosure relates to and encompasses any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, the preset performance tracking control method of the four-rotor unmanned aerial vehicle based on reinforcement learning specifically comprises the following steps:

according to the structure and the flight environment of the four-rotor unmanned aerial vehicle, the characteristic of a gesture dynamics system of the four-rotor unmanned aerial vehicle is described by adopting a normal differential equation, the tracking error of the gesture of the four-rotor unmanned aerial vehicle is measured by adopting a preset performance function, the tracking error with performance constraint is dynamically converted into an equivalent 'state constraint' system, and a gesture tracking error system meeting the harsh transient response performance requirement of the four-rotor unmanned aerial vehicle is constructed;

step 1 comprises the following sub-steps:

according to the structure and flying environment of the quadrotor unmanned aerial vehicle, the gesture motion of the quadrotor unmanned aerial vehicle is represented by the gesture angle of the quadrotor unmanned aerial vehicleAttitude angular rate->Moment of inertia->And control moment->The composition, therefore, the gestural motion of a quadrotor unmanned can be described by the following nonlinear very differential model:

；

external bounded interference to the quad-rotor unmanned helicopter;

rotation matrix of attitude angle model of four-rotor unmanned aerial vehicleAnd attitude angular rate matrix->The method comprises the following steps of:

；

in addition, the attitude angle matrix of the quadrotor unmanned aerial vehicle consists of a rolling angle, a pitch angle and a yaw angle, namely,；、/>、/>respectively a roll angle, a pitch angle and a yaw angle;

likewise, the attitude angular rate of a quad-rotor unmanned helicopter is comprised of a roll angle rate, a pitch angle rate, and a yaw angle rate, specifically；/>The roll angle rate, pitch angle rate and yaw angle rate, respectively;

the control moment of the four-rotor unmanned aerial vehicle is;

in order to meet the harsh transient response performance requirement of the quadrotor unmanned aerial vehicle, a preset performance function is adopted to measure the tracking error of the posture of the quadrotor unmanned aerial vehicle, and the method specifically comprises the following steps:

；

wherein Representing one of the four-rotor unmanned aerial vehicle attitude angles; i.e. < ->，/>、/>、/>Roll angle, pitch angle and yaw angle, respectively, represent the subscript +.>Refers to one of a roll angle, a pitch angle, and a yaw angle;

presetting a performance index functionIs a function of positive definite and monotonically increasing, satisfying +.>，；/>，/>Is constant, satisfy->，/>Is a time variable;

In the present invention, the following is designedFunction:

；

wherein ,representing a hyperbolic secant function->；

，/> and />For performance parameters selected according to the transient performance of a quadrotor unmanned aerial vehicle>，/>Determining the initial boundary and the end boundary of the attitude angle operation of the quadrotor unmanned aerial vehicle, < ->Determining the convergence speed of the attitude angle of the four-rotor unmanned aerial vehicle under the constraint of a preset performance function;

because the attitude angle state of the quadrotor unmanned aerial vehicle is constrained in the step 12, the segmentation characteristic of the method enables the attitude angle of the quadrotor unmanned aerial vehicle with the constraint not to be directly applied to the control model design based on the differential equation of the attitude dynamics of the quadrotor unmanned aerial vehicle, and the invention designs the following attitude angle error variablesMeanwhile, the four-rotor unmanned aerial vehicle state constraint and control model design are considered:

；

further, based on the introduced variablesDynamically converting tracking errors with performance constraints into an equivalent 'state constraint' model, and constructing an attitude tracking error model meeting the severe transient response performance requirements of the four-rotor unmanned aerial vehicle:

；

wherein ,refers to the attitude angle tracking error variable +.>Regarding variables->Is a partial derivative of (c).

In addition, in the case of the optical fiber,，；

for convenience of explanation, the time variable in the four-rotor unmanned aerial vehicle attitude angle dynamic model isRemoving, such as: />In the present invention, the abbreviations described above are all carried out unless otherwise specified.

For four rotor unmanned aerial vehicle attitude angle vector, +.>Intermediate variable +.>Wherein the auxiliary attitude angle constraint variable +.>，/>Is->First derivative with respect to time, < >>For constraining attitude angle->Is a preset performance index function vector of +.>Is->First derivative with respect to time, < >>Is->Second derivative with respect to time,/>Rotation matrix of attitude angle system for quadrotor unmanned aerial vehicle, +.>For the first derivative of the rotation matrix of the attitude angle system of a quadrotor unmanned aerial vehicle with respect to time,is an intermediate variable introduced.

Step 2: discretizing the attitude tracking error model constructed in the step 1, constructing a long-term cost function integrating the four-rotor unmanned aerial vehicle state constraint out-of-range penalty term and controlling energy consumption based on the discretized attitude tracking error model, and forming a real-time reward function of integral reinforcement learning;

according to the attitude dynamic system model of the four-rotor unmanned aerial vehicle attitude tracking error 'state constraint' established in the step 1, based on analysis of the long-term optimal control performance of the four-rotor unmanned aerial vehicle attitude, constructing a long-term cost function integrating the four-rotor unmanned aerial vehicle state constraint out-of-range penalty term and the control energy consumption, and forming real-time rewards of integral reinforcement learning;

step 2 comprises the following sub-steps:

according to the attitude tracking error model established in the step 1, before integral reinforcement learning real-time rewarding design is carried out, in order to improve the calculation efficiency of the model, the continuous model equation established in the step 1 is discretized into the following discrete model by a forward difference method:

；

wherein ,step number of discretization model; />Discretizing the first part based on the forward difference method>Step preset Performance tracking error vector +.>And presetting the first derivative of the performance tracking error vector with respect to time +.>；

Control input torque for discretization;

is a discretized four-rotor unmanned aerial vehicle external bounded disturbance.

Step 22: based on the error state quantity in the discretized attitude tracking error model obtained in the step 21And control amount->Constructing a long-term cost function for fusing the state constraint out-of-range penalty term of the four-rotor unmanned aerial vehicle and controlling energy consumption:

according to the discretized four-rotor unmanned aerial vehicle attitude error tracking model, based on analysis of the four-rotor unmanned aerial vehicle attitude long-term optimal control performance, a long-term cost function integrating the four-rotor unmanned aerial vehicle state constraint out-of-range penalty term and the control energy consumption is constructed：

；

wherein ,as a positive function, reflecting whether the attitude angle of the current quadrotor unmanned aerial vehicle is out of range or not>To be at the present->Based on the steps, the control performance prediction step number is carried out backwards in time;

is a discount factor, satisfy->；/>For->Is->The value of the power function;

weight matrix of positive definite matrix, balance fourTracking error performance and energy consumption of the rotor unmanned aerial vehicle model;

is discretized->Controlling input moment;

the initial moment is based on a forward difference discretization error model;

step 23: based on the long-term cost function designed in step 22Integral reinforcement learning->Real-time bonus function of steps->The design is as follows:

；

Note that based on Lyapunov stability theorem, in the present invention, the desired long-term cost function is set to translate the reinforcement learning maximization rewards mechanism into an evaluation-control mechanism that aims at minimizing the long-term cost function.

establishing an evaluation neural network-action neural network control architecture of integral reinforcement learning: based on the real-time rewarding function obtained in the step 2, an evaluation neural network for integrating reinforcement learning taking account of rewarding values at all future moments is constructed, and further, a four-rotor unmanned aerial vehicle gesture tracking control strategy based on an action neural network is provided with the aim of minimizing system evaluation neural network output, so that an integration reinforcement learning control framework of the evaluation neural network-the action neural network is formed, and the transient performance and autonomous intelligence of the four-rotor unmanned aerial vehicle are improved;

step 3 comprises the following sub-steps:

based on the real-time rewarding function established in the step 2And long-term cost function->Because the state quantity of all the four-rotor unmanned aerial vehicle in the future cannot be directly obtained, the neural network is designed to estimate and predict, and the control behavior of the current four-rotor unmanned aerial vehicle model is evaluated. Further, the specific design of the evaluation neural network is as follows:

；

wherein ,for a desired long-term performance index function, +.>A vector that is all zeros;

weight matrix for ideal evaluation of neural network, < ->To evaluate the activation function of the neural network, +.>To evaluate the neural network for the desired long-term performance index function +.>Is used for the estimation error of (a). Furthermore, assume that +.>，/> and />，/>Are all unknown constants.

Note that: based on Lyapunov stability theorem, in the present invention, a long-term cost function is expectedIs set asThe reinforcement learning maximization rewards mechanism is converted into an evaluation-control mechanism that aims at minimizing the long-term cost function.

；

wherein ,to evaluate the long-term cost function of a neural network>Estimated value of ∈10->Is the error of integral reinforcement learning.

on the basis of establishing an evaluation neural network, a control strategy network of the four-rotor unmanned aerial vehicle based on reinforcement learning is further designed;

；

according to the design method of the state feedback control law, the ideal controller designed by the invention is as follows:

；

wherein ,control gain for the design;

as a result of the fact that,，/>unknown, including model subject to external bounded disturbance, model uncertainty, etc., ideal state feedback controller->Are not directly available. To solve this problem, the following action neural network design was introduced:

；

action neural networkThe input of (1) is defined as +.>；

The weight of the hidden layer in the action neural network is represented, and in the invention, the weight of the fixed hidden layer network is a fixed constant value. Thus, the design of the evaluation neural network-action neural network integral reinforcement learning control architecture aiming at minimizing the output of the model evaluation neural network is completed.

On the basis of the step 3, the factors such as uncertainty, external environment interference and the like existing in the four-rotor unmanned aerial vehicle system and evaluation neural network output containing unknown rewarding values are considered, the self-adaptive neural network is adopted to respectively approximate the evaluation neural network and the action neural network, in addition, strict theoretical analysis is provided based on the Lyapunov theory, stability of the closed-loop system and semi-global consistency and bounty of all states are proved, and the safety of the four-rotor unmanned aerial vehicle is ensured while the performance of the four-rotor unmanned aerial vehicle is improved.

Step 4 comprises the following sub-steps:

step 41: evaluating a neural network weight update law:

because the evaluation neural network and the action neural network established in the step 3 both contain ideal neural network weights, the method cannot be directly applied to an attitude angle tracking error model of the quadrotor unmanned aerial vehicle, and in addition, the calculation load problem caused by numerous neural network weight parameters is considered, the method adopts a less-parameterized neural network weight update law design method, and the specific steps are as follows:

evaluation neural network for designIntroduction ofThe approximation error of the neural network is evaluated as follows: />；

combining the Bellman iterative equation and the empirical difference method, the Bellman error equation is further developed as:

；

；/>

wherein ,learning gain for evaluating the neural network weight update law;

。

step 42: action neural network weight update law:

aiming at the action neural network with ideal weight, the invention designs the following neural network estimation strategy to solve the problem that the neural network estimation strategy cannot be directly applied:

；

wherein ,the output of the action neural network; furthermore, assume +.>，/> and />，/>Is an unknown constant value;

；

wherein ,，/>is the desired attitude angular rate; meanwhile, the estimation error of the action neural network weight is +.>Defined as->The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the attitude angle tracking error of the four-rotor unmanned aerial vehicle caused by external bounded interference and action neural network estimation error of the four-rotor unmanned aerial vehicle is defined as；

；

according to minimizing action neural network errorsIs subject to introduction of quadratic errors +.>Binding chainThe rule, the update gradient of the action neural network weight is designed as follows:

；

further, an action neural network weight update law can be obtained:

。/>

therefore, the design of the four-rotor unmanned aerial vehicle gesture preset performance integral reinforcement learning self-adaptive neural network tracking control algorithm is realized, and the four-rotor unmanned aerial vehicle gesture can be tracked and controlled by using the integral reinforcement learning control model adopting the weight updating law. Meanwhile, lyapunov functions can be combined, and the stability of the model and the bouncy of all closed-loop model signals are proved, so that the performance of the four-rotor unmanned aerial vehicle is improved, and meanwhile, the safety of the four-rotor unmanned aerial vehicle is guaranteed.

The following simulation experiment is carried out on the preset performance integral reinforcement learning self-adaptive neural network tracking control method:

initial conditions of the four-rotor unmanned aerial vehicle attitude dynamic model are respectively as follows:，

the control input constraint of the unmanned aerial vehicle is that，/>. The track tracked is: />；

In addition, in the case of the optical fiber,，/>，/>，，/>。

and (3) constructing a four-rotor unmanned aerial vehicle model and a corresponding actuator fault model in the Matlab/Simulink by adopting Matlab/Simulink simulation, and performing simulation verification based on designing a corresponding controller.

According to the parameters designed in the method, the preset performance tracking control method of the four-rotor unmanned aerial vehicle based on reinforcement learning is simulated, so that an output tracking error curve of a four-rotor unmanned aerial vehicle motion model is obtained, as shown in fig. 2 and 3-4, the position change curve of the four-rotor unmanned aerial vehicle is shown in fig. 2, wherein the tracking effect is good, the method designed by the method is designed, the other curve is a comparison tracking curve, and the stability and the tracking performance of the four-rotor unmanned aerial vehicle are ensured; the preset performance of the attitude angle and the control moment output of the quadrotor unmanned aerial vehicle are shown in fig. 3-4, and the state safety constraint boundary of the quadrotor unmanned aerial vehicle is shown from top to bottom in fig. 3-4, and the attitude angle motion curve and the comparison group curve of the quadrotor unmanned aerial vehicle under the method can be found that the quadrotor unmanned aerial vehicle can keep good attitude angle constraint and energy consumption, and the comparison method can exceed the state constraint boundary and cause larger control quantity fluctuation, so that the energy consumption of the unmanned aerial vehicle is increased and the performance is reduced. Simulation results show that the real-time effectiveness of the design method provided by the invention expands a new visual angle for subsequent researches.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning is characterized by comprising the following steps of:

2. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 1, wherein the step 1 comprises the following substeps:

；

the control moment of the four-rotor unmanned aerial vehicle is;

external bounded interference to the quad-rotor unmanned helicopter;

；

four-rotor-wing nothing is considered for attitude angle error variableThe state constraint and control model of the man-machine is specifically as follows:

；

3. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 2, wherein,，/>and->The method comprises the following steps of:

；

4. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 2, wherein the four-rotor unmanned aerial vehicle preset performance tracking control method is characterized in that：

；

wherein ,representing a hyperbolic secant function->；

，/> and />For performance parameters selected according to the transient performance of a quadrotor unmanned aerial vehicle>，/>Determining the initial boundary and the end boundary of the attitude angle operation of the quadrotor unmanned aerial vehicle, < ->Determining attitude angle of quadrotor unmanned aerial vehicle under preset performance function constraintConvergence speed.

5. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 1, wherein the step 2 comprises the following substeps:

；

Control input torque for discretization;

；

is discretized->Controlling input moment;

the initial moment is based on a forward difference discretization error model;

step 23: long term cost function according to step 22Form integration reinforcement learning->Real-time rewarding function of steps：

；

wherein ,representing the output quantity of the four-rotor unmanned aerial vehicle attitude model; />A weight matrix is positively defined;is a desired four-rotor unmanned aerial vehicle attitude angle signal.

6. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 1, wherein the step 3 comprises the following sub-steps:

；

to evaluate the activation function of the neural network;

satisfy the following requirements，/> and />，/>Are all unknown constants;

；

error model based on integral reinforcement learning, in the firstStep, the following four-rotor unmanned aerial vehicle attitude angle tracking error is established：

；

wherein ,control gain for the design;

the following action neural network design is introduced:

；

action neural networkThe input of (1) is defined as +.>；

Representing the weight of hidden layers in the action neural network, and thus finishing the establishment of the evaluation neural network-action neural network integral reinforcement learning control model.

7. The four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 1, wherein in the step 4, the weight update law design process for evaluating the neural network is as follows:

；

wherein ,learning gain for evaluating the neural network weight update law;

。

8. the four-rotor unmanned aerial vehicle preset performance tracking control method based on reinforcement learning according to claim 1, wherein in the step 4, the weight update law design process of the action neural network is as follows:

；

wherein ,the output of the action neural network;

，/> and />，/>Is an unknown constant value;

；

wherein ,，/>is the desired attitude angular rate;

；

further, an action neural network weight update law can be obtained:

。