CN111190429B - Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning - Google Patents

Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning Download PDF

Info

Publication number
CN111190429B
CN111190429B CN202010030358.4A CN202010030358A CN111190429B CN 111190429 B CN111190429 B CN 111190429B CN 202010030358 A CN202010030358 A CN 202010030358A CN 111190429 B CN111190429 B CN 111190429B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
fault
current
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010030358.4A
Other languages
Chinese (zh)
Other versions
CN111190429A (en
Inventor
任坚
刘剑慰
杨蒲
葛志文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010030358.4A priority Critical patent/CN111190429B/en
Publication of CN111190429A publication Critical patent/CN111190429A/en
Application granted granted Critical
Publication of CN111190429B publication Critical patent/CN111190429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning, which specifically comprises two stages, namely an early off-line training stage: training and updating an evaluation network of a fault-tolerant controller for reinforcement learning by collecting historical postures generated when an unmanned aerial vehicle runs and data output by the controller, wherein the evaluation network is optimized by optimizing an extreme learning machine through a genetic algorithm, so that the training speed and the training precision are improved; and (3) system operation and on-line training stage: in the operation process of the unmanned aerial vehicle, the reinforcement learning evaluation network is used for real-time online updating, self-learning and self-improvement of the reinforcement learning fault-tolerant controller are realized through online updating in the active fault-tolerant control process of the unmanned aerial vehicle, and real-time online updating of the extreme learning machine is realized through a dynamic capacity-expansion updating algorithm. The invention optimizes the reinforcement learning method by adopting an incremental strategy, realizes asymptotic approach to an optimal fault-tolerant control strategy, and can better realize fault-tolerant control of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning
Technical Field
The invention relates to an unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning, in particular to an unmanned aerial vehicle active fault-tolerant control method based on extreme learning machine and incremental strategy reinforcement learning, and belongs to the technical field of unmanned aerial vehicle active fault-tolerant control.
Background
With the continuous development of aerospace technology, the size of a flight control system becomes larger and larger, and the complexity of the system also increases continuously. While flight control systems continue to advance, system stability also presents significant challenges. Any type of fault can cause a compromise or even a breakdown in the system performance, resulting in instability of the control system and thus a significant loss. Therefore, how to reduce or even eliminate the risk caused by system failure is a problem worthy of research, and in order to overcome the failure of sensors, actuators and other components, many domestic and foreign scholars make great efforts in the research direction of failure diagnosis and fault-tolerant control.
Most of research work in recent years focuses on the design of a system controller, the system controller is mostly reconstructed by adopting a model-based method, the complexity of a flight control system is more and more huge due to the development of scientific technology, and great challenges are brought to mathematical modeling of the flight control system. Due to the high engineering application value of the data-based method, more and more attention is paid to the industry in recent years, and reinforcement learning is taken as a control method and has high research value based on data.
At present, reinforcement learning is mainly applied to the field of optimal control theory, and research results of applying reinforcement learning algorithm to active fault-tolerant control of unmanned aerial vehicles are still few.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning solves the problems that the accuracy of mathematical modeling can greatly influence the fault-tolerant effect, the traditional deterministic strategy reinforcement learning is poor in the fault-tolerant control effect and the like in the prior art, and has high instantaneity and adaptability.
The invention adopts the following technical scheme for solving the technical problems:
an active fault-tolerant control method of an unmanned aerial vehicle based on reinforcement learning comprises the following steps:
step 1, establishing an unmanned aerial vehicle dynamic model, and performing fault injection on an unmanned aerial vehicle to obtain an unmanned aerial vehicle aircraft fault model under the fault condition;
step 2, defining five different incremental strategies, including a non-compensation action, a positive action for compensating actuator faults, a negative action for compensating actuator faults, a positive action for compensating sensor faults and a negative action for compensating sensor faults, traversing an unmanned aerial vehicle fault model by one incremental strategy in sequence, and acquiring unmanned aerial vehicle attitude data under each incremental strategy through a sensor;
step 3, training a reinforcement learning evaluation network based on a genetic algorithm-extreme learning machine by using the attitude data of the unmanned aerial vehicle to obtain a trained reinforcement learning evaluation network;
step 4, when the unmanned aerial vehicle aircraft fault model is traversed according to the uncompensated action strategy in the step 2, the acquired unmanned aerial vehicle attitude data is used for training the state transition prediction network to obtain a trained state transition prediction network;
step 5, setting the training data set to be empty, and acquiring attitude angle data S once in each sampling period in the operation process of the unmanned aerial vehicle flight control systemkRespectively combining five different incremental strategies with the attitude angle data SkInput data are formed and input to a current reinforcement learning evaluation network, and reward values corresponding to different incremental strategies under a current attitude angle are obtained;
and 6, selecting the optimal incremental strategy under the current attitude angle according to the reward values corresponding to different incremental strategies and combining the epsilon-Greedy strategy and executing the strategy to obtain the system instant return value Q (S)current,Acurrent);
Step 7, predicting the attitude angle of the next sampling period according to the current attitude angle data and the current state transition prediction network to obtain the attitude angle predicted value of the next sampling period;
step 8, repeating the step 5 and the step 6 on the attitude angle predicted value of the next sampling period to obtain the optimal incremental strategy corresponding to the next sampling period and the system instant return value Q (S)next,Anext) Calculating the prize value to be updated
Figure BDA0002364081710000021
Step 9, the current attitude angle data SkOptimal incremental strategy under current attitude angle and reward value needing to be updated
Figure BDA0002364081710000031
As a new data sample, expanding the capacity of the current training data set, and updating the current reinforcement learning evaluation network by using the current training data set;
and step 10, repeating the steps 5-9 for each sampling period until the flight mission is completed.
As a preferred scheme of the present invention, the failure model of the unmanned aerial vehicle in the failure condition in step 1 specifically includes:
Figure BDA0002364081710000032
wherein x ∈ R4×1Is a state variable of the system and is,
Figure BDA0002364081710000033
theta is a variable of the pitch angle,
Figure BDA0002364081710000034
as a variable of the roll angle,
Figure BDA0002364081710000035
is composed of
Figure BDA0002364081710000036
The derivative of (a) of (b),
Figure BDA0002364081710000037
is composed of
Figure BDA0002364081710000038
U is the control input, A, B, C, D are all system matrices, y is the output of the control system, phi (t-t)1)fa(t)、φ(t-t2)Ffs(t) indicates actuator failure, sensor failure in the flight control system, respectively, fa(t) unknown actuator fault offset value, Ffs(t) is the unknown sensor fault offset value, phi (t-t)f) Generating a time function for the fault, an
Figure BDA0002364081710000039
tfTime, t represents time, generated for an unknown fault in the flight control system.
As a preferred embodiment of the present invention, the specific process of step 3 is:
step 31, sorting the unmanned aerial vehicle attitude data acquired in the step 2 according to a time sequence order to form a training sample set;
and step 32, the reinforcement learning evaluation network based on the genetic algorithm-extreme learning machine comprises a single hidden layer, a random parameter population of parameters of the hidden layer of the extreme learning machine is created through the genetic algorithm, the random parameter population is eliminated through a fitness function, the rest random parameter population is subjected to inheritance, crossing and mutation operations of the genetic algorithm, the elimination-inheritance-crossing-mutation process is repeated until the fitness function reaches an optimal value, and the trained reinforcement learning evaluation network is obtained.
As a preferred scheme of the invention, the updated reward value of the step 8
Figure BDA00023640817100000310
The calculation formula is as follows:
Figure BDA00023640817100000311
wherein, Q (S)current,Acurrent) Representing the current attitude angle SkThe system immediate return value obtained by executing the optimal incremental strategy, wherein lambda represents a discount factor, 0 < lambda < 1, and Q (S)next,Anext) Indicates the next attitude angle SnextAnd executing the optimal incremental strategy to obtain the system instant return value.
As a preferred embodiment of the present invention, the specific method for updating in step 9 is: and (3) solving a training algorithm of Moore-Penrose generalized inverse through a genetic algorithm optimization extreme learning machine, and updating the current reinforcement learning evaluation network.
As a preferred scheme of the present invention, the current state transition prediction network described in step 7 is updated once every 10 sampling periods, and if the current sampling period is to update the state transition prediction network, the training data adopted during the update is the attitude angle data acquired in the current sampling period and the attitude angle data acquired in the first 9 sampling periods of the current sampling period.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1. the invention extracts the characteristics of real-time data generated by the system by adopting an evaluation network through a reinforcement learning controller, thereby acquiring fault information and adjusting the system controller based on the fault information; compared with the traditional fault-tolerant control method based on a model, the active fault-tolerant control method based on data breaks through the limitation of difficult modeling of a complex system, and the design of the controller is simplified by extracting the data characteristics to replace a fault detection subsystem.
2. The invention provides an incremental strategy reinforcement learning controller for the premise of uncertain faults, and improves the limitation that a deterministic fixed strategy is adopted in the traditional reinforcement learning algorithm, so that the approach to the optimal fault-tolerant strategy of the current fault system is realized.
3. The invention carries out the estimation of the next state through the state transition prediction network, and realizes the update of the real-time strategy network of the continuous control system.
4. Compared with the traditional reinforcement learning method, the reinforcement learning model has greatly enhanced capability of extracting the characteristics of the data after being optimized by optimizing the reinforcement learning evaluation network through the genetic algorithm-extreme learning machine model.
5. The invention provides a dynamic capacity-expansion updating algorithm for the online updating of the extreme learning machine model, and realizes the rapid online updating of the reinforcement learning evaluation network by utilizing the rapidity of the updating training of the extreme learning machine.
Drawings
FIG. 1 is a flow chart of a control method of the present invention.
FIG. 2 is a block diagram of an active fault-tolerant controller for reinforcement learning according to the present invention.
FIG. 3 is a flow chart of the reinforcement learning evaluation network training process of the present invention.
FIG. 4 is a schematic diagram of a dynamic capacity-expansion updating algorithm of the extreme learning network model according to the present invention.
FIG. 5 illustrates the effect of fault-tolerant control of an active fault-tolerant controller in the event of actuator failure, in accordance with an embodiment of the present invention.
FIG. 6 illustrates the effect of fault-tolerant control of an active fault-tolerant controller in the event of a sensor failure in accordance with an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1 and fig. 2, the present invention provides an active fault-tolerant control method for an unmanned aerial vehicle based on reinforcement learning, which includes the steps of:
step S1, early off-line training stage: an unmanned aerial vehicle dynamics model is established, and training and updating are carried out on an evaluation network of a fault-tolerant controller for reinforcement learning by collecting historical postures generated when an unmanned aerial vehicle runs and data output by a controller.
Step S2, system operation and on-line training stage: in the operation process of the unmanned aerial vehicle, the reinforcement learning evaluation network is used for real-time online updating, self-learning and self-improvement of the reinforcement learning fault-tolerant controller are realized through online updating in the active fault-tolerant control process of the unmanned aerial vehicle, and real-time online updating of the extreme learning machine is realized through a dynamic capacity-expansion updating algorithm. The invention optimizes the reinforcement learning method by adopting an incremental strategy, realizes asymptotic approach to an optimal fault-tolerant control strategy, and can better realize fault-tolerant control of the unmanned aerial vehicle.
The specific implementation steps of the early off-line training stage of step S1 are as follows:
s11, establishing a dynamic model of the unmanned aerial vehicle; considering that the drone flies at high altitude at constant speed, the dynamical model is described using a simplified three-degree-of-freedom model. The embodiment of the invention adopts an aircraft fault diagnosis experimental platform of ' advanced aircraft navigation, control and health management ' Ministry of industry and communications ' key laboratory of Nanjing aerospace university, and the fault model of the unmanned aerial vehicle under the established fault condition is as follows:
Figure BDA0002364081710000051
wherein x ∈ R4×1Is the state of the system in which,
Figure BDA0002364081710000052
theta is a variable of the pitch angle,
Figure BDA0002364081710000053
as a roll angle variable, u ═ u1 u2 u3 u4]TFor control input, A ∈ R4×4,B∈R4×1,C∈R1×4,D∈R1×1For the system matrix, y ∈ R is the output of the control system, φ (t-t)1)fa(t)、φ(t-t2)Ffs(t) indicates actuator and sensor faults, respectively, in a flight control system, where fa(t) unknown actuator fault offset value, Ffs(t) is the bias value for unknown sensor faults, where F ∈ R1×4,fs(t)∈R4×1,φ(t-tf) The definition function for the fault generation time is defined as follows:
Figure BDA0002364081710000061
wherein t isfFor the time of unknown fault generation in the flight control system, pass phi (t-t) in the model builtf) The function represents a sudden failure of the system (at time t)fAfter which a fault occurs). The system matrix is specifically represented as follows:
Figure BDA0002364081710000062
C=[0 1 0 0]D=0
and S12, acquiring operation data in the unmanned aerial vehicle control system through the established mathematical model, acquiring the data through a sensor when the unmanned aerial vehicle operates normally and under the condition of failure through fault injection, wherein specific data labels are attitude Euler angle data, serial numbers of fault-tolerant strategies and actual output of the control system, and the acquired data are used as training data of an evaluation network. The selected data are variables which are selected from the control system and can reflect the running state of the control system, the variables can reflect the current running state of the system, and the fault-tolerant controller extracts useful characteristics through the system state and uses the useful characteristics as an important basis for decision making of the fault-tolerant controller. Defining the fault-tolerant control system state as the attitude angle of the flight control system:
Figure BDA0002364081710000063
and S is the label attribute of the data set, n is the number of the strategy action, and the acquired data is used as the training data of the evaluation network.
Step S13, optimizing the reinforcement learning Q-learning algorithm by using an extreme learning method, where the evaluation network of the reinforcement learning fault-tolerant controller is a three-layer extreme learning network including a single hidden layer, and the specific structure is shown in fig. 3.
And step S14, performing off-line training and updating on the constructed extreme learning machine network according to the collected operation data, and optimizing the extreme learning machine network through a genetic algorithm. The process is as follows:
and step S141, forming a training data sample set by the acquired training data samples according to a time sequence.
S142, establishing a random parameter population of hidden layer parameters of the extreme learning machine through a genetic algorithm, and passing through a fitness function f through heredity, intersection and variation of the genetic algorithm processfitnessAnd (4) optimizing the population, and after a certain number of iterations, training to obtain an evaluation network model with the highest accuracy after the fitness function reaches the optimal value and does not change any more. Wherein the fitness function ffitnessIs represented as follows:
Figure BDA0002364081710000071
in the formula, yiIndicates the i-th sample expected output value, yi' represents an actual output value after the i-th sample is input into the model. After a certain number of iterations, after the fitness function reaches an optimal value and does not change any more, training to obtain an evaluation network model with the highest accuracy.
In the early off-line training stage, training data is constructed through the step 141, for an extreme learning machine algorithm, updating of output layer weights is performed through solving of the Moore-Penrose generalized inverse in a linear equation, for a training process of a genetic algorithm optimization extreme learning machine model, firstly, hidden layer random parameter samples of a certain scale are randomly initialized, then, training is performed on all samples in a population, errors of all samples are solved to serve as fitness functions of the genetic algorithm, then, elimination is performed according to the fitness functions of all the individuals, then, operations such as crossing and mutation are performed on the eliminated individuals, after the operations such as crossing and mutation are performed, next-step sample training is performed continuously, then, iteration is performed continuously according to the operations, and the specific training process is shown in FIG. 3.
Defining a historyFour-tuple of tests (S)k,Ak,Rk,Sk+1) In which S iskFor the current state value of the unmanned aerial vehicle aircraft control system, AkFault-tolerant strategic actions, R, made by the unmanned aerial vehicle aircraft control system in the current statekAction A taken for unmanned aerial vehicle aircraft in current statekThe value of the reward obtained, Sk+1Taking action A for unmanned aerial vehicle aircraft control system in current statekThe next state value that the unmanned aerial vehicle aircraft control system of back reached. In the process of updating the reinforcement learning evaluation network training, S is needed to passkAnd AkTo obtain Sk+1Further update Sk,AkThe following Q function: q (S)k,Ak) The invention realizes the S pair through a state transition prediction networkk+1And (4) predicting.
Iterative optimization is carried out on the training process of the extreme learning machine through a genetic algorithm, the number of initialized population of the genetic algorithm is 2000, the number of iteration is 200, a fitness function adopts the reciprocal of the sum of squares of model training errors, and the algorithm aims at maximizing a self-adaptive function, so that the error minimization is realized; the number of hidden layer nodes for the extreme learning machine network is 128.
The specific implementation steps of the system operation and on-line training stage of the step S2 are as follows:
s21, sorting the data of the unmanned aerial vehicle according to time sequence and inputting the data into SkThe output is Sk+1The training data sample set is formed according to the time progressive sequence of the training samples.
And step S22, training the training data sample set obtained in the step S21 through a BP neural network.
The method comprises the steps that the operation state of an unmanned aerial vehicle flight control system is determined, when state transition prediction network training is finished, fault-tolerant tracking control is carried out on the control system through an incremental strategy-based reinforcement learning method, in the process, an evaluation network is updated on line through decision and instant reward values of each step, the instant reward value evaluation criterion is an absolute value of an error between expected output and actual output of the control system, and a reward function J is defined (S)t) The specific functional form is as follows:
Figure BDA0002364081710000081
wherein gamma is a discount factor, and meets the condition that gamma is more than 0 and less than or equal to 1; and U (S)t-j,At-j) The utility function of the reinforcement learning algorithm is specifically in the form as follows:
U(St,At)=Q(St,At)
and Q (S)t,At) The function is mathematically formed as:
Q(St,At)=|y(t,At)-yd(t)|
where t is the system runtime, y (t, A)t) Making a decision A for the current time of the systemtActual output, y, obtained by the rear control systemd(t) is the desired control system output at the current time.
Step S23, determining whether the current system fails according to data such as attitude angle and actuator current and voltage acquired by a sensor during the operation of the unmanned aerial vehicle control system, and if the current system fails, changing the reward value corresponding to each action in the current state and the policy action set by the evaluation network, wherein the mathematical expression form of the policy action set is as follows:
Ω={Λ12345}
wherein ΛaFor the optional a-th configuration in the system, a is 1,2,3,4, 5. In a specific application embodiment, an incremental strategy is adopted to realize asymptotic approximation of an optimal fault-tolerant control strategy, a strategy made by a fault-tolerant controller at each moment is superposed into a current strategy signal, and for the application embodiment of the invention, the following five incremental strategies are defined:
1. actions taken when the system is normal: lambda1=[0 0 0 0]
2. CompensationPositive action of actuator fault: lambda2=+[0 0.0002 0 0]
3. Compensating for negative actions of actuator faults: lambda3=-[0 0.0002 0 0]
4. Forward action to compensate for sensor failure: lambda4=+[0 0 0.0002 0]
5. Negative actions to compensate for sensor failure: lambda5=-[0 0 0.0002 0]
Step S24, evaluating the current operation state value S of the network passing control systemkAnd each incremental strategy action in the strategy action set is used as a model input, action selection is carried out by combining model output with an epsilon-Greedy strategy, and then the incremental strategy actions made by the evaluation network are superposed in the existing action and act on the current control signal to realize fault tolerance.
The online updating of the extreme learning machine network is realized through a dynamic capacity-expansion updating algorithm, the method does not need to carry out feedforward transmission on the current sample error through an algorithm similar to gradient descent, and realizes the online quick updating through the direct expansion of training data and the rapidity of the updating algorithm of the extreme learning machine, and the specific steps are shown in fig. 4.
For the updating process of reinforcement learning, firstly, initializing a Q-learning evaluation network, randomly initializing parameters of a neural network, inputting data of the neural network into the state of the system and the sequence number of the action currently taken, outputting the input data into the current state and obtaining a reward value U (S) by taking the action in the statecurrent,Acurrent)。
Next, the current state S is collectedt: actions are randomly chosen among all the action sets with a probability of epsilon, and action A that maximizes the reward value (which in this context is the error between the actual output and the desired output of the system) is chosen with a probability of (1-epsilon)t=argmaxQ(St) Recording the current state StAnd action AtThe value of the reward is U (S)current,Acurrent)。
Step S25, after the decision module of the reinforcement learning active fault-tolerant controller gives out the fault-tolerant strategy, the current state sum is calculatedThe reward value function of the strategy made is solved by the current immediate return value Q (S)current,Acurrent) Discounted historical values
Figure BDA0002364081710000091
And obtaining the accumulated discount return value, wherein the mathematical expression form of the updated reward value is as follows:
Figure BDA0002364081710000092
while
Figure BDA0002364081710000093
And outputting the return value for the state under the current evaluation network.
Step S26, numbering the current state value, the policy action taken and the result of step S25
Figure BDA0002364081710000094
As a new data sample, expand into the existing training data set.
And step S27, obtaining a training algorithm of Moore-Penrose generalized inverse through a genetic algorithm optimization extreme learning machine to obtain a latest training model.
And step S28, repeating the above process for each sampling period until the flight mission is completed.
And updating the state transition prediction network at intervals by controlling the historical operation data of the system, and predicting the next state value by the current state and action value. In order to reduce the pressure of a processor and ensure the rapidity of the system, the state transition prediction network is updated once every 10 sampling periods on the premise of not influencing the accurate judgment of the fault-tolerant controller.
In order to verify the effect of the fault-tolerant control, the invention applies an aircraft fault diagnosis experimental platform of an engineering and trust department key laboratory of "advanced aircraft navigation, control and health management" of Nanjing aerospace university to carry out verification experiments, when an actuator fault is injected into the experimental platform, the system attitude can continuously track an expected signal after generating a deviation under the fault-tolerant control of an active fault-tolerant controller based on an extreme learning machine and an incremental strategy reinforcement learning method, and the output residual error of the unmanned aerial vehicle aircraft is shown in FIG. 5. When a sensor fault is injected into the experimental platform, the output residual error of the unmanned aerial vehicle is shown in fig. 6.
According to the simulation result, when the unmanned aerial vehicle aircraft generates actuator faults or sensor faults in the flight process, the unmanned aerial vehicle active fault-tolerant control method based on the extreme learning machine and incremental strategy reinforcement learning can have a good fault-tolerant effect without depending on a system model in the operation process, and can realize on-line self-learning and updating. The method has important applicable reference value for fault-tolerant control of the unmanned aerial vehicle aircraft with faults.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (4)

1. An active fault-tolerant control method of an unmanned aerial vehicle based on reinforcement learning is characterized by comprising the following steps:
step 1, establishing an unmanned aerial vehicle dynamic model, and performing fault injection on an unmanned aerial vehicle to obtain an unmanned aerial vehicle aircraft fault model under the fault condition;
unmanned aerial vehicle aircraft fault model under the trouble condition specifically is:
Figure FDA0003199526190000011
wherein x ∈ R4×1Is a state variable of the system and is,
Figure FDA0003199526190000012
theta is a variable of the pitch angle,
Figure FDA0003199526190000013
as a variable of the roll angle,
Figure FDA0003199526190000014
is composed of
Figure FDA0003199526190000015
The derivative of (a) of (b),
Figure FDA0003199526190000016
is composed of
Figure FDA0003199526190000017
U is the control input, A, B, C, D are all system matrices, y is the output of the control system, phi (t-t)1)fa(t)、φ(t-t2)Ffs(t) indicates actuator failure, sensor failure in the flight control system, respectively, fa(t) unknown actuator fault offset value, Ffs(t) is the unknown sensor fault offset value, phi (t-t)f) Generating a time function for the fault, an
Figure FDA0003199526190000018
tfTime of occurrence for an unknown fault in the flight control system, t representing time;
step 2, defining five different incremental strategies, including a non-compensation action, a positive action for compensating actuator faults, a negative action for compensating actuator faults, a positive action for compensating sensor faults and a negative action for compensating sensor faults, traversing an unmanned aerial vehicle fault model by one incremental strategy in sequence, and acquiring unmanned aerial vehicle attitude data under each incremental strategy through a sensor;
step 3, training a reinforcement learning evaluation network based on a genetic algorithm-extreme learning machine by using the attitude data of the unmanned aerial vehicle to obtain a trained reinforcement learning evaluation network;
step 4, when the unmanned aerial vehicle aircraft fault model is traversed according to the uncompensated action strategy in the step 2, the acquired unmanned aerial vehicle attitude data is used for training the state transition prediction network to obtain a trained state transition prediction network;
step 5, setting the training data set to be empty, and acquiring attitude angle data S once in each sampling period in the operation process of the unmanned aerial vehicle flight control systemkRespectively combining five different incremental strategies with the attitude angle data SkInput data are formed and input to a current reinforcement learning evaluation network, and reward values corresponding to different incremental strategies under a current attitude angle are obtained;
and 6, selecting the optimal incremental strategy under the current attitude angle according to the reward values corresponding to different incremental strategies and combining the epsilon-Greedy strategy and executing the strategy to obtain the system instant return value Q (S)current,Acurrent);
Step 7, predicting the attitude angle of the next sampling period according to the current attitude angle data and the current state transition prediction network to obtain the attitude angle predicted value of the next sampling period;
step 8, repeating the step 5 and the step 6 on the attitude angle predicted value of the next sampling period to obtain the optimal incremental strategy corresponding to the next sampling period and the system instant return value Q (S)next,Anext) Calculating the prize value to be updated
Figure FDA0003199526190000021
The updated prize value
Figure FDA0003199526190000022
The calculation formula is as follows:
Figure FDA0003199526190000023
wherein, Q (S)current,Acurrent) Representing the current attitude angle SkSystem obtained by executing optimal incremental strategyUnify the immediate return value, λ represents the discount factor, 0 < λ < 1, Q (S)next,Anext) Indicates the next attitude angle SnextThe system instant return value obtained by executing the optimal incremental strategy is obtained;
step 9, the current attitude angle data SkOptimal incremental strategy under current attitude angle and reward value needing to be updated
Figure FDA0003199526190000024
As a new data sample, expanding the capacity of the current training data set, and updating the current reinforcement learning evaluation network by using the current training data set;
and step 10, repeating the steps 5-9 for each sampling period until the flight mission is completed.
2. The active fault-tolerant control method for the unmanned aerial vehicle based on reinforcement learning of claim 1, wherein the specific process of the step 3 is as follows:
step 31, sorting the unmanned aerial vehicle attitude data acquired in the step 2 according to a time sequence order to form a training sample set;
and step 32, the reinforcement learning evaluation network based on the genetic algorithm-extreme learning machine comprises a single hidden layer, a random parameter population of parameters of the hidden layer of the extreme learning machine is created through the genetic algorithm, the random parameter population is eliminated through a fitness function, the rest random parameter population is subjected to inheritance, crossing and mutation operations of the genetic algorithm, the elimination-inheritance-crossing-mutation process is repeated until the fitness function reaches an optimal value, and the trained reinforcement learning evaluation network is obtained.
3. The active fault-tolerant control method for the unmanned aerial vehicle based on reinforcement learning of claim 1, wherein the specific updating method in step 9 is as follows: and (3) solving a training algorithm of Moore-Penrose generalized inverse through a genetic algorithm optimization extreme learning machine, and updating the current reinforcement learning evaluation network.
4. The active fault-tolerant control method for unmanned aerial vehicles based on reinforcement learning of claim 1, wherein the current state transition prediction network is updated once every 10 sampling periods in step 7, and if the current sampling period is to update the state transition prediction network, the training data adopted during the update is the attitude angle data acquired in the current sampling period and the attitude angle data acquired in the first 9 sampling periods of the current sampling period.
CN202010030358.4A 2020-01-13 2020-01-13 Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning Active CN111190429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010030358.4A CN111190429B (en) 2020-01-13 2020-01-13 Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010030358.4A CN111190429B (en) 2020-01-13 2020-01-13 Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN111190429A CN111190429A (en) 2020-05-22
CN111190429B true CN111190429B (en) 2022-03-18

Family

ID=70708146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010030358.4A Active CN111190429B (en) 2020-01-13 2020-01-13 Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN111190429B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111679579B (en) * 2020-06-10 2021-10-12 南京航空航天大学 Sliding mode prediction fault-tolerant control method for fault system of sensor and actuator
CN111783250A (en) * 2020-07-03 2020-10-16 上海航天控制技术研究所 Flexible robot end arrival control method, electronic device, and storage medium
CN112180960B (en) * 2020-09-29 2021-09-14 西北工业大学 Unmanned aerial vehicle fault-tolerant flight method and flight system for actuator faults
CN113467248A (en) * 2021-07-22 2021-10-01 南京大学 Fault-tolerant control method for unmanned aerial vehicle sensor during fault based on reinforcement learning
CN114153640A (en) * 2021-11-26 2022-03-08 哈尔滨工程大学 System fault-tolerant strategy method based on deep reinforcement learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104914851A (en) * 2015-05-21 2015-09-16 北京航空航天大学 Adaptive fault detection method for airplane rotation actuator driving device based on deep learning
CN105915294A (en) * 2016-06-20 2016-08-31 中国人民解放军军械工程学院 Unmanned aerial vehicle airborne transmitter fault forecasting method and system
CN107315892A (en) * 2017-08-10 2017-11-03 北京交通大学 A kind of Method for Bearing Fault Diagnosis based on extreme learning machine
CN107316046A (en) * 2017-03-09 2017-11-03 河北工业大学 A kind of method for diagnosing faults that Dynamic adaptiveenhancement is compensated based on increment
CN108256173A (en) * 2017-12-27 2018-07-06 南京航空航天大学 A kind of Gas path fault diagnosis method and system of aero-engine dynamic process
CN109001982A (en) * 2018-10-19 2018-12-14 西安交通大学 A kind of nonlinear system adaptive neural network fault tolerant control method
CN109408552A (en) * 2018-08-08 2019-03-01 南京航空航天大学 The monitoring of the civil aircraft system failure and recognition methods based on LSTM-AE deep learning frame
CN109799802A (en) * 2018-12-06 2019-05-24 郑州大学 Sensor fault diagnosis and fault tolerant control method in a kind of control of molecular weight distribution
KR20190064111A (en) * 2017-11-30 2019-06-10 한국에너지기술연구원 Energy management system and energy management method including fault tolerant function
CN110244689A (en) * 2019-06-11 2019-09-17 哈尔滨工程大学 A kind of AUV adaptive failure diagnostic method based on identification feature learning method
CN110413000A (en) * 2019-05-28 2019-11-05 北京航空航天大学 A kind of hypersonic aircraft based on deep learning reenters prediction and corrects fault-tolerant method of guidance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10628491B2 (en) * 2016-11-09 2020-04-21 Cognitive Scale, Inc. Cognitive session graphs including blockchains

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104914851A (en) * 2015-05-21 2015-09-16 北京航空航天大学 Adaptive fault detection method for airplane rotation actuator driving device based on deep learning
CN105915294A (en) * 2016-06-20 2016-08-31 中国人民解放军军械工程学院 Unmanned aerial vehicle airborne transmitter fault forecasting method and system
CN107316046A (en) * 2017-03-09 2017-11-03 河北工业大学 A kind of method for diagnosing faults that Dynamic adaptiveenhancement is compensated based on increment
CN107315892A (en) * 2017-08-10 2017-11-03 北京交通大学 A kind of Method for Bearing Fault Diagnosis based on extreme learning machine
KR20190064111A (en) * 2017-11-30 2019-06-10 한국에너지기술연구원 Energy management system and energy management method including fault tolerant function
CN108256173A (en) * 2017-12-27 2018-07-06 南京航空航天大学 A kind of Gas path fault diagnosis method and system of aero-engine dynamic process
CN109408552A (en) * 2018-08-08 2019-03-01 南京航空航天大学 The monitoring of the civil aircraft system failure and recognition methods based on LSTM-AE deep learning frame
CN109001982A (en) * 2018-10-19 2018-12-14 西安交通大学 A kind of nonlinear system adaptive neural network fault tolerant control method
CN109799802A (en) * 2018-12-06 2019-05-24 郑州大学 Sensor fault diagnosis and fault tolerant control method in a kind of control of molecular weight distribution
CN110413000A (en) * 2019-05-28 2019-11-05 北京航空航天大学 A kind of hypersonic aircraft based on deep learning reenters prediction and corrects fault-tolerant method of guidance
CN110244689A (en) * 2019-06-11 2019-09-17 哈尔滨工程大学 A kind of AUV adaptive failure diagnostic method based on identification feature learning method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A New Method for Fault Tolerant Control through Q-Learning;Changsheng Hua等;《IFAC-PapersOnLine》;20181231;第51卷(第24期);第38-45页 *
基于增益调度PID的四旋翼无人机主动容错控制;蒋银行等;《山东科技大学学报(自然科学版)》;20170430;第36卷(第4期);第31-37页 *
基于强化学习和蚁群算法的WSN节点故障诊断;常峰等;《计算机测量与控制》;20150331;第23卷(第3期);第755-758页 *

Also Published As

Publication number Publication date
CN111190429A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111190429B (en) Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning
CN110222371B (en) Bayes and neural network-based engine residual life online prediction method
CN111241952A (en) Reinforced learning reward self-learning method in discrete manufacturing scene
Wang et al. Neural-network-based fault-tolerant control of unknown nonlinear systems
Cen et al. A gray-box neural network-based model identification and fault estimation scheme for nonlinear dynamic systems
CN112439794B (en) Hot rolling bending force prediction method based on LSTM
CN112947385B (en) Aircraft fault diagnosis method and system based on improved Transformer model
Xie et al. A novel deep belief network and extreme learning machine based performance degradation prediction method for proton exchange membrane fuel cell
Ma et al. Deep auto-encoder observer multiple-model fast aircraft actuator fault diagnosis algorithm
Nasser et al. A hybrid of convolutional neural network and long short-term memory network approach to predictive maintenance
CN114692310A (en) Virtual-real integration-two-stage separation model parameter optimization method based on Dueling DQN
Yin et al. Dynamic behavioral assessment model based on Hebb learning rule
CN112146879A (en) Rolling bearing fault intelligent diagnosis method and system
Wu et al. Ensemble recurrent neural network-based residual useful life prognostics of aircraft engines
CN116432359A (en) Variable topology network tide calculation method based on meta transfer learning
Precup et al. A survey on fuzzy control for mechatronics applications
CN114880767B (en) Aero-engine residual service life prediction method based on attention mechanism Dense-GRU network
Long et al. A data fusion fault diagnosis method based on LSTM and DWT for satellite reaction flywheel
CN115972211A (en) Control strategy offline training method based on model uncertainty and behavior prior
Tirovolas et al. Introducing fuzzy cognitive map for predicting engine’s health status
Ji et al. Data preprocessing method and fault diagnosis based on evaluation function of information contribution degree
Liu et al. Aero-Engines Remaining Useful Life Prognostics Based on Multi-Hierarchical Gated Recurrent Graph Convolutional Network
Zhou et al. Bearing life prediction method based on parallel multichannel recurrent convolutional neural network
CN113821012B (en) Fault diagnosis method for variable-working-condition satellite attitude control system
Mao et al. Fault Diagnosis for Underactuated Surface Vessel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant