CN111190429A - Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning - Google Patents
Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning Download PDFInfo
- Publication number
- CN111190429A CN111190429A CN202010030358.4A CN202010030358A CN111190429A CN 111190429 A CN111190429 A CN 111190429A CN 202010030358 A CN202010030358 A CN 202010030358A CN 111190429 A CN111190429 A CN 111190429A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- fault
- current
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 53
- 238000011156 evaluation Methods 0.000 claims abstract description 35
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 21
- 230000002068 genetic effect Effects 0.000 claims abstract description 20
- 230000009471 action Effects 0.000 claims description 31
- 238000005070 sampling Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 21
- 230000007704 transition Effects 0.000 claims description 15
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 5
- 230000008092 positive effect Effects 0.000 claims description 5
- 230000035772 mutation Effects 0.000 claims description 4
- 238000002347 injection Methods 0.000 claims description 3
- 239000007924 injection Substances 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000013459 approach Methods 0.000 abstract description 3
- 238000011217 control strategy Methods 0.000 abstract description 3
- 230000036544 posture Effects 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 7
- 238000011160 research Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/08—Control of attitude, i.e. control of roll, pitch, or yaw
- G05D1/0808—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses an unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning, which specifically comprises two stages, namely an early off-line training stage: training and updating an evaluation network of a fault-tolerant controller for reinforcement learning by collecting historical postures generated when an unmanned aerial vehicle runs and data output by the controller, wherein the evaluation network is optimized by optimizing an extreme learning machine through a genetic algorithm, so that the training speed and the training precision are improved; and (3) system operation and on-line training stage: in the operation process of the unmanned aerial vehicle, the reinforcement learning evaluation network is used for real-time online updating, self-learning and self-improvement of the reinforcement learning fault-tolerant controller are realized through online updating in the active fault-tolerant control process of the unmanned aerial vehicle, and real-time online updating of the extreme learning machine is realized through a dynamic capacity-expansion updating algorithm. The invention optimizes the reinforcement learning method by adopting an incremental strategy, realizes asymptotic approach to an optimal fault-tolerant control strategy, and can better realize fault-tolerant control of the unmanned aerial vehicle.
Description
Technical Field
The invention relates to an unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning, in particular to an unmanned aerial vehicle active fault-tolerant control method based on extreme learning machine and incremental strategy reinforcement learning, and belongs to the technical field of unmanned aerial vehicle active fault-tolerant control.
Background
With the continuous development of aerospace technology, the size of a flight control system becomes larger and larger, and the complexity of the system also increases continuously. While flight control systems continue to advance, system stability also presents significant challenges. Any type of fault can cause a compromise or even a breakdown in the system performance, resulting in instability of the control system and thus a significant loss. Therefore, how to reduce or even eliminate the risk caused by system failure is a problem worthy of research, and in order to overcome the failure of sensors, actuators and other components, many domestic and foreign scholars make great efforts in the research direction of failure diagnosis and fault-tolerant control.
Most of research work in recent years focuses on the design of a system controller, the system controller is mostly reconstructed by adopting a model-based method, the complexity of a flight control system is more and more huge due to the development of scientific technology, and great challenges are brought to mathematical modeling of the flight control system. Due to the high engineering application value of the data-based method, more and more attention is paid to the industry in recent years, and reinforcement learning is taken as a control method and has high research value based on data.
At present, reinforcement learning is mainly applied to the field of optimal control theory, and research results of applying reinforcement learning algorithm to active fault-tolerant control of unmanned aerial vehicles are still few.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning solves the problems that the accuracy of mathematical modeling can greatly influence the fault-tolerant effect, the traditional deterministic strategy reinforcement learning is poor in the fault-tolerant control effect and the like in the prior art, and has high instantaneity and adaptability.
The invention adopts the following technical scheme for solving the technical problems:
an active fault-tolerant control method of an unmanned aerial vehicle based on reinforcement learning comprises the following steps:
step 2, defining five different incremental strategies, including a non-compensation action, a positive action for compensating actuator faults, a negative action for compensating actuator faults, a positive action for compensating sensor faults and a negative action for compensating sensor faults, traversing an unmanned aerial vehicle fault model by one incremental strategy in sequence, and acquiring unmanned aerial vehicle attitude data under each incremental strategy through a sensor;
step 3, training a reinforcement learning evaluation network based on a genetic algorithm-extreme learning machine by using the attitude data of the unmanned aerial vehicle to obtain a trained reinforcement learning evaluation network;
step 4, when the unmanned aerial vehicle aircraft fault model is traversed according to the uncompensated action strategy in the step 2, the acquired unmanned aerial vehicle attitude data is used for training the state transition prediction network to obtain a trained state transition prediction network;
and 6, selecting the optimal incremental strategy under the current attitude angle according to the reward values corresponding to different incremental strategies and combining the epsilon-Greedy strategy and executing the strategy to obtain the system instant return value Q (S)current,Acurrent);
step 8, repeating the step 5 and the step 6 on the attitude angle predicted value of the next sampling period to obtain the optimal incremental strategy corresponding to the next sampling period and the system instant return value Q (S)next,Anext) Calculating the prize value to be updated
and step 10, repeating the steps 5-9 for each sampling period until the flight mission is completed.
As a preferred scheme of the present invention, the failure model of the unmanned aerial vehicle in the failure condition in step 1 specifically includes:
wherein x ∈ R4×1Is a state variable of the system and is,theta is a variable of the pitch angle,as a variable of the roll angle,is composed ofThe derivative of (a) of (b),is composed ofU is the control input, A, B, C, D are all system matrices, y is the output of the control system, phi (t-t)1)fa(t)、φ(t-t2)Ffs(t) indicates actuator failure, sensor failure in the flight control system, respectively, fa(t) unknown actuator fault offset value, Ffs(t) is the unknown sensor fault offset value, phi (t-t)f) Generating a time function for the fault, an
tfTime, t represents time, generated for an unknown fault in the flight control system.
As a preferred embodiment of the present invention, the specific process of step 3 is:
step 31, sorting the unmanned aerial vehicle attitude data acquired in the step 2 according to a time sequence order to form a training sample set;
and step 32, the reinforcement learning evaluation network based on the genetic algorithm-extreme learning machine comprises a single hidden layer, a random parameter population of parameters of the hidden layer of the extreme learning machine is created through the genetic algorithm, the random parameter population is eliminated through a fitness function, the rest random parameter population is subjected to inheritance, crossing and mutation operations of the genetic algorithm, the elimination-inheritance-crossing-mutation process is repeated until the fitness function reaches an optimal value, and the trained reinforcement learning evaluation network is obtained.
As a preferred scheme of the invention, the updated reward value of the step 8The calculation formula is as follows:
wherein, Q (S)current,Acurrent) Representing the current attitude angle SkThe system immediate return value obtained by executing the optimal incremental strategy, wherein lambda represents a discount factor, 0 < lambda < 1, and Q (S)next,Anext) Indicates the next attitude angle SnextAnd executing the optimal incremental strategy to obtain the system instant return value.
As a preferred embodiment of the present invention, the specific method for updating in step 9 is: and (3) solving a training algorithm of Moore-Penrose generalized inverse through a genetic algorithm optimization extreme learning machine, and updating the current reinforcement learning evaluation network.
As a preferred scheme of the present invention, the current state transition prediction network described in step 7 is updated once every 10 sampling periods, and if the current sampling period is to update the state transition prediction network, the training data adopted during the update is the attitude angle data acquired in the current sampling period and the attitude angle data acquired in the first 9 sampling periods of the current sampling period.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1. the invention extracts the characteristics of real-time data generated by the system by adopting an evaluation network through a reinforcement learning controller, thereby acquiring fault information and adjusting the system controller based on the fault information; compared with the traditional fault-tolerant control method based on a model, the active fault-tolerant control method based on data breaks through the limitation of difficult modeling of a complex system, and the design of the controller is simplified by extracting the data characteristics to replace a fault detection subsystem.
2. The invention provides an incremental strategy reinforcement learning controller for the premise of uncertain faults, and improves the limitation that a deterministic fixed strategy is adopted in the traditional reinforcement learning algorithm, so that the approach to the optimal fault-tolerant strategy of the current fault system is realized.
3. The invention carries out the estimation of the next state through the state transition prediction network, and realizes the update of the real-time strategy network of the continuous control system.
4. Compared with the traditional reinforcement learning method, the reinforcement learning model has greatly enhanced capability of extracting the characteristics of the data after being optimized by optimizing the reinforcement learning evaluation network through the genetic algorithm-extreme learning machine model.
5. The invention provides a dynamic capacity-expansion updating algorithm for the online updating of the extreme learning machine model, and realizes the rapid online updating of the reinforcement learning evaluation network by utilizing the rapidity of the updating training of the extreme learning machine.
Drawings
FIG. 1 is a flow chart of a control method of the present invention.
FIG. 2 is a block diagram of an active fault-tolerant controller for reinforcement learning according to the present invention.
FIG. 3 is a flow chart of the reinforcement learning evaluation network training process of the present invention.
FIG. 4 is a schematic diagram of a dynamic capacity-expansion updating algorithm of the extreme learning network model according to the present invention.
FIG. 5 illustrates the effect of fault-tolerant control of an active fault-tolerant controller in the event of actuator failure, in accordance with an embodiment of the present invention.
FIG. 6 illustrates the effect of fault-tolerant control of an active fault-tolerant controller in the event of a sensor failure in accordance with an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1 and fig. 2, the present invention provides an active fault-tolerant control method for an unmanned aerial vehicle based on reinforcement learning, which includes the steps of:
step S1, early off-line training stage: an unmanned aerial vehicle dynamics model is established, and training and updating are carried out on an evaluation network of a fault-tolerant controller for reinforcement learning by collecting historical postures generated when an unmanned aerial vehicle runs and data output by a controller.
Step S2, system operation and on-line training stage: in the operation process of the unmanned aerial vehicle, the reinforcement learning evaluation network is used for real-time online updating, self-learning and self-improvement of the reinforcement learning fault-tolerant controller are realized through online updating in the active fault-tolerant control process of the unmanned aerial vehicle, and real-time online updating of the extreme learning machine is realized through a dynamic capacity-expansion updating algorithm. The invention optimizes the reinforcement learning method by adopting an incremental strategy, realizes asymptotic approach to an optimal fault-tolerant control strategy, and can better realize fault-tolerant control of the unmanned aerial vehicle.
The specific implementation steps of the early off-line training stage of step S1 are as follows:
s11, establishing a dynamic model of the unmanned aerial vehicle; considering that the drone flies at high altitude at constant speed, the dynamical model is described using a simplified three-degree-of-freedom model. The embodiment of the invention adopts an aircraft fault diagnosis experimental platform of ' advanced aircraft navigation, control and health management ' Ministry of industry and communications ' key laboratory of Nanjing aerospace university, and the fault model of the unmanned aerial vehicle under the established fault condition is as follows:
wherein x ∈ R4×1Is the state of the system in which,theta is a variable of the pitch angle,as a roll angle variable, u ═ u1u2u3u4]TFor control input, A ∈ R4×4,B∈R4×1,C∈R1×4,D∈R1×1For the system matrix, y ∈ R is the output of the control system, φ (t-t)1)fa(t)、φ(t-t2)Ffs(t) indicates actuator and sensor faults, respectively, in a flight control system, where fa(t) unknown actuator fault offset value, Ffs(t) is the bias value for unknown sensor faults, where F ∈ R1×4,fs(t)∈R4×1,φ(t-tf) The definition function for the fault generation time is defined as follows:
wherein t isfFor the time of unknown fault generation in the flight control system, pass phi (t-t) in the model builtf) The function represents a sudden failure of the system (at time t)fAfter which a fault occurs). The system matrix is specifically represented as follows:
C=[0 1 0 0]D=0
and S12, acquiring operation data in the unmanned aerial vehicle control system through the established mathematical model, acquiring the data through a sensor when the unmanned aerial vehicle operates normally and under the condition of failure through fault injection, wherein specific data labels are attitude Euler angle data, serial numbers of fault-tolerant strategies and actual output of the control system, and the acquired data are used as training data of an evaluation network. The selected data are variables which are selected from the control system and can reflect the running state of the control system, the variables can reflect the current running state of the system, and the fault-tolerant controller extracts useful characteristics through the system state and uses the useful characteristics as an important basis for decision making of the fault-tolerant controller. Defining the fault-tolerant control system state as the attitude angle of the flight control system:
and S is the label attribute of the data set, n is the number of the strategy action, and the acquired data is used as the training data of the evaluation network.
Step S13, optimizing the reinforcement learning Q-learning algorithm by using an extreme learning method, where the evaluation network of the reinforcement learning fault-tolerant controller is a three-layer extreme learning network including a single hidden layer, and the specific structure is shown in fig. 3.
And step S14, performing off-line training and updating on the constructed extreme learning machine network according to the collected operation data, and optimizing the extreme learning machine network through a genetic algorithm. The process is as follows:
and step S141, forming a training data sample set by the acquired training data samples according to a time sequence.
S142, establishing a random parameter population of hidden layer parameters of the extreme learning machine through a genetic algorithm, and passing through a fitness function f through heredity, intersection and variation of the genetic algorithm processfitnessAnd (4) optimizing the population, and after a certain number of iterations, training to obtain an evaluation network model with the highest accuracy after the fitness function reaches the optimal value and does not change any more. Wherein the fitness function ffitnessIs represented as follows:
in the formula, yiIndicates the i-th sample expected output value, yi' represents an actual output value after the i-th sample is input into the model. After a certain number of iterations, after the fitness function reaches an optimal value and does not change any more, training to obtain an evaluation network model with the highest accuracy.
In the early off-line training stage, training data is constructed through the step 141, for an extreme learning machine algorithm, updating of output layer weights is performed through solving of the Moore-Penrose generalized inverse in a linear equation, for a training process of a genetic algorithm optimization extreme learning machine model, firstly, hidden layer random parameter samples of a certain scale are randomly initialized, then, training is performed on all samples in a population, errors of all samples are solved to serve as fitness functions of the genetic algorithm, then, elimination is performed according to the fitness functions of all the individuals, then, operations such as crossing and mutation are performed on the eliminated individuals, after the operations such as crossing and mutation are performed, next-step sample training is performed continuously, then, iteration is performed continuously according to the operations, and the specific training process is shown in FIG. 3.
Defining a historical experience quadruplet (S)k,Ak,Rk,Sk+1) In which S iskFor the current state value of the unmanned aerial vehicle aircraft control system, AkFault-tolerant strategic actions, R, made by the unmanned aerial vehicle aircraft control system in the current statekAction A taken for unmanned aerial vehicle aircraft in current statekThe value of the reward obtained, Sk+1Taking action A for unmanned aerial vehicle aircraft control system in current statekThe next state value that the unmanned aerial vehicle aircraft control system of back reached. In the process of updating the reinforcement learning evaluation network training, S is needed to passkAnd AkTo obtain Sk+1Further update Sk,AkThe following Q function: q (S)k,Ak) The invention realizes the S pair through a state transition prediction networkk+1And (4) predicting.
Iterative optimization is carried out on the training process of the extreme learning machine through a genetic algorithm, the number of initialized population of the genetic algorithm is 2000, the number of iteration is 200, a fitness function adopts the reciprocal of the sum of squares of model training errors, and the algorithm aims at maximizing a self-adaptive function, so that the error minimization is realized; the number of hidden layer nodes for the extreme learning machine network is 128.
The specific implementation steps of the system operation and on-line training stage of the step S2 are as follows:
s21, sorting the data of the unmanned aerial vehicle according to time sequence and inputting the data into SkThe output is Sk+1The training data sample set is formed according to the time progressive sequence of the training samples.
And step S22, training the training data sample set obtained in the step S21 through a BP neural network.
The method comprises the steps that the operation state of the unmanned aerial vehicle flight control system is predicted, when the state transition prediction network training is finished, the control system conducts fault-tolerant tracking control on a reinforcement learning method based on an incremental strategy, and in the process, through the decision of each step and the instant reward valueUpdating the evaluation network on line, wherein the real-time reward value evaluation criterion is the absolute value of the error between the expected output and the actual output of the control system, and defines a reward function J (S)t) The specific functional form is as follows:
wherein gamma is a discount factor, and meets the condition that gamma is more than 0 and less than or equal to 1; and U (S)t-j,At-j) The utility function of the reinforcement learning algorithm is specifically in the form as follows:
U(St,At)=Q(St,At)
and Q (S)t,At) The function is mathematically formed as:
Q(St,At)=|y(t,At)-yd(t)|
where t is the system runtime, y (t, A)t) Making a decision A for the current time of the systemtActual output, y, obtained by the rear control systemd(t) is the desired control system output at the current time.
Step S23, determining whether the current system fails according to data such as attitude angle and actuator current and voltage acquired by a sensor during the operation of the unmanned aerial vehicle control system, and if the current system fails, changing the reward value corresponding to each action in the current state and the policy action set by the evaluation network, wherein the mathematical expression form of the policy action set is as follows:
Ω={Λ1,Λ2,Λ3,Λ4,Λ5}
wherein ΛaFor the optional a-th configuration in the system, a is 1,2,3,4, 5. In a specific application embodiment, an incremental strategy is adopted to realize asymptotic approximation of an optimal fault-tolerant control strategy, a strategy made by a fault-tolerant controller at each moment is superposed into a current strategy signal, and for the application embodiment of the invention, the following five incremental strategies are defined:
1. actions taken when the system is normal: lambda1=[0 0 0 0]
2. And (3) compensating the positive action of actuator fault: lambda2=+[0 0.0002 0 0]
3. Compensating for negative actions of actuator faults: lambda3=-[0 0.0002 0 0]
4. Forward action to compensate for sensor failure: lambda4=+[0 0 0.0002 0]
5. Negative actions to compensate for sensor failure: lambda5=-[0 0 0.0002 0]
Step S24, evaluating the current operation state value S of the network passing control systemkAnd each incremental strategy action in the strategy action set is used as a model input, action selection is carried out by combining model output with an epsilon-Greedy strategy, and then the incremental strategy actions made by the evaluation network are superposed in the existing action and act on the current control signal to realize fault tolerance.
The online updating of the extreme learning machine network is realized through a dynamic capacity-expansion updating algorithm, the method does not need to carry out feedforward transmission on the current sample error through an algorithm similar to gradient descent, and realizes the online quick updating through the direct expansion of training data and the rapidity of the updating algorithm of the extreme learning machine, and the specific steps are shown in fig. 4.
For the updating process of reinforcement learning, firstly, initializing a Q-learning evaluation network, randomly initializing parameters of a neural network, inputting data of the neural network into the state of the system and the sequence number of the action currently taken, outputting the input data into the current state and obtaining a reward value U (S) by taking the action in the statecurrent,Acurrent)。
Next, the current state S is collectedt: actions are randomly chosen among all the action sets with a probability of epsilon, and action A that maximizes the reward value (which in this context is the error between the actual output and the desired output of the system) is chosen with a probability of (1-epsilon)t=argmaxQ(St) Recording the current state StAnd action AtThe value of the reward is U (S)current,Acurrent)。
Step S25, after the decision module of the reinforcement learning active fault-tolerant controller gives out the fault-tolerant strategy, the reward value function of the current state and the given strategy is solved, and the current immediate report value Q (S) is passedcurrent,Acurrent) Discounted historical valuesAnd obtaining the accumulated discount return value, wherein the mathematical expression form of the updated reward value is as follows:whileAnd outputting the return value for the state under the current evaluation network.
Step S26, numbering the current state value, the policy action taken and the result of step S25As a new data sample, expand into the existing training data set.
And step S27, obtaining a training algorithm of Moore-Penrose generalized inverse through a genetic algorithm optimization extreme learning machine to obtain a latest training model.
And step S28, repeating the above process for each sampling period until the flight mission is completed.
And updating the state transition prediction network at intervals by controlling the historical operation data of the system, and predicting the next state value by the current state and action value. In order to reduce the pressure of a processor and ensure the rapidity of the system, the state transition prediction network is updated once every 10 sampling periods on the premise of not influencing the accurate judgment of the fault-tolerant controller.
In order to verify the effect of the fault-tolerant control, the invention applies an aircraft fault diagnosis experimental platform of an engineering and trust department key laboratory of "advanced aircraft navigation, control and health management" of Nanjing aerospace university to carry out verification experiments, when an actuator fault is injected into the experimental platform, the system attitude can continuously track an expected signal after generating a deviation under the fault-tolerant control of an active fault-tolerant controller based on an extreme learning machine and an incremental strategy reinforcement learning method, and the output residual error of the unmanned aerial vehicle aircraft is shown in FIG. 5. When a sensor fault is injected into the experimental platform, the output residual error of the unmanned aerial vehicle is shown in fig. 6.
According to the simulation result, when the unmanned aerial vehicle aircraft generates actuator faults or sensor faults in the flight process, the unmanned aerial vehicle active fault-tolerant control method based on the extreme learning machine and incremental strategy reinforcement learning can have a good fault-tolerant effect without depending on a system model in the operation process, and can realize on-line self-learning and updating. The method has important applicable reference value for fault-tolerant control of the unmanned aerial vehicle aircraft with faults.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.
Claims (6)
1. An active fault-tolerant control method of an unmanned aerial vehicle based on reinforcement learning is characterized by comprising the following steps:
step 1, establishing an unmanned aerial vehicle dynamic model, and performing fault injection on an unmanned aerial vehicle to obtain an unmanned aerial vehicle aircraft fault model under the fault condition;
step 2, defining five different incremental strategies, including a non-compensation action, a positive action for compensating actuator faults, a negative action for compensating actuator faults, a positive action for compensating sensor faults and a negative action for compensating sensor faults, traversing an unmanned aerial vehicle fault model by one incremental strategy in sequence, and acquiring unmanned aerial vehicle attitude data under each incremental strategy through a sensor;
step 3, training a reinforcement learning evaluation network based on a genetic algorithm-extreme learning machine by using the attitude data of the unmanned aerial vehicle to obtain a trained reinforcement learning evaluation network;
step 4, when the unmanned aerial vehicle aircraft fault model is traversed according to the uncompensated action strategy in the step 2, the acquired unmanned aerial vehicle attitude data is used for training the state transition prediction network to obtain a trained state transition prediction network;
step 5, setting the training data set to be empty, and acquiring attitude angle data S once in each sampling period in the operation process of the unmanned aerial vehicle flight control systemkRespectively combining five different incremental strategies with the attitude angle data SkInput data are formed and input to a current reinforcement learning evaluation network, and reward values corresponding to different incremental strategies under a current attitude angle are obtained;
and 6, selecting the optimal incremental strategy under the current attitude angle according to the reward values corresponding to different incremental strategies and combining the epsilon-Greedy strategy and executing the strategy to obtain the system instant return value Q (S)current,Acurrent);
Step 7, predicting the attitude angle of the next sampling period according to the current attitude angle data and the current state transition prediction network to obtain the attitude angle predicted value of the next sampling period;
step 8, repeating the step 5 and the step 6 on the attitude angle predicted value of the next sampling period to obtain the optimal incremental strategy corresponding to the next sampling period and the system instant return value Q (S)next,Anext) Calculating the prize value to be updated
Step 9, the current attitude angle data SkOptimal incremental strategy under current attitude angle and reward value needing to be updatedAs a new data sample, expanding the capacity of the current training data set, and updating the current reinforcement learning evaluation network by using the current training data set;
and step 10, repeating the steps 5-9 for each sampling period until the flight mission is completed.
2. The active fault-tolerant control method for the unmanned aerial vehicle based on reinforcement learning of claim 1, wherein the fault model of the unmanned aerial vehicle under the fault condition in step 1 is specifically:
wherein x ∈ R4×1Is a state variable of the system and is,theta is a variable of the pitch angle,as a variable of the roll angle,is composed ofThe derivative of (a) of (b),is composed ofU is the control input, A, B, C, D are all system matrices, y is the output of the control system, phi (t-t)1)fa(t)、φ(t-t2)Ffs(t) indicates actuator failure, sensor failure in the flight control system, respectively, fa(t) unknown actuator fault offset value, Ffs(t) is the unknown sensor fault offset value, phi (t-t)f) Generating a time function for the fault, an
tfTime, t represents time, generated for an unknown fault in the flight control system.
3. The active fault-tolerant control method for the unmanned aerial vehicle based on reinforcement learning of claim 1, wherein the specific process of the step 3 is as follows:
step 31, sorting the unmanned aerial vehicle attitude data acquired in the step 2 according to a time sequence order to form a training sample set;
and step 32, the reinforcement learning evaluation network based on the genetic algorithm-extreme learning machine comprises a single hidden layer, a random parameter population of parameters of the hidden layer of the extreme learning machine is created through the genetic algorithm, the random parameter population is eliminated through a fitness function, the rest random parameter population is subjected to inheritance, crossing and mutation operations of the genetic algorithm, the elimination-inheritance-crossing-mutation process is repeated until the fitness function reaches an optimal value, and the trained reinforcement learning evaluation network is obtained.
4. The active fault-tolerant control method for unmanned aerial vehicle based on reinforcement learning of claim 1, wherein the updated reward value of step 8The calculation formula is as follows:
wherein, Q (S)current,Acurrent) Representing the current attitude angle SkThe system immediate return value obtained by executing the optimal incremental strategy, wherein lambda represents a discount factor, 0 < lambda < 1, and Q (S)next,Anext) Indicates the next attitude angle SnextAnd executing the optimal incremental strategy to obtain the system instant return value.
5. The active fault-tolerant control method for the unmanned aerial vehicle based on reinforcement learning of claim 1, wherein the specific updating method in step 9 is as follows: and (3) solving a training algorithm of Moore-Penrose generalized inverse through a genetic algorithm optimization extreme learning machine, and updating the current reinforcement learning evaluation network.
6. The active fault-tolerant control method for unmanned aerial vehicles based on reinforcement learning of claim 1, wherein the current state transition prediction network is updated once every 10 sampling periods in step 7, and if the current sampling period is to update the state transition prediction network, the training data adopted during the update is the attitude angle data acquired in the current sampling period and the attitude angle data acquired in the first 9 sampling periods of the current sampling period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030358.4A CN111190429B (en) | 2020-01-13 | 2020-01-13 | Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030358.4A CN111190429B (en) | 2020-01-13 | 2020-01-13 | Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111190429A true CN111190429A (en) | 2020-05-22 |
CN111190429B CN111190429B (en) | 2022-03-18 |
Family
ID=70708146
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010030358.4A Active CN111190429B (en) | 2020-01-13 | 2020-01-13 | Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111190429B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111679579A (en) * | 2020-06-10 | 2020-09-18 | 南京航空航天大学 | Sliding mode prediction fault-tolerant control method for fault system of sensor and actuator |
CN111783250A (en) * | 2020-07-03 | 2020-10-16 | 上海航天控制技术研究所 | Flexible robot end arrival control method, electronic device, and storage medium |
CN112180960A (en) * | 2020-09-29 | 2021-01-05 | 西北工业大学 | Unmanned aerial vehicle fault-tolerant flight method and flight system for actuator faults |
CN113467248A (en) * | 2021-07-22 | 2021-10-01 | 南京大学 | Fault-tolerant control method for unmanned aerial vehicle sensor during fault based on reinforcement learning |
CN113919495A (en) * | 2021-10-11 | 2022-01-11 | 浙江理工大学 | Multi-agent fault-tolerant consistency method and system based on reinforcement learning |
CN114153640A (en) * | 2021-11-26 | 2022-03-08 | 哈尔滨工程大学 | System fault-tolerant strategy method based on deep reinforcement learning |
CN114756038A (en) * | 2022-03-23 | 2022-07-15 | 北京理工大学 | Data-driven unmanned aerial vehicle wind disturbance model online wind disturbance estimation method |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104914851A (en) * | 2015-05-21 | 2015-09-16 | 北京航空航天大学 | Adaptive fault detection method for airplane rotation actuator driving device based on deep learning |
CN105915294A (en) * | 2016-06-20 | 2016-08-31 | 中国人民解放军军械工程学院 | Unmanned aerial vehicle airborne transmitter fault forecasting method and system |
CN107315892A (en) * | 2017-08-10 | 2017-11-03 | 北京交通大学 | A kind of Method for Bearing Fault Diagnosis based on extreme learning machine |
CN107316046A (en) * | 2017-03-09 | 2017-11-03 | 河北工业大学 | A kind of method for diagnosing faults that Dynamic adaptiveenhancement is compensated based on increment |
US20180129958A1 (en) * | 2016-11-09 | 2018-05-10 | Cognitive Scale, Inc. | Cognitive Session Graphs Including Blockchains |
CN108256173A (en) * | 2017-12-27 | 2018-07-06 | 南京航空航天大学 | A kind of Gas path fault diagnosis method and system of aero-engine dynamic process |
CN109001982A (en) * | 2018-10-19 | 2018-12-14 | 西安交通大学 | A kind of nonlinear system adaptive neural network fault tolerant control method |
CN109408552A (en) * | 2018-08-08 | 2019-03-01 | 南京航空航天大学 | The monitoring of the civil aircraft system failure and recognition methods based on LSTM-AE deep learning frame |
CN109799802A (en) * | 2018-12-06 | 2019-05-24 | 郑州大学 | Sensor fault diagnosis and fault tolerant control method in a kind of control of molecular weight distribution |
KR20190064111A (en) * | 2017-11-30 | 2019-06-10 | 한국에너지기술연구원 | Energy management system and energy management method including fault tolerant function |
CN110244689A (en) * | 2019-06-11 | 2019-09-17 | 哈尔滨工程大学 | A kind of AUV adaptive failure diagnostic method based on identification feature learning method |
CN110413000A (en) * | 2019-05-28 | 2019-11-05 | 北京航空航天大学 | A kind of hypersonic aircraft based on deep learning reenters prediction and corrects fault-tolerant method of guidance |
-
2020
- 2020-01-13 CN CN202010030358.4A patent/CN111190429B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104914851A (en) * | 2015-05-21 | 2015-09-16 | 北京航空航天大学 | Adaptive fault detection method for airplane rotation actuator driving device based on deep learning |
CN105915294A (en) * | 2016-06-20 | 2016-08-31 | 中国人民解放军军械工程学院 | Unmanned aerial vehicle airborne transmitter fault forecasting method and system |
US20180129958A1 (en) * | 2016-11-09 | 2018-05-10 | Cognitive Scale, Inc. | Cognitive Session Graphs Including Blockchains |
CN107316046A (en) * | 2017-03-09 | 2017-11-03 | 河北工业大学 | A kind of method for diagnosing faults that Dynamic adaptiveenhancement is compensated based on increment |
CN107315892A (en) * | 2017-08-10 | 2017-11-03 | 北京交通大学 | A kind of Method for Bearing Fault Diagnosis based on extreme learning machine |
KR20190064111A (en) * | 2017-11-30 | 2019-06-10 | 한국에너지기술연구원 | Energy management system and energy management method including fault tolerant function |
CN108256173A (en) * | 2017-12-27 | 2018-07-06 | 南京航空航天大学 | A kind of Gas path fault diagnosis method and system of aero-engine dynamic process |
CN109408552A (en) * | 2018-08-08 | 2019-03-01 | 南京航空航天大学 | The monitoring of the civil aircraft system failure and recognition methods based on LSTM-AE deep learning frame |
CN109001982A (en) * | 2018-10-19 | 2018-12-14 | 西安交通大学 | A kind of nonlinear system adaptive neural network fault tolerant control method |
CN109799802A (en) * | 2018-12-06 | 2019-05-24 | 郑州大学 | Sensor fault diagnosis and fault tolerant control method in a kind of control of molecular weight distribution |
CN110413000A (en) * | 2019-05-28 | 2019-11-05 | 北京航空航天大学 | A kind of hypersonic aircraft based on deep learning reenters prediction and corrects fault-tolerant method of guidance |
CN110244689A (en) * | 2019-06-11 | 2019-09-17 | 哈尔滨工程大学 | A kind of AUV adaptive failure diagnostic method based on identification feature learning method |
Non-Patent Citations (3)
Title |
---|
CHANGSHENG HUA等: "A New Method for Fault Tolerant Control through Q-Learning", 《IFAC-PAPERSONLINE》 * |
常峰等: "基于强化学习和蚁群算法的WSN节点故障诊断", 《计算机测量与控制》 * |
蒋银行等: "基于增益调度PID的四旋翼无人机主动容错控制", 《山东科技大学学报(自然科学版)》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111679579A (en) * | 2020-06-10 | 2020-09-18 | 南京航空航天大学 | Sliding mode prediction fault-tolerant control method for fault system of sensor and actuator |
CN111783250A (en) * | 2020-07-03 | 2020-10-16 | 上海航天控制技术研究所 | Flexible robot end arrival control method, electronic device, and storage medium |
CN112180960A (en) * | 2020-09-29 | 2021-01-05 | 西北工业大学 | Unmanned aerial vehicle fault-tolerant flight method and flight system for actuator faults |
CN112180960B (en) * | 2020-09-29 | 2021-09-14 | 西北工业大学 | Unmanned aerial vehicle fault-tolerant flight method and flight system for actuator faults |
CN113467248A (en) * | 2021-07-22 | 2021-10-01 | 南京大学 | Fault-tolerant control method for unmanned aerial vehicle sensor during fault based on reinforcement learning |
CN113919495A (en) * | 2021-10-11 | 2022-01-11 | 浙江理工大学 | Multi-agent fault-tolerant consistency method and system based on reinforcement learning |
CN113919495B (en) * | 2021-10-11 | 2024-09-17 | 浙江理工大学 | Multi-agent fault tolerance consistency method and system based on reinforcement learning |
CN114153640A (en) * | 2021-11-26 | 2022-03-08 | 哈尔滨工程大学 | System fault-tolerant strategy method based on deep reinforcement learning |
CN114153640B (en) * | 2021-11-26 | 2024-05-31 | 哈尔滨工程大学 | System fault-tolerant strategy method based on deep reinforcement learning |
CN114756038A (en) * | 2022-03-23 | 2022-07-15 | 北京理工大学 | Data-driven unmanned aerial vehicle wind disturbance model online wind disturbance estimation method |
CN114756038B (en) * | 2022-03-23 | 2024-09-17 | 北京理工大学 | Data-driven unmanned aerial vehicle wind disturbance model online wind disturbance estimation method |
Also Published As
Publication number | Publication date |
---|---|
CN111190429B (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111190429B (en) | Unmanned aerial vehicle active fault-tolerant control method based on reinforcement learning | |
CN112149316B (en) | Aero-engine residual life prediction method based on improved CNN model | |
CN110222371B (en) | Bayes and neural network-based engine residual life online prediction method | |
CN111241952A (en) | Reinforced learning reward self-learning method in discrete manufacturing scene | |
Wang et al. | Neural-network-based fault-tolerant control of unknown nonlinear systems | |
CN112439794B (en) | Hot rolling bending force prediction method based on LSTM | |
CN112947385B (en) | Aircraft fault diagnosis method and system based on improved Transformer model | |
CN109885916B (en) | Mixed test online model updating method based on LSSVM | |
CN114692310B (en) | Dueling DQN-based virtual-real fusion primary separation model parameter optimization method | |
Xie et al. | A novel deep belief network and extreme learning machine based performance degradation prediction method for proton exchange membrane fuel cell | |
Cen et al. | A gray-box neural network-based model identification and fault estimation scheme for nonlinear dynamic systems | |
Nasser et al. | A hybrid of convolutional neural network and long short-term memory network approach to predictive maintenance | |
Ma et al. | Deep auto-encoder observer multiple-model fast aircraft actuator fault diagnosis algorithm | |
Precup et al. | A survey on fuzzy control for mechatronics applications | |
CN115972211A (en) | Control strategy offline training method based on model uncertainty and behavior prior | |
CN112146879A (en) | Rolling bearing fault intelligent diagnosis method and system | |
Yin et al. | Dynamic behavioral assessment model based on Hebb learning rule | |
CN116432359A (en) | Variable topology network tide calculation method based on meta transfer learning | |
CN114880767B (en) | Aero-engine residual service life prediction method based on attention mechanism Dense-GRU network | |
Long et al. | A data fusion fault diagnosis method based on LSTM and DWT for satellite reaction flywheel | |
Liu et al. | Aero-Engines Remaining Useful Life Prognostics Based on Multi-Hierarchical Gated Recurrent Graph Convolutional Network | |
CN114491790A (en) | MAML-based pneumatic modeling method and system | |
Zhou et al. | A health status estimation method based on interpretable neural network observer for HVs | |
Ji et al. | Data preprocessing method and fault diagnosis based on evaluation function of information contribution degree | |
Vladov et al. | Control and diagnostics of TV3-117 aircraft engine technical state in flight modes using the matrix method for calculating dynamic recurrent neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |