CN114296350A - Unmanned ship fault-tolerant control method based on model reference reinforcement learning - Google Patents
Unmanned ship fault-tolerant control method based on model reference reinforcement learning Download PDFInfo
- Publication number
- CN114296350A CN114296350A CN202111631716.8A CN202111631716A CN114296350A CN 114296350 A CN114296350 A CN 114296350A CN 202111631716 A CN202111631716 A CN 202111631716A CN 114296350 A CN114296350 A CN 114296350A
- Authority
- CN
- China
- Prior art keywords
- unmanned ship
- model
- fault
- reinforcement learning
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000006870 function Effects 0.000 claims abstract description 46
- 238000011217 control strategy Methods 0.000 claims abstract description 42
- 238000011156 evaluation Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000005259 measurement Methods 0.000 claims description 7
- 230000005284 excitation Effects 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 claims description 4
- 230000005484 gravity Effects 0.000 claims description 4
- 238000013016 damping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 125000004432 carbon atom Chemical group C* 0.000 claims description 2
- 239000006185 dispersion Substances 0.000 claims description 2
- 238000003745 diagnosis Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000013643 reference control Substances 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000012614 Monte-Carlo sampling Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Feedback Control In General (AREA)
Abstract
The invention discloses a model reference reinforcement learning-based unmanned ship fault-tolerant control method, which comprises the following steps: analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship; designing a nominal controller of the unmanned ship based on the name meaning kinetic model of the unmanned ship; the Actor-criticic method based on the maximum entropy is used for constructing a fault-tolerant controller based on model reference reinforcement learning according to a state variable difference value of an actual unmanned ship system and an unmanned ship name-sense dynamic model and the output of an unmanned ship nominal controller; and according to the control task requirements, building a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy. By using the unmanned ship system, the safety and the reliability of the unmanned ship system can be obviously improved. The unmanned ship fault-tolerant control method based on model reference reinforcement learning can be widely applied to the field of unmanned ship control.
Description
Technical Field
The invention relates to the field of unmanned ship control, in particular to an unmanned ship fault-tolerant control method based on model reference reinforcement learning.
Background
With the remarkable progress of guidance, navigation and control technologies, the use of unmanned ships (ASV) has occupied a part of the great weight of aviation. In most applications, unmanned boats are expected to operate safely without human intervention for extended periods of time. Thus, there is a need for unmanned ships that have sufficient safety and reliability attributes to provide proper operation and avoid catastrophic consequences. However, unmanned ships are prone to failure, degradation of system construction, sensor failure, etc., and thus experience performance degradation, instability, and even catastrophic loss.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a model-based reference reinforcement learning unmanned ship fault-tolerant control method, which can recover system performance or maintain system operation after a fault occurs, thereby significantly improving system safety and reliability.
The first technical scheme adopted by the invention is as follows: a fault-tolerant control method of an unmanned ship based on model reference reinforcement learning comprises the following steps:
s1, analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship;
s2, designing a nominal controller of the unmanned ship based on the name meaning dynamic model of the unmanned ship;
s3, constructing a fault-tolerant controller based on model reference reinforcement learning according to the actual unmanned ship system, the state variable difference value of the unmanned ship name-sense dynamic model and the output of the unmanned ship nominal controller by an Actor-criticic method based on the maximum entropy;
and S4, building a reinforcement learning evaluation function and a control strategy model according to the control task requirements, and training a fault-tolerant controller to obtain a trained control strategy.
Further, the formula of the unmanned ship name meaning dynamic model is as follows:
in the above formula, the first and second carbon atoms are,representing a generalized coordinate vector, v representing a generalized velocity vector, u representing a control force and moment, M representing an inertia matrix, C (v) including Coriolis force and centripetal force, D (v) representing a damping matrix, G (v) representing unmodeled dynamics due to gravity, buoyancy and moment, B representing a preset input matrix
Further, the formula of the nominal controller of the unmanned ship is as follows:
in the above formula, NmAnd HmComprising all known constant parameters, η, of the unmanned ship dynamics modelmGeneralized coordinate vector, u, representing a nominal modelmRepresenting the control law, xmRepresenting the state of the reference model.
Further, the formula of the fault-tolerant controller is as follows:
in the above formula, HmL represents a Hurwitz matrix,ulrepresents the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner-loop dynamics, nvRepresenting a noise vector on the generalized velocity measurement, fvIndicating a sensor fault acting on the generalized velocity vector.
Further, the formula of the reinforcement learning evaluation function is expressed as follows:
Qπ(st,ul,t)=TπQπ(st,ul,t)
in the above formula, ul,tIndicating the control excitation, s, from the RLtRepresenting the state signal at a time step T, TπDenotes a fixed policy, EπRepresenting the desired operator, gamma representing the discount factor, alpha representing the temperature coefficient, Qπ(st,ul,t) Representing a reinforcement learning evaluation function.
Further, the formula of the control strategy model is expressed as follows:
in the above formula, pi represents a policy set, pioldA policy representing a previous update is shown,denotes pioldQ value of (D)KLThe dispersion of the KL is expressed,represents a normalization factor,(s)tAnd.) represents a control strategy and the dots represent a write that omits the arguments.
Further, according to the control task requirement, a step of building a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy is specifically included:
and S41, building a reinforcement learning rating function and a model strategy model for the fault-tolerant controller based on model reference reinforcement learning according to the control task requirements.
S42, training the fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
and S43, injecting faults into the unmanned ship system, retraining the initial control strategy and returning to the step S41 until the reinforcement learning evaluation function network model and the control strategy model converge.
Further, still include:
introducing a double-evaluation function model, and adding strategy entropy value into a control strategy expected return function, wherein RtIs a reward function, Rt=R(st,ul,t)。
The method has the beneficial effects that: aiming at an unmanned ship system with model uncertainty and sensor faults, the invention provides a fault-tolerant control algorithm based on reinforcement learning, which combines model reference reinforcement learning with a fault diagnosis and estimation mechanism, and takes the Monte Carlo sampling efficiency into consideration, an Actor-Critic model is used to convert accumulated income into a Q function, and through new fault-tolerant control based on reinforcement learning, the unmanned ship can be ensured to be capable of learning to adapt to different sensor faults and recover the track tracking performance under the fault condition.
Drawings
FIG. 1 is a flow chart of steps of an unmanned ship fault-tolerant control method based on model reference reinforcement learning according to the invention;
fig. 2 is a block diagram of an Actor-critical network according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
As shown in fig. 1, the present invention provides a fault-tolerant control method for an unmanned ship based on model reference Reinforcement Learning (RL), which includes the following steps:
s1, analyzing the inherent uncertainty factors of the unmanned ship, neglecting all nonlinear terms in the inner loop dynamics, obtaining a linear and decoupling model of a dynamics equation of the generalized velocity vector, and establishing a nominal dynamics model of the unmanned ship;
the dynamic model is specifically as follows:
whereinIs a generalized coordinate vector, xpAnd ypRepresenting the horizontal coordinates of the ASV in the inertial system,is the heading angle. v ═ up,vp,rp]T∈R3Is a generalized velocity vector, upAnd vpLinear velocities in the x-and y-directions, rpIs the heading angular rate. u ═ τu,τr]∈R3Control of force and moment, g (v) ═ g1(v),g2(v),g3(v)]T∈R3Is unmodeled dynamics due to gravity, buoyancy and moment, M ∈ R3×3Is provided with M ═ MTAn inertia matrix > 0 and
wherein C is13(v)=-M22v-M23r,C23(v)=M11u. Damping matrix
Wherein D11(v)=-Xu-X|u|u|u|-Xuuuu2,D22(v)=-Yv-Y|v|v|v|-Y|r|v|r|,D23(v)=-Yr-Y|v|r|v|-Y|r|r|r|,D32(v)=-Nv-N|v|v|v|-N|r|v|r|,D33(v)=-Nr-N|v|r|v|-N|r|rAnd | r |, X (·), Y (·), N (·) are hydrodynamic coefficients, and the definitions are detailed in ship hydrodynamic and motion control manuals. Rotation matrixInput matrix
Definition x ═ ηT vT]TIs provided with
Wherein H (v) ═ M-1(c (v) + d (v)) and N ═ M-1B。
The state measurement of the ASV system (1) is corrupted by noise and sensor faults and is therefore denoted as y ═ x + n + f (t), where n ∈ R6Is the measurement of the noise vector, f (t) e R6Representing possible sensor fault vectors. In the invention, only the sensor fault is considered to the course angular rate rpSo that f (t) is [0,0,0,0,0, fr(t)]T. Sensor failure fr(t) is given by:
fr(t)=β(t-Tf)φ(t-Tf) Wherein phi (T-T)f) Is an unknown function of sensor failure occurring at the instant T, β (T-T)f) For T < TfTime beta (T-T)f) 0 and T > TfTime of flight(k is the evolution rate of the fault). Note that if the occurrence of a sensor fault is sudden, such as a bias fault, k → ∞. The object of the invention is to design a controller that allows state x to track state x in the presence of model uncertainties, possible sensor failures and measurement noiserThe indicated reference state trace.
S2, designing a nominal controller of the unmanned ship based on the nominal dynamics model, and ensuring the basic stability of the unmanned ship system under the condition of no fault. And analyzing the nominal model of the unmanned ship.
The nominal controller design process is as follows:
the proposed RL-based FTC algorithm follows a model reference control structure. For most ASV systems, accurate nonlinear dynamical models are rarely available, with the main uncertainties coming from M, C (v) and d (v) due to fluid mechanics, and g (v) due to gravity and buoyancy and moments. Despite the uncertainty in the ASV dynamics, the nominal model (5) can still be used based on the known information of the ASV dynamics. The nominal model of the uncertain ASV model (5) is as follows:
wherein N ismAnd HmContains all known constant parameters of the ASV dynamics (5),is the generalized coordinate vector of the nominal model. In the present invention, MmIs formed by Mm=diag{M11,M22,M33< derived, Hm=Mm -1DmFrom Dm=diag{-Xu,-Yv,-NrAnd Nm=Mm -1And B, obtaining. Therefore, in the nominal model, all nonlinear terms in the inner loop dynamics are ignored, and therefore, the linear solution of the generalized velocity vector v kinetic equation is finally obtainedAnd (4) coupling the model. Since the dynamics of the nominal model (6) are known, it is possible to design the control law umTo allow the state of the nominal system (6) to converge to the reference signal xrE.g., | | x when t → ∞ time | | xm-xr||2→ 0. Control law umCan also be used as a nominal controller by the whole ASV dynamics (5).
In the model reference control architecture, the goal is to design a control law that allows the states of (5) to track the state trajectory of the nominal model (6). The overall control law of the ASV system (5) has the following expression:
u=ub+ul (7)
wherein u isbIs a model-based approach of nominal, ulIs a control strategy from a deep learning module. Baseline control ubFor ensuring some basic properties (i.e. local stability), and ulTo compensate for all system uncertainties and sensor failures.
S3, constructing the fault-tolerant controller based on model reference reinforcement learning by taking the difference value of the state variables of the actual unmanned ship system and the nominal model and the output of the nominal controller as input.
Network block diagram of Actor-criticic referring to fig. 2, the specific derivation process of the fault-tolerant controller is as follows:
the formula for RL is based on a Markov decision process MDP represented by a tuple<S,U,P,R,γ>Where S is the state space, U specifies the operation/input space, P: S × U × S → R defines the transition probability, R: S × U → R is a reward function, γ ∈ [0,1) is a discount coefficient. In MDP, the state vector S ∈ S contains the influence RL control ulAll available signals of e U. For the tracking control of the ASV system in the invention, the transition probability is determined by the ASV dynamic and the reference signal x in (1)rAnd (5) characterizing. In the RL, the control strategy is learning using data samples collected in the discrete time domain. Let stIs the state signal s at the time step t, respectively ul,tIs the input of the RL-based control at time step t. The RL algorithm of the present invention aims to maximize a cost of action function, also called the Q function, such asShown below:
wherein R istIs a reward function, Rt=R(st,ul,t),And V isπ(st+1) is called s under strategy πtA state value function of +1, wherein
Wherein pi (u)l,t|st) Is a control strategy that is based on the fact that,is the entropy of the strategy and α is the temperature parameter. Control strategy pi (u) in RLl,t|st) Is to select action ul,tE.g. U in state stE.g., the probability under S. In the present invention, a control strategy is employed that satisfies a Gaussian distribution, i.e.
π(ul|s)=N(ul(s),σ) (10)
Wherein N (·,. cndot.) represents a Gaussian distribution, ul(s) is the mean and σ is the covariance matrix. The covariance matrix sigma controls the exploratory performance of the learning phase.
The goal of RL is to find an optimal control strategy π*Making Q in (8)π(st,ul,t) To a maximum, i.e.
π*=argmaxQπ(st,ul,t) (11)
Note that the variance σ*Will converge to 0, once the optimal strategy is obtained*(ul *|s)=N(ul *(s),σ*) Mean function ul *(s) optimal control law deep neural network Q to be learnedθ(st,ulT) is called critic, control strategy piφ(ul,t|st) Called actor, rewriting the ASV model uncertain inner loop dynamics in (5) as:
where β (v) is the set of all model uncertainties in the inner loop dynamics. Let the uncertainty term β (v) be bounded. Let ev=v-vmAccording to (6) and (12), the error dynamics are:
under healthy conditions, the model uncertainty term β (v) may use a learning-based control ulComplete compensation is performed. This means | | | e when t → ∞ time | | | ev(t)||2≦ ε, where ε is some positive small constant. Error signal e if sensor failure occursvWill be greater than epsilon. One inexperienced idea of learning-based Fault Tolerance Control (FTC) is to treat sensor failures as part of an external disturbance. However, considering sensor faults as disturbances will result in a control based on conservative learning, such as robust control. Therefore, we introduce a fault diagnosis and estimation mechanism that allows the learning-based control to adapt to different scenarios: healthy and unhealthy conditions.
Let yv=v+nv+fvWherein n isvRepresenting the noise vector on the generalized velocity measurement, and, correspondingly, fvIs a sensor fault acting on the generalized velocity vector. In addition, we defineAs a fault tracking error vector. In the practical application of the method, the material is,is measurableInstead of ev. Finally, the following fault diagnosis and estimation mechanisms are introduced:
wherein L is selected as Hm-L Hurwitz. SignalAs an indicator of the occurrence and intensity of sensor failure. Is provided withTo obtain
In the above formula, HmL represents a Hurwitz matrix,ulrepresents the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner-loop dynamics, nvRepresenting a noise vector on the generalized velocity measurement, fvIndicating a sensor fault acting on the generalized velocity vector.
S4, designing a corresponding callback function according to the control task requirement, and building a reinforcement learning evaluation function model (Q-value) and a control strategy model by using a full-connectivity network.
The callback function, the learning evaluation function and the control strategy model are derived as follows:
RL-based fault tolerance control is derived using the output of a fault diagnosis and estimation mechanism. The RL learns the control strategy at discrete time steps using data samples (including input and state data). The sampling time step is assumed to be fixed, denoted by δ t. Without loss of generality, let yt,ub,t,ul,tAnd are andrepresenting the ASV state, nominal controller excitation, control excitation from the RL, and the output of the fault diagnosis and estimation mechanism at time step t, respectively. The state signal s at the time step t is thus represented as:the training learning process of the RL will repeatedly perform policy evaluation and policy improvement. In policy evaluation, Q-value is operated by Bellman Qπ(st,ul,t)=TπQπ(st,ul,t) Obtained wherein
In the above formula, ul,tIndicating the control excitation, s, from the RLtRepresenting the state signal at a time step T, TπDenotes a fixed policy, EπRepresenting the desired operator, gamma representing the discount factor, alpha representing the temperature coefficient, Qπ(st,ul,t) Representing a reinforcement learning evaluation function.
In policy refinement, the policy is updated by:
where pi represents the set of policies, pioldA policy indicating the last time the user was updated,denotes pioldQ value of (D)KLShows the Kullback-Leibler (KL) divergence,representing a normalization factor. By mathematical operations, the object is converted into
S5, introducing a double-evaluation function model idea into an evaluation function training framework, and adding an entropy value of a strategy into a control strategy expected return function to improve the reinforcement learning training efficiency.
And (3) a dual-evaluation function model derivation process:
parameterizing the Q function by theta, and using Qθ(st,ul,t) And (4) showing. The parameterization strategy consists ofφ(ul,t|st) Where phi is the parameter set to be trained. Note that both θ and φ are a set of parameters, the size of which is determined by the deep neural network settings. For example, if QθRepresented by an MLP with K hidden layers and L neurons per hidden layer, the parameter set θ is then θ ═ θ0,θ1,...,θKAnd at 1. ltoreq. i.ltoreq.K-1θK∈R1×(L+1),θi∈R(L)×(L+1)Wherein dimsSize of state s, dimuRepresenting input ulThe size of (c).
The training session is offline, and data samples are collected at each time step t +1, e.g. input u from the previous time stepl,tLast time step stState of (1), reward RtAnd the current state st+1. These history data will be referred to as tuples(s)t,ul,t,Rt,st+1) Stored in the memory pool D. In each strategy evaluation or improvement step, we randomly extract a batch of historical data B from the memory pool D for the training parameters θ and φ. At the beginning of training, we will put the nominal control strategy ubApplied to ASV system to collect initial data D0As shown in algorithm 1. Initial data set D0For initial fitting of the Q function. After the initialization is finished, executing ubAnd the newly updated reinforcement learning strategy piφ(ul,t|st) To operate the ASV system.
The parameters θ of the Q function are trained to minimize the bellman residual:
wherein(s)t,ul,t) D means the samples(s) we randomly selected from the memory pool Dt,ul,t) And is andwhereinIs a target parameter that will be updated slowly. The DNN parameter θ is obtained by applying a random gradient descent method to (15) on the correction data batch B, the size of which is represented by | B |. The invention uses two channels of which each is theta1And theta2And (4) evaluating parameterization. Both evaluations are introduced to reduce the overestimation problem in evaluating neural network training. Under a double evaluation function, the target value YtargetComprises the following steps:
the policy refinement step is to use the data samples in memory pool D to achieve the following parameterized objective function minimization:
the parameter phi is trained to a minimum using a stochastic gradient descent method, and in the training phase, the actor neural network is represented as:
whereinIs the parameterized control law to be learned,is the standard deviation of the detection noise, ξ -N (0, I) are the detection noise, and "" is the Hadamard product. Note that the detection noise ξ is only applicable in the training phase, and once training is complete, only needs to be in useTherefore, u in the training phaselEquivalent to ul,φ. Once training is over, get
The temperature parameter a is also updated during the training phase. The update is obtained by minimizing the following objective function:
whereinIs the entropy value of the strategy. In the invention is provided withWhere "2" represents the action dimension.
And S6, training the controller based on model reference reinforcement learning under the condition of no fault, obtaining an initial control strategy, and ensuring the robustness of the overall controller to the model uncertainty.
And S7, injecting faults into the unmanned ship system, retraining the acquired initial control strategy based on model reference reinforcement learning, and realizing the adaptability of the overall controller to partial sensor faults.
And S8, under different initial state conditions, continuously repeating the step S6 and the step S7 until the reinforcement learning evaluation function network model and the control strategy model converge.
Specifically, the training process of steps S6-S8 is as follows: 1) is composed ofAndseparately initializing the parameters θ1,θ2Denotes the actor network by phi; 2) values are specified for the target parameters:3) run ulWhen u in formula (5) is 0bObtaining a data set D0(ii) a 4) Ending the exploration of the learning phase using the data set D0Training initial critic parameter θ1 0,5) Initializing memory pool D ← D0(ii) a 6) Initial values are assigned for critic parameters and their targets: theta1←θ1 0, 7) Repeating; 8) starting a loop, each data collection step executing an operation; 9) according to piφ(ul,t|st) Selecting an action ul,t(ii) a 10) Operating the nominal system (6) and the entire system (5) and a fault diagnosis and evaluation mechanism (14)&Collecting st+1={xt+1,xm,t+1,ub,t+1};11)D←D∪{st,ul,t,R(st,ul,t),st+1}; 12) ending the circulation; 13) starting a loop, each gradient update step performing an action; 14) extracting a batch of data B from the D; 15) thetaj←θj-ιQ▽θJQ(θj) And j is 1, 2; 16) phi ← phi-iotaπ▽φJπ(φ);17)α←α-ια▽αJα(α);18)And j is 1, 2; 19) ending the circulation; 20) until convergence (e.g. J)Q(θ) < a small threshold). In the algorithm, iotaQ,ιπAnd iotaαIs a positive learning rate (scalar) and κ > 0 is a constant scalar.
An unmanned ship fault-tolerant control system based on model reference reinforcement learning comprises:
the dynamic model building module is used for analyzing uncertainty factors of the unmanned ship and building a nominal dynamic model of the unmanned ship;
the controller design module is used for designing a nominal controller of the unmanned ship based on the nominal dynamics model of the unmanned ship;
the fault-tolerant controller building module is used for building a fault-tolerant controller based on model reference reinforcement learning according to a state variable difference value of an actual unmanned ship system and an unmanned ship name-meaning dynamic model and the output of an unmanned ship nominal controller based on an Actor-criticic method of the maximum entropy;
and the training module is used for building a reinforcement learning evaluation function and a control strategy model and training the fault-tolerant controller according to the control task requirements to obtain a trained control strategy.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
A model reference reinforcement learning-based unmanned ship fault-tolerant control device comprises:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the unmanned ship fault-tolerant control method based on model reference reinforcement learning as described above.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by the processor, are for implementing a model reference reinforcement learning based unmanned ship fault tolerance control method as described above.
The contents in the above method embodiments are all applicable to the present storage medium embodiment, the functions specifically implemented by the present storage medium embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present storage medium embodiment are also the same as those achieved by the above method embodiments.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A fault-tolerant control method of an unmanned ship based on model reference reinforcement learning is characterized by comprising the following steps:
s1, analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship;
s2, designing a nominal controller of the unmanned ship based on the name meaning dynamic model of the unmanned ship;
s3, constructing a fault-tolerant controller based on model reference reinforcement learning according to the actual unmanned ship system, the state variable difference value of the unmanned ship name-sense dynamic model and the output of the unmanned ship nominal controller by an Actor-criticic method based on the maximum entropy;
and S4, building a reinforcement learning evaluation function and a control strategy model according to the control task requirements, and training a fault-tolerant controller to obtain a trained control strategy.
2. The unmanned ship fault-tolerant control method based on model reference reinforcement learning of claim 1, wherein the formula of the unmanned ship nominal dynamics model is as follows:
in the above formula, the first and second carbon atoms are,representing a generalized coordinate vector, v representing a generalized velocity vector, u representing a control force and moment, M representing an inertia matrix, c (v) comprising coriolis force and centripetal force, d (v) representing a damping matrix, g (v) representing unmodeled dynamics due to gravity and buoyancy and moment, and B representing a preset input matrix.
3. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 2, wherein the formula of the unmanned ship nominal controller is as follows:
in the above formula, NmAnd HmComprising all known constant parameters, η, of the unmanned ship dynamics modelmGeneralized coordinate vector, u, representing a nominal modelmRepresenting the control law, xmRepresenting the state of the reference model.
4. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 3, wherein the formula of the fault-tolerant controller is as follows:
in the above formula, HmL represents a Hurwitz matrix,ulrepresents the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner-loop dynamics, nvRepresenting a noise vector on the generalized velocity measurement, fvIndicating a sensor fault acting on the generalized velocity vector.
5. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 4, wherein the formula of the reinforcement learning evaluation function is expressed as follows:
Qπ(st,ul,t)=TπQπ(st,ul,t)
in the above formula, ul,tIndicating the control excitation, s, from the RLtRepresenting the state signal at a time step T, TπDenotes a fixed policy, EπRepresenting the desired operator, gamma representing the discount factor, alpha representing the temperature coefficient, Qπ(st,ul,t) Representing a reinforcement learning evaluation function.
6. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 4, wherein the formula of the control strategy model is as follows:
7. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 1, wherein the step of constructing a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy according to control task requirements specifically comprises:
and S41, building a reinforcement learning rating function and a model strategy model for the fault-tolerant controller based on model reference reinforcement learning according to the control task requirements.
S42, training the fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
and S43, injecting faults into the unmanned ship system, retraining the initial control strategy and returning to the step S41 until the reinforcement learning evaluation function network model and the control strategy model converge.
8. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 7, further comprising:
and introducing a double-evaluation function model, and adding an entropy value of the strategy into the expected return function of the control strategy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111631716.8A CN114296350B (en) | 2021-12-28 | 2021-12-28 | Unmanned ship fault-tolerant control method based on model reference reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111631716.8A CN114296350B (en) | 2021-12-28 | 2021-12-28 | Unmanned ship fault-tolerant control method based on model reference reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114296350A true CN114296350A (en) | 2022-04-08 |
CN114296350B CN114296350B (en) | 2023-11-03 |
Family
ID=80972328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111631716.8A Active CN114296350B (en) | 2021-12-28 | 2021-12-28 | Unmanned ship fault-tolerant control method based on model reference reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114296350B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110109355A (en) * | 2019-04-29 | 2019-08-09 | 山东科技大学 | A kind of unmanned boat unusual service condition self-healing control method based on intensified learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
-
2021
- 2021-12-28 CN CN202111631716.8A patent/CN114296350B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110109355A (en) * | 2019-04-29 | 2019-08-09 | 山东科技大学 | A kind of unmanned boat unusual service condition self-healing control method based on intensified learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
ZHANG QINGRUI等: "fault tolerant control for autonomous surface vehicles via model reference reinforcement learning", 《2021 60THIEEE CONFERENCE ON DECISION AND CONTROL(CDC)》 * |
ZHANGQINGRUI等: ""model-reference reinforcement learning control of autonomous surface vehicles", 《2020 59THIEEE CONFERENCE ON DECISION AND CONTROL(CDC)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114296350B (en) | 2023-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Peng et al. | Predictor-based neural dynamic surface control for uncertain nonlinear systems in strict-feedback form | |
Elhaki et al. | Reinforcement learning-based saturated adaptive robust neural-network control of underactuated autonomous underwater vehicles | |
Chen et al. | Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning | |
Gong et al. | Lyapunov-based model predictive control trajectory tracking for an autonomous underwater vehicle with external disturbances | |
Hassanein et al. | Model-based adaptive control system for autonomous underwater vehicles | |
Alessandri | Fault diagnosis for nonlinear systems using a bank of neural estimators | |
Buciakowski et al. | A quadratic boundedness approach to robust DC motor fault estimation | |
Wang et al. | Extended state observer-based fixed-time trajectory tracking control of autonomous surface vessels with uncertainties and output constraints | |
Bejarbaneh et al. | Design of robust control based on linear matrix inequality and a novel hybrid PSO search technique for autonomous underwater vehicle | |
Zhang et al. | Disturbance observer-based prescribed performance super-twisting sliding mode control for autonomous surface vessels | |
Zhang et al. | Adaptive asymptotic tracking control for autonomous underwater vehicles with non-vanishing uncertainties and input saturation | |
CN113110430A (en) | Model-free fixed-time accurate trajectory tracking control method for unmanned ship | |
Li et al. | Finite-time composite learning control for trajectory tracking of dynamic positioning vessels | |
Wang et al. | Event-triggered model-parameter-free trajectory tracking control for autonomous underwater vehicles | |
CN116880184A (en) | Unmanned ship track tracking prediction control method, unmanned ship track tracking prediction control system and storage medium | |
CN114296350A (en) | Unmanned ship fault-tolerant control method based on model reference reinforcement learning | |
Liu et al. | Robust Adaptive Self‐Structuring Neural Network Bounded Target Tracking Control of Underactuated Surface Vessels | |
Gao et al. | Data-driven model-free resilient speed control of an autonomous surface vehicle in the presence of actuator anomalies | |
CN114755917B (en) | Model-free self-adaptive anti-interference ship speed controller and design method | |
Sola et al. | Evaluation of a deep-reinforcement-learning-based controller for the control of an autonomous underwater vehicle | |
Chen et al. | An optimization approach to extend control period for dynamics control of Autonomous Underwater Vehicles with X-form rudders | |
Bao et al. | Model-free control design using policy gradient reinforcement learning in lpv framework | |
Alme | Autotuned dynamic positioning for marine surface vessels | |
Caiti et al. | Enhancing autonomy: Fault detection, identification and optimal reaction for over—Actuated AUVs | |
He et al. | Gaussian process based robust trajectory tracking of autonomous underwater vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |