CN114296350A - Unmanned ship fault-tolerant control method based on model reference reinforcement learning - Google Patents

Unmanned ship fault-tolerant control method based on model reference reinforcement learning Download PDF

Info

Publication number
CN114296350A
CN114296350A CN202111631716.8A CN202111631716A CN114296350A CN 114296350 A CN114296350 A CN 114296350A CN 202111631716 A CN202111631716 A CN 202111631716A CN 114296350 A CN114296350 A CN 114296350A
Authority
CN
China
Prior art keywords
unmanned ship
model
fault
reinforcement learning
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111631716.8A
Other languages
Chinese (zh)
Other versions
CN114296350B (en
Inventor
张清瑞
熊培轩
张雷
朱波
胡天江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111631716.8A priority Critical patent/CN114296350B/en
Publication of CN114296350A publication Critical patent/CN114296350A/en
Application granted granted Critical
Publication of CN114296350B publication Critical patent/CN114296350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Feedback Control In General (AREA)

Abstract

The invention discloses a model reference reinforcement learning-based unmanned ship fault-tolerant control method, which comprises the following steps: analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship; designing a nominal controller of the unmanned ship based on the name meaning kinetic model of the unmanned ship; the Actor-criticic method based on the maximum entropy is used for constructing a fault-tolerant controller based on model reference reinforcement learning according to a state variable difference value of an actual unmanned ship system and an unmanned ship name-sense dynamic model and the output of an unmanned ship nominal controller; and according to the control task requirements, building a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy. By using the unmanned ship system, the safety and the reliability of the unmanned ship system can be obviously improved. The unmanned ship fault-tolerant control method based on model reference reinforcement learning can be widely applied to the field of unmanned ship control.

Description

Unmanned ship fault-tolerant control method based on model reference reinforcement learning
Technical Field
The invention relates to the field of unmanned ship control, in particular to an unmanned ship fault-tolerant control method based on model reference reinforcement learning.
Background
With the remarkable progress of guidance, navigation and control technologies, the use of unmanned ships (ASV) has occupied a part of the great weight of aviation. In most applications, unmanned boats are expected to operate safely without human intervention for extended periods of time. Thus, there is a need for unmanned ships that have sufficient safety and reliability attributes to provide proper operation and avoid catastrophic consequences. However, unmanned ships are prone to failure, degradation of system construction, sensor failure, etc., and thus experience performance degradation, instability, and even catastrophic loss.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a model-based reference reinforcement learning unmanned ship fault-tolerant control method, which can recover system performance or maintain system operation after a fault occurs, thereby significantly improving system safety and reliability.
The first technical scheme adopted by the invention is as follows: a fault-tolerant control method of an unmanned ship based on model reference reinforcement learning comprises the following steps:
s1, analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship;
s2, designing a nominal controller of the unmanned ship based on the name meaning dynamic model of the unmanned ship;
s3, constructing a fault-tolerant controller based on model reference reinforcement learning according to the actual unmanned ship system, the state variable difference value of the unmanned ship name-sense dynamic model and the output of the unmanned ship nominal controller by an Actor-criticic method based on the maximum entropy;
and S4, building a reinforcement learning evaluation function and a control strategy model according to the control task requirements, and training a fault-tolerant controller to obtain a trained control strategy.
Further, the formula of the unmanned ship name meaning dynamic model is as follows:
Figure BDA0003440428850000011
in the above formula, the first and second carbon atoms are,
Figure BDA0003440428850000012
representing a generalized coordinate vector, v representing a generalized velocity vector, u representing a control force and moment, M representing an inertia matrix, C (v) including Coriolis force and centripetal force, D (v) representing a damping matrix, G (v) representing unmodeled dynamics due to gravity, buoyancy and moment, B representing a preset input matrix
Figure BDA0003440428850000021
Further, the formula of the nominal controller of the unmanned ship is as follows:
Figure BDA0003440428850000022
in the above formula, NmAnd HmComprising all known constant parameters, η, of the unmanned ship dynamics modelmGeneralized coordinate vector, u, representing a nominal modelmRepresenting the control law, xmRepresenting the state of the reference model.
Further, the formula of the fault-tolerant controller is as follows:
Figure BDA0003440428850000023
in the above formula, HmL represents a Hurwitz matrix,
Figure BDA0003440428850000024
ulrepresents the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner-loop dynamics, nvRepresenting a noise vector on the generalized velocity measurement, fvIndicating a sensor fault acting on the generalized velocity vector.
Further, the formula of the reinforcement learning evaluation function is expressed as follows:
Qπ(st,ul,t)=TπQπ(st,ul,t)
Figure BDA0003440428850000025
in the above formula, ul,tIndicating the control excitation, s, from the RLtRepresenting the state signal at a time step T, TπDenotes a fixed policy, EπRepresenting the desired operator, gamma representing the discount factor, alpha representing the temperature coefficient, Qπ(st,ul,t) Representing a reinforcement learning evaluation function.
Further, the formula of the control strategy model is expressed as follows:
Figure BDA0003440428850000026
in the above formula, pi represents a policy set, pioldA policy representing a previous update is shown,
Figure BDA0003440428850000027
denotes pioldQ value of (D)KLThe dispersion of the KL is expressed,
Figure BDA0003440428850000028
represents a normalization factor,(s)tAnd.) represents a control strategy and the dots represent a write that omits the arguments.
Further, according to the control task requirement, a step of building a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy is specifically included:
and S41, building a reinforcement learning rating function and a model strategy model for the fault-tolerant controller based on model reference reinforcement learning according to the control task requirements.
S42, training the fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
and S43, injecting faults into the unmanned ship system, retraining the initial control strategy and returning to the step S41 until the reinforcement learning evaluation function network model and the control strategy model converge.
Further, still include:
introducing a double-evaluation function model, and adding strategy entropy value into a control strategy expected return function, wherein RtIs a reward function, Rt=R(st,ul,t)。
The method has the beneficial effects that: aiming at an unmanned ship system with model uncertainty and sensor faults, the invention provides a fault-tolerant control algorithm based on reinforcement learning, which combines model reference reinforcement learning with a fault diagnosis and estimation mechanism, and takes the Monte Carlo sampling efficiency into consideration, an Actor-Critic model is used to convert accumulated income into a Q function, and through new fault-tolerant control based on reinforcement learning, the unmanned ship can be ensured to be capable of learning to adapt to different sensor faults and recover the track tracking performance under the fault condition.
Drawings
FIG. 1 is a flow chart of steps of an unmanned ship fault-tolerant control method based on model reference reinforcement learning according to the invention;
fig. 2 is a block diagram of an Actor-critical network according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
As shown in fig. 1, the present invention provides a fault-tolerant control method for an unmanned ship based on model reference Reinforcement Learning (RL), which includes the following steps:
s1, analyzing the inherent uncertainty factors of the unmanned ship, neglecting all nonlinear terms in the inner loop dynamics, obtaining a linear and decoupling model of a dynamics equation of the generalized velocity vector, and establishing a nominal dynamics model of the unmanned ship;
the dynamic model is specifically as follows:
Figure BDA0003440428850000031
wherein
Figure BDA0003440428850000032
Is a generalized coordinate vector, xpAnd ypRepresenting the horizontal coordinates of the ASV in the inertial system,
Figure BDA0003440428850000033
is the heading angle. v ═ up,vp,rp]T∈R3Is a generalized velocity vector, upAnd vpLinear velocities in the x-and y-directions, rpIs the heading angular rate. u ═ τur]∈R3Control of force and moment, g (v) ═ g1(v),g2(v),g3(v)]T∈R3Is unmodeled dynamics due to gravity, buoyancy and moment, M ∈ R3×3Is provided with M ═ MTAn inertia matrix > 0 and
Figure BDA0003440428850000041
wherein
Figure BDA0003440428850000047
Matrix C (v) ═ CT(v) Including coriolis forces and centripetal forces, are given by:
Figure BDA0003440428850000042
wherein C is13(v)=-M22v-M23r,C23(v)=M11u. Damping matrix
Figure BDA0003440428850000043
Wherein D11(v)=-Xu-X|u|u|u|-Xuuuu2,D22(v)=-Yv-Y|v|v|v|-Y|r|v|r|,D23(v)=-Yr-Y|v|r|v|-Y|r|r|r|,D32(v)=-Nv-N|v|v|v|-N|r|v|r|,D33(v)=-Nr-N|v|r|v|-N|r|rAnd | r |, X (·), Y (·), N (·) are hydrodynamic coefficients, and the definitions are detailed in ship hydrodynamic and motion control manuals. Rotation matrix
Figure BDA0003440428850000044
Input matrix
Figure BDA0003440428850000045
Definition x ═ ηT vT]TIs provided with
Figure BDA0003440428850000046
Wherein H (v) ═ M-1(c (v) + d (v)) and N ═ M-1B。
The state measurement of the ASV system (1) is corrupted by noise and sensor faults and is therefore denoted as y ═ x + n + f (t), where n ∈ R6Is the measurement of the noise vector, f (t) e R6Representing possible sensor fault vectors. In the invention, only the sensor fault is considered to the course angular rate rpSo that f (t) is [0,0,0,0,0, fr(t)]T. Sensor failure fr(t) is given by:
fr(t)=β(t-Tf)φ(t-Tf) Wherein phi (T-T)f) Is an unknown function of sensor failure occurring at the instant T, β (T-T)f) For T < TfTime beta (T-T)f) 0 and T > TfTime of flight
Figure BDA0003440428850000051
(k is the evolution rate of the fault). Note that if the occurrence of a sensor fault is sudden, such as a bias fault, k → ∞. The object of the invention is to design a controller that allows state x to track state x in the presence of model uncertainties, possible sensor failures and measurement noiserThe indicated reference state trace.
S2, designing a nominal controller of the unmanned ship based on the nominal dynamics model, and ensuring the basic stability of the unmanned ship system under the condition of no fault. And analyzing the nominal model of the unmanned ship.
The nominal controller design process is as follows:
the proposed RL-based FTC algorithm follows a model reference control structure. For most ASV systems, accurate nonlinear dynamical models are rarely available, with the main uncertainties coming from M, C (v) and d (v) due to fluid mechanics, and g (v) due to gravity and buoyancy and moments. Despite the uncertainty in the ASV dynamics, the nominal model (5) can still be used based on the known information of the ASV dynamics. The nominal model of the uncertain ASV model (5) is as follows:
Figure BDA0003440428850000052
wherein N ismAnd HmContains all known constant parameters of the ASV dynamics (5),
Figure BDA0003440428850000053
is the generalized coordinate vector of the nominal model. In the present invention, MmIs formed by Mm=diag{M11,M22,M33< derived, Hm=Mm -1DmFrom Dm=diag{-Xu,-Yv,-NrAnd Nm=Mm -1And B, obtaining. Therefore, in the nominal model, all nonlinear terms in the inner loop dynamics are ignored, and therefore, the linear solution of the generalized velocity vector v kinetic equation is finally obtainedAnd (4) coupling the model. Since the dynamics of the nominal model (6) are known, it is possible to design the control law umTo allow the state of the nominal system (6) to converge to the reference signal xrE.g., | | x when t → ∞ time | | xm-xr||2→ 0. Control law umCan also be used as a nominal controller by the whole ASV dynamics (5).
In the model reference control architecture, the goal is to design a control law that allows the states of (5) to track the state trajectory of the nominal model (6). The overall control law of the ASV system (5) has the following expression:
u=ub+ul (7)
wherein u isbIs a model-based approach of nominal, ulIs a control strategy from a deep learning module. Baseline control ubFor ensuring some basic properties (i.e. local stability), and ulTo compensate for all system uncertainties and sensor failures.
S3, constructing the fault-tolerant controller based on model reference reinforcement learning by taking the difference value of the state variables of the actual unmanned ship system and the nominal model and the output of the nominal controller as input.
Network block diagram of Actor-criticic referring to fig. 2, the specific derivation process of the fault-tolerant controller is as follows:
the formula for RL is based on a Markov decision process MDP represented by a tuple<S,U,P,R,γ>Where S is the state space, U specifies the operation/input space, P: S × U × S → R defines the transition probability, R: S × U → R is a reward function, γ ∈ [0,1) is a discount coefficient. In MDP, the state vector S ∈ S contains the influence RL control ulAll available signals of e U. For the tracking control of the ASV system in the invention, the transition probability is determined by the ASV dynamic and the reference signal x in (1)rAnd (5) characterizing. In the RL, the control strategy is learning using data samples collected in the discrete time domain. Let stIs the state signal s at the time step t, respectively ul,tIs the input of the RL-based control at time step t. The RL algorithm of the present invention aims to maximize a cost of action function, also called the Q function, such asShown below:
Figure BDA0003440428850000061
wherein R istIs a reward function, Rt=R(st,ul,t),
Figure BDA0003440428850000062
And V isπ(st+1) is called s under strategy πtA state value function of +1, wherein
Figure BDA0003440428850000063
Wherein pi (u)l,t|st) Is a control strategy that is based on the fact that,
Figure BDA0003440428850000064
is the entropy of the strategy and α is the temperature parameter. Control strategy pi (u) in RLl,t|st) Is to select action ul,tE.g. U in state stE.g., the probability under S. In the present invention, a control strategy is employed that satisfies a Gaussian distribution, i.e.
π(ul|s)=N(ul(s),σ) (10)
Wherein N (·,. cndot.) represents a Gaussian distribution, ul(s) is the mean and σ is the covariance matrix. The covariance matrix sigma controls the exploratory performance of the learning phase.
The goal of RL is to find an optimal control strategy π*Making Q in (8)π(st,ul,t) To a maximum, i.e.
π*=argmaxQπ(st,ul,t) (11)
Note that the variance σ*Will converge to 0, once the optimal strategy is obtained*(ul *|s)=N(ul *(s),σ*) Mean function ul *(s) optimal control law deep neural network Q to be learnedθ(st,ulT) is called critic, control strategy piφ(ul,t|st) Called actor, rewriting the ASV model uncertain inner loop dynamics in (5) as:
Figure BDA0003440428850000071
where β (v) is the set of all model uncertainties in the inner loop dynamics. Let the uncertainty term β (v) be bounded. Let ev=v-vmAccording to (6) and (12), the error dynamics are:
Figure BDA0003440428850000072
under healthy conditions, the model uncertainty term β (v) may use a learning-based control ulComplete compensation is performed. This means | | | e when t → ∞ time | | | ev(t)||2≦ ε, where ε is some positive small constant. Error signal e if sensor failure occursvWill be greater than epsilon. One inexperienced idea of learning-based Fault Tolerance Control (FTC) is to treat sensor failures as part of an external disturbance. However, considering sensor faults as disturbances will result in a control based on conservative learning, such as robust control. Therefore, we introduce a fault diagnosis and estimation mechanism that allows the learning-based control to adapt to different scenarios: healthy and unhealthy conditions.
Let yv=v+nv+fvWherein n isvRepresenting the noise vector on the generalized velocity measurement, and, correspondingly, fvIs a sensor fault acting on the generalized velocity vector. In addition, we define
Figure BDA0003440428850000073
As a fault tracking error vector. In the practical application of the method, the material is,
Figure BDA0003440428850000074
is measurableInstead of ev. Finally, the following fault diagnosis and estimation mechanisms are introduced:
Figure BDA0003440428850000075
wherein L is selected as Hm-L Hurwitz. Signal
Figure BDA0003440428850000076
As an indicator of the occurrence and intensity of sensor failure. Is provided with
Figure BDA0003440428850000077
To obtain
Figure BDA0003440428850000078
In the above formula, HmL represents a Hurwitz matrix,
Figure BDA0003440428850000079
ulrepresents the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner-loop dynamics, nvRepresenting a noise vector on the generalized velocity measurement, fvIndicating a sensor fault acting on the generalized velocity vector.
S4, designing a corresponding callback function according to the control task requirement, and building a reinforcement learning evaluation function model (Q-value) and a control strategy model by using a full-connectivity network.
The callback function, the learning evaluation function and the control strategy model are derived as follows:
RL-based fault tolerance control is derived using the output of a fault diagnosis and estimation mechanism. The RL learns the control strategy at discrete time steps using data samples (including input and state data). The sampling time step is assumed to be fixed, denoted by δ t. Without loss of generality, let yt,ub,t,ul,tAnd are and
Figure BDA0003440428850000081
representing the ASV state, nominal controller excitation, control excitation from the RL, and the output of the fault diagnosis and estimation mechanism at time step t, respectively. The state signal s at the time step t is thus represented as:
Figure BDA0003440428850000082
the training learning process of the RL will repeatedly perform policy evaluation and policy improvement. In policy evaluation, Q-value is operated by Bellman Qπ(st,ul,t)=TπQπ(st,ul,t) Obtained wherein
Figure BDA0003440428850000083
In the above formula, ul,tIndicating the control excitation, s, from the RLtRepresenting the state signal at a time step T, TπDenotes a fixed policy, EπRepresenting the desired operator, gamma representing the discount factor, alpha representing the temperature coefficient, Qπ(st,ul,t) Representing a reinforcement learning evaluation function.
In policy refinement, the policy is updated by:
Figure BDA0003440428850000084
where pi represents the set of policies, pioldA policy indicating the last time the user was updated,
Figure BDA0003440428850000085
denotes pioldQ value of (D)KLShows the Kullback-Leibler (KL) divergence,
Figure BDA0003440428850000086
representing a normalization factor. By mathematical operations, the object is converted into
Figure BDA0003440428850000087
S5, introducing a double-evaluation function model idea into an evaluation function training framework, and adding an entropy value of a strategy into a control strategy expected return function to improve the reinforcement learning training efficiency.
And (3) a dual-evaluation function model derivation process:
parameterizing the Q function by theta, and using Qθ(st,ul,t) And (4) showing. The parameterization strategy consists ofφ(ul,t|st) Where phi is the parameter set to be trained. Note that both θ and φ are a set of parameters, the size of which is determined by the deep neural network settings. For example, if QθRepresented by an MLP with K hidden layers and L neurons per hidden layer, the parameter set θ is then θ ═ θ01,...,θKAnd at 1. ltoreq. i.ltoreq.K-1
Figure BDA0003440428850000091
θK∈R1×(L+1),θi∈R(L)×(L+1)Wherein dimsSize of state s, dimuRepresenting input ulThe size of (c).
The training session is offline, and data samples are collected at each time step t +1, e.g. input u from the previous time stepl,tLast time step stState of (1), reward RtAnd the current state st+1. These history data will be referred to as tuples(s)t,ul,t,Rt,st+1) Stored in the memory pool D. In each strategy evaluation or improvement step, we randomly extract a batch of historical data B from the memory pool D for the training parameters θ and φ. At the beginning of training, we will put the nominal control strategy ubApplied to ASV system to collect initial data D0As shown in algorithm 1. Initial data set D0For initial fitting of the Q function. After the initialization is finished, executing ubAnd the newly updated reinforcement learning strategy piφ(ul,t|st) To operate the ASV system.
The parameters θ of the Q function are trained to minimize the bellman residual:
Figure BDA0003440428850000092
wherein(s)t,ul,t) D means the samples(s) we randomly selected from the memory pool Dt,ul,t) And is and
Figure BDA0003440428850000093
wherein
Figure BDA0003440428850000094
Is a target parameter that will be updated slowly. The DNN parameter θ is obtained by applying a random gradient descent method to (15) on the correction data batch B, the size of which is represented by | B |. The invention uses two channels of which each is theta1And theta2And (4) evaluating parameterization. Both evaluations are introduced to reduce the overestimation problem in evaluating neural network training. Under a double evaluation function, the target value YtargetComprises the following steps:
Figure BDA0003440428850000095
the policy refinement step is to use the data samples in memory pool D to achieve the following parameterized objective function minimization:
Figure BDA0003440428850000096
the parameter phi is trained to a minimum using a stochastic gradient descent method, and in the training phase, the actor neural network is represented as:
Figure BDA0003440428850000097
wherein
Figure BDA0003440428850000101
Is the parameterized control law to be learned,
Figure BDA0003440428850000102
is the standard deviation of the detection noise, ξ -N (0, I) are the detection noise, and "" is the Hadamard product. Note that the detection noise ξ is only applicable in the training phase, and once training is complete, only needs to be in use
Figure BDA0003440428850000103
Therefore, u in the training phaselEquivalent to ul,φ. Once training is over, get
Figure BDA0003440428850000104
The temperature parameter a is also updated during the training phase. The update is obtained by minimizing the following objective function:
Figure BDA0003440428850000105
wherein
Figure BDA0003440428850000106
Is the entropy value of the strategy. In the invention is provided with
Figure BDA0003440428850000107
Where "2" represents the action dimension.
And S6, training the controller based on model reference reinforcement learning under the condition of no fault, obtaining an initial control strategy, and ensuring the robustness of the overall controller to the model uncertainty.
And S7, injecting faults into the unmanned ship system, retraining the acquired initial control strategy based on model reference reinforcement learning, and realizing the adaptability of the overall controller to partial sensor faults.
And S8, under different initial state conditions, continuously repeating the step S6 and the step S7 until the reinforcement learning evaluation function network model and the control strategy model converge.
Specifically, the training process of steps S6-S8 is as follows: 1) is composed of
Figure BDA0003440428850000108
And
Figure BDA0003440428850000109
separately initializing the parameters θ1,θ2Denotes the actor network by phi; 2) values are specified for the target parameters:
Figure BDA00034404288500001010
3) run ulWhen u in formula (5) is 0bObtaining a data set D0(ii) a 4) Ending the exploration of the learning phase using the data set D0Training initial critic parameter θ1 0
Figure BDA00034404288500001011
5) Initializing memory pool D ← D0(ii) a 6) Initial values are assigned for critic parameters and their targets: theta1←θ1 0
Figure BDA00034404288500001012
Figure BDA00034404288500001013
7) Repeating; 8) starting a loop, each data collection step executing an operation; 9) according to piφ(ul,t|st) Selecting an action ul,t(ii) a 10) Operating the nominal system (6) and the entire system (5) and a fault diagnosis and evaluation mechanism (14)&Collecting st+1={xt+1,xm,t+1,ub,t+1};11)D←D∪{st,ul,t,R(st,ul,t),st+1}; 12) ending the circulation; 13) starting a loop, each gradient update step performing an action; 14) extracting a batch of data B from the D; 15) thetaj←θjQθJQj) And j is 1, 2; 16) phi ← phi-iotaπφJπ(φ);17)α←α-ιααJα(α);18)
Figure BDA00034404288500001014
And j is 1, 2; 19) ending the circulation; 20) until convergence (e.g. J)Q(θ) < a small threshold). In the algorithm, iotaQ,ιπAnd iotaαIs a positive learning rate (scalar) and κ > 0 is a constant scalar.
An unmanned ship fault-tolerant control system based on model reference reinforcement learning comprises:
the dynamic model building module is used for analyzing uncertainty factors of the unmanned ship and building a nominal dynamic model of the unmanned ship;
the controller design module is used for designing a nominal controller of the unmanned ship based on the nominal dynamics model of the unmanned ship;
the fault-tolerant controller building module is used for building a fault-tolerant controller based on model reference reinforcement learning according to a state variable difference value of an actual unmanned ship system and an unmanned ship name-meaning dynamic model and the output of an unmanned ship nominal controller based on an Actor-criticic method of the maximum entropy;
and the training module is used for building a reinforcement learning evaluation function and a control strategy model and training the fault-tolerant controller according to the control task requirements to obtain a trained control strategy.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
A model reference reinforcement learning-based unmanned ship fault-tolerant control device comprises:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the unmanned ship fault-tolerant control method based on model reference reinforcement learning as described above.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by the processor, are for implementing a model reference reinforcement learning based unmanned ship fault tolerance control method as described above.
The contents in the above method embodiments are all applicable to the present storage medium embodiment, the functions specifically implemented by the present storage medium embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present storage medium embodiment are also the same as those achieved by the above method embodiments.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A fault-tolerant control method of an unmanned ship based on model reference reinforcement learning is characterized by comprising the following steps:
s1, analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship;
s2, designing a nominal controller of the unmanned ship based on the name meaning dynamic model of the unmanned ship;
s3, constructing a fault-tolerant controller based on model reference reinforcement learning according to the actual unmanned ship system, the state variable difference value of the unmanned ship name-sense dynamic model and the output of the unmanned ship nominal controller by an Actor-criticic method based on the maximum entropy;
and S4, building a reinforcement learning evaluation function and a control strategy model according to the control task requirements, and training a fault-tolerant controller to obtain a trained control strategy.
2. The unmanned ship fault-tolerant control method based on model reference reinforcement learning of claim 1, wherein the formula of the unmanned ship nominal dynamics model is as follows:
Figure FDA0003440428840000011
in the above formula, the first and second carbon atoms are,
Figure FDA0003440428840000012
representing a generalized coordinate vector, v representing a generalized velocity vector, u representing a control force and moment, M representing an inertia matrix, c (v) comprising coriolis force and centripetal force, d (v) representing a damping matrix, g (v) representing unmodeled dynamics due to gravity and buoyancy and moment, and B representing a preset input matrix.
3. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 2, wherein the formula of the unmanned ship nominal controller is as follows:
Figure FDA0003440428840000013
in the above formula, NmAnd HmComprising all known constant parameters, η, of the unmanned ship dynamics modelmGeneralized coordinate vector, u, representing a nominal modelmRepresenting the control law, xmRepresenting the state of the reference model.
4. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 3, wherein the formula of the fault-tolerant controller is as follows:
Figure FDA0003440428840000014
in the above formula, HmL represents a Hurwitz matrix,
Figure FDA0003440428840000015
ulrepresents the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner-loop dynamics, nvRepresenting a noise vector on the generalized velocity measurement, fvIndicating a sensor fault acting on the generalized velocity vector.
5. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 4, wherein the formula of the reinforcement learning evaluation function is expressed as follows:
Qπ(st,ul,t)=TπQπ(st,ul,t)
Figure FDA0003440428840000021
in the above formula, ul,tIndicating the control excitation, s, from the RLtRepresenting the state signal at a time step T, TπDenotes a fixed policy, EπRepresenting the desired operator, gamma representing the discount factor, alpha representing the temperature coefficient, Qπ(st,ul,t) Representing a reinforcement learning evaluation function.
6. The unmanned ship fault-tolerant control method based on model reference reinforcement learning as claimed in claim 4, wherein the formula of the control strategy model is as follows:
Figure FDA0003440428840000022
in the above formula, pi represents a policy set, pioldRepresents the previous onePolicy of secondary update, QπoldDenotes pioldQ value of (D)KLThe dispersion of the KL is expressed,
Figure FDA0003440428840000023
represents a normalization factor,(s)tAnd) represents a control strategy.
7. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 1, wherein the step of constructing a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy according to control task requirements specifically comprises:
and S41, building a reinforcement learning rating function and a model strategy model for the fault-tolerant controller based on model reference reinforcement learning according to the control task requirements.
S42, training the fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
and S43, injecting faults into the unmanned ship system, retraining the initial control strategy and returning to the step S41 until the reinforcement learning evaluation function network model and the control strategy model converge.
8. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 7, further comprising:
and introducing a double-evaluation function model, and adding an entropy value of the strategy into the expected return function of the control strategy.
CN202111631716.8A 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning Active CN114296350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111631716.8A CN114296350B (en) 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111631716.8A CN114296350B (en) 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning

Publications (2)

Publication Number Publication Date
CN114296350A true CN114296350A (en) 2022-04-08
CN114296350B CN114296350B (en) 2023-11-03

Family

ID=80972328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111631716.8A Active CN114296350B (en) 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning

Country Status (1)

Country Link
CN (1) CN114296350B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109355A (en) * 2019-04-29 2019-08-09 山东科技大学 A kind of unmanned boat unusual service condition self-healing control method based on intensified learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109355A (en) * 2019-04-29 2019-08-09 山东科技大学 A kind of unmanned boat unusual service condition self-healing control method based on intensified learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHANG QINGRUI等: "fault tolerant control for autonomous surface vehicles via model reference reinforcement learning", 《2021 60THIEEE CONFERENCE ON DECISION AND CONTROL(CDC)》 *
ZHANGQINGRUI等: ""model-reference reinforcement learning control of autonomous surface vehicles", 《2020 59THIEEE CONFERENCE ON DECISION AND CONTROL(CDC)》 *

Also Published As

Publication number Publication date
CN114296350B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
Peng et al. Predictor-based neural dynamic surface control for uncertain nonlinear systems in strict-feedback form
Elhaki et al. Reinforcement learning-based saturated adaptive robust neural-network control of underactuated autonomous underwater vehicles
Chen et al. Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning
Gong et al. Lyapunov-based model predictive control trajectory tracking for an autonomous underwater vehicle with external disturbances
Hassanein et al. Model-based adaptive control system for autonomous underwater vehicles
Alessandri Fault diagnosis for nonlinear systems using a bank of neural estimators
Buciakowski et al. A quadratic boundedness approach to robust DC motor fault estimation
Wang et al. Extended state observer-based fixed-time trajectory tracking control of autonomous surface vessels with uncertainties and output constraints
Bejarbaneh et al. Design of robust control based on linear matrix inequality and a novel hybrid PSO search technique for autonomous underwater vehicle
Zhang et al. Disturbance observer-based prescribed performance super-twisting sliding mode control for autonomous surface vessels
Zhang et al. Adaptive asymptotic tracking control for autonomous underwater vehicles with non-vanishing uncertainties and input saturation
CN113110430A (en) Model-free fixed-time accurate trajectory tracking control method for unmanned ship
Li et al. Finite-time composite learning control for trajectory tracking of dynamic positioning vessels
Wang et al. Event-triggered model-parameter-free trajectory tracking control for autonomous underwater vehicles
CN116880184A (en) Unmanned ship track tracking prediction control method, unmanned ship track tracking prediction control system and storage medium
CN114296350A (en) Unmanned ship fault-tolerant control method based on model reference reinforcement learning
Liu et al. Robust Adaptive Self‐Structuring Neural Network Bounded Target Tracking Control of Underactuated Surface Vessels
Gao et al. Data-driven model-free resilient speed control of an autonomous surface vehicle in the presence of actuator anomalies
CN114755917B (en) Model-free self-adaptive anti-interference ship speed controller and design method
Sola et al. Evaluation of a deep-reinforcement-learning-based controller for the control of an autonomous underwater vehicle
Chen et al. An optimization approach to extend control period for dynamics control of Autonomous Underwater Vehicles with X-form rudders
Bao et al. Model-free control design using policy gradient reinforcement learning in lpv framework
Alme Autotuned dynamic positioning for marine surface vessels
Caiti et al. Enhancing autonomy: Fault detection, identification and optimal reaction for over—Actuated AUVs
He et al. Gaussian process based robust trajectory tracking of autonomous underwater vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant