CN114296350A - A fault-tolerant control method for unmanned ships based on model reference reinforcement learning - Google Patents
A fault-tolerant control method for unmanned ships based on model reference reinforcement learning Download PDFInfo
- Publication number
- CN114296350A CN114296350A CN202111631716.8A CN202111631716A CN114296350A CN 114296350 A CN114296350 A CN 114296350A CN 202111631716 A CN202111631716 A CN 202111631716A CN 114296350 A CN114296350 A CN 114296350A
- Authority
- CN
- China
- Prior art keywords
- model
- unmanned ship
- fault
- reinforcement learning
- tolerant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000006870 function Effects 0.000 claims abstract description 43
- 238000011217 control strategy Methods 0.000 claims abstract description 37
- 238000011156 evaluation Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000005259 measurement Methods 0.000 claims description 8
- 230000005284 excitation Effects 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 claims description 4
- 230000005484 gravity Effects 0.000 claims description 4
- 238000013016 damping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 125000004432 carbon atom Chemical group C* 0.000 claims 1
- 239000006185 dispersion Substances 0.000 claims 1
- 101150080778 INPP5D gene Proteins 0.000 description 35
- 230000009471 action Effects 0.000 description 8
- 238000013461 design Methods 0.000 description 7
- 238000003745 diagnosis Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000012614 Monte-Carlo sampling Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005312 nonlinear dynamic Methods 0.000 description 1
- 239000013643 reference control Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Feedback Control In General (AREA)
Abstract
本发明公开了一种基于模型参考强化学习的无人船容错控制方法,该方法包括:对无人船的不确定性因素进行分析,构建无人船名义动力学模型;基于无人船名义动力学模型,设计无人船标称控制器;基于最大熵的Actor‑Critic方法,根据实际无人船系统、无人船名义动力学模型的状态变量差值和无人船标称控制器的输出,构建基于模型参考强化学习的容错控制器;根据控制任务需求,搭建强化学习评价函数和控制策略模型并训练容错控制器,得到训练完成的控制策略。通过使用本发明,能够显著提高无人船系统的安全性和可靠性。本发明作为一种基于模型参考强化学习的无人船容错控制方法,可广泛应用于无人船控制领域。
The invention discloses a fault-tolerant control method for an unmanned ship based on model reference reinforcement learning. The method includes: analyzing the uncertainty factors of the unmanned ship, and constructing a nominal dynamic model of the unmanned ship; based on the maximum entropy Actor-Critic method, according to the actual unmanned ship system, the state variable difference of the nominal dynamic model of the unmanned ship and the output of the nominal controller of the unmanned ship , build a fault-tolerant controller based on model reference reinforcement learning; according to the control task requirements, build a reinforcement learning evaluation function and a control strategy model and train the fault-tolerant controller to obtain the trained control strategy. By using the present invention, the safety and reliability of the unmanned ship system can be significantly improved. As a fault-tolerant control method for an unmanned ship based on model reference reinforcement learning, the present invention can be widely used in the field of unmanned ship control.
Description
技术领域technical field
本发明涉及无人船控制领域,尤其涉及一种基于模型参考强化学习的无人船容错控制方法。The invention relates to the field of unmanned ship control, in particular to an unmanned ship fault-tolerant control method based on model reference reinforcement learning.
背景技术Background technique
随着制导、导航和控制技术的显著进步,无人船(autonomous surface vehicles,ASV)的应用已经占据了航空举足轻重的部分。在大多数应用中,无人船预计将在长时间没有人工干预的情况下安全运行。因此,需要无人船具有足够的安全和可靠性属性以提供正常的运作,并避免灾难性的后果。然而,无人船容易出现故障、系统组建退化、传感器故障等问题,从而经历性能恶化,不稳定,甚至灾难性的损失。With significant advancements in guidance, navigation, and control technologies, the application of autonomous surface vehicles (ASVs) has taken over a pivotal part of aviation. In most applications, unmanned ships are expected to operate safely without human intervention for extended periods of time. Therefore, unmanned ships are required to have sufficient safety and reliability attributes to provide normal operation and avoid catastrophic consequences. However, unmanned ships are prone to problems such as failure, system component degradation, sensor failure, etc., and thus experience performance degradation, instability, and even catastrophic losses.
发明内容SUMMARY OF THE INVENTION
为了解决上述技术问题,本发明的目的是提供一种基于模型参考强化学习的无人船容错控制方法,可以在遇到故障后恢复系统性能或保持系统运行,从而显著提高系统的安全性和可靠性。In order to solve the above technical problems, the purpose of the present invention is to provide a fault-tolerant control method for unmanned ships based on model reference reinforcement learning, which can restore the system performance or keep the system running after encountering a fault, thereby significantly improving the safety and reliability of the system sex.
本发明所采用的第一技术方案是:一种基于模型参考强化学习的无人船容错控制方法,包括以下步骤:The first technical solution adopted by the present invention is: a fault-tolerant control method for an unmanned ship based on model reference reinforcement learning, comprising the following steps:
S1、对无人船的不确定性因素进行分析,构建无人船名义动力学模型;S1. Analyze the uncertainty factors of the unmanned ship, and construct a nominal dynamic model of the unmanned ship;
S2、基于无人船名义动力学模型,设计无人船标称控制器;S2. Based on the nominal dynamics model of the unmanned ship, design the nominal controller of the unmanned ship;
S3、基于最大熵的Actor-Critic方法,根据实际无人船系统、无人船名义动力学模型的状态变量差值和无人船标称控制器的输出,构建基于模型参考强化学习的容错控制器;S3. Actor-Critic method based on maximum entropy, according to the actual unmanned ship system, the state variable difference of the unmanned ship nominal dynamic model and the output of the unmanned ship nominal controller, construct fault-tolerant control based on model reference reinforcement learning device;
S4、根据控制任务需求,搭建强化学习评价函数和控制策略模型并训练容错控制器,得到训练完成的控制策略。S4. According to the control task requirements, build a reinforcement learning evaluation function and a control strategy model and train a fault-tolerant controller to obtain a trained control strategy.
进一步,所述无人船名义动力学模型的公式表示如下:Further, the formula of the nominal dynamic model of the unmanned ship is expressed as follows:
上式中,表示广义坐标向量,v表示广义速度向量,u表示控制力和力矩,M表示惯性矩阵,C(v)包括科氏力和向心力,D(v)表示阻尼矩阵,G(v)表示由于重力和浮力及力矩而产生的未建模动力学,B表示预设的输入矩阵 In the above formula, Represents generalized coordinate vector, v represents generalized velocity vector, u represents control force and moment, M represents inertia matrix, C(v) includes Coriolis force and centripetal force, D(v) represents damping matrix, G(v) represents due to gravity and Unmodeled dynamics due to buoyancy and moment, B is a preset input matrix
进一步,所述无人船标称控制器的公式表示如下:Further, the formula of the nominal controller of the unmanned ship is expressed as follows:
上式中,Nm和Hm包含无人船动力学模型的所有已知常量参数,ηm表示标称模型的广义坐标向量,um表示控制律,xm表示参考模型的状态。In the above formula, N m and H m contain all known constant parameters of the dynamic model of the unmanned ship, η m represents the generalized coordinate vector of the nominal model, u m represents the control law, and x m represents the state of the reference model.
进一步,所述容错控制器的公式表示如下:Further, the formula of the fault-tolerant controller is expressed as follows:
上式中,Hm-L表示Hurwitz矩阵,ul表示来自深度学习模块的控制策略,β(v)表示内环动力学中所有模型不确定性的集合,nv表示广义速度测量值上的噪声矢量,fv表示作用于广义速度矢量的传感器故障。In the above formula, H m -L represents the Hurwitz matrix, u l represents the control strategy from the deep learning module, β(v) represents the set of all model uncertainties in the inner loop dynamics, n v represents the noise vector on the generalized velocity measurement, f v represents the action on the generalized velocity vector Sensor failure.
进一步,所述强化学习评价函数的公式表示如下:Further, the formula of the reinforcement learning evaluation function is expressed as follows:
Qπ(st,ul,t)=TπQπ(st,ul,t)Q π (s t ,u l,t )=T π Q π (s t ,u l,t )
上式中,ul,t表示来自RL的控制激发,st表示时间步长t处的状态信号,Tπ表示固定策略,Eπ表示期望算子,γ表示折扣因子,α表示温度系数,Qπ(st,ul,t)表示强化学习评价函数。In the above formula, u l,t represents the control excitation from RL, s t represents the state signal at time step t, T π represents the fixed policy, E π represents the expectation operator, γ represents the discount factor, α represents the temperature coefficient, Q π (s t , u l, t ) represents the reinforcement learning evaluation function.
进一步,所述控制策略模型的公式表示如下:Further, the formula of the control strategy model is expressed as follows:
上式中,Π表示策略集,πold表示前一次更新的策略,表示πold的Q值,DKL表示KL散度,表示归一化因子,(st,·)表示控制策略,点表示省去自变量的写法。In the above formula, Π represents the strategy set, π old represents the strategy of the previous update, represents the Q value of π old , D KL represents the KL divergence, represents the normalization factor, (s t , ) represents the control strategy, and the dot represents the writing without independent variables.
进一步,所述根据控制任务需求,搭建强化学习评价函数和控制策略模型并训练容错控制器,得到训练完成的控制策略这一步骤,其具体包括:Further, according to the control task requirements, the step of building a reinforcement learning evaluation function and a control strategy model and training a fault-tolerant controller to obtain a trained control strategy specifically includes:
S41、根据控制任务需求,对基于模型参考强化学习的容错控制器搭建强化学习评级函数和模型策略模型。S41 , building a reinforcement learning rating function and a model policy model for the fault-tolerant controller based on model reference reinforcement learning according to the requirements of the control task.
S42、对基于模型参考强化学习的容错控制器进行训练,得到初始控制策略;S42, train the fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
S43、在无人船系统中注入故障,对初始控制策略进行再训练并返回步骤S41,直至强化学习的评价函数网络模型和控制策略模型收敛。S43, inject a fault into the unmanned ship system, retrain the initial control strategy, and return to step S41, until the evaluation function network model and the control strategy model of the reinforcement learning converge.
进一步,还包括:Further, it also includes:
引入双评价函数模型,在控制策略预期回报函数中加入策略的熵值,其中Rt是奖励函数,Rt=R(st,ul,t)。A dual evaluation function model is introduced, and the entropy value of the strategy is added to the expected return function of the control strategy, where R t is the reward function, and R t =R(s t ,u l,t ).
本发明方法的有益效果是:本发明针对存在模型不确定性和传感器故障的无人船系统,提出了一种将模型参考强化学习与故障诊断和估计机制相结合的基于强化学习的容错控制算法,考虑到蒙特卡洛采样效率低,使用Actor-Critic模型,把累计收益换成Q函数,通过新的基于强化学习的容错控制,我们确保无人船能够学习适应不同的传感器故障,并在故障条件下恢复轨迹跟踪性能。The beneficial effects of the method of the invention are as follows: the invention proposes a fault-tolerant control algorithm based on reinforcement learning that combines model reference reinforcement learning with fault diagnosis and estimation mechanisms for the unmanned ship system with model uncertainty and sensor faults , considering the low efficiency of Monte Carlo sampling, using the Actor-Critic model to replace the cumulative revenue with the Q function, through the new fault-tolerant control based on reinforcement learning, we ensure that the unmanned ship can learn to adapt to different sensor failures, and in the failure Recover trajectory tracking performance under conditions.
附图说明Description of drawings
图1是本发明一种基于模型参考强化学习的无人船容错控制方法的步骤流程图;Fig. 1 is the step flow chart of a kind of unmanned ship fault-tolerant control method based on model reference reinforcement learning of the present invention;
图2是本发明具体实施例Actor-Critic网络的结构框图。FIG. 2 is a structural block diagram of an Actor-Critic network according to a specific embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明做进一步的详细说明。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. The numbers of the steps in the following embodiments are only set for the convenience of description, and the sequence between the steps is not limited in any way, and the execution sequence of each step in the embodiments can be adapted according to the understanding of those skilled in the art Sexual adjustment.
如图1所示,本发明提供了一种基于模型参考强化学习(reinforcementlearning,RL)的无人船容错控制方法,该方法包括以下步骤:As shown in FIG. 1 , the present invention provides a fault-tolerant control method for an unmanned ship based on model reference reinforcement learning (RL), and the method includes the following steps:
S1、对无人船内在的不确定性因素进行分析,忽略其中内环动力学中的所有非线性项,得到广义速度向量的动力学方程的线性和解耦模型,建立无人船名义动力学模型;S1. Analyze the inherent uncertainty factors of the unmanned ship, ignore all nonlinear terms in the inner loop dynamics, obtain the linear and decoupled models of the dynamic equation of the generalized velocity vector, and establish the nominal dynamics of the unmanned ship Model;
动力学模型具体为:The dynamic model is specifically:
其中是一个广义坐标向量,xp和yp表示ASV在惯性系中的水平坐标,是航向角。v=[up,vp,rp]T∈R3是广义速度向量,up和vp分别为x轴和y轴方向上的线速度,rp为航向角速率。u=[τu,τr]∈R3控制力和力矩,G(v)=[g1(v),g2(v),g3(v)]T∈R3是由于重力和浮力及力矩而产生的未建模动力学,M∈R3×3是带有M=MT>0的惯性矩阵且in is a generalized coordinate vector, x p and y p represent the horizontal coordinates of the ASV in the inertial frame, is the heading angle. v = [up p , v p , rp ] T ∈ R 3 is a generalized velocity vector, up and v p are the linear velocities in the x-axis and y-axis directions, respectively, and rp is the heading angular rate. u=[τ u ,τ r ]∈R 3 controls forces and moments, G(v)=[g 1 (v),g 2 (v),g 3 (v)] T ∈ R 3 is due to gravity and buoyancy and the unmodeled dynamics resulting from the moment, M∈R 3×3 is the inertia matrix with M=M T > 0 and
其中矩阵C(v)=-CT(v)包含科氏力和向心力,由下式给出:in The matrix C(v)=- CT (v) contains the Coriolis and centripetal forces and is given by:
其中C13(v)=-M22v-M23r,C23(v)=M11u。阻尼矩阵where C 13 (v)=-M 22 vM 23 r, C 23 (v)=M 11 u. Damping matrix
其中D11(v)=-Xu-X|u|u|u|-Xuuuu2,D22(v)=-Yv-Y|v|v|v|-Y|r|v|r|,D23(v)=-Yr-Y|v|r|v|-Y|r|r|r|,D32(v)=-Nv-N|v|v|v|-N|r|v|r|,D33(v)=-Nr-N|v|r|v|-N|r|r|r|,X(·),Y(·),N(·)是水动力系数,定义详见船舶水动力和运动控制手册。旋转矩阵输入矩阵 where D 11 (v)=-X u -X |u|u |u|-X uuu u 2 , D 22 (v)=-Y v -Y |v|v |v|-Y |r|v | r|, D 23 (v)=-Y r -Y |v|r |v|-Y |r|r |r|, D 32 (v)=-N v -N |v|v |v|- N |r|v |r|, D 33 (v)=-N r -N |v|r |v|-N |r|r |r|, X(·), Y(·), N(· ) is the hydrodynamic coefficient, the definition is detailed in the Manual of Ship Hydrodynamics and Motion Control. rotation matrix input matrix
定义x=[ηT vT]T,有Define x=[η T v T ] T , we have
其中H(v)=-M-1(C(v)+D(v))且N=-M-1B。where H(v)=-M -1 (C(v)+D(v)) and N=-M - 1B.
ASV系统(1)的状态测量值因噪声和传感器故障而损坏,因此表示为y=x+n+f(t),其中n∈R6是测量噪声向量,f(t)∈R6表示可能的传感器故障向量。本发明中,我们只考虑传感器故障对航向角速率rp的测量,所以f(t)=[0,0,0,0,0,fr(t)]T。传感器故障fr(t)由下式给出:The state measurements of the ASV system (1) are corrupted by noise and sensor failures and are therefore denoted as y=x+n+f(t), where n∈R6 is the measurement noise vector and f(t) ∈R6 denotes the possible sensor fault vector. In the present invention, we only consider the measurement of the yaw rate r p due to sensor failure, so f(t)=[0,0,0,0,0,f r (t)] T . The sensor fault f r (t) is given by:
fr(t)=β(t-Tf)φ(t-Tf),其中φ(t-Tf)是在瞬时T发生的传感器故障的未知功能,β(t-Tf)是对于t<Tf时β(t-Tf)=0且t>Tf时(k是故障的演化速率)的时间剖面。注意如果传感器故障的发生是突然的,例如偏置故障,k→∞。本发明的目的便是设计一个控制器,允许状态x在存在模型不确定性、可能的传感器故障和测量噪声的情况下跟踪由xr表示的参考状态轨迹。f r (t) = β(tT f )φ(tT f ), where φ(tT f ) is the unknown function of sensor failure at instant T and β(tT f ) is β(tT for t < T f ) f )=0 and t>T f (k is the evolution rate of the fault) time profile. Note that if the sensor failure occurs suddenly, such as a bias failure, k→∞. The purpose of the present invention is to design a controller that allows state x to track a reference state trajectory represented by xr in the presence of model uncertainty, possible sensor failures and measurement noise.
S2、基于名义动力学模型,设计无人船标称控制器,保障无人船系统在无故障前提下的基本稳定性。对无人船名义模型进行分析。S2. Based on the nominal dynamics model, design the nominal controller of the unmanned ship to ensure the basic stability of the unmanned ship system under the premise of no faults. Analyze the nominal model of the unmanned ship.
标称控制器设计过程为:The nominal controller design process is:
所提出的基于RL的FTC算法遵循模型参考控制结构。对于大多数ASV系统,精确的非线性动力学模型很少可用,主要的不确定性来自流体力学引起的M、C(v)和D(v),以及重力和浮力及力矩引起的G(v)。尽管ASV动力学存在不确定性,但基于ASV动力学的已知信息,仍然可以使用标称模型(5)。不确定ASV模型(5)的标称模型如下所示:The proposed RL-based FTC algorithm follows a model reference control structure. For most ASV systems, accurate nonlinear dynamic models are rarely available, and the main uncertainties come from M, C(v) and D(v) due to fluid mechanics, and G(v) due to gravity and buoyancy and moments ). Despite the uncertainties in ASV dynamics, based on known information on ASV dynamics, a nominal model can still be used (5). The nominal model of the uncertain ASV model (5) is as follows:
其中Nm和Hm包含ASV动力学(5)的所有已知常量参数,是标称模型的广义坐标向量。本发明中,Mm是由Mm=diag{M11,M22,M33}得出的,Hm=Mm -1Dm由Dm=diag{-Xu,-Yv,-Nr}和Nm=Mm -1B得到。因此在标称模型中,忽略了内环动力学中的所有非线性项,因此最终得到了广义速度矢量v动力学方程的线性解耦模型。由于已知标称模型(6)的动力学,因此可以设计控制律um,以允许标称系统(6)的状态收敛到参考信号xr,如当t→∞时||xm-xr||2→0。这种控制律um也可被整个ASV动力学(5)用作标称控制器。where N m and H m contain all known constant parameters of ASV kinetics (5), is the generalized coordinate vector of the nominal model. In the present invention, M m is derived from M m =diag{M 11 , M 22 , M 33 }, and H m =M m -1 D m is derived from D m =diag{-X u ,-Y v ,- N r } and N m =M m -1 B are obtained. Therefore, in the nominal model, all nonlinear terms in the dynamics of the inner loop are ignored, and thus a linearly decoupled model of the dynamics equation of the generalized velocity vector v is finally obtained. Since the dynamics of the nominal model (6) are known, the control law um can be designed to allow the state of the nominal system (6) to converge to the reference signal x r as ||x m -x as t→∞ r || 2 → 0. This control law um can also be used as a nominal controller by the entire ASV dynamics (5).
在模型参考控制结构中,目标是设计一个控制律,允许(5)的状态跟踪标称模型(6)的状态轨迹。ASV系统(5)的总体控制律具有以下表达式:In a model-referenced control structure, the goal is to design a control law that allows the state of (5) to track the state trajectory of the nominal model (6). The overall control law of the ASV system (5) has the following expression:
u=ub+ul (7)u=u b +u l (7)
其中ub是基于模型方法的标称,ul是来自深度学习模块的控制策略。基线控制ub用于确保一些基本性能(即局部稳定性),而ul用于补偿所有系统不确定性和传感器故障。where u b is the nominal of the model-based approach and u l is the control policy from the deep learning module. The baseline control u b is used to ensure some basic performance (ie local stability), while u l is used to compensate for all system uncertainties and sensor failures.
S3、基于最大熵的Actor-Critic方法,以实际无人船系统和名义模型的状态变量的差值和标称控制器的输出为输入,构建基于模型参考强化学习的容错控制器。S3. The Actor-Critic method based on maximum entropy takes the difference between the state variables of the actual unmanned ship system and the nominal model and the output of the nominal controller as input to construct a fault-tolerant controller based on model reference reinforcement learning.
Actor-Critic的网络框图参照图2,容错控制器的具体推导过程如下:Refer to Figure 2 for the network block diagram of Actor-Critic. The specific derivation process of the fault-tolerant controller is as follows:
RL的公式基于一个由元组表示的马尔可夫决策过程MDP:=<S,U,P,R,γ>,其中S是状态空间,U指定操作/输入空间,P:S×U×S→R定义转移概率,R:S×U→R是一个回奖励函数,γ∈[0,1)是一个折现系数。在MDP中,状态向量s∈S包含影响RL控制ul∈U的所有可用信号。对于本发明中ASV系统的跟踪控制,转移概率由(1)中的ASV动态和参考信号xr表征。在RL中,控制策略是使用在离散时间域中采集的数据样本学习。设st为时间步长t处的状态信号s,相应地,ul,t是时间步长t时基于RL的控制的输入。本发明中的RL算法旨在最大化一个行动价值函数,也称为Q函数,如下所示:The formulation of RL is based on a Markov Decision Process MDP:=<S,U,P,R,γ> represented by a tuple, where S is the state space, U specifies the operation/input space, and P:S×U×S →R defines the transition probability, R:S×U→R is a reward function, and γ∈[0,1) is a discount coefficient. In MDP, the state vector s ∈ S contains all available signals that affect the RL control u l ∈ U. For the tracking control of the ASV system in the present invention, the transition probability is characterized by the ASV dynamics in (1) and the reference signal xr . In RL, control policies are learned using data samples collected in the discrete time domain. Let s t be the state signal s at time step t, and accordingly, u l,t be the input to the RL-based control at time step t. The RL algorithm in the present invention aims to maximize an action value function, also known as the Q-function, as follows:
其中Rt是奖励函数,Rt=R(st,ul,t),且Vπ(st+1)称为策略π下st+1的状态值函数,其中where R t is the reward function, R t =R(s t ,u l,t ), And V π (s t +1) is called the state value function of s t +1 under policy π, where
其中π(ul,t|st)是控制策略,是策略的熵,α是温度参数。RL中的控制策略π(ul,t|st)是选择行动ul,t∈U在状态st∈S下的概率。在本发明中,采用满足高斯分布的控制策略,即where π(u l,t |s t ) is the control strategy, is the entropy of the policy and α is the temperature parameter. The control policy π(u l,t |s t ) in RL is the probability of choosing action u l,t ∈ U in state s t ∈ S. In the present invention, a control strategy that satisfies the Gaussian distribution is adopted, that is,
π(ul|s)=N(ul(s),σ) (10)π(u l |s)=N(u l (s),σ) (10)
其中N(·,·)表示高斯分布,ul(s)为均值,σ为协方差矩阵。协方差矩阵σ控制学习阶段的探索性能。where N(·,·) represents the Gaussian distribution, u l (s) is the mean, and σ is the covariance matrix. The covariance matrix σ controls the exploration performance in the learning phase.
RL的目标是找到一个最优控制策略π*使(8)中的Qπ(st,ul,t)最大化,即The goal of RL is to find an optimal control policy π * that maximizes Q π (s t ,u l,t ) in (8), i.e.
π*=argmaxQπ(st,ul,t) (11)π * = argmaxQ π (s t ,u l,t ) (11)
注意,方差σ*将收敛到0,一旦得到了最优策略π*(ul *|s)=N(ul *(s),σ*),平均值函数ul *(s)将是学习到的最优控制律深度神经网络Qθ(st,ul,t)被称为critic,控制策略πφ(ul,t|st)被称为actor,将(5)中的ASV模型不确定内环动力学重写为:Note that the variance σ * will converge to 0, and once the optimal policy π * (u l * |s) = N(u l * (s),σ * ), the mean function u l * (s) will be The learned optimal control law deep neural network Q θ (s t , u l , t) is called critic, and the control strategy π φ (u l, t |s t ) is called actor. The uncertain inner loop dynamics of the ASV model is rewritten as:
其中β(v)是内环动力学中所有模型不确定性的集合。假设不确定项β(v)是有界的。使ev=v-vm,根据(6)和(12),误差动力学为:where β(v) is the set of all model uncertainties in the inner loop dynamics. Assume that the uncertainty term β(v) is bounded. Let e v = vvm , according to (6) and (12), the error dynamics are:
在健康条件下,模型不确定性项β(v)可使用基于学习的控制ul进行完全补偿。这意味着当t→∞时||ev(t)||2≤ε,其中ε是某个正小常数。如果发生传感器故障,错误信号ev将大于ε。基于学习的容错控制(faulttolerantControl,FTC)的一个缺乏经验的想法是将传感器故障视为外部干扰的一部分。然而,将传感器故障视为干扰将导致基于保守学习的控制,如鲁棒控制。因此,我们引入了一种故障诊断和估计机制,允许基于学习的控制适应不同的场景:健康和不健康的条件。Under healthy conditions, the model uncertainty term β(v) can be fully compensated using the learning-based control ul . This means that ||e v (t)|| 2 ≤ ε as t→∞, where ε is some positive small constant. In the event of a sensor failure, the error signal ev will be greater than ε. An inexperienced idea of learning-based fault tolerant control (FTC) is to treat sensor failures as part of external disturbances. However, treating sensor failures as disturbances will lead to conservative learning-based control, such as robust control. Therefore, we introduce a fault diagnosis and estimation mechanism that allows learning-based control to adapt to different scenarios: healthy and unhealthy conditions.
设yv=v+nv+fv,其中nv表示广义速度测量值上的噪声矢量,并相应地,fv是作用于广义速度矢量的传感器故障。此外我们定义了作为故障跟踪误差向量。在实际应用中,是可测量的,而不是ev。最后,介绍了以下故障诊断和估计机制:Let y v =v+n v +f v , where n v represents the noise vector on the generalized velocity measurement, and correspondingly, f v is the sensor fault acting on the generalized velocity vector. Furthermore, we define as the fault tracking error vector. In practical applications, is measurable, not e v . Finally, the following fault diagnosis and estimation mechanisms are introduced:
其中L被选择为Hm-L Hurwitz。信号作为传感器故障发生和强度的指示器。设得到where L is chosen to be H m -L Hurwitz. Signal As an indicator of the occurrence and intensity of sensor failures. Assume get
上式中,Hm-L表示Hurwitz矩阵,ul表示来自深度学习模块的控制策略,β(v)表示内环动力学中所有模型不确定性的集合,nv表示广义速度测量值上的噪声矢量,fv表示作用于广义速度矢量的传感器故障。In the above formula, H m -L represents the Hurwitz matrix, u l represents the control strategy from the deep learning module, β(v) represents the set of all model uncertainties in the inner loop dynamics, n v represents the noise vector on the generalized velocity measurement, f v represents the action on the generalized velocity vector Sensor failure.
S4、根据控制任务需求,设计相应的回调函数,利用全连通网络搭建强化学习评价函数模型(Q-value)和控制策略模型。S4. Design a corresponding callback function according to the control task requirements, and use a fully connected network to build a reinforcement learning evaluation function model (Q-value) and a control strategy model.
回调函数、学习评价函数为、控制策略模型推导如下:The callback function, the learning evaluation function and the control strategy model are derived as follows:
基于RL的容错控制是使用故障诊断和估计机制的输出得到的。RL使用数据样本(包括输入和状态数据)在离散时间步学习控制策略。假设采样时间步长是固定的,用δt表示。在不丧失一般性的情况下,使yt,ub,t,ul,t,和分别代表ASV状态、标称控制器激发、来自RL的控制激发以及时间步长t处故障诊断和估计机制的输出。因此,在时间步长t处的状态信号s表示为:RL的训练学习过程将重复执行策略评估和策略改进。在策略评估中,Q-value是通过Bellman操作Qπ(st,ul,t)=TπQπ(st,ul,t)得到的,其中RL-based fault-tolerant control is obtained using the outputs of fault diagnosis and estimation mechanisms. RL uses data samples (including input and state data) to learn control policies at discrete time steps. The sampling time step is assumed to be fixed, denoted by δt. Without loss of generality, let y t , u b,t , u l,t , and represent the ASV state, the nominal controller excitation, the control excitation from the RL, and the output of the fault diagnosis and estimation mechanism at time step t, respectively. Therefore, the state signal s at time step t is expressed as: The training learning process of RL will repeatedly perform policy evaluation and policy improvement. In policy evaluation, Q-value is obtained by Bellman operation Q π (s t ,u l,t )=T π Q π (s t ,u l,t ), where
上式中,ul,t表示来自RL的控制激发,st表示时间步长t处的状态信号,Tπ表示固定策略,Eπ表示期望算子,γ表示折扣因子,α表示温度系数,Qπ(st,ul,t)表示强化学习评价函数。In the above formula, u l,t represents the control excitation from RL, s t represents the state signal at time step t, T π represents the fixed policy, E π represents the expectation operator, γ represents the discount factor, α represents the temperature coefficient, Q π (s t , u l, t ) represents the reinforcement learning evaluation function.
在策略改进中,策略由下式更新:In policy refinement, the policy is updated by:
其中Π表示策略集,πold表示上次更新的策略,表示πold的Q值,DKL表示Kullback-Leibler(KL)散度,表示归一化因子。通过数学运算,目标被转化为where Π represents the policy set, π old represents the last updated policy, represents the Q value of π old , D KL represents the Kullback-Leibler (KL) divergence, represents the normalization factor. Through mathematical operations, the goal is transformed into
S5、在评价函数训练架构中引入双评价函数模型思想,同时在控制策略预期回报函数中加入策略的熵值,提升强化学习训练效率。S5. Introduce the idea of a dual-evaluation function model into the evaluation function training framework, and at the same time add the entropy value of the strategy to the expected return function of the control strategy to improve the training efficiency of reinforcement learning.
双评价函数模型推导过程:The derivation process of the double evaluation function model:
用θ参数化Q函数,并用Qθ(st,ul,t)表示。参数化策略由πφ(ul,t|st)表示,其中φ是要训练的参数集。注意,θ和φ都是一组参数,其尺寸由深度神经网络设置决定。例如,如果Qθ由具有K个隐藏层和每个隐藏层有L个神经元的MLP表示,则参数集θ为θ={θ0,θ1,...,θK}且在1≤i≤K-1上θK∈R1×(L+1),θi∈R(L)×(L+1),其中dims表示状态s的尺寸,dimu表示输入ul的尺寸。The Q function is parameterized by θ and denoted by Q θ (s t , u l,t ). The parameterized policy is denoted by π φ (u l,t |s t ), where φ is the set of parameters to be trained. Note that both θ and φ are a set of parameters whose dimensions are determined by the deep neural network settings. For example, if Q θ is represented by an MLP with K hidden layers and L neurons in each hidden layer, the parameter set θ is θ = {θ 0 , θ 1 ,...,θ K } and in 1≤ i≤K-1 on θ K ∈ R 1×(L+1) , θ i ∈ R (L)×(L+1) , where dim s represents the size of state s, and dim u represents the size of input u l .
训练全程是离线的,在每个时间步t+1收集数据样本,例如来自上一个时间步的输入ul,t,上一时间步st的状态、奖励Rt和当前状态st+1。这些历史数据将作为元组(st,ul,t,Rt,st+1)存储在记忆池D中。在每个策略评估或改进步骤中,我们从记忆池D中随机抽取一批历史数据B,用于训练参数θ和φ。开始训练时,我们将标称控制策略ub应用于ASV系统,以收集初始数据D0,如算法1所示。初始数据集D0用于Q函数的初始拟合。初始化结束后,执行ub和最新更新的强化学习策略πφ(ul,t|st)以运行ASV系统。The whole training process is offline, and data samples are collected at each time step t+1, such as the input u l,t from the previous time step, the state of the previous time step s t , the reward R t and the current state s t+1 . These historical data will be stored in memory pool D as a tuple (s t , u l,t , R t , s t+1 ). At each policy evaluation or improvement step, we randomly sample a batch of historical data B from memory pool D for training parameters θ and φ. When starting training, we apply the nominal control strategy ub to the ASV system to collect initial data D 0 , as shown in Algorithm 1. The initial data set D 0 is used for the initial fitting of the Q-function. After initialization, u b and the newly updated reinforcement learning policy π φ (u l,t |s t ) are executed to run the ASV system.
训练Q函数的参数θ以最小化贝尔曼残差:The parameter θ of the Q-function is trained to minimize the Bellman residual:
其中(st,ul,t)~D意味着我们从记忆池D中随机选取的样本(st,ul,t),且其中是将缓慢更新的目标参数。DNN参数θ是通过将随机梯度下降法应用于修正数据批次B上的(15)而获得的,数据批次B的大小由|B|表示。本发明中使用了两个分别由θ1和θ2参数化的评价。引入这两个评价是为了减少评价神经网络训练中的高估问题。在双评价函数下,目标值Ytarget为:where (s t , u l, t ) ~ D means a sample (s t , u l, t ) that we randomly select from memory pool D, and in is the target parameter that will be updated slowly. The DNN parameters θ are obtained by applying stochastic gradient descent to (15) on the revised data batch B, the size of which is denoted by |B|. Two evaluations parameterized by θ 1 and θ 2 , respectively, are used in the present invention. These two evaluations are introduced to reduce the overestimation problem in evaluating neural network training. Under the double evaluation function, the target value Y target is:
策略改进步骤要使用记忆池D中的数据样本来实现以下参数化目标函数最小化:The policy improvement step uses the data samples in memory pool D to minimize the following parameterized objective function:
使用随机梯度下降法将参数φ训练至最小化,在训练阶段,actor神经网络表示为:The parameter φ is trained to minimize using stochastic gradient descent. In the training phase, the actor neural network is expressed as:
其中是要学习的参数化控制律,是探测噪声标准偏差,ξ~N(0,I)是探测噪声,“⊙”是哈达玛积。注意,探测噪声ξ只适用于训练阶段,一旦训练完成,只需要在运用中的因此,在训练阶段的ul等价于ul,φ。一旦训练结束,得到 in is the parametric control law to be learned, is the standard deviation of the detection noise, ξ~N(0,I) is the detection noise, and “⊙” is the Hadamard product. Note that the detection noise ξ is only applicable in the training phase, once the training is completed, only the Therefore, u l in the training phase is equivalent to u l,φ . Once the training is over, get
温度参数α在训练阶段也会更新。其更新是通过最小化以下目标函数获得的:The temperature parameter α is also updated during the training phase. Its update is obtained by minimizing the following objective function:
其中为策略的熵值。本发明中设置其中“2”表示动作维度。in is the entropy value of the strategy. set in the present invention where "2" represents the action dimension.
S6、在无故障情况下,对基于模型参考强化学习的控制器进行训练,获得初始控制策略,保证总体控制器对于模型不确定性的鲁棒性。S6. In the case of no fault, the controller based on model reference reinforcement learning is trained to obtain an initial control strategy, so as to ensure the robustness of the overall controller to model uncertainty.
S7、在无人船系统中注入故障,对已获取的基于模型参考强化学习的初始控制策略进行再训练,实现总体控制器对于部分传感器故障的适应性。S7. Inject faults into the unmanned ship system, and retrain the obtained initial control strategy based on model reference reinforcement learning to realize the adaptability of the overall controller to some sensor faults.
S8、在不同初始状态条件下,不断重复步骤S6和步骤S7,直到强化学习的评价函数网络模型和控制策略模型收敛。S8. Repeat step S6 and step S7 continuously under different initial state conditions until the evaluation function network model and the control strategy model of the reinforcement learning converge.
具体地,步骤S6-S8的训练过程具体如下:1)为和分别初始化参数θ1,θ2,用φ表示actor网络;2)为目标参数指定值:3)运行ul=0时公式(5)中的ub,得到数据集D0;4)结束学习阶段的探索,使用数据集D0训练初始critic参数θ1 0,5)初始化记忆池D←D0;6)为critic参数与其目标指定初始值:θ1←θ1 0, 7)重复;8)开始循环,每个数据收集步骤执行操作;9)根据πφ(ul,t|st)选择一个动作ul,t;10)运行标称系统(6)和整个系统(5)以及故障诊断和估计机制(14)&收集st+1={xt+1,xm,t+1,ub,t+1};11)D←D∪{st,ul,t,R(st,ul,t),st+1};12)结束循环;13)开始循环,每个梯度更新步骤执行动作;14)从D中抽取一批数据B;15)θj←θj-ιQ▽θJQ(θj),且j=1,2;16)φ←φ-ιπ▽φJπ(φ);17)α←α-ια▽αJα(α);18)且j=1,2;19)结束循环;20)直至收敛(如JQ(θ)<一个小阈值)。在该算法中,ιQ,ιπ和ια是正学习率(标量),κ>0是常数标量。Specifically, the training process of steps S6-S8 is as follows: 1) is and Initialize the parameters θ 1 and θ 2 respectively, and use φ to represent the actor network; 2) Specify values for the target parameters: 3) Run u b in formula (5) when u l = 0 to obtain the data set D 0 ; 4) End the exploration of the learning phase, use the data set D 0 to train the initial critic parameter θ 1 0 , 5) Initialize memory pool D←D 0 ; 6) Specify initial values for critic parameters and their targets: θ 1 ←θ 1 0 , 7) Repeat; 8) Start the loop, performing actions at each data collection step; 9) Choose an action u l,t according to πφ (u l,t |s t ); 10) Run the nominal system (6) and the entire System (5) and Fault Diagnosis and Estimation Mechanism (14) & Collection s t +1 = {x t+1 , x m, t+1 , u b, t+1 }; 11) D←D∪{s t ,u l,t ,R(s t ,u l,t ),s t+1 }; 12) End the loop; 13) Start the loop, each gradient update step performs actions; 14) Extract a batch of data from D B; 15) θ j ←θ j -ι Q ▽ θ J Q (θ j ), and j=1,2; 16) φ←φ-ι π ▽ φ J π (φ); 17) α←α- ι α ▽ α J α (α); 18) and j=1,2; 19) end the loop; 20) until convergence (eg J Q (θ) < a small threshold). In this algorithm, ι Q , ι π and ι α are positive learning rates (scalars), and κ > 0 is a constant scalar.
一种基于模型参考强化学习的无人船容错控制系统,包括:A fault-tolerant control system for unmanned ships based on model reference reinforcement learning, including:
动力学模型构建模块,用于对无人船的不确定性因素进行分析,构建无人船名义动力学模型;The dynamic model building module is used to analyze the uncertain factors of the unmanned ship and construct the nominal dynamic model of the unmanned ship;
控制器设计模块,基于无人船名义动力学模型,设计无人船标称控制器;The controller design module, based on the nominal dynamics model of the unmanned ship, designs the nominal controller of the unmanned ship;
容错控制器构建模块,基于最大熵的Actor-Critic方法,根据实际无人船系统、无人船名义动力学模型的状态变量差值和无人船标称控制器的输出,构建基于模型参考强化学习的容错控制器;Fault-tolerant controller building module, based on the Actor-Critic method of maximum entropy, according to the actual unmanned ship system, the state variable difference of the unmanned ship's nominal dynamic model and the output of the unmanned ship's nominal controller, the model-based reference enhancement is constructed. Learned fault-tolerant controllers;
训练模块,用于根据控制任务需求,搭建强化学习评价函数和控制策略模型并训练容错控制器,得到训练完成的控制策略。The training module is used to build a reinforcement learning evaluation function and a control strategy model and train a fault-tolerant controller according to the requirements of the control task, and obtain the trained control strategy.
上述方法实施例中的内容均适用于本系统实施例中,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法实施例所达到的有益效果也相同。The contents in the above method embodiments are all applicable to the present system embodiments, the specific functions implemented by the present system embodiments are the same as the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
一种基于模型参考强化学习的无人船容错控制装置:A fault-tolerant control device for unmanned ships based on model reference reinforcement learning:
至少一个处理器;at least one processor;
至少一个存储器,用于存储至少一个程序;at least one memory for storing at least one program;
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如上所述一种基于模型参考强化学习的无人船容错控制方法。When the at least one program is executed by the at least one processor, the at least one processor implements the above-mentioned method for fault-tolerant control of an unmanned ship based on model reference reinforcement learning.
上述方法实施例中的内容均适用于本装置实施例中,本装置实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法实施例所达到的有益效果也相同。The contents in the above method embodiments are all applicable to the present device embodiments, the specific functions implemented by the present device embodiments are the same as the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
一种存储介质,其中存储有处理器可执行的指令,其特征在于:所述处理器可执行的指令在由处理器执行时用于实现如上所述一种基于模型参考强化学习的无人船容错控制方法。A storage medium storing processor-executable instructions, wherein the processor-executable instructions, when executed by the processor, are used to implement the above-mentioned model reference reinforcement learning-based unmanned ship Fault-tolerant control methods.
上述方法实施例中的内容均适用于本存储介质实施例中,本存储介质实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法实施例所达到的有益效果也相同。The contents in the foregoing method embodiments are all applicable to this storage medium embodiment, and the specific functions implemented by this storage medium embodiment are the same as those of the foregoing method embodiments, and the beneficial effects achieved are also the same as those achieved by the foregoing method embodiments. same.
以上是对本发明的较佳实施进行了具体说明,但本发明创造并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments, and those skilled in the art can make various equivalent deformations or replacements without departing from the spirit of the present invention. , these equivalent modifications or substitutions are all included within the scope defined by the claims of the present application.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111631716.8A CN114296350B (en) | 2021-12-28 | 2021-12-28 | A fault-tolerant control method for unmanned ships based on model reference reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111631716.8A CN114296350B (en) | 2021-12-28 | 2021-12-28 | A fault-tolerant control method for unmanned ships based on model reference reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114296350A true CN114296350A (en) | 2022-04-08 |
CN114296350B CN114296350B (en) | 2023-11-03 |
Family
ID=80972328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111631716.8A Active CN114296350B (en) | 2021-12-28 | 2021-12-28 | A fault-tolerant control method for unmanned ships based on model reference reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114296350B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110109355A (en) * | 2019-04-29 | 2019-08-09 | 山东科技大学 | A kind of unmanned boat unusual service condition self-healing control method based on intensified learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
-
2021
- 2021-12-28 CN CN202111631716.8A patent/CN114296350B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110109355A (en) * | 2019-04-29 | 2019-08-09 | 山东科技大学 | A kind of unmanned boat unusual service condition self-healing control method based on intensified learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
ZHANG QINGRUI等: "fault tolerant control for autonomous surface vehicles via model reference reinforcement learning", 《2021 60THIEEE CONFERENCE ON DECISION AND CONTROL(CDC)》 * |
ZHANGQINGRUI等: ""model-reference reinforcement learning control of autonomous surface vehicles", 《2020 59THIEEE CONFERENCE ON DECISION AND CONTROL(CDC)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114296350B (en) | 2023-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Data-driven performance-prescribed reinforcement learning control of an unmanned surface vehicle | |
Fan et al. | Global fixed-time trajectory tracking control of underactuated USV based on fixed-time extended state observer | |
Xue et al. | System identification of ship dynamic model based on Gaussian process regression with input noise | |
Peng et al. | Constrained control of autonomous underwater vehicles based on command optimization and disturbance estimation | |
Liang et al. | Finite-time velocity-observed based adaptive output-feedback trajectory tracking formation control for underactuated unmanned underwater vehicles with prescribed transient performance | |
Chen et al. | Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning | |
Elhaki et al. | Reinforcement learning-based saturated adaptive robust neural-network control of underactuated autonomous underwater vehicles | |
Ji et al. | Model-free fault diagnosis for autonomous underwater vehicles using sequence convolutional neural network | |
Wang et al. | Extended state observer-based fixed-time trajectory tracking control of autonomous surface vessels with uncertainties and output constraints | |
CN101871782B (en) | Position error forecasting method for GPS (Global Position System)/MEMS-INS (Micro-Electricomechanical Systems-Inertial Navigation System) integrated navigation system based on SET2FNN | |
CN114035550B (en) | An ESO-based fault diagnosis method for autonomous underwater robot actuators | |
Jiang et al. | Neural network based adaptive sliding mode tracking control of autonomous surface vehicles with input quantization and saturation | |
Gong et al. | Trajectory tracking control for autonomous underwater vehicles based on dual closed-loop of MPC with uncertain dynamics | |
CN107179693A (en) | Based on the Huber robust adaptive filtering estimated and method for estimating state | |
CN118244770B (en) | Repeated learning composite disturbance-resistant error-tolerant control method for unmanned ship | |
Yue et al. | Online adaptive parameter identification of an unmanned surface vehicle without persistency of excitation | |
Zhang et al. | Adaptive asymptotic tracking control for autonomous underwater vehicles with non-vanishing uncertainties and input saturation | |
Chen et al. | Dynamic positioning for underactuated surface vessel via L1 adaptive backstepping control | |
Zhang et al. | Event-trigger NMPC for 3-D trajectory tracking of UUV with external disturbances | |
Zhang et al. | AUV 3D docking control using deep reinforcement learning | |
Song et al. | Fuzzy optimal tracking control for nonlinear underactuated unmanned surface vehicles | |
Yan et al. | Event-triggered adaptive predefined-time sliding mode control of autonomous surface vessels with unknown dead zone and actuator faults | |
González-Prieto | Adaptive finite time smooth nonlinear sliding mode tracking control for surface vessels with uncertainties and disturbances | |
Yan et al. | Reinforcement learning-based integrated active fault diagnosis and tracking control | |
CN114296350B (en) | A fault-tolerant control method for unmanned ships based on model reference reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |