WO2024114458A1 - 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 - Google Patents
基于李雅普诺夫神经网络的无人系统控制方法及控制系统 Download PDFInfo
- Publication number
- WO2024114458A1 WO2024114458A1 PCT/CN2023/133088 CN2023133088W WO2024114458A1 WO 2024114458 A1 WO2024114458 A1 WO 2024114458A1 CN 2023133088 W CN2023133088 W CN 2023133088W WO 2024114458 A1 WO2024114458 A1 WO 2024114458A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- lyapunov
- neural network
- unmanned
- unmanned system
- control method
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 108
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000006870 function Effects 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000002787 reinforcement Effects 0.000 claims abstract description 25
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims description 17
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 4
- 230000004927 fusion Effects 0.000 abstract 1
- 238000013508 migration Methods 0.000 abstract 1
- 230000005012 migration Effects 0.000 abstract 1
- 239000003795 chemical substances by application Substances 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005183 dynamical system Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/04—Programme control other than numerical control, i.e. in sequence controllers or logic controllers
- G05B19/042—Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Definitions
- Unmanned ship control technology with safety assurance is of great significance. Ensuring the safety of unmanned ship control can, on the one hand, reduce the possibility of unnecessary damage to unmanned ships and dangerous accidents such as capsizing; on the other hand, it can help unmanned ships eliminate control actions with high risk factors, achieve more stable and effective control, and help unmanned ships get rid of excessive reliance on human prior knowledge and achieve true intelligence. Therefore, the safety assurance of unmanned ships is an important research direction and a key issue that needs to be solved urgently.
- the Lyapunov function is calculated based on the traditional method, and the calculation is performed through polynomial fitting; given a simple dynamic model, learning Lyapunov neural network is to fit the Lyapunov function of the given dynamic system through neural network, which solves the problem that Lyapunov function is not easy to find; learning Lyapunov neural network controller can be applied to some simple nonlinear systems, and find a suitable control function, and verify the Lyapunov condition at the same time.
- the unmanned ship system is a relatively complex nonlinear system, and the above methods cannot directly complete the safety assurance task.
- a suitable Lyapunov function can be calculated in a simple linear system, but it is difficult to find a suitable function in the unmanned ship system, and the function found can only cover a small part of the Lyapunov stable region; given a simple dynamic model to learn the Lyapunov neural network, this kind of dynamic system that is generally applicable to low-dimensional, discrete states cannot be directly applied to high-dimensional, continuous dynamic systems, and related research has basically stayed in simple experiments, such as inverted pendulums, and has not been expanded in more complex situations; learning the Lyapunov neural network controller, this method uses the Lyapunov condition to verify while generating the controller, but this method fixes the Lyapunov function, and cannot obtain a relatively large Lyapunov stable region, the exploration is insufficient, and it is not easy to migrate the algorithm. For systems with control algorithms, it cannot be effectively integrated.
- the purpose of the present invention is to provide an unmanned system control method and control system based on a Lyapunov neural network to address the deficiencies of the above-mentioned prior art, so as to at least solve one of the above-mentioned prior art problems.
- the technical solution adopted by the present invention is:
- the unmanned system After integrating the Lyapunov neural network and the model-based reinforcement learning agent of the unmanned system, the unmanned system is controlled.
- the Lyapunov neural network based on the observed state set of the unmanned system, wherein the input of the Lyapunov neural network is the working parameter data and working environment data of the unmanned system corresponding to the state, and the output of the Lyapunov neural network is the Lyapunov value corresponding to the state.
- the state is added to the safe set.
- the Gaussian process model and the Lyapunov neural network are updated based on the latest sample set.
- the model-based reinforcement learning agent of the unmanned system is obtained based on a filtering probability model predictive control algorithm; the model-based reinforcement learning agent that integrates the Lyapunov neural network and the unmanned system includes training the filtering probability model predictive control algorithm guided by the Lyapunov neural network, obtaining a reward function based on Lyapunov guidance, and controlling the unmanned system based on the reward function.
- the unmanned system is an unmanned ship, an unmanned vehicle, an unmanned aerial vehicle or a robot.
- the training sample set data includes the real-time positioning data of the unmanned ship, the speed and direction data of the unmanned ship, and the wind speed and direction data of the environment in which the unmanned ship is located; and controlling the unmanned system includes controlling its engine throttle and/or rudder angle.
- the present invention also provides an unmanned system control system based on a Lyapunov neural network, which is characterized by:
- Lyapunov function acquisition module used to obtain the Lyapunov function corresponding to the unmanned system through Lyapunov neural network fitting;
- Iterative training module used to guide the unmanned system to perform iterative training according to the safety area divided by the Lyapunov neural network;
- Control module used to control the unmanned system after integrating the Lyapunov neural network and the model-based reinforcement learning agent of the unmanned system.
- the unmanned system is an unmanned ship, an unmanned vehicle, an unmanned aerial vehicle or a robot.
- the present invention has the following beneficial effects:
- FIG1 is an overall framework diagram of an unmanned system control method based on a Lyapunov neural network according to an embodiment of the present invention (taking the unmanned system as an unmanned ship as an example).
- FIG2 is a diagram of a method for controlling an unmanned ship and an unmanned system based on a Lyapunov neural network according to an embodiment of the present invention (taking an unmanned ship as an example of the unmanned system).
- the present invention proposes a reinforcement learning unmanned system control method and control system based on Lyapunov neural network for iterative learning.
- the present invention proposes a control method for an unmanned system (such as an unmanned ship, etc.) based on reinforcement learning for iterative learning of a Lyapunov neural network.
- the safety of the system is ensured by iterative learning of the Lyapunov neural network.
- the unmanned system (such as an unmanned ship, etc.) can be autonomously learned without the need for human prior knowledge, thereby enabling safer and more effective control of the unmanned system (such as an unmanned ship, etc.).
- the present invention provides an unmanned system control method based on a Lyapunov neural network, comprising:
- the unmanned system After integrating the Lyapunov neural network and the model-based reinforcement learning agent of the unmanned system, the unmanned system is controlled.
- the method further includes training a Lyapunov neural network based on a set of observed states of the unmanned system, wherein the input of the Lyapunov neural network is working parameter data and working environment data of the unmanned system corresponding to the state, and the output of the Lyapunov neural network is a Lyapunov value corresponding to the state.
- the state is in a decreasing region.
- the state is added to the safe set.
- the Gaussian process model and the Lyapunov neural network are updated based on the latest sample set.
- the model-based reinforcement learning agent of the unmanned system is obtained based on a filtered probability model predictive control algorithm; the model-based reinforcement learning agent that integrates the Lyapunov neural network and the unmanned system includes training the filtered probability model predictive control algorithm guided by the Lyapunov neural network, obtaining a reward function based on Lyapunov guidance, and controlling the unmanned system based on the reward function.
- the unmanned system is an unmanned ship, an unmanned vehicle, a drone or a robot, etc.
- the training sample set data includes the real-time positioning data of the unmanned ship, the speed and direction data of the unmanned ship, and the wind speed and direction data of the environment in which the unmanned ship is located; controlling the unmanned system includes controlling its engine throttle and/or rudder angle.
- the present invention provides an unmanned system control system based on a Lyapunov neural network, wherein the unmanned system is an unmanned ship, an unmanned vehicle, an unmanned aerial vehicle or a robot.
- the unmanned system control system comprises:
- Lyapunov function acquisition module used to obtain the Lyapunov function corresponding to the unmanned system through Lyapunov neural network fitting;
- Iterative training module used to guide the unmanned system to perform iterative training according to the safety area divided by the Lyapunov neural network;
- Control module used to control the unmanned system after integrating the Lyapunov neural network and the model-based reinforcement learning agent of the unmanned system.
- the present invention improves the problem of lack of consideration of safety assurance in unmanned systems such as unmanned ships, and uses the Lyapunov neural network to solve the safety assurance problem that requires human prior knowledge, expands the Lyapunov neural network to high-dimensional, complex nonlinear systems, and proposes a reinforcement learning unmanned system control method and control system based on the Lyapunov neural network for iterative learning.
- the present invention fits the Lyapunov function of an unmanned system (such as an unmanned ship, etc.) through iterative training of a neural network, and then guides the unmanned system (such as an unmanned ship, etc.) to perform reinforcement learning training according to the safety area divided by the Lyapunov neural network.
- the Lyapunov neural network is integrated with the model-based reinforcement learning algorithm of the unmanned system (such as an unmanned ship, etc.), thereby improving the stable control capability of the unmanned system (such as an unmanned ship, etc.) and achieving more efficient and safe driving.
- the core idea of this invention is to use Lyapunov's second method to express the stability of the unmanned ship system from the perspective of energy.
- the safety set is defined as S ⁇ .
- any trajectory starting from the region x ⁇ S ⁇ remains in this region and gradually approaches the equilibrium point.
- v ⁇ (x) ⁇ ⁇ (x) ⁇ ⁇ ⁇ (x), where ⁇ ⁇ is a feedforward neural network.
- the activation function and weight matrix of each layer should satisfy the simple null space.
- the output dimension of each layer l as d l
- the weight matrix should be a full-rank matrix
- the weight matrix satisfies the condition d l ⁇ d l-1 to meet condition (1).
- the activation function and neural network of each layer meet the Lipschitz continuity condition, which can ensure that the Lyapunov neural network meets the Lipschitz continuity condition, that is, condition (3). Therefore, if a state meets condition (2) ⁇ v ⁇ (x) ⁇ 0, it can be determined that the state is a Lyapunov stable state. This condition is processed during the training process.
- the loss function is divided into two parts: the first part is to penalize the wrong classification and the distance between the Lyapunov value and Ci ; the second part is to penalize the state of Si that violates the decreasing condition (3):
- Ci maxv ⁇ (x), x ⁇ Si (11)
- the largest Lyapunov value in the safe set can be used as a critical value to divide the safe set and the unsafe area.
- the filter probability model predictive control is integrated with the Lyapunov neural network, and the learned Lyapunov neural network is introduced into the reward function of the filter probability model predictive control algorithm:
- Si is calculated based on the Lyapunov neural network and added to the reward function to guide the training of the unmanned ship.
- the control signal generated by the present invention mainly controls the throttle of the engine and the rudder angle of the rudder.
- the speed of the unmanned ship is controlled by controlling the throttle, and the steering is controlled by controlling the rudder angle.
- the unmanned ship In order to enable the entire system to run smoothly, the unmanned ship must collect some data to represent the current state and send it to the control system to generate a control signal. This requires some hardware to complete this collection work.
- GPS is used to locate the position of the unmanned ship and obtain the coordinates of the ship.
- a direction sensor is used to obtain the speed and direction of the unmanned ship.
- the wind sensor is used to obtain wind speed and wind direction information. The information collected by these hardware is integrated and sent to the control system of the unmanned ship to generate a control signal for the unmanned ship, so that the unmanned ship can sail smoothly on the sea.
- the scheme of the present invention does not rely on any human prior experience, it requires the unmanned ship to be able to conduct autonomous exploration to learn. Reinforcement learning, as a type of machine learning, has been widely used because it does not rely on a priori knowledge. Therefore, the present invention also adds reinforcement learning to the control algorithm of the unmanned ship, so that the unmanned ship can conduct autonomous exploration and learning, so that the unmanned ship can continuously optimize itself during navigation, making the model more accurate and the control performance of the unmanned ship more excellent.
- FIG2 is a diagram of an unmanned ship unmanned system control method based on a Lyapunov neural network according to an embodiment of the present invention.
- the sample set, loss function and other parameters are initialized, and then the initial Gaussian process model is trained through the sample set, and the Lyapunov neural network and C value are initialized, and then the iteration begins.
- the Lyapunov neural network is first trained, and then the filter probability model predictive control is trained according to the Lyapunov neural network guidance.
- the Gaussian process model and Lyapunov neural network are updated based on the latest sample set. The entire process ensures that the unmanned ship can conduct autonomous exploration and learning in the ocean, optimize the model through its own "experience" of continuous navigation, and thus obtain better control performance.
- the method of the present invention has been verified by computer simulation, and the result is very good and feasible.
- the present invention has good applicability. In addition to unmanned ship control, it can also be extended to unmanned systems such as unmanned vehicles, unmanned aerial vehicles, robots, etc., and has broad application prospects.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
本发明公开了一种基于李雅普诺夫神经网络的无人系统控制方法及控制系统,包括:通过李雅普诺夫神经网络拟合无人系统对应的李雅普诺夫函数;根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。本发明通过李雅普诺夫神经网络拟合李雅普诺夫函数,可以涵盖大部分的李雅普诺夫稳定区域,保证安全区域的充分探索;可扩展至较为复杂的非线性系统,可以在无人船等无人系统中学习李雅普诺夫神经网络;可以有效地迁移到其他的控制算法中,方便与其他算法进行融合。
Description
本发明属于无人系统控制技术领域,特别涉及一种基于李雅普诺夫神经网络的无人系统控制方法及控制系统。
近年来,为了解决海上运输行业中熟练的专业人员短缺以及运营效率问题,无人船的发展得到了快速的发展,出现了多种无人船控制方法。
船舶的海洋上航行的过程中会受到例如风力、水流扰动等环境因素的影响,存在着一定的安全隐患。同时安全性问题一直是控制领域中的核心问题,但是由于无人船系统的安全性强烈依赖于人对无人船的先验知识以及手动选择的特征,所以安全问题在现有的无人船控制方法中很少被解决。
具有安全保证的无人船控制技术具有重大意义。保障无人船控制的安全性,一方面可以减小无人船出现不必要的损伤,发生危险事故,如翻船这类事件的可能性;另一方面可以帮助无人船排除危险系数较大的控制动作,实现更为稳定、有效的控制,帮助无人船摆脱过度依赖人类的先验知识,实现真正的智能化。所以,无人船的安全性保障是一个重要的研究方向,也是亟待解决的关键问题。
针对保障安全控制问题,研究者提出了许多方法,大致可以分为3类:基于传统方法的计算李雅普诺夫函数方法;给定简单动力模型学习李雅普诺夫神经网络;学习李雅普诺夫神经网络控制器。其中,基于传统方法计算李雅普诺夫函数,通过多项式拟合来进行计算;给定简单动力模型学习李雅普诺夫神经网络是通过神经网络来拟合给定的动力系统的李雅普诺夫函数,解决了李雅普诺夫函数不易寻找的问题;学习李雅普诺夫神经网络控制器可以应用到一些简单的非线性系统中,并找到一个合适的控制函数,同时对李雅普诺夫条件进行了验证。
无人船系统是一个相对较为复杂的非线性系统,上述几种方法都无法直接完成安全保障任务。基于传统方法计算李雅普诺夫函数,在简单的线性系统中可以计算出合适的李雅普诺夫函数,但在无人船系统中很难寻找到合适的函数,而且所找到的函数只能涵盖一小部分的李雅普诺夫稳定区域;给定简单动力模型学习李雅普诺夫神经网络,这种普遍适用于低维、离散状态的动力系统,无法直接应用到高维、连续的动力系统中,相关研究也基本停留在简单的实验,如倒立摆等,没有在更为复杂的情况下进行拓展;学习李雅普诺夫神经网络控制器,这种方法在生成控制器的同时用李雅普诺夫条件进行验证,但是这种方法固定了李雅普诺夫函数,无法得到占比较大的李雅普诺夫稳定区域,探索不充分,而且不易进行算法迁移,针对有控制算法的系统,无法进行有效的融合。
本发明的目的在于,针对上述现有技术的不足,提供一种基于李雅普诺夫神经网络的无人系统控制方法及控制系统,用以至少解决上述现有技术问题之一。
为解决上述技术问题,本发明所采用的技术方案是:
一种基于李雅普诺夫神经网络的无人系统控制方法,其特点是包括:
通过李雅普诺夫神经网络拟合无人系统对应的李雅普诺夫函数;
根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;
融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。
进一步地,还包括基于无人系统的观测状态集合对李雅普诺夫神经网络进行训练,其中,所述李雅普诺夫神经网络的输入为状态对应的无人系统的工作参数数据和工作环境数据,所述李雅普诺夫神经网络的输出为状态对应的李雅普诺夫值。
作为一种优选方式,在李雅普诺夫神经网络训练过程中,状态在递减区域内。
作为一种优选方式,在李雅普诺夫神经网络训练过程中,在潜在的安全区域内,若某个状态在经历设定时间步后满足设定的安全集定义,则将该状态加入安全集中。
作为一种优选方式,在每次迭代训练结束后,基于最新的样本集更新高斯过程模型和李雅普诺夫神经网络。
作为一种优选方式,所述无人系统的有模型强化学习智能体基于滤波概率模型预测控制算法获得;所述融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体包括根据李雅普诺夫神经网络指导滤波概率模型预测控制算法进行训练,获得基于李雅普诺夫指导的奖励函数,基于所述奖励函数指导控制无人系统。
作为一种优选方式,所述无人系统为无人船、无人车、无人机或机器人。
作为一种优选方式,当所述无人系统为无人船时,训练样本集数据包括无人船的实时定位数据、无人船的速度和方向数据、无人船所处环境的风速和风向数据;对无人系统进行控制包括对其发动机油门和/或船舵舵角控制。
基于同一个发明构思,本发明还提供了一种基于李雅普诺夫神经网络的无人系统控制系统,其特点是包括:
李雅普诺夫函数获得模块:用于通过李雅普诺夫神经网络拟合获得无人系统对应的李雅普诺夫函数;
迭代训练模块:用于根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;
控制模块:用于在融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。
作为一种优选方式,所述无人系统为无人船、无人车、无人机或机器人。
与现有技术相比,本发明具有以下有益效果:
1)通过李雅普诺夫神经网络拟合李雅普诺夫函数,可以涵盖大部分的李雅普诺夫稳定区域,保证安全区域的充分探索。
2)可扩展至较为复杂的非线性系统,可以在无人船等无人系统中学习李雅普诺夫神经网络。
3)可以有效地迁移到其他的控制算法中,方便与其他算法进行融合。
图1为本发明一实施例的基于李雅普诺夫神经网络的无人系统控制方法整体框架图(以无人系统为无人船为例)。
图2为本发明一实施例的基于李雅普诺夫神经网络的无人船无人系统控制方法图(以无人系统为无人船为例)。
针对现有技术中的问题和不足,为了更高效、完整的探索安全区域,让无人船等无人系统的控制过程更加稳定,控制效率更加高效,从而能够实际应用,本发明提出一种基于李雅普诺夫神经网络来进行迭代学习的强化学习无人系统控制方法和控制系统。
本发明提出的基于李雅普诺夫神经网络来进行迭代学习的强化学习无人系统(如无人船等)控制方法,通过迭代学习李雅普诺夫神经网络来保证系统的安全性,同时能够实现无需人类的先验知识的无人系统(如无人船等)自主学习,可以更为安全、有效的实现对无人系统(如无人船等)的操控。
根据本发明的第一方面,本发明提供了一种基于李雅普诺夫神经网络的无人系统控制方法,包括:
通过李雅普诺夫神经网络拟合无人系统对应的李雅普诺夫函数;
根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;
融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。
在某些优选实施例中,还包括基于无人系统的观测状态集合对李雅普诺夫神经网络进行训练,其中,所述李雅普诺夫神经网络的输入为状态对应的无人系统的工作参数数据和工作环境数据,所述李雅普诺夫神经网络的输出为状态对应的李雅普诺夫值。
在某些优选实施例中,在李雅普诺夫神经网络训练过程中,状态在递减区域内。
在某些优选实施例中,在李雅普诺夫神经网络训练过程中,在潜在的安全区域内,若某个状态在经历设定时间步后满足设定的安全集定义,则将该状态加入安全集中。
在某些优选实施例中,在每次迭代训练结束后,基于最新的样本集更新高斯过程模型和李雅普诺夫神经网络。
在某些优选实施例中,所述无人系统的有模型强化学习智能体基于滤波概率模型预测控制算法获得;所述融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体包括根据李雅普诺夫神经网络指导滤波概率模型预测控制算法进行训练,获得基于李雅普诺夫指导的奖励函数,基于所述奖励函数指导控制无人系统。
所述无人系统为无人船、无人车、无人机或机器人等。
当所述无人系统为无人船时,训练样本集数据包括无人船的实时定位数据、无人船的速度和方向数据、无人船所处环境的风速和风向数据;对无人系统进行控制包括对其发动机油门和/或船舵舵角控制。
根据本发明的第二方面,本发明提供了一种基于李雅普诺夫神经网络的无人系统控制系统,所述无人系统为无人船、无人车、无人机或机器人等。无人系统控制系统包括:
李雅普诺夫函数获得模块:用于通过李雅普诺夫神经网络拟合获得无人系统对应的李雅普诺夫函数;
迭代训练模块:用于根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;
控制模块:用于在融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。
本发明改善了无人船等无人系统缺乏考虑安全保障的问题,并且利用李雅普诺夫神经网络解决了需要人类先验知识的安全保障难题,把李雅普诺夫神经网络拓展到高维、复杂的非线性系统中,提出基于李雅普诺夫神经网络来进行迭代学习的强化学习无人系统控制方法及控制系统。
本发明通过迭代训练神经网络来拟合无人系统(如无人船等)的李雅普诺夫函数,然后根据该李雅普诺夫神经网络划分出的安全区域来指导无人系统(如无人船等)进行强化学习训练,把李雅普诺夫神经网络与无人系统(如无人船等)的有模型强化学习算法进行融合,提高无人系统(如无人船等)的稳定控制能力,更为高效、安全的行驶。
以无人系统为无人船为例,本发明控制方法的整体框架图见图1。下面对本发明技术方案及原理进行详细阐述。
一、无人船控制系统
(一)李雅普诺夫神经网络
本发明的核心思想是用李雅普诺夫第二法,以能量的角度来表示无人船系统的稳定性。针对一个给定策略π,定义安全集为Sπ,在给定的系统和策略下,任何从x∈Sπ区域内出发的轨迹都保持在此区域内,并逐渐接近平衡点。针对时间步t,李雅普诺夫函数v(xt)需要满足以下条件:
v(0)=0,v(xt)>0for xt≠0, (1)
Δv(xt)=v(xt+1)-v(xt)<0,xt≠0, (2)
||f′(xt+1)-f′(xt)||≦k||xt+1-xt||,k∈R>0 (3)
v(0)=0,v(xt)>0for xt≠0, (1)
Δv(xt)=v(xt+1)-v(xt)<0,xt≠0, (2)
||f′(xt+1)-f′(xt)||≦k||xt+1-xt||,k∈R>0 (3)
设置一个特别的神经网络来拟合李雅普诺夫函数,vθ(x)=φθ(x)·φθ(x),φθ是一个前反馈神经网络。李雅普诺夫神经网络vθ(x)应该满足上述李雅普诺夫函数的条件:
vθ(0)=0;vθ(x)>0,x≠0;Δvθ(x)<0 (4)
vθ(0)=0;vθ(x)>0,x≠0;Δvθ(x)<0 (4)
为了保障具有简单零空间,即满足条件(1),每一层的激活函数和权重矩阵都应该满足具有简单零空间。定义每一层l的输出维度为dl,确定权重矩阵Wl为一个dl×dl-1的矩阵,权重矩阵应该是满秩的矩阵,Wlx=0只有零解,权重矩阵满足dl≥dl-1条件就可以满足条件(1)。每一层的激活函数及神经网络满足李普希兹连续条件,即可保证李雅普诺夫神经网络满足李普希兹连续条件,即条件(3)。所以如果一个状态满足条件(2)Δvθ(x)<0就可以确定该状态是李雅普诺夫稳定状态,这个条件在训练的过程进行处理。
(二)用李雅普诺夫神经网络处理无人船数据
将具有8个维度的无人船的数据作为神经网络的输入,神经网络的输出是该状态的李雅普诺夫值。按照i=1,2,…,Ntrial迭代训练李雅普诺夫神经网络,首先使用一个较小的数值初始化Ci用来描述安全区域,初始近似安全区域Si按照以下公式提供:
Si≈V(x,Ci)={x|vθ(x)<Ci} (5)
Si≈V(x,Ci)={x|vθ(x)<Ci} (5)
在当前的样本集X中,确定符合条件(3)的状态作为递减区域Di:
Di={xt∣vθ(xt)<0},xt∈X (6)
Di={xt∣vθ(xt)<0},xt∈X (6)
在训练的过程中,保证状态在递减区域内即可以保证符合李雅普诺夫稳定性条件,确定安全集Si为:
Si=V(x,Ci)={x|vθ(x)<Ci},x∈Di (7)
Si=V(x,Ci)={x|vθ(x)<Ci},x∈Di (7)
在强化学习训练的过程中,使用一个参数α∈R>1获得拓展训练集,探索潜在的安全区域:
Gi=V(x,αCi)-V(x,Ci),x∈Di (8)
Gi=V(x,αCi)-V(x,Ci),x∈Di (8)
在区域Gi内,如果一个状态在经历h个时间步后满足上述安全集的定义(7),则将该状态加入到安全集中。将位于潜在安全区域及已确定安全区域的状态当作训练集,x∈V(x,αCi),并将安全状态的标签设置为y=+1,否则设置为y=-1,遵循以下公式:
损失函数分为两个部分:第一部分根据错误的分类以及李雅普诺夫值与Ci的距离进行惩罚;第二部分针对Si中违反递减条件(3)的状态进行惩罚:
其中λ∈R>0是拉格朗日算子,模型使用随机梯度下降的方式进行更新,在训练完成之后Ci按照如下公式进行更新:
Ci+1=maxvθ(x),x∈Si (11)
Ci+1=maxvθ(x),x∈Si (11)
也就是安全集中最大的李雅普诺夫值,可以作为临界值划分安全集与不安全区域。
(三)整合李雅普诺夫神经网络到有模型强化学习框架中
滤波概率模型预测控制与李雅普诺夫神经网络进行融合,将学习后的李雅普诺夫神经网络引入到滤波概率模型预测控制算法中的奖励函数:
根据李雅普诺夫神经网络计算出Si,把它加入到奖励函数中指导无人船的训练。
二、无人船硬件系统
无人船在海洋中航行的过程中,起主要控制作用的是发动机和船的船舵,所以本发明产生的控制信号主要控制发动机的油门和船舵的舵角,通过控制油门来控制无人船的速度,同时控制船舵的舵角来控制转向。为了能使整个系统顺利运行,无人船就必须采集一些数据来表示此时的状态送给控制系统来产生控制信号,这就需要一些硬件来完成这个采集工作,使用了GPS来对无人船的位置做定位,得到船的坐标,使用方向传感器来得到无人船的速度和方向,通过风传感器来得到风速和风向信息,通过对这些硬件采集到的信息进行整合送入无人船的控制系统产生针对无人船的控制信号,使无人船能够顺利在海上航行。
三、强化学习框架
因为本发明方案不依赖于任何人类先验经验,所以这就需要无人船能够进行自主探索来学习。强化学习作为机器学习的一种,由于其不依赖于先验知识的学习方式而得到了广泛的应用,所以本发明在无人船的控制算法中也加入了强化学习,使无人船能够进行自主探索学习,从而使无人船在航行的过程中不断自我优化,使模型更加精准,对无人船的控制性能也会更加卓越。
以无人系统为无人船为例,图2本发明一实施例的基于李雅普诺夫神经网络的无人船无人系统控制方法图,开始会初始化样本集、损失函数等参数,然后通过样本集训练初始的高斯过程模型,初始化李雅普诺夫神经网络和C值,之后开始进行迭代。迭代过程中,首先训练李雅普诺夫神经网络,然后根据李雅普诺夫神经网络指导滤波概率模型预测控制进行训练,每一次迭代结束后,
根据最新的样本集来更新高斯过程模型及李雅普诺夫神经网络。整个过程保证了无人船能够在海洋中进行自主探索学习,通过自己不断航行的“经验”来优化模型,从而获得更好的控制表现。
本发明所提出的基于李雅普诺夫神经网络来进行迭代学习的强化学习无人船控制方法,相较于其他现有技术,具有以下优点:
1.将李雅普诺夫神经网络拓展到有模型强化学习框架中的高维数据,拓展李雅普诺夫神经网络到高维、连续的动力系统中。
2.借助李雅普诺夫神经网络来指导无人船学习,提高无人船控制的稳定性、鲁棒性和安全性,避免出现失控等现象。
本发明方法已经过计算机仿真验证,结果很好,具有可行性。
本发明适用性好,除无人船控制外,还可拓展至如无人车、无人机、机器人等无人系统中,具有广阔的应用前景。
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是局限性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本发明的保护范围之内。
Claims (10)
- 一种基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,包括:通过李雅普诺夫神经网络拟合无人系统对应的李雅普诺夫函数;根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。
- 根据权利要求1所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,还包括基于无人系统的观测状态集合对李雅普诺夫神经网络进行训练,其中,所述李雅普诺夫神经网络的输入为状态对应的无人系统的工作参数数据和工作环境数据,所述李雅普诺夫神经网络的输出为状态对应的李雅普诺夫值。
- 根据权利要求2所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,在李雅普诺夫神经网络训练过程中,状态在递减区域内。
- 根据权利要求2所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,在李雅普诺夫神经网络训练过程中,在潜在的安全区域内,若某个状态在经历设定时间步后满足设定的安全集定义,则将该状态加入安全集中。
- 根据权利要求1至4任一项所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,在每次迭代训练结束后,基于最新的样本集更新高斯过程模型和李雅普诺夫神经网络。
- 根据权利要求1至4任一项所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,所述无人系统的有模型强化学习智能体基于滤波概率模型预测控制算法获得;所述融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体包括根据李雅普诺夫神经网络指导滤波概率模型预测控制算法进行训练,获得基于李雅普诺夫指导的奖励函数,基于所述奖励函数指导控制无人系统。
- 根据权利要求1至4任一项所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,所述无人系统为无人船、无人车、无人机或机器人。
- 根据权利要求7所述的基于李雅普诺夫神经网络的无人系统控制方法,其特征在于,当所述无人系统为无人船时,训练样本集数据包括无人船的实时定位数据、无人船的速度和方向数据、无人船所处环境的风速和风向数据;对无人系统进行控制包括对其发动机油门和/或船舵舵角控制。
- 一种基于李雅普诺夫神经网络的无人系统控制系统,其特征在于,包括:李雅普诺夫函数获得模块:用于通过李雅普诺夫神经网络拟合获得无人系统对应的李雅普诺夫函数;迭代训练模块:用于根据所述李雅普诺夫神经网络划分出的安全区域指导无人系统进行迭代训练;控制模块:用于在融合李雅普诺夫神经网络和无人系统的有模型强化学习智能体后,对无人系统进行控制。
- 根据权利要求9所述的基于李雅普诺夫神经网络的无人系统控制系统, 其特征在于,所述无人系统为无人船、无人车、无人机或机器人。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211535505.9 | 2022-12-02 | ||
CN202211535505.9A CN115933467A (zh) | 2022-12-02 | 2022-12-02 | 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024114458A1 true WO2024114458A1 (zh) | 2024-06-06 |
Family
ID=86698515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/133088 WO2024114458A1 (zh) | 2022-12-02 | 2023-11-21 | 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115933467A (zh) |
WO (1) | WO2024114458A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115933467A (zh) * | 2022-12-02 | 2023-04-07 | 中国科学院深圳先进技术研究院 | 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110928189A (zh) * | 2019-12-10 | 2020-03-27 | 中山大学 | 一种基于强化学习和李雅普诺夫函数的鲁棒控制方法 |
WO2021071304A1 (ko) * | 2019-10-11 | 2021-04-15 | 서울대학교산학협력단 | 안정화된 비선형 최적 제어 방법 |
CN113189867A (zh) * | 2021-03-24 | 2021-07-30 | 大连海事大学 | 一种考虑位姿与速度受限的无人船自学习最优跟踪控制方法 |
CN114859899A (zh) * | 2022-04-18 | 2022-08-05 | 哈尔滨工业大学人工智能研究院有限公司 | 移动机器人导航避障的演员-评论家稳定性强化学习方法 |
CN115016278A (zh) * | 2022-06-22 | 2022-09-06 | 同济大学 | 一种基于blf-srl的自动驾驶控制方法 |
CN115933467A (zh) * | 2022-12-02 | 2023-04-07 | 中国科学院深圳先进技术研究院 | 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 |
-
2022
- 2022-12-02 CN CN202211535505.9A patent/CN115933467A/zh active Pending
-
2023
- 2023-11-21 WO PCT/CN2023/133088 patent/WO2024114458A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021071304A1 (ko) * | 2019-10-11 | 2021-04-15 | 서울대학교산학협력단 | 안정화된 비선형 최적 제어 방법 |
CN110928189A (zh) * | 2019-12-10 | 2020-03-27 | 中山大学 | 一种基于强化学习和李雅普诺夫函数的鲁棒控制方法 |
CN113189867A (zh) * | 2021-03-24 | 2021-07-30 | 大连海事大学 | 一种考虑位姿与速度受限的无人船自学习最优跟踪控制方法 |
CN114859899A (zh) * | 2022-04-18 | 2022-08-05 | 哈尔滨工业大学人工智能研究院有限公司 | 移动机器人导航避障的演员-评论家稳定性强化学习方法 |
CN115016278A (zh) * | 2022-06-22 | 2022-09-06 | 同济大学 | 一种基于blf-srl的自动驾驶控制方法 |
CN115933467A (zh) * | 2022-12-02 | 2023-04-07 | 中国科学院深圳先进技术研究院 | 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 |
Also Published As
Publication number | Publication date |
---|---|
CN115933467A (zh) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024114458A1 (zh) | 基于李雅普诺夫神经网络的无人系统控制方法及控制系统 | |
Han et al. | Automatic berthing for an underactuated unmanned surface vehicle: A real-time motion planning approach | |
CN109240288B (zh) | 一种基于轨迹单元的障碍物情况下无人艇避碰路径规划方法 | |
CN101833338B (zh) | 无人潜航器垂直面欠驱动运动控制方法 | |
CN114879671B (zh) | 一种基于强化学习mpc的无人艇轨迹跟踪控制方法 | |
CN108876065B (zh) | 一种基于轨迹单元的无人艇运动规划方法 | |
Wang et al. | Local path optimization method for unmanned ship based on particle swarm acceleration calculation and dynamic optimal control | |
CN111665846B (zh) | 一种基于快速扫描法的水面无人艇路径规划方法 | |
Zhang et al. | A multi-objective path planning method for the wave glider in the complex marine environment | |
Liang et al. | Economic MPC-based planning for marine vehicles: Tuning safety and energy efficiency | |
Wu et al. | An overview of developments and challenges for unmanned surface vehicle autonomous berthing | |
Guo et al. | Mission-driven path planning and design of submersible unmanned ship with multiple navigation states | |
CN114609905A (zh) | 一种船舶编队事件触发控制方法 | |
Zhang et al. | Autonomous navigation and control for a sustainable vessel: A wind-assisted strategy | |
CN116974278A (zh) | 基于改进los的滑模无人船路径跟踪控制系统及方法 | |
CN113820956B (zh) | 一种高速auv运动控制方法 | |
CN115933388A (zh) | 一种风能混合动力船舶能效多源协同优化系统和优化方法 | |
Wen et al. | Collision-free trajectory planning for autonomous surface vehicle | |
Li et al. | Energy-efficient space–air–ground–ocean-integrated network based on intelligent autonomous underwater glider | |
Rodriguez et al. | Adaptive Learning and Optimization of High-Speed Sailing Maneuvers for America's Cup | |
Yang et al. | The Application Status and Development Trend of Artificial Intelligence in Ship Fields of China | |
Wu et al. | Berthing Trajectory Tracking of Underactuated Surface Vehicle Based on NMPC and Position Estimation | |
Su et al. | Design and Simulation of Heading Controller for Unmanned Boat Based on Fuzzy Neural PID | |
Yuan et al. | Path Planning for Unmanned Sailboats using Improved Potential Field and A* Algorithm | |
CN117950398B (zh) | 基于速度障碍与模糊参数的无人艇规划方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23896609 Country of ref document: EP Kind code of ref document: A1 |