CN116300949A - Course tracking control method and system for discrete time reinforcement learning unmanned ship - Google Patents

Course tracking control method and system for discrete time reinforcement learning unmanned ship Download PDF

Info

Publication number
CN116300949A
CN116300949A CN202310321522.0A CN202310321522A CN116300949A CN 116300949 A CN116300949 A CN 116300949A CN 202310321522 A CN202310321522 A CN 202310321522A CN 116300949 A CN116300949 A CN 116300949A
Authority
CN
China
Prior art keywords
unmanned ship
course
module
reinforcement learning
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310321522.0A
Other languages
Chinese (zh)
Inventor
白伟伟
章文俊
刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN202310321522.0A priority Critical patent/CN116300949A/en
Publication of CN116300949A publication Critical patent/CN116300949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/0206Control of position or course in two dimensions specially adapted to water vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention provides a course tracking control method and a course tracking control system for a discrete time reinforcement learning unmanned ship. The method of the invention comprises the following steps: establishing an unmanned ship course discrete time nonlinear dynamics model; performing system transformation on the established unmanned ship course discrete time nonlinear dynamics model, and establishing an unmanned ship course tracking change system; designing an unmanned ship reinforcement learning evaluation module; and designing an unmanned ship course tracking controller so as to obtain a rudder angle instruction of the unmanned ship system, transmitting the rudder angle instruction to the unmanned ship rudder engine to output an unmanned ship course angle, and further realizing unmanned ship course tracking control. Aiming at the unmanned ship system in a non-strict feedback form, the invention utilizes a reinforcement learning method and utilizes a neural network to construct a compensator, solves the problem that subsystems are not associated when a discrete time non-strict feedback unmanned ship system control design adopts a backstepping method to design a controller, realizes interaction between the control system and the environment, and reduces the dependence of the control system on the accuracy of a controlled object model.

Description

Course tracking control method and system for discrete time reinforcement learning unmanned ship
Technical Field
The invention relates to the technical field of ship automatic control, in particular to a discrete time reinforcement learning unmanned ship course tracking control method and system.
Background
The unmanned ship is an important marine device in the ocean century of the 21 st century, can replace people to execute complex dangerous work tasks, and is widely applied in the military and civil fields. The unmanned ship course dynamic model is changed into an uncertain nonlinear model under the influence of factors such as loading conditions, sailing conditions and the like, and certain challenges are brought to unmanned ship course control. In view of the above control problems, many intelligent algorithms are applied to unmanned ship course control, such as robust control, sliding mode control, adaptive control, model predictive control, and the like.
The existing control method has the defects of insufficient control precision caused by simplifying strong interference of factors such as wind, wave, current and surge on the ship body, and can provide a new solution thought by enhancing interaction between the unmanned ship system and the environment. In addition, most of the existing research results simplify the unmanned ship motion mathematical model into a strict feedback form, and control design cannot be performed on a discrete time system with a non-strict feedback form in a more general form. Therefore, it is urgent to design a general reinforcement learning unmanned ship course control method.
Disclosure of Invention
According to the technical problems, the method and the system for tracking and controlling the course of the unmanned ship through discrete time reinforcement learning are provided. The invention mainly aims at a discrete time unmanned ship system in a non-strict feedback form, and provides a general discrete time control design method through a neural network compensator, and interaction between the system and the environment can be improved through a reinforcement learning method.
The invention adopts the following technical means:
a course tracking control method of a discrete time reinforcement learning unmanned ship comprises the following steps:
establishing an unmanned ship course discrete time nonlinear dynamics model;
performing system transformation on the established unmanned ship course discrete time nonlinear dynamics model, and establishing an unmanned ship course tracking change system;
based on the established unmanned ship course tracking change system, designing an unmanned ship reinforcement learning evaluation module;
based on the unmanned ship reinforcement learning evaluation module, the unmanned ship course tracking controller is designed, so that an unmanned ship system rudder angle instruction is obtained, the rudder angle instruction is transmitted to the unmanned ship rudder machine to output an unmanned ship course angle, and further unmanned ship course tracking control is realized.
Further, the built unmanned ship course discrete time nonlinear dynamics model specifically comprises the following steps:
Figure BDA0004151892010000021
wherein x is 1 (k) The unmanned ship course angle is the unmanned ship course angle, the angle mark 1 is the 1 st subsystem, and k is the moment; x is x 2 (k) The angle mark 2 is the 2 nd subsystem for course angular velocity; u (k) is rudder angle input; y (k) is the system output;
Figure BDA0004151892010000022
is a heading information vector; />
Figure BDA0004151892010000023
And->
Figure BDA0004151892010000024
Is a nonlinear function of unknown smoothness; />
Figure BDA0004151892010000025
And->
Figure BDA0004151892010000026
Is an unknown bounded smooth function and satisfies +.>
Figure BDA0004151892010000027
And->
Figure BDA0004151892010000028
Figure BDA0004151892010000029
And->
Figure BDA00041518920100000210
Is an unknown positive number; d (k) is unknown bounded external disturbance and satisfies +.>
Figure BDA00041518920100000211
Figure BDA00041518920100000212
Is an unknown positive number.
Further, the system transformation is performed on the established unmanned ship course discrete time nonlinear dynamics model, and an unmanned ship course tracking and changing system is established, which comprises the following steps:
according to the unmanned ship heading information and the reference signal, calculating the heading angle dynamic error and the heading angular speed and the dynamic error of the virtual control law to obtain the unmanned ship system tracking dynamic error and the unmanned ship dynamic model transformation system, specifically comprising the following steps:
the unmanned ship on-board computer calculates a course tracking dynamic error by using course information:
e 1 (k)=x 1 (k)-y d (k)
e 2 (k)=x 2 (k)-α(k)
wherein e 1 (k) The dynamic error of the course angle of the unmanned ship and the course angle of the reference signal; e, e 2 (k) Error signals of the course angular velocity of the unmanned aerial vehicle and a virtual control law alpha (k); y is d (k) Is a smooth bounded reference signal;
in order to facilitate the course tracking control design of an unmanned ship system and avoid the problem of no correlation of subsystems, a system transformation is performed on a discrete time nonlinear dynamics model of the unmanned ship course, and an unmanned ship course tracking change system is established:
Figure BDA0004151892010000031
wherein F is 1 (. Cndot.) and F 2 (. Cndot.) is a nonlinear smooth unknown function; g 1 (. Cndot.) and G 2 (. Cndot.) is a nonlinear smooth function and satisfies
Figure BDA0004151892010000032
And->
Figure BDA0004151892010000033
Further, the unmanned ship reinforcement learning evaluation module specifically comprises:
course angle dynamic error e based on unmanned ship on-board computer 1 (k) And tracking performance threshold μ, design utility function
Figure BDA0004151892010000034
The following are provided:
Figure BDA0004151892010000035
wherein,,
Figure BDA0004151892010000036
indicating that the current tracking performance meets the requirements +.>
Figure BDA0004151892010000037
Indicating that the current tracking performance does not meet the requirement;
according to the Belman principle, utility functions are utilized
Figure BDA0004151892010000038
The strategy utility function q (k) is designed as follows:
Figure BDA0004151892010000039
wherein, beta is more than 0 and less than 1 as design parameters, N is a time range;
according to the universal approximation theorem of the neural network, a strategy utility function q (k) is obtained as follows:
Figure BDA00041518920100000310
wherein θ c Is a desired weight vector and satisfies
Figure BDA00041518920100000311
Figure BDA00041518920100000312
The unknown normal number is represented, and the subscript c represents an evaluation module; the upper corner mark T represents transposition operation; />
Figure BDA00041518920100000313
Is a bounded Gaussian basis function; delta c Is an approximation error and satisfies +.>
Figure BDA00041518920100000314
Figure BDA00041518920100000315
Is an unknown positive constant;
definition of Belman error xi c (k) The following are provided:
Figure BDA00041518920100000316
wherein,,
Figure BDA00041518920100000317
for the estimation of the policy utility function q (k),/->
Figure BDA00041518920100000318
Representing ideal weight value theta c Is determined by the estimation of (a);
xi according to defined Belman error c (k) Defining a cost function
Figure BDA00041518920100000319
Minimizing cost function J by gradient descent c (k) Obtaining an evaluation moldThe block neural network is self-adaptive to law
Figure BDA00041518920100000320
Wherein lambda is c Is the learning rate.
Further, unmanned ship reinforcement learning evaluation module based on design designs unmanned ship course tracking controller to obtain unmanned ship system rudder angle instruction, give unmanned ship rudder angle of unmanned ship rudder machine output unmanned ship course angle with rudder angle instruction transmission, and then realize unmanned ship course tracking control, include:
designing virtual control law alpha (k) and neural network adaptive law in unmanned ship reinforcement learning system execution module
Figure BDA0004151892010000041
Designing control law u (k) and neural network adaptive law in unmanned ship reinforcement learning system execution module
Figure BDA0004151892010000042
Further, the virtual control law alpha (k) and the neural network adaptive law in the execution module of the unmanned plane reinforcement learning system are designed
Figure BDA0004151892010000043
The method specifically comprises the following steps:
neural network compensator phi defining the first step in an execution module 1 (k) The following are provided:
Figure BDA0004151892010000044
wherein θ 1 Is a desired weight vector and satisfies
Figure BDA0004151892010000045
Figure BDA0004151892010000046
Indicating an unknown normal number, and subscript 1 indicates a first subsystem; />
Figure BDA0004151892010000047
Is a neural network compensator phi 1 (k) Is a vector of inputs of (a);
compensator phi according to neural network 1 (k) The virtual control law α (k) is designed as follows:
Figure BDA0004151892010000048
wherein,,
Figure BDA0004151892010000049
representing ideal weight value theta 1 Is determined by the estimation of (a);
defining a policy utility function (xi) in a first execution module 1 (k) The following are provided:
Figure BDA00041518920100000410
wherein k is 1 =k-1;
Defining a cost function according to a policy utility function
Figure BDA00041518920100000411
Minimizing cost function J by gradient descent 1 (k) The first execution module neural network adaptive law is obtained as follows:
Figure BDA00041518920100000412
wherein lambda is 1 Is the learning rate.
Further, the control law u (k) and the neural network adaptive law in the execution module of the unmanned plane reinforcement learning system are designed
Figure BDA00041518920100000413
The method specifically comprises the following steps:
design control law u (k) as
Figure BDA0004151892010000051
Wherein c 1 > 0 and c 2 > 0 is a design parameter;
Figure BDA0004151892010000052
representing the ideal weight value theta of the neural network 2 And meet->
Figure BDA0004151892010000053
Figure BDA0004151892010000054
Indicating an unknown normal number, and subscript 2 indicates the first subsystem; />
Figure BDA0004151892010000055
Is an input vector to the neural network;
defining a policy utility function (xi) in a second execution module 2 (k) The following are provided:
Figure BDA0004151892010000056
wherein k is 2 =k;
According to the policy utility function xi 2 (k) Defining a cost function
Figure BDA0004151892010000057
Minimizing cost function J by gradient descent 2 (k) Obtaining a second execution module neural network adaptive law, as follows:
Figure BDA0004151892010000058
wherein lambda is 2 Is the learning rate.
The invention also provides a discrete time reinforcement learning unmanned ship course control system based on the discrete time reinforcement learning unmanned ship course tracking control method, which comprises the following steps:
the data acquisition unit is used for acquiring heading information of the unmanned ship;
the data transmission unit is used for transmitting the acquired unmanned ship course information to the shipboard computer;
the unmanned ship on-board computer is used for processing the acquired unmanned ship course information and realizing unmanned ship reinforcement learning control;
and the data feedback unit is used for transmitting the rudder angle instruction output by the shipborne computer to the unmanned ship steering engine to output the unmanned ship rudder angle, so as to realize unmanned ship course tracking control.
Further, unmanned ship on-board computer includes unmanned ship course system dynamics model module, unmanned ship evaluation module, neural network compensator module, virtual control law module, neural network self-adaptation update rate module, reinforcement study control law module and data feedback module, wherein:
the unmanned ship heading system dynamics model module is used for constructing an unmanned ship discrete time nonlinear dynamics model and a conversion system between system input and system output based on unmanned ship heading information;
the unmanned ship evaluation module is used for designing a strategy utility function and a cost function based on a preset tracking performance threshold based on the unmanned ship heading error, and realizing the design of the self-adaptive update rate of the neural network of the evaluation module;
the unmanned aerial vehicle neural network compensator module is used for outputting and compensating the unmanned aerial vehicle neural network based on the neural network system in the nonlinear unmanned aerial vehicle system;
the virtual control law module is used for designing a virtual control function of the unmanned ship system by utilizing the reference signal and the information of the compensation module, and designing a virtual control law according to the virtual control function;
the neural network self-adaptive update rate module is used for obtaining the neural network self-adaptive rate based on the evaluation module, the virtual control rate model, the control rate model information and the strategy utility function;
the reinforcement learning control law module is used for designing a controller based on the system error information and the virtual control law module information;
the data feedback module is used for transmitting the output information of the unmanned ship reinforcement learning control law module to the unmanned ship steering engine so as to realize the control of the reinforcement learning control law module on the heading of the unmanned ship.
Compared with the prior art, the invention has the following advantages:
1. the discrete time reinforcement learning unmanned ship course tracking control method and system provided by the invention effectively solve the problem that subsystems are not associated when a controller is designed by adopting a backstepping method through system conversion and a neural network compensator aiming at a discrete time unmanned ship system in a non-strict feedback mode.
2. According to the course tracking control method and system for the discrete time reinforcement learning unmanned ship, interaction between the unmanned ship system and the environment can be effectively improved through reinforcement learning technology, and the problem of control accuracy reduction caused by model simplification and environment simplification is solved.
Based on the reasons, the intelligent control method can be widely popularized in the fields of intelligent control of ship motions and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a control method of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise. Meanwhile, it should be clear that the dimensions of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
In the description of the present invention, it should be understood that the azimuth or positional relationships indicated by the azimuth terms such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal", and "top, bottom", etc., are generally based on the azimuth or positional relationships shown in the drawings, merely to facilitate description of the present invention and simplify the description, and these azimuth terms do not indicate and imply that the apparatus or elements referred to must have a specific azimuth or be constructed and operated in a specific azimuth, and thus should not be construed as limiting the scope of protection of the present invention: the orientation word "inner and outer" refers to inner and outer relative to the contour of the respective component itself.
Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
In addition, the terms "first", "second", etc. are used to define the components, and are only for convenience of distinguishing the corresponding components, and the terms have no special meaning unless otherwise stated, and therefore should not be construed as limiting the scope of the present invention.
As shown in FIG. 1, the invention provides a course tracking control method of a discrete time reinforcement learning unmanned ship, which comprises the following steps:
s1, establishing an unmanned ship course discrete time nonlinear dynamics model;
s2, performing system transformation on the established unmanned ship course discrete time nonlinear dynamics model, and establishing an unmanned ship course tracking change system;
s3, designing an unmanned ship reinforcement learning evaluation module based on the established unmanned ship course tracking change system;
s4, designing an unmanned ship course tracking controller based on the designed unmanned ship reinforcement learning evaluation module, so as to obtain an unmanned ship system rudder angle instruction, transmitting the rudder angle instruction to an unmanned ship rudder machine to output an unmanned ship course angle, and further realizing unmanned ship course tracking control.
In specific implementation, as a preferred embodiment of the present invention, in the step S1, establishing a discrete-time nonlinear dynamics model of unmanned ship heading includes:
acquiring unmanned ship heading information, transmitting the acquired unmanned ship heading information to a shipborne computer, and establishing an unmanned ship heading discrete time nonlinear dynamics model by the shipborne computer in consideration of unmanned ship rotation nonlinear characteristics; the unmanned ship course information comprises rudder angle information and compass-measured course angle information measured by an unmanned ship steering engine and course angular velocity information.
In specific implementation, as a preferred embodiment of the present invention, in the step S1, the built unmanned ship heading discrete time nonlinear dynamics model specifically includes:
Figure BDA0004151892010000091
wherein x is 1 (k) The unmanned ship course angle is the unmanned ship course angle, the angle mark 1 is the 1 st subsystem, and k is the moment; x is x 2 (k) The angle mark 2 is the 2 nd subsystem for course angular velocity; u (k) is rudder angle input; y (k) is the system output;
Figure BDA0004151892010000092
is a heading information vector; />
Figure BDA0004151892010000093
And->
Figure BDA0004151892010000094
Is a nonlinear function of unknown smoothness; />
Figure BDA0004151892010000095
And->
Figure BDA0004151892010000096
Is an unknown bounded smooth function and satisfies +.>
Figure BDA0004151892010000097
And->
Figure BDA0004151892010000098
Figure BDA0004151892010000099
And->
Figure BDA00041518920100000910
Is an unknown positive number; d (k) is unknown bounded external disturbance and satisfies +.>
Figure BDA00041518920100000911
Figure BDA00041518920100000912
Is an unknown positive number.
In a specific implementation, as a preferred embodiment of the present invention, in the step S2, a system transformation is performed on the created unmanned ship heading discrete time nonlinear dynamics model, and an unmanned ship heading tracking and changing system is created, including:
according to the unmanned ship heading information and the reference signal, calculating the heading angle dynamic error and the heading angular speed and the dynamic error of the virtual control law to obtain the unmanned ship system tracking dynamic error and the unmanned ship dynamic model transformation system, specifically comprising the following steps:
s21, calculating a course tracking dynamic error by using course information by the unmanned ship on-board computer:
e 1 (k)=x 1 (k)-y d (k)
e 2 (k)=x 2 (k)-α(k)
wherein e 1 (k) The dynamic error of the course angle of the unmanned ship and the course angle of the reference signal; e, e 2 (k) Error signals of the course angular velocity of the unmanned aerial vehicle and a virtual control law alpha (k); y is d (k) Is a smooth bounded reference signal;
s22, in order to facilitate the course tracking control design of the unmanned ship system and avoid the problem of no association of subsystems, performing system transformation on the unmanned ship course discrete time nonlinear dynamics model, and establishing an unmanned ship course tracking change system:
Figure BDA00041518920100000913
wherein F is 1 (. Cndot.) and F 2 (. Cndot.) is a nonlinear smooth unknown function; g 1 (. Cndot.) and G 2 (. Cndot.) is a nonlinear smooth function and satisfies
Figure BDA0004151892010000101
And->
Figure BDA0004151892010000102
In specific implementation, as a preferred embodiment of the present invention, in the step S3, an unmanned ship reinforcement learning evaluation module is designed, and specifically includes:
s31, course angle dynamic error e based on unmanned ship on-board computer 1 (k) And tracking performance threshold μ, design utility function
Figure BDA0004151892010000103
The following are provided:
Figure BDA0004151892010000104
wherein,,
Figure BDA0004151892010000105
indicating that the current tracking performance meets the requirements +.>
Figure BDA0004151892010000106
Indicating that the current tracking performance does not meet the requirement;
s32, utilizing utility function according to Belman principle
Figure BDA0004151892010000107
The strategy utility function q (k) is designed as follows:
Figure BDA0004151892010000108
wherein, beta is more than 0 and less than 1 as design parameters, N is a time range;
s33, obtaining a strategy utility function q (k) according to the universal approximation theorem of the neural network, wherein the strategy utility function q (k) is as follows:
Figure BDA0004151892010000109
wherein θ c Is a desired weight vector and satisfies
Figure BDA00041518920100001010
Figure BDA00041518920100001011
The unknown normal number is represented, and the subscript c represents an evaluation module; the upper corner mark T represents transposition operation; />
Figure BDA00041518920100001012
Is a bounded Gaussian basis function; delta c Is an approximation error and satisfies +.>
Figure BDA00041518920100001013
Figure BDA00041518920100001014
Is an unknown positive constant;
s34, defining Belman error c (k) The following are provided:
Figure BDA00041518920100001015
wherein,,
Figure BDA00041518920100001016
for the estimation of the policy utility function q (k),/->
Figure BDA00041518920100001017
Representing ideal weight value theta c Is determined by the estimation of (a);
s35, according to defined Belman error XI c (k) Defining a cost function
Figure BDA00041518920100001018
Minimizing cost function J by gradient descent c (k) Obtaining the neural network adaptive law of the evaluation module as
Figure BDA00041518920100001019
Wherein lambda is c Is the learning rate.
In a specific implementation, as a preferred embodiment of the present invention, in step S4, an unmanned ship course tracking controller is designed based on a designed unmanned ship reinforcement learning evaluation module, so as to obtain a rudder angle instruction of an unmanned ship system, and the rudder angle instruction is transmitted to an unmanned ship rudder engine to output an unmanned ship course angle, so as to realize unmanned ship course tracking control, including:
s41, designing a virtual control law alpha (k) and a neural network adaptive law in an execution module of the unmanned ship reinforcement learning system
Figure BDA0004151892010000111
The method specifically comprises the following steps:
s411, defining a neural network compensator phi of the first step in the execution module 1 (k) The following are provided:
Figure BDA0004151892010000112
wherein θ 1 Is a desired weight vector and satisfies
Figure BDA0004151892010000113
Figure BDA0004151892010000114
Indicating an unknown normal number, and subscript 1 indicates a first subsystem; />
Figure BDA0004151892010000115
Is a neural network compensator phi 1 (k) Is a vector of inputs of (a);
s412, compensating phi according to the neural network 1 (k) The virtual control law α (k) is designed as follows:
Figure BDA0004151892010000116
wherein,,
Figure BDA0004151892010000117
representing ideal weight value theta 1 Is determined by the estimation of (a);
s413, defining a policy utility function (XI) in the first execution module 1 (k) The following are provided:
Figure BDA0004151892010000118
wherein k is 1 =k-1;
S414, defining a cost function according to the strategy utility function
Figure BDA0004151892010000119
Minimizing cost function J by gradient descent 1 (k) The first execution module neural network adaptive law is obtained as follows:
Figure BDA00041518920100001110
wherein lambda is 1 Is the learning rate.
S42, designing a control law u (k) and a neural network self-adaptive law in an execution module of the unmanned ship reinforcement learning system
Figure BDA00041518920100001111
The method specifically comprises the following steps:
s421, designing a control law u (k) as
Figure BDA00041518920100001112
Wherein c 1 > 0 and c 2 > 0 is a design parameter;
Figure BDA00041518920100001113
representing the ideal weight value theta of the neural network 2 And meet->
Figure BDA00041518920100001114
Figure BDA00041518920100001115
Indicating an unknown normal number, and subscript 2 indicates the first subsystem; />
Figure BDA00041518920100001116
Is an input vector to the neural network;
s422, defining a policy utility function (XI) in the second execution module 2 (k) The following are provided:
Figure BDA0004151892010000121
wherein k is 2 =k;
S423, according to the policy utility function 2 (k) Defining a cost function
Figure BDA0004151892010000122
Minimizing cost function J by gradient descent 2 (k) Obtaining a second execution module neural network adaptive law, as follows:
Figure BDA0004151892010000123
wherein lambda is 2 Is the learning rate.
Corresponding to the discrete time reinforcement learning unmanned ship course tracking control method in the application, the application also provides a discrete time reinforcement learning unmanned ship course control system, which comprises the following steps: the data acquisition unit is used for acquiring heading information of the unmanned ship;
the data transmission unit is used for transmitting the acquired unmanned ship course information to the shipboard computer;
the unmanned ship on-board computer is used for processing the acquired unmanned ship course information and realizing unmanned ship reinforcement learning control;
and the data feedback unit is used for transmitting the rudder angle instruction output by the shipborne computer to the unmanned ship steering engine to output the unmanned ship rudder angle, so as to realize unmanned ship course tracking control.
In this embodiment, preferably, the unmanned ship-borne computer includes an unmanned ship heading system dynamics model module, an unmanned ship evaluation module, a neural network compensator module, a virtual control law module, a neural network self-adaptive update rate module, a reinforcement learning control law module and a data feedback module, wherein:
the unmanned ship heading system dynamics model module is used for constructing an unmanned ship discrete time nonlinear dynamics model and a conversion system between system input and system output based on unmanned ship heading information;
the unmanned ship evaluation module is used for designing a strategy utility function and a cost function based on a preset tracking performance threshold based on the unmanned ship heading error, and realizing the design of the self-adaptive update rate of the neural network of the evaluation module;
the unmanned aerial vehicle neural network compensator module is used for outputting and compensating the unmanned aerial vehicle neural network based on the neural network system in the nonlinear unmanned aerial vehicle system;
the virtual control law module is used for designing a virtual control function of the unmanned ship system by utilizing the reference signal and the information of the compensation module, and designing a virtual control law according to the virtual control function;
the neural network self-adaptive update rate module is used for obtaining the neural network self-adaptive rate based on the evaluation module, the virtual control rate model, the control rate model information and the strategy utility function;
the reinforcement learning control law module is used for designing a controller based on the system error information and the virtual control law module information;
the data feedback module is used for transmitting the output information of the unmanned ship reinforcement learning control law module to the unmanned ship steering engine so as to realize the control of the reinforcement learning control law module on the heading of the unmanned ship.
For the embodiments of the present invention, since they correspond to those in the above embodiments, the description is relatively simple, and the relevant similarities will be found in the description of the above embodiments, and will not be described in detail herein.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The course tracking control method for the unmanned ship is characterized by comprising the following steps of:
establishing an unmanned ship course discrete time nonlinear dynamics model;
performing system transformation on the established unmanned ship course discrete time nonlinear dynamics model, and establishing an unmanned ship course tracking change system;
based on the established unmanned ship course tracking change system, designing an unmanned ship reinforcement learning evaluation module;
based on the unmanned ship reinforcement learning evaluation module, the unmanned ship course tracking controller is designed, so that an unmanned ship system rudder angle instruction is obtained, the rudder angle instruction is transmitted to the unmanned ship rudder machine to output an unmanned ship course angle, and further unmanned ship course tracking control is realized.
2. The method for controlling course tracking of a unmanned ship by discrete-time reinforcement learning according to claim 1, wherein the step of establishing a discrete-time nonlinear dynamics model of the unmanned ship course comprises the steps of:
acquiring unmanned ship heading information, transmitting the acquired unmanned ship heading information to a shipborne computer, and establishing an unmanned ship heading discrete time nonlinear dynamics model by the shipborne computer in consideration of unmanned ship rotation nonlinear characteristics; the unmanned ship course information comprises rudder angle information and compass-measured course angle information measured by an unmanned ship steering engine and course angular velocity information.
3. The discrete-time reinforcement learning unmanned ship course tracking control method according to claim 1, wherein the built unmanned ship course discrete-time nonlinear dynamics model is specifically:
Figure FDA0004151891990000011
wherein x is 1 (k) The unmanned ship course angle is the unmanned ship course angle, the angle mark 1 is the 1 st subsystem, and k is the moment; x is x 2 (k) The angle mark 2 is the 2 nd subsystem for course angular velocity; u (k) is rudder angle input; y (k) isOutputting a system;
Figure FDA0004151891990000012
is a heading information vector; />
Figure FDA0004151891990000013
And->
Figure FDA0004151891990000014
Is a nonlinear function of unknown smoothness; />
Figure FDA0004151891990000015
And->
Figure FDA0004151891990000016
Is an unknown bounded smooth function and satisfies +.>
Figure FDA0004151891990000017
And->
Figure FDA0004151891990000018
Figure FDA0004151891990000019
And->
Figure FDA00041518919900000110
Is an unknown positive number; d (k) is unknown bounded external disturbance and satisfies +.>
Figure FDA00041518919900000111
Figure FDA00041518919900000112
Is an unknown positive number.
4. The method for controlling course tracking of unmanned ship by discrete time reinforcement learning according to claim 1, wherein the performing system transformation on the established unmanned ship course discrete time nonlinear dynamics model to establish an unmanned ship course tracking change system comprises:
according to the unmanned ship heading information and the reference signal, calculating the heading angle dynamic error and the heading angular speed and the dynamic error of the virtual control law to obtain the unmanned ship system tracking dynamic error and the unmanned ship dynamic model transformation system, specifically comprising the following steps:
the unmanned ship on-board computer calculates a course tracking dynamic error by using course information:
e 1 (k)=x 1 (k)-y d (k)
e 2 (k)=x 2 (k)-α(k)
wherein e 1 (k) The dynamic error of the course angle of the unmanned ship and the course angle of the reference signal; e, e 2 (k) Error signals of the course angular velocity of the unmanned aerial vehicle and a virtual control law alpha (k); y is d (k) Is a smooth bounded reference signal;
in order to facilitate the course tracking control design of an unmanned ship system and avoid the problem of no correlation of subsystems, a system transformation is performed on a discrete time nonlinear dynamics model of the unmanned ship course, and an unmanned ship course tracking change system is established:
Figure FDA0004151891990000021
wherein F is 1 (. Cndot.) and F 2 (. Cndot.) is a nonlinear smooth unknown function; g 1 (. Cndot.) and G 2 (. Cndot.) is a nonlinear smooth function and satisfies
Figure FDA0004151891990000022
And->
Figure FDA0004151891990000023
5. The method for tracking and controlling the heading of the unmanned ship through discrete time reinforcement learning according to claim 1, wherein the unmanned ship reinforcement learning evaluation module is designed, and specifically comprises:
course angle dynamic error e based on unmanned ship on-board computer 1 (k) And tracking performance threshold μ, design utility function
Figure FDA0004151891990000024
The following are provided:
Figure FDA0004151891990000025
wherein,,
Figure FDA0004151891990000026
indicating that the current tracking performance meets the requirements +.>
Figure FDA0004151891990000027
Indicating that the current tracking performance does not meet the requirement;
according to the Belman principle, utility functions are utilized
Figure FDA0004151891990000028
The strategy utility function q (k) is designed as follows:
Figure FDA0004151891990000029
wherein, beta is more than 0 and less than 1 as design parameters, N is a time range;
according to the universal approximation theorem of the neural network, a strategy utility function q (k) is obtained as follows:
Figure FDA00041518919900000210
wherein θ c Is a desired weight vector and satisfies
Figure FDA00041518919900000211
Figure FDA00041518919900000212
The unknown normal number is represented, and the subscript c represents an evaluation module; the upper corner mark T represents transposition operation; />
Figure FDA00041518919900000213
Is a bounded Gaussian basis function; delta c Is an approximation error and satisfies +.>
Figure FDA0004151891990000031
Figure FDA0004151891990000032
Is an unknown positive constant;
definition of Belman error xi c (k) The following are provided:
Figure FDA0004151891990000033
wherein,,
Figure FDA0004151891990000034
for the estimation of the policy utility function q (k),/->
Figure FDA0004151891990000035
Representing ideal weight value theta c Is determined by the estimation of (a);
xi according to defined Belman error c (k) Defining a cost function
Figure FDA0004151891990000036
Minimizing cost function J by gradient descent c (k) Obtaining the neural network adaptive law of the evaluation module as
Figure FDA0004151891990000037
Wherein lambda is c Is the learning rate.
6. The method for controlling course tracking of unmanned ship based on discrete time reinforcement learning according to claim 1, wherein the unmanned ship course tracking controller is designed based on the unmanned ship reinforcement learning evaluation module, thereby obtaining rudder angle instructions of the unmanned ship system, transmitting the rudder angle instructions to the unmanned ship rudder machine to output unmanned ship course angles, and further realizing unmanned ship course tracking control, comprising:
designing virtual control law alpha (k) and neural network adaptive law in unmanned ship reinforcement learning system execution module
Figure FDA0004151891990000038
Designing control law u (k) and neural network adaptive law in unmanned ship reinforcement learning system execution module
Figure FDA0004151891990000039
7. The method for tracking and controlling the heading of a unmanned ship based on discrete-time reinforcement learning according to claim 6, wherein the virtual control law alpha (k) and the neural network adaptive law in the execution module of the unmanned ship reinforcement learning system are designed
Figure FDA00041518919900000310
The method specifically comprises the following steps:
neural network compensator phi defining the first step in an execution module 1 (k) The following are provided:
Figure FDA00041518919900000311
wherein θ 1 Is a desired weight vector and satisfies
Figure FDA00041518919900000312
Figure FDA00041518919900000313
Indicating an unknown normal number, and subscript 1 indicates a first subsystem; />
Figure FDA00041518919900000314
Is a neural network compensator phi 1 (k) Is a vector of inputs of (a);
compensator phi according to neural network 1 (k) The virtual control law α (k) is designed as follows:
Figure FDA00041518919900000315
wherein,,
Figure FDA0004151891990000041
representing ideal weight value theta 1 Is determined by the estimation of (a);
defining a policy utility function (xi) in a first execution module 1 (k) The following are provided:
Figure FDA0004151891990000042
wherein k is 1 =k-1;
Defining a cost function according to a policy utility function
Figure FDA0004151891990000043
Minimizing cost function J by gradient descent 1 (k) The first execution module neural network adaptive law is obtained as follows:
Figure FDA0004151891990000044
wherein lambda is 1 Is the learning rate.
8. The method for tracking and controlling the heading of a unmanned ship based on discrete-time reinforcement learning of claim 6, wherein the unmanned ship reinforcement learning system is designed to perform control law u (k) and neural network adaptive law in a module
Figure FDA0004151891990000045
The method specifically comprises the following steps:
design control law u (k) as
Figure FDA0004151891990000046
Wherein c 1 > 0 and c 2 > 0 is a design parameter;
Figure FDA0004151891990000047
representing the ideal weight value theta of the neural network 2 And meet->
Figure FDA0004151891990000048
Figure FDA0004151891990000049
Indicating an unknown normal number, and subscript 2 indicates the first subsystem; />
Figure FDA00041518919900000410
Is an input vector to the neural network;
defining a policy utility function (xi) in a second execution module 2 (k) The following are provided:
Figure FDA00041518919900000411
wherein k is 2 =k;
According to the policySlightly effective function xi 2 (k) Defining a cost function
Figure FDA00041518919900000412
Minimizing cost function J by gradient descent 2 (k) Obtaining a second execution module neural network adaptive law, as follows:
Figure FDA00041518919900000413
wherein lambda is 2 Is the learning rate.
9. A discrete-time reinforcement learning unmanned ship course control system based on the discrete-time reinforcement learning unmanned ship course tracking control method of any one of claims 1 to 8, comprising:
the data acquisition unit is used for acquiring heading information of the unmanned ship;
the data transmission unit is used for transmitting the acquired unmanned ship course information to the shipboard computer;
the unmanned ship on-board computer is used for processing the acquired unmanned ship course information and realizing unmanned ship reinforcement learning control;
and the data feedback unit is used for transmitting the rudder angle instruction output by the shipborne computer to the unmanned ship steering engine to output the unmanned ship rudder angle, so as to realize unmanned ship course tracking control.
10. The discrete-time reinforcement-learning unmanned ship course control system of claim 9, wherein the unmanned ship onboard computer comprises an unmanned ship course system dynamics model module, an unmanned ship evaluation module, a neural network compensator module, a virtual control law module, a neural network adaptive update rate module, a reinforcement-learning control law module, and a data feedback module, wherein:
the unmanned ship heading system dynamics model module is used for constructing an unmanned ship discrete time nonlinear dynamics model and a conversion system between system input and system output based on unmanned ship heading information;
the unmanned ship evaluation module is used for designing a strategy utility function and a cost function based on a preset tracking performance threshold based on the unmanned ship heading error, and realizing the design of the self-adaptive update rate of the neural network of the evaluation module;
the unmanned aerial vehicle neural network compensator module is used for outputting and compensating the unmanned aerial vehicle neural network based on the neural network system in the nonlinear unmanned aerial vehicle system;
the virtual control law module is used for designing a virtual control function of the unmanned ship system by utilizing the reference signal and the information of the compensation module, and designing a virtual control law according to the virtual control function;
the neural network self-adaptive update rate module is used for obtaining the neural network self-adaptive rate based on the evaluation module, the virtual control rate model, the control rate model information and the strategy utility function;
the reinforcement learning control law module is used for designing a controller based on the system error information and the virtual control law module information;
the data feedback module is used for transmitting the output information of the unmanned ship reinforcement learning control law module to the unmanned ship steering engine so as to realize the control of the reinforcement learning control law module on the heading of the unmanned ship.
CN202310321522.0A 2023-03-29 2023-03-29 Course tracking control method and system for discrete time reinforcement learning unmanned ship Pending CN116300949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310321522.0A CN116300949A (en) 2023-03-29 2023-03-29 Course tracking control method and system for discrete time reinforcement learning unmanned ship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310321522.0A CN116300949A (en) 2023-03-29 2023-03-29 Course tracking control method and system for discrete time reinforcement learning unmanned ship

Publications (1)

Publication Number Publication Date
CN116300949A true CN116300949A (en) 2023-06-23

Family

ID=86777824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310321522.0A Pending CN116300949A (en) 2023-03-29 2023-03-29 Course tracking control method and system for discrete time reinforcement learning unmanned ship

Country Status (1)

Country Link
CN (1) CN116300949A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062058A (en) * 2018-09-26 2018-12-21 大连海事大学 Ship course track following design method based on adaptive fuzzy optimum control
CN109188909A (en) * 2018-09-26 2019-01-11 大连海事大学 Adaptive fuzzy method for optimally controlling and system towards ship course nonlinear discrete systems
CA3067573A1 (en) * 2019-01-14 2020-07-14 Harbin Engineering University Target tracking systems and methods for uuv
CN111948937A (en) * 2020-07-20 2020-11-17 电子科技大学 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
CN112782981A (en) * 2020-12-30 2021-05-11 大连海事大学 Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062058A (en) * 2018-09-26 2018-12-21 大连海事大学 Ship course track following design method based on adaptive fuzzy optimum control
CN109188909A (en) * 2018-09-26 2019-01-11 大连海事大学 Adaptive fuzzy method for optimally controlling and system towards ship course nonlinear discrete systems
CA3067573A1 (en) * 2019-01-14 2020-07-14 Harbin Engineering University Target tracking systems and methods for uuv
CN111948937A (en) * 2020-07-20 2020-11-17 电子科技大学 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
CN112782981A (en) * 2020-12-30 2021-05-11 大连海事大学 Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEIWEI BAI 等: "Adaptive Reinforcement Learning Tracking Control for Second-Order Multi-Agent Systems", IEEE XPLORE, pages 202 - 207 *
WEIWEI BAI 等: "NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems", 《IEEE TRANSACTIONS ON CYBERNETIC》, pages 4573 - 4583 *

Similar Documents

Publication Publication Date Title
CN109507885B (en) Model-free self-adaptive AUV control method based on active disturbance rejection
CN108008628B (en) Method for controlling preset performance of uncertain underactuated unmanned ship system
Zhang et al. Improved concise backstepping control of course keeping for ships using nonlinear feedback technique
CN111948937B (en) Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
CN110658814B (en) Self-adaptive ship motion modeling method applied to ship motion control
Muske et al. Identification of a control oriented nonlinear dynamic USV model
CN111198502B (en) Unmanned ship track tracking control method based on interference observer and fuzzy system
CN111897225B (en) Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system
Fang et al. Global output feedback control of dynamically positioned surface vessels: an adaptive control approach
Shen et al. Prescribed performance dynamic surface control for trajectory-tracking of unmanned surface vessel with input saturation
CN110703605B (en) Self-adaptive fuzzy optimal control method and system for intelligent ship autopilot system
Mu et al. Path following for podded propulsion unmanned surface vehicle: Theory, simulation and experiment
CN111798702B (en) Unmanned ship path tracking control method, system, storage medium and terminal
CN109240289A (en) Wave glider yawing information self-adapting filtering method
CN111930124A (en) Fuzzy self-adaptive output feedback finite time control method and system for intelligent ship autopilot system
CN110510073A (en) A kind of self-adaptation control method and system of unmanned sailing boat
Wu et al. Adaptive neural network and extended state observer-based non-singular terminal sliding modetracking control for an underactuated USV with unknown uncertainties
CN113110511A (en) Intelligent ship course control method based on generalized fuzzy hyperbolic model
Liu et al. A hierarchical disturbance rejection depth tracking control of underactuated AUV with experimental verification
Xu et al. Event-triggered adaptive target tracking control for an underactuated autonomous underwater vehicle with actuator faults
CN110515387A (en) A kind of above water craft drift angle compensating non-linear course heading control method
CN115248553A (en) Event triggering adaptive PID track tracking fault-tolerant control method for under-actuated ship
Wu et al. An overview of developments and challenges for unmanned surface vehicle autonomous berthing
CN116300949A (en) Course tracking control method and system for discrete time reinforcement learning unmanned ship
CN116400691B (en) Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination