CN111308890B - Unmanned ship data-driven reinforcement learning control method with designated performance - Google Patents

Unmanned ship data-driven reinforcement learning control method with designated performance Download PDF

Info

Publication number
CN111308890B
CN111308890B CN202010122590.0A CN202010122590A CN111308890B CN 111308890 B CN111308890 B CN 111308890B CN 202010122590 A CN202010122590 A CN 202010122590A CN 111308890 B CN111308890 B CN 111308890B
Authority
CN
China
Prior art keywords
function
follows
coordinate system
unmanned
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010122590.0A
Other languages
Chinese (zh)
Other versions
CN111308890A (en
Inventor
王宁
李堃
高颖
杨忱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN202010122590.0A priority Critical patent/CN111308890B/en
Publication of CN111308890A publication Critical patent/CN111308890A/en
Application granted granted Critical
Publication of CN111308890B publication Critical patent/CN111308890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention provides an unmanned ship data-driven reinforcement learning control method with designated performance aiming at an unmanned surface ship system, and the method comprises the following steps: s1, establishing a mathematical model of the unmanned surface vessel; s2, introducing a specified performance function; s3, designing an optimal controller of the unmanned ship; s4, weight update rate of the design judgers and executors. The method can realize the simultaneous update of the actuator and the judger, and the error can be in a designated range, so as to obtain the optimal control strategy. Meanwhile, the method of the invention accelerates the convergence speed of the control system and obviously improves the adaptability and reliability of the unmanned ship system in the operation of unknown environment.

Description

Unmanned ship data-driven reinforcement learning control method with designated performance
Technical Field
The invention relates to the technical field of reinforcement learning and trajectory tracking of unmanned ships on water, in particular to a data-driven reinforcement learning control method for unmanned ships with designated performance.
Background
Artificial intelligence technology is now widely used in the control field, particularly in unmanned ship systems. Compared with the traditional ship, the unmanned ship can well process complex and variable offshore environment and reduce the influence of human factors and uncertain disturbance. Reinforcement learning is an efficient solution to the optimal control problem. The method can solve the defect that the Hamilton-Jacobi-Bellman equation is difficult to solve in the traditional optimal control problem. Werbos proposes an optimal control framework based on reinforcement learning and using actor-commentary neural networks. Cost functions and control strategies can be approximated by using actor-critic neural networks, thereby satisfying optimal criteria and avoiding dimension disaster problems. In the actual operation process, the tracking error of the unmanned ship is required to be within a certain range, but although the tracking control of the unmanned ship can be realized by the prior art, the tracking error cannot be ensured to be within the required range.
Disclosure of Invention
In light of the technical problems set forth above, an unmanned ship data-driven reinforcement learning control method with designated performance is provided. The invention can realize the simultaneous update of the actuator and the judger, and the error can be in the designated range, so as to obtain the optimal control strategy. The convergence rate of the control system is accelerated, and the adaptability and the reliability of the unmanned ship system in operation in an unknown environment are obviously improved.
The technical means adopted by the invention are as follows:
the unmanned ship data-driven reinforcement learning control method with the designated performance is characterized by comprising the following steps of:
s1, establishing a mathematical model of the unmanned surface vessel;
s2, introducing a specified performance function;
s3, designing an optimal controller of the unmanned ship;
s4, weight update rate of the design judgers and executors.
Further, the step S1 is specifically:
s11, defining a northeast coordinate system OX 0 Y 0 Z 0 And an accessory coordinate system BXYZ;
s12, modeling the unmanned surface vessel to obtain the following vessel motion control mathematical model:
Figure RE-GDA0002449083360000021
Figure RE-GDA0002449083360000022
wherein eta is [ x, y, psi ═ x] T The ship position vector under the northeast coordinate system is shown, x and y represent the northeast position of the unmanned surface ship, and psi belongs to [0,2 pi ]]Representing a bow roll angle; v ═ u, v, r] T Representing the speed of the unmanned surface vessel in the coordinate system of the appendageDegree vectors u, v and r respectively represent the surging speed, the swaying speed and the heading speed; f (η, v) is the completely unknown dynamic vector; τ' ═ M -1 τ,τ=[τ uvr ] T Representing vessel control input vector, τ u 、τ v 、τ r Respectively representing a surging control force, a swaying control force and a yawing control force; r (psi) represents a conversion matrix between the terrestrial coordinate system and the hull coordinate system; m (t) ═ M T (t) > 0 represents an inertia matrix containing additional mass;
s13, setting the expected track mathematical model of the unmanned surface vessel as follows:
Figure RE-GDA0002449083360000023
wherein x is d =[η d T ,v d T ] T ,η d =[x d ,y dd ] T V and v d =[u d ,v d ,r d ] T Respectively, an expected position vector and a speed vector tracked by the unmanned surface vessel.
Further, the step S2 is specifically:
s21, defining the track tracking error dynamics of the unmanned ship as follows:
Figure RE-GDA0002449083360000024
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000025
η e =η-η d ,v e =v-v d
Figure RE-GDA0002449083360000026
G=[0 3X3 ,M -1 ] T
s22, defining the specified performance and enabling the tracking error to satisfy the following formula:
i,min μ i (t)≤e i (t)≤δ i,max μ i (t)
wherein, delta i,mini,max Is a constant, mu i (t) is a bounded, decreasing smooth function of
Figure RE-GDA0002449083360000027
S23, Performance function μ i (t) and a constant δ i,mini,max The error e can be determined i (t) boundary, tracking error redefined as:
e i (t)=μ i (t)Φ i (z i (t))
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000028
is the conversion error, phi i (z i ) Is a smooth increasing function, whose expression is as follows:
Figure RE-GDA0002449083360000031
having an inverse function of
Figure RE-GDA0002449083360000032
An error function of
Figure RE-GDA0002449083360000033
Figure RE-GDA0002449083360000034
The transition tracking error dynamics are as follows:
Figure RE-GDA0002449083360000035
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000036
Figure RE-GDA0002449083360000037
Figure RE-GDA0002449083360000038
in a collection containing origin
Figure RE-GDA0002449083360000039
The above is risperidone, and
Figure RE-GDA00024490833600000310
since the reference variable is bounded, so
Figure RE-GDA00024490833600000311
And
Figure RE-GDA00024490833600000312
is bounded, i.e.
Figure RE-GDA00024490833600000313
Further, the step S3 is specifically:
s31, constructing a cost function, which is as follows:
Figure RE-GDA00024490833600000314
wherein the transition tracking error having the optimal control input is
Figure RE-GDA00024490833600000315
The optimal cost function is then
Figure RE-GDA00024490833600000316
Wherein gamma is more than or equal to 0 and less than 1, r (z, tau) * )=z T Qz+τ *T Λτ * Wherein Q ∈ R 6X6 ,Λ∈R 3X3 Is a positive definite matrix;
s32, deriving the cost function, the derivative of which is as follows:
Figure RE-GDA00024490833600000317
it can be obtained that:
Figure RE-GDA00024490833600000318
the hamilton-jacobi-bellman equation can be written as:
Figure RE-GDA00024490833600000319
wherein the content of the first and second substances,
Figure RE-GDA00024490833600000320
s33, solving
Figure RE-GDA00024490833600000321
The optimal control rate is obtained as follows:
Figure RE-GDA00024490833600000322
further, the step S4 is specifically:
s41, solving the optimal controller by adopting a neural network approximation method based on the structure of the judger and the actuator; the optimal cost function is expressed as follows:
Figure RE-GDA00024490833600000323
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000041
is an ideal weight vector for the neural network of the evaluator, N being a neuronThe number of the first and second groups is,
Figure RE-GDA0002449083360000042
representing the basis function of the input vector of the neural network, epsilon c Is a bounded neural network function approximation error;
s42, considering the Bellman error equation as follows:
Figure RE-GDA0002449083360000043
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000044
s43, designing an approximation function of the cost function, wherein the approximation function is shown as the following formula:
Figure RE-GDA0002449083360000045
s44, considering the objective function
Figure RE-GDA0002449083360000046
Obtained by gradient descent method
Figure RE-GDA0002449083360000047
Figure RE-GDA0002449083360000048
Wherein, gamma is c Is a positive definite matrix;
s45, adopting reinforcement learning optimal tracking control, wherein the optimal control strategy is as follows:
Figure RE-GDA0002449083360000049
wherein the content of the first and second substances,
Figure RE-GDA00024490833600000410
is an ideal weight
Figure RE-GDA00024490833600000411
(ii) an estimate of (d); the actor adaptation rate is as follows:
Figure RE-GDA00024490833600000412
wherein, gamma is a Is a positive definite matrix, k a Is a parameter of the design that is,
Figure RE-GDA00024490833600000413
Figure RE-GDA00024490833600000414
Figure RE-GDA00024490833600000415
further, the step S11 is specifically:
northeast coordinate System (OX) 0 Y 0 Z 0 ) Regarding as an inertial coordinate system, taking any point O on the earth as the origin of coordinates, OX 0 Pointing to north, OY 0 Pointing to the east, OZ 0 Pointing to the center of the earth sphere;
the attached body coordinate system BXYZ is regarded as a non-inertial coordinate system, when the ship is in bilateral symmetry, the center of the attached body coordinate system is taken as a coordinate origin B, the BX axis points to the bow direction along the center line of the ship, the BY axis points to the starboard vertically, and the BZ axis points downwards vertically along the XY plane.
Compared with the prior art, the invention has the following advantages:
the method is characterized in that based on a reinforcement learning theory, aiming at an unmanned ship system containing complex unknown nonlinearity, an optimal control method for unmanned ship data-driven designated performance is provided by using a designated performance control method, so that the transient performance and the steady-state performance of the controlled system are realized, and the tracking error is kept in a designated boundary range at each moment. The specific innovation points are as follows:
1) the conventional ship motion control method is designed by taking a common ship dynamics model as a controlled object, giving an expected track and taking track tracking as a control target. However, most unmanned ship control methods do not consider the problem of control optimization, and with the increasing complexity of operation tasks, the design of a high-precision and high-performance unmanned ship to complete an optimal control task is urgently needed at present, but the development of an optimal control theory of the unmanned ship is not perfect, a set of systematic optimal control method needs to be designed, and the strengthened learning control research of the unmanned ship is facilitated to be expanded.
2) Compared with other vehicles or moving bodies, the unknown navigation environment of the unmanned ship on the water surface, the external interference of wind, waves, flow and the like, unmodeled dynamics and the like are more complex, so that the autonomous control problem research is very challenging. The traditional control method completely depends on accurate model dynamics and parameters, however, the navigation environment of the unmanned surface vessel often faces the pressure of high sea conditions, and the real-time changing system dynamics can not be accurately acquired, so that the control method based on the accurate model is difficult to acquire ideal control performance. Therefore, the project provides a control method based on data driving, and the use of system dynamics is completely avoided by only inputting and outputting measurable data in the control design process. The designed control strategy can still complete the high-precision tracking control task.
3) The traditional adaptive control design method only improves the conclusion of system performance by analyzing the relationship between certain parameters or certain correction terms in a controller and the system performance and modifying related parameters, but does not indicate the degree of improving the performance by quantitative indexes, and is difficult to simultaneously require control precision and convergence speed. The project designs a high-precision high-performance optimal control method for the unmanned ship, the tracking error is kept in a specified boundary at each moment, the size of the performance boundary can be set at will, the convergence rate is not less than a preset value, the error overshoot cannot exceed a preset limit value, and the steady-state error is kept in a set upper and lower boundary, so that the stability of the system is ensured, the specified performance constraint can be automatically met, and the transient performance and the steady-state performance of the system are realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a view of the position tracking of the unmanned ship according to the present invention.
FIG. 3 is a diagram of the velocity tracking of the unmanned ship of the present invention.
Fig. 4 is a cross-axis position error diagram of the unmanned ship of the present invention.
Fig. 5 is a diagram of the position error of the longitudinal axis of the unmanned ship.
Fig. 6 is a diagram of the bow roll angle error of the unmanned ship of the present invention.
FIG. 7 is a graph of the heave velocity error of the unmanned ship of the present invention.
FIG. 8 is a graph of the yaw rate error of the unmanned ship according to the present invention.
Fig. 9 is a diagram of the heading speed error of the unmanned ship.
FIG. 10 is a diagram of unmanned ship trajectory tracking according to the present invention.
FIG. 11 is a critic neural network weight update diagram of the present invention.
Fig. 12 is a diagram of actor neural network weight update according to the present invention.
FIG. 13 is a graph of unmanned ship control rate according to the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, the invention provides a method for controlling unmanned ship data-driven reinforcement learning with designated performance, which comprises the following steps:
s1, establishing a mathematical model of the unmanned surface vessel; the method specifically comprises the following steps:
s11, defining a northeast coordinate system OX 0 Y 0 Z 0 And an appendage coordinate system BXYZ;
further, as a preferred embodiment of the present invention, step S11 specifically includes:
north east coordinate system (OX) 0 Y 0 Z 0 ) Taking any point O of the earth as the origin of coordinates, OX, as an inertial coordinate system 0 Pointing to north, OY 0 Pointing to the east, OZ 0 Pointing to the center of the earth sphere;
the attached body coordinate system BXYZ is regarded as a non-inertial coordinate system, when the ship is in bilateral symmetry, the center of the attached body coordinate system is taken as a coordinate origin B, the BX axis points to the bow direction along the center line of the ship, the BY axis points to the starboard vertically, and the BZ axis points downwards vertically along the XY plane.
S12, modeling the unmanned surface vessel to obtain the following vessel motion control mathematical model:
Figure RE-GDA0002449083360000071
Figure RE-GDA0002449083360000072
wherein eta is [ x, y, psi ═ x, y, psi] T The ship position vector under the northeast coordinate system is shown, x and y represent the northeast position of the unmanned surface ship, and psi belongs to [0,2 pi ]]Representing a yaw angle; v ═ u, v, r] T Representing the motion velocity vector of the unmanned surface vessel in an attached coordinate system, wherein u, v and r respectively represent the surging velocity, the swaying velocity and the yawing velocity; f (η, v) is the completely unknown dynamic vector; τ ═ M -1 τ,τ=[τ uvr ] T Representing vessel control input vector, τ u 、τ v 、τ r Respectively representing a surging control force, a swaying control force and a yawing control force; r (psi) represents a conversion matrix between the terrestrial coordinate system and the hull coordinate system; m (t) ═ M T (t) > 0 represents an inertia matrix containing additional mass;
s13, setting the expected track mathematical model of the unmanned surface vessel as follows:
Figure RE-GDA0002449083360000073
wherein x is d =[η d T ,v d T ] T ,η d =[x d ,y dd ] T V and v d =[u d ,v d ,r d ] T Respectively, an expected position vector and a speed vector tracked by the unmanned surface vessel.
S2 introduces a specified performance function, specifically:
s21, defining the track tracking error dynamics of the unmanned ship as follows:
Figure RE-GDA0002449083360000074
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000075
η e =η-η d ,v e =v-v d
Figure RE-GDA0002449083360000076
G=[0 3X3 ,M -1 ] T
s22, defining the specified performance and enabling the tracking error to satisfy the following formula:
i,min μ i (t)≤e i (t)≤δ i,max μ i (t)
wherein, delta i,mini,max Is a constant, mu i (t) is a bounded, decreasing smooth function of
Figure RE-GDA0002449083360000081
S23, Performance function μ i (t) and a constant δ i,mini,max The error e can be determined i (t) boundary, tracking error redefined as:
e i (t)=μ i (t)Φ i (z i (t))
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000082
is the conversion error, phi i (z i ) Is a smooth increasing function, whose expression is as follows:
Figure RE-GDA0002449083360000083
having an inverse function of
Figure RE-GDA0002449083360000084
An error function of
Figure RE-GDA0002449083360000085
Figure RE-GDA0002449083360000086
The transition tracking error dynamics are as follows:
Figure RE-GDA0002449083360000087
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000088
Figure RE-GDA0002449083360000089
Figure RE-GDA00024490833600000810
in a set containing the origin
Figure RE-GDA00024490833600000811
The above is risperidone, and
Figure RE-GDA00024490833600000812
since the reference variable is bounded, so
Figure RE-GDA00024490833600000813
And
Figure RE-GDA00024490833600000814
is bounded, i.e.
Figure RE-GDA00024490833600000815
S3, designing an optimal controller of the unmanned ship: the method specifically comprises the following steps:
s31, constructing a cost function, which is as follows:
Figure RE-GDA00024490833600000816
wherein the transition tracking error having the optimal control input is
Figure RE-GDA00024490833600000817
The optimal cost function is then
Figure RE-GDA00024490833600000818
Wherein gamma is more than or equal to 0 and less than 1, r (z, tau) * )=z T Qz+τ *T Λτ * Wherein Q ∈ R 6X6 ,Λ∈R 3X3 Is a positive definite matrix.
S32, derivation is carried out on the constructed cost function, and the derivative is as follows:
Figure RE-GDA00024490833600000819
it can be obtained that:
Figure RE-GDA00024490833600000820
the hamilton-jacobi-bellman equation can be written as:
Figure RE-GDA00024490833600000821
wherein the content of the first and second substances,
Figure RE-GDA00024490833600000822
s33, solving
Figure RE-GDA0002449083360000091
The optimal control rate is obtained as follows:
Figure RE-GDA0002449083360000092
s4, weight update rate of the design judger and the executor: the method specifically comprises the following steps:
s41, in order to obtain an ideal optimal control solution, the HJB equation needs to be solved, but the HJB equation has high nonlinearity, and solution of the HJB equation is very difficult, so a neural network approximation method based on an evaluation device and an actuator structure is adopted to solve the optimal controller. The optimal cost function is expressed as follows:
Figure RE-GDA0002449083360000093
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000094
is an ideal weight vector of the neural network of the evaluation device, N is the number of the neurons,
Figure RE-GDA0002449083360000095
representing the basis function of the input vector of the neural network, epsilon c Is a bounded neural network function approximation error;
s42, considering the Bellman error equation as follows:
Figure RE-GDA0002449083360000096
wherein the content of the first and second substances,
Figure RE-GDA0002449083360000097
s43, designing an approximation function of the cost function, which is shown as the following formula:
Figure RE-GDA0002449083360000098
s44, considering the objective function
Figure RE-GDA0002449083360000099
Obtained by gradient descent method
Figure RE-GDA00024490833600000910
Figure RE-GDA00024490833600000911
Wherein, gamma is c Is a positive definite matrix;
s45, because the gradient of the cost function is unknown, the ideal optimal control strategy can not be obtained, and therefore the actual optimal control strategy is obtained by approximating unknown ideal weight. The actual estimates of the final actors and critics may be updated simultaneously through the actor and critics' neural network. And (3) adopting reinforcement learning optimal tracking control, wherein an optimal control strategy is as follows:
Figure RE-GDA00024490833600000912
wherein the content of the first and second substances,
Figure RE-GDA00024490833600000913
is an ideal weight
Figure RE-GDA00024490833600000914
Is estimated. The actor adaptation rate is as follows:
Figure RE-GDA00024490833600000915
wherein, gamma is a Is a positive definite matrix, k a Is a parameter of the design that is,
Figure RE-GDA00024490833600000916
Figure RE-GDA00024490833600000917
Figure RE-GDA00024490833600000918
as shown in fig. 11 and 12, by updating the neural network and the control rate map of fig. 13, the actuator and the evaluator can be updated simultaneously, and the error can be within a specified range, so as to obtain an optimal control strategy. Fig. 1 is a position tracking diagram of an unmanned ship, fig. 2 is a speed tracking diagram of the unmanned ship, it can be seen from the diagram that the tracking effect of the method is good, fig. 3 to fig. 9 are error diagrams, it can be seen that the errors of the ship are all within a specified range, fig. 10 is a track diagram of a reference track tracked by the unmanned ship, it can be seen from the diagram that the reference track is quickly tracked by the unmanned ship, the tracking effect is good, therefore, it can be seen that the convergence speed of the control system is increased by the method, and the adaptability and reliability of the unmanned ship system in unknown environment operation are obviously improved.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (2)

1. The unmanned ship data-driven reinforcement learning control method with the designated performance is characterized by comprising the following steps of:
s1, establishing a mathematical model of the unmanned surface vessel;
the step S1 specifically includes:
s11, defining a northeast coordinate system OX 0 Y 0 Z 0 And an accessory coordinate system BXYZ;
s12, modeling the unmanned surface vessel to obtain the following vessel motion control mathematical model:
Figure FDA0003744104780000011
Figure FDA0003744104780000012
wherein eta is [ x, y, psi ═ x, y, psi] T The ship position vector under the northeast coordinate system is shown, x and y represent the northeast position of the unmanned surface ship, and psi belongs to [0,2 pi ]]Representing a bow roll angle; v ═ u, v, r] T Representing the motion velocity vector of the unmanned surface vessel in an attached coordinate system, wherein u, v and r respectively represent the surging velocity, the swaying velocity and the yawing velocity; f (η, v) is the completely unknown dynamic vector; τ' ═ M -1 τ,τ=[τ uvr ] T Representing vessel control input vector, τ u 、τ v 、τ r Respectively representing a surging control force, a swaying control force and a yawing control force; r (psi) represents a conversion matrix between the terrestrial coordinate system and the hull coordinate system; m (t) ═ M T (t) > 0 represents an inertia matrix containing additional mass;
s13, setting the expected track mathematical model of the unmanned surface vessel as follows:
Figure FDA0003744104780000013
wherein x is d =[η d T ,v d T ] T ,η d =[x d ,y dd ] T V and v d =[u d ,v d ,r d ] T Respectively tracking an expected position vector and a speed vector of the unmanned surface vessel;
s2, introducing a specified performance function;
the step S2 specifically includes:
s21, defining the track tracking error dynamics of the unmanned ship as follows:
Figure FDA0003744104780000014
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003744104780000015
η e =η-η d ,v e =v-v d
Figure FDA0003744104780000016
G=[0 3X3 ,M -1 ] T
s22, defining the specified performance and enabling the tracking error to satisfy the following formula:
i,min μ i (t)≤e i (t)≤δ i,max μ i (t)
wherein, delta i,min ,δ i,max Is a constant, mu i (t) is a bounded, decreasing smooth function of
Figure FDA00037441047800000218
μ i,0 Represents a strictly positive constant, a i Represents a strictly positive constant, μ i,∞ Represents a strictly positive constant;
s23, bounded decreasing smooth function mu i (t) and a constant δ i,mini,max The error e can be determined i (t) boundary, tracking error redefined as:
e i (t)=μ i (t)Φ i (z i (t))
wherein z is i ,i=x,y,
Figure FDA0003744104780000021
u, v, r are the conversion errors, phi i (z i ) Is a smooth increasing function, whose expression is as follows:
Figure FDA0003744104780000022
having an inverse function of
Figure FDA0003744104780000023
An error function of
Figure FDA0003744104780000024
Figure FDA0003744104780000025
μ i Representing a bounded, decreasing smooth function, phi i Representing a smooth increasing function;
the transition tracking error dynamics are as follows:
Figure FDA0003744104780000026
wherein the content of the first and second substances,
Figure FDA0003744104780000027
Figure FDA0003744104780000028
Figure FDA0003744104780000029
in a collection containing origin
Figure FDA00037441047800000210
The above is risperidone, and
Figure FDA00037441047800000211
since the reference variable is bounded, so
Figure FDA00037441047800000212
And
Figure FDA00037441047800000213
is bounded, i.e.
Figure FDA00037441047800000214
Figure FDA00037441047800000215
Representing a non-linear function of dimension 6 x 1, b f Denotes a constant greater than zero, b g Represents a constant greater than zero;
s3, designing an optimal controller of the unmanned ship;
the step S3 specifically includes:
s31, constructing a cost function, which is as follows:
V(z)=∫ t e -γ(s-t) r(z,τ)ds
wherein the transition tracking error with the optimal control input is:
Figure FDA00037441047800000216
wherein, tau * Represents an optimal control input;
the optimal cost function is then
Figure FDA00037441047800000217
Wherein gamma is more than or equal to 0 and less than 1, r (z, tau) * )=z T Qz+τ *T Λτ * Wherein Q ∈ R 6X6 ,Λ∈R 3X3 Is a positive definite matrix;
s32, deriving the cost function, the derivative of which is as follows:
Figure FDA0003744104780000031
it can be obtained that:
Figure FDA0003744104780000032
the hamilton-jacobi-bellman equation can be written as:
Figure FDA0003744104780000033
wherein the content of the first and second substances,
Figure FDA0003744104780000034
Figure FDA0003744104780000035
represents a control gain;
s33, solving
Figure FDA0003744104780000036
The optimal control rate is obtained as follows:
Figure FDA0003744104780000037
s4, the weight updating rate of the design judger and the executor;
the step S4 specifically includes:
s41, solving the optimal controller by adopting a neural network approximation method based on the structure of the judger and the actuator; the optimal cost function is expressed as follows:
Figure FDA0003744104780000038
wherein the content of the first and second substances,
Figure FDA0003744104780000039
is an ideal weight vector of the neural network of the evaluator, N is the number of the neurons, phi c (z)=[φ c1c2 ,…,φ cN ] T Representing the basis function of the input vector of the neural network, epsilon c Is a bounded neural network function approximation error;
s42, considering the Bellman error equation as follows:
Figure FDA00037441047800000310
wherein, is c (z(t))=e -γt φ c (z(t))-φ c (z (T-T)); gamma represents a discount factor that is a function of, t represents time;
s43, designing an approximation function of the cost function, which is shown as the following formula:
Figure FDA00037441047800000311
wherein the content of the first and second substances,
Figure FDA00037441047800000312
a transpose representing estimated weights of the cost function;
s44, considering the objective function
Figure FDA00037441047800000313
Obtained by gradient descent method
Figure FDA00037441047800000314
Figure FDA00037441047800000315
Wherein, gamma is c Is a positive definite matrix; t represents transposition;
s45, adopting reinforcement learning optimal tracking control, wherein the optimal control strategy is as follows:
Figure FDA0003744104780000041
wherein the content of the first and second substances,
Figure FDA0003744104780000042
is an ideal weight
Figure FDA0003744104780000043
(ii) an estimate of (d); the actuator is adaptive as follows:
Figure FDA0003744104780000044
wherein, gamma is a Is a positive definite matrix, k a Is a parameter of the design that is,
Figure FDA0003744104780000045
Figure FDA0003744104780000046
Figure FDA0003744104780000047
2. the unmanned ship data-driven reinforcement learning control method with specified performance according to claim 1, wherein the step S11 is specifically as follows:
northeast coordinate System (OX) 0 Y 0 Z 0 ) Regarding as an inertial coordinate system, taking any point O on the earth as the origin of coordinates, OX 0 Pointing to north, OY 0 Pointing to the east, OZ 0 Pointing to the center of the earth;
the attached body coordinate system BXYZ is regarded as a non-inertial coordinate system, when the ship is in bilateral symmetry, the center of the attached body coordinate system is taken as a coordinate origin B, the BX axis points to the bow direction along the center line of the ship, the BY axis points to the starboard vertically, and the BZ axis points downwards vertically along the XY plane.
CN202010122590.0A 2020-02-27 2020-02-27 Unmanned ship data-driven reinforcement learning control method with designated performance Active CN111308890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010122590.0A CN111308890B (en) 2020-02-27 2020-02-27 Unmanned ship data-driven reinforcement learning control method with designated performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010122590.0A CN111308890B (en) 2020-02-27 2020-02-27 Unmanned ship data-driven reinforcement learning control method with designated performance

Publications (2)

Publication Number Publication Date
CN111308890A CN111308890A (en) 2020-06-19
CN111308890B true CN111308890B (en) 2022-08-26

Family

ID=71160299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010122590.0A Active CN111308890B (en) 2020-02-27 2020-02-27 Unmanned ship data-driven reinforcement learning control method with designated performance

Country Status (1)

Country Link
CN (1) CN111308890B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111679585B (en) * 2020-07-03 2022-08-26 大连海事大学 Unmanned ship reinforcement learning self-adaptive tracking control method with input saturation limitation
CN112327638B (en) * 2020-12-03 2022-11-01 大连海事大学 Unmanned ship trajectory tracking optimal control method with designated performance and input saturation limitation
CN113189867B (en) * 2021-03-24 2023-11-14 大连海事大学 Unmanned ship self-learning optimal tracking control method considering pose and speed limitation
CN113534668B (en) * 2021-08-13 2022-06-10 哈尔滨工程大学 Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework
CN114564028A (en) * 2022-03-18 2022-05-31 大连海事大学 Unmanned ship navigational speed control system driven by discrete time data and learned by self
CN116400691B (en) * 2023-03-29 2023-11-21 大连海事大学 Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN109765916A (en) * 2019-03-26 2019-05-17 武汉欣海远航科技研发有限公司 A kind of unmanned surface vehicle path following control device design method
CN109828467A (en) * 2019-03-01 2019-05-31 大连海事大学 A kind of the unmanned boat intensified learning controller architecture and design method of data-driven
CN109946975A (en) * 2019-04-12 2019-06-28 北京理工大学 A kind of intensified learning optimal track control method of unknown servo-system
CN110018687A (en) * 2019-04-09 2019-07-16 大连海事大学 Unmanned water surface ship optimal track following control method based on intensified learning method
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190385364A1 (en) * 2017-12-12 2019-12-19 John Joseph Method and system for associating relevant information with a point of interest on a virtual representation of a physical object created using digital input data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN109828467A (en) * 2019-03-01 2019-05-31 大连海事大学 A kind of the unmanned boat intensified learning controller architecture and design method of data-driven
CN109765916A (en) * 2019-03-26 2019-05-17 武汉欣海远航科技研发有限公司 A kind of unmanned surface vehicle path following control device design method
CN110018687A (en) * 2019-04-09 2019-07-16 大连海事大学 Unmanned water surface ship optimal track following control method based on intensified learning method
CN109946975A (en) * 2019-04-12 2019-06-28 北京理工大学 A kind of intensified learning optimal track control method of unknown servo-system
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于强化学习的自主移动机器人导航研究》;吴洪岩;《CNKI中国期刊全文数据库》;20091115;第1-47页 *
《用于强化学习的值函数逼近方法研究》;肖飞;《万方学位论文》;20130916;第1-75页 *

Also Published As

Publication number Publication date
CN111308890A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN111308890B (en) Unmanned ship data-driven reinforcement learning control method with designated performance
CN110018687B (en) Optimal trajectory tracking control method for unmanned surface vessel based on reinforcement learning method
Zhang et al. MPC-based 3-D trajectory tracking for an autonomous underwater vehicle with constraints in complex ocean environments
CN108008628B (en) Method for controlling preset performance of uncertain underactuated unmanned ship system
Woo et al. Dynamic model identification of unmanned surface vehicles using deep learning network
CN111679585B (en) Unmanned ship reinforcement learning self-adaptive tracking control method with input saturation limitation
Zhang et al. Model-reference reinforcement learning for collision-free tracking control of autonomous surface vehicles
Liu et al. Fully-tuned fuzzy neural network based robust adaptive tracking control of unmanned underwater vehicle with thruster dynamics
CN112650233B (en) Unmanned ship track tracking optimal control method
CN114115262B (en) Multi-AUV actuator saturation cooperative formation control system and method based on azimuth information
CN111798702B (en) Unmanned ship path tracking control method, system, storage medium and terminal
Hu et al. Trajectory tracking and re-planning with model predictive control of autonomous underwater vehicles
Zhang et al. Model-reference reinforcement learning control of autonomous surface vehicles with uncertainties
Zhang et al. Model-reference reinforcement learning control of autonomous surface vehicles
CN114967714A (en) Anti-interference motion control method and system for autonomous underwater robot
Wang et al. Fuzzy iterative sliding mode control applied for path following of an autonomous underwater vehicle with large inertia
Liu et al. Robust adaptive self-Structuring neural network bounded target tracking control of underactuated surface vessels
CN112327638B (en) Unmanned ship trajectory tracking optimal control method with designated performance and input saturation limitation
CN112558465B (en) Unknown unmanned ship finite time reinforcement learning control method with input limitation
Bejarano et al. Velocity Estimation and Robust Non-linear Path Following Control of Autonomous Surface Vehicles
Ngongi et al. Design of generalised predictive controller for dynamic positioning system of surface ships
CN113741433A (en) Distributed formation method for unmanned surface ship
Hu et al. Disturbance Observer-Based Model Predictive Control for an Unmanned Underwater Vehicle
Karras et al. Motion control for autonomous underwater vehicles: A robust model—Free approach
CN113189867B (en) Unmanned ship self-learning optimal tracking control method considering pose and speed limitation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant