CN111308890B

CN111308890B - Unmanned ship data-driven reinforcement learning control method with designated performance

Info

Publication number: CN111308890B
Application number: CN202010122590.0A
Authority: CN
Inventors: 王宁; 李堃; 高颖; 杨忱
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2022-08-26
Anticipated expiration: 2040-02-27
Also published as: CN111308890A

Abstract

The invention provides an unmanned ship data-driven reinforcement learning control method with designated performance aiming at an unmanned surface ship system, and the method comprises the following steps: s1, establishing a mathematical model of the unmanned surface vessel; s2, introducing a specified performance function; s3, designing an optimal controller of the unmanned ship; s4, weight update rate of the design judgers and executors. The method can realize the simultaneous update of the actuator and the judger, and the error can be in a designated range, so as to obtain the optimal control strategy. Meanwhile, the method of the invention accelerates the convergence speed of the control system and obviously improves the adaptability and reliability of the unmanned ship system in the operation of unknown environment.

Description

Unmanned ship data-driven reinforcement learning control method with designated performance

Technical Field

The invention relates to the technical field of reinforcement learning and trajectory tracking of unmanned ships on water, in particular to a data-driven reinforcement learning control method for unmanned ships with designated performance.

Background

Artificial intelligence technology is now widely used in the control field, particularly in unmanned ship systems. Compared with the traditional ship, the unmanned ship can well process complex and variable offshore environment and reduce the influence of human factors and uncertain disturbance. Reinforcement learning is an efficient solution to the optimal control problem. The method can solve the defect that the Hamilton-Jacobi-Bellman equation is difficult to solve in the traditional optimal control problem. Werbos proposes an optimal control framework based on reinforcement learning and using actor-commentary neural networks. Cost functions and control strategies can be approximated by using actor-critic neural networks, thereby satisfying optimal criteria and avoiding dimension disaster problems. In the actual operation process, the tracking error of the unmanned ship is required to be within a certain range, but although the tracking control of the unmanned ship can be realized by the prior art, the tracking error cannot be ensured to be within the required range.

Disclosure of Invention

In light of the technical problems set forth above, an unmanned ship data-driven reinforcement learning control method with designated performance is provided. The invention can realize the simultaneous update of the actuator and the judger, and the error can be in the designated range, so as to obtain the optimal control strategy. The convergence rate of the control system is accelerated, and the adaptability and the reliability of the unmanned ship system in operation in an unknown environment are obviously improved.

The technical means adopted by the invention are as follows:

the unmanned ship data-driven reinforcement learning control method with the designated performance is characterized by comprising the following steps of:

s1, establishing a mathematical model of the unmanned surface vessel;

s2, introducing a specified performance function;

s3, designing an optimal controller of the unmanned ship;

s4, weight update rate of the design judgers and executors.

Further, the step S1 is specifically:

s11, defining a northeast coordinate system OX ₀ Y ₀ Z ₀ And an accessory coordinate system BXYZ;

s12, modeling the unmanned surface vessel to obtain the following vessel motion control mathematical model:

wherein eta is [ x, y, psi ═ x] ^T The ship position vector under the northeast coordinate system is shown, x and y represent the northeast position of the unmanned surface ship, and psi belongs to [0,2 pi ]]Representing a bow roll angle; v ═ u, v, r] ^T Representing the speed of the unmanned surface vessel in the coordinate system of the appendageDegree vectors u, v and r respectively represent the surging speed, the swaying speed and the heading speed; f (η, v) is the completely unknown dynamic vector; τ' ═ M ^-1 τ，τ＝[τ _u ,τ _v ,τ _r ] ^T Representing vessel control input vector, τ _u 、τ _v 、τ _r Respectively representing a surging control force, a swaying control force and a yawing control force; r (psi) represents a conversion matrix between the terrestrial coordinate system and the hull coordinate system; m (t) ═ M ^T (t) > 0 represents an inertia matrix containing additional mass;

s13, setting the expected track mathematical model of the unmanned surface vessel as follows:

wherein x is _d ＝[η _d ^T ,v _d ^T ] ^T ，η _d ＝[x _d ,y _d ,ψ _d ] ^T V and v _d ＝[u _d ,v _d ,r _d ] ^T Respectively, an expected position vector and a speed vector tracked by the unmanned surface vessel.

Further, the step S2 is specifically:

s21, defining the track tracking error dynamics of the unmanned ship as follows:

wherein the content of the first and second substances,

η _e ＝η-η _d ,v _e ＝v-v _d ，

G＝[0 _3X3 ,M ^-1 ] ^T ；

s22, defining the specified performance and enabling the tracking error to satisfy the following formula:

-δ _i,min μ _i (t)≤e _i (t)≤δ _i,max μ _i (t)

wherein, delta _i,min ,δ _i,max Is a constant, mu _i (t) is a bounded, decreasing smooth function of

S23, Performance function μ _i (t) and a constant δ _i,min ,δ _i,max The error e can be determined _i (t) boundary, tracking error redefined as:

e _i (t)＝μ _i (t)Φ _i (z _i (t))

wherein the content of the first and second substances,

is the conversion error, phi _i (z _i ) Is a smooth increasing function, whose expression is as follows:

having an inverse function of

An error function of

The transition tracking error dynamics are as follows:

wherein the content of the first and second substances,

in a collection containing origin

The above is risperidone, and

since the reference variable is bounded, so

And

is bounded, i.e.

Further, the step S3 is specifically:

s31, constructing a cost function, which is as follows:

wherein the transition tracking error having the optimal control input is

The optimal cost function is then

Wherein gamma is more than or equal to 0 and less than 1, r (z, tau) ^* )＝z ^T Qz+τ ^*T Λτ ^* Wherein Q ∈ R ^6X6 ,Λ∈R ^3X3 Is a positive definite matrix;

s32, deriving the cost function, the derivative of which is as follows:

it can be obtained that:

the hamilton-jacobi-bellman equation can be written as:

wherein the content of the first and second substances,

s33, solving

The optimal control rate is obtained as follows:

further, the step S4 is specifically:

s41, solving the optimal controller by adopting a neural network approximation method based on the structure of the judger and the actuator; the optimal cost function is expressed as follows:

wherein the content of the first and second substances,

is an ideal weight vector for the neural network of the evaluator, N being a neuronThe number of the first and second groups is,

representing the basis function of the input vector of the neural network, epsilon _c Is a bounded neural network function approximation error;

s42, considering the Bellman error equation as follows:

wherein the content of the first and second substances,

s43, designing an approximation function of the cost function, wherein the approximation function is shown as the following formula:

s44, considering the objective function

Obtained by gradient descent method

Wherein, gamma is _c Is a positive definite matrix;

s45, adopting reinforcement learning optimal tracking control, wherein the optimal control strategy is as follows:

wherein the content of the first and second substances,

is an ideal weight

(ii) an estimate of (d); the actor adaptation rate is as follows:

wherein, gamma is _a Is a positive definite matrix, k _a Is a parameter of the design that is,

further, the step S11 is specifically:

northeast coordinate System (OX) ₀ Y ₀ Z ₀ ) Regarding as an inertial coordinate system, taking any point O on the earth as the origin of coordinates, OX ₀ Pointing to north, OY ₀ Pointing to the east, OZ ₀ Pointing to the center of the earth sphere;

the attached body coordinate system BXYZ is regarded as a non-inertial coordinate system, when the ship is in bilateral symmetry, the center of the attached body coordinate system is taken as a coordinate origin B, the BX axis points to the bow direction along the center line of the ship, the BY axis points to the starboard vertically, and the BZ axis points downwards vertically along the XY plane.

Compared with the prior art, the invention has the following advantages:

the method is characterized in that based on a reinforcement learning theory, aiming at an unmanned ship system containing complex unknown nonlinearity, an optimal control method for unmanned ship data-driven designated performance is provided by using a designated performance control method, so that the transient performance and the steady-state performance of the controlled system are realized, and the tracking error is kept in a designated boundary range at each moment. The specific innovation points are as follows:

1) the conventional ship motion control method is designed by taking a common ship dynamics model as a controlled object, giving an expected track and taking track tracking as a control target. However, most unmanned ship control methods do not consider the problem of control optimization, and with the increasing complexity of operation tasks, the design of a high-precision and high-performance unmanned ship to complete an optimal control task is urgently needed at present, but the development of an optimal control theory of the unmanned ship is not perfect, a set of systematic optimal control method needs to be designed, and the strengthened learning control research of the unmanned ship is facilitated to be expanded.

2) Compared with other vehicles or moving bodies, the unknown navigation environment of the unmanned ship on the water surface, the external interference of wind, waves, flow and the like, unmodeled dynamics and the like are more complex, so that the autonomous control problem research is very challenging. The traditional control method completely depends on accurate model dynamics and parameters, however, the navigation environment of the unmanned surface vessel often faces the pressure of high sea conditions, and the real-time changing system dynamics can not be accurately acquired, so that the control method based on the accurate model is difficult to acquire ideal control performance. Therefore, the project provides a control method based on data driving, and the use of system dynamics is completely avoided by only inputting and outputting measurable data in the control design process. The designed control strategy can still complete the high-precision tracking control task.

3) The traditional adaptive control design method only improves the conclusion of system performance by analyzing the relationship between certain parameters or certain correction terms in a controller and the system performance and modifying related parameters, but does not indicate the degree of improving the performance by quantitative indexes, and is difficult to simultaneously require control precision and convergence speed. The project designs a high-precision high-performance optimal control method for the unmanned ship, the tracking error is kept in a specified boundary at each moment, the size of the performance boundary can be set at will, the convergence rate is not less than a preset value, the error overshoot cannot exceed a preset limit value, and the steady-state error is kept in a set upper and lower boundary, so that the stability of the system is ensured, the specified performance constraint can be automatically met, and the transient performance and the steady-state performance of the system are realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a view of the position tracking of the unmanned ship according to the present invention.

FIG. 3 is a diagram of the velocity tracking of the unmanned ship of the present invention.

Fig. 4 is a cross-axis position error diagram of the unmanned ship of the present invention.

Fig. 5 is a diagram of the position error of the longitudinal axis of the unmanned ship.

Fig. 6 is a diagram of the bow roll angle error of the unmanned ship of the present invention.

FIG. 7 is a graph of the heave velocity error of the unmanned ship of the present invention.

FIG. 8 is a graph of the yaw rate error of the unmanned ship according to the present invention.

Fig. 9 is a diagram of the heading speed error of the unmanned ship.

FIG. 10 is a diagram of unmanned ship trajectory tracking according to the present invention.

FIG. 11 is a critic neural network weight update diagram of the present invention.

Fig. 12 is a diagram of actor neural network weight update according to the present invention.

FIG. 13 is a graph of unmanned ship control rate according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, the invention provides a method for controlling unmanned ship data-driven reinforcement learning with designated performance, which comprises the following steps:

s1, establishing a mathematical model of the unmanned surface vessel; the method specifically comprises the following steps:

s11, defining a northeast coordinate system OX ₀ Y ₀ Z ₀ And an appendage coordinate system BXYZ;

further, as a preferred embodiment of the present invention, step S11 specifically includes:

north east coordinate system (OX) ₀ Y ₀ Z ₀ ) Taking any point O of the earth as the origin of coordinates, OX, as an inertial coordinate system ₀ Pointing to north, OY ₀ Pointing to the east, OZ ₀ Pointing to the center of the earth sphere;

wherein eta is [ x, y, psi ═ x, y, psi] ^T The ship position vector under the northeast coordinate system is shown, x and y represent the northeast position of the unmanned surface ship, and psi belongs to [0,2 pi ]]Representing a yaw angle; v ═ u, v, r] ^T Representing the motion velocity vector of the unmanned surface vessel in an attached coordinate system, wherein u, v and r respectively represent the surging velocity, the swaying velocity and the yawing velocity; f (η, v) is the completely unknown dynamic vector; τ ═ M ^-1 τ，τ＝[τ _u ,τ _v ,τ _r ] ^T Representing vessel control input vector, τ _u 、τ _v 、τ _r Respectively representing a surging control force, a swaying control force and a yawing control force; r (psi) represents a conversion matrix between the terrestrial coordinate system and the hull coordinate system; m (t) ═ M ^T (t) > 0 represents an inertia matrix containing additional mass;

S2 introduces a specified performance function, specifically:

wherein the content of the first and second substances,

η _e ＝η-η _d ,v _e ＝v-v _d ，

G＝[0 _3X3 ,M ^-1 ] ^T ；

-δ _i,min μ _i (t)≤e _i (t)≤δ _i,max μ _i (t)

e _i (t)＝μ _i (t)Φ _i (z _i (t))

wherein the content of the first and second substances,

having an inverse function of

An error function of

The transition tracking error dynamics are as follows:

wherein the content of the first and second substances,

in a set containing the origin

The above is risperidone, and

since the reference variable is bounded, so

And

is bounded, i.e.

S3, designing an optimal controller of the unmanned ship: the method specifically comprises the following steps:

s31, constructing a cost function, which is as follows:

wherein the transition tracking error having the optimal control input is

The optimal cost function is then

Wherein gamma is more than or equal to 0 and less than 1, r (z, tau) ^* )＝z ^T Qz+τ ^*T Λτ ^* Wherein Q ∈ R ^6X6 ,Λ∈R ^3X3 Is a positive definite matrix.

S32, derivation is carried out on the constructed cost function, and the derivative is as follows:

it can be obtained that:

the hamilton-jacobi-bellman equation can be written as:

wherein the content of the first and second substances,

s33, solving

The optimal control rate is obtained as follows:

s4, weight update rate of the design judger and the executor: the method specifically comprises the following steps:

s41, in order to obtain an ideal optimal control solution, the HJB equation needs to be solved, but the HJB equation has high nonlinearity, and solution of the HJB equation is very difficult, so a neural network approximation method based on an evaluation device and an actuator structure is adopted to solve the optimal controller. The optimal cost function is expressed as follows:

wherein the content of the first and second substances,

is an ideal weight vector of the neural network of the evaluation device, N is the number of the neurons,

s42, considering the Bellman error equation as follows:

wherein the content of the first and second substances,

s43, designing an approximation function of the cost function, which is shown as the following formula:

s44, considering the objective function

Obtained by gradient descent method

Wherein, gamma is _c Is a positive definite matrix;

s45, because the gradient of the cost function is unknown, the ideal optimal control strategy can not be obtained, and therefore the actual optimal control strategy is obtained by approximating unknown ideal weight. The actual estimates of the final actors and critics may be updated simultaneously through the actor and critics' neural network. And (3) adopting reinforcement learning optimal tracking control, wherein an optimal control strategy is as follows:

wherein the content of the first and second substances,

is an ideal weight

Is estimated. The actor adaptation rate is as follows:

as shown in fig. 11 and 12, by updating the neural network and the control rate map of fig. 13, the actuator and the evaluator can be updated simultaneously, and the error can be within a specified range, so as to obtain an optimal control strategy. Fig. 1 is a position tracking diagram of an unmanned ship, fig. 2 is a speed tracking diagram of the unmanned ship, it can be seen from the diagram that the tracking effect of the method is good, fig. 3 to fig. 9 are error diagrams, it can be seen that the errors of the ship are all within a specified range, fig. 10 is a track diagram of a reference track tracked by the unmanned ship, it can be seen from the diagram that the reference track is quickly tracked by the unmanned ship, the tracking effect is good, therefore, it can be seen that the convergence speed of the control system is increased by the method, and the adaptability and reliability of the unmanned ship system in unknown environment operation are obviously improved.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The unmanned ship data-driven reinforcement learning control method with the designated performance is characterized by comprising the following steps of:

s1, establishing a mathematical model of the unmanned surface vessel;

the step S1 specifically includes:

wherein eta is [ x, y, psi ═ x, y, psi] ^T The ship position vector under the northeast coordinate system is shown, x and y represent the northeast position of the unmanned surface ship, and psi belongs to [0,2 pi ]]Representing a bow roll angle; v ═ u, v, r] ^T Representing the motion velocity vector of the unmanned surface vessel in an attached coordinate system, wherein u, v and r respectively represent the surging velocity, the swaying velocity and the yawing velocity; f (η, v) is the completely unknown dynamic vector; τ' ═ M ^-1 τ，τ＝[τ _u ,τ _v ,τ _r ] ^T Representing vessel control input vector, τ _u 、τ _v 、τ _r Respectively representing a surging control force, a swaying control force and a yawing control force; r (psi) represents a conversion matrix between the terrestrial coordinate system and the hull coordinate system; m (t) ═ M ^T (t) > 0 represents an inertia matrix containing additional mass;

wherein x is _d ＝[η _d ^T ,v _d ^T ] ^T ，η _d ＝[x _d ,y _d ,ψ _d ] ^T V and v _d ＝[u _d ,v _d ,r _d ] ^T Respectively tracking an expected position vector and a speed vector of the unmanned surface vessel;

s2, introducing a specified performance function;

the step S2 specifically includes:

wherein, the first and the second end of the pipe are connected with each other,

η _e ＝η-η _d ,v _e ＝v-v _d ，

G＝[0 _3X3 ,M ^-1 ] ^T ；

-δ _i,min μ _i (t)≤e _i (t)≤δ _i,max μ _i (t)

wherein, delta _i，min ，δ _i，max Is a constant, mu _i (t) is a bounded, decreasing smooth function of

μ _i,0 Represents a strictly positive constant, a _i Represents a strictly positive constant, μ _i,∞ Represents a strictly positive constant;

s23, bounded decreasing smooth function mu _i (t) and a constant δ _i,min ,δ _i,max The error e can be determined _i (t) boundary, tracking error redefined as:

e _i (t)＝μ _i (t)Φ _i (z _i (t))

wherein z is _i ,i＝x,y,

u, v, r are the conversion errors, phi _i (z _i ) Is a smooth increasing function, whose expression is as follows:

having an inverse function of

An error function of

μ _i Representing a bounded, decreasing smooth function, phi _i Representing a smooth increasing function;

the transition tracking error dynamics are as follows:

wherein the content of the first and second substances,

in a collection containing origin

The above is risperidone, and

since the reference variable is bounded, so

And

is bounded, i.e.

Representing a non-linear function of dimension 6 x 1, b _f Denotes a constant greater than zero, b _g Represents a constant greater than zero;

s3, designing an optimal controller of the unmanned ship;

the step S3 specifically includes:

s31, constructing a cost function, which is as follows:

V(z)＝∫ _t ^∞ e ^-γ(s-t) r(z,τ)ds

wherein the transition tracking error with the optimal control input is:

wherein, tau ^* Represents an optimal control input;

the optimal cost function is then

s32, deriving the cost function, the derivative of which is as follows:

it can be obtained that:

the hamilton-jacobi-bellman equation can be written as:

wherein the content of the first and second substances,

represents a control gain;

s33, solving

The optimal control rate is obtained as follows:

s4, the weight updating rate of the design judger and the executor;

the step S4 specifically includes:

wherein the content of the first and second substances,

is an ideal weight vector of the neural network of the evaluator, N is the number of the neurons, phi _c (z)＝[φ _c1 ,φ _c2 ,…,φ _cN ] ^T Representing the basis function of the input vector of the neural network, epsilon _c Is a bounded neural network function approximation error;

s42, considering the Bellman error equation as follows:

wherein, is _c (z(t))＝e ^-γt φ _c (z(t))-φ _c (z (T-T)); gamma represents a discount factor that is a function of, _t represents time;

wherein the content of the first and second substances,

a transpose representing estimated weights of the cost function;

s44, considering the objective function

Obtained by gradient descent method

Wherein, gamma is _c Is a positive definite matrix; t represents transposition;

wherein the content of the first and second substances,

is an ideal weight

(ii) an estimate of (d); the actuator is adaptive as follows:

2. the unmanned ship data-driven reinforcement learning control method with specified performance according to claim 1, wherein the step S11 is specifically as follows:

northeast coordinate System (OX) ₀ Y ₀ Z ₀ ) Regarding as an inertial coordinate system, taking any point O on the earth as the origin of coordinates, OX ₀ Pointing to north, OY ₀ Pointing to the east, OZ ₀ Pointing to the center of the earth;