CN111290270A - Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology - Google Patents

Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology Download PDF

Info

Publication number
CN111290270A
CN111290270A CN202010087509.XA CN202010087509A CN111290270A CN 111290270 A CN111290270 A CN 111290270A CN 202010087509 A CN202010087509 A CN 202010087509A CN 111290270 A CN111290270 A CN 111290270A
Authority
CN
China
Prior art keywords
speed
controller
heading
learning
underwater robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010087509.XA
Other languages
Chinese (zh)
Other versions
CN111290270B (en
Inventor
王卓
张佩
孙延超
秦洪德
朱仲本
张宇昂
曹禹
景锐洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202010087509.XA priority Critical patent/CN111290270B/en
Publication of CN111290270A publication Critical patent/CN111290270A/en
Application granted granted Critical
Publication of CN111290270B publication Critical patent/CN111290270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

A method for controlling backstepping speed and heading of an underwater robot based on a Q-learning parameter adaptive technology belongs to the technical field of robot control. The method aims to solve the problems that the prior knowledge is needed in the existing control method of the underwater robot and the parameters of the controller cannot be adjusted in real time. The invention designs a parameter self-adaptive backstepping speed and heading controller based on a Q learning algorithm, takes deviation and deviation change rate as the input of Q learning, outputs adjustment parameters, combines the control parameters determined according to the adjustment parameters with a controller designed by a backstepping method to realize speed and heading control, and is mainly used for controlling the speed and heading of an underwater robot.

Description

Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology
Technical Field
The invention relates to a speed and heading control method for an underwater robot. Belongs to the technical field of robot control.
Background
The motion control system is used as a premise and guarantee for the underwater robot to complete various expected tasks and underwater operations, and is receiving attention and attention from more and more researchers. As a complex nonlinear system, the underwater robot has uncertain and time-varying characteristics in a model, and is easily interfered by external environments such as stormy waves and currents when moving in a complex and variable ocean, so that the motion control performance of the underwater robot is greatly influenced. Therefore, for a controller designed in advance, there is room for improvement in the selection of control parameters.
At present, for the problem of automatic optimization of controller parameters, there are many algorithms combining a neural network, a fuzzy control and an evolutionary algorithm with a traditional control, and all have a certain adaptive capacity, but still have significant disadvantages, for example, the neural network adaptive control needs a large amount of instructor signals, and the instructor signals are difficult to obtain in the practical application process; fuzzy adaptive control needs prior knowledge of experts, is not beneficial to large-scale popularization and use, and genetic algorithm cannot be used for online learning and cannot be used for real-time adjustment of controller parameters.
Disclosure of Invention
The invention aims to solve the problems that the prior knowledge is needed in the existing control method of the underwater robot and the parameters of the controller cannot be adjusted in real time.
A method for controlling the backstepping speed and the heading of an underwater robot based on a Q-learning parameter adaptive technology is characterized by comprising the following steps of:
based on a kinematic model and a dynamic model of the underwater robot, a speed controller and a heading controller are established by utilizing a backstepping method;
Figure BDA0002382564880000011
Figure BDA0002382564880000012
wherein k isu
Figure BDA0002382564880000013
All are positive numbers and are control parameters to be designed; tau isuIs the longitudinal thrust of the propeller; tau isrIs the bow turning moment; the positions of three axes of an x, y and z underwater robot coordinate system,
Figure BDA0002382564880000014
theta and psi are respectively a roll angle, a pitch angle and a course angle; u, v and w are respectively a longitudinal linear velocity, a transverse linear velocity and a vertical linear velocity, and p, q and r are respectively a transverse inclination angle velocity, a longitudinal inclination angle velocity and a yaw angle velocity; | represents an absolute value, Xu
Figure BDA0002382564880000015
Xu|u|、Yv
Figure BDA0002382564880000016
Yv|v|、Nr
Figure BDA0002382564880000017
Nr|r|Are all dimensionless hydrodynamic coefficients, IzThe moment of inertia of the underwater robot around the z axis of the motion coordinate system, and the mass of the underwater robot;
and the control parameters of the backstepping method are optimized and adjusted by using Q learning:
for a speed controller, the input state vector is Su={s1u,s2u};s1u、s2uAre each eu
Figure BDA0002382564880000021
A corresponding spatially transformed transform value; the output of Q learning is a parameter k 'of the speed controller'u,k′uIs an operation space which needs to be dividedThereby establishing a Q value table of the speed controller;
for a heading controller, the input state vector is
Figure BDA0002382564880000022
Are respectively as
Figure BDA0002382564880000023
A corresponding spatially transformed transform value;
Figure BDA0002382564880000024
a transformed value of the spatial transform corresponding to the longitudinal velocity u; the output of Q learning is two control parameters of the heading controller
Figure BDA0002382564880000025
And
Figure BDA0002382564880000026
and
Figure BDA0002382564880000027
the method comprises the following steps of (1) establishing a Q value table of a heading controller by an action space needing to be divided;
the input of the Q-learned speed controller is the speed deviation and the speed deviation change rate, and the output is the adjusting parameter k 'of the speed controller through the Q-learned algorithm'u(ii) a Similarly, in the Q-learning heading controller, the deviation of the yaw angle, the deviation change rate of the yaw angle and the real-time speed of the underwater robot are input, and two adjustment parameters of the heading controller are finally output after the Q-learning algorithm
Figure BDA0002382564880000028
And
Figure BDA0002382564880000029
then, in terms of k'u
Figure BDA00023825648800000210
And
Figure BDA00023825648800000211
determining ku
Figure BDA00023825648800000212
ku=ku0+k′u
Figure BDA00023825648800000213
Figure BDA00023825648800000214
Wherein the content of the first and second substances,
Figure BDA00023825648800000215
and
Figure BDA00023825648800000216
are the initial values of two parameters of the heading controller;
will control the parameter ku
Figure BDA00023825648800000217
The value of the speed controller is substituted into the speed controller and the heading controller to realize the control of the underwater robot.
Has the advantages that:
when the underwater robot is controlled by using the method, a large amount of prior knowledge is not needed, and the parameters of the controller can be adjusted in real time through a control scheme, so that the method has stronger practicability.
Through simulation experiments, it can be determined that: under the condition of no ocean current interference, the invention has shorter rise time than the traditional speed and heading controller, mainly because the parameters of the controller of the self-adaptive backstepping method are variable or not invariable in the whole learning process of reinforcement learning, and according to the characteristics of reinforcement learning, the invention can select different optimal actions in different states, and the change of the three parameter values of the speed and heading controller is corresponding to the controller designed by people. Similarly, under the condition of ocean current interference, the controller has better control effect than the traditional controller, and has smaller overshoot and better anti-interference capability than the traditional controller.
The deviation of the speed and the heading controller of the invention is continuously reduced along with the increase of the training times, and finally the speed and the heading controller are stabilized at a certain fixed value or linger among a few smaller values, at this time, the reinforced learning training is stabilized, and the embodiment shows that under the condition of no ocean current interference, the speed controller of the invention is basically stabilized at the 150 th time, the heading controller is stabilized at the 100 th time, and similarly, under the condition of ocean current interference, the speed and the heading controller of the invention are stabilized at the 400 th time and the 350 th time respectively.
Drawings
FIG. 1 is a Q learning parameter adaptive backstepping controller of an underwater robot speed;
FIG. 2 is a Q learning parameter adaptive backstepping controller of the heading of an underwater robot;
FIG. 3 is a Q-learning based adaptive back-stepping speed controller;
FIG. 4 is a Q-learning based adaptive backstepping heading controller;
FIG. 5 is a graph of adaptive backstepping speed controller longitudinal thrust based on Q learning;
FIG. 6 adaptive backstepping heading controller yaw moment based on Q learning;
FIG. 7 adaptive backstepping speed controller parameter value variation based on Q-learning;
FIG. 8 adaptive backstepping heading controller parameter value 1 change based on Q-learning;
FIG. 9 adaptive backstepping heading controller parameter value 2 changes based on Q-learning;
FIG. 10 is a graph of a Q-learning based adaptive back-stepping speed controller speed offset sum;
FIG. 11 is a diagram of a Q-learning based adaptive back-stepping heading controller heading angle deviation sum;
FIG. 12 is a Q-learning based adaptive back-stepping speed controller;
FIG. 13 is a Q-learning based adaptive backstepping heading controller;
FIG. 14 is a graph of adaptive backstepping speed controller longitudinal thrust based on Q learning;
FIG. 15 adaptive backstepping heading controller yaw moment based on Q learning;
FIG. 16 adaptive backstepping speed controller parameter value variation based on Q-learning;
FIG. 17 adaptive backstepping heading controller parameter value 1 change based on Q-learning;
FIG. 18 adaptive backstepping heading controller parameter value 2 changes based on Q-learning;
FIG. 19 is a graph of the adaptive backstepping speed controller speed offset sum based on Q learning;
FIG. 20 is a graph of an adaptive backstepping heading controller heading angle deviation sum based on Q learning.
Detailed Description
In order to improve the autonomy and intelligence of motion control of the underwater robot and ensure that parameters of a backstepping method controller can be adjusted on line in real time, so that the motion control performance of the underwater robot under the interference of wind waves and current is improved.
Before describing the embodiments, the following description will first be made of parameters in the embodiments: mRB-an inertial force matrix; cRB-a coriolis centripetal force matrix; mA-an additional mass force matrix; cA-an additional damping force matrix; d is a damping force matrix; g is heavy buoyancy; tau isu-propeller longitudinal thrust; tau isr-a bow turning moment;
Figure BDA0002382564880000041
the underwater robot is in an inertial frameThe lower six-degree-of-freedom position and posture; v ═ u v w p q r]ΤThe underwater robot has six-degree-of-freedom linear velocity and angular velocity under a satellite coordinate system; su={s1u,s2u} -input vectors of reinforcement learning based speed controllers; k'u-adapting the output vector of the step-back speed controller based on the parameters of Q learning; s1u-deviation of the actual value of the speed from the desired value; s2u-rate of change of deviation of actual value of speed from expected value;
Figure BDA0002382564880000042
-an input vector of a lean based heading controller;
Figure BDA0002382564880000043
-adapting the output vector of the step-back speed controller based on the parameters of Q learning;
Figure BDA0002382564880000044
-deviation of the actual value of the heading angle from the desired value;
Figure BDA0002382564880000045
-the rate of change of deviation of the actual value of the heading angle from the desired value;
Figure BDA0002382564880000046
-real time speed of the autonomous underwater robot;
Figure BDA0002382564880000047
-a desired value of the heading angle; u. ofd-a desired value of speed; a istThe method comprises the following steps of obtaining a Q learning controller, obtaining an optimal output action based on the Q learning controller, obtaining a reward and punishment return value based on the Q learning controller, α, learning rate, gamma, discount rate and epsilon, greedy rate.
The first embodiment is as follows: the structure of the underwater robot speed and heading Q learning parameter adaptive backstepping method controller is respectively shown in fig. 1 and fig. 2;
the method for controlling the backstepping speed and the heading of the underwater robot based on the Q-learning parameter adaptive technology comprises the following steps:
based on a kinematic model and a dynamic model of the underwater robot, a speed controller and a heading controller are established by utilizing a backstepping method;
Figure BDA0002382564880000048
Figure BDA0002382564880000049
wherein k isu
Figure BDA00023825648800000410
All are positive numbers and are control parameters to be designed; tau isuIs the longitudinal thrust of the propeller; tau isrIs the bow turning moment; the positions of three axes of an x, y and z underwater robot coordinate system,
Figure BDA00023825648800000411
theta and psi are respectively a roll angle, a pitch angle and a course angle; u, v and w are respectively a longitudinal linear velocity, a transverse linear velocity and a vertical linear velocity, and p, q and r are respectively a transverse inclination angle velocity, a longitudinal inclination angle velocity and a yaw angle velocity; | represents an absolute value, Xu、Xu、Xu|u|、Yv
Figure BDA00023825648800000412
Yv|v|、Nr
Figure BDA00023825648800000413
Nr|r|Are all dimensionless hydrodynamic coefficients, IzThe moment of inertia of the underwater robot around the z axis of the motion coordinate system, and the mass of the underwater robot;
and the control parameters of the backstepping method are optimized and adjusted by using Q learning, and the optimal decision is found in continuous trial and error learning by using the learning characteristics of the Q learning, so that the method is full ofThe requirement of real-time adjustment on the parameters of the controller is met; the input of the Q learning parameter self-adaptive backstepping method speed controller is the speed deviation and the speed deviation change rate, and the output is the adjusting parameter k 'of the speed controller through the Q learning algorithm'u(ii) a Similarly, in the Q learning parameter self-adaptive backstepping heading controller, the deviation of the yaw angle, the deviation change rate of the yaw angle and the real-time speed of the underwater robot are input, and two adjusting parameters of the heading controller are finally output after the Q learning algorithm
Figure BDA0002382564880000051
And
Figure BDA0002382564880000052
the parameter adaptive backstepping speed and heading controller based on Q learning are designed based on the form of a Q value table, and meanwhile, the state space division size has great influence on the learning speed of Q learning, so that the states of the two controllers are divided into 21 equal parts:
for a speed controller, the input state vector is Su={s1u,s2u};s1u、s2uAre each eu
Figure BDA0002382564880000053
A corresponding conversion value; e.g. of the typeu∈[-2,2]In order to be able to measure the deviation in speed,
Figure BDA0002382564880000054
is the rate of change of deviation in velocity; in the interval [ -2,2 [)]The deviation from the speed is divided on average into 21 state values, i.e., -2, -1.8,. once, 1.8,2, every 0.2],s1uIs euThe value in the corresponding new theoretical domain, i.e. the conversion value; in the interval [ -1,1 [)]The rate of change of deviation with respect to speed is divided on average into 21 states, i.e., -1, -0.9,. once, 0.9,1, every 0.1],s2uIs composed of
Figure BDA0002382564880000055
The value in the corresponding new theoretical domain,i.e. the conversion value; the output of reinforcement learning is the parameter k 'of the speed controller'u,k′uIs a motion space to be divided, k 'is selected'u∈[-1,2]It is divided equally into 16 action values every 0.2, i.e., -1, -0.8]. In conclusion, a 21 × 21 × 16Q-value table is established for the speed controller.
For a heading controller, the input state vector is
Figure BDA0002382564880000056
Are respectively as
Figure BDA0002382564880000057
A corresponding conversion value;
Figure BDA0002382564880000058
in order to be able to determine the deviation of the yaw angle,
Figure BDA0002382564880000059
is the rate of change of deviation of yaw angle; will [ - π, π]Is expressed as [ -3.14,3.14 [ -3.14 [ ]]And is approximately [ -3,3 [)]In the interval [ -3,3 [)]The deviation from the yaw angle is divided on average every 0.3 into 21 state values, namely [ -3, -2.7.., 2.7,3],
Figure BDA00023825648800000510
Is composed of
Figure BDA00023825648800000511
The value in the corresponding new theoretical domain, i.e. the conversion value; in the interval [ -1,1 [)]The rate of change of deviation of the yaw angle is divided on average every 0.1 into 21 states, namely [ -1, -0.9.., 0.9,1 [ ]],
Figure BDA00023825648800000512
Is composed of
Figure BDA00023825648800000513
The value in the corresponding new theoretical domain, i.e. the conversion value;
Figure BDA00023825648800000514
new theory for longitudinal speed u correspondenceA value in the domain; the longitudinal speed interval [ -2,2 [ ]]Dividing the real-time speed into 21 states on average every 0.2 pairs; the output of reinforcement learning is two control parameters of the heading controller
Figure BDA00023825648800000515
And
Figure BDA00023825648800000516
and
Figure BDA00023825648800000517
is the action space to be divided, and selects
Figure BDA00023825648800000518
And
Figure BDA00023825648800000519
divide it into 16 action values on average every 0.2 and 0.1, respectively; actions 1 to 16 mean
Figure BDA00023825648800000520
The motions 17 to 32 represent
Figure BDA0002382564880000061
So far, a 21 × 21 × 21 × 32Q value table is established for the heading controller.
Rounding the values between each state to fit into the corresponding state as defined;
then according to k'u
Figure BDA0002382564880000062
And
Figure BDA0002382564880000063
determining ku
Figure BDA0002382564880000064
ku=ku0+k′u
Figure BDA0002382564880000065
Figure BDA0002382564880000066
Wherein the content of the first and second substances,
Figure BDA0002382564880000067
and
Figure BDA0002382564880000068
are the initial values of two parameters of the heading controller;
will control the parameter ku
Figure BDA0002382564880000069
The value of the speed controller is substituted into the speed controller and the heading controller to realize the control of the underwater robot.
Parameter self-adaptation backstepping controller reward punishment function design based on Q study: here, the reward and punishment function has a relatively clear target to evaluate the performance of the controller, and generally, the quality of a controller is based on the stability, accuracy and rapidity of the controller, and it is expected that the controller can reach a desired value more quickly and accurately, and reflect to a response curve with a faster rising speed and less overshoot and oscillation. Thus, the present invention defines the reward function as a function related to the control error and the rate of change of the error, as follows:
Figure BDA00023825648800000610
wherein, λ is a coefficient for controlling the magnitude of the reward and punishment function, and λ is more than 0; for speed controller
Figure BDA00023825648800000611
For the heading controller
Figure BDA00023825648800000612
euIn order to obtain the speed deviation value,
Figure BDA00023825648800000613
in order to be the rate of change of the speed deviation,
Figure BDA00023825648800000614
is a deviation value of the heading direction,
Figure BDA00023825648800000615
is the heading deviation change rate; Λ is a second-order diagonal matrix which represents the influence factor of each component of the σ on the reward punishment function; a is a magnitude control parameter of the reward function, the magnitude of the reward function can be controlled, and the observation of the above formula can find that when a is reduced, the interval of the reward function is smaller.
Determining an objective function: the control objectives of the speed and heading of the underwater robot are such that the underwater robot reaches and maintains the desired speed and desired heading, i.e. maximizes the desired cumulative reward function, so the objective function of the markov decision process model is as follows:
Figure BDA00023825648800000616
wherein r ist+1(st+1) State s at time t +1t+1A corresponding prize value; gamma is the discount rate; e [. C]Indicating a desire;
the parameter selection mechanism of the speed controller and the heading controller is as follows: the invention relates to a speed controller and a heading controller based on Q learning, wherein the parameter selection mode of the speed controller and the heading controller is an epsilon greedy strategy, epsilon is an epsilon (0,1), the epsilon greedy strategy that random probability epsilon is continuously attenuated along with the increase of iteration round number is adopted, when the value of epsilon is closer to 0, the training is shown to be in the last stage, and a Q learning system is more biased to utilize the learned experience; when the value of epsilon is closer to 1, the Q learning system is more inclined to explore the epsilon greedy strategy when the training is started; searching with the probability of epsilon and utilizing with the probability of 1-epsilon (if there are a plurality of actions with the same Q value, selecting one action at random) each time the action is selected; the concrete form is as follows:
Figure BDA0002382564880000071
Figure BDA0002382564880000072
wherein Q (s, a) is the value of the Q function, and π (a | s) is the strategy to take action a in the s-state; epsilon0The initial value is mu, the attenuation factor is mu, ξ is a control factor in the control epsilon (0,1) interval, and step represents the number of control rounds;
the learning updating process of the parameter self-adaptive backstepping controller based on Q learning comprises the following steps: initializing a Q value table of divided states and actions, initializing all Q values in the table to 0, setting an initial speed and an initial heading angle, and obtaining an initial state s through state conversiontThen selects action a via an epsilon greedy strategytTo the next state st+1And obtaining the real-time report given by the environmentt+1Based on these information, the Q-value table can be updated, and the specific update formula is as follows:
Figure BDA0002382564880000073
where α is the learning rate and γ is the discount rate.
Basic flow of the Q learning algorithm: in the Q learning algorithm, a table type algorithm is used, and when learning starts, a Q value table Q (s, a) is initialized arbitrarily; then the agent is in stState, determining action a according to epsilon greedy policytAnd obtaining empirical knowledge and training samples(s)t,at,st+1,rt+1) (ii) a Then updating the Q value; when the agent reaches the target state, the algorithm terminates a cycle; finally, the algorithm continues to start a new iteration loop from the initial state until the end of the learning period, noting that the Q table used in the initial state is already the Q table updated from the previous loop. The specific process of applying the Q learning algorithm to the control problem is shown in table 1:
TABLE 1 Single-step Q learning algorithm procedure
Figure BDA0002382564880000074
Figure BDA0002382564880000081
The second embodiment is as follows:
in the method for controlling backstepping speed and heading of an underwater robot based on a Q-learning parameter adaptive technique according to the embodiment, the process of establishing a speed controller and a heading controller by using a backstepping method based on a kinematic model and a dynamic model of the underwater robot comprises the following steps:
underwater robot coordinate system: two right-hand coordinate systems, namely an inertial coordinate system and a satellite coordinate system, are adopted.
The inertial coordinate system is used for describing the position and the posture of the underwater robot, the position and the posture of the underwater robot are fixedly connected to the ground, the E is represented by E- ξη zeta, the origin is E, the fixed point on the sea level is generally selected, the ξ axis and the η axis are perpendicular to each other, ξ axis is defined to point to geographical north as positive direction, η axis is defined to point to geographical east as positive direction, zeta axis is defined to point to the centre of the earth as positive direction, and ξ, η and zeta axis accord with the right-hand spiral rule.
An object coordinate system: the system is used for describing the motion information of the underwater robot, is fixedly connected to the underwater robot and is expressed by O-xyz, wherein O is an original point and is usually selected as the gravity center of the underwater robot; the x axis and the y axis are mutually vertical, the x axis is defined to point to the heading of the underwater robot as the forward direction, and the y axis is defined to point to the starboard of the underwater robot as the forward direction; the z axis takes the bottom pointing to the underwater robot as the positive direction; the x, y, z axes also conform to the right-hand helical rule.
The underwater robot kinematics model: the underwater robot kinematics model reflects the conversion relation between the pose and the (angular) speed, and the conversion relation is as follows:
Figure BDA0002382564880000082
Figure BDA0002382564880000083
Figure BDA0002382564880000084
Figure BDA0002382564880000085
Figure BDA0002382564880000086
Figure BDA0002382564880000087
in the formula (I), the compound is shown in the specification,
Figure BDA0002382564880000088
the position and the attitude of the underwater robot are shown, wherein the positions of three axes of an x, y and z underwater robot coordinate system,
Figure BDA0002382564880000089
theta and psi are respectively a roll angle, a pitch angle and a course angle; v ═ u v w p q r]ΤAnd the linear velocity and the angular velocity of the underwater robot are shown, wherein u, v and w are respectively a longitudinal linear velocity, a transverse linear velocity and a vertical linear velocity, and p, q and r are respectively a transverse inclination velocity, a longitudinal inclination velocity and a yaw angular velocity.
The underwater robot dynamics model: a six-degree-of-freedom dynamic model of an underwater robot, which is proposed by Fossen in Handbook of Marine scaffold dynamics and Motion Control, is adopted as follows:
Figure BDA0002382564880000091
wherein the content of the first and second substances,
Figure BDA0002382564880000092
CRB(v) v is the inertial force and the Coriolis centripetal force, MRBIs a matrix of inertial forces, CRBIs a Coriolis centripetal force matrix;
Figure BDA0002382564880000093
CA(v) v is an additional mass force and an additional damping force MATo add a mass force matrix, CAAdding damping force matrix, D (v) v is damping force, D is damping force matrix, g (η) is gravity buoyancy, and tau is thruster thrust.
Simplifying a six-degree-of-freedom dynamic model of the underwater robot into a horizontal plane kinematics and dynamic model: the underwater robot is assumed to be of a symmetrical structure in front and back, left and right, up and down, the gravity center and the floating center are on the same vertical line, the gravity and the buoyancy are balanced, and the horizontal plane kinematics and dynamics model is as follows:
Figure BDA0002382564880000094
Figure BDA0002382564880000095
wherein, tauuIs a longitudinal thrust, generated by the main propeller; tau isrIs a yaw moment and is generated by a group of vertical rudders; because the underwater robot has under-drive property and does not have a transverse propeller, the underwater robot has no transverse thrust, | · | represents an absolute value, Xu
Figure BDA00023825648800000910
Xu|u|、Yv
Figure BDA00023825648800000911
Yv|v|、Nr
Figure BDA00023825648800000912
Nr|r|Is dimensionless hydrodynamic coefficient, IzThe moment of inertia of the underwater robot around the z axis of the motion coordinate system, and the mass of the underwater robot.
According to the formula (2), the underwater robot speed control system is obtained as follows:
Figure BDA0002382564880000096
defining a speed deviation e ═ udU, and taking the derivative of e, we can obtain:
Figure BDA0002382564880000097
consider the following Lyapunov function simultaneously:
Figure BDA0002382564880000098
the derivation of equation (9) can be:
Figure BDA0002382564880000099
in order to ensure that the convergence of the V-V
Figure BDA00023825648800000913
Negative constant, so according to equation (10), for τuDesigning:
Figure BDA0002382564880000101
by substituting formula (11) for formula (10), it is possible to obtain:
Figure BDA0002382564880000102
it can be seen that only the design parameter kuIs positive number, the Lyapunov stability theory can be satisfied, thereby ensuring
Figure BDA0002382564880000103
Is negative, and finally ensures the asymptotic stability of the speed controller.
Also, according to equation (2), the underwater robot heading control system can be obtained as follows:
Figure BDA0002382564880000104
also, a heading deviation z is defined1=ψd-ψ,z2α -r, wherein α is the intermediate virtual control quantity, and for z1Taking the derivative, we can get:
Figure BDA0002382564880000105
consider the following Lyapunov function simultaneously:
Figure BDA0002382564880000106
the derivation of equation (13) can be:
Figure BDA0002382564880000107
to ensure V1Asymptotic convergence, which needs to be guaranteed
Figure BDA0002382564880000108
Negative definite, so the intermediate virtual control quantity is designed
Figure BDA0002382564880000109
At this time, α can be substituted into formula (14):
Figure BDA00023825648800001010
as can be seen from equation (15), as long as the design parameters
Figure BDA00023825648800001011
Is positive number, the Lyapunov stability condition can be satisfied, thereby leading the system z to be1Is calmed. For system z2The Lyapunov function is defined as follows:
Figure BDA00023825648800001012
the derivation of equation (16) can be:
Figure BDA00023825648800001013
the general formulae (14) and z1
Figure BDA00023825648800001014
z2
Figure BDA00023825648800001017
Substitution (17) gives:
Figure BDA00023825648800001015
order to
Figure BDA00023825648800001016
Satisfying negative definite, the control moment tau is neededrThe design is carried out, and the specific formula is as follows:
Figure BDA0002382564880000111
finally, formula (19) is substituted for formula (18):
Figure BDA0002382564880000112
as can be seen from equation (20), only the design parameters
Figure BDA0002382564880000113
And parameters
Figure BDA0002382564880000114
For positive numbers to ensure stability of the heading controller, z is1And z2Substituting, the final control moment of the heading controller can be obtained as follows:
Figure BDA0002382564880000115
in summary, only the speed and heading control laws are designed according to the equations (11) and (21), and the parameter value k of the controller for the speed and heading is ensuredu
Figure BDA0002382564880000116
And
Figure BDA0002382564880000117
the speed and the heading of the autonomous underwater robot can be well controlled by a positive number.
Other steps and parameters are the same as in the first embodiment.
Examples
The invention is applicable to virtually any form of autonomous underwater robot, i.e. it can be used to model any corresponding autonomous underwater robot. Because the contents of the invention need to be simulated in a simulation environment so as to verify the control effect of the speed and heading controller, the autonomous underwater robot needs to be subjected to a mathematical modeling to perform a simulation experiment.
Setting simulation parameters: in order to observe the speed and the effect of a heading controller of the self-adaptive backstepping method based on Q learning, the invention adopts an autonomous underwater robot kinematics and a dynamics model to respectively carry out corresponding simulation tests on the speed controller and the heading controller of the underwater robot. The desired value of the set speed is ud1m/s, the desired value of heading is psidThe effect of the control of speed and heading was observed at 1 rad. The return functions of the speed controller and the heading controller and the epsilon greedy strategy related parameters are set as follows: λ ═ 10, Λ ═ diag ([0.8, 0.2)]),ε00.4, 200, 800, ξ, 400, and one-step control step T is selecteds0.5s, 50s for the simulation time M of a single control period, 0.9 for the discount rate γ,0.9 for the learning rate α, ku0=3,
Figure BDA0002382564880000118
And speed and heading are initialized: u. of0=0m/s,v0=0m/s,ψ0=0rad,r0=0rad/s。
Meanwhile, in order to better observe the speed and the control effect of the heading controller based on the reinforcement learning self-adaptive backstepping method, two groups of simulation experiments are carried out under different conditions, the first group is the simulation of the speed and the heading controller under the condition of no ocean current interference, the second group is the simulation of the speed and the heading controller under the condition of ocean current interference, and the parameter settings are all as described above.
Simulation results and analysis:
the simulation effect of the speed and heading controller based on the Q learning adaptive backstepping method under the condition of no ocean current interference is shown in the following graphs, fig. 3 and 4 are respectively comparison graphs of the speed and heading controller based on the Q learning adaptive backstepping method and a traditional backstepping controller under the condition of no ocean current interference, fig. 5 and 6 are respectively graphs of longitudinal thrust and yaw moment based on the speed and heading controller based on the Q learning adaptive backstepping method, fig. 7, 8 and 9 are respectively comparison graphs of parameter value changes of the speed controller and the heading controller after 800 times of training and parameter value changes of the speed controller and the heading controller when the training is started, and fig. 10 and 11 are respectively graphs of speed deviation, deviation of yaw angle and change along with the increase of the training times of the speed controller and the heading controller based on the Q learning adaptive backstepping method.
The speed and heading controller simulation effect based on the Q learning adaptive backstepping method under the condition of ocean current interference is shown in the following graph, figures 12 and 13 are graphs comparing a Q-learning based adaptive back-stepping speed and heading controller with a conventional back-stepping controller in the presence of ocean current disturbances, figures 14 and 15 show respectively the longitudinal thrust and yaw moment diagrams of the adaptive backstepping speed and heading controller based on Q-learning in the presence of a sea current disturbance, FIGS. 16, 17 and 18 are graphs comparing the variation of the parameter values of the speed controller and the heading controller after 800 times of training with the variation of the parameter values of the speed controller and the heading controller at the beginning of training, fig. 19 and 20 show the speed deviation and the yaw angle deviation of the adaptive backstepping method speed controller and the heading controller based on Q learning under the condition of ocean current interference and change graphs along with the increase of training times respectively.
It can be seen from fig. 3, 4, 12 and 13 that the control effect of the controller based on Q learning is better than that of the conventional controller in both the no-flow condition and the flow condition. Under the condition of no ocean current interference, the speed and heading controller based on Q learning has shorter rise time than the traditional speed and heading controller, because the parameters of the controller based on the self-adaptive backstepping method are variable and not invariable in the whole learning process of reinforcement learning, and according to the characteristics of reinforcement learning, the controller based on Q learning can select different optimal actions in different states, and the change of three parameter values of the speed and heading controller is corresponding to the controller designed by people. Similarly, under the condition of ocean current interference, the speed and heading controller based on Q learning has better control effect than the traditional controller, and has smaller overshoot and better anti-interference capability than the traditional controller.
It can be seen from fig. 7, 8, 9, 16, 17 and 18 that the parameter values of the speed and heading controllers are changed from the initial parameters before and after the reinforcement learning training, which shows that the effect of combining Q learning with the conventional controller is obvious. In addition, under the environment without ocean current interference, the parameter value k of the speed controlleruFinally, the value of the heading controller parameter is stabilized at 4.2
Figure BDA0002382564880000121
And
Figure BDA0002382564880000122
respectively stabilizing at 2 and 1.7, and under the condition of ocean current interference, the parameter value k of the speed controlleruFinally, the value of the heading controller parameter is stabilized at 4.6
Figure BDA0002382564880000123
And
Figure BDA0002382564880000124
stabilized at 4.6 and 0.9, respectively, which, in contrast, in two different cases,the stabilized parameters are different for either the speed controller or the heading controller, which indicates that reinforcement learning does have the ability to self-learn and adapt in different environments.
The deviations of the speed and heading controller based on reinforcement learning and the deviation of the heading controller are continuously reduced along with the increase of the training times, and finally the deviations are stabilized at a certain fixed value or wander among a few smaller values, at this time, the reinforcement learning training is stabilized, so that in the case of no ocean current interference, although 800 times of training are performed, the speed controller based on reinforcement learning is basically stabilized at the 150 th time, the heading controller is stabilized at the 100 th time, and similarly, in the case of ocean current interference, as shown in fig. 19 and 20, the speed and heading controller based on reinforcement learning respectively reach the stabilization at the 400 th time and the 350 th time.
Compared with the prior art:
the invention aims to improve the autonomy and intelligence of the motion control of the underwater robot, and combines a reinforcement learning method with a traditional controller, so that the parameters of the traditional controller can be adjusted on line in real time. There are many methods for adjusting parameters, but most of them are performed in a constant and structured environment, and typical methods for adjusting parameters mainly include a fuzzy algorithm, a genetic algorithm, and the like. These two schemes are briefly described below and compared to the algorithm of the present invention.
1. Fuzzy algorithm adjusting parameter
Mohammad Hedayati Khodayari et al in the Modeling and control of Autonomous Underwater Vehicle (AUV) in the heading and depth attribute via self-adaptive fuzzy PID controller designed an adaptive fuzzy PID controller, applied to the two-channel tracking control of the heading and depth of an underwater robot, and obtained better robustness, dynamics and stability. The design a Fuzzy-like PDcontrollerforan underserver robot applies Fuzzy PD control to the control and depth control of an underwater robot, the Fuzzy part of the controller is optimized by structure and parameters, and the scale factor of the PD part is optimized based on the minimum number of experiments in a real environment. The application of the self-adaptive fuzzy PID control in the AUV control adopts a self-adaptive fuzzy PID control method, and utilizes a fuzzy reasoning method to perform online setting on PID parameters. However, parameter adjustment based on the fuzzy algorithm needs a large amount of expert prior knowledge, which is not beneficial to application and popularization, but the parameter adjustment method based on reinforcement learning adopted by the invention can carry out online adjustment of parameters without any prior knowledge, and can obtain knowledge only by continuous interaction between a sensor of the parameter adjustment method and the environment, thereby autonomously carrying out action selection.
2. Biological intelligent algorithm adjusting parameter
The biological intelligent algorithm is a series of optimization algorithms derived by simulating biological evolution, animal foraging, plant growth, natural phenomena, ecological balance and the like, and mainly comprises a genetic algorithm, a particle swarm optimization algorithm, an ant colony optimization algorithm, a bacterial foraging optimization algorithm and the like. The improved particle swarm optimization of the S-surface controller of the underwater robot provides an improved particle swarm optimization algorithm, the parameters of the S-surface controller are optimized, dynamic compression factors are adopted to accelerate the convergence of particles, an annealing algorithm is introduced to improve the local searching capability of the particle swarm optimization algorithm, a simulation experiment and a pool experiment are carried out on the underwater robot, and the experimental result shows that the algorithm has a good effect on the parameter optimization of the nonlinear controller of the underwater robot. The research on the fuzzy control technology for the path tracking of the underwater robot based on genetic algorithm optimization adopts a genetic algorithm to optimize the parameters of a path tracking controller of the underwater robot, compares the tracking effect under the condition of time-varying continuous ocean current interference and constant discontinuous ocean current interference, and a simulation experiment proves the effectiveness of the controller optimized based on the genetic algorithm. However, such an optimization algorithm has a slow convergence rate and a long search time, and is easily trapped in local optimization, and the parameters are mainly adjusted in an offline environment, so that the parameters need to be adjusted and corrected through a real-time environment on site and requirements. The self-adaptive backstepping speed and heading control method based on reinforcement learning provided by the invention can adjust parameters on line in real time according to the change of environment.
It should be noted that the detailed description is only for explaining and explaining the technical solution of the present invention, and the scope of protection of the claims is not limited thereby. It is intended that all such modifications and variations be included within the scope of the invention as defined in the following claims and the description.

Claims (7)

1. A method for controlling the backstepping speed and the heading of an underwater robot based on a Q-learning parameter adaptive technology is characterized by comprising the following steps of:
based on a kinematic model and a dynamic model of the underwater robot, a speed controller and a heading controller are established by utilizing a backstepping method;
Figure FDA0002382564870000011
Figure FDA0002382564870000012
wherein k isu
Figure FDA0002382564870000013
All are positive numbers and are control parameters to be designed; tau isuIs the longitudinal thrust of the propeller; tau isrIs the bow turning moment; the positions of three axes of an x, y and z underwater robot coordinate system,
Figure FDA0002382564870000014
theta and psi are respectively a roll angle, a pitch angle and a course angle; u, v and w are respectively a longitudinal linear velocity, a transverse linear velocity and a vertical linear velocity, and p, q and r are respectively a transverse inclination angle velocity, a longitudinal inclination angle velocity and a yaw angle velocity; | represents an absolute value, Xu
Figure FDA00023825648700000120
Xu|u|、Yv
Figure FDA00023825648700000121
Yv|v|、Nr
Figure FDA00023825648700000122
Nr|r|Are all dimensionless hydrodynamic coefficients, IzThe moment of inertia of the underwater robot around the z axis of the motion coordinate system, and the mass of the underwater robot;
and the control parameters of the backstepping method are optimized and adjusted by using Q learning:
for a speed controller, the input state vector is Su={s1u,s2u};s1u、s2uAre each eu
Figure FDA00023825648700000123
A corresponding spatially transformed transform value; the output of Q learning is a parameter k 'of the speed controller'u,k′uIs the action space which needs to be divided, thereby establishing a Q value table of the speed controller;
for a heading controller, the input state vector is
Figure FDA0002382564870000015
Figure FDA0002382564870000016
Are respectively as
Figure FDA0002382564870000017
A corresponding spatially transformed transform value;
Figure FDA0002382564870000018
a transformed value of the spatial transform corresponding to the longitudinal velocity u; the output of Q learning is two control parameters of the heading controller
Figure FDA0002382564870000019
And
Figure FDA00023825648700000110
Figure FDA00023825648700000111
and
Figure FDA00023825648700000112
the method comprises the following steps of (1) establishing a Q value table of a heading controller by an action space needing to be divided;
the input of the Q-learned speed controller is the speed deviation and the speed deviation change rate, and the output is the adjusting parameter k 'of the speed controller through the Q-learned algorithm'u(ii) a Similarly, in the Q-learning heading controller, the deviation of the yaw angle, the deviation change rate of the yaw angle and the real-time speed of the underwater robot are input, and two adjustment parameters of the heading controller are finally output after the Q-learning algorithm
Figure FDA00023825648700000113
And
Figure FDA00023825648700000114
then, in terms of k'u
Figure FDA00023825648700000115
And
Figure FDA00023825648700000116
determining ku
Figure FDA00023825648700000117
ku=ku0+k′u
Figure FDA00023825648700000118
Figure FDA00023825648700000119
Wherein the content of the first and second substances,
Figure FDA0002382564870000021
and
Figure FDA0002382564870000022
are the initial values of two parameters of the heading controller;
will control the parameter ku
Figure FDA0002382564870000023
The value of the speed controller is substituted into the speed controller and the heading controller to realize the control of the underwater robot.
2. The method for controlling the backstepping speed and the heading of the underwater robot based on the Q-learning parameter adaptive technology as claimed in claim 1, wherein the specific process of establishing the Q value table of the speed controller and the Q value table of the heading controller comprises the following steps:
for a speed controller, the input state vector is Su={s1u,s2u};s1u、s2uAre each eu
Figure FDA0002382564870000024
A corresponding conversion value; e.g. of the typeu∈[-2,2]In order to be able to measure the deviation in speed,
Figure FDA0002382564870000025
is the rate of change of deviation in velocity; in the interval [ -2,2 [)]The deviation from the speed is divided on average into 21 state values, i.e., -2, -1.8,. once, 1.8,2, every 0.2],s1uIs euThe value in the corresponding new theoretical domain, i.e. the conversion value; in the interval [ -1,1 [)]The rate of change of deviation of velocity is divided equally into 21 states, namely [ -1 ], at intervals of 0.10.9,...,0.9,1],s2uIs composed of
Figure FDA0002382564870000026
The value in the corresponding new theoretical domain, i.e. the conversion value; the output of reinforcement learning is the parameter k 'of the speed controller'u,k′uIs a motion space to be divided, k 'is selected'u∈[-1,2]It is divided equally into 16 action values every 0.2, i.e., -1, -0.8](ii) a In conclusion, a 21 × 21 × 16Q value table is established for the speed controller;
for a heading controller, the input state vector is
Figure FDA0002382564870000027
Figure FDA0002382564870000028
Are respectively as
Figure FDA0002382564870000029
A corresponding conversion value;
Figure FDA00023825648700000210
in order to be able to determine the deviation of the yaw angle,
Figure FDA00023825648700000211
is the rate of change of deviation of yaw angle; will [ - π, π]Is expressed as [ -3.14,3.14 [ -3.14 [ ]]And is approximately [ -3,3 [)]In the interval [ -3,3 [)]The deviation from the yaw angle is divided on average every 0.3 into 21 state values, namely [ -3, -2.7.., 2.7,3],
Figure FDA00023825648700000212
Is composed of
Figure FDA00023825648700000213
The value in the corresponding new theoretical domain, i.e. the conversion value; in the interval [ -1,1 [)]The rate of change of deviation of the yaw angle is divided on average every 0.1 into 21 states, namely [ -1, -0.9.., 0.9,1 [ ]],
Figure FDA00023825648700000214
Is composed of
Figure FDA00023825648700000215
The value in the corresponding new theoretical domain, i.e. the conversion value;
Figure FDA00023825648700000216
a value in the new theoretical domain corresponding to the longitudinal velocity u; the longitudinal speed interval [ -2,2 [ ]]Dividing the real-time speed into 21 states on average every 0.2 pairs; the output of reinforcement learning is two control parameters of the heading controller
Figure FDA00023825648700000217
And
Figure FDA00023825648700000218
and
Figure FDA00023825648700000219
is the action space to be divided, and selects
Figure FDA00023825648700000220
And
Figure FDA00023825648700000221
divide it into 16 action values on average every 0.2 and 0.1, respectively; actions 1 to 16 mean
Figure FDA00023825648700000222
The motions 17 to 32 represent
Figure FDA00023825648700000223
So far, a 21 × 21 × 21 × 32Q value table is established for the heading controller.
3. The method for controlling the backstepping speed and the heading of the underwater robot based on the Q-learning parameter adaptive technology as claimed in claim 2, wherein in the Q learning process, a reward function is as follows:
Figure FDA00023825648700000224
wherein, λ is a coefficient for controlling the magnitude of the reward and punishment function, and λ is more than 0; for speed controller
Figure FDA0002382564870000031
For the heading controller
Figure FDA0002382564870000032
euIn order to obtain the speed deviation value,
Figure FDA0002382564870000033
in order to be the rate of change of the speed deviation,
Figure FDA0002382564870000034
is a deviation value of the heading direction,
Figure FDA0002382564870000035
is the heading deviation change rate; Λ is a second-order diagonal matrix which represents the influence factor of each component of the σ on the reward punishment function; a is a magnitude control parameter of the reward function.
4. The method for controlling the backstepping speed and the heading of the underwater robot based on the Q-learning parameter adaptive technology as claimed in claim 3, wherein in the Q learning process, the objective function is as follows:
the control objectives of the speed and heading of the underwater robot are such that the underwater robot reaches and maintains the desired speed and desired heading, i.e. maximizes the desired cumulative reward function, so the objective function of the markov decision process model is as follows:
Figure FDA0002382564870000036
wherein r ist+1(st+1) State s at time t +1t+1A corresponding prize value; gamma is the discount rate; e [. C]Indicating a desire.
5. The method for controlling the backstepping speed and the heading of the underwater robot based on the Q-learning parameter adaptive technology as claimed in claim 4, wherein in the Q learning process, the parameter selection process of the speed controller and the heading controller is as follows:
the invention relates to a speed controller and a heading controller based on Q learning, wherein the parameter selection mode of the speed controller and the heading controller is an epsilon greedy strategy, epsilon is an epsilon (0,1), the epsilon greedy strategy that random probability epsilon is continuously attenuated along with the increase of iteration round number is adopted, when the value of epsilon is closer to 0, the training is shown to be in the last stage, and a Q learning system is more biased to utilize the learned experience; when the value of epsilon is closer to 1, the Q learning system is more inclined to explore the epsilon greedy strategy when the training is started; searching with the probability of epsilon and utilizing with the probability of 1-epsilon (if there are a plurality of actions with the same Q value, selecting one action at random) each time the action is selected; the concrete form is as follows:
Figure FDA0002382564870000037
ε=ε0·e(μ-step)/ξ
wherein Q (s, a) is the value of the Q function, and π (a | s) is the strategy to take action a in the s-state; epsilon0For the initial value, μ is the decay factor, ξ is the control factor in the control ε ∈ (0,1) interval, and step represents the number of control rounds.
6. The method for controlling the backstepping speed and heading of the underwater robot based on the Q-learning parameter adaptive technology as claimed in claim 5, wherein in the Q learning process, the learning and updating process of the parameter adaptive backstepping controller based on Q learning comprises the following steps:
initializing a Q value table of divided states and actions, initializing all Q values in the table to 0, setting an initial speed and an initial heading angle, and obtaining an initial state s through state conversiontThen selects action a via an epsilon greedy strategytTo the next state st+1And obtaining the real-time report given by the environmentt+1Based on these information, the Q-value table can be updated, and the specific update formula is as follows:
Figure FDA0002382564870000041
where α is the learning rate and γ is the discount rate.
7. The method for controlling the backstepping speed and the heading of the underwater robot based on the Q-learning parameter adaptive technology as claimed in claim 6, wherein the kinematic model and the dynamic model of the underwater robot are as follows:
the underwater robot kinematics model: the underwater robot kinematics model reflects the conversion relation between the pose and the speed, and the conversion relation is as follows:
Figure FDA0002382564870000042
Figure FDA0002382564870000043
Figure FDA0002382564870000044
Figure FDA0002382564870000045
Figure FDA0002382564870000046
Figure FDA0002382564870000047
in the formula (I), the compound is shown in the specification,
Figure FDA0002382564870000048
representing the position and attitude of the underwater robot, v ═ u v w p q r]ΤRepresenting linear and angular velocities of the underwater robot;
the underwater robot dynamics model: a six-degree-of-freedom dynamic model of the underwater robot is adopted, and the method comprises the following steps:
Figure FDA0002382564870000049
wherein the content of the first and second substances,
Figure FDA00023825648700000410
CRB(v) v is the inertial force and the Coriolis centripetal force, MRBIs a matrix of inertial forces, CRBIs a Coriolis centripetal force matrix;
Figure FDA00023825648700000411
CA(v) v is an additional mass force and an additional damping force MATo add a mass force matrix, CAD (v) v is damping force, D is damping force matrix, g (η) is gravity buoyancy, tau is thruster thrust;
simplifying a six-degree-of-freedom dynamic model of the underwater robot into a horizontal plane kinematics and dynamic model: the underwater robot is assumed to be of a symmetrical structure in front and back, left and right, up and down, the gravity center and the floating center are on the same vertical line, the gravity and the buoyancy are balanced, and the horizontal plane kinematics and dynamics model is as follows:
Figure FDA00023825648700000412
Figure FDA0002382564870000051
wherein, tauuIs a longitudinal thrust, generated by the main propeller; tau isrIs a yaw moment and is generated by a group of vertical rudders; the underwater robot has underactuation.
CN202010087509.XA 2020-02-11 2020-02-11 Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology Active CN111290270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010087509.XA CN111290270B (en) 2020-02-11 2020-02-11 Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010087509.XA CN111290270B (en) 2020-02-11 2020-02-11 Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology

Publications (2)

Publication Number Publication Date
CN111290270A true CN111290270A (en) 2020-06-16
CN111290270B CN111290270B (en) 2022-06-03

Family

ID=71021357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010087509.XA Active CN111290270B (en) 2020-02-11 2020-02-11 Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology

Country Status (1)

Country Link
CN (1) CN111290270B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966118A (en) * 2020-08-14 2020-11-20 哈尔滨工程大学 ROV thrust distribution and reinforcement learning-based motion control method
CN112947505A (en) * 2021-03-22 2021-06-11 哈尔滨工程大学 Multi-AUV formation distributed control method based on reinforcement learning algorithm and unknown disturbance observer
CN112947054A (en) * 2021-02-22 2021-06-11 武汉理工大学 Ship PID control parameter setting method and system based on Q-learning and storage medium
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN113325857A (en) * 2021-06-08 2021-08-31 西北工业大学 Simulated bat ray underwater vehicle depth control method based on centroid and buoyancy system
CN113639755A (en) * 2021-08-20 2021-11-12 江苏科技大学苏州理工学院 Fire scene escape-rescue combined system based on deep reinforcement learning
CN117079118A (en) * 2023-10-16 2023-11-17 广州华夏汇海科技有限公司 Underwater walking detection method and system based on visual detection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544256A (en) * 2017-10-17 2018-01-05 西北工业大学 Underwater robot sliding-mode control based on adaptive Backstepping
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN108873687A (en) * 2018-07-11 2018-11-23 哈尔滨工程大学 A kind of Intelligent Underwater Robot behavior system knot planing method based on depth Q study
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544256A (en) * 2017-10-17 2018-01-05 西北工业大学 Underwater robot sliding-mode control based on adaptive Backstepping
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN108873687A (en) * 2018-07-11 2018-11-23 哈尔滨工程大学 A kind of Intelligent Underwater Robot behavior system knot planing method based on depth Q study
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马琼雄.等: "基于深度强化学习的水下机器人最优轨迹控制", 《华南师范大学学报(自然科学版)》, vol. 50, no. 1, 25 February 2018 (2018-02-25), pages 118 - 123 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966118A (en) * 2020-08-14 2020-11-20 哈尔滨工程大学 ROV thrust distribution and reinforcement learning-based motion control method
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN112947054A (en) * 2021-02-22 2021-06-11 武汉理工大学 Ship PID control parameter setting method and system based on Q-learning and storage medium
CN112947505A (en) * 2021-03-22 2021-06-11 哈尔滨工程大学 Multi-AUV formation distributed control method based on reinforcement learning algorithm and unknown disturbance observer
CN113325857A (en) * 2021-06-08 2021-08-31 西北工业大学 Simulated bat ray underwater vehicle depth control method based on centroid and buoyancy system
CN113325857B (en) * 2021-06-08 2022-08-05 西北工业大学 Simulated bat ray underwater vehicle depth control method based on centroid and buoyancy system
CN113639755A (en) * 2021-08-20 2021-11-12 江苏科技大学苏州理工学院 Fire scene escape-rescue combined system based on deep reinforcement learning
CN117079118A (en) * 2023-10-16 2023-11-17 广州华夏汇海科技有限公司 Underwater walking detection method and system based on visual detection
CN117079118B (en) * 2023-10-16 2024-01-16 广州华夏汇海科技有限公司 Underwater walking detection method and system based on visual detection

Also Published As

Publication number Publication date
CN111290270B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN111290270B (en) Underwater robot backstepping speed and heading control method based on Q-learning parameter adaptive technology
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
CN110806756B (en) Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN111240345B (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111240344B (en) Autonomous underwater robot model-free control method based on reinforcement learning technology
CN112462792B (en) Actor-Critic algorithm-based underwater robot motion control method
CN111176122B (en) Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN108255060B (en) Ship dynamic positioning active disturbance rejection control method based on extreme learning machine
CN111966118A (en) ROV thrust distribution and reinforcement learning-based motion control method
CN114815626B (en) Prediction active disturbance rejection and stabilization reduction control method of rudder fin system
CN111580387B (en) Time-lag fractional order based ship motion adaptive sliding mode control method and system
CN112947505B (en) Multi-AUV formation distributed control method based on reinforcement learning algorithm and unknown disturbance observer
Stevšić et al. Sample efficient learning of path following and obstacle avoidance behavior for quadrotors
CN113885534A (en) Intelligent prediction control-based water surface unmanned ship path tracking method
CN111273677B (en) Autonomous underwater robot speed and heading control method based on reinforcement learning technology
CN112650233A (en) Unmanned ship trajectory tracking optimal control method based on backstepping method and self-adaptive dynamic programming under dead zone limitation
Gao et al. Online optimal control for dynamic positioning of vessels via time-based adaptive dynamic programming
CN115903820A (en) Multi-unmanned-boat pursuit and escape game control method
Deng et al. Data-driven unmanned surface vessel path following control method based on reinforcement learning
Yuan et al. Deep reinforcement learning-based controller for dynamic positioning of an unmanned surface vehicle
CN110703792B (en) Underwater robot attitude control method based on reinforcement learning
CN114578819A (en) Control method for multi-surface ship distributed formation based on artificial potential field method
Cao et al. A realtime Q-Learning method for unmanned surface vehicle target tracking
Song et al. Enhanced fireworks algorithm-auto disturbance rejection control algorithm for robot fish path tracking
Wang et al. Course tracking control for smart ships based on a deep deterministic policy gradient-based algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant