CN111930141A

CN111930141A - Three-dimensional path visual tracking method for underwater robot

Info

Publication number: CN111930141A
Application number: CN202010703073.2A
Authority: CN
Inventors: 张国成; 孙玉山; 柴璞鑫; 吴新雨; 张宸鸣; 马陈飞
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2020-11-13

Abstract

The invention discloses a three-dimensional path visual tracking method for an underwater robot. The invention belongs to the technical field of three-dimensional path planning of underwater robots, and is characterized in that a geodetic coordinate system, a carrier coordinate system and a curve coordinate system are established, a six-degree-of-freedom model of the underwater robot is established according to the coordinate system, a path tracking error model is established according to the established six-degree-of-freedom model of the underwater robot, and course angle deviation and submergence angle deviation are determined; performing three-dimensional path tracking on the established six-degree-of-freedom model of the underwater robot by adopting a backstepping sliding mode control method; and training the three-dimensional path tracking by adopting a deep reinforcement learning method to complete the visual tracking of the three-dimensional path of the underwater robot. The invention ensures continuous tracking error and improves the stability of path tracking; adding an integral term to the line-of-sight guidance rate to introduce the influence of time; a boundary reward function is added to accelerate the convergence speed of path tracking, reduce overshoot and improve precision.

Description

Three-dimensional path visual tracking method for underwater robot

Technical Field

The invention relates to the technical field of three-dimensional path planning of underwater robots, in particular to a visual tracking method for a three-dimensional path of an underwater robot.

Background

The ocean is the cradle of the earth life, the ocean area accounts for 71 percent of the earth surface area, rich water resources, biological resources and mineral resources are stored, along with the reduction of the mineral resources on the land and the gradual exposure of the problem of water resource shortage, all countries in the world realize the importance of ocean resource development, the development and utilization of the ocean resources become necessary ways for sustainable development, and the ocean resource development and utilization become a new field of cooperation and competition of all countries. Meanwhile, the world economy can not be shipped, and the ocean transportation is an important way for the circulation of bulk commodities, so that the safety of a navigation channel is protected, the stable and smooth ocean transportation is maintained, and the method has great significance for the continuous and healthy development of national economy. In order to meet the requirements of the economic field and the military field, the underwater mobile platform which is small in size, long in voyage, rich in functions and intelligent to a certain extent is required to be developed. Driven by these demands, Autonomous Underwater Vehicles (AUV) have been developed rapidly, and are widely used in the field of ocean development and become research hotspots of various research institutes. The intelligent underwater robot has the characteristics of long voyage, long endurance, small size and high flexibility, and has wide application and good development prospect in the aspects of ocean resource detection, hydrological information observation, underwater operation and underwater target search. The AUV can monitor hydrological information underwater for a long time through the requirements of a set program, cruise according to a set route, scan and model submarine topography, autonomously search targets, detect and maintain submarine pipelines, cables and the like, so that manpower and material resources are saved to a great extent, and meanwhile, the operation efficiency and safety are improved. In the military field, the underwater unmanned underwater vehicle can be used for executing anti-diving, marine blocking, early warning and communication tasks, a plurality of AUVs can form a powerful underwater cluster through interconnection, and various complex combat tasks can be executed in a wide sea area through centralized command and information sharing.

From the current research situation, for the path tracking guidance method of the AUV, the time-varying nonlinearity, the uncertainty research of the model parameters and the external environment interference, domestic and foreign scholars have proposed various methods to solve the above problems, and significant achievements are obtained. Such as a line-of-sight method, a virtual target method, a time delay estimation technology, a virtual control quantity, an energy dissipation theory and the like, but most of the methods are complex and have poor adaptability, and in the research of designing a controller by adopting deep reinforcement learning, the controller has better self-adaptive capacity, but the problems to be solved are also solved when the deep reinforcement learning is applied to the three-dimensional path tracking of an under-actuated AUV. Therefore, the deep neural network only outputs and acts with a single target at many times, and neglects the manipulation characteristic of the underactuated AUV. Secondly, the controller using reinforcement learning may not be sensitive enough to small errors in path tracking, which affects further improvement of tracking accuracy. Based on the analysis, the characteristics of multiple inputs, multiple outputs, nonlinearity and strong coupling of the system need to be considered in the design process of the AUV path tracking controller, and meanwhile, the influence of external ocean currents on the system is reduced as much as possible. The designed controller has stronger robustness and self-adaptability while ensuring the tracking precision.

Disclosure of Invention

The invention provides a three-dimensional path visual tracking method for an underwater robot, aiming at ensuring continuous tracking error and improving the stability of path tracking, and the invention provides the following technical scheme:

a three-dimensional path visual tracking method for an underwater robot comprises the following steps:

step 1: establishing a geodetic coordinate system, a carrier coordinate system and a curve coordinate system, and establishing a six-degree-of-freedom model of the underwater robot according to the coordinate system;

step 2: according to the established six-degree-of-freedom model of the underwater robot, a path tracking error model is established, and course angle deviation and submergence angle deviation are determined;

and step 3: performing three-dimensional path tracking on the established six-degree-of-freedom model of the underwater robot by adopting a backstepping sliding mode control method;

and 4, step 4: and training the three-dimensional path tracking by adopting a deep reinforcement learning method to complete the visual tracking of the three-dimensional path of the underwater robot.

Preferably, the step 1 specifically comprises:

step 1.1: establishing a geodetic coordinate system, wherein the geodetic coordinate system is a certain point on the sea level, the positive direction of a zeta axis in the geodetic coordinate system is the same as the main course of the underwater robot AUV, the zeta axis points to the geocentric, and the zeta axis, the eta axis and the zeta axis form a right-hand coordinate system;

establishing a carrier coordinate system, wherein the origin of the carrier coordinate system is the mass center, x, of the AUV (autonomous Underwater vehicle)_BThe shaft is fixedly connected with the AUV heading of the underwater robot, y_BThe shaft is fixedly connected with an AUV starboard, x_BAxis, y_BAxis and z_BThe axes form a right-hand coordinate system;

establishing a curve coordinate system, wherein the origin of the curve coordinate system is a point P, x on the expected path_SFIn the tangential direction of the desired path, y_SFAxis in the normal direction, x_SFAxis, y_SFAxis and z_SFThe axes form a right-hand coordinate system;

step 1.2: according to a geodetic coordinate system, a carrier coordinate system and a curve coordinate system, a six-degree-of-freedom model of the underwater robot is established, the six-degree-of-freedom model comprises a kinetic equation and a kinematic equation, and the kinetic equation is expressed by the following formula:

the kinematic equation is represented by:

wherein m is the mass of the underwater robot, I_yFor moment of inertia about the y-axis, I_zFor the moment of inertia about the z-axis, u, v, w are the longitudinal, transverse and vertical velocities, respectively, q, r are the pitch and yaw angular velocities, θ, ψ are the pitch and heading, X_(·)，Y_(·)，Z_(·)，M_(·)，N_(·)Are all hydrodynamic coefficients, z_g，z_bFor the position of the centre of gravity and centre of buoyancy, X is the longitudinal thrust, M and N are the torques about the y-axis and z-axis generated by the combined action of the propeller and rudder, psi_BIs the heading angle, theta, of the underwater robot_BIs the submergence angle of the underwater robot, alpha is the attack angle, beta is the drift angle; v. of_tThe resultant velocity of the underwater robot.

Preferably, the step 2 specifically comprises:

step 2.1: defining a virtual underwater robot AUV on a tracking path according to the established six-degree-of-freedom model of the underwater robot, and expressing a virtual underwater robot AUV kinematic equation by the following formula:

wherein psi_pAnd theta_pAttitude angle, V, of a virtual target_pThe resultant velocity of the virtual robot;

step 2.2: converting the position errors of the real underwater robot AUV and the virtual underwater robot AUV in the inertial coordinate system into a curve coordinate system, and expressing the conversion process by the following formula:

differentiating the converted coordinate system to obtain an error kinetic equation, and expressing the error kinetic equation by the following formula:

step 2.3: neglecting errors caused by non-linearity in a three-dimensional space, determining course angle deviation and submergence angle deviation, and expressing the course angle deviation and the submergence angle deviation by the following formula:

wherein the content of the first and second substances,

in order to be the deviation of the course angle,

is the deviation of the angle of repose.

Preferably, the step 3 specifically comprises:

step 3.1: adopting a backstepping sliding mode control method, introducing a horizontal plane approach angle and a vertical plane approach angle based on a Lyapunov function to adjust the path tracking process of the underwater robot, and expressing the horizontal plane approach angle (y) by the following formula_e)：

The vertical approach angle χ (z) is expressed by_e)：

Wherein, Delta_jIs the horizontal front-looking distance, Delta_kIs the vertical front viewing distance;

step 3.2: determining a tracking error according to the horizontal plane approach angle and the vertical plane approach angle, and tracking the error according to the following formula:

_ψ＝ψ_e-(y_e)

_θ＝θ_e-χ(z_e)

wherein the content of the first and second substances,_ψin order to correct the tracking error of the horizontal plane,_θa vertical plane tracking error;

adopting a three-dimensional spiral path to test the path tracking effect of the backstepping sliding mode control method, establishing a three-dimensional spiral line available parameter equation, and expressing the three-dimensional spiral line available parameter equation by the following formula:

wherein S is a path parameter, the initial value of the target position is S (0) ═ 0, the initial position of AUV is ξ (0) ═ 65, η (0) ═ 500, ζ (0) ═ 50, the initial heading angle ψ (0) ═ 0, the initial pitch angle θ (0) ═ 0, the initial speed of 0.1m/S, the initial angular speeds of 0 and 1m/S, and the steady water flow is added to detect the anti-flow interference capability, the speed of the water flow in the ξ direction is 0.3m/S, the speed in the η direction is 0.3m/S, and the speed in the ζ direction is 0.15m/S, thereby completing the three-dimensional path tracking.

Preferably, the step 4 specifically includes:

step 4.1: when the LOS method is adopted to calculate the deviation, an integral term is added to eliminate the periodic error, the integral term introduces time, the consideration of time can be effectively added into a control loop, and the deviation after the integral term is added is represented by the following formula:

wherein k is_ψ，k_θRespectively, control gains;

the basic idea of the disturbance observer is to modify the estimated value by the difference between the estimated output and the actual output, by means of a hundred and ten disturbance observer of the following formula:

step 4.2: adding penalty items into the reward function, wherein the penalty items comprise a vertical rudder, a rudder angle of a horizontal rudder and a rudder angle change rate, the penalty items are set by adopting a deformed second-order Gaussian function, and the improved reward function is represented by the following formula:

adding a boundary reward to the reward function, i.e. when the AUV is within a set specific boundary range, adding 1 to the reward value of the step as an extra reward, and expressing the boundary reward function by the following formula:

step 4.3: performing parameter optimization, including Actor neural network learning rate LR _ A, Critic neural network learning rate LR _ C, reward value attenuation discount coefficient and parameter updating discount coefficient;

selecting different parameters for debugging for multiple times, wherein the final parameters are LR _ A is 0.001, LR _ C is 0.003, gamma is 0.95 and tau is 0.05;

after the reward function is improved and the current disturbance observer is added, the new path tracking controller is trained, and the three-dimensional path visual tracking of the underwater robot is completed.

The invention has the following beneficial effects:

the current error is influenced by gradual accumulation of past errors due to continuous time integration, so that tracking is dynamically adjusted, the occurrence of static error is restrained, the influence caused by water flow interference is further reduced in order to adapt to a complex ocean current environment, the robustness and the anti-interference capability of the controller are enhanced, an interference observer is added to the control system after research, and the output of the controller is actively adjusted in real time through the observed interference form and characteristics.

The invention considers the problems of low path tracking convergence speed and large early deviation, and has the problems of continuous correct heading and large position deviation in training. When the distance is out of the boundary range, the reward is not given, the sensitivity of the neural network to the position deviation is improved, the tracking effect is improved,

the Actor neural network learning rate LR _ a determines how much experience needs to be learned in one update of the Actor network parameters, i.e., the larger LR _ a, the more experience is learned in each round of learning, and vice versa. The Critic neural network learning rate LR _ C determines how much experience needs to be learned in one update of Critic network parameters, i.e., the larger LR _ C, the more experience is learned in each round of learning, and vice versa. The discount coefficient of attenuation of reward value is used for reducing the influence of the return of the state after in the Markov decision-making process to the current state measurement, namely the smaller the influence is, the larger the influence is, the more the influence is, the later state return is, when the current state is measured. The parameter update discount coefficient determines the weight of the new parameter when the new network parameter updates the old network parameter, i.e. the larger the weight of the new parameter is, the larger the change degree of the parameter is, and vice versa. The efficiency and the effect of deep reinforcement learning are closely related to the four parameters, and the optimal parameter setting needs to be obtained through theoretical analysis and practice.

The invention designs the AUV approach angle by adopting a line-of-sight method, and designs a virtual AUV and the control rate of the position thereof by adopting a virtual guide method. And an AUV three-dimensional motion error model is established, and the virtual AUV continuously guides the AUV by adjusting the speed of the virtual AUV, so that the tracking error is continuous, and the stability of path tracking is improved.

A design method of the deep reinforcement learning path tracking controller is explored, and the most appropriate deep reinforcement learning algorithm is obtained. Learning and simulation environments are established by adopting python language, and a training cycle flow is designed by adding exploration parameters into the DDPG algorithm to enhance the exploration of the algorithm. The current state of the AUV is used as input, the action of an AUV motion actuating mechanism is used as output, and a deep neural network is built as the core of the controller. And finally, AUV three-dimensional path tracking based on the DDPG algorithm is successfully realized in simulation, and stable three-dimensional path tracking is basically realized.

Adding an integral term to the line-of-sight guidance rate to introduce the influence of time; a boundary reward function is added to accelerate the convergence speed of path tracking, reduce overshoot and improve precision. A second-order Gaussian function related to the rudder angle and the change rate of the rudder angle is added into a reward function of the reinforcement learning algorithm, so that the frequent reciprocating change of the rudder angle is inhibited; an ocean current disturbance observer is added in the control loop, so that the ocean current is observed in real time, the disturbance of the ocean current on the controller is inhibited, and the periodic error is reduced.

Drawings

FIG. 1 is a flow chart of a three-dimensional path visualization tracking method of an underwater robot;

FIG. 2 is a schematic view of a coordinate system;

FIG. 3 is a low pass filtering block diagram;

FIG. 4 is a three-dimensional path tracking trajectory diagram;

FIG. 5 is a graph of position error curves;

FIG. 6 is a schematic diagram of path tracking training;

FIG. 7 is a deep learning three-dimensional path tracking trajectory diagram;

FIG. 8 is a deep learning three-dimensional path tracking position error map;

FIG. 9 is a graph of deep reinforcement learning round rewards.

Detailed Description

The present invention will be described in detail with reference to specific examples.

The first embodiment is as follows:

according to fig. 1, the application provides a method for visually tracking a three-dimensional path of an underwater robot, which comprises the following steps:

the step 1 specifically comprises the following steps:

as shown in fig. 2, step 1.1: establishing a geodetic coordinate system { I }, wherein the geodetic coordinate system is a certain point on the sea level, the positive direction of a zeta axis in the geodetic coordinate system is the same as the main course of the AUV, the zeta axis points to the geocentric, and the zeta axis, the eta axis and the zeta axis form a right-hand coordinate system;

establishing a carrier coordinate system { B }, wherein the origin of the carrier coordinate system is the centroid of the AUV (autonomous Underwater vehicle), x_BThe shaft is fixedly connected with the AUV heading of the underwater robot, y_BThe shaft is fixedly connected with an AUV starboard, x_BAxis, y_BAxis and z_BThe axes form a right-hand coordinate system;

establishing a curve coordinate system S-F, wherein the origin of the curve coordinate system is a point P, x on the expected path_SFIn the tangential direction of the desired path, y_SFAxis in the normal direction, x_SFAxis, y_SFAxis and z_SFThe axes form a right-hand coordinate system;

the kinematic equation is represented by:

wherein m is the mass of the underwater robot, I_yFor moment of inertia about the y-axis, I_zFor rotational inertia about the z-axisThe quantities u, v, w are the longitudinal, transverse and vertical velocities, respectively, q, r are the pitch and yaw angular velocities, theta, psi are the pitch and yaw angles, X_(·)，Y_(·)，Z_(·)，M_(·)，N_(·)Are all hydrodynamic coefficients, z_g，z_bFor the position of the centre of gravity and centre of buoyancy, X is the longitudinal thrust, M and N are the torques about the y-axis and z-axis generated by the combined action of the propeller and rudder, psi_BIs the heading angle, theta, of the underwater robot_BIs the submergence angle of the underwater robot, alpha is the attack angle, beta is the drift angle; v. of_tThe resultant velocity of the underwater robot.

the step 2 specifically comprises the following steps:

wherein the content of the first and second substances,

in order to be the deviation of the course angle,

is the deviation of the angle of repose.

as shown in fig. 3, the step 3 specifically includes:

The vertical approach angle χ (z) is expressed by_e)：

_ψ＝ψ_e-(y_e)

_θ＝θ_e-χ(z_e)

The step 4 specifically comprises the following steps:

wherein k is_ψ，k_θRespectively, control gains;

the problems of low convergence speed and large early deviation of path tracking are considered, and the problems of continuous correct heading and large position deviation appear in training.

Therefore, the boundary reward is determined to be added into the reward function, namely when the AUV is within a set specific boundary range, 1 is continuously added to the reward value of the step to serve as the extra reward, if the AUV is outside the boundary range, the step is not rewarded, the sensitivity of the neural network to the position deviation is improved, the tracking effect is improved, the boundary reward function is determined, and the boundary reward function is represented by the following formula:

step 4.3: performing parameter optimization, including Actor neural network learning rate LR _ A, Critic neural network learning rate LR _ C, reward value attenuation discount coefficient and parameter updating discount coefficient; the Actor neural network learning rate LR _ a determines how much experience needs to be learned in one update of the Actor network parameters, i.e., the larger LR _ a, the more experience is learned in each round of learning, and vice versa. The Critic neural network learning rate LR _ C determines how much experience needs to be learned in one update of Critic network parameters, i.e., the larger LR _ C, the more experience is learned in each round of learning, and vice versa. The discount coefficient of attenuation of reward value is used for reducing the influence of the return of the state after in the Markov decision-making process to the current state measurement, namely the smaller the influence is, the larger the influence is, the more the influence is, the later state return is, when the current state is measured. The parameter update discount coefficient determines the weight of the new parameter when the new network parameter updates the old network parameter, i.e. the larger the weight of the new parameter is, the larger the change degree of the parameter is, and vice versa. The efficiency and the effect of deep reinforcement learning are closely related to the four parameters, and the optimal parameter setting needs to be obtained through theoretical analysis and practice. Therefore, different parameters are selected for debugging for multiple times, and the final parameters are selected as LR _ A being 0.001, LR _ C being 0.003, gamma being 0.95 and tau being 0.05;

In the simulation process, a simulation result with fast convergence and good effect is finally obtained through a large amount of parameter debugging and a plurality of tests. The simulation result is shown in fig. 4, in which the dotted line represents the target path and the solid line represents the tracking path under the control of the backstepping sliding mode technique.

Fig. 5 shows the position error of AUV in three directions during tracking, and it can be seen that the tracking error curve obviously changes periodically and reciprocates around 0 due to the action of ocean current.

The initial value of the target position is S (0) ═ 0, the initial position of AUV is ξ (0) ═ 650, η (0) ═ 500, ζ (0) ═ 50, the initial heading angle θ (0) ═ 0, and the initial pitch angle θ (0) ═ 0. Initial speed 0.1m/s, desired forward speed u_d6/s. The method is characterized in that the interfering water flow is added in the environment, and the speed of the water flow in the xi direction is 0.3m/s, the speed in the eta direction is 0.3m/s, and the speed in the direction is 0.15m/s in an inertial coordinate system. The path tracking training scenario is shown in fig. 6.

As is clear from fig. 7 and 8, the AUV as a whole implements path tracing. In 10,000 iterations, the average distance between the AUV position and the target position is 5.5 m. However, in the initial phase, there is a large overshoot because the neural network controller is not sensitive enough. The maximum deviation in direction reaches 36m and the maximum deviation in direction reaches 30m, and due to the water flow, the AUV oscillates on both sides of the target path and there is a static error.

Because the DDPG reinforcement learning algorithm comprises strategy gradient thought, the neural network can be learned in successful and failed experiences. As can be seen from the prize value curve in fig. 9, the prize value remains good most of the time after the learning is started, but the lower prize value is observed for a long time in the process of 600 to 700 steps. It shows that the neural network controller has learned much in the successful experience, but the experience learned in the failed experience is insufficient, which is likely to cause the controller to fall into local optimality and lack of exploratory. A good neural network controller should be able to learn the experience of success and failure in order to be able to succeed and avoid failure.

The above description is only a preferred embodiment of the method for visually tracking the three-dimensional path of the underwater robot, and the protection range of the method for visually tracking the three-dimensional path of the underwater robot is not limited to the above embodiments, and all technical solutions belonging to the idea belong to the protection range of the present invention. It should be noted that modifications and variations which do not depart from the gist of the invention will be those skilled in the art to which the invention pertains and which are intended to be within the scope of the invention.

Claims

1. A three-dimensional path visual tracking method for an underwater robot is characterized by comprising the following steps: the method comprises the following steps:

2. The underwater robot three-dimensional path visual tracking method as claimed in claim 1, wherein: the step 1 specifically comprises the following steps:

the kinematic equation is represented by:

3. The underwater robot three-dimensional path visual tracking method as claimed in claim 1, wherein: the step 2 specifically comprises the following steps:

wherein the content of the first and second substances,

in order to be the deviation of the course angle,

is the deviation of the angle of repose.

4. The underwater robot three-dimensional path visual tracking method as claimed in claim 1, wherein: the step 3 specifically comprises the following steps:

step 3.1: by usingA backstepping sliding mode control method is characterized in that a horizontal plane approach angle and a vertical plane approach angle are introduced based on a Lyapunov function to adjust a path tracking process of an underwater robot, and the horizontal plane approach angle (y) is expressed by the following formula_e)：

The approach angle x (z) of the vertical plane is expressed by the following equation_e)：

_ψ＝ψ_e-(y_e)

_θ＝θ_e-χ(z_e)

5. The underwater robot three-dimensional path visual tracking method as claimed in claim 1, wherein: the step 4 specifically comprises the following steps:

wherein k is_ψ，k_θRespectively, control gains;

because of continuous time integration, past errors can be gradually accumulated to influence the current errors, so that tracking is dynamically adjusted, the occurrence of static errors is restrained, in order to adapt to a complex ocean current environment, the influence caused by a water flow interference problem is further reduced, the robustness and the anti-interference capability of the controller are enhanced, an interference observer is determined to be added to the control system after research, and the output of the controller is actively adjusted in real time through the observed interference form and characteristics;

determining a boundary reward function, the boundary reward function being represented by:

step 4.3: performing parameter optimization, including Actor neural network learning rate LR _ A, Critic neural network learning rate LR _ C, reward value attenuation discount coefficient and parameter updating discount coefficient; therefore, different parameters are selected for debugging for multiple times, and the final parameters are selected as LR _ A being 0.001, LR _ C being 0.003, gamma being 0.95 and tau being 0.05;