CN108803313B

CN108803313B - Path planning method based on ocean current prediction model

Info

Publication number: CN108803313B
Application number: CN201810589190.3A
Authority: CN
Inventors: 王卓; 姚淑香; 冯晓宁; 隋炎橙; 胡磊; 徐沈方; 张士伟; 张佩
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2018-06-08
Filing date: 2018-06-08
Publication date: 2022-07-12
Anticipated expiration: 2038-06-08
Also published as: CN108803313A

Abstract

The invention belongs to the field of underwater robot control, and discloses a path planning method based on an ocean current prediction model, which comprises the following steps: rasterizing a navigation area according to the path key points; carrying out ocean current prediction on a navigation area by using an area ocean mode, and carrying out fitting calculation to obtain real-time ocean current information; marking a no-go zone by using the electronic chart information; storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and whether the grid points are in a navigation inhibition area or not and whether the grid points reach the end point or not; calculating the direction from the current position to the terminal point and determining selectable actions in all the next driving directions; and (5) seeking an optimal strategy planned by the Markov decision process by using Q learning and outputting a path. The invention fully considers the influence of real-time ocean current on path planning, performs fitting through a BP neural network and a bagging algorithm, and seeks an optimal solution by using reinforcement learning, thereby accelerating the convergence speed and reducing the complexity of operation.

Description

Path planning method based on ocean current prediction model

Technical Field

The invention belongs to the field of underwater robot control, and particularly relates to a path planning method based on an ocean current prediction model.

Background

An underwater robot is also called an unmanned remote control submersible vehicle, and is a limit operation robot working underwater. Underwater robots have become an important tool for the development of the ocean because of the harsh environment and danger of underwater environments and limited human diving depths.

The underwater robot can replace manpower to operate for a long time underwater in a high-risk environment, a polluted environment and a zero-visibility water area, the underwater robot is generally provided with a sonar system, a camera, a lighting lamp, a mechanical arm and other devices, a real-time video and a sonar image can be provided, the mechanical arm can grab a crane, and the underwater robot is widely applied to the fields of oil development, marine law enforcement evidence obtaining, scientific research, military and the like.

Because the running environment of the underwater robot is complex, the noise of underwater acoustic signals is large, and various underwater acoustic sensors generally have the defects of poor precision and frequent jumping, the filtering technology in the underwater robot motion control system is very important. A position sensor commonly adopted in the motion control of the underwater robot is a short-baseline or long-baseline underwater acoustic positioning system, and a speed sensor is a Doppler velocimeter and can influence the accuracy of the underwater acoustic positioning system. The factors mainly include sound speed error, measurement error of response time of the transponder and correction error of the position, namely the distance, of the transponder. Factors influencing the accuracy of the Doppler velocimeter mainly comprise the sound velocity c, the physical and chemical properties of a medium in seawater, the pitching of a carrier and the like

Therefore, path planning is particularly important for underwater robots. The path planning is one of the basic links of the intelligent navigation of the underwater robot. When the underwater robot navigates in a large-scale marine environment, the influence of the marine environment on the navigation of the underwater robot needs to be considered besides the problems of obstacle avoidance and energy consumption. The ocean current changing along with time brings great challenges to the safety and task realization of the underwater robot, so that the underwater robot can utilize the energy in a flow field as much as possible by the predicted ocean current elements when planning the path, and a feasible safe path with low energy consumption is planned.

From the perspective of algorithm strategies, current path planning algorithms can be divided into path planning based on intelligent computation, path planning based on behavior and learning psychology, and random sampling path planning. These algorithms are mainly aimed at improving the solution space search efficiency and speeding up convergence, or are proposed for unknown environments or dynamic spaces, and at present, more and more scholars begin the research of path planning under the influence of ocean currents. The invention discloses a method for forecasting ocean current in real time by forecasting ocean current field data and AUV position and control instructions of a certain area at a future moment by using regional ocean modes, and an ocean current field used for path planning is more accurate and has real-time performance.

The patent with the application number of 201710538828.6 discloses a path planning device and a method of an unmanned underwater vehicle based on a detection threat domain, which solve the problem of path planning of a UUV under a terrain obstacle environment based on a path planning algorithm of the detection threat domain and can meet the kinematic constraint, collision avoidance constraint and hidden detection constraint of the UUV. A path from a motion starting point to a motion terminal point is planned at a given initial position, a terminal point position, a maximum curvature constraint, a path discrete point resolution, a hidden safety index and the like, and the path is smooth, continuous and derivable, and meets the navigation turning curvature constraint, the hidden safety index and the like of a UUV (unmanned Underwater vehicle) so that the UUV can safely and covertly reach the terminal point in the shortest time. The method applies the detection threat theory and the geometric theory of navigation turning curvature constraint to the field of path planning of UUV for the first time, can rapidly realize path planning, is simple and reliable, easy to realize, small in calculated amount and good in real-time performance, can meet the requirement of path planning, improves the practicability of path planning, and has positive significance for the development of the field of underwater path planning in the future. However, when the method is applied to path planning of an underwater vehicle, the problems of too complex calculation process and poor real-time performance exist.

Disclosure of Invention

The invention aims to disclose a path planning method based on an ocean current prediction model, which is low in energy consumption and high in safety.

The purpose of the invention is realized as follows:

a path planning method based on an ocean current prediction model comprises the following steps:

step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;

step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using the regional ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and the ocean current information at the corresponding moment, and calculating to obtain the real-time ocean current information:

the control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment are input and output, the meridional speed and the latitudinal speed of ocean current are output, the input layer comprises 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined through a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained.

And (3): marking the area endangering the safe navigation of the underwater robot as a no-navigation area in the grid by utilizing the electronic chart information;

and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and storing the longitude and latitude of each point of the grid, whether the point is a navigation inhibition area or not and whether the point reaches the end point or not;

and (5): calculating the direction from the current position to the terminal point and determining the optional action in all the next driving directions:

according to the structure diagram of the rectangular grid, assuming that a black point in the middle of the rectangular grid is the current position of the underwater robot, 16 possible actions from a1, a2 to a16 exist in the current action, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and assuming that the position where the underwater robot is located after the current action is executed is in a no-navigation area;

let a_stFor the motion from the current point position to the target point position, the motion selection formula is as follows:

in the above formula, i is an integer, and i is belonged to [1,16 ]](ii) a Selection A_i>0, if the obstacle is at the nearest 8 grid points of the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is at a layer of grid point outside the current point, only the action corresponding to the grid point with the obstacle is abandoned.

And (6): and (3) adopting an emphasis learning mode, seeking the optimal strategy planned by the Markov decision process by using Q learning and outputting a path.

Step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmax_a Q(s,a)；

Step (6.2): initialization State S₀For the initial position, an initial time t is determined₀；

Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;

step (6.4): using the key exploration strategy to select action a and generate reward r_t+1Transition to State S_t+1：

Focus exploration strategy μ (x):

in the above formula, the first and second carbon atoms are,

in the above formula, w₁Is a weight coefficient of the distance influence, w₂Is the weighting factor of the ocean current effects;

v_cis the current velocity, a, of the grid point at which the current position is located at time t_iIs a probability of p_iAn optional action of (1).

Step (6.5): according to the original strategy pi, in the state S_t+1Selecting and executing action a_t+1。

Step (6.6): updating the function value of the state action value function:

Q(s_t,a_t)←Q(s_t,a_t)+β[r_t+1+γQ(s_t+1,a_t+1)-Q(s_t,a_t)]；

in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma denotes a discount factor.

Step (6.7): updating current policies with greedy policies

Step (6.8): and (4) judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9).

Step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; and if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path.

The invention has the beneficial effects that:

the method fully considers the influence of real-time ocean current on path planning, predicts future ocean elements through regional ocean modes, and performs fitting by using a BP neural network and a bagging algorithm to obtain real-time ocean current information. Meanwhile, planning is carried out according to a Markov decision process, and an optimal solution is sought by using reinforcement learning, so that the convergence speed is increased, the complexity of operation is reduced, and a planned path is obtained better and faster.

Drawings

FIG. 1 is a flow chart of a path planning method based on an ocean current prediction model;

FIG. 2 is a diagram of a rectangular grid structure;

FIG. 3 is a schematic view of action selection;

figure 4 is a flow chart of a markov decision process planning.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

example 1:

as shown in fig. 1, a path planning method based on an ocean current prediction model includes the following steps:

determining a rectangular navigation area according to the starting point and the end point of the underwater robot path; orthogonal curve grids are adopted in the horizontal direction, the grid distance range is set to be 2 km-30 km, and 20-30 layers are divided in the vertical direction in an equal depth mode.

adopting a sigma coordinate in the vertical direction, controlling the scale of the vertical coordinate to be [ -1,0] through a vertical transformation function and a stretching function, and setting the number of vertically divided layers;

vertical transformation function:

z(x,y,s,t)＝η(x,y,t)+[η(x,y,t)+h(x,y)]×Z₀(x,y,s)；

in the above formula, z is the height of a cartesian coordinate system, x is a coordinate value of a warp, y is a coordinate value of a weft, s is a vertical distance from the water surface, t is time, η (x, y, t) is a free sea surface varying with time, h (x, y) is the thickness of an undisturbed water body, and hc is a conversion parameter;

stretching function:

in the above formula, θ_sIs a surface control parameter, 0 < theta_s≤10。

The regional ocean mode initial condition is realized by four-dimensional assimilation, the boundary condition is obtained by differentiating a forecasting field of a global mode, a central difference format is adopted in space, a frog leaping format is adopted in time, the time step length is set to be 5min, and the ocean current field of a navigation region is forecasted and stored in a file.

and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to the plane grids of different depths, and storing the longitude and latitude of each point of the grid, whether the grid is a navigation inhibition area or not and whether the grid reaches an end point or not;

as shown in fig. 2, according to the rectangular grid structure diagram, it is assumed that a black dot in the middle of the rectangular grid is the current position of the underwater robot, and there are 16 possible current actions from a1, a2 to a16, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and it is assumed that the position where the current action is executed is in the no-navigation area;

as shown in FIG. 3, let a_stFor the motion from the current point position to the target point position, the motion selection formula is as follows:

in the above formula, i is an integer, and i belongs to [1,16 ]](ii) a Selection A_i>0, if the obstacle is at the nearest 8 grid points of the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is at a layer of grid point outside the current point, only the action corresponding to the grid point with the obstacle is abandoned.

And (6): and (3) adopting a key learning mode, seeking an optimal strategy planned by a Markov decision process by using Q learning and outputting a path:

the markov decision process is described by the quintuple (S, a, P, R, γ), where:

s is a finite state set, A is a finite action set, P is a state transition probability, R is a return function, and gamma is a discount factor for calculating a cumulative return.

The goal of reinforcement learning is to seek the optimal strategy given a markov decision process. A policy refers to the mapping of states to actions, often denoted by the symbol π. The underwater robot plans an optimal path according to a return optimization strategy by exploring actions in a strategy for an unknown environment, when one action generates positive return, the action is strengthened, and the action can be selected at a high probability when the same state reappears next time, otherwise, the action is weakened, and the optimal strategy is sought by continuously interacting with the environment. Due to the inherent adaptability, reaction capability and online learning capability, the Q learning is mostly used in path planning of unknown environments, and is most widely applied. The specific steps are shown in fig. 4.

The key exploration strategy is as follows:

in the above formula, the first and second carbon atoms are,

in the above formula, w₁Is a weight coefficient of the distance influence, w₂Is the weighting factor of the ocean current influence;

v_cis the current speed at the grid point of the current position at time t, a_iIs a probability of p_iAn optional action of (1);

the immediate reward function:

in the above formula, w_dIs a weight coefficient, w, of a distance reward or punishment function_rIs a weight coefficient of a hazard reward or punishment function, w_cIs the weight coefficient of the ocean current reward and punishment function;

distance reward and punishment function r_dD (t) d (t +1), d (t) representing the distance from the robot position to the target point at time t, and d (t +1) representing the distance from the robot position to the target point at time t + 1.

Danger reward and punishment function:

in the above formula, d_oThe lattice distance between the current position of the underwater robot and the obstacle is obtained;

submarine current reward and punishment function r_c＝v_ccos | α - θ | where α is the heading angle and θ is the direction of ocean currents;

generating a reward r according to the action a selected by the key exploration strategy_t+1Transition to State S_t+1；

Step (6.6): updating the function value of the state action value function:

Q(s_t,a_t)←Q(s_t,a_t)+β[r_t+1+γQ(s_t+1,a_t+1)-Q(s_t,a_t)]；

Step (6.7): updating current policies with greedy policies

Compared with the prior art, the method fully considers the influence of real-time ocean currents on path planning, predicts future ocean elements through regional ocean modes, and utilizes the BP neural network and bagging algorithm to carry out fitting to obtain real-time ocean current information. Meanwhile, planning is carried out according to a Markov decision process, and an optimal solution is sought by using reinforcement learning, so that the convergence speed is increased, the complexity of operation is reduced, and a planned path is obtained better and faster.

The above description is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A path planning method based on an ocean current prediction model is characterized by comprising the following steps: comprises the following steps:

determining a rectangular navigation area according to the starting point and the end point of the underwater robot path; adopting orthogonal curve grids in the horizontal direction, setting the grid distance range to be 2 km-30 km, and dividing 20-30 layers in the vertical direction in an equal depth manner;

adopting sigma coordinates in the vertical direction, controlling the scale of the vertical coordinates to be [ -1,0] through a vertical transformation function and a stretching function, and setting the number of vertically divided layers;

vertical transformation function:

z(x,y,s,t)＝η(x,y,t)+[η(x,y,t)+h(x,y)]×Z₀(x,y,s)；

in the above formula, z is the height of the cartesian coordinate system, x is the coordinate value of the longitude, y is the coordinate value of the latitude, s is the vertical distance from the water surface, t is the time, η (x, y, t) is the free sea surface varying with time, h (x, y) is the thickness of the undisturbed water body, and hc is the conversion parameter;

stretching function:

in the above formula, θ_sIs a surface control parameter, 0 < theta_s≤10；

Step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using an area ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and ocean current information at a corresponding moment, and calculating to obtain real-time ocean current information;

the step (2) is specifically as follows:

the control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the input is the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment, the output is the warp-wise speed and the weft-wise speed of ocean current, the input layer is 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined by a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained;

and (3): marking the area endangering the safe navigation of the underwater robot as a restricted navigation area in the grid by utilizing the electronic chart information;

and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and storing the longitude and latitude of each point of the grids, whether the grid is a navigation inhibition area or not and whether the grid reaches an end point or not;

and (5): calculating the direction from the current position to the terminal point and determining selectable actions in all the next driving directions;

the step (5) is specifically as follows:

in the above formula, i is an integer, and i is belonged to [1,16 ]](ii) a Selection A_iIf the obstacle is more than 0, if the obstacle is 8 grid points closest to the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is positioned at a layer of grid points outside the current point, only the action corresponding to the grid points with the obstacle is abandoned;

and (6): adopting a key learning mode, seeking an optimal strategy planned by a Markov decision process by using Q learning and outputting a path;

the step (6) comprises the following steps:

step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmax_aQ(s,a)；

step (6.4): using the key exploration strategy to select action a and generate reward r_t+1Transition to State S_t+1；

Step (6.5): according to the original strategy pi, in the state S_t+1Selecting and executing action a_t+1；

Step (6.6): updating the function value of the state action value function;

Q(s_t,a_t)←Q(s_t,a_t)+β[r_t+1+γQ(s_t+1,a_t+1)-Q(s_t,a_t)]；

in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma represents a discount factor;

immediate reward function:

wherein, w_d、w_r、w_cRespectively are weight coefficients of a distance reward and punishment function, a danger reward and punishment function and a sea current reward and punishment function;

distance reward and punishment function r_dD (t) -d (t +1), d (t) representing the distance from the robot position to the target point at time t, d (t +1) representing the distance from the robot position to the target point at time t + 1;

danger reward and punishment function:

d₀the lattice distance between the current position of the underwater robot and the obstacle is obtained;

submarine current reward and punishment function r_c＝v_ccos|α₀-θ|，α₀Is the course angle, theta is the ocean current direction;

step (6.7): updating current policies with greedy policies

Step (6.8): judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9);

step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path;

the key exploration strategy mu (x) is as follows:

in the above-mentioned formula, the compound has the following structure,

in the above formula, w₁Is a weight coefficient of the distance influence, w₂Is the weighting factor of the influence of the ocean current,