CN108803313B - Path planning method based on ocean current prediction model - Google Patents

Path planning method based on ocean current prediction model Download PDF

Info

Publication number
CN108803313B
CN108803313B CN201810589190.3A CN201810589190A CN108803313B CN 108803313 B CN108803313 B CN 108803313B CN 201810589190 A CN201810589190 A CN 201810589190A CN 108803313 B CN108803313 B CN 108803313B
Authority
CN
China
Prior art keywords
current
ocean current
point
time
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810589190.3A
Other languages
Chinese (zh)
Other versions
CN108803313A (en
Inventor
王卓
姚淑香
冯晓宁
隋炎橙
胡磊
徐沈方
张士伟
张佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201810589190.3A priority Critical patent/CN108803313B/en
Publication of CN108803313A publication Critical patent/CN108803313A/en
Application granted granted Critical
Publication of CN108803313B publication Critical patent/CN108803313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/048Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators using a predictor
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention belongs to the field of underwater robot control, and discloses a path planning method based on an ocean current prediction model, which comprises the following steps: rasterizing a navigation area according to the path key points; carrying out ocean current prediction on a navigation area by using an area ocean mode, and carrying out fitting calculation to obtain real-time ocean current information; marking a no-go zone by using the electronic chart information; storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and whether the grid points are in a navigation inhibition area or not and whether the grid points reach the end point or not; calculating the direction from the current position to the terminal point and determining selectable actions in all the next driving directions; and (5) seeking an optimal strategy planned by the Markov decision process by using Q learning and outputting a path. The invention fully considers the influence of real-time ocean current on path planning, performs fitting through a BP neural network and a bagging algorithm, and seeks an optimal solution by using reinforcement learning, thereby accelerating the convergence speed and reducing the complexity of operation.

Description

Path planning method based on ocean current prediction model
Technical Field
The invention belongs to the field of underwater robot control, and particularly relates to a path planning method based on an ocean current prediction model.
Background
An underwater robot is also called an unmanned remote control submersible vehicle, and is a limit operation robot working underwater. Underwater robots have become an important tool for the development of the ocean because of the harsh environment and danger of underwater environments and limited human diving depths.
The underwater robot can replace manpower to operate for a long time underwater in a high-risk environment, a polluted environment and a zero-visibility water area, the underwater robot is generally provided with a sonar system, a camera, a lighting lamp, a mechanical arm and other devices, a real-time video and a sonar image can be provided, the mechanical arm can grab a crane, and the underwater robot is widely applied to the fields of oil development, marine law enforcement evidence obtaining, scientific research, military and the like.
Because the running environment of the underwater robot is complex, the noise of underwater acoustic signals is large, and various underwater acoustic sensors generally have the defects of poor precision and frequent jumping, the filtering technology in the underwater robot motion control system is very important. A position sensor commonly adopted in the motion control of the underwater robot is a short-baseline or long-baseline underwater acoustic positioning system, and a speed sensor is a Doppler velocimeter and can influence the accuracy of the underwater acoustic positioning system. The factors mainly include sound speed error, measurement error of response time of the transponder and correction error of the position, namely the distance, of the transponder. Factors influencing the accuracy of the Doppler velocimeter mainly comprise the sound velocity c, the physical and chemical properties of a medium in seawater, the pitching of a carrier and the like
Therefore, path planning is particularly important for underwater robots. The path planning is one of the basic links of the intelligent navigation of the underwater robot. When the underwater robot navigates in a large-scale marine environment, the influence of the marine environment on the navigation of the underwater robot needs to be considered besides the problems of obstacle avoidance and energy consumption. The ocean current changing along with time brings great challenges to the safety and task realization of the underwater robot, so that the underwater robot can utilize the energy in a flow field as much as possible by the predicted ocean current elements when planning the path, and a feasible safe path with low energy consumption is planned.
From the perspective of algorithm strategies, current path planning algorithms can be divided into path planning based on intelligent computation, path planning based on behavior and learning psychology, and random sampling path planning. These algorithms are mainly aimed at improving the solution space search efficiency and speeding up convergence, or are proposed for unknown environments or dynamic spaces, and at present, more and more scholars begin the research of path planning under the influence of ocean currents. The invention discloses a method for forecasting ocean current in real time by forecasting ocean current field data and AUV position and control instructions of a certain area at a future moment by using regional ocean modes, and an ocean current field used for path planning is more accurate and has real-time performance.
The patent with the application number of 201710538828.6 discloses a path planning device and a method of an unmanned underwater vehicle based on a detection threat domain, which solve the problem of path planning of a UUV under a terrain obstacle environment based on a path planning algorithm of the detection threat domain and can meet the kinematic constraint, collision avoidance constraint and hidden detection constraint of the UUV. A path from a motion starting point to a motion terminal point is planned at a given initial position, a terminal point position, a maximum curvature constraint, a path discrete point resolution, a hidden safety index and the like, and the path is smooth, continuous and derivable, and meets the navigation turning curvature constraint, the hidden safety index and the like of a UUV (unmanned Underwater vehicle) so that the UUV can safely and covertly reach the terminal point in the shortest time. The method applies the detection threat theory and the geometric theory of navigation turning curvature constraint to the field of path planning of UUV for the first time, can rapidly realize path planning, is simple and reliable, easy to realize, small in calculated amount and good in real-time performance, can meet the requirement of path planning, improves the practicability of path planning, and has positive significance for the development of the field of underwater path planning in the future. However, when the method is applied to path planning of an underwater vehicle, the problems of too complex calculation process and poor real-time performance exist.
Disclosure of Invention
The invention aims to disclose a path planning method based on an ocean current prediction model, which is low in energy consumption and high in safety.
The purpose of the invention is realized as follows:
a path planning method based on an ocean current prediction model comprises the following steps:
step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;
step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using the regional ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and the ocean current information at the corresponding moment, and calculating to obtain the real-time ocean current information:
the control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment are input and output, the meridional speed and the latitudinal speed of ocean current are output, the input layer comprises 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined through a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained.
And (3): marking the area endangering the safe navigation of the underwater robot as a no-navigation area in the grid by utilizing the electronic chart information;
and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and storing the longitude and latitude of each point of the grid, whether the point is a navigation inhibition area or not and whether the point reaches the end point or not;
and (5): calculating the direction from the current position to the terminal point and determining the optional action in all the next driving directions:
according to the structure diagram of the rectangular grid, assuming that a black point in the middle of the rectangular grid is the current position of the underwater robot, 16 possible actions from a1, a2 to a16 exist in the current action, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and assuming that the position where the underwater robot is located after the current action is executed is in a no-navigation area;
let astFor the motion from the current point position to the target point position, the motion selection formula is as follows:
Figure BDA0001690135330000031
in the above formula, i is an integer, and i is belonged to [1,16 ]](ii) a Selection Ai>0, if the obstacle is at the nearest 8 grid points of the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is at a layer of grid point outside the current point, only the action corresponding to the grid point with the obstacle is abandoned.
And (6): and (3) adopting an emphasis learning mode, seeking the optimal strategy planned by the Markov decision process by using Q learning and outputting a path.
Step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmaxa Q(s,a);
Step (6.2): initialization State S0For the initial position, an initial time t is determined0
Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;
step (6.4): using the key exploration strategy to select action a and generate reward rt+1Transition to State St+1
Focus exploration strategy μ (x):
Figure BDA0001690135330000032
in the above formula, the first and second carbon atoms are,
Figure BDA0001690135330000033
in the above formula, w1Is a weight coefficient of the distance influence, w2Is the weighting factor of the ocean current effects;
Figure BDA0001690135330000034
vcis the current velocity, a, of the grid point at which the current position is located at time tiIs a probability of piAn optional action of (1).
Step (6.5): according to the original strategy pi, in the state St+1Selecting and executing action at+1
Step (6.6): updating the function value of the state action value function:
Q(st,at)←Q(st,at)+β[rt+1+γQ(st+1,at+1)-Q(st,at)];
in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma denotes a discount factor.
Step (6.7): updating current policies with greedy policies
Figure BDA0001690135330000035
Step (6.8): and (4) judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9).
Step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; and if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path.
The invention has the beneficial effects that:
the method fully considers the influence of real-time ocean current on path planning, predicts future ocean elements through regional ocean modes, and performs fitting by using a BP neural network and a bagging algorithm to obtain real-time ocean current information. Meanwhile, planning is carried out according to a Markov decision process, and an optimal solution is sought by using reinforcement learning, so that the convergence speed is increased, the complexity of operation is reduced, and a planned path is obtained better and faster.
Drawings
FIG. 1 is a flow chart of a path planning method based on an ocean current prediction model;
FIG. 2 is a diagram of a rectangular grid structure;
FIG. 3 is a schematic view of action selection;
figure 4 is a flow chart of a markov decision process planning.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
example 1:
as shown in fig. 1, a path planning method based on an ocean current prediction model includes the following steps:
step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;
determining a rectangular navigation area according to the starting point and the end point of the underwater robot path; orthogonal curve grids are adopted in the horizontal direction, the grid distance range is set to be 2 km-30 km, and 20-30 layers are divided in the vertical direction in an equal depth mode.
Step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using the regional ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and the ocean current information at the corresponding moment, and calculating to obtain the real-time ocean current information:
adopting a sigma coordinate in the vertical direction, controlling the scale of the vertical coordinate to be [ -1,0] through a vertical transformation function and a stretching function, and setting the number of vertically divided layers;
vertical transformation function:
z(x,y,s,t)=η(x,y,t)+[η(x,y,t)+h(x,y)]×Z0(x,y,s);
Figure BDA0001690135330000041
in the above formula, z is the height of a cartesian coordinate system, x is a coordinate value of a warp, y is a coordinate value of a weft, s is a vertical distance from the water surface, t is time, η (x, y, t) is a free sea surface varying with time, h (x, y) is the thickness of an undisturbed water body, and hc is a conversion parameter;
stretching function:
Figure BDA0001690135330000051
in the above formula, θsIs a surface control parameter, 0 < thetas≤10。
The regional ocean mode initial condition is realized by four-dimensional assimilation, the boundary condition is obtained by differentiating a forecasting field of a global mode, a central difference format is adopted in space, a frog leaping format is adopted in time, the time step length is set to be 5min, and the ocean current field of a navigation region is forecasted and stored in a file.
The control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment are input and output, the meridional speed and the latitudinal speed of ocean current are output, the input layer comprises 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined through a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained.
And (3): marking the area endangering the safe navigation of the underwater robot as a no-navigation area in the grid by utilizing the electronic chart information;
and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to the plane grids of different depths, and storing the longitude and latitude of each point of the grid, whether the grid is a navigation inhibition area or not and whether the grid reaches an end point or not;
and (5): calculating the direction from the current position to the terminal point and determining the optional action in all the next driving directions:
as shown in fig. 2, according to the rectangular grid structure diagram, it is assumed that a black dot in the middle of the rectangular grid is the current position of the underwater robot, and there are 16 possible current actions from a1, a2 to a16, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and it is assumed that the position where the current action is executed is in the no-navigation area;
as shown in FIG. 3, let astFor the motion from the current point position to the target point position, the motion selection formula is as follows:
Figure BDA0001690135330000052
in the above formula, i is an integer, and i belongs to [1,16 ]](ii) a Selection Ai>0, if the obstacle is at the nearest 8 grid points of the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is at a layer of grid point outside the current point, only the action corresponding to the grid point with the obstacle is abandoned.
And (6): and (3) adopting a key learning mode, seeking an optimal strategy planned by a Markov decision process by using Q learning and outputting a path:
the markov decision process is described by the quintuple (S, a, P, R, γ), where:
s is a finite state set, A is a finite action set, P is a state transition probability, R is a return function, and gamma is a discount factor for calculating a cumulative return.
The goal of reinforcement learning is to seek the optimal strategy given a markov decision process. A policy refers to the mapping of states to actions, often denoted by the symbol π. The underwater robot plans an optimal path according to a return optimization strategy by exploring actions in a strategy for an unknown environment, when one action generates positive return, the action is strengthened, and the action can be selected at a high probability when the same state reappears next time, otherwise, the action is weakened, and the optimal strategy is sought by continuously interacting with the environment. Due to the inherent adaptability, reaction capability and online learning capability, the Q learning is mostly used in path planning of unknown environments, and is most widely applied. The specific steps are shown in fig. 4.
Step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmaxa Q(s,a);
Step (6.2): initialization state S0For the initial position, an initial time t is determined0
Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;
step (6.4): using the key exploration strategy to select action a and generate reward rt+1Transition to State St+1
The key exploration strategy is as follows:
Figure BDA0001690135330000061
in the above formula, the first and second carbon atoms are,
Figure BDA0001690135330000062
in the above formula, w1Is a weight coefficient of the distance influence, w2Is the weighting factor of the ocean current influence;
Figure BDA0001690135330000063
vcis the current speed at the grid point of the current position at time t, aiIs a probability of piAn optional action of (1);
the immediate reward function:
Figure BDA0001690135330000064
in the above formula, wdIs a weight coefficient, w, of a distance reward or punishment functionrIs a weight coefficient of a hazard reward or punishment function, wcIs the weight coefficient of the ocean current reward and punishment function;
distance reward and punishment function rdD (t) d (t +1), d (t) representing the distance from the robot position to the target point at time t, and d (t +1) representing the distance from the robot position to the target point at time t + 1.
Danger reward and punishment function:
Figure BDA0001690135330000071
in the above formula, doThe lattice distance between the current position of the underwater robot and the obstacle is obtained;
submarine current reward and punishment function rc=vccos | α - θ | where α is the heading angle and θ is the direction of ocean currents;
generating a reward r according to the action a selected by the key exploration strategyt+1Transition to State St+1
Step (6.5): according to the original strategy pi, in the state St+1Selecting and executing action at+1
Step (6.6): updating the function value of the state action value function:
Q(st,at)←Q(st,at)+β[rt+1+γQ(st+1,at+1)-Q(st,at)];
in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma denotes a discount factor.
Step (6.7): updating current policies with greedy policies
Figure BDA0001690135330000072
Step (6.8): and (4) judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9).
Step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; and if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path.
Compared with the prior art, the method fully considers the influence of real-time ocean currents on path planning, predicts future ocean elements through regional ocean modes, and utilizes the BP neural network and bagging algorithm to carry out fitting to obtain real-time ocean current information. Meanwhile, planning is carried out according to a Markov decision process, and an optimal solution is sought by using reinforcement learning, so that the convergence speed is increased, the complexity of operation is reduced, and a planned path is obtained better and faster.
The above description is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A path planning method based on an ocean current prediction model is characterized by comprising the following steps: comprises the following steps:
step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;
determining a rectangular navigation area according to the starting point and the end point of the underwater robot path; adopting orthogonal curve grids in the horizontal direction, setting the grid distance range to be 2 km-30 km, and dividing 20-30 layers in the vertical direction in an equal depth manner;
adopting sigma coordinates in the vertical direction, controlling the scale of the vertical coordinates to be [ -1,0] through a vertical transformation function and a stretching function, and setting the number of vertically divided layers;
vertical transformation function:
z(x,y,s,t)=η(x,y,t)+[η(x,y,t)+h(x,y)]×Z0(x,y,s);
Figure FDA0003669099660000011
in the above formula, z is the height of the cartesian coordinate system, x is the coordinate value of the longitude, y is the coordinate value of the latitude, s is the vertical distance from the water surface, t is the time, η (x, y, t) is the free sea surface varying with time, h (x, y) is the thickness of the undisturbed water body, and hc is the conversion parameter;
stretching function:
Figure FDA0003669099660000012
in the above formula, θsIs a surface control parameter, 0 < thetas≤10;
Step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using an area ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and ocean current information at a corresponding moment, and calculating to obtain real-time ocean current information;
the step (2) is specifically as follows:
the control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the input is the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment, the output is the warp-wise speed and the weft-wise speed of ocean current, the input layer is 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined by a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained;
and (3): marking the area endangering the safe navigation of the underwater robot as a restricted navigation area in the grid by utilizing the electronic chart information;
and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and storing the longitude and latitude of each point of the grids, whether the grid is a navigation inhibition area or not and whether the grid reaches an end point or not;
and (5): calculating the direction from the current position to the terminal point and determining selectable actions in all the next driving directions;
the step (5) is specifically as follows:
according to the structure diagram of the rectangular grid, assuming that a black point in the middle of the rectangular grid is the current position of the underwater robot, 16 possible actions from a1, a2 to a16 exist in the current action, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and assuming that the position where the underwater robot is located after the current action is executed is in a no-navigation area;
let astFor the motion from the current point position to the target point position, the motion selection formula is as follows:
Figure FDA0003669099660000021
in the above formula, i is an integer, and i is belonged to [1,16 ]](ii) a Selection AiIf the obstacle is more than 0, if the obstacle is 8 grid points closest to the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is positioned at a layer of grid points outside the current point, only the action corresponding to the grid points with the obstacle is abandoned;
and (6): adopting a key learning mode, seeking an optimal strategy planned by a Markov decision process by using Q learning and outputting a path;
the step (6) comprises the following steps:
step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmaxaQ(s,a);
Step (6.2): initialization state S0For the initial position, an initial time t is determined0
Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;
step (6.4): using the key exploration strategy to select action a and generate reward rt+1Transition to State St+1
Step (6.5): according to the original strategy pi, in the state St+1Selecting and executing action at+1
Step (6.6): updating the function value of the state action value function;
Q(st,at)←Q(st,at)+β[rt+1+γQ(st+1,at+1)-Q(st,at)];
in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma represents a discount factor;
immediate reward function:
Figure FDA0003669099660000022
wherein, wd、wr、wcRespectively are weight coefficients of a distance reward and punishment function, a danger reward and punishment function and a sea current reward and punishment function;
distance reward and punishment function rdD (t) -d (t +1), d (t) representing the distance from the robot position to the target point at time t, d (t +1) representing the distance from the robot position to the target point at time t + 1;
danger reward and punishment function:
Figure FDA0003669099660000031
d0the lattice distance between the current position of the underwater robot and the obstacle is obtained;
submarine current reward and punishment function rc=vccos|α0-θ|,α0Is the course angle, theta is the ocean current direction;
step (6.7): updating current policies with greedy policies
Figure FDA0003669099660000032
Step (6.8): judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9);
step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path;
the key exploration strategy mu (x) is as follows:
Figure FDA0003669099660000033
in the above-mentioned formula, the compound has the following structure,
Figure FDA0003669099660000034
in the above formula, w1Is a weight coefficient of the distance influence, w2Is the weighting factor of the influence of the ocean current,
Figure FDA0003669099660000035
vcis the current velocity, a, of the grid point at which the current position is located at time tiIs a probability of piAn optional action of (1).
CN201810589190.3A 2018-06-08 2018-06-08 Path planning method based on ocean current prediction model Active CN108803313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810589190.3A CN108803313B (en) 2018-06-08 2018-06-08 Path planning method based on ocean current prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810589190.3A CN108803313B (en) 2018-06-08 2018-06-08 Path planning method based on ocean current prediction model

Publications (2)

Publication Number Publication Date
CN108803313A CN108803313A (en) 2018-11-13
CN108803313B true CN108803313B (en) 2022-07-12

Family

ID=64088958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810589190.3A Active CN108803313B (en) 2018-06-08 2018-06-08 Path planning method based on ocean current prediction model

Country Status (1)

Country Link
CN (1) CN108803313B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445437A (en) * 2018-11-30 2019-03-08 电子科技大学 A kind of paths planning method of unmanned electric vehicle
CN109657863B (en) * 2018-12-20 2021-06-25 智慧航海(青岛)科技有限公司 Firefly algorithm-based unmanned ship global path dynamic optimization method
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN109948054A (en) * 2019-03-11 2019-06-28 北京航空航天大学 A kind of adaptive learning path planning system based on intensified learning
CN110555584B (en) * 2019-07-17 2021-04-06 浙江工业大学 Automatic parking lot scheduling method based on deep reinforcement learning
CN110543171B (en) * 2019-08-27 2020-07-31 华中科技大学 Storage multi-AGV path planning method based on improved BP neural network
CN110763234B (en) * 2019-10-15 2022-10-28 哈尔滨工程大学 Submarine topography matching navigation path planning method for underwater robot
CN111645079B (en) * 2020-08-04 2020-11-10 天津滨电电力工程有限公司 Device and method for planning and controlling mechanical arm path of live working robot
CN111958601A (en) * 2020-08-19 2020-11-20 西南交通大学 Automatic path finding and material identification method based on deep learning
CN112215395B (en) * 2020-09-02 2023-04-18 中国船舶重工集团公司第七研究院 Underwater equipment adaptability information guarantee system based on ocean big data
CN112698646B (en) * 2020-12-05 2022-09-13 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112581026B (en) * 2020-12-29 2022-08-12 杭州趣链科技有限公司 Joint path planning method for logistics robot on alliance chain
CN113052370A (en) * 2021-03-15 2021-06-29 哈尔滨工程大学 Marine environment element statistical prediction method based on space-time experience orthogonal function
CN113064440B (en) * 2021-03-15 2022-08-02 哈尔滨工程大学 Self-adaptive observation method based on ocean mode
CN113325856B (en) * 2021-05-31 2022-07-08 中国船舶工业集团公司第七0八研究所 UUV optimal operation path planning method based on countercurrent approximation strategy
CN114200929B (en) * 2021-11-24 2023-10-20 中国科学院沈阳自动化研究所 Rapid comb-type path planning method for maximum detection coverage rate of multi-underwater robot
CN116700315B (en) * 2023-07-03 2024-02-06 苏州优世达智能科技有限公司 Unmanned ship track tracking control method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006122030A2 (en) * 2005-05-07 2006-11-16 Thaler Stephen L Device for the autonomous bootstrapping of useful information
CN102175245A (en) * 2011-01-28 2011-09-07 哈尔滨工程大学 Underwater vehicle path planning method based on ocean current historical statistic information
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
CN102819264A (en) * 2012-07-30 2012-12-12 山东大学 Path planning Q-learning initial method of mobile robot
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006122030A2 (en) * 2005-05-07 2006-11-16 Thaler Stephen L Device for the autonomous bootstrapping of useful information
CN102175245A (en) * 2011-01-28 2011-09-07 哈尔滨工程大学 Underwater vehicle path planning method based on ocean current historical statistic information
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
CN102819264A (en) * 2012-07-30 2012-12-12 山东大学 Path planning Q-learning initial method of mobile robot
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于ARIMA_BP神经网络模型海流流速预测研究;董世超;《中国科技信息》;20140228(第02期);第86-88页第1-4节 *
水下机器人路径规划问题的关键技术研究;曹江丽;《中国博士学位论文全文数据库 信息科技辑》;20110215(第02期);第I140-42页第5章 *

Also Published As

Publication number Publication date
CN108803313A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108803313B (en) Path planning method based on ocean current prediction model
CN109540151B (en) AUV three-dimensional path planning method based on reinforcement learning
Chen et al. Path planning and obstacle avoiding of the USV based on improved ACO-APF hybrid algorithm with adaptive early-warning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
Singh et al. A constrained A* approach towards optimal path planning for an unmanned surface vehicle in a maritime environment containing dynamic obstacles and ocean currents
Chen et al. A hybrid path planning algorithm for unmanned surface vehicles in complex environment with dynamic obstacles
Wang et al. A COLREGs-based obstacle avoidance approach for unmanned surface vehicles
CN109976349B (en) Design method of path tracking guidance and control structure of constraint-containing unmanned ship
CN109765929B (en) UUV real-time obstacle avoidance planning method based on improved RNN
CN109241552A (en) A kind of underwater robot motion planning method based on multiple constraint target
Hadi et al. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle
CN111026135B (en) High-performance sailing feedforward control system and control method for unmanned ship
CN106338919A (en) USV (Unmanned Surface Vehicle) track tracking control method based on enhanced learning type intelligent algorithm
Lan et al. Improved RRT algorithms to solve path planning of multi-glider in time-varying ocean currents
Lan et al. Path planning for underwater gliders in time-varying ocean current using deep reinforcement learning
Yan et al. Reinforcement learning-based autonomous navigation and obstacle avoidance for USVs under partially observable conditions
Hedjar et al. An automatic collision avoidance algorithm for multiple marine surface vehicles
Yan et al. Real-world learning control for autonomous exploration of a biomimetic robotic shark
CN109916400B (en) Unmanned ship obstacle avoidance method based on combination of gradient descent algorithm and VO method
Wang et al. Path following with prescribed performance for under-actuated autonomous underwater vehicles subjects to unknown actuator dead-zone
CN115793639A (en) Unmanned ship complex path planning method and device based on reinforcement learning algorithm
CN114610046A (en) Unmanned ship dynamic safety trajectory planning method considering dynamic water depth
CN115185262A (en) Dynamic obstacle avoidance path rapid planning method based on minimum safe meeting distance
Guo et al. Path planning for autonomous underwater vehicles based on an improved artificial jellyfish search algorithm in multi-obstacle ocean current environment
Yiming et al. Variable-structure filtering method for an unmanned wave glider

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant