CN112925319A - Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning - Google Patents

Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning Download PDF

Info

Publication number
CN112925319A
CN112925319A CN202110098934.3A CN202110098934A CN112925319A CN 112925319 A CN112925319 A CN 112925319A CN 202110098934 A CN202110098934 A CN 202110098934A CN 112925319 A CN112925319 A CN 112925319A
Authority
CN
China
Prior art keywords
autonomous vehicle
underwater autonomous
model
underwater
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110098934.3A
Other languages
Chinese (zh)
Other versions
CN112925319B (en
Inventor
孙玉山
罗孝坤
张国成
李岳明
薛源
于鑫
张红星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202110098934.3A priority Critical patent/CN112925319B/en
Publication of CN112925319A publication Critical patent/CN112925319A/en
Application granted granted Critical
Publication of CN112925319B publication Critical patent/CN112925319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/0206Control of position or course in two dimensions specially adapted to water vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

An underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning relates to the technical field of underwater robot obstacle avoidance. The invention aims to solve the problem that the obstacle avoidance research of an underwater autonomous vehicle on dynamic obstacles is lacked at present. The method comprises the steps of establishing an underwater autonomous vehicle model and a kinematics model, and acquiring information of surrounding obstacles; acquiring the motion state information of maneuvering obstacles around the underwater autonomous vehicle, and constructing a dynamic obstacle state equation; predicting a dynamic obstacle kinematics model according to a dynamic obstacle state equation; according to the information of obstacles around the underwater autonomous vehicle and the dynamic obstacle kinematics model, a multi-dynamic obstacle avoiding method is fused to generate an obstacle avoiding strategy and the obstacle avoiding strategy is converted into an MDP model; training the MDP model by combining a deterministic depth strategy gradient algorithm until the underwater autonomous vehicle can reach a target area without collision; and guiding the underwater autonomous vehicle to navigate by using the trained MDP model.

Description

Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of obstacle avoidance of underwater robots.
Background
In recent years, with the continuous progress of new materials, new energy sources, artificial intelligence and other technologies in accordance with the needs of ocean development and ocean military applications, the research pace of Autonomous Underwater Vehicles (AUV) is accelerated and made to progress by ocean major countries of various countries. Compared with manned underwater vehicles, the AUV has focused attention of various scholars by virtue of the advantages of strong maneuverability, wide action range, no casualty risk, high adaptability and viability, low manufacturing and maintenance cost and the like. The autonomous underwater vehicle is not limited to the use in the marine environment, is gradually applied to various water areas such as channel water areas, water delivery tunnels, port water areas and the like, and becomes key equipment for exploration, underwater environment detection, underwater rescue and the like of the underwater world.
The underwater environment is complex and changeable, when the underwater autonomous vehicle reaches the underwater navigation, the underwater autonomous vehicle faces large and small obstacles, is static and also moves, and seriously threatens the running safety of the underwater autonomous vehicle. Most researchers at present make great and small progress in the aspect of obstacle avoidance of static obstacles of an underwater autonomous vehicle, but the obstacle avoidance research on dynamic obstacles is rarely carried out. Various dynamic obstacles such as underwater floaters, navigation ships and the like exist underwater, and an underwater autonomous vehicle can complete a specified task only by having high autonomous obstacle avoidance capability and safely return to the home. Therefore, the autonomous obstacle avoidance research of the underwater autonomous vehicle in the environment with a plurality of dynamic obstacles is one of important technologies in the field of underwater autonomous vehicles.
Disclosure of Invention
The invention provides a dynamic obstacle avoidance method of an underwater autonomous vehicle based on deep reinforcement learning, aiming at solving the problem that the obstacle avoidance research of the underwater autonomous vehicle on dynamic obstacles is lacked at present.
An underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning comprises the following steps:
the method comprises the following steps: establishing an underwater autonomous vehicle model and a kinematics model so as to obtain the information of obstacles around the underwater autonomous vehicle;
step two: acquiring motion state information of maneuvering obstacles around an underwater autonomous vehicle, and constructing a dynamic obstacle state equation, wherein the motion state information comprises: motion state vectors, state transition matrices, process noise and input control matrices;
step three: predicting a dynamic obstacle kinematics model according to a dynamic obstacle state equation by utilizing a particle filtering method associated with probability data;
step four: establishing an online training environment of multiple dynamic obstacles in a Cartesian coordinate system according to the information of the obstacles around the underwater autonomous vehicle obtained in the step one and the dynamic obstacle kinematics model obtained in the step three, and fusing a multi-dynamic obstacle avoiding method to generate an obstacle avoiding strategy;
step five: converting the obstacle avoidance strategy generated in the step four into an MDP model, and establishing a state set and an action set of the MDP model when the underwater autonomous vehicle faces a plurality of dynamic obstacles;
step six: taking the state set as the input of an MDP model and the action set as the output of the MDP model, and training the MDP model by combining a deterministic depth strategy gradient algorithm until the underwater autonomous aircraft under the MDP model can reach a target area without collision;
step seven: and guiding the underwater autonomous vehicle to navigate by using the trained MDP model.
Further, the underwater autonomous vehicle model of the first step comprises: one tail propeller, two side propellers and 7 obstacle avoidance sonars, the range finding sonar sampling frequency of the underwater autonomous vehicle model is 2Hz, the detection distance is 150 m-200 m, and the distribution angles under the coordinate system are as follows: 90 degrees, 60 degrees, 30 degrees, 0 degrees, 30 degrees, 60 degrees and 90 degrees;
the kinematic model is a kinematic model with 3 degrees of freedom on the horizontal plane, and the equation is as follows:
Figure BDA0002915018900000021
wherein the content of the first and second substances,
Figure BDA0002915018900000022
the method comprises the steps that a horizontal plane position vector of the underwater autonomous vehicle under a geodetic coordinate system is defined, upsilon is a horizontal plane velocity vector of the underwater autonomous vehicle under a carrier, R (psi) is a conversion matrix, psi is a yaw angle of the underwater autonomous vehicle, and R is a yaw angular velocity of the underwater autonomous vehicle under a satellite coordinate system.
Further, the dynamic obstacle state equation in the second step includes: a discrete time state equation of the uniform motion model when the sampling interval is T and a discrete time state equation of the uniform acceleration motion model when the sampling interval is T,
the expression of the discrete time state equation of the uniform motion model when the sampling interval is T is as follows:
Xk+1=FCVXkk+1
wherein, Xk+1And XkThe states of the uniform motion model at the time k +1 and k, respectively, FCVIs a state transition matrix, omega, of a uniform motion modelk+1In order to realize the process noise of the uniform motion model in discrete time,
the expression of the discrete time state equation of the uniform acceleration motion model when the sampling interval is T is as follows:
Figure BDA0002915018900000023
wherein the content of the first and second substances,
Figure BDA0002915018900000024
and
Figure BDA0002915018900000025
states of the uniform acceleration motion model at times k +1 and k, respectively, FCAIs a state transition matrix of the uniform acceleration motion model,
Figure BDA0002915018900000026
is the process noise of the uniform acceleration motion model in discrete time.
Further, a state transition matrix F of the uniform motion modelCVThe expression of (a) is:
Figure BDA0002915018900000031
wherein the content of the first and second substances,
Figure BDA0002915018900000032
state transition matrix F of uniform acceleration motion modelCAThe expression of (a) is:
Figure BDA0002915018900000033
wherein the content of the first and second substances,
Figure BDA0002915018900000034
further, in the fourth step, a training environment map model is constructed by combining terrain information of the water area environment where the underwater autonomous vehicle is located, and then a plurality of dynamic obstacles are loaded in the training environment map model according to the dynamic obstacle kinematics model, so that the online training environment of the plurality of dynamic obstacles under a Cartesian coordinate system is obtained.
Further, in the fourth step, the behavior of the underwater autonomous vehicle towards the target is taken as a gravitational potential field function, the behavior of the underwater autonomous vehicle for avoiding dynamic obstacles is taken as a repulsive potential field function of the underwater autonomous vehicle,
the obstacle avoidance strategy is as follows:
when the sonar of the underwater autonomous vehicle detects the dynamic barrier, whether the dynamic barrier enters the action area of the repulsive potential field of the underwater autonomous vehicle is judged,
if so, the obstacle avoidance subtask priority is greater than the target tendency subtask priority, the course angle is continuously changed until the dynamic obstacle is separated from the repulsive force field action domain of the underwater autonomous vehicle,
and if not, the target trend subtask priority is greater than the obstacle avoidance subtask priority, and the heading is adjusted to be a pointing target, so that the underwater autonomous vehicle drives to a target area.
Further, the gravitational potential field function expression is as follows:
Figure BDA0002915018900000035
wherein k is1Is the gravitational potential energy gain coefficient, xtAnd ytRespectively an abscissa and an ordinate of the position of the autonomous aircraft under water in a Cartesian coordinate system at time t, xgoalAnd ygoalRespectively are horizontal and vertical coordinates of the central position of the target area under a Cartesian coordinate system;
the repulsive force potential field function expression is as follows:
Figure BDA0002915018900000041
wherein k is2Is a repulsive force potential energy gain coefficient, x'tAnd y'tRespectively, the abscissa and ordinate of the position of the dynamic obstacle at time t in a Cartesian coordinate system, d (q)t,q’t) Is the distance between the underwater autonomous vehicle and the dynamic barrier at the moment t, qt=(xt,yt),q’t=(x’t,y’t),d0The distance L is large for the influence of the repulsive force potential field action domain energy of the underwater autonomous vehicle1And L2The length of the major axis and the length of the minor axis of the ellipse are respectively expanded into the ellipse by the autonomous underwater vehicle.
Further, in the fifth step, the MDP model expression is:
MDP=(S,A,Psa,R),
wherein S is a state set, A is an action set, and PsaFor state transition probabilities, R is the reward function.
Further, in the fifth step, when facing a plurality of dynamic obstacles, the state set S of the MDP model is { S ═ S1,S2,...,St,...,ST},
Figure BDA0002915018900000042
Signals collected by 7 obstacle avoidance sonars of the underwater autonomous vehicle at the time t,
in step five, when facing a plurality of dynamic obstacles, the action set A of the MDP model is equal to { a ═ a }1,a2,...,at,...,aT},atThe yaw rate and the horizontal rate of the underwater autonomous vehicle at the moment t are { ω (t), v (t) }, ω (t) and v (t), respectively.
Further, the reward value R of the MDP model reward function R at the time ttComprises the following steps:
rt=τ1r1(st,at,st+1)+τ2r2(st,at,st+1)+τ3r3(st,at,st+1),
wherein τ 1 is the proportionality coefficient of the target module, τ 2 is the proportionality coefficient of the safety module, τ 3 is the proportionality coefficient of the stability module, r1(st,at,st+1) The prize value, r, for the time t of the goal module2(st,at,st+1) The prize value r for the moment t of the security module3(st,at,st+1) The prize value at time t of the stability module.
The dynamic obstacle avoidance method of the underwater autonomous vehicle based on the depth reinforcement learning can improve the dynamic obstacle avoidance capability of the underwater autonomous vehicle, effectively solve the problem that the underwater autonomous vehicle is difficult to avoid when the underwater autonomous vehicle encounters a plurality of dynamic obstacles to threaten the safety, and improve the safety of the underwater autonomous vehicle in a complex dynamic water area environment.
Meanwhile, the obstacle avoidance training of the underwater autonomous vehicle avoids collision damage of the underwater autonomous vehicle, and the dynamic obstacle avoidance strategy obtained by combining the kinematics model of the underwater autonomous vehicle can be directly applied to the actual underwater autonomous vehicle without secondary action planning, and compared with a traditional path planning and action planning separated mode, the method can save certain manpower and material resources.
Drawings
Fig. 1 is a flowchart of a dynamic obstacle avoidance method for an autonomous underwater vehicle based on depth reinforcement learning according to an embodiment;
FIG. 2 is a flow chart of path planning obstacle avoidance for an autonomous underwater vehicle in a multiple dynamic obstacle environment;
FIG. 3 is a schematic illustration of an underwater autonomous vehicle in a multiple dynamic obstacle environment;
fig. 4 is a schematic view of an autonomous underwater vehicle puffing process;
fig. 5 is a schematic block diagram of an obstacle avoidance controller for an underwater autonomous vehicle multi-dynamic obstacle environment based on a DDPG algorithm.
Detailed Description
The deep deterministic strategy gradient algorithm has good online adaptivity and learning capacity on a nonlinear system, and is widely researched in the fields of artificial intelligence, machine learning and automatic control. Therefore, the algorithm can be applied to a control system of the underwater autonomous vehicle to realize the autonomous obstacle avoidance function of the underwater autonomous vehicle so as to improve the adaptability of the environment. In addition, the depth certainty strategy gradient algorithm can also improve the problems of dimension disaster, long planning time, low precision and the like of other planning methods, and has important practical significance for the research of a method for avoiding a plurality of dynamic obstacles by an autonomous underwater vehicle.
The first embodiment is as follows: specifically describing the embodiment with reference to fig. 1 to 5, the method for dynamically avoiding obstacles of an autonomous underwater vehicle based on deep reinforcement learning in the embodiment includes the following steps:
the method comprises the following steps: and establishing an underwater autonomous vehicle model and a kinematics model so as to obtain the information of obstacles around the underwater autonomous vehicle.
In this embodiment, the underwater autonomous vehicle model includes: one tail propeller, two side propellers and 7 obstacle avoidance sonars. One tail propeller and two side propellers can realize the turning, advancing and retreating of the underwater autonomous vehicle, and the obstacle information around the underwater autonomous vehicle is obtained through 7 obstacle avoidance sonars. The range finding sonar sampling frequency of the underwater autonomous vehicle model is 2Hz, the detection distance is 200m, and the distribution angles under the coordinate system are as follows: 90 degrees, 60 degrees, 30 degrees, 0 degrees, 30 degrees, 60 degrees and 90 degrees.
The kinematic model is a kinematic model with 3 degrees of freedom on the horizontal plane, and the equation is as follows:
Figure BDA0002915018900000051
wherein the content of the first and second substances,
Figure BDA0002915018900000052
the method is characterized in that a horizontal plane position vector of the underwater autonomous vehicle under a geodetic coordinate system comprises a horizontal plane position coordinate and a yaw angle, and upsilon is [ u, v, r ═]T∈R3Is a horizontal plane velocity vector of the underwater autonomous vehicle under a carrier, u, v and R are an X axial component, a Y axial component and a yaw angle velocity of the horizontal velocity vector of the underwater autonomous vehicle under a satellite coordinate system respectively, R (psi) is a conversion matrix, psi is a yaw angle of the underwater autonomous vehicle,
Figure BDA0002915018900000061
in order to facilitate on-line training, the underwater autonomous vehicle is simplified into a rectangle in the embodiment, and the fact that the actual navigation of the underwater autonomous vehicle can be directly guided by a strategy trained by an on-line training system is guaranteed by combining a kinematics model of the underwater autonomous vehicle.
Step two: the precondition basis that the underwater autonomous vehicle can successfully avoid a plurality of dynamic obstacles is as follows: after the underwater autonomous vehicle accurately acquires the motion state of the dynamic barrier at some time through a sensor carried by the underwater autonomous vehicle, the sensor continuously observes the barrier, analyzes and processes observation data to obtain a kinematic model for accurately predicting and estimating the dynamic barrier, and leads the kinematic model into an obstacle avoidance planning system to avoid the obstacle of the underwater autonomous vehicle. The method specifically comprises the following steps:
acquiring motion state information of maneuvering obstacles around the underwater autonomous vehicle by using a sensor carried by the underwater autonomous vehicle, wherein the motion state information comprises: motion state vectors, state transition matrices, process noise, and input control matrices. The implementation mode aims at the movement characteristics of the dynamic barrier, combines the influence of random disturbance and system noise on the maneuvering motion of the barrier in the actual operation process, adds random noise on the basis of a linear Gaussian uniform motion (CV) model and a uniform accelerated motion (CA) model, and establishes a kinematics model for predicting and estimating the future movement state of the dynamic barrier. Specifically, a dynamic obstacle state equation is constructed according to the motion state information, and the dynamic obstacle state equation comprises the following steps: the discrete time state equation of the uniform motion model when the sampling interval is T and the discrete time state equation of the uniform acceleration motion model when the sampling interval is T.
The dynamic obstacle state equation is expressed by the position and the speed of the dynamic obstacle, and the acceleration term of white noise which obeys Gaussian distribution is added to express the slight change of the speed of the obstacle caused by external disturbance in the underwater environment. If the state vector of the dynamic obstacle at the time t is as follows:
Figure BDA0002915018900000062
wherein x (t) and y (t) are respectively the horizontal and vertical coordinates of the position of the obstacle,
Figure BDA0002915018900000063
and
Figure BDA0002915018900000064
respectively the lateral and longitudinal speed of the obstacle,
Figure BDA0002915018900000065
and
Figure BDA0002915018900000066
respectively the lateral and longitudinal acceleration of the obstacle.
The expression of the discrete time state equation of the uniform motion model when the sampling interval is T is as follows:
Figure BDA0002915018900000067
wherein, Xk+1And XkThe states of the uniform motion model at the time k +1 and k, respectively, FCVIs a state transition matrix of the uniform motion model,
Figure BDA0002915018900000071
ωk+1the process noise of the uniform motion model in discrete time is obtained.
The expression of the discrete time state equation of the uniform acceleration motion model when the sampling interval is T is as follows:
Figure BDA0002915018900000072
wherein the content of the first and second substances,
Figure BDA0002915018900000073
and
Figure BDA0002915018900000074
states of the uniform acceleration motion model at times k +1 and k, respectively, FCAIs a state transition matrix of the uniform acceleration motion model,
Figure BDA0002915018900000075
is modeled as a uniform acceleration motionThe process noise in a discrete time is,
Figure BDA0002915018900000076
step three: and predicting a dynamic obstacle kinematics model according to a dynamic obstacle state equation by utilizing a particle filtering method associated with probability data. Namely: after the motion state of the dynamic barrier at some moments is accurately acquired through a sensor carried by the underwater autonomous vehicle, the sensor continuously observes the barrier, observation data is analyzed and processed, and a kinematics model describing the kinematics characteristic of the dynamic barrier is established and estimating the dynamics characteristic of the dynamic barrier is established by combining the established uniform velocity motion (CV) model and uniform acceleration motion (CA) model of the dynamic barrier. And leading the kinematic model for predicting and estimating the dynamic barrier into an obstacle avoidance planning system to carry out obstacle avoidance control on the underwater autonomous vehicle.
Step four: and establishing an on-line training environment of multiple dynamic obstacles in a Cartesian coordinate system according to the information of the obstacles around the underwater autonomous vehicle obtained in the step one and the dynamic obstacle kinematics model obtained in the step three, and fusing a multi-dynamic obstacle avoiding method to generate an obstacle avoiding strategy. The method comprises the following specific steps:
and constructing a training environment map model by combining terrain information of the water area environment where the underwater autonomous vehicle is located, and then loading a plurality of dynamic obstacles in the training environment map model according to the dynamic obstacle kinematics model to obtain the online training environment of the plurality of dynamic obstacles in a Cartesian coordinate system.
The method is combined with the idea of an artificial potential field method of a local path planning algorithm to design a multi-dynamic-barrier obstacle avoidance strategy. Taking the behavior of the underwater autonomous vehicle towards the target as a gravitational potential field function, wherein the expression of the gravitational potential field function is as follows:
Figure BDA0002915018900000077
wherein k is1For gravitational potential energy gain coefficient, practically 0.01, x can be takentAnd ytRespectively at time t in CartesianHorizontal and vertical coordinates, x, of the position of the autonomous underwater vehicle under the coordinate systemgoalAnd ygoalRespectively are horizontal and vertical coordinates of the central position of the target area under a Cartesian coordinate system;
taking the behavior of the underwater autonomous vehicle for avoiding the dynamic barrier as a repulsive force potential field function of the underwater autonomous vehicle, wherein the repulsive force potential field function expression is as follows:
Figure BDA0002915018900000081
wherein k is2For the repulsive force potential energy gain coefficient, practically 0.1, x 'can be taken'tAnd y'tRespectively, the abscissa and ordinate of the position of the dynamic obstacle at time t in a Cartesian coordinate system, d (q)t,q’t) Is the distance between the underwater autonomous vehicle and the dynamic barrier at the moment t, qt=(xt,yt),q’t=(x’t,y’t),d0The distance can be increased for the influence of the action field of the repulsive force potential field of the underwater autonomous vehicle.
Because the shape of the dynamic obstacle is uncertain, but the shape of the underwater autonomous vehicle is determined, the embodiment performs bulking processing on the underwater autonomous vehicle to give a certain safety space:
Figure BDA0002915018900000082
wherein alpha is a constant with a value greater than 1; l and B are respectively the maximum length and the width of the underwater vehicle; l is1And L2The length of the major axis and the length of the minor axis of the ellipse are respectively expanded into the ellipse by the autonomous underwater vehicle.
The method is different from the action domain of the repulsive force field of the barrier in the manual potential field method, the action domain of the repulsive force field of the underwater autonomous vehicle is established, and when the dynamic barrier enters the action domain of the repulsive force field of the underwater autonomous vehicle, the smaller the distance between the dynamic barrier and the repulsive force field of the underwater autonomous vehicle is, the larger the repulsive force borne by the underwater autonomous vehicle is; conversely, the smaller the repulsive force experienced by the autonomous underwater vehicle. When the underwater autonomous vehicle makes a heading action to enable the barrier to be separated from the action field of the repulsive force potential field of the underwater autonomous vehicle, the repulsive force borne by the underwater autonomous vehicle is zero. The obstacle avoidance strategy is as follows:
when the sonar of the underwater autonomous vehicle detects the dynamic barrier, whether the dynamic barrier enters the action area of the repulsive potential field of the underwater autonomous vehicle is judged,
if so, the obstacle avoidance subtask priority is greater than the target tendency subtask priority, the course angle is continuously changed until the dynamic obstacle is separated from the repulsive force field action domain of the underwater autonomous vehicle,
and if not, the target trend subtask priority is greater than the obstacle avoidance subtask priority, and the heading is adjusted to be a pointing target, so that the underwater autonomous vehicle drives to a target area.
Step five: in order to perform deep reinforcement learning training, the obstacle avoidance strategy generated in the fourth step is converted into an MDP (Markov Decision Process) model, wherein the MDP model is composed of quadruplets and has the expression:
MDP=(S,A,Psa,R),
wherein S is a state set, A is an action set, and PsaFor state transition probabilities, R is the reward function.
The autonomous underwater vehicle in this embodiment is fully driven, whereby the range of heading angle is [ - π, + π]The unit: and (7) rad. Considering the limitation of self mobility, the motion space of the final MDP model is defined as the yaw rate and the horizontal velocity, and the motion set a at the time ttThe expression is as follows:
at={ω(t),V(t)},
wherein, atAnd ω (t) and V (t) are respectively the yaw rate and the horizontal speed of the underwater autonomous vehicle at the moment t, which is the action set at the moment t.
The direct equipment for interaction between the underwater autonomous vehicle and the environment is a sonar sensor, a state set is defined as signals acquired by 7 obstacle avoidance sonars of the underwater autonomous vehicle at t moment, the limit of the detection capability of detection equipment of the underwater autonomous vehicle is considered, and the detection distance D (t) range is [0, 2%00](unit: m), set of states S at time ttThe expression is as follows:
Figure BDA0002915018900000091
in the embodiment, the path planning and obstacle avoidance control method in the multiple dynamic obstacle environments proposed in the step two is fused into the specific setting of the reward function of the deep reinforcement learning MDP model, and the steps are mainly considered as follows;
setting the tendency target behavior of the autonomous underwater vehicle as the reward value r of the target module at the moment t1(st,at,st+1) Combining the gravitational potential field function in step 2, r1(st,at,st+1) The method specifically comprises the following steps:
Figure BDA0002915018900000092
when the underwater autonomous vehicle arrives at the target area, updating the reward value of the target module:
r1(st,at,st+1)←r1(st,at,st+1)+R
wherein R is a normal number.
Setting barrier avoiding behavior of the autonomous underwater vehicle as an award value r of a safety module at the moment t2(st,at,st+1) When the next state of the autonomous underwater vehicle heading from a safe area is not a safe area, namely: enabling the dynamic barrier to enter the repulsive force field action domain of the underwater autonomous vehicle, combining the repulsive force field function of the underwater autonomous vehicle in the step 2 and the reward value r of the underwater autonomous vehicle2(st,at,st+1) Comprises the following steps:
Figure BDA0002915018900000093
when the distance between the underwater autonomous vehicle and the obstacle is smaller than the minimum safe distance, the collision obstacle is represented, and the reward value of the safety module is updated:
r2(st,at,st+1)←r2(st,at,st+1)-R(d(qt,q0)≤rs)
wherein r issIs the minimum safe distance of the underwater autonomous vehicle.
In order to avoid large fluctuation of speed and course of the underwater vehicle when the underwater vehicle navigates in a safe area and reduce ocean current interference, the invention sets the reward value r of the stability module at the moment t3(st,at,st+1):
r3(st,at,st+1)=-0.01×(|vt+1-vt|+|ωt+1t|+|sin(ψt-φ)|)
Wherein v istAnd vt+1Yaw angular velocity, omega, of the autonomous underwater vehicle at time t and time t +1, respectivelytAnd ωt+1Yaw angular velocity, psi, of the autonomous underwater vehicle at time t and at time t +1, respectivelytAnd phi is the yaw angle and the water flow direction angle of the underwater autonomous vehicle at the moment t under the Cartesian coordinate system respectively.
The final prize value R of the prize function R in the MDP model at the moment ttComprises the following steps:
rt=τ1r1(st,at,st+1)+τ2r2(st,at,st+1)+τ3r3(st,at,st+1),
wherein, tau1Is the scale factor, τ, of the target module2For the proportionality coefficient of the security module, τ3Is the scaling factor of the stability module.
Setting the MDP model: the motion planning task is horizontal and has three degrees of freedom; at the same time, the time is discretized, and the sampling rate of the obstacle avoidance system is TSThe output is periodically made at an interval of 0.5s, whereby the autonomous underwater vehicle is at time tAfter receiving the state information, outputting the action mutE.g. A, generated prize value R at time tt=f(st) The state is changed to st+1. I.e. the output action utDetermined by the strategy pi, which is the state stProbability of mapping to each action: s → P (A).
And then establishing a state set and an action set of the MDP model when the underwater autonomous vehicle faces a plurality of dynamic obstacles. State set S ═ S for MDP model in the face of multiple dynamic obstacles1,S2,...,St,...,ST},
Figure BDA0002915018900000101
Signals collected by 7 obstacle avoidance sonars of the underwater autonomous vehicle at the moment t are processed, and in the fifth step, when the MDP model faces a plurality of dynamic obstacles, an action set A is { a ═ of the MDP model1,a2,...,at,...,aT}。
Step six: set of states S ═ S1,S2,...,St,...,STAs the input of the MDP model, the action set a ═ a1,a2,...,at,...,aTAnd (4) as the output of the MDP model, training the MDP model by combining with a deterministic depth strategy gradient algorithm until the underwater autonomous vehicle under the MDP model can reach a target area without collision.
A plurality of dynamic obstacle avoidance MDP models of the underwater autonomous vehicle are fused into an online training environment with a plurality of dynamic obstacles, so that a theoretical framework of a simulation training platform is established, and then the simulation environment is built by utilizing a Pyglet library in a python compiling environment. On the basis of the simulation environment module, a deep reinforcement learning training module is compiled, an obstacle avoidance controller for realizing the DDPG-based underwater autonomous vehicle multi-dynamic obstacle environment is imported by utilizing Python language compiling, and as shown in fig. 3, initial parameters of the underwater autonomous vehicle, initial parameters of dynamic obstacles and neural network training hyper-parameters are set.
Training is carried out: the underwater autonomous vehicle moves according to initial speed and initial yaw angle in a multi-dynamic-obstacle environment, environmental data detected by 7 sonars of the underwater autonomous vehicle is used as a deep reinforcement learning state, when no obstacle exists in the detection range of the 7 sonars, the underwater autonomous vehicle allows continuous learning and exploration, a target module continuously rewards and updates, the underwater autonomous vehicle continuously trends to a target until the underwater autonomous vehicle reaches a target area, and the round learning is finished.
When the 7 sonar detection ranges have obstacles and the obstacles are detected to enter the action areas of the repulsive force field of the underwater autonomous vehicle to intersect, the underwater autonomous vehicle is indicated to collide with the obstacles, the underwater autonomous vehicle obtains continuous negative reward values, the underwater autonomous vehicle changes the heading and the speed, the dynamic obstacles are continuously tried to be separated from the action area of the repulsive force field, if the underwater autonomous vehicle collides with the obstacles in the trying exploration process, the turn is finished, the turn is returned to the starting point to restart learning, and if the underwater autonomous vehicle successfully changes to avoid the obstacles and returns to the safe area, the underwater autonomous vehicle continues to learn and explore the target area.
And continuously and circularly running the operations until each round is triggered to end when no collision reaches the target area, the training is shown to be convergent at this time, and after 10000 rounds are run, the training is ended, and the learned strategy is stored. And operating the test module, calling a well deep reinforcement learning and training strategy, and generating a collision-free path.
Initializing the maximum round number 10000 of the training round number Ep to be Ep-1; time step tmax time step 2000 in Ep round is initialization t 1. The online Actor policy network is according to the current state stThe strategy selects an action set comprising the yaw rate and the horizontal speed of the underwater autonomous vehicle, and the action set in the current state is represented by the following formula: a ist=μ(stμ)+Nt
According to the output action atCombining a kinematics model of 3 degrees of freedom of the horizontal plane of the underwater autonomous vehicle to obtain a differential expression, wherein the differential expression is represented by the following formula:
Figure BDA0002915018900000111
wherein the content of the first and second substances,
Figure BDA0002915018900000112
the method comprises the steps that a horizontal plane position vector under a geodetic coordinate system of the underwater autonomous vehicle comprises a horizontal plane position coordinate and a yaw angle; upsilon (t) is a horizontal plane velocity vector of the underwater autonomous vehicle under the carrier, and comprises a horizontal velocity and a yaw angular velocity; r [ psi (t)]Is a transformation matrix; psi (t) is the yaw angle of the autonomous underwater vehicle at time step t; u (t), v (t) and r (t) respectively represent the X axial component, the Y axial component and the yaw rate of the horizontal velocity vector of the underwater autonomous vehicle under the coordinate system of the satellite at the time step t. Solving a differential expression according to a fourth-order Runge-Kutta method to obtain a new position vector eta (t +1) after the action is executed, and expressing the vector by the following formula:
η(t+1)=[x(t+1),y(t+1),ψ(t+1)]T∈R3
from the new position vector after the execution of the action to the next state st+1
When the experience sample amount stored in the experience pool is more than or equal to the maximum data storage capacity len of the memory bankMaxWhen (Data) is M, sampling a small batch of N experience samples
Figure BDA0002915018900000121
Wherein the content of the first and second substances,
Figure BDA0002915018900000122
and (3) expressing the kth experience sample of the time step t, wherein k is 1,2, …, N and N are the total number of the small batch samples, forming a data set, and sending the data set to the online strategy network, the target strategy network, the online evaluation network and the target evaluation network. From the sampled data set, the target policy network is based on state st+1Output action a't+1Calculating a target Q value, denoted as yi
yi=ri+γQ'(si+1,μ'(si+1μ')|θQ');
Target evaluation network according to state st+1Object strategyAction a 'of network output't+1And y of the target Q valueiUpdating the online evaluation network parameter theta of the critic by updating the loss function, and performing online evaluation according to the following formula:
Figure BDA0002915018900000123
where L is a loss function.
Combining the small batch of N experience samples with a random gradient descent method, updating the strategy of the actor network and the online strategy network parameter delta, and updating by the following formula:
Figure BDA0002915018900000124
wherein the content of the first and second substances,
Figure BDA0002915018900000125
a sampling strategy gradient is used.
Updating theta 'and delta' in the form of soft updates according to online network parameters theta and delta:
Figure BDA0002915018900000126
wherein τ is the weight of the online network parameter;
when t is less than or equal to 2000, the underwater autonomous vehicle collides with an obstacle or reaches a target area in the exploration process, the step is carried out by 5.2.5, and the number of turns Ep is Ep + 1; and when Ep is 10000, completing the training of the underwater autonomous vehicle in the large-scale continuous obstacle environment, and storing the learned obstacle avoidance strategy.
Step seven: and guiding the underwater autonomous vehicle to navigate by using the trained MDP model.
The above description is only a preferred embodiment of the two-dimensional obstacle avoidance control method of the underwater autonomous vehicle in the complex multi-dynamic obstacle environment, the protection range of the two-dimensional obstacle avoidance control method of the underwater autonomous vehicle in the complex multi-dynamic obstacle environment is not limited to the above embodiments, and all technical schemes belonging to the idea belong to the protection range of the invention. It should be noted that modifications and variations which do not depart from the gist of the invention will be those skilled in the art to which the invention pertains and which are intended to be within the scope of the invention.

Claims (10)

1. An underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning is characterized by comprising the following steps:
the method comprises the following steps: establishing an underwater autonomous vehicle model and a kinematics model so as to obtain the information of obstacles around the underwater autonomous vehicle;
step two: acquiring motion state information of maneuvering obstacles around an underwater autonomous vehicle, and constructing a dynamic obstacle state equation, wherein the motion state information comprises: motion state vectors, state transition matrices, process noise and input control matrices;
step three: predicting a dynamic obstacle kinematics model according to a dynamic obstacle state equation by utilizing a particle filtering method associated with probability data;
step four: establishing an online training environment of multiple dynamic obstacles in a Cartesian coordinate system according to the information of the obstacles around the underwater autonomous vehicle obtained in the step one and the dynamic obstacle kinematics model obtained in the step three, and fusing a multi-dynamic obstacle avoiding method to generate an obstacle avoiding strategy;
step five: converting the obstacle avoidance strategy generated in the step four into an MDP model, and establishing a state set and an action set of the MDP model when the underwater autonomous vehicle faces a plurality of dynamic obstacles;
step six: taking the state set as the input of an MDP model and the action set as the output of the MDP model, and training the MDP model by combining a deterministic depth strategy gradient algorithm until the underwater autonomous aircraft under the MDP model can reach a target area without collision;
step seven: and guiding the underwater autonomous vehicle to navigate by using the trained MDP model.
2. The method for dynamically avoiding obstacles of the underwater autonomous vehicle based on the depth reinforcement learning as claimed in claim 1, wherein the step one of the underwater autonomous vehicle model comprises: one tail propeller, two side propellers and 7 obstacle avoidance sonars, the range finding sonar sampling frequency of the underwater autonomous vehicle model is 2Hz, the detection distance is 150 m-200 m, and the distribution angles under the coordinate system are as follows: 90 degrees, 60 degrees, 30 degrees, 0 degrees, 30 degrees, 60 degrees and 90 degrees;
the kinematic model is a kinematic model with 3 degrees of freedom on the horizontal plane, and the equation is as follows:
Figure FDA0002915018890000011
wherein the content of the first and second substances,
Figure FDA0002915018890000012
the method comprises the steps that a horizontal plane position vector of the underwater autonomous vehicle under a geodetic coordinate system is defined, upsilon is a horizontal plane velocity vector of the underwater autonomous vehicle under a carrier, R (psi) is a conversion matrix, psi is a yaw angle of the underwater autonomous vehicle, and R is a yaw angular velocity of the underwater autonomous vehicle under a satellite coordinate system.
3. The method for dynamically avoiding obstacles of the underwater autonomous vehicle based on the deep reinforcement learning of claim 1, wherein the dynamic obstacle state equation in the second step comprises: a discrete time state equation of the uniform motion model when the sampling interval is T and a discrete time state equation of the uniform acceleration motion model when the sampling interval is T,
the expression of the discrete time state equation of the uniform motion model when the sampling interval is T is as follows:
Xk+1=FCVXkk+1
wherein, Xk+1And XkThe states of the uniform motion model at the time k +1 and k, respectively, FCVIs a state transition matrix, omega, of a uniform motion modelk+1In order to realize the process noise of the uniform motion model in discrete time,
the expression of the discrete time state equation of the uniform acceleration motion model when the sampling interval is T is as follows:
Figure FDA0002915018890000021
wherein the content of the first and second substances,
Figure FDA0002915018890000022
and
Figure FDA0002915018890000023
states of the uniform acceleration motion model at times k +1 and k, respectively, FCAIs a state transition matrix of the uniform acceleration motion model,
Figure FDA0002915018890000024
is the process noise of the uniform acceleration motion model in discrete time.
4. The underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning as claimed in claim 3, characterized in that a state transition matrix F of a uniform velocity motion modelCVThe expression of (a) is:
Figure FDA0002915018890000025
wherein the content of the first and second substances,
Figure FDA0002915018890000026
state transition matrix F of uniform acceleration motion modelCAThe expression of (a) is:
Figure FDA0002915018890000027
wherein the content of the first and second substances,
Figure FDA0002915018890000028
5. the method for dynamically avoiding the obstacle of the underwater autonomous vehicle based on the deep reinforcement learning is characterized in that in the fourth step, a training environment map model is constructed by combining terrain information of a water area environment where the underwater autonomous vehicle is located, and then a plurality of dynamic obstacles are loaded in the training environment map model according to a dynamic obstacle kinematics model to obtain an online training environment of the plurality of dynamic obstacles in a Cartesian coordinate system.
6. The dynamic obstacle avoidance method of the underwater autonomous vehicle based on the depth reinforcement learning according to claim 1 or 2, characterized in that in the fourth step, the behavior of the underwater autonomous vehicle toward the target is taken as a gravitational potential field function, the behavior of the underwater autonomous vehicle for avoiding the dynamic obstacle is taken as a repulsive potential field function of the underwater autonomous vehicle,
the obstacle avoidance strategy is as follows:
when the sonar of the underwater autonomous vehicle detects the dynamic barrier, whether the dynamic barrier enters the action area of the repulsive potential field of the underwater autonomous vehicle is judged,
if so, the obstacle avoidance subtask priority is greater than the target tendency subtask priority, the course angle is continuously changed until the dynamic obstacle is separated from the repulsive force field action domain of the underwater autonomous vehicle,
and if not, the target trend subtask priority is greater than the obstacle avoidance subtask priority, and the heading is adjusted to be a pointing target, so that the underwater autonomous vehicle drives to a target area.
7. The underwater autonomous vehicle dynamic obstacle avoidance method based on the depth reinforcement learning of claim 6 is characterized in that the gravitational potential field function expression is as follows:
Figure FDA0002915018890000031
wherein k is1Is the gravitational potential energy gain coefficient, xtAnd ytRespectively an abscissa and an ordinate of the position of the autonomous aircraft under water in a Cartesian coordinate system at time t, xgoalAnd ygoalRespectively are horizontal and vertical coordinates of the central position of the target area under a Cartesian coordinate system;
the repulsive force potential field function expression is as follows:
Figure FDA0002915018890000032
wherein k is2Is a repulsive force potential energy gain coefficient, x'tAnd y'tRespectively, the abscissa and ordinate of the position of the dynamic obstacle at time t in a Cartesian coordinate system, d (q)t,q′t) Is the distance between the underwater autonomous vehicle and the dynamic barrier at the moment t, qt=(xt,yt),q′t=(x′t,y′t),d0The distance L is large for the influence of the repulsive force potential field action domain energy of the underwater autonomous vehicle1And L2The length of the major axis and the length of the minor axis of the ellipse are respectively expanded into the ellipse by the autonomous underwater vehicle.
8. The underwater autonomous vehicle dynamic obstacle avoidance method based on the depth reinforcement learning of claim 1 is characterized in that in the fifth step, the MDP model expression is as follows:
MDP=(S,A,Psa,R),
wherein S is a state set, A is an action set, and PsaFor state transition probabilities, R is the reward function.
9. The method for dynamically avoiding obstacles of the autonomous underwater vehicle based on the deep reinforcement learning as claimed in claim 2, wherein the state set S-S of the MDP model when facing a plurality of dynamic obstacles in the fifth step is { S ═ S }1,S2,...,St,...,ST},
Figure FDA0002915018890000033
Signals collected by 7 obstacle avoidance sonars of the underwater autonomous vehicle at the time t,
in step five, when facing a plurality of dynamic obstacles, the action set A of the MDP model is equal to { a ═ a }1,a2,...,at,...,aT},atThe yaw rate and the horizontal rate of the underwater autonomous vehicle at the moment t are { ω (t), v (t) }, ω (t) and v (t), respectively.
10. The method for dynamically avoiding obstacles of the autonomous underwater vehicle based on the depth reinforcement learning as claimed in claim 9, wherein the reward value R of the reward function R at the time t in the MDP model istComprises the following steps:
rt=τ1r1(st,at,st+1)+τ2r2(st,at,st+1)+τ3r3(st,at,st+1),
wherein, tau1Is the scale factor, τ, of the target module2For the proportionality coefficient of the security module, τ3Is the proportionality coefficient of the stability module, r1(st,at,st+1) The prize value, r, for the time t of the goal module2(st,at,st+1) The prize value r for the moment t of the security module3(st,at,st+1) The prize value at time t of the stability module.
CN202110098934.3A 2021-01-25 2021-01-25 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning Active CN112925319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110098934.3A CN112925319B (en) 2021-01-25 2021-01-25 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110098934.3A CN112925319B (en) 2021-01-25 2021-01-25 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112925319A true CN112925319A (en) 2021-06-08
CN112925319B CN112925319B (en) 2022-06-07

Family

ID=76167486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110098934.3A Active CN112925319B (en) 2021-01-25 2021-01-25 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112925319B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625704A (en) * 2021-06-30 2021-11-09 北京旷视科技有限公司 Obstacle avoidance method and device and automatic navigation device
CN114360266A (en) * 2021-12-20 2022-04-15 东南大学 Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN114559439A (en) * 2022-04-27 2022-05-31 南通科美自动化科技有限公司 Intelligent obstacle avoidance control method and device for mobile robot and electronic equipment
CN115657683A (en) * 2022-11-14 2023-01-31 中国电子科技集团公司第十研究所 Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
CN116578102A (en) * 2023-07-13 2023-08-11 清华大学 Obstacle avoidance method and device for autonomous underwater vehicle, computer equipment and storage medium
CN117406757A (en) * 2023-12-12 2024-01-16 西北工业大学宁波研究院 Underwater autonomous navigation method based on three-dimensional global vision
CN117406757B (en) * 2023-12-12 2024-04-19 西北工业大学宁波研究院 Underwater autonomous navigation method based on three-dimensional global vision

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001078951A1 (en) * 2000-04-13 2001-10-25 Zhimin Lin Semi-optimal path finding in a wholly unknown environment
CN101795460A (en) * 2009-12-23 2010-08-04 大连理工大学 Markov mobility model suitable for mobile Ad Hoc network in obstacle environment
CN107193009A (en) * 2017-05-23 2017-09-22 西北工业大学 A kind of many UUV cooperative systems underwater target tracking algorithms of many interaction models of fuzzy self-adaption
US20170364081A1 (en) * 2016-01-08 2017-12-21 King Fahd University Of Petroleum And Minerals Seismic sensor deployment with a stereographically configured robot
CN107844460A (en) * 2017-07-24 2018-03-27 哈尔滨工程大学 A kind of underwater multi-robot based on P MAXQ surrounds and seize method
CN108594803A (en) * 2018-03-06 2018-09-28 吉林大学 Paths planning method based on Q- learning algorithms
CN109241552A (en) * 2018-07-12 2019-01-18 哈尔滨工程大学 A kind of underwater robot motion planning method based on multiple constraint target
CN109976340A (en) * 2019-03-19 2019-07-05 中国人民解放军国防科技大学 Man-machine cooperation dynamic obstacle avoidance method and system based on deep reinforcement learning
CN110254422A (en) * 2019-06-19 2019-09-20 中汽研(天津)汽车工程研究院有限公司 A kind of automobile barrier-avoiding method enhancing study and Bezier based on multiple target
CN110673600A (en) * 2019-10-18 2020-01-10 武汉理工大学 Unmanned ship-oriented automatic driving integrated system
US10649453B1 (en) * 2018-11-15 2020-05-12 Nissan North America, Inc. Introspective autonomous vehicle operational management
CN111880535A (en) * 2020-07-23 2020-11-03 上海交通大学 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning
CN112141098A (en) * 2020-09-30 2020-12-29 上海汽车集团股份有限公司 Obstacle avoidance decision method and device for intelligent driving automobile
CN112188503A (en) * 2020-09-30 2021-01-05 南京爱而赢科技有限公司 Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network
CN112241176A (en) * 2020-10-16 2021-01-19 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001078951A1 (en) * 2000-04-13 2001-10-25 Zhimin Lin Semi-optimal path finding in a wholly unknown environment
CN101795460A (en) * 2009-12-23 2010-08-04 大连理工大学 Markov mobility model suitable for mobile Ad Hoc network in obstacle environment
US20170364081A1 (en) * 2016-01-08 2017-12-21 King Fahd University Of Petroleum And Minerals Seismic sensor deployment with a stereographically configured robot
CN107193009A (en) * 2017-05-23 2017-09-22 西北工业大学 A kind of many UUV cooperative systems underwater target tracking algorithms of many interaction models of fuzzy self-adaption
CN107844460A (en) * 2017-07-24 2018-03-27 哈尔滨工程大学 A kind of underwater multi-robot based on P MAXQ surrounds and seize method
CN108594803A (en) * 2018-03-06 2018-09-28 吉林大学 Paths planning method based on Q- learning algorithms
CN109241552A (en) * 2018-07-12 2019-01-18 哈尔滨工程大学 A kind of underwater robot motion planning method based on multiple constraint target
US10649453B1 (en) * 2018-11-15 2020-05-12 Nissan North America, Inc. Introspective autonomous vehicle operational management
CN109976340A (en) * 2019-03-19 2019-07-05 中国人民解放军国防科技大学 Man-machine cooperation dynamic obstacle avoidance method and system based on deep reinforcement learning
CN110254422A (en) * 2019-06-19 2019-09-20 中汽研(天津)汽车工程研究院有限公司 A kind of automobile barrier-avoiding method enhancing study and Bezier based on multiple target
CN110673600A (en) * 2019-10-18 2020-01-10 武汉理工大学 Unmanned ship-oriented automatic driving integrated system
CN111880535A (en) * 2020-07-23 2020-11-03 上海交通大学 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning
CN112141098A (en) * 2020-09-30 2020-12-29 上海汽车集团股份有限公司 Obstacle avoidance decision method and device for intelligent driving automobile
CN112188503A (en) * 2020-09-30 2021-01-05 南京爱而赢科技有限公司 Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network
CN112241176A (en) * 2020-10-16 2021-01-19 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HIROSHI KAWANO,等: "Real-time Obstacle Avoidance for Underactuated Autonomous Underwater Vehicles in Unknown Vortex Sea Flow by the MDP Approach", 《PROCEEDINGS OF THE 2006 IEEE/RSJ》 *
HIROSHI KAWANO,等: "Real-time Obstacle Avoidance for Underactuated Autonomous Underwater Vehicles in Unknown Vortex Sea Flow by the MDP Approach", 《PROCEEDINGS OF THE 2006 IEEE/RSJ》, 31 December 2006 (2006-12-31) *
邢炜: "基于前视声呐的AUV避障方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》 *
邢炜: "基于前视声呐的AUV避障方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》, no. 2020, 15 January 2020 (2020-01-15) *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625704A (en) * 2021-06-30 2021-11-09 北京旷视科技有限公司 Obstacle avoidance method and device and automatic navigation device
CN114360266A (en) * 2021-12-20 2022-04-15 东南大学 Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN114360266B (en) * 2021-12-20 2022-12-13 东南大学 Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN114559439A (en) * 2022-04-27 2022-05-31 南通科美自动化科技有限公司 Intelligent obstacle avoidance control method and device for mobile robot and electronic equipment
CN114559439B (en) * 2022-04-27 2022-07-26 南通科美自动化科技有限公司 Mobile robot intelligent obstacle avoidance control method and device and electronic equipment
CN115657683A (en) * 2022-11-14 2023-01-31 中国电子科技集团公司第十研究所 Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
CN116578102A (en) * 2023-07-13 2023-08-11 清华大学 Obstacle avoidance method and device for autonomous underwater vehicle, computer equipment and storage medium
CN116578102B (en) * 2023-07-13 2023-09-19 清华大学 Obstacle avoidance method and device for autonomous underwater vehicle, computer equipment and storage medium
CN117406757A (en) * 2023-12-12 2024-01-16 西北工业大学宁波研究院 Underwater autonomous navigation method based on three-dimensional global vision
CN117406757B (en) * 2023-12-12 2024-04-19 西北工业大学宁波研究院 Underwater autonomous navigation method based on three-dimensional global vision

Also Published As

Publication number Publication date
CN112925319B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN112241176B (en) Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN112925319B (en) Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN110333739B (en) AUV (autonomous Underwater vehicle) behavior planning and action control method based on reinforcement learning
Chen et al. Path planning and obstacle avoiding of the USV based on improved ACO-APF hybrid algorithm with adaptive early-warning
Cheng et al. Path planning and obstacle avoidance for AUV: A review
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
CN108319293B (en) UUV real-time collision avoidance planning method based on LSTM network
Sun et al. Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning
Lin et al. An improved recurrent neural network for unmanned underwater vehicle online obstacle avoidance
CN108334677B (en) UUV real-time collision avoidance planning method based on GRU network
CN111240345B (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN109765929B (en) UUV real-time obstacle avoidance planning method based on improved RNN
Hadi et al. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN113534668B (en) Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework
Pinheiro et al. Trajectory planning for hybrid unmanned aerial underwater vehicles with smooth media transition
Huo et al. Model-free recurrent reinforcement learning for AUV horizontal control
Hadi et al. Adaptive formation motion planning and control of autonomous underwater vehicles using deep reinforcement learning
Liang et al. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network
CN116069023B (en) Multi-unmanned vehicle formation control method and system based on deep reinforcement learning
CN108459614B (en) UUV real-time collision avoidance planning method based on CW-RNN network
Zang et al. A machine learning enhanced algorithm for the optimal landing problem
CN115657683A (en) Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
Zhang et al. Q-learning Based Obstacle Avoidance Control of Autonomous Underwater Vehicle with Binocular Vision
CN117590756B (en) Motion control method, device, equipment and storage medium for underwater robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant