CN111880535B - Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning - Google Patents

Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning Download PDF

Info

Publication number
CN111880535B
CN111880535B CN202010715076.8A CN202010715076A CN111880535B CN 111880535 B CN111880535 B CN 111880535B CN 202010715076 A CN202010715076 A CN 202010715076A CN 111880535 B CN111880535 B CN 111880535B
Authority
CN
China
Prior art keywords
unmanned ship
network
reward
target
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010715076.8A
Other languages
Chinese (zh)
Other versions
CN111880535A (en
Inventor
张卫东
王雪纯
徐鑫莉
蔡云泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010715076.8A priority Critical patent/CN111880535B/en
Publication of CN111880535A publication Critical patent/CN111880535A/en
Application granted granted Critical
Publication of CN111880535B publication Critical patent/CN111880535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to an unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning, wherein the method comprises the following steps of: 1) building a marine environment; 2) setting an action space according to the condition of the unmanned ship propeller, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code; 3) setting reward target weight to obtain a comprehensive reward function; 4) building and training an evaluation network and a strategy network; 5) and respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the learned mean value of the strategy network. Compared with the prior art, the invention has high self-learning ability, can adapt to different large-scale complex environments through simple deployment training, and further realizes autonomous perception, autonomous navigation and autonomous obstacle avoidance.

Description

Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning
Technical Field
The invention relates to an unmanned ship autonomous obstacle avoidance method and system, in particular to an unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning.
Background
The unmanned ship is an unmanned water vehicle capable of realizing autonomous navigation, autonomous obstacle avoidance and autonomous water surface operation, and has the advantages of small volume, high speed, good stealth, no casualty risk and the like. The unmanned ship is very suitable for executing water surface operation tasks in dangerous sea areas with greater risks to casualties of people or simple water surface operation tasks with low requirements on personnel participation degree, has good cost-effectiveness ratio, and is widely and effectively applied to the fields of ocean monitoring, ocean investigation, maritime search and rescue, unmanned freight transportation and the like.
At present, the mainstream thought for realizing autonomous navigation of the unmanned ship is to deploy and apply autonomous sensing, autonomous navigation and autonomous obstacle avoidance algorithms respectively, and each algorithm is matched with each other to complement and complete navigation and operation tasks. For example, algorithms such as pattern recognition, target detection and the like are involved in vision system perception, main ideas for realizing global planning autonomous navigation include a grid graph method, an A-algorithm, a genetic algorithm and the like, and methods such as an artificial potential field method, optimal interaction collision avoidance and the like are mainly applied to local dynamic collision avoidance. Although the methods have good performance in respective application backgrounds, different functional modules need to be elaborately designed, and parameters need to be integrally configured and adjusted for a comprehensive algorithm, so that the unmanned ship intelligent algorithm is complex and tedious to realize. Furthermore, because these methods lack the ability of autonomous learning, it is difficult to adapt to large-scale complex environments, and different algorithm modules need to be redesigned and recombined to cooperate with different environments.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a reinforcement learning-based unmanned ship hybrid perception autonomous obstacle avoidance method and system with autonomous learning and environmental characteristic adaptation capabilities.
The purpose of the invention can be realized by the following technical scheme:
an unmanned ship mixed sensing autonomous obstacle avoidance method based on reinforcement learning comprises the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
2) setting an action space and a state space: setting an action space according to the condition of the unmanned ship propeller, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
3) determining a reward function: setting reward target weight to obtain a comprehensive reward function;
4) establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, and network parameters are initialized and trained;
5) and the intelligent agent decision controller outputs: and respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the learned mean value of the strategy network.
Preferably, the interaction rule between the unmanned ship and the marine environment in the step 1) follows the own kinetic equation of the unmanned ship.
Preferably, the random obstacles generated in step 1) include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
Preferably, the motion space in step 2) includes discretized yaw force, pitch force and yaw.
Preferably, the strong learning state code in step 2) is obtained through deep network learning, and specifically includes:
and learning the characteristics of the static chart by combining the convolutional neural network and full connection with learning to obtain a static programming state code, using the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as key characteristics of the reinforcement learning state code, and reallocating the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
Preferably, the dynamic obstacle avoidance state code is:
Figure BDA0002597880580000021
wherein σtFor detecting the obstacle mark in the detection radius range,
Figure BDA0002597880580000022
The distance between the unmanned boat and the target in the world coordinate system,
Figure BDA0002597880580000023
is the angle of the unmanned ship from the target in the world coordinate system, and psi is the yaw angle, u, of the unmanned ship in the world coordinate systemtIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe heading speed of the coordinate system of the unmanned ship,
Figure BDA0002597880580000024
is the closest obstacle distance in the world coordinate system,
Figure BDA0002597880580000025
the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
Preferably, the comprehensive reward function in step 3) is a product of a reward target weight matrix and a reward target, and the reward target includes: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
Preferably, the reward targets are obtained by:
in the task of navigating the unmanned ship to the target point, if
Figure BDA0002597880580000031
Then the distance to the reward target R distance1, otherwise Rdistance=0,
Figure BDA0002597880580000032
The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if
Figure BDA0002597880580000033
Obstacle avoidance reward target RobstanceNot all right 1, otherwise Robstance=0,
Figure BDA0002597880580000034
The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is not
Figure BDA0002597880580000035
Then the speed reward target RspeedNot all right 1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is not
Figure BDA0002597880580000036
Then the energy consumption awards the target R consumption1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
Preferably, step 4) is done based on the A3C algorithm.
An unmanned ship hybrid perception autonomous obstacle avoidance system based on reinforcement learning comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the autonomous obstacle avoidance method when running the computer program.
Compared with the prior art, the invention has the following advantages:
the algorithm has high self-learning capacity, and can adapt to different large-scale complex environments through simple deployment training, so that autonomous perception, autonomous navigation and autonomous obstacle avoidance are realized;
the algorithm integrates the functions of environmental perception and navigation obstacle avoidance, and gets rid of the heavy burden of respective configuration and overall parameter adjustment caused by modular algorithm design;
the algorithm has the static planning and dynamic collision avoidance capabilities, on one hand, the track planning can be realized by learning a static sea chart, on the other hand, the algorithm can deal with sea surface real-time threats, and has reliable and stable threat avoidance capabilities.
Drawings
Fig. 1 is a schematic diagram of the overall structure of the unmanned surface vehicle hybrid sensing autonomous obstacle avoidance method based on reinforcement learning.
Fig. 2 is a schematic diagram of state coding of the unmanned ship hybrid perception reinforcement learning algorithm.
Fig. 3 is a parameter explanatory diagram of dynamic obstacle avoidance coding.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.
Examples
As shown in fig. 1, an unmanned surface vehicle hybrid perception autonomous obstacle avoidance method based on reinforcement learning includes the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and a marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
the unmanned ship and marine environment interaction rule follows the self-kinetic equation of the unmanned ship:
Figure BDA0002597880580000041
Figure BDA0002597880580000042
wherein eta is [ x, y, psi ═ x, y, psi]TContaining unmanned boat position and yaw angle information, v ═ u, upsilon, r]TContaining yaw, surge, yaw speed information, [ tau ═u,0,τt]TThe pitching force and the yawing force of the unmanned boat, M is the mass of the unmanned boat, R (psi) is a function of the yaw angle psi, and C (v) and g (v) are functions of v respectively;
the random obstacles generated include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
And 4 times of initial points and target points are randomly set for each generated marine environment, and the intelligent agent can interact for 500 times for the marine environments with different initial points and target points.
2) Setting an action space and a state space: setting an action space according to the situation of the propeller of the unmanned ship, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
the motion space comprises discretized swaying force, discretized surging force and discretized yawing force;
the reinforcement learning state code is obtained through deep network learning, and specifically comprises the following steps:
and learning the characteristics of the static chart by combining the convolutional neural network and full connection with learning to obtain a static programming state code, using the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as key characteristics of the reinforcement learning state code, and reallocating the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
Preferably, the dynamic obstacle avoidance state code is:
Figure BDA0002597880580000043
wherein σtFor detecting the obstacle mark in the detection radius range,
Figure BDA0002597880580000044
The distance between the unmanned boat and the target in the world coordinate system,
Figure BDA0002597880580000045
the angle of the unmanned ship in the world coordinate system from the target is psi, the yaw angle and u of the unmanned ship in the world coordinate system are utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe heading speed of the coordinate system of the unmanned ship,
Figure BDA0002597880580000051
is the closest obstacle distance in the world coordinate system,
Figure BDA0002597880580000052
the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
The action space of the under-actuated unmanned ship is discretized output of the surging force and the yawing force, and each propulsion force is discretized into 20 levels according to the thrust level. Referring to fig. 2, the state code learning process of reinforcement learning is obtained by static programming state codes, namely, sea chart features through combination network learning of CNN and FC, and finally compressed into 256-dimensional vectors. The diagram of the nine-tuple in the dynamic obstacle avoidance state encoding information is shown in fig. 3. The reinforcement learning state code is a 265-dimensional vector of the combination of the two codes, and is obtained by multiplying the two state codes by a learned weight matrix.
3) Determining a reward function: setting reward target weight to obtain a comprehensive reward function;
the composite reward function is the product of a reward target weight matrix and reward targets, and the reward targets comprise: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
The reward objectives are obtained by:
in the task of navigating the unmanned ship to the target point, if
Figure BDA0002597880580000053
Then the distance to the reward target RdistanceNot all right 1, otherwise Rdistance=0,
Figure BDA0002597880580000054
The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if the radar detects the obstacle
Figure BDA0002597880580000055
Then obstacle avoidance reward target R obstance1, otherwise Robstance=0,
Figure BDA0002597880580000056
The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is not
Figure BDA0002597880580000057
Then the speed reward target R speed1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is not
Figure BDA0002597880580000058
Then the energy consumption awards the target RconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
4) Establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are completed based on an A3C algorithm, the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, network parameters are initialized and trained, and in the network training process, the gradient calculation of the evaluation network meets the following updating method:
Figure BDA0002597880580000059
the gradient calculation of the policy network satisfies the following updating method:
Figure BDA00025978805800000510
wherein w is a network parameter of the policy network, θ is a network parameter of the evaluation network, stCoding the dynamic obstacle avoidance state of the unmanned ship at the moment t, atIs the decision of the unmanned ship at the time t, pi (a)t|stω) is a policy network at stAction of output in the state rtMaking a for unmanned boattReward value given by the post-decision environment, V(s)tTheta) is in the state stThe value of the network forecast is evaluated.
Two network learning updating parameters are obtained to obtain V(s) and pi (a | s), and simultaneously, a mixed perception state coding method is also obtained.
5) Outputting by the agent decision controller: and respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the learned mean value of the strategy network.
In the embodiment, during training, the output of the controller, i.e. action selection, is obtained by sampling according to the learned mean-variance strategy distribution. And when the unmanned ship collides, ending the training of the current round in advance, if the current target point and the initial point complete 500 training rounds, returning to the step 1, regenerating the target point and the initial point, and if the current environment has been set with 4 initial points and target points, regenerating the marine environment.
And (4) regenerating the marine environment, the initial point and the target point under the actual test environment, carrying out interactive observation global planning and local obstacle avoidance information on the unmanned ship and the marine environment, obtaining a reinforcement learning state code through the network trained in the step (4), and executing the action corresponding to the strategy distribution mean value under the state code, namely controller output, so as to complete the set marine operation task.
An unmanned ship hybrid sensing autonomous obstacle avoidance system based on reinforcement learning comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the autonomous obstacle avoidance method is realized when the processor runs the computer program.
The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the scope of the technical idea of the present invention.

Claims (2)

1. An unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning is characterized by comprising the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
2) setting an action space and a state space: setting an action space according to the condition of the unmanned ship propeller, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
3) determining a reward function: setting reward target weight to obtain a comprehensive reward function;
4) establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, and network parameters are initialized and trained;
5) and the intelligent agent decision controller outputs: respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the mean value of the learned strategy network;
in the step 1), the unmanned ship and marine environment interaction rule follows the own kinetic equation of the unmanned ship;
the random obstacles generated in step 1) include 4 kinds: random static obstacles which can be described by a chart, random dynamic obstacles which cannot be described by the chart, random dynamic obstacles with automatic control capability and random dynamic obstacles without automatic control capability;
the action space in the step 2) comprises discretized swaying force, surging force and yawing;
the strong learning state code in the step 2) is obtained through deep network learning, and specifically comprises the following steps:
the method comprises the steps that a static planning state code is obtained through a convolutional neural network and full connection combined learning of the features of a static chart, the static planning state code and a dynamic obstacle avoidance state code fed back by a radar system are used as key features of a reinforcement learning state code, and a final reinforcement learning state code is obtained through learning of an integral weight matrix and redistribution of importance;
the dynamic obstacle avoidance state code is as follows:
Figure FDA0003366163700000011
wherein σtFor detecting the obstacle mark in the detection radius range,
Figure FDA0003366163700000012
The distance between the unmanned boat and the target in the world coordinate system,
Figure FDA0003366163700000013
the angle of the unmanned ship in the world coordinate system from the target is psi, the yaw angle and u of the unmanned ship in the world coordinate system are utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe yaw speed of the coordinate system of the unmanned ship,
Figure FDA0003366163700000021
is the nearest obstacle distance in the world coordinate system,
Figure FDA0003366163700000022
the subscript t represents the time t, which is the nearest barrier angle in the world coordinate system;
the comprehensive reward function in the step 3) is the product of a reward target weight matrix and a reward target, and the reward target comprises: a distance reward target, an obstacle avoidance reward target, a speed reward target and an energy consumption reward target;
the reward objectives are obtained by:
in the task of navigating the unmanned ship to the target point, if
Figure FDA0003366163700000023
Then the distance to the reward target Rdistance1, otherwise Rdistance=0,
Figure FDA0003366163700000024
The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if the radar detects the obstacle
Figure FDA0003366163700000025
Obstacle avoidance reward target Robstance1, otherwise Robstance=0,
Figure FDA0003366163700000026
The subscript t represents the time t, and the subscript t +1 represents the time t +1, wherein the distance is the nearest barrier in a world coordinate system;
if it is used
Figure FDA0003366163700000027
Then the speed reward target Rspeed1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is used
Figure FDA0003366163700000028
Then the energy consumption awards the target RconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthSetting a threshold value for energy consumption;
step 4) is completed based on the A3C algorithm.
2. An unmanned boat hybrid perception autonomous obstacle avoidance system based on reinforcement learning, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the autonomous obstacle avoidance method of claim 1 when running the computer program.
CN202010715076.8A 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning Active CN111880535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010715076.8A CN111880535B (en) 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010715076.8A CN111880535B (en) 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN111880535A CN111880535A (en) 2020-11-03
CN111880535B true CN111880535B (en) 2022-07-15

Family

ID=73155952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010715076.8A Active CN111880535B (en) 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN111880535B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540614B (en) * 2020-11-26 2022-10-25 江苏科技大学 Unmanned ship track control method based on deep reinforcement learning
CN112698646B (en) * 2020-12-05 2022-09-13 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112925319B (en) * 2021-01-25 2022-06-07 哈尔滨工程大学 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN113176776B (en) * 2021-03-03 2022-08-19 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN114077258B (en) * 2021-11-22 2023-11-21 江苏科技大学 Unmanned ship pose control method based on reinforcement learning PPO2 algorithm
CN114721409B (en) * 2022-06-08 2022-09-20 山东大学 Underwater vehicle docking control method based on reinforcement learning
CN114942643B (en) * 2022-06-17 2024-05-14 华中科技大学 Construction method and application of USV unmanned ship path planning model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319276A (en) * 2017-12-26 2018-07-24 上海交通大学 Underwater robot attitude regulation control device and method based on Boolean network
CN108489491A (en) * 2018-02-09 2018-09-04 上海交通大学 A kind of Three-dimensional Track Intelligent planning method of autonomous underwater vehicle
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110775200A (en) * 2019-10-23 2020-02-11 上海交通大学 AUV quick laying and recovering device under high sea condition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319276A (en) * 2017-12-26 2018-07-24 上海交通大学 Underwater robot attitude regulation control device and method based on Boolean network
CN108489491A (en) * 2018-02-09 2018-09-04 上海交通大学 A kind of Three-dimensional Track Intelligent planning method of autonomous underwater vehicle
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110775200A (en) * 2019-10-23 2020-02-11 上海交通大学 AUV quick laying and recovering device under high sea condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Q-Learning的无人驾驶船舶路径规划;王程博 等;《船海工程》;20181031;全文 *

Also Published As

Publication number Publication date
CN111880535A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN111880535B (en) Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning
Zhou et al. The review unmanned surface vehicle path planning: Based on multi-modality constraint
Statheros et al. Autonomous ship collision avoidance navigation concepts, technologies and techniques
Perera et al. Experimental evaluations on ship autonomous navigation and collision avoidance by intelligent guidance
CN101408772B (en) AUV intelligent touching-avoiding method
CN109765929B (en) UUV real-time obstacle avoidance planning method based on improved RNN
Wang et al. Ship route planning based on double-cycling genetic algorithm considering ship maneuverability constraint
CN108416152A (en) The optimal global path planning method of unmanned boat ant colony energy consumption based on electronic chart
Oh et al. Development of collision avoidance algorithms for the c-enduro usv
CN112925319B (en) Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
Wang et al. Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm
CN111123923A (en) Unmanned ship local path dynamic optimization method
Xinchi et al. A research on intelligent obstacle avoidance for unmanned surface vehicles
Zhuang et al. Navigating high‐speed unmanned surface vehicles: System approach and validations
CN109416373A (en) Flow measurement device for structural body
Xia et al. Research on collision avoidance algorithm of unmanned surface vehicle based on deep reinforcement learning
Patil et al. Deep reinforcement learning for continuous docking control of autonomous underwater vehicles: A benchmarking study
Sun et al. Collision avoidance control for unmanned surface vehicle with COLREGs compliance
Wu et al. An overview of developments and challenges for unmanned surface vehicle autonomous berthing
Hinostroza et al. Experimental and numerical simulations of zig-zag manoeuvres of a self-running ship model
CN111694880A (en) Unmanned ship platform health management method and system based on multi-source data
Hayner et al. Halo: Hazard-aware landing optimization for autonomous systems
Ayob et al. Neuroevolutionary autonomous surface vehicle simulation in restricted waters
Stelzer Autonomous sailboat navigation
Cheng et al. Trajectory optimization for ship navigation safety using genetic annealing algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant