CN111880535B - Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning - Google Patents
Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning Download PDFInfo
- Publication number
- CN111880535B CN111880535B CN202010715076.8A CN202010715076A CN111880535B CN 111880535 B CN111880535 B CN 111880535B CN 202010715076 A CN202010715076 A CN 202010715076A CN 111880535 B CN111880535 B CN 111880535B
- Authority
- CN
- China
- Prior art keywords
- unmanned ship
- network
- reward
- target
- coordinate system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000003068 static effect Effects 0.000 claims abstract description 19
- 238000011156 evaluation Methods 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 17
- 230000009471 action Effects 0.000 claims abstract description 15
- 230000008447 perception Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000005265 energy consumption Methods 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 230000004888 barrier function Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 3
- 230000001172 regenerating effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0257—Control of position or course in two dimensions specially adapted to land vehicles using a radar
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Aviation & Aerospace Engineering (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to an unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning, wherein the method comprises the following steps of: 1) building a marine environment; 2) setting an action space according to the condition of the unmanned ship propeller, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code; 3) setting reward target weight to obtain a comprehensive reward function; 4) building and training an evaluation network and a strategy network; 5) and respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the learned mean value of the strategy network. Compared with the prior art, the invention has high self-learning ability, can adapt to different large-scale complex environments through simple deployment training, and further realizes autonomous perception, autonomous navigation and autonomous obstacle avoidance.
Description
Technical Field
The invention relates to an unmanned ship autonomous obstacle avoidance method and system, in particular to an unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning.
Background
The unmanned ship is an unmanned water vehicle capable of realizing autonomous navigation, autonomous obstacle avoidance and autonomous water surface operation, and has the advantages of small volume, high speed, good stealth, no casualty risk and the like. The unmanned ship is very suitable for executing water surface operation tasks in dangerous sea areas with greater risks to casualties of people or simple water surface operation tasks with low requirements on personnel participation degree, has good cost-effectiveness ratio, and is widely and effectively applied to the fields of ocean monitoring, ocean investigation, maritime search and rescue, unmanned freight transportation and the like.
At present, the mainstream thought for realizing autonomous navigation of the unmanned ship is to deploy and apply autonomous sensing, autonomous navigation and autonomous obstacle avoidance algorithms respectively, and each algorithm is matched with each other to complement and complete navigation and operation tasks. For example, algorithms such as pattern recognition, target detection and the like are involved in vision system perception, main ideas for realizing global planning autonomous navigation include a grid graph method, an A-algorithm, a genetic algorithm and the like, and methods such as an artificial potential field method, optimal interaction collision avoidance and the like are mainly applied to local dynamic collision avoidance. Although the methods have good performance in respective application backgrounds, different functional modules need to be elaborately designed, and parameters need to be integrally configured and adjusted for a comprehensive algorithm, so that the unmanned ship intelligent algorithm is complex and tedious to realize. Furthermore, because these methods lack the ability of autonomous learning, it is difficult to adapt to large-scale complex environments, and different algorithm modules need to be redesigned and recombined to cooperate with different environments.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a reinforcement learning-based unmanned ship hybrid perception autonomous obstacle avoidance method and system with autonomous learning and environmental characteristic adaptation capabilities.
The purpose of the invention can be realized by the following technical scheme:
an unmanned ship mixed sensing autonomous obstacle avoidance method based on reinforcement learning comprises the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
2) setting an action space and a state space: setting an action space according to the condition of the unmanned ship propeller, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
3) determining a reward function: setting reward target weight to obtain a comprehensive reward function;
4) establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, and network parameters are initialized and trained;
5) and the intelligent agent decision controller outputs: and respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the learned mean value of the strategy network.
Preferably, the interaction rule between the unmanned ship and the marine environment in the step 1) follows the own kinetic equation of the unmanned ship.
Preferably, the random obstacles generated in step 1) include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
Preferably, the motion space in step 2) includes discretized yaw force, pitch force and yaw.
Preferably, the strong learning state code in step 2) is obtained through deep network learning, and specifically includes:
and learning the characteristics of the static chart by combining the convolutional neural network and full connection with learning to obtain a static programming state code, using the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as key characteristics of the reinforcement learning state code, and reallocating the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
Preferably, the dynamic obstacle avoidance state code is:
wherein σtFor detecting the obstacle mark in the detection radius range,The distance between the unmanned boat and the target in the world coordinate system,is the angle of the unmanned ship from the target in the world coordinate system, and psi is the yaw angle, u, of the unmanned ship in the world coordinate systemtIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe heading speed of the coordinate system of the unmanned ship,is the closest obstacle distance in the world coordinate system,the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
Preferably, the comprehensive reward function in step 3) is a product of a reward target weight matrix and a reward target, and the reward target includes: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
Preferably, the reward targets are obtained by:
in the task of navigating the unmanned ship to the target point, ifThen the distance to the reward target R distance1, otherwise Rdistance=0,The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, ifObstacle avoidance reward target RobstanceNot all right 1, otherwise Robstance=0,The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is notThen the speed reward target RspeedNot all right 1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is notThen the energy consumption awards the target R consumption1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
Preferably, step 4) is done based on the A3C algorithm.
An unmanned ship hybrid perception autonomous obstacle avoidance system based on reinforcement learning comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the autonomous obstacle avoidance method when running the computer program.
Compared with the prior art, the invention has the following advantages:
the algorithm has high self-learning capacity, and can adapt to different large-scale complex environments through simple deployment training, so that autonomous perception, autonomous navigation and autonomous obstacle avoidance are realized;
the algorithm integrates the functions of environmental perception and navigation obstacle avoidance, and gets rid of the heavy burden of respective configuration and overall parameter adjustment caused by modular algorithm design;
the algorithm has the static planning and dynamic collision avoidance capabilities, on one hand, the track planning can be realized by learning a static sea chart, on the other hand, the algorithm can deal with sea surface real-time threats, and has reliable and stable threat avoidance capabilities.
Drawings
Fig. 1 is a schematic diagram of the overall structure of the unmanned surface vehicle hybrid sensing autonomous obstacle avoidance method based on reinforcement learning.
Fig. 2 is a schematic diagram of state coding of the unmanned ship hybrid perception reinforcement learning algorithm.
Fig. 3 is a parameter explanatory diagram of dynamic obstacle avoidance coding.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.
Examples
As shown in fig. 1, an unmanned surface vehicle hybrid perception autonomous obstacle avoidance method based on reinforcement learning includes the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and a marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
the unmanned ship and marine environment interaction rule follows the self-kinetic equation of the unmanned ship:
wherein eta is [ x, y, psi ═ x, y, psi]TContaining unmanned boat position and yaw angle information, v ═ u, upsilon, r]TContaining yaw, surge, yaw speed information, [ tau ═u,0,τt]TThe pitching force and the yawing force of the unmanned boat, M is the mass of the unmanned boat, R (psi) is a function of the yaw angle psi, and C (v) and g (v) are functions of v respectively;
the random obstacles generated include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
And 4 times of initial points and target points are randomly set for each generated marine environment, and the intelligent agent can interact for 500 times for the marine environments with different initial points and target points.
2) Setting an action space and a state space: setting an action space according to the situation of the propeller of the unmanned ship, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
the motion space comprises discretized swaying force, discretized surging force and discretized yawing force;
the reinforcement learning state code is obtained through deep network learning, and specifically comprises the following steps:
and learning the characteristics of the static chart by combining the convolutional neural network and full connection with learning to obtain a static programming state code, using the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as key characteristics of the reinforcement learning state code, and reallocating the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
Preferably, the dynamic obstacle avoidance state code is:
wherein σtFor detecting the obstacle mark in the detection radius range,The distance between the unmanned boat and the target in the world coordinate system,the angle of the unmanned ship in the world coordinate system from the target is psi, the yaw angle and u of the unmanned ship in the world coordinate system are utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe heading speed of the coordinate system of the unmanned ship,is the closest obstacle distance in the world coordinate system,the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
The action space of the under-actuated unmanned ship is discretized output of the surging force and the yawing force, and each propulsion force is discretized into 20 levels according to the thrust level. Referring to fig. 2, the state code learning process of reinforcement learning is obtained by static programming state codes, namely, sea chart features through combination network learning of CNN and FC, and finally compressed into 256-dimensional vectors. The diagram of the nine-tuple in the dynamic obstacle avoidance state encoding information is shown in fig. 3. The reinforcement learning state code is a 265-dimensional vector of the combination of the two codes, and is obtained by multiplying the two state codes by a learned weight matrix.
3) Determining a reward function: setting reward target weight to obtain a comprehensive reward function;
the composite reward function is the product of a reward target weight matrix and reward targets, and the reward targets comprise: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
The reward objectives are obtained by:
in the task of navigating the unmanned ship to the target point, ifThen the distance to the reward target RdistanceNot all right 1, otherwise Rdistance=0,The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if the radar detects the obstacleThen obstacle avoidance reward target R obstance1, otherwise Robstance=0,The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is notThen the speed reward target R speed1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is notThen the energy consumption awards the target RconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
4) Establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are completed based on an A3C algorithm, the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, network parameters are initialized and trained, and in the network training process, the gradient calculation of the evaluation network meets the following updating method:
the gradient calculation of the policy network satisfies the following updating method:
wherein w is a network parameter of the policy network, θ is a network parameter of the evaluation network, stCoding the dynamic obstacle avoidance state of the unmanned ship at the moment t, atIs the decision of the unmanned ship at the time t, pi (a)t|stω) is a policy network at stAction of output in the state rtMaking a for unmanned boattReward value given by the post-decision environment, V(s)tTheta) is in the state stThe value of the network forecast is evaluated.
Two network learning updating parameters are obtained to obtain V(s) and pi (a | s), and simultaneously, a mixed perception state coding method is also obtained.
5) Outputting by the agent decision controller: and respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the learned mean value of the strategy network.
In the embodiment, during training, the output of the controller, i.e. action selection, is obtained by sampling according to the learned mean-variance strategy distribution. And when the unmanned ship collides, ending the training of the current round in advance, if the current target point and the initial point complete 500 training rounds, returning to the step 1, regenerating the target point and the initial point, and if the current environment has been set with 4 initial points and target points, regenerating the marine environment.
And (4) regenerating the marine environment, the initial point and the target point under the actual test environment, carrying out interactive observation global planning and local obstacle avoidance information on the unmanned ship and the marine environment, obtaining a reinforcement learning state code through the network trained in the step (4), and executing the action corresponding to the strategy distribution mean value under the state code, namely controller output, so as to complete the set marine operation task.
An unmanned ship hybrid sensing autonomous obstacle avoidance system based on reinforcement learning comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the autonomous obstacle avoidance method is realized when the processor runs the computer program.
The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the scope of the technical idea of the present invention.
Claims (2)
1. An unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning is characterized by comprising the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
2) setting an action space and a state space: setting an action space according to the condition of the unmanned ship propeller, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
3) determining a reward function: setting reward target weight to obtain a comprehensive reward function;
4) establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, and network parameters are initialized and trained;
5) and the intelligent agent decision controller outputs: respectively inputting the reinforcement learning state codes into an evaluation network and a strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the mean value of the learned strategy network;
in the step 1), the unmanned ship and marine environment interaction rule follows the own kinetic equation of the unmanned ship;
the random obstacles generated in step 1) include 4 kinds: random static obstacles which can be described by a chart, random dynamic obstacles which cannot be described by the chart, random dynamic obstacles with automatic control capability and random dynamic obstacles without automatic control capability;
the action space in the step 2) comprises discretized swaying force, surging force and yawing;
the strong learning state code in the step 2) is obtained through deep network learning, and specifically comprises the following steps:
the method comprises the steps that a static planning state code is obtained through a convolutional neural network and full connection combined learning of the features of a static chart, the static planning state code and a dynamic obstacle avoidance state code fed back by a radar system are used as key features of a reinforcement learning state code, and a final reinforcement learning state code is obtained through learning of an integral weight matrix and redistribution of importance;
the dynamic obstacle avoidance state code is as follows:
wherein σtFor detecting the obstacle mark in the detection radius range,The distance between the unmanned boat and the target in the world coordinate system,the angle of the unmanned ship in the world coordinate system from the target is psi, the yaw angle and u of the unmanned ship in the world coordinate system are utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe yaw speed of the coordinate system of the unmanned ship,is the nearest obstacle distance in the world coordinate system,the subscript t represents the time t, which is the nearest barrier angle in the world coordinate system;
the comprehensive reward function in the step 3) is the product of a reward target weight matrix and a reward target, and the reward target comprises: a distance reward target, an obstacle avoidance reward target, a speed reward target and an energy consumption reward target;
the reward objectives are obtained by:
in the task of navigating the unmanned ship to the target point, ifThen the distance to the reward target Rdistance1, otherwise Rdistance=0,The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if the radar detects the obstacleObstacle avoidance reward target Robstance1, otherwise Robstance=0,The subscript t represents the time t, and the subscript t +1 represents the time t +1, wherein the distance is the nearest barrier in a world coordinate system;
if it is usedThen the speed reward target Rspeed1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is usedThen the energy consumption awards the target RconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthSetting a threshold value for energy consumption;
step 4) is completed based on the A3C algorithm.
2. An unmanned boat hybrid perception autonomous obstacle avoidance system based on reinforcement learning, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the autonomous obstacle avoidance method of claim 1 when running the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010715076.8A CN111880535B (en) | 2020-07-23 | 2020-07-23 | Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010715076.8A CN111880535B (en) | 2020-07-23 | 2020-07-23 | Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111880535A CN111880535A (en) | 2020-11-03 |
CN111880535B true CN111880535B (en) | 2022-07-15 |
Family
ID=73155952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010715076.8A Active CN111880535B (en) | 2020-07-23 | 2020-07-23 | Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111880535B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112540614B (en) * | 2020-11-26 | 2022-10-25 | 江苏科技大学 | Unmanned ship track control method based on deep reinforcement learning |
CN112698646B (en) * | 2020-12-05 | 2022-09-13 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN112925319B (en) * | 2021-01-25 | 2022-06-07 | 哈尔滨工程大学 | Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning |
CN113176776B (en) * | 2021-03-03 | 2022-08-19 | 上海大学 | Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning |
CN114077258B (en) * | 2021-11-22 | 2023-11-21 | 江苏科技大学 | Unmanned ship pose control method based on reinforcement learning PPO2 algorithm |
CN114721409B (en) * | 2022-06-08 | 2022-09-20 | 山东大学 | Underwater vehicle docking control method based on reinforcement learning |
CN114942643B (en) * | 2022-06-17 | 2024-05-14 | 华中科技大学 | Construction method and application of USV unmanned ship path planning model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319276A (en) * | 2017-12-26 | 2018-07-24 | 上海交通大学 | Underwater robot attitude regulation control device and method based on Boolean network |
CN108489491A (en) * | 2018-02-09 | 2018-09-04 | 上海交通大学 | A kind of Three-dimensional Track Intelligent planning method of autonomous underwater vehicle |
CN109540151A (en) * | 2018-03-25 | 2019-03-29 | 哈尔滨工程大学 | A kind of AUV three-dimensional path planning method based on intensified learning |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment |
CN110775200A (en) * | 2019-10-23 | 2020-02-11 | 上海交通大学 | AUV quick laying and recovering device under high sea condition |
-
2020
- 2020-07-23 CN CN202010715076.8A patent/CN111880535B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319276A (en) * | 2017-12-26 | 2018-07-24 | 上海交通大学 | Underwater robot attitude regulation control device and method based on Boolean network |
CN108489491A (en) * | 2018-02-09 | 2018-09-04 | 上海交通大学 | A kind of Three-dimensional Track Intelligent planning method of autonomous underwater vehicle |
CN109540151A (en) * | 2018-03-25 | 2019-03-29 | 哈尔滨工程大学 | A kind of AUV three-dimensional path planning method based on intensified learning |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment |
CN110775200A (en) * | 2019-10-23 | 2020-02-11 | 上海交通大学 | AUV quick laying and recovering device under high sea condition |
Non-Patent Citations (1)
Title |
---|
基于Q-Learning的无人驾驶船舶路径规划;王程博 等;《船海工程》;20181031;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111880535A (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111880535B (en) | Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning | |
Zhou et al. | The review unmanned surface vehicle path planning: Based on multi-modality constraint | |
Statheros et al. | Autonomous ship collision avoidance navigation concepts, technologies and techniques | |
Perera et al. | Experimental evaluations on ship autonomous navigation and collision avoidance by intelligent guidance | |
CN101408772B (en) | AUV intelligent touching-avoiding method | |
CN109765929B (en) | UUV real-time obstacle avoidance planning method based on improved RNN | |
Wang et al. | Ship route planning based on double-cycling genetic algorithm considering ship maneuverability constraint | |
CN108416152A (en) | The optimal global path planning method of unmanned boat ant colony energy consumption based on electronic chart | |
Oh et al. | Development of collision avoidance algorithms for the c-enduro usv | |
CN112925319B (en) | Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning | |
Wang et al. | Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm | |
CN111123923A (en) | Unmanned ship local path dynamic optimization method | |
Xinchi et al. | A research on intelligent obstacle avoidance for unmanned surface vehicles | |
Zhuang et al. | Navigating high‐speed unmanned surface vehicles: System approach and validations | |
CN109416373A (en) | Flow measurement device for structural body | |
Xia et al. | Research on collision avoidance algorithm of unmanned surface vehicle based on deep reinforcement learning | |
Patil et al. | Deep reinforcement learning for continuous docking control of autonomous underwater vehicles: A benchmarking study | |
Sun et al. | Collision avoidance control for unmanned surface vehicle with COLREGs compliance | |
Wu et al. | An overview of developments and challenges for unmanned surface vehicle autonomous berthing | |
Hinostroza et al. | Experimental and numerical simulations of zig-zag manoeuvres of a self-running ship model | |
CN111694880A (en) | Unmanned ship platform health management method and system based on multi-source data | |
Hayner et al. | Halo: Hazard-aware landing optimization for autonomous systems | |
Ayob et al. | Neuroevolutionary autonomous surface vehicle simulation in restricted waters | |
Stelzer | Autonomous sailboat navigation | |
Cheng et al. | Trajectory optimization for ship navigation safety using genetic annealing algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |