CN112800545B - Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN - Google Patents

Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN Download PDF

Info

Publication number
CN112800545B
CN112800545B CN202110118727.XA CN202110118727A CN112800545B CN 112800545 B CN112800545 B CN 112800545B CN 202110118727 A CN202110118727 A CN 202110118727A CN 112800545 B CN112800545 B CN 112800545B
Authority
CN
China
Prior art keywords
unmanned ship
network
path planning
adaptive path
current state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202110118727.XA
Other languages
Chinese (zh)
Other versions
CN112800545A (en
Inventor
胡潇文
刘峰
陈畅
杨茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202110118727.XA priority Critical patent/CN112800545B/en
Publication of CN112800545A publication Critical patent/CN112800545A/en
Application granted granted Critical
Publication of CN112800545B publication Critical patent/CN112800545B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Hardware Design (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Computational Mathematics (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Automation & Control Theory (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computer Vision & Pattern Recognition (AREA)

Abstract

The invention belongs to the field of unmanned ship path planning, and provides an unmanned ship to perform self-adaptive path planning in a learning mode. The method mainly comprises the following steps: constructing an unmanned ship model, and putting the unmanned ship in a simulation environment for navigation; randomly exploring the unmanned ship according to the behavior of the behavior space; acquiring environment image information through a depth camera of the unmanned ship, acquiring unmanned ship position information through a positioning system, and storing data obtained by exploration into a priority experience playback pool; data extraction of the playback pool is used for training the D3QN network; and loading the trained network model into the actual unmanned ship to plan the real environment path. The unmanned ship path planning method can ensure that the path planning precision is high, the collision rate is low and the self-adaptive capacity of the unmanned ship is strong under the condition of not needing prior information.

Description

Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN
Technical Field
The invention relates to the technical field of unmanned ship path planning, in particular to an unmanned ship self-adaptive path planning method, equipment and a storage medium based on D3 QN.
Background
With the rise of the artificial intelligence era, the unmanned ship technology is widely developed. China has a plurality of regions with severe marine environments, the self-adaptive capacity of domestic unmanned ships to the environment is poor, and various external interference factors exist, so that the domestic unmanned ship technology still does not meet the expected requirements, and a path planning algorithm with strong self-adaptive capacity and capable of coping with emergencies is urgently needed to break through the current bottleneck.
The design principle of the traditional unmanned ship path planning method is that an optimized barrier-free path is planned according to a priori map, the unmanned ship is in an instruction form following an algorithm, and once the environment changes, the algorithm cannot give the optimal guidance. The traditional method can have higher stability in a simple environment. However, in future researches, people can detect increasingly complex deep sea, complex dynamic and static obstacles and dangerous environments exist, the environment can change suddenly, and the unmanned ship can adapt to the change of the environment only by having an adaptive autonomous decision-making system without a pre-detection map.
In order to improve the self-adaptive capacity of the unmanned ship, the unmanned ship control system is required to have good cognitive ability and identification ability on the spatial information and the surrounding environment state of the unmanned ship. According to existing literature researches at present, such as a genetic algorithm, an ant colony algorithm and an a-star algorithm, although convergence can be achieved and a good effect can be achieved in a simple environment, when an emergency occurs, self-adaption capability capable of timely processing is not provided, and under the condition of strong interference, a path planning effect is greatly influenced, even collision occurs, and serious consequences are generated.
Disclosure of Invention
The invention aims to solve the problem that the defects of the prior art are overcome, and when an emergency occurs, the path planning algorithm can process in time and has good self-adaptive capacity. The unmanned ship self-adaptive path planning method based on D3QN is provided, so that the unmanned ship can avoid collision in time, and the safety coefficient is high.
In order to achieve the purpose, the unmanned ship self-adaptive path planning method based on D3QN provided by the invention comprises the following steps:
s1, constructing an unmanned ship model and an underwater simulation environment, designing a D3QN network, and putting the unmanned ship model in the underwater simulation environment for autonomous navigation;
s2, selecting a behavior A from the current state S according to an epsilon-greedy algorithm;
s3, enabling the unmanned ship to reach the next state S ' by adopting a PID position and speed error control algorithm according to the behavior A, acquiring a first position relation between the position of the next state S ' and the obstacle, acquiring a second position relation between the position of the next state S ' and a terminal point, and acquiring a return R by utilizing a reward and punishment mechanism according to the first position relation and the second position relation;
s4, acquiring environment information and position information of a current state S, merging the environment information and the position information into current state data S, acquiring environment information and position information of a next state S ', merging the environment information and the position information into next state data S ', storing the current state data S, behavior A, next state data S ' and return R into a priority experience playback pool in the form of an array D, and calculating the sampling probability of the array D in the priority experience playback pool through TD-error (the difference value between a current state function value and a target value function calculated by a time sequence difference method);
s5, extracting the array D in the experience playback pool to a D3QN network according to the sampling probability, carrying out gradient descent error training of the D3QN network, judging whether a termination condition is met, if so, obtaining a trained unmanned ship self-adaptive path planning model, and executing a step S6, otherwise, taking the next state S' as the current state S, and returning to the step S2;
and S6, importing the trained unmanned ship self-adaptive path planning model into an unmanned ship path planning system, planning the unmanned ship path in a real environment, and obtaining the unmanned ship path.
Further, the step of constructing the unmanned ship model and the underwater simulation environment and designing the D3QN network comprises the following steps:
building the unmanned ship model and the underwater simulation environment through the ROS and the Gazebo;
respectively forming a main network and a target network through an LSTM network, a convolutional neural network and a antagonistic fully-connected network;
and forming the D3QN network by the main network, the target network and the experience playback pool.
Further, a depth camera and a positioning system are arranged on the unmanned ship model;
the depth camera is used for acquiring current environment information;
the positioning system is used for acquiring the position information of the unmanned ship.
Further, step S5 specifically includes:
dividing the space of the whole preferential experience playback pool into M small ranges according to the minimum sample size M;
randomly extracting sample data in each small range according to the sampling probability;
obtaining current state data s and next state data s' according to the sample data;
processing the environmental information in the current state data s through the convolutional neural network of the main network to obtain first environmental information;
processing the position information in the current state data s through the LSTM network of the main network to obtain first position information;
combining the first environment information and the first position information and inputting the combined first environment information and the combined first position information into a reactive fully-connected network in the main network to obtain an output Q of the main network;
processing the environmental information in the next state data s' through the convolutional neural network of the target network to obtain second environmental information;
processing the position information in the next state data s' through the LSTM network of the target network to obtain second position information;
combining the second environment information and the second position information and inputting the combined second environment information and the second position information into a antagonistic fully-connected network in the target network to obtain an output Q1 of the target network;
calculating to obtain a target output Qt according to the Q1 and the Q;
calculating to obtain an error function according to the Q and the Qt;
and training the D3QN network by adopting a gradient descent method based on the error function, judging whether the error function meets a termination condition, if so, obtaining a trained unmanned ship self-adaptive path planning model, and executing a step S6, otherwise, taking the next state S' as the current state S, returning to the step S2, and retraining.
Further, the epsilon-greedy algorithm is:
Figure BDA0002921259010000031
and selecting behaviors from a behavior space by a greedy algorithm according to the probability of the epsilon at random, and selecting the behavior with the probability of 1-epsilon to obtain the behavior with the maximum output Q of the main network.
Further, the reward and punishment mechanism is as follows:
Figure BDA0002921259010000032
wherein R is the return, do represents the distance between the unmanned ship and the terminal in the current state S, and dt represents the distance between the unmanned ship and the terminal in the next state S'.
Further, the PID position and velocity error control algorithm is:
Ep=[P(x′,y′,z′)-P(x,y,z),O(r′,p′,y′)-O(r,p,y)]
Ev=[v(x′,y′,z′)-v(x,y,z),ω(x′,y′,z′)-ω(x,y,z)]
where Ep is a deviation angle, Ev is a speed deviation, r, P, and y are angles at which the unmanned ship deviates from the x, y, and z axes, respectively, P (x ', y ', z '), O (r ', P ', y ') is a position and a deviation angle of the unmanned ship in a state S ', v (x ', y ', z '), ω (x ', y ', z ') is a linear speed and an angular speed of a given target in the action a, P (x, y, z), O (r, P, y) is a position and a deviation angle of the unmanned ship in a current state S, and v (x, y, z), ω (x, y, z) is a linear speed and an angular speed of the unmanned ship in the current state S, respectively.
In addition, in order to achieve the above object, the present invention further provides an unmanned ship adaptive path planning apparatus based on D3QN, which includes a memory, a processor, and an unmanned ship adaptive path planning program stored in the memory and operable on the processor, and when executed by the processor, the unmanned ship adaptive path planning program implements the steps of any of the unmanned ship adaptive path planning methods.
In addition, in order to achieve the above object, the present invention further provides a storage medium having an unmanned ship adaptive path planning program stored thereon, wherein the unmanned ship adaptive path planning program, when executed by a processor, implements the steps of any one of the unmanned ship adaptive path planning methods.
The invention has the beneficial effects that: according to the method, a D3QN algorithm is adopted, sample information does not need to be given in advance, the network can be trained autonomously through experience obtained by autonomous exploration, and an optimal solution is obtained until training is finished; the main network based on the fusion of the LSTM and the convolutional neural network can realize the feature fusion of the unmanned ship environment, and the unmanned ship has the self-adaptive capacity to the environment change by adopting a learning mode, thereby conforming to the more intelligent development direction of the unmanned ship in the future.
Drawings
FIG. 1 is a flow chart of the implementation of the unmanned ship adaptive path planning method based on D3 QN;
FIG. 2 is a flow chart of a specific algorithm corresponding to FIG. 1;
fig. 3 is a D3QN network processing drone position and image information framework diagram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
Referring to fig. 1 and fig. 2, fig. 1 is a flowchart illustrating an implementation of the unmanned ship adaptive path planning method based on D3QN according to the present invention, and fig. 2 is a flowchart illustrating a specific algorithm corresponding to fig. 1.
The embodiment of the invention provides a D3 QN-based unmanned ship self-adaptive path planning method, which comprises the following steps:
s1, constructing an unmanned ship model and an underwater simulation environment, designing a D3QN network, and putting the unmanned ship model in the underwater simulation environment for autonomous navigation;
an unmanned ship and an underwater environment are built through the ROS and the Gazebo, a depth camera is arranged on the unmanned ship and is provided with a positioning system, and the ROS has a Topic communication function;
the method comprises the steps of combining an LSTM network, a convolutional neural network, a full-connection network with a Dueling structure, a main network and a target network with parameters lagging behind the main network by a certain number of steps, and preferentially playing back an experience pool.
Acquiring image information of the underwater simulation environment through a depth camera of the unmanned ship;
acquiring the position information of the unmanned ship through a positioning system;
and transmitting the position information of the unmanned ship and the image information of the unmanned ship from the Gazebo to an adaptive path planning algorithm for storage by adopting a Topic function of the ROS.
S2, selecting behavior A from the current state S according to an epsilon-greedy algorithm, wherein the epsilon-greedy algorithm is as follows:
Figure BDA0002921259010000051
and selecting behaviors from a behavior space by a greedy algorithm according to the probability of the epsilon at random, and selecting the behavior with the probability of 1-epsilon to obtain the behavior with the maximum output Q of the main network.
S3, enabling the unmanned ship to reach the next state S ' by adopting a PID position and speed error control algorithm according to the behavior A, obtaining a first position relation between the position of the next state S ' and the obstacle, obtaining a second position relation between the position of the next state S ' and the terminal point, and obtaining a return R by utilizing a reward and punishment mechanism according to the first position relation and the second position relation;
the PID position and speed error control algorithm is as follows:
Ep=[P(x′,y′,z′)-P(x,y,z),O(r′,p′,y′)-O(r,p,y)]
Ev=[v(x′,y′,z′)-v(x,y,z),ω(x′,y′,z′)-ω(x,y,z)]
where Ep is a deviation angle, Ev is a speed deviation, r, P, and y are angles at which the unmanned ship deviates from the x, y, and z axes, respectively, P (x ', y ', z '), O (r ', P ', y ') is a position and a deviation angle of the unmanned ship in a state S ', v (x ', y ', z '), ω (x ', y ', z ') is a linear speed and an angular speed of a given target in the action a, P (x, y, z), O (r, P, y) is a position and a deviation angle of the unmanned ship in a current state S, and v (x, y, z), ω (x, y, z) is a linear speed and an angular speed of the unmanned ship in the current state S, respectively.
The reward calculation by utilizing the reward and punishment mechanism is specifically as follows:
when the unmanned ship approaches the terminal, a small amount of reward is obtained;
when the unmanned ship is far away from the terminal point, a small amount of punishment is obtained;
when the unmanned ship arrives at the terminal, a large amount of rewards are obtained;
when the unmanned ship approaches the obstacle, a large amount of punishments are obtained;
the reward and punishment mechanism calculates the reward according to the formula:
Figure BDA0002921259010000061
wherein R is the return, do represents the distance between the unmanned ship and the terminal in the current state S, and dt represents the distance between the unmanned ship and the terminal in the next state S'.
S4, acquiring environment information and position information of a current state S, merging the environment information and the position information into current state data S, acquiring environment information and position information of a next state S ', merging the current state data S, behavior A, next state data S' and return R into 5 data groups D, storing the data groups D into a priority experience playback pool, and calculating by TD-error to obtain the sampling probability of the data groups D in the priority experience playback pool;
calculating the priority (sampling probability) according to the TD-error (time difference error) value delta of the ith sampleiThe calculation formula of (c) is:
δi=Ri+Qt(si,argmaxaQ(si,a))-Q(si-1,ai-1)
argmaxaQ(sia) status data s representing a sample iiLower selection can obtain the maximum main network output Q value Q(s)iA) behavior A, Qt(si,argmaxaQ(siA)) represents the data s in the stateiTarget network output Q value, Q(s), obtained for lower selection behavior Ai-1,ai-1) Status data s representing the i-1 th samplei-1Selection behavior ai-1And the obtained main network output Q value, gamma is an attenuation coefficient, the value is 0.8, and the transition probability of random sampling according to the priority is as follows:
Figure BDA0002921259010000071
where the alpha index represents the degree of random sampling priority, which appears as uniform random sampling when alpha is equal to 0, piIndicating the priority size of the ith sample, p when proportional sampling is usediThe size of (A) is as follows:
pi=|δi|+ε
wherein epsilon is a variable larger than 0, so as to prevent the sample with TD-error of 0 from getting no playback opportunity.
S5, extracting the array D in the experience playback pool to a D3QN network according to the sampling probability, carrying out gradient descent error training on the D3QN network, judging whether a termination condition is met, if so, obtaining a trained unmanned ship self-adaptive path planning model, and executing a step S6, otherwise, taking the next state S' as the current state S, and returning to the step S2; the method comprises the following specific steps:
dividing the space of the whole prior experience playback pool into M small ranges according to the minimum sample size M;
randomly extracting a sample data in each small range according to the sampling probability;
obtaining current state data s and next state data s' according to the sample data;
referring to fig. 3, fig. 3 is a diagram of a D3QN network processing unmanned ship position and image information framework;
processing the environmental information in the current state data s through the convolutional neural network of the main network to obtain first environmental information;
processing the position information in the current state data s through the LSTM network of the main network to obtain first position information;
combining the first environment information and the first position information and inputting the combined first environment information and the combined first position information into a reactive fully-connected network in the main network to obtain an output Q of the main network;
processing the environmental information in the next state data s' through the convolutional neural network of the target network to obtain second environmental information;
processing the position information in the next state data s' through the LSTM network of the target network to obtain second position information;
combining the second environment information and the second position information and inputting the combined second environment information and second position information into a antagonistic fully-connected network in the target network to obtain an output Q1 of the target network;
calculating to obtain a target output Qt according to the Q1 and the Q;
calculating an error function L according to the Q and the Qt;
the calculation formula of the error function L is as follows:
L(θ)=E[(R+γQt(s′,argmaxa′Q(s′,a′;θ);θ-)-Q(s,a;θ))2]
training network weight parameters by adopting a gradient descent method according to an error function; the implementation formula is as follows:
Figure BDA0002921259010000081
wherein theta is a main network weight parameter, theta-is a target network weight parameter, is an attenuation coefficient and has a value of 0.8, and Q (s, a; theta) represents a main network Q value obtained by selecting the behavior A when the main network weight parameter is theta in an s state, and argmaxa′Q (s ', a'; theta) can be inBehavior A 'capable of obtaining the maximum main network output Q value under the state data s'; qt (s', argmax)a′Q (s ', a'; theta) represents the target network output Q value obtained by selecting the behavior A 'under the state data s';
judging whether the collision is approached, if the collision is approached, returning to the past state S, and reselecting the behavior A, otherwise, continuing to execute the training step;
judging whether the end point is reached, if so, resetting to the starting point, and continuing training, otherwise, continuing to execute the training step;
judging whether the target network weight is updated (the judgment condition is that the target network weight is updated once every 500 steps), if so, copying all the main network weight parameters to the target network, and otherwise, keeping the main network weight parameters unchanged;
and judging whether the iteration times are reached, if so, terminating the training, obtaining a trained unmanned ship self-adaptive path planning model, and executing the step S6, otherwise, taking the next state S' as the current state S, and continuing to retrain from S2.
In the training process, if the collision is approached, returning to the past state S, and reselecting the behavior A; if the end point is reached, resetting to the starting point, and continuing training; if the target network weight is updated (the judgment condition is that the updating is performed once every 500 steps), all the main network weight parameters are copied to the target network; and judging whether the iteration times are reached, if so, terminating the training, obtaining a trained unmanned ship self-adaptive path planning model, executing the step S6, otherwise, taking the next state S' as the current state S, and continuing to retrain from S2.
And S6, importing the trained unmanned ship self-adaptive path planning model into an unmanned ship path planning system, planning the unmanned ship path in a real environment, and obtaining the unmanned ship path.
In addition, the embodiment of the invention also provides unmanned ship adaptive path planning equipment based on D3QN, which comprises a memory, a processor and an unmanned ship adaptive path planning program stored on the memory and capable of running on the processor, wherein the unmanned ship adaptive path planning program realizes the steps of the unmanned ship adaptive path planning method when executed by the processor.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with an unmanned ship self-adaptive path planning program, and the unmanned ship self-adaptive path planning program realizes the steps of the unmanned ship self-adaptive path planning method when being executed by a processor.
The invention has the beneficial effects that: according to the method, a D3QN algorithm is adopted, sample information does not need to be given in advance, the network can be trained autonomously through experience obtained by autonomous exploration, and an optimal solution is obtained until training is finished; the main network based on the fusion of the LSTM and the convolutional neural network can realize the feature fusion of the unmanned ship environment, and the unmanned ship has the self-adaptive capacity to the environment change by adopting a learning mode, thereby conforming to the more intelligent development direction of the unmanned ship in the future.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. An unmanned ship adaptive path planning method based on D3QN is characterized by comprising the following steps:
s1, constructing an unmanned ship model and an underwater simulation environment, designing a D3QN network, and putting the unmanned ship model in the underwater simulation environment for autonomous navigation;
s2, selecting a behavior A from the current state S according to an epsilon-greedy algorithm;
s3, enabling the unmanned ship to reach the next state S ' by adopting a PID position and speed error control algorithm according to the behavior A, obtaining a first position relation between the position of the next state S ' and an obstacle, obtaining a second position relation between the position of the next state S ' and a terminal point, and obtaining a return R by utilizing a reward and punishment mechanism according to the first position relation and the second position relation;
s4, obtaining environment information and position information of the current state S, merging the environment information and the position information into current state data S, obtaining environment information and position information of the next state S ', merging the environment information and the position information into next state data S ', storing the current state data S, the behavior A, the next state data S ' and the return R into a priority experience playback pool in a form of an array D, and calculating the sampling probability of the array D in the priority experience playback pool through TD-error;
s5, extracting the array D in the experience playback pool to a D3QN network according to the sampling probability, carrying out gradient descent error training of the D3QN network, judging whether a termination condition is met, if so, obtaining a trained unmanned ship self-adaptive path planning model, and executing a step S6, otherwise, taking the next state S' as the current state S, and returning to the step S2;
and S6, importing the trained unmanned ship self-adaptive path planning model into an unmanned ship path planning system, planning the unmanned ship path in a real environment, and obtaining the unmanned ship path.
2. The unmanned ship adaptive path planning method according to claim 1, wherein the step of constructing the unmanned ship model and the underwater simulation environment and designing the D3QN network comprises:
building the unmanned ship model and the underwater simulation environment through the ROS and the Gazebo;
respectively forming a main network and a target network through an LSTM network, a convolutional neural network and a antagonistic fully-connected network;
and forming the D3QN network by the main network, the target network and the experience playback pool.
3. The unmanned ship self-adaptive path planning method according to claim 1, wherein a depth camera and a positioning system are arranged on the unmanned ship model;
the depth camera is used for acquiring current environment information;
the positioning system is used for acquiring the position information of the unmanned ship.
4. The unmanned ship adaptive path planning method according to claim 2, wherein the step S5 specifically includes:
dividing the space of the whole preferential experience playback pool into M small ranges according to the minimum sample size M;
randomly extracting a sample data in each small range according to the sampling probability;
obtaining current state data s and next state data s' according to the sample data;
respectively processing the current state data s and the next state data s' through the main network and the target network to obtain an output Q1 of the main network and an output Q1 of the target network;
calculating to obtain a target output Qt according to the Q1 and the Q;
calculating to obtain an error function according to the Q and the Qt;
and training the D3QN network by adopting a gradient descent method based on the error function, judging whether the error function meets a termination condition, if so, obtaining a trained unmanned ship self-adaptive path planning model, and executing a step S6, otherwise, taking the next state S' as the current state S, returning to the step S2, and retraining.
5. The unmanned-vessel adaptive path planning method according to claim 4, wherein the step of processing the current state data s and the next state data s' by the main network and the target network respectively to obtain the output Q1 of the main network and the output Q1 of the target network comprises:
processing the environmental information in the current state data s through the convolutional neural network of the main network to obtain first environmental information;
processing the position information in the current state data s through the LSTM network of the main network to obtain first position information;
combining the first environment information and the first position information and inputting the combined first environment information and the combined first position information into a reactive fully-connected network in the main network to obtain an output Q of the main network;
processing the environmental information in the next state data s' through the convolutional neural network of the target network to obtain second environmental information;
processing the position information in the next state data s' through the LSTM network of the target network to obtain second position information;
and combining the second environment information and the second position information and inputting the combined second environment information and second position information into a antagonistic fully-connected network in the target network to obtain an output Q1 of the target network.
6. The unmanned ship adaptive path planning method according to claim 2, wherein the epsilon-greedy algorithm is:
Figure FDA0002921258000000031
and selecting the behavior from the behavior space by a greedy algorithm according to the probability of the epsilon, and selecting the behavior with the maximum output Q of the main network according to the probability of 1-epsilon.
7. The unmanned ship adaptive path planning method according to claim 1, wherein the reward and punishment mechanism is:
Figure FDA0002921258000000032
wherein R is the return, do represents the distance between the unmanned ship and the terminal in the current state S, and dt represents the distance between the unmanned ship and the terminal in the next state S'.
8. The unmanned-vessel adaptive path planning method according to claim 1, wherein the PID position and velocity error control algorithm is:
Ep=[P(x′,y′,z′)-P(x,y,z),O(r′,p′,y′)-O(r,p,y)]
Ev=[v(x′,y′,z′)-v(x,y,z),(x′,y′,z′)-ω(x,y,z)]
where Ep is a deviation angle, Ev is a speed deviation, r, P, and y are angles at which the unmanned ship deviates from the x, y, and z axes, respectively, P (x ', y ', z '), O (r ', P ', y ') is a position and a deviation angle of the unmanned ship in the next state S ', v (x ', y ', z '), ω (x ', y ', z ') is a linear speed and an angular speed of a given target in the action a, P (x, y, z), O (r, P, y) is a position and a deviation angle of the unmanned ship in the current state S, and v (x, y, z), ω (x, y, z) is a linear speed and an angular speed of the unmanned ship in the current state S, respectively.
9. An unmanned ship adaptive path planning device based on D3QN, characterized in that the unmanned ship adaptive path planning device comprises a memory, a processor and an unmanned ship adaptive path planning program stored on the memory and operable on the processor, wherein the unmanned ship adaptive path planning program, when executed by the processor, implements the steps of the unmanned ship adaptive path planning method according to any one of claims 1 to 8.
10. A storage medium having stored thereon an unmanned ship adaptive path planning program, which when executed by a processor implements the steps of the unmanned ship adaptive path planning method according to any one of claims 1 to 8.
CN202110118727.XA 2021-01-28 2021-01-28 Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN Expired - Fee Related CN112800545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110118727.XA CN112800545B (en) 2021-01-28 2021-01-28 Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110118727.XA CN112800545B (en) 2021-01-28 2021-01-28 Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN

Publications (2)

Publication Number Publication Date
CN112800545A CN112800545A (en) 2021-05-14
CN112800545B true CN112800545B (en) 2022-06-24

Family

ID=75812443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110118727.XA Expired - Fee Related CN112800545B (en) 2021-01-28 2021-01-28 Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN

Country Status (1)

Country Link
CN (1) CN112800545B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113411099B (en) * 2021-05-28 2022-04-29 杭州电子科技大学 Double-change frequency hopping pattern intelligent decision method based on PPER-DQN
CN113503878B (en) * 2021-07-07 2023-04-07 大连海事大学 Unmanned ship path planning method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110472738A (en) * 2019-08-16 2019-11-19 北京理工大学 A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
CN110488872A (en) * 2019-09-04 2019-11-22 中国人民解放军国防科技大学 A kind of unmanned plane real-time route planing method based on deeply study
WO2019241022A1 (en) * 2018-06-13 2019-12-19 Nvidia Corporation Path detection for autonomous machines using deep neural networks
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN111829527A (en) * 2020-07-23 2020-10-27 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN111880549A (en) * 2020-09-14 2020-11-03 大连海事大学 Unmanned ship path planning-oriented deep reinforcement learning reward function optimization method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190184561A1 (en) * 2017-12-15 2019-06-20 The Regents Of The University Of California Machine Learning based Fixed-Time Optimal Path Generation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019241022A1 (en) * 2018-06-13 2019-12-19 Nvidia Corporation Path detection for autonomous machines using deep neural networks
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110472738A (en) * 2019-08-16 2019-11-19 北京理工大学 A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
CN110488872A (en) * 2019-09-04 2019-11-22 中国人民解放军国防科技大学 A kind of unmanned plane real-time route planing method based on deeply study
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN111829527A (en) * 2020-07-23 2020-10-27 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN111880549A (en) * 2020-09-14 2020-11-03 大连海事大学 Unmanned ship path planning-oriented deep reinforcement learning reward function optimization method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进Q学习算法的无人水面艇动态环境路径规划;王猛等;《仪表技术》;20200415(第04期);第17-21页 *

Also Published As

Publication number Publication date
CN112800545A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112241176B (en) Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN111694365B (en) Unmanned ship formation path tracking method based on deep reinforcement learning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
Xiaofei et al. Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle
JP2021034050A (en) Auv action plan and operation control method based on reinforcement learning
CN112800545B (en) Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN
CN113176776B (en) Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
Cao et al. Hunting algorithm for multi-auv based on dynamic prediction of target trajectory in 3d underwater environment
CN113010963B (en) Variable-quality underwater vehicle obstacle avoidance method and system based on deep reinforcement learning
CN109784201A (en) AUV dynamic obstacle avoidance method based on four-dimensional risk assessment
CN115016496A (en) Water surface unmanned ship path tracking method based on deep reinforcement learning
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
Yao et al. A hierarchical architecture using biased min-consensus for USV path planning
CN113848984B (en) Unmanned aerial vehicle cluster control method and system
CN113190037A (en) Unmanned aerial vehicle optimal path searching method based on improved fluid disturbance and sparrow algorithm
CN114879671A (en) Unmanned ship trajectory tracking control method based on reinforcement learning MPC
CN117590867B (en) Underwater autonomous vehicle connection control method and system based on deep reinforcement learning
Wang et al. Path-following optimal control of autonomous underwater vehicle based on deep reinforcement learning
Wang et al. A greedy navigation and subtle obstacle avoidance algorithm for USV using reinforcement learning
CN116679711A (en) Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning
Amendola et al. Navigation in restricted channels under environmental conditions: Fast-time simulation by asynchronous deep reinforcement learning
CN114910072A (en) Unmanned aerial vehicle navigation method, device, equipment and medium based on deep reinforcement learning
Gao et al. An optimized path planning method for container ships in Bohai bay based on improved deep Q-learning
CN117406716A (en) Unmanned ship collision avoidance control method, unmanned ship collision avoidance control device, terminal equipment and medium
Pereira et al. Reinforcement learning based robot navigation using illegal actions for autonomous docking of surface vehicles in unknown environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Hu Xiaowen

Inventor after: Liu Feng

Inventor after: Chen Chang

Inventor after: Yang Qian

Inventor before: Liu Feng

Inventor before: Hu Xiaowen

Inventor before: Chen Chang

Inventor before: Yang Qian

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220624

CF01 Termination of patent right due to non-payment of annual fee