CN109726866A - Unmanned boat paths planning method based on Q learning neural network - Google Patents

Unmanned boat paths planning method based on Q learning neural network Download PDF

Info

Publication number
CN109726866A
CN109726866A CN201811612058.6A CN201811612058A CN109726866A CN 109726866 A CN109726866 A CN 109726866A CN 201811612058 A CN201811612058 A CN 201811612058A CN 109726866 A CN109726866 A CN 109726866A
Authority
CN
China
Prior art keywords
state
deviation
function
usv
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811612058.6A
Other languages
Chinese (zh)
Inventor
冯海林
吕扬民
方益明
周国模
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang A&F University ZAFU
Original Assignee
Zhejiang A&F University ZAFU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang A&F University ZAFU filed Critical Zhejiang A&F University ZAFU
Priority to CN201811612058.6A priority Critical patent/CN109726866A/en
Publication of CN109726866A publication Critical patent/CN109726866A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses unmanned boat paths planning method of the kind based on Q learning neural network, comprising the following steps: a), initializes memory block D;B), Q network, state, movement initial value are initialized;C), training objective is set at random;D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) be stored in the D of memory block;E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), the state when USV reaches target position, or when more than every wheel maximum time is regarded as end-state;If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter, and return step d, algorithm terminates after repeating n wheel;G), target is set, carries out path planning with the Q network after training, until USV reaches target position.Decision-making time of the invention is short, route is more optimized, can satisfy the requirement of real-time planned online.

Description

Unmanned boat paths planning method based on Q learning neural network
Technical field
The invention belongs to unmanned ship field of intelligent control, are specifically related to a kind of nothing based on Q learning neural network People's ship paths planning method.
Background technique
Water quality monitoring is the main method of water quality assessment and prevention water pollution.With increasing for industrial wastewater, water pollution The problem of getting worse, the demand of water pollution dynamic monitoring is very urgent.But because traditional water quality monitoring method step It is various, time-consuming, but the data diversity that gets, accuracy are much unsatisfactory for the demand of decision.It is more according to the above problem Kind of water quality monitoring method is suggested, and if Cao Lijie et al. is proposed by establishing sensor network, obtains more that accurately water quality is anti- Drill model.Field et al. proposes to obtain the water quality parameter distribution in monitoring waters to satellite data progress inverting by water quality model Figure.But above method can not neatly replace monitoring waters, and project amount is big, step is various, in contrast water quality monitoring without Not by the influence of topography, energy continuity carries out multinomial water quality parameter monitoring in situ, makes in people's hull product small easy to carry, monitoring field Monitoring result has more diversity and accuracy.
Unmanned ship (Unmanned Surface Vehicle, USV) is that one kind can be under unknown water environment certainly Main navigation, and the Water surface motion platform completed various tasks, because extensive by application field, research contents be related to automatic Pilot, Automatic obstacle avoiding, navigation planning and pattern-recognition etc. are many-sided.It can not only be used to the clearance of military field, scout and anti- Latent operation etc. can be also used for hydrometeorology detection, environmental monitoring and search and rescue waterborne of civil field etc..But by In the mobility of water, Various Complex landform can be flowed through, staff can not detect, such as when water flows through cave;Or again due to Weather it is changeable, if waters is chronically at foggy days, keep staff unsighted, can not accurately to USV real-time operation, The autonomous navigation that can use USV reaches target water level and is detected, and autonomous navigation function passage path planning technology is subject to It realizes.
USV Path Planning Technique refers to USV in operation waters, and according to certain performance indicator, (such as distance is most short, the time It is most short etc.) search obtains a nothing from starting point to target point and touches path, and it is core component in USV airmanship, even more Represent USV intelligent level standard.Currently used planing method mainly have particle swarm algorithm, A* algorithm, Visual Graph method, Artificial Potential Field Method, ant group algorithm etc., but its method is chiefly used under the conditions of known environment.
Preferable solution has been obtained currently for the trajectory planning problem under known environment, but USV is in unknown waters It is unable to get the environmental information that will monitor waters before job execution task, the road based on known environment information can not be passed through Diameter planing method goes planning USV navigation path.Secondly because monitoring water environment is complicated, sensor information is more, the meter of system Operator workload is big, and it is poor to cause USV there are real-times, the disadvantages of oscillation before barrier.Therefore USV path planning is badly in need of research calculation Method is simple, strong real-time, and can control the path planning algorithm of the indeterminacy phenomenon in system, and it is therefore necessary to introduce to have The method of independent learning ability, wherein the path planning based on Q learning algorithm is suitable for the path planning in circumstances not known.It is existing Guo Na et al. is on the basis of traditional Q-learning algorithm in research, carries out movement selection using simulated annealing method, solve to explore with The equilibrium problem utilized.Chen Zili et al.] it proposes that genetic algorithm is used to establish new Q value table to carry out static global path rule It draws.Dong Peifang et al. Artificial Potential Field Method is added in Q learning algorithm, using gravitation potential field as initial environment prior information, then it is right Environment is successively searched for, and Q value iteration is accelerated.
Notification number is that the Chinese patent of CN108106623A discloses a kind of unmanned vehicle paths planning method based on flow field, The following steps are included: establishing flow field calculation model according to the barrier in the starting point of vehicle, terminal and environment;With front wheel angle For input quantity, coordinate and course angle are quantity of state, establish vehicle kinematics model;Using vehicle kinematics model as rolling Equation, the rolling time horizon optimization problem for solving flow field are obtained using flow field velocity vector distribution as the guidance information of path planning To planning path, wherein optimized amount is front wheel angle, optimization aim include vehicle movement with flow field movement reach it is consistent and It does not collide with barrier during vehicle movement, constraint condition is that front wheel angle is no more than steering wheel hard-over.The party Method can find connection source and terminal, smooth and avoidance path in complicated landform.Under the premise of avoidance, path Slickness and completeness obtain preferable effect simultaneously.But this method needs to know the position of environment terrain and barrier, cannot Path planning is carried out for unknown field.
Summary of the invention
Goal of the invention: in order to overcome the deficiencies in the prior art, the invention proposes one kind to be based on BP neural network Q learn intensified learning path planning algorithm, with neural network fitting Q learning method in Q function, enable it to continuous System mode as input, and by experience replay and be arranged target network method significantly improve network in the training process Convergence rate.By experiment simulation, the feasibility of modified two-step method planing method presented here is demonstrated.
Technical solution: to achieve the above object, a kind of unmanned boat path planning based on Q learning neural network of the invention A kind of method, comprising the following steps: unmanned boat paths planning method based on Q learning neural network, which is characterized in that including Following steps:
A), memory block D is initialized;
B), Q network, state, movement initial value are initialized;It include following element: S, A, P in Q networks,α, R, wherein S table Show the set of system mode locating for USV, A indicates the set for the movement that USV can take, Ps,αIndicate that systematic state transfer is general Rate, R indicate reward function;
C), training objective is set at random;
D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) be stored to In storage area D;
E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), when USV reaches Target position, or more than it is every wheel maximum time when state be regarded as end-state;
If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter, And return step d, algorithm terminates after repeating n wheel;
G), target is set, carries out path planning with the Q network after training, until USV reaches target position.
Preferably, memory block D is experience replay memory block in step a), for storing USV navigation process soldier acquisition Training sample, the presence of experience replay not being continuous in time between multiple samples when each training.
Preferably, the algorithmic rule of Q network are as follows:
Q(st,at)=Q (st,at)+αδ't
Wherein, function Q (st, at) it is in state stShi Zhihang acts at, α is learning rate, δ 'tFor TD (0) deviation, TD (0) 0 in indicate is 1 step, the values seen forward under current state more are as follows:
δ't=R (st)+γV(st+1)-Q(st,at)
Wherein, γ is discount factor, and R (s) is reward function, and V (s) is value function, value function Alternatively, it is also possible to which TD (0) deviation is defined as
δt+1=R (st+1)+γV(st+2)-V(st+1)
Wherein, δt+1For TD (0) deviation, R (s) is reward function, and V (s) is value function,
Come to carry out discount to the TD deviation in step in future using another discount factor λ ∈ [0,1],
Wherein, function Q (st,at) it is in state stShi Zhihang acts at, α is learning rate,For the deviation of TD (λ), TD (λ) is to see that λ is walked forward under current state more;
The deviation of TD (λ) hereinIt is defined as
Wherein, δ 'tThe deviation for the study of represent over,The deviation of multistep study is carried out, γ is discount factor, λ For discount factor, and λ ∈ [0,1], δt+iFor the deviation learnt now.
Preferably, by ηt(s a) is defined as characteristic function: t moment (s, a) occur, then return to 1, otherwise return to 0, To put it more simply, ignore learning efficiency, (s a) defines one with trace e to eacht(s,a)
So it is in moment t online updating
Q (s, a)=Q (s, a)+α [δ 'tηt(s,a)+
δtet(s,a)]
Wherein, (s is a) the execution movement a in state s to function Q, and α is learning rate, ηt(s a) is characterized function, et(s, A) for trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by adding up back Report the deviation δ ' of R (s) and current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Preferably, the overall income expectation harvested when intensified learning wishes to run system is maximum, i.e. E (R (s0)+γ R(s1)+γ2R(s2)+...) it is maximum, need to find an optimal policy π thus, so that when USV carries out decision and movement according to π When, the total revenue of acquisition is maximum,
The objective function of intensified learning is one of:
Vπ(s)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, π)
Qπ(s, a)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, a0=a, π)
Wherein, Vπ(s) it indicates at current original state s, can be obtained expectation according to the decision movement of tactful π and receive Benefit;And Qπ(s a) is indicated to take movement a at current state s, be moved in the state of all later all in accordance with the decision of tactful π It can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income phase harvested when system operation It hopes, R (st) indicate t moment reward function, γ is discount factor;
Purpose is exactly to find optimal policy π in Q study*, so that
Preferably, definitionQπ(s, a) refer in state s execution act a, and it The decision all done according to optimal policy in the state of all afterwards moves the expected revenus size that can be harvested, it is assumed that Q*(s,a) It is known that so can be easily by Q*(s a) generates π*As long as making to each sIt sets up, In this way, the problem of seeking optimal policy, which translates into, seeks Q*(s, a), due to having:
Q*(s, a)=R (s0)+γE(R(s1)+
γR(s2)+…|s1,a1)
Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π decision Movement can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income harvested when system operation It is expected that R (st) indicate t moment reward function, γ is discount factor;
And a1By π*It determines, then:
a1Indicate the movement taken under optimal policy, π*(s1) indicate optimal policy, Qπ(s a) is indicated in current state s Under take movement a, it is all later in the state of all in accordance with tactful π decision movement can be obtained expected revenus,
Then according to the graceful equation of Buddhist script written on pattra leaves, Q function is iterated and is found out.
Preferably, the graceful equation of Buddhist script written on pattra leaves is with recursive formal definition Q*(s, a), so that Q function can be iterated It finds out, the graceful equation of Buddhist script written on pattra leaves are as follows:
Wherein, Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π's Decision movement can be obtained expected revenus, R (s0) represent reward function, η1(s a) is characterized function, e1(s, a) indicate with Trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by accumulative return R (s) With the deviation δ ' of current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Preferably, reward function is divided into 3 kinds, the first is rewarded at a distance from target position for USV;Second Kind is that USV arrival target position is rewarded;The third is punished for USV and barrier collision;Specifically:
Preferably, the value range for repeating to take turns number n is 3000-5000 in step f.
The utility model has the advantages that
The present invention compared with the prior art, has the advantage, that
1, the method for intensified learning of the invention solve water quality monitoring unmanned boat when unknown waters carries out water quality monitoring oneself Leading bit path planning problem is fitted Q function by BP neural network, so that trained strategy can be according to current The real time information of barrier makes a policy in environment.
2, method of the invention can be such that water quality monitoring unmanned boat is cooked up in circumstances not known according to different states feasible Path, and the decision-making time is short, route is more optimized, can satisfy the requirement of real-time planned online, to overcome traditional Q The disadvantage that paths planning method is computationally intensive, convergence rate is slow is practised, it can first time monitoring problem waters.
3, the present invention is enabled it to the Q function in neural network fitting Q learning method with continuous system mode work To input, and the convergence rate of network in the training process is significantly improved by experience replay and setting target network method.
4, the present invention improves traditional Q-learning, realizes Q value iteration, the output pair of network using BP neural network Should each movement Q value, the state of the corresponding description environment of the input of network.
5, the present invention returns to different reward values for different situations, so that USV exists by the design of reward function When study and exploration more efficiently.
Detailed description of the invention
Fig. 1 is;Overall flow figure of the invention;
Fig. 2 is;The analogous diagram of complex water areas landform;
Fig. 3 is: complex water areas landform is actually reached a little absolute error figure figure at a distance from target point;
Fig. 4 is: the analogous diagram in simple concentric circles labyrinth;
Fig. 5 is: simple concentric circles labyrinth is actually reached a little absolute error figure figure at a distance from target point;
Fig. 6 is: the analogous diagram of complex maze;
Fig. 7 is: complex maze is actually reached a little absolute error figure figure at a distance from target point;
Fig. 8 is: the simulation result diagram of East Lake background;
Fig. 9 is: East Lake background is actually reached a little absolute error figure figure at a distance from target point;
Figure 10 is: the number of iterations figure of East Lake background.
Specific embodiment
The present invention will be further explained with reference to the accompanying drawings and examples.
Embodiment one:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment, comprising the following steps: a kind of Unmanned boat paths planning method based on Q learning neural network, which comprises the following steps:
A), memory block D is initialized;
B), Q network, state, movement initial value are initialized;It include following element: S, A, P in Q networks,α, R, wherein S table Show the set of system mode locating for USV, A indicates the set for the movement that USV can take, Ps,αIndicate that systematic state transfer is general Rate, R indicate reward function;
C), training objective is set at random;
D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) be stored to In storage area D;
E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), when USV reaches Target position, or more than it is every wheel maximum time when state be regarded as end-state;
If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter, And return step d, algorithm terminates after repeating n wheel;
G), target is set, carries out path planning with the Q network after training, until USV reaches target position.
Wherein, D is experience replay memory block, for storing USV navigation process and acquiring training sample.Experience replay In the presence of so that not being continuously, to minimize the correlation between sample in time between multiple samples when training every time Enhance the stability and accuracy of sample;
Embodiment two:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment one, tradition Q learning algorithm specifically:
Q study is to describe problem, Ma Erke based on markov decision process (Markov Decision Process) Husband's decision process includes 4 elements: S, A, Ps,a,R.Wherein S indicates system mode set locating for USV, i.e., USV is current State and current environment state, such as the size and location of barrier;A indicates the set of actions that USV can take, i.e. USV The direction of rotation;Ps,aExpression system model, i.e. systematic state transfer probability, P (s'| s a) is described at current state s, After execution acts a, system reaches the probability of state s;R indicates reward function, has current state and the movement taken to determine It is fixed.Q study is regarded as and finds strategy and makes overall merit maximum increment type planning, the thought of Q study be do not consider environment because Element, but directly one Q function that can be iterated to calculate of optimization, defined function Q (st, at) it is in state stShi Zhihang acts at, And accoumulation of discount reinforcement value when hereafter optimal action sequence executes, it may be assumed that
In formula, stFor t moment USV state in which, st+1For subsequent time USV state in which, atFor holding for t moment Action is made, and γ is discount factor, value 0≤γ≤1;R(st) it is reward function, value is positive number or negative.In initial rank In section study, Q value may be improperly to reflect strategy defined in them, initial Q0(s, a) for all state and Movement is it is assumed that provide.Assuming that the possible set of actions A of state set s, USV of given environment is selectively more, data volume Greatly, a large amount of system memory space is needed to go to store, and can not be extensive.In order to overcome drawbacks described above, traditional Q-learning is carried out It improves, realizes that Q value iteration, the input correspondence of the Q value of the corresponding each movement of the output of network, network are retouched using BP neural network State the state of environment.
Improved Q learning path planning algorithm
Q (λ) algorithm is to use for reference TD (λ) algorithm to generate, and allows data constantly to transmit by the thought of backtracking, so that a certain The movement decision of state is influenced by its succeeding state.If the following a certain decision π is the decision of a failure, when Preceding decision will also undertake corresponding punishment, this influence can be appended to current decision;If the following a certain decision π is one A correct decision equally also will affect in current decision then current decision can also be rewarded accordingly.In conjunction with It can be improved convergence speed of the algorithm after improvement, meet the practicability of study.Improved Q (λ) algorithm updates rule
Q(st,at)=Q (st,at)+αδ't (2)
Wherein, function Q (st, at) it is in state stShi Zhihang acts at, α is learning rate, is TD (0) deviation, TD (0) In 0 indicate is 1 step, the values seen forward under current state more are as follows:
δ't=R (st)+γV(st+1)-Q(st,at) (3)
Wherein, γ is discount factor, and R (s) is reward function, and V (s) is value function, value function Alternatively, it is also possible to which TD (0) deviation is defined as
δt+1=R (st+1)+γV(st+2)-V(st+1) (4)
Wherein, δt+1For TD (0) deviation, R (s) is reward function, and V (s) is value function, and what 0 in TD (0) indicated is It is forward under current state to see 1 step more.
Wherein also come to carry out discount to the TD deviation in step in future using discount factor λ ∈ [0,1].
Wherein, function Q (st,at) it is in state stShi Zhihang acts at, α is learning rate,
Introducing a new parameter lambda herein can be accomplished by this new parameter in the feelings for not increasing computation complexity The prediction that all step numbers are comprehensively considered under condition is for controlling weight as γ parameter before.TD (λ) is current It is forward under state to see that λ is walked more.
The deviation of TD (λ) hereinIt is defined as
Wherein, δ 'tThe deviation that the study of represent over obtains,Carry out multistep study deviation, γ be discount because Son, λ are discount factor, and λ ∈ [0,1], δt+iFor the deviation learnt now.
Embodiment three
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment two, as long as The TD deviation in future is unknown, and update above can not just carry out.But them can be gradually calculated by using with trace. Below by ηt(s a) is defined as characteristic function: in t moment, (s a) occurs, then returns to 1, otherwise return to 0.To put it more simply, ignoring Learning efficiency, to it is each (s, a) define one with trace et(s,a)
So it is in moment t online updating
Q (s, a)=Q (s, a)+α [δ 'tηt(s,a)+
δtet(s,a)] (8)
Wherein, (s is a) the execution movement a in state s to function Q, and α is learning rate, ηt(s a) is characterized function, et(s, A) for trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by adding up back Report the deviation δ ' of R (s) and current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Example IV:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment three, strengthens The overall income expectation that study harvests when wishing to run system is maximum, needs to find an optimal policy π thus, so that working as When USV carries out decision and movement according to π, the total revenue of acquisition is maximum.In general, the objective function of intensified learning be it is following wherein One of:
Vπ(s)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, π)
Qπ(s, a)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, a0=a, π) (9)
Wherein, Vπ(s) it indicates at current original state s, can be obtained expectation according to the decision movement of tactful π and receive Benefit;And Qπ(s a) is indicated to take movement a at current state s, be moved in the state of all later all in accordance with the decision of tactful π It can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income phase harvested when system operation It hopes, R (st) indicates the reward function of t moment, and γ is discount factor.
Purpose is exactly to find optimal policy π in Q study*, so that
Embodiment five:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on example IV, definitionQπ(s, a) refer in state s execution act a, and it is all later in the state of all according to most The decision that dominant strategy is done moves the expected revenus size that can be harvested.Assuming that Q*(s, a) it is known that so can be easily By Q*(s a) generates π*As long as making to each sIt sets up.In this way, the problem of seeking optimal policy It translates into and seeks Q*(s,a).Due to having:
Q*(s, a)=R (s0)+γE(R(s1)+
γR(s2)+…|s1,a1) (10)
Wherein, Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π's Decision movement can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the totality harvested when system operation Profit expectation, R (st) indicate t moment reward function, γ is discount factor;
And a1By π*It determines, then:
Wherein, a1Indicate the movement taken under optimal policy, π*(s1) indicate optimal policy, Qπ(s a) is indicated current Taken under state s movement a, it is all later in the state of all in accordance with tactful π decision movement can be obtained expected revenus, so Afterwards according to the graceful equation of Buddhist script written on pattra leaves, Q function is iterated and is found out.
Embodiment six
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment five, Buddhist script written on pattra leaves Graceful equation is with recursive formal definition Q*(s a) is found out so that Q function can be iterated, the graceful equation of Buddhist script written on pattra leaves are as follows:
Wherein, Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π's Decision movement can be obtained expected revenus, R (s0) represent reward function, η1(s a) is characterized function, e1(s, a) indicate with Trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by accumulative return R (s) With the deviation δ ' of current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Q function is to save and update in table form in traditional Q-learning algorithm, but in USV obstacle-avoiding route planning, Since barrier is likely to occur any position in space, Q function is difficult to describe in continuous space in table form The barrier of appearance.Therefore herein on Q learning foundation, depth Q study is fitted Q function with BP neural network, inputs shape State s is continuous variable.In general, being difficult to restrain with learning process when nonlinear function approximation Q function, experience replay is used thus Improve study stability with the method for target network.
Embodiment seven:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment six, strong During chemistry is practised, the design of reward function directly affects the quality of learning effect.In general, reward function corresponds to people to a certain The description of business is incorporated in study by the priori knowledge that the design of reward function can solve task.In USV path planning, Wish to avoid safely bumping against with barrier during USV is also wanted to while reaching target position as early as possible in navigation.This Reward function is divided into 3 kinds by text, the first is rewarded at a distance from target position for USV;Second is USV arrival target Position is rewarded;The third is punished for USV and barrier collision.Reward function are as follows:
From magnitude, first and second kind of reward value is bigger than first reward value.Because coming for USV avoidance task It says, main target is exactly avoiding obstacles and gets to target position, rather than only shortens USV at a distance from target position, The reason of this is added be, if only target position is reached to USV and USV knocks barrier and rewarded and punished, It is 0 that a large amount of step reward will be so had in motion process, this meeting is so that USV will not be improved in most cases Strategy, learning efficiency are low.This reward is added and is equivalent to the priori knowledge that joined people to this task, so that USV is learning With explore when more efficiently.
Embodiment eight:
In order to examine the path planning algorithm designed herein, emulation experiment is carried out on Matlab2014a software herein. In an experiment, simulated environment is the region of 20*20, and discount factor γ value is 0.9, and memory block D is sized to 40000, circulation Number 1000, neural network first layer have 64 neurons, and the second layer has 32 neurons.In trained each round, whenever When USV bumps against barrier or the arrival target position USV, which is all immediately finished, and returns to a reward.
For the accuracy for verifying context of methods, it will be tested using labyrinth landform, three kinds of different terrains will be designed herein Carry out the comparison of algorithm, respectively complex water areas landform (as shown in Figure 2), simple concentric circles labyrinth landform is (such as Fig. 4 institute Show), complex maze landform (as shown in Figure 6).This paper innovatory algorithm is emulated with traditional Q-learning algorithm in the above landform, The innovatory algorithm route that blue represents it can be seen from path profile compares the route of traditional Q-learning algorithm simulating, path length It is shorter, it is simpler and more direct.By be actually reached a little at a distance from target point it can be seen from absolute error figure innovatory algorithm than traditional Q Habit algorithm shifts to an earlier date one third and carries out convergence stabilization.
Embodiment nine:
Experiment simulation is carried out by taking the actual environment background of Linan East Lake waters as an example, as seen from Figure 8, USV is in simulation process In have no that appearance and barrier bump against and path is simple and fast.Fig. 9 is standard error curve, Figure 10 is learning curve, You Shangtu Find out, when frequency of training reaches 56 times, curve tends to be steady, illustrate substantially to cook up a safe and efficient whole route, USV can avoiding obstacles arrival target position in most cases at this time.It therefore deduces that, based on BP neural network Q learning algorithm is improved than traditional Q-learning algorithm, learns convergence rate faster, path is more optimized.
The above is only a preferred embodiment of the present invention, it should be pointed out that: those skilled in the art are come It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications should also regard For protection scope of the present invention.

Claims (9)

1. a kind of unmanned boat paths planning method based on Q learning neural network, which comprises the following steps:
A), memory block D is initialized;
B), Q network, state, movement initial value are initialized;It include following element: S, A, P in Q networks,α, R, wherein wherein S table Show the set of system mode locating for USV, A indicates the set for the movement that USV can take, Ps,αIndicate that systematic state transfer is general Rate, R indicate reward function;
C), training objective is set at random;
D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) it is stored to memory block In D;
E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), when USV reaches target Position, or more than it is every wheel maximum time when state be regarded as end-state;
If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter, and return Step d, algorithm terminates after repeating n wheel;
G), target is set, carries out path planning with the Q network after training, until USV reaches target position.
2. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that step It is rapid a) in, memory block D be experience replay memory block, for store USV navigation process soldier acquire training sample.
3. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that Q net The algorithmic rule of network are as follows:
Q(st,at)=Q (st,at)+αδ′t
Wherein, function Q (st, at) it is in state stShi Zhihang acts at, α is learning rate, δ 'tFor TD (0) deviation, in TD (0) 0 indicate is 1 step, the values seen forward under current state more are as follows:
δ′t=R (st)+γV(st+1)-Q(st,at)
Wherein, γ is discount factor, and R (s) is reward function, and V (s) is value function, value functionIn addition, TD (0) deviation can also be defined as
δt+1=R (st+1)+γV(st+2)-V(st+1)
Wherein, δt+1For TD (0) deviation, R (s) is reward function, and V (s) is value function,
Come to carry out discount to the TD deviation in step in future using another discount factor λ ∈ [0,1],
Q(st,at)=Q (st,at)+αδt λ
Wherein, function Q (st,at) it is in state stShi Zhihang acts at, α is learning rate, δt λFor the deviation of TD (λ), TD (λ) It is to see that λ is walked forward under current state more;
The deviation δ of TD (λ) hereint λIt is defined as
Wherein, δ 'tThe deviation that the study of represent over obtains, δt λThe deviation of multistep study is carried out, γ is discount factor, λ For discount factor, and λ ∈ [0,1], δt+iFor the deviation learnt now.
4. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that by ηt (s a) is defined as characteristic function: t moment (s a) occurs, then returns to 1, otherwise return to 0, to put it more simply, ignore learning efficiency, To it is each (s, a) define one with trace et(s,a)
So it is in moment t online updating
Q (s, a)=Q (s, a)+α [δ 'tηt(s,a)+
δtet(s,a)]
Wherein, (s is a) the execution movement a in state s to function Q, and α is learning rate, ηt(s a) is characterized function, et(s a) is With trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now.
5. the unmanned boat paths planning method according to claim 4 based on Q learning neural network, which is characterized in that strong Chemistry practises the overall income expectation maximum harvested when wishing to run system, needs to find an optimal policy π thus, so that working as When USV carries out decision and movement according to π, the total revenue of acquisition is maximum,
The objective function of intensified learning is one of:
Vπ(s)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, π)
Qπ(s, a)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, a0=a, π)
Wherein, Vπ(s) it indicates at current original state s, can be obtained expected revenus according to the decision movement of tactful π;And Qπ (s a) indicates to take acting a at current state s, can obtain in the state of all later all in accordance with the decision movement of tactful π Expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income expectation harvested when system operation, R (st) indicate The reward function of t moment, γ are discount factor;
Purpose is exactly to find optimal policy π in Q study*, so that
6. the unmanned boat paths planning method according to claim 5 based on Q learning neural network, which is characterized in that fixed JusticeQπ(s, a) refer in state s execution act a, and it is all later in the state of all bases The decision that optimal policy is done moves the expected revenus size that can be harvested, it is assumed that Q*(s, a) it is known that so can be easily By Q*(s a) generates π*As long as making to each sIt sets up, in this way, the problem of seeking optimal policy It translates into and seeks Q*(s, a), due to having:
Q*(s, a)=R (s0)+γE(R(s1)+
γR(s2)+…|s1,a1)
Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π decision move institute Obtainable expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income expectation harvested when system operation, R (st) indicate t moment reward function, γ is discount factor;
And a1By π*It determines, then:
a1Indicate the movement taken under optimal policy, π*(s1) indicate optimal policy, Qπ(s a) indicates to take at current state s Act a, it is all later in the state of all in accordance with tactful π decision movement can be obtained expected revenus,
Then according to the graceful equation of Buddhist script written on pattra leaves, Q function is iterated and is found out.
7. the unmanned boat paths planning method according to claim 6 based on Q learning neural network, which is characterized in that shellfish The graceful equation of text is with recursive formal definition Q*(s a) is found out so that Q function can be iterated, the graceful equation of Buddhist script written on pattra leaves are as follows:
Wherein, Qπ(s a) is indicated to take movement a at current state s, be transported in the state of all later all in accordance with the decision of tactful π It is dynamic to can be obtained expected revenus, R (s0) represent reward function, η1(s a) is characterized function, e1(s a) is indicated with trace, δ 't The deviation that the study of represent over obtains, δ1For the deviation learnt now, δ1It is R (s) and current by adding up to return Estimate the deviation δ ' of V (s)t, and update to obtain multiplied by learning rate with the deviation.
8. the unmanned boat paths planning method according to claim 6 or 7 based on Q learning neural network, which is characterized in that Reward function is divided into 3 kinds, the first is rewarded at a distance from target position for USV;Second is USV arrival target position It is rewarded;The third is punished for USV and barrier collision;Specifically:
9. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that step In rapid f, the value range for repeating to take turns number n is 3000-5000.
CN201811612058.6A 2018-12-27 2018-12-27 Unmanned boat paths planning method based on Q learning neural network Pending CN109726866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811612058.6A CN109726866A (en) 2018-12-27 2018-12-27 Unmanned boat paths planning method based on Q learning neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811612058.6A CN109726866A (en) 2018-12-27 2018-12-27 Unmanned boat paths planning method based on Q learning neural network

Publications (1)

Publication Number Publication Date
CN109726866A true CN109726866A (en) 2019-05-07

Family

ID=66297307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811612058.6A Pending CN109726866A (en) 2018-12-27 2018-12-27 Unmanned boat paths planning method based on Q learning neural network

Country Status (1)

Country Link
CN (1) CN109726866A (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321666A (en) * 2019-08-09 2019-10-11 重庆理工大学 Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110618686A (en) * 2019-10-30 2019-12-27 江苏科技大学 Unmanned ship track control method based on explicit model predictive control
CN110716574A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep Q network
CN110716575A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning
CN110836518A (en) * 2019-11-12 2020-02-25 上海建科建筑节能技术股份有限公司 System basic knowledge based global optimization control method for self-learning air conditioning system
CN110865539A (en) * 2019-11-18 2020-03-06 华南理工大学 Unmanned ship tracking error constraint control method under random interference
CN110955239A (en) * 2019-11-12 2020-04-03 中国地质大学(武汉) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111176122A (en) * 2020-02-11 2020-05-19 哈尔滨工程大学 Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN111273670A (en) * 2020-03-03 2020-06-12 大连海事大学 Unmanned ship collision avoidance method for fast moving barrier
CN111308890A (en) * 2020-02-27 2020-06-19 大连海事大学 Unmanned ship data-driven reinforcement learning control method with designated performance
CN111415048A (en) * 2020-04-10 2020-07-14 大连海事大学 Vehicle path planning method based on reinforcement learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
CN111829527A (en) * 2020-07-23 2020-10-27 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN112163720A (en) * 2020-10-22 2021-01-01 哈尔滨工程大学 Multi-agent unmanned electric vehicle battery replacement scheduling method based on Internet of vehicles
CN112188600A (en) * 2020-09-22 2021-01-05 南京信息工程大学滨江学院 Method for optimizing heterogeneous network resources by using reinforcement learning
CN112202848A (en) * 2020-09-15 2021-01-08 中国科学院计算技术研究所 Unmanned system network self-adaptive routing method and system based on deep reinforcement learning
CN112215290A (en) * 2020-10-16 2021-01-12 苏州大学 Q learning auxiliary data analysis method and system based on Fisher score
CN112327821A (en) * 2020-07-08 2021-02-05 东莞市均谊视觉科技有限公司 Intelligent cleaning robot path planning method based on deep reinforcement learning
CN112525213A (en) * 2021-02-10 2021-03-19 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN112543038A (en) * 2020-11-02 2021-03-23 杭州电子科技大学 Intelligent anti-interference decision method of frequency hopping system based on HAQL-PSO
CN112567399A (en) * 2019-09-23 2021-03-26 阿里巴巴集团控股有限公司 System and method for route optimization
CN112600759A (en) * 2020-12-10 2021-04-02 东北大学 Multipath traffic scheduling method and system based on deep reinforcement learning under Overlay network
CN112698646A (en) * 2020-12-05 2021-04-23 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN112880663A (en) * 2021-01-19 2021-06-01 西北工业大学 AUV reinforcement learning path planning method considering accumulated errors
CN112947431A (en) * 2021-02-03 2021-06-11 海之韵(苏州)科技有限公司 Unmanned ship path tracking method based on reinforcement learning
CN113415195A (en) * 2021-08-11 2021-09-21 国网江苏省电力有限公司苏州供电分公司 Alignment guiding and visualization method for wireless charging system of electric vehicle
CN113720346A (en) * 2021-09-02 2021-11-30 重庆邮电大学 Vehicle path planning method and system based on potential energy field and hidden Markov model
CN113721604A (en) * 2021-08-04 2021-11-30 哈尔滨工业大学 Intelligent track control method of unmanned surface vehicle considering sea wave encountering angle
CN113848974A (en) * 2021-09-28 2021-12-28 西北工业大学 Aircraft trajectory planning method and system based on deep reinforcement learning
CN113966596A (en) * 2019-06-11 2022-01-21 瑞典爱立信有限公司 Method and apparatus for data traffic routing
CN114518758A (en) * 2022-02-08 2022-05-20 中建八局第三建设有限公司 Q learning-based indoor measuring robot multi-target-point moving path planning method
CN116523165A (en) * 2023-06-30 2023-08-01 吉林大学 Collaborative optimization method for AMR path planning and production scheduling of flexible job shop
CN116596174A (en) * 2023-04-28 2023-08-15 北京大数据先进技术研究院 Path planning method, device, equipment and storage medium for integrating cost and benefit

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180094286A (en) * 2017-02-15 2018-08-23 국방과학연구소 Path Planning System of Unmanned Surface Vehicle for Autonomous Tracking of Underwater Acoustic Target
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN108803313A (en) * 2018-06-08 2018-11-13 哈尔滨工程大学 A kind of paths planning method based on ocean current prediction model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180094286A (en) * 2017-02-15 2018-08-23 국방과학연구소 Path Planning System of Unmanned Surface Vehicle for Autonomous Tracking of Underwater Acoustic Target
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN108803313A (en) * 2018-06-08 2018-11-13 哈尔滨工程大学 A kind of paths planning method based on ocean current prediction model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周志华: "《机器学习》", 31 January 2016, 清华大学出版社 *
徐莉: "Q-learning研究及其在AUV局部路径规划中的应用", 《中国优秀博硕士学位论文全文数据库(硕士)》 *

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113966596A (en) * 2019-06-11 2022-01-21 瑞典爱立信有限公司 Method and apparatus for data traffic routing
CN113966596B (en) * 2019-06-11 2024-03-01 瑞典爱立信有限公司 Method and apparatus for data traffic routing
CN110321666A (en) * 2019-08-09 2019-10-11 重庆理工大学 Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm
CN110321666B (en) * 2019-08-09 2022-05-03 重庆理工大学 Multi-robot path planning method based on priori knowledge and DQN algorithm
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN112567399A (en) * 2019-09-23 2021-03-26 阿里巴巴集团控股有限公司 System and method for route optimization
CN110716575A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning
CN110716574B (en) * 2019-09-29 2023-05-02 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep Q network
CN110716574A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep Q network
CN112799386B (en) * 2019-10-25 2021-11-23 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN110618686A (en) * 2019-10-30 2019-12-27 江苏科技大学 Unmanned ship track control method based on explicit model predictive control
CN110955239A (en) * 2019-11-12 2020-04-03 中国地质大学(武汉) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
CN110836518A (en) * 2019-11-12 2020-02-25 上海建科建筑节能技术股份有限公司 System basic knowledge based global optimization control method for self-learning air conditioning system
CN110865539A (en) * 2019-11-18 2020-03-06 华南理工大学 Unmanned ship tracking error constraint control method under random interference
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
US11747155B2 (en) 2019-12-31 2023-09-05 Goertek Inc. Global path planning method and device for an unmanned vehicle
CN111176122A (en) * 2020-02-11 2020-05-19 哈尔滨工程大学 Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN111308890B (en) * 2020-02-27 2022-08-26 大连海事大学 Unmanned ship data-driven reinforcement learning control method with designated performance
CN111308890A (en) * 2020-02-27 2020-06-19 大连海事大学 Unmanned ship data-driven reinforcement learning control method with designated performance
CN111273670A (en) * 2020-03-03 2020-06-12 大连海事大学 Unmanned ship collision avoidance method for fast moving barrier
CN111273670B (en) * 2020-03-03 2024-03-15 大连海事大学 Unmanned ship collision prevention method for fast moving obstacle
CN111415048B (en) * 2020-04-10 2024-04-19 大连海事大学 Vehicle path planning method based on reinforcement learning
CN111415048A (en) * 2020-04-10 2020-07-14 大连海事大学 Vehicle path planning method based on reinforcement learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
CN111694365B (en) * 2020-07-01 2021-04-20 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
CN112327821A (en) * 2020-07-08 2021-02-05 东莞市均谊视觉科技有限公司 Intelligent cleaning robot path planning method based on deep reinforcement learning
CN111829527A (en) * 2020-07-23 2020-10-27 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN111829527B (en) * 2020-07-23 2021-07-20 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN112202848A (en) * 2020-09-15 2021-01-08 中国科学院计算技术研究所 Unmanned system network self-adaptive routing method and system based on deep reinforcement learning
CN112202848B (en) * 2020-09-15 2021-11-30 中国科学院计算技术研究所 Unmanned system network self-adaptive routing method and system based on deep reinforcement learning
CN112188600B (en) * 2020-09-22 2023-05-30 南京信息工程大学滨江学院 Method for optimizing heterogeneous network resources by reinforcement learning
CN112188600A (en) * 2020-09-22 2021-01-05 南京信息工程大学滨江学院 Method for optimizing heterogeneous network resources by using reinforcement learning
CN112215290B (en) * 2020-10-16 2024-04-09 苏州大学 Fisher score-based Q learning auxiliary data analysis method and Fisher score-based Q learning auxiliary data analysis system
CN112215290A (en) * 2020-10-16 2021-01-12 苏州大学 Q learning auxiliary data analysis method and system based on Fisher score
CN112163720A (en) * 2020-10-22 2021-01-01 哈尔滨工程大学 Multi-agent unmanned electric vehicle battery replacement scheduling method based on Internet of vehicles
CN112543038A (en) * 2020-11-02 2021-03-23 杭州电子科技大学 Intelligent anti-interference decision method of frequency hopping system based on HAQL-PSO
CN112543038B (en) * 2020-11-02 2022-03-11 杭州电子科技大学 Intelligent anti-interference decision method of frequency hopping system based on HAQL-PSO
CN112698646A (en) * 2020-12-05 2021-04-23 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112698646B (en) * 2020-12-05 2022-09-13 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112600759A (en) * 2020-12-10 2021-04-02 东北大学 Multipath traffic scheduling method and system based on deep reinforcement learning under Overlay network
CN112600759B (en) * 2020-12-10 2022-06-03 东北大学 Multipath traffic scheduling method and system based on deep reinforcement learning under Overlay network
CN112880663A (en) * 2021-01-19 2021-06-01 西北工业大学 AUV reinforcement learning path planning method considering accumulated errors
CN112947431A (en) * 2021-02-03 2021-06-11 海之韵(苏州)科技有限公司 Unmanned ship path tracking method based on reinforcement learning
CN112525213B (en) * 2021-02-10 2021-05-14 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN112525213A (en) * 2021-02-10 2021-03-19 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN113721604B (en) * 2021-08-04 2024-04-12 哈尔滨工业大学 Intelligent track control method of unmanned surface vehicle considering sea wave encountering angle
CN113721604A (en) * 2021-08-04 2021-11-30 哈尔滨工业大学 Intelligent track control method of unmanned surface vehicle considering sea wave encountering angle
CN113415195A (en) * 2021-08-11 2021-09-21 国网江苏省电力有限公司苏州供电分公司 Alignment guiding and visualization method for wireless charging system of electric vehicle
CN113720346A (en) * 2021-09-02 2021-11-30 重庆邮电大学 Vehicle path planning method and system based on potential energy field and hidden Markov model
CN113720346B (en) * 2021-09-02 2023-07-04 重庆邮电大学 Vehicle path planning method and system based on potential energy field and hidden Markov model
CN113848974B (en) * 2021-09-28 2023-08-15 西安因诺航空科技有限公司 Aircraft trajectory planning method and system based on deep reinforcement learning
CN113848974A (en) * 2021-09-28 2021-12-28 西北工业大学 Aircraft trajectory planning method and system based on deep reinforcement learning
CN114518758B (en) * 2022-02-08 2023-12-12 中建八局第三建设有限公司 Indoor measurement robot multi-target point moving path planning method based on Q learning
CN114518758A (en) * 2022-02-08 2022-05-20 中建八局第三建设有限公司 Q learning-based indoor measuring robot multi-target-point moving path planning method
CN116596174B (en) * 2023-04-28 2023-10-20 北京大数据先进技术研究院 Path planning method, device, equipment and storage medium for integrating cost and benefit
CN116596174A (en) * 2023-04-28 2023-08-15 北京大数据先进技术研究院 Path planning method, device, equipment and storage medium for integrating cost and benefit
CN116523165B (en) * 2023-06-30 2023-12-01 吉林大学 Collaborative optimization method for AMR path planning and production scheduling of flexible job shop
CN116523165A (en) * 2023-06-30 2023-08-01 吉林大学 Collaborative optimization method for AMR path planning and production scheduling of flexible job shop

Similar Documents

Publication Publication Date Title
CN109726866A (en) Unmanned boat paths planning method based on Q learning neural network
Li et al. Path planning for UAV ground target tracking via deep reinforcement learning
CN110083165B (en) Path planning method of robot in complex narrow environment
CN108820157B (en) Intelligent ship collision avoidance method based on reinforcement learning
Xia et al. Neural inverse reinforcement learning in autonomous navigation
CN111399506A (en) Global-local hybrid unmanned ship path planning method based on dynamic constraints
Xia et al. Cooperative task assignment and track planning for multi-UAV attack mobile targets
CN108803321A (en) Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN109597425B (en) Unmanned aerial vehicle navigation and obstacle avoidance method based on reinforcement learning
CN112034887A (en) Optimal path training method for unmanned aerial vehicle to avoid cylindrical barrier to reach target point
CN108319293A (en) A kind of UUV Realtime collision free planing methods based on LSTM networks
CN110926477A (en) Unmanned aerial vehicle route planning and obstacle avoidance method
Wang et al. Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN111240345A (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN112947594B (en) Unmanned aerial vehicle-oriented track planning method
CN112824998A (en) Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process
CN115143970B (en) Obstacle avoidance method and system of underwater vehicle based on threat degree evaluation
Bai et al. USV path planning algorithm based on plant growth
Yao et al. Multi-USV cooperative path planning by window update based self-organizing map and spectral clustering
Sood et al. Meta-heuristic techniques for path planning: recent trends and advancements
Ramezani et al. UAV path planning employing MPC-reinforcement learning method considering collision avoidance
Kong et al. An FM*-based comprehensive path planning system for robotic floating garbage cleaning
Li et al. Deep reinforcement learning based adaptive real-time path planning for UAV
CN114609925B (en) Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190507

RJ01 Rejection of invention patent application after publication