CN109726866A - Unmanned boat paths planning method based on Q learning neural network - Google Patents
Unmanned boat paths planning method based on Q learning neural network Download PDFInfo
- Publication number
- CN109726866A CN109726866A CN201811612058.6A CN201811612058A CN109726866A CN 109726866 A CN109726866 A CN 109726866A CN 201811612058 A CN201811612058 A CN 201811612058A CN 109726866 A CN109726866 A CN 109726866A
- Authority
- CN
- China
- Prior art keywords
- state
- deviation
- function
- usv
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses unmanned boat paths planning method of the kind based on Q learning neural network, comprising the following steps: a), initializes memory block D;B), Q network, state, movement initial value are initialized;C), training objective is set at random;D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) be stored in the D of memory block;E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), the state when USV reaches target position, or when more than every wheel maximum time is regarded as end-state;If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter, and return step d, algorithm terminates after repeating n wheel;G), target is set, carries out path planning with the Q network after training, until USV reaches target position.Decision-making time of the invention is short, route is more optimized, can satisfy the requirement of real-time planned online.
Description
Technical field
The invention belongs to unmanned ship field of intelligent control, are specifically related to a kind of nothing based on Q learning neural network
People's ship paths planning method.
Background technique
Water quality monitoring is the main method of water quality assessment and prevention water pollution.With increasing for industrial wastewater, water pollution
The problem of getting worse, the demand of water pollution dynamic monitoring is very urgent.But because traditional water quality monitoring method step
It is various, time-consuming, but the data diversity that gets, accuracy are much unsatisfactory for the demand of decision.It is more according to the above problem
Kind of water quality monitoring method is suggested, and if Cao Lijie et al. is proposed by establishing sensor network, obtains more that accurately water quality is anti-
Drill model.Field et al. proposes to obtain the water quality parameter distribution in monitoring waters to satellite data progress inverting by water quality model
Figure.But above method can not neatly replace monitoring waters, and project amount is big, step is various, in contrast water quality monitoring without
Not by the influence of topography, energy continuity carries out multinomial water quality parameter monitoring in situ, makes in people's hull product small easy to carry, monitoring field
Monitoring result has more diversity and accuracy.
Unmanned ship (Unmanned Surface Vehicle, USV) is that one kind can be under unknown water environment certainly
Main navigation, and the Water surface motion platform completed various tasks, because extensive by application field, research contents be related to automatic Pilot,
Automatic obstacle avoiding, navigation planning and pattern-recognition etc. are many-sided.It can not only be used to the clearance of military field, scout and anti-
Latent operation etc. can be also used for hydrometeorology detection, environmental monitoring and search and rescue waterborne of civil field etc..But by
In the mobility of water, Various Complex landform can be flowed through, staff can not detect, such as when water flows through cave;Or again due to
Weather it is changeable, if waters is chronically at foggy days, keep staff unsighted, can not accurately to USV real-time operation,
The autonomous navigation that can use USV reaches target water level and is detected, and autonomous navigation function passage path planning technology is subject to
It realizes.
USV Path Planning Technique refers to USV in operation waters, and according to certain performance indicator, (such as distance is most short, the time
It is most short etc.) search obtains a nothing from starting point to target point and touches path, and it is core component in USV airmanship, even more
Represent USV intelligent level standard.Currently used planing method mainly have particle swarm algorithm, A* algorithm, Visual Graph method,
Artificial Potential Field Method, ant group algorithm etc., but its method is chiefly used under the conditions of known environment.
Preferable solution has been obtained currently for the trajectory planning problem under known environment, but USV is in unknown waters
It is unable to get the environmental information that will monitor waters before job execution task, the road based on known environment information can not be passed through
Diameter planing method goes planning USV navigation path.Secondly because monitoring water environment is complicated, sensor information is more, the meter of system
Operator workload is big, and it is poor to cause USV there are real-times, the disadvantages of oscillation before barrier.Therefore USV path planning is badly in need of research calculation
Method is simple, strong real-time, and can control the path planning algorithm of the indeterminacy phenomenon in system, and it is therefore necessary to introduce to have
The method of independent learning ability, wherein the path planning based on Q learning algorithm is suitable for the path planning in circumstances not known.It is existing
Guo Na et al. is on the basis of traditional Q-learning algorithm in research, carries out movement selection using simulated annealing method, solve to explore with
The equilibrium problem utilized.Chen Zili et al.] it proposes that genetic algorithm is used to establish new Q value table to carry out static global path rule
It draws.Dong Peifang et al. Artificial Potential Field Method is added in Q learning algorithm, using gravitation potential field as initial environment prior information, then it is right
Environment is successively searched for, and Q value iteration is accelerated.
Notification number is that the Chinese patent of CN108106623A discloses a kind of unmanned vehicle paths planning method based on flow field,
The following steps are included: establishing flow field calculation model according to the barrier in the starting point of vehicle, terminal and environment;With front wheel angle
For input quantity, coordinate and course angle are quantity of state, establish vehicle kinematics model;Using vehicle kinematics model as rolling
Equation, the rolling time horizon optimization problem for solving flow field are obtained using flow field velocity vector distribution as the guidance information of path planning
To planning path, wherein optimized amount is front wheel angle, optimization aim include vehicle movement with flow field movement reach it is consistent and
It does not collide with barrier during vehicle movement, constraint condition is that front wheel angle is no more than steering wheel hard-over.The party
Method can find connection source and terminal, smooth and avoidance path in complicated landform.Under the premise of avoidance, path
Slickness and completeness obtain preferable effect simultaneously.But this method needs to know the position of environment terrain and barrier, cannot
Path planning is carried out for unknown field.
Summary of the invention
Goal of the invention: in order to overcome the deficiencies in the prior art, the invention proposes one kind to be based on BP neural network
Q learn intensified learning path planning algorithm, with neural network fitting Q learning method in Q function, enable it to continuous
System mode as input, and by experience replay and be arranged target network method significantly improve network in the training process
Convergence rate.By experiment simulation, the feasibility of modified two-step method planing method presented here is demonstrated.
Technical solution: to achieve the above object, a kind of unmanned boat path planning based on Q learning neural network of the invention
A kind of method, comprising the following steps: unmanned boat paths planning method based on Q learning neural network, which is characterized in that including
Following steps:
A), memory block D is initialized;
B), Q network, state, movement initial value are initialized;It include following element: S, A, P in Q networks,α, R, wherein S table
Show the set of system mode locating for USV, A indicates the set for the movement that USV can take, Ps,αIndicate that systematic state transfer is general
Rate, R indicate reward function;
C), training objective is set at random;
D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) be stored to
In storage area D;
E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), when USV reaches
Target position, or more than it is every wheel maximum time when state be regarded as end-state;
If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter,
And return step d, algorithm terminates after repeating n wheel;
G), target is set, carries out path planning with the Q network after training, until USV reaches target position.
Preferably, memory block D is experience replay memory block in step a), for storing USV navigation process soldier acquisition
Training sample, the presence of experience replay not being continuous in time between multiple samples when each training.
Preferably, the algorithmic rule of Q network are as follows:
Q(st,at)=Q (st,at)+αδ't
Wherein, function Q (st, at) it is in state stShi Zhihang acts at, α is learning rate, δ 'tFor TD (0) deviation, TD
(0) 0 in indicate is 1 step, the values seen forward under current state more are as follows:
δ't=R (st)+γV(st+1)-Q(st,at)
Wherein, γ is discount factor, and R (s) is reward function, and V (s) is value function, value function
Alternatively, it is also possible to which TD (0) deviation is defined as
δt+1=R (st+1)+γV(st+2)-V(st+1)
Wherein, δt+1For TD (0) deviation, R (s) is reward function, and V (s) is value function,
Come to carry out discount to the TD deviation in step in future using another discount factor λ ∈ [0,1],
Wherein, function Q (st,at) it is in state stShi Zhihang acts at, α is learning rate,For the deviation of TD (λ),
TD (λ) is to see that λ is walked forward under current state more;
The deviation of TD (λ) hereinIt is defined as
Wherein, δ 'tThe deviation for the study of represent over,The deviation of multistep study is carried out, γ is discount factor, λ
For discount factor, and λ ∈ [0,1], δt+iFor the deviation learnt now.
Preferably, by ηt(s a) is defined as characteristic function: t moment (s, a) occur, then return to 1, otherwise return to 0,
To put it more simply, ignore learning efficiency, (s a) defines one with trace e to eacht(s,a)
So it is in moment t online updating
Q (s, a)=Q (s, a)+α [δ 'tηt(s,a)+
δtet(s,a)]
Wherein, (s is a) the execution movement a in state s to function Q, and α is learning rate, ηt(s a) is characterized function, et(s,
A) for trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by adding up back
Report the deviation δ ' of R (s) and current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Preferably, the overall income expectation harvested when intensified learning wishes to run system is maximum, i.e. E (R (s0)+γ
R(s1)+γ2R(s2)+...) it is maximum, need to find an optimal policy π thus, so that when USV carries out decision and movement according to π
When, the total revenue of acquisition is maximum,
The objective function of intensified learning is one of:
Vπ(s)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, π)
Qπ(s, a)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, a0=a, π)
Wherein, Vπ(s) it indicates at current original state s, can be obtained expectation according to the decision movement of tactful π and receive
Benefit;And Qπ(s a) is indicated to take movement a at current state s, be moved in the state of all later all in accordance with the decision of tactful π
It can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income phase harvested when system operation
It hopes, R (st) indicate t moment reward function, γ is discount factor;
Purpose is exactly to find optimal policy π in Q study*, so that
Preferably, definitionQπ(s, a) refer in state s execution act a, and it
The decision all done according to optimal policy in the state of all afterwards moves the expected revenus size that can be harvested, it is assumed that Q*(s,a)
It is known that so can be easily by Q*(s a) generates π*As long as making to each sIt sets up,
In this way, the problem of seeking optimal policy, which translates into, seeks Q*(s, a), due to having:
Q*(s, a)=R (s0)+γE(R(s1)+
γR(s2)+…|s1,a1)
Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π decision
Movement can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income harvested when system operation
It is expected that R (st) indicate t moment reward function, γ is discount factor;
And a1By π*It determines, then:
a1Indicate the movement taken under optimal policy, π*(s1) indicate optimal policy, Qπ(s a) is indicated in current state s
Under take movement a, it is all later in the state of all in accordance with tactful π decision movement can be obtained expected revenus,
Then according to the graceful equation of Buddhist script written on pattra leaves, Q function is iterated and is found out.
Preferably, the graceful equation of Buddhist script written on pattra leaves is with recursive formal definition Q*(s, a), so that Q function can be iterated
It finds out, the graceful equation of Buddhist script written on pattra leaves are as follows:
Wherein, Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π's
Decision movement can be obtained expected revenus, R (s0) represent reward function, η1(s a) is characterized function, e1(s, a) indicate with
Trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by accumulative return R (s)
With the deviation δ ' of current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Preferably, reward function is divided into 3 kinds, the first is rewarded at a distance from target position for USV;Second
Kind is that USV arrival target position is rewarded;The third is punished for USV and barrier collision;Specifically:
Preferably, the value range for repeating to take turns number n is 3000-5000 in step f.
The utility model has the advantages that
The present invention compared with the prior art, has the advantage, that
1, the method for intensified learning of the invention solve water quality monitoring unmanned boat when unknown waters carries out water quality monitoring oneself
Leading bit path planning problem is fitted Q function by BP neural network, so that trained strategy can be according to current
The real time information of barrier makes a policy in environment.
2, method of the invention can be such that water quality monitoring unmanned boat is cooked up in circumstances not known according to different states feasible
Path, and the decision-making time is short, route is more optimized, can satisfy the requirement of real-time planned online, to overcome traditional Q
The disadvantage that paths planning method is computationally intensive, convergence rate is slow is practised, it can first time monitoring problem waters.
3, the present invention is enabled it to the Q function in neural network fitting Q learning method with continuous system mode work
To input, and the convergence rate of network in the training process is significantly improved by experience replay and setting target network method.
4, the present invention improves traditional Q-learning, realizes Q value iteration, the output pair of network using BP neural network
Should each movement Q value, the state of the corresponding description environment of the input of network.
5, the present invention returns to different reward values for different situations, so that USV exists by the design of reward function
When study and exploration more efficiently.
Detailed description of the invention
Fig. 1 is;Overall flow figure of the invention;
Fig. 2 is;The analogous diagram of complex water areas landform;
Fig. 3 is: complex water areas landform is actually reached a little absolute error figure figure at a distance from target point;
Fig. 4 is: the analogous diagram in simple concentric circles labyrinth;
Fig. 5 is: simple concentric circles labyrinth is actually reached a little absolute error figure figure at a distance from target point;
Fig. 6 is: the analogous diagram of complex maze;
Fig. 7 is: complex maze is actually reached a little absolute error figure figure at a distance from target point;
Fig. 8 is: the simulation result diagram of East Lake background;
Fig. 9 is: East Lake background is actually reached a little absolute error figure figure at a distance from target point;
Figure 10 is: the number of iterations figure of East Lake background.
Specific embodiment
The present invention will be further explained with reference to the accompanying drawings and examples.
Embodiment one:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment, comprising the following steps: a kind of
Unmanned boat paths planning method based on Q learning neural network, which comprises the following steps:
A), memory block D is initialized;
B), Q network, state, movement initial value are initialized;It include following element: S, A, P in Q networks,α, R, wherein S table
Show the set of system mode locating for USV, A indicates the set for the movement that USV can take, Ps,αIndicate that systematic state transfer is general
Rate, R indicate reward function;
C), training objective is set at random;
D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) be stored to
In storage area D;
E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), when USV reaches
Target position, or more than it is every wheel maximum time when state be regarded as end-state;
If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter,
And return step d, algorithm terminates after repeating n wheel;
G), target is set, carries out path planning with the Q network after training, until USV reaches target position.
Wherein, D is experience replay memory block, for storing USV navigation process and acquiring training sample.Experience replay
In the presence of so that not being continuously, to minimize the correlation between sample in time between multiple samples when training every time
Enhance the stability and accuracy of sample;
Embodiment two:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment one, tradition
Q learning algorithm specifically:
Q study is to describe problem, Ma Erke based on markov decision process (Markov Decision Process)
Husband's decision process includes 4 elements: S, A, Ps,a,R.Wherein S indicates system mode set locating for USV, i.e., USV is current
State and current environment state, such as the size and location of barrier;A indicates the set of actions that USV can take, i.e. USV
The direction of rotation;Ps,aExpression system model, i.e. systematic state transfer probability, P (s'| s a) is described at current state s,
After execution acts a, system reaches the probability of state s;R indicates reward function, has current state and the movement taken to determine
It is fixed.Q study is regarded as and finds strategy and makes overall merit maximum increment type planning, the thought of Q study be do not consider environment because
Element, but directly one Q function that can be iterated to calculate of optimization, defined function Q (st, at) it is in state stShi Zhihang acts at,
And accoumulation of discount reinforcement value when hereafter optimal action sequence executes, it may be assumed that
In formula, stFor t moment USV state in which, st+1For subsequent time USV state in which, atFor holding for t moment
Action is made, and γ is discount factor, value 0≤γ≤1;R(st) it is reward function, value is positive number or negative.In initial rank
In section study, Q value may be improperly to reflect strategy defined in them, initial Q0(s, a) for all state and
Movement is it is assumed that provide.Assuming that the possible set of actions A of state set s, USV of given environment is selectively more, data volume
Greatly, a large amount of system memory space is needed to go to store, and can not be extensive.In order to overcome drawbacks described above, traditional Q-learning is carried out
It improves, realizes that Q value iteration, the input correspondence of the Q value of the corresponding each movement of the output of network, network are retouched using BP neural network
State the state of environment.
Improved Q learning path planning algorithm
Q (λ) algorithm is to use for reference TD (λ) algorithm to generate, and allows data constantly to transmit by the thought of backtracking, so that a certain
The movement decision of state is influenced by its succeeding state.If the following a certain decision π is the decision of a failure, when
Preceding decision will also undertake corresponding punishment, this influence can be appended to current decision;If the following a certain decision π is one
A correct decision equally also will affect in current decision then current decision can also be rewarded accordingly.In conjunction with
It can be improved convergence speed of the algorithm after improvement, meet the practicability of study.Improved Q (λ) algorithm updates rule
Q(st,at)=Q (st,at)+αδ't (2)
Wherein, function Q (st, at) it is in state stShi Zhihang acts at, α is learning rate, is TD (0) deviation, TD (0)
In 0 indicate is 1 step, the values seen forward under current state more are as follows:
δ't=R (st)+γV(st+1)-Q(st,at) (3)
Wherein, γ is discount factor, and R (s) is reward function, and V (s) is value function, value function
Alternatively, it is also possible to which TD (0) deviation is defined as
δt+1=R (st+1)+γV(st+2)-V(st+1) (4)
Wherein, δt+1For TD (0) deviation, R (s) is reward function, and V (s) is value function, and what 0 in TD (0) indicated is
It is forward under current state to see 1 step more.
Wherein also come to carry out discount to the TD deviation in step in future using discount factor λ ∈ [0,1].
Wherein, function Q (st,at) it is in state stShi Zhihang acts at, α is learning rate,
Introducing a new parameter lambda herein can be accomplished by this new parameter in the feelings for not increasing computation complexity
The prediction that all step numbers are comprehensively considered under condition is for controlling weight as γ parameter before.TD (λ) is current
It is forward under state to see that λ is walked more.
The deviation of TD (λ) hereinIt is defined as
Wherein, δ 'tThe deviation that the study of represent over obtains,Carry out multistep study deviation, γ be discount because
Son, λ are discount factor, and λ ∈ [0,1], δt+iFor the deviation learnt now.
Embodiment three
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment two, as long as
The TD deviation in future is unknown, and update above can not just carry out.But them can be gradually calculated by using with trace.
Below by ηt(s a) is defined as characteristic function: in t moment, (s a) occurs, then returns to 1, otherwise return to 0.To put it more simply, ignoring
Learning efficiency, to it is each (s, a) define one with trace et(s,a)
So it is in moment t online updating
Q (s, a)=Q (s, a)+α [δ 'tηt(s,a)+
δtet(s,a)] (8)
Wherein, (s is a) the execution movement a in state s to function Q, and α is learning rate, ηt(s a) is characterized function, et(s,
A) for trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by adding up back
Report the deviation δ ' of R (s) and current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Example IV:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment three, strengthens
The overall income expectation that study harvests when wishing to run system is maximum, needs to find an optimal policy π thus, so that working as
When USV carries out decision and movement according to π, the total revenue of acquisition is maximum.In general, the objective function of intensified learning be it is following wherein
One of:
Vπ(s)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, π)
Qπ(s, a)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, a0=a, π) (9)
Wherein, Vπ(s) it indicates at current original state s, can be obtained expectation according to the decision movement of tactful π and receive
Benefit;And Qπ(s a) is indicated to take movement a at current state s, be moved in the state of all later all in accordance with the decision of tactful π
It can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income phase harvested when system operation
It hopes, R (st) indicates the reward function of t moment, and γ is discount factor.
Purpose is exactly to find optimal policy π in Q study*, so that
Embodiment five:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on example IV, definitionQπ(s, a) refer in state s execution act a, and it is all later in the state of all according to most
The decision that dominant strategy is done moves the expected revenus size that can be harvested.Assuming that Q*(s, a) it is known that so can be easily
By Q*(s a) generates π*As long as making to each sIt sets up.In this way, the problem of seeking optimal policy
It translates into and seeks Q*(s,a).Due to having:
Q*(s, a)=R (s0)+γE(R(s1)+
γR(s2)+…|s1,a1) (10)
Wherein, Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π's
Decision movement can be obtained expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the totality harvested when system operation
Profit expectation, R (st) indicate t moment reward function, γ is discount factor;
And a1By π*It determines, then:
Wherein, a1Indicate the movement taken under optimal policy, π*(s1) indicate optimal policy, Qπ(s a) is indicated current
Taken under state s movement a, it is all later in the state of all in accordance with tactful π decision movement can be obtained expected revenus, so
Afterwards according to the graceful equation of Buddhist script written on pattra leaves, Q function is iterated and is found out.
Embodiment six
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment five, Buddhist script written on pattra leaves
Graceful equation is with recursive formal definition Q*(s a) is found out so that Q function can be iterated, the graceful equation of Buddhist script written on pattra leaves are as follows:
Wherein, Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π's
Decision movement can be obtained expected revenus, R (s0) represent reward function, η1(s a) is characterized function, e1(s, a) indicate with
Trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now, δ1It is by accumulative return R (s)
With the deviation δ ' of current estimation V (s)t, and update to obtain multiplied by learning rate with the deviation.
Q function is to save and update in table form in traditional Q-learning algorithm, but in USV obstacle-avoiding route planning,
Since barrier is likely to occur any position in space, Q function is difficult to describe in continuous space in table form
The barrier of appearance.Therefore herein on Q learning foundation, depth Q study is fitted Q function with BP neural network, inputs shape
State s is continuous variable.In general, being difficult to restrain with learning process when nonlinear function approximation Q function, experience replay is used thus
Improve study stability with the method for target network.
Embodiment seven:
A kind of unmanned boat paths planning method based on Q learning neural network of the present embodiment is based on embodiment six, strong
During chemistry is practised, the design of reward function directly affects the quality of learning effect.In general, reward function corresponds to people to a certain
The description of business is incorporated in study by the priori knowledge that the design of reward function can solve task.In USV path planning,
Wish to avoid safely bumping against with barrier during USV is also wanted to while reaching target position as early as possible in navigation.This
Reward function is divided into 3 kinds by text, the first is rewarded at a distance from target position for USV;Second is USV arrival target
Position is rewarded;The third is punished for USV and barrier collision.Reward function are as follows:
From magnitude, first and second kind of reward value is bigger than first reward value.Because coming for USV avoidance task
It says, main target is exactly avoiding obstacles and gets to target position, rather than only shortens USV at a distance from target position,
The reason of this is added be, if only target position is reached to USV and USV knocks barrier and rewarded and punished,
It is 0 that a large amount of step reward will be so had in motion process, this meeting is so that USV will not be improved in most cases
Strategy, learning efficiency are low.This reward is added and is equivalent to the priori knowledge that joined people to this task, so that USV is learning
With explore when more efficiently.
Embodiment eight:
In order to examine the path planning algorithm designed herein, emulation experiment is carried out on Matlab2014a software herein.
In an experiment, simulated environment is the region of 20*20, and discount factor γ value is 0.9, and memory block D is sized to 40000, circulation
Number 1000, neural network first layer have 64 neurons, and the second layer has 32 neurons.In trained each round, whenever
When USV bumps against barrier or the arrival target position USV, which is all immediately finished, and returns to a reward.
For the accuracy for verifying context of methods, it will be tested using labyrinth landform, three kinds of different terrains will be designed herein
Carry out the comparison of algorithm, respectively complex water areas landform (as shown in Figure 2), simple concentric circles labyrinth landform is (such as Fig. 4 institute
Show), complex maze landform (as shown in Figure 6).This paper innovatory algorithm is emulated with traditional Q-learning algorithm in the above landform,
The innovatory algorithm route that blue represents it can be seen from path profile compares the route of traditional Q-learning algorithm simulating, path length
It is shorter, it is simpler and more direct.By be actually reached a little at a distance from target point it can be seen from absolute error figure innovatory algorithm than traditional Q
Habit algorithm shifts to an earlier date one third and carries out convergence stabilization.
Embodiment nine:
Experiment simulation is carried out by taking the actual environment background of Linan East Lake waters as an example, as seen from Figure 8, USV is in simulation process
In have no that appearance and barrier bump against and path is simple and fast.Fig. 9 is standard error curve, Figure 10 is learning curve, You Shangtu
Find out, when frequency of training reaches 56 times, curve tends to be steady, illustrate substantially to cook up a safe and efficient whole route,
USV can avoiding obstacles arrival target position in most cases at this time.It therefore deduces that, based on BP neural network
Q learning algorithm is improved than traditional Q-learning algorithm, learns convergence rate faster, path is more optimized.
The above is only a preferred embodiment of the present invention, it should be pointed out that: those skilled in the art are come
It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications should also regard
For protection scope of the present invention.
Claims (9)
1. a kind of unmanned boat paths planning method based on Q learning neural network, which comprises the following steps:
A), memory block D is initialized;
B), Q network, state, movement initial value are initialized;It include following element: S, A, P in Q networks,α, R, wherein wherein S table
Show the set of system mode locating for USV, A indicates the set for the movement that USV can take, Ps,αIndicate that systematic state transfer is general
Rate, R indicate reward function;
C), training objective is set at random;
D), random selection acts at, obtain currently rewarding rt, subsequent time state st+1, by (st,at,rt,st+1) it is stored to memory block
In D;
E), stochastical sampling batch of data is trained from the D of memory block, i.e. a batch (st,at,rt,st+1), when USV reaches target
Position, or more than it is every wheel maximum time when state be regarded as end-state;
If f), st+1It is not end-state, then return step d, if st+1It is end-state, then updates Q network parameter, and return
Step d, algorithm terminates after repeating n wheel;
G), target is set, carries out path planning with the Q network after training, until USV reaches target position.
2. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that step
It is rapid a) in, memory block D be experience replay memory block, for store USV navigation process soldier acquire training sample.
3. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that Q net
The algorithmic rule of network are as follows:
Q(st,at)=Q (st,at)+αδ′t
Wherein, function Q (st, at) it is in state stShi Zhihang acts at, α is learning rate, δ 'tFor TD (0) deviation, in TD (0)
0 indicate is 1 step, the values seen forward under current state more are as follows:
δ′t=R (st)+γV(st+1)-Q(st,at)
Wherein, γ is discount factor, and R (s) is reward function, and V (s) is value function, value functionIn addition,
TD (0) deviation can also be defined as
δt+1=R (st+1)+γV(st+2)-V(st+1)
Wherein, δt+1For TD (0) deviation, R (s) is reward function, and V (s) is value function,
Come to carry out discount to the TD deviation in step in future using another discount factor λ ∈ [0,1],
Q(st,at)=Q (st,at)+αδt λ
Wherein, function Q (st,at) it is in state stShi Zhihang acts at, α is learning rate, δt λFor the deviation of TD (λ), TD (λ)
It is to see that λ is walked forward under current state more;
The deviation δ of TD (λ) hereint λIt is defined as
Wherein, δ 'tThe deviation that the study of represent over obtains, δt λThe deviation of multistep study is carried out, γ is discount factor, λ
For discount factor, and λ ∈ [0,1], δt+iFor the deviation learnt now.
4. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that by ηt
(s a) is defined as characteristic function: t moment (s a) occurs, then returns to 1, otherwise return to 0, to put it more simply, ignore learning efficiency,
To it is each (s, a) define one with trace et(s,a)
So it is in moment t online updating
Q (s, a)=Q (s, a)+α [δ 'tηt(s,a)+
δtet(s,a)]
Wherein, (s is a) the execution movement a in state s to function Q, and α is learning rate, ηt(s a) is characterized function, et(s a) is
With trace, δ 'tThe deviation for the study of represent over, δ1For the deviation learnt now.
5. the unmanned boat paths planning method according to claim 4 based on Q learning neural network, which is characterized in that strong
Chemistry practises the overall income expectation maximum harvested when wishing to run system, needs to find an optimal policy π thus, so that working as
When USV carries out decision and movement according to π, the total revenue of acquisition is maximum,
The objective function of intensified learning is one of:
Vπ(s)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, π)
Qπ(s, a)=E (R (s0)+γR(s1)+
γ2R(s2)+…|s0=s, a0=a, π)
Wherein, Vπ(s) it indicates at current original state s, can be obtained expected revenus according to the decision movement of tactful π;And Qπ
(s a) indicates to take acting a at current state s, can obtain in the state of all later all in accordance with the decision movement of tactful π
Expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income expectation harvested when system operation, R (st) indicate
The reward function of t moment, γ are discount factor;
Purpose is exactly to find optimal policy π in Q study*, so that
6. the unmanned boat paths planning method according to claim 5 based on Q learning neural network, which is characterized in that fixed
JusticeQπ(s, a) refer in state s execution act a, and it is all later in the state of all bases
The decision that optimal policy is done moves the expected revenus size that can be harvested, it is assumed that Q*(s, a) it is known that so can be easily
By Q*(s a) generates π*As long as making to each sIt sets up, in this way, the problem of seeking optimal policy
It translates into and seeks Q*(s, a), due to having:
Q*(s, a)=R (s0)+γE(R(s1)+
γR(s2)+…|s1,a1)
Qπ(s, a) indicate taken at current state s movement a, it is all later in the state of all in accordance with tactful π decision move institute
Obtainable expected revenus, E (R (s0)+γR(s1)+γ2R(s2)+...) it is the overall income expectation harvested when system operation, R
(st) indicate t moment reward function, γ is discount factor;
And a1By π*It determines, then:
a1Indicate the movement taken under optimal policy, π*(s1) indicate optimal policy, Qπ(s a) indicates to take at current state s
Act a, it is all later in the state of all in accordance with tactful π decision movement can be obtained expected revenus,
Then according to the graceful equation of Buddhist script written on pattra leaves, Q function is iterated and is found out.
7. the unmanned boat paths planning method according to claim 6 based on Q learning neural network, which is characterized in that shellfish
The graceful equation of text is with recursive formal definition Q*(s a) is found out so that Q function can be iterated, the graceful equation of Buddhist script written on pattra leaves are as follows:
Wherein, Qπ(s a) is indicated to take movement a at current state s, be transported in the state of all later all in accordance with the decision of tactful π
It is dynamic to can be obtained expected revenus, R (s0) represent reward function, η1(s a) is characterized function, e1(s a) is indicated with trace, δ 't
The deviation that the study of represent over obtains, δ1For the deviation learnt now, δ1It is R (s) and current by adding up to return
Estimate the deviation δ ' of V (s)t, and update to obtain multiplied by learning rate with the deviation.
8. the unmanned boat paths planning method according to claim 6 or 7 based on Q learning neural network, which is characterized in that
Reward function is divided into 3 kinds, the first is rewarded at a distance from target position for USV;Second is USV arrival target position
It is rewarded;The third is punished for USV and barrier collision;Specifically:
9. the unmanned boat paths planning method according to claim 1 based on Q learning neural network, which is characterized in that step
In rapid f, the value range for repeating to take turns number n is 3000-5000.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811612058.6A CN109726866A (en) | 2018-12-27 | 2018-12-27 | Unmanned boat paths planning method based on Q learning neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811612058.6A CN109726866A (en) | 2018-12-27 | 2018-12-27 | Unmanned boat paths planning method based on Q learning neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109726866A true CN109726866A (en) | 2019-05-07 |
Family
ID=66297307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811612058.6A Pending CN109726866A (en) | 2018-12-27 | 2018-12-27 | Unmanned boat paths planning method based on Q learning neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109726866A (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321666A (en) * | 2019-08-09 | 2019-10-11 | 重庆理工大学 | Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN110618686A (en) * | 2019-10-30 | 2019-12-27 | 江苏科技大学 | Unmanned ship track control method based on explicit model predictive control |
CN110716574A (en) * | 2019-09-29 | 2020-01-21 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep Q network |
CN110716575A (en) * | 2019-09-29 | 2020-01-21 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning |
CN110836518A (en) * | 2019-11-12 | 2020-02-25 | 上海建科建筑节能技术股份有限公司 | System basic knowledge based global optimization control method for self-learning air conditioning system |
CN110865539A (en) * | 2019-11-18 | 2020-03-06 | 华南理工大学 | Unmanned ship tracking error constraint control method under random interference |
CN110955239A (en) * | 2019-11-12 | 2020-04-03 | 中国地质大学(武汉) | Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning |
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111123963A (en) * | 2019-12-19 | 2020-05-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111176122A (en) * | 2020-02-11 | 2020-05-19 | 哈尔滨工程大学 | Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology |
CN111273670A (en) * | 2020-03-03 | 2020-06-12 | 大连海事大学 | Unmanned ship collision avoidance method for fast moving barrier |
CN111308890A (en) * | 2020-02-27 | 2020-06-19 | 大连海事大学 | Unmanned ship data-driven reinforcement learning control method with designated performance |
CN111415048A (en) * | 2020-04-10 | 2020-07-14 | 大连海事大学 | Vehicle path planning method based on reinforcement learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
CN111829527A (en) * | 2020-07-23 | 2020-10-27 | 中国石油大学(华东) | Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements |
CN112163720A (en) * | 2020-10-22 | 2021-01-01 | 哈尔滨工程大学 | Multi-agent unmanned electric vehicle battery replacement scheduling method based on Internet of vehicles |
CN112188600A (en) * | 2020-09-22 | 2021-01-05 | 南京信息工程大学滨江学院 | Method for optimizing heterogeneous network resources by using reinforcement learning |
CN112202848A (en) * | 2020-09-15 | 2021-01-08 | 中国科学院计算技术研究所 | Unmanned system network self-adaptive routing method and system based on deep reinforcement learning |
CN112215290A (en) * | 2020-10-16 | 2021-01-12 | 苏州大学 | Q learning auxiliary data analysis method and system based on Fisher score |
CN112327821A (en) * | 2020-07-08 | 2021-02-05 | 东莞市均谊视觉科技有限公司 | Intelligent cleaning robot path planning method based on deep reinforcement learning |
CN112525213A (en) * | 2021-02-10 | 2021-03-19 | 腾讯科技(深圳)有限公司 | ETA prediction method, model training method, device and storage medium |
CN112543038A (en) * | 2020-11-02 | 2021-03-23 | 杭州电子科技大学 | Intelligent anti-interference decision method of frequency hopping system based on HAQL-PSO |
CN112567399A (en) * | 2019-09-23 | 2021-03-26 | 阿里巴巴集团控股有限公司 | System and method for route optimization |
CN112600759A (en) * | 2020-12-10 | 2021-04-02 | 东北大学 | Multipath traffic scheduling method and system based on deep reinforcement learning under Overlay network |
CN112698646A (en) * | 2020-12-05 | 2021-04-23 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN112799386A (en) * | 2019-10-25 | 2021-05-14 | 中国科学院沈阳自动化研究所 | Robot path planning method based on artificial potential field and reinforcement learning |
CN112880663A (en) * | 2021-01-19 | 2021-06-01 | 西北工业大学 | AUV reinforcement learning path planning method considering accumulated errors |
CN112947431A (en) * | 2021-02-03 | 2021-06-11 | 海之韵(苏州)科技有限公司 | Unmanned ship path tracking method based on reinforcement learning |
CN113415195A (en) * | 2021-08-11 | 2021-09-21 | 国网江苏省电力有限公司苏州供电分公司 | Alignment guiding and visualization method for wireless charging system of electric vehicle |
CN113720346A (en) * | 2021-09-02 | 2021-11-30 | 重庆邮电大学 | Vehicle path planning method and system based on potential energy field and hidden Markov model |
CN113721604A (en) * | 2021-08-04 | 2021-11-30 | 哈尔滨工业大学 | Intelligent track control method of unmanned surface vehicle considering sea wave encountering angle |
CN113848974A (en) * | 2021-09-28 | 2021-12-28 | 西北工业大学 | Aircraft trajectory planning method and system based on deep reinforcement learning |
CN113966596A (en) * | 2019-06-11 | 2022-01-21 | 瑞典爱立信有限公司 | Method and apparatus for data traffic routing |
CN114518758A (en) * | 2022-02-08 | 2022-05-20 | 中建八局第三建设有限公司 | Q learning-based indoor measuring robot multi-target-point moving path planning method |
CN116523165A (en) * | 2023-06-30 | 2023-08-01 | 吉林大学 | Collaborative optimization method for AMR path planning and production scheduling of flexible job shop |
CN116596174A (en) * | 2023-04-28 | 2023-08-15 | 北京大数据先进技术研究院 | Path planning method, device, equipment and storage medium for integrating cost and benefit |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180094286A (en) * | 2017-02-15 | 2018-08-23 | 국방과학연구소 | Path Planning System of Unmanned Surface Vehicle for Autonomous Tracking of Underwater Acoustic Target |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
CN108803321A (en) * | 2018-05-30 | 2018-11-13 | 清华大学 | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study |
CN108803313A (en) * | 2018-06-08 | 2018-11-13 | 哈尔滨工程大学 | A kind of paths planning method based on ocean current prediction model |
-
2018
- 2018-12-27 CN CN201811612058.6A patent/CN109726866A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180094286A (en) * | 2017-02-15 | 2018-08-23 | 국방과학연구소 | Path Planning System of Unmanned Surface Vehicle for Autonomous Tracking of Underwater Acoustic Target |
CN108803321A (en) * | 2018-05-30 | 2018-11-13 | 清华大学 | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
CN108803313A (en) * | 2018-06-08 | 2018-11-13 | 哈尔滨工程大学 | A kind of paths planning method based on ocean current prediction model |
Non-Patent Citations (2)
Title |
---|
周志华: "《机器学习》", 31 January 2016, 清华大学出版社 * |
徐莉: "Q-learning研究及其在AUV局部路径规划中的应用", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113966596A (en) * | 2019-06-11 | 2022-01-21 | 瑞典爱立信有限公司 | Method and apparatus for data traffic routing |
CN113966596B (en) * | 2019-06-11 | 2024-03-01 | 瑞典爱立信有限公司 | Method and apparatus for data traffic routing |
CN110321666A (en) * | 2019-08-09 | 2019-10-11 | 重庆理工大学 | Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm |
CN110321666B (en) * | 2019-08-09 | 2022-05-03 | 重庆理工大学 | Multi-robot path planning method based on priori knowledge and DQN algorithm |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN112567399A (en) * | 2019-09-23 | 2021-03-26 | 阿里巴巴集团控股有限公司 | System and method for route optimization |
CN110716575A (en) * | 2019-09-29 | 2020-01-21 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning |
CN110716574B (en) * | 2019-09-29 | 2023-05-02 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep Q network |
CN110716574A (en) * | 2019-09-29 | 2020-01-21 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep Q network |
CN112799386B (en) * | 2019-10-25 | 2021-11-23 | 中国科学院沈阳自动化研究所 | Robot path planning method based on artificial potential field and reinforcement learning |
CN112799386A (en) * | 2019-10-25 | 2021-05-14 | 中国科学院沈阳自动化研究所 | Robot path planning method based on artificial potential field and reinforcement learning |
CN110618686A (en) * | 2019-10-30 | 2019-12-27 | 江苏科技大学 | Unmanned ship track control method based on explicit model predictive control |
CN110955239A (en) * | 2019-11-12 | 2020-04-03 | 中国地质大学(武汉) | Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning |
CN110836518A (en) * | 2019-11-12 | 2020-02-25 | 上海建科建筑节能技术股份有限公司 | System basic knowledge based global optimization control method for self-learning air conditioning system |
CN110865539A (en) * | 2019-11-18 | 2020-03-06 | 华南理工大学 | Unmanned ship tracking error constraint control method under random interference |
CN111123963A (en) * | 2019-12-19 | 2020-05-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
US11747155B2 (en) | 2019-12-31 | 2023-09-05 | Goertek Inc. | Global path planning method and device for an unmanned vehicle |
CN111176122A (en) * | 2020-02-11 | 2020-05-19 | 哈尔滨工程大学 | Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology |
CN111308890B (en) * | 2020-02-27 | 2022-08-26 | 大连海事大学 | Unmanned ship data-driven reinforcement learning control method with designated performance |
CN111308890A (en) * | 2020-02-27 | 2020-06-19 | 大连海事大学 | Unmanned ship data-driven reinforcement learning control method with designated performance |
CN111273670A (en) * | 2020-03-03 | 2020-06-12 | 大连海事大学 | Unmanned ship collision avoidance method for fast moving barrier |
CN111273670B (en) * | 2020-03-03 | 2024-03-15 | 大连海事大学 | Unmanned ship collision prevention method for fast moving obstacle |
CN111415048B (en) * | 2020-04-10 | 2024-04-19 | 大连海事大学 | Vehicle path planning method based on reinforcement learning |
CN111415048A (en) * | 2020-04-10 | 2020-07-14 | 大连海事大学 | Vehicle path planning method based on reinforcement learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
CN111694365B (en) * | 2020-07-01 | 2021-04-20 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
CN112327821A (en) * | 2020-07-08 | 2021-02-05 | 东莞市均谊视觉科技有限公司 | Intelligent cleaning robot path planning method based on deep reinforcement learning |
CN111829527A (en) * | 2020-07-23 | 2020-10-27 | 中国石油大学(华东) | Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements |
CN111829527B (en) * | 2020-07-23 | 2021-07-20 | 中国石油大学(华东) | Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements |
CN112202848A (en) * | 2020-09-15 | 2021-01-08 | 中国科学院计算技术研究所 | Unmanned system network self-adaptive routing method and system based on deep reinforcement learning |
CN112202848B (en) * | 2020-09-15 | 2021-11-30 | 中国科学院计算技术研究所 | Unmanned system network self-adaptive routing method and system based on deep reinforcement learning |
CN112188600B (en) * | 2020-09-22 | 2023-05-30 | 南京信息工程大学滨江学院 | Method for optimizing heterogeneous network resources by reinforcement learning |
CN112188600A (en) * | 2020-09-22 | 2021-01-05 | 南京信息工程大学滨江学院 | Method for optimizing heterogeneous network resources by using reinforcement learning |
CN112215290B (en) * | 2020-10-16 | 2024-04-09 | 苏州大学 | Fisher score-based Q learning auxiliary data analysis method and Fisher score-based Q learning auxiliary data analysis system |
CN112215290A (en) * | 2020-10-16 | 2021-01-12 | 苏州大学 | Q learning auxiliary data analysis method and system based on Fisher score |
CN112163720A (en) * | 2020-10-22 | 2021-01-01 | 哈尔滨工程大学 | Multi-agent unmanned electric vehicle battery replacement scheduling method based on Internet of vehicles |
CN112543038A (en) * | 2020-11-02 | 2021-03-23 | 杭州电子科技大学 | Intelligent anti-interference decision method of frequency hopping system based on HAQL-PSO |
CN112543038B (en) * | 2020-11-02 | 2022-03-11 | 杭州电子科技大学 | Intelligent anti-interference decision method of frequency hopping system based on HAQL-PSO |
CN112698646A (en) * | 2020-12-05 | 2021-04-23 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN112698646B (en) * | 2020-12-05 | 2022-09-13 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN112600759A (en) * | 2020-12-10 | 2021-04-02 | 东北大学 | Multipath traffic scheduling method and system based on deep reinforcement learning under Overlay network |
CN112600759B (en) * | 2020-12-10 | 2022-06-03 | 东北大学 | Multipath traffic scheduling method and system based on deep reinforcement learning under Overlay network |
CN112880663A (en) * | 2021-01-19 | 2021-06-01 | 西北工业大学 | AUV reinforcement learning path planning method considering accumulated errors |
CN112947431A (en) * | 2021-02-03 | 2021-06-11 | 海之韵(苏州)科技有限公司 | Unmanned ship path tracking method based on reinforcement learning |
CN112525213B (en) * | 2021-02-10 | 2021-05-14 | 腾讯科技(深圳)有限公司 | ETA prediction method, model training method, device and storage medium |
CN112525213A (en) * | 2021-02-10 | 2021-03-19 | 腾讯科技(深圳)有限公司 | ETA prediction method, model training method, device and storage medium |
CN113721604B (en) * | 2021-08-04 | 2024-04-12 | 哈尔滨工业大学 | Intelligent track control method of unmanned surface vehicle considering sea wave encountering angle |
CN113721604A (en) * | 2021-08-04 | 2021-11-30 | 哈尔滨工业大学 | Intelligent track control method of unmanned surface vehicle considering sea wave encountering angle |
CN113415195A (en) * | 2021-08-11 | 2021-09-21 | 国网江苏省电力有限公司苏州供电分公司 | Alignment guiding and visualization method for wireless charging system of electric vehicle |
CN113720346A (en) * | 2021-09-02 | 2021-11-30 | 重庆邮电大学 | Vehicle path planning method and system based on potential energy field and hidden Markov model |
CN113720346B (en) * | 2021-09-02 | 2023-07-04 | 重庆邮电大学 | Vehicle path planning method and system based on potential energy field and hidden Markov model |
CN113848974B (en) * | 2021-09-28 | 2023-08-15 | 西安因诺航空科技有限公司 | Aircraft trajectory planning method and system based on deep reinforcement learning |
CN113848974A (en) * | 2021-09-28 | 2021-12-28 | 西北工业大学 | Aircraft trajectory planning method and system based on deep reinforcement learning |
CN114518758B (en) * | 2022-02-08 | 2023-12-12 | 中建八局第三建设有限公司 | Indoor measurement robot multi-target point moving path planning method based on Q learning |
CN114518758A (en) * | 2022-02-08 | 2022-05-20 | 中建八局第三建设有限公司 | Q learning-based indoor measuring robot multi-target-point moving path planning method |
CN116596174B (en) * | 2023-04-28 | 2023-10-20 | 北京大数据先进技术研究院 | Path planning method, device, equipment and storage medium for integrating cost and benefit |
CN116596174A (en) * | 2023-04-28 | 2023-08-15 | 北京大数据先进技术研究院 | Path planning method, device, equipment and storage medium for integrating cost and benefit |
CN116523165B (en) * | 2023-06-30 | 2023-12-01 | 吉林大学 | Collaborative optimization method for AMR path planning and production scheduling of flexible job shop |
CN116523165A (en) * | 2023-06-30 | 2023-08-01 | 吉林大学 | Collaborative optimization method for AMR path planning and production scheduling of flexible job shop |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109726866A (en) | Unmanned boat paths planning method based on Q learning neural network | |
Li et al. | Path planning for UAV ground target tracking via deep reinforcement learning | |
CN110083165B (en) | Path planning method of robot in complex narrow environment | |
CN108820157B (en) | Intelligent ship collision avoidance method based on reinforcement learning | |
Xia et al. | Neural inverse reinforcement learning in autonomous navigation | |
CN111399506A (en) | Global-local hybrid unmanned ship path planning method based on dynamic constraints | |
Xia et al. | Cooperative task assignment and track planning for multi-UAV attack mobile targets | |
CN108803321A (en) | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study | |
CN109597425B (en) | Unmanned aerial vehicle navigation and obstacle avoidance method based on reinforcement learning | |
CN112034887A (en) | Optimal path training method for unmanned aerial vehicle to avoid cylindrical barrier to reach target point | |
CN108319293A (en) | A kind of UUV Realtime collision free planing methods based on LSTM networks | |
CN110926477A (en) | Unmanned aerial vehicle route planning and obstacle avoidance method | |
Wang et al. | Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN111240345A (en) | Underwater robot trajectory tracking method based on double BP network reinforcement learning framework | |
CN112947594B (en) | Unmanned aerial vehicle-oriented track planning method | |
CN112824998A (en) | Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process | |
CN115143970B (en) | Obstacle avoidance method and system of underwater vehicle based on threat degree evaluation | |
Bai et al. | USV path planning algorithm based on plant growth | |
Yao et al. | Multi-USV cooperative path planning by window update based self-organizing map and spectral clustering | |
Sood et al. | Meta-heuristic techniques for path planning: recent trends and advancements | |
Ramezani et al. | UAV path planning employing MPC-reinforcement learning method considering collision avoidance | |
Kong et al. | An FM*-based comprehensive path planning system for robotic floating garbage cleaning | |
Li et al. | Deep reinforcement learning based adaptive real-time path planning for UAV | |
CN114609925B (en) | Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190507 |
|
RJ01 | Rejection of invention patent application after publication |