CN102929281A - Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment - Google Patents
Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment Download PDFInfo
- Publication number
- CN102929281A CN102929281A CN 201210455666 CN201210455666A CN102929281A CN 102929281 A CN102929281 A CN 102929281A CN 201210455666 CN201210455666 CN 201210455666 CN 201210455666 A CN201210455666 A CN 201210455666A CN 102929281 A CN102929281 A CN 102929281A
- Authority
- CN
- China
- Prior art keywords
- state
- robot
- knn
- value
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
Robot path planning technology under unknown dynamic environment has an important application value. Withal, the invention discloses a robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment. The robot kNN path planning method under the incomplete perception environment comprises partially observable markov decision process (POMDP) model building, POMDP model solution and construction of an iterative learning model. Due to the fact that the iterative model is utilized, the learning capability and the adapting capability to the environment of a robot in the process of path planning are improved, and the path planning performance can be improved.
Description
Technical field
The present invention is the robot path planning method under a kind of Unknown Dynamic Environment, relates to robot navigation's technical field, refers more particularly to the Robot Path Planning Algorithm aspect.
Background technology
Along with the development of Robotics, the machine Man's Power improves constantly, and the robot application field also constantly enlarges, and especially in some danger, special or application that the people should not go to, such as nuclear emergency disposal, space operation etc., all needs the machine human intervention.Path planning is the important step of robot navigation's technology, robot path planning's problem is commonly defined as: the starting point of given robot and impact point, in the environment that fixing or moving disorder are arranged, plan a nothing path that bump, that satisfy certain optiaml ciriterion, make robot move to impact point according to this path.Wherein, optiaml ciriterion generally has: minimum, used shortest time, the path of the energy that consumes is the shortest etc.Therefore, a nothing is bumped to seeking, optimal path plays vital effect in the research of paths planning method.
Robot will be in Unknown Dynamic Environment safety, finish path planning reliably, need to possess the ability that can process various uncertain conditions, to improve the adaptability to environment.Therefore, the robot path planning who has an intelligence learning ability seems particularly important.The intensified learning algorithm is used for the robot path planning, it is advantageous that this algorithm is a kind of non-supervisory on-line study method, and does not need the accurate model of environment, and therefore the mobile robot path planning under Dynamic Unknown Environment just comes into one's own in using.As: Reinforcementbased mobile robot navigation in dynamic environment one literary composition of MohammadAbdel Kareem Jaradat is compared with Artificial Potential Field Method to intensified learning, and experimental result shows that the robot path planning method based on the intensified learning algorithm has better applicability.Extended Dyna-QAlgorithm for Path Planning of Mobile Robots one literary composition of Hoang-huu VIET is on Dyna-Q intensified learning algorithm basis, utilize Maximum Likelihood Model to select action and renewal Q value function, improved convergence of algorithm speed.
In these methods, robot finishes path planning under complete observable environment, Wu Feng in " based on the research of the multi-agent system planning problem of decision theory " literary composition from the angle of decision theory DEC-POMDP model, to solve the multiple agent planning problem under the large state space, the method has been considered the incomplete observability of environmental information, but model and the algorithm set up have higher complicacy.
For these problems, the present invention proposes the kNN of the robot paths planning method under a kind of incomplete perception environment.The method adopts the local value iterative learning model based on k arest neighbors classificating thought, the uncertainty of moving under the consideration circumstances not known and the imperfection of environment information acquisition, the adaptability of Robot Path Planning Algorithm in the raising actual environment.
Summary of the invention
The object of the invention is to solve under the Unknown Dynamic Environment, the robot path planning exists the incomplete observability of environmental information, large state space to find the solution difficult problem, with the applicable ability of Effective Raise path planning algorithm.The method adopts the local point value iteration based on the k nearest neighbor classification, replaces the value iterative computation to whole states, finds the solution dimension disaster problem in the POMDP model with effective alleviation, improves simultaneously intensified learning convergence in the path planning process.
In order to achieve the above object, the technical scheme that the present invention takes is: the kNN of the robot paths planning method under a kind of incomplete perception environment may further comprise the steps:
One, the POMDP model is set up:
Adopt grating map that the robot planning environment is divided into little grid.Utilize Grid Method to set up environment map, each little grid cell is corresponding to a state s among the POMDP model state collection S.Behavior aggregate A has east (East), west (West), south (South), north (North) four actions, and robot can be in constantly at next one of 4 adjacent accessible grid cells; Robot arrives dbjective state can obtain return value 0, and other situation return value is-1.In the continuous reciprocal process of machine human and environment, because that the execution of action exists is uncertain, so transition probability is set to correctly carry out the action that optimal strategy is selected with greater probability, to slide to the left and right sides of this action than small probability.
Two, POMDP model solution:
In the POMDP of the robot path planning, robot sensor can not be observed all environmental informations fully.In order to find the solution optimal strategy, action and the complete sequence of observer state, i.e. historical information that robot need to experience.Historical information can utilize conviction state (Belief State) to replace, and conviction state b (s) is a probability distribution on the state set S, and all conviction states form one | and S| ties up matrix.Replace state with the conviction state when finding the solution, the POMDP problem just is converted into the MDP problem based on the conviction state, Action Selection strategy π is converted into by the mapping of conviction state to action: π (b) → a, under optimal strategy π *, the accumulation of discount reward value of all conviction states forms optimal value function Q (b, a), thus can utilize method kNN-Sarsa (λ) the Algorithm for Solving POMDP problem of finding the solution the MDP problem.
Three, the structure of iterative learning model:
After robot arranges reference position and target location, utilize the robot path planning method based on the intensified learning algorithm, a nothing from the reference position to the target location is bumped, optimal path for robot is sought.In the process of seeking optimal path, the grid cell that the present invention may arrive robot is defined as the state s of iterative learning model, action a is defined as concrete direction of action: East, West, South, North, the purpose of Action Selection are the paths of farthest shortening from the reference position to the target location.The iterative model of intensified learning algorithm is given each (s, a) defined a state-working value function Q, it is the conversion accumulation return value that robot obtains when current state selects a certain action to be updated to NextState, the Action Selection strategy is selected optimum action according to this Q value, so that the accumulation return value is maximum.
Four, the list structure used of iterative learning model:
In order to realize the inventive method, need to make up lower list structure:
(1) QTable table
Based on the robot path planning of iterative learning model, at first need to set up state-working value function table Q Table.This Q Table shows | and S| is capable | the two-dimensional matrix of A| row (| S| is the number of elements of state set S, | A| is the number of elements of behavior aggregate A), it has stored accumulation return value corresponding to each state-action, i.e. Q (s, the cumulative maximum return value when a) being updated to state s for the optimum action of selection a.
(2) transfer function table T
Transfer function T:S * A → ∏ (S), the variation of action is described on the impact of ambient condition, robot elemental motion has four of the four corners of the world, need set up four transfer function table: T_E, T_W, T_S, T_N, be respectively and select after east (East), west (West), south (South), north (North) action by state s
tBe converted to state s
T+1Probability.
(3) observation function table O
Robot makes a strategic decision according to the information that self-contained sensor is surveyed, and observation function O:S * A → ∏ (Z) expression robot is at current state s
tBe transformed into new state s behind the execution action a
T+1Obtain the probability of observer state z, namely set up one based on the probability distribution table of observed reading.
(4) return value table R
By judging whether robot arrives the target location and come the detection machine people whether to finish route searching one time, when robot arrives the target location, provide the repayment of 0 value, otherwise provide the repayment of-1 value.Repayment function R:S * A → R can obtain the higher return value of environment if describe certain action, and the trend that produces this action in the searching path process so afterwards will be strengthened, otherwise the trend that produces this action will weaken.
(c2) iterative learning process:
The iterative learning process is comprised of following steps:
Step1: initialization
Init state-working value function table Q Table, make Q (s, a)=0, qualification mark e (s, a)=0, initial conviction state b (s)=0.001076, parameter k=5 (5 neighbours are selected in expression), study factor-alpha=0.95, discount factor γ=0.99, λ=0.95, wherein γ λ represents qualification mark e according to probability γ λ decay, the probable value of random action is selected in ε=0.001 expression.
Step2: obtain current state s
tAnd the conviction state set B of k arest neighbors state
1) with the reference position of robot as current state s
t
2) calculate the state set knn that k state of Euclidean distance minimum consists of among st and the state set S;
3) the conviction state value b of each state among the computing mode collection knn
t(s): b
t(s)=1/ (| S|).
Step3: obtain conviction state value function
Conviction state b
t(s) corresponding value function is calculated by following formula:
Be Q (s, a) current state s in the table
tK arest neighbors collection knn in all state value function Q (i, a) with conviction state b (i) sum of products.
Step4: select action
Move according to ε-greedy Action Selection policy selection:
Wherein, U is equally distributed random number between (0,1), and probable value ε speed with 0.99 times in each learning cycle (Episode) decays, namely starting stage of learning cycle with larger probability selection random action, avoid algorithm to be absorbed in local optimum; Along with the increase of Q value effective information, ε reduces gradually, has guaranteed Algorithm Convergence.
Step5: execution action
Execution action s
tAfter be transformed into new state s
T+1, obtain simultaneously observer state z and return value R.
Step6: calculate return value R
Robot has carried out action a
tRear arrival reposition judges whether this position is the target location, if not, then obtain return value-1, carry out Step7; Otherwise, obtain return value 0, carry out Step10.
Step7: obtain NextState s
T+1Corresponding conviction state set B '
1) calculates s
T+1The state set knn ' that consists of with k state of Euclidean distance minimum among the state set s;
2) the conviction state value b of each state among the computing mode collection knn '
T+1(s '):
3) repeat Step3-Step4.
Step8: upgrade
1) the qualification mark is pressed the following formula definition:
2) to the state of all k arest neighbors states of robot status-working value function Q (i, a) upgrade:
Q
t+1(s,a)=Q
t(s,a)+Δq
a(s)(s∈knn)
3)s
t=s
t+1,a
t=a
t+1,knn=knn′,e
t+1=γλe
t,b
t(s)=b
t+1(s′)。
Step9: turn to Step5.
Step10: one time the iterative learning process finishes, and forwards Step2 to and enters next iterative learning process, until the Q value converges to optimum or near-optimization.
Claims (1)
1. the kNN of robot paths planning method that is applicable under the incomplete perception environment is characterized in that: the POMDP model is set up, and POMDP model solution, iterative model make up three steps:
(a) the POMDP model is set up: adopt grating map that the robot planning environment is divided into little grid, a state s among the corresponding POMDP model state of each the little grid cell collection S, behavior aggregate A has east (East), west (West), south (South), north (North) four actions, robot can be in one of adjacent 4 accessible grid cells constantly at next, robot arrives dbjective state can obtain return value 0, other situation return value is-1, in robot and environmental interaction, transition probability is set to correctly carry out the action that optimal strategy is selected with greater probability, to slide to the left and right sides of this action than small probability;
(b) POMDP model solution: optimal strategy is found the solution by robot needs the action experience and the historical information of observer state, historical information can utilize conviction state (Belief State) to replace, conviction state b (s) is a probability distribution on the state set S, replace state with the conviction state when finding the solution, the POMDP problem is converted into the MDP problem based on the conviction state, Action Selection strategy π is converted into by the mapping of conviction state to action: π (b) → a, under optimal strategy π *, the accumulation of discount reward value composition optimal value function Q of all conviction states (b, a);
(c) iterative model makes up: after robot arranges reference position and target location, utilization is based on the robot path planning method of intensified learning algorithm, the intensified learning algorithm is given each (s, a) defined a state-working value function Q, it is the conversion accumulation return value that robot obtains when current state selects a certain action to be updated to NextState, the Action Selection strategy is selected optimum action according to this Q value, so that the accumulation return value is maximum, the concrete steps of Iterative Algorithm are as follows:
Step1: initialization
Init state-working value function table Q Table, to Q (s, a), qualification mark e (s, a), initial conviction state b (s), parameter k, study factor-alpha, and random action selects probable value ε to compose initial value,
Step 2: obtain current state s
tAnd the conviction state set B of k arest neighbors state
1) with the reference position of robot as current state s
t
2) calculate s
tThe state set knn that consists of with k state of Euclidean distance minimum among the state set S;
3) the conviction state value b of each state among the computing mode collection knn
t(s): b
t(s)=1/ (| S|),
Step3: obtain conviction state value function
Conviction state b
t(s) corresponding value function is calculated by following formula:
Be Q (s, a) current state s in the table
tK arest neighbors collection knn in all state value function Q (i, a) with conviction state b (i) sum of products,
Step4: select action
Move according to ε-greedy Action Selection policy selection:
Wherein, U is equally distributed random number between (0,1), and probable value ε speed with 0.99 times in each learning cycle (Episode) decays, namely starting stage of learning cycle with larger probability selection random action, avoid algorithm to be absorbed in local optimum; Along with the increase of Q value effective information, ε reduces gradually, has guaranteed Algorithm Convergence,
Step5: execution action
Execution action a
tAfter be transformed into new state s
T+1, obtain simultaneously observer state z and return value R,
Step6: calculate return value R
Robot has carried out action a
tRear arrival reposition judges whether this position is the target location, if not, then obtain return value-1, carry out Step7; Otherwise, obtain return value 0, carry out Step10,
Step 7: obtain NextState s
T+1Corresponding conviction state set B '
1) calculates s
T+1The state set knn ' that consists of with k state of Euclidean distance minimum among the state set S,
2) the conviction state value b of each state among the computing mode collection knn '
T+1(s '):
3) repeat Step3-Step4,
Step8: upgrade
1) the qualification mark is pressed the following formula definition:
2) to the state of all k arest neighbors states of robot status-working value function Q (i, a) upgrade:
Q
t+1(s,a)=Q
t(s,a)+Δq
a(s)(s∈knn)
3)s
t=s
t+1,a
t=a
t+1,knn=knn′,e
t+1=γλe
t,b
t(s)=b
t+1(s′),
Step9: turn to Step5,
Step10: one time the iterative learning process finishes, and forwards Step 2 to and enters next iterative learning process, until the Q value converges to optimum or near-optimization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201210455666 CN102929281A (en) | 2012-11-05 | 2012-11-05 | Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201210455666 CN102929281A (en) | 2012-11-05 | 2012-11-05 | Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102929281A true CN102929281A (en) | 2013-02-13 |
Family
ID=47644109
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201210455666 Pending CN102929281A (en) | 2012-11-05 | 2012-11-05 | Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102929281A (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605368A (en) * | 2013-12-04 | 2014-02-26 | 苏州大学张家港工业技术研究院 | Method and device for route programming in dynamic unknown environment |
CN103901888A (en) * | 2014-03-18 | 2014-07-02 | 北京工业大学 | Robot autonomous motion control method based on infrared and sonar sensors |
CN104626206A (en) * | 2014-12-17 | 2015-05-20 | 西南科技大学 | Robot operation pose information measuring method under non-structural environment |
CN104658297A (en) * | 2015-02-04 | 2015-05-27 | 沈阳理工大学 | Central type dynamic path inducing method based on Sarsa learning |
CN104680264A (en) * | 2015-03-27 | 2015-06-03 | 青岛大学 | Transportation vehicle path optimizing method based on multi-agent reinforcement learning |
CN105549598A (en) * | 2016-02-16 | 2016-05-04 | 江南大学 | Iterative learning trajectory tracking control and robust optimization method for two-dimensional motion mobile robot |
CN105740644A (en) * | 2016-03-24 | 2016-07-06 | 苏州大学 | Cleaning robot optimal target path planning method based on model learning |
CN107065890A (en) * | 2017-06-02 | 2017-08-18 | 北京航空航天大学 | A kind of unmanned vehicle intelligent barrier avoiding method and system |
CN107403426A (en) * | 2017-06-20 | 2017-11-28 | 北京工业大学 | A kind of target object detection method and equipment |
CN107479547A (en) * | 2017-08-11 | 2017-12-15 | 同济大学 | Decision tree behaviour decision making algorithm based on learning from instruction |
CN107703509A (en) * | 2017-11-06 | 2018-02-16 | 苏州科技大学 | A kind of optimal system and method fished a little of sonar contact shoal of fish selection |
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN108572654A (en) * | 2018-04-25 | 2018-09-25 | 哈尔滨工程大学 | Drive lacking AUV based on Q study virtually anchor three-dimensional point stabilization and implementation method |
CN108680155A (en) * | 2018-02-01 | 2018-10-19 | 苏州大学 | The robot optimum path planning method of mahalanobis distance map process is perceived based on part |
CN108762249A (en) * | 2018-04-26 | 2018-11-06 | 常熟理工学院 | Clean robot optimum path planning method based on the optimization of approximate model multistep |
CN109059939A (en) * | 2018-06-27 | 2018-12-21 | 湖南智慧畅行交通科技有限公司 | Map-matching algorithm based on Hidden Markov Model |
CN109238297A (en) * | 2018-08-29 | 2019-01-18 | 沈阳理工大学 | A kind of user is optimal and the Dynamic User-Optimal Route Choice method of system optimal |
CN109445437A (en) * | 2018-11-30 | 2019-03-08 | 电子科技大学 | A kind of paths planning method of unmanned electric vehicle |
CN109579861A (en) * | 2018-12-10 | 2019-04-05 | 华中科技大学 | A kind of method for path navigation and system based on intensified learning |
CN109857107A (en) * | 2019-01-30 | 2019-06-07 | 广州大学 | AGV trolley air navigation aid, device, system, medium and equipment |
CN110361017A (en) * | 2019-07-19 | 2019-10-22 | 西南科技大学 | A kind of full traverse path planing method of sweeping robot based on Grid Method |
CN110531620A (en) * | 2019-09-02 | 2019-12-03 | 常熟理工学院 | Trolley based on Gaussian process approximate model is gone up a hill system self-adaption control method |
CN110941268A (en) * | 2019-11-20 | 2020-03-31 | 苏州大学 | Unmanned automatic trolley control method based on Sarsa safety model |
CN110989602A (en) * | 2019-12-12 | 2020-04-10 | 齐鲁工业大学 | Method and system for planning paths of autonomous guided vehicle in medical pathological examination laboratory |
CN111240318A (en) * | 2019-12-24 | 2020-06-05 | 华中农业大学 | Robot personnel discovery algorithm |
CN111367317A (en) * | 2020-03-27 | 2020-07-03 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster online task planning method based on Bayesian learning |
CN111369181A (en) * | 2020-06-01 | 2020-07-03 | 北京全路通信信号研究设计院集团有限公司 | Train autonomous scheduling deep reinforcement learning method and module |
CN111645079A (en) * | 2020-08-04 | 2020-09-11 | 天津滨电电力工程有限公司 | Device and method for planning and controlling mechanical arm path of live working robot |
CN111752274A (en) * | 2020-06-17 | 2020-10-09 | 杭州电子科技大学 | Laser AGV path tracking control method based on reinforcement learning |
CN112015174A (en) * | 2020-07-10 | 2020-12-01 | 歌尔股份有限公司 | Multi-AGV motion planning method, device and system |
CN112131754A (en) * | 2020-09-30 | 2020-12-25 | 中国人民解放军国防科技大学 | Extended POMDP planning method and system based on robot accompanying behavior model |
CN112356031A (en) * | 2020-11-11 | 2021-02-12 | 福州大学 | On-line planning method based on Kernel sampling strategy under uncertain environment |
CN113111296A (en) * | 2019-12-24 | 2021-07-13 | 浙江吉利汽车研究院有限公司 | Vehicle path planning method and device, electronic equipment and storage medium |
CN113467481A (en) * | 2021-08-11 | 2021-10-01 | 哈尔滨工程大学 | Path planning method based on improved Sarsa algorithm |
WO2021227536A1 (en) * | 2020-05-15 | 2021-11-18 | Huawei Technologies Co., Ltd. | Methods and systems for support policy learning |
-
2012
- 2012-11-05 CN CN 201210455666 patent/CN102929281A/en active Pending
Cited By (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605368A (en) * | 2013-12-04 | 2014-02-26 | 苏州大学张家港工业技术研究院 | Method and device for route programming in dynamic unknown environment |
CN103901888A (en) * | 2014-03-18 | 2014-07-02 | 北京工业大学 | Robot autonomous motion control method based on infrared and sonar sensors |
CN103901888B (en) * | 2014-03-18 | 2017-01-25 | 北京工业大学 | Robot autonomous motion control method based on infrared and sonar sensors |
CN104626206A (en) * | 2014-12-17 | 2015-05-20 | 西南科技大学 | Robot operation pose information measuring method under non-structural environment |
CN104658297A (en) * | 2015-02-04 | 2015-05-27 | 沈阳理工大学 | Central type dynamic path inducing method based on Sarsa learning |
CN104680264A (en) * | 2015-03-27 | 2015-06-03 | 青岛大学 | Transportation vehicle path optimizing method based on multi-agent reinforcement learning |
CN104680264B (en) * | 2015-03-27 | 2017-09-22 | 青岛大学 | A kind of vehicle route optimization method based on multiple agent intensified learning |
CN105549598A (en) * | 2016-02-16 | 2016-05-04 | 江南大学 | Iterative learning trajectory tracking control and robust optimization method for two-dimensional motion mobile robot |
CN105549598B (en) * | 2016-02-16 | 2018-04-17 | 江南大学 | The iterative learning Trajectory Tracking Control and its robust Optimal methods of a kind of two dimensional motion mobile robot |
CN105740644B (en) * | 2016-03-24 | 2018-04-13 | 苏州大学 | Cleaning robot optimal target path planning method based on model learning |
CN105740644A (en) * | 2016-03-24 | 2016-07-06 | 苏州大学 | Cleaning robot optimal target path planning method based on model learning |
CN107065890A (en) * | 2017-06-02 | 2017-08-18 | 北京航空航天大学 | A kind of unmanned vehicle intelligent barrier avoiding method and system |
CN107403426A (en) * | 2017-06-20 | 2017-11-28 | 北京工业大学 | A kind of target object detection method and equipment |
CN107403426B (en) * | 2017-06-20 | 2020-02-21 | 北京工业大学 | Target object detection method and device |
CN107479547A (en) * | 2017-08-11 | 2017-12-15 | 同济大学 | Decision tree behaviour decision making algorithm based on learning from instruction |
CN107479547B (en) * | 2017-08-11 | 2020-11-24 | 同济大学 | Decision tree behavior decision algorithm based on teaching learning |
CN107703509A (en) * | 2017-11-06 | 2018-02-16 | 苏州科技大学 | A kind of optimal system and method fished a little of sonar contact shoal of fish selection |
CN107703509B (en) * | 2017-11-06 | 2023-08-04 | 苏州科技大学 | System and method for selecting optimal fishing point by detecting fish shoal through sonar |
WO2019148645A1 (en) * | 2018-02-01 | 2019-08-08 | 苏州大学张家港工业技术研究院 | Partially observable markov decision process-based optimal robot path planning method |
CN108680155B (en) * | 2018-02-01 | 2020-09-08 | 苏州大学 | Robot optimal path planning method based on partial perception Markov decision process |
CN108680155A (en) * | 2018-02-01 | 2018-10-19 | 苏州大学 | The robot optimum path planning method of mahalanobis distance map process is perceived based on part |
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN108319286B (en) * | 2018-03-12 | 2020-09-22 | 西北工业大学 | Unmanned aerial vehicle air combat maneuver decision method based on reinforcement learning |
CN108572654A (en) * | 2018-04-25 | 2018-09-25 | 哈尔滨工程大学 | Drive lacking AUV based on Q study virtually anchor three-dimensional point stabilization and implementation method |
CN108762249A (en) * | 2018-04-26 | 2018-11-06 | 常熟理工学院 | Clean robot optimum path planning method based on the optimization of approximate model multistep |
CN109059939A (en) * | 2018-06-27 | 2018-12-21 | 湖南智慧畅行交通科技有限公司 | Map-matching algorithm based on Hidden Markov Model |
CN109238297B (en) * | 2018-08-29 | 2022-03-18 | 沈阳理工大学 | Dynamic path selection method for user optimization and system optimization |
CN109238297A (en) * | 2018-08-29 | 2019-01-18 | 沈阳理工大学 | A kind of user is optimal and the Dynamic User-Optimal Route Choice method of system optimal |
CN109445437A (en) * | 2018-11-30 | 2019-03-08 | 电子科技大学 | A kind of paths planning method of unmanned electric vehicle |
CN109579861B (en) * | 2018-12-10 | 2020-05-19 | 华中科技大学 | Path navigation method and system based on reinforcement learning |
CN109579861A (en) * | 2018-12-10 | 2019-04-05 | 华中科技大学 | A kind of method for path navigation and system based on intensified learning |
CN109857107A (en) * | 2019-01-30 | 2019-06-07 | 广州大学 | AGV trolley air navigation aid, device, system, medium and equipment |
CN110361017A (en) * | 2019-07-19 | 2019-10-22 | 西南科技大学 | A kind of full traverse path planing method of sweeping robot based on Grid Method |
CN110361017B (en) * | 2019-07-19 | 2022-02-11 | 西南科技大学 | Grid method based full-traversal path planning method for sweeping robot |
CN110531620A (en) * | 2019-09-02 | 2019-12-03 | 常熟理工学院 | Trolley based on Gaussian process approximate model is gone up a hill system self-adaption control method |
CN110941268A (en) * | 2019-11-20 | 2020-03-31 | 苏州大学 | Unmanned automatic trolley control method based on Sarsa safety model |
CN110989602B (en) * | 2019-12-12 | 2023-12-26 | 齐鲁工业大学 | Autonomous guided vehicle path planning method and system in medical pathology inspection laboratory |
CN110989602A (en) * | 2019-12-12 | 2020-04-10 | 齐鲁工业大学 | Method and system for planning paths of autonomous guided vehicle in medical pathological examination laboratory |
CN113111296A (en) * | 2019-12-24 | 2021-07-13 | 浙江吉利汽车研究院有限公司 | Vehicle path planning method and device, electronic equipment and storage medium |
CN111240318A (en) * | 2019-12-24 | 2020-06-05 | 华中农业大学 | Robot personnel discovery algorithm |
CN111367317A (en) * | 2020-03-27 | 2020-07-03 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster online task planning method based on Bayesian learning |
US11605026B2 (en) | 2020-05-15 | 2023-03-14 | Huawei Technologies Co. Ltd. | Methods and systems for support policy learning |
WO2021227536A1 (en) * | 2020-05-15 | 2021-11-18 | Huawei Technologies Co., Ltd. | Methods and systems for support policy learning |
CN111369181B (en) * | 2020-06-01 | 2020-09-29 | 北京全路通信信号研究设计院集团有限公司 | Train autonomous scheduling deep reinforcement learning method and device |
CN111369181A (en) * | 2020-06-01 | 2020-07-03 | 北京全路通信信号研究设计院集团有限公司 | Train autonomous scheduling deep reinforcement learning method and module |
CN111752274A (en) * | 2020-06-17 | 2020-10-09 | 杭州电子科技大学 | Laser AGV path tracking control method based on reinforcement learning |
CN111752274B (en) * | 2020-06-17 | 2022-06-24 | 杭州电子科技大学 | Laser AGV path tracking control method based on reinforcement learning |
CN112015174B (en) * | 2020-07-10 | 2022-06-28 | 歌尔股份有限公司 | Multi-AGV motion planning method, device and system |
US12045061B2 (en) | 2020-07-10 | 2024-07-23 | Goertek Inc. | Multi-AGV motion planning method, device and system |
CN112015174A (en) * | 2020-07-10 | 2020-12-01 | 歌尔股份有限公司 | Multi-AGV motion planning method, device and system |
CN111645079A (en) * | 2020-08-04 | 2020-09-11 | 天津滨电电力工程有限公司 | Device and method for planning and controlling mechanical arm path of live working robot |
CN112131754B (en) * | 2020-09-30 | 2024-07-16 | 中国人民解放军国防科技大学 | Extension POMDP planning method and system based on robot accompanying behavior model |
CN112131754A (en) * | 2020-09-30 | 2020-12-25 | 中国人民解放军国防科技大学 | Extended POMDP planning method and system based on robot accompanying behavior model |
CN112356031A (en) * | 2020-11-11 | 2021-02-12 | 福州大学 | On-line planning method based on Kernel sampling strategy under uncertain environment |
CN112356031B (en) * | 2020-11-11 | 2022-04-01 | 福州大学 | On-line planning method based on Kernel sampling strategy under uncertain environment |
CN113467481A (en) * | 2021-08-11 | 2021-10-01 | 哈尔滨工程大学 | Path planning method based on improved Sarsa algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102929281A (en) | Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment | |
Qi et al. | MOD-RRT*: A sampling-based algorithm for robot path planning in dynamic environment | |
Yang et al. | A new robot navigation algorithm based on a double-layer ant algorithm and trajectory optimization | |
Ganeshmurthy et al. | Path planning algorithm for autonomous mobile robot in dynamic environment | |
Deng et al. | Multi-obstacle path planning and optimization for mobile robot | |
Ardiyanto et al. | Real-time navigation using randomized kinodynamic planning with arrival time field | |
Xiang et al. | Continuous control with deep reinforcement learning for mobile robot navigation | |
CN105511457A (en) | Static path planning method of robot | |
CN105425820A (en) | Unmanned aerial vehicle cooperative search method for moving object with perception capability | |
Li et al. | A mixing algorithm of ACO and ABC for solving path planning of mobile robot | |
Gu et al. | Path planning for mobile robot in a 2.5‐dimensional grid‐based map | |
Jones et al. | Information-guided persistent monitoring under temporal logic constraints | |
Zhao et al. | A fast robot path planning algorithm based on bidirectional associative learning | |
Sui et al. | Design of household cleaning robot based on low-cost 2D LiDAR SLAM | |
Ma et al. | Dynamic path planning of mobile robots based on ABC algorithm | |
Samarakoon et al. | Metaheuristic based navigation of a reconfigurable robot through narrow spaces with shape changing ability | |
Cai et al. | Curiosity-based robot navigation under uncertainty in crowded environments | |
BAYGIN et al. | PSO based path planning approach for multi service robots in dynamic environments | |
Shi et al. | Coverage path planning for cleaning robot based on improved simulated annealing algorithm and ant colony algorithm | |
Chaudhary et al. | Obstacle avoidance of a point-mass robot using feedforward neural network | |
Varghese et al. | Multi-Robot System for Mapping and Localization | |
Shi et al. | Path planning for deep sea mining robot based on ACO-PSO hybrid algorithm | |
Chen et al. | A fast online planning under partial observability using information entropy rewards | |
Raju et al. | pragmatic implementation of reinforcement algorithms for path finding on raspberry Pi | |
Vivekanandan et al. | KI-PMF: Knowledge Integrated Plausible Motion Forecasting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130213 |