WO2001078951A1 - Semi-optimal path finding in a wholly unknown environment - Google Patents

Semi-optimal path finding in a wholly unknown environment Download PDF

Info

Publication number
WO2001078951A1
WO2001078951A1 PCT/JP2000/002427 JP0002427W WO0178951A1 WO 2001078951 A1 WO2001078951 A1 WO 2001078951A1 JP 0002427 W JP0002427 W JP 0002427W WO 0178951 A1 WO0178951 A1 WO 0178951A1
Authority
WO
WIPO (PCT)
Prior art keywords
robot
goal
task
theory
environment
Prior art date
Application number
PCT/JP2000/002427
Other languages
French (fr)
Inventor
Zhimin Lin
Original Assignee
Zhimin Lin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhimin Lin filed Critical Zhimin Lin
Priority to PCT/JP2000/002427 priority Critical patent/WO2001078951A1/en
Priority to AU36785/00A priority patent/AU3678500A/en
Publication of WO2001078951A1 publication Critical patent/WO2001078951A1/en

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/31From computer integrated manufacturing till monitoring
    • G05B2219/31005Detect obstacles on path of vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40473Using genetic algorithm GA
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40499Reinforcement learning algorithm
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40501Using sub goal method of options for semi optimal path planning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the first task toi oui autonomous learnina robot is searching a path like a human beine in a wholK unknown em ironment In other words it is finding a semi-optimal path to a giv en uoal in a whoih unknown env ironment 1 he task description tor the robot is to tollow goal s direction and approach to goal Thus there is onlv one pioblem which is collision avoidance in a whollv unknown environment
  • Temporal abstraction is the essential topic in artificial intelligence area tor analvzing human decision process
  • Options ov er MDP is proposed bv S Sutton basing on MDP theorv and semi-MDP theorv w hich is a theorv ot tempoial abstraction within the framework ot reinforcement learning W e also utilize the Option concept to descnbe our powerful novel environment modeling method Sub-uoal task method
  • aiuo ⁇ thm ot HiMBS is a new learninii aiuo ⁇ thm tor a mobile robot it is designed tor making an autho ⁇ tative autonomous learning mobile robot ⁇ conventional opinion on the role ot the autonomous learning robot is that it should be able to interact with a wholly unknown natural environment, which is unpredictable in both ⁇ me and space and perhaps, contains dynamic obstacles sometimes This is one of the most difficult problems for most man-made machines ail over the worid
  • the superv ising algorithm is a conventional famous learning algorithm However, it often wastes much time on desiring a rarely happening architecture in an actual experiment It cannot deal with a wholly unknown environment exactly
  • the env ironment in path finding can be desc ⁇ bed as a tree structure a particular Quadtree or a general MDP tree tor a whole environment ( See F ig 0 )
  • each node represents a possible passing point tor a robot Every group of nodes that starts from the starting point to the final point of task goal forms a path
  • An optimal path can be de ⁇ ved from all of the found paths Therefore, how to find the nodes of a path is the first essential topic for ev ery path finding theory
  • the method of employing a Quadtree to find nodes is not a task method, that is. it is other additional human empi ⁇ cal knowledge from the task knowledge
  • This method assumes the robot has a Quadtree knowledge and can decompose its workspace into several sub-nodes These nodes are chosen as sub-goals to be reached successiv ely
  • this decomposition method can not solve the obstacle av oidance problem, in other words, this method can not model env ironment
  • the designer implants another human empi ⁇ cal knowledge into robot, which is following the bounda ⁇ es of objects, that means robot has the empi ⁇ cal knowledge of wall following knowledge to pass obstacles
  • this theory employ s much human non-task empi ⁇ cal knowledge to enable a robot to successfully find a path in an unknown environment
  • our invention here focuses on how our task method can find the nodes of a semi-optimal path by robot itself That is. how can our HiMBS robot realize the problem of environment modeling through learning'' An optimal path can be de ⁇ ved after all semi-optimal paths are found in our HiMBS theorv Our invention here firstly considers how our robot can find a semi-op ⁇ mal path Our Background of Options over VI DP
  • the benchmark mobile learning robot problem is how to make an autho ⁇ tative autonomous learning robot That is. to judge whether a mobile robot is this kind of robot or not is according to its interacting env ironment s fuzzy degree
  • a mobile robot can be said an autonomous learning robot if its interacting env ironment is wholly unknown, as complex and arbitrarv as an actual space will be. which has ev en kind of the possibilities of containing dynamic obstacles at ev erv corner of its
  • bemg capable of interacting with a wholly unknown environment is not the only ludging c ⁇ te ⁇ on of an autonomous learning robot This environment capability has been a pursuing final goal for the long past human s lnstorv That is.
  • the task is that robot should approach to goal and reach goal by following the goal ' s direction.
  • One c ⁇ te ⁇ on for judging reaching the goal is that when the robot is stopped, in other words, the robot can
  • Table 1.0 The conditions of a natural autonomous robot.
  • an autonomous robot can not successfully realize this mission in such a natural environment, it is not considered as an autho ⁇ tative autonomous learning robot Since an autonomous learning robot interacts with an environment as uncertain as a natural creature does Therefore, its interaction is a process of from sub-optimization to optimization, a process of from t ⁇ als to planning, a process from errors to correctness
  • the genetic algo ⁇ thm is quite suitable to such a process We utilize it as the control central of robot s actions Howevei. the environment is so far complex There are various different stimuli Wliereas.
  • W e use the sub-goal ' s MDP model to desc ⁇ be the most fundament strategy, which is the framework of the task method knowledge
  • MDP sub-goal model when a robot is stopped, genetic -ilgo ⁇ thm transfer the current goal and the termination condition of the stopped process to MDP sub-goal model, this model then produces sub-goals as the following current goal to reinforcement learning algo ⁇ thm Reinforcement learning module use the current goal to continue to evaluate robot s ev en action
  • the MDP sub-goal model judges the termination condition of the sub-goal process The procedures of our whole system s programming are
  • A means the actions unit of a robot [0 1] means the probability of a policy is between 0 to 1 ⁇ represents to a certain method to realize the policy
  • a policy means the mapping from states to the probabilities of taking each available p ⁇ mitive action
  • a b ⁇ efly and more accurate explanation of policy is that policy is a possible sequence of actions-states pairs In paths finding theory, a policy can be simply called as a path
  • ZZINS EF Zhinan Zen inertial Nav igation System and Ev aluation method
  • Our robot has eight moving freedoms We use the ZZINS EF to evaluate credits of these eight possible actions according to whether the direction of an action makes robot go farther or nearer to the task goal The credit of an action has the highest credit among the eight actions if the direction of an action is the same as the current direction of task goal Because of our ZZINS EF method, our robot has an immediate efficiency of finding the shortest path at once in an empty and hghtless environment and does not need any outer position navigation system Our robot is initiated because of the direction of the current goal but not the coordinates of robot Therefore, our robot can absorb a quite high level of motor ' s slipping errors
  • I ( ( ) ( .r, . . , ) is the coordinates particular state element of coordinates of state s
  • the task can be desc ⁇ bed by the option of () It consists of the initiating conditions of / at the start point and the deadlock termination conditions of ⁇ at the position of task goal CG means the current goal. TG means the task goal. I'y means the V1DP policy model of ( F
  • k is dynamic changeable. There are several reasons: one is that the env ironment is whoilv unknown One is that the robot has a non-zero size One is that there exist arbitran size dead co ers One is that sub-goals mav be set up inside the bodv of an obstacle That means the sub-goals are unreachable The final reason is that we want to enable robot to learn an accurate dead-comer model
  • Fig. 5 is the TRIAL graphic of this dynamic experiment.
  • DO means the dynamic obstacle.
  • the red coordinates of (97.440) means the 1 SI sub-goal, which is chosen firstly, is unreachable. It is in the body of Obstacle 1.
  • DC1 is the first Dead Comer.
  • DL means the state of Dead Lock. Under DL.
  • S means the start step of Dead Lock.
  • E is the last step of this Dead Lock.
  • TRIALs 1(3) means the steps of the first TRIAL is 3 steps.
  • SGI means the 1 st Sub-goal of this Dead Comer.
  • SP means the position of the Dead Lock.
  • MP means the last point of this Dead Comer ' s model.
  • LengthDC ( S m ⁇ n + If ) * f ( t ) (6)
  • HiMBS algo ⁇ thm is a powerful and successful aigo ⁇ thm for mobile robot to find a semi-optimal path to a given goal in a wholly unknown environment
  • This ammation test platform has also testified our Zhinan Zen INS navigation theorem and evaluation theory
  • the indust ⁇ al program expe ⁇ ence enables us to find the essential actual issue of sub-goal method and the new Environment Modeling learning method of TRYING in mobile robot
  • ZZINS EF Zhinan Zen Inertial Navigation System and Evaluation model
  • the least hardware conditions of our robot are one step-around-robot extern sensor or better one for detecting obstacles, motors and wheels or other instruments for moving, an inertial encoder sensor but no needs of Gyro sensor, accelerator sensor or tachometer, a controller for running our algo ⁇ thm
  • the spi ⁇ t software condition of the robot is our HiMBS executable file Etc INDUSTRIAL APPLICABILITY
  • a one-step-around-robot extern sensor or a better one for detecting obstacles is av ailable
  • our algo ⁇ thm can be utilized in indust ⁇ al applications. For example, a waiter robot in restaurants, a mine searching machine, a deep-ma ⁇ ne-ob ect searching subma ⁇ ne. etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

A mobile robot, which is guided by a new learning algorithm is presented. The first task for the robot is searching a path like a human being. That is, finding a semi-optimal path to a given goal in a wholly unknown, unpredictable and partly, dynamic large-scale environment. Therefore, it demands high technique of environment modeling through learning. Because the robot should realize its missions within the framework of the task knowledge, there appears a new phenomenon and a Dead Corner theory: genetic algorithm and reinforcement learning algorithm are the basis for the presented theory. Powerful methods are described in the mathematics tool of MDP models. The essential method of the sub-goal method is similar with the Sub-goal method of Options over MDP theory and has been successfully proved. The presented theory is a refined theory of A Learning Classifier System theory. Most of the theory has been proved in a visualization platform of an animation simulator.

Description

DESC RIPTION
SEM I-OPTIYl AL P\TH FIN DING IN \ \V HOLL\ I N kNOW N EN\ IRON M ENT
TFXHMQI E FIELD
The first task toi oui autonomous learnina robot is searching a path like a human beine in a wholK unknown em ironment In other words it is finding a semi-optimal path to a giv en uoal in a whoih unknown env ironment 1 he task description tor the robot is to tollow goal s direction and approach to goal Thus there is onlv one pioblem which is collision avoidance in a whollv unknown environment
Therefore it demands a high technique ot learning tor env ironment modeling Being similar with the iheoiΛ ot Λ Learning C lassifier vstem which is a MIT press in 1998 our theoiΛ also has two basis aliionthms uenetic learnιn_. aliioπthm and reinforcement learnimi algonthm
Learning algorithm is an essential topic in artificial intelligence Utilizing the theorv ot Markov Decision Process which is a discrete time process is a powerful approach loi analvzing human decision problems in artificial intelligence How to avoid obstacles is a kind of decision process Moreover the motion of a lobot can be considered as a discrete time process Therefore we utilize the MDP theorv to descπbe and analvze the basis of our HiMBS theorv
Temporal abstraction is the essential topic in artificial intelligence area tor analvzing human decision process Options ov er MDP is proposed bv S Sutton basing on MDP theorv and semi-MDP theorv w hich is a theorv ot tempoial abstraction within the framework ot reinforcement learning W e also utilize the Option concept to descnbe our powerful novel environment modeling method Sub-uoal task method
Λ reat important note is that although our HiMBS theorv is also an approach ot temporal abstraction in artificial intelligence our approach is not an Ac tion Rep/ax Process (ARP) or a Synchronous l alue leration (SI I) process as the theorv ot Options over MDP and Q-learning Therefore the most important things for understanding the MDP part ot our HiMBS theorv are the understandinεs ot several fundamental MDP concepts MDP MDP tree State Action State value Pohcv and Options
BACKGROUND ART
Our aiuoπthm ot HiMBS is a new learninii aiuoπthm tor a mobile robot it is designed tor making an authoπtative autonomous learning mobile robot Λ conventional opinion on the role ot the autonomous learning robot is that it should be able to interact with a wholly unknown natural environment, which is unpredictable in both πme and space and perhaps, contains dynamic obstacles sometimes This is one of the most difficult problems for most man-made machines ail over the worid
Most convendonal learning algoπthms on mobile robot demand much human empirical knowledge to enable a robot to explore and exploit a wholly unknown environment Mostly, the algorithm s designer not onlv gives the task question to a machine but also spends tremendous time to implant non-task method answer keys to this machine, that is. these methods cannot be realized as task descnpnon does but greatly depend on human's other non-task method empirical knowledge to realize task Strictly speaking, the environment in most of these conventional learning theories is not exactly wholly unknown Some representative algoπthms are discussed latterly
Once more, human's other empirical knowledge we mention here means that the answer keys of these kinds of knowledge do not utilize knowledge of task to realize task For instance, to the general task of path finding, that is. the task of following goal's direction from a start point and approaching to this given goal, method of pre-givmg an environment map to a robot and method of wall following are not considered as task method answer keys They are the human's other non-task empiπcal knowledge answer keys to realize the task
In our HiMBS algorithm, we utilize task knowledge, which is the task descπption of following goal's direction and approach to goal, to realize the task of finding a semi-optimal path from a start point to an arbitrary given goal
Our Background of Some Conventional Learning Algorithm Theories and Path Finding Theories
The superv ising algorithm is a conventional famous learning algorithm However, it often wastes much time on desiring a rarely happening architecture in an actual experiment It cannot deal with a wholly unknown environment exactly
\ learning classifier system affords a very good algorithm framework It consists of reinforcement learning algoπthm and genetic algorithm However, it is impossible to make a robot to av oid arbitrary obstacles by just constructing a blank algoπthm structure to show avoidmg several simple obstacles We are not sure that this aigoπthm is possible to deal with a wholly unknown environment and make robot show more complex behaviors
Jong Hwan Lim and Dong Woo Cho in Robottca 1998 propose the method of employing the concept of Quadtree to solve paths finding task 'Robust Sub-goal Planning and Motion Execution for Robots in . uzzv tnv nonment ". w hich is proposed bv Jianwei Zhang and J Raczkowskv in IROS ι . also tor leahzing the same task as ours Both of them also utilize the same Sub-goal method as ours to solv e the environment-modeling problem
The env ironment in path finding can be descπbed as a tree structure a particular Quadtree or a general MDP tree tor a whole environment ( See F ig 0 ) In the general M DP tree, each node represents a possible passing point tor a robot Every group of nodes that starts from the starting point to the final point of task goal forms a path An optimal path can be deπved from all of the found paths Therefore, how to find the nodes of a path is the first essential topic for ev ery path finding theory
The method of employing a Quadtree to find nodes is not a task method, that is. it is other additional human empiπcal knowledge from the task knowledge This method assumes the robot has a Quadtree knowledge and can decompose its workspace into several sub-nodes These nodes are chosen as sub-goals to be reached successiv ely However, this decomposition method can not solve the obstacle av oidance problem, in other words, this method can not model env ironment In order to solve this problem, the designer implants another human empiπcal knowledge into robot, which is following the boundaπes of objects, that means robot has the empiπcal knowledge of wall following knowledge to pass obstacles In a word, this theory employ s much human non-task empiπcal knowledge to enable a robot to successfully find a path in an unknown environment
The method of robust sub-goal planning theory also does not utilize task method to find nodes This method assumes that obstacles are polygons and these polygons are pre-known Thus, it is possible for this method to utilize the method of searching a Tangent-graph to find nodes. Obviously, environment in this method is not an actually unknown env ironment
In a word, although these two methods also treat the nodes of a tree as sub-goals to find a path, they utilize too much human non-task empiπcal knowledge to find the nodes of an environment s Markov tree However, our method of finding these nodes is a novel method These nodes are found by a robot's own expeπence of learning and interacting with its env ironment but not by any pre-assumption of our human being The robot is guided by its task knowledge, following goal's direction and approaching to goal
Therefore, our invention here focuses on how our task method can find the nodes of a semi-optimal path by robot itself That is. how can our HiMBS robot realize the problem of environment modeling through learning'' An optimal path can be deπved after all semi-optimal paths are found in our HiMBS theorv Our invention here firstly considers how our robot can find a semi-opπmal path Our Background of Options over VI DP
ITie theorv of Options ov er MDP is newer than Dvna-Q learning, which is also dev eloped bv R S Sutton It introduces manv methods on exploπng and exploiting an environment and how to calculate an optimal pohcv Manv methods are such as I The reinforcement learning ( MDP) framework 2 SMDP ( option-to-option ) methods Sub-goals for learning options Etc Thev have proposed many mathematics methods to describe and solv e human decision process as a Markov Decision Process C ollision avoidance is a kind of human decision process and a discrete time process Therefore, we utilize MDP to descπbe our method
Howev er, as mentioned by R S Sutton himself The theory of Options over MDP has offers a lot. but the most interesting cases are bevond it W e consider it means that the details of ev aluating actions ci edits, actual methods of how to interact with a large-scale whoilv unknown environment and one of the most difficult actual problems robot s position nav igation svstem problem Thev also sav that manv key issues remain incompletely understood For example the source ot sub-goals ( and ot course the method of realizing them) and integration with state abstraction, etc Although our approach is different from these ARP processes, a similar sub-goal method, which is the only essential method for solving our problem of environment modeling through learning, is proposed bv us to successfullv realize the task of env lronment modeling
Benchmark Mobile Learning Robot Problems: Environment Problem. Learning Problem and Position Navigation Problem.
The benchmark mobile learning robot problem is how to make an authoπtative autonomous learning robot That is. to judge whether a mobile robot is this kind of robot or not is according to its interacting env ironment s fuzzy degree A mobile robot can be said an autonomous learning robot if its interacting env ironment is wholly unknown, as complex and arbitrarv as an actual space will be. which has ev en kind of the possibilities of containing dynamic obstacles at ev erv corner of its However, bemg capable of interacting with a wholly unknown environment is not the only ludging cπteπon of an autonomous learning robot This environment capability has been a pursuing final goal for the long past human s lnstorv That is. in order to enable a robot to interact with a ieallv whoilv unknown env ironment we as designers, should implant a high certain learning capability to a robot What kind of a learning capability e should implant into a robot is a great problem Moreov er, there is one ot the most difficult actual problems tor a mobile robot Setting up an accurate and robust position-localizing sy stem is alw ay s arguablv one of the essential necessities to meet the obiectiv es ot an autonomous guided mobile robot in these environments There are two representative existing position navigation systems Global Position Svstem (GPS ) and Inertial Nav igation Sy stem ( INS ) Realizing these localization svstems in a small range area may be accurate and cost reasonable On the ontrarv. the accuracy of these navigation systems in a large lange arbitrary natural env ironment will greatiy decrease, cost expensive and destroy the correctness of robot s interactions
DESCRIPTION OF THE INVENTION
1. Our robot's conditions, improvements and the structure of this description of the invention.
Perhaps, there exist many other methods to realize the same task as our system the task of env ironment modeling through learning However, our methods are realized under a robot s worst conditions the robot onlv has one kind of extern sensor, which is only capable of detecting obstacles one step around robot Here, one step refers to the length between robot's two footmarks. The robot does not have any outer position navigation system Furthermore, the robot should realize its task within the framework of the task knowledge, which is the least knowledge that we should give to a robot when it is put into an env ironment Howev er, the robot has to interact with a wholly unknown environment
Therefore, what is the task description for our robot We giv e an example in Fig I In such an empty and hghtless env ironment except tor the Goal, the task is that robot should approach to goal and reach goal by following the goal 's direction. One cπteπon for judging reaching the goal is that when the robot is stopped, in other words, the robot can |udge that there are two same states, we can inform robot that it has reach its task goal Corresponding to the theory of Options over MDP [R S Sutton]. the task can be descπbed in such a MDP Policv model l S < A goal = /( . -» f 0.1], method = θ —Task MDPP model ( 1 )
Here θ means following the goal 's direction and approaching to the goal We will descπbe more details of this model latter The theory of Options over MDP is a framework of temporal abstraction in l einforcement learning It is as general as a mathematics tool on analyzing methods ot exploπng an env ironment and exploiting it bv calculating pohcy-v aiues. action-values and state-values for a robot v . e have made some important improvements trom these two theoπes 1 w e touch on some most important actual issues: how to enable robot to interact with a large-scale wholly unknown environment. We propose a particular method of the framework of the task knowledge to realize environment modeling through sub-goals learning. 2. we propose a Zhinan Zen Inertial Navigation System with only one inertial sensor of encoder to guide a robot and evaluate its actions" credits. 3. we propose the essential issue of sub-goal method to enable robot to overcome the complexity of an environment, to find a semi-optimal path quickly. 4. Our sub-goal method can absorb the actual great mobile robot's problem: motor slipping (Because the slipping occurring state will be treated as a deadlock state) and the accumulative dead reckonmg errors (Because robot moves according on orientation).
Our method is quite novel because we try to use the task knowledge to finish task and this method is as natural as human's daily intelligent activities. We compare our HiMBS robot with our human-being under the same task of finding a semi-optimal path to a given goal to show our robot's naturalness. (See table 1.0)
Table 1.0. The conditions of a natural autonomous robot.
A human I Our HiMBS robot
0. Can interact with a wholly unknown i 0. The same as a human environment
1. No need of any position navigation system I 1. The same as a human
2. Only need to know the goal's direction. 2. The same as a human, better to hav e an eigenvalue or goal's some literature features. i estimated position of goal
3. Has a strong memory's problem ' 3. Has a strong memory.
4. Has expeπence knowledge about environment | 4. No knowledge on environment, and methods: e.g. walk along a wall to pass.
5. Has knowledge of its front, back. etc... j 5. Not have such knowledge
6. Can see further than one footstep around I 6. Can only detect obstacle a step around it.
The structure of this descπption of our invention is: 1. Brief introduction of our robot's conditions, improvements and the structure of this description of the invention. 2. Our HiMBS theory, its program procedure and an essential sub-goal theorem. 3. Expeπment results and an error function. 4. Conclusions and our future work.
2. Our HiMBS theory, its program procedure and an essential sub-goal theorem. \n autonomous learning system should be able to endow iobot with human-level intelligence to interact w ith a natural env ironment Mostlv . the natural environment is wholly unknown, unpredictable at both time and space Sometimes, it also contains dynamic obiects. for instance, an obstacle, an activ e predator or a locomotiv e goal Of course, it consists of arbitrary obstacles, which hav e uncertain size and _hape Conv entionally. if an autonomous robot can not successfully realize this mission in such a natural environment, it is not considered as an authoπtative autonomous learning robot Since an autonomous learning robot interacts with an environment as uncertain as a natural creature does Therefore, its interaction is a process of from sub-optimization to optimization, a process of from tπals to planning, a process from errors to correctness The genetic algoπthm is quite suitable to such a process We utilize it as the control central of robot s actions Howevei. the environment is so far complex There are various different stimuli Wliereas. there is onlv one standard of credit for genetic algorithm to run rules ( actions etc ) process The reinforcement learning algoπthm is suitable for translating v aπous stimuli into single credit It is a good ev aluating tool Thus, we use the reinforcement algoπthm as a cooperative algoπthm of GA
The architecture of our HiMBS system is shown as Fig 3 Since the environment is wholly unknown to our robot, genetic algoπthm make our robot be sure to explore and exploit the environment thoroughly Genetic algoπthm has four modules reproduction, selection, crossover and mutation Reproduction produce some fundamental strategies and of course, pπmary actions at each footmarks ot robot For instance, strategy of the framework of the task method knowledge is executed in this module and the pπmarv strategy of sub-goal method Selection acts as an auction selector to select every action and ev en strategy to command the actuator or other component systems to produce actions or rules The roles of both crossov er and mutation are only realized at finding other paths in order to find an optimal path, which are not the central topic in this paper They both change their genes of a path to find other paths Gene of a path means the important points of a path but not all the points of a path The roles ot reinforcement learning algoπthm are evaluating credits of each actions and each control rules in order to let the Selection module works, evaluating stimuli of a robot, for example the stimulus of encountering an obstacle, the stimulus of reaching task goal, etc
W e use the sub-goal's MDP model to descπbe the most fundament strategy, which is the framework of the task method knowledge For instance, as we show in the figure, when a robot is stopped, genetic -ilgoπthm transfer the current goal and the termination condition of the stopped process to MDP sub-goal model, this model then produces sub-goals as the following current goal to reinforcement learning algoπthm Reinforcement learning module use the current goal to continue to evaluate robot s ev en action The MDP sub-goal model judges the termination condition of the sub-goal process The procedures of our whole system s programming are
• In the 1 st whole loop robot utilizes the task method knowledge to explore the environment, then it can find one quite short path and can find the optimal path at once if there is no any obstacle because of our task method method Our robot utilizes an inertial navigation method to build a two dimensions coordinates plane for its environment Moreover, in the first whole loop, after robot arm es at its task goal, it can correct the estimated coordinates of task goal s position
• Duπng the 2nd whole loop robot will try to optimize the found path by calling the mutation module and calculate the path's pohcv value according to its learning expeπence on environment's model
We hav e mentioned that our HiMBS theorv is a refined theory of Options over MDP One of the biggest improv ements is the most fundament problem of the credit model Together with the Q-learning theory, the Dyna-Q learning theory, their credit models are only in a real number space, that is. they only consider the award of a credit, which is . = /* . / = 0,1.2.3 However, according to the newest reinforcement learning theory, a credit does consist of not only an award but also a punishment Therefore, we upgrade the most pπmarv credit model into a complex number space, which is
( = rt -+ / * ω . i = 0,1.2.3 Here, to represents the punishment part of a credit Thus, from our credit model, we can have more accurate lnfoπnation on robot s interaction and can enable us to separate two paths that have the same policy-value as Q-learmng but has two different environments We can find the optimal path then Therefore, all of the values calculations of policy-v alues ( in mobile robot case simplified speaking, path-values), state-values and action-values are changed For example, the calculation ot state-values is changed from formulation (2) to (3)
! '" ( s )
Figure imgf000009_0001
However, in the whole loop of exploπng an environment, we do not utilize the S\ I process ( Svnchronous \alue Iteration process) In other words, we do not start with an arbitran approximation calculation to find paths and then make paths planning Because exploπng even points of an environment bv starting with an arbitran approximation calculation demands numerous memones and takes plenfv of time rΛ ( s, ι
= (∑ r. s.tf)[r/ + ∑ /7 ιRe(r/V . )]. - ) ae I '
+ i ;) ,τ( s , )[<y;, + ;/ I /. ;, Im( J ' , ( N, ))]i ae 1
Our method of utilizing the framework of the task method knowledge has an immediate approximation In addition we utilize the cumulative calculation formulations, such as formulation (3). at the second whole loop, that is. soon after a robot has finished its task Therefore, our calculations of state-vales in the first whole loop of exploπng an environment is simplified as formulation of (4) Here. π{ s„ __ ) means the probability of taking action a in the state of s under the policy of π (task MDPP model ( 1)) r ' means a step forward reward of taking action a in the state of s
,Vj π _( ,u)r;' } + . i . . V π _{,s . , a .).ω.. .' }. <4)
I a _ I
Thus, how to evaluate actions' credit and how to realize our three objectives are descπbed as below To realize the first objectiv e of environment modeling, there is a new dead-corner theory that appears in the first whole loop
As we mentioned before, the method condition of our robot is descπbed in an empty and hghtless environment In the task MDPP model ( 1 ) Pv means a policy S means the state space of a robot's
environment A means the actions unit of a robot [0 1] means the probability of a policy is between 0 to 1 θ represents to a certain method to realize the policy A policy means the mapping from states to the probabilities of taking each available pπmitive action A bπefly and more accurate explanation of policy is that policy is a possible sequence of actions-states pairs In paths finding theory, a policy can be simply called as a path
Therefore, what does the model ( 1 ) means exactly9 When a robot is inputted into a wholly unknown environment for the goal, it receives the only environment knowledge the goal's direction and the estimated coordinates of the goal (Stπctly speaking, coordinates of the goal are possible to be gotten from the goal's direction ) The task for the robot, even in an empty and hghtless environment (except for the robot and the goal), is descπbed as Following the goal's direction If robot is not stopped, robot keeps on approaching The initiating conditions are written as / If the goal is treated as an obstacle, robot will fall into an oscillating state, which is called a deadlock state Thus, the deadlock state conditions can be a cπteπon to )udge task goal The deadlock termination condition of a global task process is represented bv β
Thus, in order not to use an outer navigation system and to evaluate the actions credits, we propose a Zhinan Zen inertial Nav igation System and Ev aluation method ( abbreviated as ZZINS EF. see Fig 3 )
Our robot has eight moving freedoms We use the ZZINS EF to evaluate credits of these eight possible actions according to whether the direction of an action makes robot go farther or nearer to the task goal The credit of an action has the highest credit among the eight actions if the direction of an action is the same as the current direction of task goal Because of our ZZINS EF method, our robot has an immediate efficiency of finding the shortest path at once in an empty and hghtless environment and does not need any outer position navigation system Our robot is initiated because of the direction of the current goal but not the coordinates of robot Therefore, our robot can absorb a quite high level of motor's slipping errors
Thus, eight possible moving directions have their own corresponding credit
Therefore, what do the initiating conditions / and the deadlock termination conditions β mean exactly in this empty and hghtless environment1 The most simplified meaning of / is that if the initiating conditions are not equivalent with the deadlock termination conditions β at the goal, the robot
can keep on going We define the state of robot's /th footmark as s, „ / = 0,1.2.3.4 Therefore, its next
state is (_! The deadlock termination conditions β means that there exist at least two same footmarks' states at the same position of around goal m this empty and hghtless environment Mostly, a wholly unknown environment is quite complex Plenty of particular literature characteπstics of an objective ( obstacle) can represent one unique state element to a human, however is difficult for a computer Whereas, coordinates of a position are not difficult for a computer, we consider the coordinates of a footmark as a state element of the state at that footmark
Thus, the value of a state at the /th footmark is wπtten as \ l ( s ). / = 0.1 l l ( " ) is the
same as the foπnulation (4) It summarizes all of the possible actions values at state \,
I ( ( ) = ( .r, . . , ) is the coordinates particular state element of coordinates of state s
Therefore, corresponding to the theory of Options over MDP. the Options-MDP (abbreviated as OMDP) model of our framework of the task method knowledge can be written as
Here, the task can be descπbed by the option of () It consists of the initiating conditions of / at the start point and the deadlock termination conditions of β at the position of task goal CG means the current goal. TG means the task goal. I'y means the V1DP policy model of ( F
OU. Py. β)
Py : S x Λ \ CG = TG. method = θ
Figure imgf000012_0001
pi -. V ( s: ) = l ' (slk ) \ CG = TG .1 = 0.1.2.3.4...: k = 2.3.4.5.6... 7 = 0.1. Task OMDP model (TOMDP) (5)
We assume the state now is .s,+i . Because the method condition of our OMDP model framework, there appears a Dead-Corner problems. However, in order to try to utilize the task knowledge to solve the Dead-Corners, to realize the given task, we find that it is still possible to use the OMDP model to finish the task. In our following paper, we will introduce our dead corner theory.
To solve these dead-comers problems, in other words, the dead corners environment modeling, we propose a sub-goal theorem to realize it. which is mentioned as an essential issue by R.S. Sutton. in order to construct a sub-goal OMDP model, we just change the task OMDP model little. We assume that the state of Λ, is the state at the position of judging a deadlock state. When a robot is stopped by a dead comer at the point of (xl.y l ), it will set up sub-goals near the directions of its possible moving directions at a changeable distance of len . Normally, a 80 times length of a step of len is enough for passmg most arbitrary dead corner and obstacle if the environment does not contain a particular big obstacle. We give an example of sub-goal in the Fig. 4.2. The following (6) is our Sub-goal's OMDP model.
(M β,Py. f )
I : S x A ( 'G = SG TG, method = θ
/ : I ' /V1 ( .v ) ≠ V,^ (s lX \ GG = SG Sub-εoal OMDP model (6)
Figure imgf000012_0002
. i = 0,1,2.3.4...; k = dynamic const e [2.3,4.5,6.7....] / = 0.1.
We only switch the current goal between TG (task goal) and SG . switch the position of β and
/ and limit dynamic changeable constant of k but not arbitrary variable of k in the task OMDP model (5 ). We do not change the essential part of task OMDP model, which is θ method.
An important note is why k is dynamic changeable. There are several reasons: one is that the env ironment is whoilv unknown One is that the robot has a non-zero size One is that there exist arbitran size dead co ers One is that sub-goals mav be set up inside the bodv of an obstacle That means the sub-goals are unreachable The final reason is that we want to enable robot to learn an accurate dead-comer model
We summaπze a sub-goal theorem as below
In order to use me natural sub-goal method and in to use the task know ledge to solve the I m tronmeni Modeling problem We set up a Sub-goal OMDP model (SOMDP) from the task OMDP model b\ changing little of it
When a robot is follow ing into a aeadlock state some sub-goals mil be set up near to the possible moving directions at a certain distance We assume the stale at this footmark of judging a deadlock slate
's s I he SOMDP model w ill begin to be set up accordin to the termination condition of fOMDP s β
1 hat means the c urrent goal w ill be changed to sun-goal (SO) and then robot w ill begin to
Figure imgf000013_0001
e for
A steps after s under the sub-goal polιc \
Figure imgf000013_0002
er and the state-\ alue of
( ' ( . j ) satisfies me termination condition 1 the current goal w ill be sw itched back to task goal (TO) f hat is the option 0 of SOMDP is finished However the option () w ill be set up again at the 2k steps after s ] if the state \ alue of V " ( s/^ ) satisfies the initial condition β under the same task
goal polic x Pvϋ If at the 2k step footmark β is incorrect the SOMDP w ill be over that means the
robot has pas sed the dead corner and this dead corner s environment modeling has been finished It β is correct k w ill increase to be k + \ I he same SOMDP is set up again and the sub-goal proces s w ill begin as the former sub-goal proces s
Therefore, there occurs an interesting learning feature of TRIAL The robot usually tries manv times to make an accurate dead-comer model, which is shown bv our experiment
Thus, the first ob]ectiv e ot our environment modeling through learning bv robot itself is realized A model of a dead comer consists of three points noπnallv Please see an example of Fig 4
We treat the model points of a dead comer as genes of a path Therefore, we activate the Mutation module of Genetic Algoπthm to set even gene of a path to be current goal for some time in orders duπng the robot s approaching to the task goal The different orders ot the genes treated as current goal, the different paths will be found Additionallv our sub-goal method in paths finding is also guaranteed to expioπng the environment thoroughly and finds an optimal path More details ot these topics will be presented in our following papers 3. Experiment results and an error function.
In order to show the correctness of most issues of our sub-goal learning theory, we present a dynamic experiment result here. In this experiment, robot encounters a dynamic obstacle firstly, a theoretical dead comer secondly and a particular physical dead corner finally. (See Fig. 6)
Because the interacting environment is wholly unknown to robot and our robot has to learn to get a correct model of its environment, robot cannot get a correct model of this dynamic obstacle without trying. Fig. 5 is the TRIAL graphic of this dynamic experiment.
In the table 5.0. DO means the dynamic obstacle. The red coordinates of (97.440) means the 1 SI sub-goal, which is chosen firstly, is unreachable. It is in the body of Obstacle 1. DC1 is the first Dead Comer. DL means the state of Dead Lock. Under DL. S means the start step of Dead Lock. E is the last step of this Dead Lock. TRIALs 1(3) means the steps of the first TRIAL is 3 steps. SGI means the 1st Sub-goal of this Dead Comer. SP means the position of the Dead Lock. MP means the last point of this Dead Comer's model.
Table 3.0. A dynamic experiment result
Figure imgf000014_0001
From this experiment, we can get a TRIAL function on Environment Modeling through learning as Formulation (5). which, according to the sense of modeling task, is an error function. Here. Sn means the steps of robot from initial position to the position when robot encounters a Dead Comer. / means time. A process is considered as a TRIAL duπng which the current goal is sub-goal but not the task goal. A TRIAL consists of several steps. S(t ) = S„ + 2 * S * T(t ) + Pn * 7"(t)[l + T(t)] 15 )
/' means the minimal moving steps duπng one TRIAL / (/ ) means the number of trying times at
time / . that is the number of TRIALs nm means the minimal step-size of a dead comer, it can be anv
integer number In this dynamic expeπment. we let Smm = 3 steps to fast the process
According to this error function, we can get the size of one side of a Dead Comer from the corresponding error function, which is
LengthDC = ( Smιn + If ) * f ( t ) (6)
We compare steps m the first loop of expioπng and exploiting the environment with the steps in the second loop of oπginal paths optimizing in Fig 5 1 and Fig 5 2 Results show the correctness of the learned dynamic obstacle's model and the πghteousness of our sub-goal OMDP theorem
4. Conclusions and our future work.
Our HiMBS algoπthm is a novel learning approach of Environment Modeling in a wholly unknown environment It enables a robot to independently and actually autonomously leam the obstacles model It thoroughly makes a mobile robot being free from the restriction of its environment
We make an animation simulator simulate what an actual wholly unknown environment will be and successfully test our HiMBS algoπthm We try to pre-design all of the possible problems in an actual mobile robot experiment in its two dimensions plane For instance, robot has a non-zero size, it has an arbitrary size and shape Obstacle also has an arbitrary size and shape, furthermore, sometimes, obstacle is in the state of dynamic moving Moreover, there is one of the most difficult problems a difficult dead reckoning problem etc These problems have been successfully overcome by our HiMBS algoπthm Therefore, although there may be still some defects of our HiMBS. we believ e HiMBS algoπthm is a powerful and successful aigoπthm for mobile robot to find a semi-optimal path to a given goal in a wholly unknown environment This ammation test platform has also testified our Zhinan Zen INS navigation theorem and evaluation theory In a word, the industπal program expeπence enables us to find the essential actual issue of sub-goal method and the new Environment Modeling learning method of TRYING in mobile robot
A three-dimension simulating environment will make our aigoπthm behaviors be more interesting and complex Our algoπthm mav fail more times at the first loop Howev er, it will quicklv correct its errors and learn successful modes Additionally, in order to make an authoπtative autonomous learning robot, it is necessan for us to test our algoπthm in other tasks Otherwise we are not sure to make a general algoπthm
How can our sub-goal method in path planning enable a robot to find other paths except the first found path9 Is our sub-goal method in path planning guaranteed to explore and exploit a wholly unknown environment thoroughly7 What are the details of our path optimizing theory Etc These topics will be made clear in our following papers
Finally, many essential concepts m robotics are left for us to continuously develop, such as. the intelligence definition and the least knowledge for a learning algoπthm. the state concept in a computer, how to judge whether an algoπthm is better or not Etc This paper of ours is a beginning of the research approach of discussing the least knowledge for a learning algoπthm and the state concept in a computer Our this invention will be start of a new artificial intelligence approach
BRIEF DESCRIPTION OF THE DRAWINGS
Fig [ The Markov tree structure of every paths finding environment
Fig 2 The task descnption in an empty and hghtless environment
Fig 3 The architecture of our HiMBS
Fig - Zhinan Zen Inertial Navigation System and Evaluation model (abbreviated as ZZINS EF)
Fig 5 The model of a dead comer
Fig 6 Leam to model a dynamic obstacle
Fig 7 A successful dynamic experiment result
BEST MODE FOR CARRYING OUT THE INVENTION
The least hardware conditions of our robot are one step-around-robot extern sensor or better one for detecting obstacles, motors and wheels or other instruments for moving, an inertial encoder sensor but no needs of Gyro sensor, accelerator sensor or tachometer, a controller for running our algoπthm The spiπt software condition of the robot is our HiMBS executable file Etc INDUSTRIAL APPLICABILITY
If a one-step-around-robot extern sensor or a better one for detecting obstacles is av ailable, our algoπthm can be utilized in industπal applications. For example, a waiter robot in restaurants, a mine searching machine, a deep-maπne-ob ect searching submaπne. etc.
References
[ 1 ] R.S. Sutton. "Options over MDP". Artificial Intelligence. 1998.
[2] Macro Doπgo. Macro Chorombetti. "Robot shaping"
[3] L.P aelbhng. M.L. Littman and A.W Moore. "Reinforcement Learning: A Survey." Journal of
Artificial Intelligence Research. No 4. pp. 237-285. 1996 [4J . Mivazaki. M Yamamura and S. Kobavashi. "k-Certainty Exploration Method. .An Action
Selec-tor on Reinforcement Learning to Identify the En-vironment .'" Journal of Japanese Society for
Ar-tificial Intelligence. Vol. 10. No. 3. pp. 454-463, 1995 [5]J.J. Grefenstette. "Credit Assignment in Rule Dis-covery Systems Based on Genetic Algoπthms."
Machine Learning. Vol. 3. pp. 225-245. 1988 [6] Motivational system for regulating human-robot interaction Breazeal-Cynthia
Proceedings-Of-the-National-Conference-on- Artificial-Intelligence. 1998. AAAI. Menlo Park. CA.
USA P 54-61 [7] Poggio T and Beymer D ( 1996). Learning to See. IEEE Spectrum.

Claims

Claims
Claim 1 We hope to be protected in any industrial path finding field Any program that utilizes our sub-goal task method should obey the invention law Any program, which has a trying behavior to avoid collision and use our sub-goal method in a wholly unknown environment, is considered as utilizing our invention Claim 2 Any program should obey the laws to utilize any protected idea in this invention of ours
PCT/JP2000/002427 2000-04-13 2000-04-13 Semi-optimal path finding in a wholly unknown environment WO2001078951A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2000/002427 WO2001078951A1 (en) 2000-04-13 2000-04-13 Semi-optimal path finding in a wholly unknown environment
AU36785/00A AU3678500A (en) 2000-04-13 2000-04-13 Semi-optimal path finding in a wholly unknown environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2000/002427 WO2001078951A1 (en) 2000-04-13 2000-04-13 Semi-optimal path finding in a wholly unknown environment

Publications (1)

Publication Number Publication Date
WO2001078951A1 true WO2001078951A1 (en) 2001-10-25

Family

ID=11735918

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2000/002427 WO2001078951A1 (en) 2000-04-13 2000-04-13 Semi-optimal path finding in a wholly unknown environment

Country Status (2)

Country Link
AU (1) AU3678500A (en)
WO (1) WO2001078951A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2912234A1 (en) * 2007-02-06 2008-08-08 Vimades Sarl Remote controlled or self guided traveling object's i.e. vehicle, navigation determining method, involves searching viable trajectories which prohibit obstacle, while reaching target, and selecting and following effective trajectory
CN102819264A (en) * 2012-07-30 2012-12-12 山东大学 Path planning Q-learning initial method of mobile robot
CN103914560A (en) * 2014-04-16 2014-07-09 中国科学院合肥物质科学研究院 Q-learning-based multi-agent initiative recommendation method for agriculture capital electronic commerce
CN103955222A (en) * 2014-05-05 2014-07-30 无锡普智联科高新技术有限公司 Mobile robot path planning method based on multi-barrier environment
EP2827211A1 (en) * 2013-07-15 2015-01-21 BAE Systems PLC Route planning
WO2015008032A1 (en) * 2013-07-15 2015-01-22 Bae Systems Plc Route planning
US9588520B2 (en) 2015-06-18 2017-03-07 Kinpo Electronics, Inc. Positioning navigation method and electronic apparatus thereof
CN107168054A (en) * 2017-05-10 2017-09-15 沈阳工业大学 Multi-robotic task is distributed and paths planning method
CN107491068A (en) * 2017-08-29 2017-12-19 歌尔股份有限公司 Method for planning path for mobile robot, device and route design device
CN107860389A (en) * 2017-11-07 2018-03-30 金陵科技学院 Robot chamber expert walks intensified learning path navigation algorithm
CN107891423A (en) * 2017-11-08 2018-04-10 石家庄铁道大学 Intelligent exploration robot and its detection method based on Multi-sensor Fusion detection
CN108287565A (en) * 2018-01-09 2018-07-17 山东科技大学 A kind of automatic collision avoidance method of unmanned plane based on rolling window
WO2019047646A1 (en) * 2017-09-05 2019-03-14 百度在线网络技术(北京)有限公司 Obstacle avoidance method and device for vehicle
CN109799820A (en) * 2019-01-22 2019-05-24 智慧航海(青岛)科技有限公司 Unmanned ship local paths planning method based on the random road sign figure method of comparison expression
CN110908377A (en) * 2019-11-26 2020-03-24 南京大学 Robot navigation space reduction method
CN111781922A (en) * 2020-06-15 2020-10-16 中山大学 Multi-robot collaborative navigation method based on deep reinforcement learning and suitable for complex dynamic scene
CN112327862A (en) * 2020-11-16 2021-02-05 北京理工大学 Path planning method for multi-robot collaborative search in uncertain environment
CN112925319A (en) * 2021-01-25 2021-06-08 哈尔滨工程大学 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
US11263545B2 (en) 2016-06-30 2022-03-01 Microsoft Technology Licensing, Llc Control of cyber-physical systems under uncertainty
CN114782595A (en) * 2022-06-21 2022-07-22 山东捷瑞数字科技股份有限公司 Three-dimensional engine-based shortest distance track determination method, device and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106647757B (en) * 2016-12-23 2019-06-18 江西理工大学 Based on the robot path planning method for combining individual differential evolution
CN112947421B (en) * 2021-01-28 2022-09-13 西北工业大学 AUV autonomous obstacle avoidance method based on reinforcement learning

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
BARSHAN, B. ET AL: "Inertial navigation systems for mobile robots", IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, vol. 11, no. 3, June 1995 (1995-06-01), pages 328 - 342, XP002160801 *
DEL R ET AL: "Reinforcement learning of goal-directed obstacle-avoiding reaction strategies in an autonomous mobile robot", ROBOTICS AND AUTONOMOUS SYSTEMS,NL,ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, vol. 15, no. 4, 1 October 1995 (1995-10-01), pages 275 - 299, XP004001929, ISSN: 0921-8890 *
KANG D ET AL: "PATH GENERATION FOR MOBILE ROBOT NAVIGATION USING GENETIC ALGORITHM", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS,CONTROL, AND INSTRUMENTATION (IECON),US,NEW YORK, IEEE, vol. CONF. 21, 6 November 1995 (1995-11-06), pages 167 - 172, XP000586452, ISBN: 0-7803-3027-7 *
SAMEJIMA K ET AL: "Adaptive internal state space construction method for reinforcement learning of a real-world agent", NEURAL NETWORKS,ELSEVIER SCIENCE PUBLISHERS, BARKING,GB, vol. 12, no. 7-8, October 1999 (1999-10-01), pages 1143 - 1155, XP004180502, ISSN: 0893-6080 *
SMITH R E ET AL: "Combined biological paradigms: A neural, genetics-based autonomous systems strategy", ROBOTICS AND AUTONOMOUS SYSTEMS,NL,ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, vol. 22, no. 1, 10 November 1997 (1997-11-10), pages 65 - 74, XP004100314, ISSN: 0921-8890 *
SMITHERS T: "On quantitative performance measures of robot behaviour", ROBOTICS AND AUTONOMOUS SYSTEMS,NL,ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, vol. 15, no. 1, 1 July 1995 (1995-07-01), pages 107 - 133, XP004062099, ISSN: 0921-8890 *
THRUN S ET AL: "Lifelong robot learning", ROBOTICS AND AUTONOMOUS SYSTEMS,NL,ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, vol. 15, no. 1, 1 July 1995 (1995-07-01), pages 25 - 46, XP004062095, ISSN: 0921-8890 *
ZHANG J., ET AL: "ROBUST SUBGOAL PLANNING AND MOTION EXECUTION FOR ROBOTS IN FUZZY ENVIRONMENTS", PROCEEDINGS OF "INTELLIGENT ROBOTS AND SYSTEMS '94", IROS'94., 1994, pages 447 - 453, XP002160783 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2912234A1 (en) * 2007-02-06 2008-08-08 Vimades Sarl Remote controlled or self guided traveling object's i.e. vehicle, navigation determining method, involves searching viable trajectories which prohibit obstacle, while reaching target, and selecting and following effective trajectory
CN102819264A (en) * 2012-07-30 2012-12-12 山东大学 Path planning Q-learning initial method of mobile robot
US9733090B2 (en) 2013-07-15 2017-08-15 Bae Systems Plc Route planning
EP2827211A1 (en) * 2013-07-15 2015-01-21 BAE Systems PLC Route planning
WO2015008032A1 (en) * 2013-07-15 2015-01-22 Bae Systems Plc Route planning
CN103914560A (en) * 2014-04-16 2014-07-09 中国科学院合肥物质科学研究院 Q-learning-based multi-agent initiative recommendation method for agriculture capital electronic commerce
CN103955222A (en) * 2014-05-05 2014-07-30 无锡普智联科高新技术有限公司 Mobile robot path planning method based on multi-barrier environment
CN103955222B (en) * 2014-05-05 2016-05-11 无锡普智联科高新技术有限公司 A kind of method for planning path for mobile robot based on multi obstacles environment
US9588520B2 (en) 2015-06-18 2017-03-07 Kinpo Electronics, Inc. Positioning navigation method and electronic apparatus thereof
TWI577968B (en) * 2015-06-18 2017-04-11 金寶電子工業股份有限公司 Positioning navigation method and electronic apparatus thereof
US11263545B2 (en) 2016-06-30 2022-03-01 Microsoft Technology Licensing, Llc Control of cyber-physical systems under uncertainty
CN107168054A (en) * 2017-05-10 2017-09-15 沈阳工业大学 Multi-robotic task is distributed and paths planning method
CN107168054B (en) * 2017-05-10 2020-11-10 沈阳工业大学 Multi-robot task allocation and path planning method
CN107491068B (en) * 2017-08-29 2020-12-04 歌尔股份有限公司 Mobile robot path planning method and device and path planning equipment
CN107491068A (en) * 2017-08-29 2017-12-19 歌尔股份有限公司 Method for planning path for mobile robot, device and route design device
WO2019047646A1 (en) * 2017-09-05 2019-03-14 百度在线网络技术(北京)有限公司 Obstacle avoidance method and device for vehicle
CN107860389A (en) * 2017-11-07 2018-03-30 金陵科技学院 Robot chamber expert walks intensified learning path navigation algorithm
CN107891423B (en) * 2017-11-08 2024-01-26 石家庄铁道大学 Intelligent exploration robot based on multi-sensor fusion detection and detection method thereof
CN107891423A (en) * 2017-11-08 2018-04-10 石家庄铁道大学 Intelligent exploration robot and its detection method based on Multi-sensor Fusion detection
CN108287565B (en) * 2018-01-09 2020-09-04 山东科技大学 Unmanned aerial vehicle automatic collision avoidance method based on rolling window
CN108287565A (en) * 2018-01-09 2018-07-17 山东科技大学 A kind of automatic collision avoidance method of unmanned plane based on rolling window
CN109799820B (en) * 2019-01-22 2020-12-22 智慧航海(青岛)科技有限公司 Unmanned ship local path planning method based on comparative random road map method
CN109799820A (en) * 2019-01-22 2019-05-24 智慧航海(青岛)科技有限公司 Unmanned ship local paths planning method based on the random road sign figure method of comparison expression
CN110908377B (en) * 2019-11-26 2021-04-27 南京大学 Robot navigation space reduction method
CN110908377A (en) * 2019-11-26 2020-03-24 南京大学 Robot navigation space reduction method
CN111781922A (en) * 2020-06-15 2020-10-16 中山大学 Multi-robot collaborative navigation method based on deep reinforcement learning and suitable for complex dynamic scene
CN111781922B (en) * 2020-06-15 2021-10-26 中山大学 Multi-robot collaborative navigation method based on deep reinforcement learning
CN112327862A (en) * 2020-11-16 2021-02-05 北京理工大学 Path planning method for multi-robot collaborative search in uncertain environment
CN112925319A (en) * 2021-01-25 2021-06-08 哈尔滨工程大学 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN114782595A (en) * 2022-06-21 2022-07-22 山东捷瑞数字科技股份有限公司 Three-dimensional engine-based shortest distance track determination method, device and medium

Also Published As

Publication number Publication date
AU3678500A (en) 2001-10-30

Similar Documents

Publication Publication Date Title
WO2001078951A1 (en) Semi-optimal path finding in a wholly unknown environment
Sun et al. Motion planning for mobile robots—Focusing on deep reinforcement learning: A systematic review
Montiel et al. Optimal path planning generation for mobile robots using parallel evolutionary artificial potential field
Mohanty et al. Controlling the motion of an autonomous mobile robot using various techniques: a review
Niroui et al. Robot exploration in unknown cluttered environments when dealing with uncertainty
Hatch et al. The value of planning for infinite-horizon model predictive control
Gochev et al. Path planning and collision avoidance regime for a multi-agent system in industrial robotics
Akbarimajd et al. Autonomously implemented versatile path planning for mobile robots based on cellular automata and ant colony
Lyu et al. Ttr-based reward for reinforcement learning with implicit model priors
Mousa et al. Path Planning for a 6 DoF Robotic Arm Based on Whale Optimization Algorithm and Genetic Algorithm
Cabreira et al. An extended evolutionary learning approach for multiple robot path planning in a multi-agent environment
Krothapalli et al. Mobile robot navigation using variable grid size based reinforcement learning
Egerstedt Linguistic control of mobile robots
Xiaowei et al. The path planning of mobile manipulator with genetic-fuzzy controller in flexible manufacturing cell
Garrote et al. Improving Local Motion Planning with a Reinforcement Learning Approach
Yeasmin et al. GA-based adaptive fuzzy logic controller for a robotic arm in the presence of moving obstacle
Shiltagh et al. A comparative study: Modified particle swarm optimization and modified genetic algorithm for global mobile robot navigation
Arista et al. Development of an efficient path planning algorithm for indoor navigation
Rybalov et al. Fuzzy model of control for quantum-controlled mobile robots
Zaman Robot Evolution for Autonomous Behavior in Dynamic Environments
Santos Learning from Demonstration using Hierarchical Inverse Reinforcement Learning
Mitić et al. Empirical control system development for intelligent mobile robot based on the elements of the reinforcement machine learning and axiomatic design theory
Nguyen Navigation under Obstacle Motion Uncertainty using Markov Decision Processes
Özdemır et al. Analysis and Design of a Guided Sampling based Path Planning using CNNs
EP3955081A1 (en) Method and device for determining an optimized control strategy of a robotic device in a dynamic objects environment

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC. EPO FORM 1205A DATED 09.04.2003

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase