JP4964255B2

JP4964255B2 - Autonomous mobile robot operation planning apparatus, method, program and recording medium, and autonomous mobile robot operation control apparatus and method

Info

Publication number: JP4964255B2
Application number: JP2009004996A
Authority: JP
Inventors: 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-01-13
Filing date: 2009-01-13
Publication date: 2012-06-27
Anticipated expiration: 2029-01-13
Also published as: JP2010165050A

Description

この発明は、例えば飛行船、潜水艦等の自律移動ロボットの動作計画を行う技術に関する。特に自律移動ロボットに障害物を回避させる動作計画を行う技術に関する。 The present invention relates to a technique for performing an operation plan of an autonomous mobile robot such as an airship or a submarine. In particular, the present invention relates to a technique for making an operation plan for an autonomous mobile robot to avoid an obstacle.

イナーシャ（慣性）が高い劣駆動型の自律移動ロボットの動作計画を行う技術として、マルコフ決定過程における動作計画法を利用した技術が知られている（例えば、特許文献１及び非特許文献１参照。）。 As a technique for performing an operation plan of an under-actuated autonomous mobile robot having high inertia (inertia), a technique using an operation planning method in a Markov decision process is known (for example, see Patent Document 1 and Non-Patent Document 1). ).

この技術においては、想定される流速の下、各状態ｓ∈｛ｓ_１，…，ｓ_Ｎ｝にある自律移動ロボットが各行動ａ∈｛ａ_１，…，ａ_Ｍ｝を取った場合に各状態ｓ’∈｛ｓ_１，…，ｓ_Ｎ｝に遷移する状態遷移確率Ｐ^ａ _ｓｓ’と、そのときに得られる報酬Ｒ^ａ _ｓｓ’とをまず求める。報酬Ｒ^ａ _ｓｓ’については、例えば到達点を含む状態ｓ’に遷移するときに与えられる報酬Ｒ^ａ _ｓｓ’を１、障害物を含む状態ｓ’に遷移するときに与えられる報酬Ｒ^ａ _ｓｓ’を−１、障害物を含まない状態ｓ’に遷移するときに与えられる報酬Ｒ^ａ _ｓｓ’を０とする。 In this technique, under the flow rate is assumed, each state _{_{s∈ {s 1, ..., s}} N} autonomous mobile robot each behavior in _{_{a∈ {a 1, ..., a}} M} each when taking the state _{_{s'∈ {s 1, ..., s}} N} ' and reward ^R _{a ss} obtained at that _time' transition to the state transition probability ^P _{a ss} to determine first and. For the reward R ^a _{ss ′} , for example, the reward R ^a _{ss ′} given when transitioning to the state s ′ including the reaching point is 1, and the reward R ^a _{ss ′} given when transitioning to the state s ′ including the obstacle is performed. -1, and 0 'reward R ^a _ss given when transitioning to _the' state s free of obstructions.

そして、状態遷移確率Ｐ^ａ _ｓｓ’及び報酬Ｒ^ａ _ｓｓ’を用いて、マルコフ決定過程における動的計画法に基づき、状態価値関数Ｖ^π（ｓ）を求める。そして、想定される流速と実際の流速の流速差を考慮しつつ、状態価値関数Ｖ^π（ｓ）を最大にする行動ａを選択し、その選択された行動ａに従って自律移動ロボットを制御する。 Then, using a state transition probability ^P _{a ss 'and} reward ^R _{a ss',} based on dynamic programming in Markov decision processes, obtains the state value function V ^[pi a (s). Then, the action a that maximizes the state value function V ^π (s) is selected in consideration of the difference between the assumed flow speed and the actual flow speed, and the autonomous mobile robot is controlled according to the selected action a.

特開２００７−３１７１６５号公報JP 2007-317165 A

H.Kawano, “Three Dimensional Obstacle Avoidance of Autonomous Blimp Flying in Unknown Disturbance”, Proceeding of 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.123-130, October, 2006H. Kawano, “Three Dimensional Obstacle Avoidance of Autonomous Blimp Flying in Unknown Disturbance”, Proceeding of 2006 IEEE / RSJ International Conference on Intelligent Robots and Systems, pp.123-130, October, 2006

しかしながら、非特許文献１及び特許文献１に記載された技術おいては、自律移動ロボットが行動する環境で生じ得る想定外の一時的な流れを考慮して報酬を定めてはいない。 However, in the techniques described in Non-Patent Document 1 and Patent Document 1, a reward is not determined in consideration of an unexpected temporary flow that may occur in an environment in which an autonomous mobile robot acts.

したがって、自律移動ロボットが行動する環境で想定外の一時的な流れが生じた場合に、動作計画が破綻しやすいという課題がある。 Therefore, when an unexpected temporary flow occurs in an environment in which an autonomous mobile robot acts, there is a problem that the motion plan is likely to fail.

上記の課題を解決するために、互いに異なる複数の一時的な流れが生じた場合に、各状態にある上記自律移動ロボットが各行動に基づいて移動した場合に障害物に衝突するかどうかを判定し、衝突すると判定された一時的な流れが生じる確率の総和である衝突確率を計算し、衝突確率を考慮して報酬を決定し、その報酬に基づいて動的計画を行う。 In order to solve the above-mentioned problem, when a plurality of different temporary flows occur, it is determined whether or not the autonomous mobile robot in each state collides with an obstacle when moving based on each action Then, a collision probability which is a sum of the probabilities of occurrence of temporary flows determined to collide is calculated, a reward is determined in consideration of the collision probability, and dynamic planning is performed based on the reward.

この発明は、想定外の一時的な流れを考慮して報酬を決定するため、動作計画はより破綻しづらくなるという効果を奏する。 Since the present invention determines a reward in consideration of an unexpected temporary flow, there is an effect that the motion plan becomes more difficult to break down.

自律移動ロボットの動作計画装置の例の機能ブロック図。The functional block diagram of the example of the operation | movement plan apparatus of an autonomous mobile robot. 状態遷移確率計算部の例のブロック図。The block diagram of the example of a state transition probability calculation part. 自律移動ロボットの動作制御装置の例の機能ブロック図。The functional block diagram of the example of the operation control apparatus of an autonomous mobile robot. 自律移動ロボットの模式図であり、（ａ）は正面図、（ｂ）は側面図。It is a schematic diagram of an autonomous mobile robot, (a) is a front view, (b) is a side view. 水平方向の位置の変位量を説明するための図。The figure for demonstrating the displacement amount of the position of a horizontal direction. 状態遷移確率の計算を説明するための図。The figure for demonstrating calculation of a state transition probability. 衝突の判定及びその省略を説明するための概念図。The conceptual diagram for demonstrating the determination of a collision, and its omission. 衝突が起こる一時的な流れの集合Ｓ_ｇの連続的な概念図。Continuous conceptual view of a set S _g of temporary flow collision occurs. 衝突が起こる一時的な流れの集合Ｓ_ｇの離散的な概念図。Discrete conceptual view of a set S _g of temporary flow collision occurs. 自律移動ロボットの動作制御方法の例を示す流れ図。The flowchart which shows the example of the operation control method of an autonomous mobile robot. 衝突判定部の処理を説明するための図。The figure for demonstrating the process of a collision determination part. 自律移動ロボットの動作制御方法の例を示す流れ図。The flowchart which shows the example of the operation control method of an autonomous mobile robot.

［マルコフ決定過程］
まず、この発明を把握するための基礎知識である強化学習（Reinforcement Learning）におけるマルコフ決定過程（Markov decision Process）の概略を説明する。 [Markov decision process]
First, an outline of a Markov decision process in reinforcement learning (Reinforcement Learning), which is basic knowledge for grasping the present invention, will be described.

環境を構成する離散的な状態の集合をＳ＝｛ｓ_１，ｓ_２，…，ｓ_Ｎ｝、行動主体が取り得る行動の集合をＡ＝｛ａ_１，ａ_２，…，ａ_Ｍ｝と表す。環境中のある状態ｓ∈Ｓにおいて、行動主体がある行動ａ∈Ａを実行すると、環境は確率的に状態ｓ’∈Ｓへ遷移する。その遷移確率（状態遷移確率とも呼ぶ）を
Ｐ^ａ _ｓｓ’＝Ｐｒ｛ｓ_ｔ＋１＝ｓ’｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ｝
により表す。このとき環境から行動主体へ報酬ｒが確率的に与えられるが、その期待値を
Ｒ^ａ _ｓｓ’＝Ｅ｛ｒ_ｔ｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ，ｓ_ｔ＋１＝ｓ’｝
とする。 _S = the set of discrete states which constitute the environmental _{{s 1, s 2, ...} , s N}, action _A = the set of entities can take action _{{a 1, a 2, ...} , a M} and To express. In a state sεS in the environment, when an action aεA is executed, the environment probabilistically changes to the state s′εS. The transition probability (also referred to as state transition probability) is P ^a _{ss ′} = Pr {s _{t + 1} = s ′ | s _t = s, a _t = a}
Is represented by At this time, the reward r is probabilistically given from the environment to the action subject, and the expected value is expressed as R ^a _{ss ′} = E {r _t | s _t = s, a _t = a, s _{t + 1} = s ′}
And

なお、状態ｓ’に附されている記号’は、状態ｓとの識別を図るための記号である。時間微分を表す記号として記号’を用いることがあるが、記号’が附された対象がマルコフ状態遷移モデルの状態であるか否かで記号’の意味を容易に識別できるので、以降の説明でもこの記法に従うとする。 The symbol 'attached to the state s' is a symbol for identifying the state s. The symbol 'may be used as a symbol representing time differentiation, but the meaning of the symbol' can be easily identified by whether or not the object with the symbol 'is a state of the Markov state transition model. Suppose you follow this notation.

ある時間ステップｔで行った行動が、その後の報酬獲得にどの程度貢献したのかを評価するため、その後得られる報酬の時系列を考える。報酬の時系列評価は価値と呼ばれる。行動主体の目標は、価値を最大化すること、又は、価値を最大にする方策π（ｓ，ａ）を求めることである。方策π（ｓ，ａ）は、状態ｓにおいて行動ａを取ることを意味し、状態ｓ及び行動ａの複数の組のそれぞれについて定義される。価値は、時間の経過とともに報酬を割引率γ（０≦γ＜１）で割引いて合計される。すなわち、ある方策πの下での状態ｓの価値である状態価値関数Ｖ^π（ｓ）は、以下のように定義される。Ｅ_πは期待値を求める関数である。 In order to evaluate how much the action performed at a certain time step t contributed to the subsequent reward acquisition, a time series of rewards obtained thereafter is considered. The time series evaluation of reward is called value. The goal of the action subject is to maximize the value or to find a policy π (s, a) that maximizes the value. Policy π (s, a) means taking action a in state s, and is defined for each of a plurality of sets of state s and action a. The value is totaled by discounting the reward with a discount rate γ (0 ≦ γ <1) over time. That is, the state value function V ^π (s) that is the value of the state s under a certain policy π is defined as follows. E _π is a function for _obtaining an expected value.

ここでは価値関数として方策πの下での状態ｓの価値である状態価値関数Ｖ^π（ｓ）を採用したが、方策πの下で状態ｓにおいて行動ａを採ることの価値である行動価値関数Ｑ^π（ｓ，ａ）を採用することもできる。 Here, the state value function V ^π (s) that is the value of the state s under the policy π is adopted as the value function, but the behavior value function that is the value of taking the action a in the state s under the policy π. Q ^π (s, a) can also be adopted.

行動主体の目標は、最適な方策πを求めること、つまり任意の状態ｓについて価値関数（上記の例では状態価値関数Ｖ^π（ｓ）である。）が他の方策πを採った場合よりも劣るものではない方策πを求めることである。この方策πの探求は、Ｂｅｌｌｍａｎ方程式で表され、状態ｓと行動ａと遷移先の状態ｓ’との各組み合わせについての状態遷移確率Ｐ^ａ _ｓｓ’及び報酬Ｒ^ａ _ｓｓ’の値が定まっていれば、動的計画法（ダイナミックプログラミング法）により、最適な、状態価値関数Ｖ^π（ｓ）、行動価値関数Ｑ^π（ｓ，ａ）及び方策πを計算することができる（例えば、三上貞芳、皆川雅章共訳、R.S.Sutton、A.G.Barto 原著「強化学習」森北出版、1998、pp.94-118参照。）。動的計画法の処理は、周知技術であるため説明は省略する。 The goal of the action subject is to obtain an optimal policy π, that is, the value function (in the above example, the state value function V ^π (s)) for an arbitrary state s than when another policy π is adopted. It is to find a policy π that is not inferior. Quest This measure π is represented by Bellman equations long as definite values of and reward R ^a _ss' 'state transition probability P ^a _ss for each combination of _the' state s and action a destination state s For example, the optimal state value function V ^π (s), action value function Q ^π (s, a), and policy π can be calculated by dynamic programming (dynamic programming method) (for example, Sadayoshi Mikami) , Masaaki Minagawa co-translation, RSSutton, AGBarto Original work "Reinforcement Learning" Morikita Publishing, 1998, pp.94-118.) Since the dynamic programming process is a well-known technique, the description thereof is omitted.

［自律移動ロボットの動作計画装置及び方法］
自律移動ロボットの動作計画装置及び方法の実施形態について説明する。 [Operation planning apparatus and method for autonomous mobile robot]
An embodiment of an operation planning apparatus and method for an autonomous mobile robot will be described.

行動主体である自律移動ロボットの例を、図４（ａ）（ｂ）を参照して説明する。自律移動ロボットは、主推進器１０１、上下方向推進器１０２、舵１０３、ゴンドラ１０４、流速差取得部２１、位置計測部２５を有する。この自律移動ロボットは、真横方向に直接移動することができない。搭載アクチュエータである主推進器１０１、上下方向推進器１０２、舵１０３が制御可能な運動自由度よりも自律移動ロボットの運動自由度は高いので、この自律移動ロボットは劣駆動ロボットである。この実施形態では、自律移動ロボットとして飛行船タイプのものを採用しているが、水中無人探索機のような水中ロボット等の任意の自律移動ロボットを採用してもよい。自律移動ロボットには、搭載アクチュエータに応じて、行動単位時間Ｔごとに取り得る行動が定められる。 An example of an autonomous mobile robot that is an action subject will be described with reference to FIGS. The autonomous mobile robot includes a main thruster 101, a vertical thruster 102, a rudder 103, a gondola 104, a flow velocity difference acquisition unit 21, and a position measurement unit 25. This autonomous mobile robot cannot move directly in the lateral direction. Since the autonomous mobile robot has a higher degree of freedom of motion than the degree of freedom of motion that can be controlled by the main thruster 101, the vertical direction thruster 102, and the rudder 103 which are the mounted actuators, this autonomous mobile robot is an underactuated robot. In this embodiment, an airship type robot is used as the autonomous mobile robot, but any autonomous mobile robot such as an underwater robot such as an underwater unmanned searcher may be used. In the autonomous mobile robot, actions that can be taken every action unit time T are determined according to the mounted actuator.

自律移動ロボットは、不定の流速の流れがある流体で満たされた空間を航行する。その空間はマルコフ遷移状態モデルにより離散的にモデル化されており、自律移動ロボットの二次元座標（ｘ，ｙ）、方位角ψ及び旋回速度ψ’の４つ次元から構成される。各次元は、その次元の物理量を測定するセンサの分解能に応じて離散化されている。この空間には、自律移動ロボットの出発位置及び到着位置が予め定められ、また障害物が配置される。 An autonomous mobile robot navigates a space filled with a fluid having an indefinite flow velocity. The space is discretely modeled by a Markov transition state model, and is composed of four dimensions of a two-dimensional coordinate (x, y), an azimuth angle ψ, and a turning speed ψ ′ of an autonomous mobile robot. Each dimension is discretized according to the resolution of the sensor that measures the physical quantity of that dimension. In this space, the departure position and arrival position of the autonomous mobile robot are determined in advance, and obstacles are arranged.

出発位置を含む状態ｓに位置する自律移動ロボットは、予め定められた行動の集合の中から１つの行動ａを選択する。そして、予め定められた行動単位時間Ｔだけその行動ａに従って移動して、遷移先の状態ｓ’に移動する。この遷移先の状態ｓ’において、再び、予め定められた行動の集合の中から１つの行動ａを選択して、行動単位時間Ｔだけその行動に従って移動して、遷移先の状態ｓ’’に移動する。この行動の選択と状態の遷移を繰り返すことにより、初めは出発地点を含む状態にある自律移動ロボットは、空間に配置された予め定められた到達地点を含む状態に移動しようとする。自律移動ロボットの動作計画装置は、そのための動作計画を行う。 The autonomous mobile robot located in the state s including the departure position selects one action a from a predetermined set of actions. Then, the robot moves according to the action a for a predetermined action unit time T and moves to the transition destination state s ′. In this transition destination state s ′, again, one action a is selected from a predetermined set of actions, moved according to the action for the action unit time T, and changed to the transition destination state s ″. Moving. By repeating this action selection and state transition, the autonomous mobile robot initially in a state including the departure point tries to move to a state including a predetermined arrival point arranged in space. The operation planning apparatus for an autonomous mobile robot performs an operation plan for that purpose.

＜ステップＳ１（図１０）＞
状態遷移確率計算部１（図１）は、予め定められた流れの下で、各状態ｓにある自律移動ロボットが各行動ａを取った場合に各状態ｓに遷移する状態遷移確率Ｐ^ａ _ｓｓ’を計算する（ステップＳ１）。すなわち、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての状態遷移確率Ｐ^ａ _ｓｓ’を計算する。計算された状態遷移確率Ｐ^ａ _ｓｓ’は、動的計画部２に送られる。 <Step S1 (FIG. 10)>
The state transition probability calculation unit 1 (FIG. 1) performs a state transition probability P ^a _ss that transitions to each state s when the autonomous mobile robot in each state s takes each action a under a predetermined flow. _' Is calculated (step S1). That is, the state transition probability P ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s _′ is calculated. The calculated state transition probability P ^a _{ss ′} is sent to the dynamic planning unit 2.

状態遷移確率Ｐ^ａ _ｓｓ’の計算方法の例について説明する。この例では、状態遷移確率計算部１は、図２に例示するように、目標速度計算部１１、変位量計算部１２及び確率計算部１３を含む。 It will be described an example of how to calculate the state transition probability P ^a _ss'. In this example, the state transition probability calculation unit 1 includes a target speed calculation unit 11, a displacement amount calculation unit 12, and a probability calculation unit 13, as illustrated in FIG.

≪ステップＳ１１≫
状態遷移確率計算部１の目標速度計算部１１（図２）は、自律移動ロボットが各状態ｓにおいてある各行動ａを取ったときの目標速度を決定する（ステップＳ１１）。目標速度は、変位量計算部１２に送られる。例えば、各行動ａについて、下記の式に従って自律移動ロボットの旋回速度ψ^’ _τ（ｔ）と前後方向の速度ｖ_ｘｗτ（ｔ）を自律移動ロボットの目標速度として定める。（ｂ_１，ｂ_２）はマルコフ状態遷移モデルの各状態ｓにおける行動ａに対応する二次元ベクトル、αは予め定められた旋回加速度であり、βは予め定められた前後方向の加速度であり、τは各行動ａの開始時からの経過時間、ψ’_τ０は行動ａの開始時における自律移動ロボットの旋回速度、ｖ_ｘ０は行動ａの開始時における自律移動ロボットの前後方向の速度である。 << Step S11 >>
The target speed calculation unit 11 (FIG. 2) of the state transition probability calculation unit 1 determines a target speed when the autonomous mobile robot takes each action a in each state s (step S11). The target speed is sent to the displacement amount calculation unit 12. For example, for each action a, the turning speed ψ ^′ _τ (t) of the autonomous mobile robot and the speed v _xwτ (t) in the front-rear direction are determined as the target speed of the autonomous mobile robot according to the following formula. (B ₁ , b ₂ ) is a two-dimensional vector corresponding to the action a in each state s of the Markov state transition model, α is a predetermined turning acceleration, β is a predetermined longitudinal acceleration, τ is the elapsed time from the start of each action a, ψ ′ _τ0 is the turning speed of the autonomous mobile robot at the start of action a, and v _x0 is the longitudinal speed of the autonomous mobile robot at the start of action a.

ここで、旋回加速度α及び前後方向の加速度βは、自律移動ロボットの性能の限界を超えないように設定される。また、前後方向の速度ｖ_ｘ０（ｔ）及び前後方向の加速度βは、それぞれ対流体機体速度及び対流体機体加速度として記述される。 Here, the turning acceleration α and the longitudinal acceleration β are set so as not to exceed the performance limit of the autonomous mobile robot. Further, the longitudinal velocity v _x0 (t) and the longitudinal acceleration β are respectively described as the fluid velocity and the fluid acceleration.

≪ステップＳ１２≫
変位量計算部１２は、各状態ｓにある自律移動ロボットが、予め定められた流れの下において、各行動ａに従って移動した場合の、自律移動ロボットの世界座標系における水平面内位置のＸ座標，Ｙ座標，方位角ψ及び旋回速度ψ’がそれぞれどれくらい変位するのか計算する（ステップＳ１２）。計算された変位量は確率計算部１３に送られる。 << Step S12 >>
The displacement amount calculation unit 12 includes an X coordinate of a position in a horizontal plane in the world coordinate system of the autonomous mobile robot when the autonomous mobile robot in each state s moves according to each action a under a predetermined flow, How much the Y coordinate, the azimuth angle ψ, and the turning speed ψ ′ are displaced is calculated (step S12). The calculated displacement amount is sent to the probability calculation unit 13.

予め定められた流れとは、例えば、想定される流れ、想定される流れよりも速い流れである。［自律移動ロボットの動作制御装置及び方法］の欄で後述するように予め定められた流れと流速の実測値とが異なる場合には適宜補正されるため、おおよその流れでよい。予め定められた流れを０としてもよい。もっとも、想定される流れが流速の実測値と近いほど、この動作計画及びこれに基づく動作制御の精度が増す。したがって、予め定められた流れを想定される流れとすることにより、この動作計画及びこれに基づく動作制御の精度が増す。また、自律移動ロボットの出発位置から到達位置に向かう方向と同じ向きで想定される流れよりも速い流れの下で計算を行う場合、後ろから流れを受けるロボットは旋回半径が大きくなるので、現実よりも厳しい条件で動作計画をしていることになるから、より安全な動作計画を行うことができる。 The predetermined flow is, for example, an assumed flow or a flow faster than the assumed flow. As will be described later in the section [Operation control device and method for autonomous mobile robot], when the flow determined in advance is different from the actual measured flow velocity, the flow is approximated as appropriate. The predetermined flow may be set to zero. However, the closer the expected flow is to the actual measurement value of the flow velocity, the greater the accuracy of this operation plan and operation control based on it. Therefore, the accuracy of the operation plan and the operation control based on the operation plan is increased by setting the predetermined flow to an assumed flow. In addition, when calculation is performed under a flow that is faster than the flow assumed in the same direction as the direction from the departure position to the arrival position of the autonomous mobile robot, the robot that receives the flow from behind has a larger turning radius. Since the operation plan is performed under severe conditions, a safer operation plan can be performed.

自律移動ロボットの水平面内位置のＸ座標の変位量をＤ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量をＤ_Ｙ（ψ_０，ａ）、方位角ψの変位量をＤ_ψ（ψ_０，ａ）、旋回速度ψ’の変位量をＤ_ψ’（ψ_０，ａ）とすると、それぞれの変位量は、次式にように与えられる（図５を参照のこと）。 The X-coordinate displacement amount of the position in the horizontal plane of the autonomous mobile robot is D _X (ψ ₀ , a), the Y-coordinate displacement amount is _DY (ψ ₀ , a), and the azimuth angle ψ displacement amount is D _ψ (ψ ₀ , A), where the displacement amount of the turning speed ψ ′ is D _{ψ ′} (ψ ₀ , a), the respective displacement amounts are given by the following equations (see FIG. 5).

ここで、ψ_０は各状態ｓの開始時の方位角、Ｔは状態ｓから次の状態ｓ’に遷移するまでの行動単位時間、ｆ_ｍｘは予め定められた流れのＸ座標の成分、ｆ_ｍｙは予め定められた流れのＹ座標の成分である。なお、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）と、旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）については、旋回速度ψ’の制御を行うことになるため、風の影響による補正は行わない。行動単位時間は例えば１５秒とすることができる。 Here, ψ ₀ is an azimuth angle at the start of each state s, T is an action unit time until the transition from the state s to the next state s ′, f _mx is a component of the X coordinate of a predetermined flow, f _my is a predetermined Y-coordinate component of the flow. Since the displacement amount D _ψ (ψ ₀ , a) of the azimuth angle ψ and the displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′, the turning speed ψ ′ is controlled. No correction is made due to wind effects. The action unit time can be set to 15 seconds, for example.

≪ステップＳ１３≫
確率計算部１３は、自律移動ロボットの水平面内位置のＸ座標の変位量Ｄ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量Ｄ_Ｙ（ψ_０，ａ）、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）及び旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）に基づいて、状態遷移確率Ｐ^ａ _ｓｓ’を計算する（ステップＳ１３）。 << Step S13 >>
The probability calculation unit 13 includes an X coordinate displacement amount D _X (ψ ₀ , a), a Y coordinate displacement amount D _Y (ψ ₀ , a), and an azimuth angle displacement amount D _{ψ of the} position in the horizontal plane of the autonomous mobile robot. Based on (ψ ₀ , a) and the displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′, the state transition probability P ^a _{ss ′} is calculated (step S13).

まず、状態ｓが、自律移動ロボットの水平面内位置のＸ座標、Ｙ座標、方位角ψ及び旋回速度ψ’の４つの次元で構成される格子で示されるとし、その格子をＲ（ｓ）と定義する（図６を参照のこと）。そして、その格子Ｒ（ｓ）を、上記各変位量から構成される変位量ベクトル（Ｄ_Ｘ（ψ_０，ａ），Ｄ_Ｙ（ψ_０，ａ），Ｄ_ψ（ψ_０，ａ），Ｄ_ψ’（ψ_０，ａ））で、平行移動したものをＲ_ｔ（ｓ）と定義する。 First, it is assumed that the state s is represented by a grid composed of four dimensions of an X coordinate, a Y coordinate, an azimuth angle ψ, and a turning speed ψ ′ of the position in the horizontal plane of the autonomous mobile robot. Define (see FIG. 6). Then, the lattice R (s) is converted into displacement vector (D _X (ψ ₀ , a), D _Y (ψ ₀ , a), D _ψ (ψ ₀ , a), D) composed of the respective displacements. _{ψ ′} (ψ ₀ , a)) is defined as R _t (s).

ここで、自律移動ロボットが状態ｓにあるときは、自律移動ロボットは、その状態ｓを表わす４次元の格子Ｒ（ｓ）の各点の何れかに、等しい確率で存在するものと仮定する。この仮定の下では、状態遷移確率Ｐ^ａ _ｓｓ’は、Ｒ_ｔ（ｓ）と各Ｒ（ｓ’）の重なった部分の体積に比例してそれぞれ求めることができる。ここで、Ｒ（ｓ’）は、Ｒ_ｔ（ｓ）と重なった格子である。すなわち、Ｒ（ｓ’）は、状態ｓにおいてある行動ａを取ったときの遷移先の候補の状態ｓ’に対応した４次元の格子である。Ｒ_ｔ（ｓ）は最大で８つのＲ（ｓ’）と重なる可能性がある。 Here, when the autonomous mobile robot is in the state s, it is assumed that the autonomous mobile robot exists with an equal probability at any point of the four-dimensional lattice R (s) representing the state s. Under this assumption, the state transition probability P ^a _{ss ′} can be obtained in proportion to the volume of the overlapping portion of R _t (s) and each R (s ′). Here, R (s ′) is a lattice overlapping with R _t (s). That is, R (s ′) is a four-dimensional lattice corresponding to the transition destination candidate state s ′ when the action a in the state s is taken. R _t (s) may overlap with up to 8 R (s ′).

状態遷移確率Ｐ^ａ _ｓｓ’は、Ｒ_ｔ（ｓ）とあるＲ（ｓ’）の重なった部分の体積をＶ_０（ｓ，ｓ’，ａ）、Ｒ_ｔ（ｓ）とすべてのＲ（ｓ’）との重なった部分の体積をΣ_ｓ’Ｖ_０（ｓ，ｓ’，ａ）とすると、次式により求めることができる。 The state transition probability P ^a _{ss ′} is the volume of the overlapping portion of R _t (s) and a certain R (s ′) as V ₀ (s, s ′, a), R _t (s) and all R (s If the volume of the portion overlapping with ') is Σ _s' V ₀ (s, s', a), it can be obtained by the following equation.

ステップＳ１１からステップＳ１３の処理を適宜繰り返すことにより、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての状態遷移確率Ｐ^ａ _ｓｓ’を求める（ステップＳ１の説明は以上）。 By appropriately repeating the processing from step S11 to step S13, the state transition probability P ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s ′ is obtained (the description of step S1 has been described above).

次に流れ発生確率記憶部３に格納されるデータについて説明をする。流れ発生確率記憶部３（図１）には、互いに異なる複数の一時的な流れ（ｆ_ｇ，ψ_ｇ）と、各一時的な流れ（ｆ_ｇ，ψ_ｇ）が生じる確率とが記憶される。一時的な流れは、極座標系、直交座標系、斜光座標系の何れの座標系で記述してもよいが、この例では極座標系で一時的な流れを記述することにする。ｆ_ｇは一時的な流れの速さ、ψ_ｇは一時的な流れの方向を意味する。一時的な流れは、速さｆ_ｇ及び方向ψ_ｇの組で特定される。速さｆ_ｇ及び方向ψ_ｇの一時的な流れを（ｆ_ｇ，ψ_ｇ）とも表記する。起こり得る一時的な流れの範囲は、ｆ_ｇ−ψ_ｇ平面の集合Ｓで定義される。 Next, data stored in the flow occurrence probability storage unit 3 will be described. The flow occurrence probability storage unit 3 (FIG. 1) stores a plurality of different temporary flows (f _g , ψ _g ) and the probabilities that each temporary flow (f _g , ψ _g ) occurs. . The temporary flow may be described in any of the polar coordinate system, the orthogonal coordinate system, and the oblique coordinate system, but in this example, the temporary flow is described in the polar coordinate system. f _g means the speed of the temporary flow, and ψ _g means the direction of the temporary flow. Temporary streams are identified by a set of speed f _g and direction [psi _g. A temporary flow rate _{f g} and direction _{_{_{ψ g (f g, ψ g}}} ) also referred to. The range of possible temporary flows is defined by the set S of f _g −ψ _g planes.

例えば、自律移動ロボットの行動環境で過去に起こった最大流速をｆ_ｇｍａｘとして、集合Ｓを次のように定義する。 For example, the set S is defined as follows, where f _gmax is the maximum flow velocity that has occurred in the past in the behavioral environment of the autonomous mobile robot.

Ｓ＝｛（ｆ_ｇ，ψ_ｇ）｜０≦ｆ_ｇ≦ｆ_ｇｍａｘ，０≦ψ_ｇ≦３６０°｝
ｐ_ｇ（ｆ_ｇ，ψ_ｇ）は一時的な流れの確率分布であり、計算機で扱う場合には離散的な確率となるが、ｐ_ｇ（ｆ_ｇ，ψ_ｇ）を概念的な説明のために確率密度関数と呼ぶこともある。一時的な流れを連続的に考えた場合には、一時的な流れの確率分布は確率密度関数ｐ_ｇ（ｆ_ｇ，ψ_ｇ）で表現することができる。定義により確率密度関数ｐ_ｇ（ｆ_ｇ，ψ_ｇ）を集合Ｓで積分すると１となる。 S = {(f _g , ψ _g ) | 0 ≦ f _g ≦ f _gmax , 0 ≦ ψ _g ≦ 360 °}
p _g (f _g , ψ _g ) is a temporary flow probability distribution, which is a discrete probability when handled by a computer, but p _g (f _g , ψ _g ) is used for conceptual explanation. Sometimes called a probability density function. When a temporary flow is considered continuously, the probability distribution of the temporary flow can be expressed by a probability density function p _g (f _g , ψ _g ). By definition, the probability density function p _g (f _g , ψ _g ) is integrated with the set S to be 1.

１＝∫ｐ_ｇ（ｆ_ｇ，ψ_ｇ）ｄｆ_ｇｄψ_ｇ
実際に発生し得る一時的な流れの方向ψ_ｇは予想することができないので、確率密度関数ｐ_ｇ（ｆ_ｇ，ψ_ｇ）は、速さｆ_ｇについてのみ正規分布に従うとして、例えば下記式のように定義される。σは分散、μは平均値である。 1 = ∫p _g (f _g , ψ _g ) df _g dψ _g
Since the temporary flow direction ψ _g that can actually be generated cannot be predicted, the probability density function _pg (f _g , ψ _g ) is assumed to follow a normal distribution only with respect to the speed f _g , for example, Is defined as σ is the variance, and μ is the average value.

μ＝０として、速さｆ_ｇの定義域をｆ_ｇ＞０としてもよい。この場合、１＝∫ｐ_ｇ（ｆ_ｇ，ψ_ｇ）ｄｆ_ｇｄψ_ｇとするために、確率密度関数ｐ_ｇ（ｆ_ｇ，ψ_ｇ）を次のように定義する。 It is also possible to define μ _g = 0 and define the range of the speed f _g as f _g > 0. In this _{_{case, 1 = ∫p g (f g}} , ψ g) in order to _df g d [phi] _g, defining the probability density function _{_{_{p g (f g, ψ g}}} ) and as follows.

自律移動ロボットの行動環境で過去に測定された流速についての統計データを用いて、確率密度関数ｐ_ｇ（ｆ_ｇ，ψ_ｇ）を定義してもよい。 The probability density function p _g (f _g , ψ _g ) may be defined using statistical data regarding the flow velocity measured in the past in the behavior environment of the autonomous mobile robot.

流れ発生確率記憶部３には、これらの確率密度関数ｐ_ｇ（ｆ_ｇ，ψ_ｇ）を表現するためのデータが格納される。例えば、ｆ_ｇ−ψ_ｇ平面をΔｆ_ｇ，Δψ_ｇの幅の格子で分割することにより離散的に表現し、各格子の代表点（例えば格子の中心点）の値ｆ_ｇ，ψ_ｇごとに、一時的な流れの確率ｐ_ｇ（ｆ_ｇ，ψ_ｇ）の値を定義する。このとき、ｐ_ｇ（ｆ_ｇ,ψ_ｇ）の総和は１となる。すなわち、Σ_Ｓｐ_ｇ（ｆ_ｇ,ψ_ｇ）＝１である。 The flow generation probability storage unit 3 stores data for expressing these probability density functions p _g (f _g , ψ _g ). For example, the f _g -ψ _g plane is discretely expressed by dividing it by a grid having a width of Δf _g and Δψ _g , and the value f _g and ψ _g of the representative point (for example, the center point of the grid) of each grid. Define the value of the temporary flow probability p _g (f _g , ψ _g ). At this time, the total sum of p _g (f _g , ψ _g ) is 1. That is, Σ _S _pg (f _g , ψ _g ) = 1.

＜ステップＳ２＞
衝突判定部４は、予め定められた流れ（ｆ_ｘ，ｆ_ｙ）、及び、各一時的な流れ（ｆ_ｇ，ψ_ｇ）の下で、各状態ｓにある自律移動ロボットが各行動ａに基づいて移動した場合に、障害物に衝突するか判定する（ステップＳ１）。障害物に衝突すると判定された一時的な流れ（ｆ_ｇ，ψ_ｇ）についての情報は、衝突流れ記憶部５に格納される。例えば、後述するように、障害物に衝突すると判定された一時的な流れ（ｆ_ｇ，ψ_ｇ）の格子の集合Ｓ_ｇについての情報が、衝突流れ記憶部５に格納される。 <Step S2>
The collision determination unit 4 allows the autonomous mobile robot in each state s to move to each action a under a predetermined flow (f _x , f _y ) and each temporary flow (f _g , ψ _g ). It is determined whether or not the vehicle collides with an obstacle when moving based on this (step S1). Information about the temporary flow (f _g , ψ _g ) determined to collide with the obstacle is stored in the collision flow storage unit 5. For example, as will be described later, information on the lattice set S _g of the temporary flow (f _g , ψ _g ) determined to collide with the obstacle is stored in the collision flow storage unit 5.

ステップＳ１の処理の具体例を、図１１を参照して説明する。まず、衝突判定部４は、ｆ_ｇ−ψ_ｇ空間の集合Ｓから、速さｆ_ｇ及び方向ψ_ｇともに最小値となる格子を選択する（ステップＳ２１）。τ＝０とする（ステップＳ２２）。上記式により定義されるＤ_Ｘｇ（ｓ，ａ，ｆ_ｇ，ψ_ｇ）、Ｄ_Ｙｇ（ｓ，ａ，ｆ_ｇ，ψ_ｇ）の値を計算して、状態ｓにおけるＸ，Ｙの値をそれぞれ加算する（ステップＳ２３）。すなわち、Ｄ_Ｘｇ（ｓ，ａ，ｆ_ｇ，ψ_ｇ）＋Ｘ、Ｄ_Ｙｇ（ｓ，ａ，ｆ_ｇ，ψ_ｇ）＋Ｙの値を計算する。Ｄ_Ｘｇ（ｓ，ａ，ｆ_ｇ，ψ_ｇ）＋Ｘ、Ｄ_Ｙｇ（ｓ，ａ，ｆ_ｇ，ψ_ｇ）＋Ｙで示す位置が障害物であるかどうかを判定する（ステップＳ２４）。障害物でないと判定された場合には、τ＜Ｔであるかを判定する（ステップＳ２５）。τ＜Ｔである場合には、τを予め定められた値だけインクリメントし（ステップＳ２６）、その後ステップＳ２３に進む。τ＜Ｔでない場合には、速さｆ_ｇが速さｆ_ｇの最大値ｆ_ｇｍａｘであるかどうか、すなわちｆ_ｇ＜ｆ_ｇｍａｘであるかを判定する（ステップＳ２７）。ｆ_ｇ＜ｆ_ｇｍａｘである場合には、速さｆ_ｇをインクリメントして（ステップＳ２８）、ステップＳ２２に進む。ｆ_ｇ＜ｆ_ｇｍａｘでない場合には、ステップＳ２１０に進む。 A specific example of the processing in step S1 will be described with reference to FIG. First, the collision determination unit 4 selects, from the set S in the f _g −ψ _g space, a lattice having a minimum value for both the speed f _g and the direction ψ _g (step S21). τ = 0 is set (step S22). The values of D _Xg (s, a, f _g , ψ _g ) and D _Yg (s, a, f _g , ψ _g ) defined by the above formula are calculated, and the values of X and Y in the state s are respectively calculated. Add (step S23). That is, the values of D _Xg (s, a, f _g , ψ _g ) + X, _DY _g (s, a, f _g , ψ _g ) + Y are calculated. It is determined whether or not the position indicated by D _Xg (s, a, f _g , ψ _g ) + X, D _Yg (s, a, f _g , ψ _g ) + Y is an obstacle (step S24). If it is determined that the object is not an obstacle, it is determined whether τ <T (step S25). If τ <T, τ is incremented by a predetermined value (step S26), and then the process proceeds to step S23. If not tau <T indicates whether the speed _{f g} is the maximum value _{f gmax} faster _{f g,} i.e. determines whether the _{_f} g _{_<f gmax} (step S27). If f _g <f _gmax , the speed f _g is incremented (step S28), and the process proceeds to step S22. If f _g <f _gmax is not satisfied, the process proceeds to step S210.

ステップＳ２４において、障害物であると判定された場合には、自律移動ロボットの行動環境で想定される流れ（ｆ_ｘ，ｆ_ｙ）、及び、現在選択されている格子の一時的な流れ（ｆ_ｇ，ψ_ｇ）の下で、現在選択されている状態ｓにある自律移動ロボットが現在選択されている行動ａに基づいて移動した場合に、障害物に衝突すると判定して、現在選択されている一時的な流れ（ｆ_ｇ，ψ_ｇ）の格子を、衝突が起こる一時的な流れの集合Ｓｇに加える（ステップＳ２９）。また、必要に応じて、現在選択されている一時的な流れ（ｆ_ｇ，ψ_ｇ）の格子だけではなく、現在選択されている一時的な流れ（ｆ_ｇ，ψ_ｇ）と方向ψ_ｇが同じでよりも速さが速い一時的な流れの格子を、衝突が起こる一時的な流れの集合Ｓ_ｇに加えてもよい（ステップＳ２９’）。 If it is determined in step S24 that the vehicle is an obstacle, the flow (f _x , f _y ) assumed in the behavior environment of the autonomous mobile robot and the temporary flow (f of the currently selected lattice) _g , ψ _g ), when the autonomous mobile robot in the currently selected state s moves based on the currently selected action a, it is determined that it collides with an obstacle and is currently selected. The temporary flow (f _g , ψ _g ) grid is added to the temporary flow set Sg where the collision occurs (step S29). If necessary, not only the currently selected temporary flow (f _g , ψ _g ) lattice but also the currently selected temporary flow (f _g , ψ _g ) and the direction ψ _g a grid of fast transient flow rate than the same, may be added to the set S _g of temporary flow collision occurs (step S29 ').

例えば、図７の速さｆ_ｇ２の一時的な流れにおいて衝突が起こると判定された場合には、速さｆ_ｇ２と方向が同じで速さがｆ_ｇ２よりも速い速さｆ_ｇ４の一時的な流れにおいても衝突が起こることは計算をしなくてもわかる。したがって、ｆ_ｇ４の一時的な流れについての衝突が起こるかどうかの計算を省略することができるのである。このようにして、一時的な流れについて衝突するかどうかの判定を省略することにより、計算量を削減することができる。図７の状況における、衝突が起こる一時的な流れの集合Ｓ_ｇの連続的な概念図を図８に、衝突が起こる一時的な流れの集合Ｓ_ｇの離散的な概念図を図９に示す。 For example, if it is determined that a collision occurs in the temporary flow at the speed f _g2 in FIG. 7, the speed is temporarily the same as the speed f _g2 and the speed f _g4 is faster than the speed f _g2. It can be seen that a collision occurs even in a simple flow without calculation. Therefore, the calculation of whether or not a collision with respect to the temporary flow of f _g4 occurs can be omitted. In this way, it is possible to reduce the amount of calculation by omitting the determination of whether or not the temporary flow collides. In the context of FIG. 7, FIG. 8 continuous conceptual view of a set S _g of temporary flow collision occurs, shows a discrete conceptual view of a set S _g of temporary flow collision occurs in Figure 9 .

ステップＳ２１０において、方向ψ_ｇ＞３６０°であるか判定する（ステップＳ２１０）。方向ψ_ｇ＞３６０°でないと判定された場合には、方向ψ_ｇを予め定められた値だけインクリメントし、速さｆ_ｇを０とし（ステップＳ２１１）、ステップＳ２２に進む。方向ψ_ｇ＞３６０度であると判定された場合には、処理を終える。この処理を状態ｓと行動ａの各組について行う。 In step S210, it is determined whether the direction ψ _g > 360 ° (step S210). When it is determined that the direction ψ _g > 360 ° is not satisfied, the direction ψ _g is incremented by a predetermined value, the speed f _{g is set} to 0 (step S211), and the process proceeds to step S22. If it is determined that the direction ψ _g > 360 degrees, the process ends. This process is performed for each set of state s and action a.

＜ステップＳ３＞
衝突確率計算部６は、障害物に衝突すると判定された一時的な流れが生じる確率を流れ発生確率記憶部３から読み込んで加算することにより、各状態にある上記自律移動ロボットが各行動に基づいて移動した場合に障害物に衝突する衝突確率Ｐ_ｃ（ｓ，ａ）を計算する（ステップＳ３）。例えば、集合Ｓ_ｇに含まれる衝突が起こる一時的な流れ（ｆ_ｇ，ψ_ｇ）のそれぞれが生じる確率ｐ_ｇ（ｆ_ｇ，ψ_ｇ）の総和を計算することにより、衝突確率Ｐ_ｃ（ｓ，ａ）を計算する。 <Step S3>
The collision probability calculation unit 6 reads from the flow generation probability storage unit 3 and adds the probability that a temporary flow determined to collide with an obstacle is generated, so that the autonomous mobile robot in each state is based on each action. The collision probability P _c (s, a) that collides with the obstacle when moving is calculated (step S3). For example, by calculating the sum of the probabilities p _g (f _g , ψ _g ) of each of the temporary flows (f _g , ψ _g ) in which the collisions included in the set S _g occur, the collision probability P _c (s , A).

障害物に衝突すると判定された一時的な流れについての情報（例えば、集合Ｓ_ｇについての情報）は、衝突流れ記憶部５から読み出す。計算された衝突確率Ｐ_ｃ（ｓ，ａ）は、報酬決定部７に送られる。 Information about temporary flow that is determined to collide with the obstacle (for example, information about the set S _g) reads from the collision flow storage unit 5. The calculated collision probability P _c (s, a) is sent to the reward determination unit 7.

＜ステップＳ４＞
報酬決定部７は、遷移先の状態ｓ’が到達位置を含む場合の報酬が最も高く、遷移先の状態ｓ’が到達位置を含まない場合には衝突確率Ｐ_ｃ（ｓ，ａ）が小さいほど報酬が高くなるように、各状態ｓにある自律移動ロボットが各行動ａを取り各状態ｓ’に遷移する場合に得られる報酬ｒ（ｓ，ａ）を決定する（ステップＳ４）。決定された報酬ｒ（ｓ，ａ）は、動的計画部２に送られる。 <Step S4>
The reward determination unit 7 has the highest reward when the transition destination state s ′ includes the arrival position, and the collision probability P _c (s, a) is small when the transition destination state s ′ does not include the arrival position. The reward r (s, a) obtained when the autonomous mobile robot in each state s takes each action a and transitions to each state s ′ is determined so that the reward becomes higher (step S4). The determined reward r (s, a) is sent to the dynamic planning unit 2.

関数Ｆを、単調減少関数とする。単調減少関数とは、任意のｘ_１，ｘ_２（ただし、ｘ_１＜ｘ_２）に対して、ｆ（ｘ_１）≧ｆ（ｘ_２）となる関数ｆのことを意味する。Ｐ_ｃ（ｓ，ａ）の定義によりＰ_ｃ（ｓ，ａ）の最小値は０であり最大値は１であるから、関数Ｆの最大値はＦ（０）最小値はＦ（１）である。また、任意の定数Ｒ_ｍａｘを、Ｒ_ｍａｘ＞Ｆ（０）とする。例えば、Ｆ（ｘ）＝−ｘとし、Ｒ_ｍａｘ＝１とする。このとき、報酬Ｒ^ａ _ｓｓ’を例えば次のように決定する。 Let function F be a monotonically decreasing function. The monotone decreasing function means a function f that satisfies f (x ₁ ) ≧ f (x ₂ ) with respect to arbitrary x ₁ , x ₂ (where x ₁ <x ₂ ). _P _c (s, a) P by the definition of c (s, a) from the minimum value is 0 the maximum value of is 1, the maximum value of the function F is F (0) minimum is F (1) is there. Further, an arbitrary constant R _max is set to R _max > F (0). For example, F (x) = − x and R _max = 1. At this time, to determine the compensation R ^a _ss', for example, as follows.

遷移先の状態ｓ’が到達位置を含む場合⇒Ｒ^ａ _ｓｓ’＝Ｒ_ｍａｘ
それ以外の場合⇒Ｒ^ａ _ｓｓ’＝Ｆ（Ｐ_ｃ（ｓ，ａ））
＜ステップＳ５＞
動的計画部２は、状態遷移確率及び報酬を用いて、マルコフ決定過程における動的計画法に基づき、自律移動ロボットの上記到達位置への到達しやすさを表す指標を状態ごとに求める（ステップＳ５）。到達しやすさを表す指標としては、状態価値関数Ｖ^π（ｓ）、行動価値関数Ｑ^π（ｓ，ａ）及び方策πの何れかを例えば用いることができる。計算された指標は、指標記憶部１７に格納される。 When transition destination state s ′ includes an arrival position => R ^a _{ss ′} = R _max
Otherwise ⇒ R ^a _{ss ′} = F (P _c (s, a))
<Step S5>
The dynamic planning unit 2 obtains, for each state, an index that represents the ease of reaching the reaching position of the autonomous mobile robot based on the dynamic programming method in the Markov decision process using the state transition probability and the reward (step S5). For example, any one of the state value function V ^π (s), the behavior value function Q ^π (s, a), and the policy π can be used as an index representing the ease of reaching. The calculated index is stored in the index storage unit 17.

［マルコフ決定過程］の欄で説明をしたように、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての状態遷移確率Ｐ^ａ _ｓｓ’及び報酬Ｒ^ａ _ｓｓ’が計算されていれば、動的計画法に基づいて、状態価値関数Ｖ^π（ｓ）、行動価値関数Ｑ^π（ｓ，ａ）及び方策πを計算することができる。 As explained in the column of [Markov decision processes, state s, if it is and calculated reward R ^a _ss' 'state transition probability P ^a _ss for each combination _of' action a and destination state s Based on dynamic programming, the state value function V ^π (s), the action value function Q ^π (s, a) and the policy π can be calculated.

このように想定外の一時的な流れ（ｆ_ｇ，ψ_ｇ）を考慮して報酬Ｒ^ａ _ｓｓ’を決定し、この報酬Ｒ^ａ _ｓｓ’に基づいて動的計画を行うことにより、生成された動的計画はより破綻しづらくなる。 Thus unexpected temporary flow (f _{_g,} ψ _g) _'is determined and the reward R ^a _ss' reward R ^a _ss considering the by performing dynamic programming based on, the generated Dynamic planning is less likely to fail.

以上が、自律移動ロボットの動作計画装置及び方法の実施形態について説明である。 The above is the description of the embodiment of the operation planning apparatus and method for the autonomous mobile robot.

［自律移動ロボットの動作制御装置及び方法］
以下、図３及び図１２を参照して、生成された動的計画（＝指標）を用いて自律移動ロボットの動作を制御する自律移動ロボットの動作制御装置及び方法の実施形態について説明する。 [Operation control apparatus and method for autonomous mobile robot]
Hereinafter, with reference to FIG. 3 and FIG. 12, an embodiment of an operation control apparatus and method for an autonomous mobile robot that controls the operation of the autonomous mobile robot using the generated dynamic plan (= index) will be described.

＜ステップＡ１（図１２）＞
流速差取得部２１（図３）は、動作計画時に予想した予め定められた流れと、流速の実測値との差である流速差を求める（ステップＡ１）。求まった流速差は、遷移先予測部２２に送られる。予め定められた流れのＸ成分をｆ_ｘ、Ｙ成分をｆ_ｙとし、実際の流速のＸ成分をｆ_ｘａ、Ｙ成分をｆ_ｙａとすると、流速差ｄ_ｆｘ，ｄ_ｆｙは、それぞれ下記のように表される。 <Step A1 (FIG. 12)>
The flow velocity difference acquisition unit 21 (FIG. 3) obtains a flow velocity difference that is a difference between a predetermined flow predicted at the time of the operation plan and an actual measurement value of the flow velocity (step A1). The obtained flow velocity difference is sent to the transition destination prediction unit 22. The X component of the predetermined flow and _f x, the Y component and _{f y,} the actual flow rate of the X component _{f xa,} when the Y component is _{f ya,} flow rate difference _d _{fx, d fy} are the following respectively as It is expressed in

ｄ_ｆｘ＝ｆ_ｘ−ｆ_ｘａ
ｄ_ｆｙ＝ｆ_ｙ−ｆ_ｙａ
動作計画時に予想した予め定められた流れを０とした場合には、流速差取得部２１は現在の流速の実測値を流速差として求めることができる。 d _fx = _f x _{-f xa}
d _fy = _f y _{-f ya}
When the predetermined flow predicted at the time of the operation plan is set to 0, the flow velocity difference acquisition unit 21 can obtain the actual measured value of the current flow velocity as the flow velocity difference.

＜ステップＡ２＞
遷移先予測部２２は、自律移動ロボットが各行動を取った場合の遷移先の状態ｓ’を、自律移動ロボットの位置を流速差ｄ_ｆｘ，ｄ_ｆｙの分だけ移動させることにより求める（ステップＡ２）。求まった遷移先の状態ｓ’は、行動決定部２３に送られる。 <Step A2>
The transition destination prediction unit 22 obtains the transition destination state s ′ when the autonomous mobile robot takes each action by moving the position of the autonomous mobile robot by the flow velocity differences d _fx and d _fy (step A2). ). The obtained transition destination state s ′ is sent to the action determining unit 23.

遷移先の状態ｓ’の求め方の例を述べる。
流速差ｄ_ｆｘを考慮したときの自律移動ロボットのＸ軸方向の位置の変位量Ｄ_Ｘａ（ψ_０，ａ）と、流速差ｄ_ｆｙを考慮したときの自律移動ロボットのＹ軸方向の位置の変位量Ｄ_Ｙａ（ψ_０，ａ）とは、それぞれ以下のように示される。 An example of how to obtain the transition destination state s ′ will be described.
The displacement D _Xa (ψ ₀ , a) of the position of the autonomous mobile robot in the X-axis direction when considering the flow velocity difference d _fx and the position of the autonomous mobile robot in the Y-axis direction when considering the flow velocity difference d _fy The displacement amount D _Ya (ψ ₀ , a) is expressed as follows.

遷移先予測部２２は、まず、上記式により、すなわち自律移動ロボットの位置を流速差ｄ_ｆｘ，ｄ_ｆｙの分だけ移動させることにより、実際のＸ軸方向の位置の変位量Ｄ_Ｘａ（ψ_０，ａ）及び実際のＹ軸方向の位置の変位量Ｄ_Ｙａ（ψ_０，ａ）を求める。 First, the transition destination prediction unit 22 moves the position of the autonomous mobile robot by the flow velocity differences d _fx and d _fy by the above-described equation, so that the actual displacement amount X _Xa (ψ ₀ , A) and the actual displacement amount D _Ya (ψ ₀ , a) in the Y-axis direction.

遷移先予測部２２は、次に、下記式により、行動ａを取った場合の遷移先の状態ｓ’を求める。具体的には、行動ａの開始時における、Ｘ軸方向の位置Ｘ（ｓ）に、Ｙ軸方向の位置Ｙ（ｓ）、方位角ψ_０（ｓ）及び旋回速度ψ’_０（ｓ）に、それぞれ実際のＸ軸方向の位置の変位量Ｄ_Ｘａ（ψ_０，ａ）、実際のＹ軸方向の位置の変位量Ｄ_Ｙａ（ψ_０，ａ）、方位角の変位量Ｄ_ψ（ψ_０，ａ）及び旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）を加算することにより遷移先の状態ｓ’を求める。 Next, the transition destination predicting unit 22 obtains the transition destination state s ′ when the action a is taken according to the following formula. Specifically, at the start of the action a, the position X (s) in the X-axis direction, the position Y (s) in the Y-axis direction, the azimuth angle ψ ₀ (s), and the turning speed ψ ′ ₀ (s) , Displacement amounts D _Xa (ψ ₀ , a) of actual positions in the X-axis direction, displacement amounts D _Ya (ψ ₀ , a) of actual positions in the Y-axis direction, and displacement amounts D _ψ (ψ _{0 of} azimuth angles) , A) and the displacement D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′ are added to obtain the transition destination state s ′.

Ｘ軸方向の位置Ｘ（ｓ）に、Ｙ軸方向の位置Ｙ（ｓ）、方位角ψ_０（ｓ）及び旋回速度ψ’_０（ｓ）については、位置計測部２５が測定したものを用いる。Ｄ_Ｘ（ψ_０，ａ）及びＤ_Ｙ（ψ_０，ａ）については、動作計画時に計算したＤ_Ｘ（ψ_０，ａ）及びＤ_Ｙ（ψ_０，ａ）を再利用してもよい。この場合、図示していない記憶部にＤ_Ｘ（ψ_０，ａ）及びＤ_Ｙ（ψ_０，ａ）が記憶され、遷移先予測部２２が適宜これらを読み込む。もちろん、遷移先予測部２２がこれらを再度計算してもよい。 As the position X (s) in the X-axis direction, the position Y (s), the azimuth angle ψ ₀ (s), and the turning speed ψ ′ ₀ (s) in the Y-axis direction are those measured by the position measurement unit 25. . The D _X (ψ _0, a) and _{_{D Y (ψ 0, a)}} , D X calculated at the operation plan (ψ _0, a) and _{D Y} (ψ _0, a) may be reused. In this case, D _X (ψ ₀ , a) and D _Y (ψ ₀ , a) are stored in a storage unit (not shown), and the transition destination prediction unit 22 reads them appropriately. Of course, the transition destination prediction unit 22 may calculate these again.

＜ステップＡ３＞
行動決定部２３は、遷移先予測部２２が求めた、状態ｓにおいて取り得る各行動ａに従って移動した場合の遷移先の状態ｓ’についての指標を比較して、到達点に最も到達しやすい行動ａを決定する（ステップＡ３）。決定された行動ａは、制御部２４に送られる。 <Step A3>
The action determination unit 23 compares the indicators for the transition destination state s ′ obtained by the transition destination prediction unit 22 according to each action a that can be taken in the state s, and the action that is most likely to reach the arrival point. a is determined (step A3). The determined action a is sent to the control unit 24.

指標として状態価値関数Ｖ^π（ｓ）を用いた場合には、到達点を含む状態に遷移する場合に得られる報酬が最も高くなるように報酬が決定されているため、状態価値関数Ｖ^π（ｓ）の値を最も大きくする行動ａが、到達点に最も到達しやすい行動となる。 In the case of using the state value function V π ^(s) as an index, since the reward as reward obtained when a transition to a state containing an arrival point is the highest is determined, the state value function V ^[pi ( The action a that maximizes the value of s) is the action that most easily reaches the reaching point.

したがって、行動決定部２３は、指標記憶部１７を参照して、状態ｓにおいて取り得る各行動ａに従って移動した場合の遷移先の状態ｓ’における状態価値関数Ｖ^π（ｓ’）をそれぞれ求め、比較することにより、状態価値関数Ｖ^π（ｓ’）の値を最も大きくする行動ａを決定する。 Therefore, the action determination unit 23 refers to the index storage unit 17 and obtains the state value function V ^π (s ′) in the transition destination state s ′ when moving according to each action a that can be taken in the state s. By comparison, the action a that ^{maximizes the} value of the state value function V ^π (s ′) is determined.

＜ステップＡ４＞
制御部２４は、決定された行動ａに従って移動するように、自律移動ロボットを制御する（ステップＡ４）。具体的には、行動ａに対応する目標速度を維持することができるように、自律移動ロボットの主推進器１０１及び舵１０３を制御する。 <Step A4>
The control unit 24 controls the autonomous mobile robot to move according to the determined action a (step A4). Specifically, the main propulsion device 101 and the rudder 103 of the autonomous mobile robot are controlled so that the target speed corresponding to the action a can be maintained.

［変形例等］
指標として、上記したように行動価値関数Ｑ^π（ｓ，ａ）又は方策πを用いてもよい。例えば、行動価値関数Ｑ^π（ｓ，ａ）を指標として用いた場合、動的計画部２は、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての状態遷移確率Ｐ^ａ _ｓｓ’及び第一報酬Ｒ^ａ _ｓｓ’を用いて、動的計画法に基づいて、行動価値関数Ｑ^π（ｓ，ａ）を計算する。そして、行動決定部２３（図３）は指標である行動価値関数Ｑ^π（ｓ，ａ）を比較して、到達点に最も到達しやすい行動を決定する。具体的には、行動価値関数Ｑ^π（ｓ，ａ）がその値が大きい程到達点に到達しやすいことを表すように定められている場合には、遷移前の状態ｓにおいて取り得る行動を行動ａ、遷移先の状態ｓ’において取り得る行動を行動ａ’として、ｍａｘ_ａ’Ｑ^π（ｓ’，ａ’）を比較して、ｍａｘ_ａ’Ｑ^π（ｓ’，ａ’）を最大にする行動ａを選択する。 [Modifications, etc.]
As an index, the action value function Q ^π (s, a) or the policy π may be used as described above. For example, when the behavior value function Q ^π (s, a) is used as an index, the dynamic planning unit 2 uses the state transition probability P ^a _{ss ′} for each combination of the state s, the behavior a, and the transition destination state s _′. And the behavior value function Q ^π (s, a) is calculated based on the dynamic programming using the first reward R ^a _{ss ′} . Then, the behavior determining unit 23 (FIG. 3) compares the behavior value function Q ^π (s, a) that is an index to determine the behavior that most easily reaches the reaching point. Specifically, when the action value function Q ^π (s, a) is determined so as to represent that the reaching point is easily reached as the value increases, actions that can be taken in the state s before the transition are determined. The action a and the action that can be taken in the transition destination state s ′ are set as action a ′, and max _{a ′} Q ^π (s ′, a ′) is compared, and max _{a ′} Q ^π (s ′, a ′) is maximized. The action a to be selected is selected.

上記自律移動ロボットの動作計画装置及び上記自律移動ロボットの動作制御装置における処理機能は、コンピュータによって実現することができる。この場合、これらの装置がそれぞれ有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、これらの装置における各処理機能が、コンピュータ上で実現される。 The processing functions of the autonomous mobile robot motion planning device and the autonomous mobile robot motion control device can be realized by a computer. In this case, the processing contents of the functions that each of these apparatuses should have are described by a program. Then, by executing this program on a computer, each processing function in these devices is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、これらの装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. In this embodiment, these apparatuses are configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 The present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention.

１状態遷移確率計算部
１１目標速度計算部
１２変位量計算部
１３確率計算部
２動的計画部
３発生確率記憶部
４衝突判定部
５記憶部
６衝突確率計算部
７報酬決定部
１７指標記憶部
２１流速差取得部
２２遷移先予測部
２３行動決定部
２４制御部
２５位置計測部 DESCRIPTION OF SYMBOLS 1 State transition probability calculation part 11 Target speed calculation part 12 Displacement amount calculation part 13 Probability calculation part 2 Dynamic planning part 3 Occurrence probability memory | storage part 4 Collision determination part 5 Storage part 6 Collision probability calculation part 7 Compensation determination part 17 Index storage part 21 Flow velocity difference acquisition unit 22 Transition destination prediction unit 23 Action determination unit 24 Control unit 25 Position measurement unit

Claims

An autonomous mobile robot that repeats the movement that moves only for the above behavior unit time based on the behavior determined in the behavior unit time in a space filled with a fluid with an indefinite flow and filled with an obstacle In the motion planning device of an autonomous mobile robot that performs motion planning to reach the arrival position from the departure position,
The space is discretely modeled by a Markov transition state model composed of four dimensions of two-dimensional coordinates (x, y), azimuth angle ψ, and turning speed ψ ′ of an autonomous mobile robot.
State transition probability calculation unit that calculates the state transition probability that is the probability of transitioning to each state when the autonomous mobile robot in each state takes each action and moves for the above behavior unit time in a predetermined flow When,
A flow occurrence probability storage unit that stores a probability of occurrence of a plurality of different temporary flows;
A collision determination that determines whether or not the autonomous mobile robot in each state collides with an obstacle when the autonomous mobile robot in each state moves under the predetermined flow and the temporary flow. And
When the autonomous mobile robot in each state moves based on each action by reading from the flow occurrence probability storage unit and adding the probability that a temporary flow determined to collide with an obstacle is generated A collision probability calculation unit for calculating a collision probability of colliding with an obstacle;
The autonomous system in each state is such that the reward when the transition destination state includes the arrival position is the highest, and the reward is higher as the collision probability is lower when the transition destination state does not include the arrival position. A reward determination unit that determines a reward obtained when the mobile robot takes each action and transitions to each state;
Based on the dynamic programming method in the Markov decision process using the state transition probability and the reward, a dynamic planning unit that obtains an index representing the reachability of the autonomous mobile robot to the reaching position for each state,
An autonomous mobile robot motion planning device.

The operation planning device for an autonomous mobile robot according to claim 1,
When it is determined that the collision determination unit collides with an obstacle for a certain temporary flow, the collision determination unit temporarily collides with the obstacle even for a temporary flow having the same direction and the same speed as the temporary flow. An autonomous mobile robot motion planning device, characterized in that it is determined that the flow is slow.

Based on the motion plan determined by the motion planning device of the autonomous mobile robot according to claim 1 or 2,
An autonomous mobile robot that repeats the movement that moves only for the above behavior unit time based on the behavior determined in the behavior unit time in a space filled with a fluid with an indefinite flow and filled with an obstacle In the autonomous mobile robot motion control device that controls to reach the arrival position from the departure position,
A flow velocity difference acquisition unit for obtaining a flow velocity difference which is a difference between the predetermined flow and an actual measurement value of the flow velocity;
A transition destination prediction unit for obtaining a state of a transition destination when the autonomous mobile robot takes each action by moving the position of the autonomous mobile robot by the amount of the flow velocity difference; and
An action determination unit that compares the indicators for the state of the transition destination obtained by the transition destination prediction unit with each other and determines an action that is most likely to reach the arrival position;
A control unit that controls the autonomous mobile robot so that the autonomous mobile robot moves according to the determined behavior;
Control device for autonomous mobile robot including

An autonomous mobile robot that repeats the movement that moves only for the above behavior unit time based on the behavior determined in the behavior unit time in a space filled with a fluid with an indefinite flow and filled with an obstacle In the operation planning method of an autonomous mobile robot that performs an operation plan to reach the arrival position from the departure position,
The above space is discretely modeled by a Markov transition state model composed of four dimensions of the two-dimensional coordinates (x, y), azimuth angle ψ and turning speed ψ ′ of the autonomous mobile robot.
The flow occurrence probability storage unit stores the probability of occurrence of a plurality of different temporary flows from each other,
A state transition probability calculation unit calculates a state transition probability that is a probability of transitioning to each state when the autonomous mobile robot in each state takes each action and moves for the action unit time in a predetermined flow. A state transition probability calculation step to be calculated;
The collision determination unit collides with an obstacle when the autonomous mobile robot in each state moves based on each action under the predetermined flow and each temporary flow. A collision determination step for determining whether or not
The collision probability calculation unit reads from the flow generation probability storage unit and adds the probability that a temporary flow determined to collide with an obstacle is added. A collision probability calculation step for calculating the collision probability of colliding with an obstacle when moving based on
Each reward determination unit has the highest reward when the transition destination state includes the arrival position, and when the transition destination state does not include the arrival position, the reward increases as the collision probability decreases. A reward determination step for determining a reward obtained when the autonomous mobile robot in a state takes each action and transitions to each state;
The dynamic planning unit obtains, for each state, an index indicating the ease of reaching the reaching position of the autonomous mobile robot based on the dynamic programming method in the Markov decision process using the state transition probability and the reward. A dynamic planning step;
Planning method for autonomous mobile robots.

The operation planning method for an autonomous mobile robot according to claim 4,
In the collision determination step, when it is determined that a certain temporary flow collides with an obstacle, a temporary flow that has the same direction and a high speed as the temporary flow also collides with the obstacle. An autonomous mobile robot motion planning method, characterized in that it is determined that the flow is slow.

Based on the motion plan determined by the motion planning method of the autonomous mobile robot according to claim 4 or 5, the behavior is performed for each behavior unit time in a space filled with a fluid having an indefinite flow velocity and in which an obstacle is arranged. In an autonomous mobile robot operation control method for controlling an autonomous mobile robot that repeats the movement of the above behavior unit time based on the behavior to reach the arrival position from the departure position,
A flow rate difference acquisition unit for obtaining a flow rate difference that is a difference between the predetermined flow and an actual measurement value of the flow rate;
A transition destination prediction step in which the transition destination prediction unit obtains the state of the transition destination when the autonomous mobile robot takes each action by moving the position of the autonomous mobile robot by the amount of the flow velocity difference;
An action determination unit that compares the indicators for the state of the transition destination obtained by the transition destination prediction unit with each other and determines an action that is most likely to reach the arrival position;
A control unit that controls the autonomous mobile robot such that the autonomous mobile robot moves according to the determined behavior;
Control method of autonomous mobile robot including

An autonomous mobile robot motion planning program for causing a computer to execute the autonomous mobile robot motion planning device according to claim 1.

The computer-readable recording medium which recorded the operation plan program of the autonomous mobile robot of Claim 7.