JP2014079819A

JP2014079819A - Robot cooperative conveyance planning device, method and program

Info

Publication number: JP2014079819A
Application number: JP2012227349A
Authority: JP
Inventors: Hiroshi Kawano; 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-10-12
Filing date: 2012-10-12
Publication date: 2014-05-08
Anticipated expiration: 2032-10-12
Also published as: JP5931685B2

Abstract

PROBLEM TO BE SOLVED: To provide a robot cooperative conveyance planning technology capable of reducing a calculation time required for plan calculation or a memory capacity of a computer, while taking existence of a plurality of robots into consideration.SOLUTION: A Markov state space is constituted hierarchically (an article orbital calculation processing is defined as a first hierarchy, a changeover position determination processing is defined as a second hierarchy, and a moving route planning processing is defined as a third hierarchy), and search for a movement plan is performed from a hierarchy having a low change frequency (first hierarchy), and a search range in a hierarchy having a high change frequency is restricted by using a result of search calculation in a hierarchy having a low change frequency, and search in a lower rank hierarchy (a second hierarchy, a third hierarchy) having a high change frequency is performed in the restricted state space.

Description

本発明は、ロボット協調搬送計画技術に関し、詳しくは、複数のロボットが協調して移動対象物を開始位置から目標位置まで移動させるための各ロボットの行動計画を求めるロボット協調搬送計画技術に関する。 The present invention relates to a robot cooperative transfer planning technique, and more particularly, to a robot cooperative transfer planning technique for obtaining an action plan of each robot for moving a moving object from a start position to a target position in cooperation with a plurality of robots.

近年、自律移動ロボットによる物品搬送の研究が活発に行われている。搬送の対象となる物品や搬送を行うロボットの種類もさまざまな中、複数のロボットを協調動作させて効率的な物品搬送を行う技術が求められており盛んに研究が行われている（非特許文献３）。複数のロボットによる効率的な物品の協調搬送を実現するには、複数のロボットの配置、動作順序などを事前に計画することが重要である。このような計画においては、当然ながら、複数のロボットが動作する実環境における障害物の存在や経路の形状なども十分に考慮しなければならない。 In recent years, research on article transportation by autonomous mobile robots has been actively conducted. Among various types of articles to be transported and the robots that transport them, there is a need for technology to efficiently transport articles by cooperating multiple robots (non-patented). Reference 3). In order to realize efficient cooperative conveyance of articles by a plurality of robots, it is important to plan in advance the arrangement and operation order of the plurality of robots. In such a plan, as a matter of course, it is necessary to sufficiently consider the presence of obstacles and the shape of a route in an actual environment where a plurality of robots operate.

このような計画計算を行うための効果的な手法の一つとして、マルコフ決定過程における動的計画法や強化学習の手法があり、さまざまな研究が行われている。
しかしながら、マルコフ決定過程における動的計画法や強化学習を使用してこのような計画を行おうとすると、単体のロボットを使用する場合に比べて複数のロボットを使用する場合には、その計算に要する時間や計算機の記憶容量がロボットの数に対して指数関数的に増大してしまうという問題があり、その主たる原因となるのが、探索計算のためのマルコフ状態空間内の状態数の莫大な増加である。このような問題の解決方法の一つに、たとえば、動的計画法や強化学習を使用した探索計算を行うための状態空間を階層的な構造を使用して構成する方法も考えられている（非特許文献１、非特許文献２）。 As an effective method for performing such a plan calculation, there are a dynamic programming method and a reinforcement learning method in the Markov decision process, and various studies have been conducted.
However, if such a plan is made using dynamic programming or reinforcement learning in the Markov decision process, the calculation is required when using a plurality of robots as compared to using a single robot. There is a problem that the time and the memory capacity of the computer increase exponentially with respect to the number of robots, and the main cause is the enormous increase in the number of states in the Markov state space for search calculation. It is. As one of the solutions to such a problem, for example, a method of constructing a state space for performing search calculation using dynamic programming or reinforcement learning using a hierarchical structure is also considered ( Non-patent document 1, Non-patent document 2).

Rajbara Makar, Sridhar Mahadevan, Mohammad Ghavamzaden, Hierarchical Multi-Agent Reinforcement Learning, AGENTS’01, pp246-253,2001.Rajbara Makar, Sridhar Mahadevan, Mohammad Ghavamzaden, Hierarchical Multi-Agent Reinforcement Learning, AGENTS’01, pp246-253, 2001. Bernhard Hengst, Discovering Hierarchy in Reinforcement Learning with HEXQ, 19th International Conference on Machine Learning, pp.243-250.Bernhard Hengst, Discovering Hierarchy in Reinforcement Learning with HEXQ, 19th International Conference on Machine Learning, pp.243-250. Natsuki Miyata, Jun Ota, Tamio Arai, and Hajime Asama, Cooperative Transport by Multiple Mobile Robots in Unknown Static Environments Associated With Real-Time Task Assignment , IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL.18, NO.5, pp.769-780, OCTOBER 2002.Natsuki Miyata, Jun Ota, Tamio Arai, and Hajime Asama, Cooperative Transport by Multiple Mobile Robots in Unknown Static Environments Associated With Real-Time Task Assignment, IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL.18, NO.5, pp.769- 780, OCTOBER 2002.

上記非特許文献１の手法では、階層的状態空間の構成そのものを人間が考案しなければならないという問題がある。上記非特許文献２の手法は、マルコフ状態空間内の状態変数のうち値の変化の頻度の低い変数を最下層として変化頻度の順に階層を構成していく手法であり、単純な問題であれば自動的な階層的状態空間の構成が可能であるが、複雑な問題に対しては自動的に階層的状態空間を構成することは難しく、実用的な問題に適用することが難しい。また、これらの従来技術は、単体のロボットを使用する場合について考慮されたものがほとんどであり、複数のロボットの使用を考慮した効果的な階層モデルの構築方法は提案されていないのが現状である。 The method of Non-Patent Document 1 has a problem that a human must devise the structure of the hierarchical state space itself. The method of Non-Patent Document 2 is a method of configuring a hierarchy in order of change frequency with a variable whose value change frequency is low among the state variables in the Markov state space as the lowest layer. Although it is possible to automatically construct a hierarchical state space, it is difficult to automatically construct a hierarchical state space for complex problems, and it is difficult to apply it to practical problems. In addition, most of these conventional technologies are considered when a single robot is used, and no effective hierarchical model construction method that considers the use of multiple robots has been proposed. is there.

このような現状に鑑みて、本発明では、複数のロボットの存在を考慮しつつも、計画計算に必要な計算時間や計算機の記憶容量を低減可能な、ロボット協調搬送計画技術を提供することを目的とする。 In view of such a current situation, the present invention provides a robot cooperative transport planning technology that can reduce the calculation time and the storage capacity of a computer necessary for the plan calculation while considering the presence of a plurality of robots. Objective.

本発明であるロボット協調搬送計画技術は次のとおりである。複数のロボットが協調して移動対象物（以下、物品という）を開始位置から目標位置まで移動させるための各ロボットの行動計画を求めるロボット協調搬送計画技術であって、物品の位置を状態変数とし、物品の移動方向を行動変数とする、マルコフ状態遷移モデルに基づき、開始位置から目標位置までの物品の移動経路をなす位置の連鎖（以下、最適経路点列という）を求める物品軌道計算処理と、最適経路点列に含まれる各位置を定義域とする物品の位置と、少なくとも各ロボットの取り得る行動の上記複数のロボット全てについての和集合に含まれる各行動を取り得る理想的なロボット（以下、理想ロボットという）の位置と、の組を状態変数とし、当該理想ロボットが取り得る行動を行動変数とする、マルコフ状態遷移モデルに基づき、最適経路点列に含まれる各位置に物品があるときに当該理想ロボットが取る行動（以下、最適行動列という）を決定して、最適経路点列のうち一つの理想ロボットが物品を移動させることのできる区間に最適経路点列を区分する位置（以下、ロボット切り替え位置という）と、最適行動列を分割して当該理想ロボットが各切り替え位置により区分された各区間に対応する行動列（以下、第１区間最適行動列という）とを、求める切り替え位置決定処理と、ロボット切り替え位置に応じて取り得る区間ごとに、当該区間に含まれる各位置を定義域とする物品の位置と、理想ロボットであって当該区間にて物品の搬送を担当していない理想ロボット（以下、第２理想ロボットという）の位置と、の組を状態変数とし、当該第２理想ロボットが取り得る行動を行動変数とする、マルコフ状態遷移モデルに基づき、当該区間よりも後の区間において物品を移動させることが可能な位置まで第２理想ロボットが移動するために、当該区間において第２理想ロボットが取るべき行動の系列（以下、第２区間最適行動列という）を求める移動経路計画処理と、各ロボットに、ロボット相互間で矛盾が無いように、第１区間最適行動列と第２区間最適行動列を割り当てるスケジューリング処理とを含む。 The robot cooperative transfer planning technique according to the present invention is as follows. This is a robot cooperative transfer planning technique for obtaining an action plan of each robot for moving a moving object (hereinafter referred to as an article) from a start position to a target position in cooperation with a plurality of robots, and using the position of the article as a state variable. An article trajectory calculation process for obtaining a chain of positions (hereinafter referred to as an optimum path point sequence) that forms a movement path of an article from a start position to a target position, based on a Markov state transition model in which the movement direction of the article is an action variable; An ideal robot that can take each action included in the union of all of the plurality of robots with the position of the article having each position included in the optimum path point sequence as a domain and at least the action that each robot can take ( Based on the Markov state transition model, in which a pair of the position of the ideal robot) is a state variable, and an action that the ideal robot can take is a behavior variable. , The action taken by the ideal robot when there is an article at each position included in the optimum route point sequence (hereinafter referred to as the optimum behavior sequence), and one ideal robot in the optimum route point sequence moves the article A position (hereinafter referred to as a robot switching position) that divides the optimal path point sequence into sections that can be used, and an action sequence (hereinafter referred to as the robot) that divides the optimal behavior sequence and divides the ideal robot by each switching position For each section that can be taken according to the robot switching position, the position of the article having each position included in the section as a domain, and the ideal robot And a state variable is a set of a position of an ideal robot (hereinafter referred to as a second ideal robot) that is not in charge of conveying articles in the section, and the second ideal robot Since the second ideal robot moves to a position where the article can be moved in the section after the section based on the Markov state transition model using the possible behavior as a behavior variable, the second ideal in the section The movement path planning process for obtaining a sequence of actions to be taken by the robot (hereinafter referred to as the second section optimum action sequence) and the first section optimum action sequence and the second section so that there is no contradiction among the robots. And a scheduling process for assigning an optimal action sequence.

スケジューリング処理では、ロボット切り替え位置に応じて取り得る区間ごとに、当該区間の始点位置を定義域とする物品の位置と、当該区間に含まれる各位置にて物品を押せるかもしくは各位置まで物品を押してきた位置を定義域とする各ロボットの位置と、各ロボットが当該区間の直前の区間における行動を識別するための情報と、の組を状態変数とし、各ロボットが物品を移動させる行動あるいは物品を移動させることが可能な位置までの移動あるいは待機行動を行動変数とする、マルコフ状態遷移モデルに基づき、当該状態変数に含まれる各変数に対応する各ロボットの行動を決定し、最適経路点列の最初から順に各変数に対応する行動を各ロボットに割り当てることによって、各ロボットに第１区間最適行動列と第２区間最適行動列を割り当てるようにしてもよい。 In the scheduling process, for each section that can be taken according to the robot switching position, the position of the article with the start point position of the section as the domain of definition and the article can be pushed at each position included in the section, or the article can be moved to each position. Actions or articles in which each robot moves an article with a state variable as a set of the position of each robot whose position is the pressed position and information for each robot to identify the action in the section immediately preceding the section Based on the Markov state transition model, where the behavior variable is the movement to the position where the robot can be moved or the standby behavior, the behavior of each robot corresponding to each variable included in the state variable is determined, and the optimal path point sequence By assigning actions corresponding to each variable to each robot in order from the beginning, the first section optimum action sequence and the second section optimum action are assigned to each robot. The may be allocated.

また、切り替え位置決定処理では、ロボット切り替え位置を、さらに、複数のロボットに含まれる一つのロボットが物品を移動させることのできる区間に区分する位置で書き換えたものを、ロボット切り替え位置としてもよい。 In the switching position determination process, the robot switching position may be rewritten at a position where the robot switching position is further divided into sections where one robot included in the plurality of robots can move the article.

また、移動経路計画処理では、各区間において、同じ位置で待機している、上記複数のロボットに含まれるロボットが、当該区間において移動させられている物品との衝突を回避するための行動も求めるようにしてもよい。 Further, in the movement route planning process, the robot included in the plurality of robots waiting at the same position in each section also obtains an action for avoiding a collision with an article moved in the section. You may do it.

本発明に拠れば、詳しくは後述するが、マルコフ状態空間を階層的に構成し（物品軌道計算処理を第１階層、切り替え位置決定処理を第２階層、移動経路計画処理を第３階層とする）、動作計画の探索を変化頻度の低い階層（第１階層）から行い、そして、変化頻度の低い階層での探索計算の結果を使用して、変化頻度の高い階層における探索の範囲を限定し、限られた状態空間において変化頻度の高い下位の階層（第２階層、第３階層）における探索を行う。つまり、階層化によって階層が増えるに従い空間に含まれる変数が増えていくが、本発明によると各変数の定義域が狭められていく。このため、動作計画の計算に必要な計算時間や計算機の記憶容量を低減可能となっている。 According to the present invention, as will be described in detail later, the Markov state space is hierarchically configured (the article trajectory calculation process is the first hierarchy, the switching position determination process is the second hierarchy, and the movement route planning process is the third hierarchy. ), The search of the action plan is performed from the layer with the low change frequency (the first layer), and the search calculation result in the layer with the low change frequency is used to limit the search range in the layer with the high change frequency. In the limited state space, a search is performed in a lower hierarchy (second hierarchy, third hierarchy) having a high change frequency. That is, the number of variables included in the space increases as the number of hierarchies increases, but according to the present invention, the definition area of each variable is narrowed. For this reason, it is possible to reduce the calculation time required for calculating the operation plan and the storage capacity of the computer.

ロボットの移動方向と物品搬送可能方向を例示する図。The figure which illustrates the moving direction of a robot, and an article conveyance possible direction. 搬送される物品の形状例と、物品を移動方向に移動させるための必要条件となるロボットの位置を説明するための図。The figure for demonstrating the example of the shape of the articles | goods conveyed, and the position of the robot used as a necessary condition for moving an article | item in a moving direction. 複数のロボットによる物品搬送任務の環境を例示する図。The figure which illustrates the environment of the goods conveyance mission by a some robot. 搬送計画装置の機能構成例を示す図。The figure which shows the function structural example of a conveyance planning apparatus. 物品軌道計算部の機能構成例を示す図。The figure which shows the function structural example of an article | item trajectory calculation part. 切り替え位置決定部の機能構成例を示す図。The figure which shows the function structural example of a switching position determination part. 移動経路計算部の機能構成例を示す図。The figure which shows the function structural example of a movement route calculation part.

[問題設定]
複数のロボットが協調して移動対象物を開始位置から目標位置まで搬送する任務は、例えば図３に例示するような壁で区切られた部屋にある物品（移動対象物）を開始位置から目標位置まで複数のロボットの協調行動によって搬送するものである。任務を行うロボットは、p台（p≧２）であり、各々のロボットは、物品を二次元平面におけるY軸方向（この例では、図面の紙面上下方向）にのみ押して動かすことができるロボット（図１(a)参照）、または、物品を二次元平面におけるX軸方向（この例では、図面の紙面左右方向）にのみ押して動かすことができるロボット（図１(b)参照)、または、物品を二次元平面におけるX-Y軸方向（この例では、図面の紙面上下左右の四方向）に押して動かすことができるロボット（図１（c）参照）のいずれかであるとする。なお、各ロボットは、二次元平面におけるX-Y軸方向（この例では、図面の紙面上下左右の四方向）に移動可能とする。 [Problem settings]
The task of a plurality of robots cooperating to transport a moving object from the start position to the target position is, for example, an article (moving object) in a room partitioned by walls as illustrated in FIG. 3 from the start position to the target position. It is transported by cooperative action of multiple robots. The robots that perform the mission are p units (p ≧ 2), and each robot can push and move an article only in the Y-axis direction on the two-dimensional plane (in this example, the vertical direction of the drawing). 1 (see FIG. 1 (a)), or a robot (see FIG. 1 (b)) that can push and move the article only in the X-axis direction (in this example, the left-right direction on the drawing) in the two-dimensional plane, or the article Is one of the robots (see FIG. 1 (c)) that can be pushed and moved in the XY-axis direction (in this example, the four directions up, down, left and right in the drawing) in the two-dimensional plane. Each robot is movable in the XY-axis directions on the two-dimensional plane (in this example, the four directions up, down, left and right in the drawing).

また、本発明を具体的に説明する観点から物品の平面図での形状は、例えば図２に示すように、複数の正方形のブロックが組み合わさった形状をしているものと仮定する。物品の位置は、物品を構成するブロックのうちのいずれか一つの格子位置（物品位置基準点）によって定義される。本実施形態では、図２に示すように、物品を構成するブロックのうち左下端のブロックを物品位置基準点とする。 Further, from the viewpoint of specifically describing the present invention, it is assumed that the shape of the article in a plan view is a shape in which a plurality of square blocks are combined as shown in FIG. The position of the article is defined by the grid position (article position reference point) of any one of the blocks constituting the article. In the present embodiment, as shown in FIG. 2, the block at the lower left corner among the blocks constituting the article is used as the article position reference point.

このような問題設定において、物品を目標位置に適切に移動させるには、ロボット同士の協調動作が不可欠であり、かつ障害物が存在しているので、どのようなタイミングで、どのロボットを「物品を押すことが可能な位置」に移動させておくか、を正確に計画しなければならない。 In such a problem setting, in order to properly move the article to the target position, cooperative operation between the robots is indispensable, and obstacles exist. You have to plan exactly whether you want to move it to a "possible position".

それぞれのロボットは、移動しようとする方向に障害物がある場合、もしくは物品を押そうとする方向に障害物がある場合に物品をその方向に押そうとした場合には、静止をするものと仮定する。複数のロボットは同一の位置に存在可能であると仮定し、ロボット同士の衝突はないものとする。 Each robot will be stationary if there is an obstacle in the direction of movement or if there is an obstacle in the direction of pushing the article and the article is pushed in that direction. Assume. It is assumed that a plurality of robots can exist at the same position, and there is no collision between the robots.

それぞれのロボットi（iはロボット番号を表す）の初期位置を（Xri0,Yri0）、物品の開始位置を（Xb0,Yb0）、物品の搬送先の目標位置を（Xbe,Ybe）とするとき、本問題は、初期位置に配置されたロボットが、物品を開始位置から目標位置まで移動させるための行動計画を求めることと定義できる。 When the initial position of each robot i (i represents the robot number) is (Xri0, Yri0), the start position of the article is (Xb0, Yb0), and the target position of the article transport destination is (Xbe, Ybe), This problem can be defined as a behavior plan for a robot placed at an initial position to move an article from a start position to a target position.

このような問題に対して単純にマルコフ状態遷移モデルを適用しようとする場合、マルコフ状態空間は、iをロボット番号としたとき、ロボットiの位置（Xri,Yri）、ロボットiの行動di、物品の位置（Xb,Yb）によって構成される。各状態（ロボットの位置と行動、物品の位置）は離散値で表現される。部屋をX,Yの直交座標系からなる２次元平面で表すと、X軸、Y軸をそれぞれ離散化表現した値により各位置を表現する。つまり、図３のように部屋（２次元平面）は格子で区切られ、各格子が各位置に対応する。また、各格子において、障害物の「ある／なし」が予め設定されている。 When a Markov state transition model is simply applied to such a problem, the Markov state space is the position of robot i (Xri, Yri), the action di of robot i, the article, where i is the robot number. (Xb, Yb). Each state (robot position and action, article position) is represented by discrete values. When a room is represented by a two-dimensional plane composed of an orthogonal coordinate system of X and Y, each position is represented by a discrete representation of the X axis and the Y axis. That is, as shown in FIG. 3, the room (two-dimensional plane) is divided by a grid, and each grid corresponds to each position. In each grid, “present / none” of the obstacle is set in advance.

また、行動主体は部屋に配置されている各ロボットとなる。ロボットi（iはロボット番号）の行動hiは、静止、上下左右方向への１格子分の移動、の計５種類のうちのいずれかを取る。例えば、hi∈{0,1,2,3,4}として、
0: 静止
1: 二次元平面上で右方向に１格子だけ移動する
2: 二次元平面上で上方向に１格子だけ移動する
3: 二次元平面上で左方向に１格子だけ移動する
4: 二次元平面上で下方向に１格子だけ移動する
とする。 The action subject is each robot arranged in the room. The action hi of the robot i (i is a robot number) takes one of a total of five types: stationary, movement of one lattice in the vertical and horizontal directions. For example, hi hi {0,1,2,3,4}
0: stationary
1: Move one grid to the right on the 2D plane
2: Move one grid upward on a two-dimensional plane
3: Move one grid to the left on the 2D plane
4: Suppose that one grid moves downward on a two-dimensional plane.

このような任務環境におけるマルコフ状態空間は、物品数×2＋ロボット数×2の次元数の状態を持ち、かつ選択可能な行動数は、ロボットの行動（＝５通り）のロボット数乗だけ存在する。例えば、物品数が１、ロボット数が２で、部屋の縦横方向の格子数がそれぞれ１０であるとすれば状態数は10×10×10×10×10×10=1000000にもなり、探索計算に要する資源の量は膨大なものとなる。さらにロボット数が１台増えるごとに、その状態数は100倍増加していくことになり、複数ロボットを使用する場合の大きな問題となっている。 The Markov state space in such a mission environment has a state of dimensions of the number of articles x 2 + the number of robots x 2, and the number of selectable actions exists by the number of robots that is the number of robot actions (= 5). . For example, if the number of articles is 1, the number of robots is 2, and the number of grids in the vertical and horizontal directions of the room is 10, the number of states will be 10 × 10 × 10 × 10 × 10 × 10 = 1000000, and search calculation The amount of resources required for this is enormous. Further, every time the number of robots increases, the number of states increases 100 times, which is a big problem when using multiple robots.

また、このような問題において上記非特許文献２にあるＨＥＸＱのような手法を適用すると、値の変化の頻度が低い物品位置の変数（Xb,Yb）を最上位の階層の状態空間を構成する変数とし、変化の頻度の高いロボットの位置の変数（Xri,Yri）を下層の階層の状態空間を構成する変数として階層化されることになる。しかし、物品位置が変化することでロボットの動作可能な範囲は刻々と変化する。そのような場合、ＨＥＸＱによれば最下層から探索を行うこととなるのだが、最下層においてロボットの動作計画をした場合に、刻々と変化する物品位置変数の値を考慮しないで探索を行っても不適切な結果しか得られないのは明白である。このように、最下層においても上位の階層の変数を考慮しなければならない、もしくは逆のケースも発生することが多いのが一般的な探索の問題である。或る変数は、最上位の階層にも含まれうるし、最下位の階層にも含まれうるのであり、単純に変数を各階層に割り振ることは容易ではない。 In addition, when a technique such as HEXQ in Non-Patent Document 2 is applied in such a problem, the variable (Xb, Yb) of the article position whose value change frequency is low constitutes the state space of the highest hierarchy. The variable (Xri, Yri) of the position of the robot with a high change frequency is hierarchized as a variable constituting the state space of the lower hierarchy. However, the range in which the robot can operate changes every moment as the article position changes. In such a case, according to HEXQ, the search is performed from the lowermost layer. However, when the robot motion plan is performed in the lowermost layer, the search is performed without considering the value of the article position variable that changes every moment. It is clear that only inappropriate results can be obtained. As described above, it is a general search problem that a variable in an upper hierarchy must be taken into consideration at the lowest layer or vice versa. A certain variable can be included in the highest hierarchy or in the lowest hierarchy, and it is not easy to simply assign a variable to each hierarchy.

[本発明における行動の扱いについて]
本来、物品の移動は、物品とは離れた位置にあるロボットが、物品を押すことのできる適切な位置まで移動した後に、「押す動作」を行うことで実現される。それを完璧に考慮するためには、物品位置（Xb,Yb）とその周囲に存在する各ロボットｉ位置（Xri,Yri）との相対的な位置関係を、障害物の存在も含めて考慮しなければならない。しかし、それをはじめからすべて考慮することは、探索計算の効率上適切ではない。そこで、本発明では、階層的に、ロボットの物品周りでの行動範囲を広げて考慮していく方法を採用する。以下、物品周りでの行動範囲を制限されたロボットを仮想ロボットとも呼ぶことにし、それぞれの階層にて仮想ロボットが物品を押すという前提で探索計算を行っていく。 [Handling of behavior in the present invention]
Originally, movement of an article is realized by performing a “pushing operation” after a robot located at a position away from the article moves to an appropriate position where the article can be pushed. In order to consider it perfectly, consider the relative positional relationship between the position of the article (Xb, Yb) and the surrounding robot i position (Xri, Yri), including the presence of obstacles. There must be. However, considering all of this from the beginning is not appropriate for the efficiency of search computation. Therefore, in the present invention, a method is adopted in which the range of action around the robot article is expanded and considered hierarchically. Hereinafter, a robot whose action range around the article is limited is also referred to as a virtual robot, and search calculation is performed on the assumption that the virtual robot presses the article at each level.

まず、第１階層（物品軌道計算処理）においては、行動主体を物品とし、物品が自分で移動するかのように扱う。つまり、第１階層では、物品と仮想ロボットの位置関係のうち、仮想ロボットが物品を押せる位置（仮想ロボットが物品に接し、かつ物品を適切な方向に押せる位置）だけを考慮する。そして、物品が開始位置から目標位置に移動するための最適な移動経路（最適経路点列）を求める。 First, in the first hierarchy (article trajectory calculation process), an action subject is an article, and the article is handled as if it were moving by itself. In other words, in the first hierarchy, only the position where the virtual robot can push the article (position where the virtual robot touches the article and pushes the article in an appropriate direction) among the positional relationship between the article and the virtual robot is considered. Then, an optimum movement route (optimum route point sequence) for the article to move from the start position to the target position is obtained.

次に、第２階層（切り替え位置決定処理）では、物品が仮想ロボットｖに押されて移動した後、当該仮想ロボットｖが引き続き次の物品移動をできる位置まで移動可能かを考慮する。つまり、仮想ロボットｖが物品に接してはいるが、指定の方向に物品を押せる場所にはいない場合も考慮され、仮想ロボットｖの取りうる行動として「仮想ロボットが物品を押せる位置に至るまでの移動」という行動が含まれる分、最上位階層よりも仮想ロボットｖの物品周りでの行動範囲が拡大される。そして、第１階層で求めた物品の最適な移動経路の系列の中で、１台の仮想ロボットｖが継続して物品を移動できる区間内の仮想ロボットｖの行動と、別のロボットに引き継がざるを得なくなる位置（切り替え位置）を求める。 Next, in the second hierarchy (switching position determination process), after the article is pushed and moved by the virtual robot v, it is considered whether or not the virtual robot v can continue to move to a position where the next article can be moved. In other words, a case where the virtual robot v is in contact with the article but is not in a place where the article can be pushed in a specified direction is considered, and the action that the virtual robot v can take is “until the virtual robot reaches a position where the article can be pushed. Since the action “movement” is included, the action range around the article of the virtual robot v is expanded more than the highest hierarchy. Then, in the sequence of the optimal movement route of the article obtained in the first hierarchy, the action of the virtual robot v in the section in which one virtual robot v can continuously move the article and the other robot are not taken over. Find the position (switching position) that no longer gets.

さらに、第３階層（移動経路計画処理）では、物品が仮想ロボットｖによって何度か移動させられた後に、その移動先にある物品の次の移動を他の仮想ロボットｗが実現することが可能な位置まで仮想ロボットｗが移動可能かどうかを考慮する。つまり、仮想ロボットｗに関して、物品から離れた位置にある状態が考慮される。そして、第１階層で求めた物品の最適な移動経路の系列のうち、１台のロボットが継続して移動可能な区間において、物品が当該区間の終端点に到達したとき（移動は仮想ロボットｖが行う）に、仮想ロボットｗが次の区間の開始位置に物品を移動させることが可能な位置に到達するための、仮想ロボットｗの行動（移動経路）を求める。なお、仮想ロボットｖが物品を押している間、仮想ロボットｗは、所定の位置で待機しなければならない場合も起こりうるが、その場合に、仮想ロボットｗが物品の移動の邪魔にならないように、適切に移動中物品との衝突を回避するための行動も同様に第３階層にて求める。 Further, in the third hierarchy (movement route planning process), after the article is moved several times by the virtual robot v, another virtual robot w can realize the next movement of the article at the destination. Consider whether the virtual robot w can move to a certain position. That is, a state in which the virtual robot w is located away from the article is considered. Then, when the article reaches the end point of the section in the section in which one robot can continuously move in the sequence of the optimal movement route of the article obtained in the first hierarchy (movement is a virtual robot v The action (movement route) of the virtual robot w for reaching the position where the virtual robot w can move the article to the start position of the next section is obtained. In addition, while the virtual robot v is pushing the article, the virtual robot w may have to wait at a predetermined position. In this case, the virtual robot w does not interfere with the movement of the article. An action for appropriately avoiding a collision with the moving article is also obtained in the third hierarchy.

このように、第１階層、第２階層、第３階層と処理が進むにつれて、仮想ロボット（ｖまたはｗ）と、仮想ロボット（ｖまたはｗ）が動かそうとしている物品と、の位置的な距離は広がっていく。しかし、第２階層での探索は、物品が第１階層において求められた軌道に沿ってのみ移動することを前提として行われ、第３階層での探索は、第２階層にて求められた単一のロボットで連続して物品が移動させられる区間の中での探索を行うため、探索において考慮しなければならない物品位置の範囲は徐々に狭められていくのである。そして、第２階層では物品を押すロボット（仮想ロボットｖ）の行動を決定し、第３階層では、仮想ロボットｖから引き継いで物品移動を担当するロボット（仮想ロボットｗ）が、物品を押せる位置に移動するための行動を決定する。最後に、仮想ロボットｖ及び仮想ロボットｗとして求まった行動を、実環境におけるどのロボットがどの順序で行うかをスケジューリング処理で決定する。 Thus, the positional distance between the virtual robot (v or w) and the article that the virtual robot (v or w) is about to move as the processing proceeds from the first hierarchy, the second hierarchy, and the third hierarchy. Will spread. However, the search in the second hierarchy is performed on the premise that the article moves only along the trajectory obtained in the first hierarchy, and the search in the third hierarchy is performed in the single hierarchy obtained in the second hierarchy. Since a search is performed in a section in which an article is continuously moved by one robot, the range of article positions that must be considered in the search is gradually narrowed. Then, in the second hierarchy, the behavior of the robot (virtual robot v) pushing the article is determined. In the third hierarchy, the robot (virtual robot w) taking over the virtual robot v and taking charge of the article movement is in a position where the article can be pushed. Determine actions to move. Finally, it is determined by scheduling processing which robot in the real environment performs in which order the behavior obtained as the virtual robot v and the virtual robot w.

[本発明の実施形態の概要]
本実施形態のロボット協調搬送計画装置Ａの構成を図４に示す。ロボット協調搬送計画装置Ａは、物品軌道計算部１００、切り替え位置決定部２００、移動経路計画部３００、スケジューリング部４００を含む。 [Outline of Embodiment of the Present Invention]
The configuration of the robot cooperative transfer planning apparatus A of this embodiment is shown in FIG. The robot cooperative transfer planning apparatus A includes an article trajectory calculation unit 100, a switching position determination unit 200, a movement route planning unit 300, and a scheduling unit 400.

物品軌道計算部１００（第１階層）、切り替え位置決定部２００（第２階層）、移動経路計画部（第３階層)では、探索に使用するマルコフ状態空間を構成する変数を以下のようにとる。 In the article trajectory calculation unit 100 (first layer), the switching position determination unit 200 (second layer), and the movement route planning unit (third layer), variables constituting the Markov state space used for the search are as follows. .

物品軌道計算部１００（第１階層）:
状態変数s1=（Xb,Yb），行動変数a1∈{0,1,2,3,4} Article trajectory calculation unit 100 (first layer):
State variable s1 = (Xb, Yb), action variable a1∈ {0,1,2,3,4}

切り替え位置決定部２００（第２階層）：
状態変数s2=(Xb,Yb,Xr,Yr)，行動変数a2∈{0,1,2,3,4} Switching position determination unit 200 (second layer):
State variable s2 = (Xb, Yb, Xr, Yr), action variable a2∈ {0,1,2,3,4}

移動経路計画部（第３階層)：
状態変数s3=(Xb,Yb,Xr,Yr)，行動変数a3∈{0,1,2,3,4,5} Travel route planning department (third layer):
State variable s3 = (Xb, Yb, Xr, Yr), action variable a3∈ {0,1,2,3,4,5}

物品軌道計算部１００は、物品位置を表す状態変数s1と、物品の移動方向を表す行動変数a1のみを考慮する。つまり、ここでの行動主体は物品であり、物品軌道計算部１００は、あたかも物品が自ら移動するものとして計算を行い、物品を押すロボットの位置は考慮しない。物品の移動方向を表す行動変数は、静止と、上下左右方向への１格子分の移動と、の５種類のいずれかを取れるものとする。物品軌道計算部１００は、これらの変数を利用して、（ロボットの位置やロボットが物品を押せる方向を考慮しないで）開始位置から目的位置までの物品移動の位置の連鎖を、最適経路点列（理想軌道）s1_trj=(s1_trj[0], s1_trj[1], …, s1_trj[e])として算出する。 The article trajectory calculation unit 100 considers only the state variable s1 representing the article position and the behavior variable a1 representing the movement direction of the article. That is, the action subject here is an article, and the article trajectory calculation unit 100 performs the calculation as if the article moves by itself, and does not consider the position of the robot pushing the article. It is assumed that the behavior variable representing the moving direction of the article can take any one of five types, that is, stationary and movement of one lattice in the vertical and horizontal directions. The article trajectory calculation unit 100 uses these variables to calculate the chain of positions of the article movement from the start position to the target position (without considering the position of the robot and the direction in which the robot can push the article), and the optimum path point sequence. (Ideal trajectory) s1_trj = (s1_trj [0], s1_trj [1], ..., s1_trj [e]).

切り替え位置決定部２００は、物品位置（Xb,Yb）と仮想ロボットの位置（Xr,Yr）からなる状態変数s2と、仮想ロボットが物品を押す方向を表す行動変数a2を考慮する。ここでの行動主体は、１台の仮想ロボットｖ（物品を押して移動させるロボット）である。 The switching position determination unit 200 considers a state variable s2 composed of an article position (Xb, Yb) and a virtual robot position (Xr, Yr), and an action variable a2 representing the direction in which the virtual robot pushes the article. The action subject here is one virtual robot v (a robot that pushes and moves an article).

切り替え位置決定部２００では、物品軌道計算部１００が求めた最適経路点列s1_trjに含まれる位置すべてが、状態変数s2に含まれる物品位置座標（Xb,Yb）の取りうる範囲となり、それ以外の物品位置は考慮されない。また、仮想ロボットｖが物品を押せる方向（行動変数a2の取りうる値）は、実環境における各ロボットが物品を押せる方向の和集合とする。 In the switching position determination unit 200, all the positions included in the optimum path point sequence s1_trj obtained by the article trajectory calculation unit 100 are within the range that can be taken by the article position coordinates (Xb, Yb) included in the state variable s2, and other than that. The article position is not considered. In addition, the direction in which the virtual robot v can push the article (the value that the action variable a2 can take) is a union of the directions in which each robot in the real environment can push the article.

切り替え位置決定部２００は、最適経路点列s1_trjの各位置の中で、仮想ロボットｖが継続して移動させることのできる最大範囲を計算する。例えば、s1_trj[5]からs1_trj[6]に物品を移動させたロボットが、さらにs1_trj[7]に物品を移動させるための位置に移動しようとしたときに、移動方向に障害物があって移動できない場合には、他の位置にいる別のロボットに行動を引き継ぐ必要がある。切り替え位置決定部２００は、最適経路点列s1_trjの各位置のうち、このように担当ロボットを切り替える必要のある場所（切り替え点）を決定する。 The switching position determination unit 200 calculates the maximum range in which the virtual robot v can continuously move in each position of the optimum route point sequence s1_trj. For example, when a robot that has moved an article from s1_trj [5] to s1_trj [6] tries to move it to a position for moving the article to s1_trj [7], it moves with an obstacle in the movement direction. If this is not possible, it is necessary to take over the action to another robot in another position. The switching position determination unit 200 determines a place (switching point) where the responsible robot needs to be switched in this way, among the positions of the optimum route point sequence s1_trj.

移動経路計画部３００は、物品位置（Xb,Yb）と仮想ロボット位置（Xr,Yr）からなる状態変数s3と、仮想ロボットの移動方向を表す行動変数a3を考慮する。ここでの行動主体は、物品を押せる位置まで移動する仮想ロボットｗ（物品を押していないロボット）である。 The movement path planning unit 300 considers a state variable s3 composed of the article position (Xb, Yb) and the virtual robot position (Xr, Yr), and an action variable a3 representing the movement direction of the virtual robot. The action subject here is a virtual robot w that moves to a position where the article can be pushed (a robot that does not push the article).

移動経路計画部３００は、s1_trj中の各位置のうち切り替え位置決定部２００によって決定された切り替え点で区切られた個々の範囲内の位置座標が、状態変数s3に含まれる物品位置（Xb,Yb）の取りうる範囲となり、それ以外の物品位置は考慮されない。また、仮想ロボットが移動できる方向（行動変数a3の取りうる値）は、a3∈｛0,1,2,3,4,5｝である。ここで、a3=5は、仮想ロボットｗは静止し、物品を押す仮想ロボットｖが物品を一回だけ押す（つまり、仮想ロボットｗが物品や物品を押すロボットｖの移動（行動）を邪魔しないように、静止する）ことを意味する。 The movement path planning unit 300 uses the position coordinates within the individual ranges delimited by the switching points determined by the switching position determination unit 200 among the positions in s1_trj as the article position (Xb, Yb) included in the state variable s3. ), And other article positions are not considered. Further, the direction in which the virtual robot can move (values that the behavior variable a3 can take) is a3ε {0, 1, 2, 3, 4, 5}. Here, when a3 = 5, the virtual robot w stops and the virtual robot v pushing the article pushes the article only once (that is, the virtual robot w does not disturb the movement (action) of the robot v pushing the article or the article) It means to be stationary.

移動経路計画部３００は、或るロボット（仮想ロボットｖ）が担当する区間の開始位置から終了位置（切り替え点）まで物品を移動を終えたときに、引き続き物品を押せる位置に仮想ロボットｗが到達しているように、仮想ロボットｗを移動させるための行動系列を決定する。 When the movement path planning unit 300 finishes moving the article from the start position to the end position (switching point) of the section handled by a certain robot (virtual robot v), the virtual robot w reaches the position where the article can be continuously pressed. As shown, the action sequence for moving the virtual robot w is determined.

物品軌道計算処理を上位の階層（第１階層）とし、切り替え位置決定処理、移動経路計画処理の順に、第２階層、第３階層とすると、本発明は、上位の階層（第１階層）で使用する変数を変化頻度の低い変数（物品位置の変数）とし、下位の階層（第３階層）で使用する変数を変化頻度の高い変数（ロボットの位置の変数）とする点では、上記非特許文献２の考え方とほぼ同じといえる。しかし、探索の計算を上記非特許文献２のように変化頻度の高い階層（第３階層）からではなく、変化頻度の低い階層（第１階層）から行う点が異なる。そして、変化頻度の低い階層での探索計算の結果を使用して、変化頻度の高い階層における探索の範囲を限定し、限られた空間（狭い空間）において変化頻度の高い下位の階層における変数の探索を行う。つまり、階層が増えるに従い空間に含まれる変数が増えていくが、その一方で各変数の定義域が狭められていく。 Assuming that the article trajectory calculation process is the upper hierarchy (first hierarchy), and the switching position determination process and the movement route planning process are the second hierarchy and the third hierarchy in this order, the present invention is the upper hierarchy (first hierarchy). The above-mentioned non-patent point is that the variable to be used is a variable with a low change frequency (article position variable), and the variable to be used at the lower hierarchy (third hierarchy) is a variable with a high change frequency (robot position variable). It can be said that it is almost the same as the concept of Reference 2. However, the difference is that the search calculation is performed not from the layer with high change frequency (third layer) as in Non-Patent Document 2, but from the layer with low change frequency (first layer). Then, by using the search calculation result in the hierarchy with low change frequency, the search range in the hierarchy with high change frequency is limited, and the variable in the lower hierarchy with high change frequency is limited in a limited space (narrow space). Perform a search. In other words, as the hierarchy increases, the number of variables included in the space increases, but on the other hand, the definition area of each variable is narrowed.

[実施形態の具体例]
以下、各部での探索計算の具体的な処理を叙述する。 [Specific Example of Embodiment]
Hereinafter, specific processing of search calculation in each unit will be described.

[物品軌道計算部；ステップＳ１]
物品軌道計算部１００は、どの物品位置（状態）s1=(Xb,Yb)がどれだけ目標位置に到達しやすい位置であるかを表す価値関数Ｖ^π(s1)を以下のマルコフ状態遷移モデルを利用して動的計画法により求め、開始位置から目標位置までの物品移動の最短経路を示す最適経路点列を出力する。 [Article Trajectory Calculation Unit; Step S1]
The article trajectory calculation unit 100 expresses a value function V ^π (s1) representing how much article position (state) s1 = (Xb, Yb) is likely to reach the target position by the following Markov state transition model. The optimum route point sequence indicating the shortest route for moving the article from the start position to the target position is output by using the dynamic programming method.

ここでは、マルコフ状態遷移モデルにおける「環境のとりうる離散的な状態の集合」を物品位置s1_i=（Xb_i,Yb_i）の集合S1=｛s1₀,s1₁,…,s1_n-1｝とする。例えば、対象とする部屋をＸ軸方向にＭ個の区間に区切り、各離散区間のインデックスを0,1,…,M-1とする。同様に、Ｙ軸方向にＮ個の区間に区切り、各離散区間のインデックスを0,1,…,N-1で表す。そして、各格子の位置をＸ軸とＹ軸のインデックス（Xb_i,Yb_j）で表す（0≦Xb_i≦M-1，0≦Yb_j≦N-1）。 Here, the set S1 = {s1 ₀ of "a set of discrete states that could be taken of the environment" the article position _{_{s1 i = (Xb i, Yb}} i) in Markov state transition _{_{model, s1 1, ..., s1 n}} -1 }. For example, the target room is divided into M sections in the X-axis direction, and the index of each discrete section is set to 0, 1,. Similarly, it is divided into N sections in the Y-axis direction, and the indices of the discrete sections are represented by 0, 1,..., N−1. The position of each lattice is represented by an index (Xb _i , Yb _j ) of the X axis and the Y axis (0 ≦ Xb _i ≦ M−1, 0 ≦ Yb _j ≦ N−1).

また、行動主体（物品）が取り得る行動a1をa1∈{0,1,2,3,4}とし、
0: 静止
1: 二次元平面上で右方向に物品が移動
2: 二次元平面上で上方向に物品が移動
3: 二次元平面上で左方向に物品が移動
4: 二次元平面上で下方向に物品が移動
とする。また、a1の取りうる値の集合A1は、A1=｛0,1,2,3,4｝である。 Also, the action a1 that the action subject (article) can take is a1∈ {0,1,2,3,4},
0: stationary
1: The article moves to the right on the two-dimensional plane.
2: The article moves upward on a two-dimensional plane
3: The article moves to the left on the 2D plane
4: An article moves downward on a two-dimensional plane. A set of values A1 that a1 can take is A1 = {0, 1, 2, 3, 4}.

物品軌道計算部１００のマルコフ状態空間では、実環境におけるどのロボットが（ロボット１とロボット２のどちらが）物品を押すのかは考慮しない。また、ロボットが物品を押せる位置まで移動するというプロセスも考慮しない。ただし、ロボットが物品を押すために必要な物品周りの位置に障害物があり、物品を押すロボットのためのスペースが確保されていない場合には、ロボットが物品を移動させることは不可能なので、その点のみを考慮する。このような場合とは、例えば、物品を右方向に移動させることを考えた場合、物品に左側に接する位置のすべてが障害物によって占められている場合などである。 In the Markov state space of the article trajectory calculation unit 100, which robot in the real environment (which robot 1 or robot 2) pushes the article is not considered. Also, the process of moving the robot to a position where the article can be pushed is not considered. However, if there is an obstacle around the position necessary for the robot to push the article, and there is not enough space for the robot pushing the article, the robot cannot move the article. Only that point is considered. Such a case is, for example, a case where the article is moved in the right direction, and the position where the article is in contact with the left side is occupied by an obstacle.

状態s1において行動a1∈A1を実行すると、状態（物品位置）は確率的に位置s1’∈S1へ遷移するものとする。その状態遷移確率P1を
P1(s1’|s1,a1)=Pr(s1_t+1=s1’|s1_t=s1,a1_t=a1)
によって表す。tは時刻を表す。そして、各s1,a1,s1’に対する状態遷移確率P(s1’|s1,a1)を次式のように設定する。
When the action a1εA1 is executed in the state s1, the state (article position) is assumed to transition to the position s1′εS1 in a probabilistic manner. The state transition probability P1
P1 (s1 '| s1, a1) = Pr (s1 _{t + 1} = s1' | s1 _t = s1, a1 _t = a1)
Is represented by t represents time. Then, the state transition probability P (s1 ′ | s1, a1) for each s1, a1, s1 ′ is set as follows:

ただし、＜条件Ａ＞は以下の(1)〜(5)のいずれかの条件を満たす場合である。
(1) a1=1 & Xb’=Xb+1 & Yb’=Yb & Judge(s1,a1)=YES
(2) a1=2 & Xb’=Xb & Yb’=Yb+1 & Judge(s1,a1)=YES
(3) a1=3 & Xb’=Xb-1 & Yb’=Yb & Judge(s1,a1)=YES
(4) a1=4 & Xb’=Xb & Yb’=Yb-1 & Judge(s1,a1)=YES
(5) a1=0 & Xb’=Xb & Yb’=Yb & Judge(s1,a1)=YES
ここで、「&」はＡＮＤ条件を示す。 However, <condition A> is a case where any one of the following conditions (1) to (5) is satisfied.
(1) a1 = 1 & Xb '= Xb + 1 &Yb' = Yb & Judge (s1, a1) = YES
(2) a1 = 2 & Xb '= Xb &Yb' = Yb + 1 & Judge (s1, a1) = YES
(3) a1 = 3 & Xb '= Xb-1 &Yb' = Yb & Judge (s1, a1) = YES
(4) a1 = 4 & Xb '= Xb &Yb' = Yb-1 & Judge (s1, a1) = YES
(5) a1 = 0 & Xb '= Xb &Yb' = Yb & Judge (s1, a1) = YES
Here, “&” indicates an AND condition.

また、Judge(s1, a1)を次式で与える。
Judge (s1, a1) is given by the following equation.

Judge(s1,a1)は、位置s1にある物品が行動a1に従った方向に移動できる場合にはYES、移動できない場合にはNOを返す関数である。実際には、物品が移動するためには、ロボットが物品をその方向に押さなければならないので、物品を押すことが可能な位置にロボットが入るスペースが空いている（障害物がない）ことが必要条件となる。Judge(s1,a1)は、この必要条件を満たすか否かをチェックする関数である。Ready_Pos_k(s1,a1)は、図２に示すように、物品位置s1にある物品を行動a1の示す方向に移動させるための、物品に接するロボットの位置のk個の候補の集合を表す。このうち、障害物に重ならないロボット位置がある（障害物位置に一致しないReady_Pos_k(s1,a1)の要素がある）場合には、位置s1にある物品をロボットが行動a1の示す方向に移動させることができる。そうでない場合には、移動させることができない。 Judge (s1, a1) is a function that returns YES if the article at position s1 can move in the direction according to action a1, and returns NO if it cannot move. Actually, in order for the article to move, the robot must push the article in that direction, so there is a space where the robot can enter the position where the article can be pushed (no obstacles). It becomes a necessary condition. Judge (s1, a1) is a function that checks whether or not this requirement is satisfied. As shown in FIG. 2, Ready_Pos_k (s1, a1) represents a set of k candidates for the position of the robot in contact with the article for moving the article at the article position s1 in the direction indicated by the action a1. Among these, when there is a robot position that does not overlap the obstacle (there is an element of Ready_Pos_k (s1, a1) that does not match the obstacle position), the robot moves the article at position s1 in the direction indicated by action a1. be able to. Otherwise it cannot be moved.

なお、Ready_Pos_k(s1,a1)の候補位置を計算するためには、例えば、物品位置基準点((Xb,Yb)=(0,0))において、行動a1の示す方向に物品を移動させるためのロボットの位置の候補Pos_1(a1), Pos_2(a1), …, Pos_k(a1)を予め記憶しておき、各位置Pos_j(a1) (j=1,2,…,k)とs1の位置座標を要素ごとに足し合わせることで計算できる。Pos_j(a1)=(XPos_j(a1),YPos_j(a1)), s1=(Xb,Yb)であれば、(XPos_j(a1)+Xb,YPos_j(a1)+Yb)を全てのjについて計算すればよい。 In order to calculate the candidate position of Ready_Pos_k (s1, a1), for example, at the article position reference point ((Xb, Yb) = (0, 0)), to move the article in the direction indicated by the action a1. Pos_1 (a1), Pos_2 (a1),…, Pos_k (a1) are stored in advance, and each position Pos_j (a1) (j = 1,2,…, k) and the position of s1 It can be calculated by adding the coordinates for each element. If Pos_j (a1) = (XPos_j (a1), YPos_j (a1)), s1 = (Xb, Yb), (XPos_j (a1) + Xb, YPos_j (a1) + Yb) is calculated for all j. That's fine.

上記＜条件Ａ＞の(1)〜(5)における前半３つの条件は、現在の物品位置s1と物品の移動方向a1の組み合わせと、状態遷移後の物品位置s1’とが整合しているか否かを判定するための条件である。例えば、＜条件Ａ＞の(1)の場合、a1=1(右方向に物品が移動）かつs1=(Xb,Yb)の場合、状態遷移後の物品位置はs1’=(Xb+1,Yb)となる。それ以外の位置に物品が移動することは有り得ない。 The first three conditions in (1) to (5) of <Condition A> are based on whether or not the combination of the current article position s1 and the article movement direction a1 matches the article position s1 ′ after the state transition. This is a condition for determining whether or not. For example, in the case of (1) of <Condition A>, when a1 = 1 (the article moves in the right direction) and s1 = (Xb, Yb), the article position after the state transition is s1 ′ = (Xb + 1, Yb). The article cannot move to any other position.

上記式（１）は、現在の物品位置s1と物品の移動方向a1の組み合わせと、状態遷移後の物品位置s1’とが整合していて、かつ、物品を押すことが可能な位置にロボットが入るスペースが空いている（障害物がない）場合には、状態遷移確率を１とする。また、この条件を満たさない状態に遷移することはできないため、状態遷移確率を０とすることを意味している。 The above formula (1) indicates that the combination of the current article position s1 and the article movement direction a1 is aligned with the article position s1 ′ after the state transition, and the robot is at a position where the article can be pushed. When the space to enter is vacant (there is no obstacle), the state transition probability is set to 1. Moreover, since it cannot change to the state which does not satisfy | fill this condition, it means making a state transition probability 0.

また、状態s1において行動a1∈A1を実行し、状態（物品位置）がs1’∈S1へ遷移したとき、環境から行動主体へ与えられる報酬R1を次式で設定する。
Further, when the action a1εA1 is executed in the state s1 and the state (article position) transits to s1′εS1, a reward R1 given from the environment to the action subject is set by the following equation.

物品軌道計算部１００は、上記のマルコフ状態空間を用いて、最適な価値関数Ｖ^π1（ｓ１）及び方策π１をダイナミックプログラミング（Dynamic Programming）法により計算する（例えば、三上貞芳、皆川雅章共訳、R.S.Sutton、A.G.Barto 原著「強化学習」森北出版、1998、pp.94-118参照）。 The article trajectory calculation unit 100 calculates the optimum value function V ^π1 (s1) and policy π1 by the dynamic programming method using the Markov state space (for example, Sadayoshi Mikami, Masaaki Minagawa) , RSSutton, AGBarto Original work “Reinforcement Learning” Morikita Publishing, 1998, pp.94-118).

物品軌道計算部１００の詳細構成を図５に示す。物品軌道計算部１００は、物品軌道計算用状態遷移モデル設定部１０１、物品軌道計算用価値関数初期化部１０２、物品軌道計算用方策更新部１０３、物品軌道計算用価値関数更新部１０４、物品軌道計算用価値関数収束判定部１０５、最適経路点列計算部１０６を含む。 A detailed configuration of the article trajectory calculation unit 100 is shown in FIG. The article trajectory calculation unit 100 includes an article trajectory calculation state transition model setting unit 101, an article trajectory calculation value function initialization unit 102, an article trajectory calculation policy update unit 103, an article trajectory calculation value function update unit 104, and an article trajectory. A calculation value function convergence determination unit 105 and an optimum path point sequence calculation unit 106 are included.

物品軌道計算用状態遷移モデル設定部１０１は、各物品位置s1と物品の移動方向a1と、状態遷移後の物品位置s1’と、の全ての組み合わせについて、上記式（１）の状態遷移確率と、上記式（２）の報酬とを設定する。 The state transition model setting unit 101 for article trajectory calculation uses the state transition probability of the above equation (1) for all combinations of each article position s1, the movement direction a1 of the article, and the article position s1 ′ after the state transition. The reward of the above formula (2) is set.

物品軌道計算用価値関数初期化部１０２は、各s1∈S1についての価値関数Ｖ^π1（s1)の初期値を設定する。初期値としては適当な値を設定すればよいが、本実施形態では、初期値はすべてのs1についてＶ^π1（s1)＝０と設定する。 The article trajectory calculation value function initialization unit 102 sets an initial value of the value function V ^π1 (s1) for each s1εS1. An appropriate value may be set as the initial value, but in this embodiment, the initial value is set to V ^π1 (s1) = 0 for all s1.

物品軌道計算用方策更新部１０３は、すべてのs1∈S1及びa1∈A1の組み合わせについて式（３）を計算し、Ｖ^π1（s1）の値が最大となる行動a1を方策π１(s1)の値に設定する。
The article trajectory calculation policy updating unit 103 calculates Expression (3) for all combinations of s1∈S1 and a1∈A1, and determines the action a1 having the maximum value of V ^π1 (s1) as the policy π1 (s1). Set to value.

ここで、Σは取りうるすべてのs1’についての総和を表し、γは予め定められた0<γ<1.0を満たす定数である。また、式（３）の右辺におけるＶ^π1(s1)の値は、物品軌道計算用方策更新部１０３の最初の実行時は物品軌道計算用価値関数初期化部１０２で設定された初期値が利用され、２回目以降の実行では、以下の物品軌道計算用価値関数更新部１０４で更新された価値関数の値が用いられる。 Here, Σ represents the sum of all possible s1 ′, and γ is a constant that satisfies a predetermined 0 <γ <1.0. The value of V ^π1 (s1) on the right side of the equation (3) is the initial value set by the article trajectory calculation value function initialization unit 102 when the article trajectory calculation policy update unit 103 is executed for the first time. In the second and subsequent executions, the value of the value function updated by the following item trajectory calculation value function updating unit 104 is used.

物品軌道計算用価値関数更新部１０４は、各s1∈S1について、式（４）により価値関数Ｖ^π（ｓ１）を更新する。
The article trajectory calculation value function updating unit 104 updates the value function V ^π (s1) with respect to each s1εS1 according to the equation (4).

物品軌道計算用価値関数収束判定部１０５は、物品軌道計算用価値関数更新部１０４における更新後のＶ^π1(s1)の値と更新前のＶ^π1(s1)との誤差が所定の閾値以下となるまで、上記物品軌道計算用方策更新部１０３に戻り、方策と価値関数の更新処理を繰り返す。所定の閾値以下となった場合には、価値関数Ｖ^π1(s1)が収束したと判定し、価値関数Ｖ^π1(s1)と方策π１(s1)を最適経路点列計算部１０６に出力する。 Article trajectory calculation value function convergence determining unit 105, and the following error is a predetermined threshold value and updating the previous V of V ^.pi.1 updated in article trajectory calculation value function update unit 104 (s1) ^π1 (s1) Until this time, the process returns to the article trajectory calculation policy updating unit 103, and the policy and value function update processing is repeated. When the value is equal to or less than the predetermined threshold, it is determined that the value function V ^π1 (s1) has converged, and the value function V ^π1 (s1) and the policy π1 (s1) are output to the optimum path point sequence calculation unit 106.

こうして求まった価値関数Ｖ^π1(s1)は、物品位置s1が、どれだけ目標位置に到達しやすい位置であるかを表す指標である。これにより、部屋内の任意の位置から目標位置までの物品移動の最短経路を示す物品位置の系列を求めることができるようになる。 The value function V ^π1 (s1) thus obtained is an index representing how easily the article position s1 can reach the target position. As a result, a series of article positions indicating the shortest path of article movement from an arbitrary position in the room to the target position can be obtained.

最適経路点列計算部１０６は、物品軌道計算用価値関数収束判定部１０５から出力された価値関数Ｖ^π1(s1)と方策π１(s1)を用いて、開始位置から目標位置までの最短経路となる物品位置の系列（最適経路点列）を算出する。例えば、開始位置をs1_trj[0]∈S1とした場合、π１(s1_trj[0])の値が指す物品の移動を行い、位置s1_trj[1]∈S1に移動する。続いてs1_trj[1]にてπ１（s1_trj[1]）の値が指す物品の移動を行い位置s1_trj[2]に移動する。これを目標位置s1_trj[e]∈S1に到達するまで繰り返すことで、開始位置から目標位置に至る物品の最短経路に含まれる途中点の点列（最適経路点列）：s1_trj=(s1_trj[0], s1_trj[1], …, s1_trj[e])を求める（s1_trj[n]=(Xb_trj[n],Yb_trj[n]) (n=0, 1, …, e)）。また、このときの物品の移動方向を示す点列a1_trj=(a1_trj[0], a1_trj[1], …, a1_trj[e])=(π１(a1_trj[0]), π１(a1_trj[1]), …, π１(a1_trj[e])を求める。そして、最適経路点列計算部１０６は、最適経路点列s1_trjと移動方向の系列a1_trjを出力する。 The optimum path point sequence calculation unit 106 uses the value function V ^π1 (s1) and the policy π1 (s1) output from the article function calculation value function convergence determination unit 105 to determine the shortest path from the start position to the target position. A series of article positions (optimum route point sequence) is calculated. For example, when the start position is s1_trj [0] ∈S1, the article indicated by the value of π1 (s1_trj [0]) is moved and moved to the position s1_trj [1] ∈S1. Subsequently, in s1_trj [1], the article indicated by the value of π1 (s1_trj [1]) is moved to move to position s1_trj [2]. By repeating this until the target position s1_trj [e] ∈S1 is reached, a point sequence of the midpoints included in the shortest path of the article from the start position to the target position (optimum path point sequence): s1_trj = (s1_trj [0 ], s1_trj [1], ..., s1_trj [e]) (s1_trj [n] = (Xb_trj [n], Yb_trj [n]) (n = 0, 1, ..., e)). In addition, the point sequence a1_trj = (a1_trj [0], a1_trj [1], ..., a1_trj [e]) = (π1 (a1_trj [0]), π1 (a1_trj [1]) indicating the moving direction of the article at this time ,..., Π1 (a1_trj [e]), and the optimum route point sequence calculation unit 106 outputs the optimum route point sequence s1_trj and the movement direction sequence a1_trj.

[切り替え位置決定部；ステップＳ２]
切り替え位置決定部２００は、最適経路点列s1_trj=(s1_trj[0], s1_trj[1], …, s1_trj[e])の各位置の中で一台のロボットが継続して押し続けることができる位置がどの区間であるかを、マルコフ状態遷移モデルを利用して動的計画法により計算する。 [Switching position determination unit; step S2]
The switching position determination unit 200 can be continuously pushed by one robot in each position of the optimum route point sequence s1_trj = (s1_trj [0], s1_trj [1],..., S1_trj [e]). Which section the position is in is calculated by dynamic programming using a Markov state transition model.

例えば、一台のロボットが継続してs1_trj[0]からs1_trj[4]までの各位置を経由して物品を移動させることが可能であれば、s1_trj[0], s1_trj[1], s1_trj[2], s1_trj[3], s1_trj[4]の位置については一台のロボットに担当させた方が得策である。逆にたとえばs1_trj[3]からs1_trj[4]に物品を移動させたロボットが、次のs1_trj[5]に物品を移動させるための方向に物品を押せる位置に移動することが障害物によって阻まれている場合、他の位置にいる別のロボットに物品移動を引き継いだ方が得策である。すなわち、切り替え位置決定部２００は、最適経路点列s1_trj=(s1_trj[0], s1_trj[1], …, s1_trj[e])の各位置での移動の中で、担当ロボットを切り替える位置（別のロボットに移動を引き継がざるを得ない位置）を計算する。 For example, if one robot can continuously move an article through each position from s1_trj [0] to s1_trj [4], s1_trj [0], s1_trj [1], s1_trj [ 2), s1_trj [3], and s1_trj [4] are better served by one robot. Conversely, for example, a robot that has moved an article from s1_trj [3] to s1_trj [4] is prevented by an obstacle from moving to a position where the article can be pushed to the next s1_trj [5] to move the article. If it is, it is better to take over the movement of the article to another robot in another position. That is, the switching position determination unit 200 switches positions of the robot in charge during the movement at each position of the optimum route point sequence s1_trj = (s1_trj [0], s1_trj [1],..., S1_trj [e]) The position where the robot must take over the movement) is calculated.

ここでは、マルコフ状態遷移モデルにおける「環境のとりうる離散的な状態の集合」を、最適経路点列s1_trjに含まれる各物品位置（Xb,Yb）とロボットの位置（Xrv,Yrv）の組の取りうる値の集合とする。つまり、s2=(Xb,Yb,Xrv,Yrv)としたとき、(Xb,Yb)∈s1_trjとする。s1_trj以外の物品位置の値は探索にて考慮されない。また、行動主体（ロボット）は、実環境における全てのロボットの行動の集合を取りうる行動とするような仮想ロボット一台のみを考慮する。 Here, the “set of discrete states that the environment can take” in the Markov state transition model is defined as the set of each article position (Xb, Yb) and robot position (Xrv, Yrv) included in the optimal path point sequence s1_trj. A set of possible values. That is, when s2 = (Xb, Yb, Xrv, Yrv), (Xb, Yb) ∈ s1_trj. Article position values other than s1_trj are not considered in the search. Further, only one virtual robot that considers a set of actions of all robots in the real environment is considered as the action subject (robot).

上述の問題設定の場合は、
ロボット１…Ｙ軸（上下）方向に物品を押す、または、静止
ロボット２…Ｘ軸（左右）方向へ物品を押す、または、静止
の行動をとることができるので、切り替え位置決定部２００における行動主体（ロボット）は、上下左右の４方向すべてに物品を押せるとする。行動主体（ロボット）が取り得る行動a2をa2∈{0,1,2,3,4}とし、
0: 静止
1: 二次元平面上で右方向にロボットが移動
2: 二次元平面上で上方向にロボットが移動
3: 二次元平面上で左方向にロボットが移動
4: 二次元平面上で下方向にロボットが移動
とする。また、a2の取りうる値の集合Ａ２をA2={0,1,2,3,4}とする。 For the above problem setting,
Since the robot 1... Pushes the article in the Y-axis (up and down) direction, or the stationary robot 2... Pushes the article in the X-axis (left and right) direction, or can take a stationary action. Assume that the subject (robot) can push the article in all four directions, up, down, left, and right. The action a2 that the action subject (robot) can take is a2∈ {0,1,2,3,4},
0: stationary
1: Robot moves to the right on the 2D plane
2: Robot moves upward on 2D plane
3: Robot moves to the left on the 2D plane
4: The robot moves downward on a two-dimensional plane. A set A2 of values that a2 can take is set to A2 = {0,1,2,3,4}.

状態s2∈S2において行動a2∈A2を実行したとき、状態s2’∈S2へ遷移する状態遷移確率を
P2(s2’|s2,a2)=Pr(s2_t+1=s2’|s2_t=s2,a2_t=a2)
とし、各s2, a2, s2’に対する状態遷移確率P2(s2’|s2,a2)を次式のように設定する。
When the action a2∈A2 is executed in the state s2∈S2, the state transition probability to transit to the state s2′∈S2 is
P2 (s2 '| s2, a2) = Pr (s2 _{t + 1} = s2' | s2 _t = s2, a2 _t = a2)
And the state transition probability P2 (s2 ′ | s2, a2) for each s2, a2, s2 ′ is set as follows:

ただし、＜条件Ｂ＞は下記(1)〜(6)のいずれかを満たす場合である。
(1) Xb=Xb’ & Yb=Yb’& a2=1 & Xrv’=Xrv+1 & Yrv’=Yrv
(2) Xb=Xb’ & Yb=Yb’& a2=2 & Xrv’=Xrv & Yrv’=Yrv+1
(3) Xb=Xb’ & Yb=Yb’& a2 =3 & Xrv’=Xrv-1 & Yrv’=Yrv
(4) Xb=Xb’ & Yb=Yb’& a2 =4 & Xrv’=Xrv & Yrv’=Yrv-1
(5) Xb=Xb’ & Yb=Yb’& a2=0 & Xrv’=Xrv & Yrv’=Yrv
(6) （Xb,Yb）=s1_trj[l+1] & (Xb’,Yb’)=s1_trj[l+2] & a2=a1_trj[l+1] & (Xrv,Yrv)=Ready_Pos_k(s1_trj[l+1],a1_trj[l+1]) && （Xrv’,Yrv’）=a1_trj[l+1]により移動した後の位置 However, <Condition B> is when the following conditions (1) to (6) are satisfied.
(1) Xb = Xb '& Yb = Yb'& a2 = 1 & Xrv '= Xrv + 1 &Yrv' = Yrv
(2) Xb = Xb '& Yb = Yb'& a2 = 2 & Xrv '= Xrv &Yrv' = Yrv + 1
(3) Xb = Xb '& Yb = Yb'& a2 = 3 & Xrv '= Xrv-1 &Yrv' = Yrv
(4) Xb = Xb '& Yb = Yb'& a2 = 4 & Xrv '= Xrv &Yrv' = Yrv-1
(5) Xb = Xb '& Yb = Yb'& a2 = 0 & Xrv '= Xrv &Yrv' = Yrv
(6) (Xb, Yb) = s1_trj [l + 1] & (Xb ', Yb') = s1_trj [l + 2] & a2 = a1_trj [l + 1] & (Xrv, Yrv) = Ready_Pos_k (s1_trj [ l + 1], a1_trj [l + 1]) && (Xrv ', Yrv') = position after moving by a1_trj [l + 1]

また、状態s2∈S2において行動a2∈A2を実行し、状態（物品位置）がs2’∈S2へ遷移したとき、環境から行動主体へ与えられる報酬R2を次式で設定する。
Further, when the action a2∈A2 is executed in the state s2∈S2 and the state (article position) transits to s2′∈S2, the reward R2 given to the action subject from the environment is set by the following equation.

以上のように定義されたP2(s2’|s2,a2)とR2(s2’,s2,a2)を使用して動的計画法にて方策π２(s2)を求めると、π２(s2)は、物品が位置s1_trj[l+1]にあるときに、物品を位置s1_trj[l+1]から位置s1_trj[l+2]に移動させるための行動a1_trj[l+1]を取れる位置の集合(Ready_Pos_k(s1_trj[l+1],a1_trj[l+1]))のうちの少なくとも一箇所に仮想ロボットが到達するための、仮想ロボットｖの行動方策（移動のための方策）となる。すなわち、s2=(Xb_trj[l+1],Yb_trj[l+1],Xrv,Yrv)における価値関数値Ｖ^π2（s2）の値が0以上の場合、状態s2が示す位置（Xrv,Yrv）にある仮想ロボットｖは、状態s2が示す物品位置s1_trj_l+1=(Xb_trj[l+1],Yb_trj[l+1])にある物品を次の位置に押すことが可能な位置（Ready_Pos_k(s1_trj[l+1],a1_trj[l+1])のいずれかの位置）に移動することが可能であることを示す。また、その時の移動は、π２(s2)が示す行動変数の値に従えばよい。 Using the above defined P2 (s2 '| s2, a2) and R2 (s2', s2, a2) to find policy π2 (s2) by dynamic programming, π2 (s2) is , When the article is at position s1_trj [l + 1], a set of positions that can take action a1_trj [l + 1] to move the article from position s1_trj [l + 1] to position s1_trj [l + 2] ( Ready_Pos_k (s1_trj [l + 1], a1_trj [l + 1])) is an action policy (a policy for movement) of the virtual robot v for the virtual robot to reach at least one place. That is, when the value of the value function value V ^π2 (s2) in s2 = (Xb_trj [l + 1], Yb_trj [l + 1], Xrv, Yrv) is 0 or more, the position (Xrv, Yrv) indicated by the state s2 The virtual robot v in the position s1_trj _{l + 1} = (Xb_trj [l + 1], Yb_trj [l + 1]) indicated by the state s2 can push the article at the next position (Ready_Pos_k ( s1_trj [l + 1] and a1_trj [l + 1])). Further, the movement at that time may follow the value of the behavior variable indicated by π2 (s2).

ここで今、位置Ready_Pos_k(s1_trj[l],a1_trj[l])にあったロボットが位置s1_trj[l]にある物品を行動a1_trj[l]の指す方向に押してロボット自身も移動したときに到達するはずの位置は最大K個存在し、それらのうち、障害物に重ならない位置は、Ready_Pos_k(s1_trj[l+1],a1_trj[l])=(Xrl[k],Yrl[k]) (k=0,1,…,K-1)と表現できる。すると、位置s1_trj[l+1]の物品位置において、K個の点s2[k]=(Xb_trj[l+1],Yb_trj[l+1],Xrl[k],Yrl[k]) (ただし、k=0,1,…,K-1、(Xrl[k],Yrl[k])∈Ready_Pos_k(s1_trj[l+1],a1_trj[l])、s2[k]∈S2)における価値関数値Ｖ^π2（s2[k]）=Ｖ^π2（Xb_trj[l+1],Yb_trj[l+1],Xrvl[k],Yrvl[k]）のうち全ての値が０より大きければ、同じロボットが続けて物品をs1_trj[l+2]に移動させることが可能であり、そうでない場合には物品をs1_trj[l+1]に移動させたロボットは障害物に阻まれてこれ以上物品を動かすことができないので、別のロボットに切り替える必要があることが分かる。 Here, it arrives when the robot at the position Ready_Pos_k (s1_trj [l], a1_trj [l]) pushes the article at the position s1_trj [l] in the direction indicated by the action a1_trj [l] and moves itself There are a maximum of K positions, and the position that does not overlap the obstacle is Ready_Pos_k (s1_trj [l + 1], a1_trj [l]) = (Xrl [k], Yrl [k]) (k = 0,1, ..., K-1). Then, at the article position at the position s1_trj [l + 1], K points s2 [k] = (Xb_trj [l + 1], Yb_trj [l + 1], Xrl [k], Yrl [k]) (however, , K = 0,1,…, K-1, (Xrl [k], Yrl [k]) ∈Ready_Pos_k (s1_trj [l + 1], a1_trj [l]), value function in s2 [k] ∈S2) If all of the values V ^π2 (s2 [k]) = V ^π2 (Xb_trj [l + 1], Yb_trj [l + 1], Xrvl [k], Yrvl [k]) are greater than 0, the same robot Can continue to move the item to s1_trj [l + 2], otherwise the robot that moved the item to s1_trj [l + 1] will move the item further due to obstruction Since it is not possible, it turns out that it is necessary to switch to another robot.

切り替え位置決定部２００の詳細構成を図６に示す。切り替え位置決定部２００は、切り替え位置決定用状態遷移モデル設定部２０１、切り替え位置決定用価値関数初期化部２０２、切り替え位置決定用方策更新部２０３、切り替え位置決定用価値関数更新部２０４、切り替え位置決定用価値関数収束判定部２０５、切り替え位置判定部２０６、方策分割部２０７を含む。 A detailed configuration of the switching position determination unit 200 is shown in FIG. The switching position determination unit 200 includes a switching position determination state transition model setting unit 201, a switching position determination value function initialization unit 202, a switching position determination policy update unit 203, a switching position determination value function update unit 204, and a switching position. A decision value function convergence determination unit 205, a switching position determination unit 206, and a policy division unit 207 are included.

切り替え位置決定用状態遷移モデル設定部２０１は、上記式（５）の状態遷移確率と式（６）の報酬とを設定する。 The switching position determination state transition model setting unit 201 sets the state transition probability of Expression (5) and the reward of Expression (6).

切り替え位置決定用価値関数初期化部２０２、切り替え位置決定用方策更新部２０３、切り替え位置決定用価値関数更新部２０４、切り替え位置決定用価値関数収束判定部２０５の各処理は、状態遷移確率と報酬が式（５）と式（６）にそれぞれ置き換わる点を除いては、物品軌道計算部１００で説明した処理と同じである。 Each process of the switching position determination value function initialization unit 202, the switching position determination policy update unit 203, the switching position determination value function update unit 204, and the switching position determination value function convergence determination unit 205 includes a state transition probability and a reward. Is the same as the processing described in the article trajectory calculation unit 100, except that is replaced by Equation (5) and Equation (6), respectively.

切り替え位置決定用価値関数初期化部２０２は、各s2∈S2についての価値関数Ｖ^π2（s2)の初期値を0に設定する。 The switching position determination value function initialization unit 202 sets the initial value of the value function V ^π2 (s2) for each s2εS2 to zero.

切り替え位置決定用方策更新部２０３は、すべてのs2∈S2及びa2∈A2の組み合わせについて次式を計算し、Ｖ^π2（s2）の値が最大となる行動a2を方策π２(s2)の値に設定する。
The switching position determination policy updating unit 203 calculates the following expression for all combinations of s2∈S2 and a2∈A2, and sets the action a2 having the maximum value of V ^π2 (s2) to the value of the policy π2 (s2). Set.

ここで、Σは取りうるすべてのs2’についての総和を表し、γは予め定められた0<γ<1.0の定数である。また、式（７）におけるＶ^π2(s2’)の値は、切り替え位置決定用方策更新部２０３の最初の実行時は切り替え位置決定用価値関数初期化部２０２で設定された初期値が利用され、２回目以降の実行では、以下の切り替え位置決定用価値関数更新部２０４で更新された価値関数の値が用いられる。 Here, Σ represents the sum of all possible s2 ′, and γ is a predetermined constant of 0 <γ <1.0. In addition, as the value of V ^π2 (s2 ′) in Expression (7), the initial value set by the switching position determination value function initialization unit 202 is used when the switching position determination policy update unit 203 is first executed. In the second and subsequent executions, the value of the value function updated by the following switching position determination value function updating unit 204 is used.

切り替え位置決定用価値関数更新部２０４は、各s2∈S2について、次式により価値関数Ｖ^π2（ｓ２）を更新する。
The switching position determining value function updating unit 204 updates the value function V ^π2 (s2) according to the following equation for each s2εS2.

切り替え位置決定用価値関数収束判定部２０５は、切り替え位置決定用価値関数更新部２０４における更新後のＶ^π2(s2)の値と更新前のＶ^π2(s2)との誤差が所定の閾値以下となるまで、上記切り替え位置決定用方策更新部２０３に戻り、方策と価値関数の更新処理を繰り返す。所定の閾値以下となった場合には、価値関数Ｖ^π2(s2)が収束したと判定し、切り替え位置決定用価値関数収束判定部２０５は、価値関数Ｖ^π2(s2)と方策π２(s2)を切り替え位置判定部２０６に出力する。 Value for switching the position determination function convergence determining unit 205, the error value and updates the previous ^V π2 (s2) of the V ^.pi.2 updated in the switching position determination value function update unit 204 (s2) and a predetermined threshold value or less Until that time, the process returns to the policy position updating unit 203 for determining the switching position, and the policy and value function update processing is repeated. When the value is equal to or less than the predetermined threshold, it is determined that the value function V ^π2 (s2) has converged, and the switching position determination value function convergence determination unit 205 determines the value function V ^π2 (s2) and the policy π2 (s2). Is output to the switching position determination unit 206.

切り替え位置判定部２０６は、上記価値関数を用いて、各l=0,1,…,eについて次式を計算する。
The switching position determination unit 206 calculates the following equation for each l = 0, 1,..., E using the value function.

ここで、lは、最適経路点列s1_trjのインデックスである。Partition[l]の値が０のとき、s1_trj[l]にある物品をs1_trj[l+1]に移動したのち、担当ロボットを変更しなければならないことを示す。 Here, l is an index of the optimum route point sequence s1_trj. When the value of Partition [l] is 0, it indicates that the assigned robot must be changed after the article in s1_trj [l] is moved to s1_trj [l + 1].

なお、これまでの処理ではロボットは全ての方向に物品を押すことが可能な仮想ロボットを想定して計算を行った。しかし、実環境におけるロボットが物品を移動させることが可能な方向が制限されている場合がある（例えば、本実施形態の設定では各ロボットは、図１に示す物品の移動方向に制限されている）。実環境におけるロボットの、物品を移動させることが可能な方向が制限されている場合は、上記Partition[l]を求めた後、さらに以下の処理を行う。 In the processing so far, the calculation is performed assuming a virtual robot that can push an article in all directions. However, the direction in which the robot in the real environment can move the article may be restricted (for example, in the setting of the present embodiment, each robot is restricted to the movement direction of the article shown in FIG. 1). ). When the direction in which the robot can move the article in the real environment is restricted, the following processing is further performed after obtaining the above Partition [l].

まず、l=0とし、Partition[0]の値が示す物品の移動方向に、物品を押すことが可能な実環境におけるロボットを１つ選択する。ここでは、ロボットiが選択されたとする。次に、lの値を一つ増やし(l←l+1とする)、Partition[l]が指す物品の移動方向が、Partition[0]で選択したロボットｉが可能とする物品移動方向の集合の中に含まれる場合のみ、ロボットiが物品の移動を継続可能であるのでPartition[l]の値を変更しない。そうでない場合は、ロボットを切り替える必要があるので、Partition[l]=0に更新する。これを、l=2,3, …,eについて順に繰り返すことで、Partition[l]の値を更新する。こうして求められたPartition[l]の値は、１台のロボットが継続して押すことのできる区間を示す。つまり、Partition[l]=0で区切られた区間が１台のロボットが継続して押すことのできる区間である。以下では、各区間の開始点のインデックスの値をp₀, p₁, p₂, …, p_fとする。なお、区間の開始点とは、l=0、および、Partition[l]=0となるlの値である。 First, l = 0 is set, and one robot in an actual environment that can push the article is selected in the movement direction of the article indicated by the value of Partition [0]. Here, it is assumed that robot i is selected. Next, the value of l is incremented by 1 (l ← l + 1), and the movement direction of the article pointed to by Partition [l] is a set of article movement directions that can be performed by the robot i selected in Partition [0]. Since the robot i can continue to move the article only when it is included in the list, the value of Partition [l] is not changed. If not, the robot needs to be switched, so update Partition [l] = 0. By repeating this in order for l = 2, 3,..., E, the value of Partition [l] is updated. The value of Partition [l] obtained in this way indicates a section that can be continuously pressed by one robot. That is, a section divided by Partition [l] = 0 is a section that can be continuously pressed by one robot. In the following, it is assumed that the index value of the start point of each section is p ₀ , p ₁ , p ₂ _,. Note that the start point of the section is a value of l at which l = 0 and Partition [l] = 0.

例えば、
Partition[0]=1,
Partition[1]=1,
Partition[2]=0,
Partition[3]=2,
Partition[4]=2,
Partition[5]=3,
Partition[6]=0,
Partition[7]=4,
Partition[8]=5, …
の場合、区間の端点をp₀=0, p₁=2, p₂=6である。 For example,
Partition [0] = 1,
Partition [1] = 1,
Partition [2] = 0,
Partition [3] = 2,
Partition [4] = 2,
Partition [5] = 3,
Partition [6] = 0,
Partition [7] = 4,
Partition [8] = 5,…
In this case, the end points of the section are p ₀ = 0, p ₁ = 2, and p ₂ = 6.

方策分割部２０７は、上述の如く求められた方策π２(s2)を、切り替え位置判定部２０６が求めた区間ごとに分割する。方策分割部２０７は、取りうるp_m, p_nの組み合わせ（p_m, p_n∈｛p₀, p₁, p₂, …, p_f｝，p_m＜ p_n）毎に、方策π２(s2)を分割して、π2[p_m][p_n](s2)とする。π2[p_m][p_n](s2)は、方策π2(s2)のうち区間[p_m,p_n]の最適経路点列s1_trjに対応するものであり、物品の移動を担当するロボットが、位置s1_trj[p_m+1]からs1_trj[p_n+1]まで物品を継続して動かすための行動方策を表す。なお、切り替え位置決定部２０６では、仮想ロボットの移動が実ロボットと同じ行動として扱われているので、計算された方策π2[p_m][p_n](s2)が、そのまま実環境において物品移動を担当するロボットの行動方策となる。 The policy division unit 207 divides the policy π2 (s2) obtained as described above for each section obtained by the switching position determination unit 206. Measures dividing unit 207 can take p _m, p _n combinations _{_{(p m, p n ∈ {}} p 0, p 1, p 2, ..., p f}, p m <p n) for each, measures .pi.2 ( s2) is divided into π2 [p _m ] [p _n ] (s2). π2 [p _m ] [p _n ] (s2) corresponds to the optimal path point sequence s1_trj in the interval [p _m , p _n ] of the policy π2 (s2), and the robot in charge of moving the article is , Represents an action policy for continuously moving the article from position s1_trj [p _m +1] to s1_trj [p _n +1]. In the switching position determination unit 206, since the movement of the virtual robot is treated as the same action as the real robot, the calculated policy π2 [p _m ] [p _n ] (s2) is moved as it is in the real environment. It becomes the action policy of the robot in charge.

[移動経路計画部；ステップＳ３]
移動経路計画部３００においては、第２階層で計算された各区間内での物品位置及び部屋内全域のロボット位置に探索区域が限定される。すなわち、担当ロボットの切り替え時における次担当ロボットの誘導についての行動方策を、マルコフ状態遷移モデルを利用した動的計画法により計算する。切り替えを行う担当ロボット以外のロボットは、同時に行われる連続物品移動を行っているか、もしくは物品との衝突を避けつつ待機しているものとして扱われる。 [Movement route planning unit; Step S3]
In the movement path planning unit 300, the search area is limited to the article position in each section calculated in the second hierarchy and the robot position in the entire room. That is, the action policy for guiding the next assigned robot when the assigned robot is switched is calculated by a dynamic programming method using a Markov state transition model. The robots other than the robot in charge for switching are treated as performing continuous article movement performed simultaneously or waiting while avoiding a collision with the article.

移動経路計画部３００は、物品の理想軌道s1_trjの系列うち、切り替え位置決定部２０６で計算されたPartition[l]の値が０となるインデックス(lの値）で区切られた各区間について、先の区間を担当するロボットが、先の区間の初期位置にある物品を押せる位置まで移動するための行動方策を計算する。また、物品を押す行動を終えた後、そのまま同じ位置で待機しているロボットｚが、ロボットｖによって押されている物品との衝突を回避するための行動方策の計算も行う。 The movement path planning unit 300 calculates, for each section divided by an index (value of l) in which the value of Partition [l] calculated by the switching position determination unit 206 is 0 in the series of the ideal trajectory s1_trj of the article. The robot in charge of this section calculates an action policy for moving to a position where the article at the initial position of the previous section can be pushed. In addition, after finishing the action of pushing the article, the robot z waiting at the same position as it is also calculates an action policy for avoiding a collision with the article pushed by the robot v.

まず、時間的に後の区間を担当するロボットが、時間的に後の区間の初期位置にある物品を押せる位置まで移動するための行動方策の計算について説明する。ここでは、他のロボットｖが区間[p_m,p_m+1]の物品の移動をしている間に、仮想ロボットｗが[p_m,p_m+1]よりも時間的に後の区間（p_nから始まる区間)における物品を押すことが可能な位置まで移動するために、仮想ロボットｗが区間[p_s,p_n]において取るべき行動の系列（行動方策π3[p_m][p_s][p_n](s3)）を計算する。言い換えれば、s1_trj[p_m+1]からs1_trj[p_m+1+1]まで仮想ロボットｖが物品を移動させている間に、s1_trj[p_s+1]まで物品を押して移動させた後待機している仮想ロボットｗが、s1_trj[p_n+1]にある物品をs1_trj[p_n+2]に移動させることが可能な位置まで移動するために、仮想ロボットｗが取るべき行動の系列を計算する。ここで、p_m, p_s, p_n∈｛p₀, p₁, p₂, …, p_f｝，p_s≦p_m, p_s＜p_n，p_m+1≦p_nである。 First, calculation of an action policy for a robot in charge of a later section in time to move to a position where an article at an initial position in the later section can be pushed will be described. Here, while the other robot v is moving the article in the section [p _m , p _{m + 1} ], the virtual robot w is a section later in time than [p _m , p _{m + 1} ]. A sequence of actions (action policy π3 [p _m ] [p] that the virtual robot w should take in the section [p _s , p _n ] in order to move to a position where the article can be pushed in the section (starting from p _n ). _s ] [p _n ] (s3)). In other words, while the virtual robot v is moving the article from s1_trj [p _m +1] to s1_trj [p _{m + 1} +1], it waits after pushing and moving the article to s1_trj [p _s +1]. virtual robot w that is, to move the article in the s1_trj [p _n +1] s1_trj to [p _n +2] that can be moved to a position, a sequence of actions to be taken by the virtual robot w calculate. Here, p _m , p _s , p _n ∈ {p ₀ , p ₁ , p ₂ ,..., P _f }, p _s ≦ p _m, p _s <p _n , p _{m + 1} ≦ p _n .

例えば、p_m=2, p_s=1, p_n=5とすると、π3[2][1][5](s3)は、仮想ロボットｖが区間[2,3]において物品を移動させている間に、区間[0,1]まで物品を移動させた後待機していた仮想ロボットｗが、区間[5,6]における物品移動を行える開始位置まで移動するための行動方策となる。 For example, if p _m = 2, p _s = 1, and _pn = 5, π3 [2] [1] [5] (s3) indicates that the virtual robot v moves the article in the interval [2,3]. In the meantime, the virtual robot w that has been waiting after moving the article to the section [0, 1] is an action policy for moving to the start position where the article can be moved in the section [5, 6].

なお、この設定において、物品移動を担当している仮想ロボットｖの移動区間はp_mから始まる１区間[p_m,p_m+1]とすることにより、後で説明するスケジューリング層における状態数を少なくすることができ、効率的にスケジューリングを行うことができるという効果もある。 Note that in this setting, one section [p _m, p _{m + 1]} moving section starting at p _m of the virtual robot v in charge of an article moved by it to, the number of states in the scheduling layer to be described later There is also an effect that scheduling can be reduced and scheduling can be performed efficiently.

マルコフ状態遷移モデルにおける「環境のとりうる離散的な状態の集合」をs3=(Xb,Yb,Xrw,Yrw)とし、行動主体（ロボットｗ）が取りうる行動a3∈{0,1,2,3,4,5}とする（マルコフ連鎖）。行動変数a3=0,1,2,3,4は、行動変数a2と同じくロボットの静止と上下左右方向の移動を表す。a3=5は、「待機」を示す。「待機」と「静止」の違いは、「待機」は物品を移動中の他のロボットの行動を考慮している点である。つまり、a3=5は、他のロボットが物品を移動中であるため、物品の移動が1ステップ分完了するまで待機することを意味する。また、状態変数s3に含まれる物品位置(Xb,Yb)は、
(Xb,Yb)∈{s1_trj[p_m+1], s1_trj[p_m+2] …, s1_trj[p_m+1+1]}
である。 Let s3 = (Xb, Yb, Xrw, Yrw) be a set of discrete states that the environment can take in the Markov state transition model, and the action a3∈ {0,1,2, which the action subject (robot w) can take 3,4,5} (Markov chain). The behavior variable a3 = 0, 1, 2, 3, and 4 represents the stationary motion of the robot and the movement in the up / down / left / right directions, like the behavior variable a2. a3 = 5 indicates “standby”. The difference between “standby” and “stationary” is that “standby” takes into account the behavior of other robots moving the article. That is, a3 = 5 means that another robot is moving the article, and thus waits until the movement of the article is completed for one step. In addition, the article position (Xb, Yb) included in the state variable s3 is
(Xb, Yb) ∈ {s1_trj [p _m +1], s1_trj [p _m +2]…, s1_trj [p _{m + 1} +1]}
It is.

また、状態変数s3に含まれるロボットｗの位置（Xrw,Yrw）は、部屋全体を定義域とする。そして、状態変数s3=(s1_trj[l],Xrw,Yrw)∈S3において行動a3∈A3を選択したときに、状態s3’=(s1_trj[l]’,Xrw’,Yrw’)∈S3に遷移する状態遷移確率P3[p_m][p_s][p_n]を、以下のように設定する。
Further, the position (Xrw, Yrw) of the robot w included in the state variable s3 is defined as the entire room. Then, when the action a3∈A3 is selected in the state variable s3 = (s1_trj [l], Xrw, Yrw) ∈S3, the state transitions to the state s3 ′ = (s1_trj [l] ', Xrw', Yrw ') ∈S3 The state transition probability P3 [p _m ] [p _s ] [p _n ] to be set is set as follows.

ただし、＜条件Ｃ＞は以下のいずれかを満たす場合である。
(1) a3=1 & Xrw’=Xrw+1 & Yrw’=Yrw & s1_trj[l]’=s1_trj[l]
(2) a3=2 & Xrw’=Xrw & Yrw’=Yrw+1 & s1_trj[l]’=s1_trj[l]
(3) a3=3 & Xrw’=Xrw-1 & Yrw’=Yrw & s1_trj[l]’=s1_trj[l]
(4) a3=4 & Xrw’=Xrw & Yrw’=Yrw-1 & s1_trj[l]’=s1_trj[l]
(5) a3=0 & Xrw’=Xrw & Yrw’=Yrw & s1_trj[l]’=s1_trj[l]
(6) a3=5 & Xrw’=Xrw & Yrw’=Yrw & s1_trj[l]’=s1_trj[l+1] However, <Condition C> is a case where any of the following is satisfied.
(1) a3 = 1 & Xrw '= Xrw + 1 &Yrw' = Yrw & s1_trj [l] '= s1_trj [l]
(2) a3 = 2 & Xrw '= Xrw &Yrw' = Yrw + 1 & s1_trj [l] '= s1_trj [l]
(3) a3 = 3 & Xrw '= Xrw-1 &Yrw' = Yrw & s1_trj [l] '= s1_trj [l]
(4) a3 = 4 & Xrw '= Xrw &Yrw' = Yrw-1 & s1_trj [l] '= s1_trj [l]
(5) a3 = 0 & Xrw '= Xrw &Yrw' = Yrw & s1_trj [l] '= s1_trj [l]
(6) a3 = 5 & Xrw '= Xrw &Yrw' = Yrw & s1_trj [l] '= s1_trj [l + 1]

また、このときの報酬R3を以下のように設定する。
Also, the reward R3 at this time is set as follows.

以上のように定義されたP3[p_m][p_s][p_n](s3’｜s3,a3)とR3[p_m][p_s][p_n](s3’,s3,a3)を使用して動的計画法にて方策π３[p_m][p_s][p_n](s3)を求めると、π３[p_m][p_s][p_n](s3)は、別のロボットが物品をs1_trj[p_m+1]からs1_trj[p_m+1+1]まで移動させている間に、位置s1_trj[p_n+1]にある物品をa1_trj[p_n+1]の示す方向に押すことが可能な位置(Ready_Pos_k(s1_trj[p_n+1],a1_trj[p_n+1])のいずれか)に、仮想ロボットｗが到達するための、仮想ロボットｗの行動方策（移動のための方策）となる。すなわち、他のロボットが物品を連続移動させる間に、どのタイミングでReady_Pos_k(s1_trj[p_n＋１],a1_trj[p_n+1])のいずれかの位置に行動主体（仮想ロボットｗ）を移動させるかの動作計画を行うことになる。実際、他のロボットは、ここでは物品とともに動くものとしてしか評価されていない。移動経路計画部３００によって、他のロボットとの連携を初めて考慮したロボットの移動が計画されるが、他のロボットの動作は切り替え位置決定部２００によって決定されており、他のロボット（物品を移動させるロボット）は物品とともに移動する前提なので、他のロボットの状態を記述する状態変数を移動経路計画部３００においては考慮しなくてもよくなっている。つまり、状態変数s3に含まれるロボットの座標（Xrw,Yrw）は、物品を移動させていないロボットで、次の切り替え位置に移動するロボットの座標を扱っている。 P3 [p _m ] [p _s ] [p _n ] (s3 ′ | s3, a3) and R3 [p _m ] [p _s ] [p _n ] (s3 ′, s3, a3) defined as above If we find the policy π3 [p _m ] [p _s ] [p _n ] (s3) using dynamic programming, π3 [p _m ] [p _s ] [p _n ] (s3) While the robot moves the article from s1_trj [p _m +1] to s1_trj [p _{m + 1} +1], the article at position s1_trj [p _n +1] is moved to a1_trj [p _n +1] Action policy (virtual robot w) for the virtual robot w to reach a position (Ready_Pos_k (s1_trj [p _n +1], a1_trj [p _n +1]) that can be pushed in the direction shown) Measures for movement). That is, while the other robot continuously moves the article, the action subject (virtual robot w) is moved to any position of Ready_Pos_k (s1_trj [p _n +1], a1_trj [p _n +1]) while moving the article continuously. I will do some operation plan. In fact, other robots are only evaluated here as moving with the article. The movement path planning unit 300 plans the movement of the robot considering the cooperation with other robots for the first time, but the operation of the other robots is determined by the switching position determination unit 200, and the other robots (moving articles) Therefore, the movement path planning unit 300 does not need to consider state variables describing the states of other robots. That is, the coordinates (Xrw, Yrw) of the robot included in the state variable s3 handle the coordinates of the robot that has not moved the article and moves to the next switching position.

以上の処理を、取りうる各p_m, p_s, p_n∈｛p₀, p₁, p₂, …, p_f｝，p_s≦p_m，p_s＜p_n，p_m+1<p_nについて行い、方策π３[p_m][p_s][p_n](s3)を求める。 The above processing can be performed for each possible p _m , p _s , p _n ∈ {p ₀ , p ₁ , p ₂ ,..., P _f }, p _s ≦ p _m , p _s <p _n , p _{m + 1} < performed for p _n, measures _{_{π3 [p m] [p s}} ] [p n] Request (s3).

次に、s1_trj[p_s+1]まで物品を押す行動を終えた後、そのまま同じ位置で待機しているロボットｚが、ロボットｖに押されて区間[p_m,p_m+1]を移動している物品との衝突を回避するための行動方策（π３[p_m][p_s][p_s](s3)）の計算について説明する。位置s1_trj[p_m+1]からs1_trj[p_m+1+1]）を物品が連続移動するときに、位置s1_trj[p_s+1]に物品を押してきた後の位置でロボットｚが待機していると、ロボットｚが物品移動の邪魔になることがある。その際に、ロボットｚは物品移動の妨げとならない位置に一時的に場所を移動する必要がある。方策π3[p_m][p_s][p_s](s3)は、そのためのロボットｚの行動（衝突回避行動）を表す。なお、衝突回避行動のための行動方策は、π３[p_m][p_s][p_s](s3)の形で、ロボットｚの行先と出発点が同じ（ｐ_sである）移動のための方策として記述できる。ただし、衝突回避の場合の目標位置は、Ready_Pos_k(s1_trj[p_s+1], a1_trj[p_s])のいずれかの位置である。 Next, after finishing the action of pushing the object to s1_trj [p _s +1], the robot z waiting at the same position is moved by the robot v and moves in the section [p _m , p _{m + 1} ]. The calculation of an action policy (π3 [p _m ] [p _s ] [p _s ] ( _{s 3} )) for avoiding a collision with the article being operated will be described. When the article continuously moves from the position s1_trj [p _m +1] to the position s1_trj [p _m _{+ 1} +1]), the robot z waits at the position after pushing the article to the position s1_trj [p _s +1]. The robot z may interfere with the movement of the article. At that time, it is necessary for the robot z to temporarily move to a position that does not hinder the movement of the article. The policy π3 [p _m ] [p _s ] [p _s ] ( _{s 3} ) represents the action (collision avoidance action) of the robot z for that purpose. The action strategy for collision avoidance behavior is in the form of π3 [p _m ] [p _s ] [p _s ] (s3), because the destination of robot z is the same as the starting point (p _s ). It can be described as a measure. However, the target position in the case of collision _{avoidance, Ready_Pos_k (s1_trj [p s +1} ], a1_trj [p s]) is any position.

状態遷移確率は式（９）をそのまま使用し（ただし、P3[p_m][p_s][p_s]とする）、報酬は、式（１０）の報酬の設定におけるロボットの目標位置をReady_Pos_k(s1_trj[p_s+1], a1_trj[p_s])に変更したもの、即ち、次式を使用して、動的計画法にて方策π３[p_m][p_s][p_s](s3)を求めればよい。
The state transition probability uses Equation (9) as it is (where P3 [p _m ] [p _s ] [p _s ]), and the reward is the robot's target position in the reward setting of Equation (10) Ready_Pos_k Change to (s1_trj [p _s +1], a1_trj [p _s ]), that is, using the following equation, π3 [p _m ] [p _s ] [p _s ] ( Find s3).

以上の処理を、取りうる各p_m, p_s∈｛p₀, p₁, p₂, …, p_f｝，p_s≦p_mについて行い、方策π3[p_m][p_s][p_s](s3)を求める。 The above processing is performed for each possible p _m , p _s ∈ {p ₀ , p ₁ , p ₂ ,…, p _f }, p _s ≦ p _m , and policy π3 [p _m ] [p _s ] [p _s ] (s3) is obtained.

移動経路計画部３００の詳細構成を図７に示す。移動経路計画部３００は、次担当ロボット移動計画用状態遷移モデル設定部３０１、次担当ロボット移動計画用価値関数初期化部３０２、次担当ロボット移動計画用方策更新部３０３、次担当ロボット移動計画用価値関数更新部３０４、次担当ロボット移動計画用価値関数収束判定部３０５、衝突回避用状態遷移モデル設定部３０６、衝突回避用価値関数初期化部３０７、衝突回避用移動計画用方策更新部３０８、衝突回避用移動計画用価値関数更新部３０９、衝突回避用移動計画用価値関数収束判定部３１０を含む。 A detailed configuration of the movement route planning unit 300 is shown in FIG. The movement route planning unit 300 includes a state transition model setting unit 301 for the next assigned robot movement plan, a value function initializing unit 302 for the next assigned robot movement plan, a policy updating unit 303 for the next assigned robot movement plan, and a next assigned robot movement plan. A value function update unit 304, a value function convergence determination unit 305 for the next assigned robot movement plan, a collision avoidance state transition model setting unit 306, a collision avoidance value function initialization unit 307, a collision avoidance movement plan policy update unit 308, A collision avoidance movement planning value function update unit 309 and a collision avoidance movement plan value function convergence determination unit 310 are included.

次担当ロボット移動計画用状態遷移モデル設定部３０１は、上記式（９）の状態遷移確率と、式（１０）の報酬とを設定する。 The state transition model setting unit 301 for the next assigned robot movement plan sets the state transition probability of the above equation (9) and the reward of the equation (10).

次担当ロボット移動計画用価値関数初期化部３０２、次担当ロボット移動計画用方策更新部３０３、次担当ロボット移動計画用価値関数更新部３０４、次担当ロボット移動計画用価値関数収束判定部３０５の各処理は、状態遷移確率と報酬が式（９）と式（１０）にそれぞれ置き換わる点を除いては、物品軌道計算部１００の場合と同じである。 The next assigned robot movement plan value function initialization unit 302, the next assigned robot movement plan policy update unit 303, the next assigned robot movement plan value function update unit 304, and the next assigned robot movement plan value function convergence determination unit 305. The processing is the same as in the case of the article trajectory calculation unit 100 except that the state transition probability and the reward are replaced with the equations (9) and (10), respectively.

次担当ロボット移動計画用価値関数初期化部３０２は、各s3∈S3についての価値関数Ｖ^π（s3)の初期値を設定する。これは適当な値を設定すればよい。 The next assigned robot movement plan value function initialization unit 302 sets an initial value of the value function V ^π (s3) for each s3εS3. An appropriate value may be set for this.

次担当ロボット移動計画用方策更新部３０３は、すべてのs3∈S3及びa3∈A3の組み合わせについて次式を計算し、Ｖ^π3[p_m][p_s][p_n]（s3）の値が最大となる行動a2を方策π３[p_m][p_s][p_n](s3)の値に設定する。
The next robot movement plan policy updating unit 303 calculates the following equation for all combinations of s3∈S3 and a3∈A3, and the value of V ^π3 [p _m ] [p _s ] [p _n ] (s3) is calculated. The maximum action a2 is set to the value of policy π3 [p _m ] [p _s ] [p _n ] (s3).

ここで、Σは取りうるすべてのs3’についての総和を表し、γは予め定められた0<γ<1.0を満たす定数である。また、式（１１）におけるＶ^π3[p_m][p_s][p_n](s3’)の値は、次担当ロボット移動計画用方策更新部３０３の最初の実行時は次担当ロボット移動計画用価値関数初期化部３０２で設定された初期値が利用され、２回目以降の実行では、以下の次担当ロボット移動計画用価値関数更新部３０４で更新された価値関数の値が用いられる。 Here, Σ represents the total sum of all possible s3 ′, and γ is a constant that satisfies a predetermined 0 <γ <1.0. In addition, the value of V ^π3 [p _m ] [p _s ] [p _n ] ( _{s 3} ′) in the expression (11) is the next assigned robot movement plan at the first execution time of the next assigned robot movement plan policy update unit 303. The initial value set by the value function initialization unit 302 is used, and the value of the value function updated by the following assigned robot movement plan value function update unit 304 is used in the second and subsequent executions.

次担当ロボット移動計画用価値関数更新部３０４は、各s3∈S3について、次式により価値関数Ｖ^π3[p_m][p_s][p_n](s3)を更新する。
The next assigned robot movement plan value function updating unit 304 updates the value function V ^π3 [p _m ] [p _s ] [p _n ] (s 3) for each s 3 ∈ S 3 according to the following equation.

次担当ロボット移動計画用価値関数収束判定部３０５は、次担当ロボット移動計画用価値関数更新部３０４による更新後のＶ^π3[p_m][p_s][p_n](s3)の値と更新前のＶ^π3[p_m][p_s][p_n](s3)との誤差が所定の閾値以下となるまで、上記次担当ロボット移動計画用方策更新部３０３に戻り、方策と価値関数の更新処理を繰り返す。所定の閾値以下となった場合には、価値関数Ｖ^π3[p_m][p_s][p_n](s3)が収束したと判定し、価値関数Ｖ^π3[p_m][p_s][p_n](s3)と方策π３[p_m][p_s][p_n](s3)を出力する。 The next assigned robot movement plan value function convergence determination unit 305 updates and updates the value of V ^π3 [p _m ] [p _s ] [p _n ] (s3) updated by the next assigned robot movement plan value function update unit 304. ^{Until the error from the} previous V ^π3 [p _m ] [p _s ] [p _n ] ( _{s 3} ) is less than or equal to a predetermined threshold value, the process returns to the policy update unit 303 for the next assigned robot movement plan, and the policy and value function Repeat the update process. When the value is equal to or less than a predetermined threshold, it is determined that the value function V ^π3 [p _m ] [p _s ] [p _n ] ( _{s 3} ) has converged, and the value function V ^π3 [p _m ] [p _s ] [ p _n ] (s3) and policy π3 [p _m ] [p _s ] [p _n ] (s3) are output.

同様に、衝突回避用状態遷移モデル設定部３０６は、上記式（９）の状態遷移確率と、式（１０．２）の報酬とを設定する。 Similarly, the collision avoidance state transition model setting unit 306 sets the state transition probability of the above equation (9) and the reward of the equation (10.2).

衝突回避用価値関数初期化部３０７、衝突回避用方策更新部３０８、衝突回避用価値関数更新部３０９、衝突回避用価値関数収束判定部３１０の処理は、Ｖ^π3[p_m][p_s][p_n](s3)がＶ^π3[p_m][p_s][p_s](s3)に置き換わり、π３[p_m][p_s][p_n](s3)がπ３[p_m][p_s][p_s](s3)に置き換わり、状態遷移確率と報酬が式（９）と式（１０．２）にそれぞれ置き換わる点を除いては、次担当ロボット移動計画用価値関数初期化部３０２、次担当ロボット移動計画用方策更新部３０３、次担当ロボット移動計画用価値関数更新部３０４、次担当ロボット移動計画用価値関数収束判定部３０５の処理と同じであるから説明を略する。 The processes of the collision avoidance value function initialization unit 307, the collision avoidance policy update unit 308, the collision avoidance value function update unit 309, and the collision avoidance value function convergence determination unit 310 are V ^π3 [p _m ] [p _s ]. [p _n ] (s3) is ^replaced by V ^π3 [p _m ] [p _s ] [p _s ] (s3), and π3 [p _m ] [p _s ] [p _n ] (s3) is π3 [p _m ] [p _s ] [p _s ] (s3) is replaced, and the state function probabilities and rewards are replaced by equation (9) and equation (10.2), respectively. Since the processing is the same as that of the unit 302, the next assigned robot movement plan policy updating unit 303, the next assigned robot movement plan value function updating unit 304, and the next assigned robot movement plan value function convergence determining unit 305, description thereof is omitted.

移動経路計画部３００においても、マルコフ状態空間で使用している行動値はロボットの移動そのものなので、計算された行動方策π３[p_m][p_s][p_n](s3)は、担当の切り替えが行われる位置にある物品を仮想ロボットが押せる位置まで、仮想ロボットが移動するための行動方策として使用できる。π2[p_m][p_n](s2)と異なるのは、物品を移動させる他ロボットの行動（a3=5）が定義されていることである。 Also in the movement route planning unit 300, the action value used in the Markov state space is the movement of the robot itself, so the calculated action policy π3 [p _m ] [p _s ] [p _n ] (s3) It can be used as an action policy for the virtual robot to move to the position where the virtual robot can press the article at the position where switching is performed. The difference from π2 [p _m ] [p _n ] (s2) is that an action (a3 = 5) of another robot that moves the article is defined.

[変形例]
上記の説明では、移動経路計画部３００における行動変数a3の取りうる値は、０〜５の６種類としたが、仮想ロボットｗと仮想ロボットｖが同時に動く（つまり、仮想ロボットｗが物品を１ステップ移動させる行動と、仮想ロボットｖが所定方向に移動する行動とが同時に行われる）ことを意味する行動変数（ａ３＝６、７、８、９の４種類をさらに増やした合計１０種類として、計算を行ってもよい。ここで、
ａ３＝６：仮想ロボットは右方向へ移動
ａ３＝７：仮想ロボットは左方向へ移動
ａ３＝８：仮想ロボットは上方向へ移動
ａ３＝９：仮想ロボットは下方向へ移動
とする。この場合は上記＜条件Ｃ＞を、上述の(1)〜(6)に以下(7)〜(10)を加えたものに置き換える。
(7) a3=6 & Xrw’=Xrw+1 & Yrw’=Yrw & s1_trj[l]’=s1_trj[l+1]
(8) a3=7 & Xrw’=Xrw & Yrw’=Yrw+1 & s1_trj[l]’=s1_trj[l+1]
(9) a3=8 & Xrw’=Xrw-1 & Yrw’=Yrw & s1_trj[l]’=s1_trj[l+1]
(10) a3=9 & Xrw’=Xrw & Yrw’=Yrw-1 & s1_trj[l]’=s1_trj[l+1] [Modification]
In the above description, the possible values of the action variable a3 in the movement route planning unit 300 are six types of 0 to 5, but the virtual robot w and the virtual robot v move at the same time (that is, the virtual robot w takes 1 item). Action variables (a3 = 6, 7, 8, 9) are further increased to 10 kinds of action variables that mean that the action of step movement and the action of the virtual robot v moving in a predetermined direction are performed simultaneously, You can do the calculation, where
a3 = 6: The virtual robot moves to the right a3 = 7: The virtual robot moves to the left a3 = 8: The virtual robot moves upward a3 = 9: The virtual robot moves downward. In this case, the above <Condition C> is replaced with the above (1) to (6) plus the following (7) to (10).
(7) a3 = 6 & Xrw '= Xrw + 1 &Yrw' = Yrw & s1_trj [l] '= s1_trj [l + 1]
(8) a3 = 7 & Xrw '= Xrw &Yrw' = Yrw + 1 & s1_trj [l] '= s1_trj [l + 1]
(9) a3 = 8 & Xrw '= Xrw-1 &Yrw' = Yrw & s1_trj [l] '= s1_trj [l + 1]
(10) a3 = 9 & Xrw '= Xrw &Yrw' = Yrw-1 & s1_trj [l] '= s1_trj [l + 1]

行動変数a3∈{0,1,2,3,4,5}は、仮想ロボットｗまたは物品を押す仮想ロボットｖのいずれか一方が移動している、もしくは、両方とも静止していることを示す行動である。これを用いて行動を行った場合、実際にはロボットｗとロボットｖが並列に（同時に）行動可能な状況にあっても、逐次的に行動を行うような動作計画となるが、状態変数が少ないので移動経路計画部の計算を早く計算を行うことができるという利点がある。
一方、a3∈{0,1,2,3,4,5,6,7,8,9}とした場合は、a3∈{6,7,8,9}により、ロボットｗとロボットｖとが並列に行う行動も考慮することができる。行動変数が多くなるので移動経路計画部の計算に時間がかかるが、実際に計画される動作に無駄がなく、より適切な（所要時間の短い）行動系列を獲得できるという利点がある。 The action variable a3∈ {0,1,2,3,4,5} indicates that either the virtual robot w or the virtual robot v pushing the article is moving or both are stationary. It is action. When an action is performed using this, even if the robot w and the robot v can actually act in parallel (simultaneously), the action plan is such that the action is performed sequentially. Since there are few, there exists an advantage that the calculation of a movement path | route plan part can be calculated quickly.
On the other hand, when a3∈ {0,1,2,3,4,5,6,7,8,9}, the robot w and the robot v are obtained by a3∈ {6,7,8,9}. Parallel actions can also be considered. Since the number of action variables increases, it takes time to calculate the movement route planning unit. However, there is an advantage that an action sequence that is actually planned is not wasted and a more appropriate action sequence (short required time) can be obtained.

[スケジューリング部；ステップＳ４]
以上の処理により、あるロボットiが軌道区間s1_trj[p_m+1]からs1_trj[p_n+1]に物品を連続移動させるときのロボット動作のための行動方策π２[p_m][p_n](s2)と、ロボットiがs1_trj_[p_m+1]からs1_trj[p_m+1+1]に物品を移動させている間に、次の区間を担当するロボットｊが切り替え位置（次の区間の先頭位置にある物品を押せる位置）に移動するための次担当ロボットjの行動方策π３[p_m][p_s][p_n](s3)が計算できた。これにより、たとえば、π２[p₀][p₁](s2)に従ってロボット１が物品を連続移動している間に、π３[p₀][p₀][p₁](s3)に従ってロボット２がs1_trj[p₁+1]にある物品をs1_trj[p₁+2]に移動可能な位置に移動するとともに、π３[p₀][p₀][p₂](s3)に従ってロボット３がs1_trj[p₂+1]にある物品をs1_trj[p₂+2]に移動可能な位置に移動し、ロボット１が物品をs1_trj[p₁+1]に移動後に、ロボット２が行動方策をπ２[p₁][p₂](s2)に切り替えて物品を移動させ、その後ロボット３が行動方策をπ２[p₂][p₃](s2)に切り替えて物品を移動させるという形で、ロボットの協調動作が実現可能である。あとは、これらの行動方策の切り替えとロボットの割り振りのつじつまが合うように、スケジューリング部４００において、使用する行動方策とロボット番号の組み合わせを求めればよい。 [Scheduling unit; step S4]
With the above processing, an action policy π2 [p _m ] [p _n ] for robot operation when a robot i continuously moves an article from the trajectory section s1_trj [p _m +1] to s1_trj [p _n +1]. (s2) and while robot i moves the article from s1_trj_ [p _m +1] to s1_trj [p _{m + 1} +1], the robot j in charge of the next section The action policy π3 [p _m ] [p _s ] [p _n ] ( _{s 3} ) of the next robot j to move to the position where the article at the top position of the robot can be pushed) can be calculated. Thus, for example, while the robot 1 continuously moves the article according to π2 [p ₀ ] [p ₁ ] (s2), the robot 2 according to π3 [p ₀ ] [p ₀ ] [p ₁ ] (s3). Is moved to a position where s1_trj [p ₁ +1] can be moved to s1_trj [p ₁ +2], and the robot 3 moves s1_trj according to π3 [p ₀ ] [p ₀ ] [p ₂ ] (s3). move the article in the [p ₂ +1] to movable positioned s1_trj [p ₂ +2], after moving the robot 1 article s1_trj [p ₁ +1], the robot 2 action policy .pi.2 [ p ₁ ] [p ₂ ] (s2) to move the article, and then the robot 3 switches the action policy to π2 [p ₂ ] [p ₃ ] (s2) to move the article. Cooperative operation is feasible. After that, the scheduling unit 400 may obtain a combination of the action policy to be used and the robot number so that the switching of the action policy and the allocation of the robot match.

スケジューリングの方法は、周知のグラフ探索などの技術を用いればよいが、例えば、マルコフ状態空間での動的計画法によってスケジューリングを行うことができる。今、マルコフ状態空間をp個のロボットを使用するとして
s4=(Cb,C1,…,Cp,CA1,…,CAp)
a4=（d1,…,dp）
とする。 As a scheduling method, a known technique such as a graph search may be used. For example, the scheduling can be performed by dynamic programming in a Markov state space. Now, using Markov state space with p robots
s4 = (Cb, C1,…, Cp, CA1,…, CAp)
a4 = (d1,…, dp)
And

ここで、Cbは物品位置であり、その定義域はPartition[l]=0となる区間の端点、つまり、Cb∈{p₀, p₁, …,p_f}である。また、Ci(i=1,2,…,p)はロボットiの位置であるが、実際のロボットのX,Y座標ではなく、ロボットがその位置まで押してきた物品位置（イ）、もしくはこれから押そうとしている物品の位置（ロ）を指し、Cbと同様に端点番号p_mで表現される（（イ）または（ロ）の位置は、物品の位置が分かれば特定できるため、端点番号でよい）。同じCiの値でも実際のロボットの位置は２通りの場合がありうる。Ciの値の指す物品の位置まで、ロボットがどのような行動をして到達したかにより実際のロボット位置は（イ）か（ロ）のいずれかの値となる。実際のロボット位置を(Xri,Yri)とすれば、(Xri,Yri)∈{Ready_Pos_k(s1_trj[p_m+1],a1_trj[p_m])∨Ready_Pos_k(s1_trj[p_m+1],a1_trj[p_m+1])} (m=0,1,…,f) である。ここで、Ready_Pos_k(s1_trj[p_m+1],a1_trj[p_m])は、区間の始点位置(s1_trj[p_m+1])にs1_trj[p_m]から物品を押した後のロボットの位置を表し、Ready_Pos_k(s1_trj[p_m+1],a1_trj[p_m+1])は、区間の終点(s1_trj[p_m+1])にある物品をa1_trj[p_m+1]の示す方向に押せる位置である。CAiは、これを識別するための情報であり、ロボットiが一ステップ前で実行した行動diが物品押し行動π２であるか、もしくは切り替え移動行動π３であるかを示す識別値をとる。この識別値は、例えば、ブール値（１＝物品押し行動π２、０＝切替え移動行動π３）である。 Here, Cb is the article position, and its domain is the end point of the section where Partition [l] = 0, that is, Cb∈ {p ₀ , p ₁ ,..., P _f }. Ci (i = 1, 2, ..., p) is the position of robot i, but not the actual robot X and Y coordinates, but the position of the article that the robot has pushed to that position (i), or from now on. This refers to the position (b) of the article that is about to be expressed, and is expressed by the end point number p _m as in Cb (the position of (b) or (b) can be specified if the position of the article is known, so it may be the end point number. ). There can be two actual robot positions even with the same Ci value. The actual robot position is either (A) or (B) depending on how the robot has reached the position of the article indicated by the Ci value. If the actual robot position is (Xri, Yri), then (Xri, Yri) ∈ {Ready_Pos_k (s1_trj [p _m +1], a1_trj [p _m ]) ∨Ready_Pos_k (s1_trj [p _m +1], a1_trj [ p _m +1])} (m = 0,1, ..., f). Here, Ready_Pos_k (s1_trj [p _m +1], a1_trj [p _m ]) is the position of the robot after pushing the article from s1_trj [p _m ] to the start position (s1_trj [p _m +1]) of the section Ready_Pos_k (s1_trj [p _m +1], a1_trj [p _m +1]) indicates the article at the end of the section (s1_trj [p _m +1]) in the direction indicated by a1_trj [p _m +1] It is a position that can be pressed. CAi is information for identifying this, and takes an identification value indicating whether the action di executed by the robot i one step before is the article pushing action π2 or the switching movement action π3. This identification value is, for example, a Boolean value (1 = article pushing behavior π2, 0 = switching movement behavior π3).

di (i=1,…,p)は、ロボットiが選択する行動方策を表す。diの値は、π２の物品押し行動か、π３の移動行動か、もしくは待機を指す値に分かれる。最適経路点列s1_trjにおける区間の始点の個数はｆ＋１なので、di=0,1,…,fまでは、π３の移動か待機を指す（diの値が現在ロボット位置を指すならば待機）。di=f+1のときは、物品押し行動π２の選択を指す。待機行動については、CAi=1の時に選択されたときは、第３層で計算された衝突回避行動π３を行い、CAi＝０の時に選択されたときは、何もしない。Cb=p_r,Ci=p_qのとき、diの値が0,1,…,fの場合はロボットは使用する行動方策としてπ３[p_r][p_q][p_di](s3)を選択し、diの値がf+1のときは、ロボットは使用する行動方策としてπ２[p_r](s2)を選択するが、選択しうるdiの値は、ロボットの状態Ci、CAi及び物品の位置Cbによって制限がある。その制限のルールは以下のとおりである。 di (i = 1,..., p) represents an action policy selected by the robot i. The value of di is divided into a value indicating an article pushing action of π2, a movement action of π3, or waiting. Since the number of start points of the section in the optimum path point sequence s1_trj is f + 1, until di = 0, 1,..., F indicates the movement or waiting of π3 (waiting if the value of di indicates the current robot position). When di = f + 1, it indicates the selection of the article pushing action π2. When the waiting action is selected when CAi = 1, the collision avoidance action π3 calculated in the third layer is performed, and when it is selected when CAi = 0, nothing is done. When Cb = p _r , Ci = p _q and the value of di is 0, 1,…, f, the robot uses π3 [p _r ] [p _q ] [p _di ] (s3) as the action strategy to use. When the value of di is f + 1, the robot selects π2 [p _r ] (s2) as the action policy to be used. The value of di that can be selected depends on the robot state Ci, CAi and the article. There is a limit depending on the position Cb. The restriction rules are as follows.

（１）Ci=Cb かつ CAi＝0のときのみ、di値として物品押し行動π２を選択する値を選択可能である。
（２）CAi＝1のときのみ、di値として移動行動π3を選択する値を選択可能である。
（３）待機行動は、いかなる場合でも選択可能である。 (1) Only when Ci = Cb and CAi = 0, a value for selecting the article pushing action π2 can be selected as the di value.
(2) Only when CAi = 1, a value for selecting the moving action π3 as the di value can be selected.
(3) The standby action can be selected in any case.

制限（１）は、ロボットは物品を押す行動π２を連続で選択できないことを示し、制限（２）は、物品を押した後のロボットのみ移動行動π３を選択可能であることを示す。 The restriction (1) indicates that the robot cannot select the action π2 that presses the article continuously, and the restriction (2) indicates that only the robot after pressing the article can select the movement action π3.

以上は単体のロボットに対するdi値の制限であるが、さらに、行動a4全体の制限として、
要素(d1,…,dp)のいずれか一つは物品を押す行動π２でなければならない、
という制限も加わる。 The above is the limit of the di value for a single robot, but further, as the limit of the entire action a4,
Any one of the elements (d1, ..., dp) must be the action π2 that pushes the article,
This also adds a restriction.

マルコフ状態遷移モデルにおける状態遷移確率は、以下のような条件により定義される。
The state transition probability in the Markov state transition model is defined by the following conditions.

ただし、＜条件Ｄ＞は以下のいずれかを満たす場合である。
Cb=p_rとしたときCb’=p_r、Ci=p_qとしたときCi’=p_q’とし、以下のいずれかの条件のとき
[１] di=π2(押し) & r’=r+1 & q’=q+1 & CAi’=1
[２] di=π3(移動) & r’=r+1 & q’=di & CAi’=0
[３] di=π3(待機) & r’=r+1 & q’=q & CAi’=CAi However, <Condition D> is a case where any of the following is satisfied.
Cb = p _r and the time _{Cb '= p r, Ci =} p and the Ci when _q' and = p _{q ',} when any of the following conditions
[1] di = π2 (push) & r '= r + 1 &q' = q + 1 & CAi '= 1
[2] di = π3 (move) & r '= r + 1 &q' = di & CAi '= 0
[3] di = π3 (standby) & r '= r + 1 &q' = q & CAi '= CAi

以上の条件において、[１]は、ロボットが物品押し行動を選択したときには、ロボットと物品の位置はともに一つ先の区間端点に移動し、CAi＝１（物品押し行動）に変えることを意味する。[２]は、移動行動を選択したときには、ロボットの位置は行動の示す目標に移動し、物品位置は一つ先の区間端点に移動し、CAi＝０（移動行動）に変えることを意味する。[３]は、待機行動によりロボット位置CiとCAiはともに変化せず、物品位置のみ一つ先の区間端点に移動することを示す。 Under the above conditions, [1] means that when the robot selects an article pushing action, both the robot and the article position move to the next section end point and change to CAi = 1 (article pushing action). To do. [2] means that when the moving action is selected, the position of the robot moves to the target indicated by the action, the article position moves to the next section end point, and changes to CAi = 0 (moving action). . [3] indicates that the robot positions Ci and CAi are not changed by the standby action, and only the article position moves to the next section end point.

また、状態s4において行動a4∈A4を実行し、状態がs4’∈S4へ遷移したとき、環境から行動主体へ与えられる報酬R4を次式で設定する。
Further, when the action a4εA4 is executed in the state s4 and the state transits to s4′εS4, a reward R4 given from the environment to the action subject is set by the following equation.

物品が最終区間の開始点(Cb’=p_f)に到達し、いずれかのロボットの位置がp_fでありかつそこに至った行動が移動行動(CAi’=0)であれば、そのロボットが最終区間にて物品押しを開始できることが確定するので、物品は目標位置に到達できることが分かる。そのときのみ、報酬R4の値を１とし、それ以外の場合は報酬を０とすることを意味する。以上のように定義されたP4(s4’｜s4,a4)とR4(s4’,s4,a4)を使用して動的計画法にて方策π４(s4)を求めると、方策π４(s4)は、区間の端点Cbにおいて各ロボットdiが選択する行動方策を示す。これを、Cb=s1_trj[0]から順に、区間の端点に対応する状態変数s4に対する方策π４(s4)を求めることで、各区間の端点（切り替え位置）において各ロボットdiが選択する行動方策が求まる。スケジューリング部４００は、これをスケジューリング結果として出力する。 If the article reaches the start point (Cb '= p _f ) of the last section, the position of one of the robots is p _f , and the action that reaches there is a moving action (CAi' = 0), the robot Since it is determined that the pushing of the article can be started in the final section, it can be seen that the article can reach the target position. Only at that time, the value of the reward R4 is set to 1, and in other cases, the reward is set to 0. When policy π4 (s4) is obtained by dynamic programming using P4 (s4 ′ | s4, a4) and R4 (s4 ′, s4, a4) defined as above, policy π4 (s4) Indicates the action policy selected by each robot di at the end point Cb of the section. From this, in order from Cb = s1_trj [0], a policy π4 (s4) for the state variable s4 corresponding to the end point of the section is obtained, so that the action policy selected by each robot di at the end point (switching position) of each section is obtained. I want. The scheduling unit 400 outputs this as a scheduling result.

なお、スケジューリング部４００の行動方策を求める際には、物品軌道計画部１００、切り替え位置決定部２００、移動経路計算部３００と同様にマルコフモデル決定過程（ＭＤＰ）での動的計画法を用いてもよいが、MDPの拡張的な手法である準マルコフ決定過程（SMDP）での動的計画法を使用すると、より実環境での所要時間を短縮できるようなスケジューリング結果を得ることができる。SMDPでは、各状態遷移における所要時間の長さを考慮して価値関数を計算する点が、MDPと異なる。 When the action policy of the scheduling unit 400 is obtained, the dynamic planning method in the Markov model determination process (MDP) is used similarly to the article trajectory planning unit 100, the switching position determination unit 200, and the movement route calculation unit 300. However, using dynamic programming in the quasi-Markov decision process (SMDP), which is an extended method of MDP, can obtain scheduling results that can further reduce the time required in the real environment. SMDP differs from MDP in that the value function is calculated in consideration of the length of time required for each state transition.

状態s4にて行動a4を選択したときにs4’に状態遷移するための所要時間をτ(s4,s4’,a4)とする。なお、この所要時間はシミュレーション等により求めればよい。 Let τ (s4, s4 ', a4) be the time required for state transition to s4' when action a4 is selected in state s4. The required time may be obtained by simulation or the like.

スケジューリング用価値関数初期化部は、各s4∈S4についての価値関数Ｖ^π４（s4)の初期値を0に設定する。 The scheduling value function initialization unit sets the initial value of the value function V ^π4 (s4) for each s4εS4 to zero.

スケジューリング用方策更新部は、すべてのs4∈S4及びa4∈A4の組み合わせについて次式を計算し、Ｖ^π4（s4）の値が最大となる行動a4を方策π4(s4)の値に設定する。
The scheduling policy updater calculates the following equation for all combinations of s4εS4 and a4εA4, and sets the action a4 that maximizes the value of V ^π4 (s4) to the value of the policy π4 (s4).

ここで、Σは取りうるすべてのs4’についての総和を表し、γは予め定められた0<γ<1.0を満たす定数である。また、式（１３）におけるＶ^π4(s4’)の値は、スケジューリング用方策更新部の最初の実行時はスケジューリング用価値関数初期化部で設定された初期値が利用され、２回目以降の実行では、以下のスケジューリング用価値関数更新部で更新された価値関数の値が用いられる。 Here, Σ represents the total sum of all possible s4 ′, and γ is a constant that satisfies a predetermined 0 <γ <1.0. In addition, the value of V ^π4 (s4 ′) in equation (13) uses the initial value set by the scheduling value function initialization unit at the first execution of the scheduling policy update unit, and the second and subsequent executions Then, the value of the value function updated by the following scheduling value function updating unit is used.

スケジューリング用価値関数更新部は、各s４∈S４について、次式によって価値関数Ｖ^π4(s4)を更新する。
The scheduling value function updating unit updates the value function V ^π4 (s4) according to the following equation for each s4εS4.

[エラーフィードバック部；ステップＳ５]
以上に述べたとおり、物品軌道計画部１００、切り替え位置決定部２００、移動経路計算部３００、スケジューリング部４００に至るまでの階層的な動作計画計算において、物品軌道計画部１００によって決定された条件（最適経路点列）では、それより下の階層（切り替え位置決定部２００、移動経路計算部３００、スケジューリング部４００）での動作計画において適切な解を計算できないことも起こりうる。そのような場合に対処するため、エラーフィードバック部（図示せず）をさらに設け、再計算を行う構成としてもよい。 [Error feedback unit; Step S5]
As described above, in the hierarchical operation plan calculation up to the article trajectory planning unit 100, the switching position determining unit 200, the movement route calculating unit 300, and the scheduling unit 400, the conditions ( In the optimum route point sequence, it may happen that an appropriate solution cannot be calculated in the operation plan in the lower hierarchy (switching position determination unit 200, movement route calculation unit 300, scheduling unit 400). In order to cope with such a case, an error feedback unit (not shown) may be further provided to perform recalculation.

フィードバック部は、移動経路計算部３００で求めた担当ロボットの切り替え動作における価値関数の値を使用する。例えば、価値関数Ｖ^π3[p_m][p_s][p_n](s3)の状態変数s3=(Xb,Yb,Xrw,Yrw)において、(Xb,Yb)=s1_trj[p_m+1]かつ(Xrw,Yrw)をロボットｗが物品を位置s1_trj[p_s]からs1_trj[p_s+1]に押した後の位置としたときのＶ^π3[p_m][p_s][p_n](s3)の値をm<n,s<=mの全てのｓ、ｍの組についてチェックし、それらすべての場合においてＶ^π3[p_m][p_s][p_n](s3)の値が0以上でない場合には、方策π３[p_m][p_s][p_n](s3)に従って、ロボットを適切に担当切り替えさせることは不可能である。すると、物品位置s1_trj[p_n+1]での担当ロボット切り替えは、現時点での物品軌道では不可能ということになるので、物品位置s1_trj[p_n]からs1_trj[p_n+1]までの移動をs1_trj内に含めることは適切ではない。そこで、物品軌道計算部３００において、状態遷移確率P1(s1_trj[p_n+1]|s1_trj[p_n],a1)=0（つまり、いかなる行動a1においてもs1_trj[p_n]からs1_trj[p_n+1]へ移動しない）とし、P1(s1_trj[p_n]|s1_trj[p_n],a1)=1（s1_trj[p_n-1]に留まる場合の遷移確率を１）とするように、マルコフ状態遷移モデルの状態遷移確率P1の値を変更したうえで再計画を行う。 The feedback unit uses the value of the value function in the switching operation of the assigned robot obtained by the movement route calculation unit 300. For example, in the state variable s3 = (Xb, Yb, Xrw, Yrw) of the value function V ^π3 [p _m ] [p _s ] [p _n ] (s3), (Xb, Yb) = s1_trj [p _m +1] And V ^π3 [p _m ] [p _s ] [p _n ] when (Xrw, Yrw) is set to a position after the robot w has pushed the article from position s1_trj [p _s ] to s1_trj [p _s +1]. The value of (s3) is checked for all s, m pairs where m <n, s <= m, and in all these cases the value of ^Vπ3 [p _m ] [p _s ] [p _n ] (s3) If is not greater than or equal to 0, it is impossible to switch the robot appropriately according to the policy π3 [p _m ] [p _s ] [p _n ] (s3). Then, it is impossible to switch the assigned robot at the article position s1_trj [p _n +1] in the article trajectory at the current time, so the movement from the article position s1_trj [p _n ] to s1_trj [p _n +1] It is not appropriate to include in s1_trj. Therefore, in the article trajectory calculation unit 300, the state transition probability P1 (s1_trj [p _n +1] | s1_trj [p _n ], a1) = 0 (that is, s1_trj [p _n ] to s1_trj [p _{n in} any action a1) +1]) and P1 (s1_trj [p _n ] | s1_trj [p _n ], a1) = 1 (the transition probability when staying at s1_trj [p _n -1] is 1) Re-plan after changing the value of the state transition probability P1 of the state transition model.

このように、フィードバック部では、移動経路計算部３００で求めた方策π３[p_m][p_s][p_n](s3)に対応する価値関数Ｖ^π3[p_m][p_s][p_n](s3)について、(Xb,Yb)=s1_trj[p_m+1]かつ(Xrw,Yrw)をロボットｗが物品を位置s1_trj[p_s]からs1_trj[p_s+1]に押した後の位置としたときのＶ^π3[p_m][p_s][p_n](s3)の値が０以上でない場合には、第１状態遷移モデル設定部（図示せず）において、
P1(s1_trj[p_n+1]|s1_trj[p_n],a1)=0
P1(s1_trj[p_n]|s1_trj[p_n],a1)=1
と変更したうえで、物品軌道計算部１００（物品軌道計算用価値関数初期化部から最適経路点列計算部）の処理を行うことにより最適経路点列を更新する。更新された最適経路点列を用いて、切り替え位置決定部２００、移動経路計算部３００、スケジューリング部４００、エラーフィードバック部５００の処理を繰り返す。 Thus, in the feedback unit, the value function V ^π3 [p _m ] [p _s ] [p corresponding to the policy π3 [p _m ] [p _s ] [p _n ] ( _s 3) obtained by the movement path calculation unit 300 is obtained. _n ] (s3), after (Xb, Yb) = s1_trj [p _m +1] and (Xrw, Yrw) push the article from position s1_trj [p _s ] to s1_trj [p _s +1] If the value of V ^π3 [p _m ] [p _s ] [p _n ] ( _{s 3} ) is not greater than or equal to 0, the first state transition model setting unit (not shown)
P1 (s1_trj [p _n +1] | s1_trj [p _n ], a1) = 0
P1 (s1_trj [p _n ] | s1_trj [p _n ], a1) = 1
Then, the optimal path point sequence is updated by performing processing of the article trajectory calculation unit 100 (from the article trajectory calculation value function initialization unit to the optimal path point sequence calculation unit). The process of the switching position determination unit 200, the movement route calculation unit 300, the scheduling unit 400, and the error feedback unit 500 is repeated using the updated optimum route point sequence.

〔変形例：担当ロボット数の調整〕
上記のように、担当ロボットの切り替えの可、不可を価値関数Ｖ³[p_m][p_s][p_n](s3)を使用して判定し、不可の場合には物品軌道計算部３００に物品軌道の再計算を要求する代わりに、担当ロボットを追加する構成としてもよい。例えば、物品位置をs1_trj[p_s]からs1_trj[p_s+1]に押した後の位置に存在するロボットがs1_trj[p_n+1]の位置にある物品を押せる位置まで移動ができないのであれば、物品がs1_trj[p_m+1]の位置からs1_trj[p_m+1+1]の位置に移動する間のいずれかの途中位置にて、任務環境内の価値関数Ｖ^π3[p_m][p_s][p_n](s3)の値を正にできるロボット位置に新しいロボットを追加し、その新しいロボットにs1_trj[p_n+1]の位置にある物品を押せる位置まで移動を行わせ、s1_trj[p_n+1]から先の物品移動を担当させる。 [Variation: Adjustment of the number of assigned robots]
As described above, whether or not the assigned robot can be switched is determined using the value function V ³ [p _m ] [p _s ] [p _n ] ( _s ³ ). Instead of requesting recalculation of the article trajectory, a configuration may be adopted in which a responsible robot is added. For example, if the robot that exists at the position after pushing the article position from s1_trj [p _s ] to s1_trj [p _s + 1] cannot move to the position where the article at the position of s1_trj [p _n +1] can be pushed For example, the value function V ^π3 [p _m ] in the mission environment at any intermediate position while the article moves from the position s1_trj [p _m +1] to the position s1_trj [p _{m + 1} +1]. Add a new robot to the robot position where the value of [p _s ] [p _n ] (s3) can be positive, and move the new robot to a position where it can push the item at the position of s1_trj [p _n +1]. , S1_trj [p _n +1] is in charge of the previous article movement.

なお、以上の処理で得られた行動方策を強化学習におけるgreedy行動選択のための行動方策として使用し、学習そのものを効率化することも有効である。 It is also effective to use the action policy obtained by the above processing as an action policy for selecting a greedy action in reinforcement learning, and to make the learning itself more efficient.

＜ロボット協調搬送計画装置のハードウェア構成例＞
上述の実施形態に関わるロボット協調搬送計画装置は、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ＣＰＵ（Central Processing Unit）〔キャッシュメモリなどを備えていてもよい。〕、メモリであるＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）と、ハードディスクである外部記憶装置、並びにこれらの入力部、出力部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置間のデータのやり取りが可能なように接続するバスなどを備えている。また必要に応じて、各装置に、ＣＤ−ＲＯＭなどの記憶媒体を読み書きできる装置（ドライブ）などを設けるとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Hardware configuration example of robot cooperative transport planning device>
The robot cooperative transfer planning apparatus according to the above-described embodiment may include an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a CPU (Central Processing Unit) [cache memory, or the like. ] RAM (Random Access Memory) or ROM (Read Only Memory) and external storage device as a hard disk, and data exchange between these input unit, output unit, CPU, RAM, ROM, and external storage device It has a bus that can be connected. If necessary, each device may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

ロボット協調搬送計画装置の外部記憶装置には、搬送計画のためのプログラム並びにこのプログラムの処理において必要となるデータなどが記憶されている〔外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくなどでもよい。〕。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。以下、データやその格納領域のアドレスなどを記憶する記憶装置を単に「記憶部」と呼ぶことにする。 The external storage device of the robot cooperative transport planning device stores a program for transport planning and data necessary for processing of this program [not limited to the external storage device, for example, the program is read by a dedicated storage device. It may be stored in a certain ROM. ]. Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device. Hereinafter, a storage device that stores data, addresses of storage areas, and the like is simply referred to as a “storage unit”.

ロボット協調搬送計画装置では、記憶部に記憶されたプログラムとプログラムの処理に必要なデータが必要に応じてＲＡＭに読み込まれて、ＣＰＵで解釈実行・処理される。この結果、ＣＰＵが所定の機能（物品軌道計算部、切り替え位置決定部、移動経路計画部、スケジューリング部、エラーフィードバック部）を実現することで搬送計画が実現される。 In the robot cooperative transfer planning apparatus, the program stored in the storage unit and the data necessary for processing the program are read into the RAM as necessary, and are interpreted and executed by the CPU. As a result, the transportation plan is realized by the CPU realizing predetermined functions (article trajectory calculation unit, switching position determination unit, movement path planning unit, scheduling unit, error feedback unit).

＜補記＞
本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 <Supplementary note>
The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

また、上記実施形態において説明したハードウェアエンティティにおける処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 Further, when the processing functions in the hardware entity described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

A robot cooperative transfer planning apparatus for obtaining an action plan of each robot for moving a moving object (hereinafter referred to as an article) from a start position to a target position in cooperation with a plurality of robots,
Based on a Markov state transition model in which the position of the article is a state variable and the movement direction of the article is a behavior variable, a chain of positions (hereinafter referred to as optimal) that forms the movement path of the article from the start position to the target position. An article trajectory calculation unit for obtaining a route point sequence),
Ideal position that can take each action included in the union of all of the plurality of robots with the position of the article having each position included in the optimum path point sequence as a domain and at least the action that each robot can take Based on a Markov state transition model in which a pair of a position of a robot (hereinafter referred to as an ideal robot) is a state variable and an action that the ideal robot can take is an action variable, each position included in the optimal path point sequence is The action taken by the ideal robot when the article is present (hereinafter referred to as the optimum action sequence) is determined, and the optimum robot is moved to the section where the ideal robot can move the article in the optimum path point sequence. The ideal robot is divided according to each switching position by dividing the route point sequence (hereinafter referred to as the robot switching position) and the optimal action sequence. Action column (hereinafter referred to as a first interval optimal action sequence) corresponding to the section and a switching position determination unit and determines,
For each section that can be taken according to the robot switching position, the position of the article having each position included in the section as a definition area and the ideal robot that is not in charge of transporting the article in the section A section after the section based on a Markov state transition model in which the pair of the position of the robot (hereinafter referred to as the second ideal robot) is a state variable and the action that the second ideal robot can take is an action variable. In order to move the second ideal robot to a position where the article can be moved in step S2, a series of actions to be taken by the second ideal robot in the section (hereinafter referred to as a second section optimum action sequence) is obtained. A travel route planning department;
A robot cooperative transfer planning apparatus including a scheduling unit that assigns the first section optimum action sequence and the second section optimum action sequence to each robot so that there is no contradiction between the robots.

The robot cooperative transfer planning device according to claim 1,
The scheduling unit
For each section that can be taken according to the robot switching position, the article can be pushed to the position or the position of the article with the starting point position of the section as the definition area and each position included in the section. The action of each robot moving the article, with a state variable as a set of the position of each robot whose position is defined as the position and information for identifying the action of each robot immediately before the section. Alternatively, based on the Markov state transition model, the behavior of each robot included in the state variable is determined based on a Markov state transition model, where the behavior variable is a movement to a position where the article can be moved or a standby behavior. By assigning to each robot an action corresponding to each of the variables in order from the beginning of the optimum path point sequence, each robot is assigned the first section maximum. Robot Cooperative Transportation Planning and wherein the assigning the action sequence and the second segment optimal action sequence.

The robot cooperative transfer planning device according to claim 1 or 2,
The switching position determination unit
The robot switching position obtained by rewriting the robot switching position at a position that is further divided into sections where one robot included in the plurality of robots can move the article is used as the robot switching position. Cooperative transport planning device.

The robot cooperative transfer planning device according to any one of claims 1 to 3,
The travel route planning department
A robot that is waiting at the same position in each of the sections, and that a robot included in the plurality of robots also obtains an action for avoiding a collision with the article moved in the section. Cooperative transport planning device.

A robot cooperative transport planning method for obtaining an action plan of each robot for moving a moving object (hereinafter referred to as an article) from a start position to a target position in cooperation with a plurality of robots,
The position where the article trajectory calculation unit forms the movement path of the article from the start position to the target position based on a Markov state transition model in which the position of the article is a state variable and the movement direction of the article is a behavior variable. Article trajectory calculation step for obtaining a chain (hereinafter referred to as an optimal path point sequence) of
Each action included in the union of all of the plurality of robots of the position of the article whose definition area is each position included in the optimal path point sequence and the action that each robot can take Based on a Markov state transition model, the optimal path point sequence is based on a combination of the position of an ideal robot (hereinafter referred to as an ideal robot) that can take a state as a state variable, and an action that the ideal robot can take as a behavior variable. And determining an action (hereinafter referred to as an optimal action sequence) that the ideal robot takes when the article is at each position included in the position, and moving the article by one ideal robot in the optimal path point sequence. The ideal robot is switched by dividing the optimal action sequence by dividing the optimal route point sequence into sections where the optimal route point sequence is divided (hereinafter referred to as the robot switching position). Action column for each section is divided by the position (hereinafter, referred to as first zone optimal action sequence) and a switching position determination step of determining,
For each section that the movement path planning unit can take in accordance with the robot switching position, the position of the article having each position included in the section as a domain of definition, and the ideal robot that has the article in the section. Based on a Markov state transition model in which a pair of the position of a robot that is not in charge of conveyance (hereinafter referred to as a second ideal robot) is a state variable, and an action that the second ideal robot can take is an action variable. In order to move the second ideal robot to a position where the article can be moved in a section after the section, a sequence of actions to be taken by the second ideal robot in the section (hereinafter, second section optimum) A travel route planning step for obtaining an action sequence),
A robot cooperative transfer planning method, wherein the scheduling unit includes a scheduling step in which the first section optimum action sequence and the second section optimum action sequence are assigned to the robots so that there is no contradiction among the robots.

A program for causing a computer to function as the robot cooperative transfer planning apparatus according to any one of claims 1 to 4.