JP2023168996A

JP2023168996A - Movement control system, movement control method, movement controller, moving apparatus, movement control program, and program for moving apparatuses

Info

Publication number: JP2023168996A
Application number: JP2022080431A
Authority: JP
Inventors: 知也川部; Tomoya Kawabe; 竜志西; Tatsuyuki Nishi
Original assignee: Okayama University NUC
Current assignee: Okayama University NUC
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2023-11-29

Abstract

To provide a movement control system, movement control method, movement controller, moving apparatus, movement control program, and program for moving apparatuses capable of coping with an unexpected event while suppressing a cost of calculations.SOLUTION: A movement control system searches a moving route of a moving apparatus (S204), allows the moving apparatus to move along the moving route (S205), and acquires an ambient situation of the moving apparatus (S207). An area laid out around the moving apparatus is defined as a neighbor area. The moving apparatus acts on the basis of a measure associated with each condition pattern signifying whether entry to each of unit areas included in the neighbor area is permitted (S214).SELECTED DRAWING: Figure 20

Description

本発明は、経路に沿って、誘導移動、自律移動等の移動をする移動装置を制御する移動制御システム及び移動制御方法、そのような移動制御システムに用いられる移動制御装置及び移動装置、並びにそのような移動制御装置及び移動装置を実現するための移動制御プログラム及び移動装置用プログラムに関する。 The present invention relates to a movement control system and a movement control method for controlling a mobile device that performs guided movement, autonomous movement, etc. along a route, a movement control device and a movement device used in such a movement control system, and a movement control device and a movement device thereof. The present invention relates to a movement control program and a movement device program for realizing such a movement control device and movement device.

大規模工場等の施設においては、ＡＧＶ（Automated Guided Vehicle）、ＡＭＲ（Automated Mobile Robot）等の移動装置が用いられている。施設等の移動範囲内において、移動装置が目標位置へ到達する経路を設定するためには、経路計画を行う必要がある。壁、他の移動装置等の障害物との衝突を回避し、移動コストを抑制する経路計画を行うための経路計画問題を解く様々な方法が提案されている。例えば、経路計画問題を解く方法としては、ダイクストラ法（非特許文献１）、マルチエージェントネット（非特許文献２）、ペトリネット（非特許文献３）等を用いたルールベースの手法で解く研究が提案されている。また、強化学習手法を用いて経路計画問題を解く方法も提案されている（非特許文献４、非特許文献５）。 BACKGROUND ART In facilities such as large-scale factories, mobile devices such as AGVs (Automated Guided Vehicles) and AMRs (Automated Mobile Robots) are used. In order to set a route for a mobile device to reach a target position within a moving range of a facility, etc., it is necessary to perform route planning. Various methods have been proposed for solving route planning problems to avoid collisions with obstacles such as walls and other moving devices, and to plan routes that reduce travel costs. For example, as a method for solving route planning problems, there are studies that use rule-based methods such as Dijkstra's algorithm (Non-patent Document 1), multi-agent nets (Non-patent Document 2), and Petri nets (Non-patent Document 3). Proposed. Furthermore, a method of solving a route planning problem using a reinforcement learning method has also been proposed (Non-Patent Document 4, Non-Patent Document 5).

K.A. Dowsland and A.M. Greaves, Collision avoidance in bi-directional AGV systems, Journal of the Operational Research Society, Vol.45, No.7, pp.817-826 (1994).K.A. Dowsland and A.M. Greaves, Collision avoidance in bi-directional AGV systems, Journal of the Operational Research Society, Vol.45, No.7, pp.817-826 (1994). 平石邦彦,“ペトリネットによるマルチエージェントシステムのモデル化,” システム制御情報学会論文誌, Vol.45, No.8, pp.439-444 (2001).Kunihiko Hiraishi, “Modeling multi-agent systems using Petri nets,” Transactions of the Institute of Systems, Control and Information Engineers, Vol. 45, No. 8, pp. 439-444 (2001). M. Dotoli and M.P. Fanti, Coloured timed Petri net model for real-time control of automated guided vehicle systems, International Journal of Production Research, Vol.42, No.9, pp.1787-1814 (2004).M. Dotoli and M.P. Fanti, Colored timed Petri net model for real-time control of automated guided vehicle systems, International Journal of Production Research, Vol.42, No.9, pp.1787-1814 (2004). 渡辺美知子,古川正志,木下正博,嘉数侑昇,“Q 学習による多数AGV の自律搬送に関する研究,” 精密工学会誌, Vol.67, No.10, pp.1609-1614 (2001).Michiko Watanabe, Masashi Furukawa, Masahiro Kinoshita, Yunobo Kakazu, “Research on autonomous transportation of multiple AGVs using Q-learning,” Journal of the Japan Society for Precision Engineering, Vol. 67, No. 10, pp. 1609-1614 (2001). S.M. Jeon, K.H. Kim and H. Kopfer, Routing automated guided vehicles in container terminals through the Q-learning technique, Logistics Research, Vol.3, No.1, pp.19- 27 (2011).S.M. Jeon, K.H. Kim and H. Kopfer, Routing automated guided vehicles in container terminals through the Q-learning technique, Logistics Research, Vol.3, No.1, pp.19- 27 (2011).

しかしながら、ルールベースで経路計画問題を解く方法は、移動範囲が大きくなった場合に、必要なシミュレーション回数が増加し、計算コストが増大するという問題がある。更に、ルールベースで経路計画を解く方法は、複数の移動装置を用いる場合において、他の移動装置の故障等の予期せぬ事態に対して代替となる経路を即座に最適に近い経路計画を得ることが難しいという問題が生じる場合がある。 However, the rule-based method for solving route planning problems has a problem in that when the movement range becomes large, the number of required simulations increases and the calculation cost increases. Furthermore, when using multiple mobile devices, the method of solving route planning based on rules can immediately obtain an almost optimal route plan as an alternative route in case of unexpected situations such as failure of other mobile devices. Problems may arise where it is difficult to do so.

強化学習手法を用いて経路計画問題を解く方法は、障害物の発生、移動装置の遅延等の予期せぬ事態に対して臨機応変に経路計画を行うことができるが、強化学習は試行錯誤的に最適解を求める手法であるため、試行錯誤、学習等の処理に要する計算コストが問題となる。更に、強化学習を用いて経路計画問題を解く方法は、計画した経路に対する最適性が保証されないという問題がある。 Methods of solving route planning problems using reinforcement learning methods can flexibly plan routes in response to unexpected situations such as the occurrence of obstacles or delays in moving devices, but reinforcement learning requires trial and error. Since it is a method for finding the optimal solution, the computational cost required for processing such as trial and error and learning becomes a problem. Furthermore, the method of solving a route planning problem using reinforcement learning has a problem in that the optimality of the planned route is not guaranteed.

本発明は斯かる事情に鑑みてなされたものであり、計算コストを抑えながらも予期せぬ事態に対応可能な移動制御システムの提供を主たる目的とする。 The present invention has been made in view of such circumstances, and its main purpose is to provide a movement control system that can cope with unexpected situations while suppressing calculation costs.

また、本願は、本発明に係る移動制御システムにて実施される移動制御方法を開示する。更に、本願は、本発明に係る移動制御システムにて用いられる移動制御装置及び移動装置、並びにこれらを実現するための移動制御プログラム及び移動装置用プログラムを開示する。 The present application also discloses a movement control method implemented in the movement control system according to the present invention. Further, the present application discloses a movement control device and a movement device used in the movement control system according to the present invention, and a movement control program and a movement device program for realizing these.

上記課題を解決するために本願開示の移動制御システムは、移動装置を備え、前記移動装置の移動を制御する移動制御システムであって、前記移動装置の移動開始位置から目標位置までの移動経路を探索する経路探索手段と、前記移動装置の周囲の状況を取得する状況取得手段と、前記移動装置の大きさに基づき設定される単位領域を、前記移動装置の移動の基準となる単位移動距離毎に、前記移動装置の周囲に配置した領域を近傍領域と定義して、前記近傍領域に前記状況取得手段が取得した周囲の状況を当て嵌め、前記近傍領域に含まれる各単位領域について、前記移動装置の進入の可否を判定する進入可否判定手段と、前記近傍領域に含まれる各単位領域への進入可否を示す状態パターン毎に対応付けられた前記移動装置の方策に基づいて、前記進入可否判定手段が判定した各単位領域の進入可否の判定結果に相当する状態パターンに対応付けられた方策を取得する方策取得手段と、前記方策取得手段が取得した方策に基づいて、前記経路探索手段が探索した移動経路に沿った移動の可否を判定する移動可否判定手段と、移動経路に沿った移動が不可と判定した場合に、前記方策取得手段が取得した方策にて示される行動をとるように決定する行動決定手段とを備えることを特徴とする。 In order to solve the above problems, the present disclosure provides a movement control system that includes a movement device and controls the movement of the movement device, the movement control system controlling the movement path of the movement device from a movement start position to a target position. A route search means for searching, a situation acquisition means for acquiring the surrounding situation of the mobile device, and a unit area set based on the size of the mobile device for each unit movement distance that is a reference for the movement of the mobile device. A region arranged around the moving device is defined as a neighboring region, and the surrounding situation acquired by the situation obtaining means is applied to the neighboring region, and the moving device is determined for each unit region included in the neighboring region. an entry permission determining means for determining whether the device can enter; and a device determining whether the device can enter, based on a policy of the mobile device that is associated with each state pattern indicating whether it is possible to enter each unit area included in the nearby area. a policy acquisition unit that acquires a policy associated with a state pattern corresponding to the determination result of whether or not each unit area can be entered, and the route search unit searches based on the policy acquired by the policy acquisition unit; movement permission determining means for determining whether or not movement is possible along the travel route, and determining to take an action indicated by the policy acquired by the policy acquisition means when it is determined that movement along the travel route is not possible. The method is characterized by comprising a means for determining an action.

また、前記移動制御システムにおいて、前記移動装置と通信可能な移動制御装置を備え、前記移動装置は、複数であり、前記移動制御装置は、前記経路探索手段、前記状況取得手段、前記進入可否判定手段、前記方策取得手段、前記移動可否判定手段及び行動決定手段と、前記行動決定手段が決定した行動をさせる前記移動装置毎の行動命令を、対応する前記移動装置へ送信する手段とを備え、前記進入可否判定手段は、複数の前記移動装置のうちの一の移動装置について、他の移動装置が位置する単位領域及び／又は他の移動装置が移動する単位領域を進入不可と判定することを特徴とする。 The movement control system further includes a movement control device capable of communicating with the mobile device, wherein the number of the movement devices is plural, and the movement control device includes the route search means, the situation acquisition means, and the approach permission determination. means, the policy acquisition means, the movement permission determining means, the action determining means, and the means for transmitting, to the corresponding mobile device, an action command for each of the mobile devices to perform the action determined by the action determining means, The entry permission determining means determines that one of the plurality of mobile devices cannot enter a unit area in which another mobile device is located and/or a unit area in which the other mobile device moves. Features.

また、前記移動制御システムにおいて、前記移動装置は、複数であり、複数の前記移動装置のうちの一の移動装置は、少なくとも、前記状況取得手段、前記進入可否判定手段、前記方策取得手段、前記移動可否判定手段及び前記行動決定手段を備え、前記進入可否判定手段は、他の移動装置が位置する単位領域及び／又は他の移動装置が移動する単位領域を進入不可と判定することを特徴とする。 Further, in the movement control system, there is a plurality of moving devices, and one of the plurality of moving devices includes at least the situation acquisition means, the approach permission determination means, the policy acquisition means, and the It is characterized by comprising a movement permission determining means and the action determining means, wherein the entry permission determining means determines that a unit area in which another mobile device is located and/or a unit area in which the other mobile device moves cannot be entered. do.

また、前記移動制御システムにおいて、前記経路探索手段は、前記移動装置が移動可能な範囲を、単位領域をノードとし、単位領域間の移動可能な経路をエッジとして連結関係を定義したグラフに対して、前記移動装置の移動開始位置に対応するスタートノードから目標位置に対応するゴールノードまでの移動経路を、グラフ探索アルゴリズムにて探索することを特徴とする。 Further, in the movement control system, the route search means searches for a graph in which a movable range of the mobile device is defined by a unit area as a node and a movable route between unit areas as an edge. , a movement route from a start node corresponding to a movement start position of the mobile device to a goal node corresponding to a target position is searched using a graph search algorithm.

また、前記移動制御システムにおいて、前記近傍領域は、単位移動距離の２単位分の移動に基づいて定義されていることを特徴とする。 Further, in the movement control system, the nearby area is defined based on movement of two units of a unit movement distance.

また、前記移動制御システムにおいて、前記方策取得手段が取得する状態パターン及び方策の関係は、強化学習により得られた学習モデルであることを特徴とする。 Further, in the movement control system, the relationship between the state pattern and the policy acquired by the policy acquisition means is a learning model obtained by reinforcement learning.

また、前記移動制御システムにおいて、前記学習モデルは、想定される状態パターン毎に、前記移動装置の移動方向への移動及び停止によるそれぞれの行動価値を示しており、前記方策取得手段は、前記行動価値が最も高い方向への移動又は停止を方策として取得することを特徴とする。 Further, in the movement control system, the learning model indicates the respective action values of moving and stopping the moving device in the movement direction for each assumed state pattern, and the policy acquisition means is configured to It is characterized in that the strategy is to move or stop in the direction with the highest value.

更に、本願開示の移動制御方法は、移動装置及び前記移動装置の移動を制御する移動制御装置を用いた移動制御方法であって、前記移動装置の移動開始位置から目標位置までの移動経路を探索するステップと、前記移動装置の周囲の状況を取得するステップと、前記移動装置の大きさに基づき設定される単位領域を、前記移動装置の移動の基準となる単位移動距離毎に、前記移動装置の周囲に配置した領域を近傍領域と定義して、前記近傍領域に周囲の状況を当て嵌め、前記近傍領域に含まれる各単位領域について、前記移動装置の進入の可否を判定するステップと、前記近傍領域に含まれる各単位領域への進入可否を示す状態パターン毎に対応付けられた前記移動装置の方策に基づいて、各単位領域の進入可否の判定結果に相当する状態パターンに対応付けられた方策を取得するステップと、取得した方策に基づいて、移動経路に沿った移動の可否を判定するステップと、移動可と判定した場合に、前記移動装置が移動経路に沿って移動するステップと、移動不可と判定した場合に、前記移動装置が方策にて示される行動をとるステップとを実行することを特徴とする。 Furthermore, the movement control method disclosed in the present application is a movement control method using a movement device and a movement control device that controls movement of the movement device, the movement control method including searching a movement route from a movement start position of the movement device to a target position. acquiring the surrounding situation of the mobile device; and determining the unit area set based on the size of the mobile device for each unit movement distance serving as a reference for the movement of the mobile device. defining an area arranged around the area as a nearby area, applying surrounding conditions to the nearby area, and determining whether or not the moving device can enter with respect to each unit area included in the nearby area; Based on the policy of the mobile device, which is associated with each status pattern indicating whether or not it is possible to enter each unit area included in the nearby area, the area is associated with a state pattern corresponding to a determination result of whether or not each unit area can be entered. a step of acquiring a policy; a step of determining whether movement along the movement route is possible based on the acquired policy; and a step of causing the mobile device to move along the movement route when it is determined that movement is possible. If it is determined that movement is not possible, the mobile device takes the action indicated by the policy.

更に、本願開示の移動制御装置は、移動装置の移動を制御する移動制御装置であって、前記移動装置の移動開始位置から目標位置までの移動経路を探索する手段と、前記移動装置の周囲の状況を取得する手段と、前記移動装置の大きさに基づき設定される単位領域を、前記移動装置の移動の基準となる単位移動距離毎に、前記移動装置の周囲に配置した領域を近傍領域と定義して、前記近傍領域に周囲の状況を当て嵌め、前記近傍領域に含まれる各単位領域について、前記移動装置の進入の可否を判定する手段と、前記近傍領域に含まれる各単位領域への進入可否を示す状態パターン毎に対応付けられた前記移動装置の方策に基づいて、各単位領域の進入可否の判定結果に相当する状態パターンに対応付けられた方策を取得する手段と、取得した方策に基づいて、移動経路に沿った移動の可否を判定する手段と、移動可と判定した場合に、移動経路に沿って移動し、移動不可と判定した場合に、決定した行動をとらせる行動命令を、前記移動装置へ送信する手段とを備えることを特徴とする。 Furthermore, the movement control device disclosed in the present application is a movement control device that controls the movement of a movement device, and includes means for searching a movement route from a movement start position of the movement device to a target position, and means for searching a movement route of the movement device from a movement start position to a target position. means for acquiring the situation, and a unit area set based on the size of the mobile device, an area arranged around the mobile device for each unit movement distance serving as a reference for movement of the mobile device, and a nearby area. means for determining whether or not the moving device can enter each unit area included in the nearby area by applying a surrounding situation to the nearby area; means for acquiring a policy associated with a state pattern corresponding to a determination result of whether or not each unit area can be entered, based on a policy of the mobile device that is associated with each state pattern indicating whether or not entry is allowed; and the obtained policy. means for determining whether or not movement is possible along the movement route based on the above, and an action instruction that causes the user to move along the movement route if it is determined that movement is possible, and to take the determined action if it is determined that movement is not possible. and means for transmitting the information to the mobile device.

更に、本願開示の移動装置は、移動経路に沿って移動する移動装置であって、周囲の状況を取得する手段と、自機の大きさに基づき設定される単位領域を、自機の移動の基準となる単位移動距離毎に、周囲に配置した領域を近傍領域と定義して、前記近傍領域に周囲の状況を当て嵌め、前記近傍領域に含まれる各単位領域について、自機の進入の可否を判定する手段と、前記近傍領域に含まれる各単位領域への進入可否を示す状態パターン毎に対応付けられる方策に基づいて、各単位領域の進入可否の判定結果に相当する状態パターンに対応付けられた方策を取得する手段と、取得した方策に基づいて、移動経路に沿った移動の可否を判定する手段と、移動可と判定した場合に、移動経路に沿って移動する手段と、移動不可と判定した場合に、取得した方策にて示される行動をとる手段とを備えることを特徴とする。 Further, the mobile device disclosed in the present application is a mobile device that moves along a travel route, and includes a means for acquiring surrounding conditions and a unit area set based on the size of the mobile device. For each standard unit movement distance, define the surrounding area as a nearby area, apply the surrounding situation to the nearby area, and determine whether or not the own aircraft can enter for each unit area included in the nearby area. and a means for determining whether or not each unit area included in the neighborhood area can be entered into, and a state pattern corresponding to a determination result of whether or not each unit area can be entered, based on a measure that is associated with each state pattern indicating whether or not entry is possible to each unit area included in the neighboring area. a means for acquiring a policy that has been determined; a means for determining whether movement is possible along the movement route based on the acquired policy; a means for moving along the movement route when it is determined that movement is possible; and a means for moving along the movement route based on the acquired policy; and means for taking the action indicated by the acquired policy when it is determined that.

更に、本願開示の移動制御プログラムは、移動装置と通信する通信手段を備えるコンピュータに、前記移動装置の移動を制御させる移動制御プログラムであって、コンピュータに、前記移動装置の移動開始位置から目標位置までの移動経路を探索するステップと、前記移動装置の周囲の状況を取得するステップと、前記移動装置の大きさに基づき設定される単位領域を、前記移動装置の移動の基準となる単位移動距離毎に、前記移動装置の周囲に配置した領域を近傍領域と定義して、前記近傍領域に周囲の状況を当て嵌め、前記近傍領域に含まれる各単位領域について、前記移動装置の進入の可否を判定するステップと、前記近傍領域に含まれる各単位領域への進入可否を示す状態パターン毎に対応付けられた前記移動装置の方策に基づいて、各単位領域の進入可否の判定結果に相当する状態パターンに対応付けられた方策を取得するステップと、取得した方策に基づいて、移動経路に沿った移動の可否を判定するステップと、移動可と判定した場合に、移動経路に沿って移動し、移動不可と判定した場合に、決定した行動をとらせる行動命令を、前記移動装置へ送信するステップとを実行させることを特徴とする。 Furthermore, the movement control program disclosed in the present application is a movement control program that causes a computer equipped with communication means for communicating with a mobile device to control the movement of the mobile device, and the program causes the computer to move the mobile device from a movement start position to a target position. a step of searching for a travel route to the mobile device; a step of acquiring the surrounding situation of the mobile device; and a step of determining a unit area set based on the size of the mobile device, a unit movement distance serving as a reference for the movement of the mobile device. In each case, an area arranged around the mobile device is defined as a nearby area, and the surrounding situation is applied to the nearby area to determine whether or not the mobile device can enter into each unit area included in the nearby area. a step of determining, and a state corresponding to a determination result of whether or not each unit area can be entered, based on a strategy of the mobile device associated with each state pattern indicating whether or not each unit area included in the nearby area can be entered; a step of acquiring a policy associated with the pattern; a step of determining whether or not movement is possible along the movement route based on the acquired policy; and when it is determined that movement is possible, moving along the movement route; The mobile device is characterized in that, when it is determined that movement is not possible, transmitting an action command to the mobile device to cause the mobile device to take the determined action.

更に、本願開示の移動装置用プログラムは、移動経路に沿って移動する移動装置に実行させる移動装置用プログラムであって、移動装置に、周囲の状況を取得するステップと、自機の大きさに基づき設定される単位領域を、自機の移動の基準となる単位移動距離毎に、周囲に配置した領域を近傍領域と定義して、前記近傍領域に周囲の状況を当て嵌め、前記近傍領域に含まれる各単位領域について、自機の進入の可否を判定するステップと、前記近傍領域に含まれる各単位領域への進入可否を示す状態パターン毎に対応付けられる方策に基づいて、各単位領域の進入可否の判定結果に相当する状態パターンに対応付けられた方策を取得するステップと、取得した方策に基づいて、移動経路に沿った移動の可否を判定するステップと、移動可と判定した場合に、移動経路に沿って移動するステップと、移動不可と判定した場合に、取得した方策にて示される行動をとるステップとを実行させることを特徴とする。 Furthermore, the mobile device program disclosed in the present application is a mobile device program that is executed by a mobile device moving along a travel route, and includes steps for the mobile device to obtain surrounding conditions, and a step for determining the size of the mobile device. The unit area set based on the unit area is defined as a neighboring area for each unit movement distance that is the reference for movement of the own aircraft, and the surrounding situation is applied to the neighboring area, and The step of determining whether or not the own aircraft can enter each unit area included in the unit area, and the step of determining whether or not the own aircraft can enter each unit area included in the neighboring area based on the policy associated with each state pattern indicating whether or not it is possible to enter each unit area included in the neighboring area. a step of acquiring a policy associated with a state pattern corresponding to a determination result of whether entry is possible; a step of determining whether movement is possible along the movement route based on the acquired policy; and a step when it is determined that movement is possible. , the step of moving along the movement route, and the step of taking the action indicated by the acquired policy when it is determined that movement is not possible.

本願開示の移動制御システム等は、単位領域を移動装置の周囲に配置した近傍領域に、移動装置の周囲の状況を反映させた状態パターンに対し、対応付けられた方策に基づいて移動装置の行動を決定する。これにより、本願開示の移動制御システム等は、移動経路に沿って移動する移動装置が、周囲の状況の変化を状態パターンとして捉えて行動に反映させることが可能である等、優れた効果を奏する。 The movement control system, etc. disclosed in the present application performs a movement of the mobile device based on a policy associated with a state pattern in which the surrounding situation of the mobile device is reflected in a neighborhood area in which a unit area is arranged around the mobile device. Determine. As a result, the movement control system and the like disclosed in the present application can achieve excellent effects, such as allowing a mobile device that moves along a movement route to capture changes in the surrounding situation as a state pattern and reflect them in its actions. .

本願開示の移動制御システムにおいて、レイアウトに当て嵌めたグラフの一例を示している。In the movement control system disclosed in the present application, an example of a graph applied to the layout is shown. 本願開示の移動制御システムにおいて、移動装置の周囲に定義される近傍領域の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of a nearby area defined around a mobile device in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、近傍領域に含まれる単位領域に対する進入の可否を例示する説明図である。In the movement control system disclosed in the present application, it is an explanatory diagram illustrating whether it is possible to enter a unit area included in a nearby area. 本願開示の移動制御システムにおいて、移動装置の行動決定に用いられるＱ学習の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of Q learning used to determine the behavior of a mobile device in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、レイアウト上に、移動装置の近傍領域及び移動経路の一例を重畳して示す説明図である。FIG. 2 is an explanatory diagram showing an example of a nearby area of a moving device and a moving route superimposed on a layout in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、移動装置の周囲の近傍領域の状態の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of the state of a nearby area around a moving device in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、レイアウト上に、移動装置の近傍領域及び移動経路の一例を重畳して示す説明図である。FIG. 2 is an explanatory diagram showing an example of a nearby area of a moving device and a moving route superimposed on a layout in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、移動装置の周囲の近傍領域の状態の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of the state of a nearby area around a moving device in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、レイアウト上に、移動装置の近傍領域及び移動経路の一例を重畳して示す説明図である。FIG. 2 is an explanatory diagram showing an example of a nearby area of a moving device and a moving route superimposed on a layout in the movement control system disclosed in the present application. 本願開示の移動制御システムにおいて、移動装置の周囲の近傍領域の状態の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of the state of a nearby area around a moving device in the movement control system disclosed in the present application. 本願開示の移動制御システムにおけるＱ学習による行動更新の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of behavior updating by Q learning in the movement control system disclosed in the present application. 本願開示の移動制御システムにおける状態パターンとＱ値の配列との関係の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of the relationship between a state pattern and an array of Q values in the movement control system disclosed in the present application. 本願開示の移動制御システムにおける状態パターンとＱ値の配列との関係の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of the relationship between a state pattern and an array of Q values in the movement control system disclosed in the present application. 本願開示の移動制御システムにおける状態パターンとＱ値の配列との関係の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of the relationship between a state pattern and an array of Q values in the movement control system disclosed in the present application. 本願開示の移動制御システムのシステム構成の一例を示す概念図である。1 is a conceptual diagram showing an example of a system configuration of a movement control system disclosed in the present application. 本願開示の移動制御システムにて用いられる移動装置のハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of a mobile device used in the movement control system disclosed herein. 本願開示の移動制御システムにて用いられる移動制御装置のハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of a movement control device used in the movement control system disclosed in the present application. 本願開示の機械学習装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of a hardware configuration of a machine learning device disclosed in the present application. 本願開示の移動制御システムが備える機械学習装置の機械学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the machine learning process of the machine learning device with which the movement control system of this application disclosure is provided. 本願開示の移動制御システムが備える移動制御装置の移動制御処理の一例を示すフローチャートである。It is a flowchart which shows an example of movement control processing of a movement control device provided in a movement control system disclosed in the present application. 本願開示の移動制御システムが備える移動制御装置の移動指示処理の一例を示すフローチャートである。It is a flow chart which shows an example of movement instruction processing of a movement control device provided in a movement control system disclosed in the present application. 本願開示の移動制御システムが備える移動装置の移動制御処理の一例を示すフローチャートである。It is a flowchart which shows an example of the movement control process of the mobile device with which the movement control system of this application disclosure is provided. 本願開示の移動制御システムのシミュレーションテストに用いたグラフを示す説明図である。FIG. 2 is an explanatory diagram showing a graph used in a simulation test of the movement control system disclosed in the present application. 本願開示の移動制御システムのシミュレーションテストに用いたグラフに、各移動装置の移動経路を重畳して示す説明図である。FIG. 3 is an explanatory diagram showing the movement routes of each mobile device superimposed on a graph used in a simulation test of the movement control system disclosed in the present application. 本願開示の移動制御システムのシミュレーションテストの結果の一例を示すグラフである。It is a graph which shows an example of the result of the simulation test of the movement control system of this application disclosure. 本願開示の移動制御システムのシミュレーションテストの結果の一例を示すグラフである。It is a graph which shows an example of the result of the simulation test of the movement control system of this application disclosure. 本願開示の移動制御システムにおいて、移動装置の周囲に定義される近傍領域の一例を概念的に示す説明図である。FIG. 2 is an explanatory diagram conceptually showing an example of a nearby area defined around a mobile device in the movement control system disclosed in the present application.

以下、本発明の実施形態について詳述する。なお、以下の実施形態は、本発明を具現化した一例であって、本発明の技術範囲を限定する性格のものではない。 Embodiments of the present invention will be described in detail below. Note that the following embodiment is an example of embodying the present invention, and is not intended to limit the technical scope of the present invention.

＜適用例＞
本願開示の移動制御システムは、ＡＧＶ（Automated Guided Vehicle）、ＡＭＲ（Automated Mobile Robot）等の移動装置の移動の制御に適用される。例えば、複数のＡＧＶを同時に使用する半導体工場等の大規模工場、物流センター等の倉庫等の施設における構内又は構外の物流に適用される。具体的には、複数台のＡＧＶが、施設内の地面に敷設されたガイドパスに沿って移動し、設置物、配置物等の障害物、並びに他のＡＧＶとの衝突を回避しながら与えられたタスクを実行する形態に適用される。なお、設置物とは、壁、柱、建築物等の固定された障害物を示し、配置物とは、架台、搬送物等の再配置可能な障害物を示す。また、予定と異なる位置に存在する物体、例えば、転倒した配置物、故障した他の移動装置等についても障害物として扱われる。タスクとしては、指定された位置での荷積み、荷下ろし等の作業を伴う移動を例示することができる。 <Application example>
The movement control system disclosed in the present application is applied to control the movement of mobile devices such as AGVs (Automated Guided Vehicles) and AMRs (Automated Mobile Robots). For example, the present invention is applied to on-premises or off-premises logistics in facilities such as large-scale factories such as semiconductor factories, warehouses such as distribution centers, etc., where multiple AGVs are used simultaneously. Specifically, multiple AGVs move along guide paths laid on the ground within the facility, avoiding obstacles such as installed objects and objects, as well as collisions with other AGVs. Applies to the form in which tasks are executed. Note that the installed object refers to a fixed obstacle such as a wall, a pillar, or a building, and the arranged object refers to a repositionable obstacle such as a frame or a conveyed object. Furthermore, objects that are located in a different position than expected, such as overturned objects, other malfunctioning moving devices, etc., are also treated as obstacles. Examples of tasks include movement involving work such as loading and unloading at a designated location.

＜理論＞
本願開示の移動制御システムの基本的な理論について説明する。先ず、移動制御システムに適用される理論の概要について説明する。本願開示の移動制御システムは、ＡＧＶ等の移動装置が移動可能な範囲を、ノード及びエッジとして連結関係を定義したグラフに割り当てる。そして、移動制御システムは、移動装置の移動の開始位置となるスタートノードから目標位置となるゴールノードまでの最短経路を、Ａ^*アルゴリズム等のグラフ探索アルゴリズムを用いて探索する。 <Theory>
The basic theory of the movement control system disclosed in this application will be explained. First, an overview of the theory applied to the mobile control system will be explained. The mobility control system disclosed herein allocates a movable range of a mobile device such as an AGV to a graph in which connection relationships are defined as nodes and edges. Then, the movement control system searches for the shortest route from the start node, which is the starting position of the movement of the mobile device, to the goal node, which is the target position, using a graph search algorithm such as the A ^* algorithm.

また、移動装置の周囲には、移動装置の大きさに基づき設定される単位領域を並べた近傍領域（状態空間）が定義される。定義された近傍領域に含まれる単位領域に、各種設置物、配置物、他の移動装置等の障害物が存在する場合、その単位領域は、移動装置の進入が不可の単位領域と判定される。なお、他の移動装置に対しては、その単位領域に存在する場合だけでなく、その単位領域に移動する場合も進入が不可の単位領域として判定する。また、障害物及び他の移動装置が存在或いは移動しない単位領域は、移動装置の進入可能な単位領域と判定される。他方、移動制御システムでは、近傍領域に含まれる各単位領域への進入可否を示す様々な状態パターンに対し、移動装置の方策を対応付ける。状態パターンに対応付けられる移動装置の方策は、Ｑ学習等の強化学習にて、例えば、Ｑ関数（行動価値関数）等の学習モデルとして、予め求めておく。 Furthermore, a neighborhood area (state space) in which unit areas set based on the size of the mobile device are arranged is defined around the mobile device. If there are obstacles such as various installed objects, placement objects, or other moving devices in a unit area included in the defined nearby area, that unit area is determined to be a unit area into which moving devices cannot enter. . Note that for other mobile devices, not only when they exist in that unit area, but also when they move into that unit area, it is determined that the unit area is not allowed to enter. Further, a unit area in which an obstacle or another moving device exists or does not move is determined to be a unit area into which the moving device can enter. On the other hand, in the movement control system, the strategy of the movement device is associated with various state patterns indicating whether or not entry into each unit area included in the neighborhood area is possible. The strategy of the mobile device associated with the state pattern is determined in advance by reinforcement learning such as Q learning, for example, as a learning model such as a Q function (behavior value function).

移動装置は、探索した移動経路に沿って移動し、近傍領域内に障害物及び他の移動装置が存在又は移動するという状態となった場合、近傍領域に含まれる各単位領域への進入可否の状態に相当する状態パターンに対応付けられた方策を参照して行動する。具体的には、移動装置は、移動経路に沿った移動の継続、又は方策に示された減速、停止、進路変更等の衝突を回避する行動をとる。 The mobile device moves along the searched movement route, and when obstacles and other mobile devices exist or move within the nearby area, it is determined whether or not it is possible to enter each unit area included in the nearby area. Act by referring to the policy associated with the state pattern corresponding to the state. Specifically, the moving device continues moving along the moving route, or takes actions to avoid collisions such as decelerating, stopping, changing course, etc. indicated in the strategy.

次に、具体例を挙げながら、移動制御システムの詳細な説明を行う。図１は、本願開示の移動制御システムにおいて、レイアウトに当て嵌めたグラフの一例を示している。図１は、工場のレイアウトを模した正方形状の水平領域に１６×１６のノードを割り当てた例を示している。図中、「０」～「５」の番号を付した白丸は、ＡＧＶ等の移動装置の位置を示している。図中、縦線のハッチング、横線のハッチング、並びに縦線及び横線のクロスハッチングを付した正方形は、それぞれ待機充電を行うノード、荷積みを行うノード及び荷下ろしを行うノードを示している。図中斜線のクロスハッチングで示した領域は、各種設置物、配置物等の障害物を示しており、移動装置が進入できない領域である。 Next, a detailed explanation of the movement control system will be given using specific examples. FIG. 1 shows an example of a graph applied to a layout in the movement control system disclosed herein. FIG. 1 shows an example in which 16×16 nodes are assigned to a square horizontal area that resembles the layout of a factory. In the figure, white circles numbered "0" to "5" indicate the positions of mobile devices such as AGVs. In the figure, vertical hatching, horizontal hatching, and squares with vertical and horizontal crosshatching indicate nodes that perform standby charging, loading, and unloading, respectively. The area indicated by diagonal cross-hatching in the figure indicates obstacles such as various installed objects and objects, and is an area into which the moving device cannot enter.

図１の例において、移動装置「０」は、充電しながら待機中であり、移動装置「１」～「５」は、与えられたタスクを実行中である。図１の例において、タスクとは、移動装置が障害物及び他の移動装置との衝突、干渉、デッドロック等の異常の発生を避けながら、移動の開始位置から指定された荷積みノードへ移動し、荷積みノードで荷積みを行い、指定された荷下ろしノードへ移動し、荷下ろしノードで荷下ろしを行う一連の動作を示す。 In the example of FIG. 1, mobile device "0" is on standby while charging, and mobile devices "1" to "5" are performing a given task. In the example of Figure 1, the task is for a mobile device to move from a starting position to a designated loading node while avoiding collisions with obstacles and other mobile devices, interference, deadlock, and other abnormalities. This shows a series of operations in which cargo is loaded at a loading node, moved to a specified unloading node, and unloaded at the unloading node.

図２は、本願開示の移動制御システムにおいて、移動装置の周囲に定義される近傍領域の一例を概念的に示す説明図である。図２は、移動装置が移動する水平面上に定義された近傍領域を例示している。近傍領域には、正方形で示す複数の単位領域が含まれている。図２において、移動装置は、白丸の記号であり、白丸から延びる矢印が移動装置の進行方向を示している。近傍領域に含まれる単位領域の大きさは、移動装置の大きさに基づき設定される。図２では、移動装置の水平方向の広がりが収容される大きさの正方形が単位領域として設定された例を示している。近傍領域には、移動装置の移動の基準となる単位移動距離毎に単位領域が並べて配置されている。図２において、移動装置の単位移動距離は、単位領域の一辺の長さであり、移動装置は、単位移動距離分の移動を行う都度、周囲の状態に基づく移動の可否を判定する。図２に例示する近傍領域は、単位移動距離の２単位分の移動に基づいて定義されている。 FIG. 2 is an explanatory diagram conceptually showing an example of a nearby area defined around a mobile device in the movement control system disclosed herein. FIG. 2 illustrates a neighborhood area defined on a horizontal plane in which a mobile device moves. The neighborhood area includes a plurality of unit areas indicated by squares. In FIG. 2, the moving device is represented by a white circle, and an arrow extending from the white circle indicates the traveling direction of the moving device. The size of the unit area included in the neighborhood area is set based on the size of the mobile device. FIG. 2 shows an example in which a square having a size that accommodates the horizontal spread of the mobile device is set as a unit area. In the vicinity area, unit areas are arranged side by side for each unit movement distance, which is a reference for movement of the moving device. In FIG. 2, the unit moving distance of the moving device is the length of one side of the unit area, and each time the moving device moves by the unit moving distance, it determines whether the movement is possible based on the surrounding state. The neighborhood area illustrated in FIG. 2 is defined based on movement of two units of unit movement distance.

図３は、本願開示の移動制御システムにおいて、近傍領域に含まれる単位領域に対する進入の可否を例示する説明図である。図３は、移動装置「０」の周囲に定義された近傍領域を例示している。図３において、クロスハッチングで示された単位領域は、設置物、配置物等の障害物であり、移動装置が進入できない単位領域を示している。「３」及び「５」が示された白丸は、他の移動装置「３」及び「５」であり、白丸が現在の位置を示し、白丸から延びる矢印が進行方向を示している。他の移動装置が位置する単位領域及び他の移動装置が移動する単位領域は、衝突防止のため、進入すべきでない単位領域となる。即ち、移動装置「０」にとって、障害物がある単位領域、並びに他の移動装置が位置する単位領域及び他の移動装置が移動する単位領域は、進入不可の単位領域であり、進入不可の単位領域以外の単位領域は、進入可能な単位領域となる。図３の例において、移動装置「０」が、進行方向である前方へ進行すると、他の移動装置「３」が移動する単位領域に進入するため、衝突が発生する。従って、移動装置「０」は、衝突を回避する行動を取る必要が生じる。 FIG. 3 is an explanatory diagram illustrating whether or not a unit area included in a nearby area can be entered in the movement control system disclosed herein. FIG. 3 illustrates a neighborhood region defined around mobile device "0". In FIG. 3, the unit areas shown by cross-hatching are obstacles such as installed objects and arranged objects, and indicate unit areas into which the moving device cannot enter. The white circles marked with "3" and "5" are other moving devices "3" and "5", the white circles indicate the current positions, and the arrows extending from the white circles indicate the direction of movement. The unit area where another moving device is located and the unit area into which the other moving device moves are unit areas that should not be entered in order to prevent collisions. That is, for mobile device "0", the unit area where the obstacle is located, the unit area where other mobile devices are located, and the unit area where other mobile devices move are unit areas that cannot be entered; Unit areas other than the area are enterable unit areas. In the example of FIG. 3, when moving device "0" moves forward in the direction of movement, a collision occurs because it enters the unit area in which another moving device "3" moves. Therefore, mobile device "0" needs to take action to avoid collision.

衝突を回避するための移動装置の行動決定には、Ｑ学習等の強化学習を用いて予め作成しておいたＱ関数等の学習モデルが用いられる。図４は、本願開示の移動制御システムにおいて、移動装置の行動決定に用いられるＱ学習の一例を概念的に示す説明図である。Ｑ学習とは、強化学習の代表的な手法の一つである。強化学習は、動的な環境及びエージェントの２つの要素から成り立つ。強化学習は、エージェントがある環境である行動をとった場合に、その行動による環境の変化に応じて得られる報酬を、試行錯誤を繰り返しながら最大化する方策を求める方法である。図４は、エージェント及び環境、並びに状態ｓ_t 、報酬ｒ_t 及び行動ａ_t の関係を概念的に示している。エージェントが行動ａ_t をすることにより、環境が変化し、変化した状態ｓ_t 及びその報酬ｒ_t が算出される。変化した状態ｓ_t に応じて、エージェントが次の行動ａ_t をとる。報酬ｒ_tの最大化は、ある状態ｓ_t で、ある行動をとった場合における行動価値をＱ関数によるＱ値で示し、Ｑ値が最大となるように行動を選択していくこととの観点から、Ｑ学習では、下記の式（１）で定義する行動価値関数Ｑ_π（ｓ_t ，ａ_t ）を最大化するように学習を進める。 A learning model such as a Q function created in advance using reinforcement learning such as Q learning is used to determine the behavior of a mobile device to avoid a collision. FIG. 4 is an explanatory diagram conceptually showing an example of Q learning used to determine the behavior of a mobile device in the movement control system disclosed herein. Q-learning is one of the typical reinforcement learning methods. Reinforcement learning consists of two components: a dynamic environment and an agent. Reinforcement learning is a method that uses trial and error to find a strategy that maximizes the reward obtained when an agent takes a certain action in a certain environment in response to changes in the environment caused by that action. FIG. 4 conceptually shows the relationship between the agent and the environment, as well as the state s _t , reward r _t , and action a _t . When the agent performs an action a _t , the environment changes, and the changed state s _t and its reward r _t are calculated. The agent takes the next action a _t in response to the changed state s _t . Maximizing the reward r _t is based on the perspective of expressing the value of an action when taking a certain action in a certain state s _t as a Q value using a Q function, and selecting actions so that the Q value is maximized. Therefore, in Q learning, learning is performed so as to maximize the action value function Q _π (s _t , a _t ) defined by the following equation (1).

Ｑ_π（ｓ_t ，ａ_t ）＝Ｅ_π［ｒ_t ｜ｓ_t ＝ｓ，ａ_t ＝ａ］・・・式（１） Q _π (s _t , a _t )=E _π [r _t |s _t =s, a _t =a] ...Formula (1)

本願開示の移動制御システムでは、Ｑ学習に用いる状態空間として図２に例示した近傍領域を定義する。図２に例示した近傍領域は、移動装置における単位移動距離の２単位分の単位領域を並べて定義しており、他の移動装置の移動を考慮した場合の最小構成に基づくものである。このように本願開示の移動制御システムは、移動装置の位置を基準として周辺の状態のみに着目した相対的で、大きな記憶容量を必要とせず、単純な状態空間を近傍領域として定義する。相対的で単純な状態空間は、従来の一般的なＱ学習での経路計画が、移動装置が移動可能な範囲のレイアウト全体を状態空間として定義していたことと大きく異なる。そして、本願開示の移動制御システムは、相対的で、大きな記憶容量を必要とせず、単純な状態空間として近傍領域を定義することにより、移動装置が移動可能な範囲のレイアウト、他の移動装置の台数等の環境が変更された場合であっても、予め学習して取得すべきＱ関数の再学習が不要となる等、優れた効果を奏する。更に、相対的で、大きな記憶容量を必要とせず、単純な近傍領域は、Ｑ学習自体の学習に要する時間の短縮が可能となる等、優れた効果を奏する。 In the movement control system disclosed herein, the neighborhood region illustrated in FIG. 2 is defined as a state space used for Q learning. The neighborhood area illustrated in FIG. 2 is defined by arranging unit areas corresponding to two units of the unit movement distance of the mobile device, and is based on the minimum configuration when considering the movement of other mobile devices. As described above, the movement control system disclosed in the present application defines a simple state space as a neighborhood area, which is relative to the position of the moving device and focuses only on the surrounding state, and does not require a large storage capacity. The relatively simple state space is significantly different from the conventional path planning in general Q-learning, in which the entire layout of the movable range of the mobile device is defined as the state space. The movement control system disclosed in the present application is relative and does not require a large storage capacity, and by defining the neighborhood area as a simple state space, the layout of the range in which the mobile device can move, the layout of the range in which the mobile device can move, etc. Even if the environment such as the number of devices changes, there is no need to re-learn the Q function, which should be learned and acquired in advance, resulting in excellent effects. Furthermore, the relative, simple neighborhood region that does not require a large storage capacity has excellent effects, such as being able to shorten the time required for Q learning itself.

次に、本願開示の移動制御システムによる移動装置の行動決定の具体例について説明する。図５は、本願開示の移動制御システムにおいて、レイアウト上に、移動装置の近傍領域及び移動経路の一例を重畳して示す説明図である。図６は、本願開示の移動制御システムにおいて、移動装置の周囲の近傍領域の状態の一例を示す説明図である。図５は、図１に例示したレイアウトに当て嵌めたグラフ上に、Ａ^*アルゴリズムにて探索した移動装置「０」の近傍領域の範囲及び移動装置「０」の現在位置から目標となる荷下ろしを行うノードまでの移動経路を重畳して示している。図６は、図５上の移動装置「０」の周囲に定義された近傍領域の状態を示している。荷積みを行うノードに位置する移動装置「０」の周囲に定義される近傍領域内には、障害物が存在するため進入不可の状態の単位領域は存在するが、矢印で示す進行方向の単位領域には障害物はない。そして、移動装置「０」の周囲に定義された近傍領域内には、存在又は移動する他の移動装置がない。従って、Ｑ関数の更新は行われず、移動装置「０」は、移動経路に沿った移動を行う。 Next, a specific example of determining the behavior of a mobile device by the movement control system disclosed in the present application will be described. FIG. 5 is an explanatory diagram showing an example of a nearby area of a moving device and a movement route superimposed on a layout in the movement control system disclosed in the present application. FIG. 6 is an explanatory diagram illustrating an example of the state of the vicinity area around the moving device in the movement control system disclosed herein. Figure 5 shows the target unloading from the range of the vicinity area of mobile device "0" searched using the A ^* algorithm and the current position of mobile device "0" on the graph applied to the layout illustrated in Figure 1. The travel route to the node that performs this is shown superimposed. FIG. 6 shows the state of the neighborhood area defined around the mobile device "0" in FIG. Within the vicinity area defined around the mobile device "0" located at the node that performs loading, there are unit areas that cannot be entered due to the presence of obstacles, but there are unit areas in the traveling direction indicated by the arrows. There are no obstacles in the area. Then, within the vicinity area defined around the mobile device "0", there is no other mobile device existing or moving. Therefore, the Q function is not updated, and the mobile device "0" moves along the travel route.

図７は、本願開示の移動制御システムにおいて、レイアウト上に、移動装置の近傍領域及び移動経路の一例を重畳して示す説明図である。図８は、本願開示の移動制御システムにおいて、移動装置の周囲の近傍領域の状態の一例を示す説明図である。図７は、移動装置「１」に関する近傍領域及び移動経路を示しており、図８は、移動装置「１」に関する近傍領域の状態を示している。移動装置「１」の周囲に定義された近傍領域内には、他の移動装置「３」が存在するため、Ｑ関数の更新が行われる。ただし、移動装置「１」の前方の単位領域は、他の移動装置「３」が存在する単位領域でも、移動する単位領域でもないことから、図７及び図８に例示する状況下において、移動装置「１」は、移動経路を変更せずに移動経路に沿った移動を行うという行動決定がなされる。 FIG. 7 is an explanatory diagram showing an example of a nearby area of a moving device and a movement route superimposed on a layout in the movement control system disclosed in the present application. FIG. 8 is an explanatory diagram showing an example of the state of the vicinity area around the moving device in the movement control system disclosed herein. FIG. 7 shows a nearby area and a moving route regarding the mobile device "1," and FIG. 8 shows the state of the nearby area regarding the mobile device "1." Since another mobile device "3" exists within the neighborhood area defined around the mobile device "1", the Q function is updated. However, since the unit area in front of the moving device "1" is neither the unit area where the other moving device "3" exists nor the unit area where the moving device "3" is moving, it is difficult to move under the situations illustrated in FIGS. Device "1" makes a behavioral decision to move along the moving route without changing the moving route.

図９は、本願開示の移動制御システムにおいて、レイアウト上に、移動装置の近傍領域及び移動経路の一例を重畳して示す説明図である。図１０は、本願開示の移動制御システムにおいて、移動装置の周囲の近傍領域の状態の一例を示す説明図である。図９は、移動装置「２」に関する近傍領域及び移動経路を示しており、図１０は、移動装置「２」に関する近傍領域の状態を示している。移動装置「２」の周囲に定義された近傍領域内には、他の移動装置「４」が存在するため、Ｑ関数の更新が行われる。図９及び図１０に例示する状況下において、移動装置「２」は、移動する移動装置「４」との衝突を回避するため、移動経路に沿った移動をせず、減速、停止、左折、右折、後退等の回避行動に変更する決定がなされる。 FIG. 9 is an explanatory diagram showing an example of a nearby area of a moving device and a movement route superimposed on a layout in the movement control system disclosed in the present application. FIG. 10 is an explanatory diagram illustrating an example of the state of the vicinity area around the mobile device in the movement control system disclosed herein. FIG. 9 shows a nearby area and a moving route regarding mobile device "2," and FIG. 10 shows a state of the nearby area regarding mobile device "2." Since another mobile device "4" exists within the vicinity area defined around the mobile device "2", the Q function is updated. In the situations illustrated in FIGS. 9 and 10, in order to avoid collision with the moving moving device "4", the moving device "2" does not move along the moving route, but decelerates, stops, turns left, A decision is made to change to an evasive action such as turning right or going backwards.

図１１は、本願開示の移動制御システムにおけるＱ学習による行動更新の一例を概念的に示す説明図である。図１１において、右上の図は、移動装置「１」の周囲に定義された近傍領域に、周囲の観測結果を当て嵌めて示している。図１１の例において、移動装置「１」は、図中下方へ示す方向へ移動することを示している。移動制御システムでは、探索された移動経路に沿った移動の方向が、図中上方となるように、近傍領域に当て嵌めた状態を回転させる。近傍領域の回転は、図１１において、右上の図から左側の図への遷移として示されている。図１１に示す例では、進行方向を下方から上方に変換するため、１８０°回転させることになる。近傍領域の回転は、Ｑ学習等の強化学習を用いて予め作成しておいた学習モデルに適用するための処理である。 FIG. 11 is an explanatory diagram conceptually showing an example of behavior updating by Q learning in the movement control system disclosed herein. In FIG. 11, the upper right diagram shows the surrounding observation results applied to the vicinity area defined around the mobile device "1". In the example of FIG. 11, the moving device "1" is shown to move in the direction shown downward in the figure. The movement control system rotates the state fitted to the nearby area so that the direction of movement along the searched movement route is upward in the figure. The rotation of the neighborhood is shown in FIG. 11 as a transition from the top right diagram to the left diagram. In the example shown in FIG. 11, the traveling direction is changed from downward to upward, so the rotation is performed by 180°. Rotation of the neighborhood region is a process applied to a learning model created in advance using reinforcement learning such as Q learning.

図１１中の下方の図が、学習モデルの内容を概念的に示している。学習モデルは、近傍領域の様々な状態に識別番号となる状態番号を付与し、状態番号と、近傍領域の状態を示す状態パターンに応じた行動毎の報酬となるＱ値とを対応付けて記憶している。図１１中では、１、２、３、４、・・・、Ｎ－３、Ｎ－２、Ｎ－１、Ｎとの状態番号が付与されたＮ個（Ｎは自然数）の状態パターンを縦方向に並べて示しており、状態パターンに応じた行動毎のＱ値を横方向に並べて示している。行動は、例えば、上下左右（前後左右）の４方向にのみ移動可能なレイアウトである場合には、停止、上（前進）、下（後退）、左（左折）及び右（右折）の５通りであり、それぞれの行動に対してＱ値が計算され付与されている。 The lower diagram in FIG. 11 conceptually shows the content of the learning model. The learning model assigns state numbers, which serve as identification numbers, to various states in the neighborhood, and stores the state numbers in association with the Q value, which serves as a reward for each action, according to the state pattern that indicates the state of the neighborhood. are doing. In Figure 11, N state patterns (N is a natural number) with state numbers 1, 2, 3, 4, ..., N-3, N-2, N-1, N are displayed vertically. The Q values for each behavior according to the state pattern are shown side by side. For example, if the layout allows movement in only four directions: up, down, left, and right (front, back, left, and right), there are five actions: stop, up (forward), down (backward), left (left turn), and right (right turn). A Q value is calculated and assigned to each action.

移動制御システムは、周囲の観測結果に基づく近傍領域を回転させた結果に相当する状態パターンを識別する識別番号に基づいて、学習済みのＱ学習モデルを参照し、Ｑ学習モデルから行動に対して付与されたＱ値の配列を取得する。図１１の右方の図が、Ｑ学習モデルから取得したＱ値の配列を概念的に示している。図１１の例では、探索された移動経路に沿った移動を示す上方向が「１」、障害物が存在し衝突が発生する左及び右方向が「－１」、衝突は発生しないが無駄な動きとなる停止及び下方向が「０」となっている。移動制御システムは、取得した配列に従って、Ｑ値が最も高い行動となる上方向への行動をとるように移動装置を制御する。 The movement control system refers to the trained Q-learning model based on the identification number that identifies the state pattern corresponding to the result of rotating the neighborhood area based on the surrounding observation results, and determines the behavior from the Q-learning model. Get the array of assigned Q values. The diagram on the right side of FIG. 11 conceptually shows the arrangement of Q values obtained from the Q learning model. In the example shown in Figure 11, the upward direction indicating movement along the searched movement route is "1", the left and right directions where obstacles exist and collisions occur are "-1", and no collisions occur but useless Stop and downward direction, which are movements, are "0". The movement control system controls the movement device to take an upward action, which is an action with the highest Q value, according to the acquired arrangement.

近傍領域の状態パターンと、移動装置の行動との関係について、幾つかの例を挙げて説明する。図１２は、本願開示の移動制御システムにおける状態パターンとＱ値の配列との関係の一例を概念的に示す説明図である。図１２中、左方の図は、近傍領域に周囲の観測結果を当て嵌めた状態を示しており、右方の図は、探索された移動経路に沿った進行方向が上方となるように回転させた状態パターン及び状態パターンに基づいてＱ学習モデルから得られたＱ値の配列を示している。図１２は、行動制御の対象となる移動装置「０」の周囲に、障害物と、移動装置「１」及び移動装置「２」とが存在している例を示している。図１２において、移動装置「０」は、左方向へ移動（左折）すると障害物に衝突し、上方向へ移動（前進）すると移動装置「２」と衝突し、右方向へ移動（右折）すると移動装置「１」と衝突する。Ｑ値は、停止、上、下、左及び右が、それぞれ「０」、「－１」、「０」、「－１」及び「－１」となっている。図１２に示す例では、最大値となるＱ値は、「０」であるため、停止及び下が選択され、例えば、乱数により、移動装置「０」の移動は、停止又は下方向への移動（後退）に決定される。 The relationship between the state pattern of a nearby region and the behavior of a mobile device will be explained using several examples. FIG. 12 is an explanatory diagram conceptually showing an example of the relationship between the state pattern and the Q value arrangement in the movement control system disclosed herein. In Figure 12, the left diagram shows the surrounding observation results applied to the nearby area, and the right diagram is rotated so that the traveling direction along the searched movement route is upward. It shows an array of Q values obtained from the Q learning model based on the state pattern and the state pattern. FIG. 12 shows an example in which an obstacle, a mobile device "1", and a mobile device "2" exist around a mobile device "0" that is a target of behavior control. In FIG. 12, moving device "0" collides with an obstacle when moving to the left (turning left), colliding with moving device "2" when moving upward (moving forward), and moving to the right (turning right). Collision with mobile device "1". The Q values are "0", "-1", "0", "-1" and "-1" for stop, top, bottom, left and right, respectively. In the example shown in FIG. 12, the maximum Q value is "0", so stop and downward are selected. For example, depending on the random number, the movement of the moving device "0" is determined to be stop or downward movement. (retreat) is decided.

図１３は、本願開示の移動制御システムにおける状態パターンとＱ値の配列との関係の一例を概念的に示す説明図である。図１３は、図１２に例示した状態の他の例を示している。図１３は、行動制御の対象となる移動装置「０」の周囲に、障害物と、移動装置「１」、移動装置「２」及び移動装置「３」とが存在している例を示している。図１３において、移動装置「０」は、下方向へ移動すると障害物に衝突し、上方向へ移動すると移動装置「２」と衝突し、右方向へ移動又は停止すると移動装置「１」と衝突する。Ｑ値は、停止、上、下、左及び右が、それぞれ「－１」、「－１」、「－１」、「０」及び「－１」となっている。図１３に示す例では、最大値となるＱ値は、「０」であるため、移動装置「０」の移動は、左方向への移動に決定される。 FIG. 13 is an explanatory diagram conceptually showing an example of the relationship between a state pattern and an array of Q values in the movement control system disclosed herein. FIG. 13 shows another example of the state illustrated in FIG. 12. FIG. 13 shows an example in which an obstacle and mobile devices "1", "2", and "3" exist around mobile device "0", which is the target of behavior control. There is. In FIG. 13, moving device "0" collides with an obstacle when moving downward, colliding with moving device "2" when moving upward, and colliding with moving device "1" when moving or stopping to the right. do. The Q values are "-1", "-1", "-1", "0", and "-1" for stop, top, bottom, left, and right, respectively. In the example shown in FIG. 13, since the maximum Q value is "0", the movement of mobile device "0" is determined to be movement to the left.

図１４は、本願開示の移動制御システムにおける状態パターンとＱ値の配列との関係の一例を概念的に示す説明図である。図１４は、図１２及び図１３に例示した状態の更に他の例を示している。図１４は、行動制御の対象となる移動装置「０」の周囲に、障害物と、移動装置「１」及び移動装置「２」とが存在する例を示している。なお、移動装置「２」は、移動装置「０」の上方（前方）で停止している。図１４において、移動装置「０」は、下方向へ移動すると障害物に衝突し、上方向へ移動すると停止している移動装置「２」と衝突し、右方向へ移動又は停止すると移動装置「１」と衝突する。Ｑ値は、停止、上、下、左及び右が、それぞれ「－１」、「－１」、「－１」、「０」及び「－１」となっている。図１４に示す例では、最大値となるＱ値は、「０」であるため、移動装置「０」の移動は、左方向への移動に決定される。 FIG. 14 is an explanatory diagram conceptually showing an example of the relationship between a state pattern and an array of Q values in the movement control system disclosed herein. FIG. 14 shows still another example of the state illustrated in FIGS. 12 and 13. FIG. 14 shows an example in which an obstacle, a mobile device "1", and a mobile device "2" exist around a mobile device "0" that is a target of behavior control. Note that the mobile device "2" is stopped above (in front of) the mobile device "0". In FIG. 14, moving device "0" collides with an obstacle when moving downward, colliding with a stopped moving device "2" when moving upward, and moving device "2" when moving or stopping to the right. 1" collides. The Q values are "-1", "-1", "-1", "0", and "-1" for stop, top, bottom, left, and right, respectively. In the example shown in FIG. 14, since the maximum Q value is "0", the movement of mobile device "0" is determined to be movement to the left.

以上のように、本願開示の移動制御システムは、移動装置の移動経路をＡ^*アルゴリズム等のグラフ探索アルゴリズムを用いて探索し、移動開始後の移動装置の行動を、近傍領域の観測結果に応じてＱ学習モデルから得られた状態パターンのＱ値に基づいて決定する。 As described above, the movement control system disclosed in the present application searches the movement route of the mobile device using a graph search algorithm such as the A ^* algorithm, and determines the behavior of the mobile device after the movement starts according to the observation results of the nearby area. It is determined based on the Q value of the state pattern obtained from the Q learning model.

＜実装例＞
以降では、図面を参照しながら、図面に記載された本願開示の移動制御システムＳの実装例について説明する。 <Implementation example>
Hereinafter, an implementation example of the movement control system S of the present disclosure described in the drawings will be described with reference to the drawings.

＜システム構成＞
図１５は、本願開示の移動制御システムＳのシステム構成の一例を示す概念図である。移動制御システムＳは、複数の移動装置１と、移動装置１を制御する移動制御装置２と、移動制御装置２に学習モデルＭを提供する機械学習装置３とを備えている。移動制御システムＳは、大規模工場、倉庫等の施設における構内又は構外の物流に適用される。施設内には、磁気テープ等のガイドパスＰが地面に敷設されており、移動装置１は、縦横に敷設されたガイドパスＰに沿って移動する。移動装置１は、移動制御装置２と双方向通信を行う。機械学習装置３は、Ｑ学習等の強化学習により、Ｑ学習モデル等の学習モデルＭを生成する。機械学習装置３は、通信又は半導体メモリ等の記録媒体を介して、生成した学習モデルＭを移動制御装置２へ提供する。 <System configuration>
FIG. 15 is a conceptual diagram showing an example of the system configuration of the movement control system S disclosed herein. The movement control system S includes a plurality of movement devices 1, a movement control device 2 that controls the movement devices 1, and a machine learning device 3 that provides a learning model M to the movement control device 2. The movement control system S is applied to physical distribution within or outside a facility such as a large-scale factory or warehouse. Inside the facility, a guide path P such as a magnetic tape is laid down on the ground, and the moving device 1 moves along the guide path P laid out vertically and horizontally. The mobile device 1 performs bidirectional communication with the mobile control device 2. The machine learning device 3 generates a learning model M such as a Q learning model by reinforcement learning such as Q learning. The machine learning device 3 provides the generated learning model M to the movement control device 2 via communication or a recording medium such as a semiconductor memory.

＜各種装置のハードウェア構成＞
次に、移動制御システムＳが備える各種装置の構成例について説明する。図１６は、本願開示の移動制御システムＳにて用いられる移動装置１のハードウェア構成の一例を示すブロック図である。移動装置１は、自律移動が可能なＡＧＶ等の装置であり、移動の他、各種部品、商品等の搬送物の荷積み、荷下ろし等の搬送作業を行う。移動装置１は、制御部１０、記憶部１１、通信部１２、位置取得部１３、経路検出部１４、周辺状況取得部１５、駆動部１６、駆動制御部１７、搬送部１８、搬送制御部１９等の各種構成を備えている。 <Hardware configuration of various devices>
Next, configuration examples of various devices included in the movement control system S will be described. FIG. 16 is a block diagram showing an example of the hardware configuration of the mobile device 1 used in the mobile control system S disclosed herein. The moving device 1 is a device such as an AGV capable of autonomous movement, and performs transportation operations such as loading and unloading of transported objects such as various parts and products in addition to movement. The mobile device 1 includes a control section 10, a storage section 11, a communication section 12, a position acquisition section 13, a route detection section 14, a surrounding situation acquisition section 15, a drive section 16, a drive control section 17, a transport section 18, and a transport control section 19. It has various configurations such as

制御部１０は、装置全体を制御する処理を実行するＣＰＵ（Central Processing Unit ）等のプロセッサであり、情報処理回路、計時回路、レジスタ回路等の各種回路を備えている。 The control unit 10 is a processor such as a CPU (Central Processing Unit) that executes processing to control the entire device, and includes various circuits such as an information processing circuit, a time measurement circuit, and a register circuit.

記憶部１１は、ハードディスク、ＳＳＤ（Solid State Drive ）、ＲＡＩＤ（Redundant Arrays of Inexpensive Disks ）、フラッシュメモリ等の不揮発性メモリ、及び各種ＲＡＭ（Random Access Memory）等の揮発性メモリを用いて構成される記憶ユニットである。記憶部１１には、本願開示の移動制御システムＳに係る移動装置１を実現するための移動装置用プログラム１１０等の各種プログラムが記憶されている。記憶部１１に記憶された移動装置用プログラム１１０等の各種プログラムに含まれる各種ステップを制御部１０が実行することにより、ＡＧＶは、本願開示の移動装置１として機能する。 The storage unit 11 is configured using nonvolatile memories such as hard disks, SSDs (Solid State Drives), RAIDs (Redundant Arrays of Inexpensive Disks), flash memories, and volatile memories such as various RAMs (Random Access Memory). It is a storage unit. The storage unit 11 stores various programs such as a mobile device program 110 for implementing the mobile device 1 according to the mobile control system S disclosed herein. When the control unit 10 executes various steps included in various programs such as the mobile device program 110 stored in the storage unit 11, the AGV functions as the mobile device 1 of the present disclosure.

通信部１２は、ＩＥＥＥにて規定されるＷｉ-Ｆｉ（登録商標）等の無線通信規格にて規定される通信を行うためのハードウェア及びソフトウェアを用いて構成されるユニットである。通信部１２は、移動装置１の位置を示す位置情報等の情報を、移動制御装置２へ送信する送信処理、及び移動制御装置２から送信された行動命令等の情報を受信する受信処理を行う。 The communication unit 12 is a unit configured using hardware and software for performing communication defined by a wireless communication standard such as Wi-Fi (registered trademark) defined by IEEE. The communication unit 12 performs a transmission process of transmitting information such as position information indicating the position of the mobile device 1 to the movement control device 2, and a reception process of receiving information such as action commands transmitted from the movement control device 2. .

位置取得部１３は、自装置の位置を示す位置情報を取得するユニットである。位置取得部１３は、例えば、ＧＮＳＳ（Global Navigation Satellite System）、各種位置検出センサ等のハードウェア及びソフトウェアを用いて構成される。また、移動制御システムＳでは、例えば、施設内に画像センサ、光センサ、磁気センサ等の検出装置を配置し、検出装置が検出した結果に基づいて得られた移動装置１の位置を示す位置情報を、検出装置から移動装置１へ送信するようにしてもよい。検出装置から移動装置１へ位置情報を送信する場合、位置取得部１３は、位置情報を受信するユニットを用いて構成される。更に、検出装置が検出した結果に基づいて得られる各移動装置１の位置を示す位置情報を、検出装置から移動制御装置２へ送信し、移動制御装置２にて全ての移動装置１の位置を管理し、移動装置１は、移動制御装置２にアクセスして位置情報をするようにしてもよい。この場合も、位置取得部１３は、位置情報を受信するユニットを用いて構成されることになる。 The position acquisition unit 13 is a unit that acquires position information indicating the position of its own device. The position acquisition unit 13 is configured using, for example, hardware and software such as GNSS (Global Navigation Satellite System) and various position detection sensors. In addition, in the movement control system S, for example, a detection device such as an image sensor, an optical sensor, a magnetic sensor, etc. is arranged in the facility, and position information indicating the position of the movement device 1 obtained based on the results detected by the detection device is provided. may be transmitted from the detection device to the mobile device 1. When transmitting position information from the detection device to the mobile device 1, the position acquisition unit 13 is configured using a unit that receives position information. Furthermore, position information indicating the position of each mobile device 1 obtained based on the detection result by the detection device is transmitted from the detection device to the movement control device 2, and the movement control device 2 detects the positions of all the mobile devices 1. The mobile device 1 may access the mobile control device 2 to obtain position information. In this case as well, the position acquisition section 13 is configured using a unit that receives position information.

経路検出部１４は、移動装置１が経路に沿って移動していることを検出する各種センサ等のユニットである。例えば、磁気テープを用いたガイドパスＰが敷設されている場合、磁気センサ等のセンサが、経路検出部１４として用いられる。また、特定の色、発光等の特徴を有するテープをガイドパスＰとして敷設する場合、画像センサ等のセンサが用いられる。 The route detection unit 14 is a unit such as various sensors that detects that the mobile device 1 is moving along a route. For example, when a guide path P using magnetic tape is laid, a sensor such as a magnetic sensor is used as the route detection section 14. Further, when a tape having characteristics such as a specific color and light emission is laid as the guide path P, a sensor such as an image sensor is used.

周辺状況取得部１５は、周囲の状況を検出するセンサ及びセンサによる検出結果を解析する解析回路を用いて周辺状況を取得するユニットである。周囲の状況の検出には、周囲の状況を撮像して他の移動装置１及び障害物の状態を取得する画像センサ、レーザにより周囲の状況を検出するレーザセンサ、他の移動装置１と通信する近距離無線等のセンサが用いられる。 The surrounding situation acquisition unit 15 is a unit that obtains the surrounding situation using a sensor that detects the surrounding situation and an analysis circuit that analyzes the detection result by the sensor. To detect the surrounding situation, an image sensor captures an image of the surrounding situation and acquires the status of other mobile devices 1 and obstacles, a laser sensor detects the surrounding situation using a laser, and communicates with the other mobile device 1. A short-range wireless sensor or the like is used.

駆動部１６は、タイヤ及びタイヤを駆動するモータ等の移動に要するユニットであり、駆動制御部１７は、駆動部１６を制御するドライバを用いて構成される。搬送部１８は、ロボットアーム、ベルトコンベア等の搬送に要するユニットであり、搬送制御部１９は、搬送部１８を制御するドライバを用いて構成される。 The drive unit 16 is a unit required to move tires, motors that drive the tires, and the like, and the drive control unit 17 is configured using a driver that controls the drive unit 16. The conveyance section 18 is a unit required for conveyance such as a robot arm or a belt conveyor, and the conveyance control section 19 is configured using a driver that controls the conveyance section 18 .

図１７は、本願開示の移動制御システムＳにて用いられる移動制御装置２のハードウェア構成の一例を示すブロック図である。移動制御装置２は、移動装置１を制御する装置であり、サーバコンピュータ等のコンピュータを用いて実現される。移動制御装置２は、制御部２０、記憶部２１、第１通信部２２、第２通信部２３、タスク取得部２４等の各種構成を備えている。 FIG. 17 is a block diagram showing an example of the hardware configuration of the movement control device 2 used in the movement control system S disclosed herein. The movement control device 2 is a device that controls the movement device 1, and is realized using a computer such as a server computer. The movement control device 2 includes various components such as a control section 20, a storage section 21, a first communication section 22, a second communication section 23, and a task acquisition section 24.

制御部２０は、装置全体を制御するＣＰＵ等のプロセッサであり、情報処理回路、計時回路、レジスタ回路等の各種回路を備えている。 The control unit 20 is a processor such as a CPU that controls the entire device, and includes various circuits such as an information processing circuit, a clock circuit, and a register circuit.

記憶部２１は、不揮発性メモリ及び揮発性メモリを用いて構成される記憶ユニットである。記憶部２１には、基本プログラム（ＯＳ：Operating System）、基本プログラム上で動作する応用プログラム（アプリケーションプログラム）等の各種プログラム及び各種データが記憶される。応用プログラムの一つとして、記憶部２１には、本願開示の移動制御システムＳに係る移動制御装置２を実現するための移動制御プログラム２１０が記憶されている。更に、記憶部２１には、機械学習装置３から取得したＱ学習モデル等の学習モデルＭが記憶されている。 The storage section 21 is a storage unit configured using nonvolatile memory and volatile memory. The storage unit 21 stores various programs such as a basic program (OS: Operating System), application programs that operate on the basic program, and various data. As one of the application programs, the storage unit 21 stores a movement control program 210 for realizing the movement control device 2 according to the movement control system S disclosed herein. Further, the storage unit 21 stores a learning model M such as a Q learning model acquired from the machine learning device 3.

第１通信部２２は、移動装置１と通信するためのユニットである。第１通信部２２は、移動装置１の位置を示す位置情報等の情報を移動装置１から受信し、行動命令等の情報を移動装置１へ送信する。 The first communication unit 22 is a unit for communicating with the mobile device 1. The first communication unit 22 receives information such as position information indicating the position of the mobile device 1 from the mobile device 1, and transmits information such as action commands to the mobile device 1.

第２通信部２３、機械学習装置３と通信するためのユニットであり、ＷＡＮ（Wide Area Network ）、ＬＡＮ（Local Area Network）、専用通信網、通信線等の通信手段により、機械学習装置３と通信する。移動制御装置２は、機械学習装置３から学習モデルＭ等の情報を、第２通信部２３を介して受信するが、半導体メモリ等の通信媒体を介して情報のやりとりを行うことも可能である。 The second communication unit 23 is a unit for communicating with the machine learning device 3, and communicates with the machine learning device 3 through communication means such as a WAN (Wide Area Network), a LAN (Local Area Network), a dedicated communication network, and a communication line. connect. The movement control device 2 receives information such as the learning model M from the machine learning device 3 via the second communication unit 23, but it is also possible to exchange information via a communication medium such as a semiconductor memory. .

タスク取得部２４は、移動装置１に実行させるタスクを取得するユニットである。移動装置１に実行させるタスクとは、例えば、部品、商品等の搬送物を、指定された荷積みノードから指定された荷下ろしノードへ搬送する動作である。タスクの取得は、スケジューリングプログラムによる自動発生、担当者による入力、他の装置からの命令の受信等の方法で行われる。 The task acquisition unit 24 is a unit that acquires a task to be executed by the mobile device 1. The task to be executed by the mobile device 1 is, for example, an operation of transporting objects such as parts or products from a designated loading node to a designated unloading node. Tasks are acquired by methods such as automatic generation by a scheduling program, input by a person in charge, or reception of commands from other devices.

以上のように構成されたコンピュータは、制御部２０の制御により、記憶部２１から移動制御プログラム２１０等の各種プログラムを読み取り、各種プログラムに含まれている各種ステップを実行することにより、本願開示の移動制御装置２として動作する。 The computer configured as described above reads various programs such as the movement control program 210 from the storage unit 21 under the control of the control unit 20, and executes various steps included in the various programs, thereby achieving the system disclosed in the present application. It operates as a movement control device 2.

図１８は、本願開示の機械学習装置３のハードウェア構成の一例を示すブロック図である。機械学習装置３は、移動装置１の行動決定に用いられるＱ学習モデル等の学習モデルＭを生成する装置であり、汎用コンピュータ等のコンピュータを用いて実現される。機械学習装置３は、制御部３０、記憶部３１、通信部３２等の各種構成を備えている。 FIG. 18 is a block diagram showing an example of the hardware configuration of the machine learning device 3 disclosed herein. The machine learning device 3 is a device that generates a learning model M such as a Q learning model used for determining the behavior of the mobile device 1, and is realized using a computer such as a general-purpose computer. The machine learning device 3 includes various components such as a control section 30, a storage section 31, a communication section 32, and the like.

記憶部３１には、本願開示の移動制御システムＳに係る機械学習装置３を実現するための機械学習用プログラム３１０等の各種プログラムが記憶されている。更に、記憶部３１には、機械学習により生成するＱ学習モデル等の学習モデルＭが記憶される。記憶部３１に記憶された学習モデルＭは、例えば、通信部３２を介して移動制御装置２へ提供される。 The storage unit 31 stores various programs such as a machine learning program 310 for realizing the machine learning device 3 related to the movement control system S disclosed herein. Furthermore, the storage unit 31 stores a learning model M such as a Q learning model generated by machine learning. The learning model M stored in the storage unit 31 is provided to the movement control device 2 via the communication unit 32, for example.

＜各種装置のソフトウェア処理＞
次に移動制御システムＳが備える各種装置のソフトウェア処理について説明する。図１９は、本願開示の移動制御システムＳが備える機械学習装置３の機械学習処理の一例を示すフローチャートである。機械学習処理は、学習モデルＭを生成する処理である。機械学習装置３が備える制御部３０は、機械学習用プログラム３１０を実行することにより、機械学習処理を実行する。 <Software processing of various devices>
Next, software processing of various devices included in the movement control system S will be explained. FIG. 19 is a flowchart illustrating an example of machine learning processing of the machine learning device 3 included in the movement control system S disclosed herein. The machine learning process is a process that generates a learning model M. The control unit 30 included in the machine learning device 3 executes machine learning processing by executing the machine learning program 310.

機械学習装置３の制御部３０は、移動装置１の行動を規定する近傍領域を対象としたＱ関数を初期化する（Ｓ１０１）。制御部３０は、近傍領域で想定される状態パターンを作成する（Ｓ１０２）。ステップＳ１０２では、図２にて例示した進行方向を上方とする近傍領域に対し、図３等に例示した障害物の配置及び他の移動装置１の動きを想定した状況を反映させた状態パターンを作成する。 The control unit 30 of the machine learning device 3 initializes a Q function targeting a nearby region that defines the behavior of the mobile device 1 (S101). The control unit 30 creates a state pattern assumed in the nearby area (S102). In step S102, a state pattern is created that reflects the situation assuming the arrangement of obstacles and the movement of other moving devices 1 as shown in FIG. create.

制御部３０は、移動装置１の現状を観測し（Ｓ１０３）、様々なパターンの行動を決定し（Ｓ１０４）、決定した行動を取った場合に生じる次の状況を観測し（Ｓ１０５）、観測した結果に基づきＱ値となる報酬を算出し（Ｓ１０６）、算出した報酬に基づいてＱ関数を更新する（Ｓ１０７）。ステップＳ１０４では、例えば、設定された移動経路に沿った移動に対し、前進、左折（左方向へ移動）、右折（右方向へ移動）、後退、停止等の行動が決定される。ステップＳ１０６では、例えば、設定された移動経路に沿った移動を示す前進が「１」、左折及び右折が「－１」、後退及び停止が「０」となるように報酬が算出される。 The control unit 30 observes the current state of the mobile device 1 (S103), determines various patterns of behavior (S104), and observes the next situation that will occur when the determined behavior is taken (S105). A reward serving as a Q value is calculated based on the result (S106), and a Q function is updated based on the calculated reward (S107). In step S104, for example, actions such as forward movement, left turn (move to the left), right turn (move to the right), retreat, and stop are determined for movement along the set movement route. In step S106, the reward is calculated such that, for example, forward motion indicating movement along the set movement route is "1", left turn and right turn is "-1", and backward motion and stop is "0".

制御部３０は、想定される全ての状態パターンについての強化学習を終了したか否かを判定する（Ｓ１０８）。 The control unit 30 determines whether reinforcement learning for all assumed state patterns has been completed (S108).

ステップＳ１０８において、強化学習を終了していない状態パターンが存在すると判定した場合（Ｓ１０８：ＮＯ）、制御部３０は、他の状態パターンについての強化学習を行うべく、ステップＳ１０２へ戻り、以降の処理を繰り返す。 In step S108, if it is determined that there is a state pattern for which reinforcement learning has not been completed (S108: NO), the control unit 30 returns to step S102 to perform reinforcement learning on other state patterns, and performs subsequent processing. repeat.

ステップＳ１０８において、全ての状態パターンについての強化学習を終了したと判定した場合（Ｓ１０８：ＹＥＳ）、制御部３０は、機械学習処理を終了する。強化学習により更新されたＱ関数は、Ｑ学習モデル等の学習モデルＭとして、機械学習装置３から移動制御装置２へ提供される。 In step S108, if it is determined that the reinforcement learning for all state patterns has been completed (S108: YES), the control unit 30 ends the machine learning process. The Q function updated by reinforcement learning is provided from the machine learning device 3 to the movement control device 2 as a learning model M such as a Q learning model.

以上のようにして、機械学習装置３は、機械学習処理を実行する。 As described above, the machine learning device 3 executes machine learning processing.

図２０は、本願開示の移動制御システムＳが備える移動制御装置２の移動制御処理の一例を示すフローチャートである。移動制御処理は、移動装置１の行動を周囲の状況に応じて制御する処理である。移動制御装置２が備える制御部２０は、移動制御プログラム２１０を実行することにより、移動制御処理を実行する。 FIG. 20 is a flowchart illustrating an example of a movement control process of the movement control device 2 included in the movement control system S disclosed herein. The movement control process is a process that controls the behavior of the mobile device 1 according to the surrounding situation. The control unit 20 included in the movement control device 2 executes movement control processing by executing the movement control program 210.

移動制御装置２の制御部２０は、タスク取得部２４がタスクを取得したか否かを判定する（Ｓ２０１）。 The control unit 20 of the movement control device 2 determines whether the task acquisition unit 24 has acquired a task (S201).

ステップＳ２０１において、タスクを取得していないと判定した場合（Ｓ２０１：ＮＯ）、制御部２０は、ステップＳ２０１に戻り、以降の処理を繰り返す。 If it is determined in step S201 that no task has been acquired (S201: NO), the control unit 20 returns to step S201 and repeats the subsequent processing.

ステップＳ２０１において、タスクを取得したと判定した場合（Ｓ２０１：ＹＥＳ）、制御部２０は、制御下にある複数の移動装置１のうちで、タスクを与えられておらず待機中の移動装置１が存在するか否かを判定する（Ｓ２０２）。ステップＳ２０２において、移動制御装置２は、各移動装置１と通信し、待機中、タスク実行中、待機場所へ帰還中等の状況を取得し、取得した状況に基づいて、待機中の移動装置１の有無を判定する。 In step S201, if it is determined that the task has been acquired (S201: YES), the control unit 20 determines that among the plurality of mobile devices 1 under control, the mobile device 1 that has not been given a task and is on standby is It is determined whether it exists (S202). In step S202, the movement control device 2 communicates with each mobile device 1, acquires the status of the mobile device 1 in standby, executing a task, returning to the waiting place, etc., and based on the acquired status, the mobile device 1 in standby Determine the presence or absence.

ステップＳ２０２において、待機中の移動装置１が存在しないと判定した場合（Ｓ２０２：ＮＯ）、制御部２０は、ステップＳ２０２に戻り、以降の処理を繰り返す。待機中の移動装置１の有無の判定の繰り返し処理は、待機中の移動装置１が発生するまで待機する処理である。なお、取得したタスクに対し、タスクを実行できない旨を返信する処理を行う等、待機中の移動装置１が存在しない場合の処理は、適宜設計することが可能である。 If it is determined in step S202 that there is no mobile device 1 on standby (S202: NO), the control unit 20 returns to step S202 and repeats the subsequent processing. The repetitive process of determining whether there is a mobile device 1 on standby is a process of waiting until a mobile device 1 on standby occurs. Note that processing when there is no mobile device 1 on standby can be designed as appropriate, such as processing to reply to the acquired task that the task cannot be executed.

ステップＳ２０２において、待機中の移動装置１が存在すると判定した場合（Ｓ２０２：ＹＥＳ）、制御部２０は、待機中の移動装置１からタスクを割り当てる移動装置１を決定し、決定した移動装置１にタスクを割り当てる（Ｓ２０３）。 If it is determined in step S202 that there is a mobile device 1 on standby (S202: YES), the control unit 20 determines a mobile device 1 to which a task is to be assigned from among the mobile devices 1 on standby, and assigns the task to the mobile device 1 that is on standby. A task is assigned (S203).

制御部２０は、タスクを割り当てた移動装置１の待機位置を取得し、取得した待機位置から移動の目標位置までの経路を探索する（Ｓ２０４）。ステップＳ２０４は、移動装置１の移動開始位置に対応するスタートノードから、目標位置に対応するゴールノードまでの移動経路を、Ａ^*アルゴリズム等のグラフ探索アルゴリズムにて探索する処理である。ステップＳ２０４では、例えば、取得したタスクが搬送物の搬送である場合、先ず、待機充電を行うノードから荷積みノードまでの経路探索を行い、荷積みノードから荷下ろしノードまでの経路探索を行い、更に、荷下ろしノードから待機充電を行うノードまでの経路探索を行う。 The control unit 20 acquires the standby position of the mobile device 1 to which the task has been assigned, and searches for a route from the acquired standby position to the target position of movement (S204). Step S204 is a process of searching for a movement route from the start node corresponding to the movement start position of the mobile device 1 to the goal node corresponding to the target position using a graph search algorithm such as the A ^* algorithm. In step S204, for example, if the acquired task is transportation of goods, first, a route search is performed from the node that performs standby charging to the loading node, a route search is performed from the loading node to the unloading node, Furthermore, a route search is performed from the unloading node to the node that performs standby charging.

制御部２０は、探索により得られた経路を示す経路情報を移動命令として、第１通信部２２から移動装置１へ送信し、移動経路に沿った移動を指示する移動指示処理を行う（Ｓ２０５）。ステップＳ２０５において、制御部２０は、タスクを割り当てた移動装置１に対し、第１通信部２２から移動命令を送信する。移動命令を受信した移動装置１は、受信した移動命令に従って移動を開始する。 The control unit 20 transmits route information indicating the route obtained through the search as a movement command from the first communication unit 22 to the mobile device 1, and performs movement instruction processing to instruct movement along the movement route (S205). . In step S205, the control unit 20 transmits a movement command from the first communication unit 22 to the mobile device 1 to which the task has been assigned. The mobile device 1 that has received the movement command starts moving according to the received movement command.

制御部２０は、移動装置１が、タスクを完了させたか否かを判定する（Ｓ２０６）。 The control unit 20 determines whether the mobile device 1 has completed the task (S206).

ステップＳ２０６において、タスクを完了させていないと判定した場合（Ｓ２０６：ＮＯ）、制御部２０は、移動装置１の周囲の状況を取得する（Ｓ２０７）。移動制御装置２が、全ての移動装置１の位置及び状況並びに障害物の状況を把握できている場合、制御部２０は、周囲の状況を読み取ることにより、ステップＳ２０７の取得処理を実行する。移動制御装置２が、他の移動装置１の位置及び状況を把握できていない場合、制御部２０は、行動制御の対象となる移動装置１へ周囲の状況の取得を要求する命令を第１通信部２２から送信し、移動装置１が周辺状況取得部１５により取得した周辺状況を示す情報を第１通信部２２にて受信することにより、ステップＳ２０７の取得処理を実行する。なお、移動制御装置２は、これらの取得処理を併用するようにしてもよく、併用することで、状況取得の精度向上を図ることができる。 If it is determined in step S206 that the task has not been completed (S206: NO), the control unit 20 acquires the surrounding situation of the mobile device 1 (S207). If the movement control device 2 is able to grasp the positions and situations of all the movement devices 1 and the situations of obstacles, the control unit 20 executes the acquisition process in step S207 by reading the surrounding situation. If the movement control device 2 is unable to grasp the position and situation of another movement device 1, the control unit 20 sends a command to the movement device 1 that is the target of behavior control to request acquisition of the surrounding situation through the first communication. The acquisition process of step S207 is executed by receiving, at the first communication unit 22, the information indicating the surrounding situation transmitted from the first communication unit 22 and acquired by the surrounding situation acquisition unit 15 by the mobile device 1. Note that the movement control device 2 may use these acquisition processes in combination, and by using them in combination, it is possible to improve the accuracy of situation acquisition.

制御部２０は、取得した移動装置１の周囲の状況に基づいて、近傍領域に含まれる全ての単位領域について進入の可否を判定する（Ｓ２０８）。ステップＳ２０８において、制御部２０は、他の移動装置１が位置する単位領域及び他の移動装置１が移動する単位領域と、障害物が存在する単位領域を進入不可と判定する。 The control unit 20 determines whether entry is possible for all unit areas included in the vicinity area based on the acquired situation around the mobile device 1 (S208). In step S208, the control unit 20 determines that the unit area where the other moving device 1 is located, the unit area where the other moving device 1 moves, and the unit area where the obstacle exists cannot be entered.

制御部２０は、ステップＳ２０８の判定結果に基づいて、近傍領域に進入不可の単位領域があるか否かを判定する（Ｓ２０９）。ステップＳ２０９において、障害物の存在による進入不可の単位領域は、経路探索の段階で考慮済みであるため、他の移動装置１に起因する進入不可の単位領域のみを考慮して、進入不可の単位領域の有無を判定することにより、処理を高速化することが可能となる。なお、ステップＳ２０９の処理は、配置物の転倒、搬送物の落下等の予期せぬ障害物の出現を考慮して、障害物も含めて進入不可の単位領域の有無を判定するようにしてもよい。障害物も含めて進入不可の単位領域の有無を判定する場合、移動装置１の周囲の状況の取得は、移動装置１が備える周辺状況取得装置が取得した周囲の状況の検出の結果を用いることが望ましい。 Based on the determination result of step S208, the control unit 20 determines whether there is a unit area in the vicinity that cannot be entered (S209). In step S209, since unit areas that cannot be entered due to the presence of obstacles have already been considered at the route search stage, only unit areas that cannot be entered due to other mobile devices 1 are taken into consideration. By determining the presence or absence of a region, it is possible to speed up the processing. Note that the process in step S209 may also be performed by taking into account the occurrence of unexpected obstacles such as overturning of placed objects and falling of conveyed objects, and determining the presence or absence of a unit area that cannot be entered, including obstacles. good. When determining the presence or absence of a unit area that cannot be entered, including obstacles, the surrounding situation of the mobile device 1 can be obtained by using the detection result of the surrounding situation obtained by the surrounding situation acquisition device provided in the mobile device 1. is desirable.

ステップＳ２０９において、近傍領域に、他の移動装置１に起因する進入不可の単位領域がないと判定した場合（Ｓ２０９：ＮＯ）、ステップＳ２０５に戻り、制御部２０は、移動経路に沿った移動を指示する移動指示処理を行う（Ｓ２０５）。ステップＳ２０５において、制御部２０は、移動経路に沿った移動を継続する移動命令を第１通信部２２から移動装置１へ送信し、ステップＳ２０６以降の処理を繰り返す。 In step S209, if it is determined that there is no unit area in the vicinity that cannot be entered due to another moving device 1 (S209: NO), the process returns to step S205, and the control unit 20 controls the movement along the movement route. A movement instruction process is performed (S205). In step S205, the control unit 20 transmits a movement command to continue movement along the movement route from the first communication unit 22 to the mobile device 1, and repeats the processing from step S206 onwards.

ステップＳ２０９において、近傍領域に含まれる単位領域に進入不可の単位領域が含まれると判定した場合（Ｓ２０９：ＹＥＳ）、制御部２０は、近傍領域に含まれる各単位領域への進入可否を示す状態パターンに対応する方策を、学習モデルＭから取得する（Ｓ２１０）。取得した方策に基づいて、制御部２０は、移動装置１の行動を更新し（Ｓ２１１）、更新した行動を指示する行動命令を、第１通信部２２から移動装置１へ送信することにより、移動装置１に対する行動指示を行う（Ｓ２１２）。ステップＳ２１０～Ｓ２１２の処理は、図１１を用いて説明した学習モデルＭに基づいて、Ｑ値が最も高い行動を移動装置１に取らせる処理となる。なお、行動命令により指示される行動には、左折、右折、後退等の行動だけでなく、減速、停止等の行動も含まれる。 In step S209, if it is determined that the unit areas included in the nearby area include a unit area that cannot be entered (S209: YES), the control unit 20 sets a state indicating whether or not it is possible to enter each unit area included in the nearby area. A policy corresponding to the pattern is acquired from the learning model M (S210). Based on the acquired policy, the control unit 20 updates the behavior of the mobile device 1 (S211), and transmits an action command instructing the updated behavior from the first communication unit 22 to the mobile device 1. An action instruction is given to the device 1 (S212). The processing in steps S210 to S212 is a process for causing the mobile device 1 to take the action with the highest Q value based on the learning model M described using FIG. 11. Note that the actions instructed by the action command include not only actions such as left turn, right turn, and backing up, but also actions such as deceleration and stopping.

通信部１２にて行動命令を受信した移動装置１は、行動命令の指示に基づいて、移動、減速、停止等の行動を行う。 The mobile device 1 that has received the action command through the communication unit 12 performs actions such as moving, decelerating, and stopping based on the action command.

制御部２０は、行動後の移動装置１が、移動経路探索処理にて探索された移動経路から外れたか否かを判定する（Ｓ２１３）。ステップＳ２１３では、行動の更新によりとった行動が左折又は右折である場合、移動経路から外れており、行動が減速、停止又は後退である場合、移動経路上に位置すると判定する。 The control unit 20 determines whether the mobile device 1 after the action has deviated from the travel route searched in the travel route search process (S213). In step S213, if the action taken by updating the action is a left turn or right turn, it is determined that the action is off the movement route, and if the action is deceleration, stopping, or retreating, it is determined that the action is located on the movement route.

ステップＳ２１３において、移動経路から外れておらず、移動経路上に位置すると判定した場合（Ｓ２１３：ＮＯ）、制御部２０は、ステップＳ２０５へ戻り、以降の処理を繰り返す。 In step S213, if it is determined that the object is located on the movement route and not off the movement route (S213: NO), the control unit 20 returns to step S205 and repeats the subsequent processing.

ステップＳ２１３において、移動経路から外れたと判定した場合（Ｓ２１３：ＹＥＳ）、制御部２０は、経路の再設定を行う（Ｓ２１４）。経路の再設定は、移動経路から外れた位置まで戻り、そこからの移動を再開する復帰処理、外れた位置から再度移動経路を探索する再探索処理等の処理として行われる。 If it is determined in step S213 that the vehicle has deviated from the travel route (S213: YES), the control unit 20 resets the route (S214). The resetting of the route is performed as a return process in which the vehicle returns to a position from which it deviated from the movement route and resumes movement from there, and a re-search process in which the movement route is searched again from the position from which it deviated.

ステップＳ２０６において、タスクを完了させたと判定した場合（Ｓ２０６：ＹＥＳ）、制御部２０は、移動制御処理を終了する。 In step S206, if it is determined that the task has been completed (S206: YES), the control unit 20 ends the movement control process.

以上のようにして、移動制御装置２は、移動制御処理を実行する。 As described above, the movement control device 2 executes movement control processing.

上述した移動制御装置２の移動制御処理は、移動制御装置２が移動装置１の行動を集中制御する形態である。本願開示の移動制御システムＳは、様々な形態に展開することが可能であり、移動装置１自体が行動判断する分散制御として実現することも可能である。以下では、個々の移動装置１が行動判断する形態の例について説明する。 The movement control processing of the movement control device 2 described above is a form in which the movement control device 2 centrally controls the behavior of the movement device 1. The movement control system S disclosed in the present application can be deployed in various forms, and can also be implemented as distributed control in which the movement device 1 itself makes behavior decisions. Below, an example of a mode in which each mobile device 1 makes a behavior determination will be described.

図２１は、本願開示の移動制御システムＳが備える移動制御装置２の移動指示処理の一例を示すフローチャートである。移動指示処理は、移動制御装置２が、移動装置１に対してタスクを割り当て、移動を指示する処理である。移動制御装置２が備える制御部２０は、移動制御プログラム２１０を実行することにより、移動指示処理を実行する。 FIG. 21 is a flowchart illustrating an example of movement instruction processing of the movement control device 2 included in the movement control system S disclosed herein. The movement instruction process is a process in which the movement control device 2 assigns a task to the mobile device 1 and instructs it to move. The control unit 20 included in the movement control device 2 executes movement instruction processing by executing the movement control program 210.

移動制御装置２の制御部２０は、タスク取得部２４がタスクを取得したか否かを判定する（Ｓ３０１）。ステップＳ３０１の処理は、タスクを取得するまで繰り返し実行される。 The control unit 20 of the movement control device 2 determines whether the task acquisition unit 24 has acquired a task (S301). The process in step S301 is repeatedly executed until a task is acquired.

ステップＳ３０１において、タスクを取得したと判定した場合（Ｓ３０１：ＹＥＳ）、制御部２０は、待機中の移動装置１が存在するか否かを判定する（Ｓ３０２）。ステップＳ３０２の処理は、待機中の移動装置１を確認するまで繰り返し実行される。 In step S301, if it is determined that the task has been acquired (S301: YES), the control unit 20 determines whether or not there is a waiting mobile device 1 (S302). The process of step S302 is repeatedly executed until the waiting mobile device 1 is confirmed.

ステップＳ３０２において、待機中の移動装置１が存在すると判定した場合（Ｓ３０２：ＹＥＳ）、制御部２０は、待機中の移動装置１にタスクを割り当て（Ｓ３０３）、目標位置までの経路を探索し（Ｓ３０４）、探索した経路を示す移動経路に沿った移動を指示する（Ｓ３０５）。ステップＳ３０５の移動の指示は、移動制御装置２が、移動経路を含む移動命令を第１通信部２２から移動装置１へ送信することにより行われる。なお、移動制御装置２は、移動命令として目標位置を移動装置１に通知し、移動装置１にて経路探索を行うようにしてもよい。 In step S302, if it is determined that there is a mobile device 1 on standby (S302: YES), the control unit 20 assigns a task to the mobile device 1 on standby (S303), and searches for a route to the target position ( S304), and instructs movement along the travel route indicating the searched route (S305). The movement instruction in step S305 is performed by the movement control device 2 transmitting a movement command including a movement route from the first communication unit 22 to the mobile device 1. Note that the movement control device 2 may notify the movement device 1 of the target position as a movement command, and the movement device 1 may perform a route search.

以上のようにして、移動制御装置２は、移動指示処理を実行する。 As described above, the movement control device 2 executes the movement instruction process.

図２２は、本願開示の移動制御システムＳが備える移動装置１の移動制御処理の一例を示すフローチャートである。移動制御処理は、移動装置１が、移動命令に従って自装置の移動を制御する処理である。移動装置１が備える制御部１０は、移動装置用プログラム１１０を実行することにより、移動制御処理を実行する。 FIG. 22 is a flowchart illustrating an example of a movement control process of the mobile device 1 included in the movement control system S disclosed herein. The movement control process is a process in which the mobile device 1 controls the movement of its own device according to a movement command. A control unit 10 included in the mobile device 1 executes a movement control process by executing a mobile device program 110.

移動装置１の制御部１０は、通信部１２にて移動命令を受信し（Ｓ４０１）、受信した移動命令に基づいて、移動経路に沿って移動し（Ｓ４０２）、タスクを完了したか否かを判定する（Ｓ４０３）。 The control unit 10 of the mobile device 1 receives a movement command through the communication unit 12 (S401), moves along the movement route based on the received movement command (S402), and determines whether or not the task is completed. Determination is made (S403).

タスクを完了していないと判定した場合（Ｓ４０３：ＮＯ）、制御部１０は、周囲の状況を取得し（Ｓ４０４）、近傍領域に含まれる各単位領域について進入の可否を判定する（Ｓ４０５）。ステップＳ４０４において、全ての移動装置１の状況を移動制御装置２が把握している場合、移動装置１は、移動制御装置２にアクセスして周囲の状況を取得する。移動制御装置２が移動装置１の状況を把握していない場合、移動装置１は、周辺状況取得部１５により他の移動装置１の状況を含む周辺状況を取得する。 If it is determined that the task has not been completed (S403: NO), the control unit 10 acquires the surrounding situation (S404), and determines whether or not each unit area included in the nearby area can be entered (S405). In step S404, if the movement control device 2 knows the status of all the mobile devices 1, the mobile device 1 accesses the movement control device 2 and acquires the surrounding situation. If the movement control device 2 does not know the situation of the mobile device 1, the mobile device 1 uses the surrounding situation acquisition unit 15 to acquire the surrounding situation including the situation of other mobile devices 1.

制御部１０は、ステップＳ４０５の判定結果に基づいて、近傍領域に進入不可の単位領域があるか否かを判定し（Ｓ４０６）、他の移動装置１に起因する進入不可の領域がないと判定した場合（Ｓ４０６：ＮＯ）、ステップＳ４０２に戻り、移動経路に沿った移動を継続する（Ｓ４０２）。 Based on the determination result in step S405, the control unit 10 determines whether there is a unit area that cannot be entered in the nearby area (S406), and determines that there is no area that cannot be entered due to another mobile device 1. If so (S406: NO), the process returns to step S402 and continues moving along the movement route (S402).

ステップＳ４０６において、近傍領域に含まれる単位領域に進入不可の単位領域が含まれると判定した場合（Ｓ４０６：ＹＥＳ）、制御部１０は、状態パターンに対応する方策を学習モデルＭから取得する（Ｓ４０７）。学習モデルＭは、移動装置１の記憶部１１に記憶するようにしてもよく、移動装置１が移動制御装置２にアクセスし、移動制御装置２に記憶されている学習モデルＭを利用するようにしてもよい。 In step S406, if it is determined that the unit area included in the neighborhood area includes a unit area that cannot be entered (S406: YES), the control unit 10 acquires a policy corresponding to the state pattern from the learning model M (S407 ). The learning model M may be stored in the storage unit 11 of the mobile device 1, and the mobile device 1 may access the movement control device 2 and use the learning model M stored in the movement control device 2. You can.

制御部１０は、取得した方策に基づいて行動を更新し（Ｓ４０８）、更新した行動を実行する（Ｓ４０９）。行動後、制御部１０は、移動経路から外れたか否かを判定し（Ｓ４１０）、移動経路上に位置すると判定した場合（Ｓ４１０：ＮＯ）、ステップＳ４０２へ戻り、移動経路に沿った移動を継続する（Ｓ４０２）。 The control unit 10 updates the behavior based on the acquired policy (S408), and executes the updated behavior (S409). After the action, the control unit 10 determines whether or not it has deviated from the movement route (S410), and if it is determined that it is located on the movement route (S410: NO), returns to step S402 and continues moving along the movement route. (S402).

ステップＳ４１０において、移動経路から外れたと判定した場合（Ｓ４１０：ＹＥＳ）、制御部１０は、経路の再設定を行い（Ｓ４１１）、ステップＳ４０２へ戻り、移動を継続する（Ｓ４０２）。 In step S410, if it is determined that the object has deviated from the movement route (S410: YES), the control unit 10 resets the path (S411), returns to step S402, and continues movement (S402).

ステップＳ４０３において、タスクを完了させたと判定した場合（Ｓ４０３：ＹＥＳ）、制御部１０は、移動制御処理を終了する。 In step S403, if it is determined that the task has been completed (S403: YES), the control unit 10 ends the movement control process.

以上のようにして、移動装置１は、移動制御処理を実行する。 As described above, the mobile device 1 executes the movement control process.

＜シミュレーション結果＞
次に、本願開示の移動制御システムについて、シミュレーションテストを実行した結果について説明する。図２３は、本願開示の移動制御システムのシミュレーションテストに用いたグラフを示す説明図である。図２４は、本願開示の移動制御システムのシミュレーションテストに用いたグラフに、各移動装置の移動経路を重畳して示す説明図である。図２３に示す説明図は、ＡＧＶ等の移動装置が移動可能な範囲を、ノード及びエッジの関係として定義したグラフである。図２３において、黒丸はノードを示し、ノードを繋ぐ実線はエッジを示している。図２４において、「１」～「７」の番号を付した白丸は、移動装置のスタートノードを示し、白丸から延びる矢印は、移動装置が移動した経路を示し、矢印の先端がゴールノードを示している。図２３及び図２４において、移動装置「３」は、ノード「７６」において、移動装置「７」との衝突を回避するため、１ステップ分の待機を行った。また、移動装置「１」は、ノード「１３１」に一度到達した後、移動装置「２」との衝突を回避するため、一旦、ノード［１１３］への回避を行い、再度、ノード［１３１］へ移動した。以上のように、本願開示の移動制御システムにおいて、移動装置は、他の移動装置との衝突を回避しながらゴールノードまで移動した。 <Simulation results>
Next, the results of a simulation test performed on the movement control system disclosed in the present application will be described. FIG. 23 is an explanatory diagram showing a graph used in a simulation test of the movement control system disclosed in the present application. FIG. 24 is an explanatory diagram showing the movement route of each mobile device superimposed on the graph used in the simulation test of the movement control system disclosed in the present application. The explanatory diagram shown in FIG. 23 is a graph in which the movable range of a mobile device such as an AGV is defined as a relationship between nodes and edges. In FIG. 23, black circles indicate nodes, and solid lines connecting nodes indicate edges. In FIG. 24, the white circles numbered "1" to "7" indicate the start nodes of the mobile device, the arrows extending from the white circles indicate the route traveled by the mobile device, and the tip of the arrow indicates the goal node. ing. In FIGS. 23 and 24, mobile device "3" waited for one step at node "76" to avoid a collision with mobile device "7". Furthermore, after reaching node "131" once, mobile device "1" once avoids to node [113] in order to avoid collision with mobile device "2", and then returns to node [131] again. Moved to. As described above, in the movement control system disclosed herein, the mobile device moves to the goal node while avoiding collisions with other mobile devices.

図２５は、本願開示の移動制御システムのシミュレーションテストの結果の一例を示すグラフである。図２５は、横軸に移動装置数をとり、縦軸に経路計画に要する計算時間をとって、その関係を示している。なお、ノード数は３０で固定した。図２５に示すように、移動装置数の増加に伴い、移動装置数に対して線形的な計算時間の増加に収まる傾向がある。 FIG. 25 is a graph showing an example of the results of a simulation test of the movement control system disclosed in the present application. FIG. 25 shows the relationship between the number of moving devices on the horizontal axis and the calculation time required for route planning on the vertical axis. Note that the number of nodes was fixed at 30. As shown in FIG. 25, as the number of mobile devices increases, the calculation time tends to increase linearly with respect to the number of mobile devices.

図２６は、本願開示の移動制御システムのシミュレーションテストの結果の一例を示すグラフである。図２６は、横軸に格子状に配置したレイアウトの一辺のノード数をとり、縦軸に経路計画に要する計算時間をとって、その関係を示している。なお、移動装置数は５台で固定した。図２６に示すように、ノード数の増加に伴い、計算時間が延びるもののノード数の２乗の計算時間に収まる傾向がある。 FIG. 26 is a graph showing an example of the results of a simulation test of the movement control system disclosed in the present application. FIG. 26 shows the relationship between the number of nodes on one side of a layout arranged in a grid on the horizontal axis and the calculation time required for route planning on the vertical axis. Note that the number of moving devices was fixed at five. As shown in FIG. 26, although the calculation time increases as the number of nodes increases, the calculation time tends to be within the square of the number of nodes.

以上のように、本願開示の移動制御システムは、単位領域を移動装置の周囲に配置した近傍領域に、移動装置の周囲の状態を反映させた状態パターンに基づいて、移動装置の行動を決定する。状態パターンと比較して行動を決定することにより、周囲の状況の変化に対し、柔軟に対応することが可能である。例えば、他の移動装置が、故障等の異常発生により、予定と異なる位置に存在するような状況、配置物の転倒等により、障害物の位置が変化した状況等、経路探索時と異なる状況が発生しても、衝突を回避する行動をとることが可能である等、優れた効果を奏する。また、本願開示の移動制御システムは、近傍領域の大きさを、単位移動距離の２単位分に定義した場合、近傍領域を大きくすることによる状態パターンの増加及び計算負荷の増大を抑制しながらも、他の移動装置の移動後の位置まで考慮することが可能である等、優れた効果を奏する。 As described above, the movement control system disclosed in the present application determines the behavior of the mobile device based on the state pattern in which the surrounding conditions of the mobile device are reflected in the neighborhood area in which the unit area is arranged around the mobile device. . By determining behavior by comparing with state patterns, it is possible to respond flexibly to changes in the surrounding situation. For example, situations that are different from those at the time of route searching may occur, such as situations in which another mobile device is in a different position than planned due to an abnormality such as a malfunction, or the position of an obstacle has changed due to falling objects, etc. Even if a collision occurs, it is possible to take action to avoid a collision, which has excellent effects. Furthermore, when the size of the nearby area is defined as two units of the unit movement distance, the movement control system disclosed in the present application suppresses an increase in state patterns and an increase in calculation load due to enlarging the nearby area. , it is possible to take into consideration the position of another moving device after it has been moved, and other excellent effects can be achieved.

更に、本願開示の移動制御システムは、状態パターンと行動との関係をＱ学習等の強化学習手法を用いて求めるが、強化学習による学習モデルの作成は、別途、作成しておくことが可能であるため、実際の移動の際の計算負荷、計算時間等の計算コストの移動装置数、ノード数に対する指数関数的な増加を抑制することが可能である等、優れた効果を奏する。しかも、本願開示の移動制御システムは、移動装置の移動範囲となるレイアウトが異なる場合であっても、同様の学習モデルを転用することが可能である等、優れた効果を奏する。 Furthermore, although the movement control system disclosed in the present application finds the relationship between state patterns and actions using a reinforcement learning method such as Q learning, it is possible to create a learning model using reinforcement learning separately. Therefore, it is possible to suppress an exponential increase in calculation costs such as calculation load and calculation time during actual movement with respect to the number of mobile devices and the number of nodes, and other excellent effects can be achieved. Moreover, the movement control system disclosed in the present application has excellent effects such as being able to reuse the same learning model even if the layout of the movement range of the mobile device is different.

更に、本願開示の自動制御システムは、経路探索に際して、Ａ^*アルゴリズム等のグラフ探索アルゴリズムを用いることにより、最短経路を導出することができるので、移動装置同士の干渉が発生しない場合には最短経路での移動を実現することが可能である等、優れた効果を奏する。 Furthermore, the automatic control system disclosed in the present application can derive the shortest route by using a graph search algorithm such as the A ^* algorithm when searching for a route, so if there is no interference between mobile devices, the shortest route It has excellent effects, such as making it possible to move around.

本発明は、以上説明した実施形態に限定されるものではなく、他のいろいろな形態で実施することが可能である。そのため、かかる実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈してはならない。本発明の範囲は請求の範囲によって示すものであって、明細書本文には、なんら拘束されない。更に、請求の範囲の均等範囲に属する変形及び変更は、全て本発明の範囲内のものである。 The present invention is not limited to the embodiments described above, and can be implemented in various other forms. Therefore, such embodiments are merely illustrative in all respects, and should not be interpreted in a limiting manner. The scope of the present invention is indicated by the claims, and is not restricted in any way by the main text of the specification. Furthermore, all modifications and changes that come within the scope of equivalents of the claims are intended to be within the scope of the present invention.

例えば、前記実施形態及びシミュレーションでは、平面上を移動するＡＧＶを移動装置として適用する形態を示したが、本願開示の移動制御システムは、これに限らず、上下を含む立体的な空間を移動するＵＡＶ（unmanned aerial vehicle ；所謂「ドローン」）、水中移動ロボット等の装置を移動装置として適用することも可能である。立体的な移動を行う移動装置に適用する場合、近傍領域も立体的な３次元空間として定義されることになる。 For example, in the above embodiments and simulations, an AGV that moves on a plane is applied as a moving device, but the movement control system disclosed in the present application is not limited to this, and can move in a three-dimensional space including above and below. It is also possible to apply devices such as UAV (unmanned aerial vehicle; so-called "drone") and underwater mobile robots as the mobile device. When applied to a mobile device that performs three-dimensional movement, the nearby area is also defined as a three-dimensional three-dimensional space.

図２７は、本願開示の移動制御システムにおいて、移動装置の周囲に定義される近傍領域の一例を概念的に示す説明図である。図２７は、移動装置が移動する立体空間に定義された近傍領域を例示している。近傍領域には、立方体で示す複数の単位領域が含まれている。近傍領域に含まれる単位領域の大きさは、移動装置の大きさに基づき設定される。図２７では、移動装置の縦横高さの広がりが収容される大きさの立方体が単位領域として設定された例を示している。図２７に例示する近傍領域は、単位移動距離の２単位分の移動に基づいて定義されている。なお、近傍領域の中央に示すドローンを模した画像が自装置となる移動装置を示しており、周囲の単位領域内に示すドローンを模した画像が他の移動装置を示している。 FIG. 27 is an explanatory diagram conceptually showing an example of a nearby area defined around a mobile device in the movement control system disclosed herein. FIG. 27 illustrates a neighborhood area defined in a three-dimensional space in which a mobile device moves. The neighborhood area includes a plurality of unit areas shown as cubes. The size of the unit area included in the neighborhood area is set based on the size of the mobile device. FIG. 27 shows an example in which a cube having a size that accommodates the width and width of the moving device is set as the unit area. The neighborhood area illustrated in FIG. 27 is defined based on movement of two units of unit movement distance. Note that the image simulating a drone shown in the center of the nearby area shows the mobile device itself, and the images simulating a drone shown in the surrounding unit area show other mobile devices.

また、前記実施形態では、単位領域を正方形として定義したが、本願開示の移動制御システムは、これに限らず、正六角形等、他の形状に定義することも可能である。 Further, in the embodiment, the unit area is defined as a square, but the movement control system disclosed in the present application is not limited to this, and may be defined in other shapes such as a regular hexagon.

更に、前記実施形態では、Ｑ値が取り得る値の範囲を、「－１」から「１」までの整数で表したが、本願開示の移動制御システムは、これに限るものではなく、様々な報酬規定を設定することが可能である。例えば、移動装置が衝突し、更に停止行動を選択した場合に対応するＱ値を「－１０」として与える等、様々な条件を設定することが可能である。 Furthermore, in the embodiment, the range of values that the Q value can take is expressed as an integer from "-1" to "1", but the movement control system of the present disclosure is not limited to this, and can be expressed in various ways. It is possible to set compensation regulations. For example, it is possible to set various conditions, such as giving the corresponding Q value as "-10" when the moving device collides and further stops action is selected.

更に、前記実施形態では、状態パターンと行動との関係を強化学習により求めた学習モデルを用いる形態を示したが、本願開示の移動制御システムは、これに限るものではなく、様々な方法に展開することが可能である。例えば、Ｑ学習以外の価値ベースの強化学習、更には、価値ベースの強化学習以外の方法で求めた状態パターンと行動との関係を、テーブル形式のデータベースに対応付けて記憶しておき、テーブル形式のデータベースを用いて行動を決定する等、様々な形態に展開することが可能である。 Further, in the embodiment described above, a learning model is used in which the relationship between state patterns and actions is determined by reinforcement learning. However, the movement control system disclosed in the present application is not limited to this, and can be deployed in various ways. It is possible to do so. For example, value-based reinforcement learning other than Q-learning, or furthermore, relationships between state patterns and behaviors determined by methods other than value-based reinforcement learning may be stored in association with a table-format database. It is possible to develop it in various forms, such as determining actions using a database of information.

更に、前記実施形態では、Ａ^*アルゴリズムにて経路探索を行う形態を示したが、本願開示の移動制御システムは、これに限るものではなく、様々な方法に展開することが可能である。例えば、Ａ^*アルゴリズム以外の深さ優先探索、ビーム探索、ベルマンフォード法、ヒューリスティック法等のグラフ探索アルゴリズムを用いる等、様々な形態に展開することが可能である。 Furthermore, although the embodiment described above shows a mode in which the route search is performed using the A ^* algorithm, the movement control system disclosed in the present application is not limited to this, and can be developed in various ways. For example, it is possible to use graph search algorithms other than the A ^* algorithm, such as depth-first search, beam search, Bellman-Ford method, and heuristic method.

更に、前記実施形態で示した、移動装置、移動制御装置及び強化学習装置の処理の分担は、あくまでも一例であり、様々な形態に展開することが可能である。例えば、移動制御装置及び強化学習装置の処理を一台のサーバコンピュータにて実装するようにしてもよく、複数の移動装置のうちの一台に他の移動装置の移動を制御する移動制御装置としての機能を実装する等、様々な形態に展開することが可能である。 Furthermore, the division of processing among the mobile device, the movement control device, and the reinforcement learning device shown in the embodiment is merely an example, and can be expanded into various forms. For example, the processing of a movement control device and a reinforcement learning device may be implemented in one server computer, and one of a plurality of movement devices may be used as a movement control device that controls the movement of another movement device. It is possible to develop it into various forms, such as implementing the functions of

Ｓ移動制御システム
１移動装置
１０制御部
１１記憶部
１１０移動装置用プログラム
１３位置取得部
１５周辺状況取得部
２移動制御装置
２０制御部
２１記憶部
２１０移動制御プログラム
３機械学習装置
３０制御部
３１記憶部
３１０機械学習用プログラム
Ｍ学習モデル S Movement control system 1 Movement device 10 Control section 11 Storage section 110 Program for movement device 13 Position acquisition section 15 Surrounding situation acquisition section 2 Movement control device 20 Control section 21 Storage section 210 Movement control program 3 Machine learning device 30 Control section 31 Storage Part 310 Machine learning program M Learning model

Claims

A movement control system comprising a movement device and controlling movement of the movement device, the system comprising:
a route search means for searching a movement route of the mobile device from a movement start position to a target position;
situation acquisition means for acquiring the surrounding situation of the mobile device;
A unit area set based on the size of the moving device is defined as a neighborhood area, which is an area arranged around the moving device for each unit movement distance serving as a reference for movement of the moving device, and the neighborhood area is defined as a neighborhood area. an entry permission determination unit that applies the surrounding situation acquired by the situation acquisition unit to determine whether or not the mobile device can enter with respect to each unit area included in the nearby area;
Based on the determination result of whether or not each unit area can be entered, which is determined by the entry permission/impossibility determination means, based on the policy of the mobile device that is associated with each state pattern indicating whether or not it is possible to enter each unit area included in the nearby area. policy acquisition means for acquiring a policy associated with a corresponding state pattern;
Movement possibility determining means for determining whether or not movement is possible along the travel route searched by the route searching means based on the policy acquired by the strategy acquisition means;
A movement control system comprising: action determining means for determining to take the action indicated by the policy acquired by the policy acquisition means when it is determined that movement along the movement route is not possible.

The movement control system according to claim 1,
comprising a mobile control device capable of communicating with the mobile device,
The mobile devices are plural;
The movement control device includes:
the route search means, the situation acquisition means, the approach permission determination means, the policy acquisition means, the movement permission determination means, and the action determination means;
means for transmitting an action command for each of the mobile devices to perform the action determined by the action determining means to the corresponding mobile device;
The entry permission determining means determines that one of the plurality of mobile devices cannot enter a unit area in which another mobile device is located and/or a unit area in which the other mobile device moves. Features a movement control system.

The movement control system according to claim 1,
The mobile device includes:
multiple,
One of the plurality of mobile devices includes:
At least the situation acquisition means, the approach permission determination means, the policy acquisition means, the movement permission determination means, and the action determination means,
The movement control system is characterized in that the entry permission determining means determines that entry is not possible to a unit area in which another mobile device is located and/or a unit area in which the other mobile device moves.

The movement control system according to any one of claims 1 to 3,
The route searching means includes:
From the start node corresponding to the movement start position of the mobile device for a graph in which connection relationships are defined with unit areas as nodes and movable paths between unit areas as edges, the range in which the mobile device can move is defined. A movement control system that uses a graph search algorithm to search for a movement route to a goal node corresponding to a target position.

The movement control system according to any one of claims 1 to 3,
The movement control system, wherein the nearby area is defined based on movement of two units of unit movement distance.

The movement control system according to any one of claims 1 to 3,
A movement control system characterized in that the relationship between the state pattern and the policy acquired by the policy acquisition means is a learning model obtained by reinforcement learning.

The movement control system according to claim 6,
The learning model is
For each assumed state pattern, the respective action values of moving and stopping the moving device in the moving direction are shown,
The policy acquisition means includes:
A movement control system characterized in that a movement or stopping in a direction with the highest action value is acquired as a policy.

A movement control method using a movement device and a movement control device that controls movement of the movement device, the method comprising:
searching for a movement route from the movement start position of the mobile device to the target position;
obtaining the surrounding situation of the mobile device;
A unit area set based on the size of the moving device is defined as a neighborhood area, which is an area arranged around the moving device for each unit movement distance serving as a reference for movement of the moving device, and the neighborhood area is defined as a neighborhood area. applying surrounding conditions to the area, and determining whether or not the moving device can enter each unit area included in the nearby area;
Based on the policy of the moving device, which is associated with each status pattern indicating whether or not entry into each unit area included in the nearby area is allowed, the mobile device is associated with a state pattern corresponding to a determination result as to whether or not each unit area can be entered. obtaining a policy based on the
a step of determining whether or not movement is possible along the movement route based on the acquired policy;
If it is determined that the moving device is movable, the moving device moves along the moving route;
A movement control method, comprising the steps of: when it is determined that movement is not possible, the movement device takes an action indicated by a policy.

A movement control device that controls movement of a movement device,
means for searching a movement route of the mobile device from a movement start position to a target position;
means for acquiring the surrounding situation of the mobile device;
A unit area set based on the size of the moving device is defined as a neighborhood area, which is an area arranged around the moving device for each unit movement distance serving as a reference for movement of the moving device, and the neighborhood area is defined as a neighborhood area. means for determining whether or not the moving device can enter into each unit area included in the nearby area by applying surrounding conditions to the area;
Based on the policy of the moving device, which is associated with each status pattern indicating whether or not entry into each unit area included in the nearby area is allowed, the mobile device is associated with a state pattern corresponding to a determination result as to whether or not each unit area can be entered. a means for obtaining a strategy;
means for determining whether or not movement is possible along the movement route based on the acquired policy;
A means for transmitting, to the mobile device, an action command that causes the mobile device to move along the travel route when it is determined that the movement is possible, and to take the determined action when it is determined that the movement is not possible. Control device.

A mobile device that moves along a travel route,
a means of acquiring surrounding conditions;
A unit area set based on the size of the own aircraft is defined as a neighboring area for each unit movement distance that serves as a reference for movement of the own aircraft, and the surrounding situation is applied to the neighboring area. means for determining whether or not the own aircraft can enter with respect to each unit area included in the nearby area;
Based on the policy associated with each state pattern indicating whether entry into each unit area included in the neighborhood area is allowed, obtain the policy associated with the state pattern corresponding to the determination result of whether entry is allowed into each unit area. means and
means for determining whether or not movement is possible along the movement route based on the acquired policy;
means for moving along the movement route when it is determined that movement is possible;
A means for taking an action indicated by an acquired policy when it is determined that movement is impossible.

A movement control program that causes a computer equipped with a communication means for communicating with a mobile device to control movement of the mobile device, the program comprising:
to the computer,
searching for a movement route from the movement start position of the mobile device to the target position;
obtaining the surrounding situation of the mobile device;
A unit area set based on the size of the moving device is defined as a neighborhood area, which is an area arranged around the moving device for each unit movement distance serving as a reference for movement of the moving device, and the neighborhood area is defined as a neighborhood area. applying surrounding conditions to the area, and determining whether or not the moving device can enter each unit area included in the nearby area;
Based on the policy of the moving device, which is associated with each status pattern indicating whether or not entry into each unit area included in the nearby area is allowed, the mobile device is associated with a state pattern corresponding to a determination result as to whether or not each unit area can be entered. obtaining a policy based on the
a step of determining whether or not movement is possible along the movement route based on the acquired policy;
If it is determined that the mobile device is movable, the mobile device moves along the travel route, and if it is determined that the mobile device is not movable, it transmits an action command to the mobile device to take the determined action. Movement control program.

A mobile device program executed by a mobile device moving along a travel route, the program comprising:
mobile device,
a step of acquiring surrounding conditions;
A unit area set based on the size of the own aircraft is defined as a neighboring area for each unit movement distance that serves as a reference for movement of the own aircraft, and the surrounding situation is applied to the neighboring area. determining whether or not the own aircraft can enter, for each unit area included in the nearby area;
Based on the policy associated with each state pattern indicating whether entry into each unit area included in the neighborhood area is allowed, obtain the policy associated with the state pattern corresponding to the determination result of whether entry is allowed into each unit area. step and
a step of determining whether or not movement is possible along the movement route based on the acquired policy;
a step of moving along the movement route when it is determined that movement is possible;
A program for a mobile device, characterized in that, when it is determined that movement is impossible, the program executes the following steps: taking an action indicated by the acquired policy.