JP7263987B2

JP7263987B2 - Control device, control method, and control program

Info

Publication number: JP7263987B2
Application number: JP2019161195A
Authority: JP
Inventors: 洋平大川; 剣之介林; 義也柴田
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2019-08-22
Filing date: 2019-09-04
Publication date: 2023-04-25
Anticipated expiration: 2039-09-04
Also published as: JP2021035714A

Description

本発明は、制御装置、制御方法、及び制御プログラムに関する。 The present invention relates to a control device, control method, and control program.

製品を生産する生産ラインでは、様々なタイプのマニピュレータが利用されている。マニピュレータの機構、エンドエフェクタ、ワーク等の構成要素は、遂行するタスク等に応じて多くのバリエーションを有しており、それらすべてに対応したマニピュレータの動作手順を人手で作成して、マニピュレータに対象のタスクを教示するのは困難である。そのため、従来、機構、エンドエフェクタ、ワーク等の構成要素の種類を決定した後、マニピュレータを人手で動かして、実行させる一連の動作における姿勢をレコードしながら、遂行するタスクを直接的にティーチングする方法が採用されている。 Various types of manipulators are used in production lines for producing products. Manipulator mechanisms, end effectors, workpieces, and other components have many variations according to the tasks to be performed. Tasks are difficult to teach. Therefore, conventionally, after determining the types of components such as mechanisms, end effectors, and workpieces, the manipulator is manually moved, and the task to be performed is directly taught while recording the posture in a series of actions to be executed. is adopted.

しかしながら、この方法では、機構、エンドエフェクタ、ワーク等の構成要素が変更される度に、遂行するタスクをマニピュレータに教示することになる。したがって、遂行するタスクをマニピュレータに教示するのにあまりにコストがかかってしまう。そこで、近年、遂行するタスクをマニピュレータに習得させる方法の効率化が研究されている。例えば、特許文献１では、シール等の柔軟物を把持しているハンドの移動速度を、ハンド及び柔軟物の間の相対速度に基づいて決定する制御方法が提案されている。この制御方法によれば、ハンドの移動する動作の作り込み又は教示する作業の少なくとも一部を自動化することができる。そのため、マニピュレータの動作を生成又は教示するコストを低減することができる。 However, this method teaches the manipulator what task to perform each time a component such as a mechanism, end effector, or workpiece is changed. Therefore, it is too costly to teach the manipulator what task to perform. Therefore, in recent years, research has been conducted to improve the efficiency of methods for making manipulators learn tasks to be performed. For example, Patent Document 1 proposes a control method for determining the moving speed of a hand gripping a flexible object such as a seal based on the relative speed between the hand and the flexible object. According to this control method, it is possible to automate at least a part of the work of creating or teaching the movement of the hand. Therefore, the cost of generating or teaching manipulator movements can be reduced.

特開２０１５－１７４１７２号公報JP 2015-174172 A

本件発明者らは、上記のようなマニピュレータの従来の制御方法には、次のような問題点があることを見出した。すなわち、従来の制御方法では、マニピュレータの手先をセンサにより観測し、センサにより得られたセンシングデータからマニピュレータの手先の座標を推定する。この推定の結果に基づいて、マニピュレータの手先の座標を制御する。 The inventors of the present invention have found that the conventional manipulator control method described above has the following problems. That is, in the conventional control method, the tip of the manipulator is observed by a sensor, and the coordinates of the tip of the manipulator are estimated from sensing data obtained by the sensor. Based on the result of this estimation, the coordinates of the manipulator's hand are controlled.

マニピュレータの手先の座標を推定する方法の一例として、順運動学計算による方法が存在する。この方法では、各関節の角度を測定するエンコーダがセンサの一例である。エンコーダにより得られる各関節の角度の測定値からマニピュレータの手先の座標の推定値を解析的に算出することができる。 As an example of the method of estimating the coordinates of the hand of the manipulator, there is a method using forward kinematics calculation. In this method, an encoder that measures the angle of each joint is an example of a sensor. Estimated values of the coordinates of the manipulator's hand can be analytically calculated from the measured values of the angles of the joints obtained by the encoders.

また、その他の方法として、画像解析による方法が存在する。この方法では、マニピュレータの手先を含むタスクの環境を撮影するカメラがセンサの一例である。カメラにより得られる画像データに対して画像解析を実行することで、マニピュレータの手先の座標を推定することができる。画像解析の方法は、パターンマッチング等の公知の方法が採用されてよい。 As another method, there is a method by image analysis. In this method, a camera that captures the environment of the task, including the manipulator's hands, is an example of a sensor. By performing image analysis on the image data obtained by the camera, the coordinates of the hand of the manipulator can be estimated. A known method such as pattern matching may be adopted as the image analysis method.

いずれの方法でも、センサにより得られるセンシングデータからマニピュレータの手先の座標を推定することができる。しかしながら、センサから得られるセンシングデータにはノイズが含まれる場合がある。また、解析の過程でも、例えば、パターンマッチングの許容誤差等のノイズが含まれる場合がある。これらの環境に依存して生じるノイズに起因して、マニピュレータの手先の座標の推定値と真値との間の誤差が大きくなり、これによって、マニピュレータの手先の座標を制御する精度が悪化してしまう可能性があった。 In either method, the coordinates of the manipulator's hand can be estimated from the sensing data obtained by the sensor. However, sensing data obtained from sensors may contain noise. Also, in the process of analysis, for example, noise such as tolerance of pattern matching may be included. Due to these environment-dependent noises, the error between the estimated values of the manipulator's hand coordinates and the true values becomes large, which degrades the accuracy of controlling the manipulator's hand coordinates. There was a possibility of it getting lost.

本発明は、一側面では、このような実情を鑑みてなされたものであり、その目的は、マニピュレータの手先の座標を制御する精度の向上を図るための技術を提供することである。 SUMMARY OF THE INVENTION In one aspect, the present invention has been made in view of such circumstances, and an object thereof is to provide a technique for improving the accuracy of controlling the coordinates of the tip of a manipulator.

本発明は、上述した課題を解決するために、以下の構成を採用する。 The present invention adopts the following configurations in order to solve the above-described problems.

すなわち、本発明の一側面に係る制御装置は、マニピュレータの動作を制御するための制御装置であって、前記マニピュレータの手先を観測する第１センサ系から第１センシングデータを取得する第１データ取得部と、第１推定モデルを利用して、取得された前記第１センシングデータから、観測空間内における前記手先の現在の座標の第１推定値を算出する第１推定部と、前記マニピュレータの手先を観測する第２センサ系から第２センシングデータを取得する第２データ取得部と、第２推定モデルを利用して、取得された前記第２センシングデータから、前記観測空間内における前記手先の現在の座標の第２推定値を算出する第２推定部と、前記第１推定値及び前記第２推定値の間の誤差の勾配を算出し、算出された勾配に基づいて、前記誤差が小さくなるように前記第１推定モデル及び前記第２推定モデルの少なくとも一方のパラメータの値を調整する調整部と、前記第１推定値及び前記第２推定値の少なくとも一方に基づいて、前記手先の座標が目標値に近付くように、前記マニピュレータに与える制御指令を決定する指令決定部と、決定された前記制御指令を前記マニピュレータに与えることで、前記マニピュレータを駆動する駆動部と、を備える。 That is, a control device according to one aspect of the present invention is a control device for controlling the operation of a manipulator, and includes a first data acquisition system that acquires first sensing data from a first sensor system that observes a hand of the manipulator. a first estimating unit that calculates a first estimated value of the current coordinates of the hand in an observation space from the first sensing data obtained using a first estimation model; and a hand of the manipulator. and a second data acquisition unit that acquires second sensing data from a second sensor system that observes the current state of the hand in the observation space from the acquired second sensing data using a second estimation model A second estimating unit that calculates a second estimated value of the coordinates of and calculates a gradient of an error between the first estimated value and the second estimated value, and the error is reduced based on the calculated gradient and an adjusting unit that adjusts the parameter values of at least one of the first estimation model and the second estimation model so that the coordinates of the hand are adjusted based on at least one of the first estimation value and the second estimation value. A command determination unit that determines a control command to be given to the manipulator so as to approach a target value, and a drive unit that drives the manipulator by giving the determined control command to the manipulator.

当該構成に係る制御装置では、第１センサ系及び第２センサ系の２つの経路からマニピュレータの手先の座標を推定する。すなわち、当該構成に係る制御装置は、第１推定モデルを利用して、第１センサ系により得られた第１センシングデータからマニピュレータの手先の座標の第１推定値を算出する。また、当該構成に係る制御装置は、第２推定モデルを利用して、第２センサ系により得られた第２センシングデータからマニピュレータの手先の座標の第２推定値を算出する。 In the control device according to this configuration, the coordinates of the hand of the manipulator are estimated from the two paths of the first sensor system and the second sensor system. That is, the control device according to this configuration uses the first estimation model to calculate the first estimated value of the coordinates of the hand of the manipulator from the first sensing data obtained by the first sensor system. Further, the control device according to this configuration uses the second estimation model to calculate the second estimated value of the coordinates of the hand of the manipulator from the second sensing data obtained by the second sensor system.

マニピュレータの手先の座標の真値は一つである。各センサ系からの算出過程にノイズがなければ、第１推定値と第２推定値とは一致する。これに対して、各センサ系に応じたノイズが生じることで、第１推定値と第２推定値とは互いに異なり得る。そこで、得られた第１推定値及び第２推定値の間の誤差の勾配を算出し、算出された勾配に基づいて、誤差が小さくなるように第１推定モデル及び第２推定モデルの少なくとも一方のパラメータの値を調整する。この調整により、互いの推定結果（推定値）が一つの値に近付くことから、各推定モデルによる手先の座標の推定精度の改善を期待することができる。 There is only one true value for the coordinates of the manipulator's hand. If there is no noise in the calculation process from each sensor system, the first estimated value and the second estimated value will match. On the other hand, the first estimated value and the second estimated value may differ from each other due to noise generated according to each sensor system. Therefore, the gradient of the error between the obtained first estimated value and the second estimated value is calculated, and based on the calculated gradient, at least one of the first estimated model and the second estimated model is calculated so that the error becomes small. Adjust the value of the parameters in By this adjustment, each estimation result (estimation value) approaches one value, so it can be expected that the accuracy of estimating the coordinates of the hand by each estimation model will be improved.

調整後に各推定モデルによる推定値が真値に近付いているか否かは適宜評価可能である。一例として、各センサ系により得られるセンシングデータに含まれ得るノイズはホワイトノイズと想定される。そのため、所定時間分のセンシングデータを取得し、得られたセンシングデータを平均化することにより、センシングデータに含まれるノイズを除去又は低減することができる。この平均化されたセンシングデータにおいて、各推定モデルによる推定値に関する成分が含まれているか否かにより、各推定モデルによる推定値が真値に近付いているか否かを評価することができる。したがって、当該構成によれば、各推定モデルによる手先の座標の推定精度を改善し、これによって、マニピュレータの手先の座標を制御する精度の向上を図ることができる。 Whether or not the estimated value by each estimation model approaches the true value after adjustment can be appropriately evaluated. As an example, noise that may be included in sensing data obtained by each sensor system is assumed to be white noise. Therefore, by acquiring sensing data for a predetermined time period and averaging the acquired sensing data, noise contained in the sensing data can be removed or reduced. Whether or not the averaged sensing data includes a component related to the estimated value by each estimation model makes it possible to evaluate whether or not the estimated value by each estimation model is close to the true value. Therefore, according to this configuration, it is possible to improve the accuracy of estimating the coordinates of the hand by each estimation model, thereby improving the accuracy of controlling the coordinates of the hand of the manipulator.

なお、マニピュレータの種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。マニピュレータは、例えば、垂直多関節ロボット、スカラロボット、パラレルリンクロボット、直交ロボット、協調ロボット等を含んでよい。各センサ系は、１つ以上のセンサを備え、マニピュレータの手先を観測可能であれば、その構成は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。各センサ系には、例えば、カメラ、エンコーダ、触覚センサ、力覚センサ、近接センサ、トルクセンサ、圧力センサ等が用いられてよい。各推定モデルは、センシングデータから手先座標を算出するためのパラメータを備える。各推定モデルの種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。各推定モデルは、例えば、関数式、データテーブル等により表現されてよい。関数式により表現される場合、各推定モデルは、ニューラルネットワーク、サポートベクタマシン、回帰モデル、決定木等の機械学習モデルにより構成されてよい。 Note that the type of manipulator is not particularly limited, and may be appropriately selected according to the embodiment. Manipulators may include, for example, vertically articulated robots, SCARA robots, parallel link robots, orthogonal robots, collaborative robots, and the like. Each sensor system is provided with one or more sensors, and its configuration is not particularly limited as long as it can observe the hand of the manipulator, and may be appropriately selected according to the embodiment. For each sensor system, for example, a camera, an encoder, a touch sensor, a force sensor, a proximity sensor, a torque sensor, a pressure sensor, etc. may be used. Each estimation model has parameters for calculating hand coordinates from sensing data. The type of each estimation model may not be particularly limited, and may be appropriately selected according to the embodiment. Each estimation model may be represented by, for example, a function formula, a data table, or the like. When expressed by a functional formula, each estimation model may be configured by machine learning models such as neural networks, support vector machines, regression models, and decision trees.

上記一側面に係る制御装置において、前記調整部は、更に、前記マニピュレータの手先が対象物に接触した時に、当該対象物との接触の境界面上で前記手先の座標の境界値を取得し、前記接触時に推定される前記第１推定値及び取得された前記境界値の間の第１誤差の勾配を算出し、算出された前記第１誤差の勾配に基づいて、前記第１誤差が小さくなるように前記第１推定モデルのパラメータの値を調整し、かつ前記接触時に推定される前記第２推定値及び取得された前記境界値の間の第２誤差の勾配を算出し、算出された前記第２誤差の勾配に基づいて、前記第２誤差が小さくなるように前記第２推定モデルのパラメータの値を調整してもよい。 In the control device according to the above aspect, the adjustment unit further acquires a boundary value of the coordinates of the manipulator's hand on the boundary plane of contact with the object when the hand of the manipulator contacts the object, calculating a gradient of a first error between the first estimated value estimated at the time of contact and the acquired boundary value, and reducing the first error based on the calculated gradient of the first error; and calculating the slope of a second error between the second estimated value estimated at the time of contact and the boundary value obtained, and calculating the calculated A parameter value of the second estimation model may be adjusted based on the gradient of the second error so that the second error becomes smaller.

対象物と接触するという物理的制約を伴うため、対象物との接触時に境界面上から得られる境界値は、マニピュレータの手先の座標の真値として確度の高い値である。当該構成では、この確度の高い値に基づいて、各推定モデルのパラメータの値を調整することで、各推定モデルによる手先の座標の推定精度を高めることができる。したがって、当該構成によれば、マニピュレータの手先の座標を制御する精度の向上を図ることができる。 Since there is a physical constraint of contact with the object, the boundary value obtained from the boundary surface at the time of contact with the object is a value with high accuracy as the true value of the coordinates of the manipulator's hand. In this configuration, by adjusting the parameter values of each estimation model based on this highly accurate value, it is possible to increase the accuracy of estimating the coordinates of the hand by each estimation model. Therefore, according to this configuration, it is possible to improve the accuracy of controlling the coordinates of the hand of the manipulator.

なお、境界値を得る方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、オペレータの指定により境界値が得られてよい。また、例えば、マニピュレータの手先と対象物との接触する境界面上において、第１推定値及び第２推定値の少なくとも一方に近傍の点から境界値が選択されてよい。一例として、第１推定値及び第２推定値の少なくとも一方に最近傍の点が境界値を与える点として採用されてよい。対象物の種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。対象物は、例えば、ワーク、ワークの組付け先となる対象物（例えば、他のワーク）、障害物等であってよい。 Note that the method for obtaining the boundary value is not particularly limited, and may be appropriately selected according to the embodiment. For example, the boundary values may be obtained by operator specification. Further, for example, a boundary value may be selected from points near at least one of the first estimated value and the second estimated value on the boundary surface where the hand of the manipulator contacts the object. As an example, a point closest to at least one of the first estimated value and the second estimated value may be adopted as the point that gives the boundary value. The type of object may not be particularly limited, and may be appropriately selected according to the embodiment. The target may be, for example, a work, a target to which the work is attached (for example, another work), an obstacle, or the like.

上記一側面に係る制御装置において、前記マニピュレータは、１つ以上の関節を備えてよい。前記第１センサ系は、前記各関節の角度を測定するエンコーダを備えてよい。前記第２センサ系は、カメラを備えてよい。当該構成によれば、エンコーダ及びカメラによりマニピュレータの手先を観測する場面で、マニピュレータの手先の座標を制御する精度の向上を図ることができる。 In the control device according to the above aspect, the manipulator may have one or more joints. The first sensor system may comprise an encoder that measures the angle of each joint. The second sensor system may comprise a camera. According to this configuration, it is possible to improve the accuracy of controlling the coordinates of the manipulator's hand when the encoder and camera are used to observe the manipulator's hand.

上記一側面に係る制御装置において、前記マニピュレータは、ワークを保持するためのエンドエフェクタを更に備えてもよい。前記エンドエフェクタが前記ワークを保持していない場合、前記エンドエフェクタの注目点が前記手先に設定されてよい。前記エンドエフェクタが前記ワークを保持している場合、前記ワークの注目点が前記手先に設定されてよい。前記第１センサ系は、前記エンドエフェクタに対する前記ワークの位置関係を推定するための触覚センサを更に備えてもよい。 In the control device according to one aspect described above, the manipulator may further include an end effector for holding a workpiece. When the end effector does not hold the workpiece, the point of interest of the end effector may be set to the hand. When the end effector holds the work, the point of interest of the work may be set to the hand. The first sensor system may further include a tactile sensor for estimating the positional relationship of the workpiece with respect to the end effector.

当該構成によれば、エンドエフェクタがワークを保持しているか否かに応じて、マニピュレータの手先を変更する場面で、マニピュレータの手先の座標を制御する精度の向上を図ることができる。また、このマニピュレータの手先の設定により、エンドエフェクタがワークを保持していない場合の移動及び保持している場合の移動それぞれを、マニピュレータの手先を移動する共通のタスクとして捉えることができる。したがって、マニピュレータの制御処理を単純化することができ、これによって、マニピュレータの動作を生成又は教示するコストを低減することができる。 According to this configuration, it is possible to improve the accuracy of controlling the coordinates of the tip of the manipulator when changing the tip of the manipulator depending on whether the end effector is holding a workpiece. In addition, by setting the hand of the manipulator, the movement when the end effector does not hold the work and the movement when the work is held can be regarded as a common task of moving the hand of the manipulator. Therefore, the manipulator control process can be simplified, thereby reducing the cost of generating or teaching manipulator movements.

なお、エンドエフェクタ及びワークの種類は、特に限定されなくてもよく、タスクの種類等に応じて適宜選択されてよい。タスクは、少なくとも工程の一部にマニピュレータの手先の移動を伴うものであれば、その種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。マニピュレータに遂行させるタスクは、例えば、エンドエフェクタによりワークを保持し、保持したワークを他のワークに組み付けることであってよい。この場合、エンドエフェクタは、例えば、グリッパ、吸引器、ドライバ等であってよい。ワークは、例えば、コネクタ、ペグ等であってよい。他のワークは、例えば、ソケット、ホール等であってよい。タスクの遂行は、実空間又は仮想空間内で実行されてよい。 The types of end effector and work may not be particularly limited, and may be appropriately selected according to the type of task or the like. The type of the task may not be particularly limited as long as at least part of the process involves movement of the manipulator's hand, and may be appropriately selected according to the embodiment. A task to be performed by the manipulator may be, for example, holding a workpiece with an end effector and assembling the held workpiece to another workpiece. In this case, the end effector may be, for example, a gripper, aspirator, driver, or the like. The workpiece may be, for example, a connector, peg, or the like. Other workpieces may be, for example, sockets, holes, and the like. Performance of tasks may be performed in real or virtual space.

上記各形態に係る制御装置の別の態様として、本発明の一側面は、以上の制御装置の各構成を実現する情報処理方法であってもよいし、プログラムであってもよいし、このようなプログラムを記憶した、コンピュータ等が読み取り可能な記憶媒体であってもよい。コンピュータ等が読み取り可能な記憶媒体とは、プログラム等の情報を、電気的、磁気的、光学的、機械的、又は、化学的作用によって蓄積する媒体である。 As another aspect of the control device according to each of the above embodiments, one aspect of the present invention may be an information processing method or a program for realizing each configuration of the above control device. It may also be a computer-readable storage medium storing such a program. Computer-readable storage media are media that store information such as programs by electrical, magnetic, optical, mechanical, or chemical action.

例えば、本発明の一側面に係る制御方法は、マニピュレータの動作を制御するための情報処理方法であって、コンピュータが、前記マニピュレータの手先を観測する第１センサ系から第１センシングデータを取得するステップと、第１推定モデルを利用して、取得された前記第１センシングデータから、観測空間内における前記手先の現在の座標の第１推定値を算出するステップと、前記マニピュレータの手先を観測する第２センサ系から第２センシングデータを取得するステップと、第２推定モデルを利用して、取得された前記第２センシングデータから、前記観測空間内における前記手先の現在の座標の第２推定値を算出するステップと、前記第１推定値及び前記第２推定値の間の誤差の勾配を算出するステップと、算出された勾配に基づいて、前記誤差が小さくなるように前記第１推定モデル及び前記第２推定モデルの少なくとも一方のパラメータの値を調整するステップと、前記第１推定値及び前記第２推定値の少なくとも一方に基づいて、前記手先の座標が目標値に近付くように、前記マニピュレータに与える制御指令を決定するステップと、決定された前記制御指令を前記マニピュレータに与えることで、前記マニピュレータを駆動するステップと、を実行する、情報処理方法である。 For example, a control method according to one aspect of the present invention is an information processing method for controlling the operation of a manipulator, in which a computer acquires first sensing data from a first sensor system that observes a hand of the manipulator. calculating a first estimated value of current coordinates of the hand in an observation space from the obtained first sensing data using a first estimation model; and observing the hand of the manipulator. obtaining second sensing data from a second sensor system; and using a second estimation model to obtain a second estimate of the current coordinates of the hand in the observation space from the obtained second sensing data. Calculating a gradient of the error between the first estimated value and the second estimated value; Based on the calculated gradient, the first estimation model and the adjusting the value of at least one parameter of the second estimation model; and adjusting the manipulator so that the coordinates of the hand approach a target value based on at least one of the first estimated value and the second estimated value. and driving the manipulator by giving the determined control command to the manipulator.

また、例えば、本発明の一側面に係る制御プログラムは、マニピュレータの動作を制御するためのプログラムであって、コンピュータに、前記マニピュレータの手先を観測する第１センサ系から第１センシングデータを取得するステップと、第１推定モデルを利用して、取得された前記第１センシングデータから、観測空間内における前記手先の現在の座標の第１推定値を算出するステップと、前記マニピュレータの手先を観測する第２センサ系から第２センシングデータを取得するステップと、第２推定モデルを利用して、取得された前記第２センシングデータから、前記観測空間内における前記手先の現在の座標の第２推定値を算出するステップと、前記第１推定値及び前記第２推定値の間の誤差の勾配を算出するステップと、算出された勾配に基づいて、前記誤差が小さくなるように前記第１推定モデル及び前記第２推定モデルの少なくとも一方のパラメータの値を調整するステップと、前記第１推定値及び前記第２推定値の少なくとも一方に基づいて、前記手先の座標が目標値に近付くように、前記マニピュレータに与える制御指令を決定するステップと、決定された前記制御指令を前記マニピュレータに与えることで、前記マニピュレータを駆動するステップと、を実行させるための、プログラムである。 Further, for example, a control program according to one aspect of the present invention is a program for controlling the operation of a manipulator, and causes a computer to acquire first sensing data from a first sensor system that observes a hand of the manipulator. calculating a first estimated value of current coordinates of the hand in an observation space from the obtained first sensing data using a first estimation model; and observing the hand of the manipulator. obtaining second sensing data from a second sensor system; and using a second estimation model to obtain a second estimate of the current coordinates of the hand in the observation space from the obtained second sensing data. Calculating a gradient of the error between the first estimated value and the second estimated value; Based on the calculated gradient, the first estimation model and the adjusting the value of at least one parameter of the second estimation model; and adjusting the manipulator so that the coordinates of the hand approach a target value based on at least one of the first estimated value and the second estimated value. and driving the manipulator by giving the determined control command to the manipulator.

本発明によれば、マニピュレータの手先の座標を制御する精度の向上を図ることができる。 ADVANTAGE OF THE INVENTION According to this invention, the improvement of the precision which controls the coordinate of the hand of a manipulator can be aimed at.

図１は、本発明が適用される場面の一例を模式的に例示する。FIG. 1 schematically illustrates an example of a scene to which the present invention is applied. 図２Ａは、実施の形態に係る２つの対象物の間の位置関係の一例を模式的に例示する。FIG. 2A schematically illustrates an example of the positional relationship between two objects according to the embodiment. 図２Ｂは、実施の形態に係る２つの対象物の間の位置関係の一例を模式的に例示する。FIG. 2B schematically illustrates an example of the positional relationship between two objects according to the embodiment. 図３は、実施の形態に係る第１モデル生成装置のハードウェア構成の一例を模式的に例示する。FIG. 3 schematically illustrates an example of the hardware configuration of the first model generation device according to the embodiment. 図４は、実施の形態に係る第２モデル生成装置のハードウェア構成の一例を模式的に例示する。FIG. 4 schematically illustrates an example of the hardware configuration of the second model generation device according to the embodiment. 図５は、実施の形態に係る制御装置のハードウェア構成の一例を模式的に例示する。FIG. 5 schematically illustrates an example of the hardware configuration of the control device according to the embodiment. 図６は、実施の形態に係るマニピュレータの一例を模式的に例示する。FIG. 6 schematically illustrates an example of a manipulator according to the embodiment. 図７は、実施の形態に係る第１モデル生成装置のソフトウェア構成の一例を模式的に例示する。FIG. 7 schematically illustrates an example of the software configuration of the first model generation device according to the embodiment. 図８は、実施の形態に係る第２モデル生成装置のソフトウェア構成の一例を模式的に例示する。FIG. 8 schematically illustrates an example of the software configuration of the second model generation device according to the embodiment. 図９は、実施の形態に係る制御装置のソフトウェア構成の一例を模式的に例示する。FIG. 9 schematically illustrates an example of the software configuration of the control device according to the embodiment. 図１０は、実施の形態に係る第１モデル生成装置の処理手順の一例を例示する。FIG. 10 illustrates an example of a processing procedure of the first model generation device according to the embodiment. 図１１は、実施の形態に係る第２モデル生成装置の処理手順の一例を例示する。FIG. 11 illustrates an example of a processing procedure of the second model generation device according to the embodiment. 図１２Ａは、実施の形態に係るタスク空間の一例を模式的に例示する。FIG. 12A schematically illustrates an example of task space according to the embodiment. 図１２Ｂは、実施の形態に係るタスク空間の一例を模式的に例示する。FIG. 12B schematically illustrates an example of task space according to the embodiment. 図１２Ｃは、実施の形態に係るタスク空間の一例を模式的に例示する。FIG. 12C schematically illustrates an example of a task space according to the embodiment; 図１３は、実施の形態に係る推論モデルの構成及び生成方法の一例を模式的に例示する。FIG. 13 schematically illustrates an example of an inference model configuration and generation method according to the embodiment. 図１４は、実施の形態に係る推論モデルの構成及び生成方法の一例を模式的に例示する。FIG. 14 schematically illustrates an example of an inference model configuration and generation method according to the embodiment. 図１５Ａは、実施の形態に係る学習データの一例を模式的に例示する。FIG. 15A schematically illustrates an example of learning data according to the embodiment; 図１５Ｂは、実施の形態に係る推論モデルの構成の一例を模式的に例示する。FIG. 15B schematically illustrates an example of the configuration of an inference model according to the embodiment; 図１６Ａは、実施の形態に係る制御装置によるマニピュレータの動作制御に関する処理手順の一例を例示する。FIG. 16A illustrates an example of a processing procedure regarding motion control of a manipulator by the control device according to the embodiment. 図１６Ｂは、実施の形態に係る制御装置によるマニピュレータの動作制御に関する処理手順の一例を例示する。FIG. 16B illustrates an example of a processing procedure regarding motion control of the manipulator by the control device according to the embodiment. 図１７は、実施の形態に係る各要素の計算過程の一例を例示する。FIG. 17 illustrates an example of the calculation process of each element according to the embodiment. 図１８は、実施の形態に係る各対象物の位置関係を模式的に例示する。FIG. 18 schematically illustrates the positional relationship of each target object according to the embodiment. 図１９Ａは、エンドエフェクタがワークを保持していない時における各関節と手先との関係の一例を模式的に示す。FIG. 19A schematically shows an example of the relationship between each joint and the hand when the end effector does not hold a work. 図１９Ｂは、エンドエフェクタがワークを保持している時における各関節と手先との関係の一例を模式的に示す。FIG. 19B schematically shows an example of the relationship between each joint and the hand when the end effector holds a work. 図２０は、実施の形態に係る制御装置の推定モデルのパラメータ調整に関する処理手順の一例を例示する。FIG. 20 illustrates an example of a processing procedure regarding parameter adjustment of the estimation model of the control device according to the embodiment. 図２１は、実施の形態に係る制御装置の推定モデルのパラメータ調整に関する処理手順の一例を例示する。FIG. 21 illustrates an example of a processing procedure regarding parameter adjustment of the estimation model of the control device according to the embodiment. 図２２は、接触の境界面上で境界値を取得する場面の一例を模式的に例示する。FIG. 22 schematically illustrates an example of a scene in which boundary values are obtained on a contact boundary surface. 図２３は、制御タイミングと調整タイミングとの関係の一例を模式的に例示する。FIG. 23 schematically illustrates an example of the relationship between control timing and adjustment timing. 図２４は、２つの対象物が接触するか否かを示す値を座標点毎に保持する形態の一例を模式的に例示する。FIG. 24 schematically illustrates an example of a form in which a value indicating whether or not two objects are in contact is held for each coordinate point. 図２５は、変形例に係る制御装置による目標決定に関するサブルーチンの処理手順の一例を例示する。FIG. 25 illustrates an example of a processing procedure of a subroutine regarding target determination by the control device according to the modification.

以下、本発明の一側面に係る実施の形態（以下、「本実施形態」とも表記する）を、図面に基づいて説明する。ただし、以下で説明する本実施形態は、あらゆる点において本発明の例示に過ぎない。本発明の範囲を逸脱することなく種々の改良や変形を行うことができることは言うまでもない。つまり、本発明の実施にあたって、実施形態に応じた具体的構成が適宜採用されてもよい。なお、本実施形態において登場するデータを自然言語により説明しているが、より具体的には、コンピュータが認識可能な疑似言語、コマンド、パラメータ、マシン語等で指定される。 Hereinafter, an embodiment (hereinafter also referred to as "this embodiment") according to one aspect of the present invention will be described based on the drawings. However, this embodiment described below is merely an example of the present invention in every respect. It goes without saying that various modifications and variations can be made without departing from the scope of the invention. That is, in implementing the present invention, a specific configuration according to the embodiment may be appropriately employed. Although the data appearing in this embodiment are explained in terms of natural language, more specifically, they are specified in computer-recognizable pseudo-language, commands, parameters, machine language, and the like.

§１適用例
まず、図１を用いて、本発明が適用される場面の一例について説明する。図１は、本発明の適用場面の一例を模式的に例示する。図１に示されるとおり、本実施形態に係る制御システム１００は、第１モデル生成装置１、第２モデル生成装置２、及び制御装置３を備えている。第１モデル生成装置１、第２モデル生成装置２、及び制御装置３は、ネットワークを介して互いに接続されてよい。ネットワークの種類は、例えば、インターネット、無線通信網、移動通信網、電話網、専用網等から適宜選択されてよい。 §1 Application Example First, an example of a scene to which the present invention is applied will be described with reference to FIG. FIG. 1 schematically illustrates an example of an application scene of the present invention. As shown in FIG. 1, a control system 100 according to this embodiment includes a first model generation device 1, a second model generation device 2, and a control device 3. FIG. The first model generation device 1, the second model generation device 2, and the control device 3 may be connected to each other via a network. The type of network may be appropriately selected from, for example, the Internet, wireless communication network, mobile communication network, telephone network, dedicated network, and the like.

＜第１モデル生成装置＞
本実施形態に係る第１モデル生成装置１は、対象の位置関係において２つの対象物が互いに接触するか否かを判定するための判定モデル５０を生成するように構成されたコンピュータである。具体的には、本実施形態に係る第１モデル生成装置１は、２つの対象物の間の位置関係を示す訓練データ１２２及び当該位置関係において２つの対象物が互いに接触するか否かを示す正解データ１２３の組み合わせによりそれぞれ構成される複数の学習データセット１２１を取得する。 <First model generation device>
The first model generation device 1 according to this embodiment is a computer configured to generate a judgment model 50 for judging whether or not two objects come into contact with each other in the positional relationship of the objects. Specifically, the first model generation device 1 according to the present embodiment uses training data 122 indicating the positional relationship between two objects and indicating whether or not the two objects come into contact with each other in the positional relationship. A plurality of learning data sets 121 each composed of a combination of correct data 123 are obtained.

本実施形態では、２つの対象物の間の位置関係は、相対座標により表現される。相対座標は、一方の対象物から他方の対象物を見たときの座標である。２つの対象物のうちのいずれを相対座標の基準に選択してよい。「座標」は、位置及び姿勢の少なくとも一方を含んでよい。３次元空間上では、位置は、前後、左右、及び上下の３つの軸により表現されてよく、姿勢は、各軸の回転（ロール、ピッチ、ヨー）で表現されてよい。本実施形態では、相対座標は、３次元の相対位置及び３次元の相対姿勢の６次元で表現されてよい。なお、相対座標の次元数は、６次元に限られなくてよく、適宜削減されてよい。 In this embodiment, the positional relationship between two objects is represented by relative coordinates. Relative coordinates are coordinates when one object is viewed from the other object. Either of the two objects may be selected as a reference for relative coordinates. "Coordinates" may include at least one of position and orientation. In a three-dimensional space, the position may be represented by three axes of forward/backward, left/right, and up/down, and the posture may be represented by rotation (roll, pitch, yaw) of each axis. In this embodiment, the relative coordinates may be expressed in six dimensions: 3D relative position and 3D relative orientation. Note that the number of dimensions of relative coordinates is not limited to six, and may be reduced as appropriate.

そして、本実施形態に係る第１モデル生成装置１は、取得された複数の学習データセット１２１を使用して、判定モデル５０の機械学習を実施する。機械学習を実施することは、各学習データセット１２１について、訓練データ１２２の入力に対して、対応する正解データ１２３に適合する出力値を出力するように判定モデル５０を訓練することにより構成される。この機械学習により、対象の位置関係において２つの対象物が互いに接触するか否かを判定する能力を習得した学習済みの判定モデル５０を構築することができる。 Then, the first model generation device 1 according to the present embodiment performs machine learning of the judgment model 50 using the acquired plurality of learning data sets 121 . Performing machine learning consists of training the decision model 50 to output an output value that matches the corresponding correct data 123 for each training data set 121 given the training data 122 input. . Through this machine learning, it is possible to build a learned decision model 50 that has acquired the ability to decide whether or not two objects touch each other in the positional relationship of the objects.

本実施形態では、学習済みの判定モデル５０は、エンドエフェクタＴを備えるマニピュレータ４、ワークＷ、及び他のワークＧが存在する空間で、ワークＷ及びエンドエフェクタＴの間で接触が生じるか否か、並びにワークＷ及び他のワークＧの間で接触が生じるか否かを判定するのに利用される。エンドエフェクタＴ、ワークＷ、及び他のワークＧは、「対象物」の一例である。エンドエフェクタＴ、ワークＷ、及び他のワークＧの種類はそれぞれ、特に限定されなくてもよく、タスクに応じて適宜選択されてよい。エンドエフェクタＴは、例えば、グリッパ、吸引器、ドライバ等であってよい。ワークＷは、例えば、コネクタ、ペグ等であってよい。他のワークＧは、例えば、ソケット、ホール等であってよい。他のワークＧは、ワークＷの組付け先の対象物の一例である。エンドエフェクタＴによりワークＷを保持することは、例えば、グリッパによりワークを把持すること、吸引器によりワークを吸引保持すること、ドライバの先端にワークを保持すること等であってよい。 In this embodiment, the learned judgment model 50 determines whether contact occurs between the work W and the end effector T in a space where the manipulator 4 having the end effector T, the work W, and another work G exist. , and to determine whether contact between the workpiece W and another workpiece G occurs. The end effector T, work W, and another work G are examples of "objects." The types of end effector T, work W, and other work G may not be particularly limited, and may be appropriately selected according to the task. The end effector T may be, for example, a gripper, aspirator, driver, or the like. The work W may be, for example, a connector, a peg, or the like. Other workpieces G may be, for example, sockets, holes, and the like. Another work G is an example of an object to which the work W is attached. Holding the work W with the end effector T may be, for example, gripping the work with a gripper, sucking and holding the work with an aspirator, holding the work at the tip of a driver, or the like.

より詳細には、本実施形態に係るマニピュレータ４は、一例として、エンドエフェクタＴによりワークＷを保持し、保持したワークＷを他のワークＧに組み付けるタスクを遂行する。このタスクは、エンドエフェクタＴによりワークＷを保持する第１タスク及び保持されたワークＷを他のワークＧに運搬する第２タスクの２つに分けることができる。エンドエフェクタＴを移動させて、ワークＷを保持する第１タスクを遂行する場面で、学習済みの判定モデル５０は、ワークＷ及びエンドエフェクタＴの間で無用な接触が生じるか否かを判定するのに利用される。また、ワークＷを保持した後、エンドエフェクタＴを移動させて、保持されたワークＷを他のワークＧに運搬する第２タスクを遂行する場面で、学習済みの判定モデル５０は、ワークＷ及び他のワークＧの間で無用な接触が生じるか否かを判定するのに利用される。 More specifically, the manipulator 4 according to the present embodiment holds a work W with an end effector T and performs a task of assembling the held work W to another work G, as an example. This task can be divided into two tasks: a first task of holding the work W by the end effector T and a second task of transporting the held work W to another work G. FIG. When the end effector T is moved to perform the first task of holding the work W, the learned determination model 50 determines whether unnecessary contact occurs between the work W and the end effector T. used for Further, in the scene of performing the second task of moving the end effector T after holding the work W and transporting the held work W to another work G, the learned judgment model 50 It is used to determine whether or not unnecessary contact occurs between other works G.

つまり、本実施形態では、学習済みの判定モデル５０により接触が生じるか否かを判定する２つの対象物のうちの少なくともいずれかは、マニピュレータ４の動作により移動する対象である。２つの対象物のうちのいずれか一方のみが、マニピュレータ４の動作により移動する対象であってもよいし、或いは、２つの対象物のうちの両方が、マニピュレータ４の動作により移動する対象であってもよい。ただし、第１モデル生成装置１の適用対象は、このような例に限定されなくてもよい。第１モデル生成装置１は、２つの対象物の接触を判定するあらゆる場面に適用されてよい。 That is, in the present embodiment, at least one of the two objects for which it is determined whether or not contact will occur by the learned determination model 50 is a target that is moved by the operation of the manipulator 4 . Only one of the two objects may be the object to be moved by the operation of the manipulator 4, or both of the two objects may be the objects to be moved by the operation of the manipulator 4. may However, the application target of the first model generation device 1 may not be limited to such an example. The first model generation device 1 may be applied to any situation where contact between two objects is determined.

なお、上記のように、接触するか否かを判定する対象が複数存在する場合、それぞれ異なる対象物の間で接触が生じるか否かを判定する複数の学習済みの判定モデル５０が用意されてもよい。或いは、学習済みの判定モデル５０は、例えば、対象物の種類、対象物の識別子等の対象物の条件を示す情報の入力を更に受け付け、入力された条件に対応する２つの対象物の間で接触が生じるか否かを判定するように構成されてもよい。いずれの方法が採用されてもよい。以下では、説明の便宜のため、学習済みの判定モデル５０の判定対象を区別せずに説明する。 As described above, when there are a plurality of targets for determining whether or not contact occurs, a plurality of learned determination models 50 for determining whether or not contact occurs between different targets are prepared. good too. Alternatively, the learned judgment model 50 further receives input of information indicating the condition of the object such as the type of the object and the identifier of the object, and between the two objects corresponding to the input condition It may be configured to determine whether contact occurs. Either method may be adopted. In the following, for convenience of explanation, the judgment targets of the learned judgment model 50 are not distinguished.

＜第２モデル生成装置＞
本実施形態に係る第２モデル生成装置２は、マニピュレータ４の動作を制御する際に、マニピュレータ４に与える目標のタスク状態を決定するための推論モデル５５を生成するように構成されたコンピュータである。本実施形態に係るマニピュレータ４は、第１対象物及び第２対象物の存在する環境下で、第２対象物に対して第１対象物を移動するタスクを遂行することができる。上記第１タスク及び第２タスクは、「第２対象物に対して第１対象物を移動するタスク」の一例である。第１タスクを遂行する場面では、エンドエフェクタＴが第１対象物の一例であり、ワークＷが第２対象物の一例である。また、第２タスクを遂行する場面では、ワークＷが第１対象物の一例であり、他のワークＧが第２対象物の一例である。本実施形態では、タスク状態は、第１対象物及び第２対象物（すなわち、２つの対象物）の間の位置関係により規定される。 <Second model generation device>
The second model generation device 2 according to this embodiment is a computer configured to generate an inference model 55 for determining a target task state to be given to the manipulator 4 when controlling the operation of the manipulator 4. . The manipulator 4 according to this embodiment can perform the task of moving the first object with respect to the second object in an environment where the first object and the second object exist. The above first task and second task are examples of "a task of moving a first object with respect to a second object". In the scene of performing the first task, the end effector T is an example of the first object, and the work W is an example of the second object. Also, in the scene of performing the second task, the work W is an example of the first target, and the other work G is an example of the second target. In this embodiment, a task state is defined by the positional relationship between a first object and a second object (ie two objects).

ここで、図２Ａ及び図２Ｂを更に用いて、第１対象物及び第２対象物の間の位置関係によりタスク状態を規定する方法の具体例について説明する。図２Ａは、上記第１タスクを遂行する場面におけるエンドエフェクタＴ及びワークＷの間の位置関係の一例を模式的に例示する。図２Ｂは、上記第２タスクを遂行する場面におけるワークＷ及び他のワークＧの間の位置関係の一例を模式的に例示する。上記のとおり、本実施形態では、２つの対象物の間の位置関係は、相対座標により表現される。 Here, a specific example of the method of defining the task state based on the positional relationship between the first object and the second object will be described with reference to FIGS. 2A and 2B. FIG. 2A schematically illustrates an example of the positional relationship between the end effector T and the work W when the first task is performed. FIG. 2B schematically illustrates an example of the positional relationship between the work W and another work G when the second task is performed. As described above, in this embodiment, the positional relationship between two objects is represented by relative coordinates.

本実施形態では、図２Ａに示されるとおり、第１タスクを遂行する場面等のように、エンドエフェクタＴがワークＷを保持していない間は、エンドエフェクタＴの注目点Ｔ０がマニピュレータ４の手先として取り扱われる。第１タスクでは、ワークＷが、エンドエフェクタＴの移動の目標物である。エンドエフェクタＴとワークＷとの間の位置関係は、エンドエフェクタＴに対するワークＷの相対座標ＲＣ１により表現される。相対座標ＲＣ１は、エンドエフェクタＴの注目点Ｔ０を原点とするローカル座標系ＣＴから見た、ワークＷの注目点Ｗ０を原点とするローカル座標系ＣＷを表す。本実施形態では、第１タスクを遂行する場面におけるマニピュレータ４のタスク状態は、この相対座標ＲＣ１により規定される。 In this embodiment, as shown in FIG. 2A, while the end effector T does not hold the workpiece W, such as when the first task is performed, the point of interest T0 of the end effector T is the tip of the manipulator 4. treated as In the first task, the work W is the target for the end effector T to move. A positional relationship between the end effector T and the work W is represented by relative coordinates RC1 of the work W with respect to the end effector T. As shown in FIG. The relative coordinates RC1 represent the local coordinate system CW having the point of interest W0 of the workpiece W as the origin, viewed from the local coordinate system CT having the point of interest T0 of the end effector T as the origin. In this embodiment, the task state of the manipulator 4 when performing the first task is defined by the relative coordinates RC1.

一方、図２Ｂに示されるとおり、第２タスクを遂行する場面等のように、エンドエフェクタＴがワークＷを保持している間、ワークＷの注目点Ｗ０がマニピュレータ４の手先として取り扱われる。第２タスクでは、他のワークＧが、エンドエフェクタＴの移動の目標物である。他のワークＧは、ワークＷの組み付け先の対象物の一例である。ワークＷと他のワークＧとの間の位置関係は、ワークＷに対する他のワークＧの相対座標ＲＣ２により表現される。相対座標ＲＣ２は、ワークＷの注目点Ｗ０を原点とするローカル座標系ＣＷから見た、他のワークＧの注目点Ｇ０を原点とするローカル座標系ＣＧを表す。本実施形態では、第２タスクを遂行する場面におけるマニピュレータ４のタスク状態は、この相対座標ＲＣ２により規定される。 On the other hand, as shown in FIG. 2B, while the end effector T holds the work W, the target point W0 of the work W is treated as the hand of the manipulator 4, such as when the second task is performed. In the second task, another workpiece G is the target of the end effector T's movement. Another work G is an example of an object to which the work W is attached. The positional relationship between the work W and another work G is expressed by relative coordinates RC2 of the other work G with respect to the work W. As shown in FIG. The relative coordinates RC2 represent a local coordinate system CG whose origin is the point of interest G0 of another work G, viewed from the local coordinate system CW whose origin is the point of interest W0 of the work W. In this embodiment, the task state of the manipulator 4 when performing the second task is defined by this relative coordinate RC2.

すなわち、本実施形態では、第１タスク及び第２タスクを遂行する両方の場面で、タスク状態は、マニピュレータ４の手先及び目標物の間の位置関係（本実施形態では、相対座標）により規定される。マニピュレータ４の手先が第１対象物に相当し、目標物が第２対象物に相当する。これにより、第１タスク及び第２タスクを共に、マニピュレータ４の手先を目標物に対して移動するタスクとして捉えることができる。したがって、本実施形態によれば、マニピュレータ４の制御処理を単純化することができ、これによって、マニピュレータ４の動作を生成又は教示するコストを低減することができる。 That is, in this embodiment, the task state is defined by the positional relationship (relative coordinates in this embodiment) between the hand of the manipulator 4 and the target in both situations where the first task and the second task are performed. be. The hand of the manipulator 4 corresponds to the first object, and the target corresponds to the second object. As a result, both the first task and the second task can be regarded as tasks of moving the hand of the manipulator 4 with respect to the target. Therefore, according to this embodiment, the control processing of the manipulator 4 can be simplified, thereby reducing the cost of generating or teaching the motion of the manipulator 4 .

なお、各注目点（Ｔ０、Ｗ０、Ｇ０）は任意に設定されてよい。また、相対座標の与え方は、上記のような例に限定されなくてもよく、実施の形態に応じて適宜決定されてよい。例えば、相対座標ＲＣ１は、ワークＷの注目点Ｗ０を原点とするローカル座標系ＣＷから見た、エンドエフェクタＴの注目点Ｔ０を原点とするローカル座標系ＣＴを表す等のように、各相対座標（ＲＣ１、ＲＣ２）の関係は反転されてもよい。また、手先を移動することは、目標物に手先を近付けることに限られなくてもよく、実施の形態に応じて適宜決定されてよい。手先を移動することは、例えば、目標物から手先を遠ざけること、目標物を基準にして所定の位置に手先を移動すること等であってよい。 Note that each attention point (T0, W0, G0) may be set arbitrarily. Also, the method of giving relative coordinates need not be limited to the above example, and may be determined as appropriate according to the embodiment. For example, the relative coordinates RC1 represent the local coordinate system CT having the point of interest T0 of the end effector T as the origin, viewed from the local coordinate system CW having the point of interest W0 of the work W as the origin. The relationship (RC1, RC2) may be inverted. Also, moving the hand need not be limited to bringing the hand closer to the target, and may be determined as appropriate according to the embodiment. Moving the hand may be, for example, moving the hand away from the target, moving the hand to a predetermined position with reference to the target, or the like.

本実施形態に係る第２モデル生成装置２は、第１対象物及び第２対象物の対象のタスク状態を示す情報を学習済みの判定モデル５０に与えることで、対象のタスク状態において第１対象物及び第２対象物が互いに接触するか否かを判定する。本実施形態に係る第２モデル生成装置２は、この学習済みの判定モデル５０による判定の結果を利用して、第１対象物が第２対象物に接触しないように、次に遷移する目標のタスク状態を決定するように構成された推論モデル５５を生成する。 The second model generation device 2 according to the present embodiment provides the learned determination model 50 with information indicating the target task states of the first target and the second target, so that the first target in the target task state Determine whether the object and the second object are in contact with each other. The second model generation device 2 according to the present embodiment utilizes the determination result of the learned determination model 50 to determine the next transition target so that the first object does not come into contact with the second object. Generate an inference model 55 configured to determine task states.

＜制御装置＞
本実施形態に係る制御装置３は、マニピュレータ４の動作を制御するように構成されたコンピュータである。具体的には、まず、本実施形態に係る制御装置３は、マニピュレータ４の手先を観測する第１センサ系から第１センシングデータを取得する。そして、本実施形態に係る制御装置３は、第１推定モデルを利用して、取得された第１センシングデータから、観測空間内における手先の現在の座標の第１推定値を算出する。また、本実施形態に係る制御装置３は、マニピュレータ４の手先を観測する第２センサ系から第２センシングデータを取得する。そして、本実施形態に係る制御装置３は、第２推定モデルを利用して、取得された第２センシングデータから、観測空間内における手先の現在の座標の第２推定値を算出する。手先の現在の座標の各推定値を算出することは、手先の座標（以下、「手先座標」とも記載する）の現在値を推定することに相当する。 <Control device>
The control device 3 according to this embodiment is a computer configured to control the operation of the manipulator 4 . Specifically, first, the control device 3 according to the present embodiment acquires first sensing data from a first sensor system that observes the tip of the manipulator 4 . Then, the control device 3 according to the present embodiment uses the first estimation model to calculate the first estimated value of the current coordinates of the hand in the observation space from the acquired first sensing data. Also, the control device 3 according to the present embodiment acquires second sensing data from a second sensor system that observes the hand of the manipulator 4 . Then, the control device 3 according to the present embodiment uses the second estimation model to calculate a second estimated value of the current coordinates of the hand in the observation space from the acquired second sensing data. Calculating each estimated value of the current coordinates of the hand corresponds to estimating the current value of the coordinates of the hand (hereinafter also referred to as "hand coordinates").

各センサ系は、１つ以上のセンサを備え、マニピュレータ４の手先を観測するように適宜構成される。本実施形態では、第１センサ系は、各関節の角度を測定するためのエンコーダＳ２及びエンドエフェクタＴに作用する力を測定するための触覚センサＳ３により構成される。エンコーダＳ２及び触覚センサＳ３により得られる測定データ（角度データ、圧力分布データ）が、第１センシングデータの一例である。また、第２センサ系は、カメラＳ１により構成される。カメラＳ１により得られる画像データが、第２センシングデータの一例である。本実施形態に係る制御装置３は、各推定モデルを利用して、各センサ系から得られるセンシングデータから現在の手先座標の各推定値を算出する。 Each sensor system includes one or more sensors and is appropriately configured to observe the hand of the manipulator 4 . In this embodiment, the first sensor system comprises an encoder S2 for measuring the angle of each joint and a tactile sensor S3 for measuring the force acting on the end effector T. As shown in FIG. Measurement data (angle data, pressure distribution data) obtained by the encoder S2 and the tactile sensor S3 is an example of the first sensing data. Also, the second sensor system is configured by the camera S1. Image data obtained by the camera S1 is an example of the second sensing data. The control device 3 according to the present embodiment uses each estimation model to calculate each estimated value of the current hand coordinates from the sensing data obtained from each sensor system.

マニピュレータ４の手先座標の真値は一つである。各センサ系からの算出過程にノイズがなく、かつ各推定モデルのパラメータが適切であれば、第１推定値と第２推定値とは一致する。これに対して、各センサ系に応じたノイズが生じることで、第１推定値と第２推定値とは互いに異なり得る。そこで、本実施形態に係る制御装置３は、第１推定値及び第２推定値の間の誤差の勾配を算出し、算出された勾配に基づいて、誤差が小さくなるように第１推定モデル及び第２推定モデルの少なくとも一方のパラメータの値を調整する。これにより、算出される各推定値が、真値に近付くことを期待することができる。 There is one true value of the hand coordinates of the manipulator 4 . If there is no noise in the calculation process from each sensor system and the parameters of each estimation model are appropriate, the first estimated value and the second estimated value will match. On the other hand, the first estimated value and the second estimated value may differ from each other due to noise generated according to each sensor system. Therefore, the control device 3 according to the present embodiment calculates the gradient of the error between the first estimated value and the second estimated value, and based on the calculated gradient, the first estimation model and the Adjust the value of at least one parameter of the second estimation model. Thereby, each estimated value calculated can be expected to approach the true value.

本実施形態に係る制御装置３は、第１推定値及び第２推定値の少なくとも一方に基づいて、手先の座標が目標値に近付くように、マニピュレータ４に与える制御指令を決定する。そして、本実施形態に係る制御装置３は、決定された制御指令をマニピュレータ４に与えることで、マニピュレータ４を駆動する。これにより、本実施形態に係る制御装置３は、マニピュレータ４の動作を制御する。 Based on at least one of the first estimated value and the second estimated value, the control device 3 according to the present embodiment determines a control command to be given to the manipulator 4 so that the coordinates of the hand end approach the target value. Then, the control device 3 according to the present embodiment drives the manipulator 4 by giving the determined control command to the manipulator 4 . Thereby, the control device 3 according to this embodiment controls the operation of the manipulator 4 .

なお、手先座標の目標値を決定する方法は、特に限定されなくてよく、実施の形態に応じて適宜選択されてよい。本実施形態では、手先座標の目標を決定するのに、上記推論モデル５５を利用することができる。すなわち、本実施形態に係る制御装置３は、マニピュレータ４の現在のタスク状態を取得する。上記のとおり、タスク状態は、マニピュレータ４の手先及び目標物の間の位置関係により規定される。本実施形態に係る制御装置３は、上記推論モデル５５を利用して、最終目標のタスク状態に近付くように、取得された現在のタスク状態に対して次に遷移する目標のタスク状態を決定する。そして、本実施形態に係る制御装置３は、次に遷移する目標のタスク状態から手先座標の目標値を算出する。これにより、本実施形態では、タスクを遂行する過程で、手先座標の目標値を適切に決定することができる。 Note that the method of determining the target value of the hand coordinates is not particularly limited, and may be appropriately selected according to the embodiment. In this embodiment, the inference model 55 described above can be used to determine the hand coordinate target. That is, the control device 3 according to this embodiment acquires the current task state of the manipulator 4 . As described above, the task state is defined by the positional relationship between the hand of the manipulator 4 and the target. The control device 3 according to the present embodiment uses the inference model 55 to determine the target task state to which the obtained current task state should transition next so as to approach the final target task state. . Then, the control device 3 according to the present embodiment calculates the target value of the hand coordinates from the target task state to be transitioned next. As a result, in the present embodiment, it is possible to appropriately determine the target value of the hand coordinates during the process of performing the task.

＜作用効果＞
以上のとおり、本実施形態に係る制御装置３では、互いの推定結果（推定値）が一つの値に近付くように、第１推定モデル及び第２推定モデルの少なくとも一方のパラメータを調整する。この調整により、各推定モデルによる手先の座標の推定精度の改善を期待することができる。したがって、当該構成によれば、各推定モデルによる手先の座標の推定精度を改善し、これによって、マニピュレータの手先の座標を制御する精度の向上を図ることができる。 <Effect>
As described above, the control device 3 according to the present embodiment adjusts the parameters of at least one of the first estimation model and the second estimation model so that the estimation results (estimated values) of each other approach one value. This adjustment is expected to improve the accuracy of estimating the coordinates of the hand using each estimation model. Therefore, according to this configuration, it is possible to improve the accuracy of estimating the coordinates of the hand by each estimation model, thereby improving the accuracy of controlling the coordinates of the hand of the manipulator.

また、マニピュレータの従来の制御方法では、遂行するタスクに対して、マニピュレータに与える時系列の制御指令を直接的に関連付けていた。すなわち、従来の制御方法では、遂行するタスクを制御指令の系列により直接的に記述していた。そのため、タスクを遂行する環境及び対象物の少なくとも一方が少しでも変化した場合に、学習結果がその変化に対応することができず、そのタスクを適切に遂行できなくなる可能性があった。 Further, in the conventional manipulator control method, the tasks to be performed are directly associated with time-series control commands given to the manipulator. That is, in the conventional control method, the task to be performed is directly described by a sequence of control commands. Therefore, if at least one of the environment and the object for performing the task changes even slightly, the learning result cannot correspond to the change, and there is a possibility that the task cannot be performed properly.

例えば、エンドエフェクタによりワークを保持するタスクをマニピュレータに教示する場面を想定する。この場合に、ワークが対象の地点に正確に配置されていれば、マニピュレータは、学習結果に基づいて、エンドエフェクタによりワークを保持することができる。一方、ワークの姿勢が学習時とは異なっていたり、ワークが学習時と異なる位置に配置されたりした場合には、エンドエフェクタによりワークを保持する座標が変化する。これにより、マニピュレータがこの場面で遂行すべきタスクの内容が実質的に変化してしまう。そのため、学習結果により得た制御指令の系列では、マニピュレータは、エンドエフェクタによりワークを適切に保持できない可能性がある。 For example, assume that a manipulator is taught a task of holding a workpiece by an end effector. In this case, if the work is accurately positioned at the target point, the manipulator can hold the work with the end effector based on the learning result. On the other hand, if the posture of the work is different from that during learning, or if the work is arranged at a position different from that during learning, the coordinates at which the end effector holds the work change. This substantially changes the content of the task that the manipulator should perform in this scene. Therefore, the manipulator may not be able to properly hold the workpiece with the end effector in the sequence of control commands obtained from the learning results.

このように、従来の制御方法では、タスクを遂行する環境及び対象物の少なくとも一方が少しでも変化した場合に、学習結果がその変化に対応できず、そのタスクを新たに学習させなければ、マニピュレータが、そのタスクを適切に遂行できない可能性があるという問題点があった。これに起因して、マニピュレータを汎用的に動作させるためには、同じタスクでも異なる状態毎に制御指令を学習させることになり、マニピュレータにタスクを教示するのにかかるコストが依然として高かった。 As described above, in the conventional control method, even if at least one of the environment and the object in which the task is performed changes even slightly, the learning result cannot cope with the change. However, there was a problem that there was a possibility that the task could not be performed appropriately. Due to this, in order to operate the manipulator for general purposes, control commands must be learned for each different state even for the same task, and the cost of teaching tasks to the manipulator is still high.

これに対して、本実施形態では、マニピュレータ４により実行するタスクの状態が、エンドエフェクタＴ、ワークＷ、他のワークＧ等の対象物間の相対的な関係、具体的には、対象物間の位置関係により表現される。これにより、マニピュレータ４に与えられる制御指令は、タスクに直接的に関連付けられるのではなく、対象物間の相対的な位置関係の変化量に関連付けられる。すなわち、タスクの内容に依存せずに、対象物の相対的な位置関係を変化させることに対して、マニピュレータ４に与える時系列の制御指令を生成又は教示することができる。例えば、上記の例では、ワークの座標が変化しても、エンドエフェクタとワークとの間の位置関係を把握する際に、そのワークの座標の変化が考慮される。そのため、マニピュレータは、学習結果に基づいて、ワークを適切に保持することができる。したがって、本実施形態によれば、習得されるタスクを遂行する能力の汎用性を高めることができ、これによって、マニピュレータ４にタスクを教示するのにかかるコストを低減することができる。 On the other hand, in this embodiment, the state of the task executed by the manipulator 4 is the relative relationship among objects such as the end effector T, the work W, and another work G. is represented by the positional relationship of Thereby, the control command given to the manipulator 4 is not directly related to the task, but is related to the amount of change in the relative positional relationship between the objects. That is, it is possible to generate or teach a time-series control command to be given to the manipulator 4 for changing the relative positional relationship of the objects without depending on the content of the task. For example, in the above example, even if the coordinates of the work change, the change in the coordinates of the work is taken into consideration when grasping the positional relationship between the end effector and the work. Therefore, the manipulator can appropriately hold the workpiece based on the learning result. Therefore, according to this embodiment, it is possible to increase the versatility of the ability to perform the learned task, thereby reducing the cost of teaching the manipulator 4 the task.

更に、本実施形態に係る第１モデル生成装置１は、機械学習により、対象の位置関係において２つの対象物が接触するか否かを判定するための判定モデル５０を生成する。機械学習により生成された学習済みの判定モデル５０によれば、対象の位置関係（本実施形態では、相対座標）が連続値で与えられても、判定モデル５０のデータ量の大きな増加を伴うことなく、その位置関係で２つの対象物が互いに接触するか否かを判定することができる。そのため、本実施形態によれば、２つの対象物が接触する境界を表現する情報のデータ量を大幅に低減することができる。 Furthermore, the first model generation device 1 according to the present embodiment generates a judgment model 50 for judging whether or not two objects come into contact with each other in the positional relationship of the objects by machine learning. According to the learned judgment model 50 generated by machine learning, even if the positional relationship of the target (relative coordinates in this embodiment) is given as continuous values, the amount of data of the judgment model 50 is greatly increased. It is possible to determine whether or not two objects are in contact with each other based on their positional relationship. Therefore, according to this embodiment, it is possible to greatly reduce the data amount of information representing the boundary where two objects come into contact.

§２構成例
［ハードウェア構成］
＜第１モデル生成装置＞
次に、図３を用いて、本実施形態に係る第１モデル生成装置１のハードウェア構成の一例について説明する。図３は、本実施形態に係る第１モデル生成装置１のハードウェア構成の一例を模式的に例示する。 §2 Configuration example [Hardware configuration]
<First model generation device>
Next, an example of the hardware configuration of the first model generating device 1 according to this embodiment will be described using FIG. FIG. 3 schematically illustrates an example of the hardware configuration of the first model generation device 1 according to this embodiment.

図３に示されるとおり、本実施形態に係る第１モデル生成装置１は、制御部１１、記憶部１２、通信インタフェース１３、外部インタフェース１４、入力装置１５、出力装置１６、及びドライブ１７が電気的に接続されたコンピュータである。なお、図３では、通信インタフェース及び外部インタフェースを「通信Ｉ／Ｆ」及び「外部Ｉ／Ｆ」と記載している。 As shown in FIG. 3, the first model generation device 1 according to the present embodiment includes a control unit 11, a storage unit 12, a communication interface 13, an external interface 14, an input device 15, an output device 16, and a drive 17. is a computer connected to Incidentally, in FIG. 3, the communication interface and the external interface are described as "communication I/F" and "external I/F".

制御部１１は、ハードウェアプロセッサであるＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等を含み、プログラム及び各種データに基づいて情報処理を実行するように構成される。記憶部１２は、メモリの一例であり、例えば、ハードディスクドライブ、ソリッドステートドライブ等で構成される。本実施形態では、記憶部１２は、モデル生成プログラム８１、ＣＡＤ（computer-aided design）データ１２０、複数の学習データセット１２１、学習結果データ１２５等の各種情報を記憶する。 The control unit 11 includes a CPU (Central Processing Unit), which is a hardware processor, RAM (Random Access Memory), ROM (Read Only Memory), etc., and is configured to execute information processing based on programs and various data. be. The storage unit 12 is an example of memory, and is configured by, for example, a hard disk drive, a solid state drive, or the like. In this embodiment, the storage unit 12 stores various information such as a model generation program 81, CAD (computer-aided design) data 120, a plurality of learning data sets 121, learning result data 125, and the like.

モデル生成プログラム８１は、判定モデル５０の機械学習に関する後述の情報処理（図１０）を第１モデル生成装置１に実行させるためのプログラムである。モデル生成プログラム８１は、当該情報処理の一連の命令を含む。ＣＡＤデータ１２０は、各対象物（エンドエフェクタＴ、ワークＷ、他のワークＧ）のモデル（例えば、３次元モデル）等の幾何学的な構成を示す構成情報を含む。ＣＡＤデータ１２０は、公知のソフトウェアにより生成されてよい。複数の学習データセット１２１は、判定モデル５０の機械学習に使用される。学習結果データ１２５は、機械学習により生成された学習済みの判定モデル５０に関する情報を示す。学習結果データ１２５は、モデル生成プログラム８１を実行した結果として得られる。詳細は後述する。 The model generation program 81 is a program for causing the first model generation device 1 to execute information processing ( FIG. 10 ), which will be described later, regarding machine learning of the judgment model 50 . The model generation program 81 includes a series of instructions for the information processing. The CAD data 120 includes configuration information indicating the geometric configuration of models (for example, three-dimensional models) of objects (end effector T, work W, other work G). CAD data 120 may be generated by known software. A plurality of learning data sets 121 are used for machine learning of the judgment model 50 . The learning result data 125 indicates information about the learned judgment model 50 generated by machine learning. Learning result data 125 is obtained as a result of executing the model generation program 81 . Details will be described later.

通信インタフェース１３は、例えば、有線ＬＡＮ（Local Area Network）モジュール、無線ＬＡＮモジュール等であり、ネットワークを介した有線又は無線通信を行うためのインタフェースである。第１モデル生成装置１は、この通信インタフェース１３を利用することで、ネットワークを介したデータ通信を他の情報処理装置（例えば、第２モデル生成装置２、制御装置３）と行うことができる。 The communication interface 13 is, for example, a wired LAN (Local Area Network) module, a wireless LAN module, or the like, and is an interface for performing wired or wireless communication via a network. By using the communication interface 13, the first model generation device 1 can perform data communication with other information processing devices (for example, the second model generation device 2 and the control device 3) via the network.

外部インタフェース１４は、例えば、ＵＳＢ（Universal Serial Bus）ポート、専用ポート等であり、外部装置と接続するためのインタフェースである。外部インタフェース１４の種類及び数は、接続される外部装置の種類及び数に応じて適宜選択されてよい。第１モデル生成装置１は、実空間において対象物が接触するか否かを判定するために、外部インタフェース１４を介して、マニピュレータ４及びカメラＳ１と接続されてもよい。 The external interface 14 is, for example, a USB (Universal Serial Bus) port, a dedicated port, or the like, and is an interface for connecting with an external device. The type and number of external interfaces 14 may be appropriately selected according to the type and number of external devices to be connected. The first model generation device 1 may be connected to the manipulator 4 and the camera S1 via the external interface 14 in order to determine whether or not the object will come into contact in real space.

入力装置１５は、例えば、マウス、キーボード等の入力を行うための装置である。また、出力装置１６は、例えば、ディスプレイ、スピーカ等の出力を行うための装置である。オペレータは、入力装置１５及び出力装置１６を利用することで、第１モデル生成装置１を操作することができる。 The input device 15 is, for example, a device for performing input such as a mouse and a keyboard. Also, the output device 16 is, for example, a device for outputting such as a display and a speaker. An operator can operate the first model generation device 1 by using the input device 15 and the output device 16 .

ドライブ１７は、例えば、ＣＤドライブ、ＤＶＤドライブ等であり、記憶媒体９１に記憶されたプログラムを読み込むためのドライブ装置である。ドライブ１７の種類は、記憶媒体９１の種類に応じて適宜選択されてよい。上記モデル生成プログラム８１、ＣＡＤデータ１２０、及び複数の学習データセット１２１の少なくともいずれかは、この記憶媒体９１に記憶されていてもよい。 The drive 17 is, for example, a CD drive, a DVD drive, or the like, and is a drive device for reading programs stored in the storage medium 91 . The type of drive 17 may be appropriately selected according to the type of storage medium 91 . At least one of the model generation program 81 , the CAD data 120 , and the plurality of learning data sets 121 may be stored in this storage medium 91 .

記憶媒体９１は、コンピュータその他装置、機械等が、記憶されたプログラム等の情報を読み取り可能なように、当該プログラム等の情報を、電気的、磁気的、光学的、機械的又は化学的作用によって蓄積する媒体である。第１モデル生成装置１は、この記憶媒体９１から、上記モデル生成プログラム８１、ＣＡＤデータ１２０、及び複数の学習データセット１２１の少なくともいずれかを取得してもよい。 The storage medium 91 stores information such as programs by electrical, magnetic, optical, mechanical or chemical action so that computers, other devices, machines, etc. can read information such as programs. It is a storage medium. The first model generation device 1 may acquire at least one of the model generation program 81 , the CAD data 120 , and the plurality of learning data sets 121 from the storage medium 91 .

ここで、図３では、記憶媒体９１の一例として、ＣＤ、ＤＶＤ等のディスク型の記憶媒体を例示している。しかしながら、記憶媒体９１の種類は、ディスク型に限定される訳ではなく、ディスク型以外であってもよい。ディスク型以外の記憶媒体として、例えば、フラッシュメモリ等の半導体メモリを挙げることができる。 Here, in FIG. 3, as an example of the storage medium 91, a disk-type storage medium such as a CD or DVD is illustrated. However, the type of storage medium 91 is not limited to the disc type, and may be other than the disc type. As a storage medium other than the disk type, for example, a semiconductor memory such as a flash memory can be cited.

なお、第１モデル生成装置１の具体的なハードウェア構成に関して、実施形態に応じて、適宜、構成要素の省略、置換及び追加が可能である。例えば、制御部１１は、複数のハードウェアプロセッサを含んでもよい。ハードウェアプロセッサは、マイクロプロセッサ、ＦＰＧＡ（field-programmable gate array）、ＤＳＰ（digital signal processor）等で構成されてよい。記憶部１２は、制御部１１に含まれるＲＡＭ及びＲＯＭにより構成されてもよい。通信インタフェース１３、外部インタフェース１４、入力装置１５、出力装置１６及びドライブ１７の少なくともいずれかは省略されてもよい。第１モデル生成装置１は、複数台のコンピュータで構成されてもよい。この場合、各コンピュータのハードウェア構成は、一致していてもよいし、一致していなくてもよい。また、第１モデル生成装置１は、提供されるサービス専用に設計された情報処理装置の他、汎用のサーバ装置、ＰＣ（Personal Computer）等であってもよい。 Regarding the specific hardware configuration of the first model generation device 1, it is possible to omit, replace, and add components as appropriate according to the embodiment. For example, control unit 11 may include multiple hardware processors. The hardware processor may comprise a microprocessor, a field-programmable gate array (FPGA), a digital signal processor (DSP), or the like. The storage unit 12 may be configured by RAM and ROM included in the control unit 11 . At least one of the communication interface 13, the external interface 14, the input device 15, the output device 16 and the drive 17 may be omitted. The first model generation device 1 may be composed of a plurality of computers. In this case, the hardware configuration of each computer may or may not match. The first model generation device 1 may be an information processing device designed exclusively for the service provided, or may be a general-purpose server device, a PC (Personal Computer), or the like.

＜第２モデル生成装置＞
次に、図４を用いて、本実施形態に係る第２モデル生成装置２のハードウェア構成の一例について説明する。図４は、本実施形態に係る第２モデル生成装置２のハードウェア構成の一例を模式的に例示する。 <Second model generation device>
Next, an example of the hardware configuration of the second model generating device 2 according to this embodiment will be described using FIG. FIG. 4 schematically illustrates an example of the hardware configuration of the second model generation device 2 according to this embodiment.

図４に示されるとおり、本実施形態に係る第２モデル生成装置２は、制御部２１、記憶部２２、通信インタフェース２３、外部インタフェース２４、入力装置２５、出力装置２６、及びドライブ２７が電気的に接続されたコンピュータである。なお、図４では、図３と同様に、通信インタフェース及び外部インタフェースを「通信Ｉ／Ｆ」及び「外部Ｉ／Ｆ」と記載している。 As shown in FIG. 4, the second model generating device 2 according to the present embodiment includes a control unit 21, a storage unit 22, a communication interface 23, an external interface 24, an input device 25, an output device 26, and a drive 27. is a computer connected to Incidentally, in FIG. 4, the communication interface and the external interface are described as "communication I/F" and "external I/F" as in FIG.

第２モデル生成装置２の制御部２１～ドライブ２７はそれぞれ、上記第１モデル生成装置１の制御部１１～ドライブ１７それぞれと同様に構成されてよい。すなわち、制御部２１は、ハードウェアプロセッサであるＣＰＵ、ＲＡＭ、ＲＯＭ等を含み、プログラム及びデータに基づいて各種情報処理を実行するように構成される。記憶部２２は、例えば、ハードディスクドライブ、ソリッドステートドライブ等で構成される。記憶部２２は、モデル生成プログラム８２、ＣＡＤデータ２２０、学習結果データ１２５、学習データ２２３、推論モデルデータ２２５等の各種情報を記憶する。 The control section 21 to the drive 27 of the second model generation device 2 may be configured similarly to the control section 11 to the drive 17 of the first model generation device 1, respectively. That is, the control unit 21 includes a hardware processor such as a CPU, a RAM, and a ROM, and is configured to execute various types of information processing based on programs and data. The storage unit 22 is composed of, for example, a hard disk drive, a solid state drive, or the like. The storage unit 22 stores various information such as the model generation program 82, the CAD data 220, the learning result data 125, the learning data 223, the inference model data 225, and the like.

モデル生成プログラム８２は、目標のタスク状態を推論するための推論モデル５５の生成に関する後述の情報処理（図１１）を第２モデル生成装置２に実行させるためのプログラムである。モデル生成プログラム８２は、当該情報処理の一連の命令を含む。ＣＡＤデータ２２０は、上記ＣＡＤデータ１２０と同様に、各対象物（エンドエフェクタＴ、ワークＷ、他のワークＧ）のモデル等の幾何学的な構成を示す構成情報を含む。学習結果データ１２５は、学習済みの判定モデル５０の設定に利用される。学習データ２２３は、推論モデル５５の生成に使用される。推論モデルデータ２２５は、生成された推論モデル５５に関する情報を示す。推論モデルデータ２２５は、モデル生成プログラム８２を実行した結果として得られる。詳細は後述する。 The model generation program 82 is a program for causing the second model generation device 2 to execute information processing (FIG. 11) related to generation of the inference model 55 for inferring the target task state. Model generation program 82 includes a series of instructions for the information processing. Similar to the CAD data 120 described above, the CAD data 220 includes configuration information indicating the geometric configuration of the models of the objects (the end effector T, the work W, and the other work G). The learning result data 125 is used for setting the learned judgment model 50 . Learning data 223 is used to generate inference model 55 . Inference model data 225 indicates information about the generated inference model 55 . Inference model data 225 is obtained as a result of executing model generation program 82 . Details will be described later.

通信インタフェース２３は、例えば、有線ＬＡＮモジュール、無線ＬＡＮモジュール等であり、ネットワークを介した有線又は無線通信を行うためのインタフェースである。第２モデル生成装置２は、この通信インタフェース２３を利用することで、ネットワークを介したデータ通信を他の情報処理装置（例えば、第１モデル生成装置１、制御装置３）と行うことができる。 The communication interface 23 is, for example, a wired LAN module, a wireless LAN module, etc., and is an interface for performing wired or wireless communication via a network. By using the communication interface 23, the second model generation device 2 can perform data communication with other information processing devices (for example, the first model generation device 1 and the control device 3) via the network.

外部インタフェース２４は、例えば、ＵＳＢポート、専用ポート等であり、外部装置と接続するためのインタフェースである。外部インタフェース２４の種類及び数は、接続される外部装置の種類及び数に応じて適宜選択されてよい。第２モデル生成装置２は、実空間においてタスク状態を再現するために、外部インタフェース２４を介して、マニピュレータ４及びカメラＳ１と接続されてもよい。 The external interface 24 is, for example, a USB port, a dedicated port, etc., and is an interface for connecting with an external device. The type and number of external interfaces 24 may be appropriately selected according to the type and number of external devices to be connected. The second model generation device 2 may be connected to the manipulator 4 and the camera S1 via the external interface 24 in order to reproduce the task state in real space.

入力装置２５は、例えば、マウス、キーボード等の入力を行うための装置である。また、出力装置２６は、例えば、ディスプレイ、スピーカ等の出力を行うための装置である。オペレータは、入力装置２５及び出力装置２６を利用することで、第２モデル生成装置２を操作することができる。 The input device 25 is, for example, a device for performing input such as a mouse and a keyboard. Also, the output device 26 is, for example, a device for outputting such as a display and a speaker. The operator can operate the second model generation device 2 by using the input device 25 and the output device 26 .

ドライブ２７は、例えば、ＣＤドライブ、ＤＶＤドライブ等であり、記憶媒体９２に記憶されたプログラムを読み込むためのドライブ装置である。記憶媒体９２の種類は、上記記憶媒体９１と同様に、ディスク型であってもよいし、或いはディスク型以外であってもよい。上記モデル生成プログラム８２、ＣＡＤデータ２２０、学習結果データ１２５、及び学習データ２２３のうちの少なくともいずれかは、記憶媒体９２に記憶されていてもよい。また、第２モデル生成装置２は、記憶媒体９２から、上記モデル生成プログラム８２、ＣＡＤデータ２２０、学習結果データ１２５、及び学習データ２２３のうちの少なくともいずれかを取得してもよい。 The drive 27 is, for example, a CD drive, a DVD drive, or the like, and is a drive device for reading programs stored in the storage medium 92 . As with the storage medium 91, the type of the storage medium 92 may be a disk type, or may be other than the disk type. At least one of the model generation program 82 , the CAD data 220 , the learning result data 125 and the learning data 223 may be stored in the storage medium 92 . Also, the second model generation device 2 may acquire at least one of the model generation program 82 , the CAD data 220 , the learning result data 125 and the learning data 223 from the storage medium 92 .

なお、第２モデル生成装置２の具体的なハードウェア構成に関して、実施形態に応じて、適宜、構成要素の省略、置換及び追加が可能である。例えば、制御部２１は、複数のハードウェアプロセッサを含んでもよい。ハードウェアプロセッサは、マイクロプロセッサ、ＦＰＧＡ、ＤＳＰ等で構成されてよい。記憶部２２は、制御部２１に含まれるＲＡＭ及びＲＯＭにより構成されてもよい。通信インタフェース２３、外部インタフェース２４、入力装置２５、出力装置２６、及びドライブ２７の少なくともいずれかは省略されてもよい。第２モデル生成装置２は、複数台のコンピュータで構成されてもよい。この場合、各コンピュータのハードウェア構成は、一致していてもよいし、一致していなくてもよい。また、第２モデル生成装置２は、提供されるサービス専用に設計された情報処理装置の他、汎用のサーバ装置、汎用のＰＣ等であってもよい。 Regarding the specific hardware configuration of the second model generation device 2, it is possible to omit, replace, and add components as appropriate according to the embodiment. For example, the controller 21 may include multiple hardware processors. A hardware processor may comprise a microprocessor, FPGA, DSP, or the like. The storage unit 22 may be configured by RAM and ROM included in the control unit 21 . At least one of the communication interface 23, the external interface 24, the input device 25, the output device 26, and the drive 27 may be omitted. The second model generation device 2 may be composed of a plurality of computers. In this case, the hardware configuration of each computer may or may not match. The second model generation device 2 may be an information processing device designed exclusively for the service provided, or may be a general-purpose server device, a general-purpose PC, or the like.

＜制御装置＞
次に、図５を用いて、本実施形態に係る制御装置３のハードウェア構成の一例について説明する。図５は、本実施形態に係る制御装置３のハードウェア構成の一例を模式的に例示する。 <Control device>
Next, an example of the hardware configuration of the control device 3 according to this embodiment will be described using FIG. FIG. 5 schematically illustrates an example of the hardware configuration of the control device 3 according to this embodiment.

図５に示されるとおり、本実施形態に係る制御装置３は、制御部３１、記憶部３２、通信インタフェース３３、外部インタフェース３４、入力装置３５、出力装置３６、及びドライブ３７が電気的に接続されたコンピュータである。なお、図５では、図３及び図４と同様に、通信インタフェース及び外部インタフェースを「通信Ｉ／Ｆ」及び「外部Ｉ／Ｆ」と記載している。 As shown in FIG. 5, in the control device 3 according to the present embodiment, a control unit 31, a storage unit 32, a communication interface 33, an external interface 34, an input device 35, an output device 36, and a drive 37 are electrically connected. computer. 5, the communication interface and the external interface are described as "communication I/F" and "external I/F" in the same way as in FIGS.

制御装置３の制御部３１～ドライブ３７はそれぞれ、上記第１モデル生成装置１の制御部１１～ドライブ１７それぞれと同様に構成されてよい。すなわち、制御部３１は、ハードウェアプロセッサであるＣＰＵ、ＲＡＭ、ＲＯＭ等を含み、プログラム及びデータに基づいて各種情報処理を実行するように構成される。記憶部３２は、例えば、ハードディスクドライブ、ソリッドステートドライブ等で構成される。記憶部３２は、制御プログラム８３、ＣＡＤデータ３２０、ロボットデータ３２１、推論モデルデータ２２５等の各種情報を記憶する。 The control units 31 to 37 of the control device 3 may be configured similarly to the control units 11 to 17 of the first model generation device 1, respectively. That is, the control unit 31 includes a hardware processor such as a CPU, a RAM, and a ROM, and is configured to execute various types of information processing based on programs and data. The storage unit 32 is composed of, for example, a hard disk drive, a solid state drive, or the like. The storage unit 32 stores various information such as the control program 83, CAD data 320, robot data 321, inference model data 225, and the like.

制御プログラム８３は、マニピュレータ４の動作の制御に関する後述の情報処理（図１６Ａ、図１６Ｂ、図２０、及び図２１）を制御装置３に実行させるためのプログラムである。制御プログラム８３は、当該情報処理の一連の命令を含む。ＣＡＤデータ３２０は、上記ＣＡＤデータ１２０と同様に、各対象物（エンドエフェクタＴ、ワークＷ、他のワークＧ）のモデル等の幾何学的な構成を示す構成情報を含む。ロボットデータ３２１は、各関節のパラメータ等のマニピュレータ４の構成を示す構成情報を含む。推論モデルデータ２２５は、生成された推論モデル５５の設定に利用される。詳細は後述する。 The control program 83 is a program for causing the control device 3 to execute information processing (FIGS. 16A, 16B, 20, and 21) regarding the control of the operation of the manipulator 4, which will be described later. The control program 83 includes a series of instructions for the information processing. Similar to the CAD data 120 described above, the CAD data 320 includes configuration information indicating the geometric configuration of the model of each object (end effector T, work W, other work G). The robot data 321 includes configuration information indicating the configuration of the manipulator 4 such as parameters of each joint. The inference model data 225 is used to set the generated inference model 55 . Details will be described later.

通信インタフェース３３は、例えば、有線ＬＡＮモジュール、無線ＬＡＮモジュール等であり、ネットワークを介した有線又は無線通信を行うためのインタフェースである。制御装置３は、この通信インタフェース３３を利用することで、ネットワークを介したデータ通信を他の情報処理装置（例えば、第１モデル生成装置１、第２モデル生成装置２）と行うことができる。 The communication interface 33 is, for example, a wired LAN module, a wireless LAN module, or the like, and is an interface for performing wired or wireless communication via a network. By using the communication interface 33, the control device 3 can perform data communication with other information processing devices (for example, the first model generation device 1 and the second model generation device 2) via the network.

外部インタフェース３４は、例えば、ＵＳＢポート、専用ポート等であり、外部装置と接続するためのインタフェースである。外部インタフェース３４の種類及び数は、接続される外部装置の種類及び数に応じて適宜選択されてよい。制御装置３は、外部インタフェース３４を介して、カメラＳ１及びマニピュレータ４と接続されてよい。本実施形態では、マニピュレータ４は、各関節の角度を測定するエンコーダＳ２、及びエンドエフェクタＴに作用する力を測定する触覚センサＳ３を備える。 The external interface 34 is, for example, a USB port, a dedicated port, etc., and is an interface for connecting with an external device. The type and number of external interfaces 34 may be appropriately selected according to the type and number of external devices to be connected. The control device 3 may be connected with the camera S1 and the manipulator 4 via the external interface 34 . In this embodiment, the manipulator 4 includes an encoder S2 that measures the angle of each joint, and a tactile sensor S3 that measures the force acting on the end effector T. FIG.

カメラＳ１、エンコーダＳ２、及び触覚センサＳ３それぞれの種類は、特に限定されなくてもよく、実施の形態に応じて適宜決定されてよい。カメラＳ１は、例えば、ＲＧＢ画像を取得するよう構成された一般的なデジタルカメラ、深度画像を取得するように構成された深度カメラ、赤外線量を画像化するように構成された赤外線カメラ等であってよい。触覚センサＳ３は、例えば、タクタイルセンサ等であってよい。 The types of the camera S1, the encoder S2, and the tactile sensor S3 may not be particularly limited, and may be appropriately determined according to the embodiment. The camera S1 is, for example, a general digital camera configured to acquire RGB images, a depth camera configured to acquire depth images, an infrared camera configured to image infrared quantities, or the like. you can The tactile sensor S3 may be, for example, a tactile sensor or the like.

制御装置３は、外部インタフェース３４を介して、各センサ（カメラＳ１、各エンコーダＳ２、触覚センサＳ３）からセンシングデータを取得することができる。なお、カメラＳ１及びマニピュレータ４との接続方法は、このような例に限定されなくてもよい。例えば、カメラＳ１及びマニピュレータ４が通信インタフェースを備える場合、制御装置３は、通信インタフェース３３を介して、カメラＳ１及びマニピュレータ４に接続されてもよい。 The control device 3 can acquire sensing data from each sensor (camera S1, each encoder S2, tactile sensor S3) via the external interface . Note that the method of connecting the camera S1 and the manipulator 4 need not be limited to such an example. For example, if the camera S1 and the manipulator 4 have communication interfaces, the control device 3 may be connected to the camera S1 and the manipulator 4 via the communication interface 33 .

入力装置３５は、例えば、マウス、キーボード等の入力を行うための装置である。また、出力装置３６は、例えば、ディスプレイ、スピーカ等の出力を行うための装置である。オペレータは、入力装置３５及び出力装置３６を利用することで、制御装置３を操作することができる。 The input device 35 is, for example, a device for performing input such as a mouse and a keyboard. Also, the output device 36 is, for example, a device for outputting such as a display and a speaker. An operator can operate the control device 3 by using the input device 35 and the output device 36 .

ドライブ３７は、例えば、ＣＤドライブ、ＤＶＤドライブ等であり、記憶媒体９３に記憶されたプログラムを読み込むためのドライブ装置である。記憶媒体９３の種類は、上記記憶媒体９１と同様に、ディスク型であってもよいし、或いはディスク型以外であってもよい。上記制御プログラム８３、ＣＡＤデータ３２０、ロボットデータ３２１、及び推論モデルデータ２２５のうちの少なくともいずれかは、記憶媒体９３に記憶されていてもよい。また、制御装置３は、記憶媒体９３から、上記制御プログラム８３、ＣＡＤデータ３２０、ロボットデータ３２１、及び推論モデルデータ２２５のうちの少なくともいずれかを取得してもよい。 The drive 37 is, for example, a CD drive, a DVD drive, or the like, and is a drive device for reading programs stored in the storage medium 93 . As with the storage medium 91, the type of the storage medium 93 may be a disk type, or may be other than the disk type. At least one of the control program 83 , the CAD data 320 , the robot data 321 and the inference model data 225 may be stored in the storage medium 93 . Also, the control device 3 may acquire at least one of the control program 83 , the CAD data 320 , the robot data 321 and the inference model data 225 from the storage medium 93 .

なお、制御装置３の具体的なハードウェア構成に関して、実施形態に応じて、適宜、構成要素の省略、置換及び追加が可能である。例えば、制御部３１は、複数のハードウェアプロセッサを含んでもよい。ハードウェアプロセッサは、マイクロプロセッサ、ＦＰＧＡ、ＤＳＰ等で構成されてよい。記憶部３２は、制御部３１に含まれるＲＡＭ及びＲＯＭにより構成されてもよい。通信インタフェース３３、外部インタフェース３４、入力装置３５、出力装置３６、及びドライブ３７の少なくともいずれかは省略されてもよい。制御装置３は、複数台のコンピュータで構成されてもよい。この場合、各コンピュータのハードウェア構成は、一致していてもよいし、一致していなくてもよい。また、制御装置３は、提供されるサービス専用に設計された情報処理装置の他、汎用のサーバ装置、汎用のＰＣ、ＰＬＣ（programmable logic controller）等であってもよい。 Regarding the specific hardware configuration of the control device 3, it is possible to omit, replace, and add components as appropriate according to the embodiment. For example, the controller 31 may include multiple hardware processors. A hardware processor may comprise a microprocessor, FPGA, DSP, or the like. The storage unit 32 may be configured by RAM and ROM included in the control unit 31 . At least one of the communication interface 33, the external interface 34, the input device 35, the output device 36, and the drive 37 may be omitted. The control device 3 may be composed of a plurality of computers. In this case, the hardware configuration of each computer may or may not match. The control device 3 may be an information processing device designed exclusively for the service provided, or may be a general-purpose server device, a general-purpose PC, a PLC (programmable logic controller), or the like.

＜マニピュレータ＞
次に、図６を用いて、本実施形態に係るマニピュレータ４のハードウェア構成の一例について説明する。図６は、本実施形態に係るマニピュレータ４のハードウェア構成の一例を模式的に例示する。 <Manipulator>
Next, an example of the hardware configuration of the manipulator 4 according to this embodiment will be described using FIG. FIG. 6 schematically illustrates an example of the hardware configuration of the manipulator 4 according to this embodiment.

本実施形態に係るマニピュレータ４は、６軸の垂直多関節型の産業用ロボットであり、台座部４０及び６つの関節部４１～４６を備えている。各関節部４１～４６は、サーボモータ（不図示）を内蔵していることで、各軸を中心に回転可能に構成されている。第１関節部４１は、台座部４０に接続されており、先端側の部分を台座の軸周りに回転させる。第２関節部４２は、第１関節部４１に接続されており、先端側の部分を前後方向に回転させる。第３関節部４３は、リンク４９１を介して第２関節部４２に接続されており、先端側の部分を上下方向に回転させる。第４関節部４４は、リンク４９２を介して第３関節部４３に接続されており、先端側の部分をリンク４９２の軸周りに回転させる。第５関節部４５は、リンク４９３を介して第４関節部４４に接続されており、先端側の部分を上下方向に回転させる。第６関節部４６は、リンク４９４を介して第５関節部４５に接続されており、先端側の部分をリンク４９４の軸周りに回転させる。第６関節部４６の先端側には、触覚センサＳ３と共にエンドエフェクタＴが取り付けられている。 The manipulator 4 according to this embodiment is a 6-axis vertical articulated industrial robot, and includes a pedestal 40 and six joints 41 to 46 . Each of the joints 41 to 46 incorporates a servomotor (not shown) and is configured to be rotatable about each axis. The first joint portion 41 is connected to the pedestal portion 40 and rotates the portion on the distal end side around the axis of the pedestal. The second joint portion 42 is connected to the first joint portion 41 and rotates the distal end side portion in the front-rear direction. The third joint portion 43 is connected to the second joint portion 42 via a link 491, and rotates the portion on the distal end side in the vertical direction. The fourth joint portion 44 is connected to the third joint portion 43 via a link 492 and rotates the tip side portion around the axis of the link 492 . The fifth joint portion 45 is connected to the fourth joint portion 44 via a link 493, and rotates the portion on the distal end side in the vertical direction. The sixth joint portion 46 is connected to the fifth joint portion 45 via a link 494 and rotates the tip side portion around the axis of the link 494 . An end effector T is attached to the distal end side of the sixth joint portion 46 together with the tactile sensor S3.

各関節部４１～４６には、エンコーダＳ２が更に内蔵されている。各エンコーダＳ２は、各関節部４１～４６の角度（制御量）を測定するように構成される。各エンコーダＳ２の測定データ（角度データ）は、各関節部４１～４６の角度の制御に利用することができる。また、触覚センサＳ３は、エンドエフェクタＴに作用する力を検出するように構成される。触覚センサＳ３の測定データ（圧力分布データ）は、エンドエフェクタＴに保持されたワークＷの位置及び姿勢を推定したり、エンドエフェクタＴに異常な力が作用しているか否かを検知したりするために利用されてよい。 Each of the joints 41-46 further incorporates an encoder S2. Each encoder S2 is configured to measure the angle (control amount) of each joint 41-46. The measurement data (angle data) of each encoder S2 can be used to control the angles of the joints 41-46. Also, the tactile sensor S3 is configured to detect a force acting on the end effector T. As shown in FIG. The measurement data (pressure distribution data) of the tactile sensor S3 is used to estimate the position and orientation of the workpiece W held by the end effector T, and to detect whether or not an abnormal force is acting on the end effector T. may be used for

なお、マニピュレータ４のハードウェア構成は、このような例に限定されなくてもよい。マニピュレータ４の具体的なハードウェア構成に関して、実施の形態に応じて適宜、構成要素の省略、置換及び追加が可能である。例えば、マニピュレータ４は、制御量又はその他の属性を観測するために、エンコーダＳ２及び触覚センサＳ３以外のセンサを備えてもよい。例えば、マニピュレータ４は、トルクセンサを更に備えてもよい。この場合、マニピュレータ４は、エンドエフェクタＴに作用する力をトルクセンサにより測定し、トルクセンサの測定値に基づいて、エンドエフェクタＴに過剰な力が作用しないように制御されてよい。また、マニピュレータ４の軸数は、６軸に限られなくてもよい。マニピュレータ４には、公知の産業用ロボットが採用されてよい。 Note that the hardware configuration of the manipulator 4 may not be limited to such an example. Regarding the specific hardware configuration of the manipulator 4, it is possible to omit, replace, and add components as appropriate according to the embodiment. For example, the manipulator 4 may have sensors other than the encoder S2 and the tactile sensor S3 to observe control variables or other attributes. For example, manipulator 4 may further comprise a torque sensor. In this case, the manipulator 4 may measure the force acting on the end effector T with a torque sensor, and may be controlled so that excessive force does not act on the end effector T based on the measured value of the torque sensor. Also, the number of axes of the manipulator 4 may not be limited to six. A known industrial robot may be adopted as the manipulator 4 .

［ソフトウェア構成］
＜第１モデル生成装置＞
次に、図７を用いて、本実施形態に係る第１モデル生成装置１のソフトウェア構成の一例について説明する。図７は、本実施形態に係る第１モデル生成装置１のソフトウェア構成の一例を模式的に例示する。 [Software configuration]
<First model generation device>
Next, an example of the software configuration of the first model generation device 1 according to this embodiment will be described with reference to FIG. FIG. 7 schematically illustrates an example of the software configuration of the first model generation device 1 according to this embodiment.

第１モデル生成装置１の制御部１１は、記憶部１２に記憶されたモデル生成プログラム８１をＲＡＭに展開する。そして、制御部１１は、ＲＡＭに展開されたモデル生成プログラム８１に含まれる命令をＣＰＵにより解釈及び実行して、各構成要素を制御する。これにより、図７に示されるとおり、本実施形態に係る第１モデル生成装置１は、データ取得部１１１、機械学習部１１２、及び保存処理部１１３をソフトウェアモジュールとして備えるコンピュータとして動作する。すなわち、本実施形態では、第１モデル生成装置１の各ソフトウェアモジュールは、制御部１１（ＣＰＵ）により実現される。 The control unit 11 of the first model generation device 1 develops the model generation program 81 stored in the storage unit 12 in RAM. Then, the control unit 11 causes the CPU to interpret and execute instructions included in the model generation program 81 developed in the RAM, and controls each component. Thus, as shown in FIG. 7, the first model generation device 1 according to the present embodiment operates as a computer having a data acquisition unit 111, a machine learning unit 112, and a storage processing unit 113 as software modules. That is, in this embodiment, each software module of the first model generation device 1 is implemented by the control unit 11 (CPU).

データ取得部１１１は、複数の学習データセット１２１を取得する。各学習データセット１２１は、２つの対象物の間の位置関係を示す訓練データ１２２及び当該位置関係において２つの対象物が互いに接触するか否かを示す正解データ１２３の組み合わせにより構成される。訓練データ１２２は、機械学習の入力データとして利用される。正解データ１２３は、機械学習の教師信号（ラベル）として利用される。訓練データ１２２及び正解データ１２３の形式は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、訓練データ１２２には、２つの対象物の間の相対座標がそのまま利用されてもよいし、或いは、相対座標を特徴量に変換することで得られた値が利用されてもよい。ＣＡＤデータ１２０により、対象の位置関係において対象の２つの対象物が互いに接触するか否かを判定することができる。そのため、ＣＡＤデータ１２０を利用することで、各学習データセット１２１を生成することができる。 The data acquisition unit 111 acquires multiple learning data sets 121 . Each learning data set 121 is composed of a combination of training data 122 indicating the positional relationship between two objects and correct data 123 indicating whether or not the two objects contact each other in the positional relationship. The training data 122 is used as input data for machine learning. The correct data 123 is used as a teacher signal (label) for machine learning. The formats of the training data 122 and the correct answer data 123 are not particularly limited and may be appropriately selected according to the embodiment. For example, the training data 122 may use the relative coordinates between the two objects as they are, or may use values obtained by converting the relative coordinates into feature quantities. The CAD data 120 can determine whether two objects of interest touch each other in their positional relationship. Therefore, each learning data set 121 can be generated by using the CAD data 120 .

機械学習部１１２は、取得された複数の学習データセット１２１を使用して、判定モデル５０の機械学習を実施する。機械学習を実施することは、各学習データセット１２１について、訓練データ１２２の入力に対して、対応する正解データ１２３に適合する出力値を出力するように判定モデル５０を訓練することにより構成される。この機械学習により、２つの対象物が互いに接触するか否かを判定する能力を習得した学習済みの判定モデル５０を構築することができる。保存処理部１１３は、構築された学習済みの判定モデル５０に関する情報を学習結果データ１２５として生成し、生成された学習結果データ１２５を所定の記憶領域に保存する。 The machine learning unit 112 performs machine learning of the judgment model 50 using the acquired plurality of learning data sets 121 . Performing machine learning consists of training the decision model 50 to output an output value that matches the corresponding correct data 123 for each training data set 121 given the training data 122 input. . Through this machine learning, it is possible to build a learned decision model 50 that has acquired the ability to decide whether two objects are in contact with each other. The storage processing unit 113 generates information about the built learned determination model 50 as learning result data 125, and stores the generated learning result data 125 in a predetermined storage area.

（判定モデルの構成）
次に、判定モデル５０の構成の一例について説明する。本実施形態に係る判定モデル５０は、深層学習に用いられる多層構造のニューラルネットワークにより構成される。図７の例では、判定モデル５０は、３層構造の全結合型ニューラルネットワークにより構成されている。判定モデル５０は、入力層５０１、中間（隠れ）層５０２、及び出力層５０３を備えている。ただし、判定モデル５０の構造は、このような例に限定されなくてもよく、実施の形態に応じて適宜決定されてよい。例えば、判定モデル５０の備える中間層の数は、１つに限られなくてもよく、２つ以上であってもよい。或いは、中間層５０２は、省略されてもよい。 (Configuration of decision model)
Next, an example of the configuration of the judgment model 50 will be described. The judgment model 50 according to the present embodiment is composed of a multi-layered neural network used for deep learning. In the example of FIG. 7, the judgment model 50 is composed of a fully-connected neural network with a three-layer structure. The decision model 50 comprises an input layer 501 , an intermediate (hidden) layer 502 and an output layer 503 . However, the structure of the judgment model 50 need not be limited to such an example, and may be appropriately determined according to the embodiment. For example, the number of intermediate layers included in the judgment model 50 may not be limited to one, and may be two or more. Alternatively, intermediate layer 502 may be omitted.

各層５０１～５０３に含まれるニューロン（ノード）の数は、実施の形態に応じて適宜決定されてよい。例えば、入力層５０１のニューロンの数は、２つの対象物の間の位置関係を表現する相対座標の次元数に応じて決定されてよい。また、出力層５０３のニューロンの数は、２つの対象物が互いに接触するか否かを表現する方法に応じて決定されてよい。例えば、２つの対象物が互いに接触するか否かを１つの数値で表現する（例えば、［０、１］の範囲の数値で表現する）場合、出力層５０３のニューロンの数は１つであってよい。また、例えば、接触する確率を示す第１の数値及び接触していない確率を示す第２の数値の２つの数値により２つの対象物が互いに接触するか否かを表現する場合、出力層５０３のニューロンの数は２つであってよい。 The number of neurons (nodes) included in each layer 501-503 may be determined as appropriate according to the embodiment. For example, the number of neurons in the input layer 501 may be determined according to the number of dimensions of relative coordinates expressing the positional relationship between two objects. Also, the number of neurons in the output layer 503 may be determined according to how to represent whether two objects touch each other. For example, when expressing with one numerical value whether or not two objects are in contact with each other (for example, expressing with a numerical value in the range of [0, 1]), the number of neurons in the output layer 503 is one. you can Further, for example, when expressing whether or not two objects are in contact with each other by using two numerical values, a first numerical value indicating the probability of contact and a second numerical value indicating the probability of non-contact, the output layer 503 The number of neurons may be two.

隣接する層のニューロン同士は適宜結合される。本実施形態では、各ニューロンは、隣接する層の全てのニューロンと結合されている。しかしながら、各ニューロンの結合関係は、このような例に限定されなくてもよく、実施の形態に応じて適宜設定されてよい。各結合には、重み（結合荷重）が設定されている。各ニューロンには閾値が設定されており、基本的には、各入力と各重みとの積の和が閾値を超えているか否かによって各ニューロンの出力が決定される。閾値は、活性化関数により表現されてもよい。この場合、各入力と各重みとの積の和を活性化関数に入力し、活性化関数の演算を実行することで、各ニューロンの出力が決定される。活性化関数の種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。各層５０１～５０３に含まれる各ニューロン間の結合の重み及び各ニューロンの閾値は、判定モデル５０の演算パラメータの一例である。 Neurons in adjacent layers are connected appropriately. In this embodiment, each neuron is connected to all neurons in adjacent layers. However, the connection relationship of each neuron need not be limited to such an example, and may be appropriately set according to the embodiment. A weight (connection weight) is set for each connection. A threshold is set for each neuron, and basically the output of each neuron is determined depending on whether or not the sum of products of each input and each weight exceeds the threshold. The threshold may be expressed by an activation function. In this case, the output of each neuron is determined by inputting the sum of the product of each input and each weight into the activation function and executing the operation of the activation function. The type of activation function need not be particularly limited, and may be appropriately selected according to the embodiment. The weight of the connection between neurons and the threshold value of each neuron included in each layer 501 to 503 are examples of calculation parameters of the judgment model 50 .

本実施形態では、機械学習部１１２は、複数の学習データセット１２１を使用して、上記ニューラルネットワークにより構成された判定モデル５０の機械学習を実施する。具体的に、機械学習部１１２は、判定モデル５０の演算パラメータの値を調整することで、各学習データセット１２１について、訓練データ１２２を入力層５０１に入力すると、正解データ１２３に適合する出力値が出力層５０３から出力されるように判定モデル５０の演算パラメータを訓練する。これにより、機械学習部１１２は、２つの対象物が互いに接触するか否かを判定する能力を習得した学習済みの判定モデル５０を生成することができる。 In this embodiment, the machine learning unit 112 uses a plurality of learning data sets 121 to carry out machine learning of the judgment model 50 configured by the neural network. Specifically, the machine learning unit 112 adjusts the values of the calculation parameters of the judgment model 50, so that when the training data 122 is input to the input layer 501 for each learning data set 121, the output value matching the correct data 123 The operation parameters of the decision model 50 are trained so that is output from the output layer 503 . Accordingly, the machine learning unit 112 can generate the learned determination model 50 that has acquired the ability to determine whether two objects are in contact with each other.

保存処理部１１３は、構築された学習済みの判定モデル５０の構造及び演算パラメータを示す情報を学習結果データ１２５として生成する。そして、保存処理部１１３は、生成された学習結果データ１２５を所定の記憶領域に保存する。なお、学習結果データ１２５の内容は、学習済みの判定モデル５０を再生可能であれば、このような例に限定されなくてもよい。例えば、各装置間で判定モデル５０の構造が共通化されている場合、学習結果データ１２５において判定モデル５０の構造を示す情報は省略されてよい。 The storage processing unit 113 generates, as the learning result data 125 , information indicating the structure and calculation parameters of the built learned determination model 50 . Then, the save processing unit 113 saves the generated learning result data 125 in a predetermined storage area. Note that the content of the learning result data 125 need not be limited to such an example as long as the learned determination model 50 can be reproduced. For example, when the structure of the judgment model 50 is shared among the devices, the information indicating the structure of the judgment model 50 may be omitted from the learning result data 125 .

＜第２モデル生成装置＞
次に、図８を用いて、本実施形態に係る第２モデル生成装置２のソフトウェア構成の一例について説明する。図８は、本実施形態に係る第２モデル生成装置２のソフトウェア構成の一例を模式的に例示する。 <Second model generation device>
Next, an example of the software configuration of the second model generation device 2 according to this embodiment will be described using FIG. FIG. 8 schematically illustrates an example of the software configuration of the second model generation device 2 according to this embodiment.

第２モデル生成装置２の制御部２１は、記憶部２２に記憶されたモデル生成プログラム８２をＲＡＭに展開する。そして、制御部２１は、ＲＡＭに展開されたモデル生成プログラム８２に含まれる命令をＣＰＵにより解釈及び実行して、各構成要素を制御する。これにより、図８に示されるとおり、本実施形態に係る第２モデル生成装置２は、接触判定部２１１、データ収集部２１２、モデル生成部２１３、及び保存処理部２１４をソフトウェアモジュールとして備えるコンピュータとして動作する。すなわち、本実施形態では、上記第１モデル生成装置１と同様に、第２モデル生成装置２の各ソフトウェアモジュールは、制御部２１（ＣＰＵ）により実現される。 The control unit 21 of the second model generation device 2 expands the model generation program 82 stored in the storage unit 22 to RAM. Then, the control unit 21 causes the CPU to interpret and execute instructions included in the model generation program 82 developed in the RAM, and controls each component. As a result, as shown in FIG. 8, the second model generation device 2 according to the present embodiment is a computer having a contact determination unit 211, a data collection unit 212, a model generation unit 213, and a storage processing unit 214 as software modules. Operate. That is, in the present embodiment, each software module of the second model generation device 2 is implemented by the control unit 21 (CPU), similarly to the first model generation device 1 described above.

接触判定部２１１は、学習結果データ１２５を保持することで、学習済みの判定モデル５０を備えている。接触判定部２１１は、学習結果データ１２５を参照して、学習済みの判定モデル５０の設定を行う。学習済みの判定モデル５０は、上記機械学習により、第１対象物及び第２対象物が互いに接触するか否かを判定する能力を習得している。接触判定部２１１は、第１対象物及び第２対象物の対象のタスク状態を示す情報を学習済みの判定モデル５０に与えることで、対象のタスク状態において第１対象物及び第２対象物が互いに接触するか否かを判定する。 The contact determination unit 211 has a learned determination model 50 by holding the learning result data 125 . The contact determination unit 211 refers to the learning result data 125 and sets the learned determination model 50 . The learned determination model 50 has acquired the ability to determine whether or not the first object and the second object are in contact with each other through the above machine learning. The contact determination unit 211 provides the learned determination model 50 with information indicating the task states of the first object and the second object, so that the first object and the second object are in the target task state. Determine whether or not they are in contact with each other.

データ収集部２１２及びモデル生成部２１３は、学習済みの判定モデル５０による判定の結果を利用して、第１対象物が第２対象物に接触しないように、最終目標のタスク状態及び現在のタスク状態から次に遷移する目標のタスク状態を決定するように構成された推論モデル５５を生成する。すなわち、データ収集部２１２は、学習済みの判定モデル５０による判定の結果を利用して、推論モデル５５の生成に使用する学習データ２２３を収集する。学習データ２２３の収集には、ＣＡＤデータ２２０が更に利用されてよい。モデル生成部２１３は、収集された学習データ２２３を使用して、推論モデル５５を生成する。学習データ２２３及び推論モデル５５の詳細は後述する。保存処理部２１４は、生成された推論モデル５５に関する情報を推論モデルデータ２２５として生成し、生成された推論モデルデータ２２５を所定の記憶領域に保存する。 The data collection unit 212 and the model generation unit 213 use the judgment result of the learned judgment model 50 to prevent the first object from coming into contact with the second object, so that the final target task state and the current task Generating an inference model 55 configured to determine the next target task state to transition from state to. That is, the data collection unit 212 collects the learning data 223 used to generate the inference model 55 using the judgment result of the learned judgment model 50 . The CAD data 220 may be further used to collect the learning data 223 . The model generator 213 uses the collected learning data 223 to generate the inference model 55 . Details of the learning data 223 and the inference model 55 will be described later. The storage processing unit 214 generates information about the generated inference model 55 as the inference model data 225, and saves the generated inference model data 225 in a predetermined storage area.

＜制御装置＞
次に、図９を用いて、本実施形態に係る制御装置３のソフトウェア構成の一例について説明する。図９は、本実施形態に係る制御装置３のソフトウェア構成の一例を模式的に例示する。 <Control device>
Next, an example of the software configuration of the control device 3 according to this embodiment will be described using FIG. FIG. 9 schematically illustrates an example of the software configuration of the control device 3 according to this embodiment.

制御装置３の制御部３１は、記憶部３２に記憶された制御プログラム８３をＲＡＭに展開する。そして、制御部３１は、ＲＡＭに展開された制御プログラム８３に含まれる命令をＣＰＵにより解釈及び実行して、各構成要素を制御する。これにより、図９に示されるとおり、本実施形態に係る制御装置３は、目標設定部３１０、第１データ取得部３１１、第２データ取得部３１２、第１推定部３１３、第２推定部３１４、状態取得部３１５、行動決定部３１６、指令決定部３１７、駆動部３１８、及び調整部３１９をソフトウェアモジュールとして備えるコンピュータとして動作する。すなわち、本実施形態では、上記第１モデル生成装置１と同様に、制御装置３の各ソフトウェアモジュールは、制御部３１（ＣＰＵ）により実現される。 The control unit 31 of the control device 3 develops the control program 83 stored in the storage unit 32 in RAM. Then, the control unit 31 causes the CPU to interpret and execute instructions included in the control program 83 developed in the RAM to control each component. Thereby, as shown in FIG. 9, the control device 3 according to the present embodiment includes a target setting unit 310, a first data acquisition unit 311, a second data acquisition unit 312, a first estimation unit 313, a second estimation unit 314 , a state acquisition unit 315, an action determination unit 316, a command determination unit 317, a drive unit 318, and an adjustment unit 319 as software modules. That is, in the present embodiment, each software module of the control device 3 is implemented by the control unit 31 (CPU) as in the first model generation device 1 described above.

目標設定部３１０は、遂行するタスクに応じて、最終目標のタスク状態を設定する。本実施形態では、タスク状態は、遂行するタスクにおける第１対象物及び第２対象物、より詳細には、マニピュレータ４の手先及び目標物の間の位置関係により規定される。本実施形態では、位置関係は、上記相対座標により表現される。「最終目標」は、終着点（ゴール）であり、タスクの遂行を完了した時点に実現される。 The goal setting unit 310 sets the final goal task state according to the task to be performed. In this embodiment, the task state is defined by the positional relationship between the first object and the second object in the task to be performed, more specifically between the hand of the manipulator 4 and the target. In this embodiment, the positional relationship is represented by the relative coordinates. The "final goal" is the end point (goal) and is achieved when the task is completed.

目標物は、遂行するタスクに応じて適宜設定されてよい。一例として、マニピュレータ４（エンドエフェクタＴ）がワークＷを保持していない場合、目標物は、ワークＷであってよい。一方、マニピュレータ４（エンドエフェクタＴ）がワークＷを保持している場合、目標物は、ワークＷの組み付け先の対象物（本実施形態では、他のワークＧ）であってよい。 The target may be appropriately set according to the task to be performed. As an example, the target may be the work W when the manipulator 4 (end effector T) does not hold the work W. On the other hand, when the manipulator 4 (end effector T) holds the work W, the target may be an object to which the work W is attached (in this embodiment, another work G).

第１データ取得部３１１は、マニピュレータ４の手先を観測する第１センサ系から第１センシングデータ３２３を取得する。本実施形態では、第１データ取得部３１１は、第１センシングデータ３２３として、各エンコーダＳ２及び触覚センサＳ３から測定データを取得する。第２データ取得部３１２は、マニピュレータ４の手先を観測する第２センサ系から第２センシングデータ３２４を取得する。本実施形態では、第２データ取得部３１２は、第２センシングデータ３２４として、カメラＳ１から画像データを取得する。 The first data acquisition unit 311 acquires the first sensing data 323 from the first sensor system that observes the hand of the manipulator 4 . In this embodiment, the first data acquisition unit 311 acquires measurement data from each encoder S2 and tactile sensor S3 as the first sensing data 323 . The second data acquisition unit 312 acquires the second sensing data 324 from the second sensor system that observes the hand of the manipulator 4 . In this embodiment, the second data acquisition unit 312 acquires image data from the camera S1 as the second sensing data 324 .

第１推定部３１３は、第１推定モデル６１を利用して、取得された第１センシングデータ３２３から、観測空間内における手先の現在の座標の第１推定値を算出する。第２推定部３１４は、第２推定モデル６２を利用して、取得された第２センシングデータ３２４から、観測空間内における手先の現在の座標の第２推定値を算出する。調整部３１９は、第１推定値及び第２推定値の間の誤差の勾配を算出し、算出された勾配に基づいて、誤差が小さくなるように第１推定モデル６１及び第２推定モデル６２の少なくとも一方のパラメータの値を調整する。 The first estimation unit 313 uses the first estimation model 61 to calculate a first estimated value of the current coordinates of the hand in the observation space from the acquired first sensing data 323 . The second estimation unit 314 uses the second estimation model 62 to calculate a second estimated value of the current coordinates of the hand in the observation space from the acquired second sensing data 324 . The adjusting unit 319 calculates the gradient of the error between the first estimated value and the second estimated value, and adjusts the first estimated model 61 and the second estimated model 62 based on the calculated gradient so that the error becomes small. Adjust the value of at least one parameter.

状態取得部３１５は、マニピュレータ４の現在のタスク状態を示す情報を取得する。「現在」は、マニピュレータ４の動作を制御する時点であって、マニピュレータ４に与える制御指令を決定する直前の時点である。 The state acquisition unit 315 acquires information indicating the current task state of the manipulator 4 . “Current” is the point in time when the operation of the manipulator 4 is controlled, and is the point in time immediately before determining the control command to be given to the manipulator 4 .

行動決定部３１６は、最終目標のタスク状態に近付くように、取得された情報により示される現在のタスク状態に対して、次に遷移する目標のタスク状態を決定する。「目標」は、最終目標を含み、タスクの遂行を達成するために適宜設定されてよい。最終目標までに設定される目標の数は、１つであってもよいし（この場合、最終目標だけが設定される）、複数であってもよい。最終目標以外の目標は、タスクの開始点から終着点に到達するまでに経由する経由点である。そのため、最終目標を単に「目標（ゴール）」と称し、最終目標以外の目標を「下位目標（サブゴール）」と称してもよい。下位目標は、「経由点」と称してもよい。「次に遷移する目標」は、現在のタスク状態から次に目指すタスク状態（最終目標以外の目標であれば暫定的なタスク状態）であり、例えば、最終目標に向かって、現在のタスク状態に最も近い目標である。 The action determination unit 316 determines the target task state to which the current task state indicated by the acquired information transitions next so as to approach the final target task state. A "goal" includes an end goal and may be set as appropriate to accomplish the task. The number of goals set up to the final goal may be one (in this case, only the final goal is set) or plural. Goals other than the final goal are waypoints through which the task is routed from the start point to the end point. Therefore, the final goal may simply be called a "goal", and the goals other than the final goal may be called a "sub-goal". A sub-goal may be referred to as a "waypoint." "Goal to next transition" is the next target task state from the current task state (provisional task state if the goal is other than the final goal). The closest target.

本実施形態では、行動決定部３１６は、推論モデルデータ２２５を保持することで、生成された推論モデル５５を備えている。行動決定部３１６は、生成された推論モデル５５を利用して、最終目標のタスク状態及び現在のタスク状態から次に遷移する目標のタスク状態を決定する。 In this embodiment, the action determination unit 316 has an inference model 55 generated by holding the inference model data 225 . The action determination unit 316 uses the generated inference model 55 to determine the final target task state and the target task state to which the current task state transitions next.

指令決定部３１７は、第１推定値及び第２推定値の少なくとも一方に基づいて、手先の座標が目標値に近付くように、マニピュレータ４に与える制御指令を決定する。駆動部３１８は、決定された制御指令をマニピュレータ４に与えることで、マニピュレータ４を駆動する。 The command determination unit 317 determines a control command to be given to the manipulator 4 based on at least one of the first estimated value and the second estimated value so that the coordinates of the hand end approach the target value. The driving unit 318 drives the manipulator 4 by giving the determined control command to the manipulator 4 .

本実施形態では、制御指令は、各関節に対する指令値により構成される。指令決定部３１７は、第１推定値及び第２推定値の少なくとも一方に基づいて、マニピュレータ４の手先座標の現在値を認定する。また、指令決定部３１７は、決定された次に遷移する目標のタスク状態から手先座標の目標値を算出する。次に、指令決定部３１７は、手先座標の現在値及び目標値の差分から各関節の角度の変化量を算出する。そして、指令決定部３１７は、算出された各関節の角度の変化量に基づいて、各関節に対する指令値を決定する。駆動部３１８は、決定された指令値により各関節を駆動する。これらの処理により、本実施形態に係る制御装置３は、マニピュレータ４の動作を制御する。 In this embodiment, the control command is composed of command values for each joint. The command determining unit 317 determines the current values of the hand coordinates of the manipulator 4 based on at least one of the first estimated value and the second estimated value. In addition, the command determination unit 317 calculates the target value of the hand coordinates from the determined next transition target task state. Next, the command determination unit 317 calculates the amount of change in the angle of each joint from the difference between the current hand coordinate value and the target value. Then, the command determination unit 317 determines a command value for each joint based on the calculated amount of change in the angle of each joint. The driving unit 318 drives each joint according to the determined command value. Through these processes, the control device 3 according to this embodiment controls the operation of the manipulator 4 .

＜その他＞
第１モデル生成装置１、第２モデル生成装置２及び制御装置３の各ソフトウェアモジュールに関しては後述する動作例で詳細に説明する。なお、本実施形態では、第１モデル生成装置１、第２モデル生成装置２及び制御装置３の各ソフトウェアモジュールがいずれも汎用のＣＰＵによって実現される例について説明している。しかしながら、以上のソフトウェアモジュールの一部又は全部が、１又は複数の専用のプロセッサにより実現されてもよい。また、第１モデル生成装置１、第２モデル生成装置２及び制御装置３それぞれのソフトウェア構成に関して、実施形態に応じて、適宜、ソフトウェアモジュールの省略、置換及び追加が行われてもよい。 <Others>
Each software module of the first model generation device 1, the second model generation device 2, and the control device 3 will be described in detail in operation examples described later. In this embodiment, an example in which each software module of the first model generation device 1, the second model generation device 2, and the control device 3 is realized by a general-purpose CPU is described. However, some or all of the above software modules may be implemented by one or more dedicated processors. Further, regarding the software configurations of the first model generation device 1, the second model generation device 2, and the control device 3, omission, replacement, and addition of software modules may be performed as appropriate according to the embodiment.

§３動作例
［第１モデル生成装置］
次に、図１０を用いて、第１モデル生成装置１の動作例について説明する。図１０は、本実施形態に係る第１モデル生成装置１による判定モデル５０の機械学習に関する処理手順の一例を示すフローチャートである。以下で説明する処理手順は、判定モデル５０を生成するためのモデル生成方法の一例である。ただし、以下で説明する各処理手順は一例に過ぎず、各ステップは可能な限り変更されてよい。更に、以下で説明する各処理手順について、実施の形態に応じて、適宜、ステップの省略、置換、及び追加が可能である。 §3 Operation example [First model generation device]
Next, an operation example of the first model generation device 1 will be described with reference to FIG. FIG. 10 is a flowchart showing an example of a processing procedure relating to machine learning of the judgment model 50 by the first model generation device 1 according to this embodiment. The processing procedure described below is an example of a model generation method for generating the judgment model 50 . However, each processing procedure described below is merely an example, and each step may be changed as much as possible. Furthermore, for each processing procedure described below, it is possible to omit, replace, or add steps as appropriate according to the embodiment.

（ステップＳ１０１）
ステップＳ１０１では、制御部１１は、データ取得部１１１として動作し、判定モデル５０の機械学習に使用する複数の学習データセット１２１を取得する。各学習データセット１２１は、２つの対象物の間の位置関係を示す訓練データ１２２及び当該位置関係において２つの対象物が互いに接触するか否かを示す正解データ１２３の組み合わせにより構成される。 (Step S101)
In step S101 , the control unit 11 operates as the data acquisition unit 111 and acquires a plurality of learning data sets 121 to be used for machine learning of the judgment model 50 . Each learning data set 121 is composed of a combination of training data 122 indicating the positional relationship between two objects and correct data 123 indicating whether or not the two objects contact each other in the positional relationship.

各学習データセット１２１を生成する方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、ＣＡＤデータ１２０を利用して、仮想空間上で、２つの対象物を様々な位置関係に配置する。本実施形態では、位置関係は、相対座標により表現される。また、本実施形態では、２つの対象物のうちの少なくともいずれかは、マニピュレータ４の動作により移動する対象である。上記第１タスクを遂行する場面を想定する場合、エンドエフェクタＴ及びワークＷがそれぞれ各対象物の一例である。また、上記第２タスクを遂行する場面を想定する場合、エンドエフェクタＴにより保持されたワークＷ及び他のワークＧがそれぞれ各対象物の一例である。２つの対象物のうちの一方がマニピュレータ４の手先であり、他方が目標物である。各対象物の配置は、オペレータにより指定されてもよいし、ランダムに決定されてもよい。或いは、一方の対象物の位置を固定し、他方の対象物を規則に従って位置を変更することで、様々な位置関係が実現されてもよい。他方の対象物の配置を与える規則は適宜設定されてよい。これにより、各位置関係における相対座標を各学習データセット１２１の訓練データ１２２として取得することができる。また、ＣＡＤデータ１２０には、各対象物のモデルが含まれている。そのため、ＣＡＤデータ１２０により、対象の位置関係において対象の２つの対象物が互いに接触するか否かを判定することができる。ＣＡＤデータ１２０を利用して、各位置関係において２つの対象物が互いに接触するか否かを判定した結果を正解データ１２３として対応する訓練データ１２２に関連付ける。これにより、各学習データセット１２１を生成することができる。なお、各学習データセット１２１を生成する方法は、このような例に限定されなくてもよい。実空間上において、各対象物の実物を利用することで、各学習データセット１２１が生成されてもよい。 A method for generating each learning data set 121 may not be particularly limited, and may be appropriately selected according to the embodiment. For example, using the CAD data 120, two objects are arranged in various positional relationships in the virtual space. In this embodiment, the positional relationship is represented by relative coordinates. Moreover, in the present embodiment, at least one of the two objects is an object to be moved by the operation of the manipulator 4 . When assuming a scene where the first task is performed, the end effector T and the workpiece W are examples of respective objects. Further, when assuming a scene in which the second task is performed, the work W held by the end effector T and another work G are examples of respective objects. One of the two objects is the hand of the manipulator 4 and the other is the target. The placement of each object may be specified by the operator or determined at random. Alternatively, various positional relationships may be realized by fixing the position of one object and changing the position of the other object according to a rule. A rule for giving the placement of the other object may be set as appropriate. Thereby, relative coordinates in each positional relationship can be acquired as training data 122 of each learning data set 121 . The CAD data 120 also includes a model of each object. As such, the CAD data 120 can determine whether two objects of interest touch each other in the positional relationship of the objects. Using the CAD data 120 , the result of determining whether or not two objects contact each other in each positional relationship is associated with the corresponding training data 122 as correct data 123 . Thereby, each learning data set 121 can be generated. Note that the method of generating each learning data set 121 need not be limited to such an example. Each learning data set 121 may be generated by using the actual object of each object in the real space.

各学習データセット１２１は、コンピュータの動作により自動的に生成されてもよいし、少なくとも部分的にオペレータの操作を含むことで手動的に生成されてもよい。また、各学習データセット１２１の生成は、第１モデル生成装置１により行われてもよいし、第１モデル生成装置１以外の他のコンピュータにより行われてもよい。各学習データセット１２１を第１モデル生成装置１が生成する場合、制御部１１は、自動的又はオペレータの入力装置１５を介した操作により手動的に上記一連の処理を実行することで、複数の学習データセット１２１を取得する。一方、各学習データセット１２１を他のコンピュータが生成する場合、制御部１１は、例えば、ネットワーク、記憶媒体９１等を介して、他のコンピュータにより生成された複数の学習データセット１２１を取得する。この場合、ＣＡＤデータ１２０は第１モデル生成装置１から省略されてもよい。一部の学習データセット１２１が第１モデル生成装置１により生成され、その他の学習データセット１２１が１又は複数の他のコンピュータにより生成されてもよい。 Each learning data set 121 may be automatically generated by computer operations, or may be manually generated by including, at least in part, operator manipulations. Also, the generation of each learning data set 121 may be performed by the first model generation device 1 or may be performed by a computer other than the first model generation device 1 . When the first model generation device 1 generates each learning data set 121, the control unit 11 executes the above series of processes automatically or manually by an operator's operation via the input device 15 to generate a plurality of Acquire the learning data set 121 . On the other hand, when each learning data set 121 is generated by another computer, the control unit 11 acquires a plurality of learning data sets 121 generated by the other computer, for example, via a network, a storage medium 91, or the like. In this case, the CAD data 120 may be omitted from the first model generation device 1 . Some learning data sets 121 may be generated by the first model generation device 1 and other learning data sets 121 may be generated by one or more other computers.

取得される学習データセット１２１の件数は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。複数の学習データセット１２１を取得すると、制御部１１は、次のステップＳ１０２に処理を進める。 The number of acquired learning data sets 121 may not be particularly limited, and may be appropriately selected according to the embodiment. After acquiring the plurality of learning data sets 121, the control unit 11 proceeds to the next step S102.

（ステップＳ１０２）
ステップＳ１０２では、制御部１１は、機械学習部１１２として動作し、取得された複数の学習データセット１２１を使用して、判定モデル５０の機械学習を実施する。本実施形態では、制御部１１は、機械学習により、各学習データセット１２１について、訓練データ１２２を入力層５０１に入力すると、対応する正解データ１２３に適合する出力値が出力層５０３から出力されるように判定モデル５０を訓練する。これにより、制御部１１は、対象の位置関係において２つの対象物が互いに接触するか否かを判定する能力を習得した学習済みの判定モデル５０を構築する。 (Step S102)
In step S102 , the control unit 11 operates as the machine learning unit 112 and performs machine learning of the judgment model 50 using the acquired plurality of learning data sets 121 . In this embodiment, the control unit 11 uses machine learning to input the training data 122 for each learning data set 121 into the input layer 501, and the output value matching the corresponding correct data 123 is output from the output layer 503. The decision model 50 is trained as follows. As a result, the control unit 11 constructs a learned determination model 50 that has learned the ability to determine whether or not two objects come into contact with each other in the positional relationship of the objects.

機械学習の処理手順は、実施の形態に応じて適宜決定されてよい。一例として、制御部１１は、まず、処理対象となる判定モデル５０を用意する。用意する判定モデル５０の構造（例えば、層の数、各層に含まれるニューロンの数、隣接する層のニューロン同士の結合関係等）、各ニューロン間の結合の重みの初期値、及び各ニューロンの閾値の初期値は、テンプレートにより与えられてもよいし、オペレータの入力により与えられてもよい。また、再学習を行う場合には、制御部１１は、過去の機械学習を行うことで得られた学習結果データに基づいて、判定モデル５０を用意してもよい。 A processing procedure of machine learning may be appropriately determined according to the embodiment. As an example, the control unit 11 first prepares the determination model 50 to be processed. The structure of the judgment model 50 to be prepared (for example, the number of layers, the number of neurons included in each layer, the connection relationship between neurons in adjacent layers, etc.), the initial value of the weight of the connection between neurons, and the threshold value of each neuron may be given by a template or may be given by an operator's input. Further, when performing re-learning, the control unit 11 may prepare the determination model 50 based on learning result data obtained by performing past machine learning.

次に、制御部１１は、各学習データセット１２１に含まれる訓練データ１２２を入力データとして利用し、正解データ１２３を教師信号として利用して、判定モデル５０（ニューラルネットワーク）の学習処理を実行する。この学習処理には、バッチ勾配降下法、確率的勾配降下法、ミニバッチ勾配降下法等が用いられてよい。 Next, the control unit 11 uses the training data 122 included in each learning data set 121 as input data and the correct data 123 as a teacher signal to execute the learning process of the judgment model 50 (neural network). . Batch gradient descent, stochastic gradient descent, mini-batch gradient descent, or the like may be used for this learning process.

例えば、第１のステップでは、制御部１１は、各学習データセット１２１について、訓練データ１２２を判定モデル５０に入力し、判定モデル５０の演算処理を実行する。すなわち、制御部１１は、訓練データ１２２を入力層５０１に入力し、入力側から順に各層５０１～５０３に含まれる各ニューロンの発火判定を行う（すなわち、順伝播の演算を行う）。この演算処理により、制御部１１は、判定モデル５０の出力層５０３から、訓練データ１２２により示される位置関係において２つの対象物が接触するか否かを判定した結果に対応する出力値を取得する。 For example, in the first step, the control unit 11 inputs the training data 122 to the judgment model 50 for each learning data set 121 and executes the arithmetic processing of the judgment model 50 . That is, the control unit 11 inputs the training data 122 to the input layer 501, and determines firing of each neuron included in each layer 501 to 503 in order from the input side (that is, performs computation of forward propagation). Through this arithmetic processing, the control unit 11 obtains from the output layer 503 of the determination model 50 an output value corresponding to the result of determining whether or not two objects come into contact with each other in the positional relationship indicated by the training data 122. .

第２のステップでは、制御部１１は、出力層５０３から取得された出力値と正解データ１２３との誤差（損失）を損失関数に基づいて算出する。損失関数は、学習モデルの出力と正解との差分（すなわち、相違の程度）を評価する関数であり、出力層５０３から取得された出力値と正解データ１２３との差分値が大きいほど、損失関数により算出される誤差の値は大きくなる。誤差の計算に利用する損失関数の種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。 In the second step, the control unit 11 calculates the error (loss) between the output value acquired from the output layer 503 and the correct data 123 based on the loss function. The loss function is a function that evaluates the difference (that is, the degree of difference) between the output of the learning model and the correct answer. The value of the error calculated by is large. The type of loss function used to calculate the error need not be particularly limited, and may be appropriately selected according to the embodiment.

第３のステップでは、制御部１１は、誤差逆伝播（Back propagation）法により、算出された出力値の誤差の勾配を用いて、判定モデル５０の各演算パラメータ（各ニューロン間の結合の重み、各ニューロンの閾値等）の値の誤差を算出する。第４のステップでは、制御部１１は、算出された各誤差に基づいて、判定モデル５０の演算パラメータの値を更新する。演算パラメータの値を更新する程度は、学習率により調節されてよい。学習率は、オペレータの指定により与えられてもよいし、プログラム内の設定値として与えられてもよい。 In the third step, the control unit 11 uses the gradient of the error of the output value calculated by the back propagation method to calculate each operation parameter of the decision model 50 (the weight of the connection between each neuron, Calculate the error in the value of each neuron (threshold, etc.). In a fourth step, the controller 11 updates the values of the calculation parameters of the judgment model 50 based on each calculated error. The extent to which the values of the computational parameters are updated may be adjusted by the learning rate. The learning rate may be specified by the operator or may be given as a set value within the program.

制御部１１は、上記第１～第４のステップを繰り返すことで、各学習データセット１２１について、出力層５０３から出力される出力値と正解データ１２３との誤差の和が小さくなるように、判定モデル５０の演算パラメータの値を調整する。例えば、制御部１１は、当該誤差の和が閾値以下になるまで、上記第１～第４のステップの処理を繰り返してもよい。閾値は、実施の形態に応じて適宜設定されてよい。この機械学習の結果により、制御部１１は、各学習データセット１２１について、訓練データ１２２を入力層５０１に入力すると、対応する正解データ１２３に適合する出力値を出力層５０３から出力するように訓練された学習済みの判定モデル５０を構築することができる。この「適合する」ことは、閾値等により許容可能な差異が、出力層５０３の出力値と教師信号（正解データ１２３）との間に生じることを含んでもよい。判定モデル５０の機械学習が完了すると、制御部１１は、次のステップＳ１０３に処理を進める。 By repeating the first to fourth steps, the control unit 11 determines that the sum of the error between the output value output from the output layer 503 and the correct data 123 is small for each learning data set 121. Adjust the values of the calculation parameters of the model 50 . For example, the control unit 11 may repeat the processes of the first to fourth steps until the sum of the errors becomes equal to or less than the threshold. The threshold may be appropriately set according to the embodiment. Based on the result of this machine learning, the control unit 11 trains to output an output value matching the corresponding correct data 123 from the output layer 503 when the training data 122 is input to the input layer 501 for each learning data set 121. A trained decision model 50 can be constructed. This "matching" may include that an acceptable difference, such as by a threshold, occurs between the output value of the output layer 503 and the teacher signal (correct data 123). When the machine learning of the judgment model 50 is completed, the control unit 11 advances the process to the next step S103.

（ステップＳ１０３）
ステップＳ１０３では、制御部１１は、保存処理部１１３として動作し、機械学習により構築された学習済みの判定モデル５０に関する情報を学習結果データ１２５として所定の記憶領域に保存する。本実施形態では、制御部１１は、ステップＳ１０２により構築された学習済みの判定モデル５０の構造及び演算パラメータを示す情報を学習結果データ１２５として生成する。そして、制御部１１は、生成された学習結果データ１２５を所定の記憶領域に保存する。 (Step S103)
In step S103 , the control unit 11 operates as the storage processing unit 113 and stores information about the learned judgment model 50 constructed by machine learning as the learning result data 125 in a predetermined storage area. In this embodiment, the control unit 11 generates, as the learning result data 125, information indicating the structure and calculation parameters of the learned judgment model 50 constructed in step S102. Then, the control unit 11 saves the generated learning result data 125 in a predetermined storage area.

所定の記憶領域は、例えば、制御部１１内のＲＡＭ、記憶部１２、外部記憶装置、記憶メディア又はこれらの組み合わせであってよい。記憶メディアは、例えば、ＣＤ、ＤＶＤ等であってよく、制御部１１は、ドライブ１７を介して記憶メディアに学習結果データ１２５を格納してもよい。外部記憶装置は、例えば、ＮＡＳ（Network Attached Storage）等のデータサーバであってよい。この場合、制御部１１は、通信インタフェース１３を利用して、ネットワークを介してデータサーバに学習結果データ１２５を格納してもよい。また、外部記憶装置は、例えば、第１モデル生成装置１に接続された外付けの記憶装置であってもよい。 The predetermined storage area may be, for example, the RAM in the control unit 11, the storage unit 12, an external storage device, a storage medium, or a combination thereof. The storage medium may be, for example, a CD, DVD, or the like, and the control section 11 may store the learning result data 125 in the storage medium via the drive 17 . The external storage device may be, for example, a data server such as NAS (Network Attached Storage). In this case, the control unit 11 may use the communication interface 13 to store the learning result data 125 in the data server via the network. Also, the external storage device may be, for example, an external storage device connected to the first model generation device 1 .

これにより、学習結果データ１２５の保存が完了すると、制御部１１は、学習済みの判定モデル５０の生成に関する一連の処理を終了する。 Thus, when the storage of the learning result data 125 is completed, the control unit 11 terminates a series of processes regarding generation of the learned determination model 50 .

なお、生成された学習結果データ１２５は、任意のタイミングで第２モデル生成装置２に提供されてよい。例えば、制御部１１は、ステップＳ１０３の処理として又はステップＳ１０３の処理とは別に、学習結果データ１２５を第２モデル生成装置２に転送してもよい。第２モデル生成装置２は、この転送を受信することで、学習結果データ１２５を取得してもよい。また、例えば、第２モデル生成装置２は、通信インタフェース２３を利用して、第１モデル生成装置１又はデータサーバにネットワークを介してアクセスすることで、学習結果データ１２５を取得してもよい。また、例えば、第２モデル生成装置２は、記憶媒体９２を介して、学習結果データ１２５を取得してもよい。また、例えば、学習結果データ１２５は、第２モデル生成装置２に予め組み込まれてもよい。 Note that the generated learning result data 125 may be provided to the second model generation device 2 at any timing. For example, the control unit 11 may transfer the learning result data 125 to the second model generation device 2 as the process of step S103 or separately from the process of step S103. The second model generation device 2 may acquire the learning result data 125 by receiving this transfer. Further, for example, the second model generation device 2 may acquire the learning result data 125 by accessing the first model generation device 1 or the data server via the network using the communication interface 23 . Also, for example, the second model generation device 2 may acquire the learning result data 125 via the storage medium 92 . Also, for example, the learning result data 125 may be incorporated in the second model generation device 2 in advance.

更に、制御部１１は、上記ステップＳ１０１～ステップＳ１０３の処理を定期又は不定期に繰り返すことで、学習結果データ１２５を更新又は新たに生成してもよい。この繰り返しの際には、複数の学習データセット１２１の少なくとも一部の変更、修正、追加、削除等が適宜実行されてよい。そして、制御部１１は、更新した又は新たに生成した学習結果データ１２５を学習処理の実行毎に第２モデル生成装置２に提供することで、第２モデル生成装置２の保持する学習結果データ１２５を更新してもよい。 Furthermore, the control unit 11 may update or newly generate the learning result data 125 by repeating the processing of steps S101 to S103 on a regular or irregular basis. During this repetition, at least part of the plurality of learning data sets 121 may be changed, corrected, added, deleted, etc., as appropriate. Then, the control unit 11 provides the updated or newly generated learning result data 125 to the second model generation device 2 each time the learning process is executed, thereby allowing the learning result data 125 held by the second model generation device 2 to may be updated.

［第２モデル生成装置］
次に、図１１を用いて、第２モデル生成装置２による推論モデル５５の生成に関する動作例について説明する。図１１は、本実施形態に係る第２モデル生成装置２による推論モデル５５の生成に関する処理手順の一例を示すフローチャートである。なお、以下で説明する各処理手順は一例に過ぎず、各ステップは可能な限り変更されてよい。更に、以下で説明する各処理手順について、実施の形態に応じて、適宜、ステップの省略、置換、及び追加が可能である。 [Second model generation device]
Next, an operation example regarding generation of the inference model 55 by the second model generation device 2 will be described with reference to FIG. FIG. 11 is a flowchart showing an example of a processing procedure for generating the inference model 55 by the second model generation device 2 according to this embodiment. Each processing procedure described below is merely an example, and each step may be changed as much as possible. Furthermore, for each processing procedure described below, it is possible to omit, replace, or add steps as appropriate according to the embodiment.

（ステップＳ２０１）
ステップＳ２０１では、制御部２１は、マニピュレータ４の遂行するタスクに関して、最終目標のタスク状態の指定を受け付ける。タスク状態は、第１対象物及び第２対象物、より詳細には、マニピュレータ４の手先及び目標物の間の位置関係により表現される。本実施形態では、位置関係は、相対座標により表現される。 (Step S201)
In step S201 , the control unit 21 receives designation of the final target task state for the task to be performed by the manipulator 4 . The task state is represented by the positional relationship between the first object and the second object, more specifically, the hand of the manipulator 4 and the target. In this embodiment, the positional relationship is represented by relative coordinates.

最終のタスク状態における相対座標を指定する方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、最終のタスク状態における相対座標は、入力装置２５を介したオペレータの入力により直接的に指定されてもよい。また、例えば、オペレータの入力により、遂行するタスクが選択され、選択されたタスクに応じて、最終のタスク状態における相対座標が指定されてもよい。また、例えば、ＣＡＤデータ２２０を利用して、仮想空間上で各対象物のモデルを最終目標の位置関係に配置することで、最終目標における相対座標が指定されてもよい。各対象物のモデルの配置は、シミュレータにより自動的に行われてもよいし、オペレータの入力により手動的に行われてもよい。最終目標のタスク状態が指定されると、制御部２１は、次のステップＳ２０２に処理を進める。 The method of specifying the relative coordinates in the final task state is not particularly limited, and may be appropriately selected according to the embodiment. For example, relative coordinates in the final task state may be specified directly by operator input via input device 25 . Also, for example, a task to be performed may be selected by an operator's input, and relative coordinates in the final task state may be designated according to the selected task. Also, for example, by using the CAD data 220 and arranging the models of the respective objects in the virtual space in the positional relationship of the final target, the relative coordinates in the final target may be specified. The placement of the model of each object may be automatically performed by the simulator, or may be manually performed by an operator's input. When the final target task state is specified, the control unit 21 advances the process to the next step S202.

（ステップＳ２０２～ステップＳ２０４）
ステップＳ２０２では、制御部２１は、任意のタスク状態を開始点に設定する。開始点に設定されるタスク状態は、タスクの遂行を開始する時点におけるタスク状態に相当する。開始点となるタスク状態は、ランダムに設定されてもよいし、或いは、オペレータの入力により指定されてもよい。オペレータによる開始点の指定方法は、上記最終目標の指定方法と同様であってよい。また、開始点となるタスク状態は、任意のアルゴリズムにより決定されてよい。一例として、各対象物の実物を実空間に配置し、カメラにより各対象物を撮影することで、各対象物の写る画像データを取得してもよい。そして、得られた画像データに対して画像処理（例えば、ＣＡＤデータ２２０によるマッチング）を行うことで、開始点となるタスク状態が決定されてもよい。その他、開始点となるタスク状態は、ＣＡＤデータ２２０を利用して適宜決定されてもよい。 (Steps S202 to S204)
At step S202, the control unit 21 sets an arbitrary task state as a starting point. The task state set at the starting point corresponds to the task state at the time the task starts to be performed. The task state that serves as the starting point may be set randomly or may be designated by operator input. The method of specifying the starting point by the operator may be the same as the method of specifying the final goal. Also, the starting task state may be determined by any algorithm. As an example, the actual object of each object may be arranged in real space, and image data of each object may be obtained by photographing each object with a camera. Then, by performing image processing (for example, matching using the CAD data 220) on the obtained image data, a task state that serves as a starting point may be determined. In addition, the task state that serves as the starting point may be appropriately determined using the CAD data 220 .

ステップＳ２０３では、制御部２１は、接触判定部２１１として動作し、学習済みの判定モデル５０を利用して、開始点に設定されたタスク状態において２つの対象物が互いに接触するか否かを判定する。具体的には、制御部２１は、学習結果データ１２５を参照して、学習済みの判定モデル５０の設定を行う。続いて、制御部２１は、ステップＳ２０２で設定されたタスク状態の相対座標を学習済みの判定モデル５０の入力層５０１に入力する。そして、制御部２１は、学習済みの判定モデル５０の演算処理として、入力側から順に各層５０１～５０３に含まれる各ニューロンの発火判定を行う。これにより、制御部２１は、開始点に設定されたタスク状態において２つの対象物が互いに接触するか否かを判定した結果に対応する出力値を学習済みの判定モデル５０の出力層５０３から取得する。 In step S203, the control unit 21 operates as the contact determination unit 211 and uses the learned determination model 50 to determine whether or not the two objects are in contact with each other in the task state set as the starting point. do. Specifically, the control unit 21 refers to the learning result data 125 and sets the learned determination model 50 . Subsequently, the control unit 21 inputs the relative coordinates of the task state set in step S202 to the input layer 501 of the learned judgment model 50 . Then, the control unit 21 determines firing of each neuron included in each layer 501 to 503 in order from the input side as the arithmetic processing of the learned determination model 50 . As a result, the control unit 21 acquires from the output layer 503 of the learned determination model 50 the output value corresponding to the result of determining whether or not the two objects are in contact with each other in the task state set as the starting point. do.

ステップＳ２０４では、制御部２１は、ステップＳ２０３の判定結果に基づいて、処理の分岐先を決定する。ステップＳ２０３において、開始点に設定されたタスク状態において２つの対象物が互いに接触すると判定された場合、制御部２１は、ステップＳ２０２に処理を戻し、開始点のタスク状態を再度設定する。一方、開始点に設定されたタスク状態において２つの対象物が互いに接触しないと判定された場合、制御部２１は、設定された開始点のタスク状態をマニピュレータ４の現在のタスク状態として認定し、次のステップＳ２０５に処理を進める。 In step S204, the control unit 21 determines the branch destination of the process based on the determination result of step S203. If it is determined in step S203 that the two objects come into contact with each other in the task state set as the start point, the control unit 21 returns the process to step S202 and sets the task state as the start point again. On the other hand, when it is determined that the two objects do not contact each other in the task state set as the starting point, the control unit 21 recognizes the set starting point task state as the current task state of the manipulator 4, The process proceeds to the next step S205.

図１２Ａは、タスク空間ＳＰにおいて、上記ステップＳ２０１～ステップＳ２０４の処理により開始点及び最終目標のタスク状態が設定された場面の一例を模式的に例示する。タスク空間ＳＰは、タスク状態を規定する相対座標の集合を表現する。タスク空間ＳＰを示す情報は、第２モデル生成装置２に保持されていてもよいし、保持されていなくてもよい。タスク空間ＳＰに属する各ノード（点）は、２つの対象物の間の相対座標に対応する。図１２Ａの例では、ノードＮｓが、開始点のタスク状態における相対座標に対応し、ノードＮｇが、最終目標のタスク状態における相対座標に対応する。本実施形態では、タスク空間ＳＰにおける２つの対象物が接触するか否かの境界面（接触境界面）は、学習済みの判定モデル５０による判定結果に基づいて導出される。 FIG. 12A schematically illustrates an example of a scene in which the task states of the starting point and the final goal are set in the task space SP by the processing of steps S201 to S204. The task space SP represents a set of relative coordinates that define task states. Information indicating the task space SP may or may not be held in the second model generation device 2 . Each node (point) belonging to the task space SP corresponds to a relative coordinate between two objects. In the example of FIG. 12A, the node Ns corresponds to the relative coordinates in the starting task state, and the node Ng corresponds to the relative coordinates in the final goal task state. In this embodiment, a boundary surface (contact boundary surface) indicating whether or not two objects in the task space SP are in contact is derived based on the determination result of the learned determination model 50 .

（ステップＳ２０５～ステップＳ２０７）
ステップＳ２０５では、制御部２１は、最終目標のタスク状態に近付くように、現在のタスク状態に対して次に遷移する目標のタスク状態を決定する。 (Step S205 to Step S207)
In step S205, the control unit 21 determines the next target task state to transition from the current task state so as to approach the final target task state.

目標のタスク状態を決定する方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、目標のタスク状態における相対座標は、オペレータの入力により決定されてもよい。開始点となるタスク状態の設定と同様に、目標のタスク状態における相対座標は、任意のアルゴリズムにより決定されてもよいし、ＣＡＤデータ２２０を利用して適宜決定されてもよい。また、例えば、制御部２１は、開始点のタスク状態における相対座標をランダムに変更することで、目標のタスク状態における相対座標を決定してもよい。また、例えば、制御部２１は、タスク空間ＳＰ内で、ノードＮｇに近付くように、ノードＮｓから所定距離だけ離れたノードを選択してもよい。制御部２１は、選択されたノードに対応するタスク状態を目標のタスク状態として取得してもよい。また、例えば、後述する強化学習により推論モデル５５を生成する場合には、強化学習の過程における推論モデル５５を利用して、目標のタスク状態が決定されてよい。 A method for determining the target task state may not be particularly limited, and may be appropriately selected according to the embodiment. For example, the relative coordinates in the target task state may be determined by operator input. Similar to the setting of the starting point task state, the relative coordinates in the target task state may be determined by any algorithm, or may be determined as appropriate using the CAD data 220 . Further, for example, the control unit 21 may determine the relative coordinates in the target task state by randomly changing the relative coordinates in the task state of the starting point. Further, for example, the control unit 21 may select a node a predetermined distance away from the node Ns so as to approach the node Ng within the task space SP. The control unit 21 may acquire the task state corresponding to the selected node as the target task state. Further, for example, when the inference model 55 is generated by reinforcement learning, which will be described later, the target task state may be determined using the inference model 55 in the process of reinforcement learning.

また、例えば、目標のタスク状態の決定には、パスプランニング等の公知の方法が採用されてよい。一例として、制御部２１は、タスク空間ＳＰにおいて、目標のタスク状態の候補となるノードを設定してもよい。ノードの設定は、ランダムサンプリング等の方法で自動的に行われてもよいし、オペレータの入力により手動的に行われてもよい。一部のノードの設定が自動的に行われ、残りのノードの設定が手動的に行われてもよい。目標のタスク状態の候補となるノードの設定が行われた後、制御部２１は、遷移可能なノードの組み合わせを適宜選択してよい。遷移可能なノードの組み合わせを選択する方法には、例えば、最近傍法等が採用されてよい。タスク空間ＳＰ内において、遷移可能なノードの組み合わせは、ノードを連結するエッジにより表現されてよい。次に、制御部２１は、開始点のノードＮｓから最終目標のノードＮｇまでの経路を探索する。経路探索の方法には、ダイクストラ法等が採用されてよい。制御部２１は、探索により得られた経路に含まれるノードに対応するタスク状態を目標のタスク状態として取得してもよい。 Also, for example, a known method such as path planning may be employed to determine the target task state. As an example, the control unit 21 may set nodes that are candidates for the target task state in the task space SP. The node setting may be performed automatically by a method such as random sampling, or may be manually performed by an operator's input. The configuration of some nodes may be done automatically and the configuration of the remaining nodes may be done manually. After the nodes that are candidates for the target task state are set, the control unit 21 may appropriately select a combination of nodes to which transition is possible. For example, a nearest neighbor method or the like may be adopted as a method for selecting combinations of nodes that can be transitioned. Within the task space SP, combinations of transitionable nodes may be represented by edges connecting the nodes. Next, the control unit 21 searches for a route from the starting point node Ns to the final target node Ng. Dijkstra's method or the like may be adopted as a route search method. The control unit 21 may acquire the task state corresponding to the node included in the route obtained by the search as the target task state.

ステップＳ２０６では、制御部２１は、接触判定部２１１として動作し、学習済みの判定モデル５０を利用して、決定された目標のタスク状態において２つの対象物が互いに接触するか否かを判定する。開始点のタスク状態から目標のタスク状態に判定の対象が置き換わる点を除き、制御部２１は、上記ステップＳ２０３と同様に、ステップＳ２０６の処理を実行してよい。すなわち、制御部２１は、目標のタスク状態の相対座標を学習済みの判定モデル５０に入力し、学習済みの判定モデル５０の演算処理を実行する。これにより、制御部２１は、目標のタスク状態において２つの対象物が互いに接触するか否かを判定した結果に対応する出力値を学習済みの判定モデル５０から取得する。 In step S206, the control unit 21 operates as the contact determination unit 211 and uses the learned determination model 50 to determine whether or not the two objects are in contact with each other in the determined target task state. . The control unit 21 may execute the process of step S206 in the same manner as in step S203 above, except that the task state of the starting point is replaced with the target task state. That is, the control unit 21 inputs the relative coordinates of the target task state to the learned judgment model 50 and executes the arithmetic processing of the learned judgment model 50 . Thereby, the control unit 21 acquires from the learned determination model 50 an output value corresponding to the result of determining whether or not the two objects are in contact with each other in the target task state.

ステップＳ２０７では、制御部２１は、ステップＳ２０６の判定結果に基づいて、処理の分岐先を決定する。ステップＳ２０７において、目標のタスク状態において２つの対象物が互いに接触すると判定された場合、制御部２１は、ステップＳ２０５に処理を戻し、目標のタスク状態を再度決定する。一方、目標のタスク状態において２つの対象物が互いに接触しないと判定された場合、制御部２１は、次のステップＳ２０８に処理を進める。 In step S207, the control unit 21 determines the branch destination of the process based on the determination result of step S206. If it is determined in step S207 that the two objects are in contact with each other in the target task state, the control unit 21 returns the process to step S205 to determine the target task state again. On the other hand, when it is determined that the two objects do not contact each other in the target task state, the control unit 21 advances the processing to the next step S208.

なお、ステップＳ２０７における分岐先は、このような例に限定されなくてもよい。例えば、目標のタスク状態において２つの対象物が互いに接触すると判定された場合、制御部２１は、ステップＳ２０２に処理を戻し、開始点の設定から処理をやり直してもよい。また、例えば、目標のタスク状態を複数回決定した後に、最後に決定された目標のタスク状態において２つの対象物が互いに接触すると判定された場合、制御部２１は、ステップＳ２０５に処理を戻し、次に遷移する目標のタスク状態の決定を開始点から再度行ってもよい。接触までに決定された目標のタスク状態の系列は、最終目標のタスク状態まで到達不能な失敗事例として収集されてもよい。 Note that the branch destination in step S207 does not have to be limited to such an example. For example, if it is determined that the two objects are in contact with each other in the target task state, the control unit 21 may return the process to step S202 and redo the process from setting the starting point. Further, for example, after determining the target task state a plurality of times, if it is determined that the two objects come into contact with each other in the last determined target task state, the control unit 21 returns the process to step S205, The determination of the target task state to which the next transition is to be made may be performed again from the starting point. A sequence of target task states determined up to contact may be collected as failure cases in which the final target task state is unreachable.

図１２Ｂは、タスク空間ＳＰにおいて、上記ステップＳ２０５～ステップＳ２０７の処理により、目標のタスク状態が決定された場面の一例を模式的に例示する。図１２Ｂの例では、ノードＮ１が、開始点のタスク状態（ノードＮｓ）の次に遷移する目標のタスク状態として決定されたタスク状態における相対座標に対応する。なお、図１２Ｂの例では、ステップＳ２０５において、１回の遷移分の目標のタスク状態が決定されたことを想定している。ただし、ステップＳ２０５において決定する目標のタスク状態の数は、１つに限られなくてもよい。ステップＳ２０５では、制御部２１は、最終目標のタスク状態に向けて、複数回の遷移分の目標のタスク状態（目標のタスク状態の系列）を決定してもよい。 FIG. 12B schematically illustrates an example of a scene in which the target task state is determined in the task space SP by the processes of steps S205 to S207. In the example of FIG. 12B, the node N1 corresponds to the relative coordinates in the task state determined as the target task state to transition to after the starting task state (node Ns). In the example of FIG. 12B, it is assumed that the target task state for one transition is determined in step S205. However, the number of target task states determined in step S205 need not be limited to one. In step S205, the control unit 21 may determine a target task state (target task state sequence) for a plurality of transitions toward the final target task state.

（ステップＳ２０８）
ステップＳ２０８では、制御部２１は、ステップＳ２０５により決定された目標のタスク状態にマニピュレータ４の現在のタスク状態を遷移させる。そして、制御部２１は、マニピュレータ４のタスク状態が最終目標のタスク状態に到達したか否か、すなわち、遷移先のタスク状態が最終目標のタスク状態であるか否かを判定する。タスク状態の遷移は、シミュレーションにより仮想空間上で行われてよい。最終目標のタスク状態に到達したと判定した場合、制御部２１は、次のステップＳ２０９に処理を進める。一方、最終目標のタスク状態に到達していないと判定した場合、制御部２１は、ステップＳ２０５に処理を戻し、更なる目標のタスク状態を決定する。 (Step S208)
In step S208, the control unit 21 transitions the current task state of the manipulator 4 to the target task state determined in step S205. Then, the control unit 21 determines whether or not the task state of the manipulator 4 has reached the final target task state, that is, whether or not the transition destination task state is the final target task state. A task state transition may be performed in a virtual space by simulation. When determining that the final target task state has been reached, the control unit 21 advances the processing to the next step S209. On the other hand, if it is determined that the final target task state has not been reached, the control unit 21 returns the process to step S205 to determine a further target task state.

図１２Ｃは、タスク空間ＳＰにおいて、上記ステップＳ２０８までの処理により、開始点のタスク状態から最終目標のタスク状態までの遷移するタスク状態の系列が決定された場面の一例を模式的に例示する。各ノードＮ１～Ｎ４が、開始点のノードＮｓから最終目標のノードＮｇに到達するまでに、目標のタスク状態として決定されたタスク状態における相対座標に対応する。ノードＮ（ｋ＋１）は、ノードＮ（ｋ）の次に遷移する目標のタスク状態を示す（ｋは、１～３）。図１２Ｃに例示されるとおり、ステップＳ２０８までの処理により、制御部２１は、開始点から最終目標までに遷移する目標のタスク状態の系列を得ることができる。 FIG. 12C schematically illustrates an example of a scene in which a transition task state sequence from the starting point task state to the final target task state is determined by the processing up to step S208 in the task space SP. Each of the nodes N1 to N4 corresponds to the relative coordinates in the task state determined as the target task state from the starting point node Ns to the final target node Ng. Node N(k+1) indicates the target task state to which node N(k) transitions (k is 1 to 3). As illustrated in FIG. 12C, the processing up to step S208 allows the control unit 21 to obtain a series of target task states transitioning from the starting point to the final target.

（ステップＳ２０９）
ステップＳ２０９では、制御部２１は、ステップＳ２０２～ステップＳ２０８の処理を繰り返すか否かを判定する。処理を繰り返す基準は、実施の形態に応じて適宜決定されてよい。 (Step S209)
In step S209, the control unit 21 determines whether or not to repeat the processes of steps S202 to S208. A criterion for repeating the process may be appropriately determined according to the embodiment.

例えば、処理を繰り返す規定回数が設定されていてもよい。規定回数は、例えば、設定値により与えられてもよいし、オペレータの指定により与えられてもよい。この場合、制御部２１は、ステップＳ２０２～ステップＳ２０８の処理を実行した回数が規定回数に到達したか否かを判定する。実行回数が規定回数に到達していないと判定した場合、制御部２１は、ステップＳ２０２に処理を戻し、ステップＳ２０２～ステップＳ２０８の処理を繰り返す。一方、実行回数が規定回数に到達していると判定した場合には、制御部２１は、次のステップＳ２１０に処理を進める。 For example, a specified number of times to repeat the process may be set. The specified number of times may be given by a set value or may be given by an operator's designation, for example. In this case, the control unit 21 determines whether or not the number of times the processes of steps S202 to S208 have been executed has reached a specified number of times. When determining that the number of times of execution has not reached the specified number of times, the control unit 21 returns the process to step S202 and repeats the processes of steps S202 to S208. On the other hand, when determining that the number of times of execution has reached the specified number of times, the control unit 21 advances the process to the next step S210.

また、例えば、制御部２１は、処理を繰り返すか否かをオペレータに問い合わせてもよい。この場合、制御部２１は、オペレータの回答に応じて、ステップＳ２０２～ステップＳ２０８の処理を繰り返すか否かを判定する。オペレータが処理を繰り返すと回答した場合、制御部２１は、ステップＳ２０２に処理を戻し、ステップＳ２０２～ステップＳ２０８の処理を繰り返す。一方、オペレータが処理を繰り返さないと回答した場合、制御部２１は、次のステップＳ２１０に処理を進める。 Also, for example, the control unit 21 may inquire of the operator whether to repeat the process. In this case, the control unit 21 determines whether or not to repeat the processing of steps S202 to S208 according to the operator's answer. When the operator replies that the process will be repeated, the control section 21 returns the process to step S202 and repeats the processes of steps S202 to S208. On the other hand, if the operator answers not to repeat the process, the control unit 21 advances the process to the next step S210.

ステップＳ２０９までの処理により、図１２Ｃに例示される開始点から最終目標までに遷移する目標のタスク状態の１つ以上の系列を得ることができる。制御部２１は、データ収集部２１２として動作し、この開始点から最終目標までに遷移する目標のタスク状態の１つ以上の系列を収集する。そして、制御部２１は、収集された系列により学習データ２２３を生成する。制御部２１は、収集された系列をそのまま学習データ２２３として取得してもよいし、収集された系列に対して何らかの情報処理を実行することで学習データ２２３を生成してもよい。学習データ２２３の構成は、推論モデル５５を生成する方法に応じて適宜決定されてよい。学習データ２２３の構成については後述する。 Through the processing up to step S209, one or more series of target task states transitioning from the start point to the final target illustrated in FIG. 12C can be obtained. The control unit 21 operates as the data collection unit 212 and collects one or more series of target task states transitioning from this starting point to the final goal. Then, the control unit 21 generates learning data 223 from the collected series. The control unit 21 may acquire the collected series as the learning data 223 as they are, or may generate the learning data 223 by executing some information processing on the collected series. The configuration of the learning data 223 may be appropriately determined according to the method of generating the inference model 55 . The configuration of the learning data 223 will be described later.

（ステップＳ２１０及びステップＳ２１１）
ステップＳ２１０では、制御部２１は、モデル生成部２１３として動作する。すなわち、制御部２１は、学習済みの判定モデル５０による判定の結果を利用して得られた学習データ２２３を使用して、第１対象物が第２対象物に接触しないように、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論するための推論モデル５５を生成する。推論モデル５５を生成する方法については後述する。 (Step S210 and Step S211)
At step S210 , the control unit 21 operates as the model generation unit 213 . That is, the control unit 21 uses the learning data 223 obtained by using the determination result of the learned determination model 50 to prevent the first object from coming into contact with the second object. An inference model 55 is generated for inferring the next transition target task state from the state and the final target task state. A method for generating the inference model 55 will be described later.

ステップＳ２１１では、制御部２１は、保存処理部２１４として動作する。すなわち、制御部２１は、生成された推論モデル５５に関する情報を推論モデルデータ２２５として生成し、生成された推論モデルデータ２２５を所定の記憶領域に保存する。所定の記憶領域は、例えば、制御部２１内のＲＡＭ、記憶部２２、外部記憶装置、記憶メディア又はこれらの組み合わせであってよい。記憶メディアは、例えば、ＣＤ、ＤＶＤ等であってよく、制御部２１は、ドライブ２７を介して記憶メディアに推論モデルデータ２２５を格納してもよい。外部記憶装置は、例えば、ＮＡＳ等のデータサーバであってよい。この場合、制御部２１は、通信インタフェース２３を利用して、ネットワークを介してデータサーバに推論モデルデータ２２５を格納してもよい。また、外部記憶装置は、例えば、第２モデル生成装置２に接続された外付けの記憶装置であってもよい。 At step S211 , the control unit 21 operates as the saving processing unit 214 . That is, the control unit 21 generates information about the generated inference model 55 as the inference model data 225, and stores the generated inference model data 225 in a predetermined storage area. The predetermined storage area may be, for example, the RAM in the control section 21, the storage section 22, an external storage device, a storage medium, or a combination thereof. The storage medium may be a CD, DVD, or the like, for example, and the control unit 21 may store the inference model data 225 in the storage medium via the drive 27 . The external storage device may be, for example, a data server such as NAS. In this case, the control unit 21 may use the communication interface 23 to store the inference model data 225 in the data server via the network. Also, the external storage device may be, for example, an external storage device connected to the second model generation device 2 .

これにより、推論モデルデータ２２５の保存が完了すると、制御部２１は、推論モデル５５の生成に関する一連の処理を終了する。 Thus, when the storage of the inference model data 225 is completed, the control unit 21 terminates a series of processes regarding generation of the inference model 55 .

なお、生成された推論モデルデータ２２５は、任意のタイミングで制御装置３に提供されてよい。例えば、制御部２１は、ステップＳ２１１の処理として又はステップＳ２１１の処理とは別に、推論モデルデータ２２５を制御装置３に転送してもよい。制御装置３は、この転送を受信することで、推論モデルデータ２２５を取得してもよい。また、例えば、制御装置３は、通信インタフェース３３を利用して、第２モデル生成装置２又はデータサーバにネットワークを介してアクセスすることで、推論モデルデータ２２５を取得してもよい。また、例えば、制御装置３は、記憶媒体９３を介して、推論モデルデータ２２５を取得してもよい。また、例えば、推論モデルデータ２２５は、制御装置３に予め組み込まれてもよい。 Note that the generated inference model data 225 may be provided to the control device 3 at any timing. For example, the control unit 21 may transfer the inference model data 225 to the control device 3 as the process of step S211 or separately from the process of step S211. The control device 3 may acquire the inference model data 225 by receiving this transfer. Further, for example, the control device 3 may acquire the inference model data 225 by accessing the second model generation device 2 or the data server via the network using the communication interface 33 . Also, for example, the control device 3 may acquire the inference model data 225 via the storage medium 93 . Also, for example, the inference model data 225 may be pre-installed in the control device 3 .

更に、制御部２１は、上記ステップＳ２０１～ステップＳ２１１の処理を定期又は不定期に繰り返すことで、推論モデルデータ２２５を更新又は新たに生成してもよい。この繰り返しの際には、学習データ２２３の少なくとも一部の変更、修正、追加、削除等が適宜実行されてよい。そして、制御部２１は、更新した又は新たに生成した推論モデルデータ２２５を学習処理の実行毎に制御装置３に提供することで、制御装置３の保持する推論モデルデータ２２５を更新してもよい。 Furthermore, the control unit 21 may update or newly generate the inference model data 225 by periodically or irregularly repeating the processing of steps S201 to S211. During this repetition, at least part of the learning data 223 may be changed, corrected, added, deleted, etc., as appropriate. Then, the control unit 21 may update the inference model data 225 held by the control device 3 by providing the updated or newly generated inference model data 225 to the control device 3 each time the learning process is executed. .

＜推論モデルの生成方法＞
次に、上記ステップＳ２１０における推論モデル５５の生成方法の具体例について説明する。本実施形態では、制御部２１は、以下の２つの方法のうちの少なくともいずれかの方法により、推論モデル５５を生成することができる。 <Inference model generation method>
Next, a specific example of the method of generating the inference model 55 in step S210 will be described. In this embodiment, the control unit 21 can generate the inference model 55 by at least one of the following two methods.

（１）第１の方法
第１の方法では、制御部２１は、機械学習を実施することで、推論モデル５５を生成する。この場合、推論モデル５５は、機械学習モデルにより構成される。機械学習モデルの種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。推論モデル５５は、例えば、関数式、データテーブル等により表現されてよい。関数式で表現される場合、推論モデル５５は、例えば、ニューラルネットワーク、サポートベクタマシン、回帰モデル、決定木等により構成されてよい。また、機械学習の方法も、特に限定されなくてもよく、推論モデル５５の構成に応じて適宜選択されてよい。推論モデル５５の機械学習の方法には、例えば、教師あり学習、強化学習等が採用されてよい。以下、推論モデル５５を構成する機械学習モデル及び機械学習の方法それぞれの２つの例について説明する。 (1) First Method In the first method, the control unit 21 generates the inference model 55 by performing machine learning. In this case, the inference model 55 is composed of a machine learning model. The type of machine learning model is not particularly limited, and may be appropriately selected according to the embodiment. The inference model 55 may be represented by, for example, a functional formula, a data table, or the like. When expressed as a functional expression, the inference model 55 may be configured by, for example, a neural network, support vector machine, regression model, decision tree, or the like. Also, the machine learning method is not particularly limited, and may be appropriately selected according to the configuration of the inference model 55 . As a machine learning method for the inference model 55, for example, supervised learning, reinforcement learning, or the like may be employed. Two examples of each of the machine learning model and the machine learning method that make up the inference model 55 are described below.

（１－１）第１の例
図１３は、推論モデル５５を構成する機械学習モデル及び機械学習の方法の第１の例を模式的に示す。第１の例では、推論モデル５５にニューラルネットワーク、機械学習の方法に教師あり学習を採用する。なお、図１３の例では、説明の便宜上、推論モデル５５、学習データ２２３、及び推論モデルデータ２２５それぞれの一例を、推論モデル５５１、学習データ２２３１、及び推論モデルデータ２２５１と表記する。 (1-1) First Example FIG. 13 schematically shows a first example of a machine learning model and a machine learning method that constitute the inference model 55. As shown in FIG. In the first example, a neural network is used as the inference model 55, and supervised learning is used as the machine learning method. In the example of FIG. 13, for convenience of explanation, the inference model 551, the learning data 2231, and the inference model data 2251 respectively represent examples of the inference model 55, the learning data 223, and the inference model data 225, respectively.

（１－１－１）推論モデルの構成例
第１の例では、推論モデル５５１は、３層構造の再帰型ニューラルネットワークにより構成されている。具体的に、推論モデル５５１は、入力層Ｎ５１、ＬＳＴＭ（長期短期記憶：Long short-term memory）ブロックＮ５２、及び出力層Ｎ５３を備えている。ＬＳＴＭブロックＮ５２は、中間層に対応する。 (1-1-1) Configuration Example of Inference Model In the first example, the inference model 551 is configured by a recurrent neural network with a three-layer structure. Specifically, the inference model 551 includes an input layer N51, an LSTM (Long short-term memory) block N52, and an output layer N53. LSTM block N52 corresponds to the intermediate layer.

ＬＳＴＭブロックＮ５２は、入力ゲート及び出力ゲートを備え、情報の記憶及び出力のタイミングを学習可能に構成されたブロックである（S.Hochreiter and J.Schmidhuber, “Long short-term memory” Neural Computation, 9(8):1735-1780, November 15, 1997）。ＬＳＴＭブロックＮ５２は、情報の忘却のタイミングを調節する忘却ゲートを更に備えてもよい（Felix A. Gers, Jurgen Schmidhuber and Fred Cummins, "Learning to Forget: Continual Prediction with LSTM" Neural Computation, pages 2451-2471, October 2000）。ＬＳＴＭブロックＮ５２の構成は、実施の形態に応じて適宜設定されてよい。 The LSTM block N52 is a block that has an input gate and an output gate and is configured to be able to learn the timing of storing and outputting information (S. Hochreiter and J. Schmidhuber, "Long short-term memory" Neural Computation, 9 (8):1735-1780, November 15, 1997). The LSTM block N52 may further include a forget gate that regulates the timing of information forgetting (Felix A. Gers, Jurgen Schmidhuber and Fred Cummins, "Learning to Forget: Continual Prediction with LSTM" Neural Computation, pages 2451-2471). , October 2000). The configuration of the LSTM block N52 may be appropriately set according to the embodiment.

なお、推論モデル５５１の構造は、このような例に限定されなくてもよく、実施の形態に応じて適宜決定されてよい。推論モデル５５１は、異なる構造の再帰型ニューラルネットワークにより構成されてもよい。或いは、推論モデル５５１は、再帰型ではなく、上記判定モデル５０と同様に全結合型ニューラルネットワーク、又は畳み込みニューラルネットワークにより構成されてもよい。或いは、推論モデル５５１は、複数種類のニューラルネットワークの組み合わせにより構成されてもよい。また、推論モデル５５１の備える中間層の数は、１つに限られなくてもよく、２つ以上であってもよい。或いは、中間層は、省略されてもよい。その他、推論モデル５５１の構成は、上記判定モデル５０と同様であってよい。 Note that the structure of the inference model 551 does not have to be limited to such an example, and may be determined as appropriate according to the embodiment. The inference model 551 may be composed of recurrent neural networks with different structures. Alternatively, the inference model 551 may be configured by a fully-connected neural network or a convolutional neural network like the judgment model 50 described above instead of the recursive type. Alternatively, the inference model 551 may be configured by combining multiple types of neural networks. Also, the number of intermediate layers included in the inference model 551 is not limited to one, and may be two or more. Alternatively, the intermediate layer may be omitted. In addition, the configuration of the inference model 551 may be the same as that of the judgment model 50 described above.

（１－１－２）学習データの構成例
推論モデル５５１の教師あり学習に利用される学習データ２２３１は、訓練データ（入力データ）及び正解データ（教師信号）の組み合わせを含む複数の学習データセットＬ３０により構成される。訓練データは、訓練用の現在のタスク状態Ｌ３１における相対座標及び訓練用の最終目標のタスク状態Ｌ３２における相対座標により構成されてよい。正解データは、訓練用の目標のタスク状態Ｌ３３における相対座標により構成されてよい。なお、訓練データ及び正解データの形式は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、訓練データには、相対座標がそのまま利用されてもよいし、或いは、相対座標を特徴量に変換することで得られた値が利用されてもよい。 (1-1-2) Configuration example of learning data The learning data 2231 used for supervised learning of the inference model 551 is a plurality of learning data sets including combinations of training data (input data) and correct data (teacher signal). L30. The training data may consist of relative coordinates in the training current task state L31 and relative coordinates in the training target task state L32. The correct answer data may consist of relative coordinates in task state L33 of the training target. Note that the formats of the training data and correct answer data are not particularly limited, and may be appropriately selected according to the embodiment. For example, as the training data, the relative coordinates may be used as they are, or values obtained by converting the relative coordinates into feature quantities may be used.

制御部２１は、ステップＳ２０９までの処理により得られた目標のタスク状態の１つ以上の系列から各学習データセットＬ３０を生成することができる。例えば、ノードＮｇにより示される最終目標のタスク状態を、訓練用の最終目標のタスク状態Ｌ３２として利用することができる。また、制御部２１は、ノードＮｓにより示される開始点のタスク状態を訓練用の現在のタスク状態Ｌ３１に設定した場合、対応する正解データにおける訓練用の目標のタスク状態Ｌ３３に、ノードＮ１により示されるタスク状態を設定してよい。同様に、制御部２１は、ノードＮ（ｋ）により示されるタスク状態を訓練用の現在のタスク状態Ｌ３１に設定した場合、対応する正解データにおける訓練用の目標のタスク状態Ｌ３３に、ノードＮ（ｋ＋１）により示されるタスク状態を設定してよい。制御部２１は、ノードＮ４により示されるタスク状態を訓練用の現在のタスク状態Ｌ３１に設定した場合、対応する正解データにおける訓練用の目標のタスク状態Ｌ３３に、ノードＮｇにより示される最終目標のタスク状態を設定してよい。これにより、得られた目標のタスク状態の１つ以上の系列から各学習データセットＬ３０を生成することができる。 The control unit 21 can generate each learning data set L30 from one or more series of target task states obtained by the processing up to step S209. For example, the final target task state indicated by node Ng can be used as the final target task state L32 for training. Further, when the starting task state indicated by the node Ns is set as the current task state for training L31, the control unit 21 sets the target task state for training L33 in the corresponding correct answer data indicated by the node N1. You may set the task state to be Similarly, when the task state indicated by node N(k) is set as the current task state for training L31, the control unit 21 sets node N(k) to the target task state for training L33 in the corresponding correct answer data. k+1) may be set. When the task state indicated by the node N4 is set as the current task state for training L31, the control unit 21 sets the target task state for training L33 in the corresponding correct answer data to the final goal task indicated by the node Ng. You can set the state. Thereby, each learning data set L30 can be generated from one or more sequences of the obtained target task states.

（１－１－３）ステップＳ２１０について
上記ステップＳ２１０では、制御部２１は、取得された複数の学習データセットＬ３０を使用して、推論モデル５５１の機械学習（教師あり学習）を実施する。第１の例では、制御部２１は、機械学習により、各学習データセットＬ３０について、訓練データを入力層Ｎ５１に入力すると、正解データに適合する出力値を出力層Ｎ５３から出力するように推論モデル５５１を訓練する。これにより、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論する能力を獲得した学習済みの推論モデル５５１を生成することができる。 (1-1-3) Step S210 In step S210, the control unit 21 performs machine learning (supervised learning) of the inference model 551 using the acquired plurality of learning data sets L30. In the first example, the control unit 21 uses machine learning to input training data to the input layer N51 for each learning data set L30, and the inference model is configured to output an output value that matches the correct data from the output layer N53. Train 551. As a result, it is possible to generate a trained inference model 551 that has acquired the ability to infer the next transition target task state from the current task state and the final target task state.

推論モデル５５１の機械学習の方法は、上記判定モデル５０の機械学習の方法と同様であってよい。すなわち、第１のステップでは、制御部２１は、各学習データセットＬ３０について、推論モデル５５１の入力層Ｎ５１に訓練データを入力し、推論モデル５５１の演算処理を実行する。これにより、制御部２１は、現在のタスク状態に対して次に遷移する目標のタスク状態を推論した結果に対応する出力値を推論モデル５５１の出力層Ｌ５３から取得する。第２のステップでは、制御部２１は、出力層Ｌ５３の出力値と正解データとの誤差を損失関数に基づいて算出する。 The machine learning method for the inference model 551 may be the same as the machine learning method for the judgment model 50 described above. That is, in the first step, the control unit 21 inputs training data to the input layer N51 of the inference model 551 for each learning data set L30, and executes arithmetic processing of the inference model 551. FIG. As a result, the control unit 21 obtains from the output layer L53 of the inference model 551 an output value corresponding to the result of inferring the target task state to which the current task state will transition next. In the second step, the controller 21 calculates the error between the output value of the output layer L53 and the correct data based on the loss function.

続いて、第３のステップでは、制御部２１は、誤差逆伝播法により、算出された出力値の誤差の勾配を用いて、推論モデル５５１の各演算パラメータの値の誤差を算出する。制御部２１は、算出された誤差の勾配を用いて、推論モデル５５１の各演算パラメータ（例えば、各ニューロン間の結合の重み、各ニューロンの閾値等）の値の誤差を算出する。第４のステップでは、制御部２１は、算出された各誤差に基づいて、推論モデル５５１の演算パラメータの値を更新する。更新の程度は、学習率により調節されてよい。学習率は、オペレータの指定により与えられてもよいし、プログラム内の設定値として与えられてもよい。 Subsequently, in the third step, the control unit 21 calculates the error of each calculation parameter value of the inference model 551 by using the error gradient of the calculated output value by the error backpropagation method. The control unit 21 uses the calculated gradient of the error to calculate the error in the value of each calculation parameter (for example, the weight of the connection between each neuron, the threshold value of each neuron, etc.) of the inference model 551 . In a fourth step, the control unit 21 updates the values of the calculation parameters of the inference model 551 based on each calculated error. The degree of update may be adjusted by the learning rate. The learning rate may be specified by the operator or may be given as a set value within the program.

制御部２１は、上記第１～第４のステップを繰り返すことで、各学習データセットＬ３０について、出力層Ｎ５３から出力される出力値と正解データとの誤差の和が小さくなるように、推論モデル５５１の演算パラメータの値を調整する。例えば、制御部２１は、誤差の和が閾値以下になるまで、上記第１～第４のステップの処理を繰り返してもよい。閾値は、実施の形態に応じて適宜設定されてよい。或いは、制御部２１は、上記第１～第４のステップを所定回数繰り返してもよい。調整を繰り返す回数は、例えば、プログラム内の設定値で指定されてもよいし、オペレータの入力により指定されてもよい。 By repeating the first to fourth steps, the control unit 21 repeats the inference model so that the sum of the error between the output value output from the output layer N53 and the correct data is small for each learning data set L30. 551 calculation parameter values are adjusted. For example, the control unit 21 may repeat the processes of the first to fourth steps until the sum of the errors becomes equal to or less than the threshold. The threshold may be appropriately set according to the embodiment. Alternatively, the control section 21 may repeat the above-described first to fourth steps a predetermined number of times. The number of times the adjustment is repeated may be designated by a set value in the program, or may be designated by an operator's input.

この機械学習（教師あり学習）の結果により、制御部２１は、各学習データセットＬ３０について、訓練データを入力層Ｎ５１に入力すると、対応する正解データに適合する出力値を出力層Ｎ５３から出力するように訓練された学習済みの推論モデル５５１を構築することができる。すなわち、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論する能力を獲得した学習済みの推論モデル５５１を構築することができる。 Based on the results of this machine learning (supervised learning), the control unit 21 inputs training data to the input layer N51 for each learning data set L30, and outputs an output value that matches the corresponding correct data from the output layer N53. A learned inference model 551 can be constructed that has been trained as follows. That is, it is possible to construct a trained inference model 551 that has acquired the ability to infer the target task state to which the next transition is to be made from the current task state and the final target task state.

ステップＳ２１１では、制御部２１は、教師あり学習により構築された学習済みの推論モデル５５１の構造及び演算パラメータを示す情報を推論モデルデータ２２５１として生成する。そして、制御部２１は、生成された推論モデルデータ２２５１を所定の記憶領域に保存する。なお、推論モデルデータ２２５１の内容は、学習済みの推論モデル５５１を再生可能であれば、このような例に限定されなくてもよい。例えば、各装置間で推論モデル５５１の構造が共通化されている場合、推論モデルデータ２２５１において推論モデル５５１の構造を示す情報は省略されてよい。 In step S211, the control unit 21 generates, as the inference model data 2251, information indicating the structure and operation parameters of the learned inference model 551 constructed by supervised learning. Then, the control unit 21 saves the generated inference model data 2251 in a predetermined storage area. Note that the contents of the inference model data 2251 need not be limited to such an example as long as the learned inference model 551 can be reproduced. For example, when the structure of the inference model 551 is shared among the devices, information indicating the structure of the inference model 551 may be omitted from the inference model data 2251 .

（１－１－４）その他
なお、機械学習の方法に教師あり学習を採用する場合、推論モデル５５１の構成は、ニューラルネットワークに限られなくてよい。ニューラルネットワーク以外の機械学習モデルが推論モデル５５１として採用されてもよい。推論モデル５５１を構成する機械学習モデルには、例えば、サポートベクタマシン、回帰モデル、決定木等が採用されてもよい。教師あり学習の方法は、上記の例に限定されなくてよく、機械学習モデルの構成に応じて適宜選択されてよい。 (1-1-4) Others When supervised learning is adopted as the machine learning method, the configuration of the inference model 551 need not be limited to a neural network. A machine learning model other than a neural network may be employed as the inference model 551 . A support vector machine, a regression model, a decision tree, or the like, for example, may be adopted as the machine learning model that configures the inference model 551 . The supervised learning method is not limited to the above examples, and may be appropriately selected according to the configuration of the machine learning model.

（１－２）第２の例
図１４は、推論モデル５５を構成する機械学習モデル及び機械学習の方法の第２の例を模式的に示す。第２の例では、機械学習の方法に強化学習を採用する。なお、図１４の例では、説明の便宜上、推論モデル５５、学習データ２２３、及び推論モデルデータ２２５それぞれの一例を、推論モデル５５２、学習データ２２３２、及び推論モデルデータ２２５２と表記する。 (1-2) Second Example FIG. 14 schematically shows a second example of the machine learning model and the machine learning method that constitute the inference model 55 . A second example employs reinforcement learning as a method of machine learning. In the example of FIG. 14, for convenience of explanation, the inference model 552, the learning data 2232, and the inference model data 2252 represent examples of the inference model 55, the learning data 223, and the inference model data 225, respectively.

（１－２－１）推論モデルの構成例
第２の例では、推論モデル５５２には、価値ベース、方策ベース、又はその両方が採用されてよい。価値ベースを採用する場合、推論モデル５５２は、例えば、状態価値関数、行動価値関数（Ｑ関数）等の価値関数により構成されてよい。状態価値関数は、与えられた状態の価値を出力するように構成される。行動価値関数は、与えられた状態に対して各行動の価値を出力するように構成される。方策ベースを採用する場合、推論モデル５５２は、例えば、方策関数により構成されてよい。方策関数は、与えられた状態に対して各行動を選択する確率を出力するように構成される。両方を採用する場合、推論モデル５５２は、例えば、価値関数（Critic）及び方策関数（Actor）により構成されてよい。各関数は、例えば、データテーブル、関数式等により表現されてよい。関数式により表現する場合、各関数は、ニューラルネットワーク、線形関数、決定木等により構成されてよい。なお、中間（隠れ）層が複数存在する多層構造のニューラルネットワークにより各関数を構成することで、深層強化学習が実施されてよい。 (1-2-1) Configuration Example of Inference Model In a second example, the inference model 552 may be value-based, policy-based, or both. When adopting a value base, the inference model 552 may be composed of value functions such as a state value function, an action value function (Q function), and the like. A state-value function is constructed to output the value of a given state. An action-value function is constructed to output the value of each action for a given state. If a policy base is employed, the inference model 552 may be composed of policy functions, for example. The policy function is constructed to output the probability of choosing each action for a given state. If both are employed, the inference model 552 may be composed of, for example, a value function (Critic) and a policy function (Actor). Each function may be represented by, for example, a data table, a function expression, or the like. When expressed by a functional formula, each function may be configured by a neural network, a linear function, a decision tree, or the like. Note that deep reinforcement learning may be performed by configuring each function with a multi-layered neural network having a plurality of intermediate (hidden) layers.

（１－２－２）学習データの構成例
強化学習では、基本的に、方策に従って行動することで、学習の環境と相互作用するエージェントが仮定される。エージェントの実体は、例えば、ＣＰＵである。推論モデル５５２は、上記の構成により、行動を決定する方策として動作する。エージェントは、与えられた学習の環境内で、強化する行動に関する状態を観測する。本実施形態では、観測対象となる状態は、相対座標により規定されるタスク状態であり、実行される行動は、現在のタスク状態から目標のタスク状態への遷移である。方策は、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を決定（推論）するように構成される。 (1-2-2) Configuration Example of Learning Data Reinforcement learning basically assumes an agent that interacts with the learning environment by acting according to a policy. The substance of the agent is, for example, a CPU. The inference model 552 operates as a policy for determining behavior due to the above configuration. The agent observes the state of reinforcing behavior within a given learning environment. In this embodiment, the state to be observed is the task state defined by the relative coordinates, and the action to be performed is the transition from the current task state to the target task state. The policy is configured to determine (infer) the next transition target task state from the current task state and the final target task state.

エージェントは、観測された現在のタスク状態（入力データ）を推論モデル５５２に与えて、次に遷移する目標のタスク状態を推論してよい。エージェントは、この推論の結果に基づいて、目標のタスク状態を決定してもよい。或いは、目標のタスク状態は、ランダムに決定されてよい。これにより、エージェントは、採用する行動を決定することができる。決定された目標のタスク状態に遷移する行動をエージェントが実行すると、観測されるタスク状態は次のタスク状態に遷移する。場合によって、エージェントは、学習の環境から即時報酬を得ることができる。 The agent may provide the observed current task state (input data) to the inference model 552 to infer the target task state to transition to next. The agent may determine the target task state based on the results of this inference. Alternatively, the target task state may be randomly determined. This allows the agent to decide which action to take. When the agent performs an action that transitions to the determined target task state, the observed task state transitions to the next task state. In some cases, agents can obtain immediate rewards from the learning environment.

この行動の決定及び実行の試行錯誤を繰り返しながら、エージェントは、即時報酬の総和（すなわち、価値）を最大化するように、推論モデル５５２を更新する。これにより、最適な行動、すなわち、高い価値の取得を期待可能な行動が強化され、そのような行動の選択を可能にする方策（学習済みの推論モデル５５２）を得ることができる。 While repeating this trial-and-error decision and execution of actions, the agent updates the inference model 552 so as to maximize the sum of immediate rewards (ie, value). This reinforces the optimal action, that is, the action that can be expected to acquire high value, and obtains a policy (learned inference model 552) that enables the selection of such action.

したがって、強化学習では、学習データ２２３２は、この試行錯誤により得られる状態遷移データであって、実行された行動により現在のタスク状態から次のタスク状態に遷移し、場合によって即時報酬が得られるという状態遷移を示す状態遷移データにより構成される。１件の状態遷移データは、１エピソード全ての状態遷移の軌跡を示すデータにより構成されてもよいし、或いは、所定回数（１回以上）分の状態遷移を示すデータにより構成されてもよい。上記ステップＳ２０２～ステップＳ２０９の処理過程において、制御部２１は、訓練中の推論モデル５５２を利用して、上記試行錯誤を実行することで、上記状態遷移データを取得することができる。 Therefore, in reinforcement learning, the learning data 2232 is state transition data obtained by this trial and error. It is composed of state transition data indicating state transitions. One piece of state transition data may be composed of data indicating the trajectory of state transitions for all of one episode, or may be composed of data indicating state transitions for a predetermined number of times (one or more times). In the process of steps S202 to S209, the control unit 21 can acquire the state transition data by performing trial and error using the inference model 552 being trained.

また、状態遷移に応じて即時報酬を算出するのには報酬関数が用いられてよい。報酬関数は、データテーブル、関数式、又はルールにより表現されてよい。関数式により表現する場合、報酬関数は、ニューラルネットワーク、線形関数、決定木等により構成されてよい。報酬関数は、オペレータ等により、手動的に設定されてもよい。 Also, a reward function may be used to calculate immediate rewards in response to state transitions. A reward function may be represented by a data table, a functional expression, or a rule. When expressed by a functional formula, the reward function may be composed of a neural network, a linear function, a decision tree, or the like. The reward function may be manually set by an operator or the like.

或いは、報酬関数は、遷移する対象のタスク状態において第１対象物及び第２対象物が互いに接触するか否かを上記学習済みの判定モデル５０により判定した結果、並びに当該対象のタスク状態及び最終目標のタスク状態の間の距離に応じて即時報酬を与えるように設定されてよい。具体的には、即時報酬は、第１対象物及び第２対象物が互いに接触せず、かつ対象のタスク状態及び最終目標のタスク状態の間の距離が短いほど多く設定され、第１対象物及び第２対象物が互いに接触する、又は当該距離が長いほど少なく設定されてよい。以下の式１は、このように即時報酬を与える報酬関数の一例を例示する。 Alternatively, the reward function is the result of determining whether the first object and the second object contact each other in the transition target task state by the learned determination model 50, and the target task state and final It may be set to give instant rewards depending on the distance between the target task states. Specifically, the immediate reward is set more as the first object and the second object do not contact each other and the distance between the target task state and the final goal task state is shorter, and the first object and the second object contact each other, or the longer the distance, the smaller the number may be set. Equation 1 below illustrates an example reward function that provides an immediate reward in this way.

ｓ_cは、方策により決定された目標のタスク状態を示す。ｓ_gは、最終目標のタスク状態を示す。Ｆ（ｓ_c）は、タスク状態ｓ_cにおいて第１対象物及び第２対象物が互いに接触するか否かを学習済みの判定モデル５０により判定した結果を示す。互いに接触すると判定された場合、Ｆ（ｓ_c）の値は小さくなり（例えば、０）、互いに接触しないと判定された場合に、Ｆ（ｓ_c）の値は大きくなる（例えば、１）ように設定されてよい。学習済みの判定モデル５０の出力値が当該設定に対応している場合には、学習済みの判定モデル５０の出力値がそのままＦ（ｓ_c）として使用されてもよい。

s _c denotes the target task state determined by the policy. s _g indicates the final target task state. F(s _c ) indicates the result of determination by the learned determination model 50 whether or not the first object and the second object are in contact with each other in the task state s _c . If it is determined that they are in contact with each other, the value of F(s _c ) will be small (eg, 0), and if it is determined that they are not in contact with each other, the value of F(s _c ) will be large (eg, 1). may be set to When the output value of the learned judgment model 50 corresponds to the setting, the output value of the learned judgment model 50 may be used as F(s _c ) as it is.

或いは、報酬関数は、エキスパートにより得られた事例データから逆強化学習により推定されてよい。事例データは、エキスパートによる実演（の軌跡）を示すデータにより構成されてよい。本実施形態では、事例データは、例えば、任意の開始点のタスク状態から最終目標のタスク状態に到達するように第１対象物を実際に移動した経路を示すデータにより構成されてよい。事例データを生成する方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。事例データは、例えば、エキスパートによる実演の軌跡をセンサ等により記録することにより生成されてよい。 Alternatively, the reward function may be estimated by inverse reinforcement learning from case data obtained by experts. The example data may consist of data showing (the trajectory of) a demonstration by an expert. In this embodiment, the case data may be composed of, for example, data indicating the actual movement path of the first object so as to reach the final goal task state from the arbitrary starting point task state. A method for generating case data is not particularly limited, and may be appropriately selected according to the embodiment. Case data may be generated, for example, by recording the trajectory of an expert's performance with a sensor or the like.

逆強化学習の方法は、特に限定されなくてよく、実施の形態に応じて適宜選択されてよい。逆強化学習には、例えば、最大エントロピー原理による方法、相対エントロピーの最小化による方法、敵対的生成ネットワークを利用した方法（例えば、Justin Fu, et al., "Learning Robust Rewards with Adversarial Inverse Reinforcement Learning", arXiv:1710.11248, 2018）等が用いられてよい。逆強化学習により報酬関数を得る場合、学習データ２２３２は、逆強化学習に利用する事例データを更に備えてよい。 A method of inverse reinforcement learning may not be particularly limited, and may be appropriately selected according to the embodiment. Inverse reinforcement learning includes, for example, methods based on the maximum entropy principle, methods based on the minimization of relative entropy, and methods using generative adversarial networks (e.g., Justin Fu, et al., "Learning Robust Rewards with Adversarial Inverse Reinforcement Learning" , arXiv:1710.11248, 2018) and the like may be used. When obtaining a reward function by inverse reinforcement learning, the learning data 2232 may further include case data used for inverse reinforcement learning.

（１－２－３）ステップＳ２１０について
上記ステップＳ２１０では、制御部２１は、得られた状態遷移データに基づいて、価値を最大化するように推論モデル５５２の演算パラメータの値を更新する。推論モデル５５２の演算パラメータの値を調整する方法は、推論モデル５５２の構成に応じて適宜選択されてよい。例えば、推論モデル５５２がニューラルネットワークにより構成される場合、推論モデル５５２の演算パラメータの値は、誤差逆伝播法等により、上記第１の例と同様の方法で調整されてよい。 (1-2-3) Step S210 At step S210, the control unit 21 updates the values of the calculation parameters of the inference model 552 so as to maximize the value based on the obtained state transition data. A method for adjusting the values of the calculation parameters of the inference model 552 may be appropriately selected according to the configuration of the inference model 552 . For example, when the inference model 552 is composed of a neural network, the values of the calculation parameters of the inference model 552 may be adjusted in the same manner as in the first example, such as by error backpropagation.

制御部２１は、得られる価値（の期待値）が最大化されるように（例えば、更新量が閾値以下になるまで）、推論モデル５５２の演算パラメータの値を調整する。すなわち、推論モデル５５２を訓練することは、所定の条件（例えば、更新量が閾値以下になること）を満たすまで報酬が多く得られるように推論モデル５５２を構成する演算パラメータの値の修正を繰り返すことを含む。これにより、制御部２１は、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論する能力を獲得した学習済みの推論モデル５５２を生成することができる。 The control unit 21 adjusts the values of the calculation parameters of the inference model 552 so that (the expected value of) the obtained value is maximized (for example, until the update amount becomes equal to or less than the threshold). That is, training the inference model 552 repeats correction of the values of the calculation parameters that make up the inference model 552 so that a large reward can be obtained until a predetermined condition (for example, the update amount is equal to or less than a threshold) is satisfied. Including. As a result, the control unit 21 can generate the learned inference model 552 that has acquired the ability to infer the next transition target task state from the current task state and the final target task state.

なお、制御部２１は、ステップＳ２０２～ステップＳ２０９の処理により学習データ２２３２を収集し終えた後に、上記推論モデル５５２の演算パラメータの値の調整を実行してもよい。或いは、制御部２１は、ステップＳ２０２～ステップＳ２１０の処理を繰り返しながら、上記推論モデル５５２の演算パラメータの値の調整を実行してもよい。 Note that the control unit 21 may adjust the values of the calculation parameters of the inference model 552 after completing the collection of the learning data 2232 through the processing of steps S202 to S209. Alternatively, the control unit 21 may adjust the values of the calculation parameters of the inference model 552 while repeating the processes of steps S202 to S210.

推論モデル５５２が価値ベースで構成される場合、上記強化学習の方法には、ＴＤ（temporal difference）法、ＴＤ（λ）法、モンテカルロ法、動的計画法等が用いられてよい。試行錯誤における行動の決定は、オンポリシであってもよいし、オフポリシであってもよい。具体例として、強化学習の方法には、Ｑ学習、Ｓａｒｓａ等が用いられてよい。試行錯誤の際には、確率εでランダムな行動を採用してもよい（ε－グリーディ法）。 When the inference model 552 is configured on a value basis, the TD (temporal difference) method, TD(λ) method, Monte Carlo method, dynamic programming, or the like may be used as the reinforcement learning method. Behavioral decisions in trial and error may be on-policy or off-policy. As a specific example, Q-learning, Sarsa, or the like may be used as the reinforcement learning method. During trial and error, random actions may be taken with probability ε (ε-greedy method).

また、推論モデル５５２が方策ベースで構成される場合、上記強化学習の方法には、方策勾配法、ＴＲＰＯ（trust region policy optimization）、ＰＰＯ（proximal policy optimization）等が用いられてよい。この場合、制御部２１は、得られる価値が増加する方向に方策関数の演算パラメータの勾配を算出し、算出された勾配に基づいて、方策関数の演算パラメータの値を更新する。方策関数の勾配の算出には、例えば、ＲＥＩＮＦＯＲＣＥアルゴリズム等が用いられてよい。 Also, when the inference model 552 is configured on a policy basis, the method of reinforcement learning may be a policy gradient method, TRPO (trust region policy optimization), PPO (proximal policy optimization), or the like. In this case, the control unit 21 calculates the gradient of the calculation parameter of the policy function in the direction in which the value obtained increases, and updates the value of the calculation parameter of the policy function based on the calculated gradient. For example, the REINFORCE algorithm or the like may be used to calculate the gradient of the policy function.

また、推論モデル５５が両方で構成される場合、上記強化学習の方法には、Actor Critic法、Ａ２Ｃ（Advantage Actor Critic）、Ａ３Ｃ（Asynchronous Advantage Actor Critic）等が用いられてよい。 Moreover, when the inference model 55 is composed of both, the Actor Critic method, A2C (Advantage Actor Critic), A3C (Asynchronous Advantage Actor Critic), etc. may be used as the reinforcement learning method.

更に、逆強化学習を実施する場合には、上記強化学習の処理を実行する前に、制御部２１は、事例データを更に取得する。事例データは、第２モデル生成装置２により生成されてもよいし、他のコンピュータにより生成されてもよい。他のコンピュータにより生成される場合、制御部２１は、ネットワーク、記憶媒体９２等を介して、他のコンピュータにより生成された事例データを取得してもよい。次に、制御部２１は、取得された事例データを利用して、逆強化学習を実行することで、報酬関数を設定する。そして、制御部２１は、逆強化学習により設定された報酬関数を利用して、上記強化学習の処理を実行する。これにより、制御部２１は、逆強化学習により設定された報酬関数を利用して、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論する能力を獲得した学習済みの推論モデル５５２を生成することができる。 Furthermore, when performing inverse reinforcement learning, the control unit 21 further acquires case data before executing the reinforcement learning process. The case data may be generated by the second model generation device 2 or may be generated by another computer. When generated by another computer, the control unit 21 may acquire the case data generated by the other computer via a network, the storage medium 92, or the like. Next, the control unit 21 sets a reward function by executing inverse reinforcement learning using the acquired case data. Then, the control unit 21 uses the reward function set by the inverse reinforcement learning to execute the reinforcement learning process. As a result, the control unit 21 utilizes the reward function set by inverse reinforcement learning, and acquires the ability to infer the next transition target task state from the current task state and the final target task state. can generate an inference model 552 of

ステップＳ２１１では、制御部２１は、強化学習により構築された学習済みの推論モデル５５２を示す情報を推論モデルデータ２２５２として生成する。学習済みの推論モデル５５２を示す情報には、例えば、データテーブルの各項目の値、関数式の係数の値等の演算パラメータを示す情報が含まれてよい。そして、制御部２１は、生成された推論モデルデータ２２５２を所定の記憶領域に保存する。第２の例によれば、第１対象物及び第２対象物の無用な接触を避けると共に、マニピュレータ４のタスク状態が最終目標のタスク状態に早く到達するように目標のタスク状態を決定可能な推論モデル５５を生成することができる。 In step S211, the control unit 21 generates, as the inference model data 2252, information indicating the learned inference model 552 constructed by reinforcement learning. The information indicating the learned inference model 552 may include, for example, information indicating calculation parameters such as the value of each item in the data table and the value of the coefficient of the function expression. Then, the control unit 21 saves the generated inference model data 2252 in a predetermined storage area. According to the second example, the target task state can be determined so that the task state of the manipulator 4 quickly reaches the final target task state while avoiding unnecessary contact between the first object and the second object. An inference model 55 can be generated.

（１－３）小括
本実施形態では、機械学習モデルにより推論モデル５５を構成する場合、推論モデル５５の構成には、上記２つの例の少なくともいずれかが採用されてもよい。制御部２１は、上記２つの機械学習の方法の少なくともいずれかを採用することで、第１対象物が第２対象物に接触しないように、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論する能力を獲得した学習済みの推論モデル５５を生成することができる。よって、第１の方法によれば、タスクの遂行に利用可能な推論モデル５５を適切に生成することができる。 (1-3) Summary In the present embodiment, when configuring the inference model 55 using a machine learning model, at least one of the above two examples may be adopted for the configuration of the inference model 55 . By adopting at least one of the above two machine learning methods, the control unit 21 controls the current task state and the final target task state so that the first object does not come into contact with the second object. A trained inference model 55 can be generated that has acquired the ability to infer transitional target task states. Therefore, according to the first method, it is possible to appropriately generate the inference model 55 that can be used to accomplish the task.

（２）第２の方法
図１５Ａは、第２の方法における学習データ２２３の一例を模式的に例示する。図１５Ｂは、第２の方法における推論モデル５５の構成の一例を模式的に例示する。第２の方法では、推論モデル５５は、タスク状態の集合を表現するタスク空間ＳＰ内の各座標のポテンシャルを規定するポテンシャル場により構成される。なお、図１５Ａ及び図１５Ｂでは、説明の便宜上、推論モデル５５、学習データ２２３、及び推論モデルデータ２２５それぞれの一例を、推論モデル５５３、学習データ２２３３、及び推論モデルデータ２２５３と表記する。 (2) Second Method FIG. 15A schematically illustrates an example of learning data 223 in the second method. FIG. 15B schematically illustrates an example of the configuration of the inference model 55 in the second method. In a second method, the inference model 55 consists of a potential field that defines the potential of each coordinate within the task space SP representing the set of task states. 15A and 15B, examples of the inference model 55, the learning data 223, and the inference model data 225 are denoted as an inference model 553, learning data 2233, and inference model data 2253 for convenience of explanation.

制御部２１は、上記ステップＳ２０２～ステップＳ２０９の処理により、タスク空間ＳＰ内で、学習済みの判定モデル５０を利用して、第１対象物が第２対象物に接触しないように、パスプランニングを実施する。これにより、図１５Ａに例示されるとおり、制御部２１は、それぞれ開始点（ノードＮｓ）として与えられた複数のタスク状態それぞれから最終目標のタスク状態までの経路Ｈｂを示す学習データ２２３３を生成することができる。各開始点（ノードＮｓ）は、ランダムに与えられてよい。 Through the processing of steps S202 to S209, the control unit 21 uses the learned judgment model 50 in the task space SP to perform path planning so that the first object does not come into contact with the second object. implement. Thereby, as illustrated in FIG. 15A, the control unit 21 generates learning data 2233 indicating paths Hb from each of the plurality of task states given as starting points (nodes Ns) to the final target task state. be able to. Each starting point (node Ns) may be given randomly.

上記ステップＳ２１０では、制御部２１は、生成された学習データ２２３３により示される各経路Ｈｂの通過する頻度に応じて、各座標のポテンシャルを設定することで、ポテンシャル場を生成する。ポテンシャル場を導出する方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。制御部２１は、は、例えば、カーネル密度推定、混合ガウスモデル（ＧＭＭ：Gaussian Mixture Model）を用いた推定により、学習データ３２３３からポテンシャル場を導出してもよい。これにより、図１５Ｂに例示されるポテンシャル場（推論モデル５５３）を得ることができる。 In step S210 described above, the control unit 21 generates a potential field by setting the potential of each coordinate according to the frequency of passage of each route Hb indicated by the generated learning data 2233 . A method for deriving the potential field is not particularly limited, and may be appropriately selected according to the embodiment. The control unit 21 may derive the potential field from the learning data 3233 by, for example, kernel density estimation or estimation using a Gaussian Mixture Model (GMM). As a result, the potential field (inference model 553) exemplified in FIG. 15B can be obtained.

ポテンシャル場における各座標のポテンシャルは、最終目標に到達することに対する、各座標における第１対象物及び第２対象物の位置関係の評価値を示す。すなわち、ポテンシャルが高いほど、その座標における位置関係が最終目標に到達する可能性が高いことを示し、ポテンシャルが低いほど、その座標における位置関係が最終目標に到達する可能性が低いことを示す。そのため、ポテンシャルの勾配の高い方に遷移することで、開始点となる任意のタスク状態から最終目標のタスク状態に適切に到達することができる。よって、第２の方法によれば、タスクの遂行に利用可能な推論モデル５５を適切に生成することができる。 The potential of each coordinate in the potential field indicates an evaluation value of the positional relationship of the first object and the second object at each coordinate with respect to reaching the final goal. That is, the higher the potential, the higher the possibility that the positional relationship at that coordinate will reach the final target, and the lower the potential, the lower the possibility that the positional relationship at that coordinate will reach the final target. Therefore, by making a transition to the higher potential gradient, it is possible to appropriately reach the final target task state from an arbitrary task state that is the starting point. Therefore, according to the second method, it is possible to appropriately generate the inference model 55 that can be used to accomplish the task.

ステップＳ２１１では、制御部２１は、生成されたポテンシャル場を示す情報を推論モデルデータ２２５３として生成する。ポテンシャル場は、データテーブル、関数式等により表現されてよい。そして、制御部２１は、生成された推論モデルデータ２２５３を所定の記憶領域に保存する。 In step S211 , the control unit 21 generates information indicating the generated potential field as the inference model data 2253 . A potential field may be represented by a data table, a functional expression, or the like. Then, the control unit 21 saves the generated inference model data 2253 in a predetermined storage area.

（３）小括
本実施形態では、推論モデル５５を生成する方法として、上記２つの方法のうちの少なくともいずれかが採用されてよい。制御部２１は、上記２つの方法のうちの少なくともいずれかを採用することで、第１対象物が第２対象物に接触しないように、現在のタスク状態及び最終目標のタスク状態から次に遷移する目標のタスク状態を推論するように構成された推論モデル５５を生成することができる。 (3) Summary In this embodiment, at least one of the above two methods may be adopted as a method of generating the inference model 55 . By adopting at least one of the above two methods, the control unit 21 transitions from the current task state and the final target task state to the next so that the first object does not come into contact with the second object. An inference model 55 may be generated that is configured to infer a target task state to be executed.

なお、第１対象物が第２対象物に接触しないように目標のタスク状態を推論することは、第１対象物及び第２対象物の間で意図しない接触が生じるのを避けて目標のタスク状態を決定することであり、例えば、エンドエフェクタＴがワークＷを保持する等の第１対象物が第２対象物に適正に接触するタスク状態を目標のタスク状態として決定することを含んでもよい。すなわち、避ける対象となる「接触する」状態は、例えば、第１対象物及び第２対象物の間で過度な力が作用する、第１対象物及び第２対象物の一方が他方に正しい姿勢で組付けられる以外の状態で接触する等の不適正に接触する状態である。したがって、「第１対象物が第２対象物に接触しない」ことは、「第１対象物が第２対象物に不適正な状態で接触するのを避ける」ことに置き換えられてよい。 It should be noted that inferring the target task state such that the first object does not touch the second object is to avoid unintended contact between the first object and the second object to avoid unintended contact between the first object and the second object. Determining a state, and may include determining, as a target task state, a task state in which the first object properly contacts the second object, such as the end effector T holding the work W. . That is, the "contact" state to be avoided is, for example, when excessive force acts between the first object and the second object, and one of the first object and the second object is in the correct posture with respect to the other. It is a state of improper contact such as contacting in a state other than assembled with. Therefore, "the first object does not contact the second object" may be replaced with "avoid the first object from improperly contacting the second object".

［制御装置］
（Ａ）動作制御
次に、図１６Ａ、図１６Ｂ及び図１７を用いて、本実施形態に係る制御装置３のマニピュレータ４の動作制御に関する動作例について説明する。図１６Ａ及び図１６Ｂは、本実施形態に係る制御装置３によるマニピュレータ４の動作制御に関する処理手順の一例を示すフローチャートである。図１７は、動作制御の過程における各要素の計算処理のフローの一例を示す。ただし、以下で説明する各処理手順は一例に過ぎず、各ステップは可能な限り変更されてよい。更に、以下で説明する各処理手順について、実施の形態に応じて、適宜、ステップの省略、置換、及び追加が可能である。なお、以下で説明するマニピュレータ４の動作の制御は、実空間上で実施されてもよいし、或いは仮想空間上で実施されてもよい。 [Control device]
(A) Operation Control Next, an operation example regarding operation control of the manipulator 4 of the control device 3 according to the present embodiment will be described with reference to FIGS. 16A, 16B, and 17. FIG. FIGS. 16A and 16B are flowcharts showing an example of a processing procedure regarding motion control of the manipulator 4 by the control device 3 according to this embodiment. FIG. 17 shows an example of the calculation processing flow of each element in the process of motion control. However, each processing procedure described below is merely an example, and each step may be changed as much as possible. Furthermore, for each processing procedure described below, it is possible to omit, replace, or add steps as appropriate according to the embodiment. Note that the control of the operation of the manipulator 4, which will be described below, may be performed in real space or in virtual space.

（ステップＳ３０１及びステップＳ３０２）
ステップＳ３０１では、制御部３１は、遂行するタスクの指定を受け付ける。タスクの指定を受け付ける方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、制御部３１は、入力装置３５を介したタスクの名称の入力により、遂行するタスクの指定を受け付けてもよい。また、例えば、制御部３１は、遂行するタスクの候補を示すリストを出力装置３６に出力し、オペレータに遂行するタスクをリストから選択させることで、遂行するタスクの指定を受け付けてもよい。 (Step S301 and Step S302)
In step S301, control unit 31 accepts designation of a task to be performed. A method of accepting task designation is not particularly limited, and may be appropriately selected according to the embodiment. For example, the control unit 31 may receive designation of the task to be performed by inputting the name of the task via the input device 35 . Further, for example, the control unit 31 may receive designation of a task to be performed by outputting a list indicating candidates of tasks to be performed to the output device 36 and having the operator select a task to be performed from the list.

本実施形態では、制御部３１は、第１対象物及び第２対象物の存在する環境下で、第２対象物に対して第１対象物を移動するタスクの遂行を受け付ける。具体的には、マニピュレータ４を駆動して、エンドエフェクタＴによりワークＷを保持し、保持したワークＷを他のワークＧに組み付ける一連の作業が、指定されるタスクの一例である。本実施形態では、ワークＷを保持しに行く第１タスクの過程では、エンドエフェクタＴの注目点Ｔ０がマニピュレータ４の手先として取り扱われ、ワークＷが、手先の移動の目標物である。一方、ワークＷを保持した後、ワークＷを他のワークＧに組み付ける第２タスクの過程では、エンドエフェクタＴに保持されたワークＷの注目点Ｗ０がマニピュレータ４の手先として取り扱われ、ワークＷの組付け先である他のワークＷが、手先の移動の目標物である。各タスクでは、マニピュレータ４の手先が第１対象物に相当し、目標物が第２対象物に相当する。 In this embodiment, the control unit 31 receives execution of a task of moving the first object with respect to the second object in an environment where the first object and the second object exist. Specifically, a series of operations of driving the manipulator 4, holding the work W by the end effector T, and assembling the held work W to another work G is an example of the designated task. In this embodiment, in the process of the first task to hold the work W, the point of interest T0 of the end effector T is treated as the tip of the manipulator 4, and the work W is the target of movement of the tip. On the other hand, after holding the work W, in the process of the second task of assembling the work W to another work G, the attention point W0 of the work W held by the end effector T is treated as the hand of the manipulator 4, Another workpiece W to be assembled is the target for movement of the hand. In each task, the hand of the manipulator 4 corresponds to the first object, and the target corresponds to the second object.

ステップＳ３０２では、制御部３１は、目標設定部３１０として動作し、指定されたタスクに応じて最終目標のタスク状態ｓ_gを設定する。上記のとおり、本実施形態では、タスク状態は、マニピュレータ４の手先及び目標物の間の位置関係により規定される。また、位置関係は、相対座標により表現される。最終目標のタスク状態ｓ_gにおける相対座標は、ＣＡＤ等のシミュレータにより与えられてもよいし、オペレータの指定により与えられてもよい。最終目標のタスク状態ｓ_gにおける相対座標は、上記ステップＳ２０１と同様の方法により設定されてよい。最終目標のタスク状態ｓ_gを設定すると、制御部３１は、次のステップＳ３０３に処理を進める。 In step S302, the control unit 31 operates as the target setting unit 310, and sets the final target task state s _g according to the specified task. As described above, in this embodiment, the task state is defined by the positional relationship between the hand of the manipulator 4 and the target. Also, the positional relationship is represented by relative coordinates. The relative coordinates in the final target task state s _g may be given by a simulator such as CAD, or may be given by an operator's designation. The relative coordinates in the final target task state s _g may be set by the same method as in step S201. After setting the final target task state s _g , the control unit 31 proceeds to the next step S303.

（ステップＳ３０３）
ステップＳ３０３では、制御部３１は、第１データ取得部３１１として動作し、第１センサ系から第１センシングデータ３２３を取得する。また、制御部３１は、第２データ取得部３１２として動作し、第２センサ系から第２センシングデータ３２４を取得する。 (Step S303)
In step S303, the control unit 31 operates as the first data acquisition unit 311 and acquires the first sensing data 323 from the first sensor system. Also, the control unit 31 operates as a second data acquisition unit 312 and acquires second sensing data 324 from the second sensor system.

本実施形態では、第１センサ系は、各関節（関節部４１～４６）の角度を測定するエンコーダＳ２及びエンドエフェクタＴに作用する力を測定する触覚センサＳ３により構成される。制御部３１は、第１センシングデータ３２３として、マニピュレータ４における各関節の角度の現在値ｑ_(j)（すなわち、現在の測定値）を各エンコーダＳ２から取得することができる。更に、制御部３１は、第１センシングデータ３２３として、エンドエフェクタＴに作用する力の測定データを触覚センサＳ３から取得することができる。また、本実施形態では、第２センサ系は、カメラＳ１により構成される。制御部３１は、第２センシングデータ３２４として、タスクを遂行する環境の写る画像データをカメラＳ１から取得することができる。なお、以下では、説明の便宜のため、「現在」等のタイミングを特に区別する場合に（ｊ）等のタイミングを表す符号を付し、そうではない場合には当該符号を省略する。 In this embodiment, the first sensor system comprises an encoder S2 for measuring the angle of each joint (joints 41 to 46) and a tactile sensor S3 for measuring the force acting on the end effector T. FIG. As the first sensing data 323, the control unit 31 can acquire the current value q _(j) of the angle of each joint in the manipulator 4 (that is, the current measured value) from each encoder S2. Furthermore, the control unit 31 can acquire measurement data of force acting on the end effector T from the tactile sensor S3 as the first sensing data 323 . Moreover, in this embodiment, the second sensor system is configured by the camera S1. As the second sensing data 324, the control unit 31 can acquire image data showing the environment in which the task is performed from the camera S1. In the following, for the convenience of explanation, when the timing such as "now" is particularly distinguished, a symbol representing the timing such as (j) will be attached, and if not, the symbol will be omitted.

制御部３１は、各センサ（カメラＳ１、エンコーダＳ２、触覚センサＳ３）から各センシングデータ（３２３、３２４）を直接的に取得してもよいし、或いは、例えば、他のコンピュータを経由する等して、各センシングデータ（３２３、３２４）を間接的に取得してもよい。カメラＳ１及び触覚センサＳ３はそれぞれ、エンドエフェクタＴに対するワークＷの状態を観測する観測センサの一例である。各センシングデータ（３２３、３２４）を取得すると、制御部３１は、次のステップＳ３０４に処理を進める。 The control unit 31 may directly acquire each sensing data (323, 324) from each sensor (camera S1, encoder S2, tactile sensor S3), or, for example, via another computer. The sensing data (323, 324) may be obtained indirectly by using the The camera S1 and the tactile sensor S3 are examples of observation sensors that observe the state of the work W with respect to the end effector T, respectively. After acquiring each sensing data (323, 324), the control unit 31 advances the process to the next step S304.

（ステップＳ３０４）
ステップＳ３０４では、制御部３１は、ステップＳ３０３により得られた上記観測センサのセンシングデータに基づいて、エンドエフェクタＴがワークＷを保持しているか否かを判定する。判定方法は、特に限定されなくてもよく、センシングデータに応じて適宜決定されてよい。 (Step S304)
At step S304, the control unit 31 determines whether or not the end effector T holds the workpiece W based on the sensing data of the observation sensor obtained at step S303. The determination method is not particularly limited, and may be determined as appropriate according to sensing data.

例えば、本実施形態では、第２センシングデータ３２４として、タスクの環境が写る画像データをカメラＳ１から取得することができる。そこで、制御部３１は、ＣＡＤデータ３２０を利用して、取得された画像データに対してエンドエフェクタＴ及びワークＷのモデルをマッチングしてもよい。そして、制御部３１は、当該マッチングの結果により特定されたエンドエフェクタＴ及びワークＷの位置関係に基づいて、エンドエフェクタＴがワークＷを保持しているか否かを判定してもよい。マッチングの方法には、公知の画像処理方法が用いられてよい。 For example, in the present embodiment, as the second sensing data 324, image data representing the environment of the task can be acquired from the camera S1. Therefore, the control unit 31 may use the CAD data 320 to match models of the end effector T and the work W with the acquired image data. Then, the control unit 31 may determine whether or not the end effector T holds the work W based on the positional relationship between the end effector T and the work W specified as a result of the matching. A known image processing method may be used as the matching method.

また、例えば、本実施形態では、第１センシングデータ３２３として、エンドエフェクタＴに作用する力の測定データを取得することができる。そこで、制御部３１は、測定データにより表れる力の分布に基づいて、エンドエフェクタＴがワークＷを保持しているか否かを判定してもよい。エンドエフェクタＴがワークＷを保持していると認められる力がエンドエフェクタＴに作用していると測定データから推定される場合、制御部３１は、エンドエフェクタＴがワークＷを保持していると判定してもよい。一方、そうではない場合、制御部３１は、エンドエフェクタＴはワークＷを保持していないと判定してもよい。 Further, for example, in the present embodiment, measurement data of force acting on the end effector T can be acquired as the first sensing data 323 . Therefore, the control section 31 may determine whether or not the end effector T holds the workpiece W based on the distribution of force appearing in the measurement data. If it is estimated from the measurement data that the end effector T is holding the workpiece W with force acting on the end effector T, the control unit 31 determines that the end effector T is holding the workpiece W. You can judge. On the other hand, otherwise, the control section 31 may determine that the end effector T does not hold the work W. FIG.

センシングデータに基づいて、エンドエフェクタＴがワークＷを保持しているか否かの判定が完了すると、制御部３１は、次のステップＳ３０５に処理を進める。 After completing the determination of whether or not the end effector T holds the workpiece W based on the sensing data, the control unit 31 proceeds to the next step S305.

（ステップＳ３０５）
ステップＳ３０５では、制御部３１は、ステップＳ３０４の判定の結果に基づいて、マニピュレータ４の動作モードを設定する。具体的には、エンドエフェクタＴがワークＷを保持していないと判定した場合、制御部３１は、エンドエフェクタＴの注目点Ｔ０をマニピュレータ４の手先に設定し、エンドエフェクタＴによりワークＷを保持する第１タスクを遂行するモードに動作モードを設定する。一方、エンドエフェクタＴがワークＷを保持していると判定した場合、制御部３１は、ワークＷの注目点Ｗ０をマニピュレータ４の手先に設定し、エンドエフェクタＴにより保持されたワークＷを他のワークＧに組み付ける第２タスクを遂行するモードに動作モードを設定する。動作モードの設定が完了すると、制御部３１は、次のステップＳ３０６に処理を進める。 (Step S305)
At step S305, the control unit 31 sets the operation mode of the manipulator 4 based on the determination result at step S304. Specifically, when it is determined that the end effector T does not hold the work W, the control unit 31 sets the attention point T0 of the end effector T to the tip of the manipulator 4, and holds the work W by the end effector T. The operation mode is set to the mode in which the first task to be executed is performed. On the other hand, when it is determined that the end effector T holds the work W, the control unit 31 sets the attention point W0 of the work W to the tip of the manipulator 4, and moves the work W held by the end effector T to another position. The operation mode is set to the mode in which the second task to assemble the workpiece G is performed. When the setting of the operation mode is completed, the control unit 31 advances the process to the next step S306.

（ステップＳ３０６）
ステップＳ３０６では、制御部３１は、状態取得部３１５として動作し、マニピュレータ４の現在のタスク状態ｓ_(j)を取得する。 (Step S306)
In step S306 , the control unit 31 operates as the state acquisition unit 315 and acquires the current task state s _(j) of the manipulator 4 .

上記のとおり、本実施形態では、エンドエフェクタＴがワークＷを保持していない場合、タスク状態ｓは、エンドエフェクタＴに対するワークＷの相対座標により規定される。一方、エンドエフェクタＴがワークＷを保持している場合、タスク状態ｓは、ワークＷに対する他のワークＧの相対座標により規定される。本実施形態では、制御部３１は、ＣＡＤデータ３２０を利用して、カメラＳ１により得られた画像データに対して各対象物をマッチングする。制御部３１は、このマッチングの結果から、現在のタスク状態ｓ_(j)を取得することができる。 As described above, in this embodiment, when the end effector T does not hold the work W, the task state s is defined by the relative coordinates of the work W with respect to the end effector T. On the other hand, when the end effector T holds the work W, the task state s is defined by the relative coordinates of another work G with respect to the work W. In this embodiment, the control unit 31 uses the CAD data 320 to match each object with the image data obtained by the camera S1. The control unit 31 can obtain the current task state s _(j) from the result of this matching.

ここで、図１８を更に用いて、現在のタスク状態ｓ_(j)を取得する方法の一例について説明する。図１８は、各対象物の位置関係の一例を模式的に例示する。図１８の例では、マニピュレータ４の台座部４０に観測空間の原点が設定されている。ただし、原点の位置は、このような例に限定されなくてもよく、実施の形態に応じて決定されてよい。原点に対するカメラＳ１の同次座標（Ｔ_C）は、以下の式２により表現することができる。 An example of a method for acquiring the current task state s _(j) will now be described with further reference to FIG. FIG. 18 schematically illustrates an example of the positional relationship of each target. In the example of FIG. 18 , the origin of the observation space is set on the pedestal 40 of the manipulator 4 . However, the position of the origin need not be limited to such an example, and may be determined according to the embodiment. The homogeneous coordinates (T _C ) of the camera S1 with respect to the origin can be expressed by Equation 2 below.

Ｒ_rcは、原点の座標系からカメラＳ１の座標系を見た回転成分を示し、ｔ_rcは、平行移動成分を示す。以下では、説明の便宜上、原点の同次座標（Ｔ_R）以下の式３を満たすようにカメラＳ１がキャリブレーションされていると想定する。

_Rrc indicates a rotation component when the coordinate system of the camera S1 is viewed from the coordinate system of the origin, and _trc indicates a translation component. In the following, for convenience of explanation, it is assumed that the camera S1 is calibrated so that the homogeneous coordinates (T _R ) of the origin and the following Equation 3 are satisfied.

Ｉは、単位行列を示す。図１８の例では、原点に対するエンドエフェクタＴの注目点Ｔ０の相対座標がエンドエフェクタＴの座標（Ｔ_t）である。原点に対するワークＷの注目点Ｗ０の相対座標がワークＷの座標（Ｔ_w）である。原点に対する他のワークＧの注目点Ｇ０の相対座標が他のワークＧの座標（Ｔ_g）である。制御部３１は、ＣＡＤデータ３２０を利用して、画像データに対して各対象物のモデルをマッチングすることで、各座標（Ｔ_t、Ｔ_w、Ｔ_g）の推定値を得ることができる。制御部３１は、処理タイミングにおいて得られた各座標の推定値を各座標の現在値として利用することができる。

I indicates an identity matrix. In the example of FIG. 18, the coordinates of the target point T0 of the end effector T relative to the origin are the coordinates of the end effector T (T _t ). The relative coordinates of the target point W0 of the work W with respect to the origin are the coordinates of the work W (T _w ). The relative coordinates of the target point G0 of another work G with respect to the origin are the coordinates of the other work G (T _g ). Using the CAD data 320, the control unit 31 can obtain an estimated value of each coordinate (T _t , T _w , T _g ) by matching the model of each object to the image data. The control unit 31 can use the estimated value of each coordinate obtained at the processing timing as the current value of each coordinate.

エンドエフェクタＴがワークＷを保持していない場合には、タスク状態ｓとエンドエフェクタＴ及びワークＷの各座標（Ｔ_t、Ｔ_w）との関係は、上記式４により表現することができる。そのため、制御部３１は、マッチングの結果により推定されたエンドエフェクタＴ及びワークＷの各座標の現在値（Ｔ_t(j)、Ｔ_w(j)）を上記式４に代入し、上記式４の演算処理を実行することで、現在のタスク状態ｓ_(j)の推定値を算出することができる。この現在のタスク状態ｓ_(j)の推定値を算出することが、現在のタスク状態ｓ_(j)を取得することに相当する。

When the end effector T does not hold the work W, the relationship between the task state s and the coordinates (T _t , T _w ) of the end effector T and the work W can be expressed by Equation 4 above. Therefore, the control unit 31 substitutes the current values (T _t(j) , T _w(j) ) of the coordinates of the end effector T and the work W estimated from the matching result into the above equation 4, and By executing the arithmetic processing of , an estimated value of the current task state s _(j) can be calculated. Calculating the estimated value of the current task state s _(j) corresponds to obtaining the current task state s _(j) .

一方、エンドエフェクタＴがワークＷを保持している場合には、タスク状態ｓとワークＷ及び他のワークＧの各座標（Ｔ_w、Ｔ_g）との関係は、上記式５により表現することができる。そのため、制御部３１は、マッチングの結果により推定されたワークＷ及び他のワークＧの各座標の現在値（Ｔ_w(j)、Ｔ_g(j)）を上記式５に代入し、上記式５の演算処理を実行することで、現在のタスク状態ｓ_(j)の推定値を算出することができる。なお、各座標（Ｔ_t、Ｔ_w、Ｔ_g）の表現は、適宜選択されてよい。各座標（Ｔ_t、Ｔ_w、Ｔ_g）の表現には、例えば、同次座標系が用いられてよい。以下についても同様である。

On the other hand, when the end effector T holds the work W, the relationship between the task state s and the coordinates (T _w , T _g ) of the work W and another work G can be expressed by the above equation 5. can be done. Therefore, the control unit 31 substitutes the current values (T _w(j) , T _g(j) ) of the coordinates of the work W and the other work G estimated by the matching result into the above formula 5, and the above formula 5, an estimated value of the current task state s _(j) can be calculated. Note that the representation of each coordinate (T _t , T _w , T _g ) may be selected as appropriate. For example, a homogeneous coordinate system may be used to express each coordinate (T _t , T _w , T _g ). The same applies to the following.

カメラＳ１がキャリブレーションされていない場合、制御部３１は、カメラＳ１により得られる画像データ内で原点の座標（Ｔ_R）の推定値を更に算出してもよい。原点の検出には、マーカ等の目印が用いられてよい。すなわち、画像データ内で目印をマッチングすることで、原点の座標（Ｔ_R）の推定値が算出されてよい。制御部３１は、算出された原点の座標（Ｔ_R）の推定値を上記各演算に適用することで、現在のタスク状態ｓ_(j)の推定値を算出することができる。以降のステップで、ＣＡＤデータ３２０によるマッチングを実行するケースも同様に処理されてよい。 If the camera S1 is not calibrated, the control unit 31 may further calculate an estimate of the origin coordinates (T _R ) within the image data obtained by the camera S1. A mark such as a marker may be used to detect the origin. That is, by matching landmarks within the image data, an estimate of the coordinates of the origin (T _R ) may be calculated. The control unit 31 can calculate the estimated value of the current task state s _(j) by applying the calculated estimated value of the coordinates (T _R ) of the origin to each of the above calculations. In subsequent steps, the case of performing matching with CAD data 320 may be similarly handled.

現在のタスク状態ｓ_(j)を取得すると、制御部３１は、次のステップＳ３０７に処理を進める。なお、ステップＳ３０６の処理を実行するタイミングは、このような例に限定されなくてもよい。ステップＳ３０６の処理は、後述するステップＳ３０８を実行する前の任意のタイミングで実行されてよい。例えば、上記ステップＳ３０４でもＣＡＤデータ３２０によるマッチングを行う場合、当該ステップＳ３０６の処理は、上記ステップＳ３０４の処理と共に実行されてよい。 After obtaining the current task state s _(j) , the control unit 31 advances the process to the next step S307. Note that the timing of executing the process of step S306 need not be limited to such an example. The processing of step S306 may be performed at any timing before step S308, which will be described later, is performed. For example, when matching is performed using the CAD data 320 also in step S304, the process of step S306 may be performed together with the process of step S304.

（ステップＳ３０７）
ステップＳ３０７では、制御部３１は、第１推定部３１３として動作し、第１推定モデル６１を利用して、取得された第１センシングデータ３２３から、観測空間内における手先の現在の座標の第１推定値を算出する。また、制御部３１は、第２推定部３１４として動作し、第２推定モデル６２を利用して、取得された第２センシングデータ３２４から、観測空間内における手先の現在の座標の第２推定値を算出する。 (Step S307)
In step S307 , the control unit 31 operates as the first estimation unit 313 and uses the first estimation model 61 to determine the current coordinates of the hand in the observation space from the acquired first sensing data 323 . Calculate an estimate. In addition, the control unit 31 operates as a second estimation unit 314 and uses the second estimation model 62 to obtain a second estimated value of the current coordinates of the hand in the observation space from the acquired second sensing data 324. Calculate

（１）第１推定値の算出過程
まず、第１推定値の算出過程の一例について説明する。図１７に示されるとおり、制御部３１は、順運動学計算により、各エンコーダＳ２により得られる関節空間におけるマニピュレータ４の各関節の角度の現在値ｑ_(j)（第１センシングデータ３２３）から、観測空間におけるマニピュレータ４の手先座標の第１推定値を算出する（換言すると、現在値ｘ_(j)を推定する）。以下、エンドエフェクタＴがワークＷを保持していない場合とワークＷを保持している場合とに分けて説明する。 (1) First Estimated Value Calculation Process First, an example of the first estimated value calculation process will be described. As shown in FIG. 17, the control unit 31 uses the forward kinematics calculation to calculate the current values q _(j) of the angles of the joints of the manipulator 4 in the joint space obtained by the encoders S2 (first sensing data 323). A first estimate of the hand coordinates of the manipulator 4 in the observation space is calculated (in other words, the current value x _(j) is estimated). Hereinafter, the case where the end effector T does not hold the work W and the case where the end effector T holds the work W will be described separately.

（１－１）ワークＷを保持していない場面
エンドエフェクタＴがワークＷを保持していない場合、エンドエフェクタＴの注目点Ｔ０が手先に設定されている。この場合、制御部３１は、各関節の第１同次変換行列により導出される第１変換行列群（φ）を変換関数として用いた順運動学計算により、各関節の角度の現在値ｑ_(j)から設定された手先座標の第１推定値を算出する。 (1-1) Scene where the work W is not held When the end effector T does not hold the work W, the attention point T0 of the end effector T is set at the hand. In this case, the control unit 31 calculates the current value _q A first estimated value of the set hand coordinates is calculated from _j) .

具体的には、順運動学により、エンドエフェクタＴの注目点Ｔ０の座標（ｘ_t）と各関節の角度（ｑ）との関係は、上記式６により表現することができる。角度（ｑ）は、関節数に応じた次元数を有する変数である。また、各関節の第１同次変換行列（_m-1Ｔ^m）と第１変換行列群（φ）との関係は、上記式７により与えられる（ｍは、０～ｎ。ｎは、関節数）。第１同次変換行列は、対象の関節よりも手元側の座標系から見た対象の関節の座標系の相対座標を表し、手元側の座標系から対象の関節の座標系に座標を変換するのに利用される。

Specifically, the relationship between the coordinates (x _t ) of the point of interest T0 of the end effector T and the angle (q) of each joint can be expressed by the above equation 6 according to forward kinematics. Angle (q) is a variable with a number of dimensions corresponding to the number of joints. The relationship between the first homogeneous transformation matrix ( _m-1 ^Tm ) of each joint and the first transformation matrix group (φ) is given by Equation 7 above (m is 0 to n; n is the joint number). The first homogeneous transformation matrix represents the relative coordinates of the coordinate system of the target joint viewed from the coordinate system closer to the target joint than the target joint, and transforms the coordinates from the coordinate system of the target joint to the coordinate system of the target joint. used for

各関節の第１同次変換行列のパラメータの値は、各関節の角度を除いて既知であり、本実施形態では、ロボットデータ３２１に含まれている。当該パラメータは、ＤＨ（Denavit-Hartenberg）記法、修正ＤＨ記法等の公知の方法で設定されてよい。制御部３１は、ロボットデータ３２１を参照することで、上記式７に示される第１変換行列群（φ）を導出する。そして、制御部３１は、上記式６のとおり、導出された第１変換行列群（φ）に各関節の角度の現在値ｑ_(j)を代入し、第１変換行列群（φ）の演算処理を実行する。この順運動学計算の結果により、制御部３１は、エンドエフェクタＴ（の注目点Ｔ０）の現在の座標の推定値を算出する（換言すると、座標の現在値ｘ_t(j)を推定する）ことができる。制御部３１は、算出された推定値を現在の手先座標の第１推定値として取得する。 The parameter values of the first homogeneous transformation matrix of each joint are known except for the angle of each joint, and are included in the robot data 321 in this embodiment. The parameters may be set by a known method such as DH (Denavit-Hartenberg) notation, modified DH notation, or the like. The control unit 31 refers to the robot data 321 to derive the first transformation matrix group (φ) shown in Equation 7 above. Then, the control unit 31 substitutes the current value q _(j) of the angle of each joint into the derived first transformation matrix group (φ) as shown in Equation 6 above, and calculates the first transformation matrix group (φ). Execute the process. Based on the result of this forward kinematics calculation, the control unit 31 calculates an estimated value of the current coordinates of (the target point T0 of) the end effector T (in other words, estimates the current coordinate value x _t(j) ). be able to. The control unit 31 acquires the calculated estimated value as the first estimated value of the current hand coordinates.

（１－２）ワークＷを保持している場面
一方、エンドエフェクタＴがワークＷを保持している場合、ワークＷの注目点Ｗ０が手先に設定されている。この場合、まず、制御部３１は、エンドエフェクタＴの注目点Ｔ０の座標系からワークＷの注目点Ｗ０の座標系に座標を変換するための第２同次変換行列（_tＴ^w）を取得する。 (1-2) Scene of Holding Work W On the other hand, when the end effector T holds the work W, the target point W0 of the work W is set at the end of the hand. In this case, first, the control unit 31 acquires a second homogeneous transformation matrix ( _t T ^w ) for transforming the coordinates from the coordinate system of the point of interest T0 of the end effector T to the coordinate system of the point of interest W0 of the work W. do.

第２同次変換行列（_tＴ^w）を取得する方法は、特に限られなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、エンドエフェクタＴにワークＷが保持されると、エンドエフェクタＴに対するワークＷの位置及び姿勢が一定になるケースが存在する。そこで、第２同次変換行列（_tＴ^w）は、定数で与えられてもよい。 A method for obtaining the second homogeneous transformation matrix ( _t T ^w ) is not particularly limited, and may be appropriately selected according to the embodiment. For example, when the work W is held by the end effector T, there are cases where the position and orientation of the work W with respect to the end effector T are constant. Therefore, the second homogeneous transformation matrix ( _tTw ) may be given by ^a constant.

或いは、制御部３１は、ステップＳ３０３において取得されたセンシングデータから第２同次変換行列（_tＴ^w）を推定してもよい。推定方法の一例として、制御部３１は、ＣＡＤデータ３２０を利用して、カメラＳ１により得られた画像データに対してエンドエフェクタＴ及びワークＷのモデルをマッチングしてもよい。制御部３１は、このマッチングの結果により、エンドエフェクタＴの座標（Ｔ_t）及びワークＷの座標（Ｔ_w）の推定値を得ることができる。上記と同様に、カメラＳ１がキャリブレーションされていると想定すると、制御部３１は、以下の式８により、エンドエフェクタＴの座標（Ｔ_t）及びワークＷの座標（Ｔ_w）それぞれの推定値から第２同次変換行列（_tＴ^w）を推定することができる。 Alternatively, the control unit 31 may estimate the second homogeneous transformation matrix ( _tTw ⁾ from the sensing data acquired in step S303. As an example of the estimation method, the control unit 31 may use the CAD data 320 to match models of the end effector T and the workpiece W with image data obtained by the camera S1. The control unit 31 can obtain estimated values of the coordinates (T _t ) of the end effector T and the coordinates (T _w ) of the work W from the result of this matching. Assuming that the camera S1 has been calibrated in the same manner as described above, the control unit 31 calculates the estimated values of the coordinates (T _t ) of the end effector T and the coordinates (T _w ) of the workpiece W using the following equation 8: A second homogeneous transformation matrix ( _t T ^w ) can be estimated from

ＣＡＤデータ３２０によるマッチングは、上記順運動学計算により算出されるエンドエフェクタＴの注目点Ｔ０の座標（ｘ_t）付近で実施されてもよい。また、制御部３１は、順運動学計算により算出される座標（ｘ_t）の推定値を座標（Ｔ_t）の推定値として利用してもよい。これにより、制御部３１は、第２同次変換行列（_tＴ^w）を推定することができる。なお、上記ステップＳ３０６の上記式４の演算処理でも同様に、マッチングにより推定されたエンドエフェクタＴの座標の現在値（Ｔ_t(j)）の代わりに、順運動学計算により推定された座標の現在値（ｘ_t(j)）が用いられてよい。

Matching by the CAD data 320 may be performed near the coordinate (x _t ) of the target point T0 of the end effector T calculated by the above forward kinematics calculation. Also, the control unit 31 may use the estimated value of the coordinate (x _t ) calculated by the forward kinematics calculation as the estimated value of the coordinate (T _t ). Thereby, the control unit 31 can estimate the second homogeneous transformation matrix ( _t T ^w ). It should be noted that in the arithmetic processing of the above equation 4 in step S306, similarly, instead of the current value (T _t(j) ) of the coordinates of the end effector T estimated by matching, the coordinates estimated by forward kinematics calculation are A current value (x _t(j) ) may be used.

また、推定方法の他の例として、触覚センサＳ３により測定されるエンドエフェクタＴに作用する力の分布は、エンドエフェクタＴに対するワークＷの位置及び姿勢に依存し得る。そこで、制御部３１は、触覚センサＳ３により得られた測定データ（第１センシングデータ３２３）に基づいて、エンドエフェクタＴに対するワークＷの相対座標（相対位置及び相対姿勢）を推定してもよい。制御部３１は、この推定の結果から第２同次変換行列（_tＴ^w）を推定することができる。 As another example of the estimation method, the distribution of force acting on the end effector T measured by the tactile sensor S3 can depend on the position and posture of the work W with respect to the end effector T. Therefore, the control unit 31 may estimate the relative coordinates (relative position and relative orientation) of the work W with respect to the end effector T based on measurement data (first sensing data 323) obtained by the tactile sensor S3. ^The control unit 31 can estimate the second homogeneous transformation matrix ( _tTw ) from the result of this estimation.

なお、センシングデータ３２２から第２同次変換行列（_tＴ^w）を推定する方法は、上記の解析的な方法に限られなくてもよい。第２同次変換行列（_tＴ^w）の推定には、例えば、判定モデル５０、推論モデル５５１等と同様に、機械学習により、センシングデータ３２２から第２同次変換行列（_tＴ^w）を推定する能力を習得した学習済みの機械学習モデルが利用されてもよい。この場合、制御部３１は、取得されたセンシングデータ３２２を学習済みの機械学習モデルに与えて、学習済みの機械学習モデルの演算処理を実行する。これにより、制御部３１は、第２同次変換行列（_tＴ^w）を推定した結果に対応する出力値を学習済みの機械学習モデルから取得することができる。 Note that the method of estimating the second homogeneous transformation matrix ( _tTw ) from the sensing data ³²² need not be limited to the analytical method described above. For estimating the second homogeneous transformation matrix ( _t T ^w ), for example, similar to the judgment model 50, the inference model 551, etc., the second homogeneous transformation matrix ( _t T ^w ) is obtained from the sensing data 322 by machine learning. A trained machine learning model that has acquired the ability to estimate may be utilized. In this case, the control unit 31 gives the acquired sensing data 322 to the learned machine learning model, and executes the arithmetic processing of the learned machine learning model. Thereby, the control unit 31 can acquire the output value corresponding to the result of estimating the second homogeneous transformation matrix ( _t T ^w ) from the learned machine learning model.

次に、制御部３１は、得られた第２同次変換行列（_tＴ^w）を第１変換行列群（φ）に掛けることで、第２変換行列群（φ（ｑ）・_tＴ^w）を算出する。第２変換行列群は、以下の式９により表現することができる。なお、第１同次変換行列は、第１変換式の一例であり、第２同次変換行列は、第２変換式の一例である。第１変換行列群（φ）は、第１変換式群の一例であり、第２変換行列群（φ（ｑ）・_tＴ^w）は、第２変換式群の一例である。各変換式の形式は、手先座標の演算に利用可能であれば、特に限定されなくてもよい。例えば、各変換式は、同次座標系以外の形式の変換行列で表現されてもよいし、或いは、行列以外の形式の数式で表現されてもよい。 Next, the control unit 31 multiplies the obtained second homogeneous transformation matrix ( _t T ^w ) by the first transformation matrix group (φ) to obtain the second transformation matrix group (φ(q)· _t T ^w ) is calculated. The second transformation matrix group can be expressed by Equation 9 below. The first homogeneous transformation matrix is an example of a first transformation formula, and the second homogeneous transformation matrix is an example of a second transformation formula. The first transformation matrix group (φ) is an example of a first transformation equation group, and the second transformation matrix group (φ(q)· _t T ^w ) is an example of a second transformation equation group. The format of each conversion formula does not have to be particularly limited as long as it can be used for calculation of hand coordinates. For example, each transform formula may be represented by a transform matrix in a format other than a homogeneous coordinate system, or may be represented by a formula in a format other than a matrix.

制御部３１は、算出された第２変換行列群を変換関数として用いた順運動学計算により、各関節の角度の現在値ｑ_(j)から設定された手先座標の第１推定値を算出する。すなわち、制御部３１は、第２変換行列群（φ（ｑ）・_tＴ^w）に各関節の角度の現在値ｑ_(j)を代入し、第２変換行列群（φ（ｑ）・_tＴ^w）の演算処理を実行する。この順運動学計算の結果により、制御部３１は、ワークＷ（の注目点Ｗ０）の現在の座標の推定値を算出する（換言すると、座標の現在値を推定する）ことができる。制御部３１は、算出されたワークＷの現在の座標の推定値を現在の手先座標の第１推定値として取得する。

The control unit 31 calculates a first estimated value of the set hand coordinate from the current value q _(j) of the angle of each joint by forward kinematics calculation using the calculated second transformation matrix group as a transformation function. . That is, the control unit 31 substitutes the current value q _(j) of the angle of each joint into the second transformation matrix group (φ(q)· _t T ^w ), and the second transformation matrix group (φ(q)· _t T ^w ). Based on the results of this forward kinematics calculation, the control unit 31 can calculate estimated values of the current coordinates of (the point of interest W0 of) the work W (in other words, estimate the current values of the coordinates). The control unit 31 acquires the calculated estimated value of the current coordinates of the work W as the first estimated value of the current hand coordinates.

なお、上記において、第２同次変換行列（_tＴ^w）をセンシングデータから推定することにより、エンドエフェクタＴにおけるワークＷの保持状態が変動しても、その変動が反映された第２同次変換行列（_tＴ^w）を取得することができる。これにより、エンドエフェクタＴにおけるワークＷの保持状態が変動し得る場合でも、ワークＷの座標の現在値、すなわち、マニピュレータ４の手先座標の現在値を適切に推定することができる。 In the above, by estimating the second homogeneous transformation matrix ( _t T ^w ) from the sensing data, even if the holding state of the workpiece W in the end effector T fluctuates, the second homogeneous transformation matrix reflecting the fluctuation can be obtained. A transformation matrix ( _t T ^w ) can be obtained. As a result, even if the holding state of the work W on the end effector T may fluctuate, the current values of the coordinates of the work W, that is, the current values of the hand coordinates of the manipulator 4 can be appropriately estimated.

（１－３）小括
以上のとおり、上記各場面では、制御部３１は、導出された変換関数を用いた順運動学計算により、各関節の角度の現在値ｑ_(j)から、マニピュレータ４の現在の手先座標の第１推定値を算出することができる。この順運動学計算に用いられる変換関数が、第１推定モデル６１の一例である。すなわち、エンドエフェクタＴがワークＷを保持していない場面では、第１変換行列群（φ）が、第１推定モデル６１の一例に相当する。また、エンドエフェクタＴがワークＷを保持している場面では、第２変換行列群（φ（ｑ）・_tＴ^w）が、第１推定モデル６１の一例に相当する。各変換関数の各パラメータが、第１推定モデル６１のパラメータの一例に相当する。 (1-3) Summary As described above, in each of the above situations, the control unit 31 performs forward kinematics calculation using the derived conversion function to convert the current value q _(j) of the angle of each joint to the manipulator 4 A first estimate of the current hand coordinates of . A conversion function used for this forward kinematics calculation is an example of the first estimation model 61 . That is, when the end effector T does not hold the workpiece W, the first transformation matrix group (φ) corresponds to an example of the first estimation model 61 . In addition, when the end effector T holds the workpiece W, the second transformation matrix group (φ(q)· _t T ^w ) corresponds to an example of the first estimation model 61 . Each parameter of each conversion function corresponds to an example of the parameters of the first estimation model 61 .

（２）第２推定値の算出過程
次に、第２推定値の算出過程の一例について説明する。制御部３１は、ＣＡＤデータ３２０を利用して、カメラＳ１により得られた画像データ（第２センシングデータ３２４）に対して各対象物のモデルをマッチングする。これにより、制御部３１は、マニピュレータ４の現在の手先座標の第２推定値を算出する（換言すると、現在値ｘ_(j)を推定する）ことができる。この場合、制御部３１は、上記ステップＳ３０６によりタスク空間で推定された現在のタスク状態ｓ_(j)から、マニピュレータ４の現在の手先座標の第２推定値を算出してもよい。 (2) Second Estimated Value Calculation Process Next, an example of the second estimated value calculation process will be described. The control unit 31 uses the CAD data 320 to match the model of each object with the image data (second sensing data 324) obtained by the camera S1. Thereby, the control unit 31 can calculate the second estimated value of the current hand coordinates of the manipulator 4 (in other words, estimate the current value x _(j) ). In this case, the control unit 31 may calculate a second estimated value of the current hand coordinates of the manipulator 4 from the current task state s _(j) estimated in the task space in step S306.

図１８に示される各対象物の位置関係に基づいて、エンドエフェクタＴがワークＷを保持していない場合におけるタスク状態ｓ及び手先の座標ｘの間の関係は、上記式１０により表現することができる。この場合には、タスク空間から観測空間への変換関数（ψ）は、ワークＷの座標（Ｔ_w）により与えられる。制御部３１は、上記ステップＳ３０６により取得された現在のタスク状態ｓ_(j)及びマッチングにより推定されたワークＷの座標の現在値（Ｔ_w(j)）を式１０に代入し、上記式１０の演算処理を実行することで、マニピュレータ４の現在の手先座標の第２推定値を算出することができる。

Based on the positional relationship of each object shown in FIG. 18, the relationship between the task state s and the coordinate x of the hand when the end effector T does not hold the work W can be expressed by Equation 10 above. can. In this case, the transformation function (φ) from task space to observation space is given by the coordinates of the work W (T _w ). The control unit 31 substitutes the current task state s _(j) obtained in step S306 and the current value (T _w(j) ) of the coordinates of the work W estimated by matching into equation 10 to obtain the equation 10 , the second estimated value of the current hand coordinates of the manipulator 4 can be calculated.

同様に、エンドエフェクタＴがワークＷを保持している場合におけるタスク状態ｓ及び手先の座標ｘの間の関係は、上記式１１により表現することができる。この場合には、タスク空間から観測空間への変換関数（ψ）は、他のワークＧの座標（Ｔ_g）により与えられる。制御部３１は、上記ステップＳ３０６により取得された現在のタスク状態ｓ_(j)及びマッチングにより推定された他のワークＧの座標の現在値（Ｔ_g(j)）を式１１に代入し、上記式１１の演算処理を実行することで、マニピュレータ４の現在の手先座標の第２推定値を算出することができる。

Similarly, the relationship between the task state s and the hand coordinate x when the end effector T holds the work W can be expressed by Equation 11 above. In this case, the transformation function (φ) from task space to observation space is given by the coordinates (T _g ) of another work G. The control unit 31 substitutes the current task state s _(j) obtained in step S306 and the current value (T _g(j) ) of the coordinates of the other work G estimated by matching into Equation 11, and A second estimated value of the current hand coordinates of the manipulator 4 can be calculated by executing the arithmetic processing of Expression 11.

上記式１０及び式１１の変換関数（ψ）が、第２推定モデル６２の一例である。各変換関数（ψ）の各パラメータが、第２推定モデル６２のパラメータに相当する。なお、ＣＡＤデータ３２０を用いたマッチングによりマニピュレータ４の手先座標の現在値ｘ_(j)を推定する方法はこのような例に限定されなくてもよい。エンドエフェクタＴがワークＷを保持していない場合、制御部３１は、上記マッチングにより、エンドエフェクタＴの座標の現在値（Ｔ_t(j)）を推定し、推定された現在値（Ｔ_t(j)）を現在の手先座標の第２推定値として取得してもよい。同様に、エンドエフェクタＴがワークＷを保持している場合、制御部３１は、上記マッチングにより、ワークＷの座標の現在値（Ｔ_w(j)）を推定し、推定された現在値（Ｔ_w(j)）を現在の手先座標の第２推定値として取得してもよい。すなわち、制御部３１は、上記マッチングにより、マニピュレータ４の現在の手先座標の第２推定値を直接的に導出してもよい。この場合、各座標（Ｔ_t、Ｔ_w）が、第２推定モデル６２の一例である。また、各座標（Ｔ_t、Ｔ_w）の各項が、第２推定モデル６２のパラメータの一例である。 The conversion function (ψ) of Equations 10 and 11 above is an example of the second estimation model 62 . Each parameter of each conversion function (ψ) corresponds to a parameter of the second estimation model 62 . Note that the method of estimating the current value x _(j) of the hand coordinates of the manipulator 4 by matching using the CAD data 320 need not be limited to this example. When the end effector T does not hold the workpiece W, the control unit 31 estimates the current coordinate value (T _t(j) ) of the end effector T by the above matching, and calculates the estimated current value (T _{t( j)} ) may be taken as a second estimate of the current hand coordinates. Similarly, when the end effector T holds the work W, the control unit 31 estimates the current values (T _w(j) ) of the coordinates of the work W by the above matching, and the estimated current values (T _w(j) ) may be taken as a second estimate of the current hand coordinates. That is, the control unit 31 may directly derive the second estimated value of the current hand coordinates of the manipulator 4 by the above matching. In this case, each coordinate (T _t , T _w ) is an example of the second estimation model 62 . Also, each term of each coordinate (T _t , T _w ) is an example of a parameter of the second estimation model 62 .

（３）小括
以上により、制御部３１は、マニピュレータ４の現在の手先座標の第１推定値及び第２推定値を算出することができる。制御部３１は、第１推定部及び第２推定部の少なくとも一方に基づいて、マニピュレータ４の手先の座標の現在値ｘ_(j)を認定する。この認定は適宜行われてよい。例えば、制御部３１は、第１推定値及び第２推定値のいずれかを手先座標の現在値ｘ_(j)としてそのまま採用してもよい。また、例えば、制御部３１は、第１推定値及び第２推定値の平均値を算出し、算出された平均値を手先座標の現在値ｘ_(j)として取得してもよい。この場合、平均値は、重み付け平均により算出されてよい。各推定値に対する重み付けは、推定の精度の高いと想定される推定値が優先されるように行われてよい。一例として、順運動学計算による手先座標の推定精度が、カメラＳ１の画像データに対するマッチングによる手先座標の推定精度よりも高いと想定する。このケースでは、第２推定値よりも第１推定値の方が優先されるように各推定値の重み付けが行われてもよい。手先座標の現在値ｘ_(j)を取得すると、制御部３１は、次のステップＳ３０８に処理を進める。 (3) Summary As described above, the control unit 31 can calculate the first estimated value and the second estimated value of the current hand coordinates of the manipulator 4 . The control unit 31 recognizes the current value x _(j) of the coordinates of the hand of the manipulator 4 based on at least one of the first estimating unit and the second estimating unit. This accreditation may be done as appropriate. For example, the control unit 31 may directly adopt either the first estimated value or the second estimated value as the current hand coordinate value x _(j) . Further, for example, the control unit 31 may calculate an average value of the first estimated value and the second estimated value, and obtain the calculated average value as the current hand coordinate value x _(j) . In this case, the average value may be calculated by weighted averaging. Each estimated value may be weighted so that an estimated value assumed to have high estimation accuracy is given priority. As an example, it is assumed that the estimation accuracy of the hand coordinates by forward kinematics calculation is higher than the estimation accuracy of the hand coordinates by matching to the image data of the camera S1. In this case, the estimates may be weighted to favor the first estimate over the second estimate. After acquiring the current hand coordinate value x _(j) , the control unit 31 advances the process to the next step S308.

なお、後述する調整処理を実行しない場合、第１推定値及び第２推定値のいずれかの算出処理は省略されてよい。また、ステップＳ３０７の処理を実行するタイミングは、このような例に限定されなくてもよい。ステップＳ３０７の処理は、後述するステップＳ３１０の処理を実行する前の任意のタイミングで実行されてよい。例えば、ステップＳ３０７の処理は、上記ステップＳ３０６よりも前に実行されてよい。また、例えば、ＣＡＤデータ３２０を用いたマッチングを行うケースでは、ステップＳ３０７の処理は、上記ステップＳ３０６又はステップＳ３０４の処理と共に実行されてよい。 Note that when the adjustment process described later is not executed, the process of calculating either the first estimated value or the second estimated value may be omitted. Also, the timing of executing the process of step S307 need not be limited to such an example. The process of step S307 may be executed at any timing before executing the process of step S310, which will be described later. For example, the process of step S307 may be performed before step S306. Further, for example, in the case of performing matching using the CAD data 320, the process of step S307 may be executed together with the process of step S306 or step S304.

（ステップＳ３０８）
ステップＳ３０８では、制御部３１は、行動決定部３１６として動作し、最終目標のタスク状態ｓ_gに近付くように、取得された現在のタスク状態ｓ_(j)に対して次に遷移する目標のタスク状態ｓ_s(j)を決定する。本実施形態では、制御部３１は、推論モデルデータ２２５を参照し、上記ステップＳ２１０の処理により生成された推論モデル５５を利用して、現在のタスク状態ｓ_(j)に対して次に遷移する目標のタスク状態ｓ_s(j)を決定する。 (Step S308)
In step S308, the control unit 31 operates as the action determination unit 316, and determines the next target task to transition to the acquired current task state s _(j) so as to approach the final target task state _sg . Determine the state s _s(j) . In this embodiment, the control unit 31 refers to the inference model data 225 and uses the inference model 55 generated by the process of step S210 to make the next transition to the current task state s _(j). Determine the target task state s _s(j) .

次に遷移する目標のタスク状態ｓ_s(j)を推論するための推論モデル５５の演算処理は、当該推論モデル５５の構成に応じて適宜実行されてよい。推論モデル５５が上記第１の方法により生成されており、推論モデル５５が関数式により構成される場合、制御部３１は、現在のタスク状態ｓ_(j)及び最終目標のタスク状態ｓ_gを関数式に代入し、当該関数式の演算処理を実行する。推論モデル５５がニューラルネットワークにより構成される場合、制御部３１は、現在のタスク状態ｓ_(j)及び最終目標のタスク状態ｓ_gを入力層に入力し、入力側から順に各層に含まれる各ニューロンの発火判定を行う。推論モデル５５がデータテーブルにより構成される場合、制御部３１は、現在のタスク状態ｓ_(j)及び最終目標のタスク状態ｓ_gをデータテーブルに照合する。これにより、制御部３１は、推論モデル５５の出力として、次に遷移する目標のタスク状態ｓ_s(j)を推論した結果を取得する。制御部３１は、この推論結果により、次に遷移する目標のタスク状態ｓ_s(j)を決定することができる。 Arithmetic processing of the inference model 55 for inferring the target task state s _s(j) to be transitioned to next may be appropriately executed according to the configuration of the inference model 55 . If the inference model 55 is generated by the first method, and the inference model 55 is composed of a functional expression, the control unit 31 converts the current task state s _(j) and the final target task state s _g into a function Substitute into the expression and execute the arithmetic processing of the function expression. When the inference model 55 is composed of a neural network, the control unit 31 inputs the current task state s _(j) and the final target task state s _g to the input layer, and each neuron included in each layer in order from the input side fire determination. When the inference model 55 is composed of a data table, the control unit 31 checks the current task state s _(j) and the final target task state s _g against the data table. As a result, the control unit 31 acquires, as an output of the inference model 55, the result of inferring the target task state s _s(j) to which the task transitions next. Based on this inference result, the control unit 31 can determine the target task state s _s(j) to which the task transitions next.

また、推論モデル５５が上記第２の方法により生成される、すなわち、推論モデル５５がポテンシャル場により構成される場合、制御部３１は、生成されたポテンシャル場における、現在のタスク状態ｓ_(j)に対応する座標に設定されたポテンシャルの値を参照する。そして、制御部３１は、現在のタスク状態ｓ_(j)に対応する座標に設定されたポテンシャルの勾配に応じて、次に遷移する目標のタスク状態ｓ_s(j)を決定する。具体的には、制御部３１は、ポテンシャルの勾配の高い方に遷移する（例えば、勾配の最も高い方に所定の距離分だけ遷移する）ように目標のタスク状態ｓ_s(j)を決定する。 Further, when the inference model 55 is generated by the second method, that is, when the inference model 55 is composed of a potential field, the control unit 31 calculates the current task state s _(j) in the generated potential field. Refer to the potential value set at the coordinates corresponding to . Then, the control unit 31 determines the target task state s s _(j) to which the next transition is to be made, according to the gradient of the potential set at the coordinates corresponding to the current task state _{s (j)} . Specifically, the control unit 31 determines the target task state s _s(j) such that the transition to the higher gradient of the potential (for example, transition to the highest gradient by a predetermined distance). .

決定する目標のタスク状態の数は、１つに限られなくてもよい。ステップＳ３０８では、制御部３１は、決定した目標のタスク状態を現在のタスク状態として用いて、更に次に遷移する目標のタスク状態を決定してもよい。制御部３１は、この処理を繰り返すことで、目標のタスク状態を複数回決定してもよい。次に遷移する目標のタスク状態ｓ_s(j)を決定すると、制御部３１は、次のステップＳ３０９に処理を進める。 The number of target task states to be determined may not be limited to one. In step S308, the control unit 31 may use the determined target task state as the current task state to further determine the next target task state to transition to. The control unit 31 may determine the target task state multiple times by repeating this process. After determining the next target task state s _s(j) to transition to, the control unit 31 advances the process to the next step S309.

（ステップＳ３０９）
ステップＳ３０９では、制御部３１は、指令決定部３１７として動作し、決定された目標のタスク状態ｓ_s(j)から手先の座標の目標値ｘ_s(j)を算出する。図１７に示されるとおり、制御部３１は、上記変換関数（ψ）を利用することで、タスク空間における目標のタスク状態ｓ_s(j)を観測空間における手先の座標の目標値ｘ_s(j)に変換することができる。 (Step S309)
In step S309, the control unit 31 operates as the command determination unit 317 and calculates the target value x _s(j) of the hand coordinates from the determined target task state s _s(j) . As shown in FIG. 17, the control unit 31 converts the target task state s _s(j) in the task space to the target coordinate x _s(j ) of the hand in the observation space by using the transformation function (ψ). ₎ can be converted to

すなわち、エンドエフェクタＴがワークＷを保持していない場合におけるタスク空間から観測空間への変換関数（ψ）は、上記式１０により与えられる。制御部３１は、決定された目標のタスク状態ｓ_s(j)を上記式１０に代入し、上記式１０の演算処理を実行することで、手先の座標の目標値ｘ_s(j)を算出することができる。一方、エンドエフェクタＴがワークＷを保持している場合におけるタスク空間から観測空間への変換関数（ψ）は、上記式１１により与えられる。制御部３１は、決定された目標のタスク状態ｓ_s(j)を上記式１１に代入し、上記式１１の演算処理を実行することで、手先の座標の目標値ｘ_s(j)を算出することができる。手先の座標の目標値ｘ_s(j)を算出すると、制御部３１は、次のステップＳ３１０に処理を進める。 That is, the conversion function (ψ) from the task space to the observation space when the end effector T does not hold the work W is given by the above equation (10). The control unit 31 substitutes the determined target task state s _s(j) into the above equation 10, and executes the arithmetic processing of the above equation 10 to calculate the target value x _s(j) of the hand coordinates. can do. On the other hand, the conversion function (ψ) from the task space to the observation space when the end effector T holds the work W is given by the above equation (11). The control unit 31 substitutes the determined target task state s _s(j) into the above equation 11 and executes the arithmetic processing of the above equation 11 to calculate the target value x _s(j) of the hand coordinates. can do. After calculating the target value x _s(j) of the hand coordinates, the control unit 31 proceeds to the next step S310.

（ステップＳ３１０）
ステップＳ３１０では、制御部３１は、指令決定部３１７として動作し、手先座標の現在値ｘ_(j)及び手先座標の目標値ｘ_s(j)から手先座標の変化量（Δｘ_(j)）を決定する。具体的には、図１７に示されるとおり、制御部３１は、手先座標の現在値（ｘ_(j)）及び目標値（ｘ_s(j)）の偏差に基づいて手先座標の変化量（Δｘ_(j)）を決定する。例えば、手先座標の現在値及び目標値の偏差（ｘ_s－ｘ）と変化量（Δｘ）との関係は、以下の式１２により与えられてよい。なお、手先座標の変化量（Δｘ）は、手先座標の現在値及び目標値の差分の一例である。 (Step S310)
In step S310, the control unit 31 operates as the command determination unit 317, and determines the change amount (Δx(j)) of the hand coordinate from the current hand coordinate value x _(j) and the target hand coordinate value _xs( _j). decide. Specifically, as shown in FIG. 17, the control unit 31 controls the _change amount ₍ Δx _(j) ). For example, the relationship between the deviation (x _s −x) of the current value of the hand coordinate and the target value and the amount of change (Δx) may be given by Equation 12 below. Note that the change amount (Δx) of the hand coordinates is an example of the difference between the current value and the target value of the hand coordinates.

αは任意の係数である。例えば、αの値は、１以下でかつ０を超える範囲内で適宜決定されてよい。αは省略されてよい。制御部３１は、ステップＳ３０７及びステップＳ３０９により得られた手先座標の現在値ｘ_(j)及び手先座標の目標値ｘ_s(j)を上記式１２に代入し、上記式１２の演算処理を実行することで、手先座標の変化量（Δｘ_(j)）を決定することができる。手先座標の変化量（Δｘ_(j)）を決定すると、制御部３１は、次のステップＳ３１１に処理を進める。

α is an arbitrary coefficient. For example, the value of α may be appropriately determined within a range of 1 or less and greater than 0. α may be omitted. The control unit 31 substitutes the current value x _(j) of the hand coordinates and the target value x _s(j) of the hand coordinates obtained in steps S307 and S309 into the above equation 12, and executes the arithmetic processing of the above equation 12. By doing so, the change amount (Δx _(j) ) of the hand coordinates can be determined. After determining the change amount (Δx _(j) ) of the hand coordinates, the control unit 31 advances the process to the next step S311.

（ステップＳ３１１）
ステップＳ３１１では、制御部３１は、指令決定部３１７として動作し、上記順運動学計算における変換関数の逆関数を用いた逆運動学計算により、決定された手先座標の変化量（Δｘ_(j)）から各関節の角度の変化量（Δｑ_(j)）を算出する。具体的には、手先座標の変化量（Δｘ）と各関節の角度の変化量（Δｑ）とは、以下の式１３により表現することができる。 (Step S311)
In step S311, the control unit 31 operates as the command determination unit 317, and performs the inverse kinematics calculation using the inverse function of the transform function in the forward kinematics calculation _. ), the amount of change in the angle of each joint (Δq _(j) ) is calculated. Specifically, the amount of change in hand coordinates (Δx) and the amount of change in angle of each joint (Δq) can be expressed by Equation 13 below.

Ｊは、上記順運動学計算における変換関数から導出されるヤコビ行列である。ｊ_iは、ｉ番目の関節の行列成分を示し、Δｑ_iは、ｉ番目の関節の変化量を示す。

J is the Jacobian matrix derived from the transformation function in the above forward kinematics calculation. j _i indicates the matrix element of the i-th joint, and Δq _i indicates the amount of change of the i-th joint.

ここで、図１９Ａ及び図１９Ｂを更に用いて、ヤコビ行列の計算方法の一例について説明する。図１９Ａは、エンドエフェクタＴがワークＷを保持していない時における各関節と手先との関係の一例を模式的に例示する。図１９Ｂは、エンドエフェクタＴがワークＷを保持している時における各関節と手先との関係の一例を模式的に例示する。 Here, an example of a method for calculating the Jacobian matrix will be described with further reference to FIGS. 19A and 19B. FIG. 19A schematically illustrates an example of the relationship between each joint and the hand when the end effector T does not hold the work W. FIG. FIG. 19B schematically illustrates an example of the relationship between each joint and the hand while the end effector T holds the work W. FIG.

図１９Ａに示されるとおり、エンドエフェクタＴがワークＷを保持していない時には、ヤコビ行列の各関節の成分は、各関節とエンドエフェクタＴとの位置関係に基づいて算出される。例えば、制御部３１は、以下の式１４により、各関節の成分を算出することができる。一方、図１９Ｂに示されるとおり、エンドエフェクタＴがワークＷを保持しているときには、ヤコビ行列の各関節の成分は、各関節とワークＷとの位置関係に基づいて算出される。例えば、制御部３１は、以下の式１５により、各関節の成分を算出することができる。 As shown in FIG. 19A, when the end effector T does not hold the work W, the component of each joint of the Jacobian matrix is calculated based on the positional relationship between each joint and the end effector T. For example, the control unit 31 can calculate the component of each joint using Equation 14 below. On the other hand, as shown in FIG. 19B, when the end effector T holds the work W, the component of each joint in the Jacobian matrix is calculated based on the positional relationship between each joint and the work W. For example, the control unit 31 can calculate the component of each joint using Equation 15 below.

ｚ_iは、ｉ番目の関節の同次座標における回転軸の成分を示し、ａ_iは、ｉ番目の関節の同次座標における平行移動成分を示す。ｚ_i及びａ_iは、ｉ番目の関節の第１同次変換行列から抽出される。ａ_tは、エンドエフェクタＴの同次座標における平行移動成分を示す。ａ_wは、ワークＷの同次座標における平行移動成分を示す。ａ_tは、エンドエフェクタＴの座標（Ｔ_t）から抽出される。ａ_wは、ワークＷの座標（Ｔ_w）から抽出される。ヤコビ行列の各成分ｊ_iは、各関節の第１同次変換行列の微分成分を示す。

Z _i indicates the rotation axis component in the homogeneous coordinates of the i-th joint, and a _i indicates the translation component in the homogeneous coordinates of the i-th joint. The z _i and a _i are extracted from the first homogeneous transformation matrix of the i th joint. a _t indicates the translational component of the end effector T in homogeneous coordinates. a _w represents a translation component in the homogeneous coordinates of the workpiece W; a _t is extracted from the coordinates of the end effector T (T _t ). a _w is extracted from the coordinates of the workpiece W (T _w ). Each component j _i of the Jacobian matrix represents a differential component of the first homogeneous transformation matrix of each joint.

制御部３１は、上記式１４及び式１５に従って、動作モードに応じてヤコビ行列を算出する。なお、本実施形態では、エンドエフェクタＴがワークＷを保持していない場合とエンドエフェクタＴがワークＷを保持している場合との間で、ヤコビ行列の各成分において、エンドエフェクタＴの成分（ａ_t）及びワークＷの成分（ａ_w）が入れ替わるに過ぎない。そのため、制御部３１は、単純な計算処理により、それぞれの場合におけるヤコビ行列を算出することができる。 The control unit 31 calculates the Jacobian matrix according to the operation mode according to Equations 14 and 15 above. In the present embodiment, the components of the end effector T ( a _t ) and the components of the workpiece W (a _w ) are simply interchanged. Therefore, the control unit 31 can calculate the Jacobian matrix in each case by simple calculation processing.

次に、制御部３１は、算出されたヤコビ行列の逆行列（Ｊ^-1）を算出する。制御部３１は、算出された逆行列（Ｊ^-1）を用いて、逆運動学計算を実行する。具体的には、各変化量（Δｘ、Δｑ）と逆行列（Ｊ^-1）との関係は、上記式１３から以下の式１６のとおり導出される。 Next, the control unit 31 calculates an inverse matrix (J ⁻¹ ) of the calculated Jacobian matrix. The control unit 31 performs inverse kinematics calculation using the calculated inverse matrix (J ⁻¹ ). Specifically, the relationship between each amount of change (Δx, Δq) and the inverse matrix (J ⁻¹ ) is derived from Equation 13 above as Equation 16 below.

制御部３１は、算出された逆行列（Ｊ^-1）及び手先座標の変化量（Δｘ_(j)）を式１６に代入し、上記式１６の演算処理を実行することで、各関節の角度の変化量（Δｑ_(j)）を算出することができる。各関節の角度の変化量（Δｑ_(j)）を算出すると、制御部３１は、次のステップＳ３１２に処理を進める。

The control unit 31 substitutes the calculated inverse matrix (J ⁻¹ ) and the amount of change in the hand coordinates (Δx _(j) ) into Equation 16, and executes the arithmetic processing of Equation 16 to obtain the angle of each joint. can be calculated (Δq _(j) ). After calculating the amount of change in the angle of each joint (Δq _(j) ), the control unit 31 proceeds to the next step S312.

（ステップＳ３１２）
ステップＳ３１２では、制御部３１は、指令決定部３１７として動作し、算出された各関節の角度の変化量に基づいて、各関節に対する指令値を決定する。指令値を決定する方法には、例えば、ＰＩＤ（Proportional-Integral-Differential）制御、ＰＩ制御等の公知の方法が採用されてよい。各関節に対する指令値は、マニピュレータ４に与える制御指令の一例である。本実施形態では、制御部３１は、ステップＳ３０９～ステップＳ３１２の処理により、手先座標が目標値に近付くように（更には、マニピュレータ４のタスク状態を現在のタスク状態ｓ_(j)から目標のタスク状態ｓ_s(j)に変化させるように）、マニピュレータ４に与える制御指令を決定することができる。制御指令を決定すると、制御部３１は、次のステップＳ３１３に処理を進める。 (Step S312)
In step S312, the control unit 31 operates as the command determination unit 317, and determines a command value for each joint based on the calculated amount of change in the angle of each joint. A known method such as PID (Proportional-Integral-Differential) control, PI control, or the like may be employed as a method for determining the command value. A command value for each joint is an example of a control command given to the manipulator 4 . In this embodiment, the control unit 31 causes the hand coordinates to approach the target value (furthermore, changes the task state of the manipulator 4 from the current task state s _(j) to the target task It is possible to determine the control commands to give to the manipulator 4, such as changing to state s _s(j) . After determining the control command, the control unit 31 advances the process to the next step S313.

（ステップＳ３１３）
ステップＳ３１３では、制御部３１は、駆動部３１８として動作し、決定された制御指令をマニピュレータ４に与えることで、マニピュレータ４を駆動する。本実施形態では、制御部３１は、決定された各指令値により、マニピュレータ４の各関節を駆動する。なお、駆動方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、制御部３１は、マニピュレータ４の各関節を直接的に駆動してもよい。或いは、マニピュレータ４は、コントローラ（不図示）を備えてもよい。この場合、制御部３１は、各関節に対する指令値をコントローラに与えることで、マニピュレータ４の各関節を間接的に駆動してもよい。決定された制御指令に従って、マニピュレータ４を駆動すると、制御部３１は、次のステップＳ３１４に処理を進める。 (Step S313)
In step S313 , the control unit 31 operates as the driving unit 318 and drives the manipulator 4 by giving the determined control command to the manipulator 4 . In this embodiment, the controller 31 drives each joint of the manipulator 4 according to each determined command value. Note that the driving method is not particularly limited, and may be appropriately selected according to the embodiment. For example, the controller 31 may directly drive each joint of the manipulator 4 . Alternatively, manipulator 4 may comprise a controller (not shown). In this case, the control unit 31 may indirectly drive each joint of the manipulator 4 by giving a command value for each joint to the controller. After driving the manipulator 4 according to the determined control command, the control unit 31 proceeds to the next step S314.

（ステップＳ３１４～ステップＳ３１６）
ステップＳ３１４～ステップＳ３１６の処理は、サイクルが（ｊ）から（ｊ＋１）に進んでいる点を除き、上記ステップＳ３０３、ステップＳ３０６及びステップＳ３０７の処理と同様である。すなわち、ステップＳ３１４では、制御部３１は、各センサ系から各センシングデータ（３２３、３２４）を取得する。ステップＳ３１５では、制御部３１は、状態取得部３１５として動作し、マニピュレータ４の現在のタスク状態ｓ_(j+1)を取得する。ステップＳ３１６では、制御部３１は、各推定部（３１３、３１４）として動作し、取得された各センシングデータ（３２３、３２４）からマニピュレータ４の現在の手先座標の各推定値を算出する。制御部３１は、算出された第１推定値及び第２推定値の少なくとも一方に基づいて、手先座標の現在値ｘ_(j+1)を認定する。これにより、手先座標の現在値ｘ_(j+1)を取得すると、制御部３１は、次のステップＳ３１７に処理を進める。 (Steps S314 to S316)
The processing of steps S314 to S316 is the same as the processing of steps S303, S306 and S307 except that the cycle proceeds from (j) to (j+1). That is, in step S314, the control unit 31 acquires each sensing data (323, 324) from each sensor system. In step S315 , the control unit 31 operates as the state acquisition unit 315 and acquires the current task state s _(j+1) of the manipulator 4 . In step S316, the control unit 31 operates as each estimation unit (313, 314) and calculates each estimated value of the current hand coordinates of the manipulator 4 from each acquired sensing data (323, 324). The control unit 31 recognizes the current hand coordinate value x _(j+1) based on at least one of the calculated first estimated value and second estimated value. Thus, when the current value x _(j+1) of the hand coordinates is acquired, the control unit 31 advances the process to the next step S317.

（ステップＳ３１７）
ステップＳ３１７では、制御部３１は、ステップＳ３１３による駆動の結果、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したか否かを判定する。 (Step S317)
In step S317, the control unit 31 determines whether or not the task state of the manipulator 4 has transitioned to the target task state s _s(j) as a result of the driving in step S313.

判定方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、図１７に示されるとおり、駆動後の各関節の角度（ｑ_(j+1)）と駆動前の各関節の角度（ｑ_(j)）との関係は、以下の式１７により表現することができる。 The determination method is not particularly limited, and may be appropriately selected according to the embodiment. For example, as shown in FIG. 17, the relationship between the angle of each joint after driving (q _(j+1) ) and the angle of each joint before driving (q _{(j) )} is expressed by Equation 17 below. be able to.

そこで、制御部３１は、ステップＳ３１４において各エンコーダＳ２により得られた各関節の角度の値が、駆動前に各エンコーダＳ２により得られた各関節の角度の値（ｑ_(j)）及びステップＳ３１１において算出された変化量（Δｑ_(j)）の和と一致するか否かを判定してもよい。駆動後の各関節の角度が駆動前の各関節の角度及び算出された変化量の和（ｑ_(j)＋Δｑ_(j)）と一致する場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したと判定してもよい。一方、そうではない場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移していないと判定してもよい。

Therefore, the control unit 31 converts the joint angle value obtained by each encoder S2 in step S314 into the joint angle value (q _(j) obtained by each encoder S2 before driving and step S311). It may be determined whether or not it matches the sum of the amount of change (Δq _(j) ) calculated in . When the angle of each joint after driving matches the angle of each joint before driving and the sum of the calculated amount of change (q _(j) +Δq _(j) ), the control unit 31 determines that the task state of the manipulator 4 is the target may be determined to have transitioned to the task state s _s(j) of . On the other hand, otherwise, the control unit 31 may determine that the task state of the manipulator 4 has not transitioned to the target task state s _s(j) .

また、例えば、順運動学計算における変換関数と同様に、変換関数（ψ）に関しても、ヤコビ行列Ｊ_ψが導出されてよい。ヤコビ行列Ｊ_ψは変換関数（ψ）の微分成分を示す。導出されたヤコビ行列Ｊ_ψから逆行列（Ｊ_ψ ^-1）が算出されてよい。手先座標の変化量（Δｘ）及びタスク状態の変化量（Δｓ）と逆行列（Ｊ_ψ ^-1）との関係は、以下の式１８により表現することができる。 Also, for example, the Jacobian matrix J _ψ may be derived for the transformation function (ψ) as well as the transformation function in the forward kinematics calculation. The Jacobian matrix J _ψ indicates the differential component of the transformation function (ψ). An inverse matrix (J _ψ ^-1 ) may be calculated from the derived Jacobian matrix J _ψ . The relationship between the amount of change (Δx) in the hand coordinates and the amount of change (Δs) in the task state and the inverse matrix (J _ψ ⁻¹ ) can be expressed by Equation 18 below.

制御部３１は、算出された逆行列（Ｊ_ψ ^-1）及び手先座標の変化量（Δｘ_(j)）を式１８に代入し、上記式１８の演算処理を実行することで、タスク状態の変化量（Δｓ_(j)）を算出することができる。駆動後のタスク状態ｓ_(j+1)と駆動前のタスク状態ｓ_(j)との関係との関係は、上記式１７と同様に、以下の式１９により表現することができる。

The control unit 31 substitutes the calculated inverse matrix (J _ψ ^-1 ) and the amount of change in the hand coordinates (Δx _(j) ) into Equation 18, and executes the arithmetic processing of Equation 18, thereby determining the task state. The amount of change (Δs _(j) ) can be calculated. The relationship between the task state s _(j+1) after driving and the task state s _(j) before driving can be expressed by the following equation 19, like equation 17 above.

そこで、制御部３１は、ステップＳ３１５により駆動後に得られた現在のタスク状態が、ステップＳ３０６により駆動前に得られた現在のタスク状態ｓ_(j)及び上記により算出された変化量（Δｓ_(j)）の和と一致するか否かを判定してもよい。駆動後に得られた現在のタスク状態が、駆動前に得られた現在のタスク状態及び算出された変化量の和（ｓ_(j)＋Δｓ_(j)）と一致する場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したと判定してもよい。一方、そうではない場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移していないと判定してもよい。なお、本実施形態では、タスク空間は２つの対象物間の相対座標で規定されるため、タスク空間及び観測空間は互いに共通の次元で表現可能である。そのため、場合によっては、式１８の逆行列（Ｊ_ψ ^-1）は、単位行列に置き換えられ、手先座標の変化量（Δｘ）がそのままタスク状態の変化量（Δｓ）として取り扱われてもよい。一例として、他のワークＧから見たワークＷの相対座標によりタスク状態を規定した場合、式１８の逆行列（Ｊ_ψ ^-1）は、単位行列に置き換えられてよい。

Therefore, the control unit 31 determines that the current task state obtained after driving in step S315 is the current task state s _(j) obtained before driving in step S306 and the amount of change (Δs _{(j )} and ). If the current task state obtained after driving matches the sum of the current task state obtained before driving and the calculated amount of change (s _(j) +Δs _(j) ), the control unit 31 controls the manipulator 4 has transitioned to the target task state s _{s (j)} . On the other hand, otherwise, the control unit 31 may determine that the task state of the manipulator 4 has not transitioned to the target task state s _s(j) . In this embodiment, since the task space is defined by relative coordinates between two objects, the task space and the observation space can be expressed in a common dimension. Therefore, in some cases, the inverse matrix (J _ψ ⁻¹ ) of Equation 18 may be replaced with a unit matrix, and the amount of change (Δx) in hand coordinates may be treated as it is as the amount of change (Δs) in task state. As an example, when the task state is defined by the relative coordinates of a work W viewed from another work G, the inverse matrix (J _ψ ⁻¹ ) of Equation 18 may be replaced with a unit matrix.

或いは、制御部３１は、ステップＳ３１５により得られた現在のタスク状態がステップＳ３０８により決定された目標のタスク状態ｓ_s(j)と一致するか否かを判定してもよい。得られた現在のタスク状態が目標のタスク状態ｓ_s(j)と一致する場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したと判定してもよい。一方、そうではない場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移していないと判定してもよい。 Alternatively, the control unit 31 may determine whether the current task state obtained in step S315 matches the target task state s _s(j) determined in step S308. If the obtained current task state matches the target task state s _s(j) , the control unit 31 determines that the task state of the manipulator 4 has transitioned to the target task state s _s(j). good. On the other hand, otherwise, the control unit 31 may determine that the task state of the manipulator 4 has not transitioned to the target task state s _s(j) .

また、例えば、駆動後の手先座標の現在値（ｘ_(j+1)）と駆動前の手先座標の現在値（ｘ_(j)）との関係は、上記式１７と同様に、以下の式２０により表現することができる。 Further, for example, the relationship between the current value of the hand coordinates after driving (x _(j+1) ) and the current value of the hand coordinates before driving (x _(j) ) can be expressed by the following equation, similar to Equation 17 above. 20.

そこで、制御部３１は、ステップＳ３１６により取得された駆動後の手先座標の現在値が、ステップＳ３０７により取得された駆動前の手先座標の現在値（ｘ_(j)）とステップＳ３１０により決定された変化量（Δｘ_(j)）の和と一致するか否かを判定してもよい。駆動後の手先座標の現在値が駆動前の手先座標の現在値及び算出された変化量の和（ｘ_(j)＋Δｘ_(j)）と一致する場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したと判定してもよい。一方、そうではない場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移していないと判定してもよい。

Therefore, the control unit 31 determines the current value of the hand coordinates after driving obtained in step S316 from the current value of the hand coordinates before driving (x _(j) ) obtained in step S307 and the current value of the hand coordinates before driving obtained in step S310. It may be determined whether or not it matches the sum of the amount of change (Δx _{(j) )} . If the current value of the hand coordinates after driving matches the sum of the current value of the hand coordinates before driving and the calculated amount of change (x _(j) +Δx _(j) ), the control unit 31 changes the task state of the manipulator 4 to has transitioned to the target task state s _s(j) . On the other hand, otherwise, the control unit 31 may determine that the task state of the manipulator 4 has not transitioned to the target task state s _s(j) .

或いは、制御部３１は、ステップＳ３１６により取得された手先座標の現在値が、ステップＳ３０９により算出された手先座標の目標値（ｘ_s(j)）と一致するか否かを判定してもよい。駆動後の手先座標の現在値が駆動前に算出された手先座標の目標値（ｘ_s(j)）と一致する場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したと判定してもよい。一方、そうではない場合、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移していないと判定してもよい。 Alternatively, the control unit 31 may determine whether or not the current value of the hand coordinates obtained in step S316 matches the target value of the hand coordinates (x _s(j) ) calculated in step S309. . If the current value of the hand coordinates after driving matches the target value (x _s(j) ) of the hand coordinates calculated before driving, the control unit 31 determines that the task state of the manipulator 4 is the target task state s _s( It may be determined that the transition to _j) has occurred. On the other hand, otherwise, the control unit 31 may determine that the task state of the manipulator 4 has not transitioned to the target task state s _s(j) .

以上のいずれかの方法により、制御部３１は、マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したか否かを判定することができる。なお、以上の各判定において「一致すること」は、両者の値が完全に一致することの他、両者の値の差分が閾値（許容誤差）以下であることを含んでよい。マニピュレータ４のタスク状態が目標のタスク状態ｓ_s(j)に遷移したと判定した場合、制御部３１は、次のステップＳ３１８に処理を進める。一方、そうではない場合、制御部３１は、ステップＳ３１０に戻って、マニピュレータ４の駆動を再度実行する。このとき、制御部３１は、ステップＳ３１６で算出された手先座標の現在値を現在値ｘ_(j)として利用して、ステップＳ３１０以降の処理を実行してもよい。 By any of the above methods, the control unit 31 can determine whether or not the task state of the manipulator 4 has transitioned to the target task state s _s(j) . In addition, in each of the above determinations, "matching" may include not only that the values of both completely match, but also that the difference between the values of both is equal to or less than a threshold (permissible error). When determining that the task state of the manipulator 4 has transitioned to the target task state s _{s (j)} , the control unit 31 advances the process to the next step S318. On the other hand, otherwise, the controller 31 returns to step S310 and drives the manipulator 4 again. At this time, the control unit 31 may use the current value of the hand coordinates calculated in step S316 as the current value x _(j) to execute the processes after step S310.

（ステップＳ３１８）
ステップＳ３１８では、制御部３１は、最終目標のタスク状態ｓ_gを実現することができたか否かを判定する。 (Step S318)
In step S318, the control unit 31 determines whether or not the final target task state s _g has been achieved.

判定方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、制御部３１は、ステップＳ３１５により得られた現在のタスク状態ｓ_(j+1)が最終目標のタスク状態ｓ_gと一致するか否かを判定してもよい。現在のタスク状態ｓ_(j+1)が最終目標のタスク状態ｓ_gと一致する場合、制御部３１は、最終目標のタスク状態ｓ_gを実現することができたと判定する。一方、そうではない場合、制御部３１は、最終目標のタスク状態ｓ_gを実現することができていないと判定する。上記と同様に、当該判定における「一致すること」も、両者の値が完全に一致することの他、両者の値の差分が閾値（許容誤差）以下であることを含んでよい。 The determination method is not particularly limited, and may be appropriately selected according to the embodiment. For example, the control unit 31 may determine whether or not the current task state s _(j+1) obtained in step S315 matches the final target task state s _g . When the current task state s _(j+1) matches the final target task state s _g , the control unit 31 determines that the final target task state s _g has been achieved. On the other hand, otherwise, the control unit 31 determines that the final target task state s _g cannot be achieved. Similar to the above, "matching" in the determination may include not only the values of both being completely matching, but also the difference between the values being equal to or less than a threshold (permissible error).

最終目標のタスク状態ｓ_gを実現することができたと判定した場合、制御部３１は、マニピュレータ４の動作制御に関する一連の処理を終了する。一方、最終目標のタスク状態ｓ_gを実現することができていないと判定した場合、制御部３１は、ステップＳ３０８に処理を戻す。そして、制御部３１は、ステップＳ３１５及びステップＳ３１６の結果を利用して、ステップＳ３０８～ステップＳ３１３の処理を再度実行する。制御部３１は、上記一連の処理を繰り返すことで、最終目標のタスク状態ｓ_gを実現する。これにより、本実施形態に係る制御装置３は、指定されたタスクを遂行するようにマニピュレータ４の動作を制御することができる。 When determining that the final target task state s _g has been achieved, the control unit 31 terminates a series of processes related to motion control of the manipulator 4 . On the other hand, when determining that the final target task state s _g cannot be achieved, the control unit 31 returns the process to step S308. Then, the control unit 31 uses the results of steps S315 and S316 to execute the processes of steps S308 to S313 again. The control unit 31 realizes the final target task state s _g by repeating the series of processes described above. Thereby, the control device 3 according to this embodiment can control the operation of the manipulator 4 so as to perform the designated task.

なお、最終目標のタスク状態ｓ_gを実現することができていないと判定した場合の分岐先は、上記ステップＳ３０８に限られなくてもよい。例えば、複数のタスクにより構成される一連のタスクをマニピュレータ４に遂行させる場合に、最終目標のタスク状態ｓ_gには、最後に遂行されるタスクにおける最終目標のタスク状態が設定されてよい。本実施形態では、エンドエフェクタＴによりワークＷを保持し、保持したワークＷを他のワークＧに組み付けるタスクを遂行する場合に、最終目標のタスク状態ｓ_gには、ワークＷを他のワークＧに組み付けた状態が採用されてよい。この場合に、一連のタスクの遂行は、最初のタスクの開始点から開始されてよい。これに応じて、最終目標のタスク状態ｓ_gを実現することができていないと判定した場合の分岐先は、上記ステップＳ３０８ではなく、上記ステップＳ３０３であってよい。これにより、制御部３１は、ステップＳ３０４及びステップＳ３０５の処理により、動作モードを確認しながら、マニピュレータ４を駆動することができる。その結果、各タスクの切り替えをスムーズに行いながら、一連のタスクを遂行することができる。本実施形態では、エンドエフェクタＴによりワークＷを保持した際に、ワークＷを他のワークＧに運搬するタスクに動作モードをスムーズに切り替えることができる。 Note that the branch destination when it is determined that the final target task state s _g cannot be realized does not have to be limited to step S308. For example, when the manipulator 4 is caused to perform a series of tasks composed of a plurality of tasks, the final goal task state s _g may be set to the final goal task state of the last task to be performed. In this embodiment, when a task of holding a work W by the end effector T and assembling the held work W to another work G is carried out, the final target task state s _g may be adopted. In this case, the performance of the series of tasks may start from the starting point of the first task. Accordingly, when it is determined that the final target task state s _g cannot be realized, the branch destination may be step S303 instead of step S308. As a result, the control unit 31 can drive the manipulator 4 while confirming the operation mode through the processing of steps S304 and S305. As a result, a series of tasks can be accomplished while switching tasks smoothly. In this embodiment, when the work W is held by the end effector T, the operation mode can be smoothly switched to the task of transporting the work W to another work G. FIG.

（Ｂ）調整処理
次に、図２０を用いて、本実施形態に係る制御装置３の上記各推定モデル（６１、６２）のパラメータ調整に関する動作例について説明する。図２０は、本実施形態に係る制御装置３による各推定モデル（６１、６２）のパラメータ調整に関する処理手順の一例を示すフローチャートである。本実施形態では、第１推定モデル６１は、上記順運動学計算に用いる変換関数（第１変換行列群又は第２変換行列群）である。また、第２推定モデル６２は、タスク空間の値を観測空間の値に変換する変換関数（ψ）又は各座標（Ｔ_t、Ｔ_w）である。このパラメータ調整に関する情報処理は、上記マニピュレータ４の動作制御に関する情報処理と共に実行されてもよいし、或いは、別個に実行されてもよい。上記動作制御に関する処理手順を含め、以下で説明する処理手順は、本発明の「制御方法」の一例である。ただし、以下で説明する各処理手順は一例に過ぎず、各ステップは可能な限り変更されてよい。更に、以下で説明する各処理手順について、実施の形態に応じて、適宜、ステップの省略、置換、及び追加が可能である。 (B) Adjustment Processing Next, an operation example regarding parameter adjustment of the estimation models (61, 62) of the control device 3 according to the present embodiment will be described with reference to FIG. FIG. 20 is a flowchart showing an example of a processing procedure regarding parameter adjustment of each estimation model (61, 62) by the control device 3 according to this embodiment. In this embodiment, the first estimation model 61 is a transformation function (first transformation matrix group or second transformation matrix group) used for the forward kinematics calculation. Also, the second estimation model 62 is a transformation function (ψ) or each coordinate (T _t , T _w ) that transforms values in the task space into values in the observation space. The information processing related to parameter adjustment may be executed together with the information processing related to motion control of the manipulator 4, or may be executed separately. The processing procedures described below, including the processing procedures relating to the above operation control, are examples of the "control method" of the present invention. However, each processing procedure described below is merely an example, and each step may be changed as much as possible. Furthermore, for each processing procedure described below, it is possible to omit, replace, or add steps as appropriate according to the embodiment.

（ステップＳ４０１及びステップＳ４０２）
ステップＳ４０１では、制御部３１は、各データ取得部（３１１、３１２）として動作し、各センサ系から各センシングデータ（３２３、３２４）を取得する。ステップＳ４０１の処理は、上記ステップＳ３０３及びステップＳ３１４の処理と同様である。ステップＳ４０２では、制御部３１は、各推定部（３１３、３１４）として動作し、各推定モデル（６１、６２）を利用して、取得された各センシングデータ（３２３、３２４）から現在の手先座標の各推定値を算出する。ステップＳ４０２の処理は、上記ステップＳ３０７及びステップＳ３１６の処理と同様である。各推定値を算出すると、制御部３１は、次のステップＳ４０３に処理を進める。 (Step S401 and Step S402)
In step S401, the control unit 31 operates as each data acquisition unit (311, 312) and acquires each sensing data (323, 324) from each sensor system. The processing of step S401 is the same as the processing of steps S303 and S314. In step S402, the control unit 31 operates as each estimation unit (313, 314) and utilizes each estimation model (61, 62) to determine the current hand coordinates from each acquired sensing data (323, 324). Calculate each estimated value of The processing of step S402 is the same as the processing of steps S307 and S316. After calculating each estimated value, the control unit 31 advances the process to the next step S403.

なお、本パラメータ調整に関する情報処理が上記動作制御に関する情報処理と共に実行される場合、ステップＳ４０１の実行は、上記ステップＳ３０３又はステップＳ３１４の実行に相当する。また、ステップＳ４０２の実行は、上記ステップＳ３０７又はステップＳ３１６の実行に相当する。この場合、ステップＳ３０７又はステップＳ３１６を実行した後、制御部３１は、任意のタイミングで次にステップＳ４０３の処理を実行してもよい。 Note that when the information processing regarding this parameter adjustment is executed together with the information processing regarding the operation control, the execution of step S401 corresponds to the execution of step S303 or step S314. Execution of step S402 corresponds to execution of step S307 or step S316. In this case, after executing step S307 or step S316, the control unit 31 may next execute the process of step S403 at any timing.

（ステップＳ４０３及びステップＳ４０４）
ステップＳ４０３では、制御部３１は、調整部３１９として動作し、算出された第１推定値及び第２推定値の間の誤差の勾配を算出する。誤差の計算には、誤差関数等の関数式が用いられてよい。例えば、制御部３１は、第１推定値及び第２推定値の差分を算出し、算出された差分の累乗（例えば、２乗）を計算することで、得られた値を誤差として取得することができる。そして、制御部３１は、算出された誤差に対して偏微分を計算することで、各推定モデル（６１、６２）の各パラメータに関する誤差の勾配を算出することができる。 (Steps S403 and S404)
In step S403, the control unit 31 operates as the adjustment unit 319 and calculates the gradient of the error between the calculated first estimated value and second estimated value. A functional expression such as an error function may be used to calculate the error. For example, the control unit 31 calculates the difference between the first estimated value and the second estimated value, calculates the power (for example, squares) of the calculated difference, and acquires the obtained value as an error. can be done. Then, the control unit 31 can calculate the gradient of the error regarding each parameter of each estimation model (61, 62) by calculating a partial derivative of the calculated error.

ステップＳ４０４では、制御部３１は、調整部３１９として動作し、算出された勾配に基づいて、第１推定値及び第２推定値の間の誤差が小さくなるように、第１推定モデル６１及び第２推定モデル６２の少なくとも一方のパラメータの値を調整する。一例として、制御部３１は、各パラメータに関して算出された勾配を各パラメータの値から引算することで、各パラメータの値を更新する。これにより、制御部３１は、算出された勾配に基づいて、各パラメータの値を調整することができる。 In step S404, the control unit 31 operates as the adjustment unit 319, and adjusts the first estimation model 61 and the second estimation model 61 based on the calculated gradient so that the error between the first estimated value and the second estimated value becomes small. 2 Adjust the value of at least one parameter of the estimation model 62 . As an example, the control unit 31 updates the value of each parameter by subtracting the gradient calculated for each parameter from the value of each parameter. Thereby, the controller 31 can adjust the value of each parameter based on the calculated slope.

両方の推定モデル（６１、６２）のパラメータの値を調整してもよい。或いは、推定モデル（６１、６２）のうちのいずれか一方のみのパラメータの値を調整してもよい。両方にノイズが発生し得る場合、或いは両方の推定モデル（６１、６２）のパラメータが共に適正ではない可能性がある場合、両方の推定モデル（６１、６２）のパラメータを調整するのが好ましい。パラメータの調整が完了すると、制御部３１は、次のステップＳ４０５に処理を進める。 The parameter values of both estimation models (61, 62) may be adjusted. Alternatively, the parameter values of only one of the estimation models (61, 62) may be adjusted. It is preferable to adjust the parameters of both estimation models (61, 62) if noise may occur in both, or if both estimation models (61, 62) may have incorrect parameters. When the parameter adjustment is completed, the control unit 31 advances the process to the next step S405.

（ステップＳ４０５）
ステップＳ４０５では、制御部３１は、各推定モデル（６１、６２）のパラメータを調整する処理を終了するか否かを判定する。パラメータ調整の処理を終了する基準は、実施の形態に応じて適宜決定されてよい。 (Step S405)
In step S405, the control unit 31 determines whether or not to end the process of adjusting the parameters of each estimation model (61, 62). A criterion for terminating the parameter adjustment process may be appropriately determined according to the embodiment.

例えば、終了するまでにパラメータ調整を繰り返す規定回数が設定されていてもよい。規定回数は、例えば、設定値により与えられてもよいし、オペレータの指定により与えられてもよい。この場合、制御部３１は、ステップＳ４０１～ステップＳ４０４の処理を実行した回数が規定回数に到達したか否かを判定する。実行回数が規定回数に到達してないと判定した場合、制御部３１は、ステップＳ４０１に処理を戻し、ステップＳ４０１～ステップＳ４０４の処理を繰り返す。一方、実行回数が規定回数に到達していると判定した場合には、制御部３１は、第１推定値及び第２推定値の誤差の勾配に基づくパラメータ調整に関する一連の処理を終了する。 For example, a specified number of repetitions of parameter adjustment may be set until completion. The specified number of times may be given by a set value or may be given by an operator's designation, for example. In this case, the control unit 31 determines whether or not the number of times the processes of steps S401 to S404 have been executed has reached a specified number of times. When determining that the number of times of execution has not reached the specified number of times, the control unit 31 returns the process to step S401 and repeats the processes of steps S401 to S404. On the other hand, when determining that the number of times of execution has reached the specified number of times, the control unit 31 terminates a series of processes related to parameter adjustment based on the gradient of the error between the first estimated value and the second estimated value.

また、例えば、制御部３１は、処理を繰り返すか否かをオペレータに問い合わせてもよい。この場合、制御部３１は、オペレータの回答に応じて、パラメータ調整に関する処理を繰り返すか否かを判定する。オペレータが処理を繰り返すと回答した場合、制御部３１は、ステップＳ４０１に処理を戻し、ステップＳ４０１～ステップＳ４０４の処理を繰り返す。一方、オペレータが処理を繰り返さないと回答した場合、制御部３１は、第１推定値及び第２推定値の誤差の勾配に基づくパラメータ調整に関する一連の処理を終了する。 Also, for example, the control unit 31 may inquire of the operator whether to repeat the process. In this case, the control unit 31 determines whether or not to repeat the processing related to parameter adjustment according to the operator's answer. When the operator replies that the process will be repeated, the control unit 31 returns the process to step S401 and repeats the processes of steps S401 to S404. On the other hand, if the operator answers not to repeat the process, the control unit 31 terminates a series of processes related to parameter adjustment based on the gradient of the error between the first estimated value and the second estimated value.

＜接触発生時の調整処理＞
次に、図２１を用いて、本実施形態に係る制御装置３の他の方法による各推定モデル（６１、６２）のパラメータ調整に関する動作例について説明する。図２１は、他の方法による各推定モデル（６１、６２）のパラメータ調整に関する処理手順の一例を示すフローチャートである。 <Adjustment processing when contact occurs>
Next, with reference to FIG. 21, an operation example regarding parameter adjustment of each estimation model (61, 62) by another method of the control device 3 according to the present embodiment will be described. FIG. 21 is a flowchart showing an example of a processing procedure regarding parameter adjustment of each estimation model (61, 62) by another method.

本実施形態に係る制御装置３は、上記図２０に例示される方法のパラメータ調整の他、マニピュレータ４の手先が何らかの対象物に接触した時に、図２１に例示されるパラメータ調整の処理を実行する。本実施形態では、エンドエフェクタＴがワークＷを保持していない場合、エンドエフェクタＴがマニピュレータ４の手先であり、ワークＷが、マニピュレータ４の手先が接触する対象物の一例である。一方、エンドエフェクタＴがワークＷを保持している場合、ワークＧがマニピュレータ４の手先であり、他のワークＧが、マニピュレータ４の手先が接触する対象物の一例である。 The control device 3 according to the present embodiment executes the parameter adjustment process illustrated in FIG. 21 when the hand of the manipulator 4 touches some object, in addition to the parameter adjustment of the method illustrated in FIG. . In this embodiment, when the end effector T does not hold the work W, the end effector T is the tip of the manipulator 4, and the work W is an example of the object with which the tip of the manipulator 4 contacts. On the other hand, when the end effector T holds the work W, the work G is the tip of the manipulator 4, and the other work G is an example of the object with which the tip of the manipulator 4 comes into contact.

この図２１に例示される方法のパラメータ調整に関する情報処理も、上記と同様に、マニピュレータ４の動作制御に関する情報処理と共に実行されてもよいし、或いは、別個に実行されてもよい。上記動作制御に関する処理手順を含め、以下で説明する処理手順は、本発明の「制御方法」の一例である。ただし、以下で説明する各処理手順は一例に過ぎず、各ステップは可能な限り変更されてよい。更に、以下で説明する各処理手順について、実施の形態に応じて、適宜、ステップの省略、置換、及び追加が可能である。 The information processing related to parameter adjustment of the method illustrated in FIG. 21 may be executed together with the information processing related to motion control of the manipulator 4, or may be executed separately. The processing procedures described below, including the processing procedures relating to the above operation control, are examples of the "control method" of the present invention. However, each processing procedure described below is merely an example, and each step may be changed as much as possible. Furthermore, for each processing procedure described below, it is possible to omit, replace, or add steps as appropriate according to the embodiment.

（ステップＳ４１１及びステップＳ４１２）
ステップＳ４１１では、制御部３１は、各データ取得部（３１１、３１２）として動作し、各センサ系から各センシングデータ（３２３、３２４）を取得する。ステップＳ４１１の処理は、上記ステップＳ４０１の処理と同様である。ステップＳ４１２では、制御部３１は、各推定部（３１３、３１４）として動作し、各推定モデル（６１、６２）を利用して、取得された各センシングデータ（３２３、３２４）から現在の手先座標の各推定値を算出する。ステップＳ４１２の処理は、上記ステップＳ４０２の処理と同様である。各推定値を算出すると、制御部３１は、次のステップＳ４１３に処理を進める。 (Steps S411 and S412)
In step S411, the control unit 31 operates as each data acquisition unit (311, 312) and acquires each sensing data (323, 324) from each sensor system. The processing of step S411 is the same as the processing of step S401. In step S412, the control unit 31 operates as each estimation unit (313, 314), uses each estimation model (61, 62), and calculates the current hand coordinates from each acquired sensing data (323, 324). Calculate each estimated value of The processing of step S412 is the same as the processing of step S402. After calculating each estimated value, the control unit 31 advances the process to the next step S413.

（ステップＳ４１３）
ステップＳ４１３では、制御部３１は、調整部３１９として動作し、対象物との接触の境界面上で手先座標の境界値を取得する。手先座標の境界値を取得する方法は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。例えば、手先座標の境界値は、入力装置３５を介したオペレータの指定により得られてもよい。また、本実施形態では、マニピュレータ４の手先及び目標物の間の相対座標によりタスク状態が規定される。このタスク状態（相対座標）の集合を表現するタスク空間ＳＰを利用して、手先座標の境界値が得られてもよい。 (Step S413)
In step S413, the control unit 31 operates as the adjustment unit 319 and acquires the boundary value of the hand coordinates on the contact boundary surface with the object. A method for acquiring the boundary value of the hand coordinates may not be particularly limited, and may be appropriately selected according to the embodiment. For example, the boundary value of the hand coordinates may be obtained by the operator's designation via the input device 35 . Also, in this embodiment, the task state is defined by the relative coordinates between the hand of the manipulator 4 and the target. The boundary value of the hand coordinates may be obtained by using the task space SP representing the set of task states (relative coordinates).

ここで、図２２を用いて、タスク空間ＳＰを利用して、手先座標の境界値を取得する方法の一例について説明する。図２２は、タスク空間ＳＰにおいて接触の境界面上で境界値を取得する場面の一例を模式的に例示する。まず、制御部３１は、ステップＳ４１２により算出された手先座標の各推定値を変換関数（ψ）の逆関数に入力し、当該逆関数の演算処理を実行することで、各推定値に対応するタスク空間ＳＰ内の座標を算出する。図２２の例では、ノードＮｅ１が、第１推定値に対応する座標を示し、ノードＮｅ２が、第２推定値に対応する座標を示す。 Here, an example of a method of acquiring the boundary value of the hand coordinates using the task space SP will be described with reference to FIG. FIG. 22 schematically illustrates an example of a scene in which a boundary value is obtained on a contact boundary surface in the task space SP. First, the control unit 31 inputs each estimated value of the hand coordinates calculated in step S412 to the inverse function of the transformation function (ψ), and performs arithmetic processing of the inverse function, thereby corresponding to each estimated value. Calculate the coordinates in the task space SP. In the example of FIG. 22, the node Ne1 indicates the coordinates corresponding to the first estimated value, and the node Ne2 indicates the coordinates corresponding to the second estimated value.

次に、制御部３１は、タスク空間ＳＰにおける接触の境界面を導出する。接触の境界面の導出には、学習済みの判定モデル５０が用いられてよい。この場合、制御装置３（記憶部３２）は、学習結果データ１２５を保持することで、学習済みの判定モデル５０を備えてもよい。或いは、制御部３１は、ネットワークを介して第２モデル生成装置２に問い合わせることで、境界面の導出結果を第２モデル生成装置２から取得してもよい。 Next, the control unit 31 derives a contact boundary surface in the task space SP. A learned judgment model 50 may be used to derive the contact boundary surface. In this case, the control device 3 (storage unit 32 ) may include the learned determination model 50 by holding the learning result data 125 . Alternatively, the control unit 31 may acquire the interface derivation result from the second model generation device 2 by inquiring of the second model generation device 2 via the network.

続いて、制御部３１は、導出された接触の境界面上において、第１推定値に対応する座標（ノードＮｅ１）及び第２推定値に対応する座標（ノードＮｅ２）の少なくとも一方に近傍のノードＮｂを選択する。例えば、制御部３１は、両方のノード（Ｎｅ１、Ｎｅ２）に最近傍のノードをノードＮｂとして選択してもよい。制御部３１は、選択されたノードＮｂの座標を変換関数（ψ）に入力し、当該変換関数（ψ）の演算処理を実行することで、手先座標の境界値を算出することができる。 Subsequently, the control unit 31 controls a node near at least one of the coordinates corresponding to the first estimated value (node Ne1) and the coordinates corresponding to the second estimated value (node Ne2) on the derived contact boundary surface. Select Nb. For example, the control unit 31 may select a node closest to both nodes (Ne1, Ne2) as the node Nb. The control unit 31 inputs the coordinates of the selected node Nb to the transformation function (ψ) and executes the arithmetic processing of the transformation function (ψ), thereby calculating the boundary value of the hand coordinates.

以上のいずれかの方法により、制御部３１は、手先座標の境界値を取得することができる。手先座標の境界値を取得すると、制御部３１は、次のステップＳ４１４に処理を進める。なお、手先座標の境界値を取得する方法は、これらの例に限定されなくてもよい。例えば、制御部３１は、観測空間において直接的に接触の境界面を導出してもよい。制御部３１は、導出された接触の境界面上で、第１推定値の座標及び第２推定値の座標の少なくとも一方に近傍の点を選択してもよい。そして、制御部３１は、選択された点の座標を境界値として取得してもよい。 By any of the above methods, the control unit 31 can acquire the boundary value of the hand coordinates. After acquiring the boundary value of the hand coordinates, the control unit 31 advances the process to the next step S414. Note that the method of acquiring the boundary value of the hand coordinates need not be limited to these examples. For example, the control unit 31 may directly derive the interface of contact in the observation space. The control unit 31 may select a point near at least one of the coordinates of the first estimated value and the coordinates of the second estimated value on the derived contact boundary surface. Then, the control unit 31 may acquire the coordinates of the selected point as the boundary value.

（ステップＳ４１４及びステップＳ４１５）
ステップＳ４１４では、制御部３１は、調整部３１９として動作し、接触時に推定される第１推定値及び取得された境界値の間の第１誤差の勾配を算出する。また、制御部３１は、接触時に推定される第２推定値及び取得された境界値の間の第２誤差の勾配を算出する。各勾配を算出する方法は、上記ステップＳ４０３と同様であってよい。 (Step S414 and Step S415)
In step S414, the control unit 31 operates as the adjustment unit 319 and calculates the gradient of the first error between the first estimated value estimated at the time of contact and the acquired boundary value. The control unit 31 also calculates the gradient of the second error between the second estimated value estimated at the time of contact and the acquired boundary value. The method for calculating each gradient may be the same as in step S403 above.

ステップＳ４１５では、制御部３１は、調整部３１９として動作し、算出された第１誤差の勾配に基づいて、第１誤差が小さくなるように第１推定モデル６１のパラメータの値を調整する。また、制御部３１は、算出された第２誤差の勾配に基づいて、第２誤差が小さくなるように第２推定モデル６２のパラメータの値を調整する。パラメータの値を調整する方法は、上記ステップＳ４０４と同様であってよい。パラメータの調整が完了すると、制御部３１は、境界値を利用したパラメータ調整に関する一連の処理を終了する。 In step S415, the control unit 31 operates as the adjustment unit 319, and adjusts the parameter values of the first estimation model 61 based on the calculated slope of the first error so that the first error becomes smaller. Also, based on the calculated gradient of the second error, the control unit 31 adjusts the parameter values of the second estimation model 62 so that the second error becomes smaller. The method for adjusting the parameter values may be the same as in step S404 above. When the parameter adjustment is completed, the control unit 31 terminates a series of processes related to parameter adjustment using the boundary value.

＜処理タイミング＞
次に、図２３を用いて、上記動作制御のタイミングとパラメータ調整のタイミングとの関係の一例を説明する。図２３は、動作制御のタイミングとパラメータ調整のタイミングとの関係の一例を模式的に例示する。 <Processing timing>
Next, an example of the relationship between the timing of the operation control and the timing of parameter adjustment will be described with reference to FIG. FIG. 23 schematically illustrates an example of the relationship between the timing of operation control and the timing of parameter adjustment.

第１センサ系及び第２センサ系それぞれの処理周期、換言すると、各センシングデータ（３２３、３２４）を取得する周期は必ずしも同じとは限らない。それぞれの処理周期が異なる場合には、第１センサ系及び第２センサ系の少なくとも一方からセンシングデータを得ることができるタイミングで、上記マニピュレータ４の動作制御に関する情報処理を実行してもよい。これに対して、制御部３１は、両方のセンシングデータ（３２３、３２４）を取得可能なタイミングで、上記パラメータ調整を実行してもよい。 The processing cycle of each of the first sensor system and the second sensor system, in other words, the cycle of acquiring each sensing data (323, 324) is not necessarily the same. If the respective processing cycles are different, the information processing regarding the motion control of the manipulator 4 may be executed at the timing when sensing data can be obtained from at least one of the first sensor system and the second sensor system. On the other hand, the control unit 31 may perform the above parameter adjustment at a timing when both sensing data (323, 324) can be acquired.

図２３の例では、各エンコーダＳ２及び触覚センサＳ３により構成される第１センサ系の処理周期が、カメラＳ１により構成される第２センサ系の処理周期よりも短いと想定している。一例として、第１センサ系の処理周期が１０ｍｓ（ミリ秒）であり、第２センサ系の処理周期が３０ｍｓであると想定する。この場合、第２センサ系により第２センシングデータ３２４を１回取得する間に、第１センサ系により第１センシングデータ３２３を３回取得することができる。 In the example of FIG. 23, it is assumed that the processing cycle of the first sensor system composed of the encoders S2 and the tactile sensors S3 is shorter than the processing cycle of the second sensor system composed of the camera S1. As an example, assume that the processing cycle of the first sensor system is 10 ms (milliseconds) and the processing cycle of the second sensor system is 30 ms. In this case, the first sensing data 323 can be acquired three times by the first sensor system while the second sensing data 324 is acquired once by the second sensor system.

この図２３の例では、第１センサ系から第１センシングデータ３２３のみ取得可能なタイミングでは、制御部３１は、第１センシングデータ３２３から算出される現在の手先座標の第１推定値を利用して、マニピュレータ４の動作を制御してもよい。一方、第１センサ系及び第２センサ系から両方のセンシングデータ（３２３、３２４）を取得可能なタイミングでは、制御部３１は、それぞれ算出される現在の手先座標の第１推定値及び第２推定値の少なくとも一方を利用して、マニピュレータ４の動作を制御してもよい。また、このタイミングで、制御部３１は、上記図２０に示されるパラメータ調整の情報処理を実行してもよい。 In the example of FIG. 23, at the timing when only the first sensing data 323 can be acquired from the first sensor system, the control unit 31 uses the first estimated value of the current hand coordinates calculated from the first sensing data 323. may be used to control the operation of the manipulator 4 . On the other hand, at the timing when both sensing data (323, 324) can be acquired from the first sensor system and the second sensor system, the control unit 31 calculates the first estimated value and the second estimated value of the current hand coordinates, respectively. At least one of the values may be used to control the operation of manipulator 4 . Also, at this timing, the control unit 31 may execute the parameter adjustment information processing shown in FIG.

更に、マニピュレータ４の手先が何らかの対象物に接触した場合には、制御部３１は、マニピュレータ４の動作制御を停止し、第１センサ系及び第２センサ系から両方のセンシングデータ（３２３、３２４）を取得可能なタイミングまで待機してもよい。そして、両方のセンシングデータ（３２３、３２４）を取得可能なタイミングで、制御部３１は、上記図２１に示されるパラメータ調整の情報処理を実行してもよい。加えて、制御部３１は、上記図２０に示されるパラメータ調整の情報処理も共に実行してもよい。上記図２０及び図２１に示される両方のパラメータ調整を実行する場合、上記ステップＳ４０１は、ステップＳ４１１と共通の処理として実行され、上記ステップＳ４０２は、ステップＳ４１２と共通の処理として実行されてよい。これにより、マニピュレータ４の動作を制御すると共に、適切なタイミングで各推定モデル（６１、６２）のパラメータの値を調整することができる。 Furthermore, when the hand of the manipulator 4 touches some object, the control unit 31 stops the operation control of the manipulator 4, and both sensing data (323, 324) from the first sensor system and the second sensor system are collected. can be acquired. Then, at the timing when both sensing data (323, 324) can be acquired, the control unit 31 may execute the parameter adjustment information processing shown in FIG. In addition, the control unit 31 may also execute the information processing for parameter adjustment shown in FIG. When both parameter adjustments shown in FIGS. 20 and 21 are performed, step S401 may be performed as common processing with step S411, and step S402 may be performed as common processing with step S412. As a result, the operation of the manipulator 4 can be controlled, and the parameter values of each estimation model (61, 62) can be adjusted at appropriate timing.

なお、図２０及び図２１に示される各方法により、各推定モデル（６１、６２）のパラメータの値を調整した後、各推定モデル（６１、６２）による各推定値が真値に近付いているか否かは適宜評価されてよい。一例として、各センサ系により得られる各センシングデータ（３２３、３２４）に含まれ得るノイズはホワイトノイズと想定される。そのため、少なくともいずれかの所定時間分のセンシングデータを取得し、得られたセンシングデータを平均化することにより、センシングデータに含まれるノイズを除去又は低減することができる。この平均化されたセンシングデータにおいて、各推定モデル（６１、６２）による各推定値に関する成分が含まれているか否かにより、各推定モデル（６１、６２）による各推定値が真値に近付いているか否かを評価することができる。 After adjusting the parameter values of each estimation model (61, 62) by each method shown in FIG. 20 and FIG. Whether or not may be evaluated as appropriate. As an example, noise that can be included in each sensing data (323, 324) obtained by each sensor system is assumed to be white noise. Therefore, by acquiring sensing data for at least one predetermined period of time and averaging the acquired sensing data, noise contained in the sensing data can be removed or reduced. Each estimated value by each estimation model (61, 62) approaches the true value depending on whether or not the averaged sensing data includes a component related to each estimated value by each estimation model (61, 62). It is possible to evaluate whether or not

例えば、カメラＳ１として深度カメラを利用し、第２センシングデータ３２４として深度マップ（深度情報を含む画像データ）を取得したと想定する。この場合、制御部３１は、各推定値を深度マップ上にプロットし、各推定値に対応する座標と深度マップ上のマニピュレータ４の手先の座標とを比較する。この比較の結果、互いの座標が一致している（又は近似している）場合に、制御部３１は、各推定モデル（６１、６２）による各推定値が真値に近付いていると評価することができる。一方、互いの座標が乖離している場合、各推定モデル（６１、６２）による各推定値が真値に近付いていない可能性がある。この場合、制御部３１は、上記各推定値が真値に近付いていると評価できるようになるまで、上記図２０及び図２１の少なくともいずれかの方法によるパラメータ調整を繰り返してもよい。 For example, assume that a depth camera is used as the camera S1 and a depth map (image data including depth information) is acquired as the second sensing data 324 . In this case, the control unit 31 plots each estimated value on the depth map and compares the coordinates corresponding to each estimated value with the coordinates of the hand of the manipulator 4 on the depth map. As a result of this comparison, when the coordinates match (or approximate), the control unit 31 evaluates that each estimated value by each estimation model (61, 62) is close to the true value. be able to. On the other hand, when the coordinates are deviated from each other, there is a possibility that each estimated value by each estimation model (61, 62) is not close to the true value. In this case, the control unit 31 may repeat the parameter adjustment by at least one of the methods of FIGS. 20 and 21 until it can be evaluated that each estimated value is close to the true value.

［特徴］
以上のとおり、本実施形態では、上記ステップＳ４０４において、互いの推定結果（推定値）が一つの値に近付くように、第１推定モデル６１及び第２推定モデル６２の少なくとも一方のパラメータの値を調整する。また、マニピュレータ４の手先が対象物に接触したときには、上記ステップＳ４１５において、各推定値が接触の境界値に近付くように各推定モデル（６１、６２）のパラメータの値を調整する。これらの調整により、各推定モデル（６１、６２）による手先座標の推定精度の改善を期待することができる。特に、ステップＳ４１５によれば、対象物と接触するという物理的制約を伴うため、確度の高い情報（境界値）に基づいて、各推定モデル（６１、６２）のパラメータの値を調整することで、各推定モデル（６１、６２）による手先座標の推定精度の改善することができる。したがって、本実施形態によれば、マニピュレータ４の手先座標を制御する精度の向上を図ることができる。 [feature]
As described above, in the present embodiment, in step S404, the values of the parameters of at least one of the first estimation model 61 and the second estimation model 62 are adjusted so that the estimation results (estimated values) of each other approach one value. adjust. Further, when the hand of the manipulator 4 touches the object, in step S415, the values of the parameters of each estimation model (61, 62) are adjusted so that each estimated value approaches the contact boundary value. These adjustments are expected to improve the estimation accuracy of the hand coordinates by each estimation model (61, 62). In particular, according to step S415, since there is a physical constraint of contact with the object, by adjusting the parameter values of each estimation model (61, 62) based on highly accurate information (boundary values), , the estimation accuracy of the hand coordinates by each estimation model (61, 62) can be improved. Therefore, according to this embodiment, it is possible to improve the accuracy of controlling the hand coordinates of the manipulator 4 .

また、本実施形態では、上記ステップＳ３０７及びステップＳ３１６において、順運動学計算により、マニピュレータ４の現在の手先座標の第１推定値を算出することができる。エンドエフェクタＴがワークＷを保持していない場合、エンドエフェクタＴが手先に設定され、順運動学計算には、各関節（関節部４１～４６）の第１同次変換行列により導出される第１変換行列群（φ）が変換関数として用いられる。一方、エンドエフェクタＴがワークＷを保持している場合、ワークＷが手先に設定され、順運動学計算に用いる変換関数が拡張される。具体的には、順運動学計算には、エンドエフェクタＴの座標系からワークＷの座標系に座標を変換するための第２同次変換行列（_tＴ^w）を第１変換行列群（φ）に掛けることで得られる第２変換行列群（φ（ｑ）・_tＴ^w）が変換関数として用いられる。すなわち、本実施形態では、エンドエフェクタＴによりワークＷを保持した時に、エンドエフェクタＴからワークＷに運動学の基準点を変更する。 Further, in the present embodiment, the first estimated value of the current hand coordinates of the manipulator 4 can be calculated by forward kinematics calculation in steps S307 and S316. When the end effector T does not hold the workpiece W, the end effector T is set at the end of the hand, and the forward kinematics calculation is performed using the first homogeneous transformation matrix of each joint (joints 41 to 46). One transformation matrix group (φ) is used as the transformation function. On the other hand, when the end effector T holds the work W, the work W is set at the end, and the conversion function used for the forward kinematics calculation is extended. Specifically, in the forward kinematics calculation, the second homogeneous transformation matrix ( _tTw ) for transforming the coordinates from the coordinate system of the end effector T to the coordinate system ^of the work W is combined with the first transformation matrix group (φ ₎ is used as a transformation ^function . That is, in this embodiment, when the work W is held by the end effector T, the kinematic reference point is changed from the end effector T to the work W. FIG.

これにより、エンドエフェクタＴがワークＷを保持していない場合とワークＷを保持している場合とで、ステップＳ３０７及びステップＳ３１６の順運動学計算、並びにステップＳ３１１の逆運動学計算をほぼ同様に処理することができる。つまり、エンドエフェクタＴによりワークＷを保持する第１タスク及びエンドエフェクタＴにより保持されたワークＷを他のワークＧに組み付ける第２タスクを「マニピュレータ４の手先を目標物に対して移動する」共通のタスクとして取り扱うことができる。したがって、本実施形態によれば、エンドエフェクタＴがワークＷを保持していない場合とエンドエフェクタＴがワークＷを保持している場合とを区別することなく汎用的かつ統一的に制御処理を規定することができる。そのため、制御処理を単純化することができ、これによって、マニピュレータ４の動作を生成又は教示するコストを低減することができる。上記実施形態では、ワークＷをエンドエフェクタＴにより保持し、保持したワークＷを他のワークＧに組み付ける一連の動作を生成又は教示するコストを低減することができる。 As a result, the forward kinematics calculation in steps S307 and S316 and the inverse kinematics calculation in step S311 are substantially the same whether the end effector T does not hold the work W or holds the work W. can be processed. In other words, the first task of holding the work W by the end effector T and the second task of assembling the work W held by the end effector T to another work G are common to "moving the hand of the manipulator 4 with respect to the target". can be treated as a task of Therefore, according to the present embodiment, the control process is defined in a general and uniform manner without distinguishing between the case where the end effector T does not hold the work W and the case where the end effector T holds the work W. can do. Therefore, the control process can be simplified, thereby reducing the cost of generating or teaching the movement of the manipulator 4 . In the above embodiment, the work W is held by the end effector T, and the cost of creating or teaching a series of operations for assembling the held work W to another work G can be reduced.

また、本実施形態では、マニピュレータ４により実行するタスクの状態が、エンドエフェクタＴ（エンドエフェクタ）、ワークＷ、他のワークＧ等の対象物間の相対的な位置関係により表現される。これにより、制御指令は、タスクに直接的に関連付けられるのではなく、対象物間の相対的な位置関係の変化量に関連付けられる。すなわち、タスクの内容に依存せずに、対象物の相対的な位置関係を変化させることに対してマニピュレータ４に与える時系列の制御指令を生成又は教示することができる。例えば、ワークＷの座標が変化しても、上記ステップＳ３０６及びステップＳ３１５において、エンドエフェクタＴとワークＷとの間の位置関係（タスク状態）を把握する際に、そのワークＷの座標の変化が考慮される。そのため、マニピュレータ４は、学習結果に基づいて、エンドエフェクタＴによりワークＷを適切に保持することができる。したがって、本実施形態によれば、習得されるタスクを遂行する能力の汎用性を高めることができ、これによって、マニピュレータ４にタスクを教示するのにかかるコストを低減することができる。 Also, in this embodiment, the state of the task executed by the manipulator 4 is represented by the relative positional relationship among objects such as the end effector T (end effector), the work W, another work G, and the like. Thereby, the control command is not directly related to the task, but is related to the amount of change in the relative positional relationship between the objects. That is, it is possible to generate or teach a time-series control command to be given to the manipulator 4 for changing the relative positional relationship of the object without depending on the content of the task. For example, even if the coordinates of the work W change, when grasping the positional relationship (task state) between the end effector T and the work W in steps S306 and S315, the change in the coordinates of the work W is considered. Therefore, the manipulator 4 can appropriately hold the workpiece W with the end effector T based on the learning result. Therefore, according to this embodiment, it is possible to increase the versatility of the ability to perform the learned task, thereby reducing the cost of teaching the manipulator 4 the task.

また、本実施形態では、対象物間の位置関係は、相対座標により表現される。これにより、２つの対象物の間の位置関係を適切かつ端的に表現することができる。そのため、２つの対象物の間の位置関係（制御の場面では、タスク状態）を把握し易くすることができる。 Further, in this embodiment, the positional relationship between objects is represented by relative coordinates. This makes it possible to express the positional relationship between the two objects appropriately and concisely. Therefore, it is possible to easily grasp the positional relationship (task state in the case of control) between the two objects.

また、本実施形態に係る第１モデル生成装置１は、上記ステップＳ１０１及びステップＳ１０２の処理により、機械学習を実施することで、対象の位置関係において２つの対象物が接触するか否かを判定するための判定モデル５０を生成する。機械学習により生成された学習済みの判定モデル５０によれば、対象の位置関係が連続値で与えられても、判定モデル５０のデータ量の大きな増加を伴うことなく、その位置関係で２つの対象物が互いに接触するか否かを判定することができる。したがって、本実施形態によれば、２つの対象物が接触する境界を表現する情報のデータ量を大幅に低減することができる。 In addition, the first model generation device 1 according to the present embodiment performs machine learning through the processes of steps S101 and S102 to determine whether or not two objects are in contact with each other in terms of the positional relationship of the objects. A judgment model 50 is generated for According to the learned judgment model 50 generated by machine learning, even if the positional relationship of the objects is given as a continuous value, the positional relationship between the two objects can be determined without a large increase in the amount of data of the judgment model 50. It can be determined whether objects touch each other. Therefore, according to this embodiment, it is possible to greatly reduce the data amount of information representing the boundary where two objects come into contact.

ここで、図２４を更に用いて、この作用効果の具体例について説明する。図２４は、２つの対象物が互いに接触するか否かを示す値を座標点毎に保持する形態の一例を模式的に例示する。白丸が、その座標に対応する位置関係において２つの対象物が互いに接触しないことを示し、黒丸が、その座標に対応する位置関係において２つの対象物が互いに接触することを示す。図２４では、２次元により各座標点を表現しているが、上記６次元の相対座標の空間では、各座標点は、６次元で表現される。この場合、空間の解像度（分解能）を上げると、６乗のオーダでデータ量が増加してしまう。例えば、実空間での運用に利用可能な解像度で座標点を設定すると、当該情報のデータ量は、簡単に、ギガバイト単位になり得る。 Here, a specific example of this action and effect will be described with reference to FIG. FIG. 24 schematically illustrates an example of a form in which a value indicating whether or not two objects are in contact with each other is held for each coordinate point. A white circle indicates that the two objects do not contact each other in the positional relationship corresponding to the coordinates, and a black circle indicates that the two objects contact each other in the positional relationship corresponding to the coordinates. In FIG. 24, each coordinate point is expressed in two dimensions, but in the six-dimensional relative coordinate space, each coordinate point is expressed in six dimensions. In this case, if the spatial resolution is increased, the amount of data will increase in the order of the sixth power. For example, if a coordinate point is set with a resolution that can be used for operation in real space, the data amount of the information can easily reach gigabytes.

これに対して、本実施形態では、対象の位置関係において２つの対象物が互いに接触するか否かを示す情報を学習済みの判定モデル５０により保持する。この学習済みの判定モデル５０の演算パラメータの数は、相対座標の次元数に依存し得るものの、この演算パラメータの数を増やすことなく、連続値を取り扱うことができる。そのため、例えば、後述するとおり、３層構造のニューラルネットワークにより判定モデル５０を構成した場合には、学習済みの判定モデル５０のデータ量を数メガバイト程度に抑えることができる。そのため、本実施形態によれば、２つの対象物が接触する境界を表現する情報のデータ量を大幅に低減することができる。 On the other hand, in the present embodiment, the learned determination model 50 holds information indicating whether or not two objects come into contact with each other in the positional relationship of the objects. Although the number of calculation parameters of this learned judgment model 50 can depend on the number of dimensions of the relative coordinates, continuous values can be handled without increasing the number of calculation parameters. Therefore, as will be described later, for example, when the judgment model 50 is configured by a neural network with a three-layer structure, the data amount of the learned judgment model 50 can be suppressed to about several megabytes. Therefore, according to this embodiment, it is possible to greatly reduce the data amount of information representing the boundary where two objects come into contact.

また、本実施形態では、マニピュレータ４の手先及び目標物が、学習済みの判定モデル５０により接触が生じるか否かを判定する２つの対象物である。そのため、マニピュレータ４の動作を規定する場面で、２つの対象物が接触する境界を表現する情報のデータ量を大幅に低減することができる。第２モデル生成装置２では、ＲＡＭ、ＲＯＭ、及び記憶部２２の容量が比較的に小さくても、学習済みの判定モデル５０を利用することができ、これによって、目標物に手先が無用に接触しないように目標のタスク状態を決定するための推論モデル５５を生成することができる。 Further, in the present embodiment, the hand of the manipulator 4 and the target are two objects for which it is determined whether or not contact will occur by the learned determination model 50 . Therefore, in a scene that defines the operation of the manipulator 4, the data amount of information expressing the boundary where two objects come into contact can be greatly reduced. In the second model generation device 2, even if the capacities of the RAM, ROM, and storage unit 22 are relatively small, it is possible to use the learned judgment model 50, thereby preventing the hand from contacting the target unnecessarily. An inference model 55 can be generated to determine the target task state so that it does not.

また、本実施形態に係る第２モデル生成装置２は、ステップＳ２０１～ステップＳ２１０の処理により、学習済みの判定モデル５０を利用して、第１対象物が第２対象物に接触しないように目標のタスク状態を決定するための推論モデル５５を生成する。本実施形態に係る制御装置３は、ステップＳ３０８において、生成された推論モデル５５を利用して、目標のタスク状態を決定する。これにより、本実施形態に係る制御装置３は、学習済みの判定モデル５０の演算処理を伴わなくても、第１対象物が第２対象物に接触しない、すなわち、マニピュレータ４の手先が目標物に無用に接触しないように目標のタスク状態を決定することができる。そのため、マニピュレータ４の動作制御の演算コストを低減することができる。 In addition, the second model generating device 2 according to the present embodiment uses the learned determination model 50 through the processing of steps S201 to S210 to prevent the first object from coming into contact with the second object. generates an inference model 55 for determining the task state of The control device 3 according to the present embodiment uses the generated inference model 55 to determine the target task state in step S308. As a result, the control device 3 according to the present embodiment prevents the first object from coming into contact with the second object, that is, the hand of the manipulator 4 does not come into contact with the target object, even without the arithmetic processing of the learned determination model 50 . It is possible to determine the target task state so as not to touch the Therefore, it is possible to reduce the calculation cost of the motion control of the manipulator 4 .

§４変形例
以上、本発明の実施の形態を詳細に説明してきたが、前述までの説明はあらゆる点において本発明の例示に過ぎない。本発明の範囲を逸脱することなく種々の改良又は変形を行うことができることは言うまでもない。例えば、以下のような変更が可能である。なお、以下では、上記実施形態と同様の構成要素に関しては同様の符号を用い、上記実施形態と同様の点については、適宜説明を省略した。以下の変形例は適宜組み合わせ可能である。 §4 Modifications Although the embodiments of the present invention have been described in detail, the above description is merely an example of the present invention in every respect. It goes without saying that various modifications or variations can be made without departing from the scope of the invention. For example, the following changes are possible. In addition, below, the same code|symbol is used about the component similar to the said embodiment, and description is abbreviate|omitted suitably about the point similar to the said embodiment. The following modified examples can be combined as appropriate.

＜４．１＞
上記実施形態では、エンドエフェクタＴ、ワークＷ、及び他のワークＧそれぞれが対象物の一例である。特に、他のワークＧが、ワークＷの組付け先の対象物の一例である。エンドエフェクタＴがワークＷを保持していない場合、ワークＷが、マニピュレータ４の手先が接触する対象物の一例であり、エンドエフェクタＴがワークＷを保持している場合、他のワークＧが、マニピュレータ４の手先が接触する対象物の一例である。ただし、対象物は、このような例に限定されなくてもよい。対象物は、実空間又は仮想空間内で取り扱い可能なあらゆる種類の物体を含んでもよい。対象物は、上記エンドエフェクタＴ、ワークＷ、及び他のワークＧの他、例えば、障害物等のマニピュレータの動作に関連し得る物体であってもよい。 <4.1>
In the above embodiments, the end effector T, the work W, and the other work G are examples of objects. In particular, another work G is an example of an object to which the work W is attached. When the end effector T does not hold the work W, the work W is an example of the object with which the hand of the manipulator 4 contacts. It is an example of an object with which the hand of the manipulator 4 contacts. However, the target object does not have to be limited to such an example. Objects may include any kind of object that can be manipulated in real or virtual space. In addition to the end effector T, workpiece W, and other workpiece G, the target may be an object such as an obstacle that can be related to the operation of the manipulator.

なお、１つの対象物は、１つの物体で構成されてもよいし、或いは複数の物体により構成されてもよい。３つ以上の物体が存在する場合、判定モデル５０は、複数の物体を１つの対象物とみなし、複数の物体と他の物体との間で接触が生じるか否かを判定するように構成されてもよい。或いは、判定モデル５０は、個々の物体を１つの対象物とみなして、それぞれの物体間で接触が生じるか否かを判定するように構成されてもよい。 Note that one target object may be composed of one object, or may be composed of a plurality of objects. If there are three or more objects, the decision model 50 is configured to treat the objects as one object and to determine whether contact occurs between the objects and other objects. may Alternatively, the determination model 50 may be configured to regard individual objects as one object and determine whether or not contact occurs between the respective objects.

また、上記実施形態では、２つの対象物のうちの少なくともいずれかはマニピュレータの動作により移動する対象である。マニピュレータの動作により移動する対象物は、例えば、エンドエフェクタ等のマニピュレータの構成要素であってもよいし、マニピュレータ自身であってもよいし、例えば、エンドエフェクタにより保持されたワーク等のマニピュレータの構成要素以外の物体であってもよい。ただし、対象物の種類は、このような例に限定されなくてもよい。２つの対象物は共に、マニピュレータの動作により移動する対象以外の物体であってもよい。 Also, in the above embodiments, at least one of the two objects is an object that is moved by the operation of the manipulator. The object moved by the operation of the manipulator may be, for example, a component of the manipulator such as an end effector, the manipulator itself, or a configuration of the manipulator such as a workpiece held by the end effector. Objects other than elements may be used. However, the type of target does not have to be limited to such an example. Both of the two objects may be objects other than objects that are moved by manipulator operations.

また、上記実施形態では、マニピュレータ４は、垂直多関節型ロボットである。しかしながら、エンコーダＳ２により得られる各関節の角度の現在値からマニピュレータ４の手先座標を推定する場合に、マニピュレータ４は、１つ以上の関節を備えていれば、その種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。その他の場合、マニピュレータ４は、関節以外の構成要素を備えてもよい。マニピュレータ４は、上記垂直多関節ロボットの他、スカラロボット、パラレルリンクロボット、直交ロボット、協調ロボット等を含んでよい。また、上記実施形態では、制御指令は、各関節の角度に対しる指令値により構成されている。しかしながら、制御指令の構成は、このような例に限定されなくてもよく、マニピュレータ４の種類に応じて適宜決定されてよい。 Moreover, in the above embodiment, the manipulator 4 is a vertical articulated robot. However, when estimating the hand coordinates of the manipulator 4 from the current values of the angles of the joints obtained by the encoder S2, the type of the manipulator 4 is not particularly limited as long as the manipulator 4 has one or more joints. may be selected as appropriate according to the embodiment. In other cases, the manipulator 4 may comprise components other than joints. The manipulator 4 may include a SCARA robot, a parallel link robot, an orthogonal robot, a cooperative robot, etc., in addition to the vertically articulated robot. Further, in the above embodiment, the control command is composed of the command value for the angle of each joint. However, the configuration of the control command need not be limited to such an example, and may be appropriately determined according to the type of manipulator 4 .

また、上記実施形態では、エンドエフェクタＴによりワークＷを保持する作業及び保持されたワークＷを他のワークＧに組み付ける作業それぞれがマニピュレータの遂行するタスクの一例である。タスクは、少なくとも工程の一部にマニピュレータの手先の移動を伴うものであれば、その種類は、特に限定されなくてよく、実施の形態に応じて適宜選択されてよい。タスクは、上記ワークＷの保持、及びワークＷの運搬の他、例えば、部品嵌合、ネジ回し等であってよい。タスクは、例えば、ワークの保持、ワークの解放等の単純な仕事であってもよい。タスクは、例えば、対象のワークを保持し、指定の座標（位置及び姿勢）に対象のワークを配置すること等の対象のワークの座標を変更することであってよい。タスクは、例えば、エンドエフェクタとしてスプレーを用いて、指定の相対座標から当該スプレーによりワークに塗料を噴霧することであってよい。また、タスクは、例えば、エンドエフェクタに取り付けられたカメラを指定の座標に配置することであってもよい。タスクは、予め与えられてもよいし、オペレータの指定により与えられてもよい。 Further, in the above embodiment, the work of holding the work W by the end effector T and the work of assembling the held work W to another work G are examples of tasks performed by the manipulator. The type of task is not particularly limited as long as at least part of the process involves movement of the manipulator's hand, and may be appropriately selected according to the embodiment. In addition to holding the workpiece W and transporting the workpiece W, the task may be, for example, fitting parts, screwing, and the like. A task may be, for example, a simple task such as hold work, release work. The task may be, for example, holding the target work and changing the coordinates of the target work, such as placing the target work at specified coordinates (position and orientation). A task may be, for example, using a sprayer as an end effector and spraying a workpiece with the sprayer from specified relative coordinates. A task may also be, for example, to position a camera attached to an end effector at specified coordinates. A task may be given in advance or may be given by an operator's designation.

また、上記実施形態では、第１センサ系は、各エンコーダＳ２及び触覚センサＳ３により構成されている。また、第２センサ系は、カメラＳ１により構成されている。しかしながら、各センサ系は、マニピュレータ４の手先を観測可能であれば、各センサ系に用いられるセンサの種類は、このような例に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。第１センサ系及び第２センサ系の間で少なくとも一部のセンサが共通に利用されてよい。センサには、カメラ、エンコーダ、及び触覚センサの他、例えば、近接センサ、力覚センサ、トルクセンサ、圧力センサ等が用いられてよい。近接センサは、エンドエフェクタＴの周囲を観測可能な範囲に配置され、エンドエフェクタＴに近接する物体の有無を観測するのに利用されてよい。また、力覚センサ、トルクセンサ及び圧力センサは、上記触覚センサＳ３と同様に、エンドエフェクタＴに作用する力を測定可能な範囲に配置され、エンドエフェクタＴに作用する力を観測するのに利用されてよい。近接センサ、力覚センサ、トルクセンサ及び圧力センサの少なくともいずれかは、エンドエフェクタＴに対するワークＷの状態を観測するセンサとして利用されてよい。なお、カメラＳ１は、マニピュレータ４又は他のロボット装置により、任意に移動可能に構成されてよい。この場合、カメラＳ１の座標は、適宜キャリブレーションされてよい。これにより、カメラＳ１により観測する範囲を任意に制御することができる。 Further, in the above embodiment, the first sensor system is composed of the encoders S2 and the tactile sensors S3. Also, the second sensor system is composed of the camera S1. However, as long as each sensor system can observe the hand of the manipulator 4, the type of sensor used for each sensor system need not be limited to such an example, and can be appropriately selected according to the embodiment. you can At least some sensors may be used in common between the first sensor system and the second sensor system. The sensor may be a camera, an encoder, a tactile sensor, a proximity sensor, a force sensor, a torque sensor, a pressure sensor, or the like. The proximity sensor may be arranged in an observable range around the end effector T and used to observe the presence or absence of an object approaching the end effector T. Further, the force sensor, torque sensor, and pressure sensor are arranged within a range where the force acting on the end effector T can be measured in the same manner as the tactile sensor S3, and are used to observe the force acting on the end effector T. may be At least one of a proximity sensor, a force sensor, a torque sensor, and a pressure sensor may be used as a sensor for observing the state of the work W with respect to the end effector T. Note that the camera S1 may be configured to be arbitrarily movable by the manipulator 4 or another robot device. In this case, the coordinates of camera S1 may be calibrated accordingly. Thereby, the range observed by the camera S1 can be arbitrarily controlled.

上記実施形態において、第１センサ系から触覚センサＳ３は省略されてよい。触覚センサＳ３が省略される場合、第２同次変換行列（_tＴ^w）の推定には、触覚センサＳ３以外のセンサ（例えば、カメラＳ１）が用いられてよい。第１センサ系は、エンドエフェクタＴに対するワークＷの状態を観測するための他のセンサを備えてよい。或いは、第２同次変換行列（_tＴ^w）が定数で与えられてよい。また、上記実施形態において、ワークＷを保持しているか否かに応じたマニピュレータ４の手先の設定は省略されてよい。この場合、マニピュレータ４の手先は適宜設定されてよい。例えば、ワークＷを保持しているか否かを問わず、エンドエフェクタＴがマニピュレータ４の手先に設定されてよい。 In the above embodiment, the tactile sensor S3 may be omitted from the first sensor system. If the tactile sensor S3 ^is omitted, a sensor other than the tactile sensor S3 (for example, the camera S1) may be used to estimate the second homogeneous transformation matrix ( _tTw ). The first sensor system may include other sensors for observing the state of the workpiece W with respect to the end effector T. Alternatively, the second homogeneous transformation matrix ( _tTw ) may be given as ^a constant. Further, in the above embodiment, the setting of the hand of the manipulator 4 according to whether the workpiece W is being held may be omitted. In this case, the hand of the manipulator 4 may be set appropriately. For example, the end effector T may be set at the tip of the manipulator 4 regardless of whether it holds the workpiece W or not.

＜４．２＞
上記実施形態では、第２モデル生成装置２において、推論モデル５５を生成する際に、学習済みの判定モデル５０を利用している。しかしながら、学習済みの判定モデル５０の利用形態は、このような例に限定されなくてもよい。上記実施形態に係る制御装置３が、マニピュレータ４の動作を制御する際に、学習済みの判定モデル５０を利用してもよい。この場合、学習結果データ１２５は、上記と同様に、任意のタイミングで制御装置３に提供されてよい。また、制御装置３は、ソフトウェアモジュールとして接触判定部を更に備えるように構成される。 <4.2>
In the above embodiment, the learned judgment model 50 is used when the inference model 55 is generated in the second model generation device 2 . However, the usage form of the learned judgment model 50 need not be limited to such an example. The control device 3 according to the above embodiment may use the learned determination model 50 when controlling the operation of the manipulator 4 . In this case, the learning result data 125 may be provided to the control device 3 at any timing as described above. Further, the control device 3 is configured to further include a contact determination section as a software module.

図２５は、本変形例に係る目標のタスク状態の決定に関するサブルーチンの処理手順の一例を例示する。上記ステップＳ３０８の目標のタスク状態を決定する処理は、図２５に例示されるサブルーチンの処理に置き換えられてもよい。 FIG. 25 illustrates an example of a subroutine processing procedure regarding determination of a target task state according to this modification. The process of determining the target task state in step S308 may be replaced with a subroutine process illustrated in FIG.

ステップＳ５０１では、制御部３１は、行動決定部３１６として動作し、最終目標のタスク状態に近付くように、取得された現在のタスク状態に対して次に遷移する目標のタスク状態を決定する。ステップＳ５０１は、上記ステップＳ３０８と同様に処理されてよい。 In step S501, the control unit 31 operates as the action determination unit 316, and determines a target task state to transition to next from the acquired current task state so as to approach the final target task state. Step S501 may be processed in the same manner as step S308 above.

ステップＳ５０２では、制御部３１は、、接触判定部として動作し、学習済みの判定モデル５０を利用して、決定された目標のタスク状態において２つの対象物が互いに接触するか否かを判定する。ステップＳ５０２は、上記ステップＳ２０３及びステップＳ２０６と同様に処理されてよい。 In step S502, the control unit 31 operates as a contact determination unit and uses the learned determination model 50 to determine whether or not the two objects contact each other in the determined target task state. . Step S502 may be processed in the same manner as steps S203 and S206 described above.

ステップＳ５０３では、制御部３１は、ステップＳ５０２の判定結果に基づいて、処理の分岐先を決定する。ステップＳ５０２において、目標のタスク状態において２つの対象物が互いに接触すると判定された場合、制御部３１は、ステップＳ５０１に処理を戻し、目標のタスク状態を再度決定する。一方、目標のタスク状態において２つの対象物が互いに接触しないと判定された場合、制御部３１は、次のステップＳ３０９の処理を実行する。これにより、制御装置３は、マニピュレータ４の動作を制御する際に、学習済みの判定モデル５０を利用して、マニピュレータ４の手先が目標物に無用に接触しないように、マニピュレータ４の動作を決定することができる。 In step S503, the control unit 31 determines the branch destination of the process based on the determination result of step S502. If it is determined in step S502 that the two objects come into contact with each other in the target task state, the control unit 31 returns the processing to step S501 to determine the target task state again. On the other hand, when it is determined that the two objects do not come into contact with each other in the target task state, the control unit 31 executes the processing of the next step S309. Thereby, when controlling the operation of the manipulator 4, the control device 3 uses the learned judgment model 50 to determine the operation of the manipulator 4 so that the tip of the manipulator 4 does not contact the target unnecessarily. can do.

＜４．３＞
上記実施形態では、制御装置３は、上記ステップＳ３０８において、推論モデル５５を利用して、目標のタスク状態を決定している。しかしながら、目標のタスク状態を決定する方法は、このような例に限定されなくてもよい。目標のタスク状態の決定には、推論モデル５５が利用されなくてもよい。例えば、上記ステップＳ３０８では、上記ステップＳ２０５と同様に、目標のタスク状態が決定されてよい。一例として、制御部３１は、パスプランニング等の公知の方法により、目標のタスク状態を決定してもよい。また、例えば、目標のタスク状態の系列は予め与えられてもよい。この場合、上記ステップＳ３０８では、制御部３１は、当該系列を示すデータを参照することで、次に遷移する目標のタスク状態を決定してもよい。上記ステップＳ５０１も同様である。 <4.3>
In the above embodiment, the control device 3 uses the inference model 55 to determine the target task state in step S308. However, the method of determining the target task state need not be limited to such an example. The inference model 55 need not be used to determine the target task state. For example, in step S308 above, a target task state may be determined in the same manner as in step S205 above. As an example, the control unit 31 may determine the target task state by a known method such as path planning. Also, for example, the sequence of target task states may be given in advance. In this case, in step S308 , the control unit 31 may determine the next target task state to transition to by referring to the data indicating the series. The same applies to step S501.

また、上記実施形態において、推論モデル５５の生成（ステップＳ２０１～ステップＳ２１１）は、省略されてよい。或いは、上記実施形態において、制御装置３が、第２モデル生成装置２の各構成を包含してもよい。これにより、制御装置３が、上記推論モデル５５を生成する一連の処理（ステップＳ２０１～ステップＳ２１１）を更に実行するように構成されてよい。これらの場合、制御システム１００から第２モデル生成装置２が省略されてもよい。 Also, in the above embodiment, the generation of the inference model 55 (steps S201 to S211) may be omitted. Alternatively, in the above embodiment, the control device 3 may include each component of the second model generation device 2 . Thereby, the control device 3 may be configured to further execute a series of processes (steps S201 to S211) for generating the inference model 55 described above. In these cases, the second model generation device 2 may be omitted from the control system 100 .

また、上記実施形態では、学習データ２２３を収集するのに、学習済みの判定モデル５０の判定結果が利用されている。しかしながら、学習データ２２３の収集は、このような例に限定されなくてもよい。例えば、各対象物の実物を利用する等して、学習済みの判定モデル５０を利用せずに、学習データ２２３が収集されてよい。これにより、推論モデル５５は、学習済みの判定モデル５０を利用せずに生成されてよい。 Further, in the above-described embodiment, the judgment result of the learned judgment model 50 is used to collect the learning data 223 . However, collection of learning data 223 need not be limited to such an example. For example, the learning data 223 may be collected without using the learned determination model 50 by using the actual object of each object. Thereby, the inference model 55 may be generated without using the learned judgment model 50 .

また、上記実施形態において、ステップＳ２０１～ステップＳ２０９までの学習データ２２３の収集に関する処理は、他のコンピュータにより行われてもよい。この場合、上記実施形態に係る第２モデル生成装置２は、他のコンピュータにより生成された学習データ２２３を取得し、取得された学習データ２２３を使用して、ステップＳ２１０及びステップＳ２１１を実行してもよい。 Further, in the above embodiment, the processing related to the collection of learning data 223 from steps S201 to S209 may be performed by another computer. In this case, the second model generation device 2 according to the above embodiment acquires the learning data 223 generated by another computer, and uses the acquired learning data 223 to execute steps S210 and S211. good too.

＜４．４＞
上記実施形態では、２つの対象物の間の位置関係は相対座標により表現される。しかしながら、位置関係を表現する方法は、このような例に限定されなくてもよい。例えば、２つの対象物それぞれの絶対座標により位置関係が表現されてもよい。この場合、各絶対座標を相対座標に変換され、上記各情報処理が実行されてもよい。 <4.4>
In the above embodiments, the positional relationship between two objects is represented by relative coordinates. However, the method of representing the positional relationship need not be limited to such an example. For example, the positional relationship may be represented by the absolute coordinates of each of the two objects. In this case, each absolute coordinate may be converted into a relative coordinate, and each of the above information processing may be executed.

＜４．５＞
また、上記実施形態では、制御装置３は、ステップＳ３０９において、目標のタスク状態から手先座標の目標値を算出している。しかしながら、手先座標の目標値を取得する方法は、このような例に限定されなくてもよい。手先座標の目標値は、最終目標のタスク状態に近付くように適宜決定されてよい。 <4.5>
In the above embodiment, the control device 3 calculates the target value of the hand coordinates from the target task state in step S309. However, the method of acquiring the target value of the hand coordinates need not be limited to such an example. The target value of the hand coordinates may be appropriately determined so as to approach the final target task state.

例えば、手先座標の目標値は、手先座標の現在値及び最終目標のタスク状態における手先座標の値から直接的に決定されてよい。一例として、手先座標の目標値の決定には、データテーブル等の参照データが利用されてよい。この場合、制御部３１は、手先座標の現在値及び最終目標のタスク状態における手先座標の値を参照データに対して照合することで、参照データから手先座標の目標値を取得することができる。その他の例として、例えば、制御部３１は、最近傍法等により、手先座標の現在値から最終目標のタスク状態における手先座標の値まで最短距離で到達するように手先座標の目標値を決定してもよい。更にその他の例として、例えば、判定モデル５０、推論モデル５５１等と同様に、機械学習により、手先座標の現在値及び最終目標のタスク状態における手先座標の値から手先座標の目標値を決定する能力を習得した学習済みの機械学習モデルが利用されてもよい。この場合、制御部３１は、手先座標の現在値及び最終目標のタスク状態における手先座標の値を学習済みの機械学習モデルに与えて、学習済みの機械学習モデルの演算処理を実行する。これにより、制御部３１は、手先座標の目標値を決定した結果に対応する出力値を学習済みの機械学習モデルから取得することができる。 For example, the target value of the hand coordinates may be determined directly from the current value of the hand coordinates and the value of the hand coordinates in the final target task state. As an example, reference data such as a data table may be used to determine the target values of the hand coordinates. In this case, the control unit 31 can obtain the target values of the hand coordinates from the reference data by comparing the current values of the hand coordinates and the values of the hand coordinates in the final target task state with the reference data. As another example, for example, the control unit 31 uses the nearest neighbor method or the like to determine the target values of the hand coordinates so that the current values of the hand coordinates reach the value of the hand coordinates in the final target task state in the shortest distance. may As another example, for example, the ability to determine the target value of the hand coordinates from the current value of the hand coordinates and the value of the hand coordinates in the final target task state by machine learning, similar to the judgment model 50, the inference model 551, etc. A trained machine learning model that has learned the may be used. In this case, the control unit 31 gives the current values of the hand coordinates and the values of the hand coordinates in the final target task state to the learned machine learning model, and executes arithmetic processing of the learned machine learning model. Thereby, the control unit 31 can acquire the output value corresponding to the result of determining the target value of the hand coordinate from the learned machine learning model.

この場合、ステップＳ３０６、ステップＳ３０８、ステップＳ３０９、及びステップＳ３１５が、制御装置３の処理手順から省略されてよい。また、状態取得部３１５及び行動決定部３１６が、制御装置３のソフトウェア構成から省略されてよい。 In this case, steps S306, S308, S309, and S315 may be omitted from the processing procedure of the control device 3. Also, the state acquisition unit 315 and the action determination unit 316 may be omitted from the software configuration of the control device 3 .

また、上記実施形態において、例えば、最終目標のタスク状態が予め設定されている場合等、最終目標のタスク状態が他の方法により設定される場合には、制御装置３の処理手順からステップＳ３０１及びステップＳ３０２の処理は省略されてよい。この場合、制御装置３のソフトウェア構成から目標設定部３１０は省略されてよい。 Further, in the above embodiment, for example, when the final target task state is set in advance, or when the final target task state is set by another method, steps S301 and The process of step S302 may be omitted. In this case, the target setting unit 310 may be omitted from the software configuration of the control device 3 .

また、上記実施形態において、制御システム１００から第１モデル生成装置１が省略されてよい。この場合、２つの対象物が互いに接触するか否かを示す情報を学習済みの判定モデル５０以外の方法で保持する形態が採用されてもよい。例えば、２つの対象物が互いに接触するか否かを示す値を座標点毎に保持する形態が採用されてもよい。 Also, in the above embodiment, the first model generation device 1 may be omitted from the control system 100 . In this case, a method other than the learned determination model 50 may be used to hold information indicating whether or not two objects are in contact with each other. For example, a mode may be adopted in which a value indicating whether or not two objects are in contact with each other is held for each coordinate point.

＜４．６＞
上記実施形態では、判定モデル５０は、全結合型ニューラルネットワークにより構成されている。しかしながら、判定モデル５０を構成するニューラルネットワークの種類は、このような例に限定されなくてもよい。判定モデル５０は、全結合型ニューラルネットワークの他、例えば、畳み込みニューラルネットワーク、再帰型ニューラルネットワーク等により構成されてもよい。また、判定モデル５０は、複数種類のニューラルネットワークの組み合わせにより構成されてもよい。 <4.6>
In the above embodiment, the judgment model 50 is configured by a fully-connected neural network. However, the types of neural networks forming the judgment model 50 need not be limited to such examples. The judgment model 50 may be configured by, for example, a convolutional neural network, a recursive neural network, or the like, in addition to a fully connected neural network. Also, the judgment model 50 may be configured by combining a plurality of types of neural networks.

また、判定モデル５０を構成する機械学習モデルの種類は、ニューラルネットワークに限られなくてもよく、実施の形態に応じて適宜選択されてよい。判定モデル５０には、ニューラルネットワークの他、例えば、サポートベクタマシン、回帰モデル、決定木等の機械学習モデルが採用されてよい。２つの対象物が互いに接触するか否かは、実空間又は仮想空間を対象に判定されてよい。 Moreover, the type of machine learning model that constitutes the determination model 50 is not limited to a neural network, and may be appropriately selected according to the embodiment. For the judgment model 50, machine learning models such as support vector machines, regression models, and decision trees may be employed in addition to neural networks. Whether or not two objects are in contact with each other may be determined with respect to real space or virtual space.

上記実施形態において、推論モデル５５は、マニピュレータ４の遂行するタスクの種類毎に用意されてよい。すなわち、それぞれ異なるタスクにおける目標のタスク状態を推論するよう訓練された複数の推論モデル５５が用意されてよい。この場合、制御装置３の制御部３１は、上記ステップＳ３０５で設定した動作モードに応じて、用意された複数の推論モデル５５の中から推論に利用する推論モデル５５を選択してもよい。これにより、制御部３１は、動作モードに応じて、推論モデル５５を切り替えてもよい。或いは、推論モデル５５は、例えば、対象物の種類、対象物の識別子、タスクの識別子、タスクの種類等のタスクの条件を示す情報の入力を更に受け付け、入力された条件に対応するタスクにおける目標のタスク状態を推論するように構成されてもよい。この場合、制御部３１は、次に遷移する目標のタスク状態を決定する際に、上記ステップＳ３０５で設定した動作モードを示す情報を推論モデル５５に更に入力して、上記ステップＳ３０８の演算処理を実行してもよい。 In the above embodiment, the inference model 55 may be prepared for each type of task that the manipulator 4 performs. That is, multiple inference models 55 may be provided, each trained to infer a target task state for a different task. In this case, the control unit 31 of the control device 3 may select the inference model 55 to be used for inference from among the plurality of prepared inference models 55 according to the operation mode set in step S305. Thereby, the control unit 31 may switch the inference model 55 according to the operation mode. Alternatively, the inference model 55 further receives input of information indicating task conditions such as the type of object, the identifier of the object, the identifier of the task, the type of task, etc., and determines the goal of the task corresponding to the input conditions. may be configured to infer the task state of In this case, the control unit 31 further inputs information indicating the operation mode set in step S305 to the inference model 55 when determining the target task state to be transitioned to next, and performs the arithmetic processing in step S308. may be executed.

また、上記実施形態において、判定モデル５０及び推論モデル５５に対する入力及び出力の形式は、特に限定されなくてもよく、実施の形態に応じて適宜決定されてよい。例えば、判定モデル５０は、タスク状態を示す情報以外の情報の入力を更に受け付けるように構成されてもよい。同様に、推論モデル５５は、現在のタスク状態及び最終目標のタスク状態以外の情報の入力を更に受け付けるように構成されてもよい。最終目標のタスク状態が一定である場合、推論モデル５５の入力から最終目標のタスク状態を示す情報は省略されてよい。判定モデル５０及び推論モデル５５の出力形式は、識別及び回帰のいずれであってもよい。 Also, in the above embodiment, the input and output formats for the judgment model 50 and the inference model 55 are not particularly limited, and may be determined as appropriate according to the embodiment. For example, the judgment model 50 may be configured to receive input of information other than information indicating the task state. Similarly, the inference model 55 may be configured to accept further input of information other than the current task state and the final target task state. If the final goal task state is constant, the information indicating the final goal task state may be omitted from the input of the inference model 55 . The output format of the judgment model 50 and the inference model 55 may be either discrimination or regression.

また、上記実施形態において、上記順運動学計算に用いる変換関数（第１変換行列群又は第２変換行列群）が、第１推定モデル６１の一例である。また、タスク空間の値を観測空間の値に変換する変換関数（ψ）又は各座標（Ｔ_t、Ｔ_w）が、第２推定モデル６２の一例である。各推定モデル（６１、６２）の構成は、このような例に限定されなくてもよい。各推定モデル（６１、６２）は、各センシングデータ（３２３、３２４）からマニピュレータ４の手先座標を算出可能に適宜構成されてよい。 Further, in the above embodiment, the conversion function (first conversion matrix group or second conversion matrix group) used for the forward kinematics calculation is an example of the first estimation model 61 . Also, the transformation function (ψ) that transforms the values in the task space to the values in the observation space or each coordinate (T _t , T _w ) is an example of the second estimation model 62 . The configuration of each estimation model (61, 62) need not be limited to such an example. Each estimation model (61, 62) may be appropriately configured to be able to calculate the hand coordinates of the manipulator 4 from each sensing data (323, 324).

各推定モデル（６１、６２）は、各センシングデータ（３２３、３２４）から手先座標を算出するためのパラメータを備える。各推定モデル（６１、６２）の種類は、特に限定されなくてもよく、実施の形態に応じて適宜選択されてよい。各推定モデル（６１、６２）は、例えば、関数式、データテーブル等により表現されてよい。関数式により表現される場合、各推定モデル（６１、６２）は、ニューラルネットワーク、サポートベクタマシン、回帰モデル、決定木等の機械学習モデルにより構成されてよい。 Each estimation model (61, 62) has parameters for calculating hand coordinates from each sensing data (323, 324). The type of each estimation model (61, 62) may not be particularly limited, and may be appropriately selected according to the embodiment. Each estimation model (61, 62) may be represented by, for example, a functional formula, a data table, or the like. When expressed by functional formulas, each estimation model (61, 62) may be configured by machine learning models such as neural networks, support vector machines, regression models, and decision trees.

＜４．７＞
上記実施形態では、制御装置３は、図２０及び図２１の両方のパラメータ調整の情報処理を実行可能に構成されている。しかしながら、制御装置３の構成は、このような例に限定されなくてもよい。上記実施形態において、制御装置３は、図２０及び図２１のいずれか一方のみを実行するように構成されてよい。例えば、制御装置３は、図２０に示されるパラメータ調整に関する情報処理の実行を省略し、マニピュレータ４の手先が対象物に接触したときに、図２１に示されるパラメータ調整に関する情報処理を実行するように構成されてよい。 <4.7>
In the above embodiment, the control device 3 is configured to be able to execute information processing for parameter adjustment in both FIGS. 20 and 21 . However, the configuration of the control device 3 need not be limited to such an example. In the above embodiments, the controller 3 may be configured to perform only one of FIGS. 20 and 21. FIG. For example, the control device 3 omits the execution of the information processing related to parameter adjustment shown in FIG. 20, and executes the information processing related to parameter adjustment shown in FIG. may be configured to

１…第１モデル生成装置、
１１…制御部、１２…記憶部、１３…通信インタフェース、
１４…外部インタフェース、
１５…入力装置、１６…出力装置、１７…ドライブ、
９１…記憶媒体、８１…モデル生成プログラム、
１１１…データ取得部、１１２…機械学習部、
１１３…保存処理部、
１２０…ＣＡＤデータ、
１２１…学習データセット、
１２２…訓練データ、１２３…正解データ、
１２５…学習結果データ、
２…第１モデル生成装置、
２１…制御部、２２…記憶部、２３…通信インタフェース、
２４…外部インタフェース、
２５…入力装置、２６…出力装置、２７…ドライブ、
９２…記憶媒体、８２…モデル生成プログラム、
２１１…接触判定部、２１２…データ収集部、
２１３…モデル生成部、２１４…保存処理部、
２２０…ＣＡＤデータ、２２３…学習データ、
２２５…推論モデルデータ、
３…制御装置、
３１…制御部、３２…記憶部、３３…通信インタフェース、
３４…外部インタフェース、
３５…入力装置、３６…出力装置、３７…ドライブ、
９３…記憶媒体、８３…制御プログラム、
３１０…目標設定部、３１１…第１データ取得部、
３１２…第２データ取得部、３１３…第１推定部、
３１４…第２推定部、３１５…状態取得部、
３１６…行動決定部、３１７…指令決定部、
３１８…駆動部、３１９…調整部、
３２０…ＣＡＤデータ、３２１…ロボットデータ、
３２３…第１センシングデータ、
３２４…第２センシングデータ、
４…マニピュレータ、
４０…台座部、
４１～４６…関節部、４９１～４９４…リンク、
Ｔ…エンドエフェクタ、
Ｔ０…注目点、ＣＴ…ローカル座標系、
Ｗ…ワーク、
Ｗ０…注目点、ＣＷ…ローカル座標系、
Ｇ…他のワーク、ＣＧ…ローカル座標系、
ＲＣ１・ＲＣ２…相対座標、
Ｓ１…カメラ、Ｓ２…エンコーダ、Ｓ３…触覚センサ、
５０…判定モデル、
５０１…入力層、５０２…中間（隠れ）層、
５０３…出力層、
５５…推論モデル、
６１…第１推定モデル、６２…第２推定モデル 1 ... first model generation device,
11... control unit, 12... storage unit, 13... communication interface,
14 ... external interface,
15... input device, 16... output device, 17... drive,
91... Storage medium, 81... Model generation program,
111... data acquisition unit, 112... machine learning unit,
113 ... storage processing unit,
120... CAD data,
121 ... learning data set,
122...Training data, 123...Correct data,
125... Learning result data,
2 ... the first model generation device,
21... control unit, 22... storage unit, 23... communication interface,
24 ... external interface,
25... input device, 26... output device, 27... drive,
92... Storage medium, 82... Model generation program,
211 ... contact determination unit, 212 ... data collection unit,
213 ... model generation unit, 214 ... storage processing unit,
220... CAD data, 223... learning data,
225 inference model data,
3 ... control device,
31... control unit, 32... storage unit, 33... communication interface,
34 ... external interface,
35... input device, 36... output device, 37... drive,
93... Storage medium, 83... Control program,
310... Goal setting unit, 311... First data acquisition unit,
312... second data acquiring unit, 313... first estimating unit,
314 ... second estimation unit, 315 ... state acquisition unit,
316... action determination unit, 317... command determination unit,
318... drive section, 319... adjustment section,
320... CAD data, 321... robot data,
323... First sensing data,
324 ... second sensing data,
4... Manipulator,
40... Pedestal part,
41 to 46... Joints, 491 to 494... Links,
T... end effector,
T0: point of interest, CT: local coordinate system,
W... work,
W0: point of interest, CW: local coordinate system,
G... other work, CG... local coordinate system,
RC1 and RC2 ... relative coordinates,
S1... camera, S2... encoder, S3... tactile sensor,
50 ... Judgment model,
501... Input layer, 502... Intermediate (hidden) layer,
503 ... output layer,
55 Inference model,
61... First estimation model, 62... Second estimation model

Claims

A control device for controlling the operation of a manipulator,
a first data acquisition unit that acquires first sensing data from a first sensor system that observes the tip of the manipulator;
a first estimating unit that calculates a first estimated value of the current coordinates of the hand in an observation space from the first sensing data obtained using a first estimation model;
a second data acquisition unit that acquires second sensing data from a second sensor system that observes the tip of the manipulator;
a second estimation unit that calculates a second estimated value of the current coordinates of the hand in the observation space from the acquired second sensing data using a second estimation model;
calculating a gradient of an error between the first estimated value and the second estimated value, and based on the calculated gradient, at least one of the first estimated model and the second estimated model such that the error is reduced; an adjustment unit that adjusts the value of the parameter of
a command determination unit that determines a control command to be given to the manipulator based on at least one of the first estimated value and the second estimated value so that the coordinates of the hand end approach a target value;
a driving unit that drives the manipulator by giving the determined control command to the manipulator;
comprising
Control device.

The adjustment unit further
when the hand of the manipulator touches an object, obtaining a boundary value of the coordinates of the hand on the boundary surface of contact with the object;
calculating a gradient of a first error between the first estimated value estimated at the time of contact and the acquired boundary value, and reducing the first error based on the calculated gradient of the first error; and calculating a gradient of a second error between the second estimated value estimated at the time of contact and the obtained boundary value, and calculating the calculated Adjusting the parameter values of the second estimation model so that the second error is smaller based on the gradient of the second error;
A control device according to claim 1 .

the manipulator comprises one or more joints;
The first sensor system includes an encoder that measures the angle of each joint,
the second sensor system comprises a camera;
3. A control device according to claim 1 or 2.

The manipulator further comprises an end effector for holding a workpiece,
When the end effector does not hold the workpiece, the point of interest of the end effector is set to the hand,
When the end effector holds the work, the point of interest of the work is set to the hand,
The first sensor system further includes a tactile sensor for estimating the positional relationship of the workpiece with respect to the end effector.
4. A control device according to claim 3.

A control method for controlling the operation of a manipulator, comprising:
the computer
obtaining first sensing data from a first sensor system that observes the hand of the manipulator;
calculating a first estimated value of the current coordinates of the hand in the observation space from the obtained first sensing data using a first estimation model;
obtaining second sensing data from a second sensor system that observes the hand of the manipulator;
calculating a second estimated value of the current coordinates of the hand in the observation space from the acquired second sensing data using a second estimation model;
calculating the slope of the error between the first estimate and the second estimate;
adjusting a parameter value of at least one of the first estimation model and the second estimation model so as to reduce the error based on the calculated gradient;
determining a control command to be given to the manipulator based on at least one of the first estimated value and the second estimated value so that the coordinates of the hand approach a target value;
driving the manipulator by giving the determined control command to the manipulator;
run the
control method.

A control program for controlling the operation of a manipulator,
to the computer,
obtaining first sensing data from a first sensor system that observes the hand of the manipulator;
calculating a first estimated value of the current coordinates of the hand in the observation space from the obtained first sensing data using a first estimation model;
obtaining second sensing data from a second sensor system that observes the hand of the manipulator;
calculating a second estimated value of the current coordinates of the hand in the observation space from the acquired second sensing data using a second estimation model;
calculating the slope of the error between the first estimate and the second estimate;
adjusting a parameter value of at least one of the first estimation model and the second estimation model so as to reduce the error based on the calculated gradient;
determining a control command to be given to the manipulator based on at least one of the first estimated value and the second estimated value so that the coordinates of the hand approach a target value;
driving the manipulator by giving the determined control command to the manipulator;
to run the
control program.