JP6895563B2

JP6895563B2 - Robot system, model generation method, and model generation program

Info

Publication number: JP6895563B2
Application number: JP2020075789A
Authority: JP
Inventors: 岳山▲崎▼; 大輔岡野原; 叡一松元
Original assignee: Preferred Networks Inc
Current assignee: Preferred Networks Inc
Priority date: 2017-09-25
Filing date: 2020-04-22
Publication date: 2021-06-30
Anticipated expiration: 2037-09-25
Also published as: JP2020110920A

Description

本発明は、物体の取り出しのための処理を行う、装置、ロボットシステム、モデル生成方法、及びモデル生成プログラムに関する。 The present invention relates to an apparatus, a robot system, a model generation method, and a model generation program that perform processing for taking out an object.

従来、物体（以下「ワーク」ともいう）の取り出し位置の検出のために、距離画像センサを含む３次元計測部により計測した、ワークの距離画像を用いた教示が行われている。距離画像を用いた教示を行うための手法としては、例えば、ＣＡＤ（Computer-Aided Design）マッチングによる手法や、設定パラメータに基づいて探索を行う手法が一般的に利用されている。ここで、距離画像とは、計測対象物（ワーク）の表面を計測した画像であって、撮像された画像上の各画素（ピクセル）が３次元計測部からの深度情報を有する画像を意味する。すなわち、距離画像上の各画素（ピクセル）は、３次元計測部の備える３次元座標系における３次元座標情報を有するといえる。 Conventionally, in order to detect the take-out position of an object (hereinafter, also referred to as “work”), teaching using a distance image of the work measured by a three-dimensional measuring unit including a distance image sensor has been performed. As a method for teaching using a distance image, for example, a method by CAD (Computer-Aided Design) matching and a method of searching based on setting parameters are generally used. Here, the distance image means an image obtained by measuring the surface of a measurement object (work), and each pixel on the captured image has depth information from the three-dimensional measurement unit. .. That is, it can be said that each pixel on the distance image has the three-dimensional coordinate information in the three-dimensional coordinate system provided by the three-dimensional measuring unit.

ＣＡＤマッチングによる手法では、まずワークの形状に対応するＣＡＤデータを作成する。次に、ＣＡＤデータに対してロボットのハンドが把持可能となるワークの取り出し位置を教示する。また、ワークを計測した距離画像からＣＡＤデータにマッチングする箇所を探索する。そして、マッチングした箇所においてＣＡＤデータに対して教示したワークの取り出し位置に対応する位置を選択する。この選択結果に基づいて、ロボットのハンドによる把持を制御することによりワークの取り出しを実現することができる。
このようなＣＡＤデータを用いて、ワークの取り出し位置の教示を行うことにより、バラ積みされたワークを取り出す技術が、例えば特許文献１に開示されている。 In the CAD matching method, first, CAD data corresponding to the shape of the work is created. Next, the position of taking out the work that can be grasped by the robot's hand is taught to the CAD data. In addition, a location matching the CAD data is searched from the distance image obtained by measuring the work. Then, the position corresponding to the take-out position of the work taught to the CAD data is selected at the matched portion. Based on this selection result, the work can be taken out by controlling the gripping by the hand of the robot.
Patent Document 1, for example, discloses a technique for taking out bulk-loaded works by teaching a work taking-out position using such CAD data.

また、設定パラメータに基づいて検索を行う場合、教示者の経験に基づいて、ロボットのハンドが把持可能となるワークの取り出し位置を検索するための検索アルゴリズムを選択し、選択した検索アルゴリズムのパラメータを変更しながら、実際の距離画像に対してワークが見つかるかどうか、確認しながらパラメータを調整することにより、ワークの取り出しを実現することができる。 In addition, when performing a search based on the setting parameters, based on the experience of the instructor, select a search algorithm for searching the extraction position of the work that can be grasped by the robot's hand, and select the parameters of the selected search algorithm. By adjusting the parameters while checking whether the work can be found with respect to the actual distance image while changing the work, it is possible to take out the work.

特開２０１７−１２４４５０号公報JP-A-2017-124450

上述したようにＣＡＤデータマッチングによる手法は広く用いられている。しかしながら、この手法では、ＣＡＤモデルに対して、ハンドにて把持可能となるワークの取り出し位置を複数教示する必要があり、教示に時間を要してしまう。
また、マッチングを行うためにＣＡＤモデルが必要となるが、不定形のワークについては、そもそもＣＡＤモデルを作成することができない。また、取り出す対象が完成品でなく、加工途中のワークである場合、加工中のワークのＣＡＤモデルをわざわざ取り出しのために追加で作成しなければならない。 As described above, the method by CAD data matching is widely used. However, in this method, it is necessary to teach the CAD model a plurality of take-out positions of the work that can be grasped by the hand, and it takes time to teach.
In addition, a CAD model is required for matching, but a CAD model cannot be created for an amorphous work in the first place. Further, when the object to be taken out is not a finished product but a work being machined, a CAD model of the work being machined must be additionally created for taking out.

一方、ＣＡＤデータを利用することなく、距離画像からパラメータに基づいて探索を行う場合、上述したように教示者の経験に基づいて、ワークとハンドの関係を考慮した上で検索アルゴリズムを選択する必要がある。
このように、従来の手法では、ロボットシステムにおいてワークの取り出し位置を選択することは容易ではなかった。 On the other hand, when searching from a distance image based on parameters without using CAD data, it is necessary to select a search algorithm based on the experience of the instructor and considering the relationship between the work and the hand as described above. There is.
As described above, in the conventional method, it is not easy to select the take-out position of the work in the robot system.

そこで本発明は、より簡便な方法でワークの取り出し位置を選択するための、ロボットシステム及びワーク取り出し方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a robot system and a work taking-out method for selecting a work taking-out position by a simpler method.

（１）本発明の装置（例えば、後述の画像処理装置１０ａ）は、複数の物体（例えば、後述のワーク５０）の少なくとも１つをハンドにより取り出すための前記複数の物体を含む画像における位置の教示を受け付ける受付部（例えば、後述の操作受付部１３）と、
前記教示に基づいて、前記複数の物体の少なくとも１つを取り出すための学習モデルを学習する学習部（例えば、後述の学習部１４）と、を備える装置。 (1) The apparatus of the present invention (for example, the image processing apparatus 10a described later) is a position in an image including the plurality of objects for taking out at least one of a plurality of objects (for example, the work 50 described later) by hand. A reception unit that accepts teaching (for example, an operation reception unit 13 described later) and
A device including a learning unit (for example, a learning unit 14 described later) that learns a learning model for extracting at least one of the plurality of objects based on the teaching.

（２）上記（１）に記載の装置は、前記画像を表示する表示部（例えば、後述の表示部１２）、をさらに備えるようにしてもよい。
（３）上記（２）に記載の装置において、前記表示部は、前記表示された位置を描画するようにしてもよい。
（４）上記（１）乃至（３）のいずれかに記載の装置において、前記位置の教示は、前記位置を点及び領域のいずれかで指定したものであるようにしてもよい。 (2) The device according to (1) above may further include a display unit (for example, a display unit 12 described later) for displaying the image.
(3) In the device according to (2) above, the display unit may draw the displayed position.
(4) In the apparatus according to any one of (1) to (3) above, the teaching of the position may be such that the position is designated by either a point or a region.

（５）上記（１）乃至（４）のいずれかに記載の装置において、前記教示された位置及びその近傍の点群の情報を探索用情報として記憶し、前記画像に対して、前記探索用情報による探索を行うことにより、前記ハンドによる新たな取り出し位置を選択する取り出し位置選択部（例えば、後述の選択処理部１１）を備えるようにしてもよい。
（６）上記（１）乃至（５）のいずれかに記載の装置において、前記学習部は、少なくとも、前記教示に基づく評価値、又は、前記教示に基づく物体の取り出し成否のいずれか１つに基づいて、前記学習モデルを学習するようにしてもよい。 (5) In the apparatus according to any one of (1) to (4) above, information on the taught position and a point cloud in the vicinity thereof is stored as search information, and the image is used for the search. A take-out position selection unit (for example, a selection processing unit 11 described later) for selecting a new take-out position by the hand may be provided by performing a search based on information.
(6) In the apparatus according to any one of (1) to (5) above, the learning unit determines at least one of the evaluation value based on the teaching or the success or failure of taking out the object based on the teaching. Based on this, the learning model may be trained.

（７）上記（１）乃至（５）のいずれかに記載の装置において、前記学習部は、前記教示された位置及びその近傍の点群の情報を入力データとし、前記入力データとした点群の情報に対する教示に応じた評価値または取り出しの成否に応じた評価値の少なくともいずれか一方をラベルとした機械学習を行うことにより、入力データとして入力された点群の情報についての評価値を出力する前記学習モデルを学習するようにしてもよい。 (7) In the apparatus according to any one of (1) to (5) above, the learning unit uses the information of the point group at the taught position and its vicinity as input data, and uses the point group as the input data. By performing machine learning using at least one of the evaluation value according to the teaching of the information and the evaluation value according to the success or failure of extraction as a label, the evaluation value for the information of the point group input as input data is output. The learning model may be trained.

（８）上記（７）に記載の装置は、前記画像から所定領域の画像を切り抜き、切り抜かれた前記画像の点群の情報を前記学習モデルに入力データとして入力することにより出力される前記点群の情報についての評価値に基づいて、前記ハンドによる新たな取り出し位置を選択する取り出し位置選択部（例えば後述の選択処理部１１）、をさらに備えてもよい。
（９）上記（１）乃至（８）のいずれかに記載の装置において、前記学習モデルはニューラルネットワークであるようにしてもよい。 (8) The apparatus according to (7) above cuts out an image of a predetermined region from the image, and inputs the information of the point cloud of the cut out image into the learning model as input data to output the points. A take-out position selection unit (for example, a selection processing unit 11 described later) that selects a new take-out position by the hand based on the evaluation value of the group information may be further provided.
(9) In the apparatus according to any one of (1) to (8) above, the learning model may be a neural network.

（１０）本発明のロボットシステム（例えば、後述のロボットシステム１ａ）は、上記（１）乃至（８）のいずれかに記載の装置と、前記複数の物体の画像を生成する計測機（例えば、後述の３次元計測機４０）と、前記ハンドを有するロボット（例えば、後述のロボット３０）と、を備えるロボットシステム。 (10) The robot system of the present invention (for example, the robot system 1a described later) is the device according to any one of (1) to (8) above, and a measuring device (for example, a measuring device for generating images of the plurality of objects). A robot system including a three-dimensional measuring machine 40) described later and a robot having the hand (for example, the robot 30 described later).

（１１）本発明のロボットシステム（例えば、後述のロボットシステム１ａ）は、複数の物体の画像を生成する計測機（例えば、後述の３次元計測機４０）と、前記複数の物体の少なくとも１つを取り出すためのハンドを有するロボット（例えば、後述のロボット３０）と、前記画像に基づいて、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付部（例えば、後述の操作受付部１３）と、を備え、前記ロボットは、前記教示された取り出し位置に応じた点群の情報に基づいて、前記ハンドにより前記複数の物体の少なくとも１つを取り出す、ロボットシステム。 (11) The robot system of the present invention (for example, the robot system 1a described later) is a measuring machine (for example, a three-dimensional measuring machine 40 described later) that generates images of a plurality of objects, and at least one of the plurality of objects. A robot having a hand for taking out (for example, a robot 30 described later) and a reception unit (for example, an operation receiving unit 13 described later) for receiving an instruction of a taking-out position for taking out by the hand based on the image. A robot system in which the robot takes out at least one of the plurality of objects by the hand based on the information of the point group corresponding to the taught take-out position.

（１２）上記（１１）に記載のロボットシステムは、前記画像を表示する表示部、をさらに備えるようにしてもよい。
（１３）上記（１２）に記載のロボットシステムにおいて、前記表示部は、前記表示された画像上に前記教示された位置を描画するようにしてもよい。
（１４）上記（１１）乃至（１３）のいずれかに記載のロボットシステムにおいて、前記位置の教示は、前記位置を点及び領域のいずれかで指定したものであるようにしてもよい。
（１５）上記（１１）乃至（１４）のいずれかに記載のロボットシステムにおいて、前記教示された位置及びその近傍の点群の情報を探索用情報として記憶し、前記画像に対して、前記探索用情報による探索を行うことにより、前記ハンドによる新たな取り出し位置を選択する取り出し位置選択部（例えば、後述の選択処理部１１）をさらに備えるようにしてもよい。 (12) The robot system according to (11) above may further include a display unit for displaying the image.
(13) In the robot system according to the above (12), the display unit may draw the taught position on the displayed image.
(14) In the robot system according to any one of (11) to (13) above, the teaching of the position may be such that the position is designated by either a point or a region.
(15) In the robot system according to any one of (11) to (14) above, information on the taught position and a point cloud in the vicinity thereof is stored as search information, and the search is performed on the image. A take-out position selection unit (for example, a selection processing unit 11 described later) for selecting a new take-out position by the hand may be further provided by performing a search based on the information.

（１６）本発明のワーク取り出し方法は、複数の物体の画像を生成する計測機（例えば、後述の３次元計測機４０）と、前記複数の物体の少なくとも１つを取り出すためのハンドを有するロボット（例えば、後述のロボット３０）と、を備えたロボットシステム（例えば、後述のロボットシステム１ａ）が行う物体取り出し方法であって、前記画像に基づいて、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付ステップと、前記取り出し位置の教示に基づいて、学習モデルを学習するステップと、を備え、前記ロボットは、前記学習モデルに基づいて、前記ハンドにより前記複数の物体の少なくとも１つを取り出す、物体取り出し方法。 (16) The work taking-out method of the present invention is a robot having a measuring machine (for example, a three-dimensional measuring machine 40 described later) that generates images of a plurality of objects and a hand for taking out at least one of the plurality of objects. (For example, the robot 30 described later) and an object taking-out method performed by a robot system (for example, the robot system 1a described later), wherein the taking-out position for taking-out by the hand is based on the image. The robot includes a reception step for receiving teaching and a step for learning a learning model based on the teaching of the take-out position, and the robot uses the hand to perform at least one of the plurality of objects based on the learning model. How to take out an object.

（１７）上記（１６）に記載の物体取り出し方法において、前記学習するステップは、少なくとも、前記教示に基づく評価値、又は、前記教示に基づく物体の取り出し成否のいずれか１つに基づいて、前記学習モデルを学習するようにしてもよい。
（１８）上記（１６）に記載の物体取り出し方法において、前記学習するステップは、前記教示された取り出し位置及びその近傍の点群の情報を入力データとし、前記入力データとした点群の情報に対する教示に応じた評価値または取り出しの成否に応じた評価値の少なくともいずれか一方をラベルとした機械学習を行うことにより、入力データとして入力された点群の情報についての評価値を出力する前記学習モデルを学習するようにしてもよい。
（１９）上記（１８）に記載の物体取り出し方法において、前記画像から所定領域の画像を切り抜き、切り抜かれた前記画像の点群の情報を前記学習モデルに入力データとして入力することにより出力される前記点群の情報についての評価値に基づいて、新たな取り出し位置を選択する取り出し位置選択ステップ、をさらに備え、前記ロボットは、前記教示された取り出し位置に基づいて前記ハンドにより前記複数の物体の少なくとも１つを取り出し、前記取り出し位置選択ステップによって選択された前記新たな取り出し位置で各物体を前記ハンドにより取り出すようにしてもよい。 (17) In the object taking-out method according to the above (16), the learning step is based on at least one of an evaluation value based on the teaching or the success or failure of taking out the object based on the teaching. The learning model may be trained.
(18) In the object extraction method according to (16) above, the learning step uses the information of the taught extraction position and the point group in the vicinity thereof as input data, and with respect to the information of the point group used as the input data. The learning that outputs the evaluation value of the information of the point group input as input data by performing machine learning using at least one of the evaluation value according to the teaching and the evaluation value according to the success or failure of extraction as a label. You may try to train the model.
(19) In the object extraction method according to (18) above, an image of a predetermined region is cut out from the image, and the information of the point cloud of the cut out image is input to the learning model as input data to be output. The robot further includes a take-out position selection step of selecting a new take-out position based on the evaluation value of the point cloud information, and the robot uses the hand to perform the plurality of objects based on the taught take-out position. At least one may be taken out and each object may be taken out by the hand at the new take-out position selected by the take-out position selection step.

（２０）本発明の他のロボットシステム（例えば、後述のロボットシステム１ｂ）は、複数の物体（例えば、後述のワーク５０）の画像を生成する計測機（例えば、後述の３次元計測機４０）と、前記複数の物体の少なくとも１つを取り出すためのハンドを有するロボットと、前記画像を学習モデルに入力することで取得された、前記ハンドにより取り出し可能な物体が存在する領域を示す評価値マップに基づいて、前記複数の物体の少なくとも１つの取り出し位置を選択する位置選択部（例えば、後述の取り出し位置選択部１５３）と、を備え、前記ロボットは、選択した前記取り出し位置に基づいて、前記ハンドにより前記複数の物体の少なくとも１つを取り出す。
（２１）上記（２０）に記載のロボットシステム（例えば、後述のロボットシステム１ｂ）は、前記画像を表示する表示部（例えば、後述の表示部１２）と、前記表示部に表示された前記画像に基づく少なくとも１つの教示位置の教示を受け付ける受付部（例えば、後述の操作受付部１３）と、前記受付部で受け付けた前記教示位置に基づき少なくとも１つの教示位置を示すラベルマップを生成し、該ラベルマップと前記画像とを関連づけてデータセットとして教師データ格納部（例えば、後述の教師データ格納部１５１）に保存するアノテーション処理部（例えば、後述のアノテーション処理部１５２）と、前記教師データ格納部に格納された前記データセットを入力として、機械学習を行い、前記学習モデルを出力する学習処理部（例えば、後述の学習処理部１４１）と、をさらに備える。 (20) Another robot system of the present invention (for example, the robot system 1b described later) is a measuring machine (for example, a three-dimensional measuring machine 40 described later) that generates an image of a plurality of objects (for example, a work 50 described later). An evaluation value map showing a robot having a hand for taking out at least one of the plurality of objects and an area in which an object that can be taken out by the hand exists, which is acquired by inputting the image into the learning model. The robot includes a position selection unit (for example, a take-out position selection unit 153 described later) that selects at least one take-out position of the plurality of objects based on the above. Take out at least one of the plurality of objects by hand.
(21) The robot system according to (20) above (for example, the robot system 1b described later) has a display unit (for example, a display unit 12 described later) for displaying the image and the image displayed on the display unit. A reception unit (for example, an operation reception unit 13 described later) that receives teaching of at least one teaching position based on the above, and a label map showing at least one teaching position based on the teaching position received by the reception unit are generated. An annotation processing unit (for example, an annotation processing unit 152 described later) that associates a label map with the image and stores it as a data set in a teacher data storage unit (for example, a teacher data storage unit 151 described later), and the teacher data storage unit. Further includes a learning processing unit (for example, a learning processing unit 141 described later) that performs machine learning by using the data set stored in the data set as an input and outputs the learning model.

（２２）本発明の他のワーク取り出し方法は、複数の物体（例えば、後述のワーク５０）の画像を生成する計測機（例えば、後述の３次元計測機４０）と、前記複数の物体の少なくとも１つを取り出すためのハンドを有するロボット（例えば、後述のロボット３０）と、を備えたロボットシステム（例えば、後述のロボットシステム１ｂ）が行う物体取り出し方法であって、前記画像を学習モデルに入力することで取得された、前記ハンドにより取り出し可能な物体が存在する領域を示す評価値マップに基づいて、前記複数の物体の少なくとも１つの取り出し位置を選択する位置選択ステップと、を備え、前記ロボットは、選択した前記取り出し位置に基づいて、前記ハンドにより前記複数の物体の少なくとも１つを取り出す。
（２３）上記（２２）に記載の物体取り出し方法は、前記画像を表示する表示ステップと、前記表示ステップにて表示された前記画像に基づく少なくとも１つの教示位置の教示を受け付ける受付ステップと、前記受付ステップにて受け付けた前記教示位置に基づき少なくとも１つの教示位置を示すラベルマップを生成し、該ラベルマップと前記画像とを関連づけてデータセットとして教師データ格納部に保存するアノテーション処理ステップと、前記教師データ格納部に格納された前記データセットを入力として、機械学習を行い、前記学習モデルを出力する学習処理ステップと、をさらに備えるようにしてもよい。 (22) Another work taking-out method of the present invention includes a measuring machine (for example, a three-dimensional measuring machine 40 described later) that generates images of a plurality of objects (for example, the work 50 described later) and at least the plurality of objects. This is an object extraction method performed by a robot having a hand for extracting one (for example, the robot 30 described later) and a robot system (for example, the robot system 1b described later), and inputting the image into a learning model. The robot includes a position selection step of selecting at least one extraction position of the plurality of objects based on an evaluation value map indicating an area in which an object that can be extracted by the hand exists. Takes out at least one of the plurality of objects by the hand based on the selected take-out position.
(23) The object extraction method according to (22) above includes a display step for displaying the image, a reception step for receiving teaching of at least one teaching position based on the image displayed in the display step, and the above. An annotation processing step that generates a label map indicating at least one teaching position based on the teaching position received in the reception step, associates the label map with the image, and stores the data set in the teacher data storage unit, and the above. The data set stored in the teacher data storage unit may be used as an input to perform machine learning, and a learning processing step for outputting the learning model may be further provided.

本発明によれば、より簡便な方法でワークの取り出し位置を選択することが可能となる。 According to the present invention, it is possible to select the take-out position of the work by a simpler method.

本発明の各実施形態の全体構成について示す模式図である。It is a schematic diagram which shows the whole structure of each embodiment of this invention. 本発明の第１の実施形態における画像処理装置の機能ブロックを示すブロック図である。It is a block diagram which shows the functional block of the image processing apparatus in 1st Embodiment of this invention. 本発明の各実施形態において取り出し対象とするバラ積みされたワークを示す模式図である。It is a schematic diagram which shows the work piled in bulk to be taken out in each embodiment of this invention. 本発明の各実施形態においてバラ積みされたワークを計測することにより生成された距離画像を示す図である。It is a figure which shows the distance image generated by measuring the work piled in bulk in each embodiment of this invention. 本発明の各実施形態において教示された取り出し位置を描画した距離画像を示す図である。It is a figure which shows the distance image which drew the extraction position taught in each embodiment of this invention. 本発明の第１の実施形態において記録される描画位置周辺３次元点群情報を示す図である。It is a figure which shows the 3D point cloud group information around a drawing position recorded in 1st Embodiment of this invention. 本発明の第１の実施形態において行われる切り抜きについて示す図である。It is a figure which shows the cutout made in 1st Embodiment of this invention. 本発明の第１の実施形態において行われる切り抜き時の走査について示す図である。It is a figure which shows the scanning at the time of cutting performed in 1st Embodiment of this invention. 本発明の第１の実施形態における動作を示すフローチャート（１／２）である。It is a flowchart (1/2) which shows the operation in 1st Embodiment of this invention. 本発明の第１の実施形態における動作を示すフローチャート（２／２）である。It is a flowchart (2/2) which shows the operation in 1st Embodiment of this invention. 本発明の第２の実施形態における画像処理装置の機能ブロックを示すブロック図である。It is a block diagram which shows the functional block of the image processing apparatus in the 2nd Embodiment of this invention. 本発明の第２の実施形態における動作を示すフローチャートである。It is a flowchart which shows the operation in 2nd Embodiment of this invention.

次に、本発明の実施形態について図面を参照して詳細に説明をする。
以下では、第１の実施形態と第２の実施形態の２つの実施形態について説明を行う。ここで、各実施形態は、ワークの取り出し位置を選択するという構成において共通する。
ただし、取り出し位置の選択の処理において、第１の実施形態では画像のマッチング処理や画像を切り抜く処理といった前処理を行うのに対し、第２の実施形態ではこのような前処理を省略する点で相違する。
以下では、まず第１の実施形態について詳細に説明し、次に第２の実施形態において特に第１の実施形態と相違する部分について説明を行う。 Next, an embodiment of the present invention will be described in detail with reference to the drawings.
Hereinafter, two embodiments, a first embodiment and a second embodiment, will be described. Here, each embodiment is common in a configuration in which a work take-out position is selected.
However, in the process of selecting the extraction position, in the first embodiment, preprocessing such as image matching processing and image cropping processing is performed, whereas in the second embodiment, such preprocessing is omitted. It's different.
In the following, first, the first embodiment will be described in detail, and then the parts of the second embodiment that are different from the first embodiment will be described.

＜実施形態全体の構成＞
図１を参照して、第１の実施形態に係るロボットシステム１ａの構成について説明する。ロボットシステム１ａは、画像処理装置１０ａ、ロボット制御装置２０、ロボット３０、３次元計測機４０、複数のワーク５０、及びコンテナ６０を備える。
画像処理装置１０ａは、ロボット制御装置２０や３次元計測機４０と通信可能に接続されている。また、ロボット制御装置２０は、画像処理装置１０ａに加え、ロボット３０とも通信可能に接続されている。 <Structure of the entire embodiment>
The configuration of the robot system 1a according to the first embodiment will be described with reference to FIG. The robot system 1a includes an image processing device 10a, a robot control device 20, a robot 30, a three-dimensional measuring machine 40, a plurality of workpieces 50, and a container 60.
The image processing device 10a is communicably connected to the robot control device 20 and the three-dimensional measuring device 40. Further, the robot control device 20 is communicably connected to the robot 30 in addition to the image processing device 10a.

まず、ロボットシステム１ａの概略について説明する。
ロボットシステム１ａでは、バラ積みされた状態を含む乱雑に置かれた複数のワーク５０を３次元計測機４０で計測して距離画像を生成する。 First, the outline of the robot system 1a will be described.
In the robot system 1a, a plurality of randomly placed workpieces 50 including a state of being piled up in bulk are measured by a three-dimensional measuring machine 40 to generate a distance image.

そして、画像処理装置１０ａを用いて、距離画像からワーク５０の取り出し位置を選択するための教示を行う。この教示は、距離画像に対して、ユーザが直接ワーク５０の取り出し位置を教示することにより行われる。具体的には、画像処理装置１０ａは、距離画像を画像表示部に当該距離画像を構成する各画素の高さの違いが把握できるように、例えばグレースケール又はＲＧＢのグラデーション等により表示させる。 Then, using the image processing device 10a, the teaching for selecting the extraction position of the work 50 from the distance image is given. This teaching is performed by the user directly teaching the take-out position of the work 50 to the distance image. Specifically, the image processing device 10a displays the distance image on the image display unit by, for example, gray scale or RGB gradation so that the difference in height of each pixel constituting the distance image can be grasped.

これを参照したユーザが、マウスの操作等で例えば取り出し位置候補として適切な箇所を選択して、ワーク５０の取り出し位置として指定すると、画像処理装置１０ａは、表示した距離画像に対してユーザにより教示された取り出し位置（以下「教示位置」という）を描画する。また、画像処理装置１０ａは、ユーザにより教示された取り出し位置周辺の距離画像のデータ（以下「教示位置近傍画像データ」という）を取得して記憶する。なお、取り出し位置候補として適切な箇所とは、例えば３次元点の高さの高い箇所である。 When the user who refers to this selects an appropriate place as a take-out position candidate by operating a mouse or the like and designates it as the take-out position of the work 50, the image processing device 10a teaches the displayed distance image by the user. The extracted position (hereinafter referred to as "teaching position") is drawn. Further, the image processing device 10a acquires and stores the data of the distance image around the extraction position taught by the user (hereinafter referred to as "teaching position neighborhood image data"). An appropriate location as a take-out position candidate is, for example, a location with a high height of a three-dimensional point.

その後にワーク５０の取り出し位置の選択を行う場合、画像処理装置１０ａは、記憶している教示位置近傍画像データとマッチングする画像領域を距離画像にて探索する。そして、画像処理装置１０ａは、マッチングした画像領域を新たな教示位置近傍画像データとし、当該画像データの例えば中心位置を新たな取り出し位置として選択する。このように、画像処理装置１０ａは、ユーザの教示に基づいて記憶した教示位置近傍画像データにより新たな取り出し位置を選択することができるので、ユーザからの新たな教示を受けなくとも、ワーク５０の取り出し位置を選択することが可能となる。
このような構成により、ロボットシステム１ａでは、従来に比べて、より簡便な方法でワーク５０の取り出し位置を選択することができる。
また、ロボット制御装置２０は、画像処理装置１０ａが選択したワーク５０の取り出し位置に対して取り出しを行うための制御信号を生成する。そして、ロボット３０が、ロボット制御装置２０の生成した制御信号に基づいてワーク５０の取り出しを実行する。このような構成により、ロボットシステム１ａでは、選択した取り出し位置に基づいて、実際にワーク５０を取り出すこともできる。 After that, when the extraction position of the work 50 is selected, the image processing device 10a searches for an image region matching with the stored image data near the teaching position using a distance image. Then, the image processing device 10a selects the matched image region as the new teaching position neighborhood image data, and selects, for example, the center position of the image data as the new extraction position. In this way, the image processing device 10a can select a new extraction position based on the image data in the vicinity of the teaching position stored based on the instruction of the user, so that the work 50 does not need to receive a new instruction from the user. It is possible to select the take-out position.
With such a configuration, in the robot system 1a, the take-out position of the work 50 can be selected by a simpler method as compared with the conventional case.
Further, the robot control device 20 generates a control signal for taking out the work 50 selected by the image processing device 10a. Then, the robot 30 executes the extraction of the work 50 based on the control signal generated by the robot control device 20. With such a configuration, the robot system 1a can actually take out the work 50 based on the selected take-out position.

更に、画像処理装置１０ａは、教示結果に基づいて機械学習を行い、ワーク５０の取り出し位置を選択するための学習モデルを構築する。そして、画像処理装置１０ａは、構築した学習モデルに基づいて、距離画像から新たな取り出し位置を選択する。
このような構成により、ロボットシステム１ａでは、より精度高く取り出し位置を選択することができる。
以上がロボットシステム１ａの概略である。次に、ロボットシステム１ａに含まれる各装置について説明を行う。 Further, the image processing device 10a performs machine learning based on the teaching result, and constructs a learning model for selecting the extraction position of the work 50. Then, the image processing device 10a selects a new extraction position from the distance image based on the constructed learning model.
With such a configuration, the robot system 1a can select the extraction position with higher accuracy.
The above is the outline of the robot system 1a. Next, each device included in the robot system 1a will be described.

画像処理装置１０ａは、距離画像を用いた教示や機械学習を行うための装置である。画像処理装置１０ａの詳細については、図２の機能ブロック図を参照して後述する。 The image processing device 10a is a device for performing teaching and machine learning using a distance image. The details of the image processing device 10a will be described later with reference to the functional block diagram of FIG.

ロボット制御装置２０は、ロボット３０の動作を制御するための装置である。ロボット制御装置２０は、画像処理装置１０ａが選択したワーク５０の取り出し位置等の情報に基づいて、ロボット３０の動作を制御するための制御信号を生成する。そして、ロボット制御装置２０は、生成した制御信号をロボット３０に対して出力する。 The robot control device 20 is a device for controlling the operation of the robot 30. The robot control device 20 generates a control signal for controlling the operation of the robot 30 based on information such as a take-out position of the work 50 selected by the image processing device 10a. Then, the robot control device 20 outputs the generated control signal to the robot 30.

なお、画像処理装置１０ａやロボット制御装置２０は、予め行われたキャリブレーションにより、ロボット３０を制御するための機械座標系と、ワーク５０の取り出し位置を示すカメラ座標系とを対応付けているものとする。 The image processing device 10a and the robot control device 20 associate a machine coordinate system for controlling the robot 30 with a camera coordinate system indicating a take-out position of the work 50 by calibration performed in advance. And.

ロボット３０は、ロボット制御装置２０の制御に基づいて動作するロボットである。ロボット３０は、鉛直方向の軸を中心に回転するためのベース部や、移動及び回転するアームや、ワーク５０を把持するためにアームに装着されるハンドを備える。
ロボット３０は、ロボット制御装置２０が出力する制御信号に応じて、アームやハンドを駆動して、ハンドを教示位置まで移動させて、バラ積みされたワーク５０を把持してコンテナ６０から取り出す。
なお、取り出したワーク５０の移載先については図示を省略する。また、ロボット３０やハンドの具体的な構成については、当業者によく知られているので、詳細な説明を省略する。 The robot 30 is a robot that operates under the control of the robot control device 20. The robot 30 includes a base portion for rotating about an axis in the vertical direction, an arm for moving and rotating, and a hand attached to the arm for gripping the work 50.
The robot 30 drives the arm and the hand in response to the control signal output by the robot control device 20, moves the hand to the teaching position, grasps the work 50 piled up in bulk, and takes it out of the container 60.
The transfer destination of the removed work 50 is not shown. Further, since the specific configuration of the robot 30 and the hand is well known to those skilled in the art, detailed description thereof will be omitted.

３次元計測機４０は、コンテナ６０内のワーク５０を計測することにより、距離画像を生成する。３次元計測機４０は、例えば、前述したように距離画像センサを備えるカメラ、又はステレオカメラを用いた３次元計測機により実現することができる。ただし、計測の過程で３次元の点群の距離画像を生成することができるのであれば距離画像センサを備えるカメラ、又はステレオカメラを用いた３次元計測機に限定されず、他の方式の３次元計測機によって３次元計測機４０を実現してもよい。
３次元計測機４０により生成された距離画像は、画像処理装置１０ａに対して出力される。 The three-dimensional measuring machine 40 generates a distance image by measuring the work 50 in the container 60. The three-dimensional measuring machine 40 can be realized by, for example, a camera provided with a distance image sensor as described above, or a three-dimensional measuring machine using a stereo camera. However, if it is possible to generate a distance image of a three-dimensional point group in the process of measurement, it is not limited to a camera equipped with a distance image sensor or a three-dimensional measuring device using a stereo camera. The three-dimensional measuring machine 40 may be realized by the dimensional measuring machine.
The distance image generated by the three-dimensional measuring device 40 is output to the image processing device 10a.

ワーク５０は、コンテナ６０内で、バラ積みされた状態を含んで乱雑に置かれる。ワーク５０は、ロボット３０のアームに装着されたハンドで把持可能なものであればよく、その形状等は特に限定されない。 The work 50 is randomly placed in the container 60 including the state of being piled up in bulk. The work 50 may be any as long as it can be gripped by a hand attached to the arm of the robot 30, and its shape and the like are not particularly limited.

＜画像処理装置１０ａの機能ブロック＞
次に、図２を参照して画像処理装置１０ａが備える各機能ブロックについて説明をする。なお、図２では、ロボットシステム１ａにおける、画像処理装置１０ａ以外の構成要素を環境１００としてまとめて図示する。 <Functional block of image processing device 10a>
Next, each functional block included in the image processing apparatus 10a will be described with reference to FIG. In FIG. 2, the components other than the image processing device 10a in the robot system 1a are collectively illustrated as the environment 100.

画像処理装置１０ａは、選択処理部１１、表示部１２、操作受付部１３、及び学習部１４を備える。 The image processing device 10a includes a selection processing unit 11, a display unit 12, an operation reception unit 13, and a learning unit 14.

選択処理部１１は、ロボットシステム１ａの概略として上述したように、各種の画像処理を行うことにより、ワーク５０の取り出し位置を選択する。選択処理部１１により行われる画像処理は、大きく区分して、「マッチング処理」と「切り抜き処理」の２つになる。
これらの処理を行うために、選択処理部１１は、選択用データ格納部１１１、アノテーション処理部１１２、マッチング部１１３及び切り抜き部１１４を備える。
選択用データ格納部１１１は、選択処理部１１内の各機能ブロックが利用する各種のデータを格納する部分である。
アノテーション処理部１１２、ユーザから、ワーク５０の取り出し位置の教示を受けるための処理を行う部分である。
マッチング部１１３は、マッチング処理を行う部分である。また、切り抜き部１１４は切り抜き処理を行う部分である。
これら選択処理部１１が備える各機能ブロックの機能と、マッチング処理及び切り抜き処理の内容については後述する。 The selection processing unit 11 selects the take-out position of the work 50 by performing various image processing as described above as an outline of the robot system 1a. The image processing performed by the selection processing unit 11 is roughly divided into two, "matching processing" and "cutting processing".
In order to perform these processes, the selection processing unit 11 includes a selection data storage unit 111, an annotation processing unit 112, a matching unit 113, and a cutting unit 114.
The selection data storage unit 111 is a part that stores various data used by each functional block in the selection processing unit 11.
Annotation processing unit 112, a part that performs processing for receiving instruction of the extraction position of the work 50 from the user.
The matching unit 113 is a portion that performs matching processing. Further, the cutout portion 114 is a portion for performing the cutout process.
The function of each functional block included in the selection processing unit 11 and the contents of the matching process and the cutout process will be described later.

表示部１２は、選択処理部１１の出力する画像を表示する部分である。選択処理部１１の出力する画像は、例えば、ワーク５０を３次元計測機４０で計測することにより生成された距離画像や、ユーザの教示した教示位置を距離画像に描画した画像である。表示部１２は、例えば、液晶ディスプレイや有機ＥＬディスプレイにより実現される。 The display unit 12 is a portion that displays an image output by the selection processing unit 11. The image output by the selection processing unit 11 is, for example, a distance image generated by measuring the work 50 with the three-dimensional measuring machine 40, or an image in which the teaching position taught by the user is drawn on the distance image. The display unit 12 is realized by, for example, a liquid crystal display or an organic EL display.

操作受付部１３は、ユーザからの操作を受け付ける部分である。操作受付部１３は、例えば、表示部１２に表示した距離画像を参照したユーザからワーク５０の取り出し位置を教示するための当該取り出し位置を指定する操作を受け付ける。操作受付部１３は、例えばマウスやキーボード等により実現される。 The operation receiving unit 13 is a part that receives an operation from the user. The operation reception unit 13 accepts, for example, an operation of designating the take-out position for teaching the take-out position of the work 50 from a user who has referred to the distance image displayed on the display unit 12. The operation reception unit 13 is realized by, for example, a mouse, a keyboard, or the like.

例えば、ユーザがマウスやキーボードを使用して、表示部１２に表示されるカーソルを当該取り出し位置（を示す画像上）に移動させて指定するようにしてもよい。また、タッチパネルにより表示部１２と操作受付部１３を一体的に実現することで、例えば当該取り出し位置（を示す画像上）をタップすることにより指定するようにしてもよい。 For example, the user may use a mouse or a keyboard to move the cursor displayed on the display unit 12 to the extraction position (on the image showing the image) to specify the cursor. Further, by integrally realizing the display unit 12 and the operation reception unit 13 with the touch panel, for example, the extraction position (on the image showing the image) may be tapped to specify the display unit 12.

学習部１４は、機械学習に関する処理を行う部分である。学習部１４は、学習処理部１４１、学習済モデル格納部１４２、及び推定処理部１４３を備える。
学習処理部１４１は、機械学習を実行する部分であり、例えば、畳み込みニューラルネットワーク（コンボリューショナルニューラルネットワーク）を利用した深層学習（ディープラーニング）を行う。
学習済モデル格納部１４２は、学習処理部１４１が機械学習において学習中の学習モデルのパラメータや、学習済モデルのパラメータを格納する部分である。
推定処理部１４３は、選択処理部１１による取り出し位置の選択のために、学習済モデル格納部１４２に格納された学習済モデルを利用した推定を行う。
これら各部により行われる機械学習の詳細な内容は＜機械学習＞という項目にて後述する。 The learning unit 14 is a part that performs processing related to machine learning. The learning unit 14 includes a learning processing unit 141, a learned model storage unit 142, and an estimation processing unit 143.
The learning processing unit 141 is a part that executes machine learning, and for example, performs deep learning using a convolutional neural network (convolutional neural network).
The trained model storage unit 142 is a part that stores the parameters of the learning model that the learning processing unit 141 is learning in machine learning and the parameters of the trained model.
The estimation processing unit 143 performs estimation using the trained model stored in the trained model storage unit 142 in order to select the extraction position by the selection processing unit 11.
The detailed contents of machine learning performed by each of these parts will be described later in the item <Machine learning>.

以上、画像処理装置１０ａに含まれる機能ブロックについて説明した。
これらの機能ブロックを実現するために、画像処理装置１０ａは、ＣＰＵ（Central Processing Unit）等の演算処理装置を備える。また、画像処理装置１０ａは、アプリケーションソフトウェアやＯＳ（Operating System）等の各種の制御用プログラムを格納したＨＤＤ（Hard Disk Drive）等の補助記憶装置や、演算処理装置がプログラムを実行する上で一時的に必要とされるデータを格納するためのＲＡＭ（Random Access Memory）といった主記憶装置も備える。 The functional blocks included in the image processing apparatus 10a have been described above.
In order to realize these functional blocks, the image processing device 10a includes an arithmetic processing unit such as a CPU (Central Processing Unit). Further, the image processing device 10a is temporarily used as an auxiliary storage device such as an HDD (Hard Disk Drive) that stores various control programs such as application software and an OS (Operating System), or when an arithmetic processing unit executes a program. It also has a main storage device such as a RAM (Random Access Memory) for storing the required data.

そして、画像処理装置１０ａにおいて、演算処理装置が補助記憶装置からアプリケーションソフトウェアやＯＳを読み込み、読み込んだアプリケーションソフトウェアやＯＳを主記憶装置に展開させながら、これらのアプリケーションソフトウェアやＯＳに基づいた演算処理を行なう。また、この演算結果に基づいて、各装置が備える各種のハードウェアを制御する。これにより、第１の実施形態の機能ブロックは実現される。つまり、第１の実施形態は、ハードウェアとソフトウェアが協働することにより実現することができる。 Then, in the image processing device 10a, the arithmetic processing unit reads the application software and the OS from the auxiliary storage device, and while deploying the read application software and the OS to the main storage device, the arithmetic processing based on these application software and the OS is performed. Do it. In addition, various hardware included in each device is controlled based on the calculation result. Thereby, the functional block of the first embodiment is realized. That is, the first embodiment can be realized by the cooperation of hardware and software.

具体例として、画像処理装置１０ａは、一般的なパーソナルコンピュータやサーバ装置に第１の実施形態を実現するためのアプリケーションソフトウェアを組み込むことより実現できる。
ただし、画像処理装置１０ａは、学習部１４が行う機械学習に伴う演算量が多いため、例えば、コンピュータにＧＰＵ（Graphics Processing Units）を搭載し、ＧＰＧＰＵ（General-Purpose computing on Graphics Processing Units）と呼ばれる技術により、ＧＰＵを機械学習に伴う演算処理に利用するようにすると高速処理できるようになるのでよい。また、コンピュータにＦＰＧＡ（Field-Programmable Gate Array）を搭載し、ＦＰＧＡを機械学習に伴う演算処理に利用するようにすると高速処理できるようになるのでよい。 As a specific example, the image processing device 10a can be realized by incorporating application software for realizing the first embodiment into a general personal computer or server device.
However, since the image processing device 10a has a large amount of calculation due to machine learning performed by the learning unit 14, for example, a GPU (Graphics Processing Units) is mounted on a computer and is called GPGPU (General-Purpose computing on Graphics Processing Units). Depending on the technology, if the GPU is used for arithmetic processing associated with machine learning, high-speed processing may be possible. Further, if the computer is equipped with an FPGA (Field-Programmable Gate Array) and the FPGA is used for arithmetic processing associated with machine learning, high-speed processing may be possible.

更には、より高速な処理を行うために、このようなＧＰＵやＦＰＧＡを搭載したコンピュータを複数台用いてコンピュータ・クラスターを構築し、このコンピュータ・クラスターに含まれる複数のコンピュータにて並列処理を行うようにしてもよい。 Furthermore, in order to perform higher-speed processing, a computer cluster is constructed using a plurality of computers equipped with such GPUs and FPGAs, and parallel processing is performed by a plurality of computers included in the computer cluster. You may do so.

＜マッチング処理＞
次に、選択処理部１１が行う処理の内容について詳細に説明をする。上述したように、選択処理部１１は、「マッチング処理」と「切り抜き処理」の２つの処理を行う。まず、図３から図６までを参照してマッチング処理について説明をする。 <Matching process>
Next, the contents of the processing performed by the selection processing unit 11 will be described in detail. As described above, the selection processing unit 11 performs two processes, "matching process" and "cutout process". First, the matching process will be described with reference to FIGS. 3 to 6.

図３に、コンテナ６０内に積まれているワーク５０を俯瞰した状態を示す。図３に示すように、ワーク５０は乱雑にバラ積みされており、ワーク５０の取り出し位置が教示されることなく、ロボット３０のハンドにより当該ワーク５０を把持して取り出すことは困難な状態となっている。なお、図中では、図示の都合上一部のワーク５０についてのみ符号を付している。 FIG. 3 shows a bird's-eye view of the work 50 stacked in the container 60. As shown in FIG. 3, the works 50 are randomly piled up in bulk, and it is difficult to grasp and take out the work 50 by the hand of the robot 30 without being taught the take-out position of the work 50. ing. In the figure, for convenience of illustration, only some of the works 50 are designated by reference numerals.

３次元計測機４０は、コンテナ６０内に積まれているバラ積みされたワーク５０を計測することにより距離画像を生成する。３次元計測機４０は、生成した距離画像を、画像処理装置１０ａの選択処理部１１に対して送信する。選択処理部１１が受信した距離画像は、選択用データ格納部１１１に格納される。
アノテーション処理部１１２は、ユーザからワーク５０の取り出し位置の教示を受けるために、選択用データ格納部１１１から距離画像を取得する。そして、アノテーション処理部１５２は、取得した距離画像を表示部１２に対して出力する。表示部１２は、入力された距離画像を図４に示すようにして表示する。なお、図中では、図示の都合上一部のワーク５０についてのみ符号を付している。 The three-dimensional measuring machine 40 generates a distance image by measuring the bulk workpieces 50 stacked in the container 60. The three-dimensional measuring device 40 transmits the generated distance image to the selection processing unit 11 of the image processing device 10a. The distance image received by the selection processing unit 11 is stored in the selection data storage unit 111.
The annotation processing unit 112 acquires a distance image from the selection data storage unit 111 in order to receive the instruction of the extraction position of the work 50 from the user. Then, the annotation processing unit 152 outputs the acquired distance image to the display unit 12. The display unit 12 displays the input distance image as shown in FIG. In the figure, for convenience of illustration, only some of the works 50 are designated by reference numerals.

３次元計測を行うと、２次元のピクセル上に各ピクセル位置が高さ情報を持つ距離画像を生成することができる。アノテーション処理部１１２は、距離画像を、各ピクセルが持つ高さ情報をＲＧＢによる色彩の諧調で表現して表示させたり、あるいはグレースケールの濃度で表現して表示させる。教示を行うユーザは、このような表現方法の距離画像を参照することにより深度情報を認識することが可能となる。 When three-dimensional measurement is performed, it is possible to generate a distance image in which each pixel position has height information on a two-dimensional pixel. The annotation processing unit 112 displays the distance image by expressing the height information of each pixel in RGB color tones or by expressing it in grayscale density. The user who teaches can recognize the depth information by referring to the distance image of such an expression method.

図４では、特許図面を記載する上の制約から、各ピクセルが持つ高さ情報をハッチングにより表しているが、これに限定されない。例えば、色彩の諧調やグラデーション等により、各ピクセルが持つ高さ情報を表現するようにしてもよい。図４では、３次元計測機４０からの距離が近いもの（つまり、地面からの高さが高いもの）が白色になるようにし、３次元計測機４０からの距離が遠いもの（つまり、地面からの高さが低いもの）が黒色になるように表現している。 In FIG. 4, the height information of each pixel is represented by hatching due to the limitation of describing the patent drawing, but the present invention is not limited to this. For example, the height information of each pixel may be expressed by color gradation, gradation, or the like. In FIG. 4, those that are close to the 3D measuring instrument 40 (that is, those that are high from the ground) are white, and those that are far from the 3D measuring instrument 40 (that is, those that are far from the ground) are made white. The one with a low height) is expressed as black.

例えば、ワーク５０ａは、ワーク５０ｂよりも地面から高い位置に存在するので、ワーク５０ａはワーク５０ｂよりも白色になるようなハッチングがなされている。また、ワーク５０ｂに対応する面は、ワーク５０ａに対応する面により一部隠れている。ユーザは、これを参照することにより、ワーク５０ｂの上にワーク５０ａが積まれている状態であることが分かる。
また、例えばワーク５０ｃに対応する面は、場所により高さが異なっている。ユーザは、これを参照することにより、ワーク５０ｃが傾斜して積まれている状態であることが分かる。 For example, since the work 50a exists at a position higher than the work 50b from the ground, the work 50a is hatched so as to be whiter than the work 50b. Further, the surface corresponding to the work 50b is partially hidden by the surface corresponding to the work 50a. By referring to this, the user can know that the work 50a is stacked on the work 50b.
Further, for example, the height of the surface corresponding to the work 50c differs depending on the location. By referring to this, the user can know that the work 50c is in a state of being stacked in an inclined manner.

このように表示部１２に表示された距離画像を参照したユーザは、前述したようにマウスやタッチパネルにより実現される操作受付部１３を利用して、ワーク５０の取り出し位置を指定することにより、ワーク５０の取り出し位置の教示を行う。操作受付部１３は、教示されたワーク５０の取り出し位置である教示位置を、アノテーション処理部１１２に対して通知する。アノテーション処理部１１２は、この通知された教示位置をユーザが認識できるように、距離画像上の教示位置に対して描画を行う。描画は、例えば、距離画像上の教示位置の画素の色を変更する等の、ユーザが把握しやすい方法により行う。描画を行った場合の表示例を図５に示す。 The user who has referred to the distance image displayed on the display unit 12 in this way uses the operation reception unit 13 realized by the mouse or the touch panel as described above to specify the take-out position of the work 50 to work. The take-out position of 50 is taught. The operation receiving unit 13 notifies the annotation processing unit 112 of the teaching position, which is the take-out position of the taught work 50. The annotation processing unit 112 draws on the teaching position on the distance image so that the user can recognize the notified teaching position. The drawing is performed by a method that is easy for the user to grasp, such as changing the color of the pixel at the teaching position on the distance image. FIG. 5 shows a display example when drawing is performed.

ユーザは取り出し可能な位置にあるワーク５０のそれぞれについて、取出し位置を教示する。そして、図中に教示位置７１として示すように、ユーザが教示した教示位置を把握できるように描画が行われる。なお、図中では、図示の都合上一部の教示位置７１についてのみ符号を付している。 The user teaches the take-out position for each of the works 50 in the take-out position. Then, as shown as the teaching position 71 in the drawing, drawing is performed so that the teaching position taught by the user can be grasped. In the figure, reference numerals are given only to some teaching positions 71 for convenience of illustration.

アノテーション処理部１１２は、教示位置の３次元点情報と、教示位置を中心とした教示位置近傍の所定の範囲内の３次元点群情報である教示位置近傍画像データを取得する。そして、アノテーション処理部１１２は、取得した教示位置の３次元点情報と、教示位置近傍画像データとを対応付けて、マッチング用点群情報として選択用データ格納部１１１に格納する。 The annotation processing unit 112 acquires three-dimensional point information of the teaching position and image data near the teaching position, which is three-dimensional point group information within a predetermined range in the vicinity of the teaching position centered on the teaching position. Then, the annotation processing unit 112 associates the acquired three-dimensional point information of the teaching position with the image data in the vicinity of the teaching position and stores it in the selection data storage unit 111 as matching point group information.

マッチング用点群情報の一例を、マッチング用点群情報８０として図６に示す。なお、図６の例では、説明のために教示点に描画を施しているが、マッチング用点群情報８０に含まれる教示位置近傍画像データには教示位置を示す描画は含まない。また、以下の説明においては、マッチング用点群情報の符号である「８０」の記載は省略する。 An example of the matching point cloud information is shown in FIG. 6 as the matching point cloud information 80. In the example of FIG. 6, the teaching points are drawn for explanation, but the drawing indicating the teaching position is not included in the image data near the teaching position included in the matching point cloud group information 80. Further, in the following description, the description of "80", which is the code of the matching point cloud information, is omitted.

マッチング用点群情報として取得する３次元点群の範囲の大きさは、ワーク５０の大きさ等に基づいて予め設定しておく。なお、ある程度大きい範囲の３次元点群情報をマッチング用点群情報として記憶しておき、記憶したマッチング用点群情報を、後述のマッチングにおいて利用する際に、設定の調整を行って、必要なサイズにトリミングしたものをマッチング用点群情報として利用するようにしてもよい。 The size of the range of the three-dimensional point cloud acquired as the matching point cloud information is set in advance based on the size of the work 50 and the like. It should be noted that the three-dimensional point cloud information in a large range to some extent is stored as the matching point cloud information, and when the stored matching point cloud information is used in the matching described later, it is necessary to adjust the settings. The one trimmed to the size may be used as the matching point cloud information.

なお、アノテーション処理部１１２は、マッチング用点群情報に、付帯的な情報を追加するようにしてもよい。例えば、マッチング用点群情報の特徴を示す情報を追加するようにしてもよい。特徴を示す情報とは、例えば、マッチング用点群情報に含まれる複数のピクセルが持つ３次元点の高さの平均や、教示位置に対応するピクセルが持つ３次元点の高さ等の情報である。 Note that the annotation processing unit 112 may add incidental information to the matching point cloud information. For example, information indicating the characteristics of the matching point cloud information may be added. The characteristic information is, for example, information such as the average height of three-dimensional points of a plurality of pixels included in the matching point cloud information, the height of three-dimensional points of pixels corresponding to the teaching position, and the like. is there.

第１の実施形態では、複数のワーク５０がバラ積みされているので、３次元計測機４０により計測される一枚の距離画像を利用して、上述したようにユーザにより複数の教示位置が教示されることにより、ワーク５０が取りうる複数の姿勢のそれぞれの姿勢について取り出し位置の教示をすることが可能となる。 In the first embodiment, since the plurality of works 50 are piled up in bulk, a plurality of teaching positions are taught by the user as described above by using one distance image measured by the three-dimensional measuring machine 40. By doing so, it becomes possible to teach the take-out position for each of the plurality of postures that the work 50 can take.

なお、背景技術として説明したＣＡＤモデルでの教示は、形状に対して３次元点が一様に取得できると仮定する。しかしながら、実際に撮像をした場合には、光学条件等が原因で、形状に対して３次元点が一様に取得できない。このように、仮定と実際の撮像とでは、３次元点の取れやすさには違いがあるため、この乖離がマッチングの度合いに影響することがある。 The teaching in the CAD model described as the background technique assumes that the three-dimensional points can be uniformly acquired with respect to the shape. However, when an image is actually taken, three-dimensional points cannot be uniformly acquired with respect to the shape due to optical conditions and the like. As described above, since there is a difference in the ease of obtaining three-dimensional points between the hypothetical image and the actual image pickup, this deviation may affect the degree of matching.

これに対して第１の実施形態では、実際の光学条件で取得した３次元点による距離画像を入力として取得して、描画位置周辺の３次元点群情報を保存できるため、ＣＡＤモデルでの教示のように光学条件等が原因で、ＣＡＤモデル上の教示位置と対応するワーク５０の教示位置が取得できないような不具合が生じることを防止できる。 On the other hand, in the first embodiment, the distance image by the three-dimensional points acquired under the actual optical conditions can be acquired as an input, and the three-dimensional point cloud information around the drawing position can be saved. As described above, it is possible to prevent a problem that the teaching position on the CAD model and the teaching position of the work 50 corresponding to the teaching position on the CAD model cannot be obtained due to optical conditions or the like.

次に、ワーク５０の取り出し時に、マッチング部１１３により行われる、マッチング用点群情報を利用したマッチングについて説明を行う。
マッチング部１１３は、ワーク５０の取り出し時に、３次元計測機４０からコンテナ６０内に積まれているバラ積みされたワーク５０の距離画像を取得する。そして、マッチング部１１３は、取得した距離画像に対して、選択用データ格納部１１１に格納されているそれぞれのマッチング用点群情報に含まれる教示位置近傍画像データに基づいて、例えばＩＣＰ（Iterative Closest Point）マッチング等の３次元点群のマッチング手法を用いて、探索を行う。そして、マッチング部１１３は、距離画像中で、マッチングの評価が高い画像領域の例えば中心位置を、取り出し対象のワーク５０の取り出し位置として選択する。なお、マッチングの評価が閾値以上に高い画像領域を複数選択し、これら複数の画像領域のうち、地面からの高さがもっとも高い画像領域を新たな教示位置近傍画像データとしてもよい。 Next, matching using the matching point cloud information performed by the matching unit 113 when the work 50 is taken out will be described.
When the work 50 is taken out, the matching unit 113 acquires a distance image of the work 50s piled up in bulk in the container 60 from the three-dimensional measuring machine 40. Then, the matching unit 113 refers to the acquired distance image, for example, based on the teaching position neighborhood image data included in the respective matching point cloud information stored in the selection data storage unit 111, for example, ICP (Iterative Closest). Point) Search is performed using a matching method for a three-dimensional point cloud such as matching. Then, the matching unit 113 selects, for example, the center position of the image region having a high evaluation of matching in the distance image as the extraction position of the work 50 to be extracted. A plurality of image regions having a matching evaluation higher than the threshold value may be selected, and the image region having the highest height from the ground among the plurality of image regions may be used as new teaching position neighborhood image data.

マッチング部１１３は、選択したワーク５０の取り出し位置をロボット制御装置２０に対して送信する。そして、ロボット制御装置２０が、受信したワーク５０の取り出し位置に基づいてロボット３０を制御することにより、ワーク５０の取り出しを試みる。 The matching unit 113 transmits the take-out position of the selected work 50 to the robot control device 20. Then, the robot control device 20 attempts to take out the work 50 by controlling the robot 30 based on the received take-out position of the work 50.

ここで、マッチング用点群情報は、上述したようにユーザの教示に基づいて作成されるものであるが、全てのマッチング用点群情報が適切なものであるとは限らない。例えば、或るマッチング用点群情報とマッチングの評価が高い部分を取り出し位置とした場合には取り出しに成功するが、他のマッチング用点群情報とマッチングの評価が高い部分を取り出し位置とした場合には取り出しに失敗することがあり得る。
このように、マッチング用点群情報によって成否が異なる場合があり得るので、マッチング部１１３は、各マッチング用点群情報に対して評価を行うことにより、各マッチング用点群情報に評価値を付与するようにするとよい。そして、マッチング部１１３は、この評価値が高いマッチング用点群情報を利用することが望ましい。
また、この評価値を付与したマッチング用点群情報は、後述する機械学習において教師データとして使用するため、マッチング部１１３は、評価値を付与したマッチング用点群情報を選択用データ格納部１１１に格納する。なお、評価値の低いマッチング用点群情報も機械学習のための教師データ（失敗データ）として必要なデータである。そのため、マッチング部１１３は、評価値の高いマッチング用点群情報のみならず、評価値の低いマッチング用点群情報も、選択用データ格納部１１１に教師データとして格納する。 Here, the matching point cloud information is created based on the instruction of the user as described above, but not all the matching point cloud information is appropriate. For example, when a part with a high evaluation of matching with a certain matching point cloud information is set as the extraction position, the extraction is successful, but when a part with a high evaluation of matching with other matching point cloud information is set as the extraction position. May fail to retrieve.
In this way, success or failure may differ depending on the matching point cloud information. Therefore, the matching unit 113 assigns an evaluation value to each matching point cloud information by evaluating each matching point cloud information. It is good to do it. Then, it is desirable that the matching unit 113 uses the matching point cloud information having a high evaluation value.
Further, since the matching point cloud information to which the evaluation value is given is used as teacher data in machine learning described later, the matching unit 113 stores the matching point cloud information to which the evaluation value is given in the selection data storage unit 111. Store. The matching point cloud information with a low evaluation value is also necessary data as teacher data (failure data) for machine learning. Therefore, the matching unit 113 stores not only the matching point cloud information having a high evaluation value but also the matching point cloud information having a low evaluation value in the selection data storage unit 111 as teacher data.

マッチング部１１３は、ワーク５０の取り出しの成否に応じて評価値を付与することができる。例えば、マッチング部１１３は、或るマッチング用点群情報とマッチングの評価が高い部分を取り出し位置とした場合に、ワーク５０の取り出しに成功した場合には、取り出しに失敗した場合よりも高い評価値を付与する。例えば、マッチング部１１３は、ワーク５０の取り出しに成功した場合に第１所定値以上（例えば６０点以上）を付与し、ワーク５０の取り出しに失敗した場合は、第２所定値以下（例えば５０点以下）とする。また、例えば、マッチング部１１３は、ワーク５０の取り出しに成功した場合には、取り出しに要した時間に応じて更に評価値を異なるようにしてもよい。例えば、マッチング部１１３は、ワーク５０の取り出しに要した時間が短いほど評価値が高くなるようにしてもよい。また、例えば、マッチング部１１３は、ワーク５０の取り出しに失敗した場合には、失敗した度合いに応じて評価値を異なるようにしてもよい。例えば、マッチング部１１３は、ワーク５０を把持できたが取り出し途中で落下してしまった場合には、ワーク５０を把持できなかった場合よりも評価値が高くなるようにしてもよい。 The matching unit 113 can give an evaluation value according to the success or failure of taking out the work 50. For example, when the matching point cloud information and the portion where the evaluation of matching is high is set as the extraction position, the matching unit 113 has a higher evaluation value when the work 50 is successfully extracted than when the extraction fails. Is given. For example, the matching unit 113 assigns a first predetermined value or more (for example, 60 points or more) when the work 50 is successfully taken out, and gives a second predetermined value or less (for example, 50 points) when the work 50 is taken out unsuccessfully. Below). Further, for example, when the matching unit 113 succeeds in taking out the work 50, the evaluation value may be further changed according to the time required for taking out the work 50. For example, the matching unit 113 may set the evaluation value to be higher as the time required for taking out the work 50 is shorter. Further, for example, when the matching unit 113 fails to take out the work 50, the evaluation value may be different depending on the degree of failure. For example, if the matching unit 113 can grip the work 50 but falls during the removal, the evaluation value may be higher than that in the case where the work 50 cannot be gripped.

マッチング部１１３は、今回計測した距離画像に対して、各マッチング用点群情報に基づくマッチングを行う。また、マッチング部１１３は、各マッチングした箇所におけるワーク５０の取り出しの試みと、上述した評価値の付与を行う。この処理は、３次元計測機４０からコンテナ６０内に積まれているバラ積みされたワーク５０の距離画像を新たに取得することにより繰り返される。 The matching unit 113 performs matching based on each matching point cloud information with respect to the distance image measured this time. In addition, the matching unit 113 attempts to take out the work 50 at each matching location and assigns the evaluation value described above. This process is repeated by newly acquiring a distance image of the work 50s piled up in bulk in the container 60 from the three-dimensional measuring machine 40.

このように、マッチング部１１３は、マッチング用点群情報を用いたマッチング及びワーク５０の取り出しと、この取り出し結果に応じた評価値の付与を繰り返すことにより、各マッチング用点群情報についての評価を行うことができる。そして、マッチング部１１３が、評価値の高いマッチング用点群情報を選択することによって、ワーク５０の取り出しに成功する確率を向上させることができる。 In this way, the matching unit 113 evaluates each matching point cloud information by repeating matching using the matching point cloud information, taking out the work 50, and assigning an evaluation value according to the taking out result. It can be carried out. Then, the matching unit 113 can improve the probability of succeeding in taking out the work 50 by selecting the matching point cloud information having a high evaluation value.

また、マッチング部１１３は、距離画像でのマッチングに成功した部分について、マッチング用点群情報と同じ大きさの範囲の３次元点群情報を取得し、この取得した３次元点群情報を新たなマッチング点群情報とするようにするとよい。
このようにすれば、マッチング部１１３は、マッチングに用いたマッチング用点群情報に加え、マッチングした部分から取得した新たなマッチング用点群情報についてワーク５０の取り出しを行うことが可能となる。つまり、マッチング部１１３は、マッチング用点群情報を自動的に増やしていくことができる。これにより、より評価値の高いマッチング用点群情報を収集することが可能となる。 Further, the matching unit 113 acquires the three-dimensional point cloud information in the range of the same size as the matching point cloud information for the portion that has succeeded in matching in the distance image, and newly obtains the acquired three-dimensional point cloud information. It is preferable to use the matching point cloud information.
In this way, the matching unit 113 can take out the work 50 with respect to the new matching point cloud information acquired from the matching portion in addition to the matching point cloud information used for matching. That is, the matching unit 113 can automatically increase the matching point cloud information. This makes it possible to collect matching point cloud information with a higher evaluation value.

このように、第１の実施形態では、マッチング部１１３が、マッチング用点群情報を用いたマッチング及びワーク５０の取り出しと、この取り出し結果に応じた評価値の付与を繰り返すことにより、ワーク５０取り出し位置を選択することが可能となる。また、マッチング用点群情報を自動的に増やしていくことも可能となる。
そのため、第１の実施形態では、ユーザからの新たな教示を受けなくとも、選択した取り出し位置に基づいたワーク５０の取り出しを行うことができる。 As described above, in the first embodiment, the matching unit 113 repeats the matching using the matching point cloud information and the extraction of the work 50, and the addition of the evaluation value according to the extraction result, thereby extracting the work 50. It becomes possible to select the position. It is also possible to automatically increase the matching point cloud information.
Therefore, in the first embodiment, the work 50 can be taken out based on the selected take-out position without receiving a new instruction from the user.

なお、第１の実施形態は、従来から行われているＣＡＤ等のモデルに対する教示と異なり、３次元計測機４０で実際にワーク５０を測定することにより取得した距離画像に対する教示に基づく。このように実際にワーク５０を測定することにより取得した距離画像を用いる場合、外乱が含まれるなどといった理由によってマッチングが成立しないことが多くなる。そこで、マッチングが成立するように、マッチングの成否に関する閾値を甘く設定する場合がある。
しかしながら、このように閾値を甘くする手段を取ると、実際には取出し位置として検出されるにはふさわしくない場所についてもマッチングが成立してしまうという問題が生じる。そこで、この問題を軽減するため、描画によって教示しなかった場所に関してもマッチングを試みて、もしこの教示しなかった場所でマッチングが成立した場合、検出位置としては不適切なものとし、最終的な検出位置から除外する処理を、上述した第１の実施形態の処理に更に加えるようにしてもよい。
以上、マッチング部１１３によるマッチング処理について説明をした。上述のマッチング処理により、ワーク５０の取り出し位置の選択を実現することができる。
この構成に、更に学習部１４による機械学習と、切り抜き部１１４による切り抜き処理とを組み合わせることにより、ワーク５０の取り出し位置の選択精度を向上させることが可能となる。以下、学習部１４による機械学習について説明する。次に切り抜き部１１４による切り抜き処理について説明する。 The first embodiment is based on the teaching for the distance image acquired by actually measuring the work 50 with the three-dimensional measuring machine 40, unlike the teaching for the model such as CAD which has been conventionally performed. When the distance image acquired by actually measuring the work 50 in this way is used, matching is often not established due to reasons such as disturbance being included. Therefore, there is a case where the threshold value regarding the success or failure of the matching is set loosely so that the matching is established.
However, if the means of loosening the threshold value is taken in this way, there arises a problem that matching is established even in a place that is not suitable for being actually detected as the take-out position. Therefore, in order to alleviate this problem, we try to match even the places that were not taught by drawing, and if the matching is established at the places that were not taught, it is regarded as inappropriate as the detection position and finally. The process of excluding from the detection position may be further added to the process of the first embodiment described above.
The matching process by the matching unit 113 has been described above. By the above-mentioned matching process, it is possible to select the take-out position of the work 50.
By further combining this configuration with machine learning by the learning unit 14 and cutting processing by the cutting unit 114, it is possible to improve the selection accuracy of the take-out position of the work 50. Hereinafter, machine learning by the learning unit 14 will be described. Next, the cutting process by the cutting portion 114 will be described.

＜機械学習＞
マッチング部１１３は、上述したように、マッチング用点群情報を用いたマッチング及びワーク５０の取り出しと、この取り出し結果に応じた評価値の付与を繰り返すことにより、評価値が付与されたマッチング用点群情報を作成する。そして、上述したように、マッチング部１１３は、評価値が付与されたマッチング用点群情報を、選択用データ格納部１１１に格納する。
学習部１４の学習処理部１４１は、選択用データ格納部１１１に格納された、評価値が付与されたマッチング用点群情報を教師データとして、教師あり機械学習を行う。また、学習処理部１４１は、この教師あり学習により、ワーク５０の取り出し位置の選択精度を向上するための学習モデルを構築する。そして、学習処理部１４１は、構築した学習モデルを学習済モデル格納部１４２に格納する。
そして、後述の切り抜き処理にて、推定処理部１４３が、この学習済モデル格納部１４２に格納された学習モデルを利用することにより、３次元計測機４０により計測される距離画像とのマッチングを行うことなく、ワーク５０の取り出し位置を選択することを可能とする。また、学習処理部１４１は、切り抜き処理での取り出しの成否に基づいて更に機械学習を行うことによって、学習済モデル格納部１４２に格納されている一度構築された学習モデルを更に更新することもできる。 <Machine learning>
As described above, the matching unit 113 repeats matching using the matching point cloud information, taking out the work 50, and giving an evaluation value according to the taking-out result, so that the matching point to which the evaluation value is given is given. Create point cloud information. Then, as described above, the matching unit 113 stores the matching point cloud information to which the evaluation value is given in the selection data storage unit 111.
The learning processing unit 141 of the learning unit 14 performs supervised machine learning using the matching point cloud information to which the evaluation value is given stored in the selection data storage unit 111 as teacher data. Further, the learning processing unit 141 constructs a learning model for improving the selection accuracy of the extraction position of the work 50 by this supervised learning. Then, the learning processing unit 141 stores the constructed learning model in the trained model storage unit 142.
Then, in the clipping process described later, the estimation processing unit 143 matches the distance image measured by the three-dimensional measuring machine 40 by using the learning model stored in the learned model storage unit 142. It is possible to select the take-out position of the work 50 without having to do so. Further, the learning processing unit 141 can further update the once constructed learning model stored in the learned model storage unit 142 by further performing machine learning based on the success or failure of the extraction in the clipping process. ..

次に、学習モデルの構築について具体的に説明をする。学習処理部１４１は、選択用データ格納部１１１に格納された、マッチング用点群情報に含まれる教示位置近傍画像データを入力データとし、このマッチング用点群情報に付与された評価値をラベルとして教師あり学習を行う。教師あり学習の方法として、学習処理部１４１は、画像データを対象とした学習に適しているニューラルネットワークである、コンボリューションニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）を使用した深層学習（ディープラーニング）を行う。そのために、３層以上で、且つ、少なくとも１回の画像のコンボリューション演算を含むコンボリューションニューラルネットワークを用意する。ただし、これは第１の実施形態で適用される機械学習をコンボリューションニューラルネットワークに限定する趣旨ではない。第１の実施形態に、コンボリューションニューラルネットワーク以外の深層学習モデルや、線形モデル等を利用した機械学習を適用するようにしてもよい。 Next, the construction of the learning model will be specifically described. The learning processing unit 141 uses the teaching position vicinity image data included in the matching point cloud information stored in the selection data storage unit 111 as input data, and the evaluation value given to the matching point cloud information as a label. Learn with a teacher. As a method of supervised learning, the learning processing unit 141 performs deep learning using a convolutional neural network (CNN), which is a neural network suitable for learning targeting image data. Do. Therefore, a convolution neural network having three or more layers and including at least one image convolution operation is prepared. However, this does not mean that the machine learning applied in the first embodiment is limited to the convolution neural network. A deep learning model other than the convolution neural network, machine learning using a linear model, or the like may be applied to the first embodiment.

ここで、コンボリューションニューラルネットワークは、畳み込み層、プーリング層、全結合層、及び出力層を備えた構造となっている。ただし、これはあくまで説明のための構造例であり、例えばプーリング層を省略するようにしてもよい。また、例えば上述したように、画像をラベルとして学習を行うような場合に、更にデコンボリューション層を設けるようにしてもよい。 Here, the convolution neural network has a structure including a convolution layer, a pooling layer, a fully connected layer, and an output layer. However, this is just a structural example for explanation, and for example, the pooling layer may be omitted. Further, for example, as described above, when learning is performed using an image as a label, a deconvolution layer may be further provided.

畳み込み層では、エッジ抽出等の特徴抽出を行うために、入力された３次元点群情報に対して所定のパラメータのフィルタをかける。このフィルタにおける所定のパラメータは、一般的なニューラルネットワークの重みに相当しており、学習を繰り返すことにより学習されていく。 In the convolution layer, a filter of a predetermined parameter is applied to the input three-dimensional point cloud information in order to perform feature extraction such as edge extraction. The predetermined parameters in this filter correspond to the weights of a general neural network, and are learned by repeating the learning.

プーリング層では、例えば、畳み込み層から出力された画像を小さなウインドウに区切り、ウインドウそれぞれの特徴（例えばウインドウそれぞれにおける最大値）を出力する。
これら畳み込み層及びプーリング層を組み合わせることによって、３次元点群情報から特徴量を抽出することができる。 In the pooling layer, for example, the image output from the convolution layer is divided into small windows, and the characteristics of each window (for example, the maximum value in each window) are output.
By combining these convolution layers and pooling layers, features can be extracted from the three-dimensional point cloud information.

全結合層では、畳み込み層及びプーリング層を通して取り出した特徴を１つのノードに結合し、活性化関数によって変換した値を出力する。ここで、活性化関数は、０未満の出力値を全て０にする関数で、ある閾値以上の部分だけを意味の有る情報として出力層に送るために用いる。 In the fully connected layer, the features taken out through the convolution layer and the pooling layer are combined into one node, and the value converted by the activation function is output. Here, the activation function is a function that sets all output values less than 0 to 0, and is used to send only the portion above a certain threshold value to the output layer as meaningful information.

出力層では、全結合層からの出力に基づいて、入力データとしたマッチング用点群情報に基づいた取り出しについての評価値を出力する。そして、出力層の出力と、ラベルの誤差を算出する。ここで、ラベルは、上述したように、入力データとしたマッチング用点群情報に付与された評価値である。 The output layer outputs an evaluation value for extraction based on the matching point cloud information as input data based on the output from the fully connected layer. Then, the error between the output of the output layer and the label is calculated. Here, as described above, the label is an evaluation value given to the matching point cloud information as input data.

学習開始時は、コンボリューションニューラルネットワークに含まれる各パラメータの重み付けが適切にはなされていないので、この誤差は大きな値となる可能性が高い。そこで、学習処理部１４１は、算出した誤差を小さくするように重み付け値を修正する。具体的には、誤差を小さくするために、フォワードプロパゲーションやバックプロパゲーションと呼ばれる処理を繰り返すことにより、コンボリューションニューラルネットワークに含まれる各パーセプトロンの重み付け値を変更する。 At the start of learning, each parameter included in the convolution neural network is not properly weighted, so this error is likely to be a large value. Therefore, the learning processing unit 141 corrects the weighting value so as to reduce the calculated error. Specifically, in order to reduce the error, the weighting value of each perceptron included in the convolution neural network is changed by repeating a process called forward propagation or back propagation.

学習処理部１４１は、このようにして、教師データの特徴を学習し、入力データとしたマッチング用点群情報から、評価値を出力するための学習モデルを帰納的に獲得する。学習処理部１４１は、構築した学習モデルを学習済モデルとして学習済モデル格納部１４２に格納する。 In this way, the learning processing unit 141 learns the characteristics of the teacher data and recursively acquires a learning model for outputting an evaluation value from the matching point cloud information used as input data. The learning processing unit 141 stores the constructed learning model as a trained model in the trained model storage unit 142.

上述した学習は、オンライン学習で行ってもよいし、バッチ学習やミニバッチ学習で教師あり学習を行ってもよい。
オンライン学習とは、教師データが作成される都度、即座に教師あり学習を行うという学習方法である。また、バッチ学習とは、教師データが作成されることが繰り返される間に、繰り返しに応じた複数の教師データを収集し、収集した全ての教師データを用いて、教師あり学習を行うという学習方法である。更に、ミニバッチ学習とは、オンライン学習と、バッチ学習の中間的な、ある程度教師データが溜まるたびに教師あり学習を行うという学習方法である。バッチ学習やミニバッチ学習を行う場合には、学習を開始するまでの間、収集した教師データを選択用データ格納部１１１に格納すればよい。
また、新たに教師データを取得した場合に、学習済モデルのパラメータを初期値として学習を行うことにより学習済モデルによる推定の精度を向上させるようにしてもよい。また、他にも、新たに教師データを取得した場合に、学習済モデルと関係なく、別途の学習モデルを新たに構築するようにしてもよい。 The above-mentioned learning may be performed by online learning, or may be supervised learning by batch learning or mini-batch learning.
Online learning is a learning method in which supervised learning is performed immediately each time teacher data is created. In addition, batch learning is a learning method in which a plurality of teacher data corresponding to the repetition are collected while the teacher data is repeatedly created, and supervised learning is performed using all the collected teacher data. Is. Further, mini-batch learning is a learning method in which supervised learning is performed every time teacher data is accumulated to some extent, which is intermediate between online learning and batch learning. When performing batch learning or mini-batch learning, the collected teacher data may be stored in the selection data storage unit 111 until the learning is started.
Further, when the teacher data is newly acquired, the accuracy of the estimation by the trained model may be improved by performing the training with the parameters of the trained model as the initial values. In addition, when the teacher data is newly acquired, a separate learning model may be newly constructed regardless of the trained model.

＜切り抜き処理＞
次に、切り抜き部１１４が行う切り抜き処理について説明をする。
切り抜き部１１４は、学習処理部１４１により構築された学習モデルを利用することにより、３次元計測機４０により計測される距離画像とのマッチングを行うことなく、ワーク５０の取り出し位置を選択する。以下、この、マッチングを行うことなく、取り出し位置を選択する処理を「切り抜き処理」と呼んで説明をする。 <Cutout processing>
Next, the cutting process performed by the cutting unit 114 will be described.
By using the learning model constructed by the learning processing unit 141, the cutting unit 114 selects the extraction position of the work 50 without matching with the distance image measured by the three-dimensional measuring machine 40. Hereinafter, this process of selecting the extraction position without performing matching will be referred to as “cutout process” and will be described.

切り抜き処理を行う場合、切り抜き部１１４は、取り出し作業時に新たな距離画像を取得すると、取得した距離画像の全域にわたって学習済モデルに対する入力と同じ大きさ（すなわち、マッチング用点群情報と同じ大きさ）の距離画像を切り抜く。そして、切り抜き部１１４は、この切り抜いた部分（以下「取り出し位置候補」という）の３次元点群情報を取得する。 When performing the clipping process, when a new distance image is acquired during the extraction operation, the clipping unit 114 has the same size as the input to the trained model over the entire acquired distance image (that is, the same size as the matching point cloud information). ) Distance image is cropped. Then, the cutout portion 114 acquires the three-dimensional point cloud information of the cut out portion (hereinafter referred to as “take-out position candidate”).

切り抜き部１１４による取り出し位置候補の抽出処理（切り抜き処理）について図７及び図８を参照して説明をする。図７に切り抜きの対象とする領域を取り出し位置候補９０として図示する。この取り出し位置候補９０の大きさは、上記したように、学習済モデルへの入力と同じ大きさ（すなわち、マッチング用点群情報と同じ大きさ）となる。 The extraction process (cutout process) of the extraction position candidate by the cutout unit 114 will be described with reference to FIGS. 7 and 8. FIG. 7 shows a region to be cut out as a take-out position candidate 90. As described above, the size of the extraction position candidate 90 is the same as the input to the trained model (that is, the same size as the matching point cloud information).

切り抜き部１１４は、取り出し位置候補９０とする対象を、例えば図８に示すように、画像左上から数ピクセルずつ対象位置をずらしながら走査することで画像全域に対して切り抜きを行う。なお、図８では、直線的に走査を行っているが、走査に回転を加えてもよい。また、走査開始位置や走査方向も任意に設定してよい。切り抜き部１１４は、切り抜き時に、距離画像上での取り出し位置候補９０の位置（例えば、切り抜いた画像データの中心位置）も取得し、取り出し位置候補と対応付けて、選択用データ格納部１１１に格納する。
推定処理部１４３は、このようにして抽出される全ての取り出し位置候補の３次元点群情報それぞれを、選択用データ格納部１１１から取得する。そして、推定処理部１４３は、取得した全ての取り出し位置候補の３次元点群情報それぞれを、上述したマッチング用点群情報の代わりに学習済モデルに入力し、それぞれの取り出し位置候補についての評価値を出力として得る。推定処理部１４３は、この出力を切り抜き部１１４に対して通知する。 The cutout unit 114 cuts out the entire image by scanning the target to be the extraction position candidate 90 while shifting the target position by several pixels from the upper left of the image, for example, as shown in FIG. Although scanning is performed linearly in FIG. 8, rotation may be added to the scanning. Further, the scanning start position and the scanning direction may be arbitrarily set. At the time of cropping, the cropping unit 114 also acquires the position of the extraction position candidate 90 on the distance image (for example, the center position of the cropped image data), associates it with the extraction position candidate, and stores it in the selection data storage unit 111. To do.
The estimation processing unit 143 acquires each of the three-dimensional point cloud information of all the extraction position candidates extracted in this way from the selection data storage unit 111. Then, the estimation processing unit 143 inputs each of the acquired three-dimensional point cloud information of all the fetched position candidates into the trained model instead of the above-mentioned matching point cloud information, and evaluates each fetched position candidate. Is obtained as an output. The estimation processing unit 143 notifies the clipping unit 114 of this output.

切り抜き部１１４は、学習済モデルから出力される評価値が高い切り抜き画像に対応付けて記憶している位置を取り出し位置として選択する。切り抜き部１１４は、選択した取り出し位置をロボット制御装置２０に対して送信する。そして、ロボット制御装置２０が、受信した取り出し位置に基づいてロボット３０を制御することにより、ワーク５０の取り出しを試みる。 The cutout unit 114 selects a position stored in association with the cutout image having a high evaluation value output from the trained model as a take-out position. The cutout unit 114 transmits the selected take-out position to the robot control device 20. Then, the robot control device 20 attempts to take out the work 50 by controlling the robot 30 based on the received take-out position.

ここで、取り出し位置候補の３次元点群情報によりワーク５０の取り出しを試みた場合も、マッチング用点群情報により取り出しを試みた場合と同様に、取り出しに成功することも、失敗することもあり得る。そこで、切り抜き部１１４は、マッチング処理時におけるマッチング用点群情報と同様に、ワーク５０の取り出しの成否に応じて取り出し位置候補にも評価値を付与する。また、切り抜き部１１４は、この評価値を付与した取り出し位置候補を選択用データ格納部１１１に格納する。この評価値を付与した取り出し位置候補は、学習処理部１４１が、学習済モデルを更新するための、新たな教師データとして利用することができる。 Here, even when the work 50 is tried to be taken out by the three-dimensional point cloud information of the take-out position candidate, the take-out may be successful or unsuccessful as in the case where the work 50 is taken out by the matching point cloud information. obtain. Therefore, the cutout unit 114 assigns an evaluation value to the extraction position candidate according to the success or failure of extraction of the work 50, as in the case of the matching point cloud information at the time of the matching process. Further, the cutout unit 114 stores the extraction position candidate to which the evaluation value is given in the selection data storage unit 111. The extraction position candidate to which this evaluation value is given can be used as new teacher data for the learning processing unit 141 to update the trained model.

具体的には、学習処理部１４１は、取り出し位置候補の３次元点群データを入力データとし、この入力データとした取り出し位置候補に付与された評価値をラベルとして、上述のオンライン学習又はミニバッチ学習を行うことが好ましい。これにより、ワーク５０の取出し操作を実行しながら、学習済モデルを更に精度の高い学習済モデルにリアルタイムに更新することができる。なお、オンライン学習又はミニバッチ学習に限定されない。バッチ学習により学習モデルを更新してもよい。 Specifically, the learning processing unit 141 uses the three-dimensional point cloud data of the extraction position candidate as input data, and uses the evaluation value given to the extraction position candidate as the input data as a label, and performs the above-mentioned online learning or mini-batch learning. Is preferable. As a result, the trained model can be updated to a more accurate trained model in real time while executing the fetching operation of the work 50. It is not limited to online learning or mini-batch learning. The learning model may be updated by batch learning.

＜第１の実施形態の動作＞
次に、図９Ａ及び図９Ｂのフローチャートを参照して、第１の実施形態の動作について説明をする。なお、図９Ａは上述したマッチング処理に相当する動作についてのフローチャートであり、図９Ｂは上述した切り抜き処理に相当する動作についてのフローチャートである。 <Operation of the first embodiment>
Next, the operation of the first embodiment will be described with reference to the flowcharts of FIGS. 9A and 9B. Note that FIG. 9A is a flowchart of an operation corresponding to the above-mentioned matching process, and FIG. 9B is a flowchart of an operation corresponding to the above-mentioned clipping process.

ステップＳ１１において、アノテーション処理部１１２は、３次元計測機４０から、バラ積みされたワーク５０を計測することにより生成された距離画像を取得する。
ステップＳ１２において、アノテーション処理部１１２は、距離画像を表示部１２に表示させる。
ステップＳ１３において、アノテーション処理部１１２は、操作受付部１３が受け付けたユーザからのワーク５０の取り出し位置の教示に基づいて、距離画像上に教示位置を描画する。 In step S11, the annotation processing unit 112 acquires a distance image generated by measuring the works 50 piled up in bulk from the three-dimensional measuring machine 40.
In step S12, the annotation processing unit 112 causes the display unit 12 to display the distance image.
In step S13, the annotation processing unit 112 draws the teaching position on the distance image based on the teaching of the extraction position of the work 50 from the user received by the operation receiving unit 13.

ステップＳ１４において、アノテーション処理部１１２は、マッチング用点群情報の大きさを設定する。設定は、予め与えられている設定値や、ユーザの操作に応じて行われる。
ステップＳ１５において、アノテーション処理部１１２は、ステップＳ１４にて行われた設定に基づいて、マッチング用点群情報を生成する。また、アノテーション処理部１１２は、生成したマッチング用点群情報を選択用データ格納部１１１に格納する。
ステップＳ１６において、マッチング部１１３は、選択用データ格納部１１１に格納されたマッチング用点群情報を用いて、マッチング及びワーク５０の取り出しを行うことにより、マッチング用点群情報に評価値を付与する。マッチング部１１３は、評価値を付与したマッチング用点群情報を、選択用データ格納部１１１に格納する。 In step S14, the annotation processing unit 112 sets the size of the matching point cloud information. The setting is performed according to a setting value given in advance or an operation of the user.
In step S15, the annotation processing unit 112 generates matching point cloud information based on the settings made in step S14. Further, the annotation processing unit 112 stores the generated matching point cloud information in the selection data storage unit 111.
In step S16, the matching unit 113 assigns an evaluation value to the matching point cloud information by performing matching and taking out the work 50 using the matching point cloud information stored in the selection data storage unit 111. .. The matching unit 113 stores the matching point cloud information to which the evaluation value is given in the selection data storage unit 111.

ステップＳ１７において、マッチング部１１３は、追加でマッチング用点群情報が必要であるか否かを判定する。評価値が所定の値以上のマッチング用点群情報を、所定数以上記憶している場合には、ステップＳ１７においてＮｏと判定され、処理はステップＳ１８に進む。一方で、評価値が所定の値以上のマッチング用点群情報を、所定数以上記憶していない場合には、ステップＳ１７においてＹｅｓと判定され、処理はステップＳ１１に戻り、処理を再度繰り返す。 In step S17, the matching unit 113 determines whether or not additional matching point cloud information is required. When a predetermined number or more of matching point cloud information whose evaluation value is equal to or more than a predetermined value is stored, it is determined as No in step S17, and the process proceeds to step S18. On the other hand, when the matching point cloud information whose evaluation value is equal to or more than a predetermined value is not stored in a predetermined number or more, it is determined as Yes in step S17, the process returns to step S11, and the process is repeated again.

ステップＳ１８において、学習処理部１４１は、学習部１４が記憶するマッチング用点群情報を入力データとし、このマッチング用点群情報に付与された評価値をラベルとした学習を行う。これにより、学習済モデルが作成され、学習済モデル格納部１４２に格納される。 In step S18, the learning processing unit 141 uses the matching point cloud information stored in the learning unit 14 as input data, and performs learning using the evaluation value given to the matching point cloud information as a label. As a result, the trained model is created and stored in the trained model storage unit 142.

次に、図９Ｂを参照して、切り抜き処理時の動作について説明をする。
ステップＳ１９において、切り抜き部１１４は、３次元計測機４０から、バラ積みされたワーク５０を計測することにより生成された距離画像を取得する。 Next, the operation during the cutting process will be described with reference to FIG. 9B.
In step S19, the cutout portion 114 acquires a distance image generated by measuring the bulk-stacked workpieces 50 from the three-dimensional measuring machine 40.

ステップＳ２０において、切り抜き部１１４は、取得した距離画像の全域にわたって学習済モデルに対する入力と同じ大きさ（すなわち、マッチング用点群情報と同じ大きさ）の距離画像を取り出し位置候補として切り抜く。そして、切り抜き部１１４は、この取り出し位置候補の３次元点群情報を取得する。また、切り抜き部１１４は、取得した、取り出し位置候補の３次元点群情報を選択用データ格納部１１１に格納する。推定処理部１４３は、選択用データ格納部１１１に格納された、全ての取り出し位置候補の３次元点群情報それぞれを、学習済モデルに対して入力し、それぞれの取り出し位置候補についての評価値を出力として得る。推定処理部１４３は、この出力を切り抜き部１１４に対して通知する。 In step S20, the cutout unit 114 cuts out a distance image having the same size as the input to the trained model (that is, the same size as the matching point cloud information) over the entire acquired distance image as a extraction position candidate. Then, the cutout unit 114 acquires the three-dimensional point cloud information of the extraction position candidate. Further, the cutout unit 114 stores the acquired three-dimensional point cloud information of the extraction position candidate in the selection data storage unit 111. The estimation processing unit 143 inputs each of the three-dimensional point cloud information of all the extraction position candidates stored in the selection data storage unit 111 to the trained model, and inputs the evaluation value for each extraction position candidate. Get as output. The estimation processing unit 143 notifies the clipping unit 114 of this output.

ステップＳ２１において、切り抜き部１１４は、学習済モデルから出力される評価値が高い切り抜き画像に対応付けて記憶している位置を取り出し位置として選択する。切り抜き部１１４は、選択した取り出し位置をロボット制御装置２０に対して送信する。そして、ロボット制御装置２０が、受信した取り出し位置に基づいてロボット３０を制御することにより、ワーク５０の取り出しを試みる。上述したように、ワーク５０の取り出しを試みた結果、取り出しに成功することも、失敗することもあり得る。そこで、切り抜き部１１４は、ワーク５０の取り出しの成否に応じて取り出し位置候補に評価値を付与する。 In step S21, the cutout unit 114 selects a position stored in association with the cutout image having a high evaluation value output from the trained model as a take-out position. The cutout unit 114 transmits the selected take-out position to the robot control device 20. Then, the robot control device 20 attempts to take out the work 50 by controlling the robot 30 based on the received take-out position. As described above, as a result of trying to take out the work 50, the taking out may be successful or unsuccessful. Therefore, the cutout portion 114 assigns an evaluation value to the take-out position candidate according to the success or failure of take-out of the work 50.

ステップＳ２２において、学習処理部１４１は、ステップＳ２１において評価値が付与された取り出し位置候補を教師データとした学習を行うことにより、学習済モデルを更新するか否かを判定する。ミニバッチ学習を行う場合であれば、教師データが所定数記録されたことや、前回の学習から所定時間経過した場合に、ステップＳ２３においてＹｅｓと判定され、処理はステップＳ２４に進む。一方で、教師データが所定数以下しか記録されていないことや、前回の学習から所定時間経過していない場合に、ステップＳ２３においてＮｏと判定され、処理はステップＳ２４に進む。なお、オンライン学習であれば、ステップＳ２３においてＹｅｓと判定され、処理はステップＳ２４に進む。 In step S22, the learning processing unit 141 determines whether or not to update the trained model by performing learning using the extraction position candidate to which the evaluation value is given as the teacher data in step S21. In the case of performing mini-batch learning, when a predetermined number of teacher data is recorded or when a predetermined time has elapsed from the previous learning, it is determined as Yes in step S23, and the process proceeds to step S24. On the other hand, when less than a predetermined number of teacher data are recorded or when a predetermined time has not elapsed since the previous learning, it is determined as No in step S23, and the process proceeds to step S24. If it is online learning, it is determined to be Yes in step S23, and the process proceeds to step S24.

ステップＳ２４において、学習処理部１４１は、取り出し位置候補の３次元点群データを入力データとし、この入力データとした取り出し位置候補に付与された評価値をラベルとして、上述の学習を行う。これにより、学習済モデル格納部１４２に格納されている学習済モデルを更新する。 In step S24, the learning processing unit 141 performs the above-mentioned learning using the three-dimensional point cloud data of the extraction position candidate as input data and the evaluation value given to the extraction position candidate as the input data as a label. As a result, the trained model stored in the trained model storage unit 142 is updated.

ステップＳ２４において、切り抜き部１１４は、取り出しを続けるか否かを判定する。評価値の取り出し位置候補が存在し、この取り出し位置候補を対象とした取り出しを行っていない場合には、未だ取り出していないワーク５０が存在すると考えられるので、ステップＳ２４においてＹｅｓと判定され、処理はステップＳ２０に進む。一方で、全ての評価値の高い取り出し位置候補を対象とした取り出しを行っている場合には、全てのワーク５０が取り出されたと考えられるので、ステップＳ２４においてＮｏと判定され処理は終了する。 In step S24, the cutout portion 114 determines whether or not to continue taking out. If there is a take-out position candidate for the evaluation value and the take-out position candidate is not taken out, it is considered that there is a work 50 that has not been taken out yet. Therefore, it is determined as Yes in step S24, and the process is performed. The process proceeds to step S20. On the other hand, when all the extraction position candidates having high evaluation values are extracted, it is considered that all the works 50 have been extracted, so that it is determined as No in step S24 and the process ends.

以上説明した第１の実施形態のマッチング処理によれば、ステップＳ１３においてユーザからの取り出し位置の教示を受けた後は、ユーザからの新たな教示を受けなくとも、記憶したマッチング用点群情報（教示位置の３次元点情報と、教示位置を中心とした教示位置近傍画像データ）に基づいて、３次元計測機４０により計測された距離画像とのマッチングを用いた探索を行うことで、取り出し位置を選択することが可能となる。また、第１の実施形態のマッチング処理によれば、ユーザは取り出し位置を教示するのみであり、探索アルゴリズムの選択等の経験に基づく知識を必要としない。更に、第１の実施形態のマッチング処理によれば、ＣＡＤデータを利用しないので、ＣＡＤデータを準備する手間が省ける。
つまり、第１の実施形態のマッチング処理によれば、従来に比べて、より簡便な方法でワーク５０の取り出し位置を選択することができる。 According to the matching process of the first embodiment described above, after receiving the instruction of the extraction position from the user in step S13, the stored matching point cloud information ( Based on the 3D point information of the teaching position and the image data near the teaching position centered on the teaching position), the extraction position is performed by performing a search using matching with the distance image measured by the 3D measuring machine 40. Can be selected. Further, according to the matching process of the first embodiment, the user only teaches the extraction position and does not require knowledge based on experience such as selection of a search algorithm. Further, according to the matching process of the first embodiment, since the CAD data is not used, the trouble of preparing the CAD data can be saved.
That is, according to the matching process of the first embodiment, the take-out position of the work 50 can be selected by a simpler method as compared with the conventional method.

また、第１の実施形態の切り抜き処理によれば、更に、マッチング用点群情報に基づいて、学習モデルを構築し、構築した学習モデルに基づいて、距離画像から自動的に切り抜いた画像データから取り出し位置を選択することが可能となり、より効率的に、より精度高く取り出し位置を選択することができる。
更に、第１の実施形態の切り抜き処理によれば、取り出しの継続に伴い、新たな教師データを取得することができるので、構築した学習モデルをリアルタイムに更新していくことができる。 Further, according to the clipping process of the first embodiment, a learning model is further constructed based on the matching point cloud information, and based on the constructed learning model, the image data automatically clipped from the distance image is used. The take-out position can be selected, and the take-out position can be selected more efficiently and with higher accuracy.
Further, according to the clipping process of the first embodiment, new teacher data can be acquired as the extraction is continued, so that the constructed learning model can be updated in real time.

＜第２の実施形態＞
次に第２の実施形態について詳細に説明する。なお、第２の実施形態の基本的な構成は、第１の実施形態と共通する。例えば、第２の実施形態に係るロボットシステム１ｂの全体構成は、図１に示す第１の実施形態に係るロボットシステム１ａの全体構成と同様の構成であり、画像処理装置１０ａを、第２の実施形態の画像処理装置である画像処理装置１０ｂに置き換えることにより、第２の実施形態の構成となる。
以下では重複する説明を避けるために、このように両実施形態で共通する点についての説明は省略し、両実施形態で相違する点について詳細に説明する。 <Second embodiment>
Next, the second embodiment will be described in detail. The basic configuration of the second embodiment is common to that of the first embodiment. For example, the overall configuration of the robot system 1b according to the second embodiment is the same as the overall configuration of the robot system 1a according to the first embodiment shown in FIG. By replacing the image processing device 10b, which is the image processing device of the embodiment, with the image processing device 10b, the configuration of the second embodiment is obtained.
In the following, in order to avoid duplicate explanations, the description of the points common to both embodiments will be omitted, and the differences between the two embodiments will be described in detail.

＜第１の実施形態と第２の実施形態の相違点の概略＞
第１の実施形態では学習を行うにあたり、画像処理装置１０ａが、３次元計測機４０から取得した距離画像に対してマッチングを行い、マッチング用点群情報を作成するという前処理をしていた。そして、画像処理装置１０ａが、このマッチング用点群情報を学習モデルの入力データとし、このマッチング用点群情報に付与された評価値をラベルとすることにより、機械学習による学習モデルの構築や、構築した学習モデルを利用した取り出し位置の選択を行っていた。 <Outline of differences between the first embodiment and the second embodiment>
In the first embodiment, in performing the learning, the image processing device 10a performs preprocessing of matching the distance image acquired from the three-dimensional measuring device 40 and creating the matching point cloud information. Then, the image processing device 10a uses the matching point group information as input data of the learning model and uses the evaluation value given to the matching point group information as a label to construct a learning model by machine learning. The extraction position was selected using the constructed learning model.

あるいは、第１の実施形態では、画像処理装置１０ａが、３次元計測機４０から取得した距離画像の全域にわたって学習済モデルに対する入力と同じ大きさ（すなわち、マッチング用点群情報と同じ大きさ）の距離画像を、取り出し位置候補として切り抜くという前処理をしていた。そして、画像処理装置１０ａが、この切り抜いた取り出し位置候補の３次元点群情報を学習モデルの入力データとし、この３次元点群情報に付与された評価値をラベルとすることにより、機械学習による学習モデルの構築や、構築した学習モデルを利用した取り出し位置の選択を行っていた。 Alternatively, in the first embodiment, the image processing device 10a has the same size as the input to the trained model over the entire range of the distance image acquired from the three-dimensional measuring device 40 (that is, the same size as the matching point cloud information). The distance image of was cut out as a candidate for the extraction position. Then, the image processing device 10a uses the three-dimensional point cloud information of the cut-out extraction position candidate as input data of the learning model, and uses the evaluation value given to the three-dimensional point cloud information as a label, thereby performing machine learning. The learning model was constructed and the extraction position was selected using the constructed learning model.

これに対して、第２の実施形態ではこのマッチングや切り抜きといった前処理を省略し、３次元計測機４０から取得した距離画像全体を学習モデルの入力データとすることにより、機械学習による学習モデルの構築や、構築した学習モデルを利用した取り出し位置の選択を行う。 On the other hand, in the second embodiment, the preprocessing such as matching and clipping is omitted, and the entire distance image acquired from the three-dimensional measuring machine 40 is used as the input data of the learning model, so that the learning model by machine learning can be obtained. Build and select the extraction position using the built learning model.

第２の実施形態を適用する環境にもよるが、このように前処理を省略することにより、第２の実施形態では、演算処理を効率的に行うことや、実装を容易とすることが可能となる。また、第２の実施形態では、このように距離画像全体を学習モデルの入力データとすることにより、画像中で遠く離れたピクセル間の影響を考慮することも可能となる。 Although it depends on the environment to which the second embodiment is applied, by omitting the preprocessing in this way, in the second embodiment, it is possible to efficiently perform the arithmetic processing and facilitate the implementation. It becomes. Further, in the second embodiment, by using the entire distance image as the input data of the learning model in this way, it is possible to consider the influence between pixels far apart in the image.

＜画像処理装置１０ｂの機能ブロック＞
次に、前処理を省略して、距離画像全体を学習モデルの入力データとするために、画像処理装置１０ｂが備える各機能ブロックについて図１０を参照して説明をする。
なお、図１０では図２と同様に、ロボットシステム１ｂにおける、画像処理装置１０ｂ以外の構成要素を環境１００としてまとめて図示する。 <Functional block of image processing device 10b>
Next, in order to omit the preprocessing and use the entire distance image as the input data of the learning model, each functional block included in the image processing device 10b will be described with reference to FIG.
In FIG. 10, similarly to FIG. 2, the components other than the image processing device 10b in the robot system 1b are collectively shown as the environment 100.

画像処理装置１０ｂは、画像処理装置１０ａが備える選択処理部１１に代えて、選択処理部１５を備える点で相違する。すなわち、画像処理装置１０ｂは、選択処理部１５、表示部１２、操作受付部１３、及び学習部１４を備える。
また、選択処理部１５は、教師データ格納部１５１、アノテーション処理部１５２及び取り出し位置選択部１５３を備える。 The image processing device 10b is different in that the selection processing unit 15 is provided in place of the selection processing unit 11 included in the image processing device 10a. That is, the image processing device 10b includes a selection processing unit 15, a display unit 12, an operation reception unit 13, and a learning unit 14.
Further, the selection processing unit 15 includes a teacher data storage unit 151, an annotation processing unit 152, and an extraction position selection unit 153.

教師データ格納部１５１は、機械学習を行うための教師データを格納する部分である。３次元計測機４０から入力された距離画像は、教師データにおける入力データとして教師データ格納部１５１に格納される。また、アノテーション処理部１５２が生成したラベルは、教師データにおける入力データとして教師データ格納部１５１に格納される。
この、入力データとラベルとは、アノテーション処理部１５２によるラベルの格納時に紐付けられる。 The teacher data storage unit 151 is a unit that stores teacher data for performing machine learning. The distance image input from the three-dimensional measuring device 40 is stored in the teacher data storage unit 151 as input data in the teacher data. Further, the label generated by the annotation processing unit 152 is stored in the teacher data storage unit 151 as input data in the teacher data.
The input data and the label are associated with each other when the label is stored by the annotation processing unit 152.

アノテーション処理部１５２は、教師データに含まれるラベルを生成する部分である。アノテーション処理部１５２は、ラベルを生成するために、教師データ格納部１５１から距離画像を取得する。また、アノテーション処理部１５２は、取得した距離画像を表示部１２に表示する。距離画像の表示例については図４を参照して上述したとおりである。 The annotation processing unit 152 is a part that generates a label included in the teacher data. The annotation processing unit 152 acquires a distance image from the teacher data storage unit 151 in order to generate a label. Further, the annotation processing unit 152 displays the acquired distance image on the display unit 12. A display example of the distance image is as described above with reference to FIG.

ユーザは、表示された距離画像を参照すると、第１の実施形態と同様にして、取り出し位置の教示を行う。具体的には、マウスやタッチパネルにより実現される操作受付部１３を利用して、取り出し位置を指定することにより、取り出し位置の教示を行う。 With reference to the displayed distance image, the user teaches the extraction position in the same manner as in the first embodiment. Specifically, the operation reception unit 13 realized by the mouse or the touch panel is used to specify the take-out position to teach the take-out position.

操作受付部１３は、教示された取り出し位置である教示位置をアノテーション処理部１５２に対して通知する。アノテーション処理部１５２は、この通知された教示位置をユーザが認識できるように、距離画像上の教示位置に対して描画を行う。描画は、例えば、距離画像上の教示位置の画素の色を変更する等の、ユーザが把握しやすい方法により行う。描画を行った場合の表示例については図５を参照して上述したとおりである。 The operation reception unit 13 notifies the annotation processing unit 152 of the teaching position, which is the teaching position. The annotation processing unit 152 draws on the teaching position on the distance image so that the user can recognize the notified teaching position. The drawing is performed by a method that is easy for the user to grasp, such as changing the color of the pixel at the teaching position on the distance image. A display example when drawing is performed is as described above with reference to FIG.

ここで、第１の実施形態では、ユーザが取り出し位置を点により教示していた。これに対して、第２の実施形態では、ユーザが取り出し位置を所定の領域として教示する。例えば、第２の実施形態では、ユーザは取り出し可能な領域に色を塗ることにより取り出し位置の教示を行う。 Here, in the first embodiment, the user teaches the take-out position by a point. On the other hand, in the second embodiment, the user teaches the take-out position as a predetermined area. For example, in the second embodiment, the user teaches the take-out position by coloring the take-out area.

アノテーション処理部１５２は、ユーザによる教示に基づいて、距離画像全体に含まれる各画素について、ワーク５０の取り出し位置（すなわち、教示位置）であるか否かを示す属性（例えば、ワーク５０の取り出し位置は「１」。ワーク５０の取り出し位置以外の位置については「０」）を割り当てた画像を生成する。以下では、この画像を「ラベル用マップ」と呼ぶ。 Based on the instruction by the user, the annotation processing unit 152 has an attribute (for example, the extraction position of the work 50) indicating whether or not it is the extraction position (that is, the teaching position) of the work 50 for each pixel included in the entire distance image. Is "1". An image to which "0") is assigned to a position other than the extraction position of the work 50 is generated. In the following, this image will be referred to as a "label map".

第２の実施形態では、このラベル用マップをラベルとして利用する。また、アノテーション処理部１５２は、ラベル用マップを生成するにあたり、各画素についてワーク５０の取り出し位置であるか否かを示す属性を割り当てるのではなく、更に１／ｓ（ｓは任意の自然数）に分解能を上げて、１／ｓ画素毎にワーク５０の取り出し位置であるか否かを示す属性を割り当てるようにしてもよい。また、ラベルとして３次元点群情報は不要であるので、ラベル用マップでは距離画像に含まれる３次元点群情報を削除してよい。従って、ラベル用マップは、距離画像における、各画素（又は１／ｓとした各画素）の２次元座標の情報と、各画素（又は１／ｓとした各画素）についてワーク５０の取り出し位置であるか否かを示す属性の情報とを含んだ画像となる。 In the second embodiment, this label map is used as a label. Further, when generating the label map, the annotation processing unit 152 does not assign an attribute indicating whether or not it is the extraction position of the work 50 for each pixel, but further sets it to 1 / s (s is an arbitrary natural number). The resolution may be increased and an attribute indicating whether or not the work 50 is taken out may be assigned to each 1 / s pixel. Further, since the 3D point cloud information is not required as the label, the 3D point cloud information included in the distance image may be deleted in the label map. Therefore, the label map is based on the information on the two-dimensional coordinates of each pixel (or each pixel set to 1 / s) in the distance image and the extraction position of the work 50 for each pixel (or each pixel set to 1 / s). It is an image that includes information on attributes indicating whether or not it exists.

アノテーション処理部１５２は、生成したラベル用マップをラベルとし、ユーザからの教示のために用いた距離画像を入力データとして、両者を紐付けることにより教師データを生成する。そして、アノテーション処理部１５２は、生成した教師データを教師データ格納部１５１に格納する。なお、アノテーション処理部１５２は、入力データとラベルとを紐付けて教師データ格納部１５１に格納するのではなく、別途に教師データ格納部１５１に格納し、両者をリンクさせることにより教師データとするようにしてもよい。
第２の実施形態の学習処理部１４１は、機械学習を行う場合に、教師データ格納部１５１に格納されている教師データを用いて機械学習を行う。具体的には、第２の実施形態では、学習処理部１４１による機械学習により、距離画像を入力とした場合に、ラベル用マップと同様の画像が出力される学習モデルを構築する。つまり、取り出し可能な領域をセグメンテーションした画像が出力される学習モデルを構築する。 The annotation processing unit 152 uses the generated label map as a label, uses the distance image used for teaching from the user as input data, and generates teacher data by associating the two. Then, the annotation processing unit 152 stores the generated teacher data in the teacher data storage unit 151. The annotation processing unit 152 does not store the input data and the label in the teacher data storage unit 151 in association with each other, but separately stores the input data in the teacher data storage unit 151 and links the two to obtain the teacher data. You may do so.
When performing machine learning, the learning processing unit 141 of the second embodiment performs machine learning using the teacher data stored in the teacher data storage unit 151. Specifically, in the second embodiment, a learning model is constructed in which an image similar to a label map is output when a distance image is input by machine learning by the learning processing unit 141. That is, a learning model is constructed in which an image segmented from the retrievable area is output.

この機械学習は、例えば画像を入力し、画像中の全画素ピクセルに対して何らかの推定（例えばクラス分類など）を行う手法により実現することができる。このような手法としては、例えば、ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎが挙げられる。ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎは、自動車の自動運転への応用を目指した技術であり、例えば写真の画像データを入力として、車、歩行者などの領域毎に色分けすることができる。
第２の実施形態では、上述したように、入力データを距離画像、ラベルを距離画像に対するアノテーション（例えばワーク５０を取り出せそうな位置に人間が色を塗った画像）にして機械学習を行うことで、全画素に対する評価値推定を一度に行うことができる。そのため、第１の実施形態での、マッチング処理や切り抜き処理といった前処理を省略することができる。 This machine learning can be realized by, for example, inputting an image and performing some estimation (for example, classification) for all pixel pixels in the image. Examples of such a method include Semantic Segmentation. Semantic Segmentation is a technology aimed at application to automatic driving of automobiles. For example, image data of a photograph can be input and color-coded for each area such as a car or a pedestrian.
In the second embodiment, as described above, machine learning is performed by using the input data as a distance image and the label as an annotation for the distance image (for example, an image in which a human has painted a position where the work 50 can be taken out). , Evaluation value estimation for all pixels can be performed at once. Therefore, the pre-processing such as the matching process and the clipping process in the first embodiment can be omitted.

具体的に、第２の実施形態の機械学習は、例えば以下の参考文献に開示されているコンボリューションエンコーダ−デコーダを利用することにより実現できる。 Specifically, the machine learning of the second embodiment can be realized by using, for example, the convolution encoder-decoder disclosed in the following references.

＜参考文献＞
Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla、"SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation"、［online］、平成２８年８月１０日、［平成２９年９月１０日検索］、インターネット〈URL：https://arxiv.org/pdf/1511.00561.pdf〉 <References>
Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation", [online], August 10, 2016, [Search September 10, 2017], Internet <URL : Https://arxiv.org/pdf/1511.00561.pdf>

学習処理部１４１は、機械学習により学習済モデルを構築する。そして、学習処理部１４１は、構築した学習済モデルを、学習済モデル格納部１４２に格納する。 The learning processing unit 141 builds a trained model by machine learning. Then, the learning processing unit 141 stores the constructed trained model in the trained model storage unit 142.

取り出し位置選択部１５３は、推定処理部１４３が出力する評価値マップに基づいて、取り出し位置を選択する部分である。取り出し位置選択部１５３による取り出し位置の選択と、評価値マップの詳細について以下説明する。
取り出し位置選択部１５３は、取り出し位置を選択するために次元計測機４０から距離画像を取得すると、取得した距離画像を推定処理部１４３に対して出力する。推定処理部１４３は、入力された距離画像を入力データとして、学習済モデル格納部１４２に格納されている学習済モデルに入力する。この入力に応じて、学習済モデルからは、取り出し可能な領域をセグメンテーションした画像が出力される。この出力された画像を「評価値マップ」と呼ぶ。
評価値マップは、ラベル用マップと同様のデータ構造であり、各画素（又は１／ｓとした各画素）の２次元座標の情報と、各画素（又は１／ｓとした各画素）についてワーク５０の取り出し位置であるか否かを示す属性の情報とを含んだ画像となる。
推定処理部１４３は、評価値マップを取り出し位置選択部１５３に対して出力する。 The extraction position selection unit 153 is a portion that selects the extraction position based on the evaluation value map output by the estimation processing unit 143. The selection of the extraction position by the extraction position selection unit 153 and the details of the evaluation value map will be described below.
When the take-out position selection unit 153 acquires a distance image from the dimension measuring device 40 in order to select the take-out position, the take-out position selection unit 153 outputs the acquired distance image to the estimation processing unit 143. The estimation processing unit 143 inputs the input distance image as input data to the trained model stored in the trained model storage unit 142. In response to this input, the trained model outputs an image in which the retrievable area is segmented. This output image is called an "evaluation value map".
The evaluation value map has the same data structure as the map for labels, and has information on the two-dimensional coordinates of each pixel (or each pixel as 1 / s) and a work for each pixel (or each pixel as 1 / s). It is an image including the information of the attribute indicating whether or not it is the extraction position of 50.
The estimation processing unit 143 takes out the evaluation value map and outputs it to the position selection unit 153.

取り出し位置選択部１５３は、評価値マップに基づいてワーク５０の取り出し位置を選択する。具体的には、ワーク５０の取り出し位置としてセグメンテーションされている各区画を取り出し位置の候補とする。そして、取り出し位置の候補の領域を示す座標情報に基づいて、入力データとした距離画像上で、取り出し位置の候補に対応する領域を特定する。そして、この特定した領域に対して、既知の点群処理や画像処理を行うことにより取り出し位置を選択する。 The take-out position selection unit 153 selects the take-out position of the work 50 based on the evaluation value map. Specifically, each section segmented as the take-out position of the work 50 is a candidate for the take-out position. Then, based on the coordinate information indicating the region of the candidate of the extraction position, the region corresponding to the candidate of the extraction position is specified on the distance image as the input data. Then, the extraction position is selected by performing known point cloud processing or image processing on the specified area.

取り出し位置選択部１５３は、選択した取り出し位置をロボット制御装置２０に対して出力する。その後、第１の実施形態と同様に、ロボット制御装置２０が、取り出し位置に基づいて制御信号を生成する。そして、ロボット３０が制御信号に基づいてハンドによりワーク５０を取り出す。 The take-out position selection unit 153 outputs the selected take-out position to the robot control device 20. After that, the robot control device 20 generates a control signal based on the take-out position as in the first embodiment. Then, the robot 30 takes out the work 50 by hand based on the control signal.

以上説明した構成により、第２の実施形態では、マッチングや切り抜きといった前処理を省略し、３次元計測機４０から取得した距離画像全体を学習モデルの入力データとすることにより、機械学習による学習モデルの構築や、構築した学習モデルを利用した取り出し位置の選択を行うことができる。 With the configuration described above, in the second embodiment, preprocessing such as matching and clipping is omitted, and the entire distance image acquired from the three-dimensional measuring machine 40 is used as the input data of the learning model, so that the learning model by machine learning is used. Can be constructed and the extraction position can be selected using the constructed learning model.

＜第２の実施形態の動作＞
次に、図１１のフローチャートを参照して、第２の実施形態の動作について説明をする。
ステップＳ３１において、教師データ格納部１５１は、３次元計測機４０がバラ積みされたワーク５０を計測することにより生成した距離画像を格納する。
ステップＳ３２において、アノテーション処理部１５２は、教師データ格納部１５１に格納されている距離画像を表示部１２に表示させる。
ステップＳ３３において、アノテーション処理部１５２は、操作受付部１３が受け付けたユーザからの取り出し位置の教示に基づいて、距離画像上に教示位置を描画する。 <Operation of the second embodiment>
Next, the operation of the second embodiment will be described with reference to the flowchart of FIG.
In step S31, the teacher data storage unit 151 stores the distance image generated by the three-dimensional measuring machine 40 measuring the works 50 piled up in bulk.
In step S32, the annotation processing unit 152 causes the display unit 12 to display the distance image stored in the teacher data storage unit 151.
In step S33, the annotation processing unit 152 draws the teaching position on the distance image based on the teaching of the taking-out position from the user received by the operation receiving unit 13.

ステップＳ３４において、アノテーション処理部１５２は、ユーザからの教示に基づいてラベル用マップを生成する。アノテーション処理部１５２は、ラベル用マップと距離画像とを教師データとして教師データ格納部１５１に格納する。 In step S34, the annotation processing unit 152 generates a label map based on the instruction from the user. The annotation processing unit 152 stores the label map and the distance image as teacher data in the teacher data storage unit 151.

ステップＳ３５において学習処理部１４１は、教師データ格納部１５１に格納されている教師データにおける、距離画像を入力データとし、この距離画像に対応するラベル用マップをラベルとした学習を行う。これにより、学習済モデルが作成され、学習済モデル格納部１４２に格納される。 In step S35, the learning processing unit 141 performs learning using the distance image as input data in the teacher data stored in the teacher data storage unit 151 and the label map corresponding to the distance image as a label. As a result, the trained model is created and stored in the trained model storage unit 142.

ステップＳ３６において、取り出し位置選択部１５３は、３次元計測機４０から、バラ積みされたワーク５０を計測することにより生成された距離画像を取得する。そして、取り出し位置選択部１５３は、取得した距離画像を推定処理部１４３に対して出力する。 In step S36, the take-out position selection unit 153 acquires a distance image generated by measuring the works 50 piled up in bulk from the three-dimensional measuring machine 40. Then, the extraction position selection unit 153 outputs the acquired distance image to the estimation processing unit 143.

ステップＳ３７において、推定処理部１４３は、入力された距離画像を、学習済モデル格納部１４２に格納されている学習済モデルに入力する。そして、推定処理部１４３は、学習済モデルの出力として評価値マップを取得する。推定処理部１４３は、取得した評価値マップを取り出し位置選択部１５３に対して出力する。 In step S37, the estimation processing unit 143 inputs the input distance image to the trained model stored in the trained model storage unit 142. Then, the estimation processing unit 143 acquires the evaluation value map as the output of the trained model. The estimation processing unit 143 takes out the acquired evaluation value map and outputs it to the position selection unit 153.

ステップＳ３８において、取り出し位置選択部１５３は評価値マップに基づいて取り出し位置を選択する。取り出し位置選択部１５３は、選択した取り出し位置をロボット制御装置２０に対して送信する。 In step S38, the extraction position selection unit 153 selects the extraction position based on the evaluation value map. The take-out position selection unit 153 transmits the selected take-out position to the robot control device 20.

ステップＳ３９において、ロボット制御装置２０が、受信した取り出し位置に基づいてロボット３０を制御することにより、ワーク５０の取り出しを行う。なお、ワーク５０を複数取り出す場合は、ステップＳ３８において複数の取り出し位置を選択し、ステップＳ３９においてこの複数の取り出し位置それぞれについて取り出しを実行するようにしてもよい。ただし、ワーク５０を取り出すことにより、ワーク５０のバラ積みの状態が変わるので、ワーク５０を取り出す度に、ステップＳ３６に戻って処理を繰り返すようにしてもよい。 In step S39, the robot control device 20 takes out the work 50 by controlling the robot 30 based on the received take-out position. When a plurality of work 50s are taken out, a plurality of take-out positions may be selected in step S38, and take-out may be executed for each of the plurality of take-out positions in step S39. However, since the state of bulk loading of the work 50 changes by taking out the work 50, the process may be repeated by returning to step S36 each time the work 50 is taken out.

＜ハードウェアとソフトウェアの協働＞
なお、上記のロボットシステムに含まれる各装置のそれぞれは、ハードウェア、ソフトウェア又はこれらの組み合わせにより実現することができる。また、上記のロボットシステムに含まれる各装置のそれぞれの協働により行なわれる機械学習方法も、ハードウェア、ソフトウェア又はこれらの組み合わせにより実現することができる。ここで、ソフトウェアによって実現されるとは、コンピュータがプログラムを読み込んで実行することにより実現されることを意味する。 <Collaboration of hardware and software>
Each of the devices included in the above robot system can be realized by hardware, software, or a combination thereof. In addition, a machine learning method performed in collaboration with each device included in the above robot system can also be realized by hardware, software, or a combination thereof. Here, what is realized by software means that it is realized by a computer reading and executing a program.

プログラムは、様々なタイプの非一時的なコンピュータ可読媒体(non-transitory computer readable medium)を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体(tangible storage medium)を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えば、フレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば、光磁気ディスク）、ＣＤ−ＲＯＭ(Read Only Memory)、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ(Programmable ROM)、ＥＰＲＯＭ(Erasable PROM)、フラッシュＲＯＭ、ＲＡＭ(random access memory）)を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体(transitory computer readable medium)によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 Programs can be stored and supplied to a computer using various types of non-transitory computer readable medium. Non-transient computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs (Read Only Memory), CD- R, CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)) are included. The program may also be supplied to the computer by various types of transitory computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

＜実施形態の変形＞
また、上述した各実施形態は、本発明の好適な実施形態ではあるが、上記実施形態のみに本発明の範囲を限定するものではなく、各実施形態を組み合わせた形態や、本発明の要旨を逸脱しない範囲において種々の変更を施した形態での実施が可能である。 <Modification of the embodiment>
Further, although each of the above-described embodiments is a preferred embodiment of the present invention, the scope of the present invention is not limited to the above-described embodiment, and a combination of the respective embodiments and a gist of the present invention can be described. It is possible to carry out in a form with various changes within a range that does not deviate.

＜変形例１＞
上述した各実施形態では、画像処理装置１０ａ又は画像処理装置１０ｂと、ロボット制御装置２０を別体の装置としていたが、これら装置を一体として実現してもよい。 <Modification example 1>
In each of the above-described embodiments, the image processing device 10a or the image processing device 10b and the robot control device 20 are separate devices, but these devices may be realized as one.

＜変形例２＞
また、上述した各実施形態では、ロボット制御装置２０と、画像処理装置１０ａ又は画像処理装置１０ｂが近傍にいるように図示したが、これらがＬＡＮ（Local Area Network）やインターネット等のネットワークを介した遠方に位置していてもよい。 <Modification 2>
Further, in each of the above-described embodiments, the robot control device 20 and the image processing device 10a or the image processing device 10b are shown in the vicinity thereof, but these are shown via a network such as a LAN (Local Area Network) or the Internet. It may be located far away.

また、１台の画像処理装置１０ａ又は画像処理装置１０ｂが、複数台のロボット制御装置２０と接続されていてもよい。そして、１台の画像処理装置１０ａ又は画像処理装置１０ｂが、この複数台のロボット制御装置２０のそれぞれから取得した教師データに基づいて学習を行うようにしてもよい。 Further, one image processing device 10a or an image processing device 10b may be connected to a plurality of robot control devices 20. Then, one image processing device 10a or an image processing device 10b may perform learning based on the teacher data acquired from each of the plurality of robot control devices 20.

＜変形例３＞
また、上述した各実施形態では、３次元計測機４０は所定位置に固定して設置されているものとして説明したが、必ずしも所定位置に固定されていなくともよい。例えば、３次元計測機４０はロボット３０を制御するための機械座標系における位置がわかるのであれば、ロボット３０のアームに取り付けるなど、設置位置が実施中に変更される形態でもよい。 <Modification example 3>
Further, in each of the above-described embodiments, the three-dimensional measuring machine 40 has been described as being fixedly installed at a predetermined position, but it does not necessarily have to be fixed at a predetermined position. For example, if the position of the three-dimensional measuring machine 40 in the machine coordinate system for controlling the robot 30 is known, the installation position may be changed during the implementation such as attaching to the arm of the robot 30.

＜変形例４＞
更に、上述した各実施形態では、画像処理装置１０ａ又は画像処理装置１０ｂを１台の装置で実現することを想定したが、画像処理装置１０ａ又は画像処理装置１０ｂの各機能を、適宜複数の装置に分散する、分散処理システムとしてもよい。例えば、画像処理装置１０ａの選択処理部１１又は画像処理装置１０ｂの選択処理部１５の機能と、学習部１４の機能とを適宜複数の装置に分散する、分散処理システムとしてもよい。この場合に、学習部１４に含まれる各機能ブロック単位で適宜複数の装置に分散する、分散処理システムとしてもよい。また、クラウド上での仮想サーバ機能等を利用して、画像処理装置１０ａ又は画像処理装置１０ｂの各機能を実現してもよい。 <Modification example 4>
Further, in each of the above-described embodiments, it is assumed that the image processing device 10a or the image processing device 10b is realized by one device, but each function of the image processing device 10a or the image processing device 10b can be appropriately performed by a plurality of devices. It may be a distributed processing system that distributes to. For example, a distributed processing system may be used in which the functions of the selection processing unit 11 of the image processing device 10a or the selection processing unit 15 of the image processing device 10b and the functions of the learning unit 14 are appropriately distributed to a plurality of devices. In this case, it may be a distributed processing system in which each functional block included in the learning unit 14 is appropriately distributed to a plurality of devices. Further, each function of the image processing device 10a or the image processing device 10b may be realized by using the virtual server function or the like on the cloud.

＜変形例５＞
更に、上述した第１の実施形態では、マッチング処理を行い、マッチングした位置において実際に試みた取り出し結果の成否に基づいて学習済モデルを構築していた。また、このようにマッチング処理により構築した学習モデルを利用して、その後切り抜き処理を行っていた。
しかしながら、切り抜き処理を行う場合には、必ずしもマッチング処理により構築した学習モデルを利用しなくてもよい。 <Modification 5>
Further, in the first embodiment described above, the matching process is performed, and the trained model is constructed based on the success or failure of the extraction result actually tried at the matched position. In addition, the learning model constructed by the matching process in this way was used, and then the clipping process was performed.
However, when performing the clipping process, it is not always necessary to use the learning model constructed by the matching process.

この場合、学習処理部１４１が、ユーザにより教示された教示点における取り出しは、成功するものであるとみなして学習モデルを構築する。つまり、学習部１４が、教示位置近傍画像データを入力データとし、取り出しが成功した（とみなした）ことを示す評価値をラベルとして、教師データを作成する。そして、学習処理部１４１が、この教師データによる学習によって学習モデルを構築する。 In this case, the learning processing unit 141 constructs a learning model assuming that the extraction at the teaching point taught by the user is successful. That is, the learning unit 14 creates the teacher data using the image data in the vicinity of the teaching position as the input data and the evaluation value indicating that the extraction is successful (considered) as the label. Then, the learning processing unit 141 constructs a learning model by learning with the teacher data.

選択処理部１１は、このようにして構築した学習モデルを用いて、切り抜き処理を行うことができる。これにより、選択処理部１１は、マッチング処理や、実際の取り出しを行うことなく、学習モデルを構築して、取り出し位置を選択することが可能となる。
なお、この場合に学習処理部１４１が、ユーザからの教示位置とは異なる位置（非教示位置）の近傍のデータである非教示位置近傍画像データを取得するようにしてもよい。そして、学習処理部１４１が、非教示位置近傍画像データを入力データとし、取り出しが失敗した（とみなした）ことを示す評価値をラベルとして、失敗例を示す教師データを作成するようにしてもよい。そして、学習処理部１４１が、この失敗例を示す教師データを更に用いて学習を行うようにしてもよい。この場合に、非教示位置は、ユーザにより選択されるようにしてもよいし、教示位置以外の位置からランダムに選択されるようにしてもよい。 The selection processing unit 11 can perform the clipping process using the learning model constructed in this way. As a result, the selection processing unit 11 can construct the learning model and select the extraction position without performing the matching process or the actual extraction.
In this case, the learning processing unit 141 may acquire the non-teaching position neighborhood image data which is the data near the position (non-teaching position) different from the teaching position from the user. Then, the learning processing unit 141 may create teacher data indicating a failure example by using the non-teaching position vicinity image data as input data and using the evaluation value indicating that the retrieval has failed (considered) as a label. Good. Then, the learning processing unit 141 may further perform learning using the teacher data indicating this failure example. In this case, the non-teaching position may be selected by the user, or may be randomly selected from a position other than the teaching position.

ここで、以上の実施形態に関し、更に以下の付記を開示する。
（付記１）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
前記３次元計測機により生成された前記距離画像を表示する表示部と、
前記表示された距離画像上で、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付部と、を備え、
前記ロボットは、前記教示された取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出すロボットシステム。
（付記２）
前記表示部は、前記表示された距離画像上に前記教示された取り出し位置を描画する付記１に記載のロボットシステム。
（付記３）
前記教示された取り出し位置及びその近傍の３次元点群の情報を探索用情報として記憶し、前記距離画像に対して、前記探索用情報による探索を行うことにより、新たな取り出し位置を選択する取り出し位置選択部を備え、
前記ロボットは、前記取り出し位置選択部によって選択された前記新たな取り出し位置で各ワークを前記ハンドにより取り出す付記１又は付記２に記載のロボットシステム。
（付記４）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
前記３次元計測機により生成された前記距離画像と、該距離画像上に教示された取り出し位置とを表示する表示部と、
前記表示された距離画像上で、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付部と、
前記教示された取り出し位置及びその近傍の３次元点群の情報を入力データとし、該入力データとした３次元点群の情報に対する教示に応じた評価値または取り出しの成否に応じた評価値の少なくともいずれか一方をラベルとした機械学習を行うことにより、入力データとして入力された３次元点群の情報についての評価値を出力する学習モデルを構築する学習部と、
を備え、
前記ロボットは、前記教示された取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出すロボットシステム。
（付記５）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
前記３次元計測機により生成された前記距離画像と、該距離画像上に教示された取り出し位置とを表示する表示部と、
前記表示された距離画像上で、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付部と、
前記教示された取り出し位置及びその近傍の３次元点群の情報を入力データとし、該入力データとした３次元点群の情報に対する教示に応じた評価値または取り出しの成否に応じた評価値の少なくともいずれか一方をラベルとした機械学習を行うことにより、入力データとして入力された３次元点群の情報についての評価値を出力する学習モデルを構築する学習部と、
前記距離画像から所定領域の距離画像を切り抜き、切り抜かれた前記距離画像の３次元点群の情報を前記学習モデルに入力データとして入力することにより出力される前記３次元点群の情報についての評価値に基づいて、新たな取り出し位置を選択する取り出し位置選択部と、
を備え、
前記ロボットは、前記教示された取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出し、前記取り出し位置選択部によって選択された前記新たな取り出し位置で各ワークを前記ハンドにより取り出すロボットシステム。
（付記６）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
前記３次元計測機により生成された前記距離画像に基づいて前記複数のワークの少なくとも１つを前記ハンドにより取り出しを行うための取り出し位置の推定を行うことにより生成した評価値マップであって少なくとも１つの評価値からなる評価値マップを出力する推定部と、
前記推定部により出力された前記評価値マップに基づいて前記ハンドが取り出す前記複数のワークの少なくとも１つの取り出し位置を選択する位置選択部と、
を備え、
前記ロボットは、前記位置選択部により選択した前記取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出すロボットシステム。
（付記７）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
前記３次元計測機により生成された前記距離画像を機械学習用の教師データとして格納する教師データ格納部と、
前記教師データ格納部に格納された前記距離画像を表示する表示部と、
前記表示部に表示された前記距離画像に基づく少なくとも１つの教示位置の教示を受け付ける受付部と、
前記受付部で受け付けた前記教示位置に基づき少なくとも１つの教示位置を示すラベルマップを生成し、該ラベルマップと前記教師データ格納部に格納された前記距離画像とを関連づけてデータセットとして前記教師データ格納部に保存するアノテーション処理部と、
前記教師データ格納部に格納された前記データセットを入力として、機械学習を行い、学習済モデルを出力する学習処理部と、
前記学習処理部から出力された前記学習済モデルを格納する学習済モデル格納部と、
前記学習済モデル格納部に格納された前記学習済モデルと、前記３次元計測機により新たに生成された新たな複数のワークの距離画像に基づいて前記新たな複数のワークの少なくとも１つを前記ハンドにより取り出す際の取り出し位置の推定を行うことにより生成した評価値マップであって少なくとも１つの評価値からなる評価値マップを出力する推定部と、
前記推定部により出力された前記評価値マップに基づいて前記ハンドが取り出す前記新たな複数のワークの少なくとも１つの取り出し位置を選択する位置選択部と、
を備え、
前記ロボットは、前記位置選択部により選択した前記取り出し位置に基づいて前記ハンドにより前記新たな複数のワークの少なくとも１つを取り出すロボットシステム。
（付記８）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
を備えたロボットシステムが行うワーク取り出し方法であって、
前記３次元計測機により生成された前記距離画像を表示する表示ステップと、
前記表示された距離画像上で、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付ステップと、
を備え、
前記ロボットが前記教示された取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出すワーク取り出し方法。
（付記９）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
を備えたロボットシステムが行うワーク取り出し方法であって、
前記３次元計測機により生成された前記距離画像と、該距離画像上に教示された取り出し位置とを表示する表示ステップと、
前記表示された距離画像上で、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付ステップと、
前記教示された取り出し位置及びその近傍の３次元点群の情報を入力データとし、該入力データとした３次元点群の情報に対する教示に応じた評価値または取り出しの成否に応じた評価値の少なくともいずれか一方をラベルとした機械学習を行うことにより、入力データとして入力された３次元点群の情報についての評価値を出力する学習モデルを構築する学習ステップと、
を備え、
前記ロボットは、前記教示された取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出すワーク取り出し方法。
（付記１０）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
を備えたロボットシステムが行うワーク取り出し方法であって、
前記３次元計測機により生成された前記距離画像と、該距離画像上に教示された取り出し位置とを表示する表示ステップと、
前記表示された距離画像上で、前記ハンドによる取り出しを行うための取り出し位置の教示を受け付ける受付ステップと、
前記教示された取り出し位置及びその近傍の３次元点群の情報を入力データとし、該入力データとした３次元点群の情報に対する教示に応じた評価値または取り出しの成否に応じた評価値の少なくともいずれか一方をラベルとした機械学習を行うことにより、入力データとして入力された３次元点群の情報についての評価値を出力する学習モデルを構築する学習ステップと、
前記距離画像から所定領域の距離画像を切り抜き、切り抜かれた前記距離画像の３次元点群の情報を前記学習モデルに入力データとして入力することにより出力される前記３次元点群の情報についての評価値に基づいて、新たな取り出し位置を選択する取り出し位置選択ステップと、
を備え、
前記ロボットは、前記教示された取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出し、前記取り出し位置選択ステップによって選択された前記新たな取り出し位置で各ワークを前記ハンドにより取り出すワーク取り出し方法。
（付記１１）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
を備えたロボットシステムが行うワーク取り出し方法であって、
前記３次元計測機により生成された前記距離画像に基づいて前記複数のワークの少なくとも１つを前記ハンドにより取り出しを行うための取り出し位置の推定を行うことにより生成した評価値マップであって少なくとも１つの評価値からなる評価値マップを出力する推定ステップと、
前記推定ステップにて出力された前記評価値マップに基づいて前記ハンドが取り出す前記複数のワークの少なくとも１つの取り出し位置を選択する位置選択ステップと、
を備え、
前記ロボットが前記位置選択ステップにて選択した前記取り出し位置に基づいて前記ハンドにより前記複数のワークの少なくとも１つを取り出すワーク取り出し方法。
（付記１２）
複数のワークの距離画像を生成する３次元計測機と、
前記複数のワークの少なくとも１つを取り出すためのハンドを有するロボットと、
を備えたロボットシステムが行うワーク取り出し方法であって、
前記３次元計測機により生成された前記距離画像を機械学習用の教師データとして教師データ格納部に格納する教師データ格納ステップと、
前記教師データ格納部に格納された前記距離画像を表示する表示ステップと、
前記表示ステップにて表示された前記距離画像に基づく少なくとも１つの教示位置の教示を受け付ける受付ステップと、
前記受付ステップにて受け付けた前記教示位置に基づき少なくとも１つの教示位置を示すラベルマップを生成し、該ラベルマップと前記教師データ格納部に格納された前記距離画像とを関連づけてデータセットとして前記教師データ格納部に保存するアノテーション処理ステップと、
前記教師データ格納部に格納された前記データセットを入力として、機械学習を行い、学習済モデルを出力する学習処理ステップと、
前記学習処理ステップにて出力された前記学習済モデルを学習済モデル格納部に格納する学習済モデル格納ステップと、
前記学習済モデル格納部に格納された前記学習済モデルと、前記３次元計測機により新たに生成された新たな複数のワークの距離画像に基づいて前記新たな複数のワークの少なくとも１つを前記ハンドにより取り出す際の取り出し位置の推定を行うことにより生成した評価値マップであって少なくとも１つの評価値からなる評価値マップを出力する推定ステップと、
前記推定ステップにて出力された前記評価値マップに基づいて前記ハンドが取り出す前記新たな複数のワークの少なくとも１つの取り出し位置を選択する位置選択ステップと、
を備え、
前記ロボットが前記位置選択ステップにより選択した前記取り出し位置に基づいて前記ハンドにより前記新たな複数のワークの少なくとも１つを取り出すワーク取り出し方法。 Here, the following additional notes will be further disclosed with respect to the above embodiments.
(Appendix 1)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
A display unit that displays the distance image generated by the three-dimensional measuring device, and
On the displayed distance image, a reception unit that receives an instruction of a take-out position for taking out by the hand is provided.
The robot is a robot system that takes out at least one of the plurality of workpieces by the hand based on the taught take-out position.
(Appendix 2)
The robot system according to Appendix 1, wherein the display unit draws the taught take-out position on the displayed distance image.
(Appendix 3)
The information of the taught extraction position and the three-dimensional point cloud in the vicinity thereof is stored as search information, and the distance image is searched by the search information to select a new extraction position. Equipped with a position selection unit
The robot system according to Appendix 1 or Appendix 2, wherein the robot takes out each work by the hand at the new take-out position selected by the take-out position selection unit.
(Appendix 4)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
A display unit that displays the distance image generated by the three-dimensional measuring device and the extraction position taught on the distance image.
On the displayed distance image, a reception unit that receives an instruction of a take-out position for taking out by the hand, and a reception unit.
The information of the three-dimensional point group in the teaching position and its vicinity is used as input data, and at least the evaluation value according to the instruction for the information of the three-dimensional point group used as the input data or the evaluation value according to the success or failure of the extraction. A learning unit that builds a learning model that outputs evaluation values for information on 3D point groups input as input data by performing machine learning using either one as a label.
With
The robot is a robot system that takes out at least one of the plurality of workpieces by the hand based on the taught take-out position.
(Appendix 5)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
A display unit that displays the distance image generated by the three-dimensional measuring device and the extraction position taught on the distance image.
On the displayed distance image, a reception unit that receives an instruction of a take-out position for taking out by the hand, and a reception unit.
The information of the three-dimensional point group in the teaching position and its vicinity is used as input data, and at least the evaluation value according to the instruction for the information of the three-dimensional point group used as the input data or the evaluation value according to the success or failure of the extraction. A learning unit that builds a learning model that outputs evaluation values for information on 3D point groups input as input data by performing machine learning using either one as a label.
Evaluation of the information of the three-dimensional point cloud output by cutting out the distance image of a predetermined region from the distance image and inputting the information of the three-dimensional point cloud of the cut out distance image into the learning model as input data. A take-out position selection unit that selects a new take-out position based on the value,
With
The robot takes out at least one of the plurality of works by the hand based on the taught take-out position, and takes out each work by the hand at the new take-out position selected by the take-out position selection unit. system.
(Appendix 6)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
An evaluation value map generated by estimating a take-out position for taking out at least one of the plurality of works by the hand based on the distance image generated by the three-dimensional measuring machine, and at least one. An estimation unit that outputs an evaluation value map consisting of two evaluation values,
A regioselective unit that selects at least one extraction position of the plurality of workpieces to be extracted by the hand based on the evaluation value map output by the estimation unit.
With
The robot is a robot system that takes out at least one of the plurality of workpieces by the hand based on the take-out position selected by the position selection unit.
(Appendix 7)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
A teacher data storage unit that stores the distance image generated by the three-dimensional measuring machine as teacher data for machine learning, and a teacher data storage unit.
A display unit that displays the distance image stored in the teacher data storage unit, and
A reception unit that receives teaching at least one teaching position based on the distance image displayed on the display unit, and
A label map showing at least one teaching position is generated based on the teaching position received by the reception unit, and the teacher data is associated with the label map and the distance image stored in the teacher data storage unit as a data set. An annotation processing unit that saves in the storage unit and
A learning processing unit that uses the data set stored in the teacher data storage unit as input, performs machine learning, and outputs a trained model.
A trained model storage unit that stores the trained model output from the training processing unit, and a trained model storage unit.
The trained model stored in the trained model storage unit and at least one of the new plurality of works based on a distance image of the new plurality of works newly generated by the three-dimensional measuring machine. An estimation unit that outputs an evaluation value map that is an evaluation value map generated by estimating the extraction position when taking out by hand and consists of at least one evaluation value, and an estimation unit.
A regioselective unit that selects at least one extraction position of the new plurality of workpieces to be extracted by the hand based on the evaluation value map output by the estimation unit.
With
The robot is a robot system that takes out at least one of the new plurality of workpieces by the hand based on the take-out position selected by the position selection unit.
(Appendix 8)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
This is a work removal method performed by a robot system equipped with
A display step for displaying the distance image generated by the three-dimensional measuring device, and
On the displayed distance image, a reception step for receiving an instruction of a take-out position for taking out by the hand, and a reception step.
With
A work taking-out method in which the robot takes out at least one of the plurality of works by the hand based on the taught taking-out position.
(Appendix 9)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
This is a work removal method performed by a robot system equipped with
A display step for displaying the distance image generated by the three-dimensional measuring device and the extraction position taught on the distance image, and
On the displayed distance image, a reception step for receiving an instruction of a take-out position for taking out by the hand, and a reception step.
The information of the three-dimensional point group in the teaching position and its vicinity is used as input data, and at least the evaluation value according to the instruction for the information of the three-dimensional point group used as the input data or the evaluation value according to the success or failure of the extraction. A learning step to build a learning model that outputs an evaluation value for information on a 3D point group input as input data by performing machine learning using either one as a label.
With
A work taking-out method in which the robot takes out at least one of the plurality of works by the hand based on the taught taking-out position.
(Appendix 10)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
This is a work removal method performed by a robot system equipped with
A display step for displaying the distance image generated by the three-dimensional measuring device and the extraction position taught on the distance image, and
On the displayed distance image, a reception step for receiving an instruction of a take-out position for taking out by the hand, and a reception step.
The information of the three-dimensional point group in the teaching position and its vicinity is used as input data, and at least the evaluation value according to the instruction for the information of the three-dimensional point group used as the input data or the evaluation value according to the success or failure of the extraction. A learning step to build a learning model that outputs an evaluation value for information on a 3D point group input as input data by performing machine learning using either one as a label.
Evaluation of the information of the three-dimensional point cloud output by cutting out the distance image of a predetermined region from the distance image and inputting the information of the three-dimensional point cloud of the cut out distance image into the learning model as input data. A take-out position selection step that selects a new take-out position based on the value,
With
The robot takes out at least one of the plurality of works by the hand based on the taught take-out position, and takes out each work by the hand at the new take-out position selected by the take-out position selection step. How to take out.
(Appendix 11)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
This is a work removal method performed by a robot system equipped with
An evaluation value map generated by estimating a take-out position for taking out at least one of the plurality of works by the hand based on the distance image generated by the three-dimensional measuring machine, and at least one. An estimation step that outputs an evaluation value map consisting of two evaluation values, and
A position selection step of selecting at least one extraction position of the plurality of workpieces to be extracted by the hand based on the evaluation value map output in the estimation step, and a position selection step.
With
A work taking-out method in which at least one of the plurality of works is taken out by the hand based on the taking-out position selected by the robot in the position selection step.
(Appendix 12)
A 3D measuring machine that generates distance images of multiple workpieces,
A robot having a hand for taking out at least one of the plurality of workpieces, and
This is a work removal method performed by a robot system equipped with
A teacher data storage step of storing the distance image generated by the three-dimensional measuring machine as teacher data for machine learning in the teacher data storage unit, and
A display step for displaying the distance image stored in the teacher data storage unit, and
A reception step that accepts teaching of at least one teaching position based on the distance image displayed in the display step, and
A label map showing at least one teaching position is generated based on the teaching position received in the reception step, and the teacher is associated with the label map and the distance image stored in the teacher data storage unit as a data set. Annotation processing step to save in the data storage part,
A learning process step in which the data set stored in the teacher data storage unit is input, machine learning is performed, and a trained model is output.
A trained model storage step for storing the trained model output in the training process step in the trained model storage unit, and a trained model storage step.
The trained model stored in the trained model storage unit and at least one of the new plurality of works based on a distance image of the new plurality of works newly generated by the three-dimensional measuring machine. An estimation step that outputs an evaluation value map that is an evaluation value map generated by estimating the extraction position when taking out by hand and consists of at least one evaluation value, and an estimation step.
A position selection step of selecting at least one extraction position of the new plurality of workpieces to be extracted by the hand based on the evaluation value map output in the estimation step.
With
A work taking-out method for taking out at least one of the new plurality of works by the hand based on the taking-out position selected by the robot in the position selection step.

１ａ、１ｂロボットシステム
１０ａ、１０ｂ画像処理装置
１１選択処理部
１１１選択用データ格納部
１１２アノテーション処理部
１１３マッチング部
１１４切り抜き部
１２表示部
１３操作受付部
１４学習部
１４１学習処理部
１４２学習済モデル格納部
１４３推定処理部
１５選択処理部
１５１教師データ格納部
１５２アノテーション処理部
１５３取り出し位置選択部
２０ロボット制御装置
３０ロボット
４０３次元計測機
５０ワーク
６０コンテナ 1a, 1b Robot system 10a, 10b Image processing device 11 Selection processing unit 111 Selection data storage unit 112 Annotation processing unit 113 Matching unit 114 Clipping unit 12 Display unit 13 Operation reception unit 14 Learning unit 141 Learning processing unit 142 Learned model storage Unit 143 Estimation processing unit 15 Selection processing unit 151 Teacher data storage unit 152 Annotation processing unit 153 Extraction position selection unit 20 Robot control device 30 Robot 40 3D measuring machine 50 Work 60 Container

Claims

And the robot,
An image processing device that acquires a distance image including a plurality of objects generated by a measuring machine and analyzes the distance image, and an image processing device.
A control device for controlling the operation of taking out an object by the robot is provided.
The image processing device is
And obtaining for extracting at least one of the robots of said plurality of objects, the teaching position in the range image,
Obtained from the range image, applied to an object point group information that definitive at least the teaching position, the teaching, and, at least evaluation value according to any one of the extraction of the object in accordance with the teachings Results To do and
By performing machine learning using the information of the point cloud of the object as input data and using the evaluation value as a label, it is possible to learn a learning model that outputs an evaluation value for the input point cloud information. Consists of
The control device controls the operation of taking out an object by the robot based on the evaluation value output by the learning model of the image processing device.
Robot system.

The teaching of the position indicates the position as either a point or a region.
The robot system according to claim 1.

When the extraction fails, the image processing device gives a lower evaluation value than when the extraction is successful.
The robot system according to claim 1 or 2 .

The evaluation value output by the learning model is information indicating a recoverable region in the input point cloud information.
The robot system according to any one of claims 1 to 3 .

The learning model is a neural network,
The robot system according to any one of claims 1 to 4 .

The result of taking out the object based on the teaching is obtained by attempting to take out the object at the taught position by the robot.
The robot system according to any one of claims 1 to 5 .

The measuring instrument is either a distance image sensor or a stereo camera.
The robot system according to any one of claims 1 to 6.

The size of the range of the point cloud input to the learning model is preset based on the size of the object to be extracted.
The robot system according to any one of claims 1 to 7 .

The image processing device gives the evaluation value according to the time required for taking out.
The robot system according to any one of claims 1 to 8 .

The learning model outputs an image in which the retrievable area is segmented.
The robot system according to any one of claims 1 to 9 .

The image processing device selects an object extraction position based on the evaluation value output by the learning model.
The control device controls the taking-out operation of the object by the robot based on the selected taking-out position.
The robot system according to any one of claims 1 to 10 .

The image processing device and the control device are composed of one device.
The robot system according to any one of claims 1 to 11 .

The measuring instrument is attached to the robot,
The robot system according to any one of claims 1 to 12 .

Based on the result of taking out the object based on the teaching, the evaluation value is given to the information of the point cloud of the object.
The robot system according to any one of claims 1 to 13 .

The learning model is learned by the image processing device in the robot system according to any one of claims 1 to 14.
Model generation method .

A model generation method executed by at least one processor.
A step of acquiring a position instruction in a distance image including the plurality of objects generated by a measuring instrument for extracting at least one of the plurality of objects by a robot.
An evaluation value corresponding to at least one of the teaching and the result of taking out the object based on the teaching is given to the information of the point cloud of the object at least at the taught position obtained from the distance image. Steps to do and
A pre-Symbol object point group of the input information data, comprising by performing machine learning the evaluation value as a label, and the step of learning a learning model for outputting an evaluation value for information group of points is inputted, the ,
The operation of taking out an object by the robot is controlled based on the evaluation value output by the learning model.
Model generation method.

The step of assigning the evaluation value is to assign the evaluation value to the information of the point cloud of the object based on the result of taking out the object based on the teaching.
The model generation method according to claim 16 .

A step of acquiring a position instruction in a distance image including the plurality of objects generated by a measuring instrument for extracting at least one of the plurality of objects by a robot.
Obtained from the range image, the object point group information that definitive at least the teaching position, the teaching, and the evaluation value corresponding to at least one of the results of extraction of the object based on the teachings Steps to give and
A step to the previous SL object point group of the input information data, by performing machine learning the evaluation value a label, for learning the learning model for outputting an evaluation value for information point group is input,
The A model generation program to be executed by the at least one processor,
The operation of taking out an object by the robot is controlled based on the evaluation value output by the learning model.
Model generator.

The step of assigning the evaluation value is to assign the evaluation value to the information of the point cloud of the object based on the result of taking out the object based on the teaching.
The model generation program according to claim 18 .