JP2018189510A

JP2018189510A - Method and device for estimating position and posture of three-dimensional object

Info

Publication number: JP2018189510A
Application number: JP2017092407A
Authority: JP
Inventors: 翔悟荒井; Shogo Arai; 橋本　浩一; Koichi Hashimoto; 浩一橋本; 明宇李; Mingyu Sun; 哲也泉; Tetsuya Izumi; 恭嗣原田; Yasushi Harada; 巌尾方; Iwao Ogata
Original assignee: Tohoku University NUC; Micro Technica Co Ltd
Current assignee: Tohoku University NUC; Micro Technica Co Ltd
Priority date: 2017-05-08
Filing date: 2017-05-08
Publication date: 2018-11-29

Abstract

PROBLEM TO BE SOLVED: To recognize a target in a three-dimensional scene more quickly.SOLUTION: The data of three-dimensional model point group and scene point group is acquired (S100). A point pair feature quantity that defines the geometrical relation of a point pair selected from the model point group and the point pair are recorded in association in a first table (S110). A plurality of key points are selected from the model point group on the basis of the data of model point group and the first table (S120). A second table is constructed leaving only the data, out of the data recorded in the first table, that pertains to the plurality of key points (S130). The data of the scene point group that represents the surface shape of one or more objects included in a three-dimensional scene is acquired (S140). The position and posture of a target in the three-dimensional scene are estimated on the basis of the data of scene point group and the second table (S150).SELECTED DRAWING: Figure 5

Description

本願は、３次元シーンの中から対象物を認識してその姿勢を推定するための技術に関する。 The present application relates to a technique for recognizing an object from a three-dimensional scene and estimating its posture.

３次元シーンの中から、対象物を認識してその位置および姿勢を推定する様々な技術が開発されている。例えば、特許文献１および非特許文献１は、ポイントペア特徴量（ＰｏｉｎｔＰａｉｒＦｅａｔｕｒｅ：ＰＰＦ）と呼ばれる特徴量を用いて、３次元物体を効果的に認識する方法を開示している。 Various techniques for recognizing an object from a three-dimensional scene and estimating its position and posture have been developed. For example, Patent Literature 1 and Non-Patent Literature 1 disclose a method for effectively recognizing a three-dimensional object using a feature amount called a point pair feature (PPF).

米国特許第８８３０２２９号明細書U.S. Pat. No. 8,830,229

Drost et al. "Model globally, match locally: Efficient and robust 3D object recognition", 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 13 June 2010, pp 998-1005Drost et al. "Model globally, match locally: Efficient and robust 3D object recognition", 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 13 June 2010, pp 998-1005

本願は、３次元シーン中の対象物を、より高速に認識することが可能な技術を提供する。 The present application provides a technology capable of recognizing an object in a three-dimensional scene at higher speed.

本発明の一態様に係る方法は、３次元シーンにおける対象物の位置および姿勢を推定する方法である。前記方法は、前記対象物の表面の３次元形状を表すモデル点群のデータを取得するステップと、前記モデル点群から選択されたポイントペアの幾何学的関係を規定するポイントペア特徴量（ＰｏｉｎｔＰａｉｒＦｅａｔｕｒｅ）を、各ポイントペアについて計算し、前記ポイントペア特徴量と前記ポイントペアとを関連付けて第１のテーブルに記録するステップと、前記モデル点群のデータおよび前記第１のテーブルに基づいて、前記モデル点群から、複数のキーポイントを選択するステップと、前記第１のテーブルに記録されたデータのうち、前記複数のキーポイントに関するデータのみを残した第２のテーブルを構築するステップと、前記３次元シーンに含まれる１以上の物体の表面形状を表すシーン点群のデータを取得するステップと、前記シーン点群のデータ、および前記第２のテーブルに基づいて、前記３次元シーン中の前記対象物の位置および姿勢を推定するステップと、を含む。 A method according to an aspect of the present invention is a method for estimating the position and orientation of an object in a three-dimensional scene. The method includes obtaining data of a model point group representing a three-dimensional shape of the surface of the object, and a point pair feature (Point) that defines a geometric relationship between the point pairs selected from the model point group. (Pair Feature) is calculated for each point pair, the point pair feature value and the point pair are associated and recorded in a first table, and based on the model point cloud data and the first table Selecting a plurality of key points from the model point group; and constructing a second table in which only the data relating to the plurality of key points remains among the data recorded in the first table; Obtaining data of a scene point group representing the surface shape of one or more objects included in the three-dimensional scene And estimating the position and orientation of the object in the three-dimensional scene based on the data of the scene point group and the second table.

上記の包括的または具体的な態様は、装置、システム、方法、集積回路、コンピュータプログラム、記録媒体、またはこれらの任意の組み合わせによって実現され得る。 The comprehensive or specific aspect described above can be realized by an apparatus, a system, a method, an integrated circuit, a computer program, a recording medium, or any combination thereof.

本発明の実施形態によれば、３次元シーン中の対象物をより高速に認識することができる。 According to the embodiment of the present invention, an object in a three-dimensional scene can be recognized at a higher speed.

例示的な実施形態におけるロボットシステムを模式的に示す図である。1 is a diagram schematically illustrating a robot system in an exemplary embodiment. 制御装置１００の構成の一例を簡易的に示すブロック図である。2 is a block diagram simply showing an example of a configuration of a control device 100. FIG. ３次元センサ３００および制御装置１００を備えるロボット２００の一例を示す図である。It is a figure which shows an example of the robot 200 provided with the three-dimensional sensor 300 and the control apparatus 100. FIG. 実施形態における物体認識処理の概要を示す図である。It is a figure which shows the outline | summary of the object recognition process in embodiment. 信号処理回路１２０が実行する動作の全体の流れを示すフローチャートである。3 is a flowchart showing an overall flow of operations executed by a signal processing circuit 120. ポイントペア特徴量（ＰＰＦ）を説明するための図である。It is a figure for demonstrating a point pair feature-value (PPF). ハッシュテーブルの一例を示す図である。It is a figure which shows an example of a hash table. 特許文献１の図６に示されている図である。FIG. 7 is a diagram shown in FIG. 6 of Patent Document 1. 実施形態におけるキーポイントの選択処理を示すフローチャートである。It is a flowchart which shows the selection process of the key point in embodiment. 立方体形状のモデルの例を示す図である。It is a figure which shows the example of the model of a cube shape. 点ｍ０と、領域ＭＫ内の他の点ｍ１とから計算されるＰＰＦが、ハッシュテーブルにおけるＰＰＦと照合されることを示す図である。It is a figure which shows that PPF calculated from the point m0 and the other point m1 in the area | region MK is collated with PPF in a hash table. 点ｍ０について検出された同一のＰＰＦをもつポイントペアの組み合わせと、それらの各ポイントペアについて算出された姿勢との対応の例を示す図である。It is a figure which shows the example of a response | compatibility with the attitude | position calculated about each of these point pairs, and the combination of the point pair with the same PPF detected about the point m0. モデル中のポイントペア（ｍｐ，ｍｑ）に対応付けられた座標変換パラメータ（局所座標）を用いてモデル全体を座標変換した場合の例を示す図である。It is a figure which shows the example at the time of carrying out the coordinate transformation of the whole model using the coordinate transformation parameter (local coordinate) matched with the point pair (mp, mq) in a model. モデル中のポイントペア（ｍｘ，ｍｙ）に対応付けられた座標変換パラメータ（局所座標）を用いてモデル全体を座標変換した場合の例を示す図である。It is a figure which shows the example at the time of carrying out the coordinate transformation of the whole model using the coordinate transformation parameter (local coordinate) matched with the point pair (mx, my) in a model. 角またはエッジにおける参照領域の例を示す図である。It is a figure which shows the example of the reference area in a corner or an edge. 図５におけるステップＳ１５０の処理をより詳細に示すフローチャートである。It is a flowchart which shows the process of step S150 in FIG. 5 in detail. ２次元の積分画像を説明するための図である。It is a figure for demonstrating a two-dimensional integral image. ３次元積分画像を説明するための図である。It is a figure for demonstrating a three-dimensional integral image. シミュレーション実験に用いたモデルを示す図である。It is a figure which shows the model used for the simulation experiment. 合成シーンについてのシミュレーション結果を示すグラフである。It is a graph which shows the simulation result about a synthetic scene. 合成シーンについてのシミュレーション結果を示すグラフである。It is a graph which shows the simulation result about a synthetic scene. 工業部品についての認識結果の一例を示す画像である。It is an image which shows an example of the recognition result about an industrial component. 現実のシーンについて、オクルージョンの割合を変化させたときの認識率の変化の例を示すグラフである。It is a graph which shows the example of the change of the recognition rate when changing the ratio of occlusion about an actual scene.

（本発明の基礎となった知見）
本発明の実施形態を説明する前に、本発明の基礎となった知見を説明する。 (Knowledge that became the basis of the present invention)
Prior to describing the embodiments of the present invention, the knowledge underlying the present invention will be described.

３次元シーンの中から特定の物体を認識してその位置および姿勢を推定する技術は、例えばビンピッキングを行うロボットを実現する上で重要である。ビンピッキングとは、不規則な位置および姿勢で積み重ねられた複数の物体の中から特定の物体を把持して指定の場所に運ぶことを指す。そのような作業を行うロボットを実現するためには、乱雑に積み重ねられた複数の物体の中から、特定の物体の位置および姿勢を正確かつ迅速に認識することが求められる。 A technique for recognizing a specific object from a three-dimensional scene and estimating its position and orientation is important for realizing a robot that performs bin picking, for example. Bin picking refers to gripping a specific object from a plurality of objects stacked at irregular positions and postures and transporting the object to a designated place. In order to realize a robot that performs such work, it is required to accurately and quickly recognize the position and orientation of a specific object from among a plurality of objects stacked randomly.

従来、３次元データから円または四角形などの単純な幾何形状を検出することによって物体を認識することは比較的容易に実現可能であった。例えば円筒形状のボルト、リング形状のベアリング、あるいは直方体形状の箱などの単純な形状の物体については、比較的正確に認識することができた。しかし、単純な幾何形状ではない形状をもつ物体（例えば工業用部品等）については、上記のような方法では正しく検出することが困難であった。 Conventionally, it has been relatively easy to recognize an object by detecting a simple geometric shape such as a circle or a rectangle from three-dimensional data. For example, simple shaped objects such as cylindrical bolts, ring shaped bearings, and rectangular parallelepiped boxes could be recognized relatively accurately. However, it has been difficult to correctly detect an object having a shape other than a simple geometric shape (for example, an industrial part) by the above method.

特許文献１および非特許文献１（以下、"Drost"と呼ぶ）は、ＰｏｉｎｔＰａｉｒＦｅａｔｕｒｅ（ＰＰＦ）と呼ばれる特徴量を用いることにより、対象物の姿勢を従来よりも効果的に推定する方法を開示している。ＰＰＦは、３次元点群（３Ｄｐｏｉｎｔｃｌｏｕｄ）における２点間の変位ベクトルと、その２点における法線ベクトルとの幾何学的関係を記述する４次元量である。ＰＰＦは、２点間の変位ベクトルの大きさと、変位ベクトルおよび第１の点の法線ベクトルのなす角度と、変位ベクトルおよび第２の点の法線ベクトルのなす角度と、これらの２点の法線ベクトル同士がなす角度とによって記述される。Drostの方法では、物体の表面の３次元形状を表すモデル点群（ｍｏｄｅｌｃｌｏｕｄ）における全ての２点のペアごとにＰＰＦが計算される。そのＰＰＦとポイントペアの情報とがハッシュテーブルに記録される。このハッシュテーブルに記録されたＰＰＦを利用することにより、シーン点群（ｓｃｅｎｅｃｌｏｕｄ）からモデル点群に相当する部分が検出される。 Patent Document 1 and Non-Patent Document 1 (hereinafter referred to as “Drost”) disclose a method for estimating the posture of an object more effectively than before by using a feature value called Point Pair Feature (PPF). doing. The PPF is a four-dimensional quantity that describes a geometric relationship between a displacement vector between two points in a three-dimensional point group (3D point cloud) and a normal vector at the two points. The PPF is the magnitude of the displacement vector between two points, the angle formed by the normal vector of the displacement vector and the first point, the angle formed by the normal vector of the displacement vector and the second point, and the angle between these two points. It is described by the angle between normal vectors. In the Drost method, the PPF is calculated for every pair of all two points in a model point group representing a three-dimensional shape of the surface of the object. The PPF and point pair information are recorded in the hash table. By using the PPF recorded in the hash table, a portion corresponding to the model point group is detected from the scene point group (scene cloud).

Drostに開示された方法は、任意の形状の物体に適用でき、ある程度正確に対象物を認識することが可能である。しかしながら、本発明者らの検討によれば、例えば工業用部品のように、多くのポイントペアが類似するＰＰＦをもつような物体においては、認識速度が低下するという課題があることがわかった。 The method disclosed in Drost can be applied to an object of an arbitrary shape, and can recognize an object with a certain degree of accuracy. However, according to the study by the present inventors, it has been found that there is a problem that the recognition speed is lowered in an object having many similar point pairs such as industrial parts.

以上の知見に基づき、本発明者らは、上記課題を解決し得る新規な物体認識技術を開発するに至った。 Based on the above findings, the present inventors have developed a novel object recognition technology that can solve the above-mentioned problems.

本発明の一態様における方法は、
３次元シーンにおける対象物の位置および姿勢を推定する方法であって、
前記対象物の表面の３次元形状を表すモデル点群のデータを取得するステップと、
前記モデル点群から選択されたポイントペアの幾何学的関係を規定するポイントペア特徴量（ＰｏｉｎｔＰａｉｒＦｅａｔｕｒｅ：ＰＰＦ）を、各ポイントペアについて計算し、前記ポイントペア特徴量と前記ポイントペアとを関連付けて第１のテーブルに記録するステップと、
前記モデル点群のデータおよび前記第１のテーブルに基づいて、前記モデル点群から、複数のキーポイントを選択するステップと、
前記第１のテーブルに記録されたデータのうち、前記複数のキーポイントに関するデータのみを残した第２のテーブルを構築するステップと、
前記３次元シーンに含まれる１以上の物体の表面形状を表すシーン点群のデータを取得するステップと、
前記シーン点群のデータ、および前記第２のテーブルに基づいて、前記３次元シーン中の前記対象物の位置および姿勢を推定するステップと、
を含む。 In one aspect of the invention, a method includes
A method for estimating the position and orientation of an object in a three-dimensional scene,
Obtaining model point cloud data representing a three-dimensional shape of the surface of the object;
A point pair feature (PPF) that defines the geometric relationship of the point pair selected from the model point cloud is calculated for each point pair, and the point pair feature and the point pair are associated with each other. And recording in the first table;
Selecting a plurality of key points from the model point group based on the data of the model point group and the first table;
Constructing a second table in which only the data relating to the plurality of key points is left among the data recorded in the first table;
Obtaining scene point cloud data representing the surface shape of one or more objects included in the three-dimensional scene;
Estimating the position and orientation of the object in the three-dimensional scene based on the data of the scene point group and the second table;
including.

上記態様によれば、前記第１のテーブルに記録されたデータのうち、前記複数のキーポイントに関するデータのみを残した第２のテーブルを構築する。そして、前記シーン点群のデータ、および前記第２のテーブルに基づいて、前記３次元シーン中の前記対象物の位置および姿勢を推定する。これにより、対象物の位置および姿勢を推定するステップを高速化することができる。 According to the above aspect, the second table in which only the data relating to the plurality of key points is left among the data recorded in the first table is constructed. Then, the position and orientation of the object in the three-dimensional scene are estimated based on the data of the scene point group and the second table. Thereby, the step which estimates the position and attitude | position of a target object can be sped up.

第１のテーブルは、Drostに開示されたハッシュテーブルに相当する。このハッシュテーブルは、多くのデータを含むため、計算量が多くなるという課題がある。モデル点群における点の数をＮｍとすると、ハッシュテーブルに記録されるポイントペアの数は、Ｎｍ（Ｎｍ−１）である。これに対し、第２のテーブルは、モデル点群の中から選択された一部の点（キーポイント）に関するデータのみを含むため、より少ないデータ数をもつ。キーポイントの個数をＮｋ（＜Ｎｍ）とすると、ハッシュテーブルに記録されるポイントペアの数は、Ｎｋ（Ｎｍ−１）にまで削減することができる。キーポイントの数Ｎｋは、例えば、モデル点群に含まれる点の数Ｎｍの１／１００００倍以上１／２倍以下に設定され得る。ある例では、キーポイントの数Ｎｋは、例えば、モデル点群に含まれる点の数Ｎｍの１／１０００倍以上１／１００倍以下に設定され得る。このように少数のキーポイントが選択される場合には、ハッシュテーブルを用いた姿勢推定に要する計算時間を大幅に短縮することができる。 The first table corresponds to the hash table disclosed in Drost. Since this hash table includes a lot of data, there is a problem that the calculation amount increases. When the number of points in the model point group is Nm, the number of point pairs recorded in the hash table is Nm (Nm−1). On the other hand, since the second table includes only data relating to some points (key points) selected from the model point group, the second table has a smaller number of data. If the number of key points is Nk (<Nm), the number of point pairs recorded in the hash table can be reduced to Nk (Nm−1). The number Nk of key points can be set to, for example, 1 / 10,000 times or more and 1/2 times or less the number Nm of points included in the model point group. In an example, the number Nk of key points can be set to, for example, 1/1000 times or more and 1/100 times or less the number Nm of points included in the model point group. When a small number of key points are selected in this way, the calculation time required for posture estimation using a hash table can be greatly shortened.

キーポイントは、モデル点群の中から所定の基準に基づいて厳選された好ましい点の集合である。キーポイントは、例えば、姿勢の独自性が高いと判断される点から選択される。複数のキーポイントを選択する処理は、例えば以下のステップを含み得る。
（ａ）前記モデル点群における点ｍｉ（ｉは１以上Ｎｍ以下の整数、Ｎｍは前記モデル点群に含まれる点の数）と、点ｍｉからの距離が閾値よりも短い参照領域に含まれる点ｍｊ（ｊは１以上Ｍ以下の整数、Ｍは前記参照領域に含まれる点の数）とのポイントペア特徴量をそれぞれ計算する。
（ｂ）前記第１のテーブルを参照して、計算した前記ポイントペア特徴量に類似するポイントペア特徴量をもつ少なくとも１つのポイントペアを探索する。
（ｃ）探索した前記ポイントペアの各々について、点ｍｉおよび点ｍｊの対に整合させるための座標変換パラメータを決定する。
（ｄ）点ｍｉのうち、決定した前記座標変換パラメータの各々を用いて前記モデル点群の座標を変換したとき、前記参照領域に含まれる全ての点の近傍に、変換後のモデル点群のいずれかの点が位置する場合の数が最小になる点ｍｉを、キーポイントとして選択する。 A key point is a set of preferred points that are carefully selected from a model point group based on a predetermined criterion. The key point is selected, for example, from the points that are determined to have high attitude uniqueness. The process of selecting a plurality of key points can include, for example, the following steps.
(A) A point mi in the model point group (i is an integer of 1 to Nm and Nm is the number of points included in the model point group) and a distance from the point mi is included in a reference region shorter than a threshold value. Point pair feature quantities with points mj (j is an integer of 1 to M and M is the number of points included in the reference area) are calculated.
(B) Referring to the first table, search for at least one point pair having a point pair feature amount similar to the calculated point pair feature amount.
(C) For each of the searched point pairs, a coordinate conversion parameter for matching with a pair of points mi and mj is determined.
(D) Among the points mi, when the coordinates of the model point group are converted using each of the determined coordinate conversion parameters, the converted model point group is placed in the vicinity of all the points included in the reference region. A point mi having the smallest number when any point is located is selected as a key point.

上記の処理によれば、姿勢の独自性の高い点がキーポイントとして選択されるので、後に続く推定処理を高速化することができる。 According to the above processing, a point with a high attitude uniqueness is selected as a key point, so that the subsequent estimation processing can be speeded up.

対象物の位置および姿勢を推定する処理は、例えば以下のステップを含み得る。
（ｅ）前記シーン点群における各点ｓｉ（ｉは１以上Ｎｓ以下の整数、Ｎｓは前記シーン点群に含まれる点の数）からの距離が、前記モデル点群に含まれる最も離れた２点間の距離以下である点の集合から、一部の点群Ｓｒを選択する。
（ｆ）点ｓｉと、点群Ｓｒ内の各点Ｓｒｊ（ｊは１以上Ｍｓ以下の整数）とのポイントペア特徴量を計算する。
（ｇ）前記第２のテーブルを参照して、計算した前記ポイントペア特徴量に類似するポイントペア特徴量をもつ少なくとも１つのポイントペアを探索する。
（ｈ）探索した前記ポイントペアの各々について、点ｓｉおよび点Ｓｒｊの対に整合させるための座標変換パラメータを決定し、前記座標変換パラメータが示す姿勢ごとにカウント数を決定する。
（ｉ）前記カウント数を決定した姿勢の中から、前記カウント数が多い順に複数の候補姿勢を選択する。
（ｊ）前記複数の候補姿勢の中から、前記モデル点群と前記シーン点群との整合度が最も高くなる姿勢を、最終的な姿勢として決定する。 The process of estimating the position and orientation of the object can include, for example, the following steps.
(E) The distance from each point si in the scene point group (i is an integer of 1 or more and Ns or less, Ns is the number of points included in the scene point group) is the most distant 2 included in the model point group A part of the point group Sr is selected from a set of points that are less than or equal to the distance between the points.
(F) A point pair feature quantity between the point si and each point Srj (j is an integer of 1 to Ms) in the point group Sr is calculated.
(G) Referring to the second table, search for at least one point pair having a point pair feature amount similar to the calculated point pair feature amount.
(H) For each of the searched point pairs, a coordinate conversion parameter for matching with a pair of point si and point Srj is determined, and a count number is determined for each posture indicated by the coordinate conversion parameter.
(I) A plurality of candidate postures are selected in descending order of the count number from the postures for which the count number has been determined.
(J) From the plurality of candidate postures, a posture having the highest degree of matching between the model point group and the scene point group is determined as a final posture.

上記の処理によれば、シーン点群の全ての点について、モデル点群との照合が行われるため、より正確に物体の位置および姿勢を推定することができる。さらに、データ量の少ない第２のテーブルを参照するため、照合処理に要する時間を短縮することができる。 According to the above processing, since all points in the scene point group are collated with the model point group, the position and orientation of the object can be estimated more accurately. Furthermore, since the second table with a small amount of data is referred to, the time required for the collation process can be shortened.

本明細書に開示される方法は、コンピュータプログラムを実行するコンピュータ、プロセッサ、または処理回路によって実現され得る。そのようなコンピュータプログラムは、例えば装置内のメモリに格納される。本開示は、そのようなコンピュータプログラムを格納したメモリと、当該コンピュータプログラムを実行するプロセッサまたは処理回路とを備える処理装置またはロボットを含む。 The methods disclosed herein may be implemented by a computer, processor, or processing circuit that executes a computer program. Such a computer program is stored in a memory in the apparatus, for example. The present disclosure includes a processing device or a robot including a memory storing such a computer program and a processor or a processing circuit that executes the computer program.

以下、本発明のより具体的な実施形態を説明する。ただし、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になることを避け、当業者の理解を容易にするためである。なお、発明者は、当業者が本開示を十分に理解するために添付図面および以下の説明を提供するのであって、これらによって特許請求の範囲に記載の主題を限定することを意図するものではない。以下の説明において、同一または類似する構成要素については、同じ参照符号を付している。 Hereinafter, more specific embodiments of the present invention will be described. However, more detailed explanation than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art. The inventor provides the accompanying drawings and the following description in order for those skilled in the art to fully understand the present disclosure, and is not intended to limit the subject matter described in the claims. Absent. In the following description, the same or similar components are denoted by the same reference numerals.

（実施形態）
［１．構成］
図１は、本発明の例示的な実施形態におけるロボットシステムを模式的に示す図である。このシステムは、ロボット２００と、３次元センサ３００と、制御装置１００とを備えている。本実施形態では、制御装置１００が備える信号処理回路１２０が、メモリ１３０に格納されたプログラムを実行することにより、本開示の一態様に係る方法を実行する。 (Embodiment)
[1. Constitution]
FIG. 1 is a diagram schematically illustrating a robot system in an exemplary embodiment of the present invention. This system includes a robot 200, a three-dimensional sensor 300, and a control device 100. In the present embodiment, the signal processing circuit 120 included in the control device 100 executes a method according to an aspect of the present disclosure by executing a program stored in the memory 130.

ロボット２００は、複数のアームと、エンドエフェクタ（ハンド）とを備え、物体を把持して移動させることができる。ロボット２００は、制御装置１００からの指令を受けて、箱の中に乱雑に積まれた複数の物体８００の中から、特定の物体を把持して指定の位置に運ぶ作業（ビンピッキング）を行う。ロボット２００は、その内部に複数の電気モータおよび各モータを制御する少なくとも１つの制御回路を備えている。これらの制御回路は、制御装置１００からの指令に応じて各モータを適切に制御し、ロボット２００に所望の動作を実行させる。 The robot 200 includes a plurality of arms and an end effector (hand), and can grip and move an object. The robot 200 receives a command from the control device 100 and performs an operation (bin picking) by grasping a specific object from a plurality of objects 800 stacked randomly in the box and carrying the specific object to a specified position. . The robot 200 includes a plurality of electric motors and at least one control circuit for controlling each motor. These control circuits appropriately control each motor in accordance with a command from the control device 100, and cause the robot 200 to execute a desired operation.

３次元センサ３００は、箱の中の複数の物体８００を撮影し、３次元点群データを生成して出力する。この３次元点群データを、以下、「シーン点群データ」と称する。３次元センサ３００は、撮影対象の領域における距離分布などの３次元データを取得し、そのデータに基づいて３次元点群データを生成して出力する。３次元センサ３００は、例えばレーザレンジファインダ、ＴＯＦ（ＴｉｍｅｏｆＦｌｉｇｈｔ）カメラ、またはステレオカメラによって実現され得る。 The three-dimensional sensor 300 images a plurality of objects 800 in the box, generates three-dimensional point cloud data, and outputs it. This three-dimensional point cloud data is hereinafter referred to as “scene point cloud data”. The three-dimensional sensor 300 acquires three-dimensional data such as a distance distribution in the region to be imaged, and generates and outputs three-dimensional point cloud data based on the data. The three-dimensional sensor 300 can be realized by, for example, a laser range finder, a TOF (Time of Flight) camera, or a stereo camera.

「３次元点群」とは、３次元空間中に存在する１つ以上の物体の表面に分布する複数の点の集合を意味する。本実施形態においては、３次元センサ３００によって取得されるシーン点群データの他に、検出対象の物体（以下、「対象物」と称することがある）の３次元点群を示すモデル点群データが予め用意される。そのようなモデル点群データは、制御装置１００が有する記録媒体に格納されている。 The “three-dimensional point group” means a set of a plurality of points distributed on the surface of one or more objects existing in the three-dimensional space. In the present embodiment, in addition to the scene point cloud data acquired by the three-dimensional sensor 300, model point cloud data indicating a three-dimensional point cloud of an object to be detected (hereinafter sometimes referred to as “target”). Are prepared in advance. Such model point cloud data is stored in a recording medium included in the control device 100.

制御装置１００は、ロボット２００および３次元センサ３００に有線または無線で接続され、これらの動作を制御する。制御装置１００は、３次元センサ３００によって取得されたシーン点群データと、予め用意されたモデル点群データとに基づいて、３次元シーンにおける検出対象の物体を認識し、その姿勢を推定する。 The control device 100 is connected to the robot 200 and the three-dimensional sensor 300 by wire or wirelessly and controls these operations. The control device 100 recognizes an object to be detected in the three-dimensional scene based on the scene point cloud data acquired by the three-dimensional sensor 300 and the model point cloud data prepared in advance, and estimates its posture.

図２は、制御装置１００の構成の一例を簡易的に示すブロック図である。制御装置１００は、例えば汎用または専用のコンピュータであり、制御回路１１０と、信号処理回路１２０と、メモリ１３０とを備えている。制御装置１００は、他にも、通信回路、入／出力インタフェース、ハードディスクドライブなどの記録装置、電源回路などの構成要素を備え得るが、図２では省略されている。図２には、ロボット２００および３Ｄセンサ３００以外に、ディスプレイ４００が示されている。ディスプレイ４００は、制御装置１００に接続され、信号処理回路１２０による処理結果を表示する。 FIG. 2 is a block diagram schematically illustrating an example of the configuration of the control device 100. The control device 100 is a general-purpose or dedicated computer, for example, and includes a control circuit 110, a signal processing circuit 120, and a memory 130. The control device 100 may include other components such as a communication circuit, an input / output interface, a recording device such as a hard disk drive, and a power supply circuit, which are omitted in FIG. In FIG. 2, in addition to the robot 200 and the 3D sensor 300, a display 400 is shown. The display 400 is connected to the control device 100 and displays a processing result by the signal processing circuit 120.

制御回路１１０は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などのプロセッサによって実現され得る。制御回路１１０は、メモリ１３０（例えばＤＲＡＭまたはＳＲＡＭ）に格納されたコンピュータプログラムを実行することにより、ロボット２００、３Ｄセンサ３００、および信号処理回路１２０を制御する。 The control circuit 110 can be realized by a processor such as a CPU (Central Processing Unit). The control circuit 110 controls the robot 200, the 3D sensor 300, and the signal processing circuit 120 by executing a computer program stored in the memory 130 (for example, DRAM or SRAM).

信号処理回路１２０は、例えばＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）などのプロセッサによって実現され得る。信号処理回路１２０は、メモリ１３０に格納されたコンピュータプログラムを実行することにより、本実施形態における対象物の認識処理を行う。 The signal processing circuit 120 can be realized by a processor such as a DSP (Digital Signal Processor). The signal processing circuit 120 executes the computer program stored in the memory 130 to perform the object recognition process in the present embodiment.

図１および図２に示すシステムの構成は一例にすぎず、多様な変形が可能である。例えば、３次元センサ３００および制御装置１００の一方または両方は、ロボット２００内部に組み込まれていてもよい。 The configuration of the system shown in FIGS. 1 and 2 is merely an example, and various modifications are possible. For example, one or both of the three-dimensional sensor 300 and the control device 100 may be incorporated in the robot 200.

図３は、３次元センサ３００および制御装置１００を備えるロボット２００の一例を示す図である。このロボット２００は、エンドエフェクタ２２０の近傍に３次元センサ３００を備えている。ロボット２００はまた、筐体内に、制御装置１００を備えている。この例では、ロボット２００内の制御装置１００が、後述する認識処理を実行し、その結果に応じてアームおよびエンドエフェクタ２２０を制御する。ロボット２００はまた、３次元データを取得するために用いられる光源を備えていてもよい。そのような光源は、例えば可視光または近赤外線を発するレーザまたは発光ダイオードを含み得る。 FIG. 3 is a diagram illustrating an example of the robot 200 including the three-dimensional sensor 300 and the control device 100. The robot 200 includes a three-dimensional sensor 300 in the vicinity of the end effector 220. The robot 200 also includes a control device 100 in the housing. In this example, the control device 100 in the robot 200 executes a recognition process described later, and controls the arm and the end effector 220 according to the result. The robot 200 may also include a light source used to acquire 3D data. Such light sources may include, for example, lasers or light emitting diodes that emit visible or near infrared light.

［２．動作］
次に、制御装置１００内の信号処理回路１２０が実行する物体認識処理を説明する。 [2. Operation]
Next, an object recognition process executed by the signal processing circuit 120 in the control device 100 will be described.

図４は、本実施形態における物体認識処理の概要を示す図である。図示されるように、本実施形態では、３次元センサ３００による３次元計測により、シーン点群データが取得される。また、モデルの３次元点群の分布を示すモデル点群データが予め用意されている。信号処理回路１２０は、これらの２種類の点群データを取得し、両者を照合する。これにより、３次元シーンの中から、モデルと同一の物体を認識し、結果を出力する。検出結果は、例えばディスプレイ４００に画像データとして出力され得る。 FIG. 4 is a diagram showing an outline of object recognition processing in the present embodiment. As illustrated, in this embodiment, scene point cloud data is acquired by three-dimensional measurement by the three-dimensional sensor 300. Further, model point cloud data indicating the distribution of the three-dimensional point cloud of the model is prepared in advance. The signal processing circuit 120 acquires these two types of point cloud data and collates them. As a result, the same object as the model is recognized from the three-dimensional scene, and the result is output. The detection result can be output to the display 400 as image data, for example.

図５は、信号処理回路１２０が実行する動作の全体の流れを示すフローチャートである。本実施形態における処理は、オフラインフェーズとオンラインフェーズとを含む。オフラインフェーズは、３次元計測を行う前に行われる準備段階のフェーズである。オンラインフェーズは、３次元計測を行った後に行われるフェーズである。 FIG. 5 is a flowchart showing the overall flow of the operation executed by the signal processing circuit 120. The processing in this embodiment includes an offline phase and an online phase. The offline phase is a preparatory phase that is performed before the three-dimensional measurement. The online phase is a phase that is performed after performing three-dimensional measurement.

オフラインフェーズは、ステップＳ１００、Ｓ１１０、Ｓ１２０、Ｓ１３０を含む。ステップＳ１００において、信号処理回路１２０は、モデル点群データをメモリ１３０から取得する。次に、ステップＳ１１０において、モデル点群における全ての２点間のＰＰＦを計算し、ハッシュテーブルに記録する。このハッシュテーブルは、前述の第１のテーブルに該当する。ステップＳ１２０において、信号処理回路１２０は、ハッシュテーブルに基づいて、モデル点群から、複数のキーポイントを選択する。複数のキーポイントは、モデル点群のうち、姿勢の独自性が高いと判断される特別な点である。ステップＳ１２０の詳細な処理については後述する。ステップＳ１３０において、信号処理回路１２０は、キーポイント以外の点に関するデータを除外したハッシュテーブルを再構築する。この再構築されたハッシュテーブルは、前述の第２のテーブルに該当する。再構築されたハッシュテーブルは、元のハッシュテーブルよりも少ないデータ量を有する。 The offline phase includes steps S100, S110, S120, and S130. In step S <b> 100, the signal processing circuit 120 acquires model point cloud data from the memory 130. Next, in step S110, the PPF between all two points in the model point group is calculated and recorded in the hash table. This hash table corresponds to the first table described above. In step S120, the signal processing circuit 120 selects a plurality of key points from the model point group based on the hash table. The plurality of key points are special points in the model point group that are determined to be highly unique in posture. Detailed processing in step S120 will be described later. In step S130, the signal processing circuit 120 reconstructs a hash table excluding data related to points other than key points. This reconstructed hash table corresponds to the aforementioned second table. The reconstructed hash table has a smaller amount of data than the original hash table.

以上のオフラインフェーズにおける処理により、オンラインフェーズで利用されるハッシュテーブルが構築される。このハッシュテーブルは、物体を正確に認識する上で好ましいと考えられる少数の厳選されたキーポイントに関するデータのみを含む。このため、オンラインフェーズにおける動作をより高速かつ正確に行うことができる。 The hash table used in the online phase is constructed by the processing in the offline phase. This hash table contains only data relating to a small number of carefully selected key points that are considered desirable for accurately recognizing objects. For this reason, the operation in the online phase can be performed more quickly and accurately.

オフラインフェーズは、ステップＳ１４０およびＳ１５０を含む。ステップＳ１４０において、信号処理回路１２０は、まずシーン点群データを取得する。シーン点群データは、３次元認識を開始する際に制御回路１１０の指示に基づいて３次元センサ３００が取得し、メモリ１３０に格納される。信号処理回路１２０は、制御回路１１０の指示に従い、メモリ１３０からシーン点群データを取得する。ステップＳ１５０において、信号処理回路１２０は、シーン点群データと、再構築したハッシュテーブルとに基づいて、シーン中の対象物の位置および姿勢を推定する。本実施形態では、Drostに開示された投票スキームを改良した方法が用いられる。詳細は後述するが、本実施形態ではシーン点群の各点について、多くの姿勢の候補が選定され、その中から、最も正しいと推定される姿勢が決定される。Drostの方法とは異なり、シーン点群中の全ての点について姿勢の検証が行われるため、より正確に対象物の姿勢を推定することができる。 The offline phase includes steps S140 and S150. In step S140, the signal processing circuit 120 first acquires scene point cloud data. The scene point cloud data is acquired by the three-dimensional sensor 300 based on an instruction from the control circuit 110 when three-dimensional recognition is started, and stored in the memory 130. The signal processing circuit 120 acquires scene point cloud data from the memory 130 in accordance with an instruction from the control circuit 110. In step S150, the signal processing circuit 120 estimates the position and orientation of the object in the scene based on the scene point cloud data and the reconstructed hash table. In this embodiment, a method obtained by improving the voting scheme disclosed in Drost is used. Although details will be described later, in the present embodiment, many posture candidates are selected for each point of the scene point group, and the posture estimated to be the most correct is determined from the candidates. Unlike Drost's method, posture verification is performed for all points in the scene point group, so that the posture of the object can be estimated more accurately.

［２−１．ＰＰＦおよびハッシュテーブル］
本実施形態において用いられるポイントペア特徴量（ＰＰＦ）およびハッシュテーブルを説明する。 [2-1. PPF and hash table]
A point pair feature quantity (PPF) and a hash table used in this embodiment will be described.

図６は、ＰＰＦを説明するための図である。ＰＰＦは、法線ベクトルをもつ２つの点の相対的な位置および向きを表す。２つの点をｍ１およびｍ２とし、それらの点の法線ベクトルを、それぞれｎ１およびｎ２とする。点ｍ１およびｍ２のＰＰＦをＦ（ｍ１、ｍ２）と表現する。Ｆ（ｍ１、ｍ２）は、次の式（１）で表される。

ｄは、点ｍ１から点ｍ２へのベクトルを表す。∠（ａ，ｂ）は、ベクトルａおよびｂのなす角度（０以上π以下の実数）を表す。ｆ１、ｆ２、ｆ３、ｆ４の意味は、以下のとおりである。
ｆ１：点ｍ１およびｍ２の間の距離
ｆ２：ベクトルｄと法線ベクトルｎ１との角度
ｆ３：ベクトルｄと法線ベクトルｎ２との角度
ｆ４：法線ベクトルｎ１、ｎ２間の角度 FIG. 6 is a diagram for explaining the PPF. PPF represents the relative position and orientation of two points with normal vectors. The two points are m1 and m2, and the normal vectors of those points are n1 and n2, respectively. The PPF at the points m1 and m2 is expressed as F (m1, m2). F (m1, m2) is represented by the following formula (1).

d represents a vector from the point m1 to the point m2. ∠ (a, b) represents an angle (a real number between 0 and π) formed by the vectors a and b. The meanings of f1, f2, f3, and f4 are as follows.
f1: Distance between points m1 and m2 f2: Angle between vector d and normal vector n1 f3: Angle between vector d and normal vector n2 f4: Angle between normal vectors n1 and n2

本実施形態では、ステップＳ１１０において、モデル点群における全ての２点間のＰＰＦが計算され、ハッシュテーブルに格納される。 In this embodiment, in step S110, the PPF between all two points in the model point group is calculated and stored in the hash table.

図７は、ハッシュテーブルの一例を示す図である。図示されるように、ハッシュテーブルは、ＰＰＦをキーとして、ＰＰＦと、対応するポイントペアとの関係を規定する。ＰＰＦにおける各成分ｆ１、ｆ２、ｆ３、ｆ４は、離散化されており、各成分の値が近い複数のＰＰＦは、同一のＰＰＦとして処理される。 FIG. 7 is a diagram illustrating an example of a hash table. As shown in the figure, the hash table defines the relationship between the PPF and the corresponding point pair using the PPF as a key. The components f1, f2, f3, and f4 in the PPF are discretized, and a plurality of PPFs whose values are close to each other are processed as the same PPF.

［２−２．投票スキーム］
本実施形態におけるキーポイントの選択は、Drostに開示された投票スキームを一部利用している。このため、本実施形態におけるキーポイントの選択処理を説明する前に、Drostに開示された投票スキームを説明する。 [2-2. Voting scheme]
The selection of key points in this embodiment uses a part of the voting scheme disclosed in Drost. For this reason, before explaining the keypoint selection process in the present embodiment, the voting scheme disclosed in Drost will be explained.

Drostは、投票空間を２次元に減少させる効果的な投票スキームを提案している。Drostの方法では、オンラインフェーズにおいて、シーン点群における参照点ｓｒと他の点ｓｉとから、ＰＰＦすなわちＦ（ｓｒ，ｓｉ）が計算され、このＰＰＦがハッシュテーブルの中から探索される。これにより、類似するＰＰＦをもつモデルポイントペア（ｍｒ，ｍｉ）が探索される。参考のため、特許文献１の図６を図８として引用する。図８に示すように、モデルポイントペアをシーンポイントペアに揃えるために、次の２つのステップが実行される。
（ａ）ポイントペア（ｓｒ，ｓｉ）および（ｍｒ，ｍｉ）の各参照点を原点に並進移動させ、それらの法線をｘ軸に合せる。それらの並進行列をＴｍ、Ｔｓとする。
（ｂ）モデルポイントペア（ｍｒ，ｍｉ）をｘ軸のまわりに、シーンポイントペア（ｓｒ，ｓｉ）に一致するまで回転させる。 Drost has proposed an effective voting scheme that reduces the voting space in two dimensions. In the Drost method, in the online phase, PPF, that is, F (sr, si) is calculated from the reference point sr and other points si in the scene point group, and this PPF is searched from the hash table. Thereby, a model point pair (mr, mi) having a similar PPF is searched. For reference, FIG. 6 of Patent Document 1 is cited as FIG. As shown in FIG. 8, the following two steps are performed to align the model point pairs with the scene point pairs.
(A) The reference points of the point pairs (sr, si) and (mr, mi) are translated from the origin, and their normals are aligned with the x-axis. Let these parallel progressions be Tm and Ts.
(B) The model point pair (mr, mi) is rotated around the x axis until it matches the scene point pair (sr, si).

モデルポイントペア（ｍｒ，ｍｉ）と、シーンポイントペア（ｓｒ，ｓｉ）は、類似するＰＰＦをもつので、ｘ軸のまわりに回転させることにより、点ｍｉの位置および法線を、点ｓｉの位置および法線にほぼ一致させることができる。回転角をαとし、回転行列をＲｘ（α）とする。ｍｒとαとのペア（ｍｒ，α）は、局所座標（local coordinate）と呼ばれる。上記の操作により、２次元の累積テーブル（accumulator table)Ｔ（ｍｒ，α）が生成される。このテーブルに、局所座標の投票、つまり局所座標が何回現れるかが記録される。局所座標は、点Ｓｒの姿勢の候補を表す。回転角αを用いた座標変換は、以下の式（２）で表される。

Since the model point pair (mr, mi) and the scene point pair (sr, si) have similar PPF, by rotating around the x axis, the position and normal of the point mi are changed to the position of the point si. And can be approximately matched to the normal. The rotation angle is α, and the rotation matrix is Rx (α). A pair (mr, α) of mr and α is called a local coordinate. By the above operation, a two-dimensional accumulator table T (mr, α) is generated. This table records the local coordinate votes, that is, how many times local coordinates appear. Local coordinates represent candidates for the posture of the point Sr. Coordinate conversion using the rotation angle α is expressed by the following equation (2).

Drostの方法においては、シーン点群の中から一部の参照点ｓｒが選択され、参照点ｓｒごとに、ｓｉを変えながら局所座標の投票が行われる。投票数の多い複数の局所座標に対応する複数の姿勢をクラスタリングして平均化するなどの処理を行うことにより、最終的な姿勢が決定される。 In the Drost method, some reference points sr are selected from the scene point group, and local coordinates are voted while changing si for each reference point sr. A final posture is determined by performing processing such as clustering and averaging a plurality of postures corresponding to a plurality of local coordinates having a large number of votes.

［２−３．キーポイントの選択］
オフラインフェーズにおけるキーポイント選択処理（ステップＳ１２０）の詳細を説明する。キーポイントの選択は、本実施形態における姿勢推定アルゴリズムにおいて極めて重要である。本実施形態においては、モデル点群の中から、いくつかのキーポイントが選択され、それらに対応する点および姿勢がシーン点群の中から探索される。このため、良好なキーポイントを選択することが重要である。本実施形態では、モデル点群の中から、独自性の高い点が選択される。これにより、物体認識の精度を高めることができる。 [2-3. Select keypoint]
Details of the key point selection process (step S120) in the offline phase will be described. The selection of key points is extremely important in the posture estimation algorithm in this embodiment. In the present embodiment, several key points are selected from the model point group, and points and postures corresponding to them are searched from the scene point group. For this reason, it is important to select good key points. In this embodiment, a highly unique point is selected from the model point group. Thereby, the accuracy of object recognition can be improved.

本実施形態では、オフラインフェーズにおいて、上記の投票スキームの考え方を利用して、モデル点群の中から独自性が高いと考えられるキーポイントが選択される。キーポイントを選択するために、モデル点群における点ｍｉおよびその周辺の点群を、シーン点群であるものとみなす。そして、そのシーンクラウドの姿勢を、前述の投票スキームを用いて推定する。仮に少数の姿勢のみが推定できた場合、そのシーンクラウドは十分に特別である、つまりそのｍｉはキーポイントとして適していると考える。 In the present embodiment, in the offline phase, key points that are considered to be highly unique are selected from the model point group using the concept of the voting scheme described above. In order to select a key point, the point mi in the model point group and the surrounding point group are regarded as the scene point group. Then, the posture of the scene cloud is estimated using the voting scheme described above. If only a small number of poses can be estimated, the scene cloud is sufficiently special, that is, the mi is suitable as a key point.

図９は、本実施形態におけるキーポイントの選択処理を示すフローチャートである。図示されるように、キーポイントは、例えば次の処理によって選択される。 FIG. 9 is a flowchart showing key point selection processing in the present embodiment. As illustrated, the key points are selected by, for example, the following process.

１）モデル点群における全ての点ｍｉ（ｉは、１以上Ｎｍ以下の整数、Ｎｍはモデル点群に含まれる点の数）について、ｍｉからの距離が閾値よりも短い領域（参照領域と呼ぶ）内に含まれる点の集合を、ＭＫｉとする（ステップＳ１２１）。 1) For all the points mi (i is an integer of 1 to Nm, Nm is the number of points included in the model point group) in the model point group, an area where the distance from mi is shorter than a threshold (referred to as a reference area) ) Is set as MKi (step S121).

２）ｍｉとＭＫｉ内の他の各点ｍｊ（ｊは、１以上Ｍ以下の整数、Ｍは参照領域に含まれる点の数）とのＰＰＦをそれぞれ計算する（ステップＳ１２２）。計算した各ＰＰＦを作成済みのハッシュテーブルの中から探索する（ステップＳ１２３）。この場合、対応するモデルポイントペアが１つ以上見つかる。その数をＣｊとする。Ｃｊ個のモデルポイントペアの各々について、局所座標（ｍｒ、α）を計算し、累積テーブルに投票する（ステップＳ１２４）。この投票は、ＭＫｉ内の最後の点ｍ_Ｍについて完了するまで繰り返される。 2) PPFs of mi and each other point mj in MKi (j is an integer of 1 to M and M is the number of points included in the reference area) are respectively calculated (step S122). Each calculated PPF is searched from the created hash table (step S123). In this case, one or more corresponding model point pairs are found. Let that number be Cj. For each of the Cj model point pairs, the local coordinates (mr, α) are calculated and voted on the accumulation table (step S124). This vote is repeated until completion for the last point m _{M in} MKi.

３）投票数がゼロではない全ての局所座標（姿勢の候補）について、全てのモデル点をシーン空間に移動（並進および回転）させる（ステップＳ１２５）。この際、式（２）が用いられる。ＭＫｉ内の全ての点の近傍（距離が閾値未満の位置）に、移動後のモデル点が位置する場合、ｍｉのスコアに１を加算する（ステップＳ１２６、Ｓ１２７）。ｍｉごとにこのスコアが決定される。 3) For all local coordinates (posture candidates) for which the number of votes is not zero, all model points are moved (translated and rotated) into the scene space (step S125). At this time, Equation (2) is used. When the model point after movement is located near all the points in MKi (position where the distance is less than the threshold), 1 is added to the score of mi (steps S126 and S127). This score is determined for each mi.

４）ｍ_１からｍ_Ｎｍのうち、最小のスコアの点（群）をキーポイントとして選択する（ステップＳ１２８）。 4) A point (group) having the lowest score among m ₁ to m _Nm is selected as a key point (step S128).

表１は、図９のフローチャートが示すアルゴリズムの一例を示している。このようなアルゴリズムを用いることにより、信号処理回路１２０は、姿勢の独自性の高い良好なキーポイントを選択することができる。 Table 1 shows an example of the algorithm shown in the flowchart of FIG. By using such an algorithm, the signal processing circuit 120 can select a good key point with a high attitude uniqueness.

なお、本実施形態では、エッジ上の点はキーポイントとして選択されない。常に低いスコアになるが、エッジ上の点における法線は信頼性が低いと考えられるからである。ただし、性能上問題がない場合は、エッジ上の点をキーポイントに含めてもよい。 In the present embodiment, points on the edge are not selected as key points. The score is always low, but normals at points on the edge are considered unreliable. However, if there is no problem in performance, a point on the edge may be included as a key point.

図１０Ａから図１０Ｆは、本実施形態におけるスコア算出の考え方を説明するための図である。ここでは、簡単のため、モデルの表面形状が立法体である場合を考える。 FIG. 10A to FIG. 10F are diagrams for explaining the concept of score calculation in the present embodiment. Here, for the sake of simplicity, consider the case where the surface shape of the model is a legislative body.

図１０Ａは、立方体形状のモデルの例を示す図である。この例では、エッジ上の点を除き、各点の法線は、面に垂直である。ｍ_０は、スコア算出対象の点である。図中の円は、点ｍ_０からの距離が閾値以下である領域ＭＫを表している。本実施形態では、この領域ＭＫが、シーン中に存在すると仮定して、ＰＰＦの計算およびハッシュテーブルの探索が行われる。 FIG. 10A is a diagram illustrating an example of a cubic model. In this example, except for points on the edge, the normal of each point is perpendicular to the surface. m ₀ is a score calculation target point. A circle in the figure represents a region MK in which the distance from the point m ₀ is equal to or less than a threshold value. In the present embodiment, assuming that this region MK exists in the scene, PPF calculation and hash table search are performed.

図１０Ｂは、点ｍ_０と、領域ＭＫ内の他の点ｍ_１とから計算されるＰＰＦが、ハッシュテーブルにおけるＰＰＦと照合されることを示す図である。この例では、（ｍ_０，ｍ_１）のＰＰＦ（Ｆ０）と共通のＰＰＦをもつポイントペア（ｍｐ，ｍｑ）および（ｍｘ，ｍｙ）が検出されている。このようなＰＰＦの照合が、点ｍ_０と、領域ＭＫ内の他の全ての点との組み合わせについて行われる。 FIG. 10B is a diagram showing that the PPF calculated from the point m ₀ and the other point m ₁ in the region MK is collated with the PPF in the hash table. In this example, point pairs (mp, mq) and (mx, my) having a PPF (F0) of (m ₀ , m ₁ ) and a common PPF are detected. Such PPF matching is performed for the combination of the point m ₀ and all other points in the region MK.

図１０Ｃは、点ｍ_０について検出された同一のＰＰＦをもつポイントペアの組み合わせと、それらの各ポイントペアについて算出された姿勢との対応の例を示す図である。姿勢Ｐ０〜Ｐｎは、上記の局所座標が示す姿勢を表している。姿勢Ｐ０〜Ｐｎのそれぞれについて、モデル点群が仮想的なシーンである領域ＭＫを含む座標系に座標変換される。 FIG. 10C is a diagram illustrating an example of correspondence between combinations of point pairs having the same PPF detected for the point m ₀ and postures calculated for the respective point pairs. Postures P0 to Pn represent the postures indicated by the local coordinates. For each of the postures P0 to Pn, the coordinate conversion is performed on the coordinate system including the area MK in which the model point group is a virtual scene.

図１０Ｄは、モデル中のポイントペア（ｍｐ，ｍｑ）に対応付けられた座標変換パラメータ（局所座標）を用いてモデル全体を座標変換した場合の例を示している。この例では、領域ＭＫに含まれる全ての点の近傍（距離が閾値未満の位置）に、変換後のモデル点群のいずれかの点が存在する。このような場合は、ｍ_０のスコアに１を加算する。 FIG. 10D shows an example in which the entire model is coordinate transformed using coordinate transformation parameters (local coordinates) associated with a point pair (mp, mq) in the model. In this example, any point in the model point group after conversion exists in the vicinity (position where the distance is less than the threshold) of all the points included in the region MK. In such a case, it adds 1 to the score of the m _0.

図１０Ｅは、モデル中のポイントペア（ｍｘ，ｍｙ）に対応付けられた座標変換パラメータ（局所座標）を用いてモデル全体を座標変換した場合の例を示している。この例では、領域ＭＫに含まれる一部の点については、その近傍（距離が閾値未満の位置）に、変換後のいずれのモデル点も存在しない。このような場合は、ｍ_０のスコアは増加しない。 FIG. 10E shows an example in which the entire model is coordinate-transformed using coordinate transformation parameters (local coordinates) associated with a point pair (mx, my) in the model. In this example, for some points included in the region MK, none of the converted model points exists in the vicinity (position where the distance is less than the threshold). In such a case, the score of m ₀ does not increase.

以上のような点数付けが、図１０Ｃに示す全ての姿勢Ｐ０〜Ｐｎについて行われ、ｍ_０の最終的なスコアが決定される。同様の処理をモデル点群における全ての点について実行することにより、各点のスコアが決定される。最終的に、ステップＳ１２８において、最小のスコアの点または点群がキーポイントとして選出される。なお、最小のスコアの点に限らず、スコアが低い点から順に所定数の点をキーポイントとして選出してもよい。 Above such scoring is performed for all the postures P0~Pn shown in FIG. 10C, the final score of the m ₀ is determined. By executing the same process for all points in the model point group, the score of each point is determined. Finally, in step S128, the point or point group with the lowest score is selected as the key point. It should be noted that a predetermined number of points may be selected as key points in order from the lowest score, not limited to the minimum score.

図１０Ｆは、角またはエッジにおける参照領域の例を示す図である。角またはエッジである点Ａをキーポイントとして選出する場合、図１０Ｆにおいてハッチングされている領域が参照領域になる。このような点Ａについては、スコアが前述の点ｍ_０よりも遥かに小さくなる傾向がある。 FIG. 10F is a diagram illustrating an example of reference regions at corners or edges. When a point A that is a corner or an edge is selected as a key point, the hatched area in FIG. 10F becomes the reference area. For such a point A, the score tends to be much smaller than the point m ₀ described above.

［２−３．ハッシュテーブルの再構築］
信号処理回路１２０は、複数のキーポイントを選択した後、ハッシュテーブルを再構築する。すなわち、ハッシュテーブルから、複数のキーポイントに関連しないデータが削除される。本実施形態では、全てのモデル点群間のＰＰＦをハッシュテーブルに保存するDrostのアプローチとは異なり、キーポイント（複数）と、他の各モデル点との間のＰＰＦのみがハッシュテーブルに残る。モデル点群の点の数をＮｍ、キーポイントの数をＮｋとすると、再構築後のハッシュテーブルに保存されるモデルポイントペアの数は、Ｎｋ（Ｎｍ−１）である。DrostのアプローチにおけるＮｍ（Ｎｍ−１）個と比較すると、Ｎｋ／Ｎｍ倍のデータ量で済む。Ｎｋ／Ｎｍは、例えば１／１０００から１／１００程度の低い値に抑えることができる。一例として、Ｎｍ＝３０００、Ｎｋ＝１０の場合、ハッシュテーブルのデータ量は、１／３００にまで減少する。これにより、オンラインフェーズにおける計算時間を短縮することができる。 [2-3. Rebuild hash table]
The signal processing circuit 120 reconstructs the hash table after selecting a plurality of key points. That is, data that is not related to a plurality of key points is deleted from the hash table. In this embodiment, unlike Drost's approach in which PPFs between all model point groups are stored in a hash table, only the PPFs between the key point (s) and the other model points remain in the hash table. Assuming that the number of points in the model point group is Nm and the number of key points is Nk, the number of model point pairs stored in the hash table after reconstruction is Nk (Nm−1). Compared with Nm (Nm-1) in the Drost approach, the data amount is Nk / Nm times. Nk / Nm can be suppressed to a low value of about 1/1000 to 1/100, for example. As an example, when Nm = 3000 and Nk = 10, the data amount of the hash table is reduced to 1/300. Thereby, the calculation time in an online phase can be shortened.

［２−４．姿勢推定処理］
次に、図５のステップＳ１５０における姿勢推定処理の具体例を説明する。 [2-4. Posture estimation processing]
Next, a specific example of the posture estimation process in step S150 of FIG. 5 will be described.

図１１は、ステップＳ１５０の処理をより詳細に示すフローチャートである。本実施形態では、Drostの投票スキームよりもビンピッキングにより適した投票スキームが用いられる。シーン点群の一部ではなく、全ての点が選択され、周辺の点群との間でＰＰＦが計算される。 FIG. 11 is a flowchart showing the process of step S150 in more detail. In this embodiment, a voting scheme more suitable for bin picking than the Drost voting scheme is used. All points are selected, not part of the scene point cloud, and the PPF is calculated between the surrounding point clouds.

各シーン点ｓｉについて、ｓｉからの距離がモデルサイズよりも短い点群をＳｐｈｅｒｅ_ｉとする。モデルサイズとは、モデル点群のうち、最も離れた２点間の距離を意味する。信号処理回路１２０は、点群Ｓｐｈｅｒｅ_ｉのうちの一部をランダムに選択する（ステップＳ１５１）。ランダムに選択する代わりに、所定の規則（例えば１０個おきなど）に従って一部の点を選択してもよい。ここで選択される点群をＳｒとする。選択された点群Ｓｒにおける点の数は、点群Ｓｐｈｅｒｅｉに含まれる点の数のＮｒ倍とする。Ｎｒは、例えば１よりも十分に小さい数（Ｎｒ＜＜１）であり得る。 For each scene point si, a point group whose distance from si is shorter than the model size is referred to as Sphere _i . The model size means the distance between the two most distant points in the model point group. The signal processing circuit 120 selects a part of the point group Sphere _i at random (step S151). Instead of selecting at random, some points may be selected according to a predetermined rule (for example, every tenth). The point group selected here is Sr. The number of points in the selected point group Sr is Nr times the number of points included in the point group Spherei. Nr may be a number sufficiently smaller than 1, for example (Nr << 1).

信号処理回路１２０は、点ｓｉと、Ｓｒに含まれるｊ番目の点Ｓｒｊ（ｊは１以上Ｍｓ以下の整数、ＭｓはＳｒ内の点の数）とのＰＰＦを計算する（ステップＳ１５２）。その後、前述の投票スキームを実行する。信号処理回路１２０は、計算したＰＰＦに近いＰＰＦをもつモデルポイントペアを、再構築したハッシュテーブルの中から探索する（ステップＳ１５３）。Ｃｊ個のモデルポイントペアが見つかったとする。信号処理回路１２０は、Ｃｊ個のモデルポイントペアの各々について、（ｓｉ，Ｓｒｊ）に整合させるための局所座標を決定し、各局所座標の投票数を決定する（ステップＳ１５４）。ステップＳ１５２からＳ１５４の処理を、ｊが１からＭｓになるまで繰り返す。 The signal processing circuit 120 calculates the PPF between the point si and the jth point Srj included in Sr (j is an integer between 1 and Ms, and Ms is the number of points in Sr) (step S152). Thereafter, the voting scheme described above is executed. The signal processing circuit 120 searches the reconstructed hash table for a model point pair having a PPF close to the calculated PPF (step S153). Assume that Cj model point pairs are found. The signal processing circuit 120 determines local coordinates for matching with (si, Srj) for each of the Cj model point pairs, and determines the number of votes for each local coordinate (step S154). Steps S152 to S154 are repeated until j becomes 1 to Ms.

以上の処理がシーン点群に含まれる全ての点について実行される。全てのシーン点から局所座標を探索することにより、正しい姿勢が除外される可能性が低くなる。このスキームは、シーン内の複数の物体を検出するビンピッキングのような用途に適している。さらに、ハッシュテーブルが従来よりも遥かに小さいため、より多くのシーンポイントペアについてＰＰＦが計算されるにも関わらず、投票スキームに要する計算時間が増大することはない。 The above processing is executed for all points included in the scene point group. By searching for local coordinates from all scene points, the possibility that a correct posture is excluded is reduced. This scheme is suitable for applications such as bin picking to detect multiple objects in a scene. Furthermore, since the hash table is much smaller than before, the calculation time required for the voting scheme does not increase even though the PPF is calculated for more scene point pairs.

局所座標の投票数が多いほど、その局所座標が示す姿勢は多くのポイントペアに整合し、故に正しい姿勢である可能性が高い。したがって、投票数の多いいくつかの局所座標が検証のために選択される。信号処理回路１２０は、投票数が多い順にＮｖ個の局所座標を姿勢の検証のために選択する（ステップＳ１５５）。これらの局所座標は、対象物の姿勢の候補を表す。信号処理回路１２０は、各局所座標が示す姿勢を検証し、姿勢ごとに正確さを示すスコアを算出する（ステップＳ１５６）。そして、スコアが最も高い姿勢を、最終的な姿勢として決定する（ステップＳ１５７）。 As the number of votes of local coordinates increases, the posture indicated by the local coordinates is more consistent with many point pairs, and thus is more likely to be the correct posture. Therefore, some local coordinates with a large number of votes are selected for verification. The signal processing circuit 120 selects Nv local coordinates in order from the largest number of votes for posture verification (step S155). These local coordinates represent candidates for the posture of the object. The signal processing circuit 120 verifies the posture indicated by each local coordinate, and calculates a score indicating accuracy for each posture (step S156). Then, the posture having the highest score is determined as the final posture (step S157).

なお、本実施形態では、投票の段階で、シーン点群から一部の点群Ｓｒがランダムに選択される。これは、シーン点群中の全ての２点間のＰＰＦを計算すると計算時間が長くなるからである。本実施形態では、絶対投票数ではなくランキングによって局所座標の候補が選択されるため、全ての２点間のＰＰＦを計算する必要はない。各ｓｉについて、Ｎｒが一定であれば、ランキングの結果は大きくは変わらない。誤差はＮｖ（検証される姿勢の数）が大きければ修正される。Ｎｒは、例えば３〜１０％程度の小さい数に抑えることができる。Ｎｖは、例えば８０００〜２００００程度の値である。 In the present embodiment, at the voting stage, some point groups Sr are randomly selected from the scene point group. This is because calculating the PPF between all two points in the scene point group increases the calculation time. In the present embodiment, local coordinate candidates are selected by ranking rather than the absolute number of votes, so it is not necessary to calculate PPF between all two points. If Nr is constant for each si, the ranking result does not change significantly. The error is corrected if Nv (number of postures to be verified) is large. Nr can be suppressed to a small number of about 3 to 10%, for example. Nv is a value of about 8000 to 20000, for example.

表２は、図１１のフローチャートが示すアルゴリズムの一例を示している。このようなアルゴリズムを用いることにより、信号処理回路１２０は、信頼性の高い姿勢を、比較的短時間で決定することができる。 Table 2 shows an example of the algorithm shown in the flowchart of FIG. By using such an algorithm, the signal processing circuit 120 can determine a highly reliable posture in a relatively short time.

［２−５．姿勢の検証］
次に、ステップＳ１５６における姿勢の検証処理をより詳細に説明する。 [2-5. Posture verification]
Next, the posture verification process in step S156 will be described in more detail.

実施形態では、特許文献１および非特許文献１に開示された方法とは異なり、候補の姿勢のクラスタリングは行わない。より多くの候補姿勢の正確性を検証する。 In the embodiment, unlike the methods disclosed in Patent Document 1 and Non-Patent Document 1, clustering of candidate postures is not performed. Verify the accuracy of more candidate poses.

従来の姿勢検証方法では、モデル点群がシーン点群に変換され、全てのモデル点について、最も近いシーン点が探索される。これらの２つの点の間の距離とそれらの法線の間の角度とが比較される。当該距離および角度のそれぞれが閾値よりも小さければ、そのモデル点はシーン点に整合していると考えられる。整合するモデル点の数が十分に多ければ、その姿勢は正しいと考えられる。例えば、Nguyen et al.（"Determination of 3d object pose in point cloud with cad model. 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp. 16, 2015）は、そのような考え方で、モデル点とシーン点とのフィッティングを行う方法を開示している。 In the conventional posture verification method, the model point group is converted into the scene point group, and the closest scene point is searched for all the model points. The distance between these two points and the angle between their normals are compared. If each of the distance and angle is smaller than the threshold value, the model point is considered to match the scene point. If the number of matching model points is large enough, the posture is considered correct. For example, Nguyen et al. ("Determination of 3d object pose in point cloud with cad model. 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp. 16, 2015) A method of fitting with a point is disclosed.

この方法は、直感的ではあるが、多くの時間を要する。全ての姿勢の候補について、全てのモデル点に最も近い点を探索する必要があるからである。 This method is intuitive but takes a lot of time. This is because it is necessary to search for a point closest to all model points for all the posture candidates.

本実施形態では、モデル点に最も近い点を見つける必要のない方法が用いられる。具体的には、３次元積分画像（ｉｎｔｅｇｒａｌｉｍａｇｅ）が用いられる。これにより、姿勢検証の効率を大きく改善することができる。 In the present embodiment, a method that does not require finding a point closest to the model point is used. Specifically, a three-dimensional integrated image (integral image) is used. Thereby, the efficiency of posture verification can be greatly improved.

図１２Ａは、２次元の積分画像を説明するための図である。２次元の積分画像においては、点（ｘ，ｙ）における画素値は、元画像における（ｘ，ｙ）の位置から左上に位置する領域における全ての画素値の総和である。すなわち、積分画像の画素値は、以下の式（３）で表される。

ｉｉ（ｘ，ｙ）は、積分画像の画素値を、ｉ（ｘ，ｙ）は、元画像の画素値を表している。 FIG. 12A is a diagram for explaining a two-dimensional integral image. In the two-dimensional integrated image, the pixel value at the point (x, y) is the sum of all pixel values in the region located at the upper left from the position (x, y) in the original image. That is, the pixel value of the integral image is expressed by the following formula (3).

ii (x, y) represents the pixel value of the integral image, and i (x, y) represents the pixel value of the original image.

積分画像の画素値は、次の式（４）によって計算される。

The pixel value of the integral image is calculated by the following equation (4).

積分画像を用いることにより、以下の式（５）に示すように、任意の長方形の画素値の和を、４つの配列の参照によって計算できる。

By using the integral image, as shown in the following formula (5), the sum of pixel values of an arbitrary rectangle can be calculated by referring to four arrays.

３次元の積分画像についても、基本的な考え方は２次元と同様である。異なる点は、探索される空間が、長方形から直方体に変わることのみである。 The basic concept of the three-dimensional integral image is the same as that of the two-dimensional image. The only difference is that the searched space changes from a rectangle to a rectangular parallelepiped.

図１２Ｂは、３次元積分画像を説明するための図である。３次元積分画像は、任意の直方体の値の総和を計算するのに用いられる。ここで、直方体の各辺は、座標軸に平行である。式（４）と同様に、３次元積分画像は、以下の式（６）によって計算できる。

FIG. 12B is a diagram for explaining a three-dimensional integral image. The three-dimensional integral image is used to calculate the sum of arbitrary rectangular parallelepiped values. Here, each side of the rectangular parallelepiped is parallel to the coordinate axis. Similar to the equation (4), the three-dimensional integral image can be calculated by the following equation (6).

３次元積分画像を構築した後は、任意の直方体の値の総和を、次の式（７）によって計算できる。

After constructing the three-dimensional integral image, the sum of the values of any rectangular parallelepiped can be calculated by the following equation (7).

３次元積分画像の考え方を用いることにより、アルゴリズムを高速化することができる。本実施形態では、シーン点群が存在する３次元空間を、３次元画像として取り扱う。１つの画素値は、その画素内のシーン点の数とする。これにより、３次元空間の３次元積分画像を生成する。任意の直方体の内部に含まれるシーン点の数は、式（７）で計算できる。 By using the concept of a three-dimensional integral image, the algorithm can be speeded up. In the present embodiment, a three-dimensional space in which a scene point group exists is handled as a three-dimensional image. One pixel value is the number of scene points in the pixel. Thereby, a three-dimensional integrated image of the three-dimensional space is generated. The number of scene points included in an arbitrary rectangular parallelepiped can be calculated by Expression (7).

シーン点群が含まれる座標系に変換された全てのモデル点ｍｉについて、各点ｍｉは、各辺が座標軸に平行な立方体の中心に配置される。点ｍｉのまわりにシーン点があるか否かが判断される。この立方体中の点の数がゼロでない場合、その立方体中のいずれかの点は、ｍｉに対応する可能性がある。 For all model points mi converted to a coordinate system including a scene point group, each point mi is arranged at the center of a cube whose sides are parallel to the coordinate axis. It is determined whether there are scene points around the point mi. If the number of points in the cube is not zero, any point in the cube may correspond to mi.

法線間の角度を計算するために、３つの３次元積分マップが生成される。３次元積分マップは、シーン点群の法線のｘ、ｙ、ｚ成分を個別に格納する。同じ立方体中の３つの３次元積分マップの合計Σnx、Σny、Σnzは、その立方体中のシーン点全体の法線と考えることができる。ｍｉの法線と、（Σnx，Σny，Σnz）との角度が閾値よりも小さい場合、このモデル点は対応するシーン点をもち、このモデル点を正確点（correct point）と呼ぶことができる。この正確点の数（正確点数）を、各姿勢のスコアとする。 Three 3D integration maps are generated to calculate the angle between normals. The three-dimensional integration map individually stores x, y, z components of the normal of the scene point group. The sum Σnx, Σny, Σnz of the three three-dimensional integration maps in the same cube can be considered as the normal of the entire scene points in that cube. If the angle between the normal of mi and (Σnx, Σny, Σnz) is smaller than the threshold, this model point has a corresponding scene point, and this model point can be called a correct point. The number of accurate points (accurate score) is used as a score for each posture.

この方法では、正確点数は、認識性能を評価する上で重要なパラメータである。姿勢が既に与えられているため、全てのモデル点がシーン空間に変換される。もしその姿勢が正確なら、変換後、多くのモデル点がシーン点に整合する。つまり正確点数が多くなる。複数の候補の姿勢について、正確点数を比較することで、どの姿勢が最も正確かを容易に判別することができる。 In this method, the accurate score is an important parameter in evaluating recognition performance. Since the pose has already been given, all model points are converted to the scene space. If the pose is accurate, many model points match the scene points after conversion. That is, the accurate score increases. It is possible to easily determine which posture is the most accurate by comparing the accurate score for a plurality of candidate postures.

３次元積分画像を用いることの利点は、計算の効率化が可能な点にある。本発明者らの実験によれば、例えば２秒間に１００００回以上の姿勢の検証が可能である。 An advantage of using a three-dimensional integral image is that calculation efficiency can be improved. According to the experiments by the present inventors, for example, the posture can be verified 10,000 times or more in 2 seconds.

本実施形態における姿勢検証法は、非常に高速であるため、Drostに開示された方法とは異なり、姿勢のクラスタリングは不要である。さらに、検証の正確性も向上する。類似する姿勢をクラスタリングして平均値をとるのではなく、最も高い正確点数をもつ姿勢を採用することにより、多くの場合、従来よりも正確に姿勢を推定することができる。 Since the posture verification method in the present embodiment is very fast, unlike the method disclosed in Drost, posture clustering is not necessary. Furthermore, verification accuracy is also improved. In many cases, it is possible to estimate the posture more accurately than in the past by adopting the posture having the highest accurate score, instead of clustering similar postures and taking the average value.

［２−６．複数物体の検出］
本実施形態におけるアルゴリズムの他の利点は、シーン中に複数の物体を見つけることが容易であることである。本実施形態では、多数の姿勢が検証される。これらの検証結果は、シーン中の複数の物体の姿勢を検証する際にも利用することができる。例えば、以下の方法を用いることができる。
１）全ての姿勢をスコアに従ってランク付けする。
２）最初に選択された姿勢をＰ１とする。Ｐ１に属する点群をシーン点群から削除する。
３）積分画像空間をリフレッシュする。
４）新たな積分画像空間において、ステップ１）においてランク付けされた姿勢群を検証する。
５）最高スコアの姿勢Ｐ２を新たに選ぶ。Ｐ２に属する点群をシーン点群から削除する。
６）３）に戻る。 [2-6. Detection of multiple objects]
Another advantage of the algorithm in this embodiment is that it is easy to find multiple objects in the scene. In this embodiment, a number of postures are verified. These verification results can also be used when verifying the postures of a plurality of objects in a scene. For example, the following method can be used.
1) Rank all postures according to score.
2) Let P1 be the posture selected first. The point group belonging to P1 is deleted from the scene point group.
3) Refresh the integral image space.
4) In the new integrated image space, verify the posture group ranked in step 1).
5) Select a new posture P2 with the highest score. The point group belonging to P2 is deleted from the scene point group.
6) Return to 3).

［３．効果］
以上のように、本実施形態によれば、信頼性の高い厳選された点（キーポイント）のみを用いたハッシュテーブルが作成される。これにより、対象物の姿勢推定処理を高速化することができる。さらに、モデル点群の各点ｍｉの周囲の点群についてＰＰＦを計算し、座標計算によって座標変換パラメータ（局所座標）の投票を重ねる。投票数の多い姿勢から、前述の検証処理によって最善の姿勢が決定される。これにより、高速かつ信頼性の高い認識が可能となる。 [3. effect]
As described above, according to the present embodiment, a hash table using only carefully selected points (key points) with high reliability is created. Thereby, it is possible to speed up the posture estimation processing of the object. Further, the PPF is calculated for the point group around each point mi of the model point group, and the coordinate conversion parameters (local coordinates) are voted by the coordinate calculation. The best posture is determined from the posture with a large number of votes by the above-described verification processing. Thereby, recognition with high speed and high reliability is possible.

本発明者らは、多数の合成シーンおよび現実のシーンについて実験を行い、本実施形態の効果を検証した。各入力シーンおよびモデルについて、点群は、ダウンサンプルされ、モデル点数は約３０００である。キーポイント数Ｎｋは、８から２０とし、Ｎｖ＝８０００〜２００００の姿勢を検証した。Ｎｒは３％から１０％とした。 The inventors conducted experiments on a large number of synthesized scenes and actual scenes, and verified the effects of this embodiment. For each input scene and model, the point cloud is downsampled and the model score is about 3000. The number of key points Nk was 8 to 20, and the posture of Nv = 8000 to 20000 was verified. Nr was 3% to 10%.

工業部品および非工業部品を含む１６個のモデルを用いた。図１３は、それらのモデルを示している。図１３において、（ａ）〜（ｆ）は、Hinterstoisser et al. ("Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes", Asian conference on computer vision, pp. 548-562, 2012)のデータセットから利用した。（ｇ）〜（ｊ）は、Mian et al. （"Three-dimensional modelbased object recognition and segmentation in cluttered scenes", IEEE Transactions on Pattern Analysis & Machine Intelligence, Vol. 28(10): pp. 1584-1601, 2006、および" On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. Int. Journal of Computer Vision, Vol. 89(2): pp. 348-361, 2010)のデータセットから利用した。（ｋ）〜（ｐ）は工業用部品である。 Sixteen models including industrial and non-industrial parts were used. FIG. 13 shows these models. In FIG. 13, (a) to (f) are the results of Hinterstoisser et al. ("Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes", Asian conference on computer vision, pp. 548-562. , 2012). (G)-(j) are from Mian et al. ("Three-dimensional modelbased object recognition and segmentation in cluttered scenes", IEEE Transactions on Pattern Analysis & Machine Intelligence, Vol. 28 (10): pp. 1584-1601, 2006, and "On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes.Int. Journal of Computer Vision, Vol. 89 (2): pp. 348-361, 2010) (K) to (p) are industrial parts.

各実験において、誤差が閾値よりも小さい場合に結果が正しいと判定した。閾値は、変位についてはモデル中で最も離れた２点間の距離の１／２０とし、回転については１２度とした。 In each experiment, the result was determined to be correct when the error was less than the threshold. The threshold was 1/20 of the distance between the two most distant points in the model for displacement and 12 degrees for rotation.

図１４Ａおよび図１４Ｂは、合成シーンについてのシミュレーション結果を示すグラフである。図１４Ａは、モデルごとに、正しく認識された率（認識率）を示している。図１４Ｂは、モデルごとの計算時間を示している。比較のため、Drostに開示された方法を使用した場合についても同じ条件で計算した。合成シーンについては、各シーンについて、複数の同じ物体を含むように生成した。各シーンにおける物体の数は、５個から１５個であり、４個の物体を検出した。 FIG. 14A and FIG. 14B are graphs showing simulation results for the composite scene. FIG. 14A shows the rate (recognition rate) correctly recognized for each model. FIG. 14B shows the calculation time for each model. For comparison, the same conditions were used when the method disclosed in Drost was used. The synthesized scene was generated so as to include a plurality of the same objects for each scene. The number of objects in each scene was 5 to 15, and 4 objects were detected.

図１４Ａおよび図１４Ｂからわかるように、非工業部品については、本実施形態の方法でもDrostの方法でも良好な認識率および認識スピードが達成できた。しかし、工業部品については、認識率および認識スピードともに、本実施形態の方法の方が、Drostの方法よりも良好な結果が得られた。 As can be seen from FIG. 14A and FIG. 14B, for non-industrial parts, a good recognition rate and recognition speed could be achieved by both the method of this embodiment and the Drost method. However, for industrial parts, both the recognition rate and the recognition speed were better in the method of this embodiment than in the Drost method.

図１５は、工業部品についての認識結果の一例を示す画像である。（ａ）はDrostの方法を用いた場合の結果を示し、（ｂ）は本実施形態の方法を用いた場合の結果を示している。図中において、ハッチングされている箇所が、認識された箇所を示す。本実施形態の方法の方が、工業部品をより正確に認識できていることがわかる。 FIG. 15 is an image showing an example of a recognition result for an industrial part. (A) shows the result when using the method of Drost, and (b) shows the result when using the method of this embodiment. In the figure, the hatched location indicates the recognized location. It can be seen that the method of the present embodiment can more accurately recognize industrial parts.

図１６は、現実のシーンについて行った実験の結果を示すグラフである。図１６は、オクルージョンの割合を変化させたときの認識率の変化の例を示している。オクルージョンの割合は、シーン中のモデルの表面積の合計をモデルの全表面積で割った値を１から減じた値である。図１６Ａに示すように、本実施形態のアルゴリズムによれば、Drostのアルゴリズムと比較して、特にオクルージョンの割合が高い場合の認識率の低下を抑えることができる。 FIG. 16 is a graph showing the results of an experiment performed on an actual scene. FIG. 16 shows an example of the change in the recognition rate when the occlusion ratio is changed. The occlusion ratio is a value obtained by subtracting one from the value obtained by dividing the total surface area of the model in the scene by the total surface area of the model. As shown in FIG. 16A, according to the algorithm of the present embodiment, it is possible to suppress a decrease in the recognition rate particularly when the occlusion ratio is high, as compared with the Drost algorithm.

なお、上記の［２−３．キーポイントの選択］、［２−４．姿勢推定処理］、［２−５．姿勢の検証］、［２−６．複数物体の検出］において説明した処理は、あくまでも例示にすぎず、各処理について様々な改変が可能である。例えば、図９に示すキーポイントの選択の処理におけるスコアの算出の方法を適宜変形してもよい。また、図１１から図１２Ｂを参照して説明した物体の姿勢の推定方法に代えて、例えばDrostに開示された方法を適用してもよい。Drost等の公知の方法を適用した場合であっても、本発明の実施形態におけるキーポイントに関するＰＰＦの情報のみを含むハッシュテーブルが用いられていれば、演算時間の短縮の効果が得られる。 The above-mentioned [2-3. Keypoint selection], [2-4. Posture estimation processing], [2-5. Posture verification], [2-6. The processing described in [Detection of Multiple Objects] is merely an example, and various modifications can be made for each processing. For example, the score calculation method in the key point selection process shown in FIG. 9 may be modified as appropriate. Further, instead of the object posture estimation method described with reference to FIGS. 11 to 12B, for example, a method disclosed in Drost may be applied. Even when a known method such as Drost is applied, if a hash table including only PPF information related to key points in the embodiment of the present invention is used, an effect of shortening the calculation time can be obtained.

以上のように、本開示は、以下の項目に記載の方法、プログラム、および装置を含む。 As described above, the present disclosure includes the methods, programs, and apparatuses described in the following items.

［項目１］
３次元シーンにおける対象物の位置および姿勢を推定する方法であって、
前記対象物の表面の３次元形状を表すモデル点群のデータを取得するステップと、
前記モデル点群から選択されたポイントペアの幾何学的関係を規定するポイントペア特徴量（ＰｏｉｎｔＰａｉｒＦｅａｔｕｒｅ）を、各ポイントペアについて計算し、前記ポイントペア特徴量と前記ポイントペアとを関連付けて第１のテーブルに記録するステップと、
前記モデル点群のデータおよび前記第１のテーブルに基づいて、前記モデル点群から、複数のキーポイントを選択するステップと、
前記第１のテーブルに記録されたデータのうち、前記複数のキーポイントに関するデータのみを残した第２のテーブルを構築するステップと、
前記３次元シーンに含まれる１以上の物体の表面形状を表すシーン点群のデータを取得するステップと、
前記シーン点群のデータ、および前記第２のテーブルに基づいて、前記３次元シーン中の前記対象物の位置および姿勢を推定するステップと、
を含む方法。 [Item 1]
A method for estimating the position and orientation of an object in a three-dimensional scene,
Obtaining model point cloud data representing a three-dimensional shape of the surface of the object;
Point pair features that define the geometric relationship of the point pairs selected from the model point group are calculated for each point pair, and the point pair features and the point pairs are associated with each other. Recording in one table;
Selecting a plurality of key points from the model point group based on the data of the model point group and the first table;
Constructing a second table in which only the data relating to the plurality of key points is left among the data recorded in the first table;
Obtaining scene point cloud data representing the surface shape of one or more objects included in the three-dimensional scene;
Estimating the position and orientation of the object in the three-dimensional scene based on the data of the scene point group and the second table;
Including methods.

［項目２］
前記複数のキーポイントを選択するステップは、
前記モデル点群における点ｍｉ（ｉは１以上Ｎｍ以下の整数、Ｎｍは前記モデル点群に含まれる点の数）と、点ｍｉからの距離が閾値よりも短い参照領域に含まれる点ｍｊ（ｊは１以上Ｍ以下の整数、Ｍは前記参照領域に含まれる点の数）とのポイントペア特徴量をそれぞれ計算するステップと、
前記第１のテーブルを参照して、計算した前記ポイントペア特徴量に類似するポイントペア特徴量をもつ少なくとも１つのポイントペアを探索するステップと、
探索した前記ポイントペアの各々について、点ｍｉおよび点ｍｊの対に整合させるための座標変換パラメータを決定するステップと、
点ｍｉのうち、決定した前記座標変換パラメータの各々を用いて前記モデル点群の座標を変換したとき、前記参照領域に含まれる全ての点の近傍に、変換後のモデル点群のいずれかの点が位置する場合の数が最小になる点ｍｉを、キーポイントとして選択するステップと、
を含む、項目１に記載の方法。 [Item 2]
The step of selecting the plurality of key points includes:
A point mi in the model point group (i is an integer of 1 to Nm, Nm is the number of points included in the model point group) and a point mj (in the reference region whose distance from the point mi is shorter than a threshold value) j is an integer greater than or equal to 1 and less than or equal to M, and M is the number of points included in the reference area),
Searching for at least one point pair having a point pair feature amount similar to the calculated point pair feature amount with reference to the first table;
Determining a coordinate transformation parameter for matching each of the searched point pairs to a pair of points mi and mj;
Among the points mi, when the coordinates of the model point group are converted using each of the determined coordinate conversion parameters, any one of the converted model point groups is located near all the points included in the reference area. Selecting a point mi that has the smallest number of points as a key point;
The method according to item 1, comprising:

［項目３］
前記複数のキーポイントの個数は、前記モデル点群の点の個数の１／１００００倍以上１／２倍以下である、請求項１または２に記載の方法。 [Item 3]
3. The method according to claim 1, wherein the number of the plurality of key points is 1 / 10,000 times or more and 1/2 times or less the number of points of the model point group.

［項目４］
前記複数のキーポイントの個数は、前記モデル点群の点の個数の１／１０００倍以上１／１００倍以下である、項目１から３のいずれかに記載の方法。 [Item 4]
4. The method according to any one of items 1 to 3, wherein the number of the plurality of key points is not less than 1/1000 times and not more than 1/100 times the number of points of the model point group.

［項目５］
前記対象物の位置および姿勢を推定するステップは、
前記シーン点群における各点ｓｉ（ｉは１以上Ｎｓ以下の整数、Ｎｓは前記シーン点群に含まれる点の数）からの距離が、前記モデル点群に含まれる最も離れた２点間の距離以下である点の集合から、一部の点群Ｓｒを選択するステップと、
点ｓｉと、点群Ｓｒ内の各点Ｓｒｊ（ｊは１以上Ｍｓ以下の整数）とのポイントペア特徴量を計算するステップと、
前記第２のテーブルを参照して、計算した前記ポイントペア特徴量に類似するポイントペア特徴量をもつ少なくとも１つのポイントペアを探索するステップと、
探索した前記ポイントペアの各々について、点ｓｉおよび点Ｓｒｊの対に整合させるための座標変換パラメータを決定し、前記座標変換パラメータが示す姿勢ごとにカウント数を決定するステップと、
前記カウント数を決定した姿勢の中から、前記カウント数が多い順に複数の候補姿勢を選択するステップと、
前記複数の候補姿勢の中から、前記モデル点群と前記シーン点群との整合度が最も高くなる姿勢を、最終的な姿勢として決定するステップと、
を含む、項目１から４のいずれかに記載の方法。 [Item 5]
Estimating the position and orientation of the object includes
The distance from each point si in the scene point group (i is an integer of 1 to Ns, Ns is the number of points included in the scene point group) between the two most distant points included in the model point group Selecting a point cloud Sr from a set of points that are less than or equal to the distance;
Calculating a point pair feature quantity between the point si and each point Srj (j is an integer of 1 to Ms) in the point group Sr;
Searching for at least one point pair having a point pair feature amount similar to the calculated point pair feature amount with reference to the second table;
For each of the searched point pairs, determining a coordinate transformation parameter for matching to a pair of point si and point Srj, and determining a count for each posture indicated by the coordinate transformation parameter;
Selecting a plurality of candidate postures in descending order of the number of counts from the postures that have determined the count number;
Determining a posture having the highest degree of matching between the model point group and the scene point group as a final posture from the plurality of candidate postures;
The method according to any one of items 1 to 4, comprising:

［項目６］
項目１から５に記載の方法をコンピュータに実行させるコンピュータプログラム。 [Item 6]
A computer program that causes a computer to execute the method according to items 1 to 5.

［項目７］
項目６に記載のコンピュータプログラムを格納したメモリと、
前記コンピュータプログラムを実行するプロセッサと、
を備える処理装置。 [Item 7]
A memory storing the computer program according to item 6, and
A processor for executing the computer program;
A processing apparatus comprising:

［項目８］
項目７に記載の処理装置と、
エンドエフェクタと、
前記処理装置によって推定された前記対象物の位置および姿勢の情報に基づいて、前記エンドエフェクタを制御する制御回路と、
を備えるロボット。 [Item 8]
The processing apparatus according to item 7,
An end effector;
A control circuit for controlling the end effector based on the position and orientation information of the object estimated by the processing device;
Robot equipped with.

［項目９］
項目７に記載の処理装置と、
ロボットと、
前記シーン点群データを取得する３次元センサと、
前記処理装置によって推定された前記対象物の位置および姿勢の情報に基づいて、前記ロボットを制御する制御回路と、
を備えるロボットシステム。 [Item 9]
The processing apparatus according to item 7,
With robots,
A three-dimensional sensor for acquiring the scene point cloud data;
A control circuit for controlling the robot based on the position and orientation information of the object estimated by the processing device;
A robot system comprising:

本開示の実施形態における３次元物体の姿勢推定方法は、例えば、ビンピッキングを実行するロボットのための画像認識の分野で利用可能である。 The 3D object posture estimation method according to the embodiment of the present disclosure can be used, for example, in the field of image recognition for a robot that performs bin picking.

１００制御装置
１１０制御回路
１２０画像処理回路
１３０メモリ
１４０入力インタフェース
１５０出力インタフェース
２００ロボット
３００３次元センサ
４００ディスプレイ
８００ばら積みされた物体 DESCRIPTION OF SYMBOLS 100 Control apparatus 110 Control circuit 120 Image processing circuit 130 Memory 140 Input interface 150 Output interface 200 Robot 300 Three-dimensional sensor 400 Display 800 Bulk object

Claims

A method for estimating the position and orientation of an object in a three-dimensional scene,
Obtaining model point cloud data representing a three-dimensional shape of the surface of the object;
Point pair features that define the geometric relationship of the point pairs selected from the model point group are calculated for each point pair, and the point pair features and the point pairs are associated with each other. Recording in one table;
Selecting a plurality of key points from the model point group based on the data of the model point group and the first table;
Constructing a second table in which only the data relating to the plurality of key points is left among the data recorded in the first table;
Obtaining scene point cloud data representing the surface shape of one or more objects included in the three-dimensional scene;
Estimating the position and orientation of the object in the three-dimensional scene based on the data of the scene point group and the second table;
Including methods.

The step of selecting the plurality of key points includes:
A point mi in the model point group (i is an integer of 1 to Nm, Nm is the number of points included in the model point group) and a point mj (in the reference region whose distance from the point mi is shorter than a threshold value) j is an integer greater than or equal to 1 and less than or equal to M, and M is the number of points included in the reference area),
Searching for at least one point pair having a point pair feature amount similar to the calculated point pair feature amount with reference to the first table;
Determining a coordinate transformation parameter for matching each of the searched point pairs to a pair of points mi and mj;
Among the points mi, when the coordinates of the model point group are converted using each of the determined coordinate conversion parameters, any one of the converted model point groups is located near all the points included in the reference area. Selecting a point mi that has the smallest number of points as a key point;
The method of claim 1 comprising:

3. The method according to claim 1, wherein the number of the plurality of key points is 1 / 10,000 times or more and 1/2 times or less the number of points of the model point group.

The method according to claim 1, wherein the number of the plurality of key points is 1/1000 times or more and 1/100 times or less the number of points of the model point group.

Estimating the position and orientation of the object includes
The distance from each point si in the scene point group (i is an integer of 1 to Ns, Ns is the number of points included in the scene point group) between the two most distant points included in the model point group Selecting a point cloud Sr from a set of points that are less than or equal to the distance;
Calculating a point pair feature quantity between the point si and each point Srj (j is an integer of 1 to Ms) in the point group Sr;
Searching for at least one point pair having a point pair feature amount similar to the calculated point pair feature amount with reference to the second table;
For each of the searched point pairs, determining a coordinate transformation parameter for matching to a pair of point si and point Srj, and determining a count for each posture indicated by the coordinate transformation parameter;
Selecting a plurality of candidate postures in descending order of the number of counts from the postures that have determined the count number;
Determining a posture having the highest degree of matching between the model point group and the scene point group as a final posture from the plurality of candidate postures;
The method according to claim 1, comprising:

A computer program for causing a computer to execute the method according to claim 1.

A memory storing the computer program according to claim 6;
A processor for executing the computer program;
A processing apparatus comprising:

A processing apparatus according to claim 7;
An end effector;
A control circuit for controlling the end effector based on the position and orientation information of the object estimated by the processing device;
Robot equipped with.

A processing apparatus according to claim 7;
With robots,
A three-dimensional sensor for acquiring the scene point cloud data;
A control circuit for controlling the robot based on the position and orientation information of the object estimated by the processing device;
A robot system comprising: