JP4649559B2

JP4649559B2 - 3D object recognition apparatus, 3D object recognition program, and computer-readable recording medium on which the same is recorded

Info

Publication number: JP4649559B2
Application number: JP2009051500A
Authority: JP
Inventors: 剛徐
Original assignee: 3D Media Co Ltd
Current assignee: Kyoto Robotics Corp
Priority date: 2009-03-05
Filing date: 2009-03-05
Publication date: 2011-03-09
Anticipated expiration: 2029-03-05
Also published as: JP2010205095A

Description

本発明は、形状が既知の３次元物体を、カメラ等で撮影した２次元画像における輪郭などの特徴から認識する３次元物体認識装置、並びに３次元物体認識プログラム及びこれが記録されたコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to a three-dimensional object recognition apparatus for recognizing a three-dimensional object having a known shape from features such as an outline in a two-dimensional image photographed by a camera or the like, a three-dimensional object recognition program, and a computer-readable computer on which the same is recorded. The present invention relates to a recording medium.

生産ラインにおいてロボットアームによる部品等の正確な操作を可能とするため、山積みにされた部品等を個々に認識し、各部品の位置及び姿勢を認識する３次元物体認識装置が近年開発されている。この３次元物体認識装置は、まず３次元物体を所定方向からカメラで撮影した画像から３次元物体のエッジすなわち輪郭などの特徴を抽出し、撮影画像を構成する各ピクセルについて最も近いエッジまでの距離をそれぞれ計算する。次に、３次元物体認識装置は、３次元物体を種々の位置及び姿勢に置いた状態で撮影画像に投影し、エッジを構成する各点の座標をそれぞれ算出する。そして、３次元物体認識装置は、両者を比較した誤差に基づいてその位置及び姿勢を評価し、最も評価の高い位置及び姿勢をその３次元物体の位置及び姿勢として認識する。 In order to enable accurate operation of parts and the like by a robot arm in a production line, a three-dimensional object recognition device has been developed in recent years that recognizes a pile of parts individually and recognizes the position and orientation of each part. . This three-dimensional object recognition apparatus first extracts features such as edges of a three-dimensional object, ie, contours, from an image obtained by photographing the three-dimensional object with a camera from a predetermined direction, and the distance to the nearest edge for each pixel constituting the photographed image. Respectively. Next, the three-dimensional object recognition device projects the three-dimensional object on the captured image in various positions and postures, and calculates the coordinates of each point constituting the edge. Then, the three-dimensional object recognition apparatus evaluates the position and orientation based on an error obtained by comparing the two, and recognizes the position and orientation with the highest evaluation as the position and orientation of the three-dimensional object.

しかし、３次元物体の位置及び姿勢を変化させる度に、撮影画像を構成する各ピクセルについて最も近いエッジまでの距離を計算したのでは、高性能な処理装置が必要となりコストが高くなる。従って、この問題を解消すべく、ディスタンスマップを予め作成することが提案されている（例えば特許文献１参照）。このディスタンスマップは、撮影画像を構成する各ピクセルに、最も近いエッジまでの距離を画素値としてそれぞれ持たせたものであり、このディスタンスマップを参照することにより、最も近いエッジまでの距離を一々計算する手間を省くことができる。 However, each time the position and orientation of the three-dimensional object are changed, calculating the distance to the nearest edge for each pixel constituting the captured image requires a high-performance processing device and increases the cost. Therefore, in order to solve this problem, it has been proposed to create a distance map in advance (see, for example, Patent Document 1). This distance map is obtained by assigning the distance to the nearest edge to each pixel constituting the captured image as a pixel value. By referring to this distance map, the distance to the nearest edge is calculated one by one. This saves you time and effort.

特開平１１−０２５２９１号公報Japanese Patent Laid-Open No. 11-025291

しかし、従来の３次元物体認識装置は、認識対象物の隠れの影響によってロバスト性が悪いという問題があった。すなわち、部品等が山積みされた状態では、カメラ方向から見ると、認識対象である３次元物体の一部が他の物体によって隠された状態となる場合がある。そして、あるピクセルから最も近いエッジが隠れた状態となった場合、最も近いエッジまでの距離が誤って計算されるため、３次元物体の位置及び姿勢を正しく評価することができず、誤認識が発生する。 However, the conventional three-dimensional object recognition apparatus has a problem that the robustness is poor due to the influence of hiding the recognition object. That is, in a state in which parts and the like are stacked, when viewed from the camera direction, a part of the three-dimensional object that is a recognition target may be hidden by another object. And when the closest edge from a certain pixel is hidden, the distance to the closest edge is calculated incorrectly, so the position and orientation of the three-dimensional object cannot be correctly evaluated, resulting in erroneous recognition. appear.

また、本発明に係る３次元認識システムは、予め定めた複数の位置及び姿勢を評価して最適なものを選択するに留まるため、３次元物体の本来の位置及び姿勢と比較して誤差が生じる場合があり、認識精度が悪いという問題もある。 In addition, since the three-dimensional recognition system according to the present invention only selects an optimum one by evaluating a plurality of predetermined positions and postures, an error occurs compared to the original position and posture of the three-dimensional object. In some cases, the recognition accuracy is poor.

本発明は、このような問題に鑑みてなされたものであり、２次元画像における輪郭などの特徴から３次元物体を認識する３次元物体認識装置において、隠れの影響を排除してロバスト性を向上させるとともに、位置及び姿勢を最適化して認識精度を高める手段を提供する。 The present invention has been made in view of such problems, and in a three-dimensional object recognition apparatus that recognizes a three-dimensional object from features such as contours in a two-dimensional image, improves the robustness by eliminating the influence of hiding. And a means for optimizing the position and orientation to increase the recognition accuracy.

上記目的を達成するための本発明の請求項１に係る３次元物体認識装置は、認識対象である３次元物体を所定方向から撮影して画像を取得するカメラと、前記３次元物体の位置及び姿勢を変化させながら、前記３次元物体のエッジを構成するサンプリング点のうち前記カメラから視認可能なサンプリング点をカメラ画像にそれぞれ投影し、各投影点の座標及び各投影点におけるエッジの向きをそれぞれ算出する投影点座標算出手段と、前記３次元物体の位置及び姿勢と、前記各投影点の座標及び各投影点におけるエッジの向きとを対応付けて格納したルックアップテーブルを記憶するルックアップテーブル記憶手段と、前記カメラが取得した原画像に基づいて、該原画像の解像度を異なる比率で低下させた複数枚のピラミッド画像を作成するピラミッド画像作成手段と、解像度が最も低い前記ピラミッド画像について前記３次元物体のエッジを抽出するエッジ抽出手段と、解像度が最も低い前記ピラミッド画像を構成する各ピクセルに、抽出したエッジのうち最も近いエッジまでの距離と、前記最も近いエッジの向きとを画素値として持たせてなる方向付きディスタンスマップを作成する方向付きディスタンスマップ作成手段と、前記方向付きディスタンスマップ上に、前記ルックアップテーブルに格納された前記各投影点をそれぞれマッピングする投影点マッピング手段と、前記各投影点におけるエッジの向きと、前記方向付きディスタンスマップにおいて前記各投影点に対応するピクセルが持つ最も近いエッジの向きとをそれぞれ比較し、両者が略一致する投影点群について対応するピクセル群が持つ最も近いエッジまでの距離の自乗和を算出し、その算出結果に基づいて前記３次元物体の位置及び姿勢を評価する位置姿勢評価手段と、該位置姿勢評価手段によって前記３次元物体の実際の位置及び姿勢に近いと評価された位置及び姿勢を、前記自乗和が最小となるように最適化する位置姿勢最適化手段と、を備えるものである。 In order to achieve the above object, a three-dimensional object recognition apparatus according to claim 1 of the present invention includes a camera that captures an image by photographing a three-dimensional object that is a recognition target from a predetermined direction, a position of the three-dimensional object, and While changing the posture, among the sampling points constituting the edge of the three-dimensional object, the sampling points visible from the camera are respectively projected onto the camera image, and the coordinates of each projection point and the direction of the edge at each projection point are respectively projected. Projection point coordinate calculation means for calculating, a lookup table storage for storing a lookup table in which the position and orientation of the three-dimensional object, the coordinates of each projection point and the direction of the edge at each projection point are stored in association with each other Based on the original image acquired by the means and the camera, a plurality of pyramid images in which the resolution of the original image is reduced at different ratios are created A ramid image creating means; an edge extracting means for extracting the edge of the three-dimensional object for the pyramid image having the lowest resolution; and the nearest edge among the extracted edges to each pixel constituting the pyramid image having the lowest resolution A distance map creating means for creating a distance map with a direction having a distance to and a direction of the nearest edge as a pixel value, and stored in the lookup table on the distance map with the direction Further, the projection point mapping means for mapping each projection point, the direction of the edge at each projection point, and the direction of the nearest edge of the pixel corresponding to each projection point in the directional distance map are respectively compared. For a set of projection points that are approximately the same, Calculating the sum of squares of the distance to the nearest edge of the pixel group to be performed, and evaluating the position and orientation of the three-dimensional object based on the calculation result, and the three-dimensional by the position and orientation evaluation unit Position and orientation optimization means for optimizing a position and orientation evaluated to be close to the actual position and orientation of the object so that the sum of squares is minimized.

また、請求項２に係る３次元物体認識装置は、前記エッジ抽出手段が、サブピクセル精度で前記３次元物体のエッジを抽出し、前記位置姿勢評価手段が、前記最も近いエッジまでの距離として、前記ピクセル群から前記サブピクセル精度のエッジへ降ろした垂線の長さを用いるものである。 Further, in the three-dimensional object recognition apparatus according to claim 2, the edge extraction unit extracts an edge of the three-dimensional object with sub-pixel accuracy, and the position / orientation evaluation unit determines the distance to the nearest edge as The length of the perpendicular dropped from the pixel group to the edge of the sub-pixel accuracy is used.

また、請求項３に係る３次元物体認識装置は、コンピュータを、認識対象である３次元物体の位置及び姿勢を変化させながら、前記３次元物体のエッジを構成するサンプリング点のうちカメラから視認可能なサンプリング点をカメラ画像にそれぞれ投影し、各投影点の座標及び各投影点におけるエッジの向きをそれぞれ算出する投影点座標算出手段と、前記３次元物体の位置及び姿勢と、前記各投影点の座標及び各投影点におけるエッジの向きとを対応付けて格納したルックアップテーブルを記憶するルックアップテーブル記憶手段と、前記カメラが取得した原画像に基づいて、該原画像の解像度を異なる比率で低下させた複数枚のピラミッド画像を作成するピラミッド画像作成手段と、解像度が最も低い前記ピラミッド画像について前記３次元物体のエッジを抽出するエッジ抽出手段と、解像度が最も低い前記ピラミッド画像を構成する各ピクセルに、抽出したエッジのうち最も近いエッジまでの距離と、前記最も近いエッジの向きとを画素値として持たせてなる方向付きディスタンスマップを作成する方向付きディスタンスマップ作成手段と、前記方向付きディスタンスマップ上に、前記ルックアップテーブルに格納された前記各投影点をそれぞれマッピングする投影点マッピング手段と、前記各投影点におけるエッジの向きと、前記方向付きディスタンスマップにおいて前記各投影点に対応するピクセルが持つ最も近いエッジの向きとをそれぞれ比較し、両者が略一致する投影点群について対応するピクセル群が持つ最も近いエッジまでの距離の自乗和を算出し、その算出結果に基づいて前記３次元物体の位置及び姿勢を評価する位置姿勢評価手段と、該位置姿勢評価手段によって前記３次元物体の実際の位置及び姿勢に近いと評価された位置及び姿勢を、前記自乗和が最小となるように最適化する位置姿勢最適化手段として機能させるものである。 The three-dimensional object recognition apparatus according to claim 3 is capable of visually recognizing a computer from a sampling point constituting the edge of the three-dimensional object while changing the position and orientation of the three-dimensional object as a recognition target. Projection points on the camera image, projection point coordinate calculation means for calculating the coordinates of each projection point and the direction of the edge at each projection point, the position and orientation of the three-dimensional object, and the projection point Lookup table storage means for storing a lookup table in which coordinates and edge directions at each projection point are stored in association with each other, and based on the original image acquired by the camera, the resolution of the original image is reduced at different ratios. A pyramid image creating means for creating a plurality of pyramid images, and the pyramid image having the lowest resolution; Edge extraction means for extracting the edge of the original object, and each pixel constituting the pyramid image with the lowest resolution, the distance to the nearest edge among the extracted edges and the direction of the nearest edge as pixel values A directional distance map creating means for creating a directional distance map, a projected point mapping means for mapping each of the projected points stored in the lookup table on the directional distance map; and The direction of the edge at each projection point is compared with the direction of the closest edge of the pixel corresponding to each projection point in the directional distance map. Calculate the sum of squares of the distance to the nearest edge Position and orientation evaluation means for evaluating the position and orientation of the three-dimensional object based on the position, and the position and orientation evaluated by the position and orientation evaluation means as being close to the actual position and orientation of the three-dimensional object, It is made to function as a position / orientation optimizing means for optimizing so as to be minimized.

また、請求項４に係る３次元物体認識プログラムが記録されたコンピュータ読み取り可能な記録媒体は、請求項３に記載の３次元物体認識プログラムが記録されたものである。 A computer-readable recording medium on which the three-dimensional object recognition program according to claim 4 is recorded is the one on which the three-dimensional object recognition program according to claim 3 is recorded.

本発明の請求項１に係る３次元物体認識装置によれば、位置姿勢評価手段が、方向付けディスタンスマップとエッジの向きが略一致する投影点だけについて自乗和を算出する。従って、カメラの方向から見て３次元物体の一部が他の物体によって隠された状態であって、最も近いエッジまでの距離が誤って計算されたピクセルに関しては、エッジの向きが一致せず、自乗和を算出する対象から除外される。これにより、いわゆる隠れの影響を低減して、ロバスト性を向上させることができる。また、解像度が最も低いピラミッド画像を用いてルックアップテーブルに格納された位置及び姿勢を評価するので、処理速度を高速化することができる。また、位置姿勢最適化手段が、実際の位置及び姿勢に近いと評価された位置及び姿勢を更に最適化するので、位置及び姿勢の認識精度を向上させることができる。 According to the three-dimensional object recognition apparatus of the first aspect of the present invention, the position / orientation evaluation means calculates the sum of squares only for the projected points whose edge directions substantially coincide with the orientation distance map. Therefore, the edge direction does not match for a pixel in which a part of a three-dimensional object is hidden by another object when viewed from the camera direction and the distance to the nearest edge is erroneously calculated. And excluded from the target of calculating the sum of squares. Thereby, the so-called hiding effect can be reduced and the robustness can be improved. Further, since the position and orientation stored in the lookup table are evaluated using the pyramid image having the lowest resolution, the processing speed can be increased. In addition, since the position and orientation optimization unit further optimizes the position and orientation evaluated to be close to the actual position and orientation, the position and orientation recognition accuracy can be improved.

また、請求項２に係る３次元物体認識装置によれば、ピラミッド画像についてのエッジ抽出をサブピクセル精度で行うので、位置及び姿勢の認識精度を向上させることができる。 In addition, according to the three-dimensional object recognition apparatus according to the second aspect, the edge extraction of the pyramid image is performed with sub-pixel accuracy, so that the position and orientation recognition accuracy can be improved.

また、請求項３に係る３次元物体認識プログラムによれば、請求項１に係る３次元物体認識装置と同様の効果が得られる。 Further, according to the three-dimensional object recognition program according to the third aspect, the same effect as the three-dimensional object recognition apparatus according to the first aspect can be obtained.

また、請求項４に係る３次元物体認識プログラムが記録されたコンピュータ読み取り可能な記録媒体によれば、請求項１に係る３次元物体認識装置と同様の効果が得られる。 According to the computer-readable recording medium on which the three-dimensional object recognition program according to the fourth aspect is recorded, the same effect as that of the three-dimensional object recognition apparatus according to the first aspect can be obtained.

本発明の実施例に係る３次元物体認識装置１の構成を示す模式図。The schematic diagram which shows the structure of the three-dimensional object recognition apparatus 1 which concerns on the Example of this invention. ３次元物体認識プログラム８による処理の流れを示すフローチャート。7 is a flowchart showing a flow of processing by a three-dimensional object recognition program 8. ピラミッド画像１５を説明するための説明図。Explanatory drawing for demonstrating the pyramid image 15. FIG. エッジ抽出を説明するための説明図であって、原画像１６の一部をピクセルレベルまで拡大した状態を示す図。It is explanatory drawing for demonstrating edge extraction, Comprising: The figure which shows the state which expanded some original images 16 to the pixel level. ルックアップテーブルの作成方法を説明するための説明図。Explanatory drawing for demonstrating the creation method of a lookup table.

図１は、本実施例に係る３次元物体認識装置１の構成を示す模式図である。３次元物体認識装置１は、作業台２の上に置かれた認識対象としての３次元物体３と、この３次元物体３を異なる方向から撮影する２台のカメラ４と、３次元物体３を把持するためのロボットアーム５と、各カメラ４から入力された撮影画像に基づいてロボットアーム５の動作を制御するコンピュータ６とを備えるものである。 FIG. 1 is a schematic diagram illustrating a configuration of a three-dimensional object recognition apparatus 1 according to the present embodiment. The three-dimensional object recognition apparatus 1 includes a three-dimensional object 3 as a recognition target placed on a work table 2, two cameras 4 that photograph the three-dimensional object 3 from different directions, and a three-dimensional object 3. A robot arm 5 for gripping and a computer 6 for controlling the operation of the robot arm 5 based on a captured image input from each camera 4 are provided.

コンピュータ６は、図１に示すように、カメラ４が撮影した画像データ等を記憶する画像メモリ７と、３次元物体認識プログラム８を格納するハードディスク９と、該ハードディスク９から読み出された３次元物体認識プログラム８を一時記憶するＲＡＭ（ＲａｎｄａｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０と、この３次元物体認識プログラム８に従って３次元物体３の位置及び姿勢を算出するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、画像メモリ７に記憶された画像データやＣＰＵ１１による算出結果を表示するための表示部１２と、マウスやキーボード等で構成される操作部１３と、これら各部を互いに接続するシステムバス１４とを有している。尚、本実施例では３次元物体認識プログラム８をハードディスク９に格納しているが、これに代えて、コンピュータ読み取り可能な記録媒体（不図示）に格納しておき、この記録媒体から読み出すことも可能である。 As shown in FIG. 1, the computer 6 includes an image memory 7 for storing image data taken by the camera 4, a hard disk 9 for storing a three-dimensional object recognition program 8, and a three-dimensional read from the hard disk 9. A RAM (Random Access Memory) 10 that temporarily stores the object recognition program 8, a CPU (Central Processing Unit) 11 that calculates the position and orientation of the three-dimensional object 3 according to the three-dimensional object recognition program 8, and the image memory 7 A display unit 12 for displaying the image data and the calculation result by the CPU 11, an operation unit 13 composed of a mouse, a keyboard, and the like, and a system bus 14 for connecting these units to each other. In the present embodiment, the three-dimensional object recognition program 8 is stored in the hard disk 9, but instead, it may be stored in a computer-readable recording medium (not shown) and read out from this recording medium. Is possible.

以下、３次元物体認識プログラム８による処理手順について説明する。図２は、３次元物体認識プログラム８による処理の流れを示すフローチャートである。まず、ＣＰＵ１１は、３次元物体３を撮影した原画像がカメラ４から入力されると、この原画像に基づいて複数枚のピラミッド画像を作成し（Ｓ１）、図１に示す画像メモリ７に記憶する。図３は、ピラミッド画像１５を説明するための説明図である。ピラミッド画像１５は、原画像１６の解像度を所定の比率で低下させたものである。例えば、ＣＰＵ１１は、縦横両方向にそれぞれｎ個ずつのピクセルが並んだ原画像１６が入力された場合、縦横両方向にそれぞれｎ／２個のピクセルが並んだ第１ピラミッド画像１５Ａ、縦横両方向にそれぞれｎ／４個のピクセルが並んだ第２ピラミッド画像１５Ｂ、縦横両方向にそれぞれｎ／８個のピクセルが並んだ第３ピラミッド画像１５Ｃを作成する。尚、本実施例では３段階のピラミッド画像１５を作成したが、この段階数は入力画像の大きさに応じて適宜変更することができる。 Hereinafter, a processing procedure by the three-dimensional object recognition program 8 will be described. FIG. 2 is a flowchart showing the flow of processing by the three-dimensional object recognition program 8. First, when an original image obtained by photographing the three-dimensional object 3 is input from the camera 4, the CPU 11 creates a plurality of pyramid images based on the original image (S1) and stores them in the image memory 7 shown in FIG. To do. FIG. 3 is an explanatory diagram for explaining the pyramid image 15. The pyramid image 15 is obtained by reducing the resolution of the original image 16 at a predetermined ratio. For example, when an original image 16 in which n pixels are arranged in both vertical and horizontal directions is input, the CPU 11 receives a first pyramid image 15A in which n / 2 pixels are arranged in both vertical and horizontal directions, and n in both vertical and horizontal directions. A second pyramid image 15B in which / 4 pixels are arranged and a third pyramid image 15C in which n / 8 pixels are arranged in both the vertical and horizontal directions are created. In this embodiment, the three-stage pyramid image 15 is created. However, the number of stages can be appropriately changed according to the size of the input image.

次に、ＣＰＵ１１は、図２に示すように、解像度が最も低い第３ピラミッド画像１５Ｃについて３次元物体３のエッジを抽出する（Ｓ２）。ここで、このエッジ抽出としては、ピクセル精度でのエッジ抽出を行う。図４は、エッジ抽出を説明するための説明図であって、原画像１６の一部をピクセルレベルまで拡大した状態を示している。ピクセル精度でのエッジ抽出によれば、図において黒く塗り潰されたエッジ構成ピクセル１７の集合体としてエッジが抽出される（以下、このエッジを「ピクセルエッジ１８」と呼ぶ）。尚、本実施例では処理速度を優先させるべくピクセル精度でのエッジ抽出を行ったが、高い認識精度が要求される場合には、サブピクセル精度でのエッジ抽出を行ってもよい。サブピクセル精度でのエッジ抽出によれば、図４に直線で示すように、隣接ピクセル間隔Ｄ以下の精度でエッジが抽出される（以下、このエッジを「サブピクセルエッジ１９」と呼ぶ）。 Next, as shown in FIG. 2, the CPU 11 extracts the edge of the three-dimensional object 3 for the third pyramid image 15C having the lowest resolution (S2). Here, as the edge extraction, edge extraction with pixel accuracy is performed. FIG. 4 is an explanatory diagram for explaining edge extraction, and shows a state in which a part of the original image 16 is enlarged to the pixel level. According to edge extraction with pixel accuracy, an edge is extracted as a collection of edge-constituting pixels 17 filled in black in the figure (hereinafter, this edge is referred to as “pixel edge 18”). In this embodiment, edge extraction is performed with pixel accuracy so as to prioritize processing speed. However, when high recognition accuracy is required, edge extraction with sub-pixel accuracy may be performed. According to edge extraction with subpixel accuracy, as shown by a straight line in FIG. 4, edges are extracted with accuracy equal to or smaller than the adjacent pixel interval D (hereinafter, this edge is referred to as “subpixel edge 19”).

次に、ＣＰＵ１１は、図２に示すように、方向付きディスタンスマップを作成し（Ｓ３）、図１に示すＲＡＭ８に記憶する。方向付きディスタンスマップとは、図に詳細は示さないが、エッジ抽出を行った第３ピラミッド画像１５Ｃを構成する各ピクセルに、そのピクセルから最も近いピクセルエッジ１８までの距離と、最も近いピクセルエッジ１８の向きとを画素値として持たせたものである。 Next, as shown in FIG. 2, the CPU 11 creates a distance map with direction (S3) and stores it in the RAM 8 shown in FIG. Although a detailed distance map is not shown in the figure, the distance from the pixel to the nearest pixel edge 18 and the nearest pixel edge 18 to each pixel constituting the third pyramid image 15C subjected to edge extraction are shown. Are given as pixel values.

次に、ＣＰＵ１１は、方向付きディスタンスマップ上に、予め記憶したルックアップテーブルに格納された投影点をそれぞれマッピングする（Ｓ４）。このルックアップテーブルとは、３次元物体３の形状やカメラ４の位置等に応じ、ＣＰＵ１１が事前に作成してＲＡＭ１０等に記憶したものである。その作成方法は、図５に示すように、３次元物体３の各エッジ２０の上にサンプリング点２１を設定し、各サンプリング点２１がカメラ４から視認可能であるか否かを判定する。そして、カメラ４から視認可能と判断したサンプリング点２１だけをカメラ画像２２に投影し、投影点２３の座標及び投影点２３におけるエッジ２４の向きを算出する。この作業を、カメラ４の位置等から考えて可能性のある全範囲に渡って、３次元物体３の位置（３自由度）及び姿勢（３自由度）を十分に細かく変化させながら、繰り返し行う。そして、３次元物体３の位置及び姿勢に対応付けて、投影点２３の座標と投影点２３におけるエッジ２０の向きとを格納することにより、ルックアップテーブルを作成する。ＣＰＵ１１は、このルックアップテーブルに格納された各投影点２３を、その座標に基づいて方向付きディスタンスマップ上に順次配置する。尚、方向付きディスタンスマップでは、最も近いピクセルエッジ１８までの距離は画素毎にしか格納されていないため、投影点２３のマッピングに際し、投影点２３の座標値が小数部分を有する場合には、いわゆるバイリニア補間を用いることによって投影点２３の配置位置を決定すればよい。 Next, the CPU 11 maps the projection points stored in the lookup table stored in advance on the directional distance map (S4). This look-up table is created in advance by the CPU 11 and stored in the RAM 10 or the like according to the shape of the three-dimensional object 3, the position of the camera 4, or the like. As shown in FIG. 5, the creation method sets sampling points 21 on each edge 20 of the three-dimensional object 3, and determines whether each sampling point 21 is visible from the camera 4. Then, only the sampling point 21 determined to be visible from the camera 4 is projected onto the camera image 22, and the coordinates of the projection point 23 and the direction of the edge 24 at the projection point 23 are calculated. This operation is repeated while changing the position (three degrees of freedom) and posture (three degrees of freedom) of the three-dimensional object 3 sufficiently finely over the entire range that can be considered from the position of the camera 4 and the like. . A lookup table is created by storing the coordinates of the projection point 23 and the direction of the edge 20 at the projection point 23 in association with the position and orientation of the three-dimensional object 3. The CPU 11 sequentially arranges the projection points 23 stored in the lookup table on the distance map with direction based on the coordinates. In the distance map with direction, since the distance to the nearest pixel edge 18 is stored only for each pixel, when mapping the projection point 23, the coordinate value of the projection point 23 has a decimal part. The arrangement position of the projection point 23 may be determined by using bilinear interpolation.

次に、ＣＰＵ１１は、マッピングされた各投影点２３におけるエッジ２４の向きと、方向付けディスタンスマップ上でその投影点２３に対応するピクセルが画素値として持つ最も近いピクセルエッジ１８の向きとを比較する。そして、両者が一致する投影点群について、ＣＰＵ１１は、その投影点群に対応するピクセル群が持つ最も近いピクセルエッジ１８までの距離の自乗和を算出し、その算出結果に基づいて、３次元物体３の位置及び姿勢を評価する（Ｓ５）。すなわち、投影点２３からなるエッジ２４と、ピラミッド画像１５Ａにおけるピクセルエッジ１８とを比較した時の誤差の大きさに基づいて、ルックアップテーブルに従って決定した位置及び姿勢が、３次元物体３の実際の位置及び姿勢からどの程度近いかを評価する。 Next, the CPU 11 compares the direction of the edge 24 at each mapped projection point 23 with the direction of the nearest pixel edge 18 that the pixel corresponding to the projection point 23 has as a pixel value on the orientation distance map. . And about the projection point group in which both correspond, CPU11 calculates the square sum of the distance to the nearest pixel edge 18 which the pixel group corresponding to the projection point group has, and based on the calculation result, three-dimensional object 3 is evaluated (S5). That is, the position and orientation determined according to the look-up table based on the magnitude of the error when comparing the edge 24 composed of the projection points 23 and the pixel edge 18 in the pyramid image 15 </ b> A is the actual position of the three-dimensional object 3. Evaluate how close it is from position and orientation.

ここで、前記自乗和の算出に際しては、最も近いエッジまでの距離として、図４に示す対象ピクセル２５からピクセルエッジ１８までの距離Ｌａを用いる。このピクセルエッジ１８までの距離Ｌａとは、図で黒く塗り潰されたエッジ構成ピクセル１７までの最短距離を意味している。尚、前述のようにピラミッド画像１５についてのエッジ抽出をサブピクセル精度で行った場合には、最も近いエッジまでの距離として、図４に示す対象ピクセル２５からサブピクセルエッジ１９までの距離Ｌｂを用いてもよい。このサブピクセルエッジ１９までの距離Ｌｂとは、対象ピクセル２５からサブピクセルエッジ１９へと降ろした垂線２６の長さを意味している。また、要求される処理速度と認識精度の兼ね合いによっては、最も近いエッジまでの距離として距離Ｌａと距離Ｌｂを混在させて用いてもよい。例えば、距離Ｌｂが隣接ピクセル間距離Ｄに満たない場合には距離Ｌｂを用い、距離Ｌｂが隣接ピクセル間距離Ｄ以上である場合には距離Ｌａを用いるようにしてもよい。 Here, when calculating the sum of squares, the distance La from the target pixel 25 to the pixel edge 18 shown in FIG. 4 is used as the distance to the nearest edge. The distance La to the pixel edge 18 means the shortest distance to the edge constituent pixel 17 painted black in the drawing. As described above, when edge extraction for the pyramid image 15 is performed with subpixel accuracy, the distance Lb from the target pixel 25 to the subpixel edge 19 shown in FIG. 4 is used as the distance to the nearest edge. May be. The distance Lb to the subpixel edge 19 means the length of the perpendicular line 26 dropped from the target pixel 25 to the subpixel edge 19. Depending on the balance between the required processing speed and recognition accuracy, the distance La and the distance Lb may be mixed and used as the distance to the nearest edge. For example, the distance Lb may be used when the distance Lb is less than the adjacent pixel distance D, and the distance La may be used when the distance Lb is equal to or greater than the adjacent pixel distance D.

そして、評価の結果、ルックアップテーブルに従って決定した位置及び姿勢が、３次元物体３の実際の位置及び姿勢に近いと判断した場合、ＣＰＵ１１は、前記自乗和が最小となるように、当該位置及び姿勢を最適化する（Ｓ６）。ここで、この最適化には、従来公知のレーベンバーグ・マーカート法を用いる。このように、方向付けディスタンスマップとエッジの向きが略一致する投影点２３だけについて前記自乗和を算出するので、カメラ４の方向から見て３次元物体３の一部が他の物体によって隠された状態であって、最も近いエッジまでの距離が誤って計算されたピクセルに関しては、エッジの向きが一致せず、前記自乗和を算出する対象から除外される。これにより、いわゆる隠れの影響を低減して、ロバスト性を向上させることができる。また、自乗和が最小となるように位置及び姿勢を最適化することにより、位置及び姿勢の認識精度を向上させることができる。尚、位置及び姿勢の最適化の手法としては、レーベンバーグ・マーカート法に限られず、従来公知の他の非線形最適化手法を用いてもよい。 If the CPU 11 determines that the position and orientation determined according to the lookup table are close to the actual position and orientation of the three-dimensional object 3 as a result of the evaluation, the CPU 11 determines that the position and orientation so that the sum of squares is minimized. The posture is optimized (S6). Here, for this optimization, a conventionally known Levenberg-Markert method is used. In this way, the sum of squares is calculated only for the projection points 23 whose edge directions substantially coincide with the orientation distance map, so that a part of the three-dimensional object 3 is hidden by other objects when viewed from the direction of the camera 4. The pixels in which the distance to the nearest edge is erroneously calculated are excluded from the targets for calculating the sum of squares because the directions of the edges do not match. Thereby, the so-called hiding effect can be reduced and the robustness can be improved. Further, the position and orientation recognition accuracy can be improved by optimizing the position and orientation so that the sum of squares is minimized. Note that the position and orientation optimization method is not limited to the Levenberg-Markert method, and any other conventionally known nonlinear optimization method may be used.

その後、ＣＰＵ１１は、Ｓ６で最適化した位置及び精度が必要な精度を満たしているか否かを判定し（Ｓ７）、必要な精度を満たしていると判断した場合は（Ｓ７：Ｙｅｓ）、第１ピラミッド画像１５Ａについて得られた位置及び姿勢を最終結果として出力し（Ｓ８）、処理を終了する。一方、Ｓ７での判定の結果、必要な精度を満たしていないと判断した場合は（Ｓ７：Ｎｏ）、未処理のピラミッド画像１５があるか否かを判定し（Ｓ９）、未処理のピラミッド画像１５はないと判断した場合は（Ｓ９：Ｎｏ）、第１ピラミッド画像１５Ａの結果を最終結果として出力し（Ｓ８）、処理を終了する。一方、未処理のピラミッド画像１５があると判断した場合は（Ｓ９：Ｙｅｓ）、Ｓ２へ戻って残りのピラミッド画像１５例えば第２ピラミッド画像１５Ｂについて同様の処理を行う。そして、未処理のピラミッド画像１５が無くなるまでこれを繰り返す。このように、必要な精度に達するまで、より解像度の高いピラミッド画像１５について処理を行うことにより、３次元物体３の位置及び姿勢をより高い精度で認識することができる。もちろん、高速な処理速度が要求される場合には、所定段階のピラミッド画像１５で処理が終了するよう予め定めておいてもよいし、また所定段階のピラミッド画像１５だけについて処理を行うように予め定めておいてもよい。 Thereafter, the CPU 11 determines whether or not the position and accuracy optimized in S6 satisfy the required accuracy (S7), and when determining that the required accuracy is satisfied (S7: Yes), the first The position and orientation obtained for the pyramid image 15A are output as final results (S8), and the process is terminated. On the other hand, as a result of the determination in S7, if it is determined that the required accuracy is not satisfied (S7: No), it is determined whether there is an unprocessed pyramid image 15 (S9), and an unprocessed pyramid image. If it is determined that there is no 15 (S9: No), the result of the first pyramid image 15A is output as the final result (S8), and the process is terminated. On the other hand, if it is determined that there is an unprocessed pyramid image 15 (S9: Yes), the process returns to S2 and the same processing is performed on the remaining pyramid image 15, for example, the second pyramid image 15B. This is repeated until there is no unprocessed pyramid image 15. In this way, by performing processing on the pyramid image 15 having a higher resolution until the required accuracy is reached, the position and orientation of the three-dimensional object 3 can be recognized with higher accuracy. Of course, when a high processing speed is required, it may be determined in advance that the processing is completed at the pyramid image 15 at a predetermined stage, or the processing is performed only for the pyramid image 15 at the predetermined stage. It may be determined.

本発明に係る３次元物体認識装置は、ロボットアーム以外の他の機器の動作制御に用いることも可能である。 The three-dimensional object recognition apparatus according to the present invention can also be used for operation control of devices other than the robot arm.

１３次元物体認識装置
３３次元物体
４カメラ
８３次元物体認識プログラム
１５ピラミッド画像
１５Ｃ第３ピラミッド画像
１６原画像
１８ピクセルエッジ
１９サブピクセルエッジ
２１サンプリング点
２２カメラ画像
２３投影点 DESCRIPTION OF SYMBOLS 1 3D object recognition apparatus 3 3D object 4 Camera 8 3D object recognition program 15 Pyramid image 15C 3rd pyramid image 16 Original image 18 Pixel edge 19 Subpixel edge 21 Sampling point 22 Camera image 23 Projection point

Claims

A camera that captures an image by photographing a three-dimensional object to be recognized from a predetermined direction;
While changing the position and orientation of the three-dimensional object, the sampling points that are visible from the camera among the sampling points constituting the edge of the three-dimensional object are respectively projected onto the camera image, and the coordinates of each projection point and each projection Projection point coordinate calculating means for calculating the direction of the edge at each point;
Look-up table storage means for storing a look-up table in which the position and orientation of the three-dimensional object are associated with the coordinates of the projection points and the direction of the edge at each projection point;
Pyramid image creation means for creating a plurality of pyramid images in which the resolution of the original image is reduced at different ratios based on the original image acquired by the camera;
Edge extraction means for extracting an edge of the three-dimensional object for the pyramid image having the lowest resolution;
Direction to create a directional distance map in which each pixel constituting the pyramid image with the lowest resolution has the distance to the nearest edge among the extracted edges and the direction of the nearest edge as pixel values With distance map creation means,
Projection point mapping means for mapping each of the projection points stored in the look-up table on the directional distance map;
The direction of the edge at each projection point is compared with the direction of the nearest edge of the pixel corresponding to the projection point in the directional distance map, and the pixel group corresponding to the projection point group in which both are substantially the same Position and orientation evaluation means for calculating the sum of squares of the distance to the nearest edge of the image and evaluating the position and orientation of the three-dimensional object based on the calculation result;
Position and orientation optimization means for optimizing the position and orientation evaluated by the position and orientation evaluation means as being close to the actual position and orientation of the three-dimensional object so that the sum of squares is minimized;
A three-dimensional object recognition apparatus comprising:

The edge extraction means extracts the edge of the three-dimensional object with sub-pixel accuracy;
2. The three-dimensional object recognition according to claim 1, wherein the position and orientation evaluation unit uses a length of a perpendicular line dropped from the pixel group to the sub-pixel precision edge as a distance to the nearest edge. apparatus.

Computer
While changing the position and orientation of the three-dimensional object that is the recognition target, among the sampling points that constitute the edge of the three-dimensional object, the sampling points that are visible from the camera are respectively projected onto the camera image, and the coordinates of each projection point and Projection point coordinate calculating means for calculating the direction of the edge at each projection point;
Look-up table storage means for storing a look-up table in which the position and orientation of the three-dimensional object are associated with the coordinates of the projection points and the direction of the edge at each projection point;
Pyramid image creation means for creating a plurality of pyramid images in which the resolution of the original image is reduced at different ratios based on the original image acquired by the camera;
Edge extraction means for extracting an edge of the three-dimensional object for the pyramid image having the lowest resolution;
Direction to create a directional distance map in which each pixel constituting the pyramid image with the lowest resolution has the distance to the nearest edge among the extracted edges and the direction of the nearest edge as pixel values With distance map creation means,
Projection point mapping means for mapping each of the projection points stored in the look-up table on the directional distance map;
The direction of the edge at each projection point is compared with the direction of the nearest edge of the pixel corresponding to the projection point in the directional distance map, and the pixel group corresponding to the projection point group in which both are substantially the same Position and orientation evaluation means for calculating the sum of squares of the distance to the nearest edge of the image and evaluating the position and orientation of the three-dimensional object based on the calculation result;
3 for causing the position and orientation evaluated by the position and orientation evaluation means to be close to the actual position and orientation of the three-dimensional object to function as position and orientation optimization means for optimizing the sum of squares to be minimized. Dimensional object recognition program.

A computer-readable recording medium on which the three-dimensional object recognition program according to claim 3 is recorded.