JP6124566B2

JP6124566B2 - Image recognition method and image recognition apparatus

Info

Publication number: JP6124566B2
Application number: JP2012260886A
Authority: JP
Inventors: 修一榎田; 俊朗江島; 雄大市野; 央出口; 智之堀内; 寿之河野
Original assignee: Kyushu Institute of Technology NUC; Yaskawa Electric Corp
Current assignee: Kyushu Institute of Technology NUC; Yaskawa Electric Corp
Priority date: 2012-11-29
Filing date: 2012-11-29
Publication date: 2017-05-10
Anticipated expiration: 2032-11-29
Also published as: JP2014106856A

Description

この発明は、画像認識方法および画像認識装置に関する。 The present invention relates to an image recognition method and an image recognition apparatus.

従来、予め登録された登録画像から識別器を作成し、作成された識別器を用いて入力された新規画像の属性を判定して、画像を認識する画像認識方法が知られている（たとえば、特許文献１参照）。上記特許文献１では、予め登録された登録画像（たとえばワークの画像）を部分画像に分解し、部分画像上の任意の２点の輝度差が特徴量として抽出され、登録画像が学習される。そして、学習された登録画像（部分画像）と、入力された新規画像（たとえばワークの画像）の部分画像とのマッチングに基づいて、新規画像の認識（たとえばワークの位置や姿勢の認識）が行われるように構成されている。なお、特徴量として部分画像上の任意の２点の輝度差を用いる場合、この特徴量が部分画像の回転に対して不変でないため、予め登録された登録画像に、光軸（ワークを撮影するレンズに垂直な軸）周りにワークを回転させた画像を含めておく必要がある。 Conventionally, an image recognition method is known in which an identifier is created from a registered image registered in advance, the attribute of a new image input using the created identifier is determined, and the image is recognized (for example, Patent Document 1). In Patent Document 1, a registered image (for example, an image of a workpiece) registered in advance is decomposed into partial images, and a luminance difference between any two points on the partial image is extracted as a feature amount to learn a registered image. Based on the matching between the learned registered image (partial image) and the partial image of the input new image (for example, the image of the workpiece), recognition of the new image (for example, recognition of the position and orientation of the workpiece) is performed. It is configured to be In addition, when using the brightness | luminance difference of two arbitrary points on a partial image as a feature-value, since this feature-value is not invariant with rotation of a partial image, an optical axis (a workpiece | work is image | photographed to the registration image registered previously. It is necessary to include an image obtained by rotating a work around an axis perpendicular to the lens.

特開２０１１−２２９９１号公報JP2011-22991A

しかしながら、上記特許文献１に記載の画像認識方法では、ワークの位置や姿勢を求めるために、光軸周りにワークを回転させた画像（所定の角度毎に回転させたワークの画像）が必要になるため、その分、予め登録された登録画像の数が増大し、入力された新規画像の認識（登録画像と新規画像とのマッチング）に時間がかかるという問題点がある。 However, in the image recognition method described in Patent Document 1, an image obtained by rotating the workpiece around the optical axis (an image of the workpiece rotated at a predetermined angle) is necessary to obtain the position and orientation of the workpiece. Therefore, the number of registered images registered in advance increases, and there is a problem that it takes time to recognize an input new image (matching between a registered image and a new image).

この発明は、上記のような課題を解決するためになされたものであり、この発明の１つの目的は、画像の認識に時間がかかるのを抑制することが可能な画像認識方法および画像認識装置を提供することである。 The present invention has been made to solve the above-described problems, and one object of the present invention is to provide an image recognition method and an image recognition apparatus capable of suppressing time-consuming image recognition. Is to provide.

上記目的を達成するために、第１の局面による画像認識方法は、学習画像に対して複数の特徴点を抽出するステップと、抽出された特徴点に対して、回転不変な特徴量を用いて特徴量を算出するステップと、算出された学習画像の特徴点の特徴量に基づいて、特徴点の属性を判定するための識別器を作成するステップと、推定画像に対して複数の特徴点を抽出するステップと、識別器を用いて抽出された推定画像の複数の特徴点の属性を集約して推定対象の位置を判定して、推定画像を認識するステップとを備える。 To achieve the above object, an image recognition method according to a first aspect uses a step of extracting a plurality of feature points from a learning image, and using a rotation-invariant feature amount for the extracted feature points. A step of calculating a feature amount, a step of creating a discriminator for determining the attribute of the feature point based on the calculated feature amount of the feature point of the learning image, and a plurality of feature points for the estimated image A step of extracting, and a step of recognizing the estimated image by aggregating the attributes of a plurality of feature points of the estimated image extracted using the classifier to determine the position of the estimation target .

この第１の局面による画像認識方法では、上記のように、抽出された特徴点に対して、回転不変な特徴量を用いて特徴量を算出するステップを備えることによって、回転不変でない特徴量を用いて特徴点の特徴量が抽出される場合と異なり、所定の角度毎に回転した学習画像を学習する必要がない。すなわち、回転不変でない特徴量に基づいて、特徴点の属性を判定するための識別器を作成する場合と比べて、より少ない学習画像から識別器を作成することができるので、識別器を用いて推定画像の特徴点の属性を判定する際に、推定画像の特徴点とマッチングされる学習画像の特徴点の数を少なくすることができ、その分、計算量を少なくすることができる。その結果、画像の認識に時間がかかるのを抑制することができる。 In the image recognition method according to the first aspect, as described above, a feature amount that is not rotation-invariant is obtained by including a step of calculating a feature amount using a rotation-invariant feature amount with respect to the extracted feature points. Unlike the case where the feature amount of the feature point is extracted by using, it is not necessary to learn a learning image rotated at every predetermined angle. That is, it is possible to create a discriminator from fewer learning images than when creating a discriminator for determining the attribute of a feature point based on a feature amount that is not rotation invariant. When determining the attribute of the feature point of the estimated image, the number of feature points of the learning image matched with the feature point of the estimated image can be reduced, and the amount of calculation can be reduced accordingly. As a result, it can be suppressed that it takes time to recognize the image.

第２の局面による画像認識装置は、学習画像に対して複数の特徴点を抽出する第１特徴点抽出手段と、抽出された特徴点に対して、回転不変な特徴量を用いて特徴量を算出する特徴量算出手段と、算出された学習画像の特徴点の特徴量に基づいて、特徴点の属性を判定するための識別器を作成する識別器作成手段と、推定画像に対して複数の特徴点を抽出する第２特徴点抽出手段と、識別器を用いて抽出された推定画像の複数の特徴点の属性を集約して推定対象の位置を判定して、推定画像を認識する認識手段とを備える。

An image recognition apparatus according to a second aspect is configured to extract a feature amount using a first feature point extraction unit that extracts a plurality of feature points from a learning image, and a rotation-invariant feature amount with respect to the extracted feature points. A feature amount calculating means for calculating, a discriminator creating means for creating a discriminator for determining the attribute of the feature point based on the calculated feature amount of the feature point of the learning image, and a plurality of the estimated images Second feature point extracting means for extracting feature points, and recognition means for recognizing the estimated image by determining the position of the estimation target by aggregating the attributes of the plurality of feature points of the estimated image extracted using the classifier With.

この第２の局面による画像認識装置では、上記のように、抽出された特徴点に対して、回転不変な特徴量を用いて特徴量を算出する特徴量算出手段を備えることによって、回転不変でない特徴量を用いて特徴点の特徴量が抽出される場合と異なり、所定の角度毎に回転した学習画像を学習する必要がない。すなわち、回転不変でない特徴量に基づいて、特徴点の属性を判定するための識別器を作成する場合と比べて、より少ない学習画像から識別器を作成することができるので、識別器を用いて推定画像の特徴点の属性を判定する際に、推定画像の特徴点とマッチングされる学習画像の特徴点の数を少なくすることができ、その分、計算量を少なくすることができる。その結果、画像の認識に時間がかかるのを抑制することが可能な画像認識装置を提供することができる。 In the image recognition apparatus according to the second aspect, as described above, the feature amount calculating means for calculating the feature amount using the rotation-invariant feature amount is provided for the extracted feature point, so that the rotation is not invariant. Unlike the case where the feature amount of the feature point is extracted using the feature amount, it is not necessary to learn the learning image rotated at every predetermined angle. That is, it is possible to create a discriminator from fewer learning images than when creating a discriminator for determining the attribute of a feature point based on a feature amount that is not rotation invariant. When determining the attribute of the feature point of the estimated image, the number of feature points of the learning image matched with the feature point of the estimated image can be reduced, and the amount of calculation can be reduced accordingly. As a result, it is possible to provide an image recognition apparatus capable of suppressing the time taken for image recognition.

上記のように構成することによって、画像の認識に時間がかかるのを抑制することができる。 By comprising as mentioned above, it can suppress that recognition of an image takes time.

本発明の一実施形態によるロボットシステムの全体図である。1 is an overall view of a robot system according to an embodiment of the present invention. 本発明の一実施形態によるロボットシステムのブロック図である。1 is a block diagram of a robot system according to an embodiment of the present invention. 本発明の一実施形態による画像認識方法の学習時のフロー図である。It is a flowchart at the time of learning of the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法の分類木の概念図である。It is a conceptual diagram of the classification tree of the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法の推定時のフロー図である。It is a flowchart at the time of the estimation of the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法の推定時の概念図である。It is a conceptual diagram at the time of the estimation of the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験１におけるワークの斜視図である。It is a perspective view of the workpiece | work in the experiment 1 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験１における学習画像を示す図である。It is a figure which shows the learning image in the experiment 1 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験１における推定シーン（バラ積みされたワーク）を示す図である。It is a figure which shows the presumed scene (work piled up) in the experiment 1 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験１における推定時の投票面（投票結果）を示す図である。It is a figure which shows the voting surface (voting result) at the time of the estimation in the experiment 1 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験２におけるワークの斜視図である。It is a perspective view of the workpiece | work in the experiment 2 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験２における学習画像（ワークの表面）を示す図である。It is a figure which shows the learning image (surface of a workpiece | work) in the experiment 2 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験２における学習画像（ワークの裏面）を示す図である。It is a figure which shows the learning image (the back surface of a workpiece | work) in the experiment 2 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験２における推定シーン（バラ積みされたワーク）を示す図である。It is a figure which shows the presumed scene (work piled up) in the experiment 2 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験２におけるワークの表面の推定時の投票面（投票結果）を示す図である。It is a figure which shows the voting surface (voting result) at the time of estimation of the surface of the workpiece | work in the experiment 2 performed using the image recognition method by one Embodiment of this invention. 本発明の一実施形態による画像認識方法を用いて行った実験２におけるワークの裏面の推定時の投票面（投票結果）を示す図である。It is a figure which shows the voting surface (voting result) at the time of the estimation of the back surface of the workpiece | work in the experiment 2 performed using the image recognition method by one Embodiment of this invention.

以下、本実施形態を図面に基づいて説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

まず、図１および図２を参照して、本実施形態によるロボットシステム１００の構成について説明する。 First, the configuration of the robot system 100 according to the present embodiment will be described with reference to FIGS. 1 and 2.

図１および図２に示すように、ロボットシステム１００には、ロボット１と、ロボットコントローラ２と、センサユニット（画像センサユニット）３とが設けられている。なお、センサユニット３は、本発明の「画像認識装置」の一例である。 As shown in FIGS. 1 and 2, the robot system 100 is provided with a robot 1, a robot controller 2, and a sensor unit (image sensor unit) 3. The sensor unit 3 is an example of the “image recognition device” in the present invention.

図１に示すように、ロボット１は、基台１１と、基台１１に取り付けられるロボットアーム１２と、ロボットアーム１２の先端に取り付けられるエンドエフェクタ１３とを備えている。ロボットアーム１２は、６自由度を有して構成されている。ロボットアーム１２は、複数のアーム構造体を有しており、ロボット１の設置面に対して垂直な回転軸Ａ１まわりにアーム構造体１２ａが基台１１に対して回転可能に連結されている。アーム構造体１２ｂは、回転軸Ａ１に対して垂直な回転軸Ａ２まわりに回転可能にアーム構造体１２ａに連結されている。アーム構造体１２ｃは、回転軸Ａ２に対して平行な回転軸Ａ３まわりに回転可能にアーム構造体１２ｂに連結されている。アーム構造体１２ｄは、回転軸Ａ３に対して垂直な回転軸Ａ４まわりに回転可能にアーム構造体１２ｃに連結されている。アーム構造体１２ｅは、回転軸Ａ４に対して垂直な回転軸Ａ５まわりに回転可能にアーム構造体１２ｄに連結されている。アーム構造体１２ｆは、回転軸Ａ５に対して垂直な回転軸Ａ６まわりに回転可能にアーム構造体１２ｅに連結されている。なお、ここでいう「平行」「垂直」は、厳密な意味の「平行」および「垂直」だけでなく、「平行」および「垂直」から少しずれているものも含む広い意味である。各回転軸Ａ１〜Ａ６にはそれぞれサーボモータ（関節）が設けられており、各サーボモータは、それぞれの回転位置を検出するエンコーダを有している。各サーボモータは、ロボットコントローラ２に接続されており、ロボットコントローラ２の指令に基づいて各サーボモータが動作するように構成されている。 As shown in FIG. 1, the robot 1 includes a base 11, a robot arm 12 attached to the base 11, and an end effector 13 attached to the tip of the robot arm 12. The robot arm 12 is configured with six degrees of freedom. The robot arm 12 has a plurality of arm structures, and an arm structure 12 a is rotatably connected to the base 11 around a rotation axis A 1 perpendicular to the installation surface of the robot 1. The arm structure 12b is connected to the arm structure 12a so as to be rotatable around a rotation axis A2 perpendicular to the rotation axis A1. The arm structure 12c is connected to the arm structure 12b so as to be rotatable around a rotation axis A3 parallel to the rotation axis A2. The arm structure 12d is connected to the arm structure 12c so as to be rotatable around a rotation axis A4 perpendicular to the rotation axis A3. The arm structure 12e is connected to the arm structure 12d so as to be rotatable around a rotation axis A5 perpendicular to the rotation axis A4. The arm structure 12f is connected to the arm structure 12e so as to be rotatable around a rotation axis A6 perpendicular to the rotation axis A5. Here, “parallel” and “vertical” have a broad meaning including not only “parallel” and “vertical” in a strict sense but also those slightly deviated from “parallel” and “vertical”. Each rotary shaft A1 to A6 is provided with a servo motor (joint), and each servo motor has an encoder for detecting the respective rotational position. Each servo motor is connected to the robot controller 2 and is configured such that each servo motor operates based on a command from the robot controller 2.

図２に示すように、センサユニット３には、２次元画像を撮影するカメラ３１と、レーザスキャナ３２とが設けられている。また、センサユニット３の内部には、画像処理部３４およびメモリ３５を含むセンサコントローラ３３が設けられている。また、センサユニット３は、レーザスキャナ３２から、バラ積みされたワーク２００（推定対象、図１参照）にレーザ光を照射するとともに、ワーク２００から反射される光をカメラ３１により撮影することにより、ワーク２００の３次元形状を計測するように構成されている。また、センサユニット３は、ワーク２００の３次元形状を計測（ワーク２００までの距離や、ワーク２００の詳細な位置および姿勢の推定）することが可能である一方、本実施形態では、詳細な位置や姿勢の推定の前に、カメラ３１により撮影された２次元画像に基づいて、ワーク２００（および後述するワーク２０１、図１１参照）の概略の位置の推定が、画像処理部３４により、行われるように構成されている。なお、画像処理部３４は、「第１特徴点抽出手段」、「特徴量算出手段」、「識別器作成手段」、「第２特徴点抽出手段」および「認識手段」の一例である。また、ワーク２００および２０１は、「推定対象」の一例である。以下では、本実施形態による画像認識方法（ワーク２００および２０１の概略位置の推定方法）について説明する。 As shown in FIG. 2, the sensor unit 3 is provided with a camera 31 that captures a two-dimensional image and a laser scanner 32. A sensor controller 33 including an image processing unit 34 and a memory 35 is provided inside the sensor unit 3. In addition, the sensor unit 3 irradiates laser light onto the workpiece 200 (estimation target, see FIG. 1) stacked from the laser scanner 32 and shoots the light reflected from the workpiece 200 by the camera 31. The three-dimensional shape of the workpiece 200 is measured. Further, the sensor unit 3 can measure the three-dimensional shape of the workpiece 200 (estimation of the distance to the workpiece 200 and the detailed position and orientation of the workpiece 200). Before estimating the posture, the image processing unit 34 estimates the approximate position of the workpiece 200 (and the workpiece 201 described later, see FIG. 11) based on the two-dimensional image captured by the camera 31. It is configured as follows. The image processing unit 34 is an example of “first feature point extracting unit”, “feature amount calculating unit”, “classifier creating unit”, “second feature point extracting unit”, and “recognizing unit”. The workpieces 200 and 201 are examples of “estimation targets”. Hereinafter, an image recognition method (a method for estimating the approximate position of the workpieces 200 and 201) according to the present embodiment will be described.

（ＤｏＧに基づく回転不変な特徴量）
まず、画像認識方法において用いられる回転不変な特徴量について説明する。本実施形態では、回転不変な特徴量としてＤｏＧ（Ｄｉｆｆｅｒｅｎｃｅ−ｏｆ−Ｇａｕｓｓｉａｎ）が用いられる。以下に、ＤｏＧ（ＤｏＧ値）について説明する。まず、任意の特徴点（ｘ，ｙ）において、学習画像Ｉ（ｘ，ｙ）に対して、下記の式（１）に示すガウス関数Ｇ（ｕ，ｖ，σ）を畳み込むことにより、下記の式（２）に示す平滑化画像Ｌ（ｘ，ｙ，σ）が生成される。 (Rotation invariant feature based on DoG)
First, the rotation invariant feature amount used in the image recognition method will be described. In this embodiment, DoG (Difference-of-Gaussian) is used as a rotation-invariant feature amount. Hereinafter, DoG (DoG value) will be described. First, by convolving a Gaussian function G (u, v, σ) shown in the following equation (1) with respect to a learning image I (x, y) at an arbitrary feature point (x, y), A smoothed image L (x, y, σ) shown in Expression (2) is generated.

次に、２つの平滑化パラメータσ_ｉおよびσ_ｊにより得れた２つの平滑化画像の差分画像Ｄ^{（ｉ，ｊ）}（ｘ，ｙ）が下記の式（３）により生成される。 Next, a difference image D ^{(i, j)} (x, y) of two smoothed images obtained by the two smoothing parameters σ _i and σ _j is generated by the following equation (3).

そして、上記の式（３）により、σ_ｉ，σ_ｊ∈[σ_１，σ_２，．．．，σ_ｍ]の範囲の∀σ_ｉ∀σ_ｊ（σ_ｉ＜σ_ｊ）において、Ｄ^{（ｉ，ｊ）}（ｘ，ｙ）が求められて、下記の式（４）に示される特徴ベクトルＶ（ｘ，ｙ）の要素とされる。 Then, according to the above equation (3), σ _i , σ _j ∈ [σ ₁ , σ ₂ ,. . . , Σ _m ] in the range ， σ _i ∀σ _j (σ _i <σ _j ), D ^{(i, j)} (x, y) is obtained and the feature vector V shown in the following equation (4) is obtained. The element is (x, y).

上記の式（４）に示される特徴ベクトルＶは、ＤｏＧ（ＤｏＧ値）を要素としている。本実施形態では、ＤｏＧは、学習画像および推定シーン（推定画像）における範囲の異なる２つの同心円状の領域のそれぞれの輝度値の合算値の差（上記σ_ｉおよびσ_ｊより得れた２つの平滑化画像の差分）であるので、回転不変な特徴量である。なお、特徴ベクトルＶ（ＤｏＧ）は、マザーウェーブレット関数（有限の長波形）を用いたウェーブレット特徴量（周波数解析）に似た特徴量である。ウェーブレット特徴量は、解像度と方位とを要素とする一方、ＤｏＧは、方位の要素を有しない。しかしながら、特徴ベクトルＶ（ＤｏＧ）は、解像度（サイズ）に関して、多くのバリエーション（様々なσについてのＤ）を有する。このため、下記の特徴ベクトルＶの次元数削減の手法を用いることにより、推定対象に適切なサイズの特徴量を選択することが可能となる。 The feature vector V shown in the above equation (4) has DoG (DoG value) as an element. In the present embodiment, DoG is the difference between the sum of the luminance values of two concentric regions with different ranges in the learning image and the estimated scene (estimated image) (the two values obtained from σ _i and σ _j above). Since the difference is a smoothed image, it is a rotation-invariant feature quantity. The feature vector V (DoG) is a feature amount similar to a wavelet feature amount (frequency analysis) using a mother wavelet function (finite long waveform). While the wavelet feature quantity has resolution and orientation as elements, DoG does not have an orientation element. However, the feature vector V (DoG) has many variations (D for various σ) with respect to resolution (size). For this reason, it is possible to select a feature quantity of an appropriate size for the estimation target by using the following technique for reducing the number of dimensions of the feature vector V.

（特徴ベクトルＶの次元数削減）
上記の式（４）に示される特徴ベクトルＶは、多くのＤｏＧ（次元数）を有することにより、特徴ベクトルＶの分離能力が向上する。しかしながら、推定対象の推定時（認識時）のＤｏＧの生成に多くの時間が割かれたり、無駄になる（特徴が似ている）ＤｏＧが生成される可能性がある。そこで、本実施形態では、ＤｏＧ（ＤｏＧ値）を要素とする複数のベクトル（後述するｆ）を生成し、複数のベクトルの間のハミング距離に基づいて、互いに相関の低いＤｏＧ値を選択する（特徴ベクトルＶの次元を削減する）とともに、選択したＤｏＧ値に基づいて識別器を作成するように構成されている。以下に、特徴ベクトルＶの次元数削減手法について詳細に説明する。 (Reducing the number of dimensions of feature vector V)
Since the feature vector V shown in the above equation (4) has many DoGs (dimensions), the separation capability of the feature vector V is improved. However, it may take a lot of time to generate DoG at the time of estimation (recognition) of the estimation target, or there may be wasted (similar features) DoG. Therefore, in the present embodiment, a plurality of vectors (f to be described later) having DoG (DoG value) as elements are generated, and DoG values having low correlation with each other are selected based on the Hamming distance between the plurality of vectors ( And the classifier is created based on the selected DoG value. Hereinafter, a method for reducing the number of dimensions of the feature vector V will be described in detail.

まず、特徴ベクトルＶを求める上で、最適な要素Ｄを選択するために、全ての特徴点（学習画像上のｎ点）における分類性能を比較する必要がある。あるσ_ｉ，σ_ｊにおいて、（ｘ，ｙ）∈[（ｘ_１，ｙ_１），（ｘ_２，ｙ_２），．．．，（ｘ_ｎ，ｙ_ｎ）]の範囲の∀（ｘ，ｙ）に対するＤ^{（ｉ，ｊ）}（ｘ，ｙ）を要素とするベクトルｆ^{（ｉ，ｊ）}が、下記の式（５）により新たに定義される。 First, in obtaining the feature vector V, it is necessary to compare the classification performance at all feature points (n points on the learning image) in order to select the optimum element D. For some σ _i , σ _j , (x, y) ε [(x ₁ , y ₁ ), (x ₂ , y ₂ ),. . . , (X _n , y _n )] with respect to ∀ (x, y), a vector f ^{(i, j)} whose elements are D ^{(i, j)} (x, y) is expressed by the following equation (5). Newly defined.

なお、上記の式（５）において、ｄ^{（ｉ，ｊ）}（ｘ，ｙ）は、上記の式（６）により、Ｄ^{（ｉ，ｊ）}（ｘ，ｙ）を２値化したものである。また、上記の式（６）において、Ｄ_ｍｅｄ ^{（ｉ，ｊ）}は、上記の式（７）により求められる中央値である。なお、ｆ^{（ｉ，ｊ）}は、複数生成される。ここで、ｆ^{（ｉ，ｊ）}の要素であるＤは、実数値である。そこで、ｆ^{（ｉ，ｊ）}の要素であるＤの中央値を閾値として、各要素Ｄがこの閾値よりも大きいか否かによって、各要素Ｄを「０」または「１」に２値化する。これにより、ｆ^{（ｉ，ｊ）}内のｂｉｔに「０」と「１」とが等しく存在するようになり、この２値化された要素は、ｎ点の特徴点を２つに分類する上で、適切な情報となる。 In the above equation (5), d ^{(i, j)} (x, y) is a binary value of D ^{(i, j)} (x, y) according to the above equation (6). . In the above equation (6), D _med ^{(i, j)} is a median value obtained by the above equation (7). Note that a plurality of f ^{(i, j)} are generated. Here, D which is an element of f ^{(i, j)} is a real value. Therefore, the median value of D, which is an element of f ^{(i, j)} , is set as a threshold value, and each element D is binarized to “0” or “1” depending on whether each element D is larger than this threshold value. . As a result, “0” and “1” are equally present in the bits in f ^{(i, j)} , and this binarized element classifies n feature points into two. Therefore, it becomes appropriate information.

次に、最適なｆ^{（ｉ，ｊ）}の集合であるＦを決定するアルゴリズムについて説明する。まず、集合Ｆの最初の要素ｆ_１（ｔ＝１）が、全てのｆ^{（ｉ，ｊ）}の中からランダムに選択される。その後、ｔが、２≦ｔ≦Ｔ_ｍａｘを満たす間、以下の処理が逐次的に行われる。具体的には、ｔ番目のｆ_ｔを選択する際には、集合Ｆに含まれない全てのｆ^{（ｉ，ｊ）}について、下記の式（８）に示されるＨ^{（ｉ，ｊ）}が算出される。 Next, an algorithm for determining F, which is an optimal set of f ^{(i, j)} , will be described. First, the first element f ₁ (t = 1) of the set F is randomly selected from all f ^{(i, j)} . Thereafter, while t satisfies 2 ≦ t ≦ T _max , the following processing is sequentially performed. Specifically, when selecting the t-th f _t , H ^{(i, j)} shown in the following equation (8 ⁾ is calculated for all f ^{(i, j)} not included in the set F. Is done.

ここで、上記の式（８）内の関数ω_Ｈ（ｆ^{（ｉ，ｊ）}，ｆ^{（ｋ，ｌ）}）は、下記の式（９）で表される。 Here, the function ω _H (f ^{(i, j)} , f ^{(k, l)} ) in the above equation (8) is expressed by the following equation (9).

ここで、ｄ_Ｈ（ｆ^{（ｉ，ｊ）}，ｆ^{（ｋ，ｌ）}）は、ｆ^{（ｉ，ｊ）}とｆ^{（ｋ，ｌ）}との間のハミング距離を表す。なお、ハミング距離とは、ｆ^{（ｉ，ｊ）}の要素（「０」または「１」）とｆ^{（ｋ，ｌ）}の要素（「０」または「１」）との間の異なった要素の数を意味する。そして、全てのＨ^{（ｉ，ｊ）}の中で、最小値であったＨ^{（ｉ，ｊ）}の算出元であるｆ^{（ｉ，ｊ）}（最小値であったＨ^{（ｉ，ｊ）}に対応するｆ^{（ｉ，ｊ）}）が集合Ｆの要素として追加される。 Here, d _H (f ^{(i, j)} , f ^{(k, l)} ) represents a Hamming distance between f ^{(i, j)} and f ^{(k, l)} . The Hamming distance refers to a different element between the element (“0” or “1”) of f ^{(i, j) and} the element (“0” or “1”) of f ^{(k, l)} . Means number. Then, among all the ^{H (i, j),} corresponding to the minimum value at which was ^{H (i, j)} is calculated source ^{f (i, j) (the} minimum value and which was ^{H (i, j)} F ^{(i, j)} ) to be added as an element of the set F.

上記のアルゴリズムは、未選択の要素Ｄを有するｆの中で、選択された要素Ｄの群との相関が最も低い要素Ｄを有するｆを、ｔがＴ_ｍａｘになるまで集合Ｆに逐次追加していく手法である。選択されたｆの各々が異なる情報を有するＤにより構成されるため、選択されたｆ（Ｄ）は、無駄のない特徴量となる（同じような特徴を有する特徴量が削減される）と考えられる。 The above algorithm sequentially adds f having an element D having the lowest correlation with the group of selected elements D among f having an unselected element D to the set F until t reaches T _max. It is a technique to go. Since each selected f is composed of Ds having different information, the selected f (D) is considered to be a lean feature amount (features having similar features are reduced). It is done.

（アンサンブル分類木を用いた推定対象の概略位置推定）
次に、図３〜図６を参照して、アンサンブル分類木を用いた推定対象（認識対象）の概略位置の推定について説明する。 (Approximate position estimation of estimation target using ensemble classification tree)
Next, estimation of the approximate position of the estimation target (recognition target) using the ensemble classification tree will be described with reference to FIGS.

（学習時）
まず、学習時について説明する。本実施形態では、図３のステップＳ１に示すように、学習画像に対してランダムに特徴点が抽出される。そして、ステップＳ２において、各々の特徴点において上記の式（４）を用いて回転不変な特徴量であるＤｏＧ（ＤｏＧ値）を用いた特徴量（特徴ベクトルＶ）が算出される。次に、ステップＳ３において、上記の式（５）〜（９）を用いて、特徴ベクトルＶの次元数削減が行われる。そして、ステップＳ４において、生成された特徴量（特徴ベクトルＶ、選択したＤｏＧ）を分類基準として、分類木が作成される。分類木の作成手順を以下に説明する。 (During learning)
First, the learning time will be described. In the present embodiment, as shown in step S1 of FIG. 3, feature points are extracted at random from the learning image. In step S2, a feature amount (feature vector V) using DoG (DoG value), which is a rotation-invariant feature amount, is calculated using the above equation (4) at each feature point. Next, in step S3, the number of dimensions of the feature vector V is reduced using the above equations (5) to (9). In step S4, a classification tree is created using the generated feature quantity (feature vector V, selected DoG) as a classification criterion. The procedure for creating a classification tree is described below.

図４に示すように、まず、ノード（２分木の節点）に格納されている全ての特徴点について、各要素（特徴ベクトルＶの要素）ごとの中央値を閾値として、２値化が行われる。すなわち、特徴ベクトルＶ（下記の式（１０）参照）の要素（ｖ（ＤｏＧ）、式（１０）参照）について、各要素が閾値よりも大きいか否かによって、各要素が「０」または「１」に２値化（特徴ベクトルＶ_ｂｉｎ、下記の式（１１）参照）される。 As shown in FIG. 4, first, binarization is performed for all feature points stored in a node (nodes of a binary tree) with a median value for each element (element of feature vector V) as a threshold value. Is called. That is, for each element (v (DoG), see Expression (10)) of the feature vector V (see Expression (10) below), each element is “0” or “ 1 ”is binarized (feature vector V _bin , see formula (11) below).

次に、任意の距離ｄが生成されるとともに、任意の距離ｄと、２値化された特徴ベクトルＶ_ｂｉｎ（上記の式（１１）参照）とのハミング距離ｄ_Ｈ（Ｖ_ｂｉｎ（ｘ_ｉ，ｙ_ｉ），ｄ）が算出される。そして、算出されたハミング距離ｄ_Ｈ（Ｖ_ｂｉｎ（ｘ_ｉ，ｙ_ｉ），ｄ）と、子ノードの要素数が均等に分類できる閾値ｄ_ｔｈとの大小比較により、特徴ベクトルＶ_ｂｉｎの分類が行われる。以上の処理が再帰的に行われることにより、分類木が作成される。また、特徴点の抽出（ステップＳ１）が、分類木の作成時において毎回行われることにより、本実施形態では、独立性のある分類木が複数作成される。これらの複数の分類木は、アンサンブル分類木と呼ばれる。そして、作成されたアンサンブル分類木を用いることにより、後述する推定対象の位置が推定される。なお、アンサンブル分類木は、本発明の「識別器」の一例である。 Next, an arbitrary distance d is generated, and a Hamming distance d _H (V _bin (x _i , x _i ,) between the arbitrary distance d and the binarized feature vector V _bin (see the above equation (11)). y _i ), d) are calculated. Then, the classification of the feature vector V _{bin is performed} by comparing the calculated Hamming distance d _H (V _bin (x _i , y _i ), d) with a threshold value d _{th in} which the number of elements of the child nodes can be equally classified. Done. By performing the above processing recursively, a classification tree is created. In addition, in this embodiment, a plurality of independent classification trees are created by extracting feature points (step S1) every time a classification tree is created. These multiple classification trees are called ensemble classification trees. And the position of the estimation object mentioned later is estimated by using the created ensemble classification tree. The ensemble classification tree is an example of the “discriminator” of the present invention.

（推定時）
本実施形態では、図５に示すように、ステップＳ１１において、推定シーン（推定画像）において、全探査が行われて、特徴点が抽出（図６参照）される。すなわち、推定シーンにおいて、たとえばラスタスキャンが行われるとともに、スキャンされた各点において、特徴ベクトルＶ（上記の式（４）参照）が算出される。次に、ステップＳ１２において、学習時に作成されたアンサンブル分類木を用いることにより、ステップＳ１１において、抽出された推定シーンの特徴点の特徴量と類似した学習画像の特徴点が、対応点として求められる（ステップＳ１１において抽出された特徴点の属性が判定される、図６参照）。 (At the time of estimation)
In this embodiment, as shown in FIG. 5, in step S11, the entire scene is extracted in the estimated scene (estimated image), and feature points are extracted (see FIG. 6). That is, for example, a raster scan is performed in the estimated scene, and a feature vector V (see the above equation (4)) is calculated at each scanned point. Next, in step S12, by using the ensemble classification tree created at the time of learning, in step S11, the feature point of the learning image similar to the feature amount of the extracted feature point of the estimated scene is obtained as the corresponding point. (The attribute of the feature point extracted in step S11 is determined, see FIG. 6).

次に、ステップ１３において、本実施形態では、図６に示すように、対応点（属性）に対応する投票面（（ｘ，ｙ）平面）上の位置に、投票が行われる。なお、特徴量であるＤｏＧは、回転不変な特徴量であるので、推定シーン中の推定対象の方向を一意に定めることはできない。このため、投票は、投票面に円状に行われる。その結果、投票面に投票が集まっている所（多く投票された場所）に、推定対象（認識対象）が存在していると判断（推定）することが可能となる。 Next, in step 13, in this embodiment, as shown in FIG. 6, voting is performed at a position on the voting plane ((x, y) plane) corresponding to the corresponding point (attribute). Note that DoG, which is a feature quantity, is a rotation-invariant feature quantity, and therefore the direction of the estimation target in the estimation scene cannot be uniquely determined. For this reason, voting is performed circularly on the voting surface. As a result, it is possible to determine (estimate) that an estimation target (recognition target) exists at a place where votes are gathered on the voting surface (a place where many votes have been placed).

次に、図７〜図１６を参照して、本実施形態による画像認識方法の有効性を確認するために行った、バラ積みされた状態のワーク２００および２０１に対して、ワーク２００および２０１の概略の中心位置を推定する実験について説明する。 Next, with reference to FIG. 7 to FIG. 16, the workpieces 200 and 201 are compared with the workpieces 200 and 201 in a state of being stacked in order to confirm the effectiveness of the image recognition method according to the present embodiment. An experiment for estimating the approximate center position will be described.

（実験１）
実験１では、図７に示すように、３つの孔２００ａを有する平板状のワーク２００に対して、ワーク２００の概略の中心位置を推定する実験を行った。以下に、学習時の条件について説明する。なお、この条件は、以下に説明する実験２においても同様である。 (Experiment 1)
In Experiment 1, as shown in FIG. 7, an experiment for estimating the approximate center position of the workpiece 200 was performed on a flat workpiece 200 having three holes 200a. Below, the conditions at the time of learning are demonstrated. This condition is the same in Experiment 2 described below.

学習時では、３次元のＣＡＤデータに基づいて作成された仮想環境でのワーク２００の画像が学習画像として用いられた。図８に示すように、本実施形態では、学習画像は、１つのワーク２００を平面上に置いた画像からなる。また、学習画像（および推定シーン、図９参照）は、２次元画像からなる。また、１つの分類木の作成に用いられる学習画像上の特徴点数を、３００とした。また、分類木の本数は、１６本とした。また、推定時では、２５６×２５６ｐｉｘｅｌｓの推定シーンに対して、１ｐｉｘｅｌごとの全探索は行わずに、４ｐｉｘｅｌｓごとに特徴点を抽出（特徴量を算出）した。すなわち、本実施形態（実験１および２）では、推定シーンの特徴点は、推定シーンの局所画像からなる。 At the time of learning, an image of the workpiece 200 in a virtual environment created based on three-dimensional CAD data was used as a learning image. As shown in FIG. 8, in this embodiment, the learning image is an image in which one work 200 is placed on a plane. The learning image (and the estimated scene, see FIG. 9) is a two-dimensional image. The number of feature points on the learning image used for creating one classification tree is set to 300. The number of classification trees was 16. Further, at the time of estimation, feature points were extracted (feature amounts were calculated) for every 4 pixels without performing a full search for each pixel for an estimated scene of 256 × 256 pixels. That is, in the present embodiment (Experiments 1 and 2), the feature point of the estimated scene is a local image of the estimated scene.

図９は、推定時に用いられたバラ積みされたワーク２００の画像である。なお、図９における番号１〜５は、図１０に示す投票結果に基づいて、ワーク２００の中心位置と推定された場所を示している。また、図１０には、アンサンブル分類木によって判定された特徴点の属性に基づいて、投票面上に投票された結果が示されている。具体的には、アンサンブル分類木によって判定された推定シーンの局所画像の属性に基づいて、ワーク２００の中心位置が存在すると考えられる位置が投票面上に円状に投票された結果（局所画像ごとの属性に基づく投票の結果）が、等高線によって示されている。図１０における数字は、等高線の高さを示している。また、投票面の極大値に対応する位置が、ワーク２００の中心位置と推定された。そして、図９では、投票面の極大値に対応する位置が、投票面の極大値の大きさ順（投票順位順）に順位付けられて、上位第１位〜第５位（番号１〜５）まで記載されている。 FIG. 9 is an image of the stacked workpieces 200 used at the time of estimation. Note that numbers 1 to 5 in FIG. 9 indicate places estimated as the center position of the workpiece 200 based on the voting results shown in FIG. FIG. 10 shows the result of voting on the voting plane based on the feature point attributes determined by the ensemble classification tree. Specifically, based on the attribute of the local image of the estimated scene determined by the ensemble classification tree, the result of voting a position where the center position of the workpiece 200 is thought to exist in a circular shape on the voting plane (for each local image The results of voting based on the attributes of the are indicated by contour lines. The numbers in FIG. 10 indicate the heights of the contour lines. Further, the position corresponding to the maximum value of the voting surface was estimated as the center position of the workpiece 200. In FIG. 9, the positions corresponding to the maximum values of the voting plane are ranked in order of the maximal values of the voting plane (voting rank order), and the top first to fifth (numbers 1 to 5). ).

図９に示すように、投票順位の上位の結果（番号１〜４）は、概ねワーク２００の実際の中心位置を正確に推定していることが確認された。すなわち、本実施形態の画像認識方法は、高い精度を有することが確認された。ワーク２００は、平らな面を多く有しているので、バラ積みされた状態でもワーク２００の姿勢の可能性が限定される（姿勢のバリエーションが比較的少ない）ことや、３つの孔２００ａを含むという特異な特徴を有していることから、高い精度で推定することができたと考えられる。一方、ワーク２００がバラ積みされた面に対して傾いた姿勢を有している場合には、このような姿勢が未学習であるため、推定された中心位置が実際の中心位置とずれる場合があることが確認された。 As shown in FIG. 9, it was confirmed that the higher results of the voting rank (numbers 1 to 4) roughly estimated the actual center position of the workpiece 200 in general. That is, it was confirmed that the image recognition method of this embodiment has high accuracy. Since the workpiece 200 has many flat surfaces, the possibility of the posture of the workpiece 200 is limited even when the workpiece 200 is piled up (the variation in posture is relatively small), and includes three holes 200a. It is thought that it was possible to estimate with high accuracy. On the other hand, when the workpiece 200 has a posture inclined with respect to the stacked surface, such a posture has not been learned, and thus the estimated center position may deviate from the actual center position. It was confirmed that there was.

（実験２）
実験２では、図１１に示すように、６つの孔２０１ａを有する平板状のワーク２０１に対して、ワーク２０１の概略の中心位置を推定する実験を行った。なお、ワーク２０１は、図１２および図１３に示すように、表面と裏面とで、形状が異なる。具体的には、ワーク２０１の表面は、周期的な凹凸形状を有している一方、裏面は、平らな面を有している。 (Experiment 2)
In Experiment 2, as shown in FIG. 11, an experiment was performed to estimate the approximate center position of the work 201 with respect to a flat work 201 having six holes 201a. As shown in FIGS. 12 and 13, the workpiece 201 has a different shape on the front surface and the back surface. Specifically, the surface of the work 201 has a periodic uneven shape, while the back surface has a flat surface.

実験２では、図１２および図１３に示すように、平面上に置いたワーク２０１の表面の画像と、裏面の画像とが学習画像として用いられた。 In Experiment 2, as shown in FIGS. 12 and 13, the image of the front surface and the image of the back surface of the work 201 placed on a plane were used as learning images.

図１４は、推定時に用いられたバラ積みされたワーク２０１の画像である。なお、図１４における番号１〜３は、図１５および図１６に示す投票結果に基づいて、ワーク２０１の中心位置と推定された場所を示している。なお、投票面は、図１５および図１６に示すように、ワーク２０１の表面と、裏面とでそれぞれ用意された。そして、ワーク２０１の表面の学習画像（図１２参照）に基づいてアンサンブル分類木が作成された。また、作成されたアンサンブル分類木によって判定された推定シーンの局所画像の属性に基づいて、ワーク２０１の表面の中心位置が存在すると考えられる位置が投票面（図１５参照）上に円状に投票された。同様に、ワーク２０１の裏面の学習画像（図１３参照）に基づいてアンサンブル分類木が作成された。また、作成されたアンサンブル分類木によって判定された推定シーンの局所画像の属性に基づいて、ワーク２０１の裏面の中心位置が存在すると考えられる位置が投票面（図１６参照）上に円状に投票された。そして、図１４では、ワーク２０１の表面および裏面のそれぞれについて、投票面の極大値に対応する位置が、投票面の極大値の大きさ順（投票順位順）に順位付けられて、上位第１位〜第３位（番号１〜３）まで記載されている。 FIG. 14 is an image of the stacked workpieces 201 used at the time of estimation. In addition, the numbers 1-3 in FIG. 14 have shown the place estimated as the center position of the workpiece | work 201 based on the voting result shown in FIG.15 and FIG.16. As shown in FIGS. 15 and 16, voting surfaces were prepared for the front surface and the back surface of the workpiece 201, respectively. An ensemble classification tree was created based on the learning image on the surface of the work 201 (see FIG. 12). In addition, based on the attribute of the local image of the estimated scene determined by the created ensemble classification tree, the positions where the center position of the surface of the work 201 is considered to exist are voted in a circle on the voting surface (see FIG. 15). It was done. Similarly, an ensemble classification tree is created based on the learning image on the back surface of the work 201 (see FIG. 13). Further, based on the attribute of the local image of the estimated scene determined by the created ensemble classification tree, the position where the center position of the back surface of the work 201 is considered to be a circle on the voting surface (see FIG. 16). It was done. In FIG. 14, for each of the front surface and the back surface of the work 201, the positions corresponding to the maximum values of the voting surface are ranked in the order of the maximal values of the voting surface (voting rank order). The first to third positions (numbers 1 to 3) are described.

図１４に示すように、概ねワーク２０１の実際の中心位置が正確に推定されていることが確認された。すなわち、本実施形態の画像認識方法は、表面および裏面の形状が異なるワーク２０１についても、高い精度を有することが確認された。ワーク２０１もワーク２００と同様に、平らな面を多く有しているので、バラ積みされた状態でもワーク２０１の姿勢の可能性が限定されることや、６つの孔２０１ａおよび周期的な凹凸を含むという特異な特徴を有していることから、高い精度で推定することができたと考えられる As shown in FIG. 14, it was confirmed that the actual center position of the workpiece 201 was roughly estimated accurately. That is, it was confirmed that the image recognition method of the present embodiment has high accuracy even for the workpiece 201 having different front and back shapes. Since the workpiece 201 has many flat surfaces like the workpiece 200, the possibility of the posture of the workpiece 201 is limited even in a stacked state, and the six holes 201a and periodic irregularities are formed. It is thought that it was possible to estimate with high accuracy because it has a unique feature of including

本実施形態では、上記のように、抽出された特徴点に対して、回転不変な特徴量であるＤｏＧを用いて特徴量を算出することによって、回転不変でない特徴量を用いて特徴点の特徴量が抽出される場合と異なり、所定の角度毎に回転した学習画像を学習する必要がない。すなわち、回転不変でない特徴量に基づいて、特徴点の属性を判定するためのアンサンブル分類木を作成する場合と比べて、より少ない学習画像からアンサンブル分類木を作成することができるので、アンサンブル分類木を用いて推定シーンの特徴点の属性を判定する際に、推定画像の特徴点とマッチングされる学習画像の特徴点の数を少なくすることができ、その分、計算量を少なくすることができる。その結果、画像（推定対象）の認識（推定）に時間がかかるのを抑制することができる。 In the present embodiment, as described above, the feature point is calculated using the feature amount that is not rotation-invariant by calculating the feature amount using DoG that is the rotation-invariant feature amount with respect to the extracted feature point. Unlike the case where the amount is extracted, there is no need to learn a learning image rotated at every predetermined angle. In other words, an ensemble classification tree can be created from fewer learning images compared to creating an ensemble classification tree for determining feature point attributes based on feature quantities that are not rotation-invariant. When the feature point attribute of the estimated scene is determined using, the number of feature points of the learning image matched with the feature point of the estimated image can be reduced, and the amount of calculation can be reduced accordingly. . As a result, it can be suppressed that it takes time to recognize (estimate) an image (estimation target).

また、本実施形態では、上記のように、回転不変な特徴量として、学習画像の特徴点に対して、ガウス関数を畳み込むことにより複数の平滑化画像を生成するとともに、生成された複数の平滑化画像のうちの２つの平滑化画像の差分であるＤｏＧ値を用いる。これにより、容易に、学習画像から抽出された特徴点に対して回転不変な特徴量を算出することができる。 Further, in the present embodiment, as described above, a plurality of smoothed images are generated by convolving a Gaussian function with respect to the feature points of the learning image as the rotation-invariant feature quantity, and the generated plurality of smoothing features. A DoG value that is a difference between two smoothed images of the digitized image is used. Thereby, it is possible to easily calculate a rotation-invariant feature amount with respect to the feature point extracted from the learning image.

また、本実施形態では、上記のように、ＤｏＧ値を、特徴点における範囲の異なる２つの同心円状の領域のそれぞれの輝度値の合算値の差として算出する。これにより、２つの同心円状の領域のそれぞれの輝度値は、回転不変な値であるので、学習画像から抽出された特徴点に対して回転不変な特徴量を算出することができる。 In the present embodiment, as described above, the DoG value is calculated as a difference between the sum values of the luminance values of the two concentric regions having different ranges at the feature points. Thereby, since the luminance values of the two concentric regions are rotation invariant values, it is possible to calculate a rotation invariant feature quantity with respect to the feature points extracted from the learning image.

また、本実施形態では、上記のように、複数のＤｏＧ値のうち、互いに相関の低いＤｏＧ値を選択するとともに、選択したＤｏＧ値に基づいて、識別器を作成する。これにより、全てのＤｏＧ値を用いて識別器を作成する場合と異なり、識別器を用いて推定シーンの特徴点の属性を判定する際の計算量をより少なくすることができる。その結果、画像の認識に時間がかかるのをより抑制することができる。 In the present embodiment, as described above, a DoG value having a low correlation among a plurality of DoG values is selected, and a discriminator is created based on the selected DoG value. Thereby, unlike the case where the classifier is created using all the DoG values, the amount of calculation when determining the attribute of the feature point of the estimated scene using the classifier can be further reduced. As a result, it is possible to further suppress the time taken for image recognition.

また、本実施形態では、上記のように、ＤｏＧ値を要素とする複数のベクトルｆを生成し、複数のベクトルｆの間のハミング距離に基づいて、互いに相関の低いＤｏＧ値を選択するとともに、選択したＤｏＧ値に基づいてアンサンブル分類木を作成する。これにより、互いに相関の低いＤｏＧ値が選択されるので、特徴ベクトルＶの次元数を効果的に削減することができる。 In the present embodiment, as described above, a plurality of vectors f having DoG values as elements are generated, and based on the Hamming distances between the plurality of vectors f, DoG values having low correlation with each other are selected. An ensemble classification tree is created based on the selected DoG value. As a result, DoG values having a low correlation with each other are selected, so that the number of dimensions of the feature vector V can be effectively reduced.

また、本実施形態では、上記のように、回転不変な特徴量を用いて算出された学習画像の特徴点の特徴量から、特徴点の属性を判定するための分類木を複数有するアンサンブル分類木を作成する。これにより、１つの分類木の判別性能（精度）が比較的低い場合でも、分類木を複数有するアンサンブル分類木により、特徴点の属性の判定性能を高めることができる。 In the present embodiment, as described above, an ensemble classification tree having a plurality of classification trees for determining the feature point attributes from the feature quantities of the feature points of the learning image calculated using the rotation-invariant feature quantities. Create As a result, even when the discrimination performance (accuracy) of one classification tree is relatively low, the ensemble classification tree having a plurality of classification trees can enhance the attribute judgment performance of feature points.

また、本実施形態では、上記のように、アンサンブル分類木により判定された特徴点の属性に基づいて、投票面に円状に投票することによって、推定対象の位置を推定する。これにより、特徴点の特徴量を回転不変な特徴量であるＤｏＧを用いて算出した場合でも、投票を投票面に円状に行って、投票面に投票が集まっている所に推定対象が存在していると判断することにより、容易に、推定対象（ワーク２００および２０１の中心位置）を推定することができる。 Further, in the present embodiment, as described above, the position of the estimation target is estimated by voting in a circle on the voting surface based on the attribute of the feature point determined by the ensemble classification tree. As a result, even when the feature amount of the feature point is calculated using DoG, which is a rotation-invariant feature amount, the voting is performed circularly on the voting surface, and the estimation target exists where the voting is gathered on the voting surface By determining that it is being performed, it is possible to easily estimate the estimation target (the center position of the workpieces 200 and 201).

また、本実施形態では、上記のように、推定シーンは、バラ積みされた複数のワーク２００および２０１の画像であり、アンサンブル分類木を用いて判定された特徴点の属性に基づいて、投票面に円状に投票することによって、バラ積みされた複数のワーク２００および２０１の中心位置を推定する。これにより、回転不変な特徴量（ＤｏＧ）に基づいて作成されたアンサンブル分類木により、バラ積みされた複数のワーク２００および２０１の中心位置を迅速に推定することができる。 Further, in the present embodiment, as described above, the estimation scene is an image of a plurality of works 200 and 201 stacked in bulk, and the voting plane is based on the feature point attributes determined using the ensemble classification tree. The center positions of the plurality of workpieces 200 and 201 stacked in bulk are estimated by voting in a circle. Thereby, the center position of the several workpiece | work 200 and 201 piled up can be rapidly estimated with the ensemble classification | category tree produced based on the rotation invariable feature-value (DoG).

また、本実施形態では、上記のように、学習画像を、１つのワーク２００の画像から構成する。これにより、所定の角度毎に回転させたワークの学習画像を複数用意して、複数の学習画像からアンサンブル分類木を作成する場合と異なり、推定対象の推定（認識）に時間がかかるのを抑制することができる。 In the present embodiment, as described above, the learning image is configured from the image of one workpiece 200. This reduces the time required for estimation (recognition) of the estimation target, unlike when preparing multiple learning images of a work rotated at a predetermined angle and creating an ensemble classification tree from multiple learning images. can do.

また、本実施形態では、上記のように、学習画像および推定シーンを、２次元画像から構成する。これにより、学習画像および推定シーンを３次元画像から構成する場合と異なり、アンサンブル分類木の作成および推定対象の推定を迅速に行うことができる。 In the present embodiment, as described above, the learning image and the estimated scene are configured from two-dimensional images. Thereby, unlike the case where the learning image and the estimation scene are configured from three-dimensional images, it is possible to quickly create an ensemble classification tree and estimate an estimation target.

また、本実施形態では、上記のように、推定シーンの特徴点を、推定シーンの局所画像から構成する。これにより、推定シーンの全ての点（ｐｉｘｅｌ）において、特徴点の特徴量を算出する場合と異なり、推定対象の認識を迅速に行うことができる。 In the present embodiment, as described above, the feature points of the estimated scene are configured from the local images of the estimated scene. Thereby, unlike the case where the feature amount of the feature point is calculated at all points (pixels) of the estimation scene, the estimation target can be quickly recognized.

なお、今回開示された実施形態は、すべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した実施形態の説明ではなく特許請求の範囲によって示され、さらに特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれる。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is shown not by the above description of the embodiments but by the scope of claims for patent, and further includes all modifications within the meaning and scope equivalent to the scope of claims for patent.

たとえば、上記実施形態では、回転不変な特徴量としてＤｏＧ（ＤｏＧ値）を用いる例を示したが、回転不変な特徴量としてＤｏＧ以外の特徴量を用いてもよい。たとえば、推定対象の中心から推定対象のエッジ（輪郭）までの距離を光軸周りに所定の角度間隔で算出するとともに、得られた結果を周波数解析することにより得られる特徴量を、回転不変な特徴量として用いてもよい。 For example, in the above-described embodiment, an example in which DoG (DoG value) is used as a rotation-invariant feature amount has been described. However, a feature amount other than DoG may be used as a rotation-invariant feature amount. For example, the distance from the center of the estimation target to the edge (contour) of the estimation target is calculated at a predetermined angular interval around the optical axis, and the feature amount obtained by frequency analysis of the obtained result is a rotation invariant. It may be used as a feature amount.

また、上記実施形態では、互いに相関の低いＤｏＧ（ＤｏＧ値）を選択することにより、特徴ベクトルＶの次元を削減する例を示したが、特徴ベクトルＶの次元を削減しなくてもよい。 In the above-described embodiment, an example in which the dimension of the feature vector V is reduced by selecting DoGs (DoG values) having a low correlation with each other is shown. However, the dimension of the feature vector V may not be reduced.

また、上記実施形態では、特徴点における範囲の異なる２つの同心円状の領域のそれぞれの輝度値の合算値の差をＤｏＧ値とする例を示したが、たとえば、輝度値の平均値の差をＤｏＧ値としてもよい。 Moreover, in the said embodiment, although the example which made the difference of the luminance value of each two concentric area | regions where the range in a feature point differs in the DoG value was shown, the difference of the average value of a luminance value is shown, for example It may be a DoG value.

また、上記実施形態では、識別器としてアンサンブル分類木を用いる例を示したが、アンサンブル分類木以外の識別器（たとえば、１つの分類木や、サポートベクターマシン（ＳＶＭ）など）を用いてもよい。 In the above embodiment, an example using an ensemble classification tree as a classifier has been shown. However, a classifier other than the ensemble classification tree (for example, one classification tree or a support vector machine (SVM)) may be used. .

また、上記実施形態では、バラ積みされた複数のワークの画像から、ワークの中心位置を推定する例を示したが、バラ積みされた複数のワーク以外の推定対象（写真の中の人物や、航空写真中の所定の建物など）を推定することも可能である。 Moreover, in the said embodiment, although the example which estimates the center position of a workpiece | work from the image of the several workpiece | work piled up separately was shown, the estimation object (person in a photograph, It is also possible to estimate a predetermined building in an aerial photograph.

また、上記実施形態では、平面上に載置された１つのワークの画像（１つのワークの表面と裏面との画像）を学習画像として用いる例を示したが、たとえば、平面上に載置されたワークの画像に加えて、載置面に対して傾斜した状態のワークの画像や、ワークの側面の画像を学習画像として用いてもよい。これにより、様々な姿勢でバラ積みされた複数のワークの推定を精度よく行うことが可能となる。 Moreover, although the example which uses the image of one workpiece | work (image of the surface of one workpiece | work and a back surface) as a learning image was shown in the said embodiment, for example, it is mounted on the plane. In addition to the image of the workpiece, an image of the workpiece inclined with respect to the placement surface or an image of the side surface of the workpiece may be used as a learning image. This makes it possible to accurately estimate a plurality of workpieces stacked in various postures.

また、上記実施形態では、センサユニットの画像処理部により、ワークの概略位置の推定が行われる例を示したが、センサユニットの画像処理部以外の部分（たとえば、ロボットコントローラや、別途設けられたパーソナルコンピュータ（ＰＣ））によって、ワークの概略位置の推定を行ってもよい。 In the above embodiment, an example is shown in which the approximate position of the workpiece is estimated by the image processing unit of the sensor unit. However, a part other than the image processing unit of the sensor unit (for example, a robot controller or a separate unit) The approximate position of the workpiece may be estimated by a personal computer (PC).

また、上記実施形態では、ロボットシステムに、上記した画像認識方法を用いる例を示したが、ロボットシステム以外のシステムに上記した画像認識方法を用いてもよい。 Moreover, although the example which uses the above-mentioned image recognition method for a robot system was shown in the said embodiment, you may use the above-mentioned image recognition method for systems other than a robot system.

３センサユニット（画像認識装置）
３４画像処理部（第１特徴点抽出手段、特徴量算出手段、識別器作成手段、第２特徴点抽出手段、認識手段）
２００、２０１ワーク（推定対象） 3 Sensor unit (image recognition device)
34 Image processing unit (first feature point extraction means, feature amount calculation means, classifier creation means, second feature point extraction means, recognition means)
200, 201 Workpiece (estimation target)

Claims

Extracting a plurality of feature points from the learning image;
Calculating a feature quantity using a rotation-invariant feature quantity for the extracted feature points;
Creating a discriminator for determining an attribute of the feature point based on the calculated feature amount of the feature point of the learning image;
Extracting a plurality of feature points from the estimated image;
A step of recognizing the estimated image by aggregating attributes of a plurality of feature points of the extracted estimated image using the classifier to determine a position of an estimation target .

The rotation-invariant feature amount generates a plurality of smoothed images by convolving a Gaussian function with respect to a feature point of the learning image, and smoothes two of the generated smoothed images. The image recognition method according to claim 1, comprising a DoG (Difference-of-Gaussian) value that is a difference between images.

The image recognition method according to claim 2, wherein the DoG value is a difference between the sum of luminance values of two concentric regions having different ranges in the feature point.

The DoG value includes a plurality of DoG values;
The step of creating a discriminator for determining the attribute of the feature point selects a DoG value having a low correlation among the plurality of DoG values, and selects the discriminator based on the selected DoG value. The image recognition method according to claim 2, further comprising a creating step.

The step of creating the discriminator based on the selected DoG value generates a plurality of vectors having the DoG value as elements, and the DoG values having low correlation with each other based on the Hamming distance between the plurality of vectors. The image recognition method according to claim 4, further comprising: creating a classifier based on the selected DoG value.

The step of creating a discriminator for determining the attribute of the feature point is for determining the attribute of the feature point from the feature amount of the feature point of the learning image calculated using the rotation-invariant feature amount. The image recognition method of any one of Claims 1-5 including the step which creates the ensemble classification | category tree which has two or more classification trees.

The step of determining the attribute of the feature point of the extracted learning image using the classifier and recognizing the estimated image is performed on the voting surface based on the attribute of the feature point determined by the classifier. The image recognition method according to claim 1, comprising a step of estimating a position of an estimation target by voting in a circle.

The estimated image includes images of a plurality of workpieces stacked in bulk,
The step of estimating the position of the estimated image is performed by voting in a circle on a voting surface based on the attribute of the feature point determined using the discriminator. The image recognition method according to claim 7, comprising a step of estimating a position.

The image recognition method according to claim 8, wherein the learning image is composed of one image of the workpiece.

The image recognition method according to claim 1, wherein the learning image and the estimated image are two-dimensional images.

The image recognition method according to claim 1, wherein the feature point of the estimated image is a local image of the estimated image.

First feature point extracting means for extracting a plurality of feature points from the learning image;
A feature quantity calculating means for calculating a feature quantity using a rotation-invariant feature quantity for the extracted feature points;
A discriminator creating means for creating a discriminator for determining an attribute of the feature point based on the calculated feature amount of the feature point of the learning image;
Second feature point extracting means for extracting a plurality of feature points from the estimated image;
An image recognition apparatus comprising: a recognizing unit that aggregates attributes of a plurality of feature points of the extracted estimated image using the classifier to determine a position of an estimation target and recognizes the estimated image.