JP2015102928A

JP2015102928A - Processing apparatus, robot, position and attitude detection method, and program

Info

Publication number: JP2015102928A
Application number: JP2013241619A
Authority: JP
Inventors: 良至岸; Ryoji Kishi
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2013-11-22
Filing date: 2013-11-22
Publication date: 2015-06-04

Abstract

PROBLEM TO BE SOLVED: To provide a processing apparatus, robot, position and attitude detection method, program, and the like accurately detecting the position and attitude in six degrees of freedom of an object by using a phase restriction correlation method and a technique involving update processing of a position and attitude based on similarity (in a narrow sense, an asymptotic method).SOLUTION: A processing apparatus 100 includes: an input image acquisition unit 110 that acquires an input image; a template image acquisition unit 120 that sets attitude information on a three-dimensional model data of an object to acquire a template image; and a processing unit 130 that performs phase restriction correlation processing to the input image and template image to determine the similarity between the images, performs update processing of attitude information on the basis of the similarity, and performs phase restriction correlation processing to a new template image and input image acquired according to the attitude information after the update processing, and thereby performing detection processing of the position and attitude of the object.

Description

本発明は、処理装置、ロボット、位置姿勢検出方法及びプログラム等に関する。 The present invention relates to a processing device, a robot, a position / orientation detection method, a program, and the like.

所与の物体が、空間上のどの位置にどのような姿勢で配置されているかを検出する位置姿勢検出手法は種々の場面で有用である。例えば、ロボットを用いて何らかの作業を行う場合に、作業対象物に対して適切な作業を行うために、当該作業対象物の位置姿勢を検出することが行われる。通常のロボット作業では、作業対象物の所望の位置にねじ締めを行う、或いは所望の位置を所望のハンド姿勢で把持する等のように、作業対象物の位置姿勢が正確に把握されていることが望ましい状況が多いためである。 A position / orientation detection method for detecting a given object at which position in space and in which attitude is useful in various situations. For example, when performing some work using a robot, the position and orientation of the work object are detected in order to perform an appropriate work on the work object. In normal robot work, the position and orientation of the work object must be accurately grasped, such as screwing the work object to the desired position or holding the desired position with the desired hand posture. This is because there are many desirable situations.

一方で、高いロバスト性を有し、高精度（例えばサブピクセル単位）での画像マッチングを行う手法として位相限定相関法が知られている。例えば、非特許文献１では、位相限定法についての詳細な手法が開示されている。また、特許文献１では、位相限定相関法に限定されないものではあるが、サブピクセル単位での推定処理を行う手法が開示されている。 On the other hand, a phase-only correlation method is known as a technique for performing image matching with high robustness and high accuracy (for example, in units of subpixels). For example, Non-Patent Document 1 discloses a detailed method for the phase limiting method. Patent Document 1 discloses a technique for performing estimation processing in units of subpixels, although not limited to the phase-only correlation method.

特開２００９−２８２６３５号公報JP 2009-282635 A

青木孝文, 伊藤康一, 柴原琢磨, 長嶋聖,"位相限定相関法に基づく高精度マシンビジョン −ピクセル分解能の壁を越える画像センシング技術を目指して−,"IEICE Fundamentals Review, Vol. 1, No. 1, pp. 30--40, July 2007.Takafumi Aoki, Koichi Ito, Takuma Shibahara, Kiyoshi Nagashima, "High-Precision Machine Vision Based on Phase-Only Correlation-Towards Image Sensing Technology that Crosses Pixel Resolution-," IEICE Fundamentals Review, Vol. 1, No. 1, pp. 30--40, July 2007.

従来、物体の３次元位置姿勢を推定する際には、レーザーレンジセンサーやプロジェクターによるパターン光の投影、或いはtime of flight方式による距離画像センサー等を用いて、物体の３次元形状を計測し、３次元モデルデータ（ＣＡＤデータ等の３次元形状のモデルデータ）と照合して物体の位置姿勢を検出していた。つまり、従来手法は専用のハードウェア（専用のセンサーや、パターン光を投影するプロジェクター）を用いるものであり、単純な２次元画像データである入力画像（例えば物体が撮像された撮像画像）を用いるものが主ではなかった。 Conventionally, when estimating the three-dimensional position and orientation of an object, the three-dimensional shape of the object is measured using a laser range sensor, projection of pattern light by a projector, or a distance image sensor by a time of flight method. The position and orientation of the object are detected by collating with the three-dimensional model data (model data of a three-dimensional shape such as CAD data). That is, the conventional method uses dedicated hardware (a dedicated sensor or a projector that projects pattern light), and uses an input image (for example, a captured image obtained by capturing an object) that is simple two-dimensional image data. Things weren't the main thing.

位相限定相関法は高精度での画像マッチングを行う手法であるが、位相限定相関法を用いて、物体の位置姿勢（狭義には、３つの軸での位置と、各軸周りでの回転による６自由度の情報）の推定を行う手法は開示されていない。さらにいえば、パラメーターを更新しながら解（極値）を探索する漸近法（非線形計画法、山登り法）と、位相限定相関法を併用する手法の開示は見られない。 The phase-only correlation method is a technique for performing image matching with high accuracy, but using the phase-only correlation method, the position and orientation of an object (in a narrow sense, the position on three axes and the rotation around each axis) A method for estimating information on 6 degrees of freedom is not disclosed. Furthermore, there is no disclosure of a technique that uses the asymptotic method (nonlinear programming, hill-climbing method) for searching for a solution (extreme value) while updating parameters and the phase-only correlation method.

本発明の一態様は、入力画像を取得する入力画像取得部と、オブジェクトの３次元モデルデータに対して、第１〜第３の軸により規定される３次元空間での位置のうちの第３の軸での位置と、前記第１〜第３の軸の各軸まわりでの回転角である第１〜第３の回転角と、を表す姿勢情報を設定することで、前記３次元モデルデータからテンプレート画像を取得するテンプレート画像取得部と、前記入力画像と前記テンプレート画像に対して位相限定相関処理を行って、画像間の類似度を求め、前記類似度に基づいて前記姿勢情報の更新処理を行い、前記更新処理後の前記姿勢情報により取得された新たな前記テンプレート画像と、前記入力画像に対して、前記位相限定相関処理を行うことで、前記入力画像における前記オブジェクトの位置姿勢の検出処理を行う処理部と、を含む処理装置に関係する。 According to one aspect of the present invention, an input image acquisition unit that acquires an input image and a third of the positions in a three-dimensional space defined by the first to third axes with respect to the three-dimensional model data of the object 3D model data by setting posture information representing the position of the first axis and the third rotation angle, which are rotation angles around the first to third axes. A template image obtaining unit for obtaining a template image from the image, and performing phase-only correlation processing on the input image and the template image to obtain a similarity between the images, and updating the posture information based on the similarity The position and orientation of the object in the input image by performing the phase-only correlation process on the new template image acquired from the orientation information after the update process and the input image Related to processing apparatus including a processing unit that performs detection processing, the.

本発明の一態様では、３次元モデルデータの位置姿勢を設定することで取得したテンプレート画像と入力画像に対して、位相限定相関処理を行って類似度を求めることで、オブジェクトの位置姿勢を検出する。この際、類似度を用いて位置姿勢の更新処理を行い、更新後の位置姿勢から取得したテンプレート画像を用いてさらに位相限定処理を実行する。これにより、位相限定相関法と、類似度に基づく位置姿勢の更新処理を伴う手法（狭義には漸近法）を用いて、オブジェクトの６自由度の位置姿勢を精度よく検出すること等が可能になる。 In one aspect of the present invention, the position and orientation of an object are detected by performing a phase-only correlation process on a template image and an input image obtained by setting the position and orientation of the three-dimensional model data to obtain a similarity. To do. At this time, the position / orientation update process is performed using the similarity, and the phase limiting process is further performed using the template image acquired from the updated position / orientation. This makes it possible to accurately detect the position and orientation of an object with 6 degrees of freedom by using a phase-only correlation method and a method (asymptotic method in a narrow sense) that involves a position and orientation update process based on similarity. Become.

また、本発明の一態様では、前記処理部は、前記姿勢情報の前記更新処理による複数回の前記位相限定相関処理によって、前記類似度を極大とする前記姿勢情報が求められた場合には、前記姿勢情報に基づき取得される前記テンプレート画像と、前記入力画像に対する前記位相限定相関処理により求められる、前記第１の軸における位置及び前記第２の軸における位置と、前記姿勢情報により表される前記第３の軸での位置と、前記姿勢情報により表される前記第１〜第３の回転角とにより表される情報を、前記入力画像における前記オブジェクトの前記位置姿勢として検出してもよい。 Further, in one aspect of the present invention, when the posture information that maximizes the degree of similarity is obtained by the phase-only correlation processing that is performed a plurality of times by the update processing of the posture information, The template image acquired based on the posture information, the position on the first axis and the position on the second axis, which are obtained by the phase-only correlation process on the input image, and the posture information. Information represented by the position on the third axis and the first to third rotation angles represented by the posture information may be detected as the position and posture of the object in the input image. .

これにより、位置姿勢の６つの自由度のそれぞれを、類似度を極大とする位置姿勢に基づいて適切に決定すること等が可能になる。 Accordingly, it is possible to appropriately determine each of the six degrees of freedom of the position and orientation based on the position and orientation that maximizes the similarity.

また、本発明の一態様では、前記テンプレート画像取得部は、前記姿勢情報により求められる第１のテンプレート画像と、前記姿勢情報のうち、前記第３の軸での位置を変化させることにより取得される第２のテンプレート画像と、前記姿勢情報のうち、前記第１の回転角を変化させることにより取得される第３のテンプレート画像と、前記姿勢情報のうち、前記第２の回転角を変化させることにより取得される第４のテンプレート画像と、前記姿勢情報のうち、前記第３の回転角を変化させることにより取得される第５のテンプレート画像と、を取得し、前記処理部は、取得された第１〜第５のテンプレート画像と、前記入力画像を用いた前記位相限定相関処理により求められる複数の前記類似度に基づいて、前記姿勢情報の前記更新処理を行ってもよい。 In the aspect of the invention, the template image acquisition unit is acquired by changing a position on the third axis among the first template image obtained from the posture information and the posture information. The second template image and the third template image acquired by changing the first rotation angle among the posture information and the second rotation angle among the posture information are changed. A fourth template image obtained by the above and a fifth template image obtained by changing the third rotation angle among the posture information, and the processing unit is obtained. The update processing of the posture information based on the first to fifth template images and the plurality of similarities obtained by the phase-only correlation processing using the input image It may be carried out.

これにより、位置姿勢のうち４つの自由度をそれぞれ変化させることで取得される複数のテンプレート画像を用いて、位置姿勢の更新処理を行うこと等が可能になる。 This makes it possible to perform position and orientation update processing using a plurality of template images acquired by changing four degrees of freedom of the position and orientation, respectively.

また、本発明の一態様では、前記テンプレート画像取得部は、前記３次元モデルデータの透視変換処理を行うことで前記テンプレート画像を取得し、前記第３の軸は、前記透視変換における奥行き方向に対応する軸であってもよい。 In the aspect of the invention, the template image acquisition unit acquires the template image by performing a perspective transformation process of the three-dimensional model data, and the third axis is in a depth direction in the perspective transformation. It may be a corresponding axis.

これにより、第３の軸を奥行き方向の軸とした、３次元モデルデータの透視変換処理によりテンプレート画像を取得すること等が可能になる。 Thereby, it is possible to obtain a template image by the perspective transformation process of the three-dimensional model data with the third axis as the axis in the depth direction.

また、本発明の一態様では、前記処理部は、前記入力画像に対する周波数変換処理により求められる入力位相画像と、前記テンプレート画像に対する前記周波数変換処理により求められるテンプレート位相画像との位相限定合成処理の結果に対して、所与の重み付け処理を行った後に、前記周波数変換処理の逆変換処理を行うことで、前記類似度を求めてもよい。 In the aspect of the invention, the processing unit may perform a phase-only combining process between an input phase image obtained by the frequency conversion process on the input image and a template phase image obtained by the frequency conversion process on the template image. After performing a given weighting process on the result, the similarity may be obtained by performing an inverse conversion process of the frequency conversion process.

これにより、位相限定相関処理において、重みづけ処理を行うこと等が可能になる。 This makes it possible to perform weighting processing in the phase only correlation processing.

また、本発明の一態様では、前記処理部は、前記入力画像に対して、解像度を低下させる解像度低下処理を行って、前記入力画像に比べて前記解像度の低い低解像度入力画像を取得し、前記低解像度入力画像に対して、前記位相限定相関処理を行ってもよい。 In one aspect of the present invention, the processing unit performs a resolution reduction process for reducing the resolution of the input image to obtain a low-resolution input image having a lower resolution than the input image. The phase only correlation process may be performed on the low resolution input image.

これにより、位相限定相関処理の対象として、低解像度入力画像を用いること等が可能になる。 As a result, it is possible to use a low-resolution input image as the target of the phase-only correlation process.

また、本発明の一態様では、前記処理部は、前記テンプレート画像に対して前記解像度低下処理を行って、前記低解像度入力画像に対応する前記解像度となる低解像度テンプレート画像を取得し、前記低解像度入力画像と前記低解像度テンプレート画像に対して、前記位相限定相関処理を行ってもよい。 In the aspect of the invention, the processing unit performs the resolution reduction process on the template image to obtain a low-resolution template image having the resolution corresponding to the low-resolution input image. The phase only correlation process may be performed on the resolution input image and the low resolution template image.

これにより、低解像度入力画像と低解像度テンプレート画像の解像度を対応させること等が可能になる。 Thereby, it becomes possible to make the resolution of a low resolution input image correspond to the resolution of a low resolution template image.

また、本発明の一態様では、前記テンプレート画像取得部は、前記テンプレート画像における前記オブジェクトの前記解像度が、前記低解像度入力画像における前記オブジェクトの前記解像度に対応する画像を前記テンプレート画像として取得し、前記処理部は、前記低解像度入力画像と前記テンプレート画像に対して、前記位相限定相関処理を行ってもよい。 In one aspect of the present invention, the template image acquisition unit acquires, as the template image, an image in which the resolution of the object in the template image corresponds to the resolution of the object in the low-resolution input image, The processing unit may perform the phase only correlation process on the low resolution input image and the template image.

これにより、低解像度入力画像とテンプレート画像の解像度を対応させること等が可能になる。 Thereby, it becomes possible to make the resolution of a low resolution input image correspond to the resolution of a template image.

また、本発明の一態様では、前記処理部は、前記低解像度入力画像に対する前記位相限定相関処理を実行した後に、前記入力画像と前記テンプレート画像に対する前記位相限定相関処理を実行してもよい。 In the aspect of the invention, the processing unit may execute the phase-only correlation process on the input image and the template image after executing the phase-only correlation process on the low-resolution input image.

これにより、状況に応じて解像度低下処理を実行するか否かを変更すること等が可能になる。 This makes it possible to change whether or not to execute the resolution reduction process according to the situation.

また、本発明の他の態様は、入力画像を取得する入力画像取得部と、オブジェクトの３次元モデルデータに対して、第１〜第３の軸により規定される３次元空間での位置のうちの第３の軸での位置と、前記第１〜第３の軸の各軸まわりでの回転角である第１〜第３の回転角と、を表す姿勢情報を設定することで、前記３次元モデルデータからテンプレート画像を取得するテンプレート画像取得部と、前記入力画像と前記テンプレート画像に対して位相限定相関処理を行って、画像間の類似度を求め、前記類似度に基づいて前記姿勢情報の更新処理を行い、前記更新処理後の前記姿勢情報により取得された新たな前記テンプレート画像と、前記入力画像に対して、前記位相限定相関処理を行うことで、前記入力画像における前記オブジェクトの位置姿勢の検出処理を行う処理部と、を含むロボットに関係する。 Another aspect of the present invention is an input image acquisition unit that acquires an input image and a position in a three-dimensional space defined by first to third axes with respect to the three-dimensional model data of an object. By setting posture information representing the position on the third axis and the first to third rotation angles which are rotation angles around the respective axes of the first to third axes, A template image acquisition unit for acquiring a template image from the dimensional model data; and performing phase-only correlation processing on the input image and the template image to obtain a similarity between the images, and the posture information based on the similarity And performing the phase-only correlation process on the new template image obtained from the posture information after the update process and the input image, so that the object in the input image A processing unit that performs detection processing of postures relates to a robot comprising a.

また、本発明の他の態様は、入力画像を取得する入力画像取得処理を行うことと、オブジェクトの３次元モデルデータに対して、第１〜第３の軸により規定される３次元空間での位置のうちの第３の軸での位置と、前記第１〜第３の軸の各軸まわりでの回転角である第１〜第３の回転角と、を表す姿勢情報を設定することで、前記３次元モデルデータからテンプレート画像を取得するテンプレート画像取得処理を行うことと、前記入力画像と前記テンプレート画像に対して位相限定相関処理を行って、画像間の類似度を求め、前記類似度に基づいて前記姿勢情報の更新処理を行い、前記更新処理後の前記姿勢情報により取得された新たな前記テンプレート画像と、前記入力画像に対して、前記位相限定相関処理を行うことで、前記入力画像における前記オブジェクトの位置姿勢の検出処理を行うことと、を含む位置姿勢検出方法に関係する。 In another aspect of the present invention, input image acquisition processing for acquiring an input image is performed, and the three-dimensional model data of the object is defined in a three-dimensional space defined by the first to third axes. By setting posture information representing the position of the third axis among the positions and the first to third rotation angles that are rotation angles around the respective axes of the first to third axes. Performing a template image acquisition process for acquiring a template image from the three-dimensional model data; performing a phase-only correlation process on the input image and the template image to obtain a similarity between the images; And performing the phase-only correlation process on the new template image acquired from the posture information after the update process and the input image. On the image Kicking relating to the position and orientation detecting method comprising the performing the detection processing of the position and orientation of the object.

また、本発明の他の態様は、入力画像を取得する入力画像取得部と、オブジェクトの３次元モデルデータに対して、第１〜第３の軸により規定される３次元空間での位置のうちの第３の軸での位置と、前記第１〜第３の軸の各軸まわりでの回転角である第１〜第３の回転角と、を表す姿勢情報を設定することで、前記３次元モデルデータからテンプレート画像を取得するテンプレート画像取得部と、前記入力画像と前記テンプレート画像に対して位相限定相関処理を行って、画像間の類似度を求め、前記類似度に基づいて前記姿勢情報の更新処理を行い、前記更新処理後の前記姿勢情報により取得された新たな前記テンプレート画像と、前記入力画像に対して、前記位相限定相関処理を行うことで、前記入力画像における前記オブジェクトの位置姿勢の検出処理を行う処理部として、コンピューターを機能させるプログラムに関係する。 Another aspect of the present invention is an input image acquisition unit that acquires an input image and a position in a three-dimensional space defined by first to third axes with respect to the three-dimensional model data of an object. By setting posture information representing the position on the third axis and the first to third rotation angles which are rotation angles around the respective axes of the first to third axes, A template image acquisition unit for acquiring a template image from the dimensional model data; and performing phase-only correlation processing on the input image and the template image to obtain a similarity between the images, and the posture information based on the similarity And performing the phase-only correlation process on the new template image obtained from the posture information after the update process and the input image, so that the object in the input image As a processing unit that performs detection processing of the postures, there is provided a program causing a computer to function.

本発明の一態様では、３次元モデルデータの位置姿勢を設定することで取得したテンプレート画像と入力画像に対して、位相限定相関処理を行って類似度を求めることで、オブジェクトの位置姿勢を検出する手法をコンピューターに実行させる。この際、類似度を用いて位置姿勢の更新処理を行い、更新後の位置姿勢から取得したテンプレート画像を用いてさらに位相限定処理を実行する。これにより、位相限定相関法と、類似度に基づく位置姿勢の更新処理を伴う手法（狭義には漸近法）を用いて、オブジェクトの６自由度の位置姿勢を精度よく検出すること等が可能になる。 In one aspect of the present invention, the position and orientation of an object are detected by performing a phase-only correlation process on a template image and an input image obtained by setting the position and orientation of the three-dimensional model data to obtain a similarity. Make the computer execute the technique. At this time, the position / orientation update process is performed using the similarity, and the phase limiting process is further performed using the template image acquired from the updated position / orientation. This makes it possible to accurately detect the position and orientation of an object with 6 degrees of freedom by using a phase-only correlation method and a method (asymptotic method in a narrow sense) that involves a position and orientation update process based on similarity. Become.

このように、本発明の幾つかの態様によれば、位相限定相関法と、類似度に基づく位置姿勢の更新処理を伴う手法（狭義には漸近法）を用いて、オブジェクトの６自由度の位置姿勢を精度よく検出する処理装置、ロボット、位置姿勢検出方法及びプログラム等を提供することができる。 As described above, according to some aspects of the present invention, using a phase-only correlation method and a method (positioning and asymptotic method in a narrow sense) that involves position and orientation update processing based on similarity, It is possible to provide a processing device, a robot, a position and orientation detection method, a program, and the like that detect the position and orientation with high accuracy.

図１（Ａ）、図１（Ｂ）は３次元モデルデータの位置姿勢の変化と、テンプレート画像におけるオブジェクトの変化の説明図。FIGS. 1A and 1B are explanatory diagrams of changes in position and orientation of three-dimensional model data and changes in objects in a template image. 姿勢情報と類似度の関係を示す３次元グラフ。A three-dimensional graph showing the relationship between posture information and similarity. 位相限定相関法を用いた場合の姿勢情報変化に対する類似度の感度を説明する図。The figure explaining the sensitivity of the similarity with respect to the attitude | position change at the time of using a phase only correlation method. ピーク位置を補間処理により求める例。An example in which the peak position is obtained by interpolation processing. 漸近法（山登り法）の説明図。Explanatory drawing of the asymptotic method (mountain climbing method). 本実施形態に係る処理装置の構成例。The structural example of the processing apparatus which concerns on this embodiment. 漸近法を実行できる位置とできない位置の比較例。A comparison example of a position where the asymptotic method can be executed and a position where the asymptotic method cannot be executed. 通常の位相限定相関法の処理を説明する図。The figure explaining the process of the normal phase only correlation method. 位相限定合成処理の結果に対して重み付け処理を行う例。The example which performs a weighting process with respect to the result of a phase only synthetic | combination process. 図１０（Ａ）は重み付け処理を行わない場合の姿勢情報に対する類似度の例、図１０（Ｂ）は重み付け処理を行った場合の姿勢情報に対する類似度の例。FIG. 10A shows an example of similarity to posture information when weighting processing is not performed, and FIG. 10B shows an example of similarity to posture information when weighting processing is performed. 図１１（Ａ）〜図１１（Ｃ）は重み付け処理におけるパラメーターを変更した場合の姿勢情報に対する類似度の例。FIGS. 11A to 11C show examples of similarity to posture information when parameters in the weighting process are changed. 図１２（Ａ）〜図１２（Ｃ）は入力画像と低解像度入力画像の例。12A to 12C show examples of an input image and a low resolution input image. 図１３（Ａ）〜図１３（Ｃ）は解像度低下処理を行わない場合と行う場合での、姿勢情報に対する類似度の差異を説明する図。FIG. 13A to FIG. 13C are diagrams for explaining a difference in similarity to posture information between when the resolution reduction process is not performed and when it is performed. 本実施形態の手法を実行した場合の姿勢情報の収束の例。The example of the convergence of attitude | position information at the time of performing the method of this embodiment. 本実施形態に係る処理装置の他の構成例。The other structural example of the processing apparatus which concerns on this embodiment. 本実施形態に係るロボットの構成例。2 is a configuration example of a robot according to the present embodiment. 本実施形態に係るロボットの構造の例。The example of the structure of the robot which concerns on this embodiment. 本実施形態に係るロボットの構造の他の例。The other example of the structure of the robot which concerns on this embodiment. 本実施形態に係る処理装置等をサーバーシステムにより構成する例。The example which comprises the processing apparatus etc. which concern on this embodiment with a server system. 図２０（Ａ）〜図２０（Ｃ）は位相限定相関法のロバスト性の説明図。20A to 20C are explanatory diagrams of the robustness of the phase only correlation method. 位相限定相関法の処理内容を説明する図。The figure explaining the processing content of a phase only correlation method. 本実施形態に係る処理を説明するフローチャート。The flowchart explaining the process which concerns on this embodiment. テンプレート画像の生成手法と各自由度での変化方向を説明する図。The figure explaining the production | generation method of a template image, and the change direction in each freedom degree. 図２４（Ａ）〜図２４（Ｃ）は第１，第２のテンプレート画像の説明図。24A to 24C are explanatory diagrams of the first and second template images. 図２５（Ａ）〜図２５（Ｆ）は第３〜第５のテンプレート画像の説明図。FIG. 25A to FIG. 25F are explanatory diagrams of third to fifth template images.

以下、本実施形態について説明する。なお、以下に説明する本実施形態は、特許請求の範囲に記載された本発明の内容を不当に限定するものではない。また本実施形態で説明される構成の全てが、本発明の必須構成要件であるとは限らない。 Hereinafter, this embodiment will be described. In addition, this embodiment demonstrated below does not unduly limit the content of this invention described in the claim. In addition, all the configurations described in the present embodiment are not necessarily essential configuration requirements of the present invention.

１．本実施形態の手法
まず本実施形態の手法について説明する。上述したように、物体の３次元位置姿勢の検出が種々の状況で求められる。それに対して、レーザーレンジセンサーのように専用のセンサーを用いたり、照射光の射出から受光までの時間を測定するtime of flight方式を用いることで、物体の３次元形状情報を取得する手法が知られている。これらの手法では、３次元的な形状情報を取得するために、専用のセンサーや、機器（例えばtime of flight方式で用いられる赤外光の照射装置）等を用いる必要がある。 1. First, the method of this embodiment will be described. As described above, detection of the three-dimensional position and orientation of an object is required in various situations. On the other hand, there is a known method for acquiring 3D shape information of an object by using a dedicated sensor such as a laser range sensor or using a time of flight method that measures the time from emission to reception of irradiation light. It has been. In these methods, in order to acquire three-dimensional shape information, it is necessary to use a dedicated sensor, equipment (for example, an infrared light irradiation device used in the time of flight method), or the like.

また、特定のパターンの光を対象物に照射し、撮像画像に撮像された画像上でのパターンに対して画像処理を行うことで物体の３次元形状を計測する手法が知られている。この場合、２次元的な撮像画像から３次元形状を測定することになるが、特定のパターン光の生成、照射を行うプロジェクターが必要となる。 There is also known a method of measuring a three-dimensional shape of an object by irradiating a target with light of a specific pattern and performing image processing on the pattern on the image captured in the captured image. In this case, a three-dimensional shape is measured from a two-dimensional captured image, but a projector that generates and emits specific pattern light is required.

そこで本出願人は、３次元位置姿勢のための専用のハードウェアを用いることなく、オブジェクトが含まれる入力画像（狭義には撮像画像）と、当該オブジェクトの理想的な３次元形状情報である３次元モデルデータとに基づいて、オブジェクトの３次元位置姿勢を検出する手法を提案する。具体的には３次元モデルデータから２次元のテンプレート画像を生成し、入力画像とテンプレート画像の間でマッチング処理を行うことで、オブジェクトの位置姿勢を検出する。 Therefore, the present applicant does not use dedicated hardware for three-dimensional position and orientation, and is an input image (captured image in a narrow sense) including an object and ideal three-dimensional shape information of the object. A method for detecting the three-dimensional position and orientation of an object based on the three-dimensional model data is proposed. Specifically, the position and orientation of the object are detected by generating a two-dimensional template image from the three-dimensional model data and performing matching processing between the input image and the template image.

３次元モデルデータからテンプレート画像を取得（生成）する手法は種々考えられるが、例えば、図１（Ａ）に示したようにｘ軸ｙ軸ｚ軸で規定される３次元空間において、仮想カメラをｚ軸上の所与の位置に配置し、原点方向を撮像した画像をテンプレート画像とする手法を用いればよい。この際、テンプレート画像の上方向がｙ軸正方向になるとすれば、仮想カメラによる撮像画像は図１（Ｂ）に示したような画像となる。仮想カメラによる撮像は、具体的には透視変換処理等により実現される。 Various methods for acquiring (generating) a template image from the three-dimensional model data are conceivable. For example, in a three-dimensional space defined by the x-axis, the y-axis, and the z-axis as shown in FIG. A technique may be used in which an image that is arranged at a given position on the z-axis and images the origin direction is used as a template image. At this time, if the upper direction of the template image is the positive y-axis direction, the image captured by the virtual camera is an image as shown in FIG. Specifically, the imaging by the virtual camera is realized by a perspective transformation process or the like.

この場合、３次元モデルデータのｘ軸における位置を変更すれば、テンプレート画像におけるオブジェクトは画像の横方向に移動することになる。具体的には、オブジェクトの位置を図１（Ａ）の矢印の方向に変化させれば、テンプレート画像におけるオブジェクトも矢印の方向へ移動する。同様に、ｙ軸における位置を変更すれば、オブジェクトは画像の縦方向に移動することになる。また、ｚ軸方向に移動させると、オブジェクトと仮想カメラの間の距離が変化することからテンプレート画像中でのオブジェクトのサイズが変化する。また、ｘ軸回りの回転角ｕ、ｙ軸回りの回転角ｖ、ｚ軸回りの回転角ｗを変化させると、仮想カメラに対するオブジェクトの姿勢が変化するため、オブジェクトに回転対称性がある等の場合を除き、基本的にはテンプレート画像におけるオブジェクトの形状が変化することになる。なお、図１（Ａ）、図１（Ｂ）では、座標系に仮想カメラを固定して３次元モデルデータ側を動かすものとしたが、オブジェクトを固定して仮想カメラを移動させてもよい。 In this case, if the position on the x-axis of the three-dimensional model data is changed, the object in the template image moves in the horizontal direction of the image. Specifically, if the position of the object is changed in the direction of the arrow in FIG. 1A, the object in the template image also moves in the direction of the arrow. Similarly, if the position on the y-axis is changed, the object moves in the vertical direction of the image. Further, when the object is moved in the z-axis direction, the distance between the object and the virtual camera changes, so that the size of the object in the template image changes. Also, if the rotation angle u about the x axis, the rotation angle v about the y axis, and the rotation angle w about the z axis are changed, the posture of the object with respect to the virtual camera changes, and thus the object has rotational symmetry, etc. Except for cases, the shape of the object in the template image basically changes. In FIGS. 1A and 1B, the virtual camera is fixed to the coordinate system and the 3D model data side is moved. However, the virtual camera may be moved while the object is fixed.

つまり、３次元モデルデータから取得したテンプレート画像と、入力画像を用いてオブジェクトの位置姿勢を検出する際には、３次元モデルデータの位置姿勢（ｘ，ｙ，ｚ，ｕ，ｖ，ｗ）を変化させることで、オブジェクトの位置、サイズ、形状の異なる複数のテンプレート画像を取得し、当該複数のテンプレート画像の中で最も入力画像に近い画像を探索すればよい。テンプレート画像と入力画像が近い（狭義には一致する）状況では、仮想カメラに対する３次元モデルデータの相対的な位置姿勢と、入力画像を撮像した撮像部と実際のオブジェクトとの相対的な位置姿勢が充分近い（狭義には一致する）と考えることができる。 That is, when detecting the position and orientation of the object using the template image acquired from the three-dimensional model data and the input image, the position and orientation (x, y, z, u, v, w) of the three-dimensional model data are used. By changing, a plurality of template images having different object positions, sizes, and shapes may be acquired, and an image closest to the input image may be searched among the plurality of template images. In a situation where the template image and the input image are close (in the narrow sense, match), the relative position and orientation of the three-dimensional model data with respect to the virtual camera, and the relative position and orientation of the imaging unit that captured the input image and the actual object Can be considered to be close enough (in the narrow sense to match).

通常、画像マッチングでは２つの画像がどれだけ近いかを表すパラメーターである類似度が求められるため、位置姿勢の検出は、類似度を最大にする３次元モデルデータの位置姿勢（ｘ，ｙ，ｚ，ｕ，ｖ，ｗ）を求める問題に落とし込むことができる。（ｘ，ｙ，ｚ，ｕ，ｖ，ｗ）が求まれば、その際の仮想カメラに対する３次元モデルデータの相対的な位置姿勢関係を用いて、入力画像を撮像した撮像部に対する実際のオブジェクトの位置姿勢関係を求めることができる。また、所与の座標系における撮像部の配置位置姿勢が既知であれば、オブジェクトの位置姿勢を当該所与の座標系の情報に変換すること等も容易である。 In general, since image matching requires a similarity that is a parameter indicating how close two images are, the position and orientation are detected by the position and orientation (x, y, z) of the three-dimensional model data that maximizes the similarity. , U, v, w). If (x, y, z, u, v, w) is obtained, the actual object for the imaging unit that captured the input image using the relative position and orientation relationship of the three-dimensional model data with respect to the virtual camera at that time is obtained. Can be obtained. Further, if the arrangement position and orientation of the imaging unit in a given coordinate system are known, it is easy to convert the position and orientation of the object into information of the given coordinate system.

ただし、以上は一般的な手法の説明であり、本出願人の提案する手法は、テンプレート画像と入力画像をマッチングする画像マッチング手法として、上述した位相限定相関法を用いる点を特徴とする。 However, the above is a description of a general method, and the method proposed by the present applicant is characterized in that the above-described phase-only correlation method is used as an image matching method for matching a template image and an input image.

位相限定相関法は、非特許文献１に開示されているように高いロバスト性と高い精度を有する画像マッチング手法である。そして図２１等を用いて後述するように、位相限定相関法では入力画像とテンプレート画像の間の、画像平面上における位置ズレを検出することができる。つまり３軸の位置のうちの２軸での位置、例えばｘ軸とｙ軸の位置については、種々の値を設定することで複数のテンプレート画像を取得して、入力画像に最も近い（ｘ，ｙ）を探索するという処理は必要ない。テンプレート画像の取得時には（ｘ，ｙ）＝（０，０）のように所定の値を用いておけば、当該所定の値とオブジェクトの位置姿勢との差は、位相限定相関処理の結果として自然に求めることが可能である。つまり、本実施形態の手法では、オブジェクトの位置姿勢の検出は、類似度を最大にする３次元モデルデータの位置姿勢（ｘ，ｙ，ｚ，ｕ，ｖ，ｗ）を求める問題ではなく、類似度を最大にする（ｚ，ｕ，ｖ，ｗ）を求める問題として扱うことが可能である。 The phase only correlation method is an image matching method having high robustness and high accuracy as disclosed in Non-Patent Document 1. As will be described later with reference to FIG. 21 and the like, the phase-only correlation method can detect a positional shift on the image plane between the input image and the template image. In other words, regarding the position on two axes out of the three axes positions, for example, the position of the x axis and the y axis, a plurality of template images are acquired by setting various values, and the closest to the input image (x, The process of searching for y) is not necessary. If a predetermined value such as (x, y) = (0, 0) is used when the template image is acquired, the difference between the predetermined value and the position / orientation of the object is a natural result of the phase-only correlation process. It is possible to ask for. That is, in the method of the present embodiment, the detection of the position and orientation of the object is not a problem of obtaining the position and orientation (x, y, z, u, v, w) of the three-dimensional model data that maximizes the similarity, but is similar. It can be handled as a problem of finding (z, u, v, w) that maximizes the degree.

具体的には、（ｚ，ｕ，ｖ，ｗ）の値の組みを１つ決定することで、１つのテンプレート画像が取得される。この際、（ｘ，ｙ）については所定の値を用いておけばよい。そして当該テンプレート画像と入力画像を用いて位相限定相関法を行うことで、図８に示すような（ｘ，ｙ）とｒの関係が求められるため、ｒの最大値を当該（ｚ，ｕ，ｖ，ｗ）に対応する類似度ｒとする。このように１回の位相限定相関処理により、（ｚ，ｕ，ｖ，ｗ）とｒの値の組みを１つ求めることができる。これは例えば図２の棒グラフに示された複数の棒（直方体）のうち、１つの棒が求められることに対応する。ただし、図２は便宜上（ｕ，ｖ）とｒの関係を３次元グラフで示したものであり、処理上は（ｚ，ｕ，ｖ，ｗ）とｒによる５次元空間での処理が行われる。また、本明細書では図１０（Ａ）等においても（ｕ，ｖ）とｒによる３次元グラフを用いて説明を行うが、（ｚ，ｕ，ｖ，ｗ）とｒによる５次元空間に拡張して処理が行われる点は同様である。 Specifically, one template image is acquired by determining one set of values of (z, u, v, w). At this time, a predetermined value may be used for (x, y). Then, by performing the phase-only correlation method using the template image and the input image, the relationship between (x, y) and r as shown in FIG. 8 is obtained, so that the maximum value of r is determined as the (z, u, Let the similarity r correspond to v, w). Thus, one combination of the values of (z, u, v, w) and r can be obtained by one phase-only correlation process. This corresponds to, for example, obtaining one bar among a plurality of bars (cuboids) shown in the bar graph of FIG. However, FIG. 2 shows the relationship between (u, v) and r as a three-dimensional graph for the sake of convenience, and processing is performed in a five-dimensional space using (z, u, v, w) and r. . In this specification, the description will be given also using FIG. 10 (A) and the like using a three-dimensional graph by (u, v) and r, but it is extended to a five-dimensional space by (z, u, v, w) and r. Thus, the process is the same.

この処理を、異なる複数の（ｚ，ｕ，ｖ，ｗ）についてそれぞれ行って、パラメーター（ｚ，ｕ，ｖ，ｗ）と類似度ｒの組みを複数求めることで、図２に示したようにパラメーターとｒによるグラフを生成できる。後はここから類似度を最大にするパラメーターを決定すればよく、図２の例では（ｕ，ｖ）＝（０，０）となる。また、６自由度の位置姿勢のうち、（ｘ，ｙ）については、類似度を最大にパラメーターに対応する位相限定相関処理により求められた値を用いればよい。 This process is performed for each of a plurality of different (z, u, v, w), and a plurality of combinations of parameter (z, u, v, w) and similarity r are obtained, as shown in FIG. A graph with parameters and r can be generated. After that, the parameter that maximizes the similarity may be determined from here, and (u, v) = (0, 0) in the example of FIG. Further, among the positions and orientations having six degrees of freedom, for (x, y), the value obtained by the phase-only correlation process corresponding to the parameter with the maximum similarity may be used.

ここで、位相限定相関法は、位置姿勢の変化に対する感度が非常に高いことが知られている。例えば、図８を用いて後述するように、ｘｙ平面での類似度の値は、実際のずれ量に対応する位置では非常に大きい値となるのに対して、ｘやｙがわずかにずれただけで類似度の値は非常に小さいものとなる。また、これは画像中のオブジェクトの形状や姿勢等についても同様であり、例えばテンプレート画像に対して入力画像中のオブジェクトが回転しているようなケースでは、位相限定相関法により求められる類似度の値は非常に小さくなってしまう。 Here, it is known that the phase-only correlation method is very sensitive to changes in position and orientation. For example, as will be described later with reference to FIG. 8, the similarity value on the xy plane is a very large value at a position corresponding to the actual shift amount, whereas x and y are slightly shifted. Only the value of the similarity is very small. This also applies to the shape and orientation of the object in the image. For example, in the case where the object in the input image is rotated with respect to the template image, the similarity obtained by the phase-only correlation method is obtained. The value will be very small.

そのため、姿勢情報（ｚ，ｕ，ｖ，ｗ）の値が変化することで、テンプレート画像中のオブジェクトのサイズや形状が変化した場合、図３に示したように当該変化に対する類似度の値の変化は非常に急峻なものとなる。なお、本来姿勢情報は４次元であるが、図３では説明を簡略化するために１次元で表現している。そのため、テンプレート画像を静的に生成しておく場合、例えば、ｚ，ｕ，ｖ，ｗのそれぞれについて所定間隔で値を変化させることで種々の（ｚ，ｕ，ｖ，ｗ）の組みでのテンプレート画像を生成しておく場合には、膨大な量のテンプレート画像が必要となる。 Therefore, when the size or shape of the object in the template image changes due to the change in the value of the posture information (z, u, v, w), the similarity value for the change is changed as shown in FIG. The change is very steep. Note that the posture information is originally four-dimensional, but in FIG. 3, it is expressed in one dimension for the sake of simplicity. Therefore, when a template image is generated statically, for example, by changing values at predetermined intervals for each of z, u, v, and w, various combinations of (z, u, v, w) In the case of generating a template image, a huge amount of template images are required.

例えば、回転角ｕｖｗの場合であれば、実際のオブジェクトの角度から±１度程度ずれてしまうだけで、誤差と判別できないほど類似度の値が小さくなってしまうことが確認された。例えば図３において誤差と判別できる類似度の値をｒ_ｍｉｎとし、ｒ_ｍｉｎに対応するパラメーター（ここでは回転角）をｐ１及びｐ２とした場合に、ｐ１とｐ２の間隔が２度程度（ｐ_ｍａｘに対して±１度）しかないということになる。よって、ｕｖｗについてはそれぞれの間隔を少なくとも２度以下に設定してテンプレート画像を生成しておかなくては、誤差と区別できる有意な類似度の値が取得されるテンプレート画像、すなわちパラメーターの値がｐ１より大きく且つｐ２より小さいテンプレート画像が１枚も存在しないおそれがある。 For example, in the case of the rotation angle uvw, it has been confirmed that the similarity value is so small that it cannot be discriminated as an error only by deviating about ± 1 degree from the actual object angle. For example, when the similarity value that can be discriminated as an error in FIG. 3 is r _min and the parameters corresponding to r _min (rotation angles here) are p1 and p2, the interval between p1 and p2 is about 2 degrees (p _max It is only ± 1 degree with respect to. Therefore, with respect to uvw, a template image in which a significant similarity value that can be distinguished from an error is obtained unless the interval is set to at least two degrees or less to generate a template image, that is, the parameter value is There may be no template image larger than p1 and smaller than p2.

また、本来求めたいパラメーターはｐ_ｍａｘであるから、ｐ１とｐ２の間に少なくとも１つのテンプレート画像が存在したとしても、当該テンプレート画像がｐ_ｍａｘに対応する、すなわち類似度を極大にするテンプレート画像である保証はない。よって、高精度で類似度が極大となるパラメーターを求めたいのであれば、ｐ１とｐ２の間に充分な数のテンプレート画像が含まれるように、パラメーターの変化間隔を非常に狭くしなければならない。例えば、ｐ１とｐ２の間に少なくとも１０枚のテンプレート画像が含まれるようにしたければ、ｕ，ｖ，ｗのそれぞれについて２／１０度程度の間隔でテンプレート画像を生成する必要がある。またｚの変化幅についても、単位は異なるものの非常に狭くしなければならない点では同様である。結果として、生成、保持が必要なテンプレート画像が膨大なものとなる。 Further, since the parameter to be originally obtained is p _max , even if there is at least one template image between p1 and p2, the template image corresponds to p _max , that is, a template image that maximizes the degree of similarity. There is no guarantee. Therefore, if it is desired to obtain a parameter with high accuracy and maximum similarity, the parameter change interval must be very narrow so that a sufficient number of template images are included between p1 and p2. For example, if at least 10 template images are to be included between p1 and p2, it is necessary to generate template images at intervals of about 2/10 degrees for each of u, v, and w. The change width of z is the same in that it must be very narrow although the unit is different. As a result, the number of template images that need to be generated and stored becomes enormous.

また、補間処理により極値を求めることも考えられる。例えば、図４に示したようにパラメーターをｐｉ、ｐｊ、ｐｋとした場合に、当該パラメーターから求められるテンプレート画像と入力画像との類似度が、それぞれｒｉ、ｒｊ、ｒｋとなったとする。この場合、（ｐｉ，ｒｉ）、（ｐｊ，ｒｊ）、（ｐｋ，ｒｋ）の３点を用いて例えば２次曲線による補間処理を行うことで、類似度がピークとなるパラメーターの値ｐｍａｘを推定することができる。この場合、テンプレート画像に対応するパラメーターの値（ｐｉ等）が必ずしもｐｍａｘに非常に近くなる必要がないため、パラメーターの変化幅を広げることができ、テンプレート画像の枚数も削減できる。例えばｐ１とｐ２の間に３つの点が存在するようにする場合、ｕｖｗについてはそれぞれの間隔を少なくとも２／３度以下に設定してテンプレート画像を生成することになる。ただしこの場合も、上記の例に比べればテンプレート画像の数を減らすことができるが、それでも生成、保持しておくべきテンプレート画像の数は現実的な数とは言えない。 It is also conceivable to obtain extreme values by interpolation processing. For example, when the parameters are pi, pj, and pk as shown in FIG. 4, the similarity between the template image obtained from the parameters and the input image is ri, rj, and rk, respectively. In this case, the parameter value pmax at which the similarity reaches a peak is estimated by performing interpolation processing using a quadratic curve, for example, using the three points (pi, ri), (pj, rj), and (pk, rk). can do. In this case, since the parameter value (pi or the like) corresponding to the template image does not necessarily need to be very close to pmax, the parameter change range can be widened, and the number of template images can be reduced. For example, when three points exist between p1 and p2, for uvw, the interval is set to at least 2/3 degrees or less to generate a template image. However, in this case as well, the number of template images can be reduced as compared with the above example, but the number of template images that should be generated and retained is still not a realistic number.

それに対して、所与の指標値（ここでは類似度）を最大とするパラメーターを決定する手法として漸近法（非線形計画法、山登り法）が広く知られている。例えば、図５に示したように横軸をパラメーター、縦軸を指標値（本実施形態では類似度）としたときに、横軸の値を変化させながら指標値を最大（極大）とするパラメーターの値ｐ_ｍａｘを探索する。この場合、まず所与のパラメーターｐ_ｎでの指標値ｑ_ｎを求め、その後、パラメーターをｐ_ｎ＋１に変更して指標値ｑ_ｎ＋１を求める。ｑ_ｎ＋１＞ｑ_ｎであれば、パラメーターは解に近づく方向へ変化していると判定できるし、ｑ_ｎ＋１＜ｑ_ｎであればパラメーターは解から遠ざかる方向へ変化していると判定できる。つまり漸近法では、指標値の算出結果から解に近づく方向のパラメーターの変化を求めて、パラメーターの値を正解の値に近づけていき、指標値が極値に到達した場合に処理を終了する。このようにすれば、漸近法の開始点を基準に、その周辺でのパラメーターに対応するテンプレート画像を動的に生成していけばよい。そのため、上述したように膨大な枚数のテンプレート画像を生成、保持しておく必要がなくなり、画像生成の処理負荷を低減できるし、記憶部等の容量を圧迫することもない。 On the other hand, asymptotic methods (nonlinear programming, hill-climbing methods) are widely known as methods for determining a parameter that maximizes a given index value (here, similarity). For example, as shown in FIG. 5, when the horizontal axis is a parameter and the vertical axis is an index value (similarity in this embodiment), the parameter value is maximized (maximum) while changing the value on the horizontal axis. Search for the value p _max of. In this case, first, we obtain the index value _{q n} of a given parameter _{p n,} then obtains the index value _{q n + 1} to change the parameters _{p n + 1.} If q _{n + 1} > q _n, it can be determined that the parameter is changing in a direction approaching the solution, and if q _{n + 1} <q _n, it can be determined that the parameter is changing in a direction away from the solution. That is, in the asymptotic method, the parameter change in the direction approaching the solution is obtained from the calculation result of the index value, the parameter value is brought close to the correct value, and the process ends when the index value reaches the extreme value. In this way, a template image corresponding to the parameters in the vicinity thereof may be dynamically generated based on the starting point of the asymptotic method. Therefore, as described above, it is not necessary to generate and hold a huge number of template images, the processing load for image generation can be reduced, and the capacity of the storage unit or the like is not reduced.

よって本実施形態では、位相限定相関法と漸近法を用いてオブジェクトの位置姿勢を検出する。具体的には、本実施形態に係る処理装置１００は、図６に示したように、入力画像を取得する入力画像取得部１１０と、オブジェクトの３次元モデルデータに基づいて、テンプレート画像を取得するテンプレート画像取得部１２０と、入力画像とテンプレート画像に基づいて、入力画像におけるオブジェクトの位置姿勢の検出処理を行う処理部１３０を含む。そして、テンプレート画像取得部１２０は、３次元モデルデータに対して、第１〜第３の軸により規定される３次元空間での位置のうちの第３の軸での位置と、第１〜第３の軸の各軸まわりでの回転角である第１〜第３の回転角と、を表す姿勢情報を設定することで、３次元モデルデータからテンプレート画像を取得する。処理部１３０は、入力画像とテンプレート画像に対して、位相限定相関処理を行うことで、画像間の類似度を求め、類似度に基づいて、姿勢情報の更新処理を行い、更新処理後の姿勢情報により取得された新たなテンプレート画像と、入力画像に対して、位相限定相関処理を行う。 Therefore, in this embodiment, the position and orientation of the object are detected using the phase only correlation method and the asymptotic method. Specifically, as illustrated in FIG. 6, the processing apparatus 100 according to the present embodiment acquires a template image based on the input image acquisition unit 110 that acquires an input image and the three-dimensional model data of the object. A template image acquisition unit 120 and a processing unit 130 that performs processing for detecting the position and orientation of an object in the input image based on the input image and the template image are included. And the template image acquisition part 120 is the position in the 3rd axis | shaft among the positions in the 3D space prescribed | regulated by the 1st-3rd axis | shaft with respect to 3D model data, 1st-1st The template image is acquired from the three-dimensional model data by setting posture information representing the first to third rotation angles that are rotation angles around the three axes. The processing unit 130 performs phase-only correlation processing on the input image and the template image to obtain a similarity between the images, performs posture information update processing based on the similarity, and performs post-update processing posture. Phase-only correlation processing is performed on the new template image acquired from the information and the input image.

さらに処理部１３０は、姿勢情報の更新処理による複数回の位相限定相関処理によって、類似度を極大とする姿勢情報が求められた場合には、姿勢情報に基づき取得されるテンプレート画像と入力画像に対する位相限定相関処理により求められる、第１の軸における位置及び第２の軸における位置と、姿勢情報により表される第３の軸での位置と、姿勢情報により表される第１〜第３の回転角とを、入力画像におけるオブジェクトの位置姿勢として検出する。 Furthermore, when the posture information that maximizes the degree of similarity is obtained by a plurality of phase-only correlation processes by the posture information update processing, the processing unit 130 performs processing on the template image and the input image acquired based on the posture information. The first to third positions represented by the position on the first axis and the position on the second axis, the position on the third axis represented by the attitude information, and the position information obtained by the phase only correlation process. The rotation angle is detected as the position and orientation of the object in the input image.

ここでの第１の軸における位置は例えばｘ軸での座標値ｘであり、第２の軸における位置は例えばｙ軸での座標値ｙである。ｘ及びｙは上述したように位相限定相関法により決定される。また、第３の軸での位置は例えばｚ軸での座標値であり、第１〜第３の回転角とはｘｙｚ各軸周りの回転角ｕｖｗである。（ｚ，ｕ，ｖ，ｗ）については上述したように漸近法により決定される。 The position on the first axis here is, for example, the coordinate value x on the x-axis, and the position on the second axis is, for example, the coordinate value y on the y-axis. x and y are determined by the phase only correlation method as described above. The position on the third axis is a coordinate value on the z-axis, for example, and the first to third rotation angles are rotation angles uvw around the xyz axes. (Z, u, v, w) is determined by the asymptotic method as described above.

このようにすれば、オブジェクトの位置姿勢（特に６自由度の位置姿勢）の検出に位相限定相関法を用いることができる。非特許文献１にも開示されているように、位相限定相関法は高いロバスト性を有し高精度であるため、本実施形態の手法もロバスト性が高く、且つ精度が高いものとできる。具体的には、入力画像に所望のオブジェクト以外の物体が撮像されていたとしても、位置姿勢の検出を行うことができるし、検出された位置姿勢の精度を高くする（例えば角度の誤差を１度未満のオーダーとする）こと等が可能である。 In this way, the phase-only correlation method can be used to detect the position and orientation of the object (particularly, the position and orientation with six degrees of freedom). As disclosed in Non-Patent Document 1, since the phase-only correlation method has high robustness and high accuracy, the method of the present embodiment can also have high robustness and high accuracy. Specifically, even if an object other than the desired object is captured in the input image, the position and orientation can be detected, and the accuracy of the detected position and orientation is increased (for example, an angle error of 1 is set). It is possible to make an order of less than degrees).

しかし、位相限定相関法と漸近法を合わせて用いる場合には、漸近法を用いることができるパラメーター（姿勢情報）の範囲が問題となる。図５を用いて上述したように、漸近法では指標値の変化の様子を見て、パラメーターの変化方向を極値に対応する方向へ向けていくものである。つまり、山登り法における山の裾野にも来ていない場合、すなわち、指標値の変化方向が充分明確とならない場合には、そもそも極値方向へパラメーターを更新して行くことができなくなってしまう。例えば、図７に示したように２つのパラメーターｕｖを変化させて類似度ｒの最大値を探索する場合、Ａ１に示した位置からであればパラメーターを変化させたときに、それが指標値を大きくする方向なのか小さくする方向なのかを判定（山の勾配方向を判定）できるため、漸近法を実行できる。一方、Ａ２に示した位置では、開始時のパラメーターが求めたい正解値から離れすぎているため、パラメーターを変化させたとしても、当該変化が指標値にどのような方向の変化をもたらすかを明確に判定できず、漸近法を実行できない。 However, when the phase-only correlation method and the asymptotic method are used together, the range of parameters (posture information) in which the asymptotic method can be used becomes a problem. As described above with reference to FIG. 5, in the asymptotic method, the change of the parameter value is directed to the direction corresponding to the extreme value by looking at the change of the index value. In other words, if the hill-climbing method is not reached, that is, if the change direction of the index value is not sufficiently clear, the parameter cannot be updated in the extreme direction. For example, when searching for the maximum value of the similarity r by changing the two parameters uv as shown in FIG. 7, when the parameter is changed from the position shown in A1, the index value is changed. Since it is possible to determine whether the direction is to increase or decrease (determine the gradient direction of the mountain), the asymptotic method can be executed. On the other hand, at the position shown in A2, the starting parameter is too far from the correct value to be obtained, so even if the parameter is changed, it is clear what direction the change will bring to the index value. The asymptotic method cannot be executed.

以上を踏まえて考えた場合、位相限定相関法では図３に示したようにパラメーターの変化に対する類似度の変化が急である点が問題となる。なぜなら、正解の値からパラメーターがわずかにずれてしまっただけでも、図７のＡ２に示した例と同様に、漸近法により極値を探索することができなくなってしまうためである。例えば漸近法を開始することができる限界点（山の裾野）が、図３のｐ１及びｐ２に対応する点であるとすれば、漸近法開始時のパラメーターはｐ１とｐ２の間の値となっていなければならず、この範囲は上述したように非常に狭く、回転角であれば±１度程度の値になってしまう。 Considering the above, there is a problem in the phase-only correlation method that the change of the similarity with respect to the change of the parameter is abrupt as shown in FIG. This is because, even if the parameter is slightly deviated from the correct value, the extremum cannot be searched by the asymptotic method as in the example shown in A2 of FIG. For example, if the limit point (the peak of the mountain) at which the asymptotic method can be started is a point corresponding to p1 and p2 in FIG. 3, the parameter at the start of the asymptotic method is a value between p1 and p2. This range is very narrow as described above, and a value of about ± 1 degree is obtained at a rotation angle.

つまり漸近法を適切に実行するには、例えば上述した例と同様に、パラメーターの変化幅が２度以内になるようにして、膨大な量のテンプレート画像を生成、保持しておく必要が生じることになる。この場合、漸近法を用いない場合に比べれば、類似度を最大とするパラメーター、すなわちオブジェクトの位置姿勢を精度よく検出できること、或いは同程度の精度を出すものとした場合に必要となるテンプレート画像の数を少なくすること等が可能ではあるが、テンプレート画像の枚数が多くなるという問題を考慮しなくてはならない。 In other words, in order to properly execute the asymptotic method, for example, as in the above-described example, it is necessary to generate and hold a huge amount of template images so that the parameter change width is within 2 degrees. become. In this case, compared with the case where the asymptotic method is not used, the parameter that maximizes the similarity, that is, the position and orientation of the object can be detected with high accuracy, or the template image required when the same level of accuracy is obtained. Although it is possible to reduce the number, the problem that the number of template images increases must be taken into consideration.

また、本実施形態に係るオブジェクトの位置姿勢検出処理の前段階として、比較的精度の低い位置姿勢検出処理を行っておくという手法も考えられる。この場合、前段での位置姿勢検出の結果を用いて本実施形態の処理を行えばよく、例えば前段で検出した位置姿勢を、漸近法における開始パラメーターとして用いる。このようにすれば、多数のテンプレート画像を保持しておく必要はなくなる。しかしこの場合、前段での処理結果であるオブジェクトの位置姿勢は、誤差が±１度以内に収まる程度の精度となることが求められる。つまり、前段の処理に対する要求が厳しいものとなるため、やはり好ましい状況とは言えない。 Further, as a pre-stage of the object position / orientation detection process according to the present embodiment, a method of performing a relatively low-accuracy position / orientation detection process may be considered. In this case, the processing of this embodiment may be performed using the result of the position and orientation detection in the previous stage. For example, the position and orientation detected in the previous stage is used as the start parameter in the asymptotic method. In this way, it is not necessary to store a large number of template images. However, in this case, the position and orientation of the object, which is the processing result in the previous stage, is required to have an accuracy that the error is within ± 1 degree. In other words, since the requirements for the processing in the previous stage are severe, it cannot be said that the situation is preferable.

そこで本出願人は、漸近法（山登り法）における山の裾野部分を広げる手法、すなわち、漸近法を適切に実行することができるパラメーター（姿勢情報）の範囲を広くする手法を提案する。 Therefore, the present applicant proposes a method of widening the base of the mountain in the asymptotic method (hill climbing method), that is, a method of widening the range of parameters (posture information) that can appropriately execute the asymptotic method.

具体的には、処理部１３０は、入力画像に対する周波数変換処理により求められる入力位相画像と、テンプレート画像に対する周波数変換処理により求められるテンプレート位相画像との位相限定合成処理の結果に対して、所与の重み付け処理を行った後に、周波数変換処理の逆変換処理を行うことで、類似度を求める。 Specifically, the processing unit 130 gives a given value to the result of the phase-only synthesis process of the input phase image obtained by the frequency conversion process for the input image and the template phase image obtained by the frequency conversion process for the template image. After the weighting process is performed, the similarity is obtained by performing an inverse conversion process of the frequency conversion process.

ここでの重みづけ処理の内容は種々考えられるが、例えばガウシアンフィルターを作用させるものであってもよい。図８は、（ｚ，ｕ，ｖ，ｗ）の値を決定することでテンプレート画像を１つ作成し、当該テンプレート画像と入力画像とにより通常の位相限定相関法を行った（ガウシアンをかけていない）例である。図８からわかるように、ｘｙ平面上の所与の点で非常に鋭いピークがたち（類似度の値が非常に大きくなり）、ｘｙの値が少しでもずれると類似度の値は極端に小さくなる。図８の手法を用いて、複数の（ｚ，ｕ，ｖ，ｗ）の値の組から求められた複数のテンプレート画像のそれぞれについて処理を行うことで、類似度と、パラメーター（ｚ，ｕ，ｖ，ｗ）との関係を求めたものが図１０（Ａ）である。この場合、ｒの値は正解となる（ｕ，ｖ）のところで非常に高くなり、（ｕ，ｖ）の値が正解から離れると急激に小さくなる。つまり、図３等を用いて上述したように、漸近法を用いることができる範囲が非常に狭い急峻な山となる。 Various contents of the weighting process here are conceivable. For example, a Gaussian filter may be used. In FIG. 8, one template image is created by determining the values of (z, u, v, w), and a normal phase-only correlation method is performed using the template image and the input image (gaussian is applied). No) is an example. As can be seen from FIG. 8, a very sharp peak appears at a given point on the xy plane (similarity value becomes very large), and if the xy value deviates even a little, the similarity value becomes extremely small. Become. By performing processing for each of a plurality of template images obtained from a plurality of sets of (z, u, v, w) using the method of FIG. 8, similarity and parameters (z, u, FIG. 10 (A) shows the relationship with v, w). In this case, the value of r becomes very high at the correct answer (u, v), and decreases rapidly when the value of (u, v) leaves the correct answer. That is, as described above with reference to FIG. 3 and the like, the range in which the asymptotic method can be used is a steep mountain that is very narrow.

それに対して、位相限定合成処理の後にガウシアンフィルターを作用させた例が図９である。図８に比べて、（ｘ，ｙ，ｒ）の３次元空間でのピークがなまり、結果としてこの際の（ｕ，ｖ）に対応する類似度ｒの値は小さいものとなる。図９の手法を用いて、複数の（ｚ，ｕ，ｖ，ｗ）の値の組から求められた複数のテンプレート画像のそれぞれについて処理を行うことで、類似度と、パラメーター（ｚ，ｕ，ｖ，ｗ）との関係を求めたものが図１０（Ｂ）である。図１０（Ｂ）から明らかなように、ガウシアンフィルターを作用させることで、図１０（Ａ）に比べて山の斜面が緩やかなものになる。そのため、図１０（Ａ）に比べて山の頂点から離れた位置、すなわち正解との差が比較的大きいパラメーターからであっても漸近法を実行することが可能になる。 In contrast, FIG. 9 shows an example in which a Gaussian filter is applied after the phase only synthesis process. Compared with FIG. 8, the peak in the three-dimensional space of (x, y, r) is reduced, and as a result, the value of the similarity r corresponding to (u, v) at this time is small. By performing processing for each of a plurality of template images obtained from a plurality of sets of (z, u, v, w) using the method of FIG. 9, similarity and parameters (z, u, FIG. 10B shows the relationship obtained with respect to v, w). As is clear from FIG. 10B, the slope of the mountain becomes gentler than that in FIG. 10A by applying the Gaussian filter. Therefore, the asymptotic method can be executed even from a position far from the peak of the mountain as compared with FIG. 10A, that is, from a parameter having a relatively large difference from the correct answer.

なお、重みづけ処理の手法は種々考えられる。例えばガウシアンフィルターの特性を決定するパラメーターσ等を変更してもよい。図１１（Ａ）がガウシアンフィルターを作用させない例であるのに対して、σ＝１のガウシアンフィルターを作用させたものが図１１（Ｂ）であり、σ＝２のガウシアンフィルターを作用させたものが図１１（Ｃ）である。図１１（Ａ）〜図１１（Ｃ）からわかるように、σを大きくするほどピークが鈍り、山の裾野部分が広がる傾向にある。ただし図１１（Ａ）〜図１１（Ｃ）のｒの値を比較すればわかるように、σを大きくすることで類似度のピーク値自体も小さくなってしまうため、一様誤差等の種々の誤差との識別が困難となるおそれが生じ始める。つまり、重みづけ処理においてはピークを強く鈍らせればよいというものではなく、重みづけ処理の強度（ここでは処理前に比べてどれだけピークを鈍らせるかの指標）は適切に設定する必要がある。或いは重みづけ処理としてガウシアンフィルター以外のフィルターを用いてもよいし、重みづけ処理がフィルター処理以外の処理で実現されてもよい。 Various weighting methods can be considered. For example, the parameter σ that determines the characteristics of the Gaussian filter may be changed. FIG. 11A shows an example in which a Gaussian filter is not applied, whereas FIG. 11B shows an example in which a Gaussian filter with σ = 1 is applied, and with a Gaussian filter with σ = 2 applied. Is FIG. 11C. As can be seen from FIGS. 11 (A) to 11 (C), the peak becomes dull as σ increases, and the skirt portion of the mountain tends to spread. However, as can be seen by comparing the values of r in FIGS. 11 (A) to 11 (C), since the peak value of the similarity itself is reduced by increasing σ, various errors such as uniform errors can be obtained. It may start to be difficult to distinguish from errors. In other words, in the weighting process, it is not necessary to dull the peak strongly, and the intensity of the weighting process (here, an index of how much the peak is dulled before the process) needs to be set appropriately. . Alternatively, a filter other than a Gaussian filter may be used as the weighting process, or the weighting process may be realized by a process other than the filter process.

また、山の裾野部分を広げる手法は位相限定合成処理の結果に対する重み付け処理に限定されるものではない。具体的には、処理部１３０は、入力画像に対して、解像度を低下させる解像度低下処理を行って、入力画像に比べて解像度の低い低解像度入力画像を取得し、低解像度入力画像に対して、位相限定相関処理を行ってもよい。 Further, the method of expanding the base of the mountain is not limited to the weighting process for the result of the phase only synthesis process. Specifically, the processing unit 130 performs a resolution reduction process for reducing the resolution on the input image, acquires a low-resolution input image having a lower resolution than the input image, and performs an operation on the low-resolution input image. The phase only correlation process may be performed.

具体例を図１２（Ａ）〜図１３（Ｃ）に示す。ここでは入力画像の解像度が図１２（Ａ）に示すように５１２×５１２（画素、ピクセル）である場合に、図１２（Ｂ）に示すように２５６×２５６の低解像度入力画像、或いは図１２（Ｃ）に示すように１２８×１２８の低解像度入力画像を取得する。そして、図１２（Ａ）のように入力画像をそのまま用いて、種々の（ｚ，ｕ，ｖ，ｗ）について位相限定相関処理を行って類似度を求めた結果をプロットしたものが図１３（Ａ）である。同様に、図１２（Ｂ）に示すように２５６×２５６の低解像度入力画像を取得して位相限定相関処理を行った結果が図１３（Ｂ）であり、図１２（Ｃ）に示すように１２８×１２８の低解像度入力画像を取得して位相限定相関処理を行った結果が図１３（Ｃ）である。 Specific examples are shown in FIGS. 12A to 13C. Here, when the resolution of the input image is 512 × 512 (pixels, pixels) as shown in FIG. 12A, a 256 × 256 low-resolution input image as shown in FIG. As shown in (C), a 128 × 128 low-resolution input image is acquired. Then, as shown in FIG. 12A, the input image is used as it is, and the results obtained by performing the phase-only correlation process for various (z, u, v, w) to obtain the similarity are plotted in FIG. A). Similarly, as shown in FIG. 12B, the result of obtaining the 256 × 256 low-resolution input image and performing the phase-only correlation process is FIG. 13B, and as shown in FIG. FIG. 13C shows the result of obtaining a 128 × 128 low-resolution input image and performing phase-only correlation processing.

図１３（Ａ）はオリジナルの状態であるため図１０（Ａ）等でも説明した通りピークが急峻である。それに対して、図１３（Ｂ）、図１３（Ｃ）からわかるように、低解像度入力画像を用いることで、類似度のピークを鈍らせることが可能になる。具体的には、解像度が低くなるほど、ピークの鈍化が大きい。この場合にも、重みづけ処理を行う場合と同様に、漸近法を実行できるパラメーターの範囲が広くなる。 Since FIG. 13A is an original state, the peak is steep as described with reference to FIG. On the other hand, as can be seen from FIG. 13B and FIG. 13C, the peak of similarity can be blunted by using a low-resolution input image. Specifically, the lower the resolution, the greater the blunting of the peak. Also in this case, as in the case of performing the weighting process, the range of parameters for which the asymptotic method can be executed is widened.

図１４は重み付け処理及び低解像度入力画像を用いた本実施形態の手法を実行した場合の、漸近法におけるパラメーターの変化を示した図である。図１４のうち特にｘ軸回りの回転角ｕの変化からわかるように、解に対する誤差が±５度程度あったとしても、位相限定相関法及び漸近法により、解を収束させることが可能であることが確認された。つまり、テンプレート画像を生成、保持しておく場合にも、パラメーターの間隔は±５度（１０度）程度とすればよいため、テンプレート画像の数を少なくすることができる。具体的には、（ｚ，ｕ，ｖ，ｗ）という４自由度のパラメーターのそれぞれについて、上述した±１度の例に比べて、テンプレート画像の枚数を１／５程度とすることができ、結果として全体での枚数を１／５^４程度とすることが可能である。 FIG. 14 is a diagram showing parameter changes in the asymptotic method when the method of the present embodiment using the weighting process and the low resolution input image is executed. As can be seen from the change in the rotation angle u around the x axis in FIG. 14, even if there is an error of about ± 5 degrees, the solution can be converged by the phase-only correlation method and the asymptotic method. It was confirmed. That is, even when template images are generated and held, the parameter interval may be about ± 5 degrees (10 degrees), so the number of template images can be reduced. Specifically, for each of the four-degree-of-freedom parameters (z, u, v, w), the number of template images can be reduced to about 1/5 compared to the above-described example of ± 1 degree. as a result the number of the whole may be a 1/5 ^4.

また、本実施形態の処理の前段として精度の比較的低い位置姿勢検出処理を行う場合にも、当該処理においては±５度程度の誤差まで許容できるため、処理負荷の軽減、或いは処理時間の短縮等が可能になる。 In addition, when position and orientation detection processing with relatively low accuracy is performed as a pre-stage of the processing of the present embodiment, an error of about ± 5 degrees can be allowed in the processing, so that the processing load is reduced or the processing time is shortened. Etc. becomes possible.

以下、本実施形態に係る処理装置１００等のシステム構成例について説明し、その後位相限定相関法について簡単に説明する。さらに、本実施形態の処理の詳細を図２２のフローチャートを用いて説明する。 Hereinafter, a system configuration example of the processing apparatus 100 and the like according to this embodiment will be described, and then the phase only correlation method will be briefly described. Furthermore, details of the processing of this embodiment will be described with reference to the flowchart of FIG.

２．システム構成例
図１５に本実施形態に係る処理装置１００の詳細なシステム構成例を示す。ただし、処理装置１００は図１５の構成に限定されず、これらの一部の構成要素を省略したり、他の構成要素を追加するなどの種々の変形実施が可能である。なお、種々の変形実施が可能な点は図１６等でも同様である。 2. System Configuration Example FIG. 15 shows a detailed system configuration example of the processing apparatus 100 according to this embodiment. However, the processing apparatus 100 is not limited to the configuration of FIG. 15, and various modifications such as omitting some of these components or adding other components are possible. Note that various modifications can be made in FIG.

処理装置１００は、入力画像取得部１１０と、テンプレート画像取得部１２０と、処理部１３０と、記憶部１４０を含む。入力画像取得部１１０と、テンプレート画像取得部１２０と、処理部１３０については、図６を用いて上述したものと同様であるため、詳細な説明は省略する。 The processing apparatus 100 includes an input image acquisition unit 110, a template image acquisition unit 120, a processing unit 130, and a storage unit 140. The input image acquisition unit 110, template image acquisition unit 120, and processing unit 130 are the same as those described above with reference to FIG.

記憶部１４０は、処理部１３０等のワーク領域となるもので、その機能はＲＡＭ等のメモリーやＨＤＤ（ハードディスクドライブ）などにより実現できる。記憶部１４０は、例えば位置姿勢検出処理の対象であるオブジェクトの３次元モデルデータを記憶する。３次元モデルデータとは例えばＣＡＤデータ等である。記憶部１４０はテンプレート画像取得部１２０に接続され、テンプレート画像取得部１２０は、記憶部１４０から３次元モデルデータを読み出す。また、記憶部１４０はテンプレート画像取得部１２０で取得（生成）されたテンプレート画像の記憶を行ってもよい。 The storage unit 140 serves as a work area for the processing unit 130 and the like, and its function can be realized by a memory such as a RAM, an HDD (hard disk drive), or the like. The storage unit 140 stores, for example, 3D model data of an object that is a target of position and orientation detection processing. The three-dimensional model data is, for example, CAD data. The storage unit 140 is connected to the template image acquisition unit 120, and the template image acquisition unit 120 reads the 3D model data from the storage unit 140. The storage unit 140 may store the template image acquired (generated) by the template image acquisition unit 120.

また、図１５では入力画像取得部１１０は、撮像部１５０からの撮像画像を入力画像として取得する例を示した。図１５では撮像部１５０は処理装置１００の外部に設けられるものとしたがこれには限定されず、処理装置１００が撮像部１５０を含んでもよい。 FIG. 15 illustrates an example in which the input image acquisition unit 110 acquires a captured image from the imaging unit 150 as an input image. In FIG. 15, the imaging unit 150 is provided outside the processing apparatus 100, but the present invention is not limited to this, and the processing apparatus 100 may include the imaging unit 150.

また、図１５には不図示であるが、処理部１３０により検出されたオブジェクトの位置姿勢を、他の機器に対して出力する出力部等を含んでもよい。 Although not shown in FIG. 15, an output unit or the like that outputs the position and orientation of the object detected by the processing unit 130 to another device may be included.

また本実施形態の手法は、図１６に示したように、入力画像を取得する入力画像取得部１１０と、オブジェクトの３次元モデルデータに基づいて、テンプレート画像を取得するテンプレート画像取得部１２０と、入力画像とテンプレート画像に基づいて、入力画像におけるオブジェクトの位置姿勢の検出処理を行う処理部１３０を含むロボット３０に適用できる。そしてロボット３０のテンプレート画像取得部１２０は、３次元モデルデータに対して、第１〜第３の軸により規定される３次元空間での位置のうちの第３の軸での位置と、第１〜第３の軸の各軸まわりでの回転角である第１〜第３の回転角と、を表す姿勢情報を設定することで、３次元モデルデータからテンプレート画像を取得し、処理部１３０は、入力画像とテンプレート画像に対して、位相限定相関処理を行うことで、画像間の類似度を求め、類似度に基づいて、姿勢情報の更新処理を行い、更新処理後の姿勢情報により取得された新たなテンプレート画像と、入力画像に対して、位相限定相関処理を行う。 Also, as shown in FIG. 16, the method of the present embodiment includes an input image acquisition unit 110 that acquires an input image, a template image acquisition unit 120 that acquires a template image based on the three-dimensional model data of the object, The present invention can be applied to a robot 30 including a processing unit 130 that performs processing for detecting the position and orientation of an object in an input image based on the input image and the template image. Then, the template image acquisition unit 120 of the robot 30 performs the first axis position on the third axis among the positions in the three-dimensional space defined by the first to third axes with respect to the three-dimensional model data. By setting posture information representing first to third rotation angles that are rotation angles around each axis of the third axis, a template image is acquired from the three-dimensional model data, and the processing unit 130 The phase-only correlation processing is performed on the input image and the template image to obtain the similarity between the images, and the posture information is updated based on the similarity, and is obtained from the posture information after the update processing. Phase-only correlation processing is performed on the new template image and the input image.

ここでの処理部１３０は、例えば位相限定相関法と漸近法を用いて検出したオブジェクトの位置姿勢を用いて、ロボット機構（アーム３１０等）の制御を行う。例えば、把持や加工、組み立て等の対象であるオブジェクトの位置姿勢を検出し、検出した位置姿勢から特定した作業位置（把持位置等）にエンドエフェクター３１９を移動させ、当該エンドエフェクター３１９でオブジェクトの作業位置に対して作業を行う制御等を実行することが考えられる。 Here, the processing unit 130 controls the robot mechanism (such as the arm 310) using, for example, the position and orientation of the object detected using the phase-only correlation method and the asymptotic method. For example, the position and orientation of an object that is a target of gripping, processing, assembly, and the like are detected, the end effector 319 is moved from the detected position and orientation to a specified work position (gripping position, etc.), and the end effector 319 moves the object. It is conceivable to perform control or the like for performing work on the position.

また、ロボット３０は図１５に示した処理装置１００と同様に記憶部１４０を含んでもよい。また、ロボット３０は図１６に示したように、アーム３１０とエンドエフェクター３１９と、ハンドアイカメラ３３０を含んでもよい。ここでのアーム３１０等は処理部１３０により制御されてもよい。図１６では、入力画像取得部１１０は、ハンドアイカメラ３３０により撮像される撮像画像を、入力画像として取得するものとしたがこれに限定されず、ロボット３０に設けられる他の撮像部からの画像を入力画像としてもよいし、ロボットの外部に設けられる撮像部からの画像を入力画像としてもよい。 Further, the robot 30 may include a storage unit 140 as in the processing apparatus 100 illustrated in FIG. Further, the robot 30 may include an arm 310, an end effector 319, and a hand eye camera 330 as shown in FIG. Here, the arm 310 and the like may be controlled by the processing unit 130. In FIG. 16, the input image acquisition unit 110 acquires a captured image captured by the hand eye camera 330 as an input image. However, the input image acquisition unit 110 is not limited to this, and is an image from another imaging unit provided in the robot 30. May be used as an input image, or an image from an imaging unit provided outside the robot may be used as an input image.

ここでのロボット３０とは、図１７に示したように、制御装置２００と、ロボット本体３００と、を含むロボット３０であってもよい。図１７の構成であれば、制御装置２００に図１６の処理部１３０等が含まれる。そしてロボット本体３００は、アーム３１０と、エンドエフェクター３１９を含む。このようにすれば、オブジェクトの位置姿勢を検出し、当該位置姿勢を用いた制御に従って動作するロボットを実現することが可能になる。 Here, the robot 30 may be a robot 30 including a control device 200 and a robot body 300 as shown in FIG. With the configuration of FIG. 17, the control device 200 includes the processing unit 130 of FIG. The robot body 300 includes an arm 310 and an end effector 319. This makes it possible to realize a robot that detects the position and orientation of an object and operates according to control using the position and orientation.

なお、本実施形態に係るロボットの構成例は図１７に限定されない。例えば、図１８に示したように、ロボットは、ロボット本体３００と、ベースユニット部３５０を含んでもよい。本実施形態に係るロボットは図１８に示したように双腕ロボットであってもよく、頭部や胴体に相当する部分に加え、第１のアーム３１０−１と第２のアーム３１０−２を含む。図１８では第１のアーム３１０−１は、関節３１１，３１３と、関節の間に設けられるフレーム３１５，３１７から構成され、第２のアーム３１０−２についても同様のものとしたがこれに限定されない。なお、図１８では２本のアームを有する双腕ロボットの例を示したが、本実施形態のロボットは３本以上のアームを有してもよい。 Note that the configuration example of the robot according to the present embodiment is not limited to FIG. For example, as shown in FIG. 18, the robot may include a robot main body 300 and a base unit portion 350. The robot according to the present embodiment may be a double-arm robot as shown in FIG. 18, and in addition to portions corresponding to the head and the torso, the first arm 310-1 and the second arm 310-2 are provided. Including. In FIG. 18, the first arm 310-1 is composed of joints 311 and 313 and frames 315 and 317 provided between the joints. The same applies to the second arm 310-2. Not. 18 shows an example of a double-arm robot having two arms, the robot of this embodiment may have three or more arms.

ベースユニット部３５０は、ロボット本体３００の下部に設けられ、ロボット本体３００を支持する。図１８の例では、ベースユニット部３５０には車輪等が設けられ、ロボット全体が移動可能な構成となっている。ただし、ベースユニット部３５０が車輪等を持たず、床面等に固定される構成であってもよい。図１８のロボットシステムでは、ベースユニット部３５０に制御装置２００が格納されることで、ロボット本体３００と制御装置２００とが一体として構成される。 The base unit 350 is provided in the lower part of the robot body 300 and supports the robot body 300. In the example of FIG. 18, the base unit portion 350 is provided with wheels and the like so that the entire robot can move. However, the base unit 350 may be configured to be fixed to a floor surface or the like without having wheels or the like. In the robot system of FIG. 18, the robot main body 300 and the control device 200 are integrally configured by storing the control device 200 in the base unit unit 350.

或いは、制御装置２００のように、特定の制御用の機器を設けることなく、ロボットに内蔵される基盤（更に具体的には基盤上に設けられるＩＣ等）により、上記の処理部１３０等を実現してもよい。 Alternatively, the processing unit 130 and the like can be realized by a base built in the robot (more specifically, an IC provided on the base) without providing a specific control device as in the control device 200. May be.

また、図１９に示すように、処理装置１００或いはロボット３０に含まれる処理部１３０等（以下処理装置１００等と記載）の機能は、有線及び無線の少なくとも一方を含むネットワーク４００を介して、端末装置（狭義にはロボット３０であるため、以下ではロボットとして説明する）と通信接続されたサーバー５００により実現されてもよい。 Further, as shown in FIG. 19, the functions of the processing unit 130 or the like (hereinafter referred to as the processing device 100 or the like) included in the processing device 100 or the robot 30 are performed via a network 400 including at least one of wired and wireless. It may be realized by a server 500 that is communicatively connected to an apparatus (because it is a robot 30 in a narrow sense and will be described as a robot below).

或いは本実施形態では、本発明の処理装置等の処理の一部を、処理装置であるサーバー５００が行ってもよい。この場合には、ロボット側に設けられた処理装置等との分散処理により、当該処理を実現する。 Or in this embodiment, the server 500 which is a processing apparatus may perform a part of processes, such as a processing apparatus of this invention. In this case, the processing is realized by distributed processing with a processing device or the like provided on the robot side.

そして、この場合に、処理装置であるサーバー５００は、本発明の処理装置等における各処理のうち、サーバー５００に割り当てられた処理を行う。一方、ロボットに設けられた処理装置等は、本発明の処理装置等の各処理のうち、ロボットの処理装置等に割り当てられた処理を行う。 In this case, the server 500 which is a processing device performs processing assigned to the server 500 among the processing in the processing device or the like of the present invention. On the other hand, a processing device or the like provided in the robot performs processing assigned to the processing device or the like of the robot among the processing of the processing device or the like of the present invention.

例えば、本発明の処理装置等が第１〜第Ｍ（Ｍは整数）の処理を行うものであり、第１の処理がサブ処理１ａ及びサブ処理１ｂにより実現され、第２の処理がサブ処理２ａ及びサブ処理２ｂにより実現されるように、第１〜第Ｍの各処理が複数のサブ処理に分割できる場合を考える。この場合、処理装置等であるサーバー５００がサブ処理１ａ、サブ処理２ａ、・・・サブ処理Ｍａを行い、ロボット側に設けられた処理装置等がサブ処理１ｂ、サブ処理２ｂ、・・・サブ処理Ｍｂを行うといった分散処理が考えられる。この際、本実施形態に係る処理装置等、すなわち、第１〜第Ｍの処理を実行する処理装置等とは、サブ処理１ａ〜サブ処理Ｍａを実行する処理装置等であってもよいし、サブ処理１ｂ〜サブ処理Ｍｂを実行する処理装置等であってもよいし、サブ処理１ａ〜サブ処理Ｍａ及びサブ処理１ｂ〜サブ処理Ｍｂの全てを実行する処理装置等であってもよい。更にいえば、本実施形態に係る処理装置等は、第１〜第Ｍの処理の各処理について、少なくとも１つのサブ処理を実行するロボット制御装置である。 For example, the processing apparatus or the like of the present invention performs the first to M-th (M is an integer) processes, the first process is realized by the sub-process 1a and the sub-process 1b, and the second process is the sub-process. Consider a case where each of the first to Mth processes can be divided into a plurality of sub-processes, as realized by 2a and sub-process 2b. In this case, the server 500 which is a processing device or the like performs sub-processing 1a, sub-processing 2a,..., Sub-processing Ma, and the processing device provided on the robot side performs sub-processing 1b, sub-processing 2b,. Distributed processing such as performing processing Mb is conceivable. At this time, the processing apparatus according to the present embodiment, that is, the processing apparatus that executes the first to Mth processes may be a processing apparatus that executes the sub-process 1a to the sub-process Ma, A processing device or the like that executes sub-processing 1b to sub-processing Mb may be used, or a processing device or the like that executes all of sub-processing 1a to sub-processing Ma and sub-processing 1b to sub-processing Mb. Furthermore, the processing apparatus according to the present embodiment is a robot control apparatus that executes at least one sub-process for each of the first to M-th processes.

これにより、例えばロボット側の端末装置（例えば図１７の制御装置２００）よりも処理能力の高いサーバー５００が、処理負荷の高い処理を行うこと等が可能になる。さらに、サーバー５００が各ロボットの動作を一括して制御することができ、例えば複数のロボットに協調動作をさせること等が容易になる。 As a result, for example, the server 500 having a higher processing capability than the robot-side terminal device (for example, the control device 200 of FIG. 17) can perform processing with a high processing load. Further, the server 500 can collectively control the operations of the robots, and for example, it is easy to cause a plurality of robots to perform a cooperative operation.

また近年は、多品種少数の部品を製造することが増えてきている。そして、製造する部品の種類を変更する場合には、ロボットが行う動作を変更する必要がある。図１９に示すような構成であれば、複数のロボットの各ロボットへ教示作業をし直さなくても、サーバー５００が一括して、ロボットが行う動作を変更すること等が可能になる。さらに、各ロボットに対して一つの処理装置等を設ける場合に比べて、処理装置等のソフトウェアアップデートを行う際の手間を大幅に減らすこと等が可能になる。 In recent years, the production of a small number of various types of parts has been increasing. And when changing the kind of components to manufacture, it is necessary to change the operation | movement which a robot performs. With the configuration as shown in FIG. 19, the server 500 can change the operations performed by the robot in a batch without re-instructing each robot of the plurality of robots. Furthermore, compared with the case where one processing device or the like is provided for each robot, it is possible to greatly reduce the trouble of performing software update of the processing device or the like.

３．位相限定相関法
次に位相限定相関法（ＰＯＣ，Phase Only Correlation）について簡単に説明する。位相限定相関法は、２枚の画像をそれぞれ２次元フーリエ変換して得られる振幅情報と位相情報のうち、位相情報のみを用いて，２つの画像の平行移動量を検出する手法である。位相限定相関法の特性として、画像の移動量に対して鋭敏な特性を示すこと、及び画像の輝度変化や背景変動、隠れ等に対してロバストであることが知られている。 3. Phase Only Correlation Method Next, the phase only correlation method (POC, Phase Only Correlation) will be briefly described. The phase only correlation method is a method for detecting the parallel movement amount of two images using only phase information out of amplitude information and phase information obtained by two-dimensional Fourier transform of two images. As a characteristic of the phase only correlation method, it is known that it exhibits a sharp characteristic with respect to the amount of movement of the image, and is robust against changes in luminance, background fluctuation, hiding, and the like of the image.

位相限定相関法がロバストであることの説明図が、図２０（Ａ）〜図２０（Ｃ）である。この場合では、テンプレート画像は図２０（Ａ）に示したように、オブジェクトのみが含まれ、単色無地の背景の画像であるのに対して、入力画像は図２０（Ｂ）に示したように、背景に複雑な色味形状となるものが写り込んでしまっている。位相限定相関法では、このような状況でも精度よく移動量を検出することができ、実際に図２０（Ｃ）に示したように入力画像中の適切な位置にオブジェクトを検出すること等が可能である。 20A to 20C are explanatory diagrams showing that the phase-only correlation method is robust. In this case, as shown in FIG. 20A, the template image includes only the object and is a solid background image, whereas the input image is as shown in FIG. 20B. The background has a complex color shape. The phase-only correlation method can accurately detect the amount of movement even in such a situation, and can actually detect an object at an appropriate position in the input image as shown in FIG. It is.

位相限定相関法の概要を示したものが図２１である。図２１では、２つの画像としてテンプレート画像と入力画像を用い、テンプレート画像のオブジェクトに対する、入力画像のオブジェクトの移動量を検出している。 FIG. 21 shows an outline of the phase only correlation method. In FIG. 21, a template image and an input image are used as two images, and the movement amount of the object of the input image with respect to the object of the template image is detected.

位相限定相関法では、まず２つの画像に対してそれぞれ２次元フーリエ変換を行う。図２１では離散フーリエ変換（ＤＦＴ，discrete Fourier transform）を用いるものとしているが、これには限定されない。テンプレート画像に対して２次元フーリエ変換を行って得られた周波数軸での情報がＧ_ｒｅｆであり、入力画像に対して２次元フーリエ変換を行って得られた周波数軸での情報がＧ_ｉｎｐである。 In the phase only correlation method, first, two-dimensional Fourier transform is performed on each of two images. In FIG. 21, discrete Fourier transform (DFT) is used, but the present invention is not limited to this. Information on the frequency axis obtained by performing the two-dimensional Fourier transform on the template image is G _ref , and information on the frequency axis obtained by performing the two-dimensional Fourier transform on the input image is G _inp . is there.

例えば、テンプレート画像が、画像における座標値ｘ（ここでのｘはオブジェクトの６自由度の位置姿勢のうちの１つであるｘとは異なるものである）を用いて、関数ｆ（ｘ）と表現される場合、Ｇ_ｒｅｆは下式（１）により求めることができる。ここでＮは変換を正規化するための定数係数であり、ωは周波数を表す変数である。 For example, if the template image uses a coordinate value x in the image (where x is different from x, which is one of the positions and orientations of the six degrees of freedom of the object), the function f (x) and When expressed, G _ref can be obtained by the following equation (1). Here, N is a constant coefficient for normalizing the transformation, and ω is a variable representing the frequency.

上式（１）の右辺は、振幅部分と位相部分に分けて考えることができ、下式（２）のように表現することができる。 The right side of the above equation (1) can be divided into an amplitude portion and a phase portion, and can be expressed as the following equation (2).

Ｇ_ｒｅｆ＝Ａ（ω）ｅｘｐ（ｉθ（ω））・・・・・（２）
なお、上式（１）、（２）は１次元の情報に対して１次元のフーリエ変換を行った式であり、実際には画像情報は２次元であり、フーリエ変換も２次元で行われることから、下式（３）により考える必要がある。ただし、フーリエ変換の２次元への拡張は容易であることから、以下では上式（１）、（２）を用いて１次元で説明を行う。 G _ref = A (ω) exp (iθ (ω)) (2)
The above formulas (1) and (2) are formulas obtained by performing one-dimensional Fourier transform on one-dimensional information. Actually, image information is two-dimensional, and Fourier transform is also performed in two dimensions. Therefore, it is necessary to consider the following equation (3). However, since the Fourier transform can be easily extended to two dimensions, the following description will be made in one dimension using the above equations (1) and (2).

ここで、ｆ（ｘ）がｄだけ移動した場合、例えば画像中のオブジェクトがｄだけ移動された場合を考える。この場合、移動後の画像信号に対するフーリエ変換は下式（４）により表される。 Here, consider a case where f (x) is moved by d, for example, a case where an object in the image is moved by d. In this case, the Fourier transform for the image signal after movement is expressed by the following equation (4).

上式（４）と上式（２）を比較するとわかるように、位置の移動は、周波数空間では位相の変化に変換されていることがわかり、振幅Ａ（ω）には影響しない。つまり、画像における位置の変化が位相に変換され、明るさの変化が振幅に変換されることになり、位相だけを処理すれば画像の明るさ等による影響を受けずに、位置検出を行うことが可能になる。 As can be seen by comparing the above equation (4) and the above equation (2), it can be seen that the movement of the position is converted into a phase change in the frequency space, and does not affect the amplitude A (ω). In other words, a change in position in the image is converted into a phase, and a change in brightness is converted into an amplitude. If only the phase is processed, the position is detected without being affected by the brightness of the image. Is possible.

つまり、上式（２）から位相部分であるＡ（ω）を取った位相部分∠Ｇ_ｒｅｆを処理に用いる。∠Ｇ_ｒｅｆは下式（５）により表すことができる。 That is, the phase part ∠G _ref obtained by taking A (ω) which is the phase part from the above equation (2) is used for the processing. ∠G _ref can be expressed by the following equation (5).

∠Ｇ_ｒｅｆ＝ｅｘｐ（ｉθ（ω））・・・・・（５）
テンプレート画像と同様に、入力画像についても位相部分である∠Ｇ_ｉｎｐを求める。そして位相限定相関法では、∠Ｇ_ｒｅｆと∠Ｇ_ｉｎｐを用いて位相限定合成処理を行い、その結果に対して逆フーリエ変換を行う。ここで、位相限定合成処理とは、一方の位相情報に対して、他方の位相情報の複素共役をかける処理に対応する。 ∠G _ref = exp (iθ (ω)) (5)
Similarly to the template image, ∠G _inp which is a phase portion is obtained for the input image. In the phase-only correlation method, phase-only combining processing is performed using ∠G _ref and ∠G _inp, and inverse Fourier transform is performed on the result. Here, the phase-only combining process corresponds to a process of multiplying one phase information with a complex conjugate of the other phase information.

この場合、入力画像がテンプレート画像に対してｄだけ変位している画像であれば、上式（４）と同様に考えられるため、∠Ｇ_ｉｎｐは下式（６）となる。 In this case, if the input image is an image displaced by d with respect to the template image, it can be considered in the same manner as the above equation (4), and therefore ∠G _inp is represented by the following equation (6).

∠Ｇ_ｉｎｐ＝ｅｘｐ（ｉθ（ω）−２πｉωｄ）・・・・・（６）
よって位相限定合成処理は、上式（５）の複素共役と上式（６）の乗算となるため、ｅｘｐ（ｉθ（ω））がキャンセルされて、ｅｘｐ（−２πｉωｄ）となる。位相限定合成処理の結果に対して逆フーリエ変換を行うことで、下式（７）が導かれる。 ∠G _inp = exp (iθ (ω) −2πiωd) (6)
Therefore, the phase only synthesis process is the multiplication of the complex conjugate of the above equation (5) and the above equation (6), so exp (iθ (ω)) is canceled and becomes exp (−2πiωd). By performing inverse Fourier transform on the result of the phase only synthesis process, the following expression (7) is derived.

ここでＣは定数であり、δはδ関数を表す。つまりテンプレート画像と、当該テンプレート画像に対してｄだけ移動した入力画像に対して位相限定相関処理を行うことで、原点からｄの位置にピークを持つδ関数が得られる。図２１においても、入力画像はテンプレート画像に対して左方向にずれているが、処理結果では原点に対して左方向にずれた位置にピークが得られている。 Here, C is a constant, and δ represents a δ function. That is, a δ function having a peak at the position d from the origin is obtained by performing the phase-only correlation process on the template image and the input image moved by d with respect to the template image. In FIG. 21 as well, the input image is shifted to the left with respect to the template image, but in the processing result, a peak is obtained at a position shifted to the left with respect to the origin.

一方で、テンプレート画像と入力画像の相関がない場合、∠Ｇ_ｒｅｆと∠Ｇ_ｉｎｐに相関がなくなるため、例えば∠Ｇ_ｉｎｐ＝ｅｘｐ（ｉφ（ω））となる。この場合、位相限定合成処理の結果は、ｅｘｐ（−ｉθ（ω）＋ｉφ（ω））となり、これはランダムな変数となるため、逆フーリエ変換を行った結果は一様ノイズとなる。 On the other hand, when there is no correlation between the template image and the input image, since there is no correlation between ∠G _ref and ∠G _inp , for example, ∠G _inp = exp (iφ (ω)). In this case, the result of the phase-only synthesis process is exp (−iθ (ω) + iφ (ω)), which is a random variable, and the result of performing the inverse Fourier transform is uniform noise.

また、テンプレート画像と入力画像でオブジェクトが同一ではないが、かなり高い相関を示す場合、処理結果はδ関数と一様ノイズの中間的な状態となる。これは、本実施形態の手法であれば、（ｚ，ｕ，ｖ，ｗ）の値が解と異なることで、テンプレート画像中のオブジェクトの形状、姿勢、サイズが入力画像とわずかに異なる場合に対応する。すなわち、ピークは現れるものの、鋭さはδ関数よりも低くなる。本実施形態では上述したように、姿勢のズレによるピークの鋭さの違い（類似度の値の違い）に基づいて、漸近法により解となる（ｚ，ｕ，ｖ，ｗ）の値を探索する。 If the template image and the input image do not have the same object but show a fairly high correlation, the processing result is in an intermediate state between the δ function and the uniform noise. This is because, in the method of the present embodiment, the value of (z, u, v, w) is different from the solution, and the shape, orientation, and size of the object in the template image are slightly different from the input image. Correspond. That is, although the peak appears, the sharpness is lower than the δ function. In the present embodiment, as described above, based on the difference in peak sharpness (difference in similarity value) due to the deviation in posture, the value of (z, u, v, w) that is the solution is searched by the asymptotic method. .

４．処理の詳細
本実施形態の処理の詳細を図２２のフローチャートを用いて説明する。この処理が開始されると、まず事前処理により求められている比較的精度の低い位置姿勢検出の結果から、オブジェクトのテンプレート画像を生成する（Ｓ１０１）。 4). Details of Processing Details of processing according to the present embodiment will be described with reference to a flowchart of FIG. When this process is started, first, a template image of an object is generated from the result of position / orientation detection with relatively low accuracy obtained by pre-processing (S101).

具体的には、テンプレート画像取得部１２０は、３次元モデルデータの透視変換処理を行うことで前記テンプレート画像を取得する。第３の軸は、透視変換における奥行き方向に対応する軸とすればよい。例えば、図２３に示したように、設定された視点からの視線の方向（仮想カメラを想定した場合の光軸の方向）をｚ軸に沿った方向とすればよい。 Specifically, the template image acquisition unit 120 acquires the template image by performing a perspective conversion process on the three-dimensional model data. The third axis may be an axis corresponding to the depth direction in perspective transformation. For example, as shown in FIG. 23, the direction of the line of sight from the set viewpoint (the direction of the optical axis when a virtual camera is assumed) may be the direction along the z axis.

そして生成されたテンプレート画像と、入力画像とを用いて位相限定相関法（ＰＯＣ）を用いた処理を行う（Ｓ１０２）。この際、上述したように山の裾野部分を広げる手法を用いるとよい。具体的には、入力画像に対して解像度低下処理を行って、低解像度入力画像を取得し、当該低解像度入力画像に対してＰＯＣを行えばよい。 Then, processing using the phase only correlation method (POC) is performed using the generated template image and the input image (S102). At this time, as described above, it is preferable to use a method of expanding the base of the mountain. Specifically, a resolution reduction process may be performed on the input image to obtain a low resolution input image, and POC may be performed on the low resolution input image.

この際、入力画像の解像度とテンプレート画像の解像度は対応している（狭義には一致している）ことが求められる。よって、ＰＯＣに用いるテンプレート画像は、低解像度入力画像と同程度の解像度とする必要がある。 At this time, the resolution of the input image and the resolution of the template image are required to correspond (in a narrow sense, match). Therefore, the template image used for POC needs to have the same resolution as the low-resolution input image.

例えば処理部１３０は、テンプレート画像に対して解像度低下処理を行って、低解像度入力画像に対応する解像度となる低解像度テンプレート画像を取得し、低解像度入力画像と低解像度テンプレート画像に対して、位相限定相関処理を行ってもよい。この場合、Ｓ１０１の処理では解像度の高い（例えば解像度低下処理前の入力画像と同程度の解像度の）テンプレート画像を取得しておき、Ｓ１０２の処理においてテンプレート画像に対しても解像度低下処理を行うことになる。 For example, the processing unit 130 performs a resolution reduction process on the template image to obtain a low resolution template image having a resolution corresponding to the low resolution input image, and performs phase conversion on the low resolution input image and the low resolution template image. Limited correlation processing may be performed. In this case, a template image having a high resolution (for example, the same resolution as the input image before the resolution reduction process) is acquired in the process of S101, and the resolution reduction process is also performed on the template image in the process of S102. become.

或いは、Ｓ１０１の処理時点で、テンプレート画像取得部１２０は、テンプレート画像におけるオブジェクトの解像度が、低解像度入力画像におけるオブジェクトの解像度に対応する画像をテンプレート画像として取得してもよい。この場合、Ｓ１０２の処理では、処理部１３０は、低解像度入力画像とテンプレート画像に対して、位相限定相関処理を行うことになる。つまり、テンプレート画像と低解像度入力画像（狭義にはそれぞれに含まれるオブジェクト）の解像度の対応がとれていればよいため、テンプレート画像の取得時点で解像度の調整を行っておいてもよい。 Alternatively, at the time of the processing of S101, the template image acquisition unit 120 may acquire, as a template image, an image whose object resolution in the template image corresponds to the object resolution in the low-resolution input image. In this case, in the process of S102, the processing unit 130 performs the phase only correlation process on the low resolution input image and the template image. That is, since it is sufficient that the resolution of the template image and the low-resolution input image (objects included in the narrow sense) correspond to each other, the resolution may be adjusted at the time of obtaining the template image.

また、Ｓ１０２では山の裾野部分を広げる他の手法として、位相限定合成処理の結果に対して重み付け処理を行ってもよい。 In S102, weighting processing may be performed on the result of the phase-only synthesis processing as another method of expanding the base of the mountain.

Ｓ１０２の後は、漸近法により解に近づく方向の、パラメーター（位置姿勢）の変化方向を求める処理を行う。具体的には、Ｓ１０２に対応する位置姿勢から、ｚ，ｕ，ｖ，ｗのそれぞれを微小に変化させて、それぞれについてテンプレート画像を生成する（Ｓ１０３）。 After S102, processing for obtaining a parameter (position and orientation) change direction in a direction approaching the solution by an asymptotic method is performed. Specifically, each of z, u, v, and w is minutely changed from the position and orientation corresponding to S102, and a template image is generated for each (S103).

すなわち、テンプレート画像取得部１２０は、姿勢情報により求められる第１のテンプレート画像と、姿勢情報のうちの第３の軸での位置を変化させることにより取得される第２のテンプレート画像と、姿勢情報のうちの第１の回転角を変化させることにより取得される第３のテンプレート画像と、姿勢情報のうちの第２の回転角を変化させることにより取得される第４のテンプレート画像と、姿勢情報のうちの第３の回転角を変化させることにより取得される第５のテンプレート画像と、を取得する。 That is, the template image acquisition unit 120 obtains the first template image obtained from the posture information, the second template image obtained by changing the position on the third axis in the posture information, and the posture information. 3rd template image acquired by changing the 1st rotation angle among them, 4th template image acquired by changing the 2nd rotation angle among posture information, and posture information And a fifth template image acquired by changing the third rotation angle.

そして、取得された複数のテンプレート画像の各点プレート画像を用いて、Ｓ１０２と同様にＰＯＣを行い類似度を求め（Ｓ１０４）、その結果から（ｚ，ｕ，ｖ，ｗ）の各自由度に対して、どのような変化をさせることで類似度を大きくできるかを判定する（Ｓ１０５）。 Then, using each point plate image of the acquired plurality of template images, POC is performed in the same manner as in S102 to obtain a similarity (S104), and from the result, each degree of freedom of (z, u, v, w) is obtained. On the other hand, it is determined what kind of change the similarity can be increased (S105).

Ｓ１０５では、例えば処理部１３０は、取得された第１〜第５のテンプレート画像と、入力画像を用いた位相限定相関処理により求められる複数の類似度に基づいて、姿勢情報の更新処理を行う。 In S <b> 105, for example, the processing unit 130 performs posture information update processing based on the obtained first to fifth template images and a plurality of similarities obtained by phase-only correlation processing using the input image.

ここで、第１のテンプレート画像とは、Ｓ１０２に対応する画像であり、変化幅は０である。よって、Ｓ１０２での位置姿勢が図２３に示した状態であれば、第１のテンプレート画像は、図２４（Ａ）のようになる。また、第２のテンプレート画像とは、図２３におけるｚ軸での位置を微小に変化させて得られる画像である。なお、ｚ軸での変化方向は、プラス方向に変化させた方が解に近づく場合もあれば、マイナス方向に変化させた方が解に近づく場合もある。よって、ｚ軸についてプラスマイナスの両側に微小に変化させるとよい。具体的には図２３のＢ１とＢ２の方向にそれぞれ微小に変化させることで、図２４（Ｂ）、図２４（Ｃ）に示したように、２枚のテンプレート画像を第２のテンプレート画像として取得する。 Here, the first template image is an image corresponding to S102, and the change width is zero. Therefore, if the position and orientation in S102 are in the state shown in FIG. 23, the first template image is as shown in FIG. The second template image is an image obtained by slightly changing the position on the z axis in FIG. Note that the direction of change in the z-axis may be closer to the solution when changed in the positive direction, or may be closer to the solution when changed in the negative direction. Therefore, it is preferable to slightly change the z axis on both the plus and minus sides. Specifically, by making minute changes in the directions of B1 and B2 in FIG. 23, as shown in FIGS. 24B and 24C, two template images are used as the second template image. get.

また、第１の回転角としてｘ軸回りの回転角ｕを考えれば、ｕもＢ３とＢ４の両側に微小変化させる。そのため、第３のテンプレート画像として、図２５（Ａ）、図２５（Ｂ）の２枚の画像が取得される。同様に、第２の回転角ｖについてＢ５とＢ６の両側に微小変化させることで、第４のテンプレート画像として、図２５（Ｃ）、図２５（Ｄ）の２枚の画像が取得され、第３の回転角ｗについてＢ７とＢ８の両側に微小変化させることで、第５のテンプレート画像として、図２５（Ｅ）、図２５（Ｆ）の２枚の画像が取得される。 Further, if the rotation angle u around the x axis is considered as the first rotation angle, u is also slightly changed on both sides of B3 and B4. Therefore, two images shown in FIGS. 25A and 25B are acquired as the third template image. Similarly, by slightly changing the second rotation angle v on both sides of B5 and B6, two images of FIG. 25C and FIG. 25D are acquired as the fourth template image, By slightly changing the rotation angle w of 3 to both sides of B7 and B8, two images shown in FIGS. 25E and 25F are acquired as the fifth template image.

なお、図２４（Ｂ）〜図２５（Ｆ）では、姿勢情報の変化に対する画像上でのオブジェクトの形状、姿勢、サイズの変化を明確にするため、変化幅を大きく取ったが、実際には（ｚ，ｕ，ｖ，ｗ）の変化及び画像上でのオブジェクトの変化は非常に小さいものとなる。 Note that in FIG. 24B to FIG. 25F, the change width is set large in order to clarify the change in shape, posture, and size of the object on the image with respect to the change in posture information. The change of (z, u, v, w) and the change of the object on the image are very small.

これにより、（ｚ，ｕ，ｖ，ｗ）のそれぞれについて、プラスマイナスの両側に変化させたときの類似度の変化を求めることができる。よってＳ１０５ではその結果を用いて、（ｚ，ｕ，ｖ，ｗ）のそれぞれについて、プラスマイナスどちらの方向に、どの程度だけ値を変化させれば、類似度を最大とする方向に変化するかを求め、位置姿勢（ｚ，ｕ，ｖ，ｗ）を更新する。 As a result, for each of (z, u, v, w), it is possible to obtain a change in similarity when the value is changed to both plus and minus. Therefore, in S105, using the result, for each of (z, u, v, w), how much the value is changed in the plus or minus direction, the direction in which the similarity is maximized is changed. And the position and orientation (z, u, v, w) are updated.

そして、類似度が最大になったかの判定を行い（Ｓ１０６）、最大になればそのときの（ｚ，ｕ，ｖ，ｗ）がオブジェクトの位置姿勢に対応するものとして処理を終了する。なお、上述してきたようにｘとｙについては、当該（ｚ，ｕ，ｖ，ｗ）でのテンプレート画像を用いたＰＯＣの結果（類似度を最大にするｘ，ｙ）から決定すればよい。 Then, it is determined whether or not the degree of similarity is maximized (S106), and if it is maximized, the processing is terminated assuming that (z, u, v, w) at that time corresponds to the position and orientation of the object. As described above, x and y may be determined from the result of POC using the template image at (z, u, v, w) (x, y that maximizes similarity).

一方、Ｓ１０６でＮｏの場合には、さらに姿勢情報を更新してＰＯＣを実行する（Ｓ１０７）。具体的には、Ｓ１０５の結果により更新された後の姿勢情報を基準として、当該基準に対して、（ｚ，ｕ，ｖ，ｗ）のそれぞれについて値を微小に変化させて、第２〜第５のテンプレート画像を取得し、上記の処理を繰り返せばよい。 On the other hand, in the case of No in S106, the posture information is further updated and POC is executed (S107). Specifically, with the posture information updated according to the result of S105 as a reference, values for (z, u, v, w) are slightly changed with respect to the reference, 5 template images may be acquired and the above process repeated.

なお、以上の説明では省略したが、入力画像の解像度を低下させる処理は、本実施形態の処理において常時行う必要はない。例えば、処理部１３０は、低解像度入力画像に対する位相限定相関法を用いた処理の後に、入力画像に対する位相限定相関法を用いた処理を実行してもよい。 Although omitted in the above description, the process of reducing the resolution of the input image does not have to be performed constantly in the process of the present embodiment. For example, the processing unit 130 may execute a process using the phase only correlation method for the input image after the process using the phase only correlation method for the low resolution input image.

入力画像の解像度を低下させるのは、上述してきたように漸近法を実行できる範囲を広げるためである。つまり、漸近法の開始時或いは、開始直後においては、その段階での姿勢情報は解となる値とは離れている可能性が高いことから、解像度低下の有用性は高い。それに対して、漸近法が正常に実行され、ある程度の回数だけ姿勢情報の更新処理等が行われた状態であれば、その段階での姿勢情報は充分解に近づいていると考えられる。例えば、解に対する誤差が±１度程度まで近づいているのであれば、解像度を下げて山の裾野部分を広げなくても、漸近法により解へ収束させることが可能である。 The reason why the resolution of the input image is lowered is to widen the range in which the asymptotic method can be executed as described above. That is, at the start of the asymptotic method or immediately after the start, there is a high possibility that the posture information at that stage is far from the solution value, so that the usefulness of resolution reduction is high. On the other hand, if the asymptotic method is normally executed and the posture information update processing or the like has been performed a certain number of times, the posture information at that stage is considered to be close to charging / decomposition. For example, if the error with respect to the solution is close to about ± 1 degree, it is possible to converge to the solution by an asymptotic method without reducing the resolution and widening the bottom of the mountain.

図１３（Ａ）〜図１３（Ｃ）に示したように、解像度を低下させることで山の裾野部分は広がるが、その分ピークがぼけてしまう。つまり、解像度低下処理を行うことで、位置姿勢検出の精度が低下するおそれがある。よって、漸近法を実行できる範囲を広げることが重要視される状況であるか、位置姿勢検出の精度が重要視される状況であるかに応じて、解像度低下処理を実行するか否かを切り替えるとよい。具体的には上述したように、開始時からある程度の時間（或いは更新回数）までを第１フェーズとし、その後を第２フェーズとした場合に、第１フェーズでは入力画像に対する解像度低下処理を実行し、第２フェーズでは解像度低下処理をスキップすればよい。 As shown in FIGS. 13A to 13C, the base of the mountain is widened by reducing the resolution, but the peak is blurred accordingly. In other words, the accuracy of position and orientation detection may be reduced by performing the resolution reduction process. Therefore, whether or not to execute the resolution reduction process is switched depending on whether it is a situation where it is important to expand the range in which the asymptotic method can be executed or whether the accuracy of position and orientation detection is important. Good. Specifically, as described above, when the first phase is set for a certain period of time (or the number of updates) from the start and the second phase is set thereafter, resolution reduction processing is performed on the input image in the first phase. In the second phase, the resolution reduction process may be skipped.

テンプレート画像については、上述したようにマッチング対象の画像と解像度が対応する必要がある。よって第１フェーズでは、上述したように、比較的解像度の高いテンプレート画像を取得し、当該テンプレート画像に対して解像度低下処理を行って低解像度テンプレート画像を取得するものであってもよい。或いは、テンプレート画像として低解像度入力画像に対応する解像度の画像を取得してもよい。また、第２フェーズでは、入力画像に対して解像度低下処理を行わないため、テンプレート画像として入力画像に対応する解像度の画像を取得すればよく、当該解像度は第１のフェーズでの位相限定相関処理に用いられるテンプレート画像の解像度よりも高いものとなる。 As described above, the template image needs to correspond to the image to be matched with the resolution. Therefore, in the first phase, as described above, a template image having a relatively high resolution may be acquired, and a resolution reduction process may be performed on the template image to acquire a low resolution template image. Or you may acquire the image of the resolution corresponding to a low-resolution input image as a template image. In the second phase, since the resolution reduction process is not performed on the input image, an image having a resolution corresponding to the input image may be acquired as a template image. The resolution is the phase-only correlation process in the first phase. The resolution of the template image used in the above is higher.

なお、重み付け処理（狭義にはガウシアンフィルターを用いたフィルター処理）については、当該重み付け処理がノイズ低減に利用されることもあることから、状況に応じて処理をスキップということは基本的には想定していない。ただし、この点は種々の変形実施が可能であり、状況に応じて重みづけ処理をスキップすることは妨げられない。 As for weighting processing (filter processing using a Gaussian filter in a narrow sense), the weighting processing may be used for noise reduction, so it is basically assumed that the processing is skipped depending on the situation. Not done. However, various modifications can be made in this respect, and skipping the weighting process according to the situation is not prevented.

なお、本実施形態の処理装置１００等は、その処理の一部または大部分をプログラムにより実現してもよい。この場合には、ＣＰＵ等のプロセッサーがプログラムを実行することで、本実施形態の処理装置１００等が実現される。具体的には、非一時的な情報記憶媒体に記憶されたプログラムが読み出され、読み出されたプログラムをＣＰＵ等のプロセッサーが実行する。ここで、情報記憶媒体（コンピューターにより読み取り可能な媒体）は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＤＶＤ、ＣＤ等）、ＨＤＤ（ハードディスクドライブ）、或いはメモリー（カード型メモリー、ＲＯＭ等）などにより実現できる。そして、ＣＰＵ等のプロセッサーは、情報記憶媒体に格納されるプログラム（データ）に基づいて本実施形態の種々の処理を行う。即ち、情報記憶媒体には、本実施形態の各部としてコンピューター（操作部、処理部、記憶部、出力部を備える装置）を機能させるためのプログラム（各部の処理をコンピューターに実行させるためのプログラム）が記憶される。 Note that the processing apparatus 100 or the like according to the present embodiment may realize part or most of the processing by a program. In this case, a processor such as a CPU executes the program, whereby the processing device 100 according to the present embodiment is realized. Specifically, a program stored in a non-temporary information storage medium is read, and a processor such as a CPU executes the read program. Here, the information storage medium (computer-readable medium) stores programs, data, and the like, and functions as an optical disk (DVD, CD, etc.), HDD (hard disk drive), or memory (card type). It can be realized by memory, ROM, etc. A processor such as a CPU performs various processes according to the present embodiment based on a program (data) stored in the information storage medium. That is, in the information storage medium, a program for causing a computer (an apparatus including an operation unit, a processing unit, a storage unit, and an output unit) to function as each unit of the present embodiment (a program for causing the computer to execute processing of each unit) Is memorized.

また、本実施形態の処理装置等は、プロセッサーとメモリーを含んでもよい。ここでのプロセッサーは、例えばＣＰＵ（Central Processing Unit）であってもよい。ただしプロセッサーはＣＰＵに限定されるものではなく、ＧＰＵ（Graphics Processing Unit）、或いはＤＳＰ（Digital Signal Processor）等、各種のプロセッサーを用いることが可能である。またプロセッサーはＡＳＩＣによるハードウェア回路でもよい。また、メモリーはコンピューターにより読み取り可能な命令を格納するものであり、当該命令がプロセッサーにより実行されることで、本実施形態に係る処理装置等の各部が実現されることになる。ここでのメモリーは、ＳＲＡＭ、ＤＲＡＭなどの半導体メモリーであってもよいし、レジスターやハードディスク等でもよい。また、ここでの命令は、プログラムを構成する命令セットの命令でもよいし、プロセッサーのハードウェア回路に対して動作を指示する命令であってもよい。 Further, the processing apparatus and the like of the present embodiment may include a processor and a memory. The processor here may be, for example, a CPU (Central Processing Unit). However, the processor is not limited to the CPU, and various processors such as a GPU (Graphics Processing Unit) or a DSP (Digital Signal Processor) can be used. The processor may be an ASIC hardware circuit. In addition, the memory stores instructions that can be read by a computer. When the instructions are executed by the processor, each unit of the processing apparatus according to the present embodiment is realized. The memory here may be a semiconductor memory such as SRAM or DRAM, or a register or a hard disk. Further, the instruction here may be an instruction of an instruction set constituting the program, or an instruction for instructing an operation to the hardware circuit of the processor.

なお、以上のように本実施形態について詳細に説明したが、本発明の新規事項および効果から実体的に逸脱しない多くの変形が可能であることは当業者には容易に理解できるであろう。従って、このような変形例はすべて本発明の範囲に含まれるものとする。例えば、明細書又は図面において、少なくとも一度、より広義または同義な異なる用語と共に記載された用語は、明細書又は図面のいかなる箇所においても、その異なる用語に置き換えることができる。また処理装置１００等の構成、動作も本実施形態で説明したものに限定されず、種々の変形実施が可能である。 Although the present embodiment has been described in detail as described above, it will be easily understood by those skilled in the art that many modifications can be made without departing from the novel matters and effects of the present invention. Accordingly, all such modifications are intended to be included in the scope of the present invention. For example, a term described at least once together with a different term having a broader meaning or the same meaning in the specification or the drawings can be replaced with the different term in any part of the specification or the drawings. Further, the configuration and operation of the processing apparatus 100 and the like are not limited to those described in the present embodiment, and various modifications can be made.

３０ロボット、１００処理装置、１１０入力画像取得部、
１２０テンプレート画像取得部、１３０処理部、１４０記憶部、１５０撮像部、２００制御装置、３００ロボット本体、３１０アーム、３１１，３１３関節、
３１５，３１７フレーム、３１９エンドエフェクター、３３０ハンドアイカメラ、
３５０ベースユニット部、４００ネットワーク、５００サーバー 30 robot, 100 processing device, 110 input image acquisition unit,
120 template image acquisition unit, 130 processing unit, 140 storage unit, 150 imaging unit, 200 control device, 300 robot body, 310 arm, 311, 313 joint,
315, 317 frame, 319 end effector, 330 hand eye camera,
350 base unit, 400 network, 500 server

Claims

An input image acquisition unit for acquiring an input image;
For the three-dimensional model data of the object, the position on the third axis among the positions in the three-dimensional space defined by the first to third axes, and each axis of the first to third axes A template image acquisition unit configured to acquire a template image from the three-dimensional model data by setting posture information representing first to third rotation angles that are rotation angles around;
A phase-only correlation process is performed on the input image and the template image, a similarity between the images is obtained, the attitude information is updated based on the similarity, and the attitude information after the update process is used. A processing unit that performs detection processing of the position and orientation of the object in the input image by performing the phase-only correlation process on the acquired new template image and the input image;
The processing apparatus characterized by including.

In claim 1,
The processor is
When the posture information that maximizes the degree of similarity is obtained by a plurality of phase-only correlation processing by the update processing of the posture information,
The template image acquired based on the posture information, and the position on the first axis and the position on the second axis, which are obtained by the phase-only correlation process on the input image;
A position on the third axis represented by the posture information;
Information represented by the first to third rotation angles represented by the posture information,
A processing apparatus that detects the position and orientation of the object in the input image.

In claim 1 or 2,
The template image acquisition unit
A first template image obtained from the posture information;
Of the posture information, a second template image acquired by changing the position on the third axis;
Of the posture information, a third template image acquired by changing the first rotation angle;
Among the posture information, a fourth template image acquired by changing the second rotation angle;
A fifth template image obtained by changing the third rotation angle in the posture information;
The processor is
The posture information is updated based on the obtained first to fifth template images and a plurality of the similarities obtained by the phase-only correlation processing using the input image. Processing equipment.

In any one of Claims 1 thru | or 3,
The template image acquisition unit
The template image is obtained by performing a perspective transformation process of the three-dimensional model data,
The processing apparatus according to claim 3, wherein the third axis is an axis corresponding to a depth direction in the perspective transformation.

In any one of Claims 1 thru | or 4,
The processor is
After performing a given weighting process on the result of the phase-only synthesis process between the input phase image obtained by the frequency conversion process on the input image and the template phase image obtained by the frequency conversion process on the template image The processing apparatus is characterized in that the similarity is obtained by performing an inverse conversion process of the frequency conversion process.

In any one of Claims 1 thru | or 5,
The processor is
A resolution reduction process for reducing the resolution is performed on the input image to obtain a low-resolution input image having a lower resolution than the input image, and the phase-only correlation process is performed on the low-resolution input image. The processing apparatus characterized by performing.

In claim 6,
The processor is
The resolution reduction process is performed on the template image to obtain a low resolution template image having the resolution corresponding to the low resolution input image, and the low resolution input image and the low resolution template image A processing apparatus that performs phase-only correlation processing.

In claim 6,
The template image acquisition unit
The resolution of the object in the template image is acquired as the template image an image corresponding to the resolution of the object in the low-resolution input image,
The processor is
A processing apparatus that performs the phase-only correlation process on the low-resolution input image and the template image.

In any of claims 6 to 8,
The processor is
The processing apparatus, wherein the phase-only correlation process is performed on the input image after the phase-only correlation process is performed on the low-resolution input image.

An input image acquisition unit for acquiring an input image;
For the three-dimensional model data of the object, the position on the third axis among the positions in the three-dimensional space defined by the first to third axes, and each axis of the first to third axes A template image acquisition unit configured to acquire a template image from the three-dimensional model data by setting posture information representing first to third rotation angles that are rotation angles around;
A phase-only correlation process is performed on the input image and the template image, a similarity between the images is obtained, the attitude information is updated based on the similarity, and the attitude information after the update process is used. A processing unit that performs detection processing of the position and orientation of the object in the input image by performing the phase-only correlation process on the acquired new template image and the input image;
A robot characterized by including:

Performing input image acquisition processing for acquiring an input image;
For the three-dimensional model data of the object, the position on the third axis among the positions in the three-dimensional space defined by the first to third axes, and each axis of the first to third axes Performing template image acquisition processing for acquiring a template image from the three-dimensional model data by setting posture information representing first to third rotation angles that are rotation angles around;
A phase-only correlation process is performed on the input image and the template image, a similarity between the images is obtained, the attitude information is updated based on the similarity, and the attitude information after the update process is used. Performing detection processing of the position and orientation of the object in the input image by performing the phase-only correlation process on the acquired new template image and the input image;
A position and orientation detection method comprising:

An input image acquisition unit for acquiring an input image;
For the three-dimensional model data of the object, the position on the third axis among the positions in the three-dimensional space defined by the first to third axes, and each axis of the first to third axes A template image acquisition unit configured to acquire a template image from the three-dimensional model data by setting posture information representing first to third rotation angles that are rotation angles around;
A phase-only correlation process is performed on the input image and the template image, a similarity between the images is obtained, the attitude information is updated based on the similarity, and the attitude information after the update process is used. As a processing unit that detects the position and orientation of the object in the input image by performing the phase-only correlation process on the acquired new template image and the input image,
A program characterized by operating a computer.