JP2008309595A

JP2008309595A - Object recognizing device and program used for it

Info

Publication number: JP2008309595A
Application number: JP2007156908A
Authority: JP
Inventors: Loic Merckel; ロイックメルケル
Original assignee: Horiba Ltd
Current assignee: Horiba Ltd
Priority date: 2007-06-13
Filing date: 2007-06-13
Publication date: 2008-12-25

Abstract

<P>PROBLEM TO BE SOLVED: To perform pose calculation of a stereoscopic object surely in a short time based on the photographed images in an object recognizing device, where the pose of the stereoscopic object is computed by utilizing a plurality of fixed points of which the relative position relation with the stereoscopic object is determined and its relative position relation is known, without requiring a complicated constitution. <P>SOLUTION: This device includes: a photographing device 3 for photographing a stereoscopic object 2 where two or three fixed points are set; an inclination meter 4 for detecting the inclination of the photographing device 3 on the horizontal two axes respectively; a computing part for extracting each fixed point from the images photographed by the photographing device 3 to compute the pose of the stereoscopic object 2 using the position information of each fixed point on the images and each angle information on the two axes output from the inclination meter as parameters. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、二次元撮像画像に映し出された立体オブジェクトから、そのポーズを特定するオブジェクト認識装置等に関するものである。 The present invention relates to an object recognition device for specifying a pose from a three-dimensional object projected on a two-dimensional captured image.

二次元撮像画像からそこに映し出された立体オブジェクトのポーズを特定する手法は、フォトグラメトリ（写真測量法）やロボティクス、マシンビジョンなど広い分野に亘って利用されており、また、多くの研究者によって種々の方法が提案されている。 A method for identifying the pose of a 3D object projected from a two-dimensional captured image is used in a wide range of fields such as photogrammetry (photogrammetry), robotics, machine vision, and many researchers. Various methods have been proposed.

その一例として、オブジェクト上におけるｎ箇所の既知ポイント（以下定点という）から、オブジェクトのポーズを算出するという透視ｎ点解法が知られている。この透視ｎ点解法は、カメラの座標系とオブジェクトの座標系との相関を示す回転行列及び移動ベクトルを求めることに帰着され、これは通常、４次多項式で表されることから、従来は４箇所以上の定点の画像上における位置に基づいて、反復最適化手法によって逆問題を解き、オブジェクトのポーズを算定するようにしている。なお、ポーズとは、カメラに対するオブジェクトの相対的な姿勢及び／又は位置を示すものであり、カメラを基準とした空間座標系における各定点の３次元座標が求まれば、前記ポーズを算出することができる。 As an example, a perspective n-point method is known in which the pose of an object is calculated from n known points (hereinafter referred to as fixed points) on the object. This perspective n-point solution method results in obtaining a rotation matrix and a movement vector indicating the correlation between the camera coordinate system and the object coordinate system, which is usually expressed by a fourth-order polynomial. An inverse problem is solved by an iterative optimization method based on the positions of fixed points or more on the image, and the pose of the object is calculated. The pose indicates the relative posture and / or position of the object with respect to the camera. If the three-dimensional coordinates of each fixed point in the spatial coordinate system with respect to the camera are obtained, the pose is calculated. Can do.

しかしながら、４つ定点から計算すると、かなりの時間を要するため、例えば、動画の各フレームにおけるオブジェクトのポーズをリアルタイムで計算するといったことが難しくなる。 However, since it takes a considerable amount of time to calculate from four fixed points, it becomes difficult to calculate the pose of an object in each frame of a moving image in real time, for example.

そこで、非特許文献に示すように、３定点のみを利用するという透視３点手法が提案されている。ところが、この手法では、解が複数算出されたり、解けなくなったりする場合がしばしば生じ得る。逆に言えば、ある特定の条件下でなければ、解を得ることができない。 Therefore, as shown in non-patent literature, a perspective three-point method is proposed in which only three fixed points are used. However, with this method, there are often cases where a plurality of solutions are calculated or cannot be solved. Conversely, a solution cannot be obtained unless it is under certain conditions.

また、いずれの手法においても最適化手法を用いて逆問題を解くのだが、その際の初期値の設定によっては、非常に時間がかかるであるとか、場合によっては間違った解を出したり、解けなかったりすることもある。 In both methods, the optimization method is used to solve the inverse problem, but depending on the initial value setting at that time, it may take a lot of time, or in some cases, an incorrect solution may be obtained or solved. There may be no.

Fischler,M.A.,Bolles,R.C.: Randomsample consensus: a paradigm for model fitting with applications to image analysisand automated cartography. Commun. ACM 24(6) (1981) 381-395Fischler, M.A., Bolles, R.C .: Randomsample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Commun. ACM 24 (6) (1981) 381-395

本発明は、かかる不具合を解決すべくなされたものであって、立体オブジェクト上における２乃至３定点から、より確実に、かつ短時間で、当該立体オブジェクトのポーズを算出できるようにすることを、その主たる所期課題としたものである。 The present invention has been made to solve such a problem, and it is possible to calculate the pose of the three-dimensional object more reliably and in a short time from two to three fixed points on the three-dimensional object. This is the main intended issue.

かかる課題を解決するために本発明は次のような手段を講じたものである。 In order to solve this problem, the present invention provides the following means.

すなわち、本発明に係るオブジェクト認識装置は、立体オブジェクトとの相対位置関係が定められ、かつその相対位置関係が既知である複数の定点を利用してその立体オブジェクトのポーズを算出するものであって、 That is, the object recognition device according to the present invention calculates a pose of a three-dimensional object using a plurality of fixed points whose relative positional relationship with a three-dimensional object is determined and whose relative positional relationship is known. ,

２つ乃至３つの定点が設定された前記立体オブジェクトを撮像する撮像装置と、その撮像装置の水平２軸周りの各傾きをそれぞれ検出する傾斜計と、前記撮像装置によって撮像された画像から前記各定点を抽出し、画像上におけるそれら各定点の位置情報と、傾斜計から出力される前記２軸周りの各角度情報とをパラメータとして、前記立体オブジェクトのポーズを算出する算出部とを備えていることを特徴とする。 An imaging device that captures the three-dimensional object in which two to three fixed points are set, an inclinometer that detects each tilt around two horizontal axes of the imaging device, and each of the images captured by the imaging device. A calculation unit that extracts fixed points and calculates the position of each fixed point on the image and the angle information about the two axes output from the inclinometer as parameters; It is characterized by that.

このようなものによれば、傾斜計からの角度情報を利用しているので、定点が２つ乃至３つでもポーズ計算結果の信頼性を担保でき、また、定点を少なく設定できることから、短時間でのポーズ計算が可能になる。そして、このことから、例えば、動画の各フレームにおけるオブジェクトのポーズをリアルタイムで計算することが容易にできるようになる。このことは、種々の応用を可能にする。例えば、分析機器や情報機器、システム機器など、複雑な操作を必要とするものにおいて、ユーザがその機器（つまり立体オブジェクト）をビデオで撮像すると、リアルタイムで、その機器を認識し、機器の操作場所、操作方法などを、その画像上に矢印などで示すといったことなどができるようになる。さらに、構成的に言えば、傾斜計を撮像装置に取り付けるだけで、あとはソフトウェア処理によって実現できるため、既存装置からの大幅な複雑化を助長するものでもない。 According to such a configuration, since the angle information from the inclinometer is used, the reliability of the pose calculation result can be ensured even if there are two or three fixed points, and the fixed points can be set to a small number. It is possible to calculate the pose at. From this, for example, it becomes possible to easily calculate the pose of the object in each frame of the moving image in real time. This allows for various applications. For example, in a device that requires complicated operations, such as an analytical device, an information device, and a system device, when the user captures the device (that is, a three-dimensional object) with video, the device is recognized in real time, and the device operation location The operation method can be indicated by an arrow on the image. Furthermore, in terms of configuration, since it can be realized by software processing only by attaching an inclinometer to the imaging device, it does not promote significant complication from existing devices.

前記算出部が、前記立体オブジェクトのポーズを、反復最適化手法を用いて算出するものにおいて、その計算を短時間でかつより確実にできるようにするには、３つの定点を利用するとともに、その３定点のうちの２定点を結んだ直線が略鉛直となり、残りの１定点と前記２定点のうちの１定点とを結んだ直線が略水平となるように前記各定点の位置を設定しておき、前記算出部が、鉛直軸と撮像装置の画像面とのなす角度から本来計算されるべき定点座標を、その角度に代えて前記傾斜計からの一の角度情報を用いて仮定計算し、その仮定計算した定点座標を初期値として、前記反復最適化を行い、定点座標を算出するようにしたものが好ましい。 When the calculation unit calculates the pose of the solid object using an iterative optimization method, the calculation unit uses three fixed points in order to be able to perform the calculation in a short time and more reliably. The position of each fixed point is set so that a straight line connecting two fixed points of the three fixed points is substantially vertical, and a straight line connecting the remaining one fixed point and one fixed point of the two fixed points is substantially horizontal. In addition, the calculation unit calculates a fixed point coordinate that should be originally calculated from the angle formed by the vertical axis and the image plane of the imaging device, using one angle information from the inclinometer instead of the angle, It is preferable that the fixed point coordinates calculated by using the assumed fixed point coordinates as an initial value are subjected to the iterative optimization.

なぜなら、３定点を前述のように設定することで、鉛直軸と撮像装置の画像面とのなす角度が、傾斜計から出力される一の角度情報の値と近似することになるため、これを用いて算出した初期値の最終解に対する乖離を防止でき、反復最適化計算の時間短縮と確実性とを向上させることが可能になるからである。 Because, by setting the three fixed points as described above, the angle formed by the vertical axis and the image plane of the imaging device approximates the value of one angle information output from the inclinometer. This is because the deviation of the initial value calculated by using the final solution can be prevented, and it is possible to reduce the time and the certainty of the iterative optimization calculation.

以上のように構成した本発明によれば、撮像画像に基づいた立体オブジェクトの短時間かつ確実なポーズ計算を、複雑な構成を必要とすることなくできるようになり、種々の技術への応用可能性を大きく向上させることができる。 According to the present invention configured as described above, it is possible to perform a quick and reliable pose calculation of a three-dimensional object based on a captured image without requiring a complicated configuration, and can be applied to various technologies. Can be greatly improved.

以下、本発明の一実施形態について図面を参照して説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

本実施形態に係るオブジェクト認識装置１は、立体オブジェクト２の画像から、その立体オブジェクト２のポーズを認識するものであって、図１に示すように、前記立体オブジェクト２を撮像するための撮像装置３と、その撮像装置３の水平２軸周りの各傾きをそれぞれ検出する傾斜計４と、前記撮像装置３によって撮像された画像データを受信し、その画像データから立体オブジェクト２の姿勢を算出する情報処理装置５とを備えている。なお、このオブジェクト認識装置１は、撮像装置３が静止して立体オブジェクト２が動く場合、撮像装置３が動いて立体オブジェクト２が静止している場合、双方が動く場合の３態様に適用可能なものであるが、以下では、２つめ、すなわち撮像装置３が動いて立体オブジェクト２が静止している場合についてのオブジェクト２の認識を行う例につき、説明する。 An object recognition apparatus 1 according to the present embodiment recognizes a pose of a three-dimensional object 2 from an image of the three-dimensional object 2, and as shown in FIG. 3, an inclinometer 4 that detects each inclination of the imaging device 3 around two horizontal axes, and image data captured by the imaging device 3, and calculates the orientation of the three-dimensional object 2 from the image data And an information processing device 5. The object recognition device 1 can be applied to three modes in which the imaging device 3 is stationary and the three-dimensional object 2 is moving, the imaging device 3 is moving and the three-dimensional object 2 is stationary, and both are moving. However, in the following, the second example, that is, an example in which the object 2 is recognized when the imaging device 3 moves and the three-dimensional object 2 is stationary will be described.

撮像装置３は、ここでは例えばビデオカメラであって、立体オブジェクト２の動画を撮像するものである。なお、後述するが、このオブジェクト認識装置１は、動画を構成する１枚１枚の静止画（フレーム）のデータからそれぞれ立体オブジェクト２の姿勢を算出するものであるから、この撮像装置３として写真機のように静止画を撮像するものを用いてもよく、その場合はいずれもが静止した状態で撮像が行われる。 Here, the imaging device 3 is, for example, a video camera and captures a moving image of the three-dimensional object 2. As will be described later, the object recognition device 1 calculates the orientation of the three-dimensional object 2 from each piece of still image (frame) data constituting a moving image. A device that captures a still image, such as a machine, may be used, and in that case, imaging is performed in a state where both are stationary.

傾斜計４は、インクリノメータあるいはクリノメータなどと称されるものであり、ここでは撮像装置３に取り付けられて、当該撮像装置３の直交する水平二軸周りの撮像装置３の傾きを検出する。この種の傾斜計４としては、測定方法の異なる種々のタイプのものが開発されているが、原理的にはどのタイプを用いてもよい。もちろん、検出精度がよく、測定時間の短いものが好ましく、本実施形態では精度が０．００１°〜０．１°のものを用いている。 The inclinometer 4 is called an inclinometer or a clinometer, and is attached to the imaging device 3 here to detect the inclination of the imaging device 3 around two orthogonal horizontal axes of the imaging device 3. Various types of inclinometers 4 having different measurement methods have been developed as this type of inclinometer 4, but in principle any type may be used. Of course, a sensor with good detection accuracy and a short measurement time is preferable. In this embodiment, a sensor with a precision of 0.001 ° to 0.1 ° is used.

情報処理装置５は、ＣＰＵ、メモリ、ディスプレイなどの出力機器、キーボードなどの入力機器等を備えた、例えば汎用のコンピュータであり、前記メモリに記憶されたオブジェクト認識用プログラムに従って、ＣＰＵやその他の周辺機器が協働することにより、前記画像データから立体オブジェクト２の姿勢を算出する算出部としての機能を発揮する。 The information processing device 5 is, for example, a general-purpose computer including a CPU, an output device such as a memory and a display, an input device such as a keyboard, and the like according to an object recognition program stored in the memory. When the devices cooperate, the device functions as a calculation unit that calculates the posture of the three-dimensional object 2 from the image data.

この算出部の機能の概要を説明すれば、透視ｎ点解法を利用して、画像に映し出された立体オブジェクト２の姿勢を算出するものであり、ここでは、前記立体オブジェクト２に設定された３点の定点を利用する。定点とは、立体オブジェクト２との相対位置関係が変動せず、しかもその相対位置関係が既知、すなわちコンピュータのメモリに予め蓄えられたものであって、立体オブジェクト２の画像データから情報処理装置５（コンピュータ）が認識可能な特徴的なポイントのことである。例えば、立体オブジェクト２の特徴的な形状部分（例えば角部分）を定点と設定してもよいし、何か目印となる表示をオブジェクト２の表面につけて、それを定点としてもよい。ただし、各定点をそれぞれ識別できるようにしておく必要がある（単に定点と判別できるだけでは足りず、それがどの定点であるのか、画像データなどから情報処理装置５が識別できるユニークなものでなければならない）。この実施形態では、情報処理装置５に機能部である図示しない認識部を設け、その認識部が画像データを画像処理することによって各定点を識別するようにしている。 The outline of the function of this calculation unit will be described. The perspective n-point solution method is used to calculate the posture of the solid object 2 shown in the image. Here, the 3D set for the solid object 2 is calculated. Use a fixed point. The fixed point is one in which the relative positional relationship with the three-dimensional object 2 does not change and the relative positional relationship is known, that is, stored in the memory of the computer in advance. It is a characteristic point that can be recognized by (computer). For example, a characteristic shape portion (for example, a corner portion) of the three-dimensional object 2 may be set as a fixed point, or a display serving as a mark may be attached to the surface of the object 2 and set as a fixed point. However, it is necessary to be able to identify each fixed point (it is not enough to simply distinguish it from a fixed point, and if it is not unique, the information processing device 5 can identify which fixed point it is from the image data) Must not). In this embodiment, a recognition unit (not shown) that is a functional unit is provided in the information processing apparatus 5, and the recognition unit identifies each fixed point by performing image processing on image data.

次に、この算出部による、より具体的な姿勢算出過程を、その原理説明を交えながら詳述する。 Next, a more specific posture calculation process by the calculation unit will be described in detail with an explanation of the principle.

＜前提＞
図２に示すように、オブジェクト２の座標系を(M_０, u, v, w)、撮像装置３の座標系を(O, i, j, k)とすれば、オブジェクト座標系で表したオブジェクト２上の定点
Mi = (Ui, Vi, Wi)^Tは、以下の変換式（数１）により、撮像装置３の座標系での座標(Xi, Yi, Zi)^Tに変換される。 <Premise>
As shown in FIG. 2, if the coordinate system of the object 2 is (M ₀ , u, v, w) and the coordinate system of the imaging device 3 is (O, i, j, k), the object coordinate system is used. Fixed point on object 2
Mi = (Ui, Vi, Wi) ^T is converted into coordinates (Xi, Yi, Zi) ^T in the coordinate system of the imaging apparatus 3 by the following conversion formula (Equation 1).

ここでRは回転行列、Tは並進ベクトルである。添字ｉは、３つの定点を区別するためのもので、０、１、２の３値をとる。 Here, R is a rotation matrix and T is a translation vector. The subscript i is for distinguishing three fixed points and takes three values of 0, 1, and 2.

まず、回転行列Rから説明すると、回転行列Rは、周知のように、式（数２）で定義されるものである。
α、β、γは、それぞれワールド座標系（オブジェクト座標系）でのｘ、ｙ、ｚ座標軸周りの回転角度を示し、Rx(α)、Ry(β)、Rz(γ)は、前記各座標軸周りの回転したときの変換行列を表す。 First, from the rotation matrix R, the rotation matrix R is defined by the equation (Equation 2) as is well known.
α, β, and γ indicate rotation angles around the x, y, and z coordinate axes in the world coordinate system (object coordinate system), respectively. Rx (α), Ry (β), and Rz (γ) are the coordinate axes. Represents the transformation matrix when rotating around.

具体的には、式（数３）に示す通りである。なお、ここでｘ軸、ｚ軸が直交する水平二軸であり、ｙが鉛直軸を表す。
Specifically, it is as shown in the equation (Equation 3). Here, the x axis and the z axis are two horizontal axes orthogonal to each other, and y represents the vertical axis.

ところで、前記傾斜計４が、αとγの角度を出力するから、この回転行列Rを算出するにはβだけを計算すればよい。 By the way, since the inclinometer 4 outputs the angles α and γ, in order to calculate the rotation matrix R, only β needs to be calculated.

一方、並進ベクトルTは、以下の式（数４）で示される。
ここでR³は、撮像装置３の座標系での３次元空間を表す。 On the other hand, the translation vector T is expressed by the following equation (Equation 4).
Here, R ³ represents a three-dimensional space in the coordinate system of the imaging device 3.

また、画像Ｇ（図４（ａ）等に示す）、すなわちビデオイメージの二次元座標系(C, i, j)において、前記定点が写る座標、すなわち定点の位置情報m_i= (x_i, y_i)は、ピンホールイメージモデルに基づき、以下の式（数５）で算出できる。
ただし、fは撮像装置３の焦点距離で既知の値である。 Further, in the image G (shown in FIG. 4 (a) and the like), that is, in the two-dimensional coordinate system (C, i, j) of the video image, the coordinates at which the fixed point appears, that is, the position information m _i = (x _i , y _i ) can be calculated by the following equation (Equation 5) based on the pinhole image model.
However, f is a known value for the focal length of the imaging device 3.

＜最適化手法を用いた姿勢算出＞
ここまでの前提に基づいて、前記算出部が、立体オブジェクト２の姿勢を算定するが、そのためには、非線形の２次以上の関数を解く必要がある。この実施形態では、例えばGauss-Newton法やLevenberg-Marquardtアルゴリズムなどの最適化アルゴリズムを用いて解を求める。通常、これらの手法は、４つの定点を必要とするが、ここでは、傾斜計４の検出結果を利用しているので、３つの定点で解を求めることができる。 <Attitude calculation using optimization method>
Based on the premise so far, the calculation unit calculates the posture of the three-dimensional object 2, but in order to do so, it is necessary to solve a non-linear quadratic or higher order function. In this embodiment, a solution is obtained by using an optimization algorithm such as Gauss-Newton method or Levenberg-Marquardt algorithm. Normally, these methods require four fixed points, but here, since the detection result of the inclinometer 4 is used, a solution can be obtained with three fixed points.

まず、前記式（数５）から以下の式（数６）を導き出せる。
したがって、Z₀とβがわかれば、立体オブジェクト２の姿勢を算定できる。 First, the following formula (formula 6) can be derived from the formula (formula 5).
Therefore, if Z ₀ and β are known, the posture of the three-dimensional object 2 can be calculated.

ここでψ_iを、R^２→R、すなわち２次元から１次元への変換関数と定義すれば、ψ_iは、以下の式（数７）で表すことができる。
Here [psi _i, R ² → R, i.e. by defining a conversion function from 2D to 1D, [psi _i can be expressed by the following equation (7).

しかして、姿勢を算出するためのZ₀,βは、以下のコスト関数σの最小化問題を解くことにより得られる。
Therefore, Z ₀ and β for calculating the posture can be obtained by solving the following minimization problem of the cost function σ.

なお、この実施形態では、ロバスト性が高いという観点から、解法としてLevenberg-Marquardtアルゴリズムを用いているが、その他の手法を用いても構わない。 In this embodiment, the Levenberg-Marquardt algorithm is used as a solution from the viewpoint of high robustness, but other methods may be used.

＜初期値の設定＞
前記最小化問題を解くには、Z₀,βの初期値を与える必要があるが、どのような初期値でもよいというわけではなく、最終解にできるだけ近い初期値を与えた方が、反復最適化による探索計算を早く、しかも正確に行える。 <Initial value setting>
In order to solve the minimization problem, it is necessary to give initial values of Z ₀ and β. However, any initial values may be used, and it is better to give initial values as close as possible to the final solution. The search calculation can be performed quickly and accurately.

具体的には、動画における１つ前のフレーム（静止画像）で求めた解を、初期値として与えるようにしている。フレーム間の非常に短い時間で立体オブジェクト２の姿勢が大きく変わることは、通常考えにくく、前のフレームでの解が、対象フレームでの解と近似している場合が多いからである。 Specifically, the solution obtained from the previous frame (still image) in the moving image is given as an initial value. It is usually difficult to think that the posture of the three-dimensional object 2 changes greatly in a very short time between frames, because the solution in the previous frame often approximates the solution in the target frame.

このように、１つ前のフレームが存在している場合は、初期値設定を容易に行うことができるが、最初のフレームでの初期値をどのように設定するかという問題は残る。 As described above, when there is a previous frame, the initial value can be easily set, but the problem of how to set the initial value in the first frame remains.

これについて、本実施形態では、以下のような手法を用いて、算出部による初期値設定が行われるようにしている。 In this regard, in the present embodiment, the initial value is set by the calculation unit using the following method.

ここでは、立体オブジェクト２が静止しているので、前記３定点のうちの２定点を結んだ直線が略鉛直となり、残りの１定点と前記２定点のうちの１定点とを結んだ直線が略水平となるように前記各定点の位置を設定する。言い換えれば、図３に示すように、３つの定点を結ぶ直線が三角形を形成するように構成し、そのうちの１辺が鉛直、他の１辺が水平となるように、３定点を設定する。 Here, since the solid object 2 is stationary, a straight line connecting two fixed points of the three fixed points is substantially vertical, and a straight line connecting the remaining one fixed point and one fixed point of the two fixed points is approximately. The position of each fixed point is set so as to be horizontal. In other words, as shown in FIG. 3, a straight line connecting three fixed points is configured to form a triangle, and three fixed points are set so that one of them is vertical and the other one is horizontal.

このような構成にした場合、タレスの定理（Thale’s therem）と前記式（数５）から、Z₀を、下記の式（数９）で表すことができる
ここで、Hは、図４（ａ）に示すように、（M₀M₁）//（Hm₀）かつH∈（OM₁）と定義されるポイントである。この式（数９）からわかるように、Z₀を算出するには、並進ベクトルTを算出すればよく、そのためには‖Hm₀‖を算出すればよい。 In such a configuration, Z ₀ can be expressed by the following formula (formula 9) from Thale's therem and the formula (formula 5).
Here, H is a point defined as (M ₀ M ₁ ) // (Hm ₀ ) and H∈ (OM ₁ ) as shown in FIG. As can be seen from this equation (Equation 9), Z ₀ can be calculated by calculating the translation vector T. To that end, その Hm ₀よい can be calculated.

しかして‖Hm₀‖は、下記の式（数１０）で表すことができる。
ここで、δとｓは、αの値とy₁、y_c、y₀の相対位置とに依存して定まるものであり、表１にその関係を示す。 Therefore, ‖Hm ₀ ‖ can be expressed by the following equation (Equation 10).
Here, δ and s are determined depending on the value of α and the relative positions of y ₁ , y _c , and y ₀ , and Table 1 shows the relationship.

例えば、θ＜０（すなわち、α＜０）で、かつy₁≦y_c≦y₀（図５参照）であるとすれば、s=1、δ=|θ|-φである。
For example, if θ <0 (that is, α <0) and y ₁ ≦ y _c ≦ y ₀ (see FIG. 5), s = 1 and δ = | θ | −φ.

図４（ｂ）からも直感的にわかるように、α（これは傾斜計４からの出力で得られる）とθとの差は微小であると考えられるから、この実施形態では、Z₀を算出するために、最初にθをαと仮定し、その結果得られるZ₀を初期値として計算を始めるようにしている。 As can be understood intuitively from FIG. 4B, since the difference between α (which is obtained from the output from the inclinometer 4) and θ is considered to be very small, in this embodiment, Z ₀ is In order to calculate, θ is first assumed to be α, and calculation is started with Z ₀ obtained as a result as an initial value.

次にβであるが、この値を求めるためには、まず回転行列Rを算出する必要がある。前記式（数１）、（数３）、(数４)から、以下の式（数１１）を導くことができる。なお、ここでの添字i = 1 or 2である。
Next, for β, in order to obtain this value, it is necessary to calculate the rotation matrix R first. From the equations (Equation 1), (Equation 3), and (Equation 4), the following equation (Equation 11) can be derived. Note that the subscript i = 1 or 2 here.

また、各係数は以下のように定義される。
Each coefficient is defined as follows.

この式から明らかなように、２つの線形方程式と２つの未知数cosβ、sinβがあるので、この実施形態では、定点M₁又はM₂を用いることにより、βを求めるようにしている。 As is apparent from this equation, there are two linear equations and two unknowns cos β and sin β. In this embodiment, β is obtained by using a fixed point M ₁ or M ₂ .

次に、より具体的なシミュレーション結果につき説明する。
この実施例では、等辺がそれぞれ１０ｃｍの直角三角形をなすように３つの定点を設定し、立体オブジェクト２が静止していてカメラがランダムに移動した場合と等価なモデル、すなわち、画像上で立体オブジェクト２がランダムに回転するモデルを仮想的に設定した。撮像装置３としては、ピンホールカメラ（焦点距離が５５０ピクセルでレンズの主点座標が（３０５、２２５））を仮想した。傾斜計４からの出力信号としては、理想信号に対し、平均値が０で標準偏差が０．０１のガウシアンノイズを加算したものを用いた。また撮像ノイズとして、平均値が０で標準偏差が０、１及び２ピクセルのガウシアンノイズをそれぞれ重畳した。 Next, more specific simulation results will be described.
In this embodiment, three fixed points are set so that each equilateral side forms a right-angled triangle of 10 cm, and a model equivalent to a case where the solid object 2 is stationary and the camera moves randomly, that is, a solid object on the image. A model in which 2 rotates randomly was set virtually. As the imaging device 3, a pinhole camera (focal length is 550 pixels and lens principal point coordinates are (305, 225)) is assumed. As an output signal from the inclinometer 4, a signal obtained by adding Gaussian noise having an average value of 0 and a standard deviation of 0.01 to an ideal signal was used. As imaging noise, Gaussian noise having an average value of 0 and a standard deviation of 0, 1, and 2 pixels was superimposed.

詳細なシミュレーション条件とその結果を以下に示す。
条件１：撮像装置３に対する立体オブジェクト２の相対位置を変化させた。具体的には、並進ベクトルT=(Tx,Ty,Tz)^Tにおいて、Tx=9cm, Ty=-1cmに固定するとともに、Tz/10を15から50まで１ずつ変化させた。
条件２：撮像装置３に対する立体オブジェクト２の相対位置及び姿勢を変化させた。具体的には、並進ベクトルT=(Tx,Ty,Tz)^Tにおいて、Tz=30cmに固定し、Tx/10を2から20まで１ずつ変化させるとともに、その変化の都度、並進ベクトルTと単位ベクトルｉとのなす角度が0から2πまで、π/4ずつ変化するように、Tyを定めた。 Detailed simulation conditions and the results are shown below.
Condition 1: The relative position of the three-dimensional object 2 with respect to the imaging device 3 was changed. Specifically, in the translation vector T = (Tx, Ty, Tz) ^T , Tx = 9 cm and Ty = −1 cm were fixed, and Tz / 10 was changed from 15 to 50 one by one.
Condition 2: The relative position and orientation of the three-dimensional object 2 with respect to the imaging device 3 were changed. Specifically, in translation vector T = (Tx, Ty, Tz) ^T , Tz = 30cm is fixed, Tx / 10 is changed from 2 to 20, one by one, and at each change, translation vector T and unit Ty was determined such that the angle formed with the vector i changed from 0 to 2π by π / 4.

条件１に対応するシミュレーション結果の一例を図６、条件２に対応するシミュレーション結果の一例を図７に示す。ここでは、算出部の計算結果である立体オブジェクト２の姿勢の理想値からの誤差（５００回試行したときの平均誤差、mean error (degree)）と、立体オブジェクト２の位置の理想値からの誤差（５００回試行したときの平均誤差、mean translation error）とをそれぞれ縦軸にとってグラフに示している。 An example of the simulation result corresponding to condition 1 is shown in FIG. 6, and an example of the simulation result corresponding to condition 2 is shown in FIG. Here, the error from the ideal value of the posture of the solid object 2 (mean error (degree) after 500 trials) and the error from the ideal value of the position of the solid object 2 are the calculation results of the calculation unit. (The average error at the time of 500 trials, mean translation error) is plotted on the vertical axis.

これから、ノイジーな環境下であっても、撮像装置３が立体オブジェクト２に近ければ姿勢角度（β）の誤差は、実に４°以下に留まっているし、それに応じて他の角度（α、γ）の誤差も小さいと推定できる。位置誤差に関しても、非常に小さく、実用上問題ないと考えられる。 From now on, even in a noisy environment, if the imaging device 3 is close to the three-dimensional object 2, the error of the posture angle (β) actually stays below 4 °, and other angles (α, γ) accordingly. ) Can also be estimated to be small. The position error is also very small, and it is considered that there is no practical problem.

次に、従前の他の手法でポーズを算出した場合と、この実施形態に係るオブジェクト認識装置１を用いた場合の、シミュレーションでの比較結果を示す。他の手法としては、傾斜計４からの出力を利用しない、POSITと呼ばれる手法と旧来のLevenberg-Marquardtアルゴリズムを基本とする手法（Opt.Algo.と表示する）とを選んだ。その結果を表２に示す。各シミュレーション結果は１０００回の平均であり、撮像ノイズとして、平均値が０で標準偏差が１ピクセルのガウシアンノイズを重畳した。
Next, a comparison result in a simulation when the pose is calculated by another conventional method and when the object recognition apparatus 1 according to this embodiment is used is shown. As another method, a method called POSIT that does not use the output from the inclinometer 4 and a method based on the old Levenberg-Marquardt algorithm (shown as Opt. Algo.) Were selected. The results are shown in Table 2. Each simulation result is an average of 1000 times, and Gaussian noise having an average value of 0 and a standard deviation of 1 pixel is superimposed as imaging noise.

定点が２つ又は３つの場合、従来手法では、解を全く得られなかったのに対し、本オブジェクト認識装置１によれば、表２中では示していないが、２定点で６０％の確率、３定点で９３％の確率、４定点以上では、９９％以上の確率で解を得られる。なお、表２で、ｒ_ｅは、平均姿勢角度誤差、ｔ_ｅは平均位置誤差を示している。また、計算に必要な最適化の反復回数（一定範囲に解が安定するまでの回数）を、例えば４定点以上の場合におけるOpt.Algo.との比較で言えば、本オブジェクト認識装置１によれば、６０回以下で済むところ、Opt.Algoでは１２０回以上が必要になる。 In the case where there are two or three fixed points, no solution can be obtained by the conventional method. On the other hand, according to the object recognition apparatus 1, although not shown in Table 2, a probability of 60% at two fixed points, A solution of 93% probability with 3 fixed points and a probability of 99% or more with 4 fixed points or more. In Table 2, r _e is the mean attitude angle error, t _e represents the average position error. Further, the number of optimization iterations necessary for the calculation (the number of times until the solution is stabilized within a certain range) is compared with Opt. Algo. For example, 60 times or less is required, but Opt.Algo requires 120 times or more.

したがって、このオブジェクト認識装置１又はオブジェクト認識用プログラムによれば、傾斜計４からの角度情報を利用しているので、定点が２つ乃至３つでも、ポーズの算出結果の信頼性を担保でき、また、定点を少なく設定できることから、その計算時間の短縮化を図ることができる。 Therefore, according to the object recognition apparatus 1 or the object recognition program, since the angle information from the inclinometer 4 is used, the reliability of the pose calculation result can be ensured even if there are two or three fixed points. In addition, since a fixed number of points can be set, the calculation time can be shortened.

特に前記実施形態では、その３定点のうちの２定点を結んだ直線が略鉛直となるように設定するとともに、その設定を利用して適切な初期値を前記傾斜計４からの一の角度情報を用いて計算するようにしているので、その効果がさらに顕著なものとなる。 In particular, in the embodiment, the straight line connecting the two fixed points among the three fixed points is set to be substantially vertical, and an appropriate initial value is set as one angle information from the inclinometer 4 using the setting. Since the calculation is performed using, the effect becomes even more remarkable.

なお、本発明は前記実施形態あるいは実施例に限定されるものではない。例えば、前記実施形態では、３つの定点を利用したが、精度が若干落ちることを許容すれば、２つの定点でも構わないし、計算時間を犠牲にして精度を向上させるなら、４つ以上の定点を利用してもよい。４つ以上の定点を利用する場合は、実施例で述べたように、傾斜計４を用いない従来のものよりも、精度の向上や計算時間の短縮を図ることができる。 In addition, this invention is not limited to the said embodiment or Example. For example, in the above-described embodiment, three fixed points are used. However, if the accuracy is allowed to be slightly reduced, two fixed points may be used. If accuracy is improved at the expense of calculation time, four or more fixed points may be used. May be used. When four or more fixed points are used, as described in the embodiment, the accuracy can be improved and the calculation time can be shortened as compared with the conventional one not using the inclinometer 4.

また、単純に３つの定点だけを設定したのでは、撮像方向によっては、その３つの定点のうちのいずれかが隠れてしまい、計算できなくなる恐れが生じる。したがって、どの方向から撮像しても、その画像内に定点が３つ以上存在するように、多くの（つまり４つ以上の）定点を設定しておき、実際の計算では、画像に現れた定点の中から３つを情報処理装置５が選択して、立体オブジェクト２の姿勢を算出するようにすればなおよい。なお、選択された定点がどの定点にあたるのか、各定点の違いを識別する認識部は、もちろん必要である。 If only three fixed points are simply set, depending on the imaging direction, one of the three fixed points may be hidden and calculation may not be possible. Therefore, a large number of fixed points (that is, four or more) are set so that there are three or more fixed points in the image no matter which direction is taken. It is more preferable that the information processing apparatus 5 selects three of the three to calculate the posture of the three-dimensional object 2. Of course, a recognition unit for identifying the fixed point to which the selected fixed point corresponds and the difference between the fixed points is necessary.

さらに本発明のオブジェクト認識装置１、あるいはオブジェクト認識用プログラムを組み込んで、分析機器や情報機器、システム機器など、複雑な操作を必要とするものにおいて、ユーザがその機器（つまり立体オブジェクト２）をビデオで撮像すると、リアルタイムで、その機器を認識し、機器の操作場所、操作方法などを、その画像上に矢印などで示すといった、リアルタイム動画マニュアルシステムを構築することもできるようになる。 Further, the object recognition apparatus 1 or the object recognition program of the present invention is incorporated, and in a device requiring complicated operations such as an analysis device, an information device, a system device, etc., the user can video the device (that is, the three-dimensional object 2) When an image is picked up, a real-time moving image manual system can be constructed in which the device is recognized in real time, and the operation location and operation method of the device are indicated by an arrow on the image.

その他、本発明は、その趣旨を逸脱しない範囲で、種々変形が可能である。 In addition, the present invention can be variously modified without departing from the spirit of the present invention.

本発明の一実施形態におけるオブジェクト認識装置の全体模式図。1 is an overall schematic diagram of an object recognition apparatus according to an embodiment of the present invention. 同実施形態における座標系を説明するための座標系説明図。Coordinate system explanatory drawing for demonstrating the coordinate system in the embodiment. 同実施形態における定点の設定方法を示した定点説明図。Fixed point explanatory drawing which showed the setting method of the fixed point in the same embodiment. 同実施形態における初期値設定に関する理解を助けるための補助図。The auxiliary figure for helping the understanding regarding the initial value setting in the same embodiment. 同実施形態における初期値設定に関する理解を助けるための補助図。The auxiliary figure for helping the understanding regarding the initial value setting in the same embodiment. シミュレーション結果の一例を示すものであって、縦軸に姿勢（角度）誤差及び位置誤差をとった結果グラフ。FIG. 9 is a graph showing an example of a simulation result, in which a posture (angle) error and a position error are taken on the vertical axis. シミュレーション結果の一例を示すものであって、縦軸に姿勢（角度）誤差及び位置誤差をとった結果グラフ。FIG. 9 is a graph showing an example of a simulation result, in which a posture (angle) error and a position error are taken on the vertical axis.

Explanation of symbols

１・・・オブジェクト認識装置
２・・・立体オブジェクト
３・・・撮像装置
４・・・傾斜計
５・・・情報処理装置 DESCRIPTION OF SYMBOLS 1 ... Object recognition apparatus 2 ... Solid object 3 ... Imaging device 4 ... Inclinometer 5 ... Information processing apparatus

Claims

Calculating a pose of the three-dimensional object using a plurality of fixed points whose relative positional relation with the three-dimensional object is determined and whose relative positional relation is known;
An imaging device for imaging the three-dimensional object in which at least two to three fixed points are set;
An inclinometer for detecting each inclination around two horizontal axes of the imaging device;
The fixed points are extracted from the image captured by the imaging device, and the position information of the fixed points on the image and the angle information about the two axes output from the inclinometer are used as parameters. An object recognition apparatus comprising: a calculation unit that calculates a pose.

The calculation unit calculates a pose using an iterative optimization method;
Using three fixed points, a straight line connecting two fixed points of the three fixed points is substantially vertical, and a straight line connecting the remaining one fixed point and one fixed point of the two fixed points is substantially horizontal. Set the position of each fixed point,
The calculation unit replaces the three-dimensional space coordinates of the fixed point that should be originally calculated from the angle formed by the vertical axis and the image plane of the imaging device, and sets the value of one angle information output from the inclinometer instead of the angle. The object recognition apparatus according to claim 1, wherein an assumption calculation is performed using the assumption calculation result as an initial value, and the iterative optimization is performed to calculate a three-dimensional spatial coordinate of a fixed point.

An imaging device that captures a three-dimensional object, and an inclinometer that detects each inclination about two horizontal axes of the imaging device,
The fixed points are extracted from the captured image of the three-dimensional object in which two to three fixed points are set, the position information of the fixed points on the image, and the angle information about the two axes output from the inclinometer, An object recognition program for causing a computer to function as a calculation unit that calculates a pose of the three-dimensional object using as a parameter.