JP2000041173A

JP2000041173A - Deciding method for visual point position attitude, camera device and visual point position sensor

Info

Publication number: JP2000041173A
Application number: JP10208307A
Authority: JP
Inventors: Kiyohide Sato; 清秀佐藤; Hiroyuki Yamamoto; 裕之山本
Original assignee: MR SYSTEM KENKYUSHO KK; MR System Kenkyusho KK
Current assignee: MR SYSTEM KENKYUSHO KK; MR System Kenkyusho KK
Priority date: 1998-07-23
Filing date: 1998-07-23
Publication date: 2000-02-08
Anticipated expiration: 2018-07-23
Also published as: JP3976900B2

Abstract

PROBLEM TO BE SOLVED: To uniquely decide a camera parameter to show the position attitude of a camera by acquiring the images of three marks existing within a prescribed plane through the camera, acquiring the image coordinates of those marks and then calculating and outputting a parameter to decide the camera position attitude based on the depth information on three marks and the image coordinates acquired from these depth information. SOLUTION: A camera parameter deciding device consists of an estimation module, a coordinate detection module and a parameter estimation module. The image coordinates of three land marks are acquired together with the depth information up to the land marks and a matrix U of C'=U'.W'-1 is acquired. Thus, the camera position is decided in AR (augmented reality). The camera parameter matrices U' and W' are shown in expressions I and II. In these expressions, the image coordinates of three marks are shown in (x1, y1), (x2, y2) and (x3, y3) with the depth information on the marks shown in z1, z2 and z3 and the world coordinate value of the marks shown in (XW1 YW1, O), (XW2, YW2, O) and (XW3, YW3 and O) respectively.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、視点位置姿勢の決
定方法、カメラ装置、位置姿勢センサに関し、特に、カ
メラの視点位置を、３つのマークによって決定する方法
などの改良に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for determining a viewpoint position and orientation, a camera device, and a position and orientation sensor, and more particularly to an improvement in a method for determining a viewpoint position of a camera by three marks.

【０００２】[0002]

【従来の技術】近年、現実空間と仮想空間の融合を目的
とした複合現実感（MR: Mixed Reality）に関する研究
が盛んになっている。その中でも、現実空間に仮想空間
の情報を重畳表示する技術は「オーグメンティド・リア
リティ」（AR: Augmented Reality）と呼ばれている。2. Description of the Related Art In recent years, researches on mixed reality (MR) for the purpose of integrating a real space and a virtual space have been actively conducted. Among them, a technique for superimposing and displaying information of a virtual space on a real space is called “Augmented Reality” (AR: Augmented Reality).

【０００３】ARの実現手段は、大きく２つに分類でき
る。１つは透過型のHMD（Head-Mounted Display）を利
用して、表示面越しに見える現実世界の光景に仮想物体
の映像を重畳する方式で、「光学シースルー方式」と呼
ばれている。もう１つは、ビデオカメラで撮影された映
像に仮想物体を重畳描画する方式で、「ビデオシースル
ー方式」と呼ばれている。いずれの方式においても、２
つの空間の自然な融合を実現するためには、「位置合わ
せ」、「画質の一致」、「３次元空間合成」（前後関係
や衝突の表現）といった要因について取り組まなくては
ならない。中でも「位置合わせ」は、ARを実現するため
の最も基本的でかつ重要な要因といえる。[0003] AR realizing means can be roughly classified into two types. One method is to use a transmission-type HMD (Head-Mounted Display) to superimpose an image of a virtual object on a real world scene seen over a display surface, which is called an “optical see-through method”. The other is a method of superimposing and drawing a virtual object on an image shot by a video camera, which is called a “video see-through method”. In either method, 2
In order to realize a natural fusion of two spaces, it is necessary to address factors such as “alignment”, “coincidence of image quality”, and “synthesis of three-dimensional space” (expression of context and collision). Above all, “alignment” is the most basic and important factor for realizing AR.

【０００４】ARにおける位置合わせとは、基本的に観察
者の視点（光学シースルー方式の場合）やカメラ（ビデ
オシースルー方式の場合）の位置や方位などのパラメー
タを計測・推定することである。これには、大きく２つ
の方式が用いられる。１つは磁気センサや超音波センサ
といった３次元位置方位センサを用いる「センサベー
ス」の方式であり、もう１つは主にビデオシースルー方
式のARで用いられる「画像ベース」の位置合わせ方式で
ある。[0004] Positioning in AR basically means measuring and estimating parameters such as the viewpoint (in the case of an optical see-through system) of a viewer and the position and orientation of a camera (in the case of a video see-through system). For this purpose, two methods are used. One is a "sensor-based" method using a three-dimensional position and orientation sensor such as a magnetic sensor or an ultrasonic sensor, and the other is an "image-based" method of alignment mainly used in video see-through AR. .

【０００５】[0005]

【発明が解決しようとする課題】センサベースの位置合
わせ方式は、動作の安定性の面では優れているものの、
ARに用いるには精度的に不十分な場合が多い。一方、画
像ベースの位置合わせ方式は、融合の対象となる現実の
画像情報を位置合わせに直接利用することで、精度の高
い位置合わせが可能である。そこには、コンピュータビ
ジョン分野で研究されてきた各種カメラ・キャリブレー
ションの手法を応用することが考えられる。しかし、AR
ではリアルタイム処理の制約のもとで各種処理を実装す
る必要があり、そのようなアルゴリズムではランドマー
クの抽出／識別処理に誤差が生じやすく、動作が不安定
になるという課題を有している。Although the sensor-based alignment method is excellent in operation stability,
In many cases, accuracy is insufficient for use in AR. On the other hand, the image-based positioning method enables highly accurate positioning by directly using the actual image information to be merged for positioning. It is conceivable to apply various camera calibration methods that have been studied in the field of computer vision. But AR
In such a case, it is necessary to implement various kinds of processing under the restriction of real-time processing, and such an algorithm has a problem that an error easily occurs in landmark extraction / identification processing and operation becomes unstable.

【０００６】位置あわせについて、従来の手法について
説明する。なお以下では、説明の簡略化のために、カメ
ラ座標系から画像座標系への投影は理想的な透視投影の
モデルに基づいて行われていると仮定する。すなわち、
画像の歪みや中心のずれ、アスペクト比といった要因は
事前に計測されており、画像座標抽出の段階で除去され
ているものとする。[0006] A conventional method for positioning will be described. In the following, for simplification of description, it is assumed that the projection from the camera coordinate system to the image coordinate system is performed based on an ideal perspective projection model. That is,
It is assumed that factors such as image distortion, center shift, and aspect ratio have been measured in advance and have been removed at the stage of image coordinate extraction.

【０００７】まず、カメラパラメータ推定の基本形につ
いて説明する。３次元空間中のランドマークQ_i（世界座
標Q_Wi = (X_Wi, Y_Wi, Z_Wi, 1)）が、カメラにより、画
像座標q_i= (x_i, y_i)で撮像されたとする。このカメラ
について透視投影を仮定すると、この投影は、34の変換
行列Cによって、First, a basic form of camera parameter estimation will be described. It is assumed that a landmark Q _i (world coordinates Q _Wi = (X _Wi , Y _Wi , Z _Wi , 1)) in a three-dimensional space is imaged by a camera at image coordinates q _i = (x _i , y _i ). . Assuming a perspective projection for this camera, this projection is given by 34 transformation matrices C,

【０００８】[0008]

【数４】 … (1) と表わすことができる。ここでh_iは媒介変数である。こ
の(1)式を展開すると、(Equation 4) … (1) Where h _i is a parameter. Expanding equation (1),

【０００９】[0009]

【数５】 …(2) が得られる。（２）の第３式から媒介変数h_iを消去する
ことができる。また、ランドマークの世界座標値(X_Wi,
Y_Wi, Z_Wi)は既知であり、そのランドマークに対して画
像上で観測座標値(x_i, y_i)が得られたのであるから、１
点のランドマークについての一対の世界座標値と観測座
標値は、(2)式の第１と第２の２つの式を与える。(Equation 5) … (2) is obtained. The parameter h _i can be eliminated from the third expression of (2). In addition, the world coordinates of the landmark (X _Wi ,
Y _Wi , Z _Wi ) are known, and the observation coordinates (x _i , y _i ) have been obtained on the image for that landmark, so 1
A pair of world coordinate values and observation coordinate values for a point landmark give the first and second two expressions of the expression (2).

【００１０】行列Cは３×４であるから、この１２個の
未知数、即ち、行列要素を有する。１つのランドマーク
は２つの式を与えるから、この行列Cを決定するために
は、同一平面上にない6点（ｉ＝１，２…６）以上の
（既知の）ランドマークが画像上で観察されていればよ
いことになる。この行列Cをいかに求めるかが、カメラ
パラメータの推定、すなわち、位置合わせの問題であ
る。Since the matrix C is 3 × 4, it has these 12 unknowns, ie, matrix elements. Since one landmark gives two equations, in order to determine the matrix C, six (i = 1, 2,... 6) or more (known) landmarks not on the same plane It just needs to be observed. How to find the matrix C is a problem of camera parameter estimation, that is, positioning.

【００１１】奥行き情報を利用してカメラのパラメータ
を推定することが提案されている。以下に、奥行き情報
を利用してカメラのパラメータを推定する方法を説明す
る。式(1)における媒介変数h_iは、カメラ座標系におけ
るランドマークQ_iの奥行き値Z_Ciと比例関係にあり、あ
る定数kを用いて、It has been proposed to estimate camera parameters using depth information. Hereinafter, a method of estimating camera parameters using depth information will be described. The parameter h _i in equation (1) is proportional to the depth value Z _Ci of the landmark Q _i in the camera coordinate system, and using a certain constant k,

【００１２】[0012]

【数６】 …(3) と表わすことができる。また、この比例関係を満たす値
であれば、kの値は任意に選ぶことができる。今、ラン
ドマークQ_iに対する奥行きの尺度として、(Equation 6) ... (3). Further, the value of k can be arbitrarily selected as long as the value satisfies the proportional relationship. Now, as a measure of depth for landmark Q _i ,

【００１３】[0013]

【数７】 …(4) を満たすような値z_iが得られているとする。この場合、
z_iを式(1)のh_iに代入することで、1点のランドマークに
ついて次の３つの式を得る。(Equation 7) It is assumed that a value z _i that satisfies (4) is obtained. in this case,
By substituting z _i for h _i in equation (1), the following three equations are obtained for one landmark.

【００１４】[0014]

【数８】 …(5) ここで、同一平面上にない4点（または以上）のランド
マークの世界座標値が、 (X_W1, Y_W1, Z_W1) (X_W2, Y_W2, Z_W2) (X_W3, Y_W3, Z_W3) (X_W4, Y_W4, Z_W4) …(6) と与えられ、画像座標系で、座標値が、 (x₁, y₁, z₁) (x₂, y₂, z₂) (x₃, y₃, z₃) (x₄, y₄, z₃) …(7) と観測された場合、(Equation 8) … (5) Here, the world coordinate values of four (or more) landmarks that are not on the same plane are (X _W1 , Y _W1 , Z _W1 ) (X _W2 , Y _W2 , Z _W2 ) (X _W3 , Y _W3 , Z _W3 ) (X _W4 , Y _W4 , Z _W4 )… (6), and in the image coordinate system, the coordinate values are (x ₁ , y ₁ , z ₁ ) (x ₂ , y ₂ , z ₂ ) (x ₃ , y ₃ , z ₃ ) (x ₄ , y ₄ , z ₃ )… (7)

【００１５】[0015]

【数９】 …(8)(Equation 9) … (8)

【００１６】[0016]

【数１０】 …(9) と表記すると、式(5)は、 U=C W …(10) と表わせるので、行列Cは次式によって求められる。(Equation 10) .. (9), Equation (5) can be expressed as U = CW... (10), and the matrix C is obtained by the following equation.

【００１７】 C=UW^-1 …(11) ここで、行列Ｗ^-1は、既知のランドマークの世界座標の
組によって表現される行列Ｗの逆行列であり、予め計算
しておくことができる。したがって、カメラパラメータ
（Ｃ＝｛a_ij｝）の推定問題は、従来においては、いか
にして行列U、すなわち、4点のランドマークの画像座標
(x_i, y_i)とその奥行きの尺度z_iを得るかという問題に帰
着する。C = UW ⁻¹ (11) Here, the matrix W ⁻¹ is an inverse matrix of the matrix W represented by a set of known landmark world coordinates, and can be calculated in advance. . Therefore, the problem of estimating the camera parameters (C = {a _ij }) is conventionally determined by how the matrix U, that is, the image coordinates of the four landmarks
The result is to get (x _i , y _i ) and its depth measure z _i .

【００１８】ランドマークの奥行きの尺度z_iを得るに
は、例えば、Mellor（J. P. Mellor:Realtime camera c
alibration for enhanced reality visualization, Pro
c. CVRMed 95, pp.471-475, 1995.）は、ランドマーク
の見かけの大きさの情報を利用する手法を提案した。こ
のMellorの手法は、ランドマークの見かけの大きさs_iが
視点からランドマークまでの距離に反比例することを利
用し、こうして得られたs_iの逆数1/s_iを式(6)のz_iとし
て用いることで、４点のランドマークを用いた位置合わ
せを行うものである。In order to obtain the landmark depth scale z _i , for example, Mellor (JP Mellor: Realtime camera c
alibration for enhanced reality visualization, Pro
c. CVRMed 95, pp.471-475, 1995.) proposed a method using information on the apparent size of landmarks. This method of Mellor makes use of the fact that the apparent size s _i of the landmark is inversely proportional to the distance from the viewpoint to the landmark.The reciprocal 1 / s _i of s _i obtained in this way is expressed by z in Equation (6). _By using it as _i , alignment using four landmarks is performed.

【００１９】このように、カメラパラメータの推定にラ
ンドマークの奥行き情報を用いる場合でも、前述したよ
うに、4点のランドマークが必要であった。カメラの配
置に拘束を課すことによっても、カメラパラメータを推
定することができる。中沢ら（中沢, 中野, 小松, 斎
藤: 画像中の特徴点に基づく実写画像とCG画像との動画
像合成システム, 映像情報メディア学会誌, Vol.51, N
o.7, pp.1086-1095, 1997.）は、Z=0の平面を利用して
カメラパラメータを推定する方法を提案している。即
ち、全てのランドマークが世界座標系におけるZ = 0の
平面に配置されているという前提に基づいて、4点のラ
ンドマークに同一平面上にあるという拘束を課して、カ
メラパラメータを推定するというものである。また、大
隈ら（大隈, 清川, 竹村, 横矢: ビデオシースルー型拡
張現実感のための実画像からのカメラパラメータの実時
間推定, 信学技報, PRMU97-113, 1997. ）は、焦点距離
を既知とすることで、中沢らの手法をさらに簡略化した
位置合わせを実現した。As described above, even when the landmark depth information is used for estimating camera parameters, four landmarks are required as described above. Camera parameters can also be estimated by imposing constraints on camera placement. Nakazawa et al. (Nakazawa, Nakano, Komatsu, Saito: Video Synthesizing System Based on Feature Points in Images and CG Images, Journal of the Institute of Image Information and Television Engineers, Vol.51, N
o.7, pp.1086-1095, 1997.) propose a method for estimating camera parameters using a plane of Z = 0. That is, based on the assumption that all landmarks are arranged on the plane of Z = 0 in the world coordinate system, the constraint that four landmarks are on the same plane is imposed, and the camera parameters are estimated. That is. Okuma et al. (Okuma, Kiyokawa, Takemura, Yokoya: Real-time estimation of camera parameters from real images for video see-through augmented reality, IEICE Technical Report, PRMU97-113, 1997.) By making it known, the alignment method that further simplified Nakazawa's method was realized.

【００２０】[0020]

【発明が解決しようとする課題】しかしながら、上述の
３つの先行技術の手法は、いずれも、４点のランドマー
クを必要とするものであり、リアルタイム処理の点で問
題があった。たしかに、３点のランドマークによる提案
（例えば、Fisher）も存在するが、これには煩雑な非線
形方程式を解く必要があり、またその解は複数存在する
ものであり、一意な解を得ることができず、従って、カ
メラパラメータを決定することはできていなかったとい
っても過言ではない。However, all of the above three prior art methods require four landmarks, and have a problem in real-time processing. Certainly, there are proposals using three landmarks (eg, Fisher), but this requires solving a complicated nonlinear equation, and there are multiple solutions, and it is difficult to obtain a unique solution. It is not an exaggeration to say that the camera parameters could not be determined.

【００２１】本発明は従来技術のこのような欠点に鑑み
てなされたもので、その目的は、３点のランドマークか
ら、カメラの位置姿勢を表すカメラパラメータを一意に
決定することができる視点位置の決定方法、カメラ装
置、カメラの位置姿勢検出方法なを提案することを目的
とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned drawbacks of the prior art, and an object of the present invention is to provide a viewpoint position at which a camera parameter representing a camera position and orientation can be uniquely determined from three landmarks. It is an object of the present invention to propose a method for determining the position, a camera device, and a method for detecting the position and orientation of a camera.

【００２２】[0022]

【課題を解決するための手段】上記課題を達成するため
の、本発明の請求項１にかかる方法は、所定の面内に置
かれた３つのマークの画像をカメラを用いて取得し、こ
の画像内で前記３つのマークの画像座標を夫々取得し、
前記３つのマークの奥行き情報を得、前記マークについ
て得られた、画像座標と奥行き情報とに基づいて、前記
カメラの位置姿勢決定のためのパラメータを演算して出
力することを特徴とする。According to a first aspect of the present invention, there is provided a method for acquiring an image of three marks placed in a predetermined plane using a camera. Acquire the image coordinates of each of the three marks in the image,
The depth information of the three marks is obtained, and parameters for determining the position and orientation of the camera are calculated and output based on the image coordinates and the depth information obtained for the marks.

【００２３】本発明は、種々のカメラやセンサの組合せ
に適用可能である。本発明の好適な一態様である請求項
２に拠れば、前記３つのマークの奥行き情報を、単眼カ
メラとこのカメラに設けられた位置姿勢センサの出力と
に基づいて検出することを特徴とする。また、本発明の
好適な一態様である請求項３に拠れば、前記３つのマー
クの奥行き情報を、ステレオカメラと三次元位置姿勢セ
ンサの出力とによって求めることを特徴とする。The present invention is applicable to various camera and sensor combinations. According to a preferred aspect of the present invention, the depth information of the three marks is detected based on a monocular camera and an output of a position and orientation sensor provided in the camera. . According to a third aspect of the present invention, the depth information of the three marks is obtained by a stereo camera and an output of a three-dimensional position and orientation sensor.

【００２４】マークの奥行き情報は常に完全に得られる
必要はない。単に、三珠まで表示することが目的でるな
らば、本発明の好適な一態様である請求項４の如く、少
なくとも１つのマークの奥行き情報が得られる時は、仮
想のマークの奥行き情報を設定する。本発明の好適な一
態様である請求項５に拠れば、前記３つのマークを、同
一直線上に乗ってない、世界座標でＺ＝０面上のマーク
に選ぶことを特徴とする。The mark depth information does not always need to be completely obtained. If the purpose is simply to display up to three beads, when the depth information of at least one mark is obtained, the depth information of the virtual mark is set as in claim 4, which is a preferred aspect of the present invention. I do. According to claim 5, which is a preferred aspect of the present invention, the three marks are selected as marks on the Z = 0 plane in world coordinates that are not on the same straight line.

【００２５】本発明の好適な一態様である請求項６に拠
れば、前記３つのマークを、同一直線上に乗ってない、
世界座標でＺ＝０でない任意の平面上のマークに選ぶ場
合には、Ｚ＝０でない前記任意の平面からＺ＝０平面へ
の変換行列を求め、この変換行列によって変換されたラ
ンドマークの世界座標を用いて、カメラのパラメータを
決定することを特徴とする。According to claim 6 which is a preferable aspect of the present invention, the three marks are not on the same straight line,
When selecting a mark on an arbitrary plane other than Z = 0 in world coordinates, a transformation matrix from the arbitrary plane other than Z = 0 to the Z = 0 plane is obtained, and the landmark world transformed by this transformation matrix is obtained. It is characterized in that parameters of the camera are determined using the coordinates.

【００２６】上記目的は、請求項６のカメラ装置によっ
ても達成される。例えば、画像と共に位置姿勢データを
出力するカメラ装置は、所定の面内に置かれた３つのマ
ークの画像を撮影するカメラと、このカメラで取り込ま
れたマークの画像の座標値を演算する手段と、前記３つ
のマークの奥行き情報を得る手段と、前記マークについ
て得られた、画像座標と奥行き情報とに基づいて、前記
カメラの位置姿勢決定のためのパラメータを演算する手
段とを具備することを特徴とする。The above object is also achieved by a camera device according to claim 6. For example, a camera device that outputs position and orientation data together with an image includes a camera that captures images of three marks placed in a predetermined plane, and a unit that calculates coordinate values of the images of the marks captured by the camera. Means for obtaining depth information of the three marks, and means for calculating parameters for determining the position and orientation of the camera based on image coordinates and depth information obtained for the marks. Features.

【００２７】上記課題は、頭部の位置姿勢センサに依っ
ても達成される。この場合の位置姿勢センサは、請求項
７のように、頭部近傍に取り付けられ、所定の面内に置
かれた３つのマークの画像を撮影するカメラと、このカ
メラで取り込まれたマークの画像座標値を求める手段
と、前記３つのマークの奥行き情報を得る手段と、前記
マークについて得られた、画像座標と奥行き情報とに基
づいて、前記カメラの位置姿勢決定のためのパラメータ
を演算して出力する手段とを具備する。The above object can also be achieved by a head position / posture sensor. The position and orientation sensor in this case is attached to the vicinity of the head and captures images of three marks placed in a predetermined plane, and an image of the mark captured by the camera. Means for obtaining coordinate values; means for obtaining depth information of the three marks; and calculating parameters for determining the position and orientation of the camera based on image coordinates and depth information obtained for the marks. Output means.

【００２８】上記課題は、請求項８のように、３つのマ
ークの、画像座標を(x₁, y₁), (x₂,y₂), (x₃, y₄)、奥
行き情報をz₁, z₂, z₃、世界座標値を(X_W1, Y_W1, 0)、
(X_W2,Y_W2, 0)、(X_W3, Y_W3, 0)とすると、カメラのパラ
メータ行列は、The object of the present invention is to set the image coordinates of the three marks to (x ₁ , y ₁ ), (x ₂ , y ₂ ), (x ₃ , y ₄ ) and the depth information to z ₁ , z ₂ , z ₃ , the world coordinate values are (X _W1 , Y _W1 , 0),
If (X _W2 , Y _W2 , 0) and (X _W3 , Y _W3 , 0), the camera parameter matrix is

【００２９】[0029]

【数１１】 [Equation 11]

【００３０】[0030]

【数１２】とすると、(Equation 12) Then

【００３１】[0031]

【数１３】によって与えられる行列Ｃから求めることを特徴とす
る。(Equation 13) From the matrix C given by

【００３２】[0032]

【発明の実施の形態】以下、添付図面を参照しながら、
本発明の実施形態を説明する。本発明の発明者たちは、
実施形態の手法を、中沢らの手法を拡張したものとして
位置づけている。本実施形態の手法は、3点のランドマ
ークによって位置合わせする、即ち、カメラパラメータ
の推定を可能にするものである。BRIEF DESCRIPTION OF THE DRAWINGS FIG.
An embodiment of the present invention will be described. The inventors of the present invention
The method of the embodiment is positioned as an extension of the method of Nakazawa et al. The method according to the present embodiment enables alignment using three landmarks, that is, enables estimation of camera parameters.

【００３３】まず、中沢らの手法を本発明の発明者の観
点から体系化する。〈Ｚ＝０平面を利用した推定〉世界座標系におけるラン
ドマークのZ座標を全て0とすると、ランドマークを画像
上で観測する時の投影関係を表わす座標変換行列は、
(1)式の行列Cの第3列（Z座標に関する成分）を省略した
3x3の行列だけで表現することができ、それCとして表わ
すとする。こうすると、ランドマークの、世界座標系か
ら画像座標系への投影は、(1)式よりも単純化されて、
次式のように記述できる。First, Nakazawa et al.'S method is systematized from the viewpoint of the inventor of the present invention. <Estimation Using Z = 0 Plane> Assuming that the Z coordinates of landmarks in the world coordinate system are all 0, a coordinate transformation matrix representing a projection relationship when observing the landmarks on an image is:
The third column of matrix C in equation (1) (the component related to the Z coordinate) was omitted.
It can be represented only by a 3x3 matrix, and it is represented as C. In this case, the projection of the landmark from the world coordinate system to the image coordinate system is simplified as compared with the equation (1).
It can be described as follows.

【００３４】[0034]

【数１４】 … (12) この式を展開して、[Equation 14] … (12) By expanding this equation,

【００３５】[0035]

【数１５】 …(13) が得られる。(13)の第３式を第１式，第２式に代入して
媒介変数h_iを消去すると、1点のランドマーク（X_Wi, Y
_Wi）について２つの式が得られる。ここで、a₃₄ =1とす
ると、行列Cについての未知数がa₁₁, a₁₂, a₁₄, a₂₁, a
₂₁, a₂₂, a₂₄, a₃ ₁, a₃₂という8個となることから、4点
以上のランドマーク（X_Wi, Y_Wi）(i= 1, 2, 3, 4)を観
測することで、行列Cを求めることができる。(Equation 15) … (13) is obtained. Substituting the third equation of (13) into the first and second equations to eliminate the parameter h _i , one landmark (X _Wi , Y
Two equations are obtained for _Wi ). Here, if a ₃₄ = 1, the unknowns of the matrix C are a ₁₁ , a ₁₂ , a ₁₄ , a ₂₁ , a
_21, from _{_{_{a 22, a 24, a 3}}} 1, be eight of a _32, four or more landmarks _{_{(X Wi, Y Wi) (}} i = 1, 2, 3, 4) to observe the Then, the matrix C can be obtained.

【００３６】このようにして得られた行列Cから行列Cの
第3列成分（a₁₃, a₂₃, a₃₃）を求めることで、カメラパ
ラメータCを推定することができる。行列Cから行列Cを
求めることの手順を以下にさらに詳しく説明する。一般
に、世界座標系から画像座標系への座標変換を表わす行
列C (3x4)は、カメラの焦点距離をｆとすれば、カメラ
座標系から画像座標系への透視変換行列P（3x4行列）
と、世界座標系からカメラ座標系への座標変換行列M（4
x4行列）によって、次式のように記述できる。The camera parameters C can be estimated by obtaining the third column components (a ₁₃ , a ₂₃ , a ₃₃ ) of the matrix C from the matrix C thus obtained. The procedure for obtaining the matrix C from the matrix C will be described in more detail below. Generally, a matrix C (3x4) representing a coordinate transformation from the world coordinate system to the image coordinate system is a perspective transformation matrix P (3x4 matrix) from the camera coordinate system to the image coordinate system, where f is the focal length of the camera.
And a coordinate transformation matrix M (4
x4 matrix) can be described as follows:

【００３７】[0037]

【数１６】 … (14) 一方、前述の行列Cも同様に、行列Pと、行列Mの第3列を
省略した行列M (4x3)とによって、次式のように記述す
ることができる。(Equation 16) (14) On the other hand, the above-described matrix C can be similarly described by the following equation using the matrix P and the matrix M (4x3) in which the third column of the matrix M is omitted.

【００３８】[0038]

【数１７】 … (15) 即ち、行列Cの各要素は上述の如く求まっているので、
カメラの焦点距離fが既知であれば、行列M'の各要素は
その行列C'から容易に求める事ができる。また、行列M
の第3列はカメラ座標系のz軸を表わしており、これは、
行列Mの（すなわち行列Mの）第1列（x軸）と第2列（y
軸）が表わす２つのベクトルに直交するベクトルとして
求めることができる。したがって、行列Mは行列Mから推
定可能であり、こうして得られた行列Mを式(14)に代入
することで、カメラパラメータを表わす行列Cを獲得す
ることができる。即ち、４点のランドマークをＺ＝０平
面に拘束することによって、カメラパラメータＣを得る
ことができる。[Equation 17] … (15) That is, since each element of the matrix C is obtained as described above,
If the focal length f of the camera is known, each element of the matrix M 'can be easily obtained from the matrix C'. Also, the matrix M
The third column of represents the z-axis of the camera coordinate system, which is
The first column (x-axis) and the second column (y
(Axis) can be obtained as a vector orthogonal to the two vectors represented by (axis). Therefore, the matrix M can be estimated from the matrix M, and the matrix M representing the camera parameters can be obtained by substituting the matrix M thus obtained into the equation (14). That is, the camera parameters C can be obtained by constraining the four landmarks on the Z = 0 plane.

【００３９】〈３点によるカメラパラメータ行列Ｃの推
定〉第１図に示すように、３点のランドマーク(Q₁, Q₂,
Q₃)の、世界座標系から画像座標系への投影は、(12)式
と同じように、次式のように記述できる。<Estimation of Camera Parameter Matrix C Using Three Points> As shown in FIG. 1, three landmarks (Q ₁ , Q ₂ ,
The projection from the world coordinate system to the image coordinate system in Q ₃ ) can be described as in the following equation, as in equation (12).

【００４０】[0040]

【数１８】 … (16) この式における媒介変数h_iは、カメラ座標系におけるラ
ンドマークQ_i(=Q₁, Q₂,Q₃)の奥行き値Z_Ciと比例関係に
あり、ある定数kを用いて、(Equation 18) (16) The parameter h _i in this equation is proportional to the depth value Z _{Ci of the} landmark Q _i (= Q ₁ , Q ₂ , Q ₃ ) in the camera coordinate system, and using a constant k,

【００４１】[0041]

【数１９】 …(17) と表わすことができる。また、この比例関係を満たす値
であれば、kの値は任意に選ぶことができる。今、ラン
ドマークQ_iに対する奥行きの尺度として、[Equation 19] ... (17) Further, the value of k can be arbitrarily selected as long as the value satisfies the proportional relationship. Now, as a measure of depth for landmark Q _i ,

【００４２】[0042]

【数２０】 …(18) を満たすような値z_i(z₁, z₂, z₃)が得られているとす
る。この場合、z_iを式(16)のh_iに代入することで、1点
のランドマークについて次の３つの式を得る。(Equation 20) It is assumed that a value z _i (z ₁ , z ₂ , z ₃ ) that satisfies (18) is obtained. In this case, the following three equations are obtained for one landmark by substituting z _i for h _i in equation (16).

【００４３】[0043]

【数２１】 …(19) 同一直線上にない3点以上のランドマークを観測した場
合、(Equation 21) … (19) When three or more landmarks that are not on the same line are observed,

【００４４】[0044]

【数２２】 …(20)(Equation 22) … (20)

【００４５】[0045]

【数２３】 …(21) と表記すると、式(16)の関係は、(Equation 23) … (21), the relationship of equation (16) is

【００４６】[0046]

【数２４】 …(22) と表わすことができるので、行列Cの第3列（Z座標に関
する成分）を省略した3x3の行列であるところの、行列C
は(Equation 24) ... (22), the matrix C is a 3x3 matrix in which the third column (component relating to the Z coordinate) of the matrix C is omitted.
Is

【００４７】[0047]

【数２５】 …(23) によって得ることができる。そして、得られたＣからカ
メラパラメータＣを前述の手法と同じように求めること
ができる。即ち、カメラ座標系から画像座標系への透視
変換行列をP（3x4行列）、世界座標系からカメラ座標系
への座標変換行列Mを（4x4行列）とすると、(Equation 25) … (23). Then, the camera parameter C can be obtained from the obtained C in the same manner as in the above-described method. That is, if the perspective transformation matrix from the camera coordinate system to the image coordinate system is P (3x4 matrix) and the coordinate transformation matrix M from the world coordinate system to the camera coordinate system is (4x4 matrix),

【００４８】[0048]

【数２６】 …(24) …(25) であり、行列C (3x4)は(Equation 26) …(twenty four) … (25) and the matrix C (3x4)

【００４９】[0049]

【数２７】 … (26) と表すことができ、行列Cも同様に、[Equation 27] … (26), and the matrix C is also

【００５０】[0050]

【数２８】 … (27) と表すことができる。カメラの焦点距離fが既知であれ
ば、行列Mの要素は、前述したように、行列Ｃから容易
に求めることができる。そして、行列Mの第3列は、行列
Mの（すなわち行列Mの）第1列（x軸）と第2列（y軸）が
表わす２つのベクトルに直交するベクトルとして求める
ことができる。したがって、行列Mは行列Mから推定可能
であり、こうして得られた行列Mを式(26)に代入するこ
とで、カメラパラメータを表わす行列Cを獲得すること
ができる。即ち、３点のランドマークをＺ＝０平面に拘
束することによって、カメラパラメータＣを得ることが
できた。[Equation 28] … (27) If the focal length f of the camera is known, the elements of the matrix M can be easily obtained from the matrix C as described above. And the third column of the matrix M is the matrix
It can be obtained as a vector orthogonal to the two vectors represented by the first column (x-axis) and the second column (y-axis) of M (that is, of the matrix M). Therefore, the matrix M can be estimated from the matrix M, and the matrix M representing the camera parameters can be obtained by substituting the matrix M thus obtained into the equation (26). That is, the camera parameters C could be obtained by constraining the three landmarks on the Z = 0 plane.

【００５１】即ち、式(23)の行列Ｗ^-1は、３点の既知の
ランドマークの世界座標の組であり、予め計算しておく
ことができる。したがって、カメラパラメータの推定問
題は、行列U、すなわち、3点のランドマークの画像座標
と、その奥行きの尺度z_iを求める問題に帰着する。な
お、３点のランドマークQ_iは必ず１つの平面上に存在す
るが、その平面が、第２図に示すように、世界座標系の
Z = 0平面でない場合がある。このような場合でも、そ
の3点Q_iのランドマークが配置された平面から、Z = 0平
面への座標変換行列N (4x4)は必ず存在し、また容易に
求めることができる。したがって、そのような座標変換
行列Nによって変換された各ランドマークの世界座標P^N
_Wiは、 P^N _Wi＝NQ_i …(28) であり、式(16)乃至式(27)を満足しなくてはならない。
即ち、変換された座標P^N _Wiについて式(16)乃至式(27)を
説いて得たカメラパラメータ行列をC^(N)とすると、 C = C^(N)N …(29) とすることで、正しいカメラパラメータＣが導出され
る。That is, the matrix W in equation (23)^-1Has three known
A set of landmark world coordinates, calculated in advance
be able to. Therefore, the camera parameter estimation question
The title is the matrix U, that is, the image coordinates of the three landmarks
And its depth scale z_iComes down to the problem of seeking. What
Contact, 3 landmarks Q_iAlways exists on one plane
However, as shown in FIG. 2, the plane is in the world coordinate system.
It may not be the Z = 0 plane. Even in such a case,
3 points Q_iZ = 0 flat from the plane where the landmarks are located
The coordinate transformation matrix N (4x4) to the surface always exists, and
You can ask. Therefore, such a coordinate transformation
World coordinates P of each landmark transformed by matrix N^N
_WiIs P^N _Wi= NQ_i (28), and must satisfy the expressions (16) to (27).
That is, the transformed coordinates P^N _WiEquations (16) through (27)
Let C be the camera parameter matrix obtained^(N)Then C = C^(N)By setting N ... (29), the correct camera parameter C is derived.
You.

【００５２】[0052]

【実施例】第３図は、実施例のカメラパラメータ決定装
置の構成を示す。この決定装置は、同図に示すように、
奥行き推定モジュール１００と座標検出モジュール２０
０とパラメータ推定モジュール３００とからなる。前述
したように、本発明の本質は、３つのランドマークの画
像座標と、そのランドマークまでの奥行き情報を得て、
式(23)の行列Uを求めることにより、ARにおけるカメラ
位置を決定することである。第３図の装置では、奥行き
情報を得るために、例えば、三次元位置方位センサ（磁
気センサ）と１つ以上のカメラを具備することが可能で
ある。従って、本装置に、三次元センサが接続されてい
るのか否か、あるいは、カメラが何台接続されているの
か、あるいは、対象のランドマークがその奥行き情報を
得ることができる程度に撮像されているのか否かによっ
ても、本装置の動作は異なる。以下、本装置の動作につ
いて、入力装置の種々の形態に従って説明する。FIG. 3 shows the structure of a camera parameter determining apparatus according to an embodiment. As shown in FIG.
Depth estimation module 100 and coordinate detection module 20
0 and a parameter estimation module 300. As described above, the essence of the present invention is to obtain image coordinates of three landmarks and depth information to the landmarks,
This is to determine the camera position in AR by obtaining the matrix U in equation (23). In the apparatus of FIG. 3, for example, a three-dimensional position and orientation sensor (magnetic sensor) and one or more cameras can be provided to obtain depth information. Therefore, whether or not a three-dimensional sensor is connected to the present apparatus, or how many cameras are connected, or a target landmark is imaged to such an extent that depth information can be obtained. The operation of the present device differs depending on whether the device is present or not. Hereinafter, the operation of the present device will be described according to various modes of the input device.

【００５３】尚、実施例の決定装置は、ソフトウエアに
よってもハードウエアによっても実現可能であり、第３
図の構成は一例に過ぎない。〈ステレオによる位置合わせ〉…第１実施例第１実施例は、第３図の装置が、ランドマークの画像を
入力するのにステレオカメラを有する場合におけるカメ
ラパラメータの決定手法である。The determining device of the embodiment can be realized by software or hardware.
The configuration in the figure is merely an example. <Positioning by Stereo> First Embodiment The first embodiment is a method of determining camera parameters when the apparatus shown in FIG. 3 has a stereo camera for inputting an image of a landmark.

【００５４】ビデオシースルー方式のARにおいて観察者
の左右の眼に視差画像を提示するためには、HMDにステ
レオカメラを装着し、それぞれのカメラの映像に対して
位置合わせを行う必要がある。第１実施例では、位置合
わせの手掛かりとして、この２台のカメラから得られる
情報を利用するものである。ステレオカメラを用いて位
置合わせを行う場合、両カメラによって得られた画像間
でのランドマークの対応関係を求めることで、ランドマ
ークまでの距離情報z_iが得られる。In order to present a parallax image to the left and right eyes of the observer in the AR of the video see-through system, it is necessary to mount a stereo camera on the HMD and perform positioning with respect to the image of each camera. In the first embodiment, information obtained from these two cameras is used as a clue for positioning. When positioning is performed using a stereo camera, distance information z _i to the landmark can be obtained by determining the correspondence between the landmarks between images obtained by the two cameras.

【００５５】説明を簡単にするために、2台のステレオ
カメラの光軸が互いに平行で、かつ基線と直交し、画像
座標系のx軸と平行なエピポーラ線を持つように正規化
されていると仮定する。ランドマークQ_iが、右画像上の
点q^R _i=(x^R _i, y^R _i)として、また、左画像上の点q^L _i=
(x^L _i, y^L _i)として観測されたとする（ただし、y^R _i= y^L
_i）。このとき、第４図に示すように、対応点間の視差d
_i(= x^L _i- x^R _i)は、Q_iの奥行き値Z_Ciと反比例の関係に
ある。For simplicity, the optical axes of the two stereo cameras are normalized so as to have an epipolar line that is parallel to each other, perpendicular to the base line, and parallel to the x-axis of the image coordinate system. Assume that The landmark Q _i is defined as a point q ^R _i = (x ^R _i , y ^R _i ) on the right image and a point q ^L _i =
(x ^L _i , y ^L _i ) (where y ^R _i = y ^L
_i ). At this time, as shown in FIG. 4, the parallax d between the corresponding points
_i (= x ^L _i -x ^R _i ) is inversely proportional to the depth value Z _{Ci of} Q _i .

【００５６】[0056]

【数２９】 …(30) したがって、3点のランドマークの対応点を得ることに
よって、行列Uを、(Equation 29) … (30) Therefore, by obtaining the corresponding points of the three landmarks, the matrix U is

【００５７】[0057]

【数３０】 …(31) とおくことで、右カメラのパラメータ、即ち、座標変換
行列C^Rを得ることができる。さらに、[Equation 30] By placing ... (31), the parameters of the right camera, i.e., it is possible to obtain the coordinate transformation matrix C ^R. further,

【００５８】[0058]

【数３１】 …(32) となることから、左カメラのパラメータ、即ち、座標変
換行列C^Lは、(Equation 31) (32), the parameters of the left camera, that is, the coordinate transformation matrix C ^L are

【００５９】[0059]

【数３２】 …(33) として、容易に求められる。ここで、a^R _jkは右カメラの
座標変換行列C^Rの各要素を表わしている。尚、ステレオ
カメラの光軸が互いに平行でなくて、輻輳している場合
でも、透視変換行列P (3x4)が既知であり、また、カメ
ラ間の相対位置が与えられていれば、ランドマークQ_iの
右カメラ座標系における奥行き値Z^R _Ciは、そのステレオ
画像上での対応関係から容易に求められる。したがっ
て、行列Uを、(Equation 32) … (33) is easily obtained. Here, a ^R _jk represents each element of the coordinate transformation matrix C ^R of the right camera. Incidentally, even when the optical axes of the stereo cameras are not parallel to each other and are congested, if the perspective transformation matrix P (3x4) is known and the relative position between the cameras is given, the landmark Q The depth value Z ^R _Ci of _{i in} the right camera coordinate system can be easily obtained from the correspondence on the stereo image. Therefore, the matrix U is

【００６０】[0060]

【数３３】 …(34) とおくことで、右カメラの座標変換行列C^Rが求められ
る。〈単眼画像とセンサによる位置合わせ〉…第２実施例第
１実施例は、第３図の装置が、ランドマークの画像を入
力するのにステレオカメラを有するシステムに本発明を
適用したものであった。第２実施例は、単眼のカメラと
三次元位置方位センサを有するシステムに本発明を適用
した場合におけるカメラパラメータの決定手法である。[Equation 33] .. (34), the coordinate transformation matrix C ^R of the right camera is obtained. <Alignment with Monocular Image and Sensor> Second Embodiment In the first embodiment, the present invention is applied to a system in which the apparatus shown in FIG. 3 has a stereo camera for inputting an image of a landmark. Was. The second embodiment is a method for determining camera parameters when the present invention is applied to a system having a monocular camera and a three-dimensional position and orientation sensor.

【００６１】画像ベースとセンサベースの位置合わせ手
法の相互の欠点を補うために、画像とセンサ双方の情報
を用いて位置合わせを行う試みがなされている。これに
は、画像ベースの位置合わせを安定化させるためにセン
サの情報を利用する考え方と、センサベースの位置合わ
せを主として捉え、その誤差を画像情報によって補正す
る考え方がある。第２実施例では、センサベースの位置
合わせにおける位置ずれを、単眼画像情報を用いて補正
する手法を以下に説明する。In order to compensate for the mutual drawbacks of the image-based and sensor-based registration methods, attempts have been made to perform registration using information on both the image and the sensor. This includes a concept of using sensor information to stabilize image-based alignment, and a concept of mainly capturing sensor-based alignment and correcting an error using image information. In the second embodiment, a method of correcting a positional shift in sensor-based alignment using monocular image information will be described below.

【００６２】〈3点が観測されている場合〉…第２-1実
施例画像上で3点のランドマークが抽出されている状況を想
定する。前述したように、各ランドマークの奥行き情報
が利用可能であれば、その3点のランドマークを用いて
式(23)を解くことができる。ここでは、３次元位置方位
センサによって大まかなカメラの位置姿勢情報が利用可
能であるので、この情報に基づいて、各ランドマークの
奥行き情報を導出する。<Case where Three Points are Observed> 2-1-1 Example A situation is assumed where three landmarks are extracted on an image. As described above, if depth information of each landmark is available, equation (23) can be solved using the three landmarks. Here, since approximate camera position and orientation information can be used by the three-dimensional position and orientation sensor, depth information of each landmark is derived based on this information.

【００６３】いま、ランドマークQ_i(i=1, 2, 3)が、画
像上の点q_i=(x_i, y_i)(i=1, 2, 3)として抽出されている
とする。このとき、３次元位置方位センサから得られる
カメラの位置姿勢M^WC（世界座標系からカメラ座標系へ
の4x4の座標変換行列として表現される）によって、ラ
ンドマークQ_iのカメラ座標は、Now, it is assumed that the landmark Q _i (i = 1, 2, 3) is extracted as a point q _i = (x _i , y _i ) (i = 1, 2, 3) on the image. . At this time, the position and orientation M ^WC of the camera obtained from the three-dimensional position direction sensor (expressed as the coordinate transformation matrix 4x4 from the world coordinate system to the camera coordinate system), the camera coordinate of the landmark Q _i is

【００６４】[0064]

【数３４】 …(35) と推定できる。このZ成分Z^(c) _CiをランドマークQ_iの奥
行き情報として利用する。第５図に示すように、ランド
マークQ₁, Q₂, Q₃が画像上で観測されたとする。このと
き、行列Uは、各ランドマークの画像座標と、式(35)に
よって求められる奥行き情報に基づいて、以下のように
設定できる。(Equation 34) … (35). Utilizing the Z component Z ^(c) _Ci as the depth information of the landmark Q _i. As shown in FIG. 5, it is assumed that landmarks Q ₁ , Q ₂ and Q ₃ are observed on the image. At this time, the matrix U can be set as follows based on the image coordinates of each landmark and the depth information obtained by Expression (35).

【００６５】[0065]

【数３５】 …(36) この行列Uから求めたカメラパラメータ行列Cは、３次元
位置方位センサ出力から求められるカメラパラメータ行
列C^(c)(=PM^(WC))に対して、3点のランドマーク上での位
置ずれを除去するような補正が加えられたものとなる。(Equation 35) … (36) The camera parameter matrix C obtained from this matrix U is expressed on the three landmarks with respect to the camera parameter matrix C ^(c) (= PM ^(WC) ) obtained from the three-dimensional position and orientation sensor output. Is corrected so as to remove the positional deviation.

【００６６】〈2点が観測されている場合〉…第２-2実
施例画像上で2点のランドマークが抽出されている状況を仮
定する。この場合、第3のランドマークを仮想的に設定
することで、上記と同様にカメラパラメータが推定でき
る。第６図に示すように、ランドマークQ₁, Q₂が観測さ
れたとする。第3の（仮想）ランドマークQ₃が、Q_W1, Q
_W2とは同一直線上にないZ=0平面上の点Q_W3に存在すると
仮定する。ランドマークQ₁,Q₂,Q₃の奥行き値Z^(c) _Ciを式
(35)によって求め、さらに、ランドマークQ₃の画像面上
への投影座標(x^(c) ₃, y^(c) ₃)を次式によって推定する。<Case where Two Points Are Observed> 2-2nd Example Assume that two landmarks are extracted on the image. In this case, by virtually setting the third landmark, the camera parameters can be estimated in the same manner as described above. Assume that landmarks Q ₁ and Q ₂ are observed as shown in FIG. The third (virtual) landmark Q ₃ is Q _W1 , Q
It is assumed that there is a point Q _W3 on the Z = 0 plane that is not collinear with _W2 . The depth value Z ^(c) _Ci of the landmarks Q ₁ , Q ₂ , Q ₃ is ^calculated as
Determined by (35), further, estimates projected coordinates onto the image plane of the landmark Q ₃ a ^{_{(x (c) 3, y}} (c) 3) by the following equation.

【００６７】[0067]

【数３６】 …(37)[Equation 36] … (37)

【００６８】[0068]

【数３７】 …(38) これらを用いて、行列Uを式(36)のように設定する。こ
のようにして求めたカメラパラメータ行列Cは、３次元
位置方位センサ出力から求められるカメラパラメータ行
列に対して、2点のランドマーク上での位置ずれを除去
するような補正が加えられたものとなる。(37) (38) Using these, the matrix U is set as in equation (36). The camera parameter matrix C obtained in this manner is obtained by correcting the camera parameter matrix obtained from the three-dimensional position and orientation sensor output so as to remove a displacement on two landmarks. Become.

【００６９】〈1点が観測されている場合〉…第２-3実
施例画像上で1点のランドマークが抽出されている状況で
も、２点の場合と同様にして仮想のランドマークを２つ
想定することで、ランドマーク上での位置ずれが補正で
きる。ただし、1点の場合は第７図に示すように、カメ
ラ座標系の原点と点q₁を通る直線上の、Z = Z^(c) _C1とな
る点Q_C1を求め、 Q^(c) _C1 - Q_C1 …(39) だけカメラの位置情報を平行移動させることで、より簡
便にランドマーク上での位置ずれが補正できる。<Case where One Point is Observed> 2-3 Example Even if one landmark is extracted on the image, two virtual landmarks are set in the same manner as in the case of two points. By assuming this, it is possible to correct the displacement on the landmark. However, in the case of one point, as shown in FIG. 7, a point Q _C1 on the straight line passing through the origin of the camera coordinate system and the point q ₁ where Z = Z ^(c) _C1 is obtained, and Q ^(c) _C1 -By moving the camera position information in parallel only by Q _C1 (39), the displacement on the landmark can be corrected more easily.

【００７０】〈ステレオと３次元センサによる位置合わ
せ〉…第３実施例前述の画像ベースの位置合わせ手法と、センサベースの
位置ずれ補正手法を統合する手法を提案する。前述の手
法は、入力された3つのランドマーク（仮想を含む）の
画像座標(x_i,y_i)と奥行き情報Q_iから、式(23)における
行列Uを求め、これを解くことでカメラパラメータを表
わす行列Cを推定するものであった。第３実施例では、
これらの手法を統合することで、ステレオカメラとセン
サ情報を併用した位置合わせ手法を実現する。この統合
は、左右の画像上でのランドマークの抽出状況にしたが
って、各ランドマークの奥行き値の推定手法を適応的に
切り替えることで実現する。以下、ランドマークの抽出
される状況毎に、カメラ座標の推定手法を述べる。<Positioning by Stereo and Three-Dimensional Sensor> Third Embodiment A method of integrating the above-described image-based positioning method and the sensor-based position shift correction method is proposed. The above-described method obtains a matrix U in equation (23) from the input image coordinates (x _i , y _i ) of three landmarks (including virtual) and depth information Q _i , and solves the matrix U to solve this. It was to estimate the matrix C representing the parameters. In the third embodiment,
By integrating these methods, an alignment method using both a stereo camera and sensor information is realized. This integration is realized by adaptively switching the method of estimating the depth value of each landmark according to the extraction state of the landmark on the left and right images. Hereinafter, a method of estimating camera coordinates for each situation where a landmark is extracted will be described.

【００７１】〈3点を全て両眼で抽出〉…第3-1実施例 3点を全て両眼で抽出した場合、即ち、３点の全てがス
テレオカメラで抽出された場合には、その各ランドマー
クの奥行き値をステレオ情報に基づいて推定する。すな
わち、第１実施例（式(30)乃至式(33)）の手法をそのま
ま適用する。〈2点を両眼で、1点を単眼で抽出〉…第3-2実施例３点のうち、2点（Q₁,Q₂）がステレオカメラによって、
1点（Q₃）を単眼で抽出された場合には、その1点（Q₃）
の奥行き情報は直ちには求まらない。<Extracting All Three Points with Binocular> Third Embodiment When all three points are extracted with both eyes, that is, when all three points are extracted with a stereo camera, each of the three points is extracted. The depth value of the landmark is estimated based on the stereo information. That is, the method of the first embodiment (Equations (30) to (33)) is applied as it is. <Two points are extracted with both eyes, and one point is extracted with one eye> ... Example 3-2 Two of the three points (Q ₁ , Q ₂ ) are obtained by a stereo camera.
If one point (Q ₃ ) is extracted with a single eye, that one point (Q ₃ )
Is not immediately obtained.

【００７２】そこで、２点Q₁, Q₂の奥行き値Z_C1, Z
_C2を、そのステレオ情報に基づいて推定する。一方、セ
ンサ情報に基づいた各ランドマークの奥行き値Z^(s) _C1,Z
^(s) _C2,Z⁽ ^s) _C3を式(35)を用いて推定する。さらに、ラン
ドマークQ₁,Q₂について、Therefore, the depth values Z _C1 , Z of the two points Q ₁ , Q ₂
_C2 is estimated based on the stereo information. On the other hand, the depth value Z ^(s) _C1 , Z of each landmark based on the sensor information
^(s) _C2 and Z ⁽ ^s) _C3 are estimated using equation (35). Furthermore, for landmarks Q ₁ and Q ₂ ,

【００７３】[0073]

【数３８】 …(40) を満たす係数k_iを求め、その平均値k_avを算出する。こ
の係数k_avを用いて、(38) The coefficient k _i that satisfies (40) is obtained, and the average value k _av is calculated. Using this coefficient k _av ,

【００７４】[0074]

【数３９】 …(41) によって得られるZ_C3をQ₃の奥行き値として、式(36)か
ら行列Uを求める。〈1点を両眼で、2点を単眼で抽出〉…第3-3実施例この場合は、Q₁の奥行き値Z_C1をステレオ情報に基づい
て推定する。一方、センサ情報に基づいた各ランドマー
クの奥行き値Z^(s) _C1,Z^(s) _C2,Z^(s) _C3を式(35)を用いて推
定する。さらに、ランドマークQ₁について、[Equation 39] The matrix U is obtained from Expression (36) using Z _C3 obtained by (41) as the depth value of Q ₃ . <1 point in both eyes, extracted with monocular two points> ... In this case 3-3 embodiment estimates based on the depth value Z _C1 Q ₁ at the stereo information. On the other hand, the depth values Z ^(s) _C1 , Z ^(s) _C2 , and Z ^(s) _C3 of each landmark based on the sensor information are estimated using Expression (35). In addition, for the landmark Q _1,

【００７５】[0075]

【数４０】 …(42) を満たす係数k_avを求め、式(41)と同様にしてQ₂,Q₃の奥
行き値を算出し、式(35)に代入して行列Uを求める。〈2点を両眼で抽出〉…第3-4実施例この場合には、Q₁,Q₂の奥行き値Z_C1,Z_C2をステレオ情報
に基づいて推定し、センサ情報に基づいた奥行き値Z^(s)
_C1,Z^(s) _C2から係数k_avを算出する。さらに、第２実施例
と同様の手法を用いて、第3の（仮想）ランドマークQ₃
の画像座標(x^(s) ₃, y^(s) ₃)と奥行き値Z^(s) _C3を推定し、
式(41)によって得られるZ_C3をQ₃の奥行き値とする。こ
れらの値を式(35)に代入して行列Uを求める。(Equation 40)... coefficient k that satisfies (42)_avAnd calculate Q in the same manner as in equation (41)._Two, Q_ThreeDeep inside
The outgoing value is calculated and substituted into equation (35) to obtain a matrix U. <Extracting two points with both eyes> ... Example 3-4 In this case, Q₁, Q_TwoDepth value of Z_C1, Z_C2The stereo information
And depth value Z based on sensor information^(s)
_C1, Z^(s) _C2To the coefficient k_avIs calculated. Further, the second embodiment
The third (virtual) landmark Q_Three
Image coordinates (x^(s) _Three, y^(s) _Three) And depth value Z^(s) _C3And estimate
Z obtained by equation (41)_C3The Q_ThreeIs the depth value of. This
The matrix U is obtained by substituting these values into equation (35).

【００７６】〈1点を両眼で、1点を単眼で抽出〉…第3-
5実施例 Q₁の奥行き値Z_C1をステレオ情報に基づいて推定し、セ
ンサ情報に基づいた奥行き値Z^(s) _C1から係数k_avを算出
する。また、センサ情報に基づいて、Q₂の奥行き値Z^(s)
_C2と、第3の（仮想）ランドマークQ₃の画像座標(x^(s)
_3,y ^(s) ₃と奥行き値Z^(s) _C3を推定し、式(40)によってQ₂,
Q₃の奥行き値を算出する。これらの値を式(35)に代入
して行列Uを求める。<One point is extracted with both eyes and one point is extracted with one eye>
Fifth Embodiment The depth value Z _C1 of Q ₁ is estimated based on stereo information, and a coefficient k _av is calculated from the depth value Z ^(s) _C1 based on sensor information. Further, based on the sensor information, Q ₂ 'depth value Z ^(s)
And _{C2, the} third of (virtual) landmark Q ₃ of the image coordinates ^{(x (s)}
_{3, y} ^(s) ₃ and depth value Z ^(s) _C3 are estimated, and Q ₂ ,
To calculate the depth value of Q _3. The matrix U is obtained by substituting these values into equation (35).

【００７７】〈1点を両眼で抽出〉…第3-6実施例 Q₁のカメラ座標Q_C1をステレオ情報に基づいて推定し、
第２-3実施例の手法によってカメラ位置を補正する。〈ステレオ情報が得られない場合〉…第3-7実施例ステレオ情報が得られない場合とは、m点を単眼で抽出
した場合であり、各ランドマークの奥行き値を、センサ
情報に基づいて推定する。すなわち、第２実施例の手法
をそのまま適用する。[0077] <extracting one point with both eyes> ... estimated based the 3-6 embodiment to Q ₁ camera coordinate Q _C1 in the stereo information,
The camera position is corrected by the method of the 2-3 embodiment. <Case where stereo information cannot be obtained> ... Example 3-7 Example Where stereo information cannot be obtained is a case where point m is extracted with a single eye, and the depth value of each landmark is determined based on sensor information. presume. That is, the method of the second embodiment is applied as it is.

【００７８】〈処理選択の制御〉前述したように、本発
明のカメラパラメータの決定装置は、装着されているセ
ンサもしくはカメラの有無や数によって処理が異なる。
また、センサもしくはカメラに変更がなくとも、画像と
して検出されたランドマークの数によっても、第１実施
例乃至第３実施例のいずれかの形態をとることが適応的
に要求される。この選択は、例えばパラメータ推定モジ
ュール３００が行う。<Control of Process Selection> As described above, the processing of the camera parameter determination device of the present invention differs depending on the presence or absence and number of sensors or cameras mounted.
Even if there is no change in the sensor or camera, it is adaptively required to take any one of the first to third embodiments depending on the number of landmarks detected as images. This selection is performed by the parameter estimation module 300, for example.

【００７９】即ち、モジュール３００は、決定装置にい
かなるデバイスが装着されているかを不図示のインタフ
ェースを介して知ることができる。デバイスの種類／数
などを知ったならば、モジュール３００は、座標検出モ
ジュール２００に問い合わせて、現在カメラから取り込
んでいる画像中にいくつのランドマークが捕捉されてい
るかを調べる。その結果、奥行き推定モジュール１００
に対して、処理アルゴリズム（第１実施例乃至第３実施
例）の切り換え命令を発する。That is, the module 300 can know what device is attached to the determination device via an interface (not shown). Once the device type / number is known, the module 300 queries the coordinate detection module 200 to find out how many landmarks are captured in the image currently being captured from the camera. As a result, the depth estimation module 100
, An instruction to switch the processing algorithm (first to third embodiments) is issued.

【００８０】〈実験結果〉以上の位置合わせ手法の有効
性を評価するための実験を行った。実験には、6自由度
の磁気方位センサ（Polhemus社Fastrak）と小型カラーC
CDカメラ（ELMO社MN-421）2台を装着したHMDを使用し
た。提示映像の生成には、左右の映像それぞれに対して
1台のSiliconGraphic社製ワークステーションO2を使用
した。またランドマーク追跡の画像処理は、PCに搭載し
た2台の日立製画像処理ボードIP5005によって行った。
カメラからの映像はO2と画像処理ボードに分岐入力し、
PCからO2へのデータ転送はイーサネットでのパケット通
信によって行った。<Experimental Results> An experiment was conducted to evaluate the effectiveness of the above-described alignment method. The experiment included a 6-DOF magnetic azimuth sensor (Polhemus Fastrak) and a small color C
An HMD equipped with two CD cameras (ELMO MN-421) was used. To generate the presentation video,
One SiliconGraphic workstation O2 was used. Landmark tracking image processing was performed by two Hitachi image processing boards IP5005 mounted on a PC.
The video from the camera is branched and input to O2 and the image processing board,
Data transfer from PC to O2 was performed by packet communication over Ethernet.

【００８１】現実空間のランドマークとして、世界座標
の既知な複数の点に赤色のマークを設置した。あらかじ
め設定したマーク色の（YUV空間における）存在範囲に
したがって、画像処理ボードは入力画像に二値化、ラベ
リング処理を施し、各クラスタの重心の座標をビデオレ
ートで抽出する。抽出された座標データはO2に転送さ
れ、センサ情報から得られる各ランドマークの予測観測
位置との比較により、ランドマークの同定が行われる。As landmarks in the real space, red marks were set at a plurality of points with known world coordinates. The image processing board binarizes and labels the input image according to a preset range of the mark color (in the YUV space), and extracts the coordinates of the center of gravity of each cluster at a video rate. The extracted coordinate data is transferred to O2, and the landmark is identified by comparison with the predicted observation position of each landmark obtained from the sensor information.

【００８２】構築したシステムにおける提示画像の更新
レートは平均10Hzであった。磁気センサのみを用いた場
合の更新レートも同じく10Hzであり、位置合わせのため
の演算がシステム全体のパフォーマンスに与える影響
は、ほとんど無視できる程度のものであることが確認さ
れた。提案した手法の定量的評価を行うために、同一の
状況に対していくつかの位置合わせアルゴリズムを適用
し、位置ずれの変化を計測した。位置ずれの計測は、ラ
ンドマーク以外に３次元位置の既知な基準点を多数用意
して、ランドマークや基準点の各点での位置ずれの大き
さの平均値を算出することで行った。実験は、(a)3点両
眼、(b)2点両眼＋1点単眼、(c)1点両眼＋2点単眼、(d)3
点単眼、(e)2点両眼、(f)1点両眼＋1点単眼、(g)2点単
眼、(h)1点両眼、(i)1点単眼のランドマーク情報と３次
元位置方位センサ情報を併用した場合と、(j)３次元位
置方位センサのみを用いた場合について行った。The update rate of the presented image in the constructed system was 10 Hz on average. The update rate when only the magnetic sensor was used was also 10 Hz, and it was confirmed that the effect of the alignment calculation on the performance of the entire system was almost negligible. In order to evaluate the proposed method quantitatively, we applied some alignment algorithms to the same situation and measured the change of displacement. The misalignment was measured by preparing a number of known three-dimensional position reference points other than the landmarks and calculating the average value of the magnitude of the misalignment at each point of the landmarks and the reference points. The experiments were (a) 3-point binocular, (b) 2-point binocular + 1 point monocular, (c) 1 point binocular + 2 point monocular, (d) 3
Landmark information and 3D of single point eye, (e) 2 point binocular, (f) 1 point binocular + 1 point monocular, (g) 2 point monocular, (h) 1 point binocular, (i) 1 point monocular The case where the position and orientation sensor information was used together and the case where (j) only the three-dimensional position and orientation sensor were used were performed.

【００８３】実験に使用した入力画像（データA：右眼
画像）を第８図に、上記(a),(d),(e),(j)の各条件での
実験の融合結果を第９図乃至第１２図に示す。図中
“□”印は抽出されたランドマーク位置を示す。仮想物
体としては、現実の立方体に対してそのワイヤーフレー
ムモデルのＣＧ図形を重畳して表示した。各条件での誤
差の値を第１３図に示す。第１３図乃至第１５図中、横
軸が上記(a)乃至(j)の手法、縦軸が発生した位置ずれを
示している。さらに、観測対象とHMD、磁気センサのエ
ミッタとの位置関係を変化させながら同様の実験を行っ
た結果を第１４図乃至第１５図に示す。データBはHMDを
エミッタから遠く、観測対象により近い位置に、データ
CはHMDを観測対象から離れた位置に設置した場合の結果
である。単眼のみを用いる手法を比較すると、3点の特
徴点を用いる手法(d)はいずれも高い精度で位置合わせ
が可能であり、2点(g)、1点(i)の位置ずれ補正によって
も、補正を行わない場合(j)に比べ格段の精度の向上が
得られていることがわかる。FIG. 8 shows the input image (data A: right eye image) used in the experiment, and FIG. 8 shows the results of the fusion of the experiment under the conditions (a), (d), (e), and (j). This is shown in FIGS. 9 to 12. In the figure, “□” indicates an extracted landmark position. As a virtual object, a CG figure of the wire frame model was superimposed on a real cube and displayed. The values of the error under each condition are shown in FIG. In FIGS. 13 to 15, the horizontal axis represents the above-described methods (a) to (j), and the vertical axis represents the generated positional deviation. Further, FIGS. 14 to 15 show the results of similar experiments performed while changing the positional relationship between the observation target, the HMD, and the emitter of the magnetic sensor. Data B moves the HMD away from the emitter and closer to the observation target.
C is the result when the HMD is installed at a position distant from the observation target. Comparing the method using only a single eye, the method (d) using three feature points can be aligned with high accuracy, and the method (d) using two points (g) and one point (i) It can be seen that a marked improvement in accuracy is obtained as compared with the case where no correction is performed (j).

【００８４】また、ステレオの情報による位置合わせの
効果を見ると、データA（第１３図）,データC（第１５
図）では、1点の場合(h)を除いて、ステレオの情報を用
いない場合のほうが誤差が小さいという結果となった。
これは、既知として与えたカメラ間の相対位置や画像処
理による特徴点抽出の誤差の影響と考えられる。一方、
データBの状況においては、ステレオの情報を用いるこ
とで、位置合わせ精度が向上していることがわかる。こ
れは、ステレオによる距離情報推定の精度がカメラから
物体までの距離と反比例の関係にあり、観察対象が近く
に存在するデータBの状況においてはステレオ画像処理
による距離情報の精度が比較的良かったためであると想
像される。Looking at the effect of the alignment by the stereo information, data A (FIG. 13) and data C (15
In the figure), except for the case of one point (h), the result was smaller when no stereo information was used.
This is considered to be due to the influence of an error in feature point extraction by image processing or a relative position between cameras given as known. on the other hand,
In the situation of the data B, it can be seen that the use of stereo information improves the alignment accuracy. This is because the accuracy of distance information estimation by stereo is inversely proportional to the distance from the camera to the object, and the accuracy of distance information by stereo image processing was relatively good in the situation of data B where the observation target was nearby. It is supposed to be.

【００８５】〈ソフトウエアとしての組み込み〉本決定
装置は、カメラの視点位置を精度よく検出して、その検
出された視点での座標変換パラメータ、即ち、カメラパ
ラメータを出力するものである。カメラパラメータの出
力は、座標変換行列Ｃの行列要素を出力することに他な
らない。この座標変換行列Ｃの行列要素の決定・出力を
ソフトを用いて行うときは、その決定のための処理ルー
チンは、ARもしくはMRのためのアプリケーション・プロ
グラムに組み込むことも可能であるが、ＨＭＤ本体内も
しくはカメラ本体内にＲＯＭとして組み込んでもよい。
行列要素の決定処理はユーザ側よりもＨＭＤ装置や位置
姿勢センサのメーカ側が開発した方が好都合だからであ
る。<Incorporation as Software> The present determination device detects the position of the viewpoint of the camera with high accuracy, and outputs a coordinate conversion parameter at the detected viewpoint, that is, a camera parameter. The output of the camera parameters is nothing but outputting the matrix elements of the coordinate transformation matrix C. When determining and outputting the matrix elements of the coordinate transformation matrix C using software, a processing routine for the determination can be incorporated in an application program for AR or MR. It may be incorporated as a ROM in the camera or in the camera body.
This is because it is more convenient for the matrix element determination process to be developed by the manufacturer of the HMD device or the position and orientation sensor than by the user.

【００８６】また、パソコンやワークステーションに適
用するときは、ドライバソフトウエアとして組み込む。〈実施形態及び実施例の効果〉本明細書では、現実空間
と仮想空間の融合技術に関連して、ステレオカメラと３
次元センサを併用した位置合わせ手法を提案した。この
手法により、センサベースと画像ベースの位置合わせ手
法を同一の枠組みで扱うことが可能となった。When the present invention is applied to a personal computer or a workstation, it is incorporated as driver software. <Effects of Embodiments and Examples> In this specification, a stereo camera and a three-
An alignment method using a dimensional sensor was proposed. This method makes it possible to handle sensor-based and image-based registration methods in the same framework.

【００８７】第２実施例，第３実施例に述べた手法は、
画像処理によって得られる情報が、常に最も信頼性が高
いという前提に基づいている。しかし、実験結果からも
明らかなように、ステレオによって得られる距離情報の
精度は、ランドマークまでの距離と依存関係にある。一
方、３次元位置方位センサの精度は、センサ固有の計測
範囲にしたがって変化する。今後はこのような画像処理
およびセンサの特性にしたがって、情報の信頼性を評価
し、それに応じて最適な解を選択するような位置合わせ
手法を検討することが重要といえる。The method described in the second embodiment and the third embodiment is as follows.
It is based on the assumption that the information obtained by image processing is always the most reliable. However, as is clear from the experimental results, the accuracy of the distance information obtained by the stereo depends on the distance to the landmark. On the other hand, the accuracy of the three-dimensional position and orientation sensor changes according to the measurement range unique to the sensor. In the future, it will be important to evaluate the reliability of information in accordance with such image processing and sensor characteristics, and to study a positioning method that selects an optimal solution in accordance with the reliability.

【００８８】また、本手法によって求められる座標変換
は座標軸の直交性を保持していないため、仮想空間に不
自然な変形が施される場合がある。このような状況への
対処も今後の課題といえる。Further, since the coordinate transformation obtained by this method does not maintain the orthogonality of the coordinate axes, an unnatural deformation may be applied to the virtual space. Dealing with such a situation is also an issue for the future.

【００８９】[0089]

【発明の効果】以上説明したように、本発明によれば、
３つのランドマークによって、位置姿勢を正確に検出す
ることができる。As described above, according to the present invention,
The position and orientation can be accurately detected by the three landmarks.

[Brief description of the drawings]

【図１】実施形態に位置姿勢検出装置において、１つ
の視点と３つのランドマークとの配置関係を説明する
図。FIG. 1 is a view for explaining an arrangement relationship between one viewpoint and three landmarks in a position and orientation detection apparatus according to an embodiment.

【図２】Ｚ＝０でない任意の平面に一般化したときの
補正の手法を説明する図。FIG. 2 is a view for explaining a correction method when generalizing to an arbitrary plane other than Z = 0.

【図３】実施形態の装置の構成を説明する図。FIG. 3 is a diagram illustrating a configuration of an apparatus according to the embodiment.

【図４】２つの視点位置と１つのランドマークとの関
係を説明する図。FIG. 4 is a view for explaining the relationship between two viewpoint positions and one landmark.

【図５】ランドマークを３点用いる場合におけるカメ
ラ姿勢パラメータを決定する原理を説明する図。FIG. 5 is a view for explaining the principle of determining camera posture parameters when three landmarks are used.

【図６】ランドマークを２点を用い、一点を仮想する
場合におけるカメラ姿勢パラメータを決定する原理を説
明する図。FIG. 6 is a view for explaining the principle of determining camera posture parameters when two points are used as a landmark and one point is imagined;

【図７】ランドマークを１点用い、２点を仮想する場
合におけるカメラ姿勢パラメータを決定する原理を説明
する図。FIG. 7 is a view for explaining the principle of determining a camera posture parameter when two points are imagined using one landmark.

【図８】本発明の実施例を用いた実験に用いられた物
体の斜視図。FIG. 8 is a perspective view of an object used in an experiment using the embodiment of the present invention.

【図９】３点のランドマークをステレオカメラを用い
て行った実験で実験対象物の上に仮想図形とが重なって
表示されることを説明する図。FIG. 9 is a view for explaining that virtual landmarks are displayed on an experimental object in an overlapping manner in an experiment in which three landmarks are performed using a stereo camera.

【図１０】３点のランドマークを、単眼カメラと三次
元センサとを用いて行った実験で実験対象物の上に仮想
図形とが重なって表示されることを説明する図。FIG. 10 is a view for explaining that three landmarks are displayed so that a virtual figure is superimposed on an experimental object in an experiment performed using a monocular camera and a three-dimensional sensor.

【図１１】２点のランドマークを、ステレオカメラと
三次元センサとを用いて行った実験で実験対象物の上に
仮想図形とが重なって表示されることを説明する図。FIG. 11 is a view for explaining that two landmarks are displayed with a virtual figure superimposed on an experimental object in an experiment performed using a stereo camera and a three-dimensional sensor.

【図１２】三次元センサのみを用いて行った実験で実
験対象物の上に仮想図形とが重なって表示されることを
説明する図。FIG. 12 is a view for explaining that a virtual figure is displayed so as to be superimposed on a test object in an experiment performed using only a three-dimensional sensor.

【図１３】データＡについて、条件ａ乃至ｊと変えて
いったときの夫々において発生した位置ずれを説明する
図。FIG. 13 is a view for explaining positional displacements that occur when data A is changed to conditions a to j.

【図１４】データＢについて、条件ａ乃至ｊと変えて
いったときの夫々において発生した位置ずれを説明する
図。FIG. 14 is a view for explaining positional displacements that have occurred in data B when the conditions are changed to conditions a to j.

【図１５】データＣについて、条件ａ乃至ｊと変えて
いったときの夫々において発生した位置ずれを説明する
図。FIG. 15 is a view for explaining positional displacements that have occurred in data C when the conditions are changed to conditions a to j.

Claims

[Claims]

1. An image of three marks placed in a predetermined plane is obtained using a camera, image coordinates of the three marks are obtained in the image, and depth information of the three marks is obtained. A method for calculating and outputting parameters for determining the position and orientation of the camera based on image coordinates and depth information obtained for the mark.

2. The viewpoint position and orientation determination according to claim 1, wherein depth information of the three marks is detected based on a monocular camera and an output of a position and orientation sensor provided in the camera. Method.

3. The method of determining a viewpoint position and orientation according to claim 1, wherein depth information of the three marks is obtained from an output of a stereo camera and a three-dimensional position and orientation sensor.

4. The method according to claim 1, wherein when depth information of at least one mark is obtained, depth information of a virtual mark is set.

5. The viewpoint position and orientation according to claim 1, wherein the three marks are selected as marks on the Z = 0 plane in world coordinates that do not lie on the same straight line. How to determine.

6. In the case where the three marks are selected as marks on an arbitrary plane which is not on the same straight line and which is not on the Z = 0 plane in world coordinates, Z is selected from the arbitrary plane which is not on the Z = 0 plane. The viewpoint position according to any one of claims 1 to 3, wherein a transformation matrix to a = 0 plane is obtained, and camera parameters are determined using the world coordinates of the landmark transformed by the transformation matrix. How to determine the posture.

7. A camera device for outputting position and orientation data together with an image, comprising: a camera for capturing images of three marks placed in a predetermined plane; and coordinate values of the images of the marks captured by the camera. Means for calculating depth information of the three marks; means for calculating parameters for determining the position and orientation of the camera based on image coordinates and depth information obtained for the marks; and A camera device comprising:

8. A head position / posture sensor, which is attached near the head and captures images of three marks placed in a predetermined plane, and an image of the mark captured by the camera. Means for obtaining coordinate values; means for obtaining depth information of the three marks; and calculating parameters for determining the position and orientation of the camera based on image coordinates and depth information obtained for the marks. And a means for outputting.

9. A storage medium for storing a program for causing a computer to execute the method for determining a viewpoint position and orientation according to claim 1. Description:

10. The image coordinates of the three marks are represented by (x₁,
y₁), (x_Two, y_Two), (x_Three, y _Four), Depth information z₁, z_Two, z_Three,
The world coordinate value is (X_W1, Y_W1, 0), (X_W2, Y_W2, 0), (X _W3, Y
_W3, 0), the camera's parameter matrix is:(Equation 2)Then,From the matrix C given by
Camera position and orientation detection method.