JP2011118724A

JP2011118724A - Apparatus and program for estimating posture of camera

Info

Publication number: JP2011118724A
Application number: JP2009276182A
Authority: JP
Inventors: Kankun Boku; 漢薫朴; Hideki Mitsumine; 秀樹三ッ峰; Masato Fujii; 真人藤井; Masahiro Shibata; 正啓柴田
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2009-12-04
Filing date: 2009-12-04
Publication date: 2011-06-16
Anticipated expiration: 2029-12-04
Also published as: JP5291605B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve accuracy, robustness, and flexibility in shooting environment in estimating a posture of a camera. <P>SOLUTION: An apparatus 1 for estimating a posture of camera includes: a tracking state measuring part 40 which measures a tracking state including an error err<SB>1</SB>corresponding to edges and an error err<SB>2</SB>corresponding to feature points, on the basis of a three-dimensional model 11 of a subject, a feature point database 21, and an input photographic image; a reliability calculation part 50 which calculates reliability f of integrated tracking obtained by integrating edge-base tracking and feature point-base tracking, as an index for prorating the error err<SB>1</SB>corresponding to edges and the error err<SB>2</SB>corresponding to feature points in accordance with the tracking state; a reliability correction part 60 which generates a correction reliability η when necessary; and a camera posture estimation means 70 which varies a rate of prorating the error err<SB>1</SB>corresponding to edges and the error err<SB>2</SB>corresponding to feature points, in accordance with the correction reliability η to generate an integrated error err, and estimates a camera posture E<SP>k+1</SP>so as to minimize the integrated error. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、仮想現実感、拡張現実感、ロボット制御、映像合成などで利用されるカメラ姿勢推定装置およびカメラ姿勢推定プログラムに係り、特に、モデルベースのカメラ姿勢推定装置およびカメラ姿勢推定プログラムに関する。 The present invention relates to a camera posture estimation device and a camera posture estimation program used in virtual reality, augmented reality, robot control, video composition, and the like, and more particularly to a model-based camera posture estimation device and a camera posture estimation program.

カメラパラメータ（カメラ姿勢）を推定する技術は、映像合成や仮想現実感（ＶＲ：Virtual Reality）、拡張現実感（ＡＲ：Augmented Reality）を表現する装置において必要とされる技術である。近年では、特殊なセンサを用いないで撮影映像からカメラ姿勢を推定する手法が提案されているが、カメラトラッキングの安定性や精度、変化する撮影環境に対する柔軟性に課題があり、改善が望まれている。 A technique for estimating camera parameters (camera posture) is a technique required in an apparatus that expresses video synthesis, virtual reality (VR), and augmented reality (AR). In recent years, methods have been proposed to estimate the camera posture from captured images without using special sensors, but there are issues with the stability and accuracy of camera tracking and the flexibility of changing shooting environments, and improvements are desired. ing.

このうち、例えば、撮影画像中のマーカ座標を基にしてカメラパラメータを計算するマーカベースの方法には、多くの先行技術があり、いくつかの手法は既に実用化されている。ただし、撮影方向にマーカを設置することは、運用面において多くの制限を与えている。例えば、マーカは、視覚的な妨害となったり、屋外環境では遠景への設置を考えると設置自体が困難であったり、相応のサイズが必要となったりする場合がある。このような理由から、マーカを用いない手法が多く提案されている。中でも、その頑健性と柔軟性からモデルベース手法を基本とした手法がいくつか提案されている。 Among these, for example, there are many prior arts for marker-based methods for calculating camera parameters based on marker coordinates in a captured image, and some methods have already been put into practical use. However, installing a marker in the shooting direction places many restrictions on operation. For example, the marker may be a visual obstacle, or may be difficult to install in an outdoor environment, or may require a corresponding size in consideration of installation in a distant view. For this reason, many methods that do not use markers have been proposed. Among them, several methods based on model-based methods have been proposed because of their robustness and flexibility.

モデルベースによるカメラ姿勢推定手法（モデルベースカメラ姿勢推定手法）では、この手法を実現するシステムが、例えば図３（ａ）に示すような撮影シーンの３次元モデルを予め保有していることを前提としている。この例では、３次元モデルは、例えば、直方体３０１、円柱３０２、それらの周囲の床や壁と、その３次元座標の情報である。 In the model-based camera posture estimation method (model-based camera posture estimation method), it is assumed that a system that realizes this method has a three-dimensional model of a shooting scene as shown in FIG. It is said. In this example, the three-dimensional model is, for example, a rectangular parallelepiped 301, a cylinder 302, a floor or a wall around them, and information on the three-dimensional coordinates thereof.

図３（ａ）に符号３０３で示す領域についての被写体の特徴を示す情報を、過去に求めたカメラ姿勢やラフに（粗く）求めたカメラ姿勢でカメラ座標空間に投影して生成したカメラ映像（投影画像）を図３（ｂ）に示す。ここでは、投影画像にエッジ３０４，３０５，３０６を含んでいる。ここで、３次元モデルとラフなカメラ姿勢は、事前に計測するなどして既知であると仮定し、推定を行なうものとする。 3A is a camera image generated by projecting information indicating the characteristics of the subject in the region indicated by reference numeral 303 in FIG. 3A onto the camera coordinate space with a camera posture obtained in the past or a roughly (roughly) obtained camera posture. A projected image) is shown in FIG. Here, the projection image includes edges 304, 305, and 306. Here, it is assumed that the three-dimensional model and the rough camera posture are assumed to be known, for example, by measuring in advance.

そして、図３（ｃ）に示すように、現在の映像（カメラ画像）から、特徴点（例えば、被写体表面模様に含まれる絵柄のコーナーなど）やエッジなどの視覚的手がかりを抽出する。ここでは、現在の映像にエッジ３１４，３１５，３１６を含んでいる。そして、図３（ｄ）に示すように、現在の映像から抽出した特徴点やエッジなどの視覚的手がかりと、予めシステムが有している撮影シーンの３次元モデルをラフに求めたカメラ姿勢で投影したカメラ映像との位置ずれを求める。この例では、カメラトラッキングにより、エッジ３０４，３０５，３０６と、エッジ３１４，３１５，３１６との位置ずれが求められる。そして、この位置ずれが最小化するように、構造を変換する関数（並進の移動量Δｔや回転の移動量ΔＲ）を求める。この一連の手順により、モデルベースカメラ姿勢推定手法は、精度の高いカメラ姿勢を推定することができる。 Then, as shown in FIG. 3C, visual cues such as feature points (for example, corners of a pattern included in the subject surface pattern) and edges are extracted from the current video (camera image). Here, the current video includes edges 314, 315, and 316. Then, as shown in FIG. 3 (d), visual cues such as feature points and edges extracted from the current video, and a camera posture roughly obtained from a three-dimensional model of a shooting scene that the system has in advance. Find the positional deviation from the projected camera image. In this example, the positional deviation between the edges 304, 305, and 306 and the edges 314, 315, and 316 is obtained by camera tracking. Then, a function for converting the structure (translational movement amount Δt and rotational movement amount ΔR) is obtained so that the positional deviation is minimized. With this series of procedures, the model-based camera posture estimation method can estimate a highly accurate camera posture.

従来、映像上の特徴点またはエッジのいずれか一方を用いるモデルベースカメラ姿勢推定手法が知られている。また、映像上の特徴点およびエッジの双方を用いるモデルベースカメラ姿勢推定手法（非特許文献１〜５参照）も知られている。非特許文献１〜４に記載の技術は、エッジと特徴点のずれの総和を最小化する推定手法に関するものである。また、非特許文献５に記載の技術は、エッジおよび特徴点の双方の理論的な分析と、エッジおよび特徴点の双方の特徴の連続性とに基づいた手法に関するものである。 Conventionally, a model-based camera posture estimation method using either a feature point or an edge on a video is known. A model-based camera posture estimation method (see Non-Patent Documents 1 to 5) that uses both feature points and edges on a video is also known. The techniques described in Non-Patent Documents 1 to 4 relate to an estimation method for minimizing the sum of deviations between edges and feature points. The technique described in Non-Patent Document 5 relates to a method based on theoretical analysis of both edges and feature points and continuity of features of both edges and feature points.

Vacchetti L, Lepetit V,Fua P. Combining edge and texture information for real-time accurate 3D camera tracking. In Proc. of IEEE and ACM Intl. Sym. on Mixed and Augmented Reality, 2004;48-57.Vacchetti L, Lepetit V, Fua P. Combining edge and texture information for real-time accurate 3D camera tracking.In Proc. Of IEEE and ACM Intl.Sym. On Mixed and Augmented Reality, 2004; 48-57. Dornaika F, Garcia C. Pose estimation using point and line correspondences. Real-Time Imaging 1999; 5: 215-230.Dornaika F, Garcia C. Pose estimation using point and line correspondences.Real-Time Imaging 1999; 5: 215-230. Pressigout M, Marchand E. Real-time 3D model-based tracking: Combining edge and texture information. In Proc. of Intl. Conf. on Robotics and Automation, 2006: 2726-2731.Pressigout M, Marchand E. Real-time 3D model-based tracking: Combining edge and texture information.In Proc. Of Intl. Conf. On Robotics and Automation, 2006: 2726-2731. Park H, 0h J, Seo BK, Park JI. Object-adaptive tracking for AR guidance system. In Proc. of Intl. Conf. on Virtual-Reality Continuum and Its Applications in Industry, 2008.Park H, 0h J, Seo BK, Park JI. Object-adaptive tracking for AR guidance system.In Proc. Of Intl. Conf. On Virtual-Reality Continuum and Its Applications in Industry, 2008. Rosten E, Drummond T. Fusing points and lines for high performance tracking. In Proc. of Intl. Conf on Computer Vision, 2005; 2: 1508-1511.Rosten E, Drummond T. Fusing points and lines for high performance tracking.In Proc. Of Intl. Conf on Computer Vision, 2005; 2: 1508-1511.

しかしながら、映像上の特徴点またはエッジのいずれか一方を用いる既存のモデルベースカメラ姿勢推定手法は、信頼性の高い推定を行なうためには、限定された環境に拘束するか、または、環境条件をある程度拘束する必要がある。具体的には、次の（１）〜（３）等の制限が必要である。
（１）撮影シーンに、くっきりした模様が含まれる。
（２）撮影シーンに、複数の強いエッジが含まれる。
（３）撮影条件として、照明などの環境が変化しない。 However, existing model-based camera pose estimation methods that use either feature points or edges on the video are restricted to a limited environment or environmental conditions in order to perform reliable estimation. It is necessary to restrain to some extent. Specifically, the following restrictions (1) to (3) are necessary.
(1) A clear pattern is included in the shooting scene.
(2) The shooting scene includes a plurality of strong edges.
(3) An environment such as illumination does not change as a photographing condition.

また、映像上の特徴点およびエッジの双方を用いるモデルベースカメラ姿勢推定手法では、非特許文献１〜４に記載の技術は、特徴点を用いた誤差とエッジを用いた誤差とを統合する誤差を考えて双方に重み付けを行なうものではない。また、仮にそのような統合をしていると捉えたとしてもヒューリステックに（正解ではなく簡易的にまたは近似的に）重みを決定している。つまり、非特許文献１〜４に記載の技術は、特徴点およびエッジからの寄与の双方を解析的に融合して利用するものではなく、環境を拘束する必要がある。 Further, in the model-based camera posture estimation method using both feature points and edges on the video, the techniques described in Non-Patent Documents 1 to 4 are errors that integrate errors using feature points and errors using edges. In consideration of this, weighting is not performed on both sides. Even if it is assumed that such integration is performed, the weight is determined heuristically (not simply but simply or approximately). In other words, the techniques described in Non-Patent Documents 1 to 4 do not use the feature points and the contributions from the edges in an analytically fused manner, and must constrain the environment.

また、非特許文献５に記載の技術は、特徴点およびエッジの双方について理論的な分析をしているものの、特徴点およびエッジの双方の環境条件に対する依存性までは考慮するものではなく、環境を拘束する必要がある。 The technique described in Non-Patent Document 5 theoretically analyzes both feature points and edges, but does not consider the dependence of both feature points and edges on environmental conditions. Need to be restrained.

さらに、放送映像では、撮影シーンが静的であることは稀であり、照明条件も一定ではないため、環境を拘束することが困難である。 Furthermore, in a broadcast video, the shooting scene is rarely static, and the lighting conditions are not constant, so it is difficult to constrain the environment.

本発明は、以上のような問題点に鑑みてなされたものであり、カメラ姿勢の推定において、カメラトラッキングの精度、頑健さ、撮影環境の自由度を向上することを課題とする。 The present invention has been made in view of the above problems, and it is an object of the present invention to improve the accuracy and robustness of camera tracking and the degree of freedom of the shooting environment in estimating the camera posture.

前記課題を解決するために、本発明の請求項１に記載のカメラ姿勢推定装置は、被写体の撮影画像中のエッジおよび特徴点を用いてカメラ姿勢を推定するモデルベースによるカメラ姿勢推定装置であって、３次元モデル記憶手段と、特徴点データベース記憶手段と、トラッキング状態計測手段と、信頼度計算手段と、カメラ姿勢推定手段とを備えることとした。 In order to solve the above problems, a camera posture estimation apparatus according to claim 1 of the present invention is a model-based camera posture estimation apparatus that estimates a camera posture using edges and feature points in a captured image of a subject. In addition, a three-dimensional model storage unit, a feature point database storage unit, a tracking state measurement unit, a reliability calculation unit, and a camera posture estimation unit are provided.

かかる構成によれば、カメラ姿勢推定装置は、３次元モデル記憶手段に、カメラの撮影方向に存在する被写体の特徴の情報を示す３次元モデルを記憶する。なお、この３次元モデルの位置情報は、例えば世界座標空間の３次元座標で記述される。また、カメラ姿勢推定装置は、特徴点データベース記憶手段に、前記３次元モデルの特徴点の記述子および３次元情報を含む特徴点情報を格納した特徴点データベースを記憶する。なお、この特徴点データベースにおける特徴点の位置情報は、例えばカメラ座標空間の３次元座標で記述される。 According to this configuration, the camera posture estimation device stores a three-dimensional model indicating information on the characteristics of the subject existing in the shooting direction of the camera in the three-dimensional model storage unit. Note that the position information of the three-dimensional model is described by, for example, three-dimensional coordinates in the world coordinate space. In addition, the camera posture estimation apparatus stores a feature point database storing feature point descriptors and feature point information including three-dimensional information in the feature point database storage unit. Note that the feature point position information in the feature point database is described by, for example, three-dimensional coordinates in the camera coordinate space.

そして、カメラ姿勢推定装置は、トラッキング状態計測手段によって、予め作成された前記３次元モデルおよび前記特徴点データベースと、入力される前記被写体を含む撮影画像とに基づいて、前記エッジをベースにして求めたトラッキングの第１誤差と、前記特徴点をベースにして求めたトラッキングの第２誤差とを少なくとも含むトラッキング状態を計測する。 Then, the camera posture estimation device obtains the edge based on the three-dimensional model and the feature point database created in advance by the tracking state measuring unit and the captured image including the input subject. The tracking state including at least the first tracking error and the second tracking error determined based on the feature point is measured.

そして、カメラ姿勢推定装置は、信頼度計算手段によって、前記計測されたトラッキング状態に応じて、前記第１誤差と前記第２誤差とを案分する指標として、エッジベーストラッキングと特徴点ベーストラッキングとを統合した統合トラッキングの信頼度を計算する。ここで、撮影環境に応じて第１誤差と第２誤差とを案分する割合を決定する関数を予め求めておくことができる。この関数は、計測されたトラッキング状態に応じて、例えばｘ軸上の区間［ａ，ｂ］の任意の位置の値を返す関数とすることができる。この場合、特徴点ベーストラッキングだけを用いる場合をｘ＝ａの位置で表し、エッジベーストラッキングだけを用いる場合をｘ＝ｂの位置で表してもよい。一般的に、環境変化のないような整えられた環境下では、エッジベーストラッキングは、特徴点ベーストラッキングと比較して精度が良い。そこで、最も単純化した場合には、区間［ａ，ｂ］を区間［０，１］で表すことができる。この場合、例えば、統合トラッキングの信頼度が０．４であれば、第２誤差よりも第１誤差の方の寄与が大きいので、エッジベーストラッキングよりも特徴点ベーストラッキングの方の寄与が大きいことになる。 Then, the camera posture estimation device uses edge-based tracking and feature point-based tracking as indices for dividing the first error and the second error according to the measured tracking state by the reliability calculation means. Calculate the reliability of integrated tracking that integrates. Here, a function that determines the proportion of the first error and the second error according to the shooting environment can be obtained in advance. This function can be a function that returns a value at an arbitrary position in the section [a, b] on the x-axis, for example, according to the measured tracking state. In this case, the case where only feature point base tracking is used may be represented by the position x = a, and the case where only edge base tracking is used may be represented by the position x = b. In general, edge-based tracking is more accurate than feature-point-based tracking in an organized environment where there is no environmental change. Therefore, in the simplest case, the section [a, b] can be represented by the section [0, 1]. In this case, for example, if the reliability of integrated tracking is 0.4, the contribution of the first error is larger than the second error, so that the feature point-based tracking contributes more than the edge-based tracking. become.

そして、カメラ姿勢推定装置は、カメラ姿勢推定手段によって、前記統合トラッキングの信頼度に応じて、前記第１誤差と前記第２誤差とを案分する割合を変動させて統合誤差を生成し、前記統合誤差が最小となるように現在のカメラ姿勢を推定する。例えば、入力映像のフレーム毎に以下の動作を行う。すなわち、トラッキング状態として第１誤差および第２誤差等を計測し、これらの結果を用いて統合誤差を生成する。そして、カメラ姿勢を推定するためのカメラトラッキングの予め用意された目的関数に含まれる誤差を、この生成した統合誤差に置き換えて、目的関数における誤差が最小となっときのカメラ姿勢の値を現在のカメラ姿勢の推定値として求める。 Then, the camera posture estimation device generates an integrated error by changing a proportion of dividing the first error and the second error according to the reliability of the integrated tracking by the camera posture estimation unit, The current camera posture is estimated so that the integration error is minimized. For example, the following operation is performed for each frame of the input video. That is, the first error, the second error, etc. are measured as the tracking state, and an integrated error is generated using these results. Then, the error included in the objective function prepared in advance for camera tracking for estimating the camera attitude is replaced with the generated integrated error, and the value of the camera attitude when the error in the objective function is minimized is Obtained as an estimated value of the camera posture.

また、本発明の請求項２に記載のカメラ姿勢推定装置は、請求項１に記載のカメラ姿勢推定装置において、前記トラッキング状態計測手段が、前記トラッキング状態として、前記第１誤差と、前記第２誤差と、前記エッジ数と、前記特徴点数と、前記カメラの動きに応じてぼけるエッジの個数を示す動きボケ強度と、カメラ姿勢の推定前の値に対する信頼度を示す初期カメラ姿勢の信頼度と、前記３次元モデルのモデルエッジに対応する前記撮影画像中のエッジの周囲に存在するエッジ対応点候補の数とを計測することとした。 The camera posture estimation apparatus according to a second aspect of the present invention is the camera posture estimation apparatus according to the first aspect, wherein the tracking state measuring means sets the first error and the second as the tracking state. An error, the number of edges, the number of feature points, a motion blur intensity indicating the number of edges blurred according to the motion of the camera, and a reliability of an initial camera posture indicating a reliability with respect to a value before estimation of the camera posture; The number of edge corresponding point candidates existing around the edge in the captured image corresponding to the model edge of the three-dimensional model is measured.

かかる横成によれば、カメラ姿勢推定装置において、前記信頼度計算手段が、前記トラッキング状態として計測された動きボケ強度の値が予め定められた第１閾値よりも大きい場合には、前記第１誤差だけ用いるように前記統合トラッキングの信頼度を計算する。
また、前記信頼度計算手段は、前記動きボケ強度の値が前記第１閾値以下の場合、かつ、前記トラッキング状態として計測された初期カメラ姿勢の信頼度の値が予め定められた第２閾値よりも大きい場合には、前記動きボケ強度に比例し、かつ、前記初期カメラ姿勢の信頼度および前記特徴点数にそれぞれ反比例するように案分して前記統合トラッキングの信頼度を計算する。
また、前記信頼度計算手段は、前記動きボケ強度の値が前記第１閾値以下の場合、かつ、前記初期カメラ姿勢の信頼度の値が前記第２閾値以下の場合には、前記エッジ数および前記動きボケ強度にそれぞれ比例し、かつ、前記初期カメラ姿勢の信頼度、前記特徴点数および前記エッジ対応点候補の数にそれぞれ反比例するように案分して前記統合トラッキングの信頼度を計算する。 According to this horizontal composition, in the camera posture estimation device, when the reliability calculation unit has a value of the motion blur intensity measured as the tracking state larger than a predetermined first threshold, The reliability of the integrated tracking is calculated so that only the error is used.
Further, the reliability calculation means is configured such that when the value of the motion blur intensity is equal to or less than the first threshold, and the reliability value of the initial camera posture measured as the tracking state is based on a predetermined second threshold. Is larger than the motion blur intensity and is proportional to the reliability of the initial camera posture and the number of feature points, the reliability of the integrated tracking is calculated.
In addition, the reliability calculation means, when the value of the motion blur intensity is less than or equal to the first threshold and when the reliability of the initial camera posture is less than or equal to the second threshold, The integrated tracking reliability is calculated by proportionally proportional to the motion blur intensity and inversely proportional to the reliability of the initial camera posture, the number of feature points, and the number of edge corresponding point candidates.

ここで、動きボケ強度の第１閾値と、初期カメラ姿勢の信頼度の第２閾値とは、撮影環境に応じて異なる値である。これらの閾値は、撮影シーンの実環境に対して予め行う実験に基づいて決定することができる。また、この実験により、映像上のエッジと特徴点の２つの特徴をどのように統一的に評価すれば良いのかを分析することができる。この分析により、エッジと特徴点という異なる特徴量を、撮影環境の変化から受ける影響を考慮して条件毎に場合分けして、適した重み付け導出式を求めた。信頼度計算手段は、これらの重み付け導出式を利用する。そのため、信頼度計算手段は、エッジと特徴点を協調的に利用することができる。これにより、モデルベースによるカメラ姿勢推定手法を頑健にすることができる。そして、このように重み付け導出式を条件に応じて変更して利用することで、エッジと特徴点という異なる特徴を統一的に扱い評価可能とした。したがって、撮影環境の変化から影響を低減し、安定かつ高精度にカメラ姿勢を推定できる。 Here, the first threshold value of the motion blur intensity and the second threshold value of the reliability of the initial camera posture are different values depending on the shooting environment. These threshold values can be determined based on experiments performed in advance on the actual environment of the shooting scene. In addition, this experiment can analyze how the two features of the edge and the feature point on the video should be evaluated in a unified manner. Based on this analysis, different feature quantities such as edges and feature points were classified for each condition in consideration of the influence of changes in the shooting environment, and a suitable weighting derivation formula was obtained. The reliability calculation means uses these weighting derivations. Therefore, the reliability calculation means can use the edge and the feature point cooperatively. Thereby, the model-based camera posture estimation method can be made robust. In this way, by changing the weighting derivation formula according to the conditions and using it, different features such as edges and feature points can be handled and evaluated uniformly. Therefore, it is possible to reduce the influence from changes in the shooting environment and to estimate the camera posture stably and with high accuracy.

また、本発明の請求項３に記載のカメラ姿勢推定装置は、請求項２に記載のカメラ姿勢推定装置において、信頼度補正手段をさらに備え、前記カメラ姿勢推定手段は、前記補正信頼度に応じて、前記第１誤差と前記第２誤差とを案分する割合を変動させ、現在のカメラ姿勢を推定することとした。 The camera posture estimation apparatus according to claim 3 of the present invention is the camera posture estimation apparatus according to claim 2, further comprising a reliability correction unit, wherein the camera posture estimation unit is responsive to the correction reliability. Thus, the proportion of dividing the first error and the second error is changed, and the current camera posture is estimated.

かかる横成によれば、カメラ姿勢推定装置は、信頼度補正手段によって、前記統合トラッキングの信頼度を補正した補正信頼度を生成することとした。そして、この信頼度補正手段は、前記統合トラッキングの信頼度が０．５より大きく、かつ、前記特徴点数に対する前記エッジ数の割合を示すサンプル比が１より小さい場合には、前記統合トラッキングの信頼度に比例し、かつ、前記サンプル比に反比例するような第１補正式により前記補正信頼度を計算する。ここで、統合トラッキングの信頼度が０．５より大きい場合とは、エッジベーストラッキングの撮影環境が良い場合を示す。また、サンプル比が１より小さい場合とは、エッジの個数が相対的に少ない場合を示す。このようにエッジベーストラッキングの撮影環境が良くても、用いるエッジの数が相対的に少ない場合には、信頼度の計算結果が良好なものとならない場合がある。しかし、第１補正式により補正信頼度を計算することで、モデルエッジと撮影画像から検出するエッジとの誤対応を防止することができる。 According to this horizontal composition, the camera posture estimation device generates the correction reliability in which the reliability of the integrated tracking is corrected by the reliability correction unit. Then, the reliability correction means determines the reliability of the integrated tracking when the reliability of the integrated tracking is larger than 0.5 and the sample ratio indicating the ratio of the number of edges to the number of feature points is smaller than 1. The correction reliability is calculated by a first correction formula that is proportional to the degree and inversely proportional to the sample ratio. Here, the case where the reliability of the integrated tracking is larger than 0.5 indicates a case where the photographing environment of the edge-based tracking is good. A case where the sample ratio is smaller than 1 indicates a case where the number of edges is relatively small. As described above, even if the shooting environment of edge-based tracking is good, the reliability calculation result may not be good when the number of edges used is relatively small. However, by calculating the correction reliability using the first correction formula, it is possible to prevent erroneous correspondence between the model edge and the edge detected from the captured image.

また、信頼度補正手段は、前記統合トラッキングの信頼度が０．５より小さく、かつ、前記サンプル比が１より大きい場合には、前記統合トラッキングの信頼度に比例し、かつ、前記サンプル比に反比例するような第２補正式により前記補正信頼度を計算する。ここで、統合トラッキングの信頼度が０．５より小さい場合とは、特徴点ベーストラッキングの撮影環境が良い場合を示す。また、サンプル比が１より大きい場合とは、特徴点の個数が相対的に少ない場合を示す。このように特徴点ベーストラッキングの撮影環境が良くても、用いる特徴点の個数が相対的に少ない場合には、信頼度の計算結果が良好なものとならない場合がある。しかし、第２補正式により補正信頼度を計算することで、特徴点データベースの特徴点と、撮影画像から検出する特徴点との誤対応を防止することができる。 In addition, the reliability correction unit is proportional to the reliability of the integrated tracking when the reliability of the integrated tracking is smaller than 0.5 and the sample ratio is larger than 1, and The correction reliability is calculated using a second correction formula that is inversely proportional. Here, the case where the reliability of the integrated tracking is smaller than 0.5 indicates a case where the photographing environment of the feature point base tracking is good. The case where the sample ratio is greater than 1 indicates a case where the number of feature points is relatively small. Thus, even if the shooting environment for feature point-based tracking is good, if the number of feature points used is relatively small, the reliability calculation result may not be good. However, by calculating the correction reliability using the second correction formula, it is possible to prevent erroneous correspondence between the feature points in the feature point database and the feature points detected from the captured image.

また、請求項４に記載のカメラ姿勢推定装置は、請求項２または請求項３に記載のカメラ姿勢推定装置において、前記トラッキング状態計測手段が、エッジ検出手段と、エッジマッチング手段と、動きボケ計算手段と、特徴点検出マッチング手段と、初期カメラ姿勢計算手段とを備えることとした。 According to a fourth aspect of the present invention, in the camera posture estimation apparatus according to the second or third aspect, the tracking state measurement unit includes an edge detection unit, an edge matching unit, and motion blur calculation. Means, feature point detection matching means, and initial camera attitude calculation means.

かかる横成によれば、カメラ姿勢推定装置において、トラッキング状態計測手段は、エッジ検出手段によって、入力される前記撮影映像からエッジを検出する。そして、トラッキング状態計測手段は、エッジマッチング手段によって、前記３次元モデルに含まれるモデルエッジと、前記検出されたエッジとのマッチング処理により、前記エッジ対応点候補の数と、前記エッジ数とを算出する。また、トラッキング状態計測手段は、動きボケ計算手段によって、入力される前記撮影映像の動きボケに基づき、前記動きボケ強度を算出する。また、トラッキング状態計測手段は、特徴点検出マッチング手段によって、入力される前記撮影映像から特徴点を検出し、前記特徴点データベースに格納された特徴点と、前記検出された特徴点とのマッチング処理により、前記特徴点数を算出する。そして、トラッキング状態計測手段は、初期カメラ姿勢計算手段によって、前記特徴点のマッチング処理の結果から前記初期カメラ姿勢を求め、この求めた初期カメラ姿勢に対して前記初期カメラ姿勢の信頼度を算出する。ここで、初期カメラ姿勢は推定前のカメラ姿勢を示す。 According to this horizontal composition, in the camera posture estimation device, the tracking state measuring means detects the edge from the inputted photographed video by the edge detecting means. Then, the tracking state measuring unit calculates the number of edge corresponding point candidates and the number of edges by the matching process between the model edge included in the three-dimensional model and the detected edge by the edge matching unit. To do. In addition, the tracking state measuring unit calculates the motion blur intensity based on the motion blur of the captured video input by the motion blur calculation unit. Further, the tracking state measuring unit detects a feature point from the input captured image by the feature point detection matching unit, and performs a matching process between the feature point stored in the feature point database and the detected feature point To calculate the number of feature points. Then, the tracking state measuring means obtains the initial camera attitude from the result of the feature point matching process by the initial camera attitude calculating means, and calculates the reliability of the initial camera attitude with respect to the obtained initial camera attitude. . Here, the initial camera posture indicates the camera posture before estimation.

また、請求項５に記載のカメラ姿勢推定プログラムは、被写体の撮影画像中のエッジおよび特徴点を用いてモデルベースによりカメラ姿勢を推定するために、カメラの撮影方向に存在する被写体の特徴の情報を示す３次元モデルを記憶する３次元モデル記憶手段と、前記３次元モデルの特徴点の記述子および３次元情報を含む特徴点情報を格納した特徴点データベースを記憶する特徴点データベース記憶手段とを備えたコンピュータを、トラッキング状態計測手段、信頼度計算手段、カメラ姿勢推定手段として機能させるためのプログラムである。 According to a fifth aspect of the present invention, there is provided a camera posture estimation program for estimating a camera posture based on a model using an edge and a feature point in a photographed image of a subject. Three-dimensional model storage means for storing a three-dimensional model indicating a feature point; and feature point database storage means for storing a feature point database storing feature point information including feature point descriptors and three-dimensional information of the three-dimensional model. This is a program for causing a computer provided to function as tracking state measuring means, reliability calculating means, and camera posture estimating means.

かかる構成によれば、カメラ姿勢推定プログラムは、トラッキング状態計測手段によって、予め作成された前記３次元モデルおよび前記特徴点データベースと、入力される前記被写体を含む撮影画像とに基づいて、前記エッジをベースにして求めたトラッキングの第１誤差と、前記特徴点をベースにして求めたトラッキングの第２誤差とを少なくとも含むトラッキング状態を計測する。 According to such a configuration, the camera posture estimation program detects the edge based on the three-dimensional model and the feature point database created in advance by the tracking state measurement unit and the captured image including the input subject. A tracking state including at least a first tracking error determined based on the base and a second tracking error determined based on the feature point is measured.

そして、カメラ姿勢推定プログラムは、信頼度計算手段によって、前記計測されたトラッキング状態に応じて、前記第１誤差と前記第２誤差とを案分する指標として、エッジベーストラッキングと特徴点ベーストラッキングとを統合した統合トラッキングの信頼度を計算する。 Then, the camera posture estimation program uses edge-based tracking and feature point-based tracking as indices for dividing the first error and the second error according to the measured tracking state by the reliability calculation means. Calculate the reliability of integrated tracking that integrates.

そして、カメラ姿勢推定プログラムは、カメラ姿勢推定手段によって、前記統合トラッキングの信頼度に応じて、前記第１誤差と前記第２誤差とを案分する割合を変動させて統合誤差を生成し、前記統合誤差が最小となるように現在のカメラ姿勢を推定する。 The camera posture estimation program generates an integrated error by changing a proportion of the first error and the second error according to the reliability of the integrated tracking by the camera posture estimation means, The current camera posture is estimated so that the integration error is minimized.

請求項１に記載の発明によれば、カメラ姿勢推定装置は、予め作成されたモデルと特徴点データベースと撮影画像とに基づいて計測されたトラッキング状態に応じて、エッジをベースにして求めたトラッキングの第１誤差と、特徴点をベースにして求めたトラッキングの第２誤差とを案分する割合を変動させて統合誤差を生成し、現在のカメラ姿勢を推定する。したがって、いずれか一方の誤差を用いる場合や、双方の誤差を固定的に案分する場合に比べて、カメラトラッキングの精度、頑健さ、撮影環境の自由度を向上することができる。 According to the first aspect of the present invention, the camera posture estimation device is a tracking obtained based on an edge according to a tracking state measured based on a model created in advance, a feature point database, and a captured image. An integrated error is generated by varying the proportion of the first error and the second tracking error obtained based on the feature points, and the current camera posture is estimated. Therefore, the accuracy of the camera tracking, the robustness, and the degree of freedom of the photographing environment can be improved as compared with the case where either one of the errors is used or when both errors are fixedly distributed.

請求項２に記載の発明によれば、カメラ姿勢推定装置は、トラッキング状態として、エッジをベースにして求めたトラッキングの第１誤差、特徴点をベースにして求めたトラッキングの第２誤差、動きボケ強度、エッジ数、特徴点数、初期カメラ姿勢の信頼度、およびエッジ対応点候補の数とを計測し、計測結果に応じて最適の条件式で信頼度を計算する。したがって、カメラ姿勢の推定において、撮影環境が変わっても柔軟に適用することができる。 According to the second aspect of the present invention, the camera posture estimation device uses, as the tracking state, the first tracking error obtained based on the edge, the second tracking error obtained based on the feature point, and motion blur. The strength, the number of edges, the number of feature points, the reliability of the initial camera posture, and the number of edge corresponding point candidates are measured, and the reliability is calculated with an optimum conditional expression according to the measurement result. Therefore, the camera posture can be flexibly applied even if the shooting environment changes.

請求項３に記載の発明によれば、カメラ姿勢推定装置は、特徴点数に対するエッジ数の割合を示すサンプル比と統合トラッキングの信頼度の値とに応じて、統合トラッキングの信頼度を補正した補正信頼度を生成する。したがって、エッジまたは特徴点の誤対応を防止することができる。 According to the third aspect of the invention, the camera posture estimation device corrects the reliability of the integrated tracking according to the sample ratio indicating the ratio of the number of edges to the number of feature points and the value of the reliability of the integrated tracking. Generate confidence. Accordingly, it is possible to prevent erroneous correspondence between edges or feature points.

請求項４に記載の発明によれば、カメラ姿勢推定装置は、マッチング処理により、エッジ対応点候補の数と、エッジ数と、特徴点数とを算出し、撮影映像から、動きボケ強度と初期カメラ姿勢の信頼度とを算出することができる。そのため、安定的にカメラ姿勢を推定することができる。 According to the fourth aspect of the present invention, the camera posture estimation device calculates the number of edge corresponding point candidates, the number of edges, and the number of feature points by matching processing, and uses the motion blur intensity and the initial camera from the captured video. The posture reliability can be calculated. Therefore, the camera posture can be estimated stably.

請求項５に記載の発明によれば、カメラ姿勢推定プログラムは、予め作成されたモデルと特徴点データベースと撮影画像とに基づいて計測されたトラッキング状態に応じて、エッジをベースにして求めたトラッキングの第１誤差と、特徴点をベースにして求めたトラッキングの第２誤差とを案分する割合を変動させて統合誤差を生成し、現在のカメラ姿勢を推定することができる。 According to the fifth aspect of the present invention, the camera posture estimation program obtains the tracking based on the edge according to the tracking state measured based on the model, the feature point database, and the photographed image created in advance. The current camera posture can be estimated by generating an integrated error by varying the proportion of the first error and the tracking second error obtained based on the feature points.

本発明の実施形態に係るカメラ姿勢推定装置の構成を模式的に示すブロック図である。It is a block diagram which shows typically the structure of the camera attitude | position estimation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るカメラ姿勢推定装置を含む映像合成システムを模式的に示す説明図であって、（ａ）は映像合成システムの構成を示すブロック図、（ｂ）は人物が装着した状態の一例をそれぞれ示している。BRIEF DESCRIPTION OF THE DRAWINGS It is explanatory drawing which shows typically the video composition system containing the camera attitude | position estimation apparatus which concerns on embodiment of this invention, Comprising: (a) is a block diagram which shows the structure of a video composition system, (b) is the state with which the person mounted | wore An example of each is shown. 本発明のモデルベースによるカメラトラッキングの概要を示す説明図であって、（ａ）は３次元モデル、（ｂ）は過去のカメラ姿勢を用いた投影画像、（ｃ）は現在のカメラ画像、（ｄ）は移動量をそれぞれ示している。It is explanatory drawing which shows the outline | summary of the camera tracking by the model base of this invention, Comprising: (a) is a three-dimensional model, (b) is the projection image using the past camera attitude | position, (c) is the present camera image, ( d) shows the amount of movement, respectively. 従来方法を用いてカメラ姿勢の初期値にガウスノイズを加えた場合のカメラ姿勢推定誤差のシミュレーション結果を示すグラフであって、（ａ）は点ベーストラッキング方法、（ｂ）はエッジベーストラッキング方法を示している。It is a graph which shows the simulation result of the camera attitude | position estimation error at the time of adding Gaussian noise to the initial value of a camera attitude | position using the conventional method, Comprising: (a) is a point-based tracking method, (b) is an edge-based tracking method. Show. 従来方法を用いてシーンの複雑さに相当する視覚的手がかりの量とカメラ姿勢推定誤差との関係のシミュレーション結果を示すグラフであって、（ａ）は点ベーストラッキング方法、（ｂ）はエッジベーストラッキング方法を示している。It is a graph which shows the simulation result of the relationship between the amount of visual cues equivalent to the complexity of a scene and the camera posture estimation error using a conventional method, (a) is a point-based tracking method, (b) is edge-based tracking The tracking method is shown. 従来のエッジベーストラッキング方法においてシーンの複雑さが影響するエッジ探索性能に関するカメラ姿勢推定誤差のシミュレーション結果を示すグラフである。It is a graph which shows the simulation result of the camera attitude | position estimation error regarding the edge search performance in which the complexity of a scene influences in the conventional edge-based tracking method. 従来方法を用いてカメラ映像に動きボケを加えた場合のカメラ姿勢推定誤差のシミュレーション結果を示すグラフであって、（ａ）は点ベーストラッキング方法、（ｂ）はエッジベーストラッキング方法を示している。It is a graph which shows the simulation result of the camera attitude | position estimation error at the time of adding a motion blur to a camera image | video using the conventional method, (a) is a point-based tracking method, (b) has shown the edge-based tracking method. . 従来方法を用いてカメラ映像にガウスノイズを加えた場合のカメラ姿勢推定のシミュレーション結果を示すグラフであって、（ａ）は点ベーストラッキング方法、（ｂ）はエッジベーストラッキング方法を示している。It is a graph which shows the simulation result of the camera attitude | position estimation at the time of adding Gaussian noise to a camera image | video using the conventional method, Comprising: (a) has shown the point-based tracking method, (b) has shown the edge-based tracking method. 本発明の実施形態に係るカメラ姿勢推定装置の信頼度計算処理および信頼度補正処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the reliability calculation process of the camera attitude | position estimation apparatus which concerns on embodiment of this invention, and a reliability correction process. カメラ姿勢の並進項ｔ１の推定結果を示すグラフであって、（ａ）は従来の点ベース方法、（ｂ）は従来のエッジベース方法、（ｃ）はηを固定した場合、（ｄ）は本発明においてηを変化させた場合、（ｅ）はηの変化をそれぞれ示している。FIG. 7 is a graph showing the estimation result of the translation term t1 of the camera posture, where (a) is a conventional point-based method, (b) is a conventional edge-based method, (c) is when η is fixed, and (d) is When (eta) is changed in this invention, (e) has each shown the change of (eta). カメラ姿勢の並進項ｔ２の推定結果を示すグラフであって、（ａ）は従来の点ベース方法、（ｂ）は従来のエッジベース方法、（ｃ）はηを固定した場合、（ｄ）は本発明においてηを変化させた場合をそれぞれ示している。FIG. 6 is a graph showing the estimation result of the translation term t2 of the camera posture, where (a) is a conventional point-based method, (b) is a conventional edge-based method, (c) is when η is fixed, (d) is The case where η is changed in the present invention is shown. カメラ姿勢の並進項ｔ３の推定結果を示すグラフであって、（ａ）は従来の点ベース方法、（ｂ）は従来のエッジベース方法、（ｃ）はηを固定した場合、（ｄ）は本発明においてηを変化させた場合をそれぞれ示している。FIG. 6 is a graph showing the estimation result of the translation term t3 of the camera posture, where (a) is a conventional point-based method, (b) is a conventional edge-based method, (c) is when η is fixed, (d) is The case where η is changed in the present invention is shown. 本発明において、カメラの動きと共に照明が変化する場合にカメラ姿勢を推定したときの映像合成結果を模式的に示す図であって、（ａ）は照明が通常の場合、（ｂ）は照明が暗くなった場合、（ｃ）は照明が明るくなった場合をそれぞれ示している。In this invention, it is a figure which shows typically the image composition result when a camera attitude | position is estimated when illumination changes with a motion of a camera, Comprising: (a) is normal illumination, (b) is illumination. When it becomes dark, (c) has shown the case where illumination becomes bright, respectively.

以下、本発明のカメラ姿勢推定装置を実施するための形態（以下「実施形態」という）について図面を参照して詳細に説明する。以下では、１．カメラ姿勢推定装置の概要、２．映像合成システム、３．カメラ姿勢推定装置の構成、４．線形反復解法によるモデルベースカメラトラッキング、５．撮影環境の分析手法、６．カメラ姿勢推定装置の動作の各章に分けて順次説明する。 Hereinafter, a mode for carrying out a camera posture estimation apparatus of the present invention (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. In the following, 1. 1. Outline of camera posture estimation device 2. video composition system; 3. Configuration of camera posture estimation device 4. Model-based camera tracking by linear iterative method, 5. Analysis method of shooting environment The operation of the camera posture estimation device will be described in order in each chapter.

［１．カメラ姿勢推定装置の概要］
図１に示すカメラ姿勢推定装置１は、被写体の撮影画像中のエッジおよび特徴点を用いてカメラ姿勢を推定するモデルベースによるカメラ姿勢推定装置である。ここで、カメラ姿勢とは、いわゆるカメラの外部パラメータである。ここでは、例えば、カメラレンズの撮影角度、カメラレンズの設置場所や高さを示す３次元座標、カメラのパン軸、チルト軸、ズーム軸、フォーカス軸等の操作を反映したカメラ姿勢を示す軸移動角度や軸移動距離、カメラレンズの焦点距離、カメラの撮像素子の画素ピッチ等を含んでもよい。 [1. Overview of camera posture estimation device]
A camera posture estimation device 1 shown in FIG. 1 is a model-based camera posture estimation device that estimates a camera posture using edges and feature points in a captured image of a subject. Here, the camera posture is a so-called external parameter of the camera. Here, for example, the camera lens imaging angle, the three-dimensional coordinates indicating the installation location and height of the camera lens, the axis movement indicating the camera posture reflecting the operations of the camera pan axis, tilt axis, zoom axis, focus axis, etc. It may include an angle, an axial movement distance, a focal length of a camera lens, a pixel pitch of an image sensor of the camera, and the like.

このカメラ姿勢推定装置１では、図３に示すように、被写体の３次元モデルを仮想的なスクリーンに投影した投影画像と、撮影映像の被写体上の特徴とのズレを最小化することでカメラ姿勢を推定する手法を利用している。本実施形態では、映像上の被写体の特徴として、エッジと特徴点の双方の情報を映像に対応して重み付けをして利用する。また、後記するように、エッジと特徴点の双方の情報が撮影環境から受ける影響を、事前に撮影映像から分析することとした。この分析結果に基づき導きだした重み付け導出式を、カメラ姿勢推定装置１が利用し、カメラ映像と投影映像のズレを評価して重み付けに反映する。これにより、照明条件や、カメラの速い動きなど一般的に想定される撮影の条件、環境の変化に対する頑健さを備えることができる。 In this camera posture estimation device 1, as shown in FIG. 3, the camera posture is minimized by minimizing the deviation between the projected image obtained by projecting the three-dimensional model of the subject on a virtual screen and the feature on the subject of the photographed video. Is used. In the present embodiment, as the characteristics of the subject on the video, information on both edges and feature points is weighted corresponding to the video and used. Further, as will be described later, the influence of both the edge and feature point information from the shooting environment is analyzed from the shot video in advance. The camera pose estimation device 1 uses the weighting derivation formula derived based on the analysis result to evaluate the deviation between the camera image and the projected image and reflect it in the weighting. This makes it possible to have robustness against changes in illumination conditions, generally assumed shooting conditions such as fast camera movement, and environmental changes.

図１に示したカメラ姿勢推定装置１は、図２に示す映像合成システム１００の構成部分のみを示したものであり、図１では省略したカメラ２等を含んでもよい。ここで、映像合成システム１００について説明する。 The camera posture estimation apparatus 1 shown in FIG. 1 shows only the components of the video composition system 100 shown in FIG. 2, and may include the camera 2 and the like omitted in FIG. Here, the video composition system 100 will be described.

［２．映像合成システム］
図２（ａ）に示す映像合成システム１００は、ユーザが拡張現実感を享受するためのシステムであって、図１に示したカメラ姿勢推定装置１の１つの適用例を示している。
映像合成システム１００は、カメラ姿勢推定装置１と、カメラ２と、仮想３次元物体モデル記憶手段３と、レンダリング装置４と、映像合成装置５とを備える。ここでは、映像合成システム１００は、図２（ｂ）に示すようにユーザに装着されるものである。図２（ｂ）に示す装着者Ｐは、頭部に、カメラ２とＨＭＤ（Head Mounted Display）７とを装着している。また、装着者Ｐは、腰部にウェアラブルＰＣ（Personal Computer）８を装着している。このウェアラブルＰＣ８は、カメラ姿勢推定装置１と、仮想３次元物体モデル記憶手段３と、レンダリング装置４と、映像合成装置５とを備える。 [2. Video composition system]
A video composition system 100 shown in FIG. 2A is a system for a user to enjoy augmented reality, and shows one application example of the camera posture estimation apparatus 1 shown in FIG.
The video composition system 100 includes a camera posture estimation device 1, a camera 2, a virtual three-dimensional object model storage unit 3, a rendering device 4, and a video composition device 5. Here, the video composition system 100 is worn by the user as shown in FIG. A wearer P shown in FIG. 2B wears a camera 2 and an HMD (Head Mounted Display) 7 on his head. The wearer P wears a wearable PC (Personal Computer) 8 on the waist. The wearable PC 8 includes a camera posture estimation device 1, a virtual three-dimensional object model storage unit 3, a rendering device 4, and a video composition device 5.

この映像合成システム１００では、ユーザは、視覚センサとしてのカメラ以外には特殊なセンサなどを装着する必要なしに拡張現実感を享受することができる。
カメラ２は、ユーザの頭部に装着される小型カメラであり、撮影映像をカメラ姿勢推定装置１および映像合成装置５に出力する。
カメラ姿勢推定装置１は、映像中のエッジおよび特徴点という視覚的手がかりを解析的に統合してカメラ姿勢を推定し、推定したカメラ姿勢をレンダリング装置４に出力する。なお、カメラ姿勢推定装置１の詳細については後記する。 In this video composition system 100, the user can enjoy augmented reality without having to wear a special sensor or the like other than the camera as the visual sensor.
The camera 2 is a small camera mounted on the user's head, and outputs the captured video to the camera posture estimation device 1 and the video composition device 5.
The camera posture estimation device 1 estimates the camera posture by analytically integrating visual cues such as edges and feature points in the video, and outputs the estimated camera posture to the rendering device 4. Details of the camera posture estimation device 1 will be described later.

仮想３次元物体モデル記憶手段３は、ＣＧ（Computer Graphics）データ６を記憶するものであって、一般的なメモリ等から構成される。
レンダリング装置４は、ＣＧデータ６に基づいて仮想３次元空間データを生成し、入力されたカメラ姿勢に基づいてＣＧオブジェクト（ＣＧ画像）およびアルファプレーンをレンダリングし、レンダリングしたＣＧオブジェクトを、アルファプレーンと共に映像合成装置５に出力する。なお、アルファプレーンは、ＣＧオブジェクトの被写体領域とそうでない領域とを区別する情報を有する画像である。 The virtual three-dimensional object model storage means 3 stores CG (Computer Graphics) data 6 and is composed of a general memory or the like.
The rendering device 4 generates virtual three-dimensional spatial data based on the CG data 6, renders a CG object (CG image) and an alpha plane based on the input camera posture, and renders the rendered CG object together with the alpha plane. The image is output to the video composition device 5. The alpha plane is an image having information for distinguishing the subject area of the CG object from the non-object area.

映像合成装置５は、レンダリング装置４のレンダリングしたＣＧオブジェクトの画像とアルファプレーンを用いて、カメラ２から出力される撮影画像に映像合成するものである。映像合成装置５の出力する合成画像（合成映像）は、例えば図２（ｂ）に示すＨＭＤ７に表示され、ユーザに提示される。なお、映像合成装置５は、公知のバーチャルスタジオ用ＣＧ合成装置で実現するようにしてもよい。 The video synthesizing device 5 synthesizes a video image of the CG object rendered by the rendering device 4 and the captured image output from the camera 2 using the alpha plane. The composite image (composite video) output from the video composition device 5 is displayed on, for example, the HMD 7 shown in FIG. 2B and presented to the user. The video composition device 5 may be realized by a known virtual studio CG composition device.

映像合成システム１００によれば、ユーザは、例えば、実際にそこには配置されていない仮想的な物体をあたかも存在しているように視覚的に感じながら行動することができる。つまり、ユーザは拡張現実感を享受しながら行動できる。
また、この映像合成システム１００では、環境を制御していない。例えば、装着者Ｐが行動する環境において、次の（１）〜（３）のような条件を人工的に実現するように外部から環境を制御するといったことをしていない。
（１）撮影シーンに、くっきりした模様が含まれる。
（２）撮影シーンに、複数の強いエッジが含まれる。
（３）撮影条件として、照明などの環境が変化しない。
つまり、この映像合成システム１００では、装着者Ｐが視線を向けた方向（カメラ２の撮影方向）に、くっきりした模様やエッジが存在しない場合に、それを察知して視線を向けた方向に、くっきりした模様やエッジを自動的に配置するようなことはしていない。また、装着者Ｐが暗い方向に視線を向けた場合に、それを察知して視線を向けた方向の照明を明るくしたりするように環境をコントロールすることはしていない。
しかしながら、映像合成システム１００は、本実施形態のカメラ姿勢推定装置１を備えているために頑健にカメラ姿勢を推定することができる。また、時間的に変化のある環境下でも頑健にカメラ姿勢を推定することができる。 According to the video composition system 100, for example, the user can act while visually feeling as if a virtual object that is not actually arranged there is present. That is, the user can act while enjoying augmented reality.
In addition, the video composition system 100 does not control the environment. For example, in the environment where the wearer P acts, the environment is not controlled from the outside so as to artificially realize the following conditions (1) to (3).
(1) A clear pattern is included in the shooting scene.
(2) The shooting scene includes a plurality of strong edges.
(3) An environment such as illumination does not change as a photographing condition.
That is, in the video composition system 100, when there is no clear pattern or edge in the direction in which the wearer P points the line of sight (the shooting direction of the camera 2), in the direction in which the line of sight is detected by detecting it. It does not automatically place clear patterns and edges. Further, when the wearer P turns his / her line of sight in a dark direction, the environment is not controlled so as to detect this and brighten the illumination in the direction of the line of sight.
However, since the video composition system 100 includes the camera posture estimation device 1 of the present embodiment, it can robustly estimate the camera posture. Further, it is possible to robustly estimate the camera posture even in an environment with temporal changes.

［３．カメラ姿勢推定装置の構成］
図１に示すカメラ姿勢推定装置１は、例えば、ＣＰＵ等の演算装置と、メモリやハードディスク等の記憶装置（記憶手段）と、外部との間で各種情報の送受信を行うインタフェース装置とを備えたコンピュータと、このコンピュータにインストールされたプログラムとから構成される。 [3. Configuration of camera posture estimation device]
The camera posture estimation device 1 shown in FIG. 1 includes, for example, a calculation device such as a CPU, a storage device (storage means) such as a memory and a hard disk, and an interface device that transmits and receives various types of information to and from the outside. It consists of a computer and a program installed on this computer.

カメラ姿勢推定装置１は、ハードウェア装置とソフトウェアとが協働することによって、前記したハードウェア資源がプログラムによって制御されることにより実現され、図１に示すように、３次元モデル記憶手段１０と、特徴点データベース記憶手段２０と、映像入力手段３０と、トラッキング状態計測部（トラッキング状態計測手段）４０と、信頼度計算部（信頼度計算手段）５０と、信頼度補正部（信頼度補正手段）６０と、カメラ姿勢推定手段７０と、出力手段８０とを備えている。なお、図１のブロック図は、カメラパラメータ推定アルゴリズムの処理の流れをそのまま反映した図を示している。 The camera posture estimation device 1 is realized by the hardware device and software cooperating to control the above hardware resources by a program. As shown in FIG. , Feature point database storage means 20, video input means 30, tracking state measurement section (tracking state measurement means) 40, reliability calculation section (reliability calculation means) 50, reliability correction section (reliability correction means) ) 60, camera posture estimation means 70, and output means 80. The block diagram of FIG. 1 shows a diagram that directly reflects the processing flow of the camera parameter estimation algorithm.

＜３次元モデル記憶手段＞
３次元モデル記憶手段１０は、ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）等のメモリやハードディスク等の記憶装置から構成され、事前に作成された３次元モデル１１を記憶することとした。３次元モデル１１は、カメラの撮影方向に存在する被写体の特徴の情報を示す。被写体の特徴の情報とは、例えば、被写体の形状や表面模様に含まれる絵柄のコーナーなどの特徴点の位置を示す情報である。なお、この３次元モデル１１の位置情報は、例えば世界座標空間の３次元座標で記述される。 <Three-dimensional model storage means>
The three-dimensional model storage means 10 includes a memory device such as a RAM (Random Access Memory) and a ROM (Read Only Memory), and a storage device such as a hard disk, and stores a three-dimensional model 11 created in advance. The three-dimensional model 11 shows information on the characteristics of a subject existing in the shooting direction of the camera. The subject feature information is, for example, information indicating the position of a feature point such as a corner of a picture included in the shape or surface pattern of the subject. The position information of the three-dimensional model 11 is described by, for example, three-dimensional coordinates in the world coordinate space.

＜特徴点データベース記憶手段＞
特徴点データベース記憶手段２０は、メモリやハードディスク等の記憶装置から構成され、事前に作成された特徴点データベース２１を記憶することとした。特徴点データベース２１は、被写体の３次元モデル１１の特徴点の記述子および３次元情報を含む特徴点情報を格納したデータベースである。ここで、３次元モデルの特徴点の記述子とは、被写体の３次元モデルに存在する特徴点はどういったものかということを記述したもので、特徴点を識別することのできる名称や識別子を示す。また、３次元情報とは、３次元モデルの特徴点のｘ座標、ｙ座標、ｚ座標を示す。なお、この特徴点データベースにおける特徴点の３次元情報は、３次元モデルの世界座標空間からカメラ座標空間に投影された座標（カメラ座標）で記述されている。また、特徴点データベース２１は、３次元モデル１１の情報を基に想定したカメラ姿勢で、これらの情報を仮想的なスクリーンに投影したものと、この投影したときのカメラパラメータとを格納している。 <Feature point database storage means>
The feature point database storage means 20 is composed of a storage device such as a memory or a hard disk, and stores a feature point database 21 created in advance. The feature point database 21 is a database storing feature point descriptors including feature point descriptors and three-dimensional information of the three-dimensional model 11 of the subject. Here, the feature point descriptor of the 3D model describes what kind of feature point exists in the 3D model of the subject, and a name or identifier that can identify the feature point. Indicates. The three-dimensional information indicates the x coordinate, y coordinate, and z coordinate of the feature point of the three-dimensional model. Note that the three-dimensional information of feature points in the feature point database is described by coordinates (camera coordinates) projected from the world coordinate space of the three-dimensional model onto the camera coordinate space. Further, the feature point database 21 stores a camera posture assumed based on the information of the three-dimensional model 11, a projection of the information on a virtual screen, and a camera parameter when the projection is performed. .

＜映像入力手段＞
映像入力手段３０は、被写体を含む撮影映像を入力するものであり、所定の入力インタフェース等から構成される。なお、映像入力手段３０は、図示しない通信ネットワークから撮影映像を入力する通信インタフェース等から構成するようにしてもよい。ここで入力された撮影映像は、トラッキング状態計測部４０のエッジ検出マッチング部４１、動きボケ計算部４２、特徴点検出マッチング部４３に出力される。 <Video input means>
The video input means 30 inputs a captured video including a subject, and is configured from a predetermined input interface or the like. Note that the video input unit 30 may be configured by a communication interface or the like for inputting a captured video from a communication network (not shown). The captured image input here is output to the edge detection matching unit 41, the motion blur calculation unit 42, and the feature point detection matching unit 43 of the tracking state measurement unit 40.

＜トラッキング状態計測部＞
トラッキング状態計測部（トラッキング状態計測手段）４０は、３次元モデル１１および特徴点データベース２１と、入力される撮影画像とに基づいて、エッジ対応誤差err₁と、特徴点対応誤差err₂とを含むトラッキング状態を計測するものである。 <Tracking state measurement unit>
The tracking state measurement unit (tracking state measurement means) 40 includes an edge correspondence error err ₁ and a feature point correspondence error err ₂ based on the three-dimensional model 11 and the feature point database 21 and the input captured image. The tracking state is measured.

ここで、エッジ対応誤差err₁は、エッジをベースにして求めたトラッキング誤差（第１誤差）であり、対応するエッジがどれくらいずれているかを示す。
また、特徴点対応誤差err₂は、特徴点をベースにして求めたトラッキング誤差（第２誤差）であり、対応する特徴点がどれくらいずれているかを示す。 Here, the edge correspondence error err ₁ is a tracking error (first error) obtained based on the edge, and indicates how many corresponding edges are present.
The feature point correspondence error err ₂ is a tracking error (second error) obtained based on the feature point, and indicates how many corresponding feature points are present.

本実施形態では、トラッキング状態計測部４０は、トラッキング状態として、さらに、エッジ数ａ_eと、エッジ対応点候補の数ｃ_cと、特徴点数ａ_pと、初期カメラ姿勢の信頼度ｖと、動きボケ強度ｂとを計測する。
エッジ数ａ_eは、３次元モデル１１のモデルエッジに対応する撮影画像中のエッジ数を表す。
エッジ対応点候補の数ｃ_cは、モデルエッジに対応する撮影画像中のエッジの周囲に存在するエッジ対応点候補の数を表す。つまり、エッジ対応点候補の数ｃ_cは、対象のエッジの周りに存在する異なるエッジの数を示す。
特徴点数ａ_pは、特徴点データベース２１の特徴点に対応する撮影画像中の特徴点数を表す。
初期カメラ姿勢の信頼度ｖは、カメラ姿勢の推定前の値（カメラ姿勢の初期値）に対する統合誤差の信頼度を示す。これは、最適化の初期値によっては、エラーが大きくなってしまうことを考慮したパラメータである。
動きボケ強度ｂは、カメラ２の動きに応じてぼけるエッジの個数を示す。ここで、エッジがぼけるとは、映像的にはカメラを振ったときにエッジがフワーと広がることを示す。 In the present embodiment, the tracking state measurement unit 40 further includes the number of edges a _e , the number of edge corresponding point candidates c _c , the number of feature points a _p , the reliability v of the initial camera posture, and the motion as the tracking state. The blur intensity b is measured.
The edge number a _e represents the number of edges in the captured image corresponding to the model edge of the three-dimensional model 11.
The number c _c edge corresponding point candidate represents the number of edges corresponding point candidate existing around the edges in the captured image corresponding to the model edge. That is, the number c _c of edge corresponding point candidates indicates the number of different edges existing around the target edge.
The feature point number _ap represents the number of feature points in the captured image corresponding to the feature points in the feature point database 21.
The reliability v of the initial camera posture indicates the reliability of the integrated error with respect to the value before the camera posture is estimated (initial value of the camera posture). This is a parameter that takes into account that the error will increase depending on the initial value of optimization.
The motion blur intensity b indicates the number of edges that blur according to the motion of the camera 2. Here, the blurring of the edge means that the edge widens when the camera is shaken in terms of video.

つまり、本実施形態では、トラッキング状態計測部４０は、トラッキング状態として、エッジ対応誤差err₁および特徴点対応誤差err₂の他に、入力画像から５種類のパラメータ（ｖ、ｂ、ｃ_c、ａ_e、ａ_p）を計測する。この５種類のパラメータは、後段の信頼度計算部５０で信頼度を計算するために用いられるものである。
これらのパラメータを計測するため、トラッキング状態計測部４０は、図１に示すように、エッジ検出マッチング部４１と、動きボケ計算部（動きボケ計算手段）４２と、特徴点検出マッチング部（特徴点検出マッチング手段）４３と、初期カメラ姿勢計算部（初期カメラ姿勢計算手段）４４とを備える。なお、計測とは、対象の量の直接的な計測と、関連した量から直接計測した結果から算出するという間接的な計測との両方を含んでいる。 In other words, in the present embodiment, the tracking state measuring unit 40 uses five parameters (v, b, c _c , a) from the input image as the tracking state, in addition to the edge correspondence error err ₁ and the feature point correspondence error err _2. _e , a _p ) are measured. These five types of parameters are used by the reliability calculation unit 50 in the subsequent stage to calculate the reliability.
In order to measure these parameters, the tracking state measurement unit 40, as shown in FIG. 1, includes an edge detection matching unit 41, a motion blur calculation unit (motion blur calculation unit) 42, and a feature point detection matching unit (feature inspection). Output matching means) 43 and an initial camera attitude calculation unit (initial camera attitude calculation means) 44. Note that the measurement includes both direct measurement of the target amount and indirect measurement calculated from the result of direct measurement from the related amount.

エッジ検出マッチング部４１は、入力される撮影映像からエッジを検出し、３次元モデル１１に格納されたモデルエッジと、検出されたエッジとのマッチング処理を行うものである。ここでは、エッジ検出マッチング部４１は、エッジ検出部（エッジ検出手段）４５と、エッジマッチング部（エッジマッチング手段）４６とを備える。
エッジ検出部４５は、入力する撮影画像からエッジを検出し、エッジマッチング部４６に出力する。 The edge detection matching unit 41 detects an edge from an input captured video and performs a matching process between the model edge stored in the three-dimensional model 11 and the detected edge. Here, the edge detection matching unit 41 includes an edge detection unit (edge detection unit) 45 and an edge matching unit (edge matching unit) 46.
The edge detection unit 45 detects an edge from the photographed image to be input and outputs it to the edge matching unit 46.

エッジマッチング部４６は、被写体の３次元モデル１１に含まれるモデルエッジと、エッジ検出部４５で検出されたエッジとのマッチング処理を行い、エッジ対応点候補の数ｃ_cとエッジ数ａ_eとをカウントして算出する。エッジマッチング部４６で用いるオペレータは、特に限定されないが、例えば、エッジ検出オペレータでエッジ数ａ_e等を算出することができる。ここで、エッジ検出オペレータは、Sobelオペレータであってもよいし、Prewittオペレータ、Robertsオペレータ等の微分型と呼ばれるエッジ検出手法や、Robinsonのエッジ検出オペレータやKirschのエッジ検出オペレータ等のテンプレート型と呼ばれるエッジ検出手法等、様々なエッジ検出手法を用いることができる。算出されたエッジ対応点候補の数ｃ_cとエッジ数ａ_eとは、信頼度計算部５０に出力される。なお、得られたエッジ数ａ_eの値は、信頼度補正部６０でも用いられる。 The edge matching unit 46 performs a matching process between the model edge included in the three-dimensional model 11 of the subject and the edge detected by the edge detection unit 45, and calculates the number of edge corresponding point candidates c _c and the number of edges a _e . Count to calculate. The operator used in the edge matching unit 46 is not particularly limited. For example, the edge number a _e can be calculated by the edge detection operator. Here, the edge detection operator may be a Sobel operator, an edge detection method called a differential type such as a Prewitt operator or a Roberts operator, or a template type such as an edge detection operator of Robinson or an edge detection operator of Kirsch. Various edge detection methods such as an edge detection method can be used. The calculated number of edge corresponding point candidates c _c and the number of edges a _e are output to the reliability calculation unit 50. Note that the value of the obtained edge number a _e is also used in the reliability correction unit 60.

また、エッジマッチング部４６は、マッチング処理によりエッジ対応誤差err₁を求め、求めたエッジ対応誤差err₁をカメラ姿勢推定手段７０の姿勢移動量計算部７１に出力する。なお、エッジマッチング部４６は、初期カメラ姿勢計算部４４から取得する、推定前の状態（これを状態ｋとする）のカメラ姿勢（初期カメラ姿勢Ｅ^(k)）をカメラ姿勢推定手段７０のカメラ姿勢計算部７２に出力する。 Further, the edge matching unit 46 obtains the edge correspondence error err ₁ by matching processing, and outputs the obtained edge correspondence error err ₁ to the posture movement amount calculation unit 71 of the camera posture estimation means 70. The edge matching unit 46 uses the camera posture (initial camera posture E ^(k) ) obtained from the initial camera posture calculation unit 44 in the state before estimation (this is referred to as state k) as the camera of the camera posture estimation unit 70. Output to the posture calculation unit 72.

動きボケ計算部（動きボケ計算手段）４２は、入力される撮影映像の動きボケに基づき、カメラの動きに応じて映像中でぼけるエッジの個数を示す動きボケ強度ｂを算出する。なお、動きボケ強度ｂは直接的に測定するものではなく、エッジの平均幅ｗ_avgから算出するものである。なお、このような変換には従来公知の手法を用いることができる。算出された動きボケ強度ｂは、信頼度計算部５０に出力される。 The motion blur calculation unit (motion blur calculation means) 42 calculates a motion blur intensity b indicating the number of edges blurred in the video according to the motion of the camera, based on the motion blur of the input captured video. Note that the motion blur intensity b is not directly measured, but is calculated from the average edge width w _avg . A conventionally known method can be used for such conversion. The calculated motion blur intensity b is output to the reliability calculation unit 50.

特徴点検出マッチング部（特徴点検出マッチング手段）４３は、入力される撮影映像から特徴点を検出し、特徴点データベース２１に格納された特徴点と、検出された特徴点とのマッチング処理により、特徴点数ａ_pをカウントして算出する。特徴点検出マッチング部４３で用いるオペレータは、特に限定されるものではなく、特徴点数ａ_pをカウントすることができるものであればよい。算出された特徴点数ａ_pの値は、信頼度計算部５０に出力される。なお、得られた特徴点数ａ_pの値は、信頼度補正部６０でも用いられる。 A feature point detection matching unit (feature point detection matching means) 43 detects a feature point from the input captured video, and performs a matching process between the feature point stored in the feature point database 21 and the detected feature point. The number of feature points a _p is counted and calculated. The operator used in the feature point detection matching unit 43 is not particularly limited as long as it can count the number of feature points _ap . The calculated value of the number of feature points a _p is output to the reliability calculation unit 50. Note that the obtained value of the number of feature points a _p is also used in the reliability correction unit 60.

初期カメラ姿勢計算部（初期カメラ姿勢計算手段）４４は、特徴点検出マッチング部４３から、特徴点のマッチング処理の結果と、特徴点データベース２１に格納されたカメラパラメータ等の情報とを取得し、推定前の状態（状態ｋ）のカメラ姿勢（初期カメラ姿勢Ｅ^(k)）を計算で求め、エッジマッチング部４６に出力する。なお、状態ｋは、例えば、入力映像のフレーム番号に対応している。
また、初期カメラ姿勢計算部４４は、求めた初期カメラ姿勢Ｅ^(k)に対して初期カメラ姿勢の信頼度ｖを算出する。算出された初期カメラ姿勢の信頼度ｖの値は、信頼度計算部５０に出力される。 The initial camera posture calculation unit (initial camera posture calculation means) 44 obtains the result of the feature point matching process and information such as camera parameters stored in the feature point database 21 from the feature point detection matching unit 43, The camera posture (initial camera posture E ^(k) ) in the state before estimation (state k) is obtained by calculation and output to the edge matching unit 46. The state k corresponds to, for example, the frame number of the input video.
Further, the initial camera posture calculation unit 44 calculates the reliability v of the initial camera posture with respect to the obtained initial camera posture E ^(k) . The calculated value of the reliability v of the initial camera posture is output to the reliability calculation unit 50.

＜信頼度計算部＞
信頼度計算部（信頼度計算手段）５０は、トラッキング状態計測部４０で計測されたトラッキング状態に応じて、撮影画像中のエッジから求めたエッジ対応誤差err₁と、撮影画像中の特徴点から求めた特徴点対応誤差err₂とを案分する指標として、エッジベーストラッキングと特徴点ベーストラッキングとを統合した統合トラッキングの信頼度ｆを計算するものである。 <Reliability calculator>
The reliability calculation unit (reliability calculation means) 50 uses the edge correspondence error err ₁ obtained from the edge in the captured image and the feature point in the captured image according to the tracking state measured by the tracking state measurement unit 40. Using the obtained feature point correspondence error err ₂ as an index, the reliability f of integrated tracking obtained by integrating edge-based tracking and feature point-based tracking is calculated.

本実施形態では、信頼度計算部５０は、計測された動きボケ強度ｂの値が予め定められた第１閾値ｔｈ_bよりも大きい場合には、エッジ対応誤差err₁だけ用いるように統合トラッキングの信頼度ｆを計算する。
また、信頼度計算部５０は、計測された動きボケ強度ｂの値が第１閾値ｔｈ_b以下の場合、かつ、計測された初期カメラ姿勢の信頼度ｖの値が予め定められた第２閾値ｔｈ_vよりも大きい場合には、動きボケ強度ｂに比例し、かつ、初期カメラ姿勢の信頼度ｖおよび特徴点数ａ_pにそれぞれ反比例するように案分して統合トラッキングの信頼度ｆを計算する。 In the present embodiment, the reliability calculation unit 50 uses the integrated tracking so that only the edge correspondence error err _{1 is} used when the value of the measured motion blur intensity b is larger than a predetermined first threshold th _b . The reliability f is calculated.
In addition, the reliability calculation unit 50 is configured such that the measured value of the motion blur intensity b is equal to or less than the first threshold th _b , and the measured value of the reliability v of the initial camera posture is a predetermined second threshold. If it is larger than th _v, the integrated tracking reliability f is calculated by proportionally proportional to the motion blur intensity b and inversely proportional to the reliability v of the initial camera posture and the number of feature points _ap. .

また、信頼度計算部５０は、計測された動きボケ強度ｂの値が第１閾値ｔｈ_b以下の場合、かつ、計測された初期カメラ姿勢の信頼度ｖの値が第２閾値ｔｈ_v以下の場合には、エッジ数ａ_eおよび動きボケ強度ｂにそれぞれ比例し、かつ、初期カメラ姿勢の信頼度ｖ、特徴点数ａ_pおよびエッジ対応点候補の数ｃ_cにそれぞれ反比例するように案分して統合トラッキングの信頼度ｆを計算する。 In addition, the reliability calculation unit 50, when the value of the measured motion blur intensity b is less than or equal to the first threshold th _b and the value of the reliability v of the measured initial camera posture is less than or equal to the second threshold th _v In this case, the distribution is proportional to the number of edges a _e and the motion blur intensity b, and inversely proportional to the reliability v of the initial camera posture, the number of feature points a _p, and the number of edge corresponding point candidates c _c. Then, the reliability f of the integrated tracking is calculated.

統合トラッキングの信頼度ｆは、例えば、式（１）のように記述される。ここで、ｆは統合トラッキングの信頼度、ｂは動きボケ強度、ｖは初期カメラ姿勢の信頼度、ａ_eはエッジ数、ａ_pは特徴点数、ｃ_cはエッジ対応点候補の数、Κ₀、Κ₁は調整用の定数、th_bは動きボケ強度ｂ閾値（第１閾値）、th_vは初期カメラ姿勢の信頼度の閾値（第２閾値）をそれぞれ示す。Κ₀、Κ₁、th_b、th_vはユーザが指定することができる。 The reliability f of integrated tracking is described, for example, as in Expression (1). Here, the reliability of f the consolidated tracking, b is the motion blur intensity, v is the reliability of the initial camera position, a _e is the number of edges, a _p is the number of feature points, c _c is the number of edges corresponding point candidates, kappa ₀ , Κ ₁ is a constant for adjustment, th _b is a motion blur intensity b threshold value (first threshold value), and th _v is a reliability threshold value (second threshold value) of the initial camera posture. Κ ₀ , Κ ₁ , th _b , th _v can be specified by the user.

なお、後記するように、式（１）のif文を上から順番に適用して必要なパラメータの演算だけを行うようにすることもできる。ここでは、一例として、トラッキング状態計測部４０が５種類のパラメータ（ｖ、ｂ、ｃ_c、ａ_e、ａ_p）をすべて算出することとしているので、式（１）を式（１ａ）、式（１ｂ）、式（１ｃ）のように書き換えることとする。 As will be described later, it is possible to apply only the necessary parameters by applying the if statement of formula (1) in order from the top. Here, as an example, since the tracking state measurement unit 40 calculates all five types of parameters (v, b, c _c , a _e , a _p ), Expression (1) is replaced with Expression (1a), Rewrite as (1b) and equation (1c).

＜信頼度補正部＞
信頼度補正部（信頼度補正手段）６０は、信頼度計算部５０で計算された統合トラッキングの信頼度ｆを補正した補正信頼度ηを生成するものである。前記した式（１ｂ）および式（１ｃ）は、特徴点の数がエッジの数と同じであると仮定して導出されているので、特徴点の数とエッジの数とが同等ではない場合には不都合が生じる可能性がある。そこで、特徴点の数とエッジの数とが同等ではない場合に、統合トラッキングの信頼度ｆを補正することした。本実施形態では、信頼度補正部６０は、特徴点数ａ_pに対するエッジ数ａ_eの割合を示すサンプル比γを求め、サンプル比γと、そのときの統合トラッキングの信頼度ｆとに応じて補正信頼度ηを生成する。これにより、特徴点の数とエッジの数とが同等ではない場合であってもカメラ姿勢の推定精度を向上させることができる。ここで生成された補正信頼度ηは、カメラ姿勢推定手段７０に出力される。 <Reliability correction unit>
The reliability correction unit (reliability correction unit) 60 generates a correction reliability η obtained by correcting the integrated tracking reliability f calculated by the reliability calculation unit 50. Since the above-described equations (1b) and (1c) are derived on the assumption that the number of feature points is the same as the number of edges, the number of feature points is not equal to the number of edges. Can cause inconvenience. Therefore, the reliability f of the integrated tracking is corrected when the number of feature points and the number of edges are not equal. In this embodiment, the reliability correction unit 60 obtains a sample ratio γ indicating the ratio of the number of edges a _{e to} the number of feature points a _p , and corrects according to the sample ratio γ and the reliability f of integrated tracking at that time. A reliability η is generated. Thereby, even when the number of feature points and the number of edges are not equal, the estimation accuracy of the camera posture can be improved. The correction reliability η generated here is output to the camera posture estimation means 70.

本実施形態では、信頼度補正部６０は、統合トラッキングの信頼度ｆが０．５より大きく、かつ、サンプル比γが１より小さい場合には、統合トラッキングの信頼度ｆに比例し、かつ、サンプル比γに反比例するような第１補正式により補正信頼度ηを計算する。
また、信頼度補正部６０は、統合トラッキングの信頼度ｆが０．５より小さく、かつ、サンプル比γが１より大きい場合には、統合トラッキングの信頼度ｆに比例し、かつ、サンプル比γに反比例するような第２補正式により補正信頼度ηを計算する。 In the present embodiment, the reliability correction unit 60 is proportional to the reliability f of integrated tracking when the reliability f of integrated tracking is greater than 0.5 and the sample ratio γ is less than 1, and The correction reliability η is calculated by the first correction formula that is inversely proportional to the sample ratio γ.
Further, when the integrated tracking reliability f is smaller than 0.5 and the sample ratio γ is larger than 1, the reliability correction unit 60 is proportional to the integrated tracking reliability f and the sample ratio γ. The correction reliability η is calculated by a second correction formula that is inversely proportional to.

ただし、補正の必要がない場合、つまり、前記した２つの条件を満たさない場合には、信頼度補正部６０は、取得した統合トラッキングの信頼度ｆをそのまま補正信頼度ηとする（η＝ｆ）。つまり、この場合には、統合トラッキングの信頼度ｆ（補正信頼度η）が、カメラ姿勢推定手段７０に出力されることとなる。 However, when correction is not necessary, that is, when the above two conditions are not satisfied, the reliability correction unit 60 directly uses the acquired integrated tracking reliability f as the correction reliability η (η = f ). That is, in this case, the integrated tracking reliability f (correction reliability η) is output to the camera posture estimation means 70.

信頼度補正部６０は、例えば、次の式（２）により補正信頼度ηを計算し、式（３）によりサンプル比γを算出する。 For example, the reliability correction unit 60 calculates the correction reliability η by the following equation (2), and calculates the sample ratio γ by the equation (3).

なお、式（２）は、次の式（２ａ）に示す第１補正式、式（２ｂ）に示す第２補正式、式（２ｃ）、式（２ｄ）に分解することができる。 Equation (2) can be decomposed into the first correction equation shown in the following equation (2a), the second correction equation shown in equation (2b), equation (2c), and equation (2d).

＜カメラ姿勢推定手段＞
カメラ姿勢推定手段７０は、信頼度計算部５０で計算された統合トラッキングの信頼度ｆに応じて、エッジ対応誤差err₁と、特徴点対応誤差err₂とを案分した統合誤差errを生成し、統合誤差errが最小となるように現在のカメラ姿勢を推定するものである。ここでは、推定前の状態ｋにおけるカメラ姿勢（初期カメラ姿勢Ｅ^(k)）に対して、現在のカメラ姿勢をＥ^(k+1)と表記する。なお、状態ｋは、例えば、入力映像のフレーム毎の状態を示す。本実施形態では、カメラ姿勢推定手段７０は、信頼度補正部６０で生成された補正信頼度ηに応じて統合誤差errを求め、現在のカメラ姿勢Ｅ^(k+1)を推定することとした。 <Camera posture estimation means>
The camera pose estimation means 70 generates an integrated error err that is an apportionment of the edge correspondence error err ₁ and the feature point correspondence error err ₂ according to the integrated tracking reliability f calculated by the reliability calculation unit 50. The current camera posture is estimated so that the integrated error err is minimized. Here, the current camera posture is expressed as E ^{(k + 1)} with respect to the camera posture in the state k before estimation (initial camera posture E ^(k) ). The state k indicates the state of each frame of the input video, for example. In the present embodiment, the camera posture estimation means 70 obtains the integrated error err according to the correction reliability η generated by the reliability correction unit 60 and estimates the current camera posture E ^{(k + 1)} . .

カメラ姿勢推定手段７０は、カメラトラッキング誤差が最小となるときのカメラ姿勢を例えば、後記する線形反復解法により推定するものであり、姿勢移動量計算部７１と、カメラ姿勢計算部７２とを備える。
姿勢移動量計算部７１は、エッジマッチング部４６から取得するエッジ対応誤差err₁と、特徴点検出マッチング部４３から取得する特徴点対応誤差err₂とを、信頼度補正部６０から取得する補正信頼度ηに応じて加重総和し、この統合誤差errが最小となるようにカメラ姿勢移動量ΔＥを算出する。この姿勢移動量計算部７１は、例えば、式（４）により統合誤差errを求める。この統合誤差errは、最小化させるエラー関数において、エッジおよび特徴点の双方の対応誤差（信頼度）により求めた単一の重み値である。この重み値は、後記するように分析により解析的に求められたものである。 The camera posture estimation means 70 estimates the camera posture when the camera tracking error is minimized by, for example, a linear iterative solution method described later, and includes a posture movement amount calculation unit 71 and a camera posture calculation unit 72.
The posture movement amount calculation unit 71 corrects the edge correspondence error err ₁ acquired from the edge matching unit 46 and the feature point correspondence error err ₂ acquired from the feature point detection matching unit 43 from the reliability correction unit 60. Weighted summation is performed according to the degree η, and the camera attitude movement amount ΔE is calculated so that the integration error err is minimized. The posture movement amount calculation unit 71 obtains the integrated error err using, for example, Expression (4). The integrated error err is a single weight value obtained by the error (reliability) of both the edge and the feature point in the error function to be minimized. The weight value is obtained analytically by analysis as will be described later.

err＝ηerr₁＋（１−η）err₂ … 式（４） err = ηerr ₁ + (1−η) err ₂ ... (4)

カメラ姿勢計算部７２は、姿勢移動量計算部７１で算出されたカメラ姿勢移動量ΔＥと、エッジマッチング部４６から取得する推定前の初期カメラ姿勢Ｅ^(k)（状態ｋのカメラ姿勢Ｅ^(k)）とに基づいて、式（５）により現在のカメラ姿勢Ｅ^(k+1)を推定する。カメラ姿勢Ｅ^(k)，Ｅ^(k+1)等は後記するようにマトリクスで表されるものであって、式（５）において「・」はマトリクスの乗算を示す。 Camera orientation calculation unit 72, a camera attitude change amount ΔE calculated in attitude change amount calculation unit 71, the initial camera pose before estimates obtained from the edge matching unit 46 E ^(k) (the state k of the camera posture E ^{(k )} )), The current camera posture E ^{(k + 1)} is estimated by equation (5). The camera postures E ^(k) , E ^{(k + 1) and the} like are represented by a matrix as will be described later, and “·” in the equation (5) indicates matrix multiplication.

Ｅ^(k+1)＝ΔＥ・Ｅ^(k) … 式（５） E ^{(k + 1)} = ΔE · E ^(k) (5)

なお、図１に示したトラッキング状態計測部４０、信頼度計算部５０、信頼度補正部６０、およびカメラ姿勢推定部は、例えば、ＣＰＵが記憶手段のＲＯＭ等に格納された所定のプログラムをＲＡＭに展開して実行することにより実現されるものである。 Note that the tracking state measurement unit 40, the reliability calculation unit 50, the reliability correction unit 60, and the camera posture estimation unit illustrated in FIG. 1 are stored in, for example, a predetermined program stored in a ROM or the like of a storage unit by the CPU. It is realized by expanding and executing the above.

＜出力手段＞
出力手段８０は、推定された現在のカメラ姿勢Ｅ^(k+1)をレンダリング装置４（図２参照）や図示しない出力装置に出力するものであり、所定の出力インタフェース等から構成される。なお、図示しない出力装置は、例えば、ＨＭＤ、ＣＲＴ（Cathode Ray Tube）、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）、ＰＤＰ（Plasma Display Panel）、ＥＬ（Electronic Luminescence）等から構成される。なお、図示しない出力装置は、レンダリング装置４（図２参照）や映像合成装置５（図２参照）から出力される情報を切り替えてそれぞれ表示することもできる。 <Output means>
The output unit 80 outputs the estimated current camera posture E ^{(k + 1)} to the rendering device 4 (see FIG. 2) or an output device (not shown), and is configured by a predetermined output interface or the like. The output device (not shown) includes, for example, an HMD, a CRT (Cathode Ray Tube), a liquid crystal display (LCD), a PDP (Plasma Display Panel), an EL (Electronic Luminescence), and the like. Note that an output device (not shown) can also switch and display information output from the rendering device 4 (see FIG. 2) and the video composition device 5 (see FIG. 2).

［４．線形反復解法によるモデルベースカメラトラッキング］
＜４．０．＞
ここでは、カメラ姿勢推定手段７０が行う線形反復解法の原理を説明する。カメラ姿勢の推定処理はLie群およびLie代数に基づくDrummondの手法をベースとしている。なお、Drummondの手法については、「Drummond T. and Cipolla R.: Real-time visual tracking of complex structures, IEEE Trans. on Pattern Analysis and Machine Intelligence 2002; 24(7): 932-946」に記載されている。以下に、具体的な手法について述べる。 [4. Model-based camera tracking by linear iterative method]
<4.0. >
Here, the principle of the linear iterative solution performed by the camera posture estimation means 70 will be described. The camera pose estimation process is based on Drummond's method based on Lie group and Lie algebra. The Drummond method is described in `` Drummond T. and Cipolla R .: Real-time visual tracking of complex structures, IEEE Trans. On Pattern Analysis and Machine Intelligence 2002; 24 (7): 932-946 ''. Yes. The specific method is described below.

＜４．１．カメラ射影行列＞
カメラ射影行列Ｐは、シーン上の（被写体上の特徴点などの）３次元座標（Ｘ，Ｙ，Ｚ）とそれを投影した撮影画像上の２次元座標（u/w、v/w）との関係から、カメラの内部マトリクスをＫ、外部パラメータをＥとして乗算したものとして定義され、次の式（６）のように表される。なお、内部マトリクスは、例えばレンズ歪み等を含む光学的な未知の内部パラメータを示す。

ここで、Ｒおよびｔはそれぞれカメラの回転マトリクスと並進マトリクスである。 <4.1. Camera projection matrix>
The camera projection matrix P includes three-dimensional coordinates (X, Y, Z) on the scene (such as feature points on the subject) and two-dimensional coordinates (u / w, v / w) on the photographed image obtained by projecting them. From this relationship, the camera internal matrix is defined as K and the external parameter is multiplied by E, and is expressed as the following equation (6). The internal matrix indicates optically unknown internal parameters including, for example, lens distortion.

Here, R and t are a rotation matrix and a translation matrix of the camera, respectively.

＜４．２．カメラ姿勢の反復更新による推定＞
カメラの内部パラメータと視覚的な手がかりの3D-2Dの対応とが与えられた場合、式（６）の方程式を解くことにより、カメラの外部パラメータＥを計算可能である。本実施形態では、次の式（７）を繰り返すことにより、現フレームより前あるいはラフに求めたカメラ姿勢から現フレームのカメラ姿勢のマトリクスに更新する。つまり、繰り返しにより現フレームヘの運動行列に最適化する。

<4.2. Estimation by repetitive update of camera posture>
When the camera internal parameters and the 3D-2D correspondence of visual cues are given, the camera external parameter E can be calculated by solving the equation (6). In this embodiment, the following equation (7) is repeated to update the camera posture obtained before or roughly from the current frame to the camera posture matrix of the current frame. That is, the motion matrix for the current frame is optimized by repetition.

ここで、Ｍは４×４の運動行列でｘ，ｙ，ｚ各軸方向に対する微小な並進とｘ，ｙ，ｚ軸に対する微小な回転であり次の式（８）で表される。

Here, M is a 4 × 4 motion matrix, which is a minute translation with respect to the x, y, z axis directions and a minute rotation with respect to the x, y, z axes, and is represented by the following equation (8).

計算コストを低減するため、運動行列Ｍを次の式のように近似する。

In order to reduce the calculation cost, the motion matrix M is approximated by the following equation.

したがって、カメラ姿勢の推定はα_i（ｉ＝0，1，…，5）を推定することと等価で、それは、以上の演算を繰り返すことで行なうことが可能である。 Therefore, the estimation of the camera posture is equivalent to estimating α _i (i = 0, 1,..., 5), which can be performed by repeating the above calculation.

＜４．３．反復計算によるα_iの推定＞
投影したモデル（視覚的手がかりの座標）と、撮影画像上の対応する視覚的な手がかり（対応点）との誤差の２乗総和は次のように定式化できる。

<4.3. Estimation of α _i by iterative calculation>
The square sum of errors between the projected model (coordinates of visual cues) and the corresponding visual cues (corresponding points) on the captured image can be formulated as follows.

ここでＮは、エッジや特徴点などの視覚的な手がかりの数、ｄは、視覚的手がかりの座標と対応点との距離である。ｆ_iは視覚的手がかりの座標と対応点との動き成分（Ｇ_iで点を投影した際の変位に関係するもの：Drummondの手法参照）である。
もし、α_iが正解α_i ^GTと等しい場合、式（１１）の偏微分方程式は０となる。

Here, N is the number of visual cues such as edges and feature points, and d is the distance between the coordinates of the visual cues and the corresponding points. f _i is the motion component between the coordinates of the visual cue and the corresponding point (related to the displacement when the point is projected with G _i : see Drummond's method).
If α _i is equal to the correct α _i ^GT , the partial differential equation of equation (11) is zero.

しかし、通常実環境ではα_iはエラー項εを含むため、式（１１）は０にはならない。
つまり、α_i＝α_i ^GT＋εとなる。したがって式（１１）は以下のように変形できる。

However, since α _i usually includes an error term ε in a real environment, equation (11) does not become zero.
That is, α _i = α _i ^GT + ε. Therefore, equation (11) can be modified as follows.

式（１２）の方程式より、エラー項εを求めるための以下の線形方程式を得ることができる。

From the equation (12), the following linear equation for obtaining the error term ε can be obtained.

α_iより、得られたエラー項εを減算することによりα_i ^Extを得ることができる。また反復処理により、正解付近に収束する。つまり、運動行列Ｍは、４．２節で説明した式（７）を繰り返すループの内側に、この４．３節の線形方程式を解くループがあり、これらの反復により最適化がなされている。 α _i ^Ext can be obtained by subtracting the obtained error term ε from α _i . Also, it converges near the correct answer by iterative processing. That is, the motion matrix M has a loop that solves the linear equation in Section 4.3 inside the loop that repeats the expression (7) described in Section 4.2, and is optimized by these iterations.

＜４．４．視覚的な手がかりの解析的融合＞
エッジと特徴点の両者の視覚的手がかりを相補的に利用するため前記した式（１０）を以下のように変形する。

<4.4. Analytical fusion of visual cues>
In order to use the visual cues of both the edge and the feature point in a complementary manner, the above equation (10) is modified as follows.

ここで、Ｎ_e＋Ｎ_fは視覚的な手がかりの数（＝Ｎ）であり、Ｎ_eは、エッジの数（ａ_e）、Ｎ_fは特徴点の数（ａ_f）を示す。また、ηは補正信頼度を示す。前記した式（１１）、式（１２）、式（１３）についても同様に変形する。これらを成立させるには、補正信頼度ηを決定する必要がある。なお、式（１４）において、補正信頼度ηを統合トラッキングの信頼度ｆに置き換えてもよい。 Here, N _e + N _f is the number of visual cues (= N), N _e is the number of edges (a _e ), and N _f is the number of feature points (a _f ). Further, η indicates a correction reliability. The above formula (11), formula (12), and formula (13) are similarly modified. In order to establish these, it is necessary to determine the correction reliability η. In the equation (14), the correction reliability η may be replaced with the integrated tracking reliability f.

［５．撮影環境の分析手法］
＜５．０．前提＞
この章では、各視覚的な手がかり（エッジや特徴点）をロバストに迫跡するために望ましい条件を分析して、任意環境下でダイナミックに統合トラッキングの信頼度ｆまたは補正信頼度ηを調整する評価式を求める手法の一例を説明する。以下では、エッジや特徴点の固有の性質を分析および解析しているが、この解析では、エッジや特徴点の双方の環境条件に対する依存性に着目している。 [5. Analysis method of shooting environment]
<5.0. Premise>
In this chapter, we analyze the desired conditions to robustly track each visual cue (edge or feature point) and dynamically adjust the integrated tracking reliability f or correction reliability η in any environment An example of a method for obtaining the evaluation formula will be described. In the following, the unique properties of edges and feature points are analyzed and analyzed, but this analysis focuses on the dependency of both edges and feature points on environmental conditions.

この解析のために、既知のカメラワークでＣＧ映像を制作し利用した。ここで、シーンに存在するＣＧオブジェクトをできるだけ多く撮影可能なようにＣＧオブジェクトから離してカメラを配置するよう設定した。そして、作成した映像を利用し、特徴点のみとエッジのみによる手法のそれぞれでカメラ姿勢を推定した。綿密な分析を行うために、既知（正解）の姿勢を初期（または直前）の姿勢として使用した。ここでは、４種類の分析（分析１〜分析４）を行った。 For this analysis, CG images were produced and used with known camera work. Here, the camera is set to be separated from the CG object so that as many CG objects existing in the scene as possible can be photographed. Then, using the created video, the camera posture was estimated by each of the method using only the feature points and only the edges. To perform a thorough analysis, a known (correct) posture was used as the initial (or just prior) posture. Here, four types of analyzes (Analysis 1 to Analysis 4) were performed.

＜５．１．分析１：ガウスノイズをカメラ姿勢推定の初期値に加えた場合＞
特徴点（以下、単に点ともいう）あるいはエッジベースのカメラトラッキング手法のロバスト性をカメラ姿勢推定の初期値（姿勢推定前の値）の正確さ（信頼度ｖ）に関して分析するため、カメラ姿勢推定の初期値の並進項に対して、異なるレベルのガウスノイズを加えシミュレーションを行った。このときの結果を図４に示す。 <5.1. Analysis 1: When Gaussian noise is added to the initial camera posture estimation value>
In order to analyze the robustness of feature points (hereinafter also simply referred to as points) or the edge-based camera tracking method with respect to the accuracy (reliability v) of the initial value of camera posture estimation (value before posture estimation), camera posture estimation A simulation was performed by adding different levels of Gaussian noise to the initial translation term. The result at this time is shown in FIG.

図４（ａ）は、点ベーストラッキングの結果、図４（ｂ）は、エッジベーストラッキングの結果をそれぞれ示している。各グラフの横軸は、カメラ姿勢として測定したカメラの並進ｔ₁、ｔ₂、ｔ₃およびカメラの回転ｒ₁、ｒ₂、ｒ₃と、再投影誤差を示している。各グラフの縦軸は誤差を示している。なお、誤差は標準偏差で示した。また、カメラ姿勢の回転角の単位はラジアン、再投影誤差の単位はカメラスクリーンにおける画素である。 FIG. 4A shows the result of point-based tracking, and FIG. 4B shows the result of edge-based tracking. The horizontal axis of each graph indicates the camera translations t ₁ , t ₂ , t ₃ measured as the camera posture, the camera rotations r ₁ , r ₂ , r _3, and the reprojection error. The vertical axis of each graph indicates an error. The error is shown by standard deviation. The unit of rotation angle of the camera posture is radian, and the unit of reprojection error is a pixel on the camera screen.

図４において、ｖは、初期カメラ姿勢の信頼度ｖの値（０〜１）そのものではなく、それに対応する「加えられたノイズ」を表している。この分析１においては、ｖの値を０〜３０までの整数値の範囲で変化させて推定誤差を求めた。 In FIG. 4, “v” represents not the value (0 to 1) of the reliability “v” of the initial camera posture but the “added noise” corresponding thereto. In this analysis 1, the estimation error was obtained by changing the value of v in the range of integer values from 0 to 30.

加えられたノイズが小さい場合、図４（ｂ）に示すように、エッジベーストラッキングには大きな影響は見受けられない。次に、加えられたノイズが大きい場合、図４（ａ）に示すように、点ベーストラッキングには、顕著な影響は無かった。ところが、この場合、図４（ｂ）に示すように、エッジベーストラッキングの性能は指数関数的に低下した。このため、推定の初期値に用いる姿勢の精度は、点ベーストラッキングの方が有利であった。したがって、エッジベーストラッキングの信頼性を決定するためには、推定の初期値に用いる姿勢の精度が有力な条件となると考えられる。 When the added noise is small, as shown in FIG. 4B, there is no significant effect on edge-based tracking. Next, when the added noise was large, there was no significant effect on point-based tracking, as shown in FIG. However, in this case, as shown in FIG. 4B, the performance of edge-based tracking has decreased exponentially. For this reason, the point-based tracking is more advantageous for the posture accuracy used for the initial value of the estimation. Therefore, in order to determine the reliability of edge-based tracking, it is considered that the accuracy of the posture used for the initial value of the estimation is an effective condition.

＜５．２．分析２：シーンの複雑さ＞
≪５．２．１≫
シーンの複雑さとは、シーン自身に含まれるテクスチャとエッジの細かさであり、直接２つの視覚的な手がかりの量（エッジ数ａ_eと特徴点数ａ_p）に関連する。したがって、２つの視覚的な手がかりの量に関して点ベーストラッキングとエッジベーストラッキングの性能を分析した。このときの結果を図５に示す。図５（ａ）および図５（ｂ）に示すグラフは、図４（ａ）および図４（ｂ）と同様な横軸および縦軸を有している。図５（ａ）において、ａは、特徴点数ａ_pに相当し、図５（ｂ）において、ａは、エッジ数ａ_eに相当する。各グラフは、パラメータａを変化させたときの実験結果を示している。 <5.2. Analysis 2: Scene complexity>
≪5.2.1≫
The complexity of the scene is the texture and the fineness of the edges contained in the scene itself, and is directly related to the amount of two visual cues (the number of edges a _e and the number of feature points a _p ). Therefore, we analyzed the performance of point-based tracking and edge-based tracking with respect to the amount of two visual cues. The result at this time is shown in FIG. The graphs shown in FIGS. 5A and 5B have the horizontal and vertical axes similar to those in FIGS. 4A and 4B. In FIG. 5A, a corresponds to the number of feature points a _p , and in FIG. 5B, a corresponds to the number of edges a _e . Each graph shows experimental results when the parameter a is changed.

直接２つの視覚的な手がかりの量（エッジ数ａ_eと特徴点数ａ_p）は、モデル（被写体の３次元モデル）のエッジ上のサンプル間隔の変更と、参照特徴点（特徴点データベース）のサブサンプルの程度とによって均等に分配されるような操作により、視覚的な手がかりの総和が調整される。しかしながら、視覚的な手がかりの量（エッジ数ａ_eと特徴点数ａ_p）は、相応なカメラ姿勢推定には、ある最少量以上に保つ必要がある。この例では、特徴点数ａ_pは15個以上、エッジ数ａ_eは50個以上に保持する。これらの視覚的な手がかりの量が十分である場合には、図５に示すように、性能は、視覚的な手がかりの量の変化によって大きな影響を及ぼさないことが分かった。しかしながら、視覚的な手がかりの量が小さくなったとき、性能は直線的に低下している。 The amount of direct visual cues (number of edges a _e and number of feature points a _p ) can be obtained by changing the sample interval on the edge of the model (three-dimensional model of the object) and sub-reference feature points (feature point database). The sum of visual cues is adjusted by an operation that is evenly distributed according to the degree of the sample. However, the amount of visual cues (the number of edges a _e and the number of feature points a _p ) needs to be kept above a certain minimum amount in order to estimate the appropriate camera posture. In this example, the number of feature points a _p is kept at 15 or more, and the number of edges a _e is kept at 50 or more. When the amount of these visual cues is sufficient, it has been found that performance is not significantly affected by changes in the amount of visual cues, as shown in FIG. However, when the amount of visual cues decreases, performance decreases linearly.

≪５．２．２≫
また、シーンの複雑さは、撮影画像におけるモデルエッジとの対応点を探索する過程で信頼性に影響を及し易いエッジの分布に関係する。例えば、撮影画像に多くの誤ったエッジ（モデルのエッジに対応しないエッジ）があれば、誤対応の数（エッジ対応点候補の数ｃ_c）を増やす可能性が大きくなる。そこで、モデルエッジの対応点を探索する過程で、信頼性に関するエッジベーストラッキングの性能を分析した。そのために、妨害となるように、ランダムに配置したｌ（エル）本の赤いラインをカメラ画像に追加描画した。実験の結果、図６に示す通り、ｌの増加はエッジベーストラッキングの性能を顕著に低下させることが分かった。なお、図６のグラフは、図４のグラフと同様の軸を有している。 ≪5.2.2≫
The complexity of the scene is related to the distribution of edges that easily affect the reliability in the process of searching for corresponding points with the model edges in the captured image. For example, if there are many erroneous edges (edges that do not correspond to model edges) in the captured image, the possibility of increasing the number of erroneous correspondences (number of edge correspondence point candidates c _c ) increases. Therefore, we analyzed the performance of edge-based tracking related to reliability in the process of searching for corresponding points of model edges. Therefore, l red lines arranged at random are additionally drawn on the camera image so as to interfere. As a result of the experiment, as shown in FIG. 6, it has been found that an increase in l significantly reduces the performance of edge-based tracking. Note that the graph of FIG. 6 has the same axis as the graph of FIG.

＜５．３．分析３：カメラ映像への動きボケの追加＞
異なったレベルの水平方向の動きボケをカメラ画像に追加しシミュレーションを行った。モーションブラー（motion blur）をシミュレートするために移動平均によるフィルタ［1/b … 1/b］を用いた（［］内の1/bの個数はｂ個である）。正解のカメラ姿勢に対し推定した姿勢の差を図７に示す。なお、図７のグラフは、図４のグラフと同様の軸を有している。図７において、ｂは、動きボケ強度を表している。この分析においては、ｂの値を０〜１１までの整数値の範囲で変化させて推定誤差を求めた。 <5.3. Analysis 3: Add motion blur to camera image>
Different levels of horizontal motion blur were added to the camera image and simulated. To simulate motion blur, a moving average filter [1 / b... 1 / b] was used (the number of 1 / b in [] is b). FIG. 7 shows the estimated posture difference with respect to the correct camera posture. Note that the graph of FIG. 7 has the same axis as the graph of FIG. In FIG. 7, b represents the motion blur intensity. In this analysis, the estimation error was obtained by changing the value of b in the range of integer values from 0 to 11.

動きボケ（動きボケ強度ｂ）が増加するに従い、図７（ａ）の点ベーストラッキングと、図７（ｂ）のエッジベーストラッキングとの双方の性能が比例して低下している。しかしながら、点ベーストラッキングは比較的大きな影響を受けていることが分かる。よって、統合トラッキングの信頼度ｆまたは補正信頼度ηは、モーションブラーの量に比例し増加するべきだと考えられる。また、モーションブラーの増加が、カメラ画像で見つけられる特徴点数の減少を招いている点が注目された。したがって、相応のカメラ姿勢推定に必要とされるに十分な特徴点を抽出可能とするには、モーションブラーが所定の閾値（この実験によればｂ＝11）以下でなければならないと考えられる。 As the motion blur (motion blur intensity b) increases, the performance of both the point-based tracking in FIG. 7A and the edge-based tracking in FIG. 7B decreases in proportion. However, it can be seen that point-based tracking is relatively heavily affected. Therefore, it is considered that the integrated tracking reliability f or the correction reliability η should increase in proportion to the amount of motion blur. It was also noted that the increase in motion blur has led to a decrease in the number of feature points found in camera images. Therefore, it is considered that the motion blur must be equal to or less than a predetermined threshold (b = 11 according to this experiment) in order to be able to extract feature points sufficient for the corresponding camera posture estimation.

＜５．４．分析４：カメラ映像へのノイズ付加＞
異なるレベルのガウスノイズＮ（０，ｎ²）をカメラ画像の画素値（Ｒ，Ｇ，Ｂ）に加えシミュレーションを行った。推定した姿勢と正解との差を図８に示す。なお、図８のグラフは、図４のグラフと同様の軸を有している。この分析においては、ｎの値を０〜１００までの整数値の範囲で変化させて推定誤差を求めた。 <5.4. Analysis 4: Add noise to camera image>
A simulation was performed by adding different levels of Gaussian noise N (0, n ² ) to the pixel values (R, G, B) of the camera image. The difference between the estimated posture and the correct answer is shown in FIG. Note that the graph of FIG. 8 has the same axis as the graph of FIG. In this analysis, the estimation error was obtained by changing the value of n in the range of integer values from 0 to 100.

図８（ａ）の点ベーストラッキングと、図８（ｂ）のエッジベーストラッキングとの双方ともに、ｎの値を変化させても推定誤差に大きな影響はなかった。また、ガウスノイズに代えて、一様なノイズＵ（−ｎ，ｎ）を用いても同様の結果であった。これらの実験結果は、ＳＵＲＦ（Speeded Up Robust Features）やＣａｎｎｙオペレータのノイズに対するロバスト性によるものと考えられる。 In both the point-based tracking shown in FIG. 8A and the edge-based tracking shown in FIG. 8B, changing the value of n did not significantly affect the estimation error. Similar results were obtained even when uniform noise U (-n, n) was used instead of Gaussian noise. These experimental results are considered to be due to SURF (Speeded Up Robust Features) and robustness against the noise of the Canny operator.

＜５．５．分析のまとめ＞
分析の結果、統合トラッキングの信頼度ｆまたは補正信頼度η（以下、単にｆとする）は、エッジ数ａ_eと動きボケ強度ｂとに比例し、図４に示すｖと、特徴点数ａ_pと、図６に示すｌとに反比例し、図８に示すｎには関係しないと推察される。ここで、図４に示すｖは、初期カメラ姿勢の信頼度ｖに相当する。 <5.5. Summary of analysis>
As a result of the analysis, the integrated tracking reliability f or the correction reliability η (hereinafter simply referred to as f) is proportional to the number of edges a _e and the motion blur intensity b, and v and the number of feature points a _p shown in FIG. Is in inverse proportion to l shown in FIG. 6, and is not related to n shown in FIG. Here, v shown in FIG. 4 corresponds to the reliability v of the initial camera posture.

ここでは、簡略化のために、統合トラッキングの信頼度ｆと各パラメータとの関係が線形であると仮定した。また、実際には、図４に示すｖを直接的に測定するのではなく、特徴点とそれらの対応点との間の平均距離（ｄ_avg）からｖを算出する。また、図６に示すｌを直接的に測定するのではなく、エッジ対応点候補の数ｃ_cが図６に示すｌに比例するものと仮定し、エッジ対応点候補の数ｃ_cからｌを算出する。さらに、動きボケ強度ｂを直接的に測定するのではなく、エッジの平均幅（ｗ_avg）からｂを算出する。なお、このような変換式は公知である。 Here, for the sake of simplicity, it is assumed that the relationship between the integrated tracking reliability f and each parameter is linear. In practice, v shown in FIG. 4 is not directly measured, but v is calculated from the average distance (d _avg ) between the feature points and their corresponding points. Also, instead of directly measuring the l shown in FIG. 6, assuming that the number c _c edge corresponding point candidate is proportional to l shown in FIG. 6, the l from a few c _c edge corresponding point candidate calculate. Further, instead of directly measuring the motion blur intensity b, b is calculated from the average edge width ( _wavg ). Such a conversion formula is known.

以上の分析から、信頼度計算部５０で計算する統合トラッキングの信頼度ｆは、前記した式（１）のように記述されることとした。このように撮影環境を考慮して、エッジや特徴点固有の性質を実験に基づき分析したので、導出された式（１）や式（２）は、任意の環境に対し最適かつ柔軟に適用することができる。 From the above analysis, the reliability f of the integrated tracking calculated by the reliability calculation unit 50 is described as the above-described equation (1). As described above, since the properties specific to edges and feature points are analyzed based on experiments in consideration of the shooting environment, the derived equations (1) and (2) are optimally and flexibly applied to any environment. be able to.

［６．カメラ姿勢推定装置の動作］
次に、図１に示すカメラ姿勢推定装置１の動作について説明する。図１のブロック図は、そのままでカメラ姿勢推定装置１のカメラパラメータ推定アルゴリズムを示している。つまり、トラッキング状態計測部４０が、入力映像に対して５種類のパラメータ（ｖ、ｂ、ｃ_c、ａ_e、ａ_p）をすべて算出し、前記した式（１ａ）、式（１ｂ）、式（１ｃ）を用いることとした。このカメラ姿勢推定装置１の典型的な処理の流れについては、説明を省略し、代わりに、トラッキング状態計測部４０が、前記した式（１）を用い、入力映像に対して５種類のうち必要なパラメータだけを算出するときの信頼度計算処理および信頼度補正処理の手順の一例について図９を参照（適宜図１参照）して説明する。 [6. Operation of camera posture estimation device]
Next, the operation of the camera posture estimation apparatus 1 shown in FIG. 1 will be described. The block diagram of FIG. 1 shows the camera parameter estimation algorithm of the camera posture estimation apparatus 1 as it is. That is, the tracking state measurement unit 40 calculates all five types of parameters (v, b, c _c , a _e , a _p ) for the input video, and the above-described equations (1a), (1b), and (1c) was used. The description of the typical processing flow of the camera posture estimation device 1 is omitted, and instead, the tracking state measurement unit 40 uses the above-described equation (1) and is required among the five types for the input video. With reference to FIG. 9 (refer to FIG. 1 as appropriate), an example of the procedure of the reliability calculation process and the reliability correction process when calculating only the parameters will be described.

この場合、まず、カメラ姿勢推定装置１は、トラッキング状態計測部４０の動きボケ計算部４２によって、入力される撮影映像の動きボケに基づき、動きボケ強度ｂを測定する（ステップＳ１）。そして、カメラ姿勢推定装置１は、信頼度計算部５０によって、動きボケ強度ｂの値が第１閾値ｔｈ_bよりも大きいか否かを判別する（ステップＳ２）。動きボケ強度ｂの値が第１閾値ｔｈ_b以下の場合（ステップＳ２：Ｎｏ）、カメラ姿勢推定装置１は、トラッキング状態計測部４０の特徴点検出マッチング部４３によって、特徴点数ａ_pを測定し、初期カメラ姿勢計算部４４によって、初期カメラ姿勢の信頼度ｖを測定する（ステップＳ３）。 In this case, first, the camera posture estimation device 1 measures the motion blur intensity b based on the motion blur of the input captured video by the motion blur calculation unit 42 of the tracking state measurement unit 40 (step S1). Then, the camera posture estimation apparatus 1 determines whether or not the value of the motion blur intensity b is larger than the first threshold th _b by using the reliability calculation unit 50 (step S2). When the value of the motion blur intensity b is equal to or less than the first threshold th _b (step S2: No), the camera posture estimation device 1 measures the number of feature points _ap by the feature point detection matching unit 43 of the tracking state measurement unit 40. The initial camera posture calculation unit 44 measures the reliability v of the initial camera posture (step S3).

そして、カメラ姿勢推定装置１は、信頼度計算部５０によって、初期カメラ姿勢の信頼度ｖの値が第２閾値ｔｈ_vよりも大きいか否かを判別する（ステップＳ４）。初期カメラ姿勢の信頼度ｖの値が第２閾値ｔｈ_v以下の場合（ステップＳ４：Ｎｏ）、カメラ姿勢推定装置１は、トラッキング状態計測部４０のエッジマッチング部４６によって、エッジ数ａ_eとエッジ対応点候補の数ｃ_cとを測定する（ステップＳ５）。 Then, the camera posture estimation device 1 determines whether or not the reliability v of the initial camera posture is larger than the second threshold th _v by the reliability calculation unit 50 (step S4). When the reliability v of the initial camera posture is equal to or smaller than the second threshold th _v (step S4: No), the camera posture estimation device 1 uses the edge matching unit 46 of the tracking state measurement unit 40 to determine the number of edges a _e and the edge. measuring the number c _c corresponding point candidate (step S5).

そして、カメラ姿勢推定装置１は、信頼度計算部５０によって、５種類のパラメータ（ｂ、ｖ、ａ_p、ａ_e、ｃ_c）を用いて、前記した式（１ｃ）にしたがって統合トラッキングの信頼度ｆを算出すると共に、さらに、信頼度補正部６０によって、２種類のパラメータ（ａ_p、ａ_e）を用いて、前記した式（３）にしたがってサンプル比γを算出する（ステップＳ６）。 Then, the camera posture estimation apparatus 1 uses the reliability calculation unit 50 to calculate the reliability of integrated tracking according to the above-described equation (1c) using the five types of parameters (b, v, a _p , a _e , and c _c ). In addition to calculating the degree f, the reliability correction unit 60 calculates the sample ratio γ according to the above-described equation (3) using the two types of parameters (a _p , a _e ) (step S6).

そして、カメラ姿勢推定装置１は、信頼度補正部６０によって、統合トラッキングの信頼度ｆとサンプル比γとの関係を判定する（ステップＳ７）。そして、信頼度補正部６０は、統合トラッキングの信頼度ｆが０．５より大きく、かつ、サンプル比γが１より小さい場合には（ｆ＞０．５，かつ，γ＜１）、前記した式（２ａ）により補正信頼度ηを計算する（ステップＳ８）。 The camera posture estimation apparatus 1 determines the relationship between the integrated tracking reliability f and the sample ratio γ by the reliability correction unit 60 (step S7). Then, when the reliability f of integrated tracking is larger than 0.5 and the sample ratio γ is smaller than 1 (f> 0.5 and γ <1), the reliability correction unit 60 described above. The correction reliability η is calculated from the equation (2a) (step S8).

また、前記したステップＳ７において、信頼度補正部６０は、統合トラッキングの信頼度ｆが０．５より小さく、かつ、サンプル比γが１より大きい場合には（ｆ＜０．５，かつ，γ＞１）、前記した式（２ｂ）により補正信頼度ηを計算する（ステップＳ９）。
さらに、前記したステップＳ７において、信頼度補正部６０は、それ以外のその他の場合、統合トラッキングの信頼度ｆをそのまま補正信頼度η（η＝ｆ）とする（ステップＳ１０）。すなわち、ｆ≦０．５，かつ，γ≧１、または、ｆ≧０．５，かつ，γ≦１の場合、前記した式（２ｃ）または式（２ｄ）により補正信頼度ηを計算する。 In step S7, the reliability correction unit 60 determines that the integrated tracking reliability f is smaller than 0.5 and the sample ratio γ is larger than 1 (f <0.5 and γ > 1), the correction reliability η is calculated by the above equation (2b) (step S9).
Further, in step S7 described above, the reliability correction unit 60 sets the integrated tracking reliability f to the correction reliability η (η = f) as it is in other cases (step S10). That is, when f ≦ 0.5 and γ ≧ 1, or f ≧ 0.5, and γ ≦ 1, the correction reliability η is calculated according to the above equation (2c) or equation (2d).

一方、前記したステップＳ４において、初期カメラ姿勢の信頼度ｖの値が第２閾値ｔｈ_vよりも大きい場合（ステップＳ４：Ｙｅｓ）、カメラ姿勢推定装置１は、信頼度計算部５０によって、それまでに求めてある３種類のパラメータ（ｂ、ｖ、ａ_p）を用いて、前記した式（１ｂ）にしたがって統合トラッキングの信頼度ｆを算出する（ステップＳ１１）。続いて、前記したステップＳ１０において、信頼度補正部６０は、統合トラッキングの信頼度ｆをそのまま補正信頼度η（η＝ｆ）とする。 On the other hand, when the value of the reliability v of the initial camera posture is larger than the second threshold th _{v in} step S4 described above (step S4: Yes), the camera posture estimation device 1 uses the reliability calculation unit 50 until then. Using the three parameters (b, v, a _p ) obtained in (1), the integrated tracking reliability f is calculated according to the above-described equation (1b) (step S11). Subsequently, in step S10 described above, the reliability correction unit 60 sets the integrated tracking reliability f as it is as the correction reliability η (η = f).

このように初期カメラ姿勢の信頼度ｖの値が第２閾値ｔｈ_vよりも大きい場合（ステップＳ４：Ｙｅｓ）には、エッジ数ａ_eとエッジ対応点候補の数ｃ_cとを測定（または算出）する必要がなく、また、統合トラッキングの信頼度ｆを補正する必要がないので、演算処理の負荷を低減し、処理を高速化することができる。仮想現実感（ＶＲ）などではリアルタイム性が重要なファクタとなるので、計算負荷は低いほど良いこととなる。このため、図９のフローのように必要なパラメータだけを算出する処理はＶＲへの適用に効果的である。 Thus, when the value of the reliability v of the initial camera posture is larger than the second threshold th _v (step S4: Yes), the number of edges a _e and the number of edge corresponding point candidates c _c are measured (or calculated). ) And the reliability f of the integrated tracking need not be corrected, so that the processing load can be reduced and the processing speed can be increased. In virtual reality (VR) and the like, real-time performance is an important factor, so the lower the computational load, the better. Therefore, the process of calculating only the necessary parameters as in the flow of FIG. 9 is effective for application to VR.

また、前記したステップＳ２において、動きボケ強度ｂの値が第１閾値ｔｈ_bよりも大きい場合（ステップＳ２：Ｙｅｓ）、カメラ姿勢推定装置１は、信頼度計算部５０によって、前記した式（１ａ）にしたがって統合トラッキングの信頼度ｆの値を「１」とする（ステップＳ１２）。続いて、前記したステップＳ１０において、信頼度補正部６０は、統合トラッキングの信頼度ｆをそのまま補正信頼度η（η＝ｆ＝１）とする。 If the value of the motion blur intensity b is larger than the first threshold th _{b in} step S2 described above (step S2: Yes), the camera posture estimation device 1 uses the above-described formula (1a) by the reliability calculation unit 50. ), The integrated tracking reliability f is set to "1" (step S12). Subsequently, in step S10 described above, the reliability correction unit 60 sets the integrated tracking reliability f as it is as the correction reliability η (η = f = 1).

このように動きボケ強度ｂの値が第１閾値ｔｈ_bよりも大きい場合（ステップＳ２：Ｙｅｓ）には、動きボケ強度ｂ以外のパラメータを測定（または算出）する必要がなく、また、統合トラッキングの信頼度ｆを補正する必要がないので、演算処理の負荷を低減し、処理を高速化することができる。 Thus, when the value of the motion blur intensity b is larger than the first threshold th _b (step S2: Yes), it is not necessary to measure (or calculate) parameters other than the motion blur intensity b, and integrated tracking Since it is not necessary to correct the reliability f, the processing load can be reduced and the processing speed can be increased.

また、本実施形態によれば、カメラ姿勢推定装置１は、被写体の３次元モデル１１と特徴点データベース２１と撮影画像とに基づいて計測されたトラッキング状態に応じて、エッジ対応誤差err₁と、特徴点対応誤差err₂とを案分する割合を変動させて統合誤差errを生成し、現在のカメラ姿勢を推定する。したがって、後記する実施例に示すように、いずれか一方の誤差を用いる場合や、双方の誤差を固定的に案分する場合に比べて、カメラトラッキングの精度、頑健さ、撮影環境の自由度を向上することができる。 In addition, according to the present embodiment, the camera posture estimation device 1 includes the edge correspondence error err ₁ according to the tracking state measured based on the three-dimensional model 11 of the subject, the feature point database 21, and the captured image. An integrated error err is generated by varying the proportion of the feature point correspondence error err ₂ and the current camera posture is estimated. Therefore, as shown in the examples described later, the accuracy of the camera tracking, the robustness, and the freedom of the shooting environment are improved as compared with the case where either one of the errors is used or when both errors are fixedly distributed. Can be improved.

また、本実施形態によれば、エッジと特徴点を協調的に利用するために、環境に対する分析と、どのように統一的にエッジと特徴点の２つの特徴を評価すれば良いのかを考えるために、エッジや特徴点固有の性質を実験に基づき分析した。これにより、導き出された式（１）および式（２）は、任意の環境に対し最適かつ柔軟に適用できる。したがって、モデルベースによるカメラ姿勢推定手法を頑健にすることができるという効果を奏する。 In addition, according to the present embodiment, in order to use edges and feature points in a cooperative manner, analysis of the environment and how to evaluate two features of edges and feature points in a unified manner are considered. Furthermore, the properties unique to edges and feature points were analyzed based on experiments. Thereby, the derived equations (1) and (2) can be applied optimally and flexibly to any environment. Therefore, there is an effect that the model-based camera posture estimation method can be made robust.

以上、本実施形態に基づいて本発明を説明したが、本発明はこれに限定されるものではない。例えば、図１に示したカメラ姿勢推定装置は、一般的なコンピュータを、前記した各手段として機能させるプログラムにより動作させることで実現することができる。このプログラム（カメラ姿勢推定プログラム）は、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 As mentioned above, although this invention was demonstrated based on this embodiment, this invention is not limited to this. For example, the camera posture estimation apparatus shown in FIG. 1 can be realized by causing a general computer to operate according to a program that functions as each of the means described above. This program (camera posture estimation program) can be distributed via a communication line, or can be distributed by writing on a recording medium such as a CD-ROM.

また、本実施形態では、カメラ姿勢推定手段７０が、例えば、線形反復解法によりカメラ姿勢を推定することとしたが、最小値問題の解法は、例えば、ラグランジェ未定乗数法等の線形計画法を用いたり、ペナルティ法等の非線形計画法を用いたりして計算してもよい。 In the present embodiment, the camera posture estimation means 70 estimates the camera posture by, for example, a linear iterative solution method. However, for solving the minimum value problem, for example, a linear programming method such as a Lagrange undetermined multiplier method is used. The calculation may be performed using a nonlinear programming method such as a penalty method.

本発明による効果を確認するために、本発明のカメラ姿勢推定装置の性能を検証するコンピュータシミュレーションの実験を行った。具体的には、本発明のカメラ姿勢推定装置により推定したカメラ姿勢と、エッジまたは特徴点を単独で利用して推定したカメラ姿勢とを比較する実験（実験１）と、拡張現実感への応用システムにおいて環境の変化への適応能力を確認する実験（実験２）とを行った。 In order to confirm the effect of the present invention, a computer simulation experiment was conducted to verify the performance of the camera posture estimation apparatus of the present invention. Specifically, an experiment (experiment 1) comparing the camera posture estimated by the camera posture estimation apparatus of the present invention with a camera posture estimated by using an edge or a feature point alone, and application to augmented reality An experiment (Experiment 2) was conducted to confirm the adaptability to environmental changes in the system.

［実験１］
（実験条件）
前記した式（１）において、th_b、th_v、Κ₀、Κ₁のユーザ指定の定数の値は、それぞれ、th_b＝95、th_v＝６、Κ₀＝0.8、Κ₁＝0.12のように指定した。 [Experiment 1]
(Experimental conditions)
In the above equation (1), the values of user-specified constants of th _b , th _v , Κ ₀ , and の₁ are th _b = 95, th _v = 6, Κ ₀ = 0.8, and Κ ₁ = 0.12, respectively. It was specified as follows.

（比較例）
特徴点を単独で利用して推定したカメラ姿勢を求めた（点ベース方法：比較例１）。
また、エッジを単独で利用して推定したカメラ姿勢を求めた（エッジベース方法：比較例２）。さらに、エッジトラッキングと特徴点トラッキングとの双方を用いた場合であって、状況によらず固定値０．５で重み付けを行なった誤差を利用して推定したカメラ姿勢を求めた（信頼度をη＝０．５に固定した方法：比較例３）。 (Comparative example)
The estimated camera posture was obtained by using the feature points independently (point-based method: Comparative Example 1).
In addition, the camera posture estimated by using the edge alone was obtained (edge-based method: Comparative Example 2). Further, when both edge tracking and feature point tracking are used, an estimated camera posture is obtained using an error weighted with a fixed value of 0.5 regardless of the situation (reliability is η = Method fixed at 0.5: Comparative Example 3).

（実験結果）
カメラ姿勢として、カメラの並進ｔ₁、ｔ₂、ｔ₃と、カメラの回転ｒ₁、ｒ₂、ｒ₃とについて測定した。測定結果として平均値と標準偏差を表１に示す。 (Experimental result)
The camera orientation was measured for camera translations t ₁ , t ₂ , t ₃ and camera rotations r ₁ , r ₂ , r ₃ . Table 1 shows the average values and standard deviations as measurement results.

また、カメラの並進ｔ₁の場合の実験結果については、図１０に示す。カメラの並進ｔ₂の場合の実験結果については、図１１に示す。カメラの並進ｔ₃の場合の実験結果については、図１２に示す。図１０〜図１２において、（ａ）は比較例１、（ｂ）は比較例２、（ｃ）は比較例３、（ｄ）は本発明の実施例をそれぞれ示している。なお、図１０（ｅ）は、カメラの並進ｔ₁の場合に本発明の実施例において補正信頼度ηの変化を示している。図１０〜図１２の各グラフにおいて、横軸はフレーム番号（時間軸）を示す。また、縦軸は、推定したカメラ位置の誤差（正解との差）を示す。なお、図１０（ｅ）のグラフの縦軸は、補正信頼度ηを示している。 The experimental results in the case of camera translation t ₁ are shown in FIG. The experimental results in the case of camera translation t ₂ are shown in FIG. The experimental results in the case of camera translation t ₃ are shown in FIG. 10 to 12, (a) shows Comparative Example 1, (b) shows Comparative Example 2, (c) shows Comparative Example 3, and (d) shows an example of the present invention. FIG. 10E shows a change in the correction reliability η in the embodiment of the present invention in the case of the translation t ₁ of the camera. In each graph of FIGS. 10 to 12, the horizontal axis indicates the frame number (time axis). The vertical axis represents the estimated camera position error (difference from the correct answer). In addition, the vertical axis | shaft of the graph of FIG.10 (e) has shown correction | amendment reliability (eta).

図１０（ｅ）のグラフに示すｌ（エル）は、前記の５．２．２節で説明したパラメータであり、妨害となるようにカメラ画像に追加描画によりランダムに配置した赤いラインの本数であり、エッジ対応点候補の数ｃ_cに相当する。ここでは、各フレームを通じてｌ＝300であった。また、初期カメラ姿勢の信頼度ｖとして、特徴点とそれらの対応点との間の平均距離ｄ_avgを求め、これからｖを算出した。フレーム番号101〜151ではｖ＝５であったが、フレーム番号151〜201ではｖ＝１０であった。さらに、動きボケ強度ｂとして、エッジの平均幅ｗ_avgからｂを算出した。フレーム番号126〜176ではｂ＝７であった（ｂは11よりも小さかった）。これらにより、補正信頼度ηは図１０（ｅ）のグラフに示すように変化した。 L (el) shown in the graph of FIG. 10 (e) is the parameter described in the above section 5.2.2, and is the number of red lines randomly arranged in the camera image by additional drawing so as to interfere. _Yes , this corresponds to the number c _c of edge corresponding point candidates. Here, l = 300 throughout each frame. Further, an average distance d _avg between the feature points and their corresponding points was obtained as the reliability v of the initial camera posture, and v was calculated _therefrom . In frame numbers 101 to 151, v = 5, but in frame numbers 151 to 201, v = 10. Further, b was calculated from the average edge width w _avg as the motion blur intensity b. In frame numbers 126 to 176, b = 7 (b was smaller than 11). As a result, the correction reliability η changed as shown in the graph of FIG.

表１および図１０〜図１２に示すように、本発明の実施例は、エッジまたは特徴点を単独で利用して推定した比較例１や比較例２に比べて精度が良好であった。また、エッジおよび特徴点の双方を利用しつつもそれらの寄与を単純に平均化するだけの比較例３に比べても、本発明の実施例は、精度が良好であり、かつ、時間的にも安定した精度が得られた。したがって、本発明のカメラ姿勢推定装置は、従来よりも精度が向上し、頑健さが向上したと言える。 As shown in Table 1 and FIGS. 10 to 12, the example of the present invention had better accuracy than Comparative Example 1 and Comparative Example 2 that were estimated using edges or feature points alone. Also, compared with Comparative Example 3 in which both the edges and feature points are used and their contributions are simply averaged, the embodiment of the present invention has good accuracy and is temporally Stable accuracy was also obtained. Therefore, it can be said that the camera posture estimation apparatus of the present invention has improved accuracy and improved robustness compared to the prior art.

［実験２］
図２に示した映像合成システム１００の映像合成出力結果を比較した。撮影環境において、照明とカメラの向きとが時間的に変化するものとした。この映像合成システム１００において、カメラの動きと共に照明が変化する場合にカメラ姿勢を推定したときの映像合成結果を図１３に模式的に示す。図１３（ａ）、図１３（ｂ）、図１３（ｃ）は、この順番に時間が経過したときの映像合成出力結果を示している。映像合成出力結果のうち、ポットだけがＣＧオフジェクトを示し、ポットが載置された台および壁面は実写映像を示している。図１３では照明の変化を誇張して表現した。図１３（ａ）は照明が通常の場合、（ｂ）は照明が暗くなった場合、（ｃ）は照明が明るくなった場合をそれぞれ示している。 [Experiment 2]
The video synthesis output results of the video synthesis system 100 shown in FIG. 2 were compared. In the shooting environment, the lighting and the direction of the camera change over time. In this video composition system 100, FIG. 13 schematically shows a video composition result when the camera posture is estimated when the illumination changes with the movement of the camera. FIG. 13A, FIG. 13B, and FIG. 13C show video composition output results when time elapses in this order. Among the video composition output results, only the pot indicates a CG object, and the stand and the wall surface on which the pot is placed indicate a live-action video. In FIG. 13, the change in illumination is exaggerated. FIG. 13A shows a case where the illumination is normal, FIG. 13B shows a case where the illumination becomes dark, and FIG. 13C shows a case where the illumination becomes bright.

図１３（ｂ）に示すように照明が急に暗くなった場合であっても、エッジおよび特徴点の双方を検出してカメラ姿勢を安定に推定でき、ＣＧオフジェクトと実写映像とを精度よく合成することができた。また、図１３（ｃ）に示すように照明が急に明るくなった場合であっても、エッジおよび特徴点の双方を検出してカメラ姿勢を安定に推定でき、ＣＧオフジェクトと実写映像とを精度よく合成することができた。従来のモデルベースカメラ推定手法は、照明とカメラの向きとが時間的に変化しないことを前提に構築されているので、従来手法では、このようにＣＧオフジェクトと実写映像とを精度よく合成することはできなかった。したがって、本発明のカメラ姿勢推定装置を用いた映像合成システムは、環境の変化への効果的な適応や、柔軟さを従来よりも向上させることができると言える。 Even when the illumination suddenly becomes dark as shown in FIG. 13B, the camera posture can be stably estimated by detecting both edges and feature points, and the CG object and the live-action image can be accurately obtained. I was able to synthesize. Further, as shown in FIG. 13C, even when the lighting suddenly becomes brighter, it is possible to stably estimate the camera posture by detecting both the edge and the feature point, and to obtain the CG object and the live-action image. It was possible to synthesize with high accuracy. Since the conventional model-based camera estimation method is constructed on the assumption that the illumination and the direction of the camera do not change with time, the conventional method synthesizes the CG object and the live-action image with high accuracy in this way. I couldn't. Therefore, it can be said that the video composition system using the camera posture estimation apparatus of the present invention can improve the adaptation to the change of environment and the flexibility more than before.

本発明は、仮想現実感、拡張現実感の産業応用や映像合成を利用した映像制作に利用が可能である。また、姿勢情報を利用する様々なロボットなどにセンサとして利用が可能である。 The present invention can be used for industrial applications of virtual reality and augmented reality and video production using video composition. Also, it can be used as a sensor for various robots that use posture information.

１００映像合成システム
１カメラ姿勢推定装置
２カメラ
３仮想３次元物体モデル記憶手段
４レンダリング装置
５映像合成装置
６ＣＧデータ
１０３次元モデル記憶手段
１１３次元モデル
２０特徴点データベース記憶手段
２１特徴点データベース
３０映像入力手段
４０トラッキング状態計測部（トラッキング状態計測手段）
４１エッジ検出マッチング部
４２動きボケ計算部（動きボケ計算手段）
４３特徴点検出マッチング部（特徴点検出マッチング手段）
４４初期カメラ姿勢計算部（初期カメラ姿勢計算手段）
４５エッジ検出部（エッジ検出手段）
４６エッジマッチング部（エッジマッチング手段）
５０信頼度計算部（信頼度計算手段）
６０信頼度補正部（信頼度補正手段）
７０カメラ姿勢推定手段
７１姿勢移動量計算部
７２カメラ姿勢計算部
８０出力手段 DESCRIPTION OF SYMBOLS 100 Image composition system 1 Camera posture estimation apparatus 2 Camera 3 Virtual 3D object model storage means 4 Rendering apparatus 5 Image composition apparatus 6 CG data 10 3D model storage means 11 3D model 20 Feature point database storage means 21 Feature point database 30 Video input means 40 Tracking state measuring unit (tracking state measuring means)
41 Edge detection matching unit 42 Motion blur calculation unit (motion blur calculation means)
43 Feature Point Detection Matching Unit (Feature Point Detection Matching Unit)
44 Initial camera posture calculation unit (initial camera posture calculation means)
45 Edge detection unit (edge detection means)
46 Edge matching part (edge matching means)
50 Reliability calculation part (Reliability calculation means)
60 reliability correction unit (reliability correction means)
70 Camera Posture Estimation Unit 71 Posture Movement Amount Calculation Unit 72 Camera Posture Calculation Unit 80 Output Unit

Claims

A model-based camera posture estimation device for estimating a camera posture using edges and feature points in a captured image of a subject,
Three-dimensional model storage means for storing a three-dimensional model indicating information on characteristics of a subject existing in the shooting direction of the camera;
Feature point database storage means for storing a feature point database storing feature point information including feature point descriptors and three-dimensional information of the three-dimensional model;
Based on the first three-dimensional model and the feature point database created in advance and the first tracking error determined based on the edge based on the input photographed image including the subject, and the feature point Tracking state measuring means for measuring a tracking state including at least the second tracking error determined in the above;
Reliability calculation for calculating the reliability of integrated tracking in which edge-based tracking and feature point-based tracking are integrated as an index for apportioning the first error and the second error according to the measured tracking state Means,
According to the reliability of the integrated tracking, the ratio of dividing the first error and the second error is changed to generate an integrated error, and the current camera posture is estimated so that the integrated error is minimized. Camera posture estimation means for
A camera posture estimation apparatus comprising:

The tracking state measuring means includes
As the tracking state, the first error, the second error, the number of edges, the number of feature points, the motion blur intensity indicating the number of edges blurred according to the motion of the camera, and before the estimation of the camera posture Measuring the reliability of the initial camera posture indicating the reliability of the value of the value and the number of edge corresponding point candidates existing around the edge in the captured image corresponding to the model edge of the three-dimensional model;
The reliability calculation means includes:
When the value of the motion blur intensity measured as the tracking state is larger than a predetermined first threshold value, the reliability of the integrated tracking is calculated so that only the first error is used,
When the value of the motion blur intensity is equal to or less than the first threshold value and the reliability value of the initial camera posture measured as the tracking state is larger than a predetermined second threshold value, the motion blur value is determined. Proportionally proportional to the intensity and inversely proportional to the reliability of the initial camera posture and the number of feature points, respectively, to calculate the reliability of the integrated tracking,
When the value of the motion blur intensity is less than or equal to the first threshold, and when the reliability value of the initial camera posture is less than or equal to the second threshold, the number of edges and the motion blur intensity are proportional to each other, And, the reliability of the initial tracking is calculated, and the reliability of the integrated tracking is calculated so as to be inversely proportional to the number of feature points and the number of edge corresponding point candidates.
The camera posture estimation apparatus according to claim 1.

A reliability correction means for generating a correction reliability that corrects the reliability of the integrated tracking;
The camera posture estimation means varies a proportion of the first error and the second error according to the correction reliability, estimates a current camera posture,
The reliability correction means includes
If the integrated tracking reliability is greater than 0.5 and the sample ratio indicating the ratio of the number of edges to the number of feature points is less than 1, the sample is proportional to the integrated tracking reliability and the sample Calculating the correction reliability according to the first correction formula inversely proportional to the ratio;
When the integrated tracking reliability is smaller than 0.5 and the sample ratio is larger than 1, the second correction formula is proportional to the integrated tracking reliability and inversely proportional to the sample ratio. To calculate the correction reliability,
The camera posture estimation apparatus according to claim 2.

The tracking state measuring means includes
Edge detection means for detecting an edge from the input captured video;
Edge matching means for calculating the number of edge corresponding point candidates and the number of edges by matching processing between model edges included in the three-dimensional model and the detected edges;
Motion blur calculation means for calculating the motion blur intensity based on the motion blur of the photographed video input;
A feature point detection matching unit that detects a feature point from the input captured video and calculates the number of feature points by a matching process between the feature point stored in the feature point database and the detected feature point;
An initial camera posture calculating means for calculating the initial camera posture from the result of the feature point matching process and calculating a reliability of the initial camera posture with respect to the obtained initial camera posture;
The camera posture estimation apparatus according to claim 2 or claim 3, wherein

Three-dimensional model storage means for storing a three-dimensional model indicating information on the characteristics of the subject existing in the shooting direction of the camera in order to estimate the camera posture based on the model using the edges and feature points in the captured image of the subject; A computer comprising a feature point database storing means for storing a feature point database storing feature point information including feature point descriptors and three-dimensional information of the three-dimensional model,
Based on the first three-dimensional model and the feature point database created in advance and the first tracking error determined based on the edge based on the input photographed image including the subject, and the feature point Tracking state measuring means for measuring a tracking state including at least the second tracking error determined in the above;
Reliability calculation for calculating the reliability of integrated tracking in which edge-based tracking and feature point-based tracking are integrated as an index for apportioning the first error and the second error according to the measured tracking state means,
According to the reliability of the integrated tracking, the ratio of dividing the first error and the second error is changed to generate an integrated error, and the current camera posture is estimated so that the integrated error is minimized. Camera posture estimation means,
Camera posture estimation program to function as