JP2014032623A

JP2014032623A - Image processor

Info

Publication number: JP2014032623A
Application number: JP2012174320A
Authority: JP
Inventors: Tatsuya Kobayashi; 達也小林; Haruhisa Kato; 晴久加藤; Akio Yoneyama; 暁夫米山
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-08-06
Filing date: 2012-08-06
Publication date: 2014-02-20
Anticipated expiration: 2032-08-06
Also published as: JP5833507B2

Abstract

PROBLEM TO BE SOLVED: To improve the accuracy of corresponding point detection between an observation object and the camera image thereof without increasing calculation cost.SOLUTION: An image processor includes a probability density DB 122b for learning local feature information and three-dimensional coordinates of a first feature point detected from an observation object, and giving probability density of the three-dimensional coordinates to optional local feature information, a normal management part for managing a normal of each first feature point, a feature point detection part 121 for detecting a second feature point from a camera image obtained by photographing the observation object, and extracting the local feature information thereof, a camera visual line estimation device 14 for estimating a visual line of the camera image, a matching part 122 for associating each second feature point with the first feature point having a higher matching score on the basis of the local feature information and the probability density database, and a second culling processing part 124 for comparing a normal of the associated first feature point with the visual line of the camera image, and eliminates correspondence point candidates whose angular difference exceeds a predetermined threshold.

Description

本発明は、観察対象の三次元座標と当該観察対象をカメラで撮影して得られるカメラ画像の二次元座標とを対応付ける画像処理装置に係り、特に、計算コストや使用メモリ量を増加させることなく高精度での対応付けを可能にする画像処理装置に関する。 The present invention relates to an image processing apparatus that associates three-dimensional coordinates of an observation object with two-dimensional coordinates of a camera image obtained by photographing the observation object with a camera, and in particular, without increasing the calculation cost and the amount of memory used. The present invention relates to an image processing apparatus that enables association with high accuracy.

近年、現実空間の映像をコンピュータで処理して更なる情報を付加するAR（拡張現実感）技術が、WEBカメラの接続されたPCや、カメラ付き携帯電話端末上で実現されるようになっている。AR技術では、カメラ画像内の対象物に対するカメラ姿勢（カメラの外部パラメータ）を推定する必要があり、センサや基準マーカを利用した手法等が用いられている。また、形状や画像情報が既知である三次元物体を対象物としてカメラ姿勢を推定する技術が検討されている。 In recent years, AR (Augmented Reality) technology that adds real-time images by processing images in a real space has been realized on PCs connected to web cameras and mobile phone terminals with cameras. Yes. In the AR technology, it is necessary to estimate a camera posture (an external parameter of the camera) with respect to an object in a camera image, and a method using a sensor or a reference marker is used. In addition, a technique for estimating a camera posture using a three-dimensional object whose shape and image information are known as an object is being studied.

特許文献１には、複数の対象物の登録画像と入力画像の双方から検出された特徴点のマッチングを行い、相関の高い登録画像を特定し、特徴点の組み合わせと対象物の形状からカメラ姿勢を推定する技術が開示されている。 In Patent Document 1, feature points detected from both registered images and input images of a plurality of objects are matched, a highly correlated registered image is specified, and the camera posture is determined from the combination of the feature points and the shape of the object. A technique for estimating the above is disclosed.

非特許文献１には、分類器の出力に閾値を設定することで、対応点の精度（正しく対応付けられた点の割合）を高める技術が開示されている。 Non-Patent Document 1 discloses a technique for increasing the accuracy of corresponding points (the proportion of points correctly associated) by setting a threshold value for the output of the classifier.

特許４７１５５３９号Patent 4715539

D. Wagner et al., "Real-Time Detection and Tracking for Augmented Reality on Mobile Phones," Visualization and Computer Graphics, IEEE Transactions on, vol. 16, no. 3, pp. 355 -367, may/june. 2010.D. Wagner et al., "Real-Time Detection and Tracking for Augmented Reality on Mobile Phones," Visualization and Computer Graphics, IEEE Transactions on, vol. 16, no. 3, pp. 355 -367, may / june. 2010 .

特許文献１では、入力画像と登録画像間で特徴点マッチングを行うために、入力画像内の三次元物体の姿勢と近い状態で記録された登録画像がデータベースに存在しない場合に、対応点が少なくなって特徴点マッチングが困難になるという問題があった。 In Patent Document 1, in order to perform feature point matching between an input image and a registered image, there are few corresponding points when a registered image recorded in a state close to the posture of a three-dimensional object in the input image does not exist in the database. Thus, there is a problem that feature point matching becomes difficult.

また、対象物を様々な方向から撮影して登録画像を増やした場合に、相関の高い登録画像の特定に計算コストがかかり、かつ識別器のサイズが大きくなるために計算機の使用メモリ量が増加するという問題があった。 In addition, when the number of registered images is increased by shooting the object from various directions, the calculation cost is high for identifying highly correlated registered images, and the size of the classifier increases, so the amount of memory used by the computer increases. There was a problem to do.

非特許文献１には、対象物が平面画像である場合に高精度な対応点が得られるものの、三次元物体に対しては十分な精度が得られないという問題点があった。 Non-Patent Document 1 has a problem that a high-accuracy corresponding point can be obtained when the object is a planar image, but sufficient accuracy cannot be obtained for a three-dimensional object.

本発明の目的は、上記の技術課題を解決し、計算コストや使用メモリ量を増加させることなく、精度の高い多数の対応点を得られる画像処理装置を提供することにある。 An object of the present invention is to solve the above technical problem and provide an image processing apparatus capable of obtaining a large number of corresponding points with high accuracy without increasing the calculation cost and the amount of memory used.

上記の目的を達成するために、本発明は、観察対象の三次元座標と当該観察対象を撮影したカメラ画像の特徴点とを対応付ける画像処理装置において、以下のような構成を設けた点に特徴がある。 In order to achieve the above object, the present invention is characterized in that the following configuration is provided in an image processing apparatus that associates the three-dimensional coordinates of an observation object with the feature points of a camera image obtained by photographing the observation object. There is.

(1)観察対象から検出された第１特徴点の局所特徴情報および三次元座標を学習し、任意の局所特徴情報に対して三次元座標の確率密度を与える確率密度データベースと、各第１特徴点の法線を検出する手段と、観察対象を撮影したカメラ画像から第２特徴点を検出する特徴点検出手段と、各第２特徴点から局所特徴情報を抽出する局所特徴情報抽出手段と、カメラ画像の視線を推定する手段と、各第２特徴点を、その局所特徴情報および前記確率密度データベースに基づいて、マッチングスコアのより高い第１特徴点と対応付けるマッチング手段と、対応付けられた第１特徴点の法線とカメラ画像の視線とを比較し、角度差が所定の閾値を下回る対応関係を対応点として出力する対応点出力手段とを設けた。 (1) A probability density database that learns local feature information and three-dimensional coordinates of a first feature point detected from an observation target and gives a probability density of the three-dimensional coordinates to arbitrary local feature information, and each first feature Means for detecting a normal of a point, feature point detecting means for detecting a second feature point from a camera image obtained by photographing an observation object, local feature information extracting means for extracting local feature information from each second feature point, Means for estimating the line of sight of the camera image; matching means for associating each second feature point with a first feature point having a higher matching score based on the local feature information and the probability density database; Corresponding point output means is provided for comparing the normal of one feature point with the line of sight of the camera image and outputting as a corresponding point a correspondence relationship in which the angle difference is below a predetermined threshold.

(2)観察対象を異なる視線で投影した多数の二次元画像から検出された特徴点の各局所特徴情報および視線を学習し、任意の局所特徴情報に対して視線の確率密度を与える視線データベースをさらに具備し、カメラ画像の視線を推定する手段は、カメラ画像から検出された第２特徴点の局所特徴情報および前記視線の確率密度に基づいてカメラ画像の視線を推定するようにした。 (2) A gaze database that learns each local feature information and gaze of feature points detected from a large number of two-dimensional images projected from different gazes, and gives a gaze probability density to arbitrary local feature information Further, the means for estimating the line of sight of the camera image estimates the line of sight of the camera image based on the local feature information of the second feature point detected from the camera image and the probability density of the line of sight.

(3)各第１特徴点の法線とカメラ画像の視線との角度差に基づいて、当該角度差が所定の閾値を上回る第１特徴点をマッチング対象から外すためのマスクを生成する手段を具備し、対応点出力手段は、マスクされていない第１特徴点のみを対象にマッチングを行うようにし、マスクには、第１特徴点ごとに、その法線とカメラ画像の視線との角度差に応じた多値のマスク値が設定され、対応点出力手段は、多値のマスク値を重みとしてマッチングを行うようにした。 (3) Based on the angle difference between the normal line of each first feature point and the line of sight of the camera image, means for generating a mask for excluding the first feature point whose angle difference exceeds a predetermined threshold from the matching target And the corresponding point output means performs matching only on the first feature point that is not masked, and the mask has an angular difference between the normal line and the line of sight of the camera image for each first feature point. A multi-value mask value corresponding to the multi-value mask value is set, and the corresponding point output means performs matching using the multi-value mask value as a weight.

(4)対応点出力手段から出力される対応点のうち、幾何拘束の条件を満足する対応点のみを利用して前記カメラの姿勢を推定するカメラ姿勢推定手段を具備し、対応点出力手段は、前記カメラ姿勢推定手段においてカメラ姿勢の推定に利用できた対応点数が少ないほど閾値が緩くなるように、当該閾値を動的に変更するようにした。 (4) Of the corresponding points output from the corresponding point output means, comprising camera posture estimation means for estimating the posture of the camera using only corresponding points satisfying the geometric constraint condition, the corresponding point output means, The threshold value is dynamically changed so that the threshold value becomes gentler as the number of corresponding points that can be used for camera posture estimation in the camera posture estimation means is smaller.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1)対応点が、観察対象から検出された特徴点およびカメラ画像から検出された特徴点の各局所特徴情報間のマッチングスコアのみならず、観察対象の向き（特徴点の法線）と当該観察対象を撮影したカメラの向き（カメラ視線）との角度差も考慮して検出されるので、僅かな計算量で精度の高い対応点検出が可能になる。 (1) Corresponding points are not only the matching score between the feature points detected from the observation target and the local feature information of the feature points detected from the camera image, but also the direction of the observation target (the normal of the feature point) and the relevant point Since the detection is performed in consideration of the angle difference with the direction of the camera (camera line of sight) that captured the observation object, it is possible to detect corresponding points with high accuracy with a small amount of calculation.

(2)観察対象を異なる視線で投影した多数の二次元画像から検出された特徴点の各局所特徴情報および視線を学習し、任意の局所特徴情報に対して視線の確率密度を与える視線データベースを設けたので、GPS機能や方位センサを設けることなく、カメラ画像の視線を正確に推定できるようになる。 (2) A gaze database that learns each local feature information and gaze of feature points detected from a large number of two-dimensional images projected from different gazes, and gives a gaze probability density to arbitrary local feature information Since it is provided, it is possible to accurately estimate the line of sight of the camera image without providing a GPS function or a direction sensor.

(3)観察対象から検出された特徴点およびカメラ画像から検出された特徴点の各局所特徴情報間のマッチングスコアを、観察対象の向き（特徴点の法線）と当該観察対象を撮影したカメラの向き（カメラ視線）との角度差に基づいて重み付けすれば、例えば角度差が大きくてもマッチングスコアが十分に高ければ対応点とみなすなどの柔軟な対応点検出が可能になる。 (3) The matching score between the feature points detected from the observation object and the local feature information of the feature points detected from the camera image, the direction of the observation object (the normal of the feature point) and the camera that captured the observation object If the weighting is performed based on the angle difference with respect to the direction (camera line of sight), for example, even if the angle difference is large, if the matching score is sufficiently high, the corresponding corresponding point can be detected flexibly.

(4)出力された対応点が、幾何拘束の条件を満足しているか否かに基づいて、角度の閾値が動的に制御されるので、必要かつ十分な数の対応点を得られるようになる。 (4) Since the threshold value of the angle is dynamically controlled based on whether the output corresponding points satisfy the geometric constraint condition, the necessary and sufficient number of corresponding points can be obtained. Become.

本発明の第１および第２実施形態が適用されるARシステムのブロック図である。1 is a block diagram of an AR system to which first and second embodiments of the present invention are applied. FIG. 本発明の第１実施形態のブロック図である。It is a block diagram of a 1st embodiment of the present invention. 第２カリング処理の機能を説明するための図である。It is a figure for demonstrating the function of a 2nd culling process. 角度差の閾値θrefを動的に設定する方法を示した図である。It is the figure which showed the method of setting the threshold value (theta) ref of an angle difference dynamically. 本発明の第２実施形態のブロック図である。It is a block diagram of a 2nd embodiment of the present invention. マスクを用いない実施形態とマスクを用いる実施形態との対応点数（横軸）とマッチング精度（縦軸）との関係を示した図である。It is the figure which showed the relationship between the number of corresponding points (horizontal axis) and matching accuracy (vertical axis) of the embodiment using no mask and the embodiment using a mask. 本発明の第３および第４実施形態が適用されるARシステムのブロック図である。It is a block diagram of an AR system to which the third and fourth embodiments of the present invention are applied. 本発明の第３実施形態のブロック図である。It is a block diagram of a 3rd embodiment of the present invention. 本発明の第４実施形態のブロック図である。It is a block diagram of a 4th embodiment of the present invention.

以下、図面を参照して本発明の実施形態について詳細に説明する。図１は、本発明が適用されるARシステム１の構成を示したブロック図であり、携帯電話、スマートフォン、PDAあるいはノートPCなどの情報端末に実装されて使用される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an AR system 1 to which the present invention is applied, and is used by being mounted on an information terminal such as a mobile phone, a smartphone, a PDA, or a notebook PC.

撮像装置（カメラ）１０は、携帯端末等に搭載されているカメラモジュールあるいはWEBカメラ装置であり、観察対象２を撮影して、そのカメラ画像Icaを表示装置１１およびカメラ姿勢推定装置１２に出力する。 The imaging device (camera) 10 is a camera module or a WEB camera device mounted on a mobile terminal or the like. The imaging device (camera) 10 captures the observation target 2 and outputs the camera image Ica to the display device 11 and the camera posture estimation device 12. .

カメラ視線推定装置１４は、例えば情報端末に搭載されている方位センサやGPS機能（いずれも図示省略）により検知される自身の位置および向きと、予め与えられている観察対象２の位置とに基づいて、観察対象２に対するカメラ１０の相対的な視線（カメラ視線）を大まかに推定する。本発明では、後に詳述するように、カメラ視線に基づいて対応点候補を予め絞り込むことで対応点の精度を向上させている。 The camera line-of-sight estimation device 14 is based on, for example, its own position and orientation detected by an orientation sensor and a GPS function (both not shown) mounted on the information terminal, and the position of the observation object 2 given in advance. Thus, the relative line of sight (camera line of sight) of the camera 10 with respect to the observation object 2 is roughly estimated. In the present invention, as will be described in detail later, the accuracy of corresponding points is improved by narrowing down corresponding point candidates in advance based on the camera line of sight.

前記カメラ姿勢推定装置１２は、観察対象２の各特徴点Pmから抽出された局所特徴量とカメラ画像Icaの各特徴点Pcから抽出された局所特徴量との間で特徴点マッチングを実施する。そして、マッチングスコアがより高く、かつ観察対象の向き（特徴点の法線）と当該観察対象を撮影したカメラの向き（カメラ視線）との角度差がより小さい特徴点Pmを含むペアを対応点として採用し、これらの対応点のうち、幾何拘束条件を満足する真に正しい対応点のみを利用してカメラ姿勢を推定する。 The camera posture estimation device 12 performs feature point matching between the local feature amount extracted from each feature point Pm of the observation object 2 and the local feature amount extracted from each feature point Pc of the camera image Ica. A pair including a feature point Pm having a higher matching score and a smaller angle difference between the direction of the observation target (normal line of the feature point) and the direction of the camera that captured the observation target (camera line of sight). Among these corresponding points, the camera posture is estimated using only the truly correct corresponding points satisfying the geometric constraint condition.

カメラ姿勢は、カメラの外部パラメータと呼ばれる行列の形で表され、三次元空間内でのカメラの位置および方向の情報を含み、これとカメラの内部パラメータと呼ばれるカメラ固有の焦点距離、主軸の位置の情報が含まれる行列やその他光学的歪みのパラメータによって画面内の物体の見え方が決定される。 The camera pose is expressed in the form of a matrix called camera external parameters, and includes information on the position and orientation of the camera in the three-dimensional space. This and the camera's intrinsic parameters, called camera internal parameters, are the positions of the main axes. The appearance of the object in the screen is determined by a matrix including the information and other optical distortion parameters.

本実施形態では、内部パラメータや歪みパラメータが予めカメラキャリブレーション等により取得されて歪みは取り除かれているものとし、カメラ姿勢推定装置１２によって推定されたカメラ姿勢が表示装置１１に出力される。ＡＲシステム１が複数種類の三次元物体を対象とする場合、カメラ姿勢と共に対応する観察対象２の種類も表示装置１１に出力される。 In the present embodiment, it is assumed that internal parameters and distortion parameters are acquired in advance by camera calibration and the distortion is removed, and the camera posture estimated by the camera posture estimation device 12 is output to the display device 11. When the AR system 1 targets a plurality of types of three-dimensional objects, the corresponding types of observation targets 2 are also output to the display device 11 together with the camera posture.

表示装置１１は、撮像装置１０が連続的に取得したカメラ画像Icaをユーザに掲示できるモニタ装置であり、携帯端末のディスプレイでも良い。また、ヘッドマウントディスプレイ(HMD)のような形態でも良く、特にシースルー型のHMDの場合はカメラ画像Icaを表示せず、視界に付加情報のみを重畳して表示することも可能である。表示装置１１がディスプレイである場合は、カメラ画像Icaに付加情報DBから入力された付加情報を、カメラ姿勢推定装置から入力されたカメラ姿勢によって補正された位置に重畳表示する。 The display device 11 is a monitor device that can post a camera image Ica continuously acquired by the imaging device 10 to the user, and may be a display of a portable terminal. Further, a form such as a head-mounted display (HMD) may be used. In particular, in the case of a see-through type HMD, the camera image Ica is not displayed, and only the additional information can be superimposed and displayed in the field of view. When the display device 11 is a display, the additional information input from the additional information DB to the camera image Ica is superimposed and displayed at the position corrected by the camera posture input from the camera posture estimation device.

付加情報データベース１３は、ハードディスクドライブや半導体メモリモジュール等により構成された記憶装置であり、観察対象２の位置をARシステム１が認識した際に、表示装置１１上で観察対象２に重畳表示するCGや二次元画像を保持しており、カメラ姿勢推定装置１２が推定したカメラ姿勢に対応する観察対象２に関する付加情報を表示装置１１に出力する。 The additional information database 13 is a storage device composed of a hard disk drive, a semiconductor memory module, or the like. When the AR system 1 recognizes the position of the observation target 2, the additional information database 13 is displayed on the display device 11 so as to be superimposed on the observation target 2. And the additional information regarding the observation object 2 corresponding to the camera posture estimated by the camera posture estimation device 12 is output to the display device 11.

図１では、観察対象２の例として地球儀を扱っているが、直方体形状、円柱形状、球形状等のプリミティブな構造を持つ三次元物体や、複雑な構造を持つ場合でも、その三次元モデルが与えられれば、同様のＡＲシステムが構築可能である。そして、このようなＡＲシステムによれば、高度を視覚表示するように地表を重畳表示することや、過去の大陸形状の重畳表示、国境や国名に変更があった際に更新した情報を重畳表示、ジェスチャー認識と組み合わせて指差した国名を表示するといった利用例が想定される。 In FIG. 1, the globe is handled as an example of the observation object 2. However, even when the object has a primitive structure such as a rectangular parallelepiped shape, a cylindrical shape, a spherical shape, or a complicated structure, the 3D model is If given, a similar AR system can be constructed. And according to such an AR system, the surface of the earth is displayed in a superimposed manner so that the altitude is visually displayed, the superimposed display of past continental shapes, and the information updated when there is a change in borders or country names are displayed in a superimposed manner. An example of use in which a country name pointed to in combination with gesture recognition is displayed is assumed.

前記カメラ姿勢推定装置１２において、特徴点識別部１２ｂは、後に詳述するように、撮像装置１０が出力するカメラ画像Icaから特徴点検出器を用いて特徴点を検出し、当該特徴点およびその近傍の局所特徴情報に基づいて特徴点を分類する。そして、予め観察対象２から学習した各特徴点の三次元座標、局所特徴量および法線と、前記カメラ画像Icaから検出された各特徴点の二次元座標、局所特徴量および視線とを比較し、精度のより高い対応点を出力する。 In the camera posture estimation device 12, the feature point identification unit 12b detects a feature point using a feature point detector from the camera image Ica output from the imaging device 10, as described in detail later, and the feature point and its feature point Classify feature points based on local feature information in the vicinity. Then, the three-dimensional coordinates, local feature amounts and normals of each feature point learned from the observation object 2 in advance are compared with the two-dimensional coordinates, local feature amounts and line of sight of each feature point detected from the camera image Ica. , Output corresponding points with higher accuracy.

カメラ姿勢算出部１２ａは、前記特徴点識別部１２ｂから出力される複数の対応点に基づいてカメラ姿勢を算出し、この算出結果を表示装置１１へ出力する。対応点の二次元座標および三次元座標からカメラ姿勢（カメラの外部パラメータ）を推定する手法は従来から検討されており、三次元座標と二次元座標との関係は一般的に次式(1)で表される。 The camera posture calculation unit 12a calculates a camera posture based on a plurality of corresponding points output from the feature point identification unit 12b, and outputs the calculation result to the display device 11. Methods for estimating camera posture (external camera parameters) from 2D and 3D coordinates of corresponding points have been studied, and the relationship between 3D and 2D coordinates is generally expressed by the following equation (1) It is represented by

[u,v,1]^T=sAW[X,Y,Z,1]^T … (1) [u, v, 1] ^ T = sAW [X, Y, Z, 1] ^ T… (1)

ここで、[u,v], [X,Y,Z]はそれぞれ二次元ピクセル座標値および三次元座標値を表し、[・]^Tは転置行列を表す。また、Ａ、Ｗは、それぞれカメラの内部パラメータおよび外部パラメータ（カメラ姿勢）を表す。カメラの内部パラメータは予めカメラキャリブレーションによって求めておく。 Here, [u, v] and [X, Y, Z] represent a two-dimensional pixel coordinate value and a three-dimensional coordinate value, respectively, and [·] ^ T represents a transposed matrix. A and W represent an internal parameter and an external parameter (camera posture) of the camera, respectively. The internal parameters of the camera are obtained in advance by camera calibration.

カメラ姿勢W=[R,t]=[r1,r2,r3,t]であり、回転行列Rと並進ベクトルtとで表される。三次元座標[X,Y,Z,1]^Tと二次元座標[u,v,1]^Tとのマッチおよびカメラの内部パラメータを用いて、カメラ姿勢Wを推定できる。 Camera posture W = [R, t] = [r1, r2, r3, t], which is represented by a rotation matrix R and a translation vector t. The camera posture W can be estimated using the match between the three-dimensional coordinates [X, Y, Z, 1] ^ T and the two-dimensional coordinates [u, v, 1] ^ T and the internal parameters of the camera.

ここで、入力された二次元座標および三次元座標の対応点の中には誤った対応点が含まれるため、サンプリング法のような手法で入力された対応点から正しい対応点（インライア）のみを抽出し、カメラ姿勢の推定を行うことが一般的である。ここでカメラ姿勢算出部１２ａは、サンプリング法によって得られたインライアの絶対数を前記特徴点識別部１２ｂにフィードバックできる。 Here, since the corresponding points of the input 2D coordinates and 3D coordinates include incorrect corresponding points, only correct corresponding points (inliers) are selected from the corresponding points input by a method such as the sampling method. It is common to extract and estimate the camera pose. Here, the camera posture calculation unit 12a can feed back the absolute number of inliers obtained by the sampling method to the feature point identification unit 12b.

[第１実施形態]
図２は、前記特徴点識別部１２ｂの第１実施形態の機能ブロック図である。特徴点検出部１２１は、カメラ画像Icaから特徴点Pcを検出し、当該特徴点Pcおよびその局所特徴情報をマッチング部１２２に出力する。前記カメラ画像Icaは、リアルタイムでキャプチャされたフレームおよび録画された連続フレームのいずれでも良く、さらに連続している必要もなく、予め登録された対象物が写った１枚の写真であっても良い。 [First embodiment]
FIG. 2 is a functional block diagram of the first embodiment of the feature point identification unit 12b. The feature point detection unit 121 detects the feature point Pc from the camera image Ica, and outputs the feature point Pc and its local feature information to the matching unit 122. The camera image Ica may be either a frame captured in real time or a recorded continuous frame, and may be a single photograph showing a pre-registered object without being continuous. .

前記特徴点の検出器としては、Harrisコーナー検出器、Hessianキーポイント検出器あるいはFASTコーナー検出器など、特徴を持つ二次元座標を特定できるものであれば、あらゆる種類のものが使用可能である。本実施形態では、非特許文献１の手法で採用されているFASTコーナー検出器を利用する。 As the feature point detector, any kind of detector can be used as long as it can identify a two-dimensional coordinate having a feature, such as a Harris corner detector, a Hessian key point detector, or a FAST corner detector. In the present embodiment, the FAST corner detector employed in the method of Non-Patent Document 1 is used.

前記局所特徴情報は、例えばSIFTディスクリプタやSURFディスクリプタ等、特徴点を識別するための情報である。本発明では、一般的な局所特徴情報であれば、あらゆる種類のものが使用可能である。本実施形態では、非特許文献１の手法で採用されているパッチ画像が局所特徴情報として用いられる。 The local feature information is information for identifying feature points such as a SIFT descriptor and a SURF descriptor. In the present invention, any kind of general local feature information can be used. In the present embodiment, a patch image adopted by the method of Non-Patent Document 1 is used as local feature information.

ここで、パッチ画像とは原画像から特定の大きさで切り出された画像であり、本実施形態においては特徴点を中心とした任意の幅と高さ（例えば幅３２ピクセル、高さ３２ピクセル）の画像のことである。特徴点が大きさ（スケール）の情報も持つ場合、その大きさに応じてパッチ画像の幅と高さを変更することが可能である。パッチ画像の取得は一般的な局所特徴情報と比較して画像の切り出しのみで済むため、非常に高速であるという特徴がある。 Here, the patch image is an image cut out from the original image at a specific size, and in this embodiment, an arbitrary width and height (for example, a width of 32 pixels and a height of 32 pixels) centering on the feature point. It is the image of. When a feature point also has size (scale) information, the width and height of the patch image can be changed according to the size. The patch image is acquired at a very high speed because it is only necessary to cut out the image as compared with general local feature information.

マッチング部１２２は、分類器１２２ａおよび確率密度データベース１２２ｂを含み、予め観察対象２の三次元モデルを様々な方向から二次元に投影して得られる多数の投影画像から検出された全ての特徴点Pmの局所特徴情報（パッチ画像）を分類器１２２ａに適用する。そして、各局所特徴情報の分類結果とその三次元座標との対応関係を集計することにより、局所特徴情報とその三次元座標の確率密度との関係を一元管理できる確率密度分布を構築し、これを学習モデルとして確率密度データベース１２２ｂに登録する。 The matching unit 122 includes a classifier 122a and a probability density database 122b, and all feature points Pm detected from a large number of projection images obtained by projecting a three-dimensional model of the observation object 2 in two dimensions from various directions in advance. The local feature information (patch image) is applied to the classifier 122a. Then, by calculating the correspondence between the classification results of each local feature information and the three-dimensional coordinates, a probability density distribution that can centrally manage the relationship between the local feature information and the probability density of the three-dimensional coordinates is constructed. Is registered in the probability density database 122b as a learning model.

このようなマッチング手法を採用する分類器１２２ａとして、パッチ画像を高速に処理するFerns分類器を利用できる。Ferns分類器は複数の決定木（Fern）から構成され、各Fernはパッチ画像を分岐させる決定規則を持つ多段構成の分岐点（ノード）とノードの末端（リーフ）から構成される。決定規則は、パッチ画像からランダムに選択した２点のピクセルの輝度の大小関係によって左右に分岐させるというものである。後段のノードは、パッチ画像を別の決定規則によってさらに分岐させるが、同じ段数のノードは同じ決定規則を持つ。そのためにノードの種類はノードの段数と等しい。最終的にパッチ画像が到達するノード（リーフノード）がパッチ画像の分類結果となる。 As a classifier 122a that employs such a matching method, a Ferns classifier that processes patch images at high speed can be used. The Ferns classifier is composed of a plurality of decision trees (Fern), and each Fern is composed of a multi-stage branch point (node) having a decision rule for branching a patch image and a terminal end (leaf) of the node. The decision rule is to branch left and right depending on the magnitude relationship of the luminance of two pixels randomly selected from the patch image. The latter node further branches the patch image by another decision rule, but the nodes having the same number of steps have the same decision rule. Therefore, the node type is equal to the number of nodes. The node (leaf node) where the patch image finally arrives becomes the patch image classification result.

各リーフノードは、予め学習によって獲得したパッチ画像に対応する三次元座標の確率密度を前記確率密度DB１２２ｂに保持している。Ferns分類器は複数のFernを持っているため、パッチ画像を各Fernに入力し、それぞれの到達したリーフノードに対応する確率密度を取得し、単純ベイズ分類器を用いてその確率を乗算して最終的なパッチ画像に対応する三次元座標の確率密度を決定する。 Each leaf node holds a probability density of three-dimensional coordinates corresponding to a patch image acquired by learning in advance in the probability density DB 122b. Since the Ferns classifier has multiple Ferns, the patch image is input to each Fern, the probability density corresponding to each arrived leaf node is obtained, and the probability is multiplied using a naive Bayes classifier. The probability density of the three-dimensional coordinates corresponding to the final patch image is determined.

前記マッチング部１２２は、カメラ画像Icaから検出された特徴点Pcのパッチ画像を前記Ferns分類器に適用することにより、当該パッチ画像に対応する三次元座標の確率密度を取得する。そして、各パッチ画像（特徴点）に対応して最も確率（尤度）の高い三次元座標を対応付けて対応点とし、その尤度をマッチングスコアとして第１カリング処理部１２３へ出力する。 The matching unit 122 obtains the probability density of the three-dimensional coordinates corresponding to the patch image by applying the patch image of the feature point Pc detected from the camera image Ica to the Ferns classifier. Then, the three-dimensional coordinates having the highest probability (likelihood) are associated with each patch image (feature point) as a corresponding point, and the likelihood is output to the first culling processing unit 123 as a matching score.

第１カリング処理部１２３は、マッチングスコアに一定の閾値（スコア閾値）を設定し、マッチングスコアが所定の閾値未満のペアを対応点候補から外す一方、閾値以上のペアを第２カリング処理部１２４へ対応点候補として出力する。一般に、誤って対応付けられた対応点はマッチングスコアが絶対的にあまり高くないため、この第１カリング処理により、対応点候補に占める正しい対応点（インライア）の割合を高めることが可能になる。 The first culling processing unit 123 sets a certain threshold (score threshold) for the matching score, and removes pairs whose matching score is less than the predetermined threshold from the corresponding point candidates, while the second culling processing unit 124 Is output as a corresponding point candidate. In general, since the matching points that are mistakenly associated with each other do not have a very high matching score, it is possible to increase the proportion of correct corresponding points (inliers) in the corresponding point candidates by this first culling process.

第２カリング処理部１２４は、前記第１カリング処理を経た対応点候補を対象に、前記カメラ視線推定装置１４により与えられるカメラ視線に基づいて第２カリング処理を行い、第２カリング処理後の対応点候補を対応点として出力する。具体的には、第１カリング処理後の対応点候補のうち、特徴点Pmの法線がカメラ１０に対して正面を向いている対応点、すなわち観察対象２から検出された特徴点Pmの法線と前記カメラ画像Icaの視線Pcとを比較し、角度差θが閾値θref（角度閾値）を下回る対応点候補のみを対応点として出力する。 The second culling processing unit 124 performs a second culling process based on the camera line of sight given by the camera line-of-sight estimation device 14 for the corresponding point candidate that has undergone the first culling process, and the response after the second culling process Point candidates are output as corresponding points. Specifically, among the corresponding point candidates after the first culling processing, the corresponding point where the normal of the feature point Pm faces the front with respect to the camera 10, that is, the method of the feature point Pm detected from the observation object 2. The line and the line of sight Pc of the camera image Ica are compared, and only corresponding point candidates whose angle difference θ is less than a threshold θref (angle threshold) are output as corresponding points.

図３は、前記第２カリング処理部１２４の機能を説明するための図であり、一般に、三次元物体表面上の点の周囲の領域のカメラ画像への映り方（投影）は、カメラの焦点から三次元物体表面の点への直線（視線）に対する表面の法線の角度差θが大きくなるほど変形が大きくなってマッチング精度が悪くなる。そして、角度差θが直角を超えるとその点（及びその周囲の局所領域）は、カメラ画像に映らなくなってマッチングが不可能になる。 FIG. 3 is a diagram for explaining the function of the second culling processing unit 124. In general, the way in which a region around a point on the surface of a three-dimensional object is reflected (projected) on the camera image is determined by the focus of the camera. As the angle difference θ of the surface normal to the straight line (line of sight) from the point to the point on the surface of the three-dimensional object increases, the deformation increases and the matching accuracy deteriorates. When the angle difference θ exceeds a right angle, the point (and the surrounding local region) is not reflected in the camera image and cannot be matched.

つまり、外部からカメラ視線に関する情報を得ることで三次元座標のマッチング精度を予測することが可能になる。したがって、第２カリング処理を行うことで、精度が悪いと推測される対応点候補を除外し、対応点中のインライアの割合を高めることが可能になる。 That is, it is possible to predict the matching accuracy of the three-dimensional coordinates by obtaining information about the camera line of sight from the outside. Therefore, by performing the second culling process, it is possible to exclude corresponding point candidates that are estimated to be inaccurate and increase the ratio of inliers in the corresponding points.

ここで、観察対象２が直方体のように平面で構成されるか、あるいは平面を含む形状を持つ場合、カメラ画像Icaの視線と特徴点Pmの法線との角度差θは、各特徴点が同一平面上にある場合には同じような値を取ることが予想される。そこで、第２カリング処理部１２４は、視線と各特徴点の法線とのなす角度を計算する際に、観察対象２を構成する平面（ポリゴン）の中心の法線とカメラ画像の視線との角度差θを計算し、これをポリゴン上の全ての特徴点Pmの法線との角度差θとして扱うことで計算コストを削減できる。たとえば、対象物が直方体（６面体）で対応点に含まれる三次元座標が１００点あった場合、この処理によって１００回の角度計算が６回の角度計算で済むことになる。 Here, when the observation object 2 is configured as a plane like a rectangular parallelepiped or has a shape including a plane, the angle difference θ between the line of sight of the camera image Ica and the normal of the feature point Pm is determined by each feature point. Similar values are expected when they are on the same plane. Therefore, the second culling processing unit 124 calculates the angle formed between the line of sight and the normal of each feature point between the normal of the center of the plane (polygon) constituting the observation object 2 and the line of sight of the camera image. The calculation cost can be reduced by calculating the angle difference θ and treating it as the angle difference θ with respect to the normal of all the feature points Pm on the polygon. For example, when the object is a rectangular parallelepiped (hexahedron) and there are 100 three-dimensional coordinates included in the corresponding points, this processing makes it possible to calculate the angle 100 times 6 times.

なお、上記の実施形態では、第２カリング処理部１２４に設定される角度差θの閾値θrefが一つであるものとして説明したが、本発明はこれのみに限定されるものではなく、カメラ１０が観察対象２を見込む角度や、観察対象２の特徴点Pmごとに異なる閾値θrefを設定するようにしても良い。 In the above embodiment, the description has been made assuming that the threshold value θref of the angle difference θ set in the second culling processing unit 124 is one, but the present invention is not limited to this, and the camera 10 May be set to a different threshold value θref for each angle at which the observation object 2 is viewed or for each feature point Pm of the observation object 2.

すなわち、観察対象２が例えば地球儀であると、一の角度から見込んだ場合には海洋領域が大部分を占める一方、他の一の角度から見込んだ場合には陸地領域が大部分を占めるといったように、角度によって観察対象２のテクスチャ傾向が大きく異なる場合がある。このような場合には、前記法線と視線との角度差に対するマッチング精度も大きく変化するので、観察対象２を見込む角度に応じて閾値θrefを可変としても良い。 That is, when the observation object 2 is a globe, for example, the ocean area occupies most when viewed from one angle, while the land area occupies most when viewed from another angle. In addition, the texture tendency of the observation object 2 may vary greatly depending on the angle. In such a case, since the matching accuracy with respect to the angle difference between the normal line and the line of sight changes greatly, the threshold value θref may be made variable according to the angle at which the observation object 2 is expected.

同様の観点から、前記法線と視線との角度差に対するマッチング精度は特徴点のテクスチャにも大きく依存し、視線方向の変化に対する局所特徴情報の変化が大きなテクスチャ部分と小さなテクスチャ部分とが存在する場合がある。したがって、局所特徴情報が視線に頑健な特徴点に対しては、それ以外の特徴点よりも閾値θrefを大きく設定するようにしても良い。 From the same point of view, the matching accuracy with respect to the angle difference between the normal and the line of sight largely depends on the texture of the feature point, and there are a texture part with a large change in local feature information and a small texture part with a change in the line of sight direction. There is a case. Therefore, for feature points whose local feature information is robust to the line of sight, the threshold value θref may be set larger than those of other feature points.

図４は、上記のように閾値θrefを動的に設定する方法を示した図であり、縦軸はマッチング精度（％）を表し、横軸は法線と視線との角度差（θ）を表している。 FIG. 4 is a diagram showing a method for dynamically setting the threshold value θref as described above. The vertical axis represents the matching accuracy (%), and the horizontal axis represents the angle difference (θ) between the normal line and the line of sight. Represents.

閾値θrefを特徴点ごとに、すなわち観察対象２から検出された特徴点Pmの三次元座標ごとに設定するのであれば、観察対象２を様々な角度で撮影したテスト画像に対してマッチングを行うことで、特徴点の三次元座標xごとにマッチング精度と角度差θとの関係をプロットする。そして、三次元座標xごとに各プロットを結ぶ曲線と目標精度との交点に対応した閾値θrefx (θref1，θref2…)を求め、当該特徴点を含むペアに関しては、その三次元座標xに対応した閾値θrefxを用いて第２カリング処理を行えば良い。 If the threshold value θref is set for each feature point, that is, for each three-dimensional coordinate of the feature point Pm detected from the observation object 2, matching is performed on test images obtained by photographing the observation object 2 at various angles. Then, the relationship between the matching accuracy and the angle difference θ is plotted for each three-dimensional coordinate x of the feature point. Then, the threshold θrefx (θref1, θref2...) Corresponding to the intersection of the curve connecting each plot and the target accuracy is obtained for each three-dimensional coordinate x, and the pair including the feature point corresponds to the three-dimensional coordinate x. The second culling process may be performed using the threshold value θrefx.

その際、評価用のテスト画像が観察対象２の３Ｄモデルを用いて人工的に作り出しても良い。テスト画像を３Ｄレンダリングで人工的に作成する場合、設定したカメラパラメータが既知であるために、前記法線と視線との角度差を正確に求められるので、各閾値θrefxを正確に算出できるようになる。 At this time, a test image for evaluation may be artificially created using the 3D model of the observation object 2. When a test image is artificially created by 3D rendering, since the set camera parameter is known, the angle difference between the normal and the line of sight can be accurately obtained, so that each threshold value θrefx can be accurately calculated. Become.

なお、前記各プロットを結ぶ曲線と目標精度との交点を求める際、テスト画像の数が少ない場合にプロットが摂動し、交点の位置が安定しない場合がある。そのような場合には、設定した精度近傍のプロットを構成する点群から直線回帰によってプロットを近似する直線を計算し、近似直線と目標精度との交点を計算するようにしても良い。 When obtaining the intersection between the curve connecting the plots and the target accuracy, the plot may be perturbed when the number of test images is small, and the position of the intersection may not be stable. In such a case, a straight line approximating the plot may be calculated by linear regression from a point group constituting a plot near the set accuracy, and the intersection of the approximate straight line and the target accuracy may be calculated.

閾値θrefとマッチング精度の関係については、角度差θの閾値を小さく設定することで、正面近くから撮影された特徴点のみを残すようにすれば精度は高くなる。しかしながら、インライアの対応点もカリングする可能性があるため、出力する対応点に含まれるインライアの割合が増える代わりに絶対数が減る。インライアの絶対数がある程度ある場合は割合が高くなることで後段のカメラ姿勢算出の処理が高速になるが、インライアの絶対数が少なくなるとカメラ姿勢の算出自体が行えなくなる。 Regarding the relationship between the threshold value θref and the matching accuracy, the accuracy increases if only the feature point photographed from near the front is left by setting the threshold value of the angle difference θ small. However, since the corresponding points of the inliers may also be culled, the absolute number decreases instead of increasing the proportion of inliers included in the corresponding points to be output. When there is a certain number of inliers, the ratio increases to speed up the subsequent camera posture calculation process. However, when the absolute number of inliers decreases, the camera posture calculation itself cannot be performed.

そこで、第２カリング処理部１２４は、カメラ姿勢算出部１２ａからカメラ姿勢の算出に用いたインライアの絶対数のフィードバックを受け取って閾値を調節しても良い。 Therefore, the second culling processing unit 124 may receive the feedback of the absolute number of inliers used for calculation of the camera posture from the camera posture calculation unit 12a and adjust the threshold value.

一般に、カメラ姿勢の算出は４組未満の対応点からは行えず、安定した算出結果を得るためには、少なくとも８〜１０組の対応点が必要である。そこで、第２カリング処理部１２４は、フレーム毎に対応点の絶対数を評価し、一定数以下（例えば１０）になった場合には閾値θrefを緩和する（大きくする）ことで、カメラ姿勢算出の安定性を確保することができる。 In general, the camera posture cannot be calculated from less than 4 corresponding points, and at least 8 to 10 corresponding points are required to obtain a stable calculation result. Therefore, the second culling processing unit 124 evaluates the absolute number of corresponding points for each frame, and when it becomes equal to or smaller than a certain number (for example, 10), the second culling processing unit 124 relaxes (increases) the threshold θref, thereby calculating the camera posture Can be ensured.

また、閾値とマッチング精度との関係は、第１カリング処理部１２３で用いた閾値によっても変わるため、第２カリング処理部１２４は、第１カリング処理部１２３で設定した閾値に応じて自身の角度閾値θrefを調節しても良い。 Further, since the relationship between the threshold and the matching accuracy also changes depending on the threshold used in the first culling processing unit 123, the second culling processing unit 124 determines its own angle according to the threshold set in the first culling processing unit 123. The threshold value θref may be adjusted.

具体的には、角度とマッチング精度の関係及びそこから計算される角度の閾値を第１カリング処理部１２３で設定した閾値別に記録し、角度の閾値の設定は第１カリング処理部１２３で設定した閾値に応じて行う。 Specifically, the relationship between the angle and the matching accuracy and the threshold value of the angle calculated therefrom are recorded for each threshold set by the first culling processing unit 123, and the setting of the angle threshold is set by the first culling processing unit 123. Perform according to the threshold.

[第２実施形態]
図５は、本発明の第２実施形態に係る特徴点識別部１２ｂの構成を示したブロック図であり、前記と同一の符号は同一または同等部分を表している。本実施形態は、図２を参照して説明した第１実施形態の特徴点識別部１２ｂとの比較において、第２カリング処理部１２４が省略され、その代わりにマスク生成部１２６が追加されている点に特徴がある。 [Second Embodiment]
FIG. 5 is a block diagram showing a configuration of the feature point identification unit 12b according to the second embodiment of the present invention, and the same reference numerals as those described above represent the same or equivalent parts. In the present embodiment, in comparison with the feature point identification unit 12b of the first embodiment described with reference to FIG. 2, the second culling processing unit 124 is omitted, and a mask generation unit 126 is added instead. There is a feature in the point.

前記マスク生成部１２６は、前記カメラ視線推定装置１４により推定されたカメラ視線と、観察対象２から検出された特徴点Pmの三次元座標ごとに検出・登録されている法線との角度差θを全て計算する。そして、角度差θが所定の閾値θrefを超える三次元座標に関しては「１」、閾値θrefを超えない三次元座標に関しては「０」となる、観察対象２の特徴点数を長さとする２値配列のマスクを生成する。 The mask generation unit 126 determines the angle difference θ between the camera line-of-sight estimated by the camera line-of-sight estimation device 14 and the normal line detected and registered for each three-dimensional coordinate of the feature point Pm detected from the observation object 2. Are all calculated. Then, a binary array whose length is the number of feature points of the observation object 2 is “1” for a three-dimensional coordinate having an angle difference θ exceeding a predetermined threshold θref and “0” for a three-dimensional coordinate not exceeding the threshold θref. Generate a mask for

マッチング部１２２では、観察対象２の各特徴点Pmとカメラ画像Icaの各特徴点Pcとの間で対応点マッチングを行う際、予め観察対象２の各特徴点Pmに、その三次元座標に対応したマスク値を適用することで、各特徴点Pmをマッチング対象に含めるか否かが決定される。 In the matching unit 122, when performing corresponding point matching between each feature point Pm of the observation target 2 and each feature point Pc of the camera image Ica, the feature point Pm of the observation target 2 corresponds to the three-dimensional coordinates in advance. By applying the mask value, it is determined whether each feature point Pm is included in the matching target.

すなわち、「１」のマスク値に対応した特徴点Pmはマッチング候補から外され、「０」のマスク値に対応した特徴点Pmはマッチング候補とされる。したがって、マスク値が全て「０」の配列であればマスク処理は全く行わないことになる。 That is, the feature point Pm corresponding to the mask value “1” is excluded from the matching candidates, and the feature point Pm corresponding to the mask value “0” is set as the matching candidate. Therefore, if the mask values are all “0”, no mask processing is performed.

前記第１カリング処理部１２３は、マスクされていない特徴点Pmから抽出されたパッチ画像と前記各特徴点Pcから抽出されたパッチ画像との間でのみ対応点マッチングを実施し、マッチングスコアが所定の閾値を超える組合せを対応点として出力する。 The first culling processing unit 123 performs corresponding point matching only between the patch image extracted from the unmasked feature point Pm and the patch image extracted from each feature point Pc, and the matching score is predetermined. A combination exceeding the threshold is output as a corresponding point.

本実施形態のように、第２カリング処理に代えてマスク処理を採用すれば、第２カリング処理を実施する場合と比較して、より多くの対応点を取得できる可能性がある。
図６は、マスク処理によって対応点の数を増やすことができる場合を示した図である。カメラ画像Icaから特徴点Pcが検出され、前記マッチング部１２２によって三次元座標との対応付けが実施された結果、前記特徴点Pcが本来であれば対応すべき特徴点Pm2と類似した模様を持つ特徴点Pm1が存在し、誤った対応Pc‐Pm1間のマッチングスコアが全ての特徴点Pm中で最も高く、正しい対応Pc−Pm2間のマッチングスコアが２番目となっている。 If the mask process is employed instead of the second culling process as in the present embodiment, there is a possibility that more corresponding points can be acquired as compared with the case where the second culling process is performed.
FIG. 6 is a diagram showing a case where the number of corresponding points can be increased by mask processing. As a result of detecting the feature point Pc from the camera image Ica and performing the matching with the three-dimensional coordinates by the matching unit 122, the feature point Pc has a pattern similar to the feature point Pm2 that should be matched if originally. The feature point Pm1 exists, the matching score between the incorrect corresponding Pc-Pm1 is the highest among all the feature points Pm, and the matching score between the correct corresponding Pc-Pm2 is the second.

前記第１実施形態であれば、前記第２カリング処理が実施されることにより、まずマッチング部１２２によって特徴点Pcは最もマッチングスコアの高い特徴点Pm1の三次元座標とマッチングされて第１対応点を構成する。第１対応点のマッチングスコアはスコア閾値を満たすので第１カリング処理を通過し、次に第２カリング処理によって角度の評価が実施される。ここでは、第１対応点の角度（１０５°）は角度閾値（７０°）を満たさないため、第１対応点は対応点から除外され、カリング処理後の対応点中にPcは含まれなくなる。 In the first embodiment, by performing the second culling process, the matching unit 122 first matches the feature point Pc with the three-dimensional coordinate of the feature point Pm1 having the highest matching score, so that the first corresponding point is obtained. Configure. Since the matching score of the first corresponding point satisfies the score threshold value, the first culling process is passed, and then the angle is evaluated by the second culling process. Here, since the angle (105 °) of the first corresponding point does not satisfy the angle threshold (70 °), the first corresponding point is excluded from the corresponding points, and Pc is not included in the corresponding points after the culling process.

これに対して、本実施形態のように、前記第２カリング処理に代えてマスク処理を初めに行う場合、特徴点Pm1はマスクされ、特徴点Pcはマスクされていない三次元座標の中で最もマッチングスコアの高い特徴点Pm2とマッチングされて第２対応点を構成する。そして、第２対応点のスコア（８点）および角度（０°）は、それぞれスコア閾値（７点）および角度閾値（７０°）を共に満たすため、カリング処理後の対応点には特徴点Pcが含まれる。 In contrast, when the mask process is first performed instead of the second culling process as in the present embodiment, the feature point Pm1 is masked and the feature point Pc is the most unmasked three-dimensional coordinate. A second corresponding point is configured by matching with the feature point Pm2 having a high matching score. Since the score (8 points) and the angle (0 °) of the second corresponding point both satisfy the score threshold value (7 points) and the angle threshold value (70 °), the corresponding point after the culling process has a feature point Pc. Is included.

以上より、マスク処理を行う場合は第２カリング処理を行う場合と比較して、より多くの対応点を取得できるため、正しい対応点の数が増加することが期待される。しかし一方で、誤った対応点が増加する可能性もあり、一般に対応点の数と精度のトレードオフが存在する。 As described above, when the mask process is performed, more corresponding points can be obtained than when the second culling process is performed. Therefore, it is expected that the number of correct corresponding points increases. On the other hand, however, there is a possibility that erroneous corresponding points may increase, and there is generally a trade-off between the number of corresponding points and accuracy.

なお、上記の実施形態ではマスク値が２値あるものとして説明したが、本発明はこれのみに限定されるものではなく、前記角度差θの大きさに応じて、例えば角度差θが小さいほど大きな値となる３値以上の多値を採用しても良い。マッチング部１２２では、このマスク値を重み値として利用し、マッチングスコアをマスク値が大きいほど増補正し、マスク値が小さいほど減補正する。このようにすれば、角度差θが大きい組合せであっても、マッチングスコアが十分に高ければ対応点とされるので、柔軟なカリング処理が可能になる。 In the above embodiment, the description has been given on the assumption that the mask value is binary. However, the present invention is not limited to this, and the smaller the angle difference θ, for example, according to the magnitude of the angle difference θ. You may employ | adopt the multi-value of 3 values or more used as a big value. The matching unit 122 uses this mask value as a weight value, and the matching score is increased and corrected as the mask value is increased, and decreased as the mask value is decreased. In this way, even a combination with a large angle difference θ is regarded as a corresponding point if the matching score is sufficiently high, so that flexible culling processing becomes possible.

図７は、本発明の第３および第４実施形態が適用されるARシステム１の構成を示したブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。図１の構成と比較すると、カメラ視線推定装置１４が省略され、これと同等の機能が特徴点識別部１２ｂに実装されている点に特徴がある。 FIG. 7 is a block diagram showing the configuration of the AR system 1 to which the third and fourth embodiments of the present invention are applied. The same reference numerals as those described above represent the same or equivalent parts, and the explanation thereof is as follows. Omitted. Compared with the configuration of FIG. 1, the camera gaze estimation device 14 is omitted, and a feature equivalent to this is implemented in the feature point identification unit 12b.

[第３実施形態]
図８は、本発明の第３実施形態の主要部の構成を示した機能ブロック図であり、前記特徴点識別部１２ｂの構成を示している。本実施形態は、特徴点検出部１２１からカメラ画像の各特徴点の局所特徴情報を取得し、当該局所特徴情報とカメラ画像の視線（カメラ視線）との対応付けを行うカメラ視線推定部１２５を、前記カメラ視線推定装置１４に代えて設けた点に特徴がある。 [Third embodiment]
FIG. 8 is a functional block diagram showing the configuration of the main part of the third embodiment of the present invention, and shows the configuration of the feature point identification unit 12b. In this embodiment, the camera gaze estimation unit 125 that acquires local feature information of each feature point of the camera image from the feature point detection unit 121 and associates the local feature information with the gaze of the camera image (camera gaze). The camera gaze estimation device 14 is characterized in that it is provided in place of the camera gaze estimation device 14.

前記カメラ視線推定部１２５は、前記マッチング部１２２と同様にFerns分類器を利用してカメラ視線を推定する。ここでは、推定の対象が三次元座標ではなくカメラ視線となるため、Ferns分類器の各リーフノードには、観察対象２を様々な視線で均等に分割して投影した多数の二次元画像から検出された各特徴点Pmの各局所特徴情報および視線を学習し、任意の局所特徴情報に対してカメラ視線の確率密度を与える確率密度が予め保持されている。 The camera line-of-sight estimation unit 125 estimates the camera line-of-sight using a Ferns classifier as in the matching unit 122. Here, since the estimation target is not a three-dimensional coordinate but a camera line of sight, each leaf node of the Ferns classifier is detected from a large number of two-dimensional images that are projected by dividing the observation target 2 equally with various lines of sight. A probability density that learns each local feature information and line of sight of each feature point Pm and gives a probability density of the camera line of sight to arbitrary local feature information is held in advance.

ランタイム時には、カメラ画像の各特徴点Pcから切り出したパッチ画像を前記分類器に適用することでパッチ画像をカメラ視線に分類し、最も多く分類されたカメラ視線が、前記パッチ画像に関する最終的なカメラ視線の推定結果として第２カリング処理部１２４へ出力される。 At runtime, the patch image cut out from each feature point Pc of the camera image is applied to the classifier to classify the patch image into the camera line of sight, and the most classified camera line of sight is the final camera for the patch image The line of sight estimation result is output to the second culling processing unit 124.

この際、視線推定部１２５は、視線推定のための分類器の決定規則をマッチング部１２２と共用することで、視線推定に必要なデータサイズを抑えることができる。その場合、視線推定部１２５およびマッチング部１２２は、２種類のクラスおよび確率密度と、１種類の決定木群とを持つ分類器を使用することになる。 At this time, the line-of-sight estimation unit 125 can reduce the data size necessary for line-of-sight estimation by sharing the classifier determination rule for line-of-sight estimation with the matching unit 122. In that case, the line-of-sight estimation unit 125 and the matching unit 122 use a classifier having two types of classes and probability densities and one type of decision tree group.

本実施形態によれば、方位センサやGPS機能等を備えていない情報端末においても、カメラ視線を高い確度で推定できるので、局所特徴情報間のマッチングスコアのみならず、観察対象の向き（特徴点の法線）と当該観察対象を撮影したカメラの向き（カメラ視線）との角度差も考慮した特徴点検出が可能になる。 According to the present embodiment, the camera line of sight can be estimated with high accuracy even in an information terminal that is not equipped with an orientation sensor, a GPS function, and the like. ) And a feature point detection that also considers the angle difference between the direction of the camera (camera line of sight) that captured the observation target.

[第４実施形態]
図９は、本発明の第４実施形態の主要部の構成を示した機能ブロック図であり、前記特徴点識別部１２ｂの構成を示している。本実施形態は、前記視線推定部１２５により推定された視線情報に基づいて対応点マッチング用のマスクを作成し、これをマッチング部１２２に提供するマスク生成部１２６を設けた点に特徴がある。 [Fourth embodiment]
FIG. 9 is a functional block diagram showing the configuration of the main part of the fourth embodiment of the present invention, and shows the configuration of the feature point identification unit 12b. The present embodiment is characterized in that a mask for corresponding point matching is created based on the line-of-sight information estimated by the line-of-sight estimation unit 125 and a mask generation unit 126 is provided to provide the matching unit 122 with the mask.

１…ARシステム，２…観察対象，１０…撮像装置，１２…カメラ姿勢推定装置，１２ａ…カメラ姿勢算出部，１２ｂ…特徴点識別部，１３…付加情報データベース，１４…カメラ視線推定装置 DESCRIPTION OF SYMBOLS 1 ... AR system, 2 ... Observation object, 10 ... Imaging apparatus, 12 ... Camera attitude estimation apparatus, 12a ... Camera attitude calculation part, 12b ... Feature point identification part, 13 ... Additional information database, 14 ... Camera gaze estimation apparatus

Claims

In the image processing apparatus that associates the three-dimensional coordinates of the observation object with the feature points of the camera image obtained by photographing the observation object,
A probability density database that learns local feature information and three-dimensional coordinates of a first feature point detected from an observation target, and gives a probability density of the three-dimensional coordinates to arbitrary local feature information;
Means for detecting a normal of each first feature point;
Feature point detecting means for detecting a second feature point from a camera image obtained by photographing an observation object;
Local feature information extracting means for extracting local feature information from each second feature point;
Means for estimating the line of sight of the camera image;
Matching means for associating each second feature point with a first feature point having a higher matching score based on the local feature information and the probability density database;
And a corresponding point output unit that compares the normal line of the associated first feature point with the line of sight of the camera image and outputs a pair whose angle difference is below a predetermined threshold as a corresponding point. Image processing device.

Further comprising means for excluding the matching score other than the pair whose matching score is higher than a predetermined threshold,
2. The image processing according to claim 1, wherein the corresponding point output unit outputs, as corresponding points, corresponding point candidates whose angle difference is lower than a predetermined threshold among the corresponding point candidates having a high matching score. apparatus.

Based on the angle difference between the normal line of each first feature point and the line of sight of the camera image, the apparatus further comprises means for generating a mask for excluding the first feature point whose angle difference exceeds a predetermined threshold from the matching target. ,
The image processing apparatus according to claim 1, wherein the corresponding point output unit performs matching only on the first unmasked first feature point.

In the mask, for each first feature point, a multi-value mask value is set according to the angle difference between the normal line and the line of sight of the camera image,
The image processing apparatus according to claim 3, wherein the corresponding point output unit performs matching using the multi-value mask value as a weight.

Further comprising means for detecting the position and orientation of the device itself;
The means for estimating the line of sight of the camera image estimates the line of sight of the camera image based on the position and orientation of the device itself and the position of the observation target separately provided. An image processing apparatus according to claim 1.

It further includes a gaze database that learns each local feature information and gaze of feature points detected from a number of two-dimensional images obtained by projecting the observation target with different gazes and gives a gaze probability density to arbitrary local feature information. ,
The means for estimating the line of sight of the camera image estimates the line of sight of the camera image based on the local feature information of the second feature point detected from the camera image and the probability density of the line of sight. 5. The image processing device according to any one of 4.

7. The normal line detection unit typifies a normal line of a plurality of feature points belonging to the same plane to be observed by a normal line of one of the feature points. Image processing apparatus.

8. The image processing apparatus according to claim 1, wherein in the corresponding point output unit, a threshold value related to the angle difference is set to a unique value for each first feature point.

Of the corresponding points output from the corresponding point output means, comprising camera posture estimation means for estimating the camera posture using only corresponding points satisfying the geometric constraint condition,
The said corresponding point output means changes the said threshold value dynamically so that a threshold value may become loose, so that there are few corresponding points which could be utilized for the camera attitude estimation in the said camera attitude estimation means. The image processing apparatus according to any one of 8.