JP2014106641A

JP2014106641A - Image processing apparatus

Info

Publication number: JP2014106641A
Application number: JP2012257917A
Authority: JP
Inventors: Tatsuya Kobayashi; 達也小林; Haruhisa Kato; 晴久加藤; Akio Yoneyama; 暁夫米山
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-11-26
Filing date: 2012-11-26
Publication date: 2014-06-09
Anticipated expiration: 2032-11-26
Also published as: JP5975484B2

Abstract

PROBLEM TO BE SOLVED: To improve accuracy and speed of feature point matching between an object and a camera image without increasing calculation cost or memory consumption.SOLUTION: An image processing apparatus for associating three-dimensional coordinates of first feature points detected from images obtained by observing an object M from a plurality of directions with second feature points detected from a camera image Ica obtained by capturing the object includes: a sorter 32a which sorts local feature quantity on the basis of a predetermined sorting rule; a database 32b of probability distribution for applying three-dimensional coordinates and view point probability distribution to a result of sorting the local feature quantity; an identification part 32d which identifies correspondence between the local feature quantity of the second feature point, the three-dimensional coordinate, and the view point, on the basis of a sorting result obtained by applying the local feature quantity of the second feature point to the sorter 32a and the probability distribution; and accuracy improving means 3b which improves accuracy of correspondence between the local feature quantity of the second feature point and the three-dimensional coordinate on the basis of the correspondence between the local feature quantity of the second feature point and the view point.

Description

本発明は、観察対象の三次元座標と当該観察対象をカメラで撮影して得られるカメラ画像の二次元座標とを対応付けてカメラ姿勢の推定等に活用する画像処理装置に係り、特に、計算コストや使用メモリ量を増加させることなく高速かつ高精度での対応付けを可能にする画像処理装置に関する。 The present invention relates to an image processing apparatus that associates three-dimensional coordinates of an observation object with two-dimensional coordinates of a camera image obtained by photographing the observation object with a camera, and particularly uses it for estimation of a camera posture. The present invention relates to an image processing apparatus that enables high-speed and high-accuracy association without increasing costs and memory usage.

近年、現実空間の映像をコンピュータで処理して更なる情報を付加するAR（拡張現実感）技術が、WEBカメラの接続されたPCや、カメラ付き携帯電話端末上で実現されるようになっている。AR技術では、カメラ画像内の対象物に対するカメラ姿勢（カメラの外部パラメータ）を推定する必要があり、センサや基準マーカを利用した手法等が用いられている。また、形状や画像情報が既知である三次元物体を対象物としてカメラ姿勢を推定する技術が検討されている。 In recent years, AR (Augmented Reality) technology that adds real-time images by processing images in a real space has been realized on PCs connected to web cameras and mobile phone terminals with cameras. Yes. In the AR technology, it is necessary to estimate a camera posture (an external parameter of the camera) with respect to an object in a camera image, and a method using a sensor or a reference marker is used. In addition, a technique for estimating a camera posture using a three-dimensional object whose shape and image information are known as an object is being studied.

特許文献１には、三次元物体を複数の視点から撮影した記憶画像を用意し、入力画像と記憶画像の間の特徴点マッチングによって最も類似する記憶画像（視点）を特定する技術が開示されている。視点の特定後は、その記憶画像と入力画像との間で対応点が取得されてカメラ姿勢が算出される。 Patent Document 1 discloses a technique for preparing a stored image obtained by photographing a three-dimensional object from a plurality of viewpoints and specifying the most similar stored image (viewpoint) by feature point matching between the input image and the stored image. Yes. After the viewpoint is specified, corresponding points are acquired between the stored image and the input image, and the camera posture is calculated.

非特許文献１には、特徴点のクラス分類によって、特徴点を対象物に対する視点の候補に分類することで視点を推定する技術が開示されている。特許文献２には、対象物に対する視点の情報を利用して、高精度な対応点を取得する技術が開示されている。 Non-Patent Document 1 discloses a technique for estimating a viewpoint by classifying feature points into candidate viewpoints for an object by classifying feature points. Patent Document 2 discloses a technique for acquiring highly accurate corresponding points using viewpoint information on an object.

特開２０１２−８３８５５号公報JP 2012-83855 A 特願２０１２−１７４３２０号Japanese Patent Application No. 2012-174320

平井悠貴, 鈴木覚, 藤吉弘亘, "2段階のRandomized Treesによる高速3次元物体認識", 画像センシングシンポジウム, IS1-15, 2011.Yuki Hirai, Satoru Suzuki, Hironobu Fujiyoshi, "High-speed 3D object recognition by two-stage Randomized Trees", Image Sensing Symposium, IS1-15, 2011.

特許文献１，２では、視点推定の精度が記憶画像の撮影角度に依存するため、特定の方向から撮影された対象物に対しては十分な視点推定の精度が得られないという問題点があった。 In Patent Documents 1 and 2, since the accuracy of viewpoint estimation depends on the shooting angle of a stored image, there is a problem that sufficient accuracy of viewpoint estimation cannot be obtained for an object shot from a specific direction. It was.

非特許文献１では、限られた姿勢の範囲で撮影された三次元物体に対しては十分な視点推定精度が得られるものの、推定する視点の候補が増えた場合に推定精度が劣化する。そのため、対象とする三次元物体が任意の方向から撮影された入力画像に対しては十分な視点推定精度が得られないという問題点があった。 In Non-Patent Document 1, although sufficient viewpoint estimation accuracy is obtained for a three-dimensional object photographed in a limited posture range, the estimation accuracy deteriorates when the number of viewpoint candidates to be estimated increases. Therefore, there has been a problem that sufficient viewpoint estimation accuracy cannot be obtained for an input image obtained by photographing a target three-dimensional object from an arbitrary direction.

さらに、特許文献１および非特許文献１の従来技術をカメラ姿勢推定問題に適用した場合、視点の推定と対応点の取得の２段階のマッチング処理を行う必要があることで、カメラ姿勢推定処理全体の計算量が増加するという問題点があった。 Furthermore, when the conventional techniques of Patent Document 1 and Non-Patent Document 1 are applied to the camera posture estimation problem, it is necessary to perform two-stage matching processing of viewpoint estimation and corresponding point acquisition, so that the entire camera posture estimation processing is performed. There has been a problem that the amount of calculation increases.

本発明の目的は、上述の技術課題を解決し、観察対象とそのカメラ画像との特徴点マッチングの精度および速度を、計算コストや使用メモリ量を増加させることなく向上させることを可能にした画像処理装置を提供することにある。 An object of the present invention is an image that solves the above-described technical problems and that can improve the accuracy and speed of feature point matching between an observation target and its camera image without increasing the calculation cost or the amount of memory used. It is to provide a processing apparatus.

上記の目的を達成するために、本発明は、観察対象を複数の視点から観察して得られた各画像から検出される各第１特徴点の三次元座標と当該観察対象を撮影したカメラ画像から検出される各第２特徴点とを対応付ける画像処理装置において、局所特徴量を所定の分類規則により分類する分類手段と、局所特徴量の分類結果に対して三次元座標および視点の各対応関係を与える各確率分布のデータベースと、第２特徴点の局所特徴量を前記分類手段に適用し得られる分類結果と前記確率分布とに基づいて、第２特徴点の局所特徴量と三次元座標および視点との対応関係を識別する識別手段と、第２特徴点の局所特徴量と視点との対応関係に基づいて、前記第２特徴点の局所特徴量と三次元座標との対応関係を高精度化する手段とを具備した点に特徴がある。 In order to achieve the above object, the present invention provides a three-dimensional coordinate of each first feature point detected from each image obtained by observing the observation object from a plurality of viewpoints and a camera image obtained by photographing the observation object. In the image processing apparatus for associating each second feature point detected from the classification means for classifying the local feature amount according to a predetermined classification rule, and each correspondence relationship between the three-dimensional coordinates and the viewpoint for the classification result of the local feature amount A local feature amount of the second feature point, a three-dimensional coordinate, and a probability distribution obtained by applying the local feature amount of the second feature point to the classification means and the probability distribution. Based on the identification means for identifying the correspondence between the viewpoint and the correspondence between the local feature of the second feature point and the viewpoint, the correspondence between the local feature of the second feature and the three-dimensional coordinate is highly accurate. The point which comprises There is a feature.

本発明によれば、カメラ画像から検出された特徴点の局所特徴量を分類器に一度適用するだけで、この局所特徴量に対応した3D座標および視点の双方が同時に求まるので、観察対象とそのカメラ画像との特徴点マッチングの精度および速度を、計算コストや使用メモリ量を増加させることなく向上させることが可能になる。 According to the present invention, only by applying the local feature amount of the feature point detected from the camera image once to the classifier, both the 3D coordinates and the viewpoint corresponding to the local feature amount can be obtained simultaneously. It is possible to improve the accuracy and speed of feature point matching with a camera image without increasing the calculation cost and the amount of memory used.

本発明が適用されるARシステムの構成を示した機能ブロック図である。It is a functional block diagram showing the configuration of an AR system to which the present invention is applied. カメラ視点推定部３ａの機能を模式的に示した図である。It is the figure which showed the function of the camera viewpoint estimation part 3a typically. カメラ視点推定部３ａの第１実施形態の機能ブロック図である。It is a functional block diagram of 1st Embodiment of the camera viewpoint estimation part 3a. 第１実施形態におけるFerns分類器を用いたパッチ画像の分類方法を示した図である。It is the figure which showed the classification method of the patch image using the Ferns classifier in 1st Embodiment. 学習処理のフローチャートである。It is a flowchart of a learning process. 学習処理の模式図である。It is a schematic diagram of a learning process. 教師データに用いる特徴点の選択方法を示した図である。It is the figure which showed the selection method of the feature point used for teacher data. 視点候補の設定方法を示した図である。It is the figure which showed the setting method of a viewpoint candidate. カメラの焦点位置の計算方法を示した図である。It is the figure which showed the calculation method of the focus position of a camera. 識別処理のフローチャートである。It is a flowchart of an identification process. 3D座標および視点の求め方を示した図である。It is the figure which showed how to obtain | require 3D coordinates and a viewpoint. カメラ視点推定部３ａの第２実施形態の機能ブロック図である。It is a functional block diagram of 2nd Embodiment of the camera viewpoint estimation part 3a. 第２実施形態におけるFerns分類器を用いたパッチ画像の分類方法を示した図である。It is the figure which showed the classification method of the patch image using the Ferns classifier in 2nd Embodiment. 距離方向への離散化に対応した視点候補の設定方法を示した図である。It is the figure which showed the setting method of the viewpoint candidate corresponding to the discretization to a distance direction.

以下、図面を参照して本発明の実施形態について詳細に説明する。図１は、本発明が適用されるARシステムの主要部の構成を示した機能ブロック図であり、携帯電話、スマートフォン、タブレット端末、PDAあるいはノートPCなどの情報端末に実装して使用される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram illustrating a configuration of a main part of an AR system to which the present invention is applied, and is used by being mounted on an information terminal such as a mobile phone, a smartphone, a tablet terminal, a PDA, or a notebook PC.

撮像装置（カメラ）１は、携帯端末等に搭載されているデジタルカメラモジュールやWEBカメラ装置であり、観察対象Mを撮影して、そのカメラ画像Icaを表示装置２およびカメラ姿勢推定装置３に出力する。前記カメラ姿勢推定装置３は、観察対象Mを撮影した際の当該観察対象Mに対するカメラ１の相対的な視点（カメラ視点）を推定するカメラ視点推定部３ａ、カメラ視点の推定結果に基づいて、観察対象Mの三次元(3D)座標とカメラ画像の二次元(2D)座標との対応関係を高精度化する高精度化部３ｂおよび当該高精度化された2D-3D座標の対応関係に基づいてカメラ姿勢を算出するカメラ姿勢算出部３ｃを主要な構成とする。 The imaging device (camera) 1 is a digital camera module or a WEB camera device mounted on a portable terminal or the like. The imaging device (camera) 1 captures the observation object M and outputs the camera image Ica to the display device 2 and the camera posture estimation device 3. To do. The camera posture estimation device 3 is based on a camera viewpoint estimation unit 3a that estimates a relative viewpoint (camera viewpoint) of the camera 1 with respect to the observation target M when the observation target M is photographed. Based on the correspondence between the three-dimensional (3D) coordinates of the observation object M and the two-dimensional (2D) coordinates of the camera image, which is highly accurate, and the highly accurate 2D-3D coordinates. The camera posture calculation unit 3c that calculates the camera posture is the main component.

付加情報データベース(DB)４は、ハードディスクドライブや半導体メモリモジュール等により構成された記憶装置であり、観察対象Mの位置をARシステム１が認識した際に、表示装置２上で観察対象Mに重畳表示するCG、二次元画像あるいはテキスト情報を保持しており、カメラ姿勢推定装置３が推定したカメラ姿勢に対応する観察対象Mの付加情報を表示装置２へ提供する。 The additional information database (DB) 4 is a storage device configured by a hard disk drive, a semiconductor memory module, or the like, and is superimposed on the observation target M on the display device 2 when the AR system 1 recognizes the position of the observation target M. CG, two-dimensional image or text information to be displayed is held, and additional information of the observation target M corresponding to the camera posture estimated by the camera posture estimation device 3 is provided to the display device 2.

前記表示装置２は、カメラ１が連続的に取得するカメラ画像Icaをユーザに掲示できるモニタ装置であり、携帯端末のディスプレイでも良い。また、ヘッドマウントディスプレイ(HMD)のような形態でも良く、特にシースルー型のHMDの場合はカメラ画像Icaを表示せず、視界に付加情報のみを重畳して表示することも可能である。表示装置２がディスプレイの場合、カメラ画像Icaに付加情報DB４から入力された付加情報を、カメラ姿勢推定装置から入力されたカメラ姿勢によって補正された位置に重畳表示する。 The display device 2 is a monitor device that can post a camera image Ica continuously acquired by the camera 1 to the user, and may be a display of a portable terminal. Further, a form such as a head-mounted display (HMD) may be used. In particular, in the case of a see-through type HMD, the camera image Ica is not displayed, and only the additional information can be superimposed and displayed in the field of view. When the display device 2 is a display, the additional information input from the additional information DB 4 to the camera image Ica is superimposed and displayed at the position corrected by the camera posture input from the camera posture estimation device.

前記カメラ姿勢推定装置３において、カメラ視点推定部３ａは、後に詳述するように、観察対象Mを撮影したカメラ画像Icaが入力されると、このカメラ画像Icaから検出される特徴点の二次元座標と観察対象Mの3D座標との対応関係およびその尤度（3D座標対応関係）、ならびに前記特徴点と視点との対応関係およびその尤度（視点対応関係）を推定して高精度化部３ｂへ出力する。高精度化部３ｂは、前記視点の推定結果に基づいて、当該推定された視点において前記観察対象から得ることのできなかった特徴点の三次元座標を対応点候補から除外することで3D座標対応関係を高精度化する。 In the camera posture estimation device 3, as will be described in detail later, when the camera image Ica obtained by photographing the observation object M is input, the camera viewpoint estimation unit 3 a receives the two-dimensional feature points detected from the camera image Ica. Correspondence between coordinates and 3D coordinates of observation object M and its likelihood (3D coordinate correspondence), and correspondence between the feature point and viewpoint and its likelihood (viewpoint correspondence) are estimated to improve accuracy. Output to 3b. Based on the viewpoint estimation result, the high-accuracy unit 3b eliminates the three-dimensional coordinates of the feature points that could not be obtained from the observation target at the estimated viewpoint from the corresponding point candidates, thereby supporting the 3D coordinates. Increase the accuracy of the relationship.

カメラ姿勢算出部３ｃは、前記高精度化された対応関係（2D-3D対応関係）、ならびにカメラの内部および外部パラメータに基づいて、観察対象Mに対するARシステム１の姿勢を推定する。 The camera posture calculation unit 3c estimates the posture of the AR system 1 with respect to the observation target M based on the highly accurate correspondence relationship (2D-3D correspondence relationship) and the internal and external parameters of the camera.

従来から、3D座標と2D座標とのマッチから、その関係を説明するカメラ姿勢（カメラの外部パラメータ）を推定する手法が検討されており、3D座標と2D座標との関係は、一般的に次式(1)で表される。 Conventionally, methods for estimating the camera pose (external parameters of the camera) that explain the relationship from the match between the 3D coordinate and the 2D coordinate have been studied. The relationship between the 3D coordinate and the 2D coordinate is generally the following: It is expressed by equation (1).

[u,v,1]^T=sAW[X,Y,Z,1]^T … (1) [u, v, 1] ^ T = sAW [X, Y, Z, 1] ^ T… (1)

ここで、[u,v]，[X,Y,Z]は、それぞれ２次元ピクセル座標値および3D座標値を表し、[・]^Tは転置行列を表す。また、Ａ、Ｗは、それぞれカメラの内部パラメータおよび外部パラメータ（カメラ姿勢）を表す。カメラの内部パラメータは予めカメラキャリブレーションによって求めておく。カメラ姿勢W=[R,t]=[r1,r2,r3,t]であり、回転行列Rと並進ベクトルtとで表される。3D座標[X,Y,Z,1]^Tと2D座標[u,v,1]^Tとのマッチおよびカメラの内部パラメータを用いて、カメラ姿勢Wを推定できる。 Here, [u, v] and [X, Y, Z] represent a two-dimensional pixel coordinate value and a 3D coordinate value, respectively, and [•] ^ T represents a transposed matrix. A and W represent an internal parameter and an external parameter (camera posture) of the camera, respectively. The internal parameters of the camera are obtained in advance by camera calibration. Camera posture W = [R, t] = [r1, r2, r3, t], which is represented by a rotation matrix R and a translation vector t. The camera pose W can be estimated using the match between the 3D coordinates [X, Y, Z, 1] ^ T and the 2D coordinates [u, v, 1] ^ T and the internal parameters of the camera.

なお、図１では観察対象Mの例として地球儀を扱っているが、直方体形状、円柱形状、球形状等のプリミティブな構造を持つ三次元物体や、より複雑な三次元構造を持つ物体が観察対象である場合でも、その三次元モデルが与えられれば同様のARシステムが構築可能である。そして、このようなARシステムによれば、観察対象に応じた重畳表示を行うことで、ユーザに直観的な情報掲示を実現することが可能である。地球儀の例では、高度を視覚表示するように地表を重畳表示することや、過去の大陸形状の重畳表示、国境や国名に変更があった際に更新した情報を重畳表示、ジェスチャー認識と組み合わせて指差した国名を表示するといった利用例が想定される。 In FIG. 1, the globe is used as an example of the observation target M, but a three-dimensional object having a primitive structure such as a rectangular parallelepiped shape, a cylindrical shape, or a spherical shape, or an object having a more complicated three-dimensional structure is to be observed. Even in this case, if the 3D model is given, a similar AR system can be constructed. According to such an AR system, it is possible to realize information posting intuitive to the user by performing superimposed display according to the observation target. In the example of the globe, the surface of the earth is superimposed so as to display the altitude visually, the past continental shape is superimposed, the information updated when there is a change in the border or country name is superimposed, and combined with gesture recognition A usage example in which the name of the country pointed to is displayed is assumed.

図２は、前記カメラ視点推定部３ａの機能を模式的に示した図であり、観察対象Mを撮影したカメラ画像Icaから特徴点を検出し、各特徴点から抽出した局所特徴量を分類器に適応する。この分類器には、観察対象Mの3Dモデルを多数の視点から観察して取得された多数の特徴点の局所特徴量とその3D座標との関係（3D座標確率分布）が、座標マッチング用として予め学習されると共に、前記局所特徴量と視点との関係（視点確率分布）も、視点分類用に予め学習されている。 FIG. 2 is a diagram schematically showing the function of the camera viewpoint estimator 3a. A feature point is detected from a camera image Ica obtained by photographing the observation object M, and a local feature amount extracted from each feature point is classified into a classifier. To adapt. In this classifier, the relationship (3D coordinate probability distribution) between the local feature quantities of many feature points obtained by observing the 3D model of the observation target M from many viewpoints and its 3D coordinates is used for coordinate matching. In addition to learning in advance, the relationship between the local feature amount and the viewpoint (viewpoint probability distribution) is also learned in advance for viewpoint classification.

本実施形態では、後に詳述するように、座標マッチング用の規則と視点分類用の規則とが共通化され、この共通規則に局所特徴量を適用することにより、一回の適用のみで、特徴点の局所特徴量に対応する3D座標および視点を、その尤度と共に出力できる。視点については、後に詳述するように、観察対象Mを中心部に収容できる多面体Uを仮想し、その頂点が視点の候補として定義される。 In this embodiment, as will be described in detail later, a rule for coordinate matching and a rule for viewpoint classification are made common, and by applying a local feature amount to this common rule, a feature can be obtained only once. The 3D coordinates and viewpoint corresponding to the local feature of the point can be output together with its likelihood. As regards the viewpoint, as will be described in detail later, a polyhedron U that can accommodate the observation object M in the center is virtualized, and its vertex is defined as a viewpoint candidate.

[第１実施形態]
図３は、前記カメラ視点推定部３ａの第１実施形態の機能ブロック図であり、図４は、その動作を模式的に表現した図である。特徴点検出部３１は、カメラ１が出力するカメラ画像Icaから特徴点検出器３１ａを用いて多数の特徴点を検出し、各特徴点およびその近傍の局所特徴情報をマッチング部３２へ出力する。 [First embodiment]
FIG. 3 is a functional block diagram of the first embodiment of the camera viewpoint estimation unit 3a, and FIG. 4 is a diagram schematically representing the operation thereof. The feature point detector 31 detects a large number of feature points from the camera image Ica output from the camera 1 using the feature point detector 31a, and outputs each feature point and local feature information in the vicinity thereof to the matching unit 32.

前記特徴点検出器３１ａとしては、Harrisコーナー検出器、Hessianキーポイント検出器あるいはFASTコーナー検出器など、特徴を持つ二次元座標を特定できるものであれば、あらゆる種類のものが使用可能である。本実施形態では、特許文献２の手法で採用されているFASTコーナー検出器を利用する。 Any kind of feature point detector 31a can be used as long as it can identify two-dimensional coordinates having features, such as a Harris corner detector, a Hessian key point detector, or a FAST corner detector. In the present embodiment, the FAST corner detector employed in the method of Patent Document 2 is used.

前記局所特徴情報は、例えばSIFTディスクリプタやSURFディスクリプタ等、特徴点を識別するための情報であり、一般的な局所特徴情報であれば、あらゆる種類のものが使用可能である。本実施形態では、特許文献２で採用されているパッチ画像が局所特徴情報として用いられる。 The local feature information is information for identifying a feature point such as a SIFT descriptor or a SURF descriptor, and any kind of general feature information can be used. In the present embodiment, the patch image adopted in Patent Document 2 is used as local feature information.

ここで、パッチ画像とは原画像から特定の大きさで切り出された画像であり、本実施形態においては特徴点を中心とした任意の幅と高さ（例えば幅３２ピクセル、高さ３２ピクセル）の画像のことである。特徴点が大きさ（スケール）の情報も持つ場合、その大きさに応じてパッチ画像の幅と高さを変更することが可能である。パッチ画像の取得は一般的な局所特徴情報と比較して画像の切り出しのみで済むため、非常に高速であるという特徴がある。 Here, the patch image is an image cut out from the original image at a specific size, and in this embodiment, an arbitrary width and height (for example, a width of 32 pixels and a height of 32 pixels) centering on the feature point. It is the image of. When a feature point also has size (scale) information, the width and height of the patch image can be changed according to the size. The patch image is acquired at a very high speed because it is only necessary to cut out the image as compared with general local feature information.

マッチング部３２において、3D座標DB３２ｅには、観察対象Mの3Dモデルを様々な視点から観察して得られた多数の画像のそれぞれから検出された多数の特徴点のパッチ画像が、その3D座標を表すラベルおよび視点を表すラベルと共に予め格納されている。本実施形態では、マッチング部３２が学習機能および識別機能を備え、カメラ画像Icaから検出された各特徴点のパッチ画像と観察対象Mの3D座標および視点との対応付けが、大きく以下の２段階で行われる。 In the matching unit 32, in the 3D coordinate DB 32e, patch images of a large number of feature points detected from each of a large number of images obtained by observing the 3D model of the observation target M from various viewpoints are displayed as 3D coordinates. It is stored in advance together with a label representing and a label representing the viewpoint. In the present embodiment, the matching unit 32 has a learning function and an identification function, and the correspondence between the patch image of each feature point detected from the camera image Ica and the 3D coordinates and viewpoint of the observation object M is largely divided into the following two stages. Done in

(1)学習段階では、学習部３２ｃが主導的に機能し、前記3D座標DB３２ｅに格納されている多数の特徴点の中から、その3D座標がカメラ画像において特徴点として検出されやすい特徴点を選択部３２ｆにより予め選択し、当該選択されたパッチ画像に、その3D座標および視点の各ラベルを付して分類器３２ａに適用する。
分類器３２ａは、各パッチ画像の分類結果とその3D座標ラベルおよび視点ラベルとの対応関係をそれぞれ集計することにより、各パッチ画像（特徴点）とその3D座標との確率分布、および各パッチ画像とその視点との確率分布を求め、これを学習モデル（3D座標確率分布および視点確率分布）として確率分布DB３２ｂへ予め登録する。 (1) In the learning stage, the learning unit 32c functions as a lead, and among the many feature points stored in the 3D coordinate DB 32e, feature points that are easily detected as feature points in the camera image are selected. Selection is performed in advance by the selection unit 32f, and each label of the 3D coordinates and the viewpoint is attached to the selected patch image and applied to the classifier 32a.
The classifier 32a aggregates the correspondence between the classification result of each patch image and its 3D coordinate label and viewpoint label, thereby calculating the probability distribution between each patch image (feature point) and its 3D coordinate, and each patch image. And the probability distribution between the viewpoint and the viewpoint are obtained and registered in the probability distribution DB 32b in advance as a learning model (3D coordinate probability distribution and viewpoint probability distribution).

(2)識別段階では、識別部３２ｄが主導的に機能し、前記学習モデルの登録後、観察対象Mを撮影したカメラ画像Icaから検出された特徴点のパッチ画像を前記分類器３２ａに適用し、その分類結果と前記確率分布DB３２ｂに学習済みの確率分布とに基づいて、カメラ画像Icaの各特徴点と観察対象Mの3D座標との対応関係X1およびその尤度X2、ならびにカメラ画像Icaの各特徴点と視点との対応関係Y1およびその尤度Y2を識別する。視点決定部３３は、カメラ画像Icaの各特徴点（パッチ画像）と視点との対応関係Y1およびその尤度Y2に基づいて、尤度の高い少なくとも一つの視点を当該カメラ画像Icaの視点候補に決定する。 (2) In the identification stage, the identification unit 32d functions predominantly, and after the learning model is registered, the patch image of the feature point detected from the camera image Ica obtained by photographing the observation target M is applied to the classifier 32a. Based on the classification result and the probability distribution learned in the probability distribution DB 32b, the correspondence X1 between each feature point of the camera image Ica and the 3D coordinate of the observation object M and its likelihood X2, and the camera image Ica The correspondence relationship Y1 between each feature point and the viewpoint and its likelihood Y2 are identified. The viewpoint determination unit 33 selects at least one viewpoint having a high likelihood as a viewpoint candidate of the camera image Ica based on the correspondence Y1 between each feature point (patch image) of the camera image Ica and the viewpoint and its likelihood Y2. decide.

このようなマッチング部３２としては、パッチ画像をクラス分類可能なFerns分類器を利用できる。Ferns分類器は、図４に示したように、複数の決定木（Fern）から構成され、各Fernはパッチ画像を分岐させる決定規則を持つ多段構成の分岐点（ノード）とノードの末端（リーフ）から構成される。決定規則は、パッチ画像からランダムに選択した２点のピクセルの輝度の大小関係によって左右に分岐させるというものである。後段のノードは、パッチ画像を別の決定規則によってさらに分岐させるが、同じ段数のノードは同じ決定規則を持つ。そのためにノードの種類はノードの段数と等しい。最終的にパッチ画像が到達するノード（リーフノード）がパッチ画像の分類結果となる。 As such a matching unit 32, a Ferns classifier capable of classifying patch images can be used. As shown in FIG. 4, the Ferns classifier is composed of a plurality of decision trees (Fern), and each Fern has a multistage branch point (node) and a terminal end (leaf) having a decision rule for branching a patch image. ). The decision rule is to branch left and right depending on the magnitude relationship of the luminance of two pixels randomly selected from the patch image. The latter node further branches the patch image by another decision rule, but the nodes having the same number of steps have the same decision rule. Therefore, the node type is equal to the number of nodes. The node (leaf node) where the patch image finally arrives becomes the patch image classification result.

各リーフノードは学習によって獲得したパッチ画像に対応するクラスの確率分布を保持している。Ferns分類器は複数のFernを持っているため、パッチ画像を各Fernに入力し、それぞれの到達したリーフノードに対応する確率分布を取得し、単純ベイズ分類器を用いてその確率を乗算して最終的なパッチ画像に対応するクラスの確率分布を決定する。 Each leaf node holds a probability distribution of a class corresponding to a patch image acquired by learning. Since the Ferns classifier has multiple Ferns, the patch image is input to each Fern, the probability distribution corresponding to each reached leaf node is obtained, and the probability is multiplied using the naive Bayes classifier. The probability distribution of the class corresponding to the final patch image is determined.

各リーフノードは分類候補となる各クラスに対応する離散的な確率分布を保持しており、区間の総数は分類候補となるクラスの総数である。確率分布の学習は予め教師ラベル（正解のクラス）の付与されたパッチ画像を分類することで行い、本実施形態では特許文献２と同様にして観察対象Mの3Dモデルを利用して行う。 Each leaf node holds a discrete probability distribution corresponding to each class that is a classification candidate, and the total number of sections is the total number of classes that are classification candidates. Learning of the probability distribution is performed by classifying patch images to which teacher labels (correct answer classes) are assigned in advance. In this embodiment, the 3D model of the observation target M is used in the same manner as in Patent Document 2.

このように、本実施形態における分類結果は、3D座標ラベルおよび視点ラベルの付された各パッチ画像がどのリーフに到達したかという情報であり、3D座標ラベルおよび視点ラベルごとに確率分布が得られる。 As described above, the classification result in the present embodiment is information indicating which leaf each patch image with the 3D coordinate label and the viewpoint label has reached, and a probability distribution is obtained for each 3D coordinate label and the viewpoint label. .

本実施形態では、特徴点のパッチ画像と3D座標との対応関係およびその尤度[X1，X2]、ならびに前記パッチ画像と視点との対応関係およびその尤度[Y1，Y2]という、２つの対応関係が予め学習され、カメラ画像Icaの各特徴点から切り出されたパッチ画像をマッチング部３２に１回だけ適用するのみで、各特徴点のパッチ画像に対応する3D座標および視点を得られるようにした点に特徴がある。 In the present embodiment, the correspondence between the patch image of the feature point and the 3D coordinate and its likelihood [X1, X2], and the correspondence between the patch image and the viewpoint and its likelihood [Y1, Y2] The correspondence relationship is learned in advance, and the patch image cut out from each feature point of the camera image Ica is applied to the matching unit 32 only once, so that the 3D coordinates and the viewpoint corresponding to the patch image of each feature point can be obtained. There is a feature in the point.

次いで、前記学習段階の処理を、図５のフローチャートおよび図６の模式図を参照しながら説明する。ここでは、観察対象Mの形状、模様、色彩、大きさおよび各部の3D座標が忠実に表現された3Dモデルが、予め既知の手法により構築、用意されているものとして説明する。 Next, the learning process will be described with reference to the flowchart of FIG. 5 and the schematic diagram of FIG. Here, it is assumed that a 3D model in which the shape, pattern, color, size, and 3D coordinates of each part of the observation target M are faithfully expressed is constructed and prepared by a known method in advance.

また、3Dモデルを様々な視点から観察して得られた画像から多数の特徴点が検出され、各特徴点から切り出されたパッチ画像が、その3D座標を一意に識別できる3D座標ラベルおよび視点を一意に識別できる視点ラベルと紐付けられて予めデータベースに記憶されているものとする。本実施形態では、観察対象Mが約一万の視点から撮影され、視点ごとに各特徴点の3D座標およびそのパッチ画像が記憶される。 In addition, many feature points are detected from images obtained by observing the 3D model from various viewpoints, and the patch image cut out from each feature point has a 3D coordinate label and viewpoint that can uniquely identify the 3D coordinates. It is assumed that the viewpoint label that can be uniquely identified is associated with the viewpoint label and stored in advance in the database. In the present embodiment, the observation object M is photographed from about 10,000 viewpoints, and the 3D coordinates of each feature point and its patch image are stored for each viewpoint.

なお、クラスとして扱う3D座標や視点は離散的である必要があり、かつクラス数が多過ぎると分類精度が劣化するため、学習する3D座標や視点は適切な数に抑える必要がある。そこで、3D座標については、カメラ画像Icaとして写った際に特徴点として検出されやすい順に一定数が選択される。本実施形態では、図７に一例を示したように、視点ごとに得られる画像から検出される全ての特徴点を、その3D座標で分類し、検出回数の多い上位Nベスト、すなわち視点にかかわらず検出されやすい特徴点が教師データとして選択される。 Note that the 3D coordinates and viewpoints handled as classes need to be discrete, and if there are too many classes, the classification accuracy deteriorates. Therefore, it is necessary to limit the number of 3D coordinates and viewpoints to be learned to an appropriate number. Therefore, a fixed number of 3D coordinates is selected in the order in which the 3D coordinates are easily detected as feature points when captured as a camera image Ica. In this embodiment, as shown in an example in FIG. 7, all feature points detected from an image obtained for each viewpoint are classified by their 3D coordinates, and the top N vests with the highest number of detections, that is, the viewpoints, are classified. Feature points that are easy to detect are selected as teacher data.

視点については、3Dモデルの中心から均等な距離にあり、かつ互いに均等に離れるように配置する必要があることから、図８に一例を示したように、3Dモデルを中心部に収容できる多面体（本実施形態では、８０面体）を仮想し、その頂点（本実施形態では、４２個）が視点候補として定義される。
そして、前記約一万の視点を、その焦点位置に基づいて最近傍のいずれかの頂点候補に関連づけて同一グループに分類し、同一グループの視点には、その視点候補を代表する同一の視点ラベルが付される。すなわち、本実施形態では約一万個の視点が４２個の視点候補のいずれかと同一のグループに分類され、同一グループの視点には同一の視点ラベルが付される。前記カメラの焦点位置Pは、図９に示したように、カメラの外部パラメータ行列Wを用いて計算できる。
本実施形態では、多面体の頂点を用いることで、観察対象Mに対する視点をその方向に関して略等間隔に離散化しているが、視点を距離に関しても略等距離に離散化するのであれば、図１４に示したように、大きさの異なる多面体を階層的に複数配置し、各視点を最近傍のいずれかの多面体の頂点に分類すれば良い。
このような距離に関する離散化は、3Dモデルの中心からの距離を段階的に大きくした８０面体を、例えば図１４に示したように３つ配置し、これらの頂点１２６個（42個×3）を視点候補とすることで実現される。 Since the viewpoints need to be arranged at equal distances from the center of the 3D model and so as to be evenly spaced from each other, as shown in an example in FIG. 8, a polyhedron that can accommodate the 3D model in the center ( In this embodiment, an 80-hedron is hypothesized and its vertices (42 in this embodiment) are defined as viewpoint candidates.
Then, the approximately 10,000 viewpoints are classified into the same group in association with one of the nearest vertex candidates based on the focal position, and the same viewpoint label representing the viewpoint candidate is assigned to the viewpoint of the same group. Is attached. That is, in this embodiment, about 10,000 viewpoints are classified into the same group as any of the 42 viewpoint candidates, and the same viewpoint label is attached to the viewpoints of the same group. The focal position P of the camera can be calculated using an external parameter matrix W of the camera as shown in FIG.
In this embodiment, by using the vertices of the polyhedron, the viewpoint with respect to the observation target M is discretized at substantially equal intervals with respect to the direction, but if the viewpoint is discretized at approximately equal distances with respect to the distance, FIG. As shown in FIG. 5, a plurality of polyhedrons having different sizes may be arranged in a hierarchical manner, and each viewpoint may be classified as a vertex of one of the nearest polyhedrons.
Such discretization with respect to distance is performed by arranging three 80-hedrons whose distance from the center of the 3D model is increased stepwise, for example, as shown in FIG. 14, and 126 of these vertices (42 × 3). This is realized by using as a viewpoint candidate.

図５を参照し、ステップＳ１では、４２個の視点候補の中から未選択の視点候補の一つが今回の注目視点候補として選択される。ステップＳ２では、注目視点候補のグループに属する未選択の視点の一つが今回の注目視点として選択される。ステップＳ３では、今回の注目視点の画像から検出された未選択の特徴点の一つが今回の注目特徴点として選択される。 Referring to FIG. 5, in step S1, one of the unselected viewpoint candidates is selected from the 42 viewpoint candidates as the current viewpoint viewpoint candidate. In step S2, one of the unselected viewpoints belonging to the target viewpoint candidate group is selected as the current target viewpoint. In step S3, one of the unselected feature points detected from the current viewpoint image is selected as the current feature point.

ステップＳ４では、図６に示したように、今回の注目特徴点から切り出されたパッチ画像が、その視点ラベルおよび3D座標ラベルと共に分類器３２ａに適応され、分類結果がいずれかのリーフへ投票される。ステップＳ５では、今回の注目視点に関して、検出されている全ての特徴点のパッチ画像について、その投票が終了したか否かが判定される。終了していなければステップＳ３へ戻り、未選択の特徴点を選択し直して上記の各処理が繰り返される。 In step S4, as shown in FIG. 6, the patch image cut out from the current feature point is applied to the classifier 32a together with the viewpoint label and the 3D coordinate label, and the classification result is voted on any leaf. The In step S5, it is determined whether or not voting has been completed for patch images of all detected feature points with respect to the current viewpoint of interest. If not completed, the process returns to step S3, an unselected feature point is selected again, and the above processes are repeated.

ステップＳ６では、今回の注目視点について、同一グループに所属する全ての視点の画像から切り出されたパッチ画像に関して投票が終了したか否かが判定される。終了していなければステップＳ２へ戻り、未選択の視点を選択し直して上記の各処理が繰り返される。ステップＳ７では、全ての視点候補について上記の各処理が終了したか否かが判定される。終了していなければステップＳ１へ戻り、未選択の視点候補を選択し直して上記の各処理が繰り返される。 In step S <b> 6, it is determined whether or not voting has been completed for the patch image cut out from images of all viewpoints belonging to the same group for the current viewpoint of interest. If not completed, the process returns to step S2, the unselected viewpoint is selected again, and the above processes are repeated. In step S7, it is determined whether or not each of the above processes has been completed for all viewpoint candidates. If not completed, the process returns to step S1 to reselect an unselected viewpoint candidate, and the above-described processes are repeated.

以上のようにして、全ての視点候補について、各視点候補のグループに属する全ての視点の全ての特徴点のパッチ画像に基づく3D座標ラベルおよび視点ラベルの投票が完了すると、各投票結果が「3D座標確率分布」および「視点確率分布」として確率分布DB３２ｂに登録される。 As described above, when voting of 3D coordinate labels and viewpoint labels based on patch images of all feature points of all viewpoints belonging to each viewpoint candidate group is completed for each viewpoint candidate, each voting result becomes “3D “Coordinate probability distribution” and “viewpoint probability distribution” are registered in the probability distribution DB 32b.

次いで、前記識別段階の処理を、図１０のフローチャートおよび図１１の模式図を参照して説明する。ステップＳ３１では、入力されたカメラ画像Icaから全ての特徴点が検出され、その2D座標が判別される。ステップＳ３２では、前記各特徴点からパッチ画像が切り出される。 Next, the process of the identification stage will be described with reference to the flowchart of FIG. 10 and the schematic diagram of FIG. In step S31, all feature points are detected from the input camera image Ica, and their 2D coordinates are determined. In step S32, a patch image is cut out from each feature point.

ステップＳ３３では、各特徴点のパッチ画像が、前記分類器３２ａの各Fern(Fern1〜FernN)に順次に適用されてノードの分岐がFernごとに進行し、到達先のリーフが判別される。ステップＳ３４では、図１１に示したように、到達先の各リーフに登録されている確率分布が集計され、特徴点ごとに3D座標および視点の確率（尤度）の対数和が算出される。 In step S33, the patch image of each feature point is sequentially applied to each Fern (Fern1 to FernN) of the classifier 32a, and node branching proceeds for each Fern, and the destination leaf is determined. In step S34, as shown in FIG. 11, the probability distributions registered in each destination leaf are aggregated, and the logarithmic sum of 3D coordinates and viewpoint probabilities (likelihood) is calculated for each feature point.

ステップＳ３５では、前記識別部３２ｄにおいて、対数和が最大値を示したクラスIDに基づいて、今回の特徴点に対応する3D座標および視点が識別される。ステップＳ３６では、カメラ画像Icaから検出されている全ての特徴点について、その識別が完了したか否かが判定される。完了していなければステップＳ３３へ戻り、残り全ての特徴点に関して同様の識別手順が繰り返される。 In step S35, the identification unit 32d identifies the 3D coordinate and viewpoint corresponding to the current feature point based on the class ID whose logarithmic sum indicates the maximum value. In step S36, it is determined whether or not identification of all feature points detected from the camera image Ica has been completed. If not completed, the process returns to step S33, and the same identification procedure is repeated for all remaining feature points.

以上のようにして、観察対象Mのカメラ画像Icaから検出された各特徴点のパッチ画像と観察対象Mの3D座標との対応関係、および各パッチ画像と視点との対応関係が取得されると、前記視点決定部３３は、それらを総合的に評価して最終的な視点の推定値を決定する。 As described above, when the correspondence between the patch image of each feature point detected from the camera image Ica of the observation target M and the 3D coordinate of the observation target M and the correspondence between each patch image and the viewpoint are acquired. The viewpoint determination unit 33 comprehensively evaluates them and determines a final viewpoint estimation value.

本実施形態では、各対応関係に結び付けられた各視点の確率分布を、単純ベイズ分類器を用いて乗算若しくは対数和を計算することで総合的な視点の確率分布とする。ここで、各視点の確率分布を平均することで、総合的な視点の確率分布とすることも可能である。そして、この確率分布（カメラ画像Ica内の対象物に対する各視点の確率）に基づいて、最尤の一つの視点または尤度が上位Nベストの複数の視点を識別し、これを視点の決定結果として出力する。 In the present embodiment, the probability distribution of each viewpoint linked to each correspondence relationship is set as a comprehensive viewpoint probability distribution by calculating multiplication or logarithmic sum using a naive Bayes classifier. Here, it is possible to obtain a comprehensive probability distribution of viewpoints by averaging the probability distribution of each viewpoint. Then, based on this probability distribution (probability of each viewpoint with respect to the object in the camera image Ica), one viewpoint with the highest likelihood or a plurality of viewpoints with the highest N highest likelihoods are identified, and this is determined as the viewpoint determination result. Output as.

図１へ戻り、前記高精度化部３ｂは、前記決定された１ないし尤度が上位N個の視点に対応する3D座標対応関係のみを残し、他の3D座標対応関係を削除することにより3D座標対応関係を高精度化する。すなわち、前記3D座標対応関係には、観察対象Mを予め様々な視点で観察して得られた各画像から検出された特徴点の三次元座標が対応点候補として含まれるが、ここでは、前記事前の学習段階において、前記決定された視点で観察対象Mを観察したときには得られなかった特徴点の三次元座標が対応点候補から除外される。これにより、前記事前の学習段階において例えば前記決定された視点とは正反対の視点で観察対象Mを観察したとき得られた特徴点のように、今回の対応点としてあり得ない特徴点の三次元座標が対応点候補から除外される。前記カメラ姿勢算出部３ｃは、前記高精度化された3D座標対応関係に基づいてカメラ姿勢を推定する。 Returning to FIG. 1, the high-accuracy unit 3b leaves only the 3D coordinate correspondences corresponding to the top N viewpoints with the determined 1 to likelihoods, and deletes the other 3D coordinate correspondences. Improve the coordinate correspondence. That is, the 3D coordinate correspondence includes three-dimensional coordinates of feature points detected from images obtained by observing the observation object M from various viewpoints in advance as corresponding point candidates. In the learning stage before the article, the three-dimensional coordinates of the feature points not obtained when observing the observation object M from the determined viewpoint are excluded from the corresponding point candidates. Thus, in the prior learning stage, for example, a feature point that is not possible as a corresponding point this time, such as a feature point obtained when observing the observation target M from a viewpoint opposite to the determined viewpoint, for example, The original coordinates are excluded from the corresponding point candidates. The camera posture calculation unit 3c estimates a camera posture based on the highly accurate 3D coordinate correspondence.

本実施形態よれば、学習段階においては、観察対象Mの各特徴点から切り出されたパッチ画像を、その3D座標ラベルおよび視点ラベルと共に分類器３２ａの共通規則に一回適用するだけで学習モデルを構築できる。 According to the present embodiment, in the learning stage, the patch model cut out from each feature point of the observation target M is applied to the common rule of the classifier 32a together with the 3D coordinate label and the viewpoint label, and the learning model is simply applied once. Can be built.

また、認識段階においても、観察対象Mのカメラ画像Icaから検出された各特徴点のパッチ画像を前記分類器の共通規則に一回適用するだけで、各パッチ画像と観察対象Mの3D座標との対応関係、および各パッチ画像と視点との対応関係を取得できる。したがって、観察対象の3D座標とそのカメラ画像の2D座標とを、計算コストを増加させることなく高速かつ高精度で対応付けられるようになる。 In the recognition stage, the patch image of each feature point detected from the camera image Ica of the observation object M is applied once to the common rule of the classifier, and each patch image and the 3D coordinates of the observation object M are And the correspondence between each patch image and the viewpoint can be acquired. Therefore, the 3D coordinates of the observation target and the 2D coordinates of the camera image can be associated with each other at high speed and with high accuracy without increasing the calculation cost.

[第２実施形態]
図１２は、前記カメラ視点推定部３ａの第２実施形態の機能ブロック図であり、図１３は、本実施形態の動作を模式的に表現した図である。いずれの図でも、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。 [Second Embodiment]
FIG. 12 is a functional block diagram of the second embodiment of the camera viewpoint estimation unit 3a, and FIG. 13 is a diagram schematically representing the operation of this embodiment. In any of the drawings, the same reference numerals as those described above represent the same or equivalent parts, and thus the description thereof is omitted.

本実施形態では、マッチング部３２から出力される各特徴点（パッチ画像）とその視点との対応関係[Y1，Y2]に対して、同じくマッチング部３２から出力される各特徴点とその3D座標との対応関係[X1，X2]に基づいて閾値処理を施す閾値処理部３４を設けた点に特徴がある。 In the present embodiment, each feature point output from the matching unit 32 and its 3D coordinates for each feature point (patch image) output from the matching unit 32 and the corresponding relationship [Y1, Y2]. There is a feature in that a threshold processing unit 34 that performs threshold processing based on the correspondence relationship [X1, X2] is provided.

前記閾値処理部３４は、カメラ画像Icaから検出された各特徴点のパッチ画像とその3D座標との対応関係[X1，X2]から、尤度X2が所定の閾値を下回る対応関係、すなわち尤度の低い対応関係を選別し、その特徴点に対応する視点対応関係[Y1，Y2]を削除する。すなわち、3D座標対応関係[X1，X2]において尤度の低いパッチ画像の分類結果として求まる視点対応関係[Y1，Y2]を、その尤度Y2とは無関係に削除する。 The threshold processing unit 34 determines a correspondence relationship in which the likelihood X2 is less than a predetermined threshold, that is, the likelihood from the correspondence relationship [X1, X2] between the patch image of each feature point detected from the camera image Ica and its 3D coordinates. Are selected, and the viewpoint correspondence [Y1, Y2] corresponding to the feature point is deleted. That is, the viewpoint correspondence [Y1, Y2] obtained as the classification result of the patch images with low likelihood in the 3D coordinate correspondence [X1, X2] is deleted regardless of the likelihood Y2.

発明者等の実験結果によれば、3D座標対応関係[X1，X2]および視点対応関係[Y1，Y2]の対応点精度と各尤度X2，Y2との関係を比較したところ、3D座標対応関係[X1，X2]については、その対応点精度と尤度X2とが高い相関を示し、尤度X2が高いほど対応点精度も高くなることが確認された。これに対して、視点対応関係[Y1，Y2]については、その対応点精度と尤度X2との相関が低く、尤度Y2が対応点精度を正確に代表できない場合のあることが確認された。 According to the results of experiments by the inventors, when the correspondence between the corresponding point accuracy of the 3D coordinate correspondence [X1, X2] and the viewpoint correspondence [Y1, Y2] and each likelihood X2, Y2 is compared, 3D coordinate correspondence Regarding the relationship [X1, X2], the corresponding point accuracy and the likelihood X2 showed a high correlation, and it was confirmed that the corresponding point accuracy increases as the likelihood X2 increases. On the other hand, for the viewpoint correspondence [Y1, Y2], the correlation between the corresponding point accuracy and the likelihood X2 is low, and it was confirmed that the likelihood Y2 may not accurately represent the corresponding point accuracy. .

一般的に、3D座標対応関係の尤度X2は視点対応関係の尤度Y2と異なるが、尤度X2が低い場合、その特徴点は、背景から検出されたものやクラスとして選択されなかったモデル表面の点など、学習に用いられていない特徴点である可能性が高い。そこで、本実施形態では3D座標対応関係に基づいて視点対応関係を閾値処理することにより、視点対応関係[Y1，Y2]から、対応点精度が低いと推定される対応関係を省くことにより、確率分布の学習に使用された特徴点の割合を高めるようにした。 Generally, the likelihood X2 of the 3D coordinate correspondence is different from the likelihood Y2 of the viewpoint correspondence, but when the likelihood X2 is low, the feature point is not detected as a class or selected from the background There is a high possibility that the feature point is not used for learning, such as a surface point. Therefore, in this embodiment, by performing threshold processing on the viewpoint correspondence based on the 3D coordinate correspondence, by omitting the correspondence that is estimated to have low corresponding point accuracy from the viewpoint correspondence [Y1, Y2], the probability Increased the percentage of feature points used to learn the distribution.

そして、前記視点決定部３３は、前記閾値処理後の視点対応関係に結び付けられた各視点の確率分布を用いて総合的な視点の確率分布とする。そして、この確率分布（カメラ画像Ica内の対象物に対する各視点の確率）に基づいて、最尤の一つの視点または尤度が上位Nベストの複数の視点を識別し、これを視点の決定結果とする。高精度化部３ｂは、前記決定された１ないし尤度が上位N個の視点に対応する3D座標対応関係のみを残し、他の3D座標対応関係を削除することにより3D座標対応関係を高精度化する。 Then, the viewpoint determination unit 33 uses the probability distribution of each viewpoint linked to the viewpoint correspondence after the threshold processing to obtain a comprehensive viewpoint probability distribution. Then, based on this probability distribution (probability of each viewpoint with respect to the object in the camera image Ica), one viewpoint with the highest likelihood or a plurality of viewpoints with the highest N highest likelihoods are identified, and this is determined as the viewpoint determination result. And The high accuracy unit 3b leaves only the 3D coordinate correspondences corresponding to the top N viewpoints with the determined 1 to 1 likelihoods, and deletes the other 3D coordinate correspondences to obtain the 3D coordinate correspondences with high accuracy. Turn into.

本実施形態によれば、対応点精度と尤度との相関が高い3D座標対応関係に基づいて、対応点精度と尤度とが相関が低い場合がある視点対応関係が閾値処理され、この閾値処理後の視点対応関係の尤度に基づいて視点が決定されるので、視点の推定精度を向上させることができるようになる。 According to the present embodiment, based on the 3D coordinate correspondence where the correlation between the corresponding point accuracy and the likelihood is high, the viewpoint correspondence relationship in which the correlation between the corresponding point accuracy and the likelihood may be low is thresholded. Since the viewpoint is determined based on the likelihood of the processed viewpoint correspondence, the viewpoint estimation accuracy can be improved.

[第３実施形態]
上記の各実施形態では、認識時のみならず学習時もマッチング部３２による分類が１回しか行われないものとして説明したが、認証時に較べて事前の学習時には時間的な余裕があるので、3D座標確率分布の学習と視点学習とは別々に行うようにしても良い。本実施形態では、3D座標確率分布と視点確率分布とを別々に学習することで、より確度の高い3D座標推定を可能としている。 [Third embodiment]
In each of the above embodiments, the classification by the matching unit 32 is performed only once at the time of learning as well as at the time of recognition. The coordinate probability distribution learning and the viewpoint learning may be performed separately. In this embodiment, the 3D coordinate probability distribution and the viewpoint probability distribution are separately learned, thereby enabling more accurate 3D coordinate estimation.

すなわち、上記の各実施形態では、3D座標確率分布および視点確率分布の学習に際して、特徴点として検出されやすい3D座標（特徴点）に関するパッチ画像のみを学習部３２ｃの選択部３２ｆにより予め選択し、これを教師データ（第１教師データ）として、3D座標および視点の教師ラベルを付して分類器に適用することで、3D座標確率分布および視点確率分布が一挙に求められた。
これに対して、本実施形態では前記第１教師データに基づいて、先に3D座標確率分布のみが求められる。次いで、視点確率分布が学習されるが、ここでは前記選択部３２ｆにより選択されなかったパッチ画像を含む全てのパッチ画像、すなわち前記3D座標DB３２ｅに格納されている全てのパッチ画像を教師データ（第２教師データ）とし、視点のラベルを付して分類器に適用することにより視点確率分布が学習される。
認識段階では、特徴点として検出されやすい3D座標以外からも一部特徴点が検出され、そのパッチ画像がマッチング部３２に適用されることが起こり得る。しかしながら、上記の手順によって、視点確率分布の学習に用いる教師データを認識段階に入力され得るものに近付けることができるので、より高精度な視点推定が可能となる。結果として高精度化部３ｂの精度が向上し、より確度の高い3D座標推定が可能となる。
さらに、認識段階で上記第２実施形態に記載した閾値処理部３４を用いる場合においては、学習段階でもこれを活用することでより高精度な視点推定が可能になる。
前記視点確率分布を学習する際に、初めに第２教師データのパッチ画像を前記マッチング部３２に適用し、各パッチ画像と3D座標との対応関係[X1，X2]が求められる。次いで、これに閾値処理部３４が適用され、尤度の低いパッチ画像が第２教師データから削除される。
以上の処理によって第２教師データが修正され、修正後の第２教師データを用いて視点確率分布が学習される。認識段階では閾値処理部３４を適用することで、3D座標との対応関係[X1，X2]の尤度が低いパッチ画像は、実際には視点推定に使用されないことになる。
本実施形態では、上記の教師データの修正を行うことで視点確率分布の学習に用いる教師データを認識段階に入力され得るものに近づけることができ、より高精度な視点推定が可能になる。結果として高精度化部３ｂの精度が向上され、より確度の高い3D座標推定が可能になる。 That is, in each of the above embodiments, when learning the 3D coordinate probability distribution and the viewpoint probability distribution, only the patch image related to 3D coordinates (feature points) that are easily detected as feature points is selected in advance by the selection unit 32f of the learning unit 32c, By applying this to the classifier as teacher data (first teacher data) with 3D coordinates and viewpoint teacher labels, the 3D coordinate probability distribution and viewpoint probability distribution were obtained all at once.
In contrast, in the present embodiment, only the 3D coordinate probability distribution is first obtained based on the first teacher data. Next, the viewpoint probability distribution is learned. Here, all patch images including patch images not selected by the selection unit 32f, that is, all patch images stored in the 3D coordinate DB 32e are used as teacher data (first data). The viewpoint probability distribution is learned by attaching a viewpoint label and applying it to the classifier.
In the recognition stage, some feature points may be detected from other than the 3D coordinates that are easily detected as the feature points, and the patch image may be applied to the matching unit 32. However, according to the above procedure, the teacher data used for learning of the viewpoint probability distribution can be brought close to that which can be input at the recognition stage, so that more accurate viewpoint estimation can be performed. As a result, the accuracy of the high accuracy unit 3b is improved, and more accurate 3D coordinate estimation is possible.
Furthermore, when the threshold value processing unit 34 described in the second embodiment is used at the recognition stage, more accurate viewpoint estimation can be performed by utilizing this at the learning stage.
When learning the viewpoint probability distribution, the patch image of the second teacher data is first applied to the matching unit 32, and the correspondence [X1, X2] between each patch image and 3D coordinates is obtained. Next, the threshold processing unit 34 is applied to this, and a patch image with a low likelihood is deleted from the second teacher data.
The second teacher data is corrected by the above processing, and the viewpoint probability distribution is learned using the corrected second teacher data. By applying the threshold processing unit 34 at the recognition stage, a patch image having a low likelihood of the correspondence [X1, X2] with the 3D coordinates is not actually used for viewpoint estimation.
In the present embodiment, by correcting the teacher data described above, the teacher data used for learning of the viewpoint probability distribution can be brought close to what can be input to the recognition stage, and more accurate viewpoint estimation can be performed. As a result, the accuracy of the high accuracy unit 3b is improved, and more accurate 3D coordinate estimation is possible.

なお、上記の各実施形態では、ARシステム１のカメラ視点推定部３ａが、観察対象Mを模した三次元モデルの投影画像から検出された特徴点のパッチ画像に基づいて、自ら確率分布の学習モデルを構築して予め確率分布DB３２ｂに登録しておくものとして説明したが、本発明はこれのみに限定されるものではない。 In each of the above embodiments, the camera viewpoint estimation unit 3a of the AR system 1 learns the probability distribution by itself based on the patch image of the feature point detected from the projection image of the three-dimensional model simulating the observation target M. Although it has been described that a model is constructed and registered in advance in the probability distribution DB 32b, the present invention is not limited to this.

すなわち、学習部３２ｃと同一または同様の方法で学習モデルを構築できる専用システムを別途に用意し、当該専用システムにおいて上記と同様の方法、手順で確率分布を予め確立し、これをARシステム１のデータベースに登録して利用するようにしてもよい。このように、学習モデルを別途に構築してARシステム１のデータベースに登録するようにすれば、ユーザ側での学習操作が不要になる。 That is, a dedicated system capable of constructing a learning model in the same or similar method as the learning unit 32c is prepared separately, and a probability distribution is established in advance by the same method and procedure in the dedicated system. You may make it register and use for a database. In this way, if a learning model is separately constructed and registered in the database of the AR system 1, a learning operation on the user side becomes unnecessary.

１…撮像装置（カメラ），２…表示装置，３…カメラ姿勢推定装置，３ａ…カメラ視点推定部，３ｂ…高精度化部，３ｃ…カメラ姿勢算出部，４…付加情報データベース(DB)，３１…特徴点検出部，３１ａ…特徴点検出器，３２…マッチング部，３２ａ…分類器，３２ｂ…確率分布DB，３２ｃ…学習部，３２ｄ…識別部，３２ｅ…3D座標DB，３３…視点決定部，３４…閾値処理部 DESCRIPTION OF SYMBOLS 1 ... Imaging device (camera), 2 ... Display apparatus, 3 ... Camera attitude estimation apparatus, 3a ... Camera viewpoint estimation part, 3b ... High precision part, 3c ... Camera attitude calculation part, 4 ... Additional information database (DB), 31 ... Feature point detector, 31a ... Feature point detector, 32 ... Matching unit, 32a ... Classifier, 32b ... Probability distribution DB, 32c ... Learning unit, 32d ... Identifier, 32e ... 3D coordinate DB, 33 ... Determination of viewpoint Part, 34... Threshold processing part

Claims

An image that associates the three-dimensional coordinates of each first feature point detected from each image obtained by observing the observation target from a plurality of viewpoints and each second feature point detected from the camera image obtained by photographing the observation target. In the processing device,
A classifying means for classifying the local features according to a predetermined classification rule;
A database of probability distributions that give respective correspondence relationships of viewpoints with respect to observation targets in three-dimensional coordinates and camera images with respect to the classification results of the local feature amounts;
Identification means for identifying the correspondence between the second feature point, the three-dimensional coordinates, and the viewpoint based on the classification result obtained by applying the local feature amount of the second feature point to the classification means and the probability distribution;
An image processing apparatus comprising: means for improving the correspondence between the second feature point and the three-dimensional coordinates based on the correspondence between the second feature point and the viewpoint.

Means for determining at least one viewpoint for the observation target in the camera image based on the correspondence between the second feature point and the viewpoint;
The image processing apparatus according to claim 1, wherein the high-accuracy means excludes three-dimensional coordinates of a first feature point that cannot be obtained from the observation target at the determined viewpoint from corresponding point candidates. .

The means for improving the accuracy can be detected from the image of the determined viewpoint when the three-dimensional coordinates of the first feature points are detected from the images obtained by observing the observation target from a plurality of viewpoints. The image processing apparatus according to claim 2, wherein the three-dimensional coordinates of the first feature point that did not exist are excluded from the corresponding point candidates.

Means for thresholding the correspondence between the second feature point and the viewpoint based on each likelihood in the correspondence between the second feature point and the three-dimensional coordinate;
The means for determining the viewpoint determines the viewpoint for the observation target in the camera image based on the correspondence between the second feature point after the threshold processing and the viewpoint. An image processing apparatus according to claim 1.

2. The learning apparatus according to claim 1, further comprising learning means for learning each probability distribution by applying a local feature amount of each first feature point with a three-dimensional coordinate label and a viewpoint label to the classifier. 5. The image processing device according to any one of 4.

Each first feature point detected from each image obtained by observing the observation object from a plurality of viewpoints is less than the number of viewpoints based on the viewpoint position, and is approximately equidistant from the observation object. The first viewpoint is classified into the nearest one among a plurality of viewpoint candidates discretized at intervals, and the same viewpoint label is attached to the first feature point classified as the same viewpoint candidate. 5. The image processing apparatus according to 5.

Each image in which the first feature point is detected is obtained by observing the observation target from a plurality of directions and distances,
Each of the first feature points is classified into the closest viewpoint from among a plurality of viewpoint candidates discretized at equal intervals including the distance from the observation target based on the viewpoint position, The image processing apparatus according to claim 5, wherein the classified first feature point is assigned the same viewpoint label.

Means for selecting three-dimensional coordinates that are easy to detect as feature points in a camera image from the first feature points;
The learning means learns a probability distribution of the three-dimensional coordinates based on local feature quantities of feature points corresponding to the three-dimensional coordinates that are easily detected. Image processing device.

Means for selecting three-dimensional coordinates that are easy to detect as feature points in a camera image from the first feature points;
The learning means learns each probability distribution of the three-dimensional coordinate and the viewpoint based on a local feature amount of a feature point corresponding to the three-dimensional coordinate that is easily detected. An image processing apparatus according to 1.

The image processing apparatus according to claim 1, wherein the local feature information is a patch image cut out from each feature point and its vicinity.