JP2015032256A

JP2015032256A - Image processing device and database construction device therefor

Info

Publication number: JP2015032256A
Application number: JP2013163446A
Authority: JP
Inventors: 小林　達也; Tatsuya Kobayashi; 達也小林; 加藤　晴久; Haruhisa Kato; 晴久加藤; 柳原　広昌; Hiromasa Yanagihara; 広昌柳原
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-08-06
Filing date: 2013-08-06
Publication date: 2015-02-16
Anticipated expiration: 2033-08-06
Also published as: JP6086491B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing device and a database construction device therefor capable of reducing an off-line learning time by improving robustness against a shooting distance or angle without increasing a processing load.SOLUTION: When an algorithm having high robustness against a change in distance is adopted, equivalent matching accuracy is obtained irrespective of shooting distance, so that there is no need for preparing a number of representative images differing in shooting distance for each angle. In other words, a view point interval is roughened with respect to distance. On the other hand, when an algorithm having low robustness against a change in shooting distance is mounted, matching accuracy changes greatly in accordance with the shooting distance, so that it is necessary to prepare a number of representative images differing in distance for each angle. In other words, the view point interval is thickened with respect to distance. When a recognition algorithm having high robustness against a change in distance or rotation is adopted, equivalent matching accuracy is obtained irrespective of angle or rotation, so that the view point interval is roughened with respect to angle or rotation.

Description

本発明は、カメラ等の撮像手段によって取得された入力画像から認識対象を識別し、認識対象とカメラとの相対的な位置および姿勢の関係を推定する画像処理装置およびそのデータベース構築装置に関する。 The present invention relates to an image processing apparatus that identifies a recognition target from an input image acquired by an imaging unit such as a camera, and estimates a relative position and orientation relationship between the recognition target and the camera, and a database construction apparatus thereof.

近年、現実空間の映像をコンピュータで処理して更なる情報を付加するAR（拡張現実感）技術が、携帯電話等のカメラモジュールを備えた情報端末で実現されている。これらAR技術においては、処理リソースの少ない端末においても、カメラ等の撮像手段によって取得された入力画像内の認識対象物を識別し、それらの位置および姿勢の関係（以下、姿勢パラメータと表現する場合もある）をリアルタイムに推定する必要がある。 In recent years, AR (augmented reality) technology for processing a real-space image by a computer and adding further information has been realized in an information terminal equipped with a camera module such as a mobile phone. In these AR technologies, even in a terminal with few processing resources, recognition objects in an input image acquired by an imaging means such as a camera are identified, and the relationship between their positions and postures (hereinafter referred to as posture parameters) Need to be estimated in real time.

特許文献１では、画像認識処理によって、入力画像中の基準マーカに対する姿勢パラメータを推定する技術が開示されている。特許文献２では、サーバと情報端末における特徴点マッチングを利用した画像認識処理によって、処理リソースの少ない端末において、入力画像中の任意の対象物を識別する技術が開示されている。特許文献３では、初期化と追跡を組み合わせたハイブリッド型の画像認識処理によって、処理リソースの少ない端末において、入力画像中の任意の対象物に対する姿勢パラメータを推定する技術が開示されている。 Patent Document 1 discloses a technique for estimating a posture parameter with respect to a reference marker in an input image by image recognition processing. Patent Document 2 discloses a technique for identifying an arbitrary object in an input image in a terminal with few processing resources by image recognition processing using feature point matching between a server and an information terminal. Patent Document 3 discloses a technique for estimating a posture parameter for an arbitrary object in an input image in a terminal with less processing resources by a hybrid type image recognition process combining initialization and tracking.

特開２０１２−２１２４６０号公報JP 2012-212460 A 特開２０１２−２０３６６９号公報JP 2012-203669 A 特表２０１３−５０８８４４号公報Special table 2013-508844 gazette

M. Ozuysal, M. Calonder, V. Lepetit, and P. Fua, "Fast keypoint recognition using random ferns," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 3, pp. 448 -461, march 2010.M. Ozuysal, M. Calonder, V. Lepetit, and P. Fua, "Fast keypoint recognition using random ferns," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 3, pp. 448 -461, march 2010.

特許文献１に記載の従来技術においては、複雑なデザインのプリント等、任意の物体を認識対象物として扱うことができないという問題点があった。 In the prior art described in Patent Document 1, there is a problem that an arbitrary object such as a print with a complicated design cannot be handled as a recognition target object.

特許文献２に記載の従来技術においては、サーバと通信を行うことによる識別速度の低下や、対象物を撮影する距離や方向の変化に対する頑健性の不足が問題であった。 In the prior art described in Patent Document 2, there is a problem of a decrease in identification speed due to communication with a server and a lack of robustness with respect to a change in the distance and direction of photographing an object.

特許文献３に記載の従来技術においては、初期化時の対象物を撮影する距離や方向の変化に対する頑健性の不足が問題であった。 In the prior art described in Patent Document 3, there is a problem of lack of robustness with respect to changes in the distance and direction in which an object is imaged at the time of initialization.

また、特許文献２，３では、処理リソースの少ない端末でも実行可能な非特許文献1に記載のRandom Fernsと呼ばれる手法を用いることを想定しているが、その場合、実行前に必要なオフライン処理に時間を要することや、実行に必要なデータベースのサイズが大きいことが問題であった。 In Patent Documents 2 and 3, it is assumed that a technique called Random Ferns described in Non-Patent Document 1 that can be executed by a terminal with a small processing resource is used. In this case, offline processing required before execution is used. The problem was that it took a long time and the database size required for execution was large.

その一方で、オフライン処理に時間を要さないSURF，ORB，BRISK等の局所特徴量を用いた場合は、実行時の処理負荷や対象物を撮影する距離や方向の変化に対する頑健性の不足がより顕著になるという問題点があった。 On the other hand, when local features such as SURF, ORB, BRISK, etc. that do not require time for offline processing are used, there is a lack of robustness against changes in the processing load at the time of execution and the distance and direction of shooting the object. There was a problem of becoming more prominent.

本発明の目的は、上記の従来技術の課題を鑑みてなされたものであり、処理負荷を増やすことなく撮影距離や方向の変化に対する頑健性を向上させ、さらにオフライン処理時間を削減できる画像処理装置およびそのデータベース構築装置を提供することにある。 An object of the present invention has been made in view of the above-described problems of the prior art, and is an image processing apparatus capable of improving robustness against changes in shooting distance and direction without increasing processing load and further reducing offline processing time. And providing a database construction apparatus thereof.

(1) 本発明の画像処理装置は、認識対象を複数の視点で画像サンプリングして各視点に固有の代表画像を生成する代表画像生成手段と、認識対象のカメラ画像から特徴点およびその局所特徴量を検出する特徴点検出手段と、各代表画像から対応点候補およびその局所特徴量を検出する対応点候補検出手段と、前記対応点候補の局所特徴量および認識対象に逆投影した座標を対応付けて管理するデータベースと、カメラ画像から検出された各特徴点の局所特徴量と前記データベースで管理される各対応点候補の局所特徴量とのマッチング結果に基づいて対応点を認識する認識手段と、前記認識結果に基づいて、前記カメラ画像を撮影した際の姿勢パラメータを推定する姿勢パラメータ算出手段とを具備した。 (1) The image processing apparatus of the present invention includes representative image generation means for generating a representative image specific to each viewpoint by sampling the recognition target from a plurality of viewpoints, and feature points and local features thereof from the recognition target camera image. Corresponding feature point detecting means for detecting the amount, corresponding point candidate detecting means for detecting the corresponding point candidate and its local feature amount from each representative image, and the local feature amount of the corresponding point candidate and the back-projected coordinates on the recognition target A database to be managed, and a recognition means for recognizing corresponding points based on a matching result between local feature amounts of each feature point detected from the camera image and local feature amounts of each corresponding point candidate managed in the database; And posture parameter calculation means for estimating a posture parameter when the camera image is taken based on the recognition result.

(2) 本発明の画像処理装置は、認識対象を複数の視点で画像サンプリングして生成された代表画像から検出された各対応点候補の局所特徴量および認識対象に逆投影した座標を対応付けて管理するデータベースと、認識対象のカメラ画像から特徴点およびその局所特徴量を取得する特徴点検出手段と、カメラ画像から検出された各特徴点の局所特徴量と前記データベースで管理される各対応点候補の局所特徴量とのマッチング結果に基づいて対応点を認識する認識手段と、認識結果に基づいて、前記カメラ画像を撮影した際の姿勢パラメータを推定する姿勢パラメータ算出手段とを具備した。 (2) The image processing apparatus according to the present invention associates the local feature amount of each corresponding point candidate detected from the representative image generated by sampling the recognition target from a plurality of viewpoints and the back-projected coordinates on the recognition target. A database to be managed, feature point detection means for acquiring feature points and their local feature amounts from camera images to be recognized, local feature amounts of feature points detected from camera images, and correspondences managed in the database A recognition unit for recognizing a corresponding point based on a matching result with a local feature amount of a point candidate and a posture parameter calculation unit for estimating a posture parameter when the camera image is taken based on the recognition result are provided.

(3) 本発明のデータベース構築装置は、認識対象を複数の視点で画像サンプリングして各視点に固有の代表画像を生成する代表画像生成手段と、各代表画像から対応点候補およびその局所特徴量を取得する対応点候補検出手段と、対応点候補ごとに、その局所特徴量および認識対象に逆投影した座標を対応付けるデータベースとを具備した。 (3) The database construction device of the present invention includes representative image generation means for generating a representative image specific to each viewpoint by sampling a recognition target from a plurality of viewpoints, and corresponding point candidates and their local feature amounts from each representative image. And a database for associating the local feature amount and the back-projected coordinates to the recognition target for each corresponding point candidate.

(4) 代表画像生成手段は、特徴点および／または特徴点候補を検出するアルゴリズムの視点変化に対する頑健性に応じて、視点を変化させる距離、角度、回転などのパラメータごとに画像サンプリングの密度を決定するようにした。 (4) The representative image generation means determines the density of image sampling for each parameter such as distance, angle, and rotation that changes the viewpoint according to the robustness to the viewpoint change of the algorithm that detects feature points and / or feature point candidates. I decided to decide.

本発明によれば、以下のような効果が達成される。
(1) データベースには、投影パラメータの異なる代表画像そのものは蓄積されず、各代表点から検出された対応点候補の局所特徴量および当該対応点候補を認識対象へ逆投影して得られる座標値のみが蓄積され、さらに各局所特徴量は、その対応点候補の検出元の代表画像とは無関係に画一的に扱われるので、データベース容量の小型化およびデータベース構成の簡素化が達成され、認識速度の向上が図られる。 According to the present invention, the following effects are achieved.
(1) The representative image itself with different projection parameters is not stored in the database, but the local feature amount of the corresponding point candidate detected from each representative point and the coordinate value obtained by back projecting the corresponding point candidate to the recognition target In addition, each local feature is handled uniformly regardless of the representative image from which the corresponding point candidate is detected, thereby reducing the database capacity and simplifying the database configuration. The speed is improved.

(2) 投影パラメータ値を変えて代表画像を取得する際の視点配置の密度が、特徴点検出および対応点候補検出アルゴリズムの各視点変化に対する頑健性に応じて、その弱さを補うように決定される。したがって、頑健性の高い視点変化に関して密に視点配置を行って精度向上に寄与しない多数の代表画像に関して対応点候補を無駄に蓄積したり、その逆に頑健性の低い視点変化に関して粗に視点配置を行って精度を劣化させたりすることなく、処理負荷の軽減およびデータベース容量の小型化を、視点変化に対する十分な頑健性を確保しながら実現できるようになる。 (2) The density of the viewpoint arrangement when acquiring the representative image by changing the projection parameter value is determined to compensate for the weakness according to the robustness to each viewpoint change of the feature point detection and corresponding point candidate detection algorithm Is done. Therefore, densely arrange viewpoints for highly robust viewpoint changes and wastefully accumulate corresponding point candidates for a large number of representative images that do not contribute to accuracy improvement, or conversely, roughly arrange viewpoints for viewpoint changes with low robustness. The processing load can be reduced and the database capacity can be reduced without degrading the accuracy by performing the process while ensuring sufficient robustness against changes in viewpoint.

本発明が適用されるARシステムの構成を示したブロック図である。1 is a block diagram showing a configuration of an AR system to which the present invention is applied. 姿勢パラメータ推定装置２の主要部の構成を示したブロック図である。3 is a block diagram showing a configuration of a main part of posture parameter estimation apparatus 2. FIG. 認識対象から生成される代表画像の例を示した図である。It is the figure which showed the example of the representative image produced | generated from the recognition object. 代表画像の各対応点候補を認識対象に逆投影することで、各対応点候補に対応する認識対象の座標および勾配方向を取得する方法を示した図である。It is the figure which showed the method of acquiring the coordinate and gradient direction of the recognition target corresponding to each corresponding point candidate by back-projecting each corresponding point candidate of a representative image to a recognition target. 仮想視点の設定方法を示した図である。It is the figure which showed the setting method of a virtual viewpoint. 特徴点検出アルゴリズムの頑健性に応じて特徴点データベースの登録内容を異ならせる例を示した図である。It is the figure which showed the example which changes the registration content of a feature point database according to the robustness of a feature point detection algorithm. 画像処理装置のデータベース構築装置の構成を示したブロック図である。It is the block diagram which showed the structure of the database construction apparatus of an image processing apparatus.

以下、図面を参照して本発明の実施形態について詳細に説明する。図１は、本発明が適用されるARシステムの構成を示したブロック図であり、携帯電話、スマートフォン、PDAあるいはノートPCなどの情報端末に実装されて使用される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an AR system to which the present invention is applied, and is used by being mounted on an information terminal such as a mobile phone, a smartphone, a PDA, or a notebook PC.

撮像装置４は、携帯端末等に搭載されているカメラモジュールあるいはWEBカメラ装置であり、認識対象５を撮影してカメラ画像Icaを表示装置１および姿勢パラメータ推定装置２に出力する。認識対象５は、形状や模様が既知である任意の三次元物体であり、印刷物やプリント等に代表される二次元的な物体（画像）も含まれる。 The imaging device 4 is a camera module or WEB camera device mounted on a portable terminal or the like, captures the recognition target 5 and outputs a camera image Ica to the display device 1 and the posture parameter estimation device 2. The recognition target 5 is an arbitrary three-dimensional object whose shape and pattern are known, and includes a two-dimensional object (image) represented by a printed matter or a print.

姿勢パラメータ推定装置２は、撮像装置４から取得したカメラ画像に基づいて、後に詳述するように、認識対象５の各対応点候補（特徴点）とカメラ画像の各特徴点との間で特徴点マッチングを実施し、相互に対応付けられた特徴点ならびにカメラの内部パラメータに基づいて、認識対象５とARシステムないしは撮像装置４との相対的な位置および姿勢の関係を推定する。 The posture parameter estimation device 2 is based on the camera image acquired from the imaging device 4 and features between each corresponding point candidate (feature point) of the recognition target 5 and each feature point of the camera image, as will be described in detail later. Point matching is performed, and the relationship between the relative position and orientation of the recognition target 5 and the AR system or the imaging device 4 is estimated based on the feature points associated with each other and the internal parameters of the camera.

一般に、相対的な位置および姿勢の関係は、姿勢パラメータまたはカメラの外部パラメータと呼ばれる行列の形で表され、三次元空間内の位置と方向の情報が含まれる。画面内での物体の見え方は、この姿勢パラメータと、カメラの内部パラメータと呼ばれる、カメラに固有の焦点距離、主軸の位置の情報が含まれる行列やその他の光学的歪みのパラメータとによって決定される。 In general, the relationship between the relative position and orientation is expressed in the form of a matrix called an orientation parameter or a camera external parameter, and includes information on the position and orientation in a three-dimensional space. The appearance of the object on the screen is determined by this attitude parameter, the camera's intrinsic parameters, the focal length inherent to the camera, the matrix containing information on the principal axis position, and other optical distortion parameters. The

本実施形態では、内部パラメータや歪みパラメータは予めキャリブレーション等によって取得され、歪みは取り除かれているものとし、姿勢パラメータの推定結果は表示装置１へ提供される。ARシステムが複数種類の物体を認識対象とする場合には、姿勢パラメータと対象物のIDとの組が、認識した数だけ表示装置１へ提供される。 In the present embodiment, it is assumed that internal parameters and distortion parameters are acquired in advance by calibration or the like and distortion is removed, and the posture parameter estimation results are provided to the display device 1. When the AR system targets a plurality of types of objects as recognition targets, a set of posture parameters and target object IDs are provided to the display device 1 by the number of recognitions.

付加情報データベース３は、ハードディスクドライブや半導体メモリモジュール等により構成された記憶装置であり、認識対象５の位置をARシステムが認識した際に、表示装置１上で認識対象５に重畳表示するCGや２次元画像を保持しており、姿勢パラメータ推定装置２が推定したカメラの姿勢パラメータに対応する認識対象５に関する付加情報を表示装置１へ出力する。 The additional information database 3 is a storage device configured by a hard disk drive, a semiconductor memory module, or the like. When the AR system recognizes the position of the recognition target 5, the additional information database 3 is displayed on the display device 1 in a superimposed manner on the recognition target 5. A two-dimensional image is held, and additional information regarding the recognition target 5 corresponding to the camera posture parameter estimated by the posture parameter estimation device 2 is output to the display device 1.

前記表示装置１は、撮像装置４が連続的に取得したカメラ画像、および付加情報データベース３から取得された付加情報をユーザに掲示できるモニタ装置であり、携帯端末のディスプレイでも良い。また、ヘッドマウントディスプレイ(HMD)のような形態でも良く、特にシースルー型のHMDの場合はカメラ画像を表示せず、視界に付加情報のみを重畳して表示することも可能である。表示装置１がディスプレイである場合は、カメラ画像に付加情報DBから入力された付加情報を重畳表示する。その際、付加情報は、姿勢パラメータ推定装置２から入力された姿勢パラメータによって、位置や向きを補正されて表示される。 The display device 1 is a monitor device that can post a camera image continuously acquired by the imaging device 4 and additional information acquired from the additional information database 3 to the user, and may be a display of a portable terminal. In addition, a form such as a head-mounted display (HMD) may be used. In particular, in the case of a see-through type HMD, it is possible to display only the additional information superimposed on the field of view without displaying a camera image. When the display device 1 is a display, the additional information input from the additional information DB is superimposed on the camera image. At this time, the additional information is displayed with its position and orientation corrected by the posture parameter input from the posture parameter estimation device 2.

図２は、前記姿勢パラメータ推定装置２の主要部の構成を示したブロック図であり、特徴点検出部２１，マッチング部２２および姿勢パラメータ算出部２３により構成される。 FIG. 2 is a block diagram showing the configuration of the main part of the posture parameter estimation device 2, which includes a feature point detection unit 21, a matching unit 22, and a posture parameter calculation unit 23.

特徴点検出部２１において、特徴点検出器２０１は、認識対象５を撮影したカメラ画像から特徴点を検出する。特徴量抽出器２０２は、前記各特徴点から局所特徴量を抽出する。前記特徴点検出器２０１および特徴量抽出器２０２としては、Harris，Hessian，SIFT，SURF，FAST，BRIEF，ORB，BRISK，FREAK等のアルゴリズムを用いることができる。 In the feature point detection unit 21, the feature point detector 201 detects a feature point from a camera image obtained by capturing the recognition target 5. The feature quantity extractor 202 extracts a local feature quantity from each feature point. As the feature point detector 201 and feature quantity extractor 202, algorithms such as Harris, Hessian, SIFT, SURF, FAST, BRIEF, ORB, BRISK, and FREAK can be used.

一般に、これらのアルゴリズムには一長一短の側面があり、撮影の距離や角度・回転に頑健なSIFT，SURF等のアルゴリズムは処理負荷が大きく、処理負荷の小さいFAST，BRIEF等のアルゴリズムは、距離や角度・回転に対する頑健性に乏しい。処理負荷と頑健性とを両立できる手法として、特徴点を分類するための識別器を事前学習する、Random Fernsに代表されるアルゴリズムが存在するが、長時間の学習とサイズの大きいデータベース（識別器）を必要とするという課題がある。 In general, these algorithms have advantages and disadvantages. Algorithms such as SIFT and SURF that are robust to shooting distance, angle, and rotation have a large processing load, while algorithms such as FAST and BRIEF that have a low processing load have distance and angle.・ It is not robust against rotation. As a method that can achieve both processing load and robustness, there is an algorithm represented by Random Ferns that pre-learns classifiers for classifying feature points. However, long-term learning and large databases (classifiers) ) Is required.

マッチング部２２において、学習部２０３は、代表画像生成部２０３ａ、対応点候補検出部２０３ｂおよび座標対応付部２０３ｃを含む。 In the matching unit 22, the learning unit 203 includes a representative image generation unit 203a, a corresponding point candidate detection unit 203b, and a coordinate association unit 203c.

代表画像生成部２０３ａは、認識対象５を複数の位置および方向から（複数の視点から）撮影することで、あるいは視点を変えた複数の投影パラメータを使用して投影する（以下、画像サンプリングと表現する場合もある）ことで、代表画像を生成する。本実施形態では、複数の投影パラメータを使用して代表画像が生成される。ここで、視点は認識対象５と撮像装置４との距離、認識対象５に対する撮像装置４の角度、認識対象５に対する撮像装置４の光軸周りの回転等から構成される。代表画像生成部２０３ａは、複数の異なる視点を投影パラメータに変換する。この投影パラメータを用いて認識対象５を二次元に投影することにより、距離、角度および回転の異なる複数の画像が代表画像として生成される。 The representative image generation unit 203a projects the recognition target 5 by photographing it from a plurality of positions and directions (from a plurality of viewpoints) or using a plurality of projection parameters with different viewpoints (hereinafter referred to as image sampling and expression). In some cases, a representative image is generated. In the present embodiment, a representative image is generated using a plurality of projection parameters. Here, the viewpoint includes a distance between the recognition target 5 and the imaging device 4, an angle of the imaging device 4 with respect to the recognition target 5, a rotation around the optical axis of the imaging device 4 with respect to the recognition target 5, and the like. The representative image generation unit 203a converts a plurality of different viewpoints into projection parameters. By projecting the recognition target 5 in two dimensions using the projection parameters, a plurality of images having different distances, angles, and rotations are generated as representative images.

図３は、認識対象５から生成される代表画像の例を示した図であり、ここでは、複数種類の角度（横軸）および距離（縦軸）の組み合わせから変換した投影パラメータを用いて代表画像が生成されている。 FIG. 3 is a diagram showing an example of a representative image generated from the recognition target 5. Here, a representative image is represented by using projection parameters converted from combinations of a plurality of types of angles (horizontal axis) and distances (vertical axis). An image has been generated.

前記代表画像は、認識対象５を複数の視点から撮影することで取得することも可能であるが、オフライン処理の簡便さの観点から、認識対象５が平面物体であれば基準画像を射影変換（変換パラメータは投影パラメータから算出される）する等して、また複雑な三次元物体であれば、三次元モデルを投影することで人工的に作成する手法を用いることが望ましい。 The representative image can be acquired by photographing the recognition target 5 from a plurality of viewpoints. However, from the viewpoint of simplicity of offline processing, if the recognition target 5 is a planar object, the reference image is subjected to projective transformation ( It is desirable to use a method of artificially creating a 3D model by projecting a 3D model if the conversion parameter is calculated from a projection parameter) or if it is a complex 3D object.

対応点候補検出部２０３ｂは、前記特徴点検出部２１における手法と同様の手法で、前記各代表画像から特徴点を検出して対応点候補とし、さらに各対応点候補の局所特徴量を抽出する。 The corresponding point candidate detection unit 203b detects a feature point from each of the representative images by using a method similar to the method in the feature point detection unit 21, and extracts a local feature amount of each corresponding point candidate. .

なお、対応点候補の精度を高めるために、距離が予め設定された閾値以下の局所特徴量のみを対応付ける手法や、対応点の全体の傾向から外れた対応点候補を除外する手法等が知られており、これらの手法は本発明のマッチング部２２にも同様に適用可能である。 In order to increase the accuracy of corresponding point candidates, a method for associating only local feature amounts whose distance is equal to or less than a preset threshold, a method for excluding corresponding point candidates that deviate from the overall tendency of corresponding points, and the like are known. These methods are also applicable to the matching unit 22 of the present invention.

本実施形態では、マッチング部２２の対応点候補検出部２０３ｂとして、前記特徴点検出部２１と同様のアルゴリズムを用いることが望ましいが、分解能や閾値等の細かいパラメータまで一致させる必要はなく、むしろ処理負荷の観点では、カメラ画像ごとにマッチングを実行する特徴点検出部２１では、実行速度優先のパラメータ設定を行い、実行時の最初あるいはオフラインでの学習時にのみマッチングを実行する対応点候補検出部２０３ｂでは、精度優先のパラメータ設定とすることが望ましい。 In the present embodiment, it is desirable to use the same algorithm as that of the feature point detection unit 21 as the corresponding point candidate detection unit 203b of the matching unit 22, but it is not necessary to match even fine parameters such as resolution and threshold value, rather processing. From the viewpoint of load, the feature point detection unit 21 that executes matching for each camera image sets a parameter with priority on execution speed, and performs corresponding matching only at the beginning of execution or during offline learning. Then, it is desirable to set the parameter setting with priority on accuracy.

座標対応付部２０３ｃは、図４に一例を示したように、各代表画像から検出された各対応点候補を認識対象５に逆投影することで、各対応点候補に対応する認識対象５の物体座標系における座標値（３次元座標）を取得する。 As shown in an example in FIG. 4, the coordinate association unit 203 c backprojects each corresponding point candidate detected from each representative image onto the recognition target 5, so that the recognition target 5 corresponding to each corresponding point candidate is displayed. A coordinate value (three-dimensional coordinate) in the object coordinate system is acquired.

特徴点データベース２０４には、各代表画像から検出された各対応点候補の局所特徴量および座標対応付部２０３ｃによって取得された逆投影後の座標値が相互に対応付けられて対応点候補ごとに記憶される。それぞれの対応点候補は異なる画像（代表画像）から検出された局所特徴量を含むが、以降の処理では全て認識対象５から検出された対応点候補として画一的に扱われる。 In the feature point database 204, the local feature amount of each corresponding point candidate detected from each representative image and the coordinate value after backprojection acquired by the coordinate correspondence unit 203c are associated with each other and each corresponding point candidate is stored. Remembered. Each corresponding point candidate includes a local feature amount detected from a different image (representative image), but in the subsequent processing, all the corresponding point candidates are treated as a corresponding point candidate detected from the recognition target 5.

認識部２０５は、上記の学習後に認識対象５のカメラ画像から検出された各特徴点の局所特徴量と前記データベース２０４に蓄積されている各対応点候補の局所特徴量との対応点マッチングを行なう。本実施形態では、各特徴点および各対応点候補の局所特徴量がベクトル形式で表現されており、ユークリッド距離やハミング距離が最も近い特徴点と対応点候補とのペアが相互に対応付けられて対応点とされる。 The recognizing unit 205 performs corresponding point matching between the local feature amount of each feature point detected from the camera image of the recognition target 5 after learning and the local feature amount of each corresponding point candidate stored in the database 204. . In this embodiment, local feature quantities of each feature point and each corresponding point candidate are expressed in a vector format, and a pair of a feature point and a corresponding point candidate having the closest Euclidean distance or Hamming distance is associated with each other. It is considered as a corresponding point.

ここで、対応付けられた特徴点と対応点候補とのペア（対応点）には、一部誤りが含まれている可能性があり、対応点を姿勢パラメータ算出部２３に入力する前段階で選抜する処理を行うことで、対応点の信頼性の向上が可能である。 Here, there is a possibility that a pair of corresponding feature points and corresponding point candidates (corresponding points) includes a partial error, and the corresponding points are input before the posture parameter calculation unit 23 is input. By performing the selection process, the reliability of corresponding points can be improved.

この選抜処理の例として、例えば認識対象が平面物体の場合、各対応点を構成する特徴点と対応点候補の勾配方向を比較し、各対応点の勾配方向の差分が特異な対応点を削除する処理があげられる。ここで、認識対象が平面物体の場合、各勾配方向の差分は概ね一致することが期待される性質を利用している。 As an example of this selection processing, for example, when the recognition target is a planar object, the feature points constituting each corresponding point are compared with the gradient directions of the corresponding point candidates, and the corresponding points having a unique difference in the gradient direction of each corresponding point are deleted. Process to do. Here, when the recognition target is a planar object, the difference between the gradient directions is expected to be approximately the same.

ここで、上述した勾配方向を利用した選抜処理を行う場合、前記座標対応付部２０３ｃが逆投影の際に、各対応点候補に対応する認識対象５の物体座標系における座標値（３次元座標）だけでなく、各対応点候補に対応する認識対象５の物体座標系における勾配方向も同時に取得し、データベース２０４に記憶する必要がある。 Here, when the selection process using the gradient direction described above is performed, when the coordinate association unit 203c performs back projection, the coordinate values (three-dimensional coordinates) in the object coordinate system of the recognition target 5 corresponding to each corresponding point candidate. ) As well as the gradient direction in the object coordinate system of the recognition target 5 corresponding to each corresponding point candidate must be simultaneously acquired and stored in the database 204.

姿勢パラメータ算出部２３は、前記特徴点と対応点候補との対応関係、およびカメラの内部パラメータに基づいて姿勢パラメータを推定する。従来から、２次元座標と３次元座標とのマッチから、その関係を説明する姿勢パラメータを推定する手法が検討されており、３次元座標と２次元座標との関係は、一般的に次式(1)で表される。 The posture parameter calculation unit 23 estimates a posture parameter based on the correspondence between the feature point and the corresponding point candidate and the internal parameters of the camera. Conventionally, a method for estimating a posture parameter that explains the relationship from a match between a two-dimensional coordinate and a three-dimensional coordinate has been studied. The relationship between a three-dimensional coordinate and a two-dimensional coordinate is generally expressed by the following formula ( It is represented by 1).

ここで、[u,v]，[X,Y,Z]は、それぞれ２次元ピクセル座標値および３次元座標値を表し、[・]^Tは転置行列を表す。また、A、Wは、それぞれカメラの内部パラメータおよび姿勢パラメータを表す。カメラの内部パラメータは予めカメラキャリブレーションによって求めておく。 Here, [u, v] and [X, Y, Z] represent a two-dimensional pixel coordinate value and a three-dimensional coordinate value, respectively, and [•] ^ T represents a transposed matrix. A and W represent camera internal parameters and posture parameters, respectively. The internal parameters of the camera are obtained in advance by camera calibration.

姿勢パラメータW=[R,t]=[r1,r2,r3,t]であり、回転行列Rと並進ベクトルtとで表される。３次元座標[X,Y,Z,1]^Tと２次元座標[u,v,1]^Tとのマッチおよびカメラの内部パラメータを用いて姿勢パラメータWを推定できる。 Attitude parameter W = [R, t] = [r1, r2, r3, t], which is represented by a rotation matrix R and a translation vector t. The posture parameter W can be estimated using the match between the three-dimensional coordinates [X, Y, Z, 1] ^ T and the two-dimensional coordinates [u, v, 1] ^ T and the internal parameters of the camera.

ここで、入力された２次元座標と３次元座標の対応点の中には一部誤った対応点（ペア）が含まれるため、RANSACやPROSACに代表されるサンプリング手法によって入力された対応点から正しい対応点（インライア）のみを抽出し、Levenberg-Marquard法等の反復手法によって３次元座標の再投影誤差を最小化する姿勢パラメータの推定を行うことが一般的である。 Here, since the corresponding two-dimensional coordinates and the corresponding points of the three-dimensional coordinates include some erroneous corresponding points (pairs), the corresponding points input by the sampling method represented by RANSAC and PROSAC are used. It is common to extract only correct corresponding points (inliers) and estimate posture parameters that minimize the reprojection error of three-dimensional coordinates by an iterative method such as the Levenberg-Marquard method.

次いで、前記代表画像生成部２０３ａによる代表画像の生成方法について更に詳細に説明する。本発明では、多数の代表画像を、投影する視点に偏りがないように生成するために、認識対象５の周囲に仮想的な視点を偏りなく配置し、各視点から観察される画像を生成する。より具体的には、各視点から投影パラメータを算出し、認識対象５またはその三次元モデルを二次元に投影することで代表画像が生成される。 Next, a method for generating a representative image by the representative image generating unit 203a will be described in more detail. In the present invention, in order to generate a large number of representative images so that the viewpoints to be projected are not biased, virtual viewpoints are arranged evenly around the recognition target 5 and images observed from each viewpoint are generated. . More specifically, a projection image is calculated from each viewpoint, and a representative image is generated by projecting the recognition target 5 or its three-dimensional model in two dimensions.

本実施形態では、図５に一例を示したように、認識対象５の周囲を覆う半球を近似した多角形の頂点を仮想的な視点として扱う。この場合、次式(2)、(3)、(4)の計算を行うことで視点から回転行列Ｒを求めることが可能である。 In this embodiment, as illustrated in FIG. 5, a polygonal vertex that approximates a hemisphere that covers the periphery of the recognition target 5 is treated as a virtual viewpoint. In this case, the rotation matrix R can be obtained from the viewpoint by calculating the following equations (2), (3), and (4).

ただし However,

ここで、Pは仮想視点の三次元座標、rod(・)はロドリゲス変換を表す。さらに各回転行列について任意の並進パラメータtを加え、カメラの内部パラメータAと合わせて投影パラメータP=A［R｜t］を作成する。最終的に、代表画像生成部２０３ａは投影パラメータを用いて認識対象５を投影することで前記代表画像（図３）を生成する。 Here, P represents the three-dimensional coordinates of the virtual viewpoint, and rod (•) represents the Rodrigues transformation. Further, an arbitrary translation parameter t is added to each rotation matrix, and a projection parameter P = A [R | t] is created together with the internal parameter A of the camera. Finally, the representative image generation unit 203a generates the representative image (FIG. 3) by projecting the recognition target 5 using the projection parameters.

なお、代表画像生成部２０３ａは視点に偏りを作ることなく各代表画像を生成しても良いが、特定の視点から撮影される尤度が高いような状況下では、代表画像を生成する視点の範囲を制限しても良い。一般的に、認識対象５が平面物体であれば正面に近い角度から撮影される尤度が高いことが想定される。その場合、代表画像生成部２０３ａは、正面に近い仮想視点のみを用いて代表画像を生成することで、学習時間の削減および信頼性の向上を実現することが可能になる。 Note that the representative image generation unit 203a may generate each representative image without creating a bias in the viewpoint, but in a situation where the likelihood of shooting from a specific viewpoint is high, The range may be limited. In general, if the recognition target 5 is a planar object, it is assumed that the likelihood of photographing from an angle close to the front is high. In that case, the representative image generation unit 203a can generate a representative image using only a virtual viewpoint close to the front, thereby realizing reduction in learning time and improvement in reliability.

また、例えば路面上のデザイン等、撮影距離が一定の範囲内に制限されることが想定される場合には、並進パラメータの範囲を狭めることで代表画像数を削減し、学習時間の削減と信頼性の向上を実現することが可能にある。その一方で、認識対象５が三次元形状を持ち、全周囲から撮影される可能性がある場合には、仮想視点も全周囲に配置することが望ましい。 In addition, when the shooting distance is assumed to be limited within a certain range, for example, on the road surface, the number of representative images can be reduced by narrowing the range of the translation parameter, thereby reducing learning time and reliability. It is possible to improve the performance. On the other hand, when the recognition target 5 has a three-dimensional shape and may be photographed from the entire periphery, it is desirable to arrange the virtual viewpoints in the entire periphery.

ところで、特徴点検出部２１および対応点候補検出部２０３ｂが採用する特徴点検出のアルゴリズムが、例えば角度の変化に対して高い頑健性を有している場合、撮影角度に関して視点間隔を密にして代表画像数を増やしてもマッチング精度の向上に余り寄与しない。 By the way, when the feature point detection algorithm employed by the feature point detection unit 21 and the corresponding point candidate detection unit 203b has, for example, high robustness against a change in angle, the viewpoint interval is set close to the shooting angle. Increasing the number of representative images does not contribute much to the improvement of matching accuracy.

したがって、例えばSIFTアルゴリズムのように、距離、角度および回転（以下、「視点を変化させるパラメータ」で総称する場合もある）のいずれの変化に対する頑健性も優れたアルゴリズムを採用するのであれば、視点間隔を距離、角度、回転のいずれに関しても粗にすれば、オフライン処理負荷およびデータベース容量の小型化を、認識対象５を撮影する際の視点の変化に対する十分な頑健性を確保しながら実現できるようになる。 Therefore, if an algorithm that is excellent in robustness against any change in distance, angle, and rotation (hereinafter may be collectively referred to as “a parameter that changes the viewpoint”), such as the SIFT algorithm, is used. If the interval is rough with respect to any of distance, angle, and rotation, it is possible to reduce the offline processing load and the database capacity while ensuring sufficient robustness against changes in the viewpoint when photographing the recognition target 5. become.

一方、例えばFASTアルゴリズムやBRIEFアルゴリズムのように、距離、角度および回転のいずれの変化に対しても頑健性が低いアルゴリズムを採用するのであれば、視点間隔を距離、角度、回転のいずれに関しても密にすることで、マッチング精度の向上を実現できるようになる。 On the other hand, if an algorithm with low robustness against any change in distance, angle, and rotation, such as the FAST algorithm or the BRIEF algorithm, is adopted, the viewpoint interval is fine with respect to any distance, angle, or rotation. By doing so, it becomes possible to improve the matching accuracy.

ただし、必要以上に密に視点を配置してもマッチング精度の向上には寄与しないため、本発明の対応点候補検出部２０３ｂは、各視点変化（距離の変化、角度の変化、回転の変化等）に対する頑健性を補う必要最小限の数の代表画像を生成するようにしている。例えば、BRIEFアルゴリズムであれば、０〜１０度程度の回転であれば問題なくマッチングを行うことが可能であるため、回転に関しては、視点間隔（画像サンプリングの分解能）を２０度（１８パターン）や３０度（１２パターン）に設定することで、オフライン処理負荷およびデータベース容量の小型化を、認識対象５を撮影する際の回転の変化に対する十分な頑健性を確保しながら実現できるようになる。 However, even if the viewpoints are arranged more densely than necessary, the matching point candidate detection unit 203b of the present invention does not change the viewpoints (distance change, angle change, rotation change, etc.). ) To generate the minimum necessary number of representative images to compensate for robustness. For example, in the case of the BRIEF algorithm, matching can be performed without any problem if the rotation is about 0 to 10 degrees. Therefore, with respect to the rotation, the viewpoint interval (resolution of image sampling) is set to 20 degrees (18 patterns) or By setting the angle to 30 degrees (12 patterns), it is possible to reduce the off-line processing load and the database capacity while ensuring sufficient robustness against changes in rotation when the recognition target 5 is photographed.

また、例えばORBアルゴリズムやBRISKアルゴリズムのように、回転の変化に対する頑健性に較べて距離や角度の変化に対する頑健性が低いアルゴリズムを採用するのであれば、視点間隔を回転に関しては粗にする一方、距離や角度に関しては密にすれば、処理負荷の軽減およびデータベース容量の小型化を、認識対象５を撮影する際の視点の変化に対する十分な頑健性を確保しながら実現できるようになる。 In addition, if you adopt an algorithm that is less robust to changes in distance and angle compared to robustness against changes in rotation, such as the ORB algorithm and BRISK algorithm, the viewpoint interval will be coarse with respect to rotation, If the distance and angle are made dense, the processing load can be reduced and the database capacity can be reduced while securing sufficient robustness against changes in the viewpoint when the recognition target 5 is photographed.

一般に、回転の変化に対する頑健性にのみ優れるアルゴリズムは低処理負荷のものが数多く存在しており、距離や角度の変化に対する頑健性を加えると高処理負荷になる傾向がある。したがって、距離、角度の変化に対する頑健性に乏しいORBアルゴリズムやBRISKアルゴリズムを採用し、距離と角度の変化に対する頑健性を補えるように代表画像を生成することが、総合的な処理負荷と各視点変化に対する頑健性の両立を実現する上で望ましい。 In general, there are many algorithms with low processing load that are excellent only in robustness against changes in rotation, and if robustness against changes in distance and angle is added, there is a tendency to increase the processing load. Therefore, the ORB algorithm and BRISK algorithm, which are not robust against changes in distance and angle, are used to generate representative images to compensate for the robustness against changes in distance and angle. It is desirable to achieve both robustness against

上述したように、本発明の対応点候補検出部２０３ｂは、特徴点検出部２１および対応点候補検出部２０３ｂが採用するアルゴリズムの各視点変化（距離の変化、角度の変化、回転の変化等）に対する頑健性に応じて、その頑健性の弱さを補うように代表画像を生成するようにしている。 As described above, the corresponding point candidate detection unit 203b according to the present invention changes each viewpoint of the algorithm employed by the feature point detection unit 21 and the corresponding point candidate detection unit 203b (change in distance, change in angle, change in rotation, etc.). According to the robustness to the image, the representative image is generated so as to compensate for the weakness of the robustness.

図６は、特徴点検出部２１および対応点候補検出部２０３ｂが採用するアルゴリズムの各視点変化に対する頑健性に応じて、特徴点データベース２０４に対応点候補が登録される代表画像の傾向を異ならせる例を示した図であり、ここでは、角度の変化に対する頑健性が同等であって、距離の変化に対する頑健性の高いアルゴリズムを採用する場合と低いアルゴリズムを採用する場合との比較を示している。 FIG. 6 shows different tendencies of representative images in which corresponding point candidates are registered in the feature point database 204 according to the robustness to each viewpoint change of the algorithm adopted by the feature point detecting unit 21 and the corresponding point candidate detecting unit 203b. It is the figure which showed the example, and here shows the comparison with the case where the robustness with respect to the change of the angle is equal and the algorithm with high robustness with respect to the change of the distance is adopted .

距離の変化に対して頑健性の高い認識アルゴリズムを採用した場合は、撮影距離に関わらず同等のマッチング精度を得られるので、角度ごとに撮影距離の異なる多数の代表画像を用意する必要がない。すなわち、距離に関して視点の間隔を粗にできるので、特徴点が登録される代表画像数を減じることができる。 When a recognition algorithm that is highly robust with respect to a change in distance is used, the same matching accuracy can be obtained regardless of the shooting distance, and it is not necessary to prepare a large number of representative images having different shooting distances for each angle. In other words, since the distance between viewpoints can be made coarse with respect to distance, the number of representative images in which feature points are registered can be reduced.

一方、撮影距離の変化に対して頑健性の低いアルゴリズムを搭載した場合は、撮影距離に応じてマッチング精度が大きく変化するので、角度ごとに距離の異なる多数の代表画像を用意する必要がある。すなわち、距離に関して視点の間隔を密にしなければならないので、代表画像数が増えることになる。 On the other hand, when an algorithm having low robustness with respect to a change in the shooting distance is mounted, matching accuracy greatly changes depending on the shooting distance, so it is necessary to prepare a large number of representative images having different distances for each angle. In other words, the distance between viewpoints must be close with respect to the distance, and the number of representative images increases.

なお、図示は省略するが、角度や回転の変化に対して頑健性の高い認識アルゴリズムを採用した場合は、角度や回転に関わらず同等のマッチング精度を得られるので、撮影距離ごとに角度や回転の異なる多数の代表画像を用意する必要がない。すなわち、角度や回転に関して視点の間隔を粗に配置すれば良い。 Although not shown, when a recognition algorithm that is robust against changes in angle and rotation is used, the same matching accuracy can be obtained regardless of the angle and rotation. There is no need to prepare a large number of different representative images. That is, the viewpoint intervals may be roughly arranged with respect to the angle and rotation.

一方、角度や回転の変化に対して頑健性の低いアルゴリズムを搭載した場合は、角度や回転に応じてマッチング精度が大きく変化するので、撮影距離ごとに角度や回転の異なる多数の代表画像を用意する必要がある。すなわち、角度や回転に関して視点の間隔を密に設定しなければならない。 On the other hand, if an algorithm with low robustness against changes in angle and rotation is installed, the matching accuracy changes greatly depending on the angle and rotation, so a large number of representative images with different angles and rotation are prepared for each shooting distance. There is a need to. That is, the interval between viewpoints must be set closely with respect to angle and rotation.

このように、本実施形態によれば、投影パラメータ値を変えて代表画像を取得する際の視点配置の密度が、特徴点検出部および対応点候補検出部の各視点変化に対する頑健性の弱さを補うように決定される。したがって、頑健性の高い視点変化に関して密に視点配置を行って精度向上に寄与しない多数の代表画像に関して対応点候補を無駄に蓄積したり、その逆に頑健性の低い視点変化に関して粗に視点配置を行って精度を劣化させたりすることなく、処理負荷の軽減およびデータベース容量の小型化を、視点変化に対する十分な頑健性を確保しながら実現できるようになる。 As described above, according to the present embodiment, the density of the viewpoint arrangement when the representative image is acquired by changing the projection parameter value is weak in the robustness to each viewpoint change of the feature point detection unit and the corresponding point candidate detection unit. Is determined to compensate. Therefore, densely arrange viewpoints for highly robust viewpoint changes and wastefully accumulate corresponding point candidates for a large number of representative images that do not contribute to accuracy improvement, or conversely, roughly arrange viewpoints for viewpoint changes with low robustness. The processing load can be reduced and the database capacity can be reduced without degrading the accuracy by performing the process while ensuring sufficient robustness against changes in viewpoint.

なお、上記の実施形態では、ARシステムの姿勢パラメータ推定装置２が学習部２０３を備え、自ら特徴点データベース２０４を予め構築するものとして説明したが、本発明はこれのみに限定されるものではなく、図７に一例を示したように、前記学習部２０３と同等の学習機能１０を姿勢パラメータ推定装置の外部に設け、当該学習機能１０がオフラインで構築したデータベースを学習部２０３に移植するようにしても良い。 In the above embodiment, the posture parameter estimation apparatus 2 of the AR system is described as including the learning unit 203 and constructing the feature point database 204 in advance, but the present invention is not limited to this. 7, a learning function 10 equivalent to the learning unit 203 is provided outside the posture parameter estimation device, and a database constructed offline by the learning function 10 is transplanted to the learning unit 203. May be.

学習機能１０において、代表画像生成部１０１は、認識対象５を様々な投影パラメータで二次元に投影することで代表画像を生成する。対応点候補検出部１０２は、特徴点検出部２１における手法と同様の手法で、前記各代表画像から特徴点を検出して対応点候補とし、さらに各対応点候補の局所特徴量を各代表画像から抽出する。 In the learning function 10, the representative image generation unit 101 generates a representative image by projecting the recognition target 5 two-dimensionally with various projection parameters. Corresponding point candidate detection unit 102 detects feature points from each representative image as a corresponding point candidate by a method similar to the method in feature point detection unit 21, and further sets the local feature amount of each corresponding point candidate to each representative image. Extract from

座標対応付部１０３は、各代表画像から検出された各対応点候補を認識対象５に逆投影することで、各対応点候補に対応する認識対象５の物体座標系における座標値（３次元座標）および勾配方向を取得する。データベース１０４には、各代表画像から検出された各対応点候補の局所特徴量、逆投影後の座標値および勾配方向が、相互に対応付けられて対応点候補ごとに記憶され、その後、ARシステムの特徴点データベース２０４へ移植される。 The coordinate correlating unit 103 back-projects each corresponding point candidate detected from each representative image onto the recognition target 5 to thereby obtain a coordinate value (three-dimensional coordinate) in the object coordinate system of the recognition target 5 corresponding to each corresponding point candidate. ) And get the gradient direction. In the database 104, the local feature amount of each corresponding point candidate detected from each representative image, the coordinate value after back projection and the gradient direction are stored in association with each other for each corresponding point candidate. To the feature point database 204.

１…表示装置，２…姿勢パラメータ推定装置，３…付加情報データベース，４…撮像装置，５…認識対象，２１…特徴点検出部，２２…マッチング部，２３…姿勢パラメータ算出部，２０３…学習部，２０３ａ…代表画像生成部，２０３ｂ…対応点候補検出部，２０３ｃ…座標対応付部，２０４…特徴点データベース，２０５…認識部 DESCRIPTION OF SYMBOLS 1 ... Display apparatus, 2 ... Attitude parameter estimation apparatus, 3 ... Additional information database, 4 ... Imaging apparatus, 5 ... Recognition object, 21 ... Feature point detection part, 22 ... Matching part, 23 ... Attitude parameter calculation part, 203 ... Learning Part 203a ... representative image generation part 203b ... corresponding point candidate detection part 203c ... coordinate correspondence part 204 204 feature point database 205 ... recognition part

Claims

In an image processing apparatus that estimates posture parameters for a recognition target in a captured camera image,
Representative image generation means for sampling a recognition target from a plurality of viewpoints and generating a representative image unique to each viewpoint;
Feature point detection means for detecting a feature point and its local feature amount from a camera image to be recognized; and
A corresponding point candidate detecting means for detecting a corresponding point candidate and its local feature amount from each representative image;
For each of the corresponding point candidates, a database for managing the local feature amount and the backprojected coordinates on the recognition target in association with each other,
Recognizing means for recognizing corresponding points based on matching results between local feature amounts of each feature point detected from the camera image and local feature amounts of each corresponding point candidate managed in the database;
An image processing apparatus, comprising: an attitude parameter calculation unit that estimates an attitude parameter when the camera image is captured based on the recognition result.

The representative image generation means determines the density of image sampling for each parameter for changing the viewpoint, according to robustness to the viewpoint change of an algorithm for detecting the feature points and / or feature point candidates. Item 8. The image processing apparatus according to Item 1.

The image processing apparatus according to claim 1, wherein the representative image generation unit performs image sampling more densely for a parameter having lower robustness.

The image processing apparatus according to claim 1, wherein the representative image generation unit performs image sampling more coarsely for a parameter having higher robustness.

In an image processing apparatus that estimates posture parameters for a recognition target in a captured camera image,
For each corresponding point candidate detected from a representative image generated by sampling a recognition target from a plurality of viewpoints, a database for managing the local feature amount and the back-projected coordinates on the recognition target in association with each other,
Feature point detection means for acquiring a feature point and its local feature amount from a camera image to be recognized; and
Recognizing means for recognizing corresponding points based on matching results between local feature amounts of each feature point detected from the camera image and local feature amounts of each corresponding point candidate managed in the database;
An image processing apparatus, comprising: an attitude parameter calculation unit that estimates an attitude parameter when the camera image is captured based on the recognition result.

The image processing apparatus according to claim 5, wherein the representative image is image-sampled at a density corresponding to robustness against a viewpoint change of an algorithm for detecting the feature point for each parameter for changing the viewpoint. .

The image processing apparatus according to claim 1, wherein the parameter for changing the viewpoint includes at least one of a distance, an angle, and a rotation.

The image processing apparatus according to claim 1, wherein the representative image is an image observed from a viewpoint virtually arranged without any bias around the recognition target.

9. The image processing apparatus according to claim 1, wherein the representative image is a projection image obtained by projecting a recognition target in two dimensions, and the parameter for changing the viewpoint is a projection parameter.

In a database construction device of an image processing device that estimates posture parameters for a recognition target in a captured camera image,
Representative image generation means for sampling a recognition target from a plurality of viewpoints and generating a representative image unique to each viewpoint;
Corresponding point candidate detecting means for acquiring a corresponding point candidate and its local feature amount from each representative image;
A database construction apparatus for an image processing apparatus, comprising: a database that associates each of the corresponding point candidates with a local feature amount and coordinates back-projected on a recognition target.

The said representative image production | generation means determines the density of image sampling for every parameter which changes a viewpoint according to the robustness with respect to the viewpoint change of the algorithm which detects the said corresponding point candidate. A database construction device for an image processing device.

12. The database construction apparatus for an image processing apparatus according to claim 10, wherein the representative image generation means performs image sampling more densely for a parameter having lower robustness.

12. The database construction apparatus for an image processing apparatus according to claim 10, wherein the representative image generation means performs image sampling more roughly as the parameter having higher robustness.