JP6126979B2

JP6126979B2 - Feature selection apparatus, method, and program

Info

Publication number: JP6126979B2
Application number: JP2013256015A
Authority: JP
Inventors: 小萌武; 柏野　邦夫; 邦夫柏野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-12-11
Filing date: 2013-12-11
Publication date: 2017-05-10
Anticipated expiration: 2033-12-11
Also published as: JP2015114819A

Description

本発明は、特徴選択装置、方法、及びプログラムに係り、特に、特徴選択を行う特徴選択装置、方法、及びプログラムに関する。 The present invention relates to a feature selection device, method, and program, and more particularly, to a feature selection device, method, and program for performing feature selection.

局所特徴を対象とした特徴選択方法には、同一概念や物体等を表す複数画像における局所特徴の出現頻度に着目し、出現頻度が高い局所特徴を検索や認識等に有益な注目特徴と見なし選択する方法がある（非特許文献１、非特許文献２、非特許文献３）。 In the feature selection method for local features, focus on the appearance frequency of local features in multiple images representing the same concept or object, and select local features with high appearance frequency as useful features for search and recognition. (Non-patent document 1, Non-patent document 2, Non-patent document 3).

具体的には、ある局所特徴が画像に出現した場合における画像分類ラベルの事後確率を局所特徴の出現頻度と捉え、事後確率が高い局所特徴を選択する方法（非特許文献１）と、事後確率のエントロピーを基に情報獲得量と呼ばれる尺度を定義し、局所特徴の出現頻度と捉え、情報獲得量が高い局所特徴を選択する方法（非特許文献２）と、特徴選択問題を雑音特徴排除問題に設定し、関心分類ラベル（正例）であると誤認識される傾向が顕著な負例画像を対象に、負例画像における局所特徴の出現頻度に着目し、出現頻度が高い局所特徴を雑音と見なし排除する方法（非特許文献３）とがある。 Specifically, a method of selecting a local feature with a high posterior probability (Non-Patent Document 1), taking the posterior probability of an image classification label when a certain local feature appears in an image as the appearance frequency of the local feature, and the posterior probability A measure called information acquisition amount is defined based on the entropy of the image, and it is regarded as the appearance frequency of local features, and a local feature with a high information acquisition amount is selected (Non-Patent Document 2), and the feature selection problem is a noise feature exclusion problem Focusing on the appearance frequency of local features in negative example images, focusing on negative example images that tend to be misrecognized as interest classification labels (positive examples). (Non-patent Document 3).

また、特徴点間の文脈関係に着目する特徴選択方法には、同一概念や物体等を表すと思われる複数画像を対象に空間文脈に基づいた照合を実施し、空間的一貫性が強い局所特徴を検索や認識等に有益な注目特徴と見なし選択する方法がある（非特許文献４）。 In addition, the feature selection method that focuses on the contextual relationship between feature points is a local feature that has strong spatial consistency by performing collation based on spatial context for multiple images that may represent the same concept or object. Is selected as a feature of interest useful for search and recognition (Non-Patent Document 4).

非特許文献４の方法では、空間文脈照合方法の一つであるＲＡＮＳＡＣ(非特許文献５)を駆使し、ＲＡＮＳＡＣの出力であるＩｎｌｉｅｒと呼ばれる特徴点を空間的一貫性が高いと見なし選択する。 In the method of Non-Patent Document 4, RANSAC (Non-Patent Document 5), which is one of spatial context matching methods, is used to select a feature point called Inlier, which is an output of RANSAC, with high spatial consistency.

また、空間文脈を考慮した特徴表現方法は、画像中の濃淡変化が大きい特徴点を対象に、特徴点間の相対位置関係や幾何学特性差分等の文脈関係に着目し、このような文脈関係を画像特徴として表現する方法である。 In addition, the feature expression method that takes into account the spatial context focuses on feature points with large shading changes in the image, focusing on contextual relationships such as the relative positional relationship between the feature points and geometric characteristic differences. Is expressed as an image feature.

具体的には、マルチスケールドロネー図を提案・駆使し、画像空間上における近傍となる特徴点の三つ組を検出し、この三つ組を基本単位とした特徴表現方法（非特許文献６）と、Ｋ近傍法を駆使し、画像空間上における近傍となる特徴点対を検出し、この対の幾何学特性（大きさと主方向）差分に基づいた特徴表現方法（非特許文献７）と、画像間における類似特徴点を基に幾何学変換を複数仮設し、各仮設の証拠となる特徴点をＩｎｌｉｅｒと定義し、このＩｎｌｉｅｒの数を尺度にしたＲＡＮＳＡＣと呼ばれる画像表現・照合方法と、類似特徴点間における幾何学的特性（大きさと主方向）に着目し、幾何学的特性差分ヒストグラムによるＷｅａｋＧｅｏｍｅｔｒｉｃａｌＣｏｎｓｉｓｔｅｎｃｙと呼ばれる画像表現・照合方法（非特許文献８）とがある。 Specifically, a multi-scale Delaunay diagram is proposed and used to detect a triad of feature points that are neighboring in the image space, and a feature representation method (Non-Patent Document 6) using this triad as a basic unit, Using feature-based methods to detect feature point pairs that are close to each other in the image space, a feature expression method (Non-Patent Document 7) based on the difference in geometric characteristics (size and main direction) of this pair, and similarity between images A number of geometric transformations are temporarily set based on feature points, feature points that are evidence of each temporary are defined as Inliers, and an image expression / matching method called RANSAC with the number of Inliers as a scale, and between similar feature points Focusing on geometric characteristics (size and main direction), image representation and collation method called Weak Geometric Consistency by geometric characteristic difference histogram (non- Patent Document 8).

上記の方法は、特徴点の幾何学的特性に依存するか否かによる分類も可能である。非特許文献６の方法は、特徴点間における近傍関係のみに基づくため、幾何学的特性に依存しない方法に分類され、非特許文献６の方法は、特徴点の形状を表す楕円領域を画像照合の基本単位と見なす。非特許文献８の方法及び非特許文献７の方法は、特徴点の大きさと主方向を基に画像表現を実施するため、幾何学的特性に依存する方法に分類される。ここで、幾何学的特性に依存しない方法は、特徴点検出、記述の誤差に頑強であるのに対し、幾何学的特性に依存する方法は、不正解画像に対する弁別力が高い長所をもつ。 The above method can also be classified according to whether or not it depends on the geometric characteristics of feature points. Since the method of Non-Patent Document 6 is based only on the neighborhood relationship between feature points, it is classified as a method that does not depend on geometric characteristics. Is considered the basic unit. The method of Non-Patent Document 8 and the method of Non-Patent Document 7 are classified into methods that depend on geometric characteristics in order to perform image representation based on the size and main direction of feature points. Here, the method that does not depend on the geometric characteristics is robust against the error of feature point detection and description, whereas the method that depends on the geometric characteristics has an advantage of high discrimination power against incorrect images.

Fayin Li and Jana Kosecka. Probabilistic location recognition using reduced feature set. In ICRA,pp. 3405{3410, 2006.Fayin Li and Jana Kosecka.Probabilistic location recognition using reduced feature set.In ICRA, pp. 3405 {3410, 2006. Grant Schindler, Matthew Brown, and Richard Szeliski. City-scale location recognition. In CVPR,2007.Grant Schindler, Matthew Brown, and Richard Szeliski.City-scale location recognition. In CVPR, 2007. Jan Knopp, Josef Sivic, and Tomas Pajdla. Avoiding confusing features in place recognition. In ECCV (1), pp. 748{761, 2010.Jan Knopp, Josef Sivic, and Tomas Pajdla.Avoiding confusing features in place recognition.In ECCV (1), pp. 748 {761, 2010. P. Turcot and D.G. Lowe. Better matching with fewer features: The selection of useful features in large database recognition problems. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 2109{2116, 2009.P. Turcot and D.G.Lowe.Better matching with fewer features: The selection of useful features in large database recognition problems.In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 2109 {2116, 2009. James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman.Object retrieval with large vocabularies and fast spatial matching.In CVPR, 2007. Yannis Kalantidis, Lluis Garcia Pueyo, Michele Trevisiol, Roelof van Zwol, and Yannis S. Avrithis.Scalable triangulation-based logo recognition. In ICMR, p. 20, 2011.Yannis Kalantidis, Lluis Garcia Pueyo, Michele Trevisiol, Roelof van Zwol, and Yannis S. Avrithis.Scalable triangulation-based logo recognition. In ICMR, p. 20, 2011. Zhen Liu, Houqiang Li, Wengang Zhou, and Qi Tian. Embedding spatial context information into inverted le for large-scale image retrieval. In ACM Multimedia, pp. 199{208, 2012.Zhen Liu, Houqiang Li, Wengang Zhou, and Qi Tian.Embedding spatial context information into inverted le for large-scale image retrieval.In ACM Multimedia, pp. 199 {208, 2012. Herve Jegou, Matthijs Douze, and Cordelia Schmid. Improving bag-of-features for large scale image search. International Journal of Computer Vision, Vol. 87, No. 3, pp. 316{336, 2010.Herve Jegou, Matthijs Douze, and Cordelia Schmid.Improving bag-of-features for large scale image search.International Journal of Computer Vision, Vol. 87, No. 3, pp. 316 {336, 2010.

非特許文献１〜３の方法は、単一特徴点にしか着目せず、複数特徴点間の文脈関係を考慮しないため、白色雑音のような雑音領域による悪影響を受けやすい問題がある。 The methods of Non-Patent Documents 1 to 3 focus only on a single feature point and do not consider the context relationship between a plurality of feature points, and thus have a problem of being easily affected by a noise region such as white noise.

空間文脈を考慮した特徴表現に基づく特徴選択に関して、非特許文献４の方法は、入力となる複数画像における空間的一貫性が強い局所特徴しか選択できない。そのため、結果的に、検索や認識等に有益であるにもかかわらず、入力となる複数画像が不充分な為に空間的一貫性が低いといったような注目特徴が過度に排除される傾向がある。 Regarding feature selection based on feature expression in consideration of a spatial context, the method of Non-Patent Document 4 can select only local features having strong spatial consistency in a plurality of input images. Therefore, as a result, although it is useful for search, recognition, etc., there is a tendency that features of interest such as low spatial consistency due to insufficient input multiple images are excessively excluded. .

また、特徴点の幾何学特性（形状等）に依存するため、特徴点検出・記述の誤差に敏感であり、空間的一貫性の解析が計算負荷の高いＲＡＮＳＡＣに依存する為、大規模なデータベースに適応するのが困難である。 In addition, because it depends on the geometric characteristics (shape, etc.) of feature points, it is sensitive to errors in feature point detection / description, and because spatial consistency analysis depends on RANSAC, which has a high computational load, a large database Difficult to adapt to.

非特許文献７の方法において、Ｋ近傍法による近傍検出に基づく空間文脈特徴表現・特徴選択方法が考えられるが、Ｋ近傍法自体の計算負荷が高く、処理時間が長いという問題がある。 In the method of Non-Patent Document 7, a spatial context feature expression / feature selection method based on neighborhood detection by the K neighborhood method can be considered, but there is a problem that the calculation load of the K neighborhood method itself is high and the processing time is long.

本発明では、上記問題点を解決するために成されたものであり、適切な注目特徴を選択することができる特徴選択装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to provide a feature selection apparatus, method, and program capable of selecting an appropriate feature of interest.

上記目的を達成するために、第１の発明に係る特徴選択装置は、対象物を表す複数の画像を受け付ける入力部と、前記入力部において受け付けた複数の画像の各々について、前記画像の特徴点の各々を抽出し、前記特徴点の局所特徴の各々を抽出する局所特徴抽出部と、前記複数の画像の各々について、前記画像の特徴点であって、他の画像から抽出された局所特徴と共通する局所特徴を有する特徴点の各々を共通特徴点として検出し、前記複数の画像の各々について、前記検出された前記画像の共通特徴点のうちの複数の共通特徴点のペアを各々検出し、前記検出された前記ペアの各々に対し、前記複数の共通特徴点の局所特徴を表す空間文脈特徴表現を生成し、前記複数の画像の各々について、前記検出された前記ペアであって、他の画像から生成された空間文脈特徴表現と共通する空間文脈特徴表現の前記ペアの各々を検出する空間文脈特徴表現部と、前記複数の画像の各々について、前記空間文脈特徴表現部において検出された前記画像の前記共通する空間文脈特徴表現の前記ペアに基づいて、注目領域を推定する注目領域推定部と、前記複数の画像の各々について、前記注目領域推定部において推定された前記画像の注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択する特徴選択部と、を含んで構成されている。 In order to achieve the above object, a feature selection device according to a first invention includes an input unit that receives a plurality of images representing an object, and a feature point of the image for each of the plurality of images received by the input unit. A local feature extraction unit that extracts each of the local features of the feature points, and for each of the plurality of images, a feature point of the image, and a local feature extracted from another image Detecting each feature point having a common local feature as a common feature point, and for each of the plurality of images, detecting each of a plurality of common feature point pairs among the detected common feature points of the image Generating a spatial context feature representation representing local features of the plurality of common feature points for each of the detected pairs, and for each of the plurality of images, the detected pair, Painting A spatial context feature representation unit that detects each of the pairs of spatial context feature representations that are common to the spatial context feature representation generated from the image, and the images detected in the spatial context feature representation unit for each of the plurality of images A region of interest estimation unit that estimates a region of interest based on the pair of the common spatial context feature representations, and each of the plurality of images is included in the region of interest of the image estimated by the region of interest estimation unit And a feature selection unit that selects each of the local features of the feature points as attention features.

第２の発明に係る特徴選択方法は、入力部と、局所特徴抽出部と、空間文脈特徴表現部と、注目領域推定部と、特徴選択部と、を含む特徴選択装置における特徴選択方法であって、前記入力部は、対象物を表す複数の画像を受け付け、前記局所特徴抽出部は、前記入力部において受け付けた複数の画像の各々について、前記画像の特徴点の各々を抽出し、前記特徴点の局所特徴の各々を抽出し、前記空間文脈特徴表現部は、前記複数の画像の各々について、前記画像の特徴点であって、他の画像から抽出された局所特徴と共通する局所特徴を有する特徴点の各々を共通特徴点として検出し、前記複数の画像の各々について、前記検出された前記画像の共通特徴点のうちの複数の共通特徴点のペアを各々検出し、前記検出された前記ペアの各々に対し、前記複数の共通特徴点の局所特徴を表す空間文脈特徴表現を生成し、前記複数の画像の各々について、前記検出された前記ペアであって、他の画像から生成された空間文脈特徴表現と共通する空間文脈特徴表現の前記ペアの各々を検出し、前記注目領域推定部は、前記複数の画像の各々について、前記空間文脈特徴表現部において検出された前記画像の前記共通する空間文脈特徴表現の前記ペアに基づいて、注目領域を推定し、前記特徴選択部は、前記複数の画像の各々について、前記注目領域推定部において推定された前記画像の注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択する。 A feature selection method according to a second invention is a feature selection method in a feature selection device including an input unit, a local feature extraction unit, a spatial context feature expression unit, a region of interest estimation unit, and a feature selection unit. The input unit receives a plurality of images representing an object, and the local feature extraction unit extracts each of feature points of the image for each of the plurality of images received by the input unit, and the feature Each of the local features of the points is extracted, and the spatial context feature representation unit extracts, for each of the plurality of images, a local feature that is a feature point of the image and is common to the local features extracted from other images. Each of the feature points is detected as a common feature point, and for each of the plurality of images, a plurality of common feature point pairs among the common feature points of the detected image are detected, and the detected Each of the pairs On the other hand, a spatial context feature representation representing local features of the plurality of common feature points is generated, and for each of the plurality of images, the detected pair of the spatial context feature representations generated from other images Each of the pair of spatial context feature representations that are in common with each other, and the region of interest estimation unit, for each of the plurality of images, the common spatial context feature of the images detected in the spatial context feature representation unit A region of interest is estimated based on the pair of expressions, and the feature selection unit, for each of the plurality of images, local features of feature points included in the region of interest of the image estimated by the region of interest estimation unit Are selected as features of interest.

第１及び第２の発明によれば、入力部により、対象物を表す複数の画像を受け付け、局所特徴抽出部により、画像の各々について、特徴点の各々を抽出し、特徴点の局所特徴の各々を抽出し、空間文脈特徴表現部により、複数の画像の各々について、画像の特徴点であって、他の画像から抽出された局所特徴と共通する局所特徴を有する特徴点の各々を共通特徴点として検出し、複数の画像の各々について、検出された画像の共通特徴点のうちの複数の共通特徴点のペアを各々検出し、検出された前記ペアの各々に対し、前記複数の共通特徴点の局所特徴を表す空間文脈特徴表現を生成し、複数の画像の各々について、検出された前記ペアであって、他の画像から生成された空間文脈特徴表現と共通する空間文脈特徴表現のペアの各々を検出し、注目領域推定部は、複数の画像の各々について、検出された画像の共通する空間文脈特徴表現の前記ペアに基づいて、注目領域を推定し、特徴選択部により、複数の画像の各々について、推定された画像の注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択する。 According to the first and second inventions, the input unit receives a plurality of images representing an object, the local feature extraction unit extracts each feature point for each of the images, and the local feature of the feature point is extracted. Each of the plurality of images is extracted by the spatial context feature expression unit, and each of the feature points having the same local features as the local features extracted from other images is common to each of the plurality of images. A plurality of common feature points among the common feature points of the detected image are detected for each of the plurality of images, and the plurality of common features is detected for each of the detected pairs. A spatial context feature representation that represents a local feature of a point is generated, and for each of a plurality of images, a pair of the spatial context feature representation that is detected and is common to the spatial context feature representation generated from another image Detect each of The attention area estimation unit estimates the attention area for each of the plurality of images based on the pair of spatial context feature expressions common to the detected images. Each of the local features of the feature points included in the estimated region of interest of the image is selected as the feature of interest.

このように、第１及び第２の発明によれば、対象物を表す画像の各々について抽出された特徴点毎の局所特徴の各々を抽出し、複数の画像において共通する空間文脈特徴表現の各々を検出し、複数の画像の各々について、共通する空間文脈特徴表現に基づいて、注目領域を推定し、複数の画像の各々について、推定された注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択することにより、適切な注目特徴を選択することができる。 Thus, according to the first and second inventions, each of the local features for each feature point extracted for each of the images representing the object is extracted, and each of the spatial context feature expressions common to the plurality of images is extracted. For each of the plurality of images based on the common spatial context feature expression, and for each of the plurality of images, each of the local features of the feature points included in the estimated region of interest is determined. By selecting as a feature of interest, an appropriate feature of interest can be selected.

また、第１の発明において、前記空間文脈特徴表現は、前記画像の共通特徴点についてのマルチスケールドロネー図に基づいて得られる、近傍に存在する複数の共通特徴点のペアの局所特徴を表したものとしてもよい。 Further, in the first invention, the spatial context feature representation represents a local feature of a pair of a plurality of common feature points existing in the vicinity, obtained based on a multiscale Delaunay diagram for the common feature points of the image. It may be a thing.

また、第１の発明において、前記空間文脈特徴表現は、前記画像の共通特徴点からＫ近傍法により得られる、近傍に存在する複数の共通特徴点のペアの局所特徴を表したものとしてもよい。 In the first invention, the spatial context feature expression may represent a local feature of a plurality of pairs of common feature points existing in the vicinity obtained from the common feature points of the image by a K-neighbor method. .

また、第１の発明において、前記空間文脈特徴表現は、前記複数の共通特徴点のペアの幾何学的特性を考慮せずに、複数の共通特徴点のペアの局所特徴を表したものとしてもよい。 In the first invention, the spatial context feature expression may represent a local feature of a plurality of common feature point pairs without considering a geometric characteristic of the plurality of common feature point pairs. Good.

また、第１の発明において、前記空間文脈特徴表現は、複数の共通特徴点のペアの局所特徴及び前記複数の共通特徴点のペアの幾何学的特性の差分を表したものとしてもよい。 In the first invention, the spatial context feature representation may represent a local feature of a plurality of common feature point pairs and a difference in geometric characteristics of the plurality of common feature point pairs.

また、本発明のプログラムは、コンピュータを、上記の特徴選択装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said feature selection apparatus.

以上説明したように、本発明の特徴選択装置、方法、及びプログラムによれば、対象物を表す画像の各々について抽出された特徴点毎の局所特徴の各々を抽出し、複数の画像において共通する空間文脈特徴表現の各々を検出し、複数の画像の各々について、共通する空間文脈特徴表現に基づいて、注目領域を推定し、複数の画像の各々について、推定された注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択することにより、適切な注目特徴を選択することができる。 As described above, according to the feature selection device, method, and program of the present invention, each local feature for each feature point extracted for each image representing an object is extracted, and is common to a plurality of images. Detecting each of the spatial context feature representations, estimating a region of interest for each of the plurality of images based on a common spatial context feature representation, and feature points included in the estimated region of interest for each of the plurality of images By selecting each of the local features as the feature of interest, an appropriate feature of interest can be selected.

本発明の第１の実施の形態に係る特徴選択装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the feature selection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る局所特徴抽出部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the local feature extraction part which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る空間文脈特徴表現部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the spatial context characteristic expression part which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る注目領域推定部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the attention area estimation part which concerns on the 1st Embodiment of this invention. 注目近傍を検出する例を示す図である。It is a figure which shows the example which detects an attention vicinity. 本発明の第１の実施の形態に係るコードブック学習装置における機能的構成を示すブロック図である。It is a block diagram which shows the functional structure in the code book learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るコードブック学習装置におけるコードブック学習処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the code book learning process routine in the code book learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る特徴選択装置における特徴選択処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the feature selection process routine in the feature selection apparatus concerning the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る特徴選択装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the feature selection apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る空間文脈特徴表現部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the spatial context characteristic expression part which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る注目領域推定部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the attention area estimation part which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る特徴選択装置における特徴選択処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the feature selection process routine in the feature selection apparatus concerning the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜第１の実施の形態に係る特徴選択装置の構成＞
次に、本発明の第１の実施の形態に係る特徴選択装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係る特徴選択装置１００は、ＣＰＵと、ＲＡＭと、後述する特徴選択処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この特徴選択装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部９０とを備えている。 <Configuration of Feature Selection Device According to First Embodiment>
Next, the configuration of the feature selection device according to the first embodiment of the present invention will be described. As shown in FIG. 1, a feature selection apparatus 100 according to the first embodiment of the present invention includes a CPU, a RAM, a ROM that stores a program and various data for executing a feature selection processing routine described later. , Can be configured with a computer including. Functionally, the feature selection apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 90 as shown in FIG.

入力部１０は、同一の対象物（概念や物体等）を含む複数の画像の各々を受け付ける。 The input unit 10 receives each of a plurality of images including the same target (concept, object, etc.).

演算部２０は、局所特徴抽出部３０と、空間文脈特徴表現部４０と、注目領域推定部６０と、特徴選択部７０とを含んで構成されている。 The calculation unit 20 includes a local feature extraction unit 30, a spatial context feature expression unit 40, an attention area estimation unit 60, and a feature selection unit 70.

局所特徴抽出部３０は、入力部１０において受け付けた画像の各々について、当該画像から特徴点の各々を抽出し、当該特徴点の各々の局所特徴ベクトルを抽出する。また、局所特徴抽出部３０は、図２に示すように、特徴点抽出部３２と、特徴点記述部３４と、コードブック記憶部３６と、ＶｉｓｕａｌＷｏｒｄ割当部３８とを含んで構成されている。なお、局所特徴ベクトルが局所特徴の一例である。 For each of the images received by the input unit 10, the local feature extraction unit 30 extracts feature points from the image, and extracts local feature vectors of the feature points. As shown in FIG. 2, the local feature extraction unit 30 includes a feature point extraction unit 32, a feature point description unit 34, a code book storage unit 36, and a Visual Word allocation unit 38. . A local feature vector is an example of a local feature.

特徴点抽出部３２は、入力部１０において受け付けた画像の各々について、当該画像と当該画像を反転させた画像から、ＨｅｓｓｉａｎＡｆｆｉｎｅやＨａｒｒｉｓＡｆｆｉｎｅ等のアフィン不変特徴検出器を用いて、楕円領域の各々を抽出し、当該楕円領域の中心を特徴点として抽出し、当該特徴点の各々の座標を取得する。 For each of the images received by the input unit 10, the feature point extraction unit 32 uses an affine invariant feature detector such as Hessian Affine or Harris Affine from the image and an image obtained by inverting the image. , And the center of the elliptical area is extracted as a feature point, and the coordinates of the feature point are obtained.

特徴点記述部３４は、画像の各々について、特徴点抽出部３２において抽出された当該画像の特徴点の各々に対し、当該画像と、当該画像を反転させた画像と、当該特徴点の座標とに基づいて、当該特徴点の楕円領域の局所特徴ベクトルを計算する。局所特徴ベクトルの一例として、ＳＩＦＴ、ＳＵＲＦ、及びＲｏｏｔＳＩＦＴ等を計算する。 For each image, the feature point description unit 34, for each of the feature points of the image extracted by the feature point extraction unit 32, the image obtained by inverting the image, the coordinates of the feature point, Based on the above, the local feature vector of the elliptical region of the feature point is calculated. As an example of the local feature vector, SIFT, SURF, Root SIFT, and the like are calculated.

コードブック記憶部３６は、後述するコードブック学習装置２００において出力されたコードブックを記憶している。ここで、コードブックとは、局所特徴ベクトル毎に対応して定義されているユニークなＶｉｓｕａｌＷｏｒｄの集合のことである。 The code book storage unit 36 stores the code book output from the code book learning device 200 described later. Here, the code book is a set of unique Visual Words defined corresponding to each local feature vector.

ＶｉｓｕａｌＷｏｒｄ割当部３８は、画像の各々について、特徴点抽出部３２において抽出された当該画像の特徴点の各々に対し、特徴点記述部３４において計算された当該特徴点の局所特徴ベクトルと、コードブック記憶部３６に記憶されているコードブックとに基づいて、近似最近傍法を用いて、当該局所特徴ベクトルの最近傍となるＶｉｓｕａｌＷｏｒｄを取得し、当該特徴点の局所特徴ベクトルに割当てる。 For each of the images, the Visual Word assigning unit 38, for each feature point of the image extracted by the feature point extracting unit 32, a local feature vector of the feature point calculated by the feature point description unit 34, and a code Based on the code book stored in the book storage unit 36, the Visual Word that is the nearest neighbor of the local feature vector is obtained using the approximate nearest neighbor method, and assigned to the local feature vector of the feature point.

空間文脈特徴表現部４０は、局所特徴抽出部３０において画像の各々について抽出された局所特徴ベクトルの各々に基づいて、空間文脈を考慮した特徴表現・照合を実施し、他の画像と空間文脈特徴表現が共通する共通特徴点のペアを検出する。また、空間文脈特徴表現部４０は、図３に示すように、特徴点索引付与部４２と、共通特徴点検出部４４と、近傍検出部４６と、近傍記述部４８と、近傍索引付与部５２と、共通近傍検出部５４とを含んで構成されている。 The spatial context feature representation unit 40 performs feature representation / matching in consideration of the spatial context based on each of the local feature vectors extracted for each of the images by the local feature extraction unit 30, and the spatial context feature with other images. A pair of common feature points with common expressions is detected. As shown in FIG. 3, the spatial context feature expression unit 40 includes a feature point index assigning unit 42, a common feature point detecting unit 44, a neighborhood detecting unit 46, a neighborhood description unit 48, and a neighborhood index assigning unit 52. And a common neighborhood detector 54.

特徴点索引付与部４２は、特徴点抽出部３２において画像の各々について抽出された特徴点の各々に対し、当該特徴点に割当てられているＶｉｓｕａｌＷｏｒｄに基づいて、転置索引付与等を用いて、当該特徴点に割当てられているＶｉｓｕａｌＷｏｒｄに対応させて、当該特徴点を特徴点索引に入れ、特徴点索引を生成する。 The feature point index assigning unit 42 uses, for each feature point extracted for each image in the feature point extracting unit 32, transposition index assignment based on the Visual Word assigned to the feature point. Corresponding to the Visual Word assigned to the feature point, the feature point is put into the feature point index to generate a feature point index.

共通特徴点検出部４４は、特徴点索引付与部４２において生成された特徴点索引に基づいて、入力部１０において受け付けた画像の各々について、特徴点抽出部３２において抽出された当該画像の特徴点であり、かつ、同一のＶｉｓｕａｌＷｏｒｄが割当てられている特徴点が他の画像から抽出されている特徴点を、共通特徴点として検出する。具体的には、特徴点索引において、ＶｉｓｕａｌＷｏｒｄに対応する特徴点として、複数の画像の特徴点が存在する場合、当該特徴点を共通特徴点と見なし、検出する。 Based on the feature point index generated by the feature point index assigning unit 42, the common feature point detecting unit 44 extracts the feature points of the image extracted by the feature point extracting unit 32 for each of the images received by the input unit 10. And feature points in which feature points to which the same Visual Word is assigned are extracted from other images are detected as common feature points. Specifically, when there are feature points of a plurality of images as feature points corresponding to Visual Word in the feature point index, the feature points are detected as common feature points.

近傍検出部４６は、入力部１０において受け付けた画像の各々について、共通特徴点検出部４４において抽出された当該画像の共通特徴点の各々に対して、マルチスケールドロネー三角形分割を適用して、マルチスケールドロネー図を生成し、近傍に存在する共通特徴点のペアを検出し、検出された近傍に存在する共通特徴点のペアの集合からなる特徴点近傍集合を、画像毎に検出する。具体的には、入力部１０において受け付けた画像の各々について、当該画像の共通特徴点を、共通特徴点に対応する楕円領域の大きさの小さい順で並べ替え、得られた一覧表を、複数の互いに重なった、部分集合に分割し、ドロネー三角形分割を各部分集合に適用し、マルチスケールドロネー図を生成する。そして、ドロネー三角形の辺で結ばれた２つの共通特徴点を、近傍に存在する共通特徴点のペアとして検出する。 The neighborhood detection unit 46 applies multiscale Delaunay triangulation to each of the common feature points of the image extracted by the common feature point detection unit 44 for each of the images received by the input unit 10 to A scale Delaunay diagram is generated, a pair of common feature points existing in the vicinity is detected, and a feature point neighborhood set consisting of a set of pairs of common feature points existing in the detected neighborhood is detected for each image. Specifically, for each of the images received by the input unit 10, the common feature points of the image are rearranged in the order of the size of the elliptical area corresponding to the common feature points, and a plurality of obtained lists are obtained. Are subdivided into subsets, and the Delaunay triangulation is applied to each subset to generate a multi-scale Delaunay diagram. Then, two common feature points connected by the sides of the Delaunay triangle are detected as a pair of common feature points existing in the vicinity.

近傍記述部４８は、入力部１０において受け付けた画像の各々について、近傍検出部４６において検出された当該画像の特徴点近傍集合に基づいて、近傍に存在する共通特徴点のペア毎に、当該ペアを構成する共通特徴点の各々に割当てられたＶｉｓｕａｌＷｏｒｄを連結させた記述子を、当該ペアの空間文脈特徴表現として生成し、当該ペアに割当てる。 For each of the images received by the input unit 10, the neighborhood description unit 48 creates, for each pair of common feature points that exist in the vicinity, based on the feature point neighborhood set of the image detected by the neighborhood detection unit 46. Is generated as a spatial context feature representation of the pair, and assigned to the pair.

近傍索引付与部５２は、近傍検出部４６において検出された特徴点近傍集合の各々に含まれる、近傍に存在する共通特徴点のペア毎に、当該ペアに割当てられている空間文脈特徴表現に基づいて、転置索引付与等を用いて、当該ペアに割当てられている空間文脈特徴表現に対応させて、当該ペアを近傍索引に入れ、近傍索引を生成する。 The neighborhood index assigning unit 52 includes, for each pair of common feature points existing in the neighborhood included in each of the feature point neighborhood sets detected by the neighborhood detection unit 46, based on the spatial context feature expression assigned to the pair. Then, using a transposed index assignment or the like, the pair is placed in the neighborhood index in correspondence with the spatial context feature expression assigned to the pair, and a neighborhood index is generated.

共通近傍検出部５４は、近傍索引付与部５２において生成された近傍索引に基づいて、入力部１０において受け付けた画像の各々について、近傍検出部４６において当該画像に対して検出された、近傍に存在する共通特徴点のペアであり、かつ、同一の空間文脈特徴表現が割当てられている、近傍に存在する共通特徴点のペアが他の画像から検出されているペアを、他の画像と空間文脈特徴表現が共通するペアとして検出する。 Based on the neighborhood index generated by the neighborhood index assigning unit 52, the common neighborhood detecting unit 54 exists in the neighborhood detected for the image by the neighborhood detecting unit 46 for each of the images received by the input unit 10. A pair of common feature points that are assigned to the same spatial context feature expression and a pair of common feature points existing in the neighborhood is detected from another image, and a spatial context Detect as a pair with common feature expression.

注目領域推定部６０は、入力部１０において受け付けた画像の各々について、空間文脈特徴表現部４０において当該画像に対し検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアに基づいて、注目領域を推定する。また、注目領域推定部６０は、図４に示すように、共通近傍拡張部６２と、推移閉包部６４と、境界矩形計算部６６とを含んで構成されている。 The attention area estimation unit 60 sets, for each of the images received by the input unit 10, a pair of common feature points that are detected for the image by the spatial context feature representation unit 40 and share the same spatial context feature representation with other images. Based on this, an attention area is estimated. Further, as shown in FIG. 4, the attention area estimation unit 60 includes a common neighborhood extension unit 62, a transition closure unit 64, and a boundary rectangle calculation unit 66.

共通近傍拡張部６２は、入力部１０において受け付けた画像の各々について、近傍検出部４６において生成したマルチスケールドロネー図と、共通近傍検出部５４において検出した、他の画像と空間文脈特徴表現が共通する共通特徴点のペアの各々とに基づいて、他の画像と空間文脈特徴表現が共通する共通特徴点のペアを拡張し、拡張した共通特徴点のペアを注目近傍として検出する。具体的には、入力部１０において受け付けた画像の各々について、共通近傍検出部５４において検出した、他の画像と空間文脈特徴表現が共通する共通特徴点のペア毎に、当該ペアを結ぶ辺を含むドロネー三角形について、当該ペアを結ぶ辺の対角が予め定められた閾値以上である場合（例えば、鈍角である場合）、当該対角を挟む両辺まで拡張し、注目近傍として検出する。図５に例を示す。 The common neighborhood extension unit 62 shares the same multi-scale Delaunay diagram generated by the neighborhood detection unit 46 with respect to each image received by the input unit 10 and the spatial context feature expression detected by the common neighborhood detection unit 54 and other images. Based on each of the pair of common feature points, the pair of common feature points common to other images and the spatial context feature representation is expanded, and the expanded pair of common feature points is detected as the attention neighborhood. Specifically, for each of the images received by the input unit 10, for each pair of common feature points detected by the common neighborhood detection unit 54 and having a common spatial context feature expression with another image, an edge connecting the pair is determined. For a Delaunay triangle to be included, if the diagonal of the side connecting the pair is equal to or greater than a predetermined threshold (for example, an obtuse angle), the triangle is extended to both sides sandwiching the diagonal and detected as the attention neighborhood. An example is shown in FIG.

推移閉包部６４は、入力部１０において受け付けた画像毎に、共通近傍検出部５４において当該画像について検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアの各々に対応する辺と、共通近傍拡張部６２において検出した注目近傍の各々に対応する辺とを組み合わせ、無向グラフと見なし、推移閉包を用いて、少なくとも１つ以上の連結グラフを検出する。 For each image received by the input unit 10, the transition closure unit 64 corresponds to each pair of common feature points that are detected for the image by the common neighborhood detection unit 54 and share a spatial context feature expression with other images. The edges and the edges corresponding to each of the attention neighborhoods detected by the common neighborhood extension unit 62 are combined to be regarded as an undirected graph, and at least one or more connected graphs are detected using transitive closure.

境界矩形計算部６６は、入力部１０において画像の各々について、推移閉包部６４において検出された当該画像の連結グラフの各々を包含する最小境界矩形を各々計算し、得られた少なくとも１つ以上の境界矩形における和集合を計算し、当該計算された和集合が表す領域を当該画像の注目領域として推定する。 The boundary rectangle calculation unit 66 calculates, for each of the images in the input unit 10, a minimum boundary rectangle that includes each of the connected graphs of the image detected in the transition closure unit 64, and obtained at least one or more obtained The union at the boundary rectangle is calculated, and the area represented by the calculated union is estimated as the attention area of the image.

特徴選択部７０は、入力部１０において受け付けた画像の各々について、境界矩形計算部６６において推定された当該画像の注目領域に含まれる、特徴点抽出部３２において抽出した特徴点の各々の局所特徴ベクトルを注目特徴として選択する。 The feature selection unit 70 includes, for each of the images received by the input unit 10, local features of the feature points extracted by the feature point extraction unit 32 included in the attention area of the image estimated by the boundary rectangle calculation unit 66. Select a vector as the feature of interest.

出力部９０は、特徴選択部７０において選択された注目特徴の各々を出力する。 The output unit 90 outputs each feature of interest selected by the feature selection unit 70.

＜第１の実施の形態に係るコードブック学習装置の構成＞
次に、本発明の第１の実施の形態に係るコードブック学習装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係るコードブック学習装置２００は、ＣＰＵと、ＲＡＭと、後述するコードブック構築処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このコードブック学習装置２００は、機能的には図６に示すように入力部２１０と、演算部２２０と、出力部２９０とを備えている。 <Configuration of Codebook Learning Device According to First Embodiment>
Next, the configuration of the code book learning device according to the first embodiment of the present invention will be described. As shown in FIG. 1, the code book learning device 200 according to the first embodiment of the present invention stores a CPU, a RAM, a program for executing a code book construction processing routine described later, and various data. It can be composed of a computer including a ROM. Functionally, the code book learning apparatus 200 includes an input unit 210, a calculation unit 220, and an output unit 290 as shown in FIG.

入力部２１０は、コードブック学習用の複数の画像を受け付ける。 The input unit 210 receives a plurality of images for codebook learning.

演算部２２０は、特徴点抽出部２３２と、特徴点記述部２３４と、コードブック記憶部２３６と、コードブック構築部２３８とを含んで構成されている。 The calculation unit 220 includes a feature point extraction unit 232, a feature point description unit 234, a code book storage unit 236, and a code book construction unit 238.

特徴点抽出部２３２は、入力部２１０において受け付けた画像の各々について、当該画像と当該画像を反転させた画像から、ＨｅｓｓｉａｎＡｆｆｉｎｅやＨａｒｒｉｓＡｆｆｉｎｅ等のアフィン不変特徴検出器を用いて、特徴点の各々を抽出し、当該特徴点の各々の座標を取得する。 For each of the images received by the input unit 210, the feature point extraction unit 232 uses the affine invariant feature detector such as Hessian Affine or Harris Affine from the image and an image obtained by inverting the image. And the coordinates of each feature point are obtained.

特徴点記述部２３４は、入力部２１０において受け付けた画像の各々について、特徴点抽出部２３２において抽出された当該画像の特徴点の各々について、上記特徴点記述部３４と同様に、特徴点の楕円領域の局所特徴ベクトルを計算する。 The feature point description unit 234, for each of the images received by the input unit 210, for each feature point of the image extracted by the feature point extraction unit 232, as in the feature point description unit 34, Compute the local feature vector of the region.

コードブック記憶部２３６は、コードブック構築部２３８において学習されたコードブックを記憶している。 The code book storage unit 236 stores the code book learned by the code book construction unit 238.

コードブック構築部２３８は、特徴点記述部２３４において計算された特徴点の各々の局所特徴ベクトルに基づいて、近似Ｋ−ＭｅａｎｓやＶｏｃａｂｕｌａｒｙＴｒｅｅ等の方法を用いて、コードブックを学習し、コードブック記憶部２３６に記憶すると共に、出力部２９０に出力する。具体的には、局所特徴ベクトルの各々に対応するＶｉｓｕａｌＷｏｒｄを学習し、局所特徴ベクトルとＶｉｓｕａｌＷｏｒｄとのペアの集合をコードブックとする。 Based on the local feature vectors of the feature points calculated by the feature point description unit 234, the code book construction unit 238 learns the code book using a method such as approximate K-Means or Vocabulary Tree. The data is stored in the storage unit 236 and output to the output unit 290. Specifically, the Visual Word corresponding to each of the local feature vectors is learned, and a set of pairs of the local feature vector and the Visual Word is used as a code book.

出力部２９０は、コードブック構築部２３８において学習されたコードブックを出力する。 The output unit 290 outputs the code book learned by the code book construction unit 238.

＜第１の実施の形態に係るコードブック学習装置の作用＞
次に、本発明の第１の実施の形態に係るコードブック学習装置２００の作用について説明する。入力部２１０においてコードブック学習用の、複数の画像を受け付けると、コードブック学習装置２００は、図７に示すコードブック学習処理ルーチンを実行する。 <Operation of the code book learning device according to the first embodiment>
Next, the operation of the code book learning apparatus 200 according to the first embodiment of the present invention will be described. When the input unit 210 receives a plurality of images for codebook learning, the codebook learning device 200 executes a codebook learning process routine shown in FIG.

まず、ステップＳ１００では、入力部２１０において受け付けた複数の画像の各々について、特徴点の各々を抽出する。 First, in step S100, each feature point is extracted from each of a plurality of images received by the input unit 210.

次に、ステップＳ１０４では、入力部２１０において受け付けた複数の画像の各々について、ステップＳ１００において取得した特徴点の各々について、当該特徴点の楕円領域の局所特徴ベクトルを計算する。 Next, in step S104, for each of the plurality of images received by the input unit 210, for each feature point acquired in step S100, a local feature vector of the elliptical region of the feature point is calculated.

次に、ステップＳ１０６では、ステップＳ１０４において取得した特徴点の各々の局所特徴ベクトルに基づいて、コードブックを学習し、コードブック記憶部２３６に記憶する。 Next, in step S106, the code book is learned based on the local feature vectors of the feature points acquired in step S104, and stored in the code book storage unit 236.

次に、ステップＳ１０８では、ステップＳ１０６において取得したコードブックを出力部２９０から出力して、コードブック学習処理ルーチンの処理を終了する。 Next, in step S108, the code book acquired in step S106 is output from the output unit 290, and the process of the code book learning process routine is terminated.

＜第１の実施の形態に係る特徴選択装置の作用＞
次に、本発明の第１の実施の形態に係る特徴選択装置１００の作用について説明する。事前に、コードブック学習装置２００において学習されたコードブックが入力部１０により受け付けられ、特徴選択装置１００のコードブック記憶部３６に記憶される。そして、入力部１０において同一の対象物（概念や物体等）を含む複数の画像を受け付けると、特徴選択装置１００は、図８に示す特徴選択処理ルーチンを実行する。 <Operation of Feature Selection Device According to First Embodiment>
Next, the operation of the feature selection apparatus 100 according to the first embodiment of the present invention will be described. The code book learned in advance by the code book learning device 200 is received by the input unit 10 and stored in the code book storage unit 36 of the feature selection device 100. When the input unit 10 receives a plurality of images including the same target (concept, object, etc.), the feature selection device 100 executes a feature selection processing routine shown in FIG.

まず、ステップＳ２００では、入力部１０において受け付けた画像の各々について、特徴点の各々を抽出する。 First, in step S <b> 200, each feature point is extracted from each image received by the input unit 10.

次に、ステップＳ２０４では、入力部１０において受け付けた画像の各々について、ステップＳ２００において取得した当該画像の特徴点の各々の局所特徴ベクトルを計算する。 Next, in step S204, for each image received by the input unit 10, a local feature vector of each feature point of the image acquired in step S200 is calculated.

次に、ステップＳ２０６では、コードブック記憶部３６に記憶されているコードブックを読み込む。 In step S206, the code book stored in the code book storage unit 36 is read.

次に、ステップＳ２０８では、入力部１０において受け付けた画像の各々について、ステップＳ２００において取得した当該画像の特徴点の各々に対し、ステップＳ２０４において取得した当該特徴点の局所特徴ベクトルと、ステップＳ２０６において取得したコードブックとに基づいて、近似最近傍法を用いて当該特徴点にＶｉｓｕａｌＷｏｒｄを割当てる。 Next, in step S208, for each of the images received in the input unit 10, for each feature point of the image acquired in step S200, the local feature vector of the feature point acquired in step S204, and in step S206. Based on the acquired codebook, a Visual Word is assigned to the feature point using an approximate nearest neighbor method.

次に、ステップＳ２１０では、ステップＳ２００において取得した入力部１０において受け付けた画像の各々について抽出された特徴点の各々に対し、ステップＳ２０８において、当該特徴点に割当てられているＶｉｓｕａｌＷｏｒｄに基づいて、転置索引付与等を用いて、当該特徴点を特徴点索引に入れ、特徴点索引を生成する。 Next, in step S210, for each feature point extracted for each of the images received in the input unit 10 acquired in step S200, in step S208, based on the Visual Word assigned to the feature point, The feature point is entered into the feature point index using transposition indexing or the like, and a feature point index is generated.

次に、ステップＳ２１２では、ステップＳ２１０において生成された特徴点索引に基づいて、入力部１０において受け付けた画像の各々について、ステップＳ２００において取得した当該画像の特徴点であり、かつ、同一のＶｉｓｕａｌＷｏｒｄが割当てられている特徴点が他の画像から抽出されている特徴点を、共通特徴点と見なし、検出する。 Next, in step S212, for each of the images received by the input unit 10 based on the feature point index generated in step S210, the feature points of the image acquired in step S200 and the same Visual Word. A feature point to which a feature point assigned to is extracted from another image is regarded as a common feature point and is detected.

次に、ステップＳ２１４では、入力部１０において受け付けた画像の各々について、ステップＳ２１２において取得した当該画像の共通特徴点の各々に対して、マルチスケールドロネー三角形分割を適用して、マルチスケールドロネー図を生成し、近傍に存在する共通特徴点のペアを検出し、検出された近傍に存在する共通特徴点のペアの集合からなる特徴点近傍集合を検出する。 Next, in step S214, multiscale Delaunay triangulation is applied to each of the common feature points of the image acquired in step S212 for each of the images received by the input unit 10, and a multiscale Delaunay diagram is displayed. A pair of common feature points that are generated and detected in the vicinity are detected, and a feature point neighborhood set that is a set of pairs of common feature points that exist in the detected neighborhood is detected.

次に、ステップＳ２１６では、入力部１０において受け付けた画像の各々について、ステップＳ２１４において取得した当該画像の特徴点近傍集合に基づいて、近傍に存在する共通特徴点のペア毎に、当該ペアを構成する共通特徴点の各々に対してステップＳ２０８において割当てられたＶｉｓｕａｌＷｏｒｄを連結させた記述子を、当該ペアの空間文脈特徴表現として生成し、当該ペアに割当てる。 Next, in step S216, for each image received by the input unit 10, the pair is configured for each pair of common feature points existing in the vicinity based on the feature point neighborhood set acquired in step S214. A descriptor obtained by connecting the Visual Word assigned in step S208 to each common feature point is generated as a spatial context feature expression of the pair, and assigned to the pair.

次に、ステップＳ２１８では、ペアの各々について、ステップＳ２１６において当該ペア毎に割当てられた空間文脈特徴表現に基づいて、転置索引付与等を用いて、当該ペアに割当てられている空間文脈特徴表現に対応させて、当該ペアを近傍索引に入れ、近傍索引を生成する。 Next, in step S218, for each of the pairs, based on the spatial context feature expression assigned to each pair in step S216, using the transposed index assignment or the like, the spatial context feature expression assigned to the pair is converted into the spatial context feature expression assigned to the pair. Correspondingly, the pair is placed in the neighborhood index to generate a neighborhood index.

次に、ステップＳ２２０では、ステップＳ２１８において取得した近傍索引に基づいて、入力部１０において受け付けた画像の各々について、ステップＳ２１４において取得した当該画像に対して検出された、近傍に存在する共通特徴点のペアであり、かつ、ステップＳ２１６において、同一の空間文脈特徴表現が割当てられている、近傍に存在する共通特徴点のペアが他の画像から検出されているペアを、他の画像と空間文脈特徴表現が共通するペアとして検出する。 Next, in step S220, for each of the images received by the input unit 10 based on the neighborhood index acquired in step S218, the common feature points existing in the vicinity detected for the image acquired in step S214. In step S216, a pair of common feature points existing in the neighborhood, to which the same spatial context feature expression is assigned, is detected from another image as a spatial context. Detect as a pair with common feature expression.

次に、ステップＳ２２２では、入力部１０において受け付けた画像の各々について、ステップＳ２２０において取得した当該画像に対し検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアの各々と、ステップＳ２１４において取得したマルチスケールドロネー図とに基づいて、他の画像と空間文脈特徴表現が共通する共通特徴点のペアを拡張し、拡張した結果を注目近傍として検出する。 Next, in step S222, for each of the images received by the input unit 10, each of the pairs of common feature points detected for the image acquired in step S220 and having a common spatial context feature expression with the other images, and Based on the multi-scale Delaunay diagram acquired in step S214, a pair of common feature points having a common spatial context feature expression with another image is expanded, and the expanded result is detected as the attention neighborhood.

次に、ステップＳ２２４では、入力部１０において受け付けた画像の各々について、ステップＳ２２０において取得した当該画像について検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアの各々に対応する辺と、ステップＳ２２２において取得した注目近傍の各々に対応する辺とを組み合わせ、無向グラフと見なし、推移閉包を用いて、少なくとも１つ以上の連結グラフを検出する。 Next, in step S224, each of the images received in the input unit 10 corresponds to each pair of common feature points detected in the image acquired in step S220 and having a common spatial context feature expression with other images. The edge corresponding to each of the attention neighborhoods acquired in step S222 is considered as an undirected graph, and at least one connected graph is detected using transitional closure.

次に、ステップＳ２２６では、入力部１０において受け付けた画像の各々について、ステップＳ２２４において取得した当該画像の連結グラフの各々に基づいて、当該画像の注目領域を推定する。 Next, in step S226, for each image received by the input unit 10, the attention area of the image is estimated based on each of the connection graphs of the image acquired in step S224.

次に、ステップＳ２２８では、入力部１０において受け付けた画像の各々について、ステップＳ２２６において取得した当該画像の注目領域に含まれる、ステップＳ２０４において取得した特徴点の各々の局所特徴ベクトルを注目特徴として選択する。 Next, in step S228, for each of the images received by the input unit 10, the local feature vectors of the feature points acquired in step S204 included in the target region of the image acquired in step S226 are selected as the target features. To do.

次に、ステップＳ２３０では、ステップＳ２２８において取得した注目特徴を出力部９０から出力して特徴選択処理ルーチンの処理を終了する。 Next, in step S230, the feature of interest acquired in step S228 is output from the output unit 90, and the processing of the feature selection processing routine is terminated.

以上説明したように、本発明の第１の実施の形態に係る特徴選択装置によれば、対象物を表す画像の各々について抽出された特徴点毎の局所特徴の各々を抽出し、他の画像と局所特徴が共通する共通特徴点を検出し、マルチスケールドロネー三角形分割により、近傍に存在する共通特徴点のペアの検出を行い、複数の画像において共通する空間文脈特徴表現を有する共通特徴点のペアの各々を検出し、複数の画像の各々について、共通する空間文脈特徴表現に基づいて、幾何学特性に依存しない注目領域を推定し、複数の画像の各々について、推定された注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択することにより、適切な注目特徴を選択することができる。 As described above, according to the feature selection device according to the first embodiment of the present invention, each of the local features for each feature point extracted for each of the images representing the object is extracted, and another image is obtained. Common feature points that are common to local features are detected, and by using multiscale Delaunay triangulation, pairs of common feature points existing in the vicinity are detected, and common feature points having a common spatial context feature representation in multiple images are detected. Detect each pair, estimate a region of interest that does not depend on geometric characteristics based on a common spatial context feature representation for each of multiple images, and include each of the multiple images in the estimated region of interest An appropriate feature of interest can be selected by selecting each of the local features of the feature point to be selected as the feature of interest.

また、マルチドロネー三角形分割により、より効率的に、近傍に存在する共通特徴点のペアの検出を実現し、結果的に、特徴選択の効率を上げることができる。 In addition, multi- Delaunay triangulation makes it possible to more efficiently detect pairs of common feature points existing in the vicinity, and as a result, the efficiency of feature selection can be increased.

また、索引付与により、より効率的に空間文脈を考慮した特徴表現・照合を実現し、結果的に、特徴選択を大規模なデータベースに適応することが可能となる。 In addition, by indexing, it is possible to more efficiently realize feature expression / collation considering the spatial context, and as a result, feature selection can be applied to a large-scale database.

また、幾何学的特性に依存しない空間文脈特徴表現により、特徴点検出・記述の誤差に頑強な特徴選択ができる。 In addition, spatial context feature expression that does not depend on geometric characteristics enables feature selection that is robust against feature point detection and description errors.

また、入力となる複数画像から同一概念や物体等を表す画像の注目領域を推定することができ、結果的に、検索や認識等に有益であるにも関わらず、入力となる複数画像が不十分な為に空間的一貫性が低い注目特徴でも、過度に排除されることなく、より完全な特徴選択が実現できる。 In addition, it is possible to estimate a region of interest of an image representing the same concept or object from a plurality of input images, and as a result, although it is useful for search, recognition, etc., the plurality of input images are not valid. Even if the feature of interest has a low spatial consistency because it is sufficient, more complete feature selection can be realized without being excessively excluded.

また、特徴近傍を拡張することにより、より完全な注目領域推定が可能となる。 Further, by expanding the neighborhood of the feature, a more complete attention area estimation can be performed.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、第１の実施の形態においては、空間文脈特徴表現を、近傍に存在する共通特徴点のペアに含まれる特徴点の各々に割当てられたＶｉｓｕａｌＷｏｒｄを連結したものとする場合について説明したが、これに限定されるものではない。後述する第２の実施の形態と同様に、近傍に存在する共通特徴点のペアに含まれる特徴点の各々に割当てられたＶｉｓｕａｌＷｏｒｄを連結したもの、及び近傍に存在する共通特徴点のペアに含まれる特徴点の各々の幾何学特性の差分を表す空間文脈特徴表現を生成してもよい。 For example, in the first embodiment, a case has been described in which the spatial context feature representation is a combination of Visual Words assigned to each of feature points included in a pair of common feature points existing in the vicinity. However, the present invention is not limited to this. As in the second embodiment to be described later, a combination of Visual Words assigned to each of feature points included in a pair of common feature points existing in the vicinity, and a pair of common feature points existing in the vicinity A spatial context feature representation that represents a difference in geometric characteristics of each of the feature points included may be generated.

また、第１の実施の形態においては、特徴選択部７０において、入力部１０において受け付けた画像毎に、注目領域に含まれる特徴点の各々の局所特徴ベクトルを注目特徴として選択する場合について説明したが、これに限定されるものではない。例えば、特徴選択部７０において、入力部１０において受け付けた画像毎に、注目領域から特徴点を抽出し、当該抽出された特徴点の各々について、局所特徴ベクトルを計算し、計算された局所ベクトルを注目特徴として選択するようにしてもよい。 In the first embodiment, the case has been described in which the feature selection unit 70 selects, as the feature of interest, the local feature vector of each feature point included in the region of interest for each image received by the input unit 10. However, the present invention is not limited to this. For example, the feature selection unit 70 extracts feature points from the attention area for each image received by the input unit 10, calculates a local feature vector for each of the extracted feature points, and calculates the calculated local vector. You may make it select as an attention feature.

次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成及び作用となる部分については、同一符号を付して説明を省略する。 Next, a second embodiment will be described. In addition, about the part which becomes the structure and effect | action similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、空間文脈特徴表現が特徴点の幾何学特性の差分を更に表している点、近傍に存在する共通特徴点のペアを、Ｋ近傍法を用いて検出している点、及び他の画像と空間文脈特徴表現が共通する共通特徴点のペアを拡張していない点が第１の実施の形態と異なっている。 In the second embodiment, the spatial context feature expression further represents a difference in geometric characteristics of feature points, and a pair of common feature points existing in the vicinity is detected using the K-neighbor method. The difference from the first embodiment is that a pair of common feature points that share a spatial context feature expression with other images is not expanded.

＜第２の実施の形態に係る特徴選択装置の構成＞
次に、第２の実施の形態に係る特徴選択装置３００の構成について説明する。 <Configuration of Feature Selection Device According to Second Embodiment>
Next, the configuration of the feature selection device 300 according to the second embodiment will be described.

本発明の第２の実施の形態に係る特徴選択装置３００は、図９に示すように、入力部１０と、演算部３２０と、出力部９０とを備えている。 As shown in FIG. 9, the feature selection device 300 according to the second exemplary embodiment of the present invention includes an input unit 10, a calculation unit 320, and an output unit 90.

演算部３２０は、局所特徴抽出部３０と、空間文脈特徴表現部３４０と、注目領域推定部３６０と、特徴選択部７０とを備えている。 The calculation unit 320 includes a local feature extraction unit 30, a spatial context feature expression unit 340, an attention area estimation unit 360, and a feature selection unit 70.

空間文脈特徴表現部３４０は、局所特徴抽出部３０において画像の各々について抽出された局所特徴ベクトルの各々に基づいて、空間文脈を考慮した特徴表現・照合を実施し、他の画像と空間文脈特徴表現が共通する共通特徴点のペアを検出する。また、空間文脈特徴表現部３４０は、図１０に示すように、特徴点索引付与部４２と、共通特徴点検出部４４と、近傍検出部３４６と、近傍記述部３４８と、近傍索引付与部５２と、共通近傍検出部５４とを含んで構成されている。 The spatial context feature representation unit 340 performs feature representation / matching in consideration of the spatial context based on each of the local feature vectors extracted for each of the images by the local feature extraction unit 30, and the spatial context feature with other images. A pair of common feature points with common expressions is detected. Further, as shown in FIG. 10, the spatial context feature expression unit 340 includes a feature point index assigning unit 42, a common feature point detection unit 44, a neighborhood detection unit 346, a neighborhood description unit 348, and a neighborhood index assignment unit 52. And a common neighborhood detector 54.

近傍検出部３４６は、入力部１０において受け付けた画像の各々について、共通特徴点検出部４４において抽出された当該画像の共通特徴点の各々に対して、Ｋ近傍法を適用して、近傍に存在する共通特徴点のペアを検出し、検出された近傍に存在する共通特徴点のペアの集合からなる特徴点近傍集合を、画像毎に検出する。 The neighborhood detection unit 346 applies the K neighborhood method to each of the common feature points of the image extracted by the common feature point detection unit 44 for each of the images received by the input unit 10 and exists in the neighborhood. A common feature point pair is detected, and a feature point neighborhood set composed of a set of common feature point pairs existing in the detected neighborhood is detected for each image.

近傍記述部３４８は、入力部１０において受け付けた画像の各々について、近傍検出部３４６において検出された当該画像の特徴点近傍集合に基づいて、近傍に存在する共通特徴点のペア毎に、当該ペアを構成する共通特徴点の各々に割当てられたＶｉｓｕａｌＷｏｒｄを連結させたもの、及び当該ペアを構成する共通特徴点の各々の幾何学特性差分（楕円領域の大きさの差分、主方向の差分、楕円同士の最短距離等）を表す空間文脈特徴表現を生成し、当該ペアに割当てる。 For each of the images received by the input unit 10, the neighborhood description unit 348 determines, for each pair of common feature points existing in the neighborhood, based on the feature point neighborhood set of the image detected by the neighborhood detection unit 346. Connected to each of the common word points constituting the common feature points, and the geometric characteristic difference of each of the common feature points constituting the pair (difference in the size of the ellipse region, difference in the main direction, A spatial context feature representation representing the shortest distance between ellipses is generated and assigned to the pair.

注目領域推定部３６０は、入力部１０において受け付けた画像の各々について、空間文脈特徴表現部３４０において当該画像に対し検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアに基づいて、注目領域を推定する。また、注目領域推定部３６０は、図１１に示すように、推移閉包部３６４と、境界矩形計算部６６とを含んで構成されている。 The attention area estimation unit 360 sets, for each of the images received by the input unit 10, a pair of common feature points that are detected for the image by the spatial context feature expression unit 340 and share a spatial context feature expression with other images. Based on this, an attention area is estimated. Further, as shown in FIG. 11, the attention area estimation unit 360 includes a transition closure unit 364 and a boundary rectangle calculation unit 66.

推移閉包部３６４は、入力部１０において受け付けた画像毎に、共通近傍検出部５４において当該画像について検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアの各々に対応する辺を、無向グラフと見なし、推移閉包を用いて、少なくとも１つ以上の連結グラフを検出する。 For each image received by the input unit 10, the transition closure unit 364 corresponds to each pair of common feature points that are detected for the image by the common neighborhood detection unit 54 and share a spatial context feature expression with other images. The edge is regarded as an undirected graph, and at least one connected graph is detected using transitive closure.

＜第２の実施の形態に係る特徴選択装置の作用＞
次に、本発明の第２の実施の形態に係る特徴選択装置３００の作用について説明する。事前に、コードブック学習装置２００において学習されたコードブックが入力部１０により受け付けられ、特徴選択装置３００のコードブック記憶部３６に記憶される。そして、入力部１０において同一の対象物（概念や物体等）を含む複数の画像を受け付けると、特徴選択装置３００は、図１２に示す特徴選択処理ルーチンを実行する。 <Operation of Feature Selection Device According to Second Embodiment>
Next, the operation of the feature selection apparatus 300 according to the second embodiment of the present invention will be described. The code book learned in advance by the code book learning device 200 is received by the input unit 10 and stored in the code book storage unit 36 of the feature selection device 300. When the input unit 10 receives a plurality of images including the same target (concept, object, etc.), the feature selection device 300 executes a feature selection processing routine shown in FIG.

次に、ステップＳ３００では、入力部１０において受け付けた画像の各々について、ステップＳ２１２において取得した当該画像の共通特徴点の各々に対して、Ｋ近傍法を適用して、近傍に存在する共通特徴点のペアを検出し、検出された近傍に存在する共通特徴点のペアの集合からなる特徴点近傍集合を検出する。 Next, in step S300, for each of the images received by the input unit 10, the K feature method is applied to each of the common feature points of the image acquired in step S212, and common feature points existing in the vicinity. And a feature point neighborhood set consisting of a set of common feature point pairs existing in the detected neighborhood.

次に、ステップＳ３０２では、入力部１０において受け付けた画像の各々について、ステップＳ３００において取得した当該画像の特徴点近傍集合に基づいて、近傍に存在する共通特徴点のペア毎に、当該ペアを構成する共通特徴点の各々に対して、ステップＳ２０８において割当てられたＶｉｓｕａｌＷｏｒｄを連結させたもの、及び当該ペアを構成する共通特徴点の各々の幾何学特性差分（楕円領域の大きさの差分、主方向の差分、楕円同士の最短距離等）を表す空間文脈特徴表現を生成し、当該ペアに割当てる。 Next, in step S302, for each of the images received by the input unit 10, the pair is configured for each common feature point pair existing in the vicinity based on the feature point neighborhood set of the image acquired in step S300. Each of the common feature points connected to the Visual Word assigned in step S208 and the geometric characteristic difference of each of the common feature points constituting the pair (difference in the size of the ellipse region, the main difference) A spatial context feature representation representing a difference in direction, the shortest distance between ellipses, etc.) is generated and assigned to the pair.

次に、ステップＳ２１８では、ペアの各々について、ステップＳ３０２において当該ペア毎に割当てられた空間文脈特徴表現に基づいて、転置索引付与等を用いて、当該ペアに割当てられている空間文脈特徴表現に対応させて、当該ペアを近傍索引に入れ、近傍索引を生成する。 Next, in step S218, for each of the pairs, based on the spatial context feature expression assigned to each pair in step S302, using the transposed index assignment or the like, the spatial context feature expression assigned to the pair Correspondingly, the pair is placed in the neighborhood index to generate a neighborhood index.

次に、ステップＳ２２０では、ステップＳ２１８において取得した近傍索引に基づいて、入力部１０において受け付けた画像の各々について、ステップＳ３００において取得した当該画像に対して検出された、近傍に存在する共通特徴点のペアであり、かつ、ステップＳ３０２において取得した同一の空間文脈特徴表現が割当てられている、近傍に存在する共通特徴点のペアが他の画像から検出されているペアを、他の画像と空間文脈特徴表現が共通するペアとして検出する。 Next, in step S220, for each of the images received in the input unit 10 based on the neighborhood index acquired in step S218, the common feature points existing in the vicinity detected for the image acquired in step S300. And a pair in which a common feature point pair existing in the vicinity, to which the same spatial context feature representation acquired in step S302 is assigned, is detected from another image, and another image and space Detect as a pair with common context feature expressions.

次に、ステップＳ３０４では、入力部１０において受け付けた画像の各々について、ステップＳ２２０において取得した当該画像に対し検出された、他の画像と空間文脈特徴表現が共通する共通特徴点のペアの各々に対応する辺を、無向グラフと見なし、推移閉包を用いて、少なくとも１つ以上の連結グラフを検出する。 Next, in step S304, for each of the images received by the input unit 10, each pair of common feature points detected for the image acquired in step S220 and having a common spatial context feature representation with another image is obtained. The corresponding edge is regarded as an undirected graph, and at least one connected graph is detected using transitive closure.

次に、ステップＳ２２６では、入力部１０において受け付けた画像の各々について、ステップＳ３０４において取得した当該画像の連結グラフの各々に基づいて、当該画像の注目領域を推定する。 Next, in step S226, for each image received by the input unit 10, the attention area of the image is estimated based on each of the connection graphs of the image acquired in step S304.

以上説明したように、本発明の第２の実施の形態に係る特徴選択装置によれば、対象物を表す画像の各々について抽出された特徴点毎の局所特徴の各々を抽出し、他の画像と局所特徴が共通する共通特徴点を検出し、Ｋ近傍法により、近傍に存在する共通特徴点のペアの検出を行い、複数の画像において共通する空間文脈特徴表現を有する共通特徴点のペアの各々を検出し、複数の画像の各々について、共通する空間文脈特徴表現に基づいて、幾何学特性を考慮した注目領域を推定し、複数の画像の各々について、推定された注目領域に含まれる特徴点の局所特徴の各々を注目特徴として選択することにより、適切な注目特徴を選択することができる。 As described above, according to the feature selection device according to the second embodiment of the present invention, each local feature for each feature point extracted for each of the images representing the object is extracted, and another image is obtained. Common feature points that are common to local features are detected, a pair of common feature points existing in the neighborhood is detected by the K-neighbor method, and a pair of common feature points having a common spatial context feature representation in a plurality of images is detected. Detecting each of the plurality of images, estimating a region of interest in consideration of geometric characteristics based on a common spatial context feature expression, and including the features included in the estimated region of interest for each of the plurality of images By selecting each local feature of a point as a feature of interest, an appropriate feature of interest can be selected.

例えば、第２の実施の形態においては、近傍に存在する共通特徴点のペアに含まれる特徴点の各々に割当てられたＶｉｓｕａｌＷｏｒｄを連結したもの、及び近傍に存在する共通特徴点のペアに含まれる特徴点の各々の幾何学特性の差分を表す空間文脈特徴表現を生成する場合について説明したがこれに限定されるものではない。第１の実施の形態と同様に、近傍に存在する共通特徴点のペアに含まれる特徴点の各々に割当てられたＶｉｓｕａｌＷｏｒｄを連結したものを空間文脈特徴表現として生成してもよい。 For example, in the second embodiment, a combination of Visual Words assigned to each of feature points included in a pair of common feature points existing in the vicinity, and a pair of common feature points existing in the vicinity Although the case where the spatial context feature expression representing the difference between the geometric characteristics of each feature point to be generated is described has been described, the present invention is not limited to this. As in the first embodiment, a combination of Visual Words assigned to each feature point included in a pair of common feature points existing in the vicinity may be generated as a spatial context feature expression.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium or provided via a network. It is also possible to do.

１０入力部
２０演算部
３０局所特徴抽出部
３２特徴点抽出部
３４特徴点記述部
３６コードブック記憶部
３８ＶｉｓｕａｌＷｏｒｄ割当部
４０空間文脈特徴表現部
４２特徴点索引付与部
４４共通特徴点検出部
４６近傍検出部
４８近傍記述部
５２近傍索引付与部
５４共通近傍検出部
６０注目領域推定部
６２共通近傍拡張部
６４推移閉包部
６６境界矩形計算部
７０特徴選択部
９０出力部
１００特徴選択装置
２００コードブック学習装置
２１０入力部
２２０演算部
２３２特徴点抽出部
２３４特徴点記述部
２３６コードブック記憶部
２３８コードブック構築部
２９０出力部
３００特徴選択装置
３２０演算部
３４０空間文脈特徴表現部
３４６近傍検出部
３４８近傍記述部
３６０注目領域推定部
３６４推移閉包部 DESCRIPTION OF SYMBOLS 10 Input part 20 Operation part 30 Local feature extraction part 32 Feature point extraction part 34 Feature point description part 36 Codebook storage part 38 Visual Word allocation part 40 Spatial context feature expression part 42 Feature point index addition part 44 Common feature point detection part 46 Neighborhood detection unit 48 Neighborhood description unit 52 Neighborhood indexing unit 54 Common neighborhood detection unit 60 Region of interest estimation unit 62 Common neighborhood extension unit 64 Transition closure unit 66 Boundary rectangle calculation unit 70 Feature selection unit 90 Output unit 100 Feature selection device 200 Codebook Learning device 210 Input unit 220 Calculation unit 232 Feature point extraction unit 234 Feature point description unit 236 Code book storage unit 238 Code book construction unit 290 Output unit 300 Feature selection device 320 Calculation unit 340 Spatial context feature expression unit 346 Neighborhood detection unit 348 Neighborhood Description part 360 Attention area estimation part 364 Transition closure part

Claims

An input unit for receiving a plurality of images representing the object;
For each of a plurality of images received at the input unit, it extracts each characteristic point of the image, as a local feature of the feature point, and the local feature extractor for exiting each extract the Visual Word,
For each of the plurality of images, a feature point of the image, each of the feature points having a local feature common to a local feature extracted from another image is detected as a common feature point,
For each of the plurality of images, a pair of a plurality of common feature points in the image among the common feature points of the detected image is detected, and the plurality of pairs are detected for each of the detected pairs. Generating a spatial context feature representation including a descriptor concatenating Visual Words that are local features of the common feature points of
A spatial context feature representation unit that detects each of the pairs of spatial context feature representations that are the detected pairs and are common to spatial context feature representations generated from other images for each of the plurality of images; ,
A region of interest estimation unit that estimates a region of interest based on the pair of the common spatial context feature representations of the images detected in the spatial context feature representation unit for each of the plurality of images;
For each of the plurality of images, a feature selection unit that selects each of the local features of feature points included in the region of interest of the image estimated by the region of interest estimation unit as a feature of interest;
A feature selection device.

An input unit for receiving a plurality of images representing the object;
For each of the plurality of images received in the input unit, each of the feature points of the image is extracted, and a local feature extraction unit that extracts each of the local features of the feature points;
For each of the plurality of images, a feature point of the image, each of the feature points having a local feature common to a local feature extracted from another image is detected as a common feature point,
For each of the plurality of images, each of a plurality of common feature points among the common feature points of the detected image is detected, and the plurality of common feature points is detected for each of the detected pairs. Generate a spatial context feature representation that represents local features of
A spatial context feature representation unit that detects each of the pairs of spatial context feature representations that are the detected pairs and are common to spatial context feature representations generated from other images for each of the plurality of images; ,
A region of interest estimation unit that estimates a region of interest based on the pair of the common spatial context feature representations of the images detected in the spatial context feature representation unit for each of the plurality of images;
For each of the plurality of images, a feature selection unit that selects each of the local features of feature points included in the region of interest of the image estimated by the region of interest estimation unit as a feature of interest;
Only including,
The feature selection apparatus, wherein the spatial context feature representation represents a local feature of a plurality of pairs of common feature points existing in the vicinity, obtained based on a multiscale Delaunay diagram for the common feature points of the image .

The feature selection apparatus according to claim 1, wherein the spatial context feature representation represents a local feature of a pair of a plurality of common feature points existing in the vicinity obtained from a common feature point of the image by a K-neighbor method.

4. The spatial context feature representation represents a local feature of a plurality of common feature point pairs and a difference in geometric characteristics of the plurality of common feature point pairs. The feature selection device according to item.

A feature selection method in a feature selection device including an input unit, a local feature extraction unit, a spatial context feature expression unit, a region of interest estimation unit, and a feature selection unit,
The input unit receives a plurality of images representing an object,
The local feature extraction unit for each of the plurality of images received at the input unit, extracts each characteristic point of the image, as a local feature of the feature point, out each extract the Visual Word,
The spatial context feature expression unit, for each of the plurality of images, is a feature point of the image, each feature point having a local feature common to a local feature extracted from another image as a common feature point Detect
For each of the plurality of images, a pair of a plurality of common feature points in the image among the common feature points of the detected image is detected, and the plurality of pairs are detected for each of the detected pairs. Generating a spatial context feature representation including a descriptor concatenating Visual Words that are local features of the common feature points of
For each of the plurality of images, detecting each of the detected pairs of spatial context feature representations that are in common with spatial context feature representations generated from other images;
The attention area estimation unit estimates an attention area for each of the plurality of images based on the pair of the common spatial context feature representations of the images detected by the spatial context feature representation unit,
The feature selection unit selects, for each of the plurality of images, each of local features of feature points included in the attention region of the image estimated by the attention region estimation unit as a feature of interest.

A feature selection method in a feature selection device including an input unit, a local feature extraction unit, a spatial context feature expression unit, a region of interest estimation unit, and a feature selection unit,
The input unit receives a plurality of images representing an object,
The local feature extraction unit extracts each of the feature points of the image for each of the plurality of images received in the input unit, extracts each of the local features of the feature point,
The spatial context feature expression unit, for each of the plurality of images, is a feature point of the image, each feature point having a local feature common to a local feature extracted from another image as a common feature point Detect
For each of the plurality of images, each of a plurality of common feature points among the common feature points of the detected image is detected, and the plurality of common feature points is detected for each of the detected pairs. Generate a spatial context feature representation that represents local features of
For each of the plurality of images, detecting each of the detected pairs of spatial context feature representations that are in common with spatial context feature representations generated from other images;
The attention area estimation unit estimates an attention area for each of the plurality of images based on the pair of the common spatial context feature representations of the images detected by the spatial context feature representation unit,
The feature selection unit selects, as attention features, local features of feature points included in the attention region of the image estimated by the attention region estimation unit for each of the plurality of images.
Including
The feature selection method, wherein the spatial context feature representation represents a local feature of a plurality of pairs of common feature points existing in the vicinity obtained based on a multi-scale Delaunay diagram for the common feature points of the image .

The program for functioning a computer as each part which comprises the characteristic selection apparatus of any one of Claims 1-4 .