JP4873258B2

JP4873258B2 - Information processing apparatus and method, and program

Info

Publication number: JP4873258B2
Application number: JP2007273045A
Authority: JP
Inventors: 章中村; 嘉昭岩井; 隆之芦ヶ原
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-10-19
Filing date: 2007-10-19
Publication date: 2012-02-08
Anticipated expiration: 2027-10-19
Also published as: JP2009104245A

Description

本発明は、情報処理装置および方法、並びにプログラムに関し、特に、より確実に画像内の物体を認識することができるようになった情報処理装置および方法、並びにプログラムに関する。 The present invention relates to an information processing apparatus, method, and program, and more particularly, to an information processing apparatus, method, and program that can recognize an object in an image more reliably.

従来、局所特徴量を用いたテクスチャベースの一般物体認識手法が存在する（特許文献１参照）。かかる手法は、照明変化に強く、ロバストに物体の認識を可能にする一方で、テクスチャの少ない物体に適用すると識別能力が低下するという特徴がある。 Conventionally, there is a texture-based general object recognition method using local features (see Patent Document 1). Such a method is resistant to changes in illumination and can recognize an object robustly, but has a feature that the discrimination ability is lowered when applied to an object with less texture.

また、本願の出願当初の出願人によって既に特願2006-100705号として出願された願書に添付した明細書等には、エッジ情報やサポート点を使用することにより、テクスチャのない物体に対して局所特徴マッチングの手法の適用を可能にするといった手法が開示されている。即ち、かかる手法とは、モデル画像とクエリ画像とから特徴点を抽出し、その周辺の局所特徴量を記述し、特徴量同士のマッチングを行い、ハフ変換やRANSAC等を用いたアウトライヤ（ミスマッチ）除去を行った後のマッチングペア数で、モデル画像内の物体とクエリ画像内の物体の識別をするという手法である。
特開2004-326693号公報 In addition, the specification attached to the application already filed as Japanese Patent Application No. 2006-100705 by the applicant at the time of the filing of the present application uses the edge information and the support points, so that a local object with respect to an object having no texture is used. A technique that enables the application of a feature matching technique is disclosed. In other words, this method is to extract feature points from the model image and query image, describe the local feature values around them, match the feature values, and use outliers (mismatch) using Hough transform, RANSAC, etc. This is a technique of identifying an object in the model image and an object in the query image by the number of matching pairs after the removal.
JP 2004-326693 A

しかしながら、これらの従来の手法では、次のような３つの問題点が存在していた。その結果、これらの従来の手法よりも確実に画像内の物体を認識できる手法の実現が期待されている状況である。 However, these conventional methods have the following three problems. As a result, it is expected to realize a method that can recognize an object in an image more reliably than these conventional methods.

即ち、第１の問題点とは、モデル画像の特徴点位置とクエリ画像の特徴点位置の出現再現性が悪い場合には、識別能力が著しく低下するという問題点である。この第１の問題点は、エッジを使用した場合には、モデル画像のエッジとクエリ画像のエッジの再現性が識別能力に大きく影響するというという問題点となる。 That is, the first problem is that when the appearance reproducibility of the feature point position of the model image and the feature point position of the query image is poor, the discrimination ability is remarkably lowered. The first problem is that when edges are used, the reproducibility of the model image edges and the query image edges greatly affects the discrimination ability.

第２の問題点とは、最終的にモデルの識別をインライヤ（ミスマッチペア除去後）のマッチペア数で判断しているため、モデル画像内の物体とクエリ画像内の物体の類似度によらず、複雑なテクスチャや輪郭で、特徴点が多く出る物体同士のマッチペアは多くなり、単純なテクスチャや形状の物体は、マッチペアが少なくなるという傾向がある、という問題点である。 The second problem is that the identification of the model is finally determined by the number of match pairs of the inlier (after mismatched pair removal), so regardless of the similarity between the object in the model image and the object in the query image, The problem is that there are a large number of match pairs between objects that have many feature points due to complex textures and contours, and objects that have simple textures and shapes tend to have fewer match pairs.

第３の問題点とは、ベース点周辺にサポート点を設け、マッチングの精度向上に利用する場合、サポート点の選択基準が、複数のモデル画像間の差異を考慮していない、という問題点である。 The third problem is that when support points are provided around the base points and used for improving the accuracy of matching, the support point selection criteria do not take into account differences between multiple model images. is there.

本発明は、このような状況に鑑みてなされたものであり、より確実に画像内の物体を認識することができるようにするものである。 The present invention has been made in view of such a situation, and makes it possible to more reliably recognize an object in an image.

本発明の一側面の情報処理装置は、クエリ画像とモデル画像とを比較し、前記モデル画像の被写体と前記クエリ画像の被写体とを同定する情報処理装置であって、前記モデル画像からＮ個（Ｎは１以上の整数値）の特徴点が抽出され、抽出された前記Ｎ個の特徴点の特徴量がそれぞれ記述された場合、前記Ｎ個の特徴点とそれらの前記特徴量を示す情報が登録されるモデル辞書が自身内部または外部に存在し、前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応する前記特徴量と、前記クエリ画像との相関画像をそれぞれ生成する相関画像生成手段と、前記相関画像生成手段により生成された前記Ｎ個の相関画像のそれぞれについて、対応する前記特徴点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、Ｎ個のシフト相関画像を生成するシフト相関画像生成手段と、前記シフト相関画像生成手段により生成された前記Ｎ個のシフト相関画像の各画素の画素値を加算することで、相関和画像を生成する相関和画像生成手段と、前記相関和画像生成手段により生成された前記相関和画像に基づいて、前記モデル画像の被写体と前記クエリ画像の被写体とが一致するか否かを判定する判定手段とを備える。 An information processing apparatus according to one aspect of the present invention is an information processing apparatus that compares a query image and a model image and identifies a subject of the model image and a subject of the query image, and includes N ( N is an integer value of 1 or more), and when the feature values of the extracted N feature points are respectively described, the N feature points and information indicating the feature values are included. A model dictionary to be registered exists inside or outside, and a correlation image between the corresponding feature quantity and the query image for each of the N feature points of the model image registered in the model dictionary For each of the N correlation images generated by the correlation image generation unit and the correlation image generation unit for generating the corresponding feature points in the model image, By shifting the pixel position of the pixel, a shift correlation image generation unit that generates N shift correlation images, and a pixel value of each pixel of the N shift correlation images generated by the shift correlation image generation unit Based on the correlation sum image generation unit that generates a correlation sum image by addition and the correlation sum image generated by the correlation sum image generation unit, the subject of the model image matches the subject of the query image Determination means for determining whether or not to do.

前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のそれぞれについて、記述された自身の前記特徴量と、自身が抽出された前記モデル画像、および１以上の別モデル画像との相関画像がそれぞれ生成され、それらの相関画像に基づいて、前記モデル画像の前記被写体を識別するための寄与度を示す識別能力値が演算された場合、それらの識別能力値も、対応する前記特徴点を示す情報とともに前記モデル辞書にさらに登録され、前記シフト相関画像生成手段は、さらに、前記Ｎ個のシフト相関画像の各画素の画素値を、前記モデル辞書に登録された前記識別能力値に応じて重み付けをすることで、Ｎ個の重みつきシフト相関画像を生成し、前記相関和画像生成手段は、前記シフト相関画像生成手段により生成された前記Ｎ個の重みつきシフト相関画像の各画素の画素値を加算することで、前記相関和画像を生成する。 For each of the one or more feature points of the model image registered in the model dictionary, the described feature quantity of the model image, the model image from which the model image is extracted, and one or more different model images When correlation images are respectively generated and, based on the correlation images, a discrimination capability value indicating a degree of contribution for identifying the subject of the model image is calculated, the discrimination capability values also correspond to the corresponding feature The shift correlation image generation means is further registered in the model dictionary together with information indicating points, and the shift correlation image generation means further converts the pixel value of each pixel of the N shift correlation images to the identification capability value registered in the model dictionary. N weighted shift correlation images are generated by weighting in response, and the correlation sum image generation means generates the shift correlation image generation means. By adding the pixel value of each pixel of the number of weighted shift correlation image, to generate the correlation sum image.

前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のうちの少なくとも１つがベース点とされ、前記ベース点の一定範囲内に存在する前記特徴点の中から１以上のサポート点が選択された場合、それらのベース点とサポート点を示す情報も前記モデル辞書にさらに登録され、前記相関画像生成手段は、さらに、前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応するmb個（mbは０以上の整数値）のサポート点の前記特徴量と、前記クエリ画像とのmb個のサポート点相関画像をそれぞれ生成し、前記mb個のサポート点相関画像のそれぞれについて、対応する前記サポート点と前記ベース点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、mb個のサポート点シフト相関画像を生成し、前記Ｎ個の相関画像のそれぞれについて、自身および前記mb個のサポート点シフト相関画像の各画素の画素値を加算することで、Ｎ個の相関画像和を生成し、前記シフト相関画像生成手段は、前記相関画像生成手段により生成された前記Ｎ個の相関画像和のそれぞれから、前記Ｎ個のシフト相関画像を生成する。 At least one of the one or more feature points of the model image registered in the model dictionary is a base point, and one or more support points from among the feature points existing within a certain range of the base point Is selected, information indicating their base points and support points is also registered in the model dictionary, and the correlation image generation means further includes the N images of the model images registered in the model dictionary. For each feature point, the mb support point correlation images of the feature amount of the corresponding mb support points (mb is an integer value of 0 or more) and the query image are generated, and the mb support points are generated. For each of the point correlation images, by shifting the pixel position of each pixel according to the arrangement position of the corresponding support point and the base point in the model image, the mb support images are supported. Point correlation image, and for each of the N correlation images, the sum of the pixel values of each pixel of itself and the mb support point shift correlation images is generated to generate N correlation image sums. Then, the shift correlation image generation unit generates the N shift correlation images from each of the N correlation image sums generated by the correlation image generation unit.

前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のそれぞれについて、記述された自身の前記特徴量と、自身が抽出された前記モデル画像、および１以上の別モデル画像との相関画像がそれぞれ生成され、それらの相関画像に基づいて、前記モデル画像の前記被写体を識別するための寄与度を示す識別能力値が演算された場合、それらの識別能力値も、対応する前記特徴点を示す情報とともに前記モデル辞書にさらに登録され、前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のうちの少なくとも１つがベース点とされ、前記ベース点の一定範囲内に存在する前記特徴点の中から前記識別能力値が前記ベース点よりも高い前記特徴点がサポート点として選択された場合、それらのベース点とサポート点を示す情報も前記モデル辞書にさらに登録され、前記相関画像生成手段は、さらに、前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応するmb個のサポート点の前記特徴量と、前記クエリ画像とのmb個のサポート点相関画像をそれぞれ生成し、前記mb個のサポート点相関画像のそれぞれについて、対応する前記サポート点と前記ベース点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、mb個のサポート点シフト相関画像を生成し、前記Ｎ個の相関画像のそれぞれについて、自身および前記mb個のサポート点シフト相関画像の各画素の画素値を加算することで、Ｎ個の相関画像和を生成し、前記シフト相関画像生成手段は、前記相関画像生成手段により生成された前記Ｎ個の相関画像和の各画素の画素値を、前記モデル辞書に登録された前記識別能力値に応じて重み付けをすることで、Ｎ個の重みつきシフト相関画像を生成し、前記相関和画像生成手段は、前記シフト相関画像生成手段により生成された前記Ｎ個の重みつきシフト相関画像の各画素の画素値を加算することで、前記相関和画像を生成する。 For each of the one or more feature points of the model image registered in the model dictionary, the described feature quantity of the model image, the model image from which the model image is extracted, and one or more different model images When correlation images are respectively generated and, based on the correlation images, a discrimination capability value indicating a degree of contribution for identifying the subject of the model image is calculated, the discrimination capability values also correspond to the corresponding feature It is further registered in the model dictionary together with information indicating a point, and at least one of the one or more feature points of the model image registered in the model dictionary is set as a base point, and within a certain range of the base point If the feature point having a higher discrimination ability value than the base point is selected as a support point from the existing feature points, those base points and support points are selected. Information indicating points is also registered in the model dictionary, and the correlation image generation means further supports corresponding mb pieces for each of the N feature points of the model image registered in the model dictionary. Mb support point correlation images of the feature amount of the points and the query image are generated, respectively, and the corresponding support points and base points in the model image for each of the mb support point correlation images. Mb support point shift correlation images are generated by shifting the pixel position of each pixel in accordance with the arrangement position of the N and the mb support point shift correlations for each of the N correlation images. By adding the pixel values of each pixel of the image, N correlation image sums are generated, and the shift correlation image generation unit generates the previous correlation image generation unit generated by the correlation image generation unit. N weighted shift correlation images are generated by weighting the pixel value of each pixel of the N correlation image sums according to the discrimination ability value registered in the model dictionary, and the correlation sum image The generation unit generates the correlation sum image by adding pixel values of each pixel of the N weighted shift correlation images generated by the shift correlation image generation unit.

本発明の一側面の情報処理方法およびプログラムは、上述した本発明の一側面の情報処理装置に対応する方法およびプログラムである。 An information processing method and program according to one aspect of the present invention are a method and program corresponding to the information processing apparatus according to one aspect of the present invention described above.

本発明の一側面においては、クエリ画像とモデル画像とを比較し、前記モデル画像の被写体と前記クエリ画像の被写体とを同定する情報処理として、次のような処理が実行される。ただし、前記モデル画像からＮ個（Ｎは１以上の整数値）の特徴点が抽出され、抽出された前記Ｎ個の特徴点の特徴量がそれぞれ記述された場合、前記Ｎ個の特徴点とそれらの前記特徴量を示す情報が登録されるモデル辞書が存在する。この場合、前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応する前記特徴量と、前記クエリ画像との相関画像がそれぞれ生成され、生成された前記Ｎ個の相関画像のそれぞれについて、対応する前記特徴点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、Ｎ個のシフト相関画像が生成され、生成された前記Ｎ個のシフト相関画像の各画素の画素値を加算することで、相関和画像が生成され、生成された前記相関和画像に基づいて、前記モデル画像の被写体と前記クエリ画像の被写体とが一致するか否かが判定される。 In one aspect of the present invention, the following processing is executed as information processing for comparing the query image and the model image and identifying the subject of the model image and the subject of the query image. However, when N feature points (N is an integer value of 1 or more) are extracted from the model image and the feature values of the extracted N feature points are respectively described, the N feature points and There is a model dictionary in which information indicating these feature quantities is registered. In this case, for each of the N feature points of the model image registered in the model dictionary, a correlation image between the corresponding feature amount and the query image is generated, and the generated N images are generated. For each of the correlation images, N shift correlation images are generated by shifting the pixel position of each pixel in accordance with the arrangement position of the corresponding feature point in the model image, and the generated N A correlation sum image is generated by adding the pixel values of each pixel of the number of shift correlation images, and the subject of the model image matches the subject of the query image based on the generated correlation sum image. It is determined whether or not.

以上のように、本発明の一側面によれば、画像内の物体を認識することができる。特に、本発明の一側面によれば、より確実に画像内の物体を認識することができる。 As described above, according to one aspect of the present invention, an object in an image can be recognized. In particular, according to one aspect of the present invention, an object in an image can be recognized more reliably.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書又は図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書又は図面に記載されていることを確認するためのものである。したがって、明細書又は図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment that is described in the specification or the drawings but is not described here as an embodiment that corresponds to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

本発明の一側面の情報処理装置は、
クエリ画像（例えば図１２（Ａ）のクエリ画像２２）とモデル画像（例えば図１２（Ｂ）のモデル画像２１）とを比較し、前記モデル画像の被写体と前記クエリ画像の被写体とを同定する情報処理装置において（例えば図１１のクエリ画像認識部１３を有する図１の物体認識装置）、
前記モデル画像からＮ個（Ｎは１以上の整数値）の特徴点が抽出され、抽出された前記Ｎ個の特徴点の特徴量がそれぞれ記述された場合、前記Ｎ個の特徴点とそれらの前記特徴量を示す情報が登録されるモデル辞書（例えば図１１のモデル特徴量辞書１２）が自身内部または外部に存在し、
前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応する前記特徴量と、前記クエリ画像との相関画像をそれぞれ生成する相関画像生成手段（例えば図１１の相関画像生成部５２）と、
前記相関画像生成手段により生成された前記Ｎ個の相関画像のそれぞれについて、対応する前記特徴点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、Ｎ個のシフト相関画像を生成するシフト相関画像生成手段（例えば図１１のシフト相関画像生成部５３）と、
前記シフト相関画像生成手段により生成された前記Ｎ個のシフト相関画像の各画素の画素値を加算することで、相関和画像を生成する相関和画像生成手段（例えば図１１の相関画像和生成部５４）と、
前記相関和画像生成手段により生成された前記相関和画像に基づいて、前記モデル画像の被写体と前記クエリ画像の被写体とが一致するか否かを判定する判定手段（例えば図１１の判定部５５）と
を備える。 An information processing apparatus according to one aspect of the present invention includes:
Information for comparing a query image (for example, query image 22 in FIG. 12A) and a model image (for example, model image 21 in FIG. 12B) to identify the subject of the model image and the subject of the query image In the processing device (for example, the object recognition device in FIG. 1 having the query image recognition unit 13 in FIG. 11),
When N feature points are extracted from the model image (N is an integer value equal to or greater than 1), and the feature amounts of the extracted N feature points are respectively described, the N feature points and their feature points are described. A model dictionary (for example, the model feature dictionary 12 in FIG. 11) in which information indicating the feature is registered exists inside or outside itself,
Correlation image generation means for generating a correlation image between the corresponding feature quantity and the query image for each of the N feature points of the model image registered in the model dictionary (for example, the correlation in FIG. 11). An image generator 52);
For each of the N correlation images generated by the correlation image generation unit, the pixel position of each pixel is shifted in accordance with the arrangement position of the corresponding feature point in the model image, so that N Shift correlation image generation means (for example, shift correlation image generation unit 53 in FIG. 11) for generating a shift correlation image;
Correlation sum image generation means for generating a correlation sum image by adding pixel values of each pixel of the N shift correlation images generated by the shift correlation image generation means (for example, the correlation image sum generation unit in FIG. 11) 54)
Determination means for determining whether or not the subject of the model image and the subject of the query image match based on the correlation sum image generated by the correlation sum image generation means (for example, the determination unit 55 in FIG. 11). And.

前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のそれぞれについて、記述された自身の前記特徴量と、自身が抽出された前記モデル画像、および１以上の別モデル画像との相関画像がそれぞれ生成され、それらの相関画像に基づいて、前記モデル画像の前記被写体を識別するための寄与度を示す識別能力値が演算された場合、それらの識別能力値も、対応する前記特徴点を示す情報とともに前記モデル辞書にさらに登録され、
前記シフト相関画像生成手段は、さらに、前記Ｎ個のシフト相関画像の各画素の画素値を、前記モデル辞書に登録された前記識別能力値に応じて重み付けをすることで、Ｎ個の重みつきシフト相関画像を生成し（例えば図１４のステップＳ１４２を実行し）、
前記相関和画像生成手段は、前記シフト相関画像生成手段により生成された前記Ｎ個の重みつきシフト相関画像の各画素の画素値を加算することで、前記相関和画像を生成する。 For each of the one or more feature points of the model image registered in the model dictionary, the described feature quantity of the model image, the model image from which the model image is extracted, and one or more different model images When correlation images are respectively generated and, based on the correlation images, a discrimination capability value indicating a degree of contribution for identifying the subject of the model image is calculated, the discrimination capability values also correspond to the corresponding feature It is further registered in the model dictionary along with information indicating points,
The shift correlation image generation means further weights the pixel values of each pixel of the N shift correlation images according to the identification capability value registered in the model dictionary, thereby giving N weighted values. A shift correlation image is generated (for example, step S142 in FIG. 14 is executed),
The correlation sum image generation means generates the correlation sum image by adding pixel values of each pixel of the N weighted shift correlation images generated by the shift correlation image generation means.

前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のうちの少なくとも１つがベース点とされ、前記ベース点の一定範囲内に存在する前記特徴点の中から１以上のサポート点が選択された場合、それらのベース点とサポート点を示す情報も前記モデル辞書にさらに登録され、
前記相関画像生成手段は、さらに、
前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応するmb個のサポート点の前記特徴量と、前記クエリ画像とのmb個のサポート点相関画像をそれぞれ生成し、前記mb個のサポート点相関画像のそれぞれについて、対応する前記サポート点と前記ベース点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、mb個のサポート点シフト相関画像を生成し（例えば図１５のＳ１５１−２を実行し）、
前記Ｎ個の相関画像のそれぞれについて、自身および前記mb個のサポート点シフト相関画像の各画素の画素値を加算することで、Ｎ個の相関画像和を生成し（例えば図１５のＳ１５１−３を実行し）、
前記シフト相関画像生成手段は、
前記相関画像生成手段により生成された前記Ｎ個の相関画像和のそれぞれから、前記Ｎ個のシフト相関画像を生成する（例えば図１５のＳ１５２を実行する）。 At least one of the one or more feature points of the model image registered in the model dictionary is a base point, and one or more support points from among the feature points existing within a certain range of the base point Is selected, information indicating their base points and support points is also registered in the model dictionary,
The correlation image generation means further includes:
For each of the N feature points of the model image registered in the model dictionary, the feature amount of the corresponding mb support points and the mb support point correlation images of the query image are generated. Then, for each of the mb support point correlation images, mb support is provided by shifting the pixel position of each pixel according to the arrangement position of the corresponding support point and the base point in the model image. A point shift correlation image is generated (for example, S151-2 in FIG. 15 is executed)
For each of the N correlation images, N correlation image sums are generated by adding the pixel values of each pixel of itself and the mb support point shift correlation images (for example, S151-3 in FIG. 15). Run)
The shift correlation image generation means includes
The N shift correlation images are generated from each of the N correlation image sums generated by the correlation image generation means (for example, S152 in FIG. 15 is executed).

前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のそれぞれについて、記述された自身の前記特徴量と、自身が抽出された前記モデル画像、および１以上の別モデル画像との相関画像がそれぞれ生成され、それらの相関画像に基づいて、前記モデル画像の前記被写体を識別するための寄与度を示す識別能力値が演算された場合、それらの識別能力値も、対応する前記特徴点を示す情報とともに前記モデル辞書にさらに登録され、
前記モデル辞書に登録されている前記モデル画像の１以上の前記特徴点のうちの少なくとも１つがベース点とされ、前記ベース点の一定範囲内に存在する前記特徴点の中から前記識別能力値が前記ベース点よりも高い前記特徴点がサポート点として選択された場合、それらのベース点とサポート点を示す情報も前記モデル辞書にさらに登録され、
前記相関画像生成手段は、さらに、
前記モデル辞書に登録されている前記モデル画像の前記Ｎ個の特徴点のそれぞれについて、対応するmb個のサポート点の前記特徴量と、前記クエリ画像とのmb個のサポート点相関画像をそれぞれ生成し、前記mb個のサポート点相関画像のそれぞれについて、対応する前記サポート点と前記ベース点の前記モデル画像内の配置位置に応じて、各画素の画素位置をシフトすることで、mb個のサポート点シフト相関画像を生成し（例えば図１６のＳ１６１−２を実行し）、
前記Ｎ個の相関画像のそれぞれについて、自身および前記mb個のサポート点シフト相関画像の各画素の画素値を加算することで、Ｎ個の相関画像和を生成し（例えば図１６のＳ１６１−３を実行し）、
前記シフト相関画像生成手段は、
前記相関画像生成手段により生成された前記Ｎ個の相関画像和の各画素の画素値を、前記モデル辞書に登録された前記識別能力値に応じて重み付けをすることで、Ｎ個の重みつきシフト相関画像を生成し（例えば図１６のＳ１６２を実行し）、
前記相関和画像生成手段は、前記シフト相関画像生成手段により生成された前記Ｎ個の重みつきシフト相関画像の各画素の画素値を加算することで、前記相関和画像を生成する（例えば図１６のＳ１６３を実行する）。 For each of the one or more feature points of the model image registered in the model dictionary, the described feature quantity of the model image, the model image from which the model image is extracted, and one or more different model images When correlation images are respectively generated and, based on the correlation images, a discrimination capability value indicating a degree of contribution for identifying the subject of the model image is calculated, the discrimination capability values also correspond to the corresponding feature It is further registered in the model dictionary along with information indicating points,
At least one of the one or more feature points of the model image registered in the model dictionary is set as a base point, and the discrimination ability value is selected from the feature points existing within a certain range of the base point. When the feature point higher than the base point is selected as a support point, information indicating the base point and the support point is also registered in the model dictionary,
The correlation image generation means further includes:
For each of the N feature points of the model image registered in the model dictionary, the feature amount of the corresponding mb support points and the mb support point correlation images of the query image are generated. Then, for each of the mb support point correlation images, mb support is provided by shifting the pixel position of each pixel according to the arrangement position of the corresponding support point and the base point in the model image. A point shift correlation image is generated (for example, S161-2 in FIG. 16 is executed)
For each of the N correlation images, N correlation image sums are generated by adding the pixel values of each pixel of itself and the mb support point shift correlation images (for example, S161-3 in FIG. 16). Run)
The shift correlation image generation means includes
N weighted shifts are performed by weighting the pixel value of each pixel of the N correlated image sums generated by the correlated image generating means in accordance with the discrimination ability value registered in the model dictionary. Generate a correlation image (for example, execute S162 of FIG. 16),
The correlation sum image generation means generates the correlation sum image by adding pixel values of each pixel of the N weighted shift correlation images generated by the shift correlation image generation means (for example, FIG. 16). Step S163 is executed).

本発明の一側面の情報処理方法およびプログラムは、上述した本発明の一側面の情報処理装置に対応する方法およびプログラムである。詳細については後述するが、このプログラムは、例えば、図１７のリムーバブルメディア２１１や、記憶部２０８に含まれるハードディスク等の記録媒体に記録され、図１７の構成のコンピュータにより実行される。 An information processing method and program according to one aspect of the present invention are a method and program corresponding to the information processing apparatus according to one aspect of the present invention described above. Although details will be described later, this program is recorded on, for example, a removable medium 211 in FIG. 17 or a recording medium such as a hard disk included in the storage unit 208, and is executed by the computer having the configuration in FIG.

その他、本発明の一側面としては、上述した本発明の一側面のプログラムを記録した記録媒体も含まれる。 In addition, as one aspect of the present invention, a recording medium on which the program according to one aspect of the present invention described above is recorded is also included.

以下、図面を参照しながら本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施の形態である物体認識装置の機能の構成を示すブロック図である。 FIG. 1 is a block diagram showing a functional configuration of an object recognition apparatus according to an embodiment of the present invention.

図１において、物体認識装置は、モデル特徴量抽出部１１、モデル特徴量辞書１２、およびクエリ画像認識部１３から構成される。 In FIG. 1, the object recognition apparatus includes a model feature quantity extraction unit 11, a model feature quantity dictionary 12, and a query image recognition unit 13.

モデル特徴量抽出部１１は、物体の認識において、認識の対象の物体をそれぞれ含むモデル画像２１−１乃至２１−Ｎ（Ｎは１以上の整数値）から、モデル特徴量をそれぞれ抽出して、モデル特徴量辞書１２に登録する。 In the recognition of an object, the model feature amount extraction unit 11 extracts model feature amounts from model images 21-1 to 21-N (N is an integer value of 1 or more) each including an object to be recognized, It is registered in the model feature dictionary 12.

なお、モデル画像２１−１乃至２１−Ｎは、静止画像そのものまたは動画像のフレーム画像とされる。 The model images 21-1 to 21-N are still images themselves or frame images of moving images.

クエリ画像認識部１３は、モデル画像２１−１乃至２１−Ｎに含まれる各物体と比較され、認識される物体を含むクエリ画像２２から、クエリ特徴量を抽出し、モデル特徴量辞書１２に登録されているモデル特徴量とのマッチングを行い、そのマッチングの結果に基づいて、モデル画像２１−１乃至２１−Ｎ内の各物体とクエリ画像２２内の物体との同定を試みる。 The query image recognition unit 13 is compared with each object included in the model images 21-1 to 21 -N, extracts a query feature amount from the query image 22 including the recognized object, and registers it in the model feature amount dictionary 12. Matching is performed with the model feature quantity, and identification of each object in the model images 21-1 to 21-N and an object in the query image 22 is attempted based on the result of the matching.

なお、クエリ画像２２は、モデル画像２１−１乃至２１−Ｎと同様に、静止画像そのものまたは動画像のフレーム画像とされる。 The query image 22 is a still image itself or a frame image of a moving image, like the model images 21-1 to 21-N.

以下、モデル特徴量抽出部１１とクエリ画像認識部１３とのそれぞれの詳細について、その順番で個別に説明していく。 Hereinafter, the details of the model feature quantity extraction unit 11 and the query image recognition unit 13 will be described individually in that order.

なお、以下、モデル画像２１−１乃至２１−Ｎを個々に区別する必要がない場合、換言すると、モデル画像２１−１乃至２１−Ｎのうちの１つについて言及する場合、単にモデル画像２１と称する。 Hereinafter, when it is not necessary to distinguish the model images 21-1 to 21-N individually, in other words, when referring to one of the model images 21-1 to 21-N, the model images 21 and 21-N are simply referred to as the model image 21. Called.

図２は、モデル特徴量抽出部１１の機能の詳細な構成を示すブロック図である。 FIG. 2 is a block diagram showing a detailed configuration of the function of the model feature quantity extraction unit 11.

モデル特徴量抽出部１１は、特徴点抽出部３１、特徴量記述部３２、特徴点識別能力値演算部３３、サポート点選択部３４、および、モデル特徴量情報生成部３５を含むように構成される。 The model feature amount extraction unit 11 is configured to include a feature point extraction unit 31, a feature amount description unit 32, a feature point identification capability value calculation unit 33, a support point selection unit 34, and a model feature amount information generation unit 35. The

特徴点抽出部３１は、モデル画像２１から特徴点を抽出し、その抽出結果を特徴量記述部３２とモデル特徴量情報生成部３５とに提供する。 The feature point extraction unit 31 extracts feature points from the model image 21 and provides the extraction result to the feature amount description unit 32 and the model feature amount information generation unit 35.

なお、特徴点抽出部３１が採用する特徴点抽出手法自体は特に限定されない。 Note that the feature point extraction method itself employed by the feature point extraction unit 31 is not particularly limited.

具体的には例えば、図３は、Harrisコーナーディテクタ等を使用した特徴点抽出手法が採用された場合の特徴点の抽出結果を示している。図３の○（白丸印）が特徴点を示している。かかる手法では、図３に示されるように、コーナ点が特徴点として抽出される。 Specifically, for example, FIG. 3 shows a feature point extraction result when a feature point extraction method using a Harris corner detector or the like is employed. The circles (white circles) in FIG. 3 indicate feature points. In this method, as shown in FIG. 3, corner points are extracted as feature points.

また例えば、図４は、Cannyエッジディテクタ等を使用した特徴点抽出手法が採用された場合の特徴点の抽出結果を示している。図４の○（白丸印）が特徴点を示している。かかる手法では、図４に示されるように、エッジ点が特徴点として抽出される。 Further, for example, FIG. 4 shows a feature point extraction result when a feature point extraction method using a Canny edge detector or the like is employed. The circles (white circles) in FIG. 4 indicate feature points. In this method, as shown in FIG. 4, edge points are extracted as feature points.

特徴量記述部３２は、特徴点抽出部３１によって抽出された各特徴点の周辺で、局所特徴量記述を行う処理をそれぞれ行い、各処理結果を特徴点識別能力値演算部３３とモデル特徴量情報生成部３５とに提供する。 The feature quantity description unit 32 performs a process of describing a local feature quantity around each feature point extracted by the feature point extraction unit 31, and each process result is used as a feature point identification capability value calculation unit 33 and a model feature quantity. Provided to the information generation unit 35.

なお、特徴量記述部３２が採用する局所特徴量記述手法自体は特に限定されない。 Note that the local feature description method itself employed by the feature description unit 32 is not particularly limited.

例えば、画素値の輝度勾配等を利用した、局所特徴量のベクトル記述を行う手法を採用できる。 For example, it is possible to employ a method of performing a vector description of local feature amounts using a luminance gradient of pixel values.

具体的には例えば、図５に示されるように、特徴点周辺で、５ｘ５画素の範囲の輝度勾配をベクトル記述する場合は、各画素の輝度勾配の、ｘ成分、ｙ成分をそれぞれ次元とし、(Vx(0,0),Vy(0,0), Vx(0,1), Vy(0,1),…Vx(4,4), Vy(4,4))で５０次元ベクトルを構成する、といった手法を採用することができる。 Specifically, for example, as shown in FIG. 5, when describing the luminance gradient in the range of 5 × 5 pixels around the feature point as a vector, the x component and the y component of the luminance gradient of each pixel are respectively dimensioned. (Vx (0,0), Vy (0,0), Vx (0,1), Vy (0,1), ... Vx (4,4), Vy (4,4)) constitutes a 50-dimensional vector Can be employed.

また、別の手法としては、例えば、輝度勾配ベクトルの方向別にヒストグラムを取った記述手法等も採用できる。例えば特徴点周辺の起動勾配ベクトルの方向を、１０度ごとにヒストグラムをとった場合３６次元ベクトルとなる。 As another method, for example, a description method using a histogram for each direction of the luminance gradient vector can be employed. For example, when the direction of the starting gradient vector around the feature point is taken every 10 degrees, a 36-dimensional vector is obtained.

また例えば、輝度情報をそのまま特徴量とするといった手法等も採用できる。たとえば特徴量周辺の５ｘ５画素の範囲で、輝度情報をそのままベクトル記述する場合は、２５次元ベクトルとなる。 Further, for example, a method of directly using luminance information as a feature amount can be employed. For example, when the luminance information is described as a vector in the range of 5 × 5 pixels around the feature amount, it becomes a 25-dimensional vector.

さらにまた、上述した各種記述手法を組み合わせてもよい。 Furthermore, the various description methods described above may be combined.

特徴点識別能力値演算部３３は、特徴点抽出部３１によって抽出された各特徴点（特徴量記述部３２により特徴量記述化された各特徴点）のそれぞれについて、識別能力値を演算し、それらの各演算結果をサポート点選択部３４とモデル特徴量情報生成部３５に提供する。 The feature point identification capability value calculation unit 33 calculates a discrimination capability value for each feature point (each feature point described by the feature amount description unit 32) extracted by the feature point extraction unit 31, Each calculation result is provided to the support point selection unit 34 and the model feature amount information generation unit 35.

ここで、識別能力値とは、モデル画像２１に含まれる被写体、即ち認識対象の物体を、他の物体（他のモデル画像に含まれる物体等）と区別して識別する場合において、その特徴点がその識別にどの程度寄与しているのか、即ち、その特徴点がその識別においてどの程度影響を及ぼしているのか、といった被写体を識別するための特徴点の能力（モデル識別の能力）を示す値をいう。 Here, the discriminating ability value is a characteristic point when the subject included in the model image 21, that is, the recognition target object is distinguished from other objects (such as objects included in other model images). A value indicating the capability (model identification capability) of the feature point for identifying the subject such as how much it contributes to the identification, that is, how much the feature point affects the identification. Say.

図６は、識別能力値が算出されるまでの一連の処理を説明するフローチャートである。 FIG. 6 is a flowchart for explaining a series of processes until the discrimination ability value is calculated.

なお、以下、図６の記載に併せて、モデル画像２１−１から抽出された各特徴点についての識別能力値が演算される場合の処理について説明をする。ただし、実際には、モデル画像２１−１のみならず、別のモデル画像２１−２乃至２１−Ｎから抽出された各特徴点のそれぞれについても、以下の説明と同様の処理が施されて、識別能力値がそれぞれ演算される。 In the following, in conjunction with the description of FIG. 6, processing in the case where the discrimination ability value for each feature point extracted from the model image 21-1 is calculated will be described. However, in practice, not only the model image 21-1, but also each feature point extracted from the other model images 21-2 to 21-N is subjected to the same processing as described below. Discrimination ability values are respectively calculated.

図６のステップＳ１００において、モデル特徴量抽出部１１は、全モデル画像２１−１乃至２１−Ｎを取得する。 In step S100 of FIG. 6, the model feature amount extraction unit 11 acquires all model images 21-1 to 21-N.

ステップＳ１０２において、特徴点抽出部３１は、上述したように、モデル画像２１−１から１以上の特徴点を抽出する。ステップＳ１０３において、特徴量記述部３２は、上述したように、モデル画像２１−１から抽出された各特徴点について特徴量記述をそれぞれ行う。 In step S102, the feature point extraction unit 31 extracts one or more feature points from the model image 21-1, as described above. In step S103, as described above, the feature quantity description unit 32 performs feature quantity description for each feature point extracted from the model image 21-1.

このようなステップＳ１０２とＳ１０３の処理と並行して、ステップＳ１０４において、特徴点識別能力値演算部３３は、モデル画像２１−１乃至２１−Ｎのそれぞれから、特徴量画像４１−１乃至４１−Ｎをそれぞれ生成する。 In parallel with the processing of steps S102 and S103, in step S104, the feature point identification capability value calculation unit 33 extracts the feature amount images 41-1 to 41- from the model images 21-1 to 21-N, respectively. N is generated respectively.

ここで、特徴量画像４１−Ｋ（Ｋは、１乃至Ｎのうちの何れかの整数値）とは、モデル画像２１−Ｋの全画素を対象として、特徴量記述部３２に採用された局所特徴量記述手法と同一手法に従って特徴量記述がそれぞれ行われた場合、その記述結果、即ち、各特徴量を各画素値として構成された画像をいう。 Here, the feature amount image 41-K (K is an integer value of 1 to N) is a local amount adopted by the feature amount description unit 32 for all the pixels of the model image 21-K. When feature amount descriptions are performed according to the same method as the feature amount description method, the description result, that is, an image configured with each feature amount as each pixel value.

ステップＳ１０５において、特徴点識別能力値演算部３３は、モデル画像２１−１の各特徴点（ステップＳ１０２の処理で抽出されて、ステップＳ１０３の処理で特徴量記述化された各特徴点）のうちの、識別能力を演算したいP個（Pは、ステップＳ１０２の処理で抽出された個数以下の整数値）の特徴点についてそれぞれ、相関画像を生成する。 In step S105, the feature point identification capability value calculation unit 33 out of each feature point of the model image 21-1 (each feature point extracted in the process of step S102 and described in the feature amount in the process of step S103). A correlation image is generated for each of P feature points (P is an integer value equal to or smaller than the number extracted in the process of step S102) for which the discrimination ability is to be calculated.

ここで、相関画像４２−ＫＬ（Ｋは、上述の特徴量画像４１−ＫのＫと同一値。Ｌは、１乃至Ｐのうちのいずれかの値）とは、次のような画像をいう。即ち、識別能力を演算したいP個の特徴点に１乃至Pの番号を付したする。そして、そのうちの処理の対象として注目すべき番号Ｌの特徴点を、注目特徴点Ｌと称するとする。この場合、注目特徴点Ｌの特徴量と、特徴量画像４１−Ｋを構成する各画素値（即ち、各特徴量）とのマッチングがそれぞれ行われ、それぞれ相関（距離）値が求められたときに、それらの各相関値を各画素値として構成された画像が、相関画像４２−ＫＬとなる。このとき、相関値としては、例えば、ベクトル同士の正規化相関、距離0としてはユーグリッド距離等の尺度を採用することができる。 Here, the correlation image 42-KL (K is the same value as K of the above-described feature amount image 41-K. L is any value from 1 to P) refers to the following image. . That is, the number of 1 to P is given to P feature points for which the discrimination ability is to be calculated. And the feature point of the number L which should be noted as the object of the process is called the feature point L of interest. In this case, when the feature amount of the target feature point L is matched with each pixel value (that is, each feature amount) constituting the feature amount image 41-K, and a correlation (distance) value is obtained, respectively. In addition, an image configured with each of the correlation values as a pixel value is a correlation image 42-KL. At this time, as the correlation value, for example, a normalized correlation between vectors can be used, and a scale such as a Eugrid distance can be used as the distance 0.

即ち、注目特徴点Ｌに対して、Ｎ枚の特徴量画像４１−１，４１−２，・・・，４１−Ｎのそれぞれの各画素との相関を示すＮ枚の相関画像４２−１Ｌ，４２−２Ｌ，・・・，４２−ＮＬが生成される。 That is, N correlation images 42-1L indicating the correlation with each pixel of the N feature amount images 41-1, 41-2,. 42-2L,..., 42-NL are generated.

換言すると、１つの特徴量画像４１−Ｋに対しては、番号１乃至Pがそれぞれ付されたP個の各特徴点毎に１枚ずつの相関画像、即ち、P枚の相関画像４２−Ｋ１，４２−Ｋ２，・・・，４２−ＫＰが生成される。 In other words, for one feature value image 41-K, one correlation image for each of P feature points numbered 1 to P, that is, P correlation images 42-K1. , 42-K2,..., 42-KP are generated.

ステップＳ１０６において、特徴点識別能力値演算部３３は、番号１乃至Pが付されたP個の各特徴点毎に、全相関画像の平均または最大値から識別能力値をそれぞれ演算する。即ち、特徴点識別能力値演算部３３は、この平均または最大値の低いものから順に、モデル識別が高いものとして、識別能力値を与えていく。なお、全相関画像とは、注目特徴点Ｌに対して生成された相関画像の全て、即ち、Ｎ枚の相関画像４２−１Ｌ，４２−２Ｌ，・・・，４２−ＮＬをいう。 In step S106, the feature point discriminating ability value calculation unit 33 calculates the discriminating ability value from the average or maximum value of all correlation images for each of the P feature points numbered 1 to P. In other words, the feature point identification capability value calculation unit 33 gives the identification capability value as the model identification is higher in order from the lowest average or maximum value. The all correlation images are all correlation images generated for the target feature point L, that is, N correlation images 42-1L, 42-2L,..., 42-NL.

例えば、図７や図８には、識別能力値を画像化したものが示されている。ここで、識別能力値が高い特徴点ほど、明るく（白色に）なっている。即ち、図７は、カエルの形状を有する物体（以下、カエルと略称する）を含む画像がモデル画像２１−１とされた場合の識別能力値の例を示している。図７に示されるように、カエルの目の付近が、識別能力値が高い、即ち、カエルであることを識別するために重要な部分であることがわかる。一方、図８は、犬の形状を有する物体（以下、犬と略記する）を含む画像がモデル画像２１−１とされた場合の識別能力値の例を示している。図８に示されるように、犬の尾の付近が、識別能力値が高い、即ち、犬であることを識別するために重要な部分であることがわかる。 For example, FIGS. 7 and 8 show an image of the discrimination ability value. Here, the feature points with higher discrimination ability values are brighter (whiter). That is, FIG. 7 shows an example of the discrimination ability value when an image including an object having a frog shape (hereinafter abbreviated as a frog) is the model image 21-1. As shown in FIG. 7, it can be seen that the vicinity of the frog eye has a high discrimination ability value, that is, an important part for identifying that it is a frog. On the other hand, FIG. 8 shows an example of the discrimination ability value when an image including an object having a dog shape (hereinafter abbreviated as a dog) is a model image 21-1. As shown in FIG. 8, it can be seen that the vicinity of the tail of the dog is an important part for identifying the dog having a high discrimination capability value, that is, a dog.

なお、図示はしないが、特徴点識別能力値演算部３３は、図６のステップＳ１０６の処理後、例えば、Ｐ個の各特徴点の番号を、識別能力値の高い順に並び替える処理を実行してもよい。即ち、かかる処理後のＰ個の各特徴点の番号とは、モデル識別に重要な順番を示すことになる。 Although not shown, the feature point identification capability value calculation unit 33 executes, for example, a process of rearranging the numbers of the P feature points in descending order of the identification capability value after the process of step S106 in FIG. May be. That is, the numbers of the P feature points after the processing indicate the order important for model identification.

図２に戻り、サポート点選択部３４は、特徴点識別能力値演算部３３により算出された識別能力値を利用してサポート点を選択する。 Returning to FIG. 2, the support point selection unit 34 selects a support point using the discrimination capability value calculated by the feature point discrimination capability value calculation unit 33.

ここで、サポート点とは、次のような点をいう。即ち、特徴点抽出部３１により抽出された特徴点の中から基準となる点として選択された点を、以下、ベース点と称する。この場合、ベース点以外の特徴点であってベース点に従属して決定される点を、サポート点と称する。 Here, the support points refer to the following points. That is, the point selected as a reference point from the feature points extracted by the feature point extraction unit 31 is hereinafter referred to as a base point. In this case, a feature point other than the base point that is determined depending on the base point is referred to as a support point.

サポート点の決定手法自体は特に限定されず、例えば本実施の形態では、モデル画像２１において、ベース点の配置位置から一定範囲内にある特徴点のうちの、識別能力値がベース点よりも高い値を有する特徴点を、サポート点として選択する、といった手法が採用されているとする。かかる手法を採用した場合には、１つのベース点に対して、複数のサポート点が選択される場合もある。図９は、かかる手法に従ったサポート点選択部３４の処理（以下、サポート点選択処理と称する）の一例を説明するフローチャートである。 The support point determination method itself is not particularly limited. For example, in the present embodiment, in the model image 21, among the feature points within a certain range from the base point arrangement position, the discrimination ability value is higher than the base point. Assume that a method of selecting a feature point having a value as a support point is employed. When such a method is adopted, a plurality of support points may be selected for one base point. FIG. 9 is a flowchart for explaining an example of processing of the support point selection unit 34 (hereinafter referred to as support point selection processing) according to such a method.

図９のステップＳ１２１において、サポート点選択部３４は、モデル画像２１におけるＰ個の各特徴点の識別能力値を取得する。 In step S <b> 121 of FIG. 9, the support point selection unit 34 acquires the identification capability value of each of the P feature points in the model image 21.

ステップＳ１２２において、サポート点選択部３４は、Ｐ個の特徴点から１以上のベース点を選択する。なお、ベース点の選択手法自体は特に限定されない。 In step S122, the support point selection unit 34 selects one or more base points from the P feature points. The base point selection method itself is not particularly limited.

ステップＳ１２３において、サポート点選択部３４は、１以上のベース点のうちの所定の１つを処理対象として、その処理対象のベース点の位置から一定範囲内にある他の特徴点を抽出する。 In step S123, the support point selection unit 34 sets a predetermined one of the one or more base points as a processing target, and extracts other feature points within a certain range from the position of the base point to be processed.

ステップＳ１２４において、サポート点選択部３４は、抽出された特徴点の識別能力値が、ベース点の識別能力値より高いか否かを判定する。 In step S124, the support point selection unit 34 determines whether or not the extracted feature point identification capability value is higher than the base point identification capability value.

ここで、ステップＳ１２３の処理で、１つの特徴点も抽出されない場合がある。かかる場合には、ステップＳ１２４の処理でＮＯであると強制的に判定されて、処理はステップＳ１２６に進むとする。なお、ステップＳ１２６以降の処理については後述する。 Here, in the process of step S123, one feature point may not be extracted. In such a case, it is forcibly determined as NO in the process of step S124, and the process proceeds to step S126. The processing after step S126 will be described later.

逆に、ステップＳ１２３の処理で、複数の特徴点が抽出される場合がある。かかる場合には、複数の特徴点のうちの所定の１つがステップＳ１２４の処理対象となり、その処理対象の特徴点の識別能力値が、ベース点の識別能力値より高いか否かが判定される。 Conversely, a plurality of feature points may be extracted in the process of step S123. In such a case, it is determined whether or not a predetermined one of the plurality of feature points is a processing target in step S124, and the discrimination capability value of the processing target feature point is higher than the discrimination capability value of the base point. .

サポート点選択部３４は、ステップＳ１２４において、抽出された特徴点の識別能力値がベース点の識別能力値より高いと判定した場合、ステップＳ１２５において、抽出された特徴点（複数の特徴点が抽出されている場合には、処理対象の特徴点）をサポート点として選択する。これにより、処理はステップＳ１２６に進む。 If the support point selection unit 34 determines in step S124 that the discriminating ability value of the extracted feature point is higher than the discriminating ability value of the base point, in step S125, the support point selecting unit 34 extracts the extracted feature points (a plurality of feature points are extracted. If so, the feature point to be processed is selected as a support point. Thereby, the process proceeds to step S126.

これに対して、ステップＳ１２４において、抽出された特徴点の識別能力値がベース点の識別能力値より低いと判定された場合、ステップＳ１２５の処理は実行されずに、即ち、抽出された特徴点（複数の特徴点が抽出されている場合には、処理対象の特徴点）はサポート点として選択されずに、処理はステップＳ１２６に進む。 On the other hand, if it is determined in step S124 that the discriminating capability value of the extracted feature point is lower than the discriminating capability value of the base point, the process of step S125 is not executed, that is, the extracted feature point (If a plurality of feature points are extracted, the feature point to be processed) is not selected as a support point, and the process proceeds to step S126.

ステップＳ１２６において、サポート点選択部３４は、他に抽出された特徴点があるか否かを判定する。 In step S126, the support point selection unit 34 determines whether there are other extracted feature points.

即ち、上述したように、ステップＳ１２３の処理で複数の特徴点が抽出された場合には、ステップＳ１２６の処理でＹＥＳであると判定され、処理はステップＳ１２４に戻され、それ以降の処理が繰り返される。即ち、複数の特徴点のそれぞれが順次処理対象となり、ステップＳ１２４，Ｓ１２５，Ｓ１２６のループ処理が繰り返し実行される。その結果、複数の特徴点のうちの、ベース点よりも識別能力値が高い特徴点のみがサポート点として選択されることになる。複数の特徴点の全てについて上述のループ処理が実行されると、最後のループ処理のステップＳ１２６においてＮＯであると判定されて、処理はステップＳ１２７に進む。 That is, as described above, when a plurality of feature points are extracted in the process of step S123, it is determined YES in the process of step S126, the process returns to step S124, and the subsequent processes are repeated. It is. That is, each of the plurality of feature points is sequentially processed, and the loop processing of steps S124, S125, and S126 is repeatedly executed. As a result, among the plurality of feature points, only a feature point having a higher discrimination ability value than the base point is selected as a support point. When the above loop processing is executed for all of the plurality of feature points, it is determined NO in step S126 of the last loop processing, and the processing proceeds to step S127.

また、ステップＳ１２３の処理で１つの特徴点のみが抽出された場合または１つの特徴点も抽出されなかった場合におけるステップＳ１２６の処理では、直ちにＮＯであると判定され、処理はステップＳ１２７に進む。 Further, when only one feature point is extracted in the process of step S123 or when one feature point is not extracted, it is immediately determined as NO in the process of step S126, and the process proceeds to step S127.

ステップＳ１２７において、サポート点選択部３４は、他にベース点があるか否かを判定する。 In step S127, the support point selection unit 34 determines whether there is another base point.

まだ、処理対象となっていないベース点が存在する場合には、ステップＳ１２７の処理でＹＥＳであると判定されて、処理はステップＳ１２３に戻されてそれ以降の処理が繰り返される。 If there is still a base point that is not subject to processing, it is determined YES in step S127, the process returns to step S123, and the subsequent processing is repeated.

このようにして、１以上のベース点のそれぞれについて、サポート点が０以上選択されると、ステップＳ１２７の処理でＮＯであると判定されて、サポート点選択処理は終了する。 In this manner, when 0 or more support points are selected for each of one or more base points, it is determined NO in the process of step S127, and the support point selection process ends.

具体的には例えば、図１０（Ａ）乃至（Ｃ）のそれぞれには、ベース点とサポート点の選択結果が示されている。即ち、同一のモデル画像２１から、３つのベース点が選択されており、それぞれのベース点が○（白丸）として、図１０（Ａ）乃至（Ｃ）のそれぞれに示されている。そして、それらの３つのベース点に対して選択された複数のサポート点が、図１０（Ａ）乃至（Ｃ）に、ベース点を示す○（白丸）よりも小さな●（黒丸）としてそれぞれ示されている。 Specifically, for example, in each of FIGS. 10A to 10C, the selection result of the base point and the support point is shown. That is, three base points are selected from the same model image 21, and each base point is shown as a circle (white circle) in FIGS. 10A to 10C. A plurality of support points selected for these three base points are respectively shown in FIGS. 10A to 10C as ● (black circles) smaller than ○ (white circles) indicating the base points. ing.

図２に戻り、モデル特徴量情報生成部３５は、以上説明した特徴点抽出部３１乃至サポート点選択部３４の各種処理結果を示すモデル特徴量情報（ベース点＋サポート点）を生成し、モデル特徴量辞書１２に登録する。即ち、モデル特徴量情報とは、モデル画像２１−１乃至２１−Ｎのそれぞれについて、抽出された各特徴点に関する情報をいう。具体的には例えば、それらの特徴点がベース点とサポート点との別に区別されて、それぞれについての局所特徴量や識別能力値、また、サポート点情報等から構成される情報が、モデル特徴量情報である。 Returning to FIG. 2, the model feature amount information generation unit 35 generates model feature amount information (base point + support point) indicating various processing results of the feature point extraction unit 31 to the support point selection unit 34 described above. Register in the feature dictionary 12. That is, the model feature amount information is information related to each feature point extracted for each of the model images 21-1 to 21-N. Specifically, for example, these feature points are classified into base points and support points, and information including local feature amounts, discriminating ability values, support point information, and the like is model feature amounts. Information.

以上、図１の物体認識装置のうちのモデル特徴量抽出部１１の詳細について説明した。以下、クエリ画像認識部１３の詳細について説明する。 The details of the model feature quantity extraction unit 11 in the object recognition apparatus of FIG. 1 have been described above. Details of the query image recognition unit 13 will be described below.

図１１は、クエリ画像認識部１３の機能の詳細な構成を示すブロック図である。 FIG. 11 is a block diagram illustrating a detailed configuration of functions of the query image recognition unit 13.

クエリ画像認識部１３は、特徴画像生成部５１、相関画像生成部５２、シフト相関画像生成部５３、相関画像和生成部５４、および判定部５５を含むように構成される。 The query image recognition unit 13 is configured to include a feature image generation unit 51, a correlation image generation unit 52, a shift correlation image generation unit 53, a correlation image sum generation unit 54, and a determination unit 55.

特徴画像生成部５１は、認識させたい物体を含むクエリ画像２２が入力されると、そのクエリ画像２２から特徴量画像を生成する。即ち、上述した図６のステップＳ１０４と同様の処理がクエリ画像２２に施されることになる。 When a query image 22 including an object to be recognized is input, the feature image generation unit 51 generates a feature amount image from the query image 22. That is, the processing similar to step S104 in FIG. 6 described above is performed on the query image 22.

相関画像生成部５２は、クエリ画像２２の特徴量画像の各画素値（即ち、各画素の特徴量）と、モデル特徴量辞書１２に記録された各モデル画像２１−１乃至２１−Ｎの各特徴点（以下、各モデル特徴点と称する）の特徴量とのマッチングを行い、それぞれの相関（距離）値を各画素値として構成する画像、即ち、相関画像を生成する。 The correlation image generation unit 52 stores each pixel value of the feature amount image of the query image 22 (that is, the feature amount of each pixel) and each of the model images 21-1 to 21-N recorded in the model feature amount dictionary 12. Matching with feature quantities of feature points (hereinafter, referred to as model feature points) is performed, and an image in which each correlation (distance) value is configured as each pixel value, ie, a correlation image is generated.

シフト相関画像生成部５３は、各モデル特徴点の位置に応じて、それぞれ対応する相関画像の各画素位置をシフトさせた画像（以下、シフト相関画像と称する）を生成する。なお、シフト相関画像の生成手法については、図１２乃至図１６を参照して後述する。 The shift correlation image generation unit 53 generates an image (hereinafter referred to as a shift correlation image) obtained by shifting each pixel position of the corresponding correlation image according to the position of each model feature point. A method for generating a shift correlation image will be described later with reference to FIGS.

相関画像和生成部５４は、モデル画像２１−１乃至２１−Ｎ毎に、各モデル特徴点の各シフト相関画像、若しくは、それらに対して各種画像処理を施した後の各画像の和を取った画像（以下、相関和画像と称する）を生成する。即ち、相関和画像とは、２以上の画像の各画素値の総和を、それぞれの画素値として構成される画像をいう。 The correlation image sum generation unit 54 calculates, for each model image 21-1 to 21-N, each shift correlation image of each model feature point, or the sum of each image after various image processing is performed on them. The image (hereinafter referred to as a correlation sum image) is generated. That is, the correlation sum image is an image configured by using the sum of the pixel values of two or more images as the respective pixel values.

なお、相関和画像の生成手法（シフト相関画像に対して施される各種画像処理の例含む）の具体例については、図１２乃至図１６を参照して後述する。 A specific example of the correlation sum image generation method (including examples of various image processes performed on the shift correlation image) will be described later with reference to FIGS.

判定部５５は、モデル画像２１−１乃至２１−Ｎのそれぞれに対して生成された各相関和画像に基づいて、モデル画像２１−１乃至２１−Ｎに含まれる各物体がクエリ画像２２に含まれている物体と同一であるか否かを判定し、その判定結果を出力する。 The determination unit 55 includes the objects included in the model images 21-1 to 21-N in the query image 22 based on the correlation sum images generated for the model images 21-1 to 21-N. It is determined whether or not the object is the same as the detected object, and the determination result is output.

即ち、所定のモデル画像２１−Ｋについての相関和画像のうちの、シフト相関画像の生成時のシフト位置（後述する例では中央付近の位置）の画素値が、相関和画像のローカルピークとなる。そして、かかるローカルピークが、モデル画像２１−Ｋに含まれる物体が、クエリ画像２２においてどの程度の割合で存在するのかを示す存在推定度を表すことになる。よって、判定部５５は、モデル画像２１−Ｋの相関和画像のローカルピークが閾値以上の場合、モデル画像２１−Ｋに含まれる物体がクエリ画像２２に含まれる画像と一致すると判定すること、即ち、その物体を認識することができる。 That is, among the correlation sum images for the predetermined model image 21-K, the pixel value at the shift position (position near the center in the example described later) at the time of generation of the shift correlation image becomes the local peak of the correlation sum image. . Then, the local peak represents a presence estimation degree indicating how much an object included in the model image 21 -K is present in the query image 22. Therefore, the determination unit 55 determines that the object included in the model image 21-K matches the image included in the query image 22 when the local peak of the correlation sum image of the model image 21-K is equal to or greater than the threshold. The object can be recognized.

以下、図１２乃至図１６を参照して、クエリ画像認識部１３の動作のうちの、主に相関画像生成部５２乃至相関画像和生成部５４の動作について説明する。 Hereinafter, the operations of the correlation image generation unit 52 to the correlation image sum generation unit 54 among the operations of the query image recognition unit 13 will be described with reference to FIGS.

即ち、図１３乃至図１６は、図１２（Ａ）の画像がクエリ画像２２として入力された場合における、図１２（Ｂ）のモデル画像２１との相関和画像が生成されるまでの各種処理結果の具体例を示している。 That is, FIGS. 13 to 16 show various processing results until the correlation sum image with the model image 21 of FIG. 12B is generated when the image of FIG. 12A is input as the query image 22. A specific example is shown.

図１３の例では、モデル画像２１の特徴量情報として、４つのベース点b1乃至b4の特徴量のみが利用されて、相関和画像が生成される。換言すると、図１３の例では、後述する他の例のように、サポート点の情報や、識別能力値は一切利用されない。なお、ベース点b1乃至b4は例示にしか過ぎず、ベース点の位置や個数は図１３の例に限定されず任意であることは言うまでもない。 In the example of FIG. 13, as the feature amount information of the model image 21, only the feature amounts of the four base points b1 to b4 are used to generate a correlation sum image. In other words, in the example of FIG. 13, the support point information and the identification capability value are not used at all as in other examples described later. Note that the base points b1 to b4 are merely examples, and it goes without saying that the positions and the number of base points are not limited to the example of FIG. 13 and are arbitrary.

図１３のステップＳ１３１において、相関画像生成部５２は、クエリ画像２２の特徴量画像の各画素値（即ち、各画素の特徴量）と、モデル画像２１のベース点b1乃至b4の各特徴量とのマッチングを行うことで、図１３のＳ１３１の枠内に示されるような４つの相関画像を生成する。 In step S131 of FIG. 13, the correlation image generation unit 52 determines each pixel value of the feature amount image of the query image 22 (that is, a feature amount of each pixel) and each feature amount of the base points b1 to b4 of the model image 21. By performing this matching, four correlation images as shown in the frame of S131 of FIG. 13 are generated.

ステップＳ１３２において、シフト相関画像生成部５３は、各ベース点b1乃至b4の位置に応じて、それぞれ対応する相関画像の各画素位置をシフトさせることで、図１３のＳ１３２の枠内に示されるような４つのシフト相関画像を生成する。 In step S132, the shift correlation image generation unit 53 shifts each pixel position of the corresponding correlation image in accordance with the positions of the base points b1 to b4, as shown in the frame of S132 in FIG. Four shift correlation images are generated.

図１３の例のシフト相関画像は、モデル画像２１におけるベース点bn（nは整数値であって、図１３の例ではnは１乃至４のうちの何れかの値）の存在位置（相関画像の対応画素位置）が、画像の中央位置にシフトするように、相関画像の各画素位置がシフトされた結果得られる画像となっている。 The shift correlation image in the example of FIG. 13 is the presence position (correlation image) of the base point bn (n is an integer value, and n is any one of 1 to 4 in the example of FIG. 13). Is obtained as a result of shifting each pixel position of the correlation image so that the corresponding pixel position is shifted to the center position of the image.

ステップＳ１３３において、相関画像和生成部５４は、これらの４つのシフト相関画像を単純に足し合わせることで、図１３のＳ１３３の枠内に示されるような相関和画像を生成する。なお、「足し合わせるとは」、上述の如く、各画素毎に、各画素値を足し合わせることを意味する。このことは、以下の説明でも同様である。 In step S133, the correlation image sum generation unit 54 simply adds these four shift correlation images to generate a correlation sum image as shown in the frame of S133 in FIG. Note that “adding” means adding the pixel values for each pixel as described above. The same applies to the following description.

このような図１３の例に対して、図１４の例では、モデル画像２１の特徴量情報として、４つのベース点b1乃至b4の特徴量に加えて、それらの識別能力値に基づく重み値α１乃至α４が利用されて、相関和画像が生成される。 In contrast to the example of FIG. 13, in the example of FIG. 14, as the feature amount information of the model image 21, in addition to the feature amounts of the four base points b1 to b4, the weight value α1 based on their discrimination ability values A correlation sum image is generated by using .alpha.4.

即ち、ステップＳ１４１において、相関画像生成部５２は、クエリ画像２２の特徴量画像の各画素値（即ち、各画素の特徴量）と、モデル画像２１のベース点b1乃至b4の各特徴量とのマッチングを行うことで、図１４のＳ１４１の枠内に示されるような４つの相関画像を生成する。 That is, in step S <b> 141, the correlation image generation unit 52 calculates each pixel value (that is, a feature amount of each pixel) of the feature amount image of the query image 22 and each feature amount of the base points b <b> 1 to b <b> 4 of the model image 21. By performing matching, four correlation images as shown in the frame of S141 in FIG. 14 are generated.

なお、図１４のＳ１４１の枠内に示される４つの相関画像とは、図１３のＳ１３１の枠内に示される４つの相関画像と同一である。即ち、ステップＳ１４１の処理とステップＳ１３１の処理とは同様の処理である。 Note that the four correlation images shown in the frame of S141 in FIG. 14 are the same as the four correlation images shown in the frame of S131 in FIG. That is, the process of step S141 and the process of step S131 are the same process.

ステップＳ１４２の処理では、シフト相関画像の生成処理が実行される。ただし、ステップＳ１４２の処理は、図１３のステップＳ１３２の処理とは異なる。 In the process of step S142, a shift correlation image generation process is executed. However, the process of step S142 is different from the process of step S132 of FIG.

即ち、ステップＳ１４２−１において、シフト相関画像生成部５３は、各ベース点b1乃至b4の位置に応じて、それぞれ対応する相関画像の各画素位置をシフトさせることで、図１４のＳ１４２−１の点線枠内に示されるような４つのシフト相関画像を生成する。 That is, in step S142-1, the shift correlation image generation unit 53 shifts each pixel position of the corresponding correlation image in accordance with the position of each base point b1 to b4, so that the process of S142-1 in FIG. Four shift correlation images as shown in the dotted frame are generated.

なお、図１４のＳ１４２−１の点線枠内に示される４つのシフト相関画像とは、図１３のＳ１３２の枠内に示される４つのシフト相関画像と同一である。即ち、ステップＳ１４２−１の処理とは、図１３のステップＳ１３２の処理と同様の処理である。 Note that the four shift correlation images shown in the dotted frame in S142-1 in FIG. 14 are the same as the four shift correlation images shown in the frame in S132 in FIG. That is, the process of step S142-1 is the same process as the process of step S132 of FIG.

換言すると、ステップＳ１４２の処理とは、図１３のステップＳ１３２（＝ステップＳ１４２−１）の処理に加えて、さらに次のようなステップＳ１４２−２の処理が付加された処理であるといる。なお、ステップＳ１４２−２の処理で最終的に得られるシフト相関画像と、ステップＳ１４２−１の処理の結果得られるシフト相関画像とを個々に区別すべく、以下、前者を重みつきシフト相関画像と称し、後者を単純シフト相関画像と称する。 In other words, the process of step S142 is a process in which the following process of step S142-2 is further added to the process of step S132 (= step S142-1) of FIG. In order to distinguish the shift correlation image finally obtained by the process of step S142-2 from the shift correlation image obtained as a result of the process of step S142-1, hereinafter, the former will be referred to as a weighted shift correlation image. The latter is referred to as a simple shift correlation image.

即ち、ステップＳ１４２−１の処理では、図１４のＳ１４２−１の点線枠内に示される４つの単純シフト相関画像が生成される。そこで、ステップＳ１４２−２において、シフト相関画像生成部５３は、各ベース点b1乃至b4のそれぞれに対応する各単純シフト相関画像の各画素値に対して、各ベース点b1乃至b4の識別能力値に基づく重み値α1乃至α4をそれぞれ掛けることで、識別能力値に応じた重み付けがなされた各画素値により構成される画像、即ち、図１４のＳ１４２−２の点線枠内に示されるような４つの重みつきシフト相関画像を生成する。 That is, in the process of step S142-1, four simple shift correlation images shown in the dotted line frame of S142-1 in FIG. 14 are generated. Therefore, in step S142-2, the shift correlation image generation unit 53 determines the discrimination capability value of each base point b1 to b4 for each pixel value of each simple shift correlation image corresponding to each of the base points b1 to b4. Are multiplied by the weight values α1 to α4 based on the image, respectively, and an image composed of each pixel value weighted according to the discriminating ability value, that is, 4 as shown in the dotted line frame of S142-2 in FIG. One weighted shift correlation image is generated.

ステップＳ１４３において、相関画像和生成部５４は、これらの４つの重みつきシフト相関画像を単純に足し合わせることで、図１４のＳ１４３の枠内に示されるような相関和画像を生成する。 In step S143, the correlation image sum generation unit 54 generates a correlation sum image as shown in the frame of S143 in FIG. 14 by simply adding these four weighted shift correlation images.

このような図１３，図１４の例に対して、図１５の例では、モデル画像２１の特徴量情報として、４つのベース点b1乃至b4の特徴量に加えてさらに、各ベース点b1乃至b4のサポート点の情報が利用されて、相関和画像が生成される。ただし、図１５の例では、図１４の例のように、識別能力値に基づく重み値α１乃至α４は利用されない。 In contrast to the examples of FIGS. 13 and 14, in the example of FIG. 15, in addition to the feature amounts of the four base points b1 to b4, the base points b1 to b4 are added as the feature amount information of the model image 21. Correlation sum images are generated using the information of the support points. However, in the example of FIG. 15, the weight values α1 to α4 based on the discrimination ability value are not used as in the example of FIG.

ステップＳ１５１の処理では、相関画像の生成処理が実行される。ただし、ステップＳ１５１の処理は、図１３のステップＳ１３１や図１４のステップＳ１４１の処理とは異なる。 In the process of step S151, a correlation image generation process is executed. However, the process of step S151 is different from the process of step S131 of FIG. 13 and step S141 of FIG.

即ち、ステップＳ１５２−１において、相関画像生成部５２は、クエリ画像２２の特徴量画像の各画素値（即ち、各画素の特徴量）と、モデル画像２１のベース点b1乃至b4の各特徴量とのマッチングを行うことで、図１５のＳ１５１−１の枠内に示されるような４つの相関画像を生成する。 That is, in step S152-1, the correlation image generation unit 52 determines each pixel value of the feature amount image of the query image 22 (ie, the feature amount of each pixel) and each feature amount of the base points b1 to b4 of the model image 21. Are generated, four correlation images as shown in the frame of S151-1 in FIG. 15 are generated.

なお、図１５のＳ１５１−１の枠内に示される４つの相関画像とは、図１３のＳ１３１の枠内に示される４つの相関画像と同一、即ち図１４のＳ１４１の枠内に示される４つの相関画像と同一である。即ち、ステップＳ１５１−１の処理とは、図１３のステップＳ１３１や図１４のステップＳ１４１の処理と同様の処理である。 Note that the four correlation images shown in the frame of S151-1 in FIG. 15 are the same as the four correlation images shown in the frame of S131 in FIG. 13, that is, 4 shown in the frame of S141 in FIG. Identical to two correlation images. That is, the process of step S151-1 is the same process as the process of step S131 of FIG. 13 or step S141 of FIG.

換言すると、ステップＳ１５１の処理とは、図１３のステップＳ１３１（＝図１４のステップＳ１４１＝図１５のステップＳ１５１−１）の処理に加えて、さらに次のようなステップＳ１５１−２，Ｓ１５１−３の処理が付加された処理であるといる。なお、以下、各ステップＳ１５１−１乃至Ｓ１５１−３の各処理の結果得られる相関画像を個々に区別すべく、ステップＳ１５１−１の処理の結果得られる相関画像をベース点相関画像と称し、ステップＳ１５１−２の処理の結果得られる相関画像をサポート点シフト相関画像と称し、ステップＳ１５１−３の処理の結果得られる相関画像を、ベース点bnを中心としたサポート点シフト相関画像和と称する。 In other words, in addition to the processing of step S131 in FIG. 13 (= step S141 in FIG. 14 = step S151-1 in FIG. 15), the processing in step S151 further includes the following steps S151-2 and S151-3. This process is added. Hereinafter, the correlation image obtained as a result of the process of step S151-1 is referred to as a base point correlation image in order to individually distinguish the correlation images obtained as a result of the processes of steps S151-1 to S151-3. The correlation image obtained as a result of the process of S151-2 is referred to as a support point shift correlation image, and the correlation image obtained as a result of the process of step S151-3 is referred to as a support point shift correlation image sum centered on the base point bn.

即ち、ステップＳ１５１−１の処理では、図１５のＳ１５１−１の点線枠内に示される４つのベース点相関画像が生成される。 That is, in the process of step S151-1, four base point correlation images shown in the dotted line frame of S151-1 of FIG. 15 are generated.

ステップＳ１５１−２において、相関画像生成部５２は、モデル画像２１のベース点bnについて、クエリ画像２２の特徴量画像の各画素値（即ち、各画素の特徴量）と、ベース点bnにおけるサポート点snm（mは、１以上の整数値）の各特徴量とのマッチングをそれぞれ行うことで、m個の相関画像を生成する。さらに、相関画像生成部５２は、サポート点snmの存在位置（相関画像の対応画素位置）を、ベース点bnの存在位置（相関画像の対応画素位置）にシフトすることで、ベース点b1乃至b4のそれぞれについて、図１５のＳ１５１−２の枠内に示されるようなm個のサポート点シフト相関画像をそれぞれ生成する。 In step S151-2, the correlation image generation unit 52 determines each pixel value of the feature amount image of the query image 22 (that is, the feature amount of each pixel) and the support point at the base point bn for the base point bn of the model image 21. By performing matching with each feature quantity of snm (m is an integer value of 1 or more), m correlation images are generated. Furthermore, the correlation image generation unit 52 shifts the presence position of the support point snm (corresponding pixel position of the correlation image) to the presence position of the base point bn (corresponding pixel position of the correlation image), thereby causing the base points b1 to b4 to be shifted. For each of these, m support point shift correlation images as shown in the frame of S151-2 of FIG. 15 are generated.

即ち、ベース点b1には、2個のサポート点s11，s12が存在する。よって、サポート点s11についてのサポート点シフト相関画像と、サポート点s12についてのサポート点シフト相関画像が生成される。 That is, there are two support points s11 and s12 at the base point b1. Therefore, a support point shift correlation image for the support point s11 and a support point shift correlation image for the support point s12 are generated.

以下同様に、ベース点b2には、3個のサポート点s21，s22，s23が存在する。よって、サポート点s21についてのサポート点シフト相関画像、サポート点s22についてのサポート点シフト相関画像、および、サポート点s23についてのサポート点シフト相関画像が生成される。 Similarly, there are three support points s21, s22, and s23 at the base point b2. Therefore, a support point shift correlation image for the support point s21, a support point shift correlation image for the support point s22, and a support point shift correlation image for the support point s23 are generated.

ベース点b3には、2個のサポート点s31，s32が存在する。よって、サポート点s31についてのサポート点シフト相関画像と、サポート点s32についてのサポート点シフト相関画像が生成される。 There are two support points s31 and s32 at the base point b3. Therefore, a support point shift correlation image for the support point s31 and a support point shift correlation image for the support point s32 are generated.

ベース点b4には、1個のサポート点s41が存在する。よって、サポート点s41についてのサポート点シフト相関画像が生成される。 There is one support point s41 at the base point b4. Therefore, a support point shift correlation image for the support point s41 is generated.

ステップＳ１５１−３において、相関画像生成部５２は、モデル画像２１のベース点bnについて、対応するベース点相関画像（ステップＳ１５１−１の処理の結果得られる画像）と、対応するm個のサポート点シフト相関画像（ステップＳ１５１−２の処理の結果得られる画像）を単純に足し合わせることで、図１５のＳ１５１−３の枠内に示されるような、ベース点bnを中心としたサポート点シフト相関画像和を生成する。 In step S151-3, the correlation image generation unit 52 performs the corresponding base point correlation image (the image obtained as a result of the processing in step S151-1) and the corresponding m support points for the base point bn of the model image 21. By simply adding the shift correlation image (the image obtained as a result of the processing in step S151-2), the support point shift correlation centered on the base point bn as shown in the frame of S151-3 in FIG. Generate an image sum.

即ち、ベース点b1については、ベース点b1についてのベース点相関画像、並びに、サポート点s11についてのサポート点シフト相関画像およびサポート点s12についてのサポート点シフト相関画像が足し合わされ、その結果、ベース点b1を中心としたサポート点シフト相関画像和が生成される。 That is, for the base point b1, the base point correlation image for the base point b1, and the support point shift correlation image for the support point s11 and the support point shift correlation image for the support point s12 are added together. A support point shift correlation image sum centered on b1 is generated.

以下同様に、ベース点b2については、ベース点b2についてのベース点相関画像、並びに、サポート点s21についてのサポート点シフト相関画像、サポート点s22についてのサポート点シフト相関画像、および、サポート点s23についてのサポート点シフト相関画像が足し合わされ、その結果、ベース点b2を中心としたサポート点シフト相関画像和が生成される。 Similarly, for the base point b2, the base point correlation image for the base point b2, the support point shift correlation image for the support point s21, the support point shift correlation image for the support point s22, and the support point s23. These support point shift correlation images are added together, and as a result, a support point shift correlation image sum centered on the base point b2 is generated.

ベース点b3については、ベース点b3についてのベース点相関画像、並びに、サポート点s31についてのサポート点シフト相関画像およびサポート点s32についてのサポート点シフト相関画像が足し合わされ、その結果、ベース点b3を中心としたサポート点シフト相関画像和が生成される。 For the base point b3, the base point correlation image for the base point b3, and the support point shift correlation image for the support point s31 and the support point shift correlation image for the support point s32 are added together. A centered support point shift correlation image sum is generated.

ベース点b4については、ベース点b4についてのベース点相関画像、並びに、サポート点s41についてのサポート点シフト相関画像が足し合わされ、その結果、ベース点b4を中心としたサポート点シフト相関画像和が生成される。 For the base point b4, the base point correlation image for the base point b4 and the support point shift correlation image for the support point s41 are added, and as a result, a support point shift correlation image sum centered on the base point b4 is generated. Is done.

その後のステップＳ１５２，Ｓ１５３の処理は、図１３のステップＳ１３２，Ｓ１３３の処理と基本的に同様の処理が実行される。ただし、図１３のステップＳ１３２の処理対象は、図１５のステップＳ１５１−１の処理結果であるベース点相関画像となっていた。これに対して、図１５のステップＳ１５２の処理対象は、上述の如く、図１５のステップＳ１５１−１の処理結果であるベース点相関画像に対して、ステップＳ１５１−２の処理結果であるサポート点シフト相関画像が足し合わされた結果得られる画像、即ち、ベース点を中心としたサポート点シフト相関画像和である。 Subsequent processes in steps S152 and S153 are basically the same as the processes in steps S132 and S133 in FIG. However, the processing target in step S132 in FIG. 13 is the base point correlation image that is the processing result in step S151-1 in FIG. On the other hand, as described above, the processing target of step S152 in FIG. 15 is the support point that is the processing result of step S151-2 with respect to the base point correlation image that is the processing result of step S151-1 of FIG. This is an image obtained as a result of adding the shift correlation images, that is, a support point shift correlation image sum centered on the base point.

図１６の例は、図１４の例と図１５の例とを組み合わせた例である。即ち、図１６の例では、モデル画像２１の特徴量情報として、４つのベース点b1乃至b4の特徴量に加えて、それらの識別能力値に基づく重み値α１乃至α４と、各ベース点b1乃至b4のサポート点の情報との両者が利用されて、相関和画像が生成される。 The example of FIG. 16 is an example in which the example of FIG. 14 and the example of FIG. 15 are combined. That is, in the example of FIG. 16, in addition to the feature amounts of the four base points b1 to b4, as the feature amount information of the model image 21, weight values α1 to α4 based on their discrimination ability values and the base points b1 to b4 A correlation sum image is generated using both of the support point information of b4.

換言すると、図１６のステップＳ１６１の処理が、図１５のステップＳ１５１の処理と同様の処理である。即ち、図１６のステップＳ１６１−１乃至Ｓ１６１−３のそれぞれが、図１５のステップＳ１５１−１乃至Ｓ１５１−３のそれぞれと同様の処理である。 In other words, the process in step S161 in FIG. 16 is the same as the process in step S151 in FIG. That is, each of steps S161-1 to S161-3 in FIG. 16 is the same processing as each of steps S151-1 to S151-3 in FIG.

一方、図１６のステップＳ１６２の処理が、図１４のステップＳ１４２の処理と同様の処理である。即ち、図１６のステップＳ１６２−１，Ｓ１６２−２のそれぞれが、図１４のステップＳ１４１−１，Ｓ１４１−２のそれぞれと同様の処理である。 On the other hand, the process in step S162 in FIG. 16 is the same as the process in step S142 in FIG. That is, each of steps S162-1, S162-2 in FIG. 16 is the same processing as each of steps S141-1, S141-2 in FIG.

式で表すと、図１６のステップＳ１６１の処理結果は、次の式（１）のように表される。 Expressed as an expression, the processing result of step S161 in FIG. 16 is expressed as the following expression (1).

・・・（１）

... (1)

式（１）の左辺のSumSpCor_bn(x,y)が、ベース点bnを中心としたサポート点シフト相関画像和の座標（x,y）における画素値を示している。なお、nは、図１６の例では１乃至4のうちの何れかの値とされているが、任意の整数値に一般化できることはいうまでもない。 SumSpCor _bn (x, y) on the left side of Expression (1) indicates the pixel value at the coordinates (x, y) of the support point shift correlation image sum centered on the base point bn. Note that n is any value from 1 to 4 in the example of FIG. 16, but it goes without saying that it can be generalized to an arbitrary integer value.

また、式（１）の右辺において、Cor_snm(x,y)が、サポート点snmの相関画像の(x,y)における画素値を示している。m_bnは、ベース点bnにおけるサポート点の数を示している。即ち、図１６の例では、m_b1＝２，m_b2＝３，m_b3＝２，m_b4＝１とされている。(bx_n，by_n)は、ベース点bnの座標を示している。(snx_m，sny_m)は、サポート点snmの座標を示している。 In addition, on the right side of Equation (1), Cor _snm (x, y) indicates the pixel value in (x, y) of the correlation image at the support point snm. m _bn indicates the number of support points at the base point bn. That is, in the example of FIG. 16, m _b1 = 2, m _b2 = 3, m _b3 = 2 and m _b4 = 1. (bx _n , by _n ) indicates the coordinates of the base point bn. (snx _m , sny _m ) indicates the coordinates of the support point snm.

そして、図１６のステップＳ１６３の最終的な処理結果は、次の式（２）のように表される。即ち、式（２）の右辺のΣ内の式が、図１６のステップＳ１６２の処理結果を示している。 Then, the final processing result of step S163 in FIG. 16 is expressed as the following equation (2). That is, the expression in Σ on the right side of Expression (2) indicates the processing result of Step S162 in FIG.

・・・（２）

... (2)

式（２）の左辺のSumCor(x,y)が、ステップＳ１６３の処理の結果得られる相関和画像の座標（x,y）における画素値を示している。 SumCor (x, y) on the left side of Expression (2) indicates the pixel value at the coordinates (x, y) of the correlation sum image obtained as a result of the processing in step S163.

また、式（２）の右辺において、(cx,cy)が、モデル画像２１の中心座標を示している。 Further, on the right side of Expression (2), (cx, cy) indicates the center coordinates of the model image 21.

以上説明したように、本発明を適用することで、モデル画像と、クエリ画像の特徴点抽出のリピータビリティーを考慮する必要がなくなり、よりロバストな認識が可能になる。 As described above, by applying the present invention, it is not necessary to consider the repeatability of model image and query image feature point extraction, and more robust recognition is possible.

また、相関画像和の所定画素値（例えば中央付近の画素値）、即ち、相関値の総和の値が、物体の存在推定度を表すので、この値を比較することにより、どの物体がどれ位の確率で存在しているかが分かるようになる。 In addition, the predetermined pixel value of the correlation image sum (for example, the pixel value near the center), that is, the sum of the correlation values represents the existence estimation degree of the object. You can see if it exists with the probability of.

また、自分のモデル画像の他の部分や、他のモデル画像との相関具合を考慮して、特徴量の識別能力値を演算し、その識別能力値に基づいてサポート点の選択もできるので、マッチングの精度が向上する。 Also, considering the correlation with other parts of your model image and other model images, you can calculate the feature value discrimination ability value, and you can also select the support point based on the discrimination ability value, Matching accuracy is improved.

上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software may execute various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

図１７は、上述した一連の処理をプログラムにより実行するパーソナルコンピュータの構成の例を示すブロック図である。CPU（Central Processing Unit）２０１は、ROM（Read Only Memory）２０２、または記憶部２０８に記憶されているプログラムに従って各種の処理を実行する。RAM（Random Access Memory）２０３には、CPU２０１が実行するプログラムやデータなどが適宜記憶される。これらのCPU２０１、ROM２０２、およびRAM２０３は、バス２０４により相互に接続されている。 FIG. 17 is a block diagram showing an example of the configuration of a personal computer that executes the above-described series of processing by a program. A CPU (Central Processing Unit) 201 executes various processes according to a program stored in a ROM (Read Only Memory) 202 or a storage unit 208. A RAM (Random Access Memory) 203 appropriately stores programs executed by the CPU 201 and data. The CPU 201, ROM 202, and RAM 203 are connected to each other via a bus 204.

CPU２０１にはまた、バス２０４を介して入出力インターフェース２０５が接続されている。入出力インターフェース２０５には、キーボード、マウス、マイクロフォンなどよりなる入力部２０６、ディスプレイ、スピーカなどよりなる出力部２０７が接続されている。CPU２０１は、入力部２０６から入力される指令に対応して各種の処理を実行する。そして、CPU２０１は、処理の結果を出力部２０７に出力する。 An input / output interface 205 is also connected to the CPU 201 via the bus 204. Connected to the input / output interface 205 are an input unit 206 made up of a keyboard, mouse, microphone, and the like, and an output unit 207 made up of a display, speakers, and the like. The CPU 201 executes various processes in response to commands input from the input unit 206. Then, the CPU 201 outputs the processing result to the output unit 207.

入出力インターフェース２０５に接続されている記憶部２０８は、例えばハードディスクからなり、CPU２０１が実行するプログラムや各種のデータを記憶する。通信部２０９は、インターネットやローカルエリアネットワークなどのネットワークを介して外部の装置と通信する。 The storage unit 208 connected to the input / output interface 205 includes, for example, a hard disk, and stores programs executed by the CPU 201 and various data. The communication unit 209 communicates with an external device via a network such as the Internet or a local area network.

また、通信部２０９を介してプログラムを取得し、記憶部２０８に記憶してもよい。 Further, a program may be acquired via the communication unit 209 and stored in the storage unit 208.

入出力インターフェース２０５に接続されているドライブ２１０は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア２１１が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じて記憶部２０８に転送され、記憶される。 The drive 210 connected to the input / output interface 205 drives a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and drives the program or data recorded therein. Get etc. The acquired program and data are transferred to and stored in the storage unit 208 as necessary.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納するプログラム記録媒体は、図１７に示されるように、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)を含む）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア２１１、または、プログラムが一時的もしくは永続的に格納されるROM２０２や、記憶部２０８を構成するハードディスクなどにより構成される。プログラム記録媒体へのプログラムの格納は、必要に応じてルータ、モデムなどのインターフェースである通信部２０９を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 As shown in FIG. 17, a program recording medium that stores a program that is installed in a computer and can be executed by the computer is a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read). Only memory), DVD (Digital Versatile Disc), removable media 211, which is a package medium composed of a magneto-optical disk, semiconductor memory, or the like, or ROM 202 where a program is temporarily or permanently stored, or a storage unit It is constituted by a hard disk or the like constituting 208. The program is stored in the program recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via a communication unit 209 that is an interface such as a router or a modem as necessary. Done.

なお、本明細書において、プログラム記録媒体に格納されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the step of describing the program stored in the program recording medium is not limited to the processing performed in time series in the order described, but is not necessarily performed in time series. Or the process performed separately is also included.

また、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。例えば、以上においては、本発明を物体認識装置に適用した実施の形態について説明したが、本発明は、例えば、画像内の物体を比較し認識する情報処理装置に適用することができる。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention. For example, the embodiment in which the present invention is applied to an object recognition apparatus has been described above. However, the present invention can be applied to an information processing apparatus that compares and recognizes objects in an image, for example.

本発明の一実施の形態である物体認識装置の機能の構成を示すブロック図である。It is a block diagram which shows the structure of the function of the object recognition apparatus which is one embodiment of this invention. 図１のモデル特徴量抽出部の機能の詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of the function of the model feature-value extraction part of FIG. 図２の特徴点抽出部の処理結果の具体例を示す図である。It is a figure which shows the specific example of the process result of the feature point extraction part of FIG. 図２の特徴点抽出部の処理結果の具体例を示す図である。It is a figure which shows the specific example of the process result of the feature point extraction part of FIG. 図２の特徴量記述部の処理手法の例を説明する図である。It is a figure explaining the example of the processing method of the feature-value description part of FIG. 図２の特徴点識別能力値演算部の処理例を説明するフローチャートである。It is a flowchart explaining the process example of the feature point identification capability value calculating part of FIG. 図６の処理結果の具体例を示す図である。It is a figure which shows the specific example of the process result of FIG. 図６の処理結果の具体例を示す図である。It is a figure which shows the specific example of the process result of FIG. 図２のサポート点選択部によるサポート点選択処理例を説明するフローチャートである。It is a flowchart explaining the example of a support point selection process by the support point selection part of FIG. 図９の処理結果の具体例を示す図である。It is a figure which shows the specific example of the process result of FIG. 図１のクエリ画像認識部の機能の詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of the function of the query image recognition part of FIG. 図１１のクエリ画像認識部の処理を説明するためのモデル画像とクエリ画像の具体例を示す図である。It is a figure which shows the specific example of the model image and query image for demonstrating the process of the query image recognition part of FIG. 図１１のクエリ画像認識部の処理結果の具体例を示す図である。It is a figure which shows the specific example of the processing result of the query image recognition part of FIG. 図１１のクエリ画像認識部の処理結果の具体例を示す図である。It is a figure which shows the specific example of the processing result of the query image recognition part of FIG. 図１１のクエリ画像認識部の処理結果の具体例を示す図である。It is a figure which shows the specific example of the processing result of the query image recognition part of FIG. 図１１のクエリ画像認識部の処理結果の具体例を示す図である。It is a figure which shows the specific example of the processing result of the query image recognition part of FIG. パーソナルコンピュータの構成の例を示すブロック図である。And FIG. 11 is a block diagram illustrating an example of a configuration of a personal computer.

Explanation of symbols

１１モデル特徴量抽出部，１２モデル特徴量辞書，１３クエリ画像認識部，２１モデル画像２１クエリ画像，３１特徴点抽出部，３２特徴量記述部，３３特徴点識別能力値演算部，３４サポート点選択部，５１特徴画像生成部，５２相関画像生成部，５３シフト相関画像生成部，５４相関画像和生成部，５５判定部，２０１ＣＰＵ，２０２ＲＯＭ，２０３ＲＡＭ，２０８記憶部，２１１リムーバブルメディア DESCRIPTION OF SYMBOLS 11 Model feature-value extraction part, 12 Model feature-value dictionary, 13 Query image recognition part, 21 Model image 21 Query image, 31 Feature point extraction part, 32 Feature-value description part, 33 Feature point identification capability value calculation part, 34 Support point Selection unit, 51 feature image generation unit, 52 correlation image generation unit, 53 shift correlation image generation unit, 54 correlation image sum generation unit, 55 determination unit, 201 CPU, 202 ROM, 203 RAM, 208 storage unit, 211 removable media

Claims

In an information processing apparatus that compares a query image and a model image and identifies a subject of the model image and a subject of the query image,
When N feature points are extracted from the model image (N is an integer value equal to or greater than 1), and the feature amounts of the extracted N feature points are respectively described, the N feature points and their feature points are described. A model dictionary in which information indicating the feature amount is registered exists inside or outside itself,
Correlation image generation means for generating a correlation image between the corresponding feature amount and the query image for each of the N feature points of the model image registered in the model dictionary;
For each of the N correlation images generated by the correlation image generation unit, the pixel position of each pixel is shifted in accordance with the arrangement position of the corresponding feature point in the model image, so that N Shift correlation image generation means for generating a shift correlation image;
Correlation sum image generation means for generating a correlation sum image by adding pixel values of each pixel of the N shift correlation images generated by the shift correlation image generation means;
An information processing apparatus comprising: a determination unit that determines whether the subject of the model image matches the subject of the query image based on the correlation sum image generated by the correlation sum image generation unit.

For each of the one or more feature points of the model image registered in the model dictionary, the described feature quantity of the model image, the model image from which the model image is extracted, and one or more different model images When correlation images are respectively generated and, based on the correlation images, a discrimination capability value indicating a degree of contribution for identifying the subject of the model image is calculated, the discrimination capability values also correspond to the corresponding feature It is further registered in the model dictionary along with information indicating points,
The shift correlation image generation means further weights the pixel values of each pixel of the N shift correlation images according to the identification capability value registered in the model dictionary, thereby giving N weighted values. Generate a shift correlation image,
The correlation sum image generation unit generates the correlation sum image by adding pixel values of pixels of the N weighted shift correlation images generated by the shift correlation image generation unit. The information processing apparatus described.

At least one of the one or more feature points of the model image registered in the model dictionary is a base point, and one or more support points from among the feature points existing within a certain range of the base point Is selected, information indicating their base points and support points is also registered in the model dictionary,
The correlation image generation means further includes:
For each of the N feature points of the model image registered in the model dictionary, the feature amount of the corresponding mb support points (mb is an integer value of 0 or more) and the mb of the query image Each of the support point correlation images is generated, and for each of the mb support point correlation images, the pixel position of each pixel is determined according to the arrangement position of the corresponding support point and the base point in the model image. By shifting, mb support point shift correlation images are generated,
For each of the N correlation images, by adding the pixel values of each pixel of itself and the mb support point shift correlation images, N correlation image sums are generated,
The shift correlation image generation means includes
The information processing apparatus according to claim 1, wherein the N shift correlation images are generated from each of the N correlation image sums generated by the correlation image generation unit.

For each of the one or more feature points of the model image registered in the model dictionary, the described feature quantity of the model image, the model image from which the model image is extracted, and one or more different model images When correlation images are respectively generated and, based on the correlation images, a discrimination capability value indicating a degree of contribution for identifying the subject of the model image is calculated, the discrimination capability values also correspond to the corresponding feature It is further registered in the model dictionary along with information indicating points,
At least one of the one or more feature points of the model image registered in the model dictionary is set as a base point, and the discrimination ability value is selected from the feature points existing within a certain range of the base point. When the feature point higher than the base point is selected as a support point, information indicating the base point and the support point is also registered in the model dictionary,
The correlation image generation means further includes:
For each of the N feature points of the model image registered in the model dictionary, the feature amount of the corresponding mb support points (mb is an integer value of 0 or more) and the mb of the query image Each of the support point correlation images is generated, and for each of the mb support point correlation images, the pixel position of each pixel is determined according to the arrangement position of the corresponding support point and the base point in the model image. By shifting, mb support point shift correlation images are generated,
For each of the N correlation images, by adding the pixel values of each pixel of itself and the mb support point shift correlation images, N correlation image sums are generated,
The shift correlation image generation means includes
N weighted shifts are performed by weighting the pixel value of each pixel of the N correlated image sums generated by the correlated image generating means in accordance with the discrimination ability value registered in the model dictionary. Generate correlation images,
The correlation sum image generation unit generates the correlation sum image by adding pixel values of pixels of the N weighted shift correlation images generated by the shift correlation image generation unit. The information processing apparatus described.

In the information processing method of the information processing apparatus for comparing the query image and the model image and identifying the subject of the model image and the subject of the query image,
When N feature points are extracted from the model image (N is an integer value equal to or greater than 1), and the feature amounts of the extracted N feature points are respectively described, the N feature points and their feature points are described. A model dictionary in which information indicating the feature amount is registered exists inside or outside itself,
As the step executed by the information processing apparatus,
For each of the N feature points of the model image registered in the model dictionary, a correlation image between the corresponding feature amount and the query image is generated, respectively.
For each of the generated N correlation images, N shift correlation images are generated by shifting the pixel position of each pixel according to the arrangement position of the corresponding feature point in the model image. ,
A correlation sum image is generated by adding pixel values of each pixel of the generated N shift correlation images,
An information processing method including a step of determining whether or not a subject of the model image matches a subject of the query image based on the generated correlation sum image.

A computer that controls an information processing apparatus that compares a query image and a model image and identifies a subject of the model image and a subject of the query image,
When N feature points are extracted from the model image (N is an integer value equal to or greater than 1), and the feature amounts of the extracted N feature points are respectively described, the N feature points and their feature points are described. In a computer in which a model dictionary in which information indicating the feature amount is registered exists inside or outside itself,
For each of the N feature points of the model image registered in the model dictionary, a correlation image between the corresponding feature amount and the query image is generated, respectively.
For each of the generated N correlation images, N shift correlation images are generated by shifting the pixel position of each pixel according to the arrangement position of the corresponding feature point in the model image. ,
A correlation sum image is generated by adding pixel values of each pixel of the generated N shift correlation images,
A program for executing a step of determining whether or not a subject of the model image matches a subject of the query image based on the generated correlation sum image.