JP5382786B2

JP5382786B2 - Feature quantity generation device, feature quantity generation method and feature quantity generation program, class discrimination device, class discrimination method, and class discrimination program

Info

Publication number: JP5382786B2
Application number: JP2009121244A
Authority: JP
Inventors: 達也原田; 英樹中山; 康夫國吉
Original assignee: University of Tokyo NUC
Current assignee: University of Tokyo NUC
Priority date: 2009-05-19
Filing date: 2009-05-19
Publication date: 2014-01-08
Anticipated expiration: 2029-05-19
Also published as: WO2010134539A1; JP2010271787A

Description

本発明は、実世界情報を示す１つのデータから抽出された複数の高次局所特徴ベクトルを用いて当該１つのデータ全体の特徴を示す特徴ベクトルを生成する特徴量生成装置、特徴量生成方法および特徴量生成プログラム、ならびに実世界情報を示す新奇データが複数のクラスのいずれに属するかを判別するクラス判別装置、クラス判別方法およびクラス判別プログラムに関する。 The present invention relates to a feature amount generation apparatus, a feature amount generation method, and a feature amount generation method, which generate a feature vector indicating the entire feature of one piece of data using a plurality of higher-order local feature vectors extracted from one piece of data indicating real world information. The present invention relates to a feature amount generation program, a class determination device, a class determination method, and a class determination program for determining to which of a plurality of classes new data indicating real world information belongs.

画像データや音声データ等は、視覚情報や聴覚情報といった実世界情報を示すものであるが、この種の実世界情報を示すデータの検索や新奇データの内容判別を可能とするためには、実世界情報を示すデータ全体の特徴量を適正に把握しておく必要がある。従来、１つの画像データ全体の特徴（global feature）を表すのに用いられる手法として、Bag-of-Keypoints法が知られている（例えば、非特許文献１参照）。Bag-of-Keypoints法は、所定の局所特徴記述子を用いて対象となる画像データから抽出された局所特徴ベクトルをクラスタリングすると共にクラスタの代表ベクトル（visual words）を求め、画像データから抽出された局所特徴を最も近い“visual words”に割り当てることにより画像データ全体の特徴を局所特徴の集合として表現するものである。なお、局所特徴の抽出に際して必要な特徴点の検出（選択）手法としては、“Difference of Gaussian”やランダムな特徴点検出手法（例えば、非特許文献２参照）、“Dense Sampling”と呼ばれるグリッドによる特徴点検出手法（例えば、非特許文献３参照）等が知られている。また、局所特徴記述子としては、エッジヒストグラムやＨＳＶカラーヒストグラム等が知られているが、近年では“SIFT記述子”（例えば、非特許文献４参照）等も用いられている。 Image data, audio data, and the like indicate real world information such as visual information and auditory information. In order to be able to search for data indicating this type of real world information and to determine the contents of novel data, It is necessary to appropriately grasp the feature amount of the entire data indicating the world information. Conventionally, the Bag-of-Keypoints method is known as a method used to represent a global feature of one piece of image data (see, for example, Non-Patent Document 1). The Bag-of-Keypoints method uses a predetermined local feature descriptor to cluster local feature vectors extracted from target image data, obtains representative vectors (visual words) of the clusters, and extracts them from the image data. By assigning local features to the closest “visual words”, the features of the entire image data are expressed as a set of local features. In addition, as a feature point detection (selection) method necessary for local feature extraction, “Difference of Gaussian”, a random feature point detection method (for example, refer to Non-Patent Document 2), or a grid called “Dense Sampling” is used. A feature point detection method (see, for example, Non-Patent Document 3) is known. As local feature descriptors, edge histograms, HSV color histograms, and the like are known, but in recent years, “SIFT descriptors” (for example, see Non-Patent Document 4) are also used.

G. Csurka, C. R. Dance, L. Fan, J. Willamowski and C. Bray. Visual Categorization with bags of keypoints. In Proc. ECCV Workshop on Statistical Learning in Computer Vision, 2004.G. Csurka, C. R. Dance, L. Fan, J. Willamowski and C. Bray. Visual Categorization with bags of keypoints. In Proc. ECCV Workshop on Statistical Learning in Computer Vision, 2004. E. Nowak, F. Jurie, and B. Trigges. Sampling strategies for bag-of-features image classification. In Proc. European Conference on Computer Vision, pages 490・503, 2006.E. Nowak, F. Jurie, and B. Trigges. Sampling strategies for bag-of-features image classification.In Proc.European Conference on Computer Vision, pages 490 ・ 503, 2006. L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 524・531, 2005.L. Fei-Fei and P. Perona.A bayesian hierarchical model for learning natural scene categories.In Proc.IEEE Conf.Computer Vision and Pattern Recognition, pages 524 ・ 531, 2005. D. G. Lowe. Object recognition from local scale-invariant features. In Proc. IEEE International Conference on Computer Vision, pages 1150・1157, 1999.D. G. Lowe.Object recognition from local scale-invariant features.In Proc.IEEE International Conference on Computer Vision, pages 1150/1157, 1999.

ここで、実世界情報を示す１つのデータ全体の特徴を取得するためには、基本的に次の１）〜３）のプロセスが必要となる。
１）画像の特徴的な点（特徴点）の検出と，その特徴点のスケール・オリエンテーションの正規化
２）特徴点の周辺における部分的な画像特徴（局所特徴）の記述
３）すべての局所特徴を利用して最終的な画像特徴を算出
ここで、上記１）および２）のプロセスに関しては、例えば上述の“SIFT記述子”のような精度のよい局所特徴をより少ない計算コストで抽出可能とするものが提案されている。しかしながら、上記３）の局所特徴からの最終的な特徴量の算出に関しては、計算コストの面での課題が依然として解消されておらず、最終的に得られた特徴量の精度（特徴の表現度）にも課題が残されている。例えば上述のBag-of-Keypoints法は、クラスタリングに極めて多大な時間を要するものであり、長時間かけて得られた特徴ベクトルを用いても画像認識精度の飛躍的改善が認められておらず、いわゆるスケーラビリティに劣っている。また、１つのデータ全体の特徴を示す特徴量の精度が低いと、新奇データの出現時における当該新奇データの内容判別（クラス判別）の精度もおのずと低下してしまうことになる。 Here, the following processes 1) to 3) are basically required in order to acquire the characteristics of the entire data indicating the real world information.
1) Detection of characteristic points (feature points) of images and normalization of scale orientation of the feature points 2) Description of partial image features (local features) around the feature points 3) All local features Here, the final image feature is calculated by using the above-described processes 1) and 2). For example, it is possible to extract a local feature with high accuracy such as the above-mentioned “SIFT descriptor” with less calculation cost. What to do has been proposed. However, with regard to the calculation of the final feature value from the local feature in 3) above, the problem in terms of calculation cost has not yet been solved, and the accuracy of the finally obtained feature value (feature expression level) ) Is still a problem. For example, the Bag-of-Keypoints method described above requires an extremely large amount of time for clustering, and even if a feature vector obtained over a long time is used, a dramatic improvement in image recognition accuracy has not been recognized. It is inferior in so-called scalability. In addition, if the accuracy of the feature quantity indicating the characteristics of one piece of data is low, the accuracy of content determination (class determination) of the novel data at the time of appearance of the novel data will naturally decrease.

そこで、本発明による特徴量生成装置、特徴量生成方法および特徴量生成プログラムは、実世界情報を示すデータから精度のよい特徴量をより少ない計算コストで生成可能にすることを主目的とする。また、本発明によるクラス判別装置、クラス判別方法およびクラス判別プログラムは、実世界情報を示す新奇データが複数のクラスのいずれに属するかを高精度に判別可能にすることを主目的とする。 In view of the above, a feature amount generation apparatus, feature amount generation method, and feature amount generation program according to the present invention are mainly intended to enable generation of an accurate feature amount from data indicating real world information with less calculation cost. The main object of the class discriminating apparatus, class discriminating method and class discriminating program according to the present invention is to make it possible to discriminate with high accuracy which of a plurality of classes new data indicating real world information belongs.

本発明の特徴量生成装置、特徴量生成方法および特徴量生成プログラム、ならびにクラス判別装置、クラス判別方法およびクラス判別プログラムは、上述の主目的を達成するために以下の手段を採っている。 The feature quantity generation device, feature quantity generation method and feature quantity generation program, class discrimination device, class discrimination method and class discrimination program of the present invention employ the following means in order to achieve the main object described above.

本発明の特徴量生成装置は、
実世界情報を示す１つのデータから抽出された複数の高次局所特徴ベクトルを用いて前記１つのデータ全体の特徴を示す特徴ベクトルを生成する特徴量生成装置であって、
前記複数の高次局所特徴ベクトルの平均ベクトルを取得する平均取得手段と、
前記複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトル（ただし、“Ｍ”は値１以上の整数であり、“ｍ”は値１から値Ｍまでの整数である）を取得する相関取得手段と、
前記平均取得手段により取得された平均ベクトルを構成する要素と、前記相関取得手段により取得されたｍ次相関ベクトルを構成する要素とに基づいて前記特徴ベクトルを取得する特徴ベクトル取得手段と、
を備えるものである。 The feature value generation apparatus of the present invention is:
A feature quantity generation device that generates a feature vector that represents a feature of the entire data using a plurality of higher-order local feature vectors extracted from one data that represents real world information,
Average acquisition means for acquiring an average vector of the plurality of higher-order local feature vectors;
First-order to M-th order m-order correlation vectors between the plurality of higher-order local feature vectors (where “M” is an integer greater than or equal to value 1, and “m” is an integer between value 1 and value M) Correlation acquisition means for acquiring
Feature vector obtaining means for obtaining the feature vector based on elements constituting the average vector obtained by the average obtaining means and elements constituting the m-th order correlation vector obtained by the correlation obtaining means;
Is provided.

本発明者らは、実世界情報を示す１つのデータ全体の特徴（global feature）を表現する際の計算コストを削減する観点から、単純に、当該データから抽出された複数の高次局所特徴ベクトルの平均ベクトルをベースとしてデータ全体の特徴（global feature）を表現することに立ち返った。ただし、高次局所特徴ベクトルの平均のみを用いたのでは、データ全体の特徴を適正に表現する上で重要な局所特徴の分布情報のすべてが喪失されてしまうことになる。従って、特徴表現度の高い特徴ベクトルを得るためには、局所特徴の分布情報をより適正に表現することが重要となるが、当該分布情報は、最終的な特徴ベクトルの生成に多大な計算時間を要する比較的複雑な従来手法においても充分に表現されていない。これは、一般に１つのデータから抽出される局所特徴ベクトルの数は、計算処理面から見れば多いものであるが、大域的に見ればさほど多いものとはいえない（まばらである）ということに起因していると考えられる。これを踏まえて、本発明者らは、複数の高次局所特徴ベクトル間におけるｍ次相関ベクトルに着目し、当該ｍ次相関ベクトルを用いて局所特徴の分布情報を表現することとした。すなわち、本発明による特徴量生成装置では、実世界情報を示す１つのデータから抽出された複数の高次局所特徴ベクトルの平均ベクトルを構成する要素と、複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトルを構成する要素とに基づいて当該データ全体の特徴を示す特徴ベクトルが取得される。ここで、ｍ次相関ベクトルは、例えばクラスタリング等に比して大幅に軽負荷な計算処理により取得可能なものであり、しかも、重要な特徴要素間の相関すなわち局所特徴の分布情報を良好に表すものである。この結果、この特徴量生成装置によれば、計算コストを大幅に削減しつつ実世界情報を示すデータから精度のよい（特徴表現度の高い）特徴量を速やかに得ることが可能となる。なお、上記特徴ベクトルの生成に際してのｍ次相関ベクトルの最大次数（値Ｍ）は、高次局所特徴ベクトルの数や次元等に応じて任意に定められるものであり、１次であってもよく、２次、３次あるいは更に高次とされてもよい。更に、複数の高次局所特徴ベクトルの平均ベクトルは、複数の高次局所特徴ベクトル間における０次相関ベクトルとも表現され得ることから、平均取得手段と相関取得手段とは、単一の計算処理モジュールにより構成されてもよい。 From the viewpoint of reducing the calculation cost when expressing the global feature of one piece of data indicating real world information, the present inventors simply use a plurality of higher-order local feature vectors extracted from the data. We returned to expressing the global feature of the whole data based on the average vector. However, if only the average of the higher-order local feature vectors is used, all the distribution information of the local features that are important in properly expressing the features of the entire data will be lost. Therefore, in order to obtain a feature vector having a high feature expression level, it is important to express the distribution information of local features more appropriately. However, the distribution information requires a great amount of calculation time to generate a final feature vector. However, it is not sufficiently expressed even in a relatively complicated conventional method that requires a large amount of data. This is because the number of local feature vectors extracted from one data is generally large from the viewpoint of calculation processing, but is not so large (sparse) from a global viewpoint. It is thought to be caused. Based on this, the present inventors focused on the m-th order correlation vector between a plurality of higher-order local feature vectors, and decided to express the distribution information of local features using the m-th order correlation vector. That is, in the feature quantity generation device according to the present invention, the elements constituting the average vector of a plurality of higher-order local feature vectors extracted from one data indicating real world information and the first between the plurality of higher-order local feature vectors. Based on the elements constituting the m-th order correlation vectors from the next to the M-th order, a feature vector indicating the characteristics of the entire data is acquired. Here, the m-th order correlation vector can be obtained by calculation processing that is significantly lighter than that of, for example, clustering and the like, and represents a correlation between important feature elements, that is, distribution information of local features. Is. As a result, according to the feature quantity generation device, it is possible to quickly obtain a feature quantity with high accuracy (high feature expression) from data indicating real world information while greatly reducing the calculation cost. The maximum order (value M) of the m-th order correlation vector at the time of generating the feature vector is arbitrarily determined according to the number and dimensions of higher-order local feature vectors, and may be the first order. It may be second order, third order or higher order. Furthermore, since the average vector of a plurality of higher-order local feature vectors can also be expressed as a zero-order correlation vector between the plurality of higher-order local feature vectors, the average acquisition means and the correlation acquisition means are a single calculation processing module. It may be constituted by.

また、実世界情報を示す１つのデータＩから抽出されたｐ個のｄ次局所特徴ベクトルをＶ_k＝（ｖ₁，…，ｖ_d）としたときに（ただし、“ｐ”および“ｄ”はそれぞれ値２以上の整数であり、“ｋ”は値１から値ｐまでの整数である）、前記平均取得手段は、前記ｐ個のｄ次局所特徴ベクトルＶ_kの平均ベクトルμを次式（１）に従って取得し、前記相関取得手段は、前記ｐ個のｄ次局所特徴ベクトルＶ_kの自己相関行列Ｒを次式（２）に従って取得すると共に該自己相関行列Ｒの上三角行列の要素を列挙して１次相関ベクトルｕｐｐｅｒ（Ｒ）を取得するものであってもよく、前記特徴ベクトル取得手段は、前記特徴ベクトルをＸとしたときに、次式（３）に従って平均ベクトルμの要素と１次相関ベクトルｕｐｐｅｒ（Ｒ）の要素とを列挙することにより該特徴ベクトルＸを取得するものであってもよい。これにより、多数かつ比較的高次の局所特徴ベクトルからデータ全体の特徴をより適正に表す特徴ベクトルを速やかに生成することが可能となる。
Further, when p d-order local feature vectors extracted from one data I indicating real world information are V _k = (v ₁ ,..., V _d ) (where “p” and “d”) Each is an integer greater than or equal to value 2 and “k” is an integer from value 1 to value p), and the average acquisition means calculates the average vector μ of the p d-order local feature vectors V _k as The correlation acquisition means acquires the autocorrelation matrix R of the p d-order local feature vectors V _k according to the following equation (2) and the elements of the upper triangular matrix of the autocorrelation matrix R: To obtain the primary correlation vector upper (R), and the feature vector acquisition means, when the feature vector is X, is an element of the average vector μ according to the following equation (3) And elements of the primary correlation vector upper (R) It may be configured to obtain the feature vector X by. This makes it possible to quickly generate a feature vector that more appropriately represents the features of the entire data from a large number of relatively high-order local feature vectors.

更に、前記相関取得手段は、主成分分析による前記高次局所特徴ベクトルの次元圧縮を伴って前記ｍ次相関ベクトルを取得するものであってもよい。これにより、高次局所特徴ベクトルの次元がより高い場合に、ｍ次相関ベクトルの取得に伴う計算コストを削減することが可能となる。また、次元圧縮によりデータ全体の特徴を表現する上で不要な局所特徴を除去することも可能となる。 Furthermore, the correlation acquisition means may acquire the m-th order correlation vector with dimensional compression of the higher-order local feature vector by principal component analysis. As a result, when the dimension of the higher-order local feature vector is higher, it is possible to reduce the calculation cost associated with the acquisition of the m-th order correlation vector. Also, it is possible to remove local features that are unnecessary for expressing the features of the entire data by dimensional compression.

また、実世界情報を示すＮ個のデータＩ^(j)が存在するとし（ただし、“Ｎ”は値２以上の整数であり、“ｊ”は値１から値Ｎまでの整数である）、１つのデータＩ^(j)から抽出されたｐ^(j)個のｄ次局所特徴ベクトルをＶ_k ^(j)＝（ｖ₁，…，ｖ_d）とし（ただし、“ｐ^(j)”および“ｄ”はそれぞれ値２以上の整数であり、“ｋ”は値１から値ｐまでの整数である）、前記平均取得手段により取得される前記ｐ^(j)個のｄ次局所特徴ベクトルＶ_kの平均ベクトルを次式（４）に示すμ^(j)とし、前記ｐ^(j)個のｄ次局所特徴ベクトルＶ_k ^(j)の自己相関行列を次式（５）に示すＲ^(j)とし、前記Ｎ個のデータから抽出されたｄ次局所特徴ベクトル全体の自己相関行列を次式（６）に示すＲ_allとし、新奇データをＩ^(j+1)としたときに、前記相関取得手段は、次式（７）の固有値問題を解いて得られるｄ次よりも低次であるｄｌ次の主成分空間への射影行列Ｕ_dlと、新奇データＩ^(j+1)から抽出されたｐ^(j+1)個のｄ次局所特徴ベクトルＶ_k ^(j+1)の自己相関行列Ｒ^(j+1)とに基づく対角行列Ｕ_dl ^TＲ^(j+1)Ｕ_dlを取得すると共に、該対角行列Ｕ_dl ^TＲ^(j+1)Ｕ_dlの上三角行列の要素を列挙して１次相関ベクトルｕｐｐｅｒ（Ｕ_dl ^TＲ^(j+1)Ｕ_dl）を取得するものであってもよく、前記特徴ベクトル取得手段は、次式（８）に従って前記ｐ^(j+1)個のｄ次局所特徴ベクトルＶ_k ^(j+1)の平均ベクトルμ^(j+1)を構成する要素と１次相関ベクトルｕｐｐｅｒ（Ｕ_dl ^TＲ^(j+1)Ｕ_dl）を構成する要素とを列挙することにより新奇データＩ^(j+1)の特徴ベクトルＸ^(j+1)を取得するものであってもよい。この場合、予めＮ個のデータＩ^(j)を用いて射影行列Ｕ_dlを求めておくことにより、新奇データＩ^(j+1)が出現したときに、当該新奇データＩ^(j+1)の特徴ベクトルＸ^(j+1)を速やかに取得することが可能となる。
Also, assume that there are N pieces of data I ^(j) indicating real world information (where “N” is an integer greater than or equal to value 2 and “j” is an integer between value 1 and value N); The p ^(j) d-th order local feature vectors extracted from one data I ^(j) are set as V _k ^(j) = (v ₁ ,..., V _d ) (where “p ^(j) ” and “ d ”is an integer of value 2 or more, and“ k ”is an integer from value 1 to value p), and the p ^(j) d-th order local feature vectors V _k acquired by the average acquisition means. Is an average vector of μ ^(j) shown in the following equation (4), and an autocorrelation matrix of the p ^(j) d-order local feature vectors V _k ^(j) is R ^(j) shown in the following equation (5 ^). When the autocorrelation matrix of the entire d-th order local feature vector extracted from the N pieces of data is R _all shown in the following equation (6) and the novel data is I ^{(j + 1)} , the correlation acquisition is performed. Means A projection matrix U _dl to dl next principal component space than d following obtained by solving the eigenvalue problem is a low following equation (7), extracted from the novel data ^{^{I (j + 1) p (}} j ^A diagonal matrix U _dl ^T R ^{(j + 1)} U _dl based on the autocorrelation matrix R ^{(j + 1) of} ⁺¹⁾ d-th order local feature vectors V _k ^{(j + 1)} , and Even if the diagonal matrix U _dl ^T R ^{(j + 1)} U _dl is enumerated and the primary correlation vector upper (U _dl ^T R ^{(j + 1)} U _dl ) is obtained by enumerating the elements of the upper triangular matrix The feature vector acquisition means may include an element constituting an average vector μ ^{(j + 1)} of the p ^{(j + 1)} d-order local feature vectors V _k ^{(j + 1)} according to the following equation (8): intended to obtain the feature vector X of the novel data ^{^{I (j + 1) (j}} + 1) by enumerating the elements constituting the primary correlation vector _{^{^{upper (U dl T R (j}}} + 1) U dl) There may be. In this case, by calculating the projection matrix U _dl using N pieces of data I ^(j) in advance, when the new data I ^{(j + 1)} appears, the new data I ^{(j + 1)} The feature vector X ^{(j + 1)} can be acquired quickly.

本発明による特徴量生成方法は、
実世界情報を示す１つのデータから抽出された複数の高次局所特徴ベクトルを用いて前記１つのデータ全体の特徴を示す特徴ベクトルを生成する特徴量生成方法であって、
前記複数の高次局所特徴ベクトルの平均ベクトルと、前記複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトル（ただし、“Ｍ”は値１以上の整数であり、“ｍ”は値１から値Ｍまでの整数である）とを取得し、
前記取得された平均ベクトルを構成する要素と、前記取得されたｍ次相関ベクトルを構成する要素とに基づいて前記特徴ベクトルを取得するものである。 A feature value generation method according to the present invention includes:
A feature amount generation method for generating a feature vector indicating a feature of the entire data using a plurality of higher-order local feature vectors extracted from one data indicating real world information,
The average vector of the plurality of higher-order local feature vectors and the m-th order correlation vector from the first order to the M-th order between the plurality of higher-order local feature vectors (where “M” is an integer greater than or equal to 1) , “M” is an integer from value 1 to value M),
The feature vector is acquired based on elements constituting the acquired average vector and elements constituting the acquired m-th order correlation vector.

この方法によれば、計算コストを大幅に削減しつつ実世界情報を示すデータから精度のよい（特徴表現度の高い）特徴量を速やかに生成することが可能となる。 According to this method, it is possible to quickly generate a feature quantity with high accuracy (high feature expression) from data indicating real world information while greatly reducing the calculation cost.

本発明による特徴量生成プログラムは、
実世界情報を示す１つのデータから抽出された複数の高次局所特徴ベクトルを用いて前記１つのデータ全体の特徴を示す特徴ベクトルを生成する装置としてコンピュータを機能させる特徴量生成プログラムであって、
前記複数の高次局所特徴ベクトルの平均ベクトルを取得する平均取得モジュールと、
前記複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトル（ただし、“Ｍ”は値１以上の整数であり、“ｍ”は値１から値Ｍまでの整数である）を取得する相関取得モジュールと、
前記平均取得モジュールにより取得された平均ベクトルを構成する要素と、前記相関取得モジュールにより取得されたｍ次相関ベクトルを構成する要素とに基づいて前記特徴ベクトルを取得する特徴ベクトル取得モジュールと、
を備えるものである。 The feature value generation program according to the present invention is:
A feature amount generation program that causes a computer to function as a device that generates a feature vector indicating the characteristics of the entire one data using a plurality of higher-order local feature vectors extracted from one data indicating real world information,
An average acquisition module for acquiring an average vector of the plurality of higher-order local feature vectors;
First-order to M-th order m-order correlation vectors between the plurality of higher-order local feature vectors (where “M” is an integer greater than or equal to value 1, and “m” is an integer between value 1 and value M) A correlation acquisition module for acquiring
A feature vector acquisition module for acquiring the feature vector based on an element constituting the average vector acquired by the average acquisition module and an element constituting the m-th order correlation vector acquired by the correlation acquisition module;
Is provided.

この特徴量生成プログラムをインストールしたコンピュータを用いれば、計算コストを大幅に削減しつつ実世界情報を示すデータから精度のよい（特徴表現度の高い）特徴量を速やかに生成することが可能となる。 By using a computer in which this feature quantity generation program is installed, it is possible to quickly generate feature quantities with high accuracy (high feature expression) from data indicating real world information while greatly reducing calculation costs. .

本発明によるクラス判別装置は、
実世界情報を示す新奇データがそれぞれ少なくとも１つの既知データに対応した複数のクラスのいずれに属するかを判別するクラス判別装置であって、
第ｈ層で前記新奇データおよび前記既知データのそれぞれがｈ×ｈ個（ただし、“ｈ”は値１から値Ｈまでの整数であり、“Ｈ”は値２以上の整数である）の領域に分割されるとして、第１層から第Ｈ層までの各層で前記既知データのそれぞれを分割して得られる領域それぞれの特徴ベクトルに基づいて該領域ごとに導出された該特徴ベクトルを潜在空間に射影するための変換を記憶する変換記憶手段と、
第ｈ層で前記新奇データがｈ×ｈ個の領域に分割されるとして、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域のそれぞれから複数の高次局所特徴ベクトルを抽出する局所特徴抽出手段と、
第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域のそれぞれから前記局所特徴抽出手段により抽出された複数の高次局所特徴ベクトルの平均ベクトルと、該複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトル（ただし、“Ｍ”は値１以上の整数であり、“ｍ”は値１から値Ｍまでの整数である）とを取得すると共に、該平均ベクトルを構成する要素と該ｍ次相関ベクトルを構成する要素とに基づいて前記領域それぞれの特徴ベクトルを取得する特徴ベクトル取得手段と、
前記クラスごとに、第１層から第Ｈ層までの各層で前記既知データのそれぞれを分割して得られる領域それぞれの特徴ベクトルを該領域に対応した前記変換により前記潜在空間に射影して得られる射影点と、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域それぞれの特徴ベクトルを該領域に対応した前記変換により前記潜在空間に射影して得られる射影点とに基づいて、前記既知データの第ｈ層における第ｉ番目（ただし、“ｉ”は値１から値ｈ²までの整数である）の領域の特徴ベクトルから前記新奇データの第ｈ層における第ｉ番目の領域の特徴ベクトルが出現する確率のｉ＝１からｉ＝ｈ²かつ第１層から第Ｈ層までの総和を前記クラスから前記新奇データの特徴ベクトルが出現する確率として導出する確率導出手段と、
前記確率導出手段により導出された確率が最大となるクラスを前記新奇データが属するクラスとして設定するクラス設定手段と、
を備えるものである。 The class identification device according to the present invention comprises:
A class discriminating apparatus for discriminating to which one of a plurality of classes each corresponding to at least one known data each of novel data indicating real world information belongs,
Each of the novel data and the known data in the h-th layer is h × h (where “h” is an integer from value 1 to value H, and “H” is an integer greater than or equal to value 2). The feature vector derived for each region based on the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer is set as a latent space. Conversion storage means for storing conversion for projecting;
Assuming that the novel data is divided into h × h regions in the h-th layer, a plurality of higher-order local features from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer Local feature extraction means for extracting vectors;
An average vector of a plurality of higher-order local feature vectors extracted by the local feature extraction means from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer, and the plurality of higher-order M-th order correlation vectors from the first order to the M-th order between local feature vectors (where “M” is an integer greater than or equal to value 1 and “m” is an integer between value 1 and value M). And a feature vector acquisition means for acquiring a feature vector of each of the regions based on an element constituting the average vector and an element constituting the m-th order correlation vector,
For each class, it is obtained by projecting the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer into the latent space by the transformation corresponding to the region. A projection point, and a projection point obtained by projecting the feature vector of each region obtained by dividing the novel data in each layer from the first layer to the H-th layer onto the latent space by the transformation corresponding to the region; Based on the feature vector of the i-th area (where “i” is an integer from 1 to h ² ) in the h-th layer of the known data, the i-th in the h-th layer of the novel data Probability derivation for deriving the sum of the probability from the first layer to the H-th layer from i = 1 to i = h ² and the probability that the feature vector of the new data appears from the class Means,
Class setting means for setting a class having a maximum probability derived by the probability deriving means as a class to which the novel data belongs;
Is provided.

このクラス判別装置は、新奇データが複数のクラスのいずれに属するかを判別するに際して、第１層から第Ｈ層までの各層で複数の既知データおよび新奇データをｈ×ｈ分割して得られる領域のそれぞれから抽出された複数の高次局所特徴ベクトルの平均ベクトルを構成する要素と、当該複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトルを構成する要素とに基づいて取得される当該領域それぞれの特徴ベクトルを利用する。かかる特徴ベクトルは、低い計算コストで取得可能であると共に対象領域の特徴を良好に表現し得るものである。そして、このクラス判別装置は、クラスごとに、第１層から第Ｈ層までの各層で既知データのそれぞれを分割して得られる領域それぞれの特徴ベクトルを当該領域に対応した変換により潜在空間に射影して得られる射影点と、第１層から第Ｈ層までの各層で新奇データを分割して得られる領域それぞれの特徴ベクトルを当該領域に対応した変換により潜在空間に射影して得られる射影点とに基づいて、既知データの第ｈ層における第ｉ番目の領域の特徴ベクトルから新奇データの第ｈ層における第ｉ番目の領域の特徴ベクトルが出現する確率のｉ＝１からｉ＝ｈ²かつ第１層から第Ｈ層までの総和を当該クラスから新奇データの特徴ベクトルが出現する確率として導出する。このように、低い計算コストで取得可能であると共に高い特徴表現度を有する特徴ベクトルと、確率的線形判別分析（Probabilistic linear discriminant analysis：S. Ioffe. Probabilistic linear discriminant analysis. In Proc. European Conference on Computer Vision, pages 531-542, 2006. 参照）に対して潜在空間を多重化するという拡張を導入した手法とを用いることにより、あるクラスから新奇データの特徴ベクトルが出現する確率をより高精度かつ速やかに導出することができる。従って、このクラス判別装置によれば、クラスごとに導出された確率から、実世界情報を示す新奇データが複数のクラスのいずれに属するかをより高精度に判別することが可能となる。なお、第１層において１×１＝１個に分割される既知データや新奇データは、当該既知データや当該新奇データそのものとなり、第１層において抽出・生成される高次局所特徴ベクトルや特徴ベクトルは、既知データや新奇データそれ自体から直接に抽出・生成されるものとなる。 This class discriminating device is an area obtained by dividing a plurality of known data and novel data by h × h in each layer from the first layer to the H layer when discriminating which of the plurality of classes the new data belongs to. Elements constituting an average vector of a plurality of higher-order local feature vectors extracted from each of the elements, and elements constituting m-th order correlation vectors from the first order to the M-th order among the plurality of higher-order local feature vectors, The feature vector of each area acquired based on the above is used. Such a feature vector can be obtained at a low calculation cost and can well express the feature of the target region. Then, this class discriminating apparatus projects the feature vector of each area obtained by dividing each of the known data in each layer from the first layer to the Hth layer into the latent space for each class by conversion corresponding to the area. And the projection points obtained by projecting the feature vectors of the regions obtained by dividing the novel data in each layer from the first layer to the H layer into the latent space by conversion corresponding to the regions. Based on the above, i = 1 to i = h ^{2 of} the probability that the feature vector of the i-th region in the h-th layer of novel data appears from the feature vector of the i-th region in the h-th layer of known data and The sum from the first layer to the Hth layer is derived as the probability that a feature vector of novel data appears from the class. In this way, a feature vector that can be acquired at a low calculation cost and has a high degree of feature expression, and probabilistic linear discriminant analysis (S. Ioffe. Probabilistic linear discriminant analysis. In Proc. European Conference on Computer) Vision, pages 531-542, 2006.) and a method that introduces an extension that multiplexes the latent space, and the probability that the feature vector of novel data will appear from a certain class is more accurate and prompt. Can be derived. Therefore, according to this class discriminating apparatus, it is possible to discriminate with high accuracy which of the plurality of classes the novel data indicating the real world information belongs from the probability derived for each class. The known data and novel data divided into 1 × 1 = 1 pieces in the first layer become the known data and the novel data themselves, and higher-order local feature vectors and feature vectors extracted and generated in the first layer. Is extracted and generated directly from known data and novel data itself.

また、第ｈ層における第ｉ番目の領域についての前記変換は、前記クラスの個数をＧ（ただし、“Ｇ”は値２以上の整数である）とし、前記クラスをＣ_g（ただし、“ｇ”は値１から値Ｇまでの整数である）とし、クラスＣ_gからサンプルとして抽出される既知データであるサンプルデータの数をｎとし（だだし、“ｎ”は値１以上の整数である）、クラスＣ_gに属する第ｊ番目（ただし、“ｊ”は値１から値ｎまでの整数である）のサンプルデータの第ｈ層における第ｉ番目の領域の特徴ベクトルをＸ_j ^g(h,i)とし、クラスＣ_gに属するサンプルデータの第ｈ層におけるｉ番目の領域の特徴ベクトルＸ_j ^g(h,i)の平均ベクトルをＸ^-g(h,i)とし、クラスＣ_gに属する全サンプルデータの第ｈ層におけるｉ番目の領域の特徴ベクトルの平均ベクトルをμ_x ^(h,i)とし、第ｈ層におけるｉ番目の領域についてのクラス内共分散行列を次式（９）に示すΣ_w ^(h,i)とし、第ｈ層におけるｉ番目の領域についてのクラス外共分散行列を次式（１０）に示すΣ_b ^(h,i)としたときに、次式（１１）の固有値問題を解いて得られる射影行列Ｗ^(h,i)であってもよく（ただし、式（１１）の“Λ^(h,i)”は判別基準としての固有値を順番に対角に並べて得られる対角行列である）、前記特徴ベクトルをＸとし、前記射影行列をＷとし、該特徴ベクトルＸの射影点をｕとしたときに、第１層から第Ｈ層までの各層で前記サンプルデータのそれぞれを分割して得られる領域それぞれの特徴ベクトルの射影点と、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域それぞれの特徴ベクトルの射影点とは、次式（１２）に従って導出されてもよく、前記新奇データの特徴ベクトルをＸ_sとしたときに、クラスＣ_gから該特徴ベクトルＸ_sが出現する確率ｐ（Ｘ_s｜Ｃ_g）は、次式（１３）に基づいて導出されてもよい。ただし、式（１３）における添え字（ｈ，ｉ）は第ｈ層における第ｉ番目の領域に由来することを示し、添え字ｓは新奇データに由来することを示し、添え字Ｃ_gはクラスＣ_gに属することを示し、添え字１…ｎはクラスＣ_gに属する第１〜ｎ番目のサンプルデータに由来することを示し、“α^h”は、第ｈ層に対して付与される重みであり、式（１３）における“Ｚ^(h,i)Cg”および“Θ^(h,i)”は次式（１４）および（１５）に示すとおりであり、式（１４）におけるｕ^-(h、^j)CgはクラスＣ_gに属する特徴ベクトルＸ^(h、^j)Cgの射影点ｕ^(h、^j)Cgの平均であり、式（１４）および（１５）における“Ψ^(h,i)”は次式（１６）に示す潜在変数の分散であり、式（１６）におけるΛ^(h,i)は第ｈ層における第ｉ番目の領域における固有値問題の解である固有値を順番に対角に並べて得られる対角行列である。これにより、あるクラスから新奇データの特徴ベクトルが出現する確率をより高精度に導出することが可能となる。
In the transformation for the i-th region in the h-th layer, the number of the classes is G (where “G” is an integer equal to or greater than 2), and the class is C _g (where “g “Is an integer from value 1 to value G), and n is the number of sample data that is known data extracted as a sample from class C _g (where“ n ”is an integer greater than or equal to value 1). ), The feature vector of the i-th region in the h-th layer of the j-th sample data belonging to the class C _g (where “j” is an integer from the value 1 to the value n) is represented by X _j ^{g (h , i)} , the average vector of the feature vector X _j ^{g (h, i)} of the i-th region in the h-th layer of sample data belonging to class C _g is X ^{−g (h, i),} and class C _g The average vector of feature vectors of the i-th region in the h-th layer of all sample data to which it belongs is μ _x ^{(h, i)} , the intra-class covariance matrix for the i-th region in the h-th layer is Σ _w ^{(h, i)} shown in the following equation (9), and the i-th region in the h-th layer is When the out-of-class covariance matrix is Σ _b ^{(h, i)} shown in the following equation (10), even if it is a projection matrix W ^{(h, i)} obtained by solving the eigenvalue problem in the following equation (11) Well (however, “Λ ^{(h, i)} ” in equation (11) is a diagonal matrix obtained by sequentially arranging eigenvalues as discrimination criteria diagonally), and the feature vector is X, and the projection matrix is When W is the projection point of the feature vector X and u is the projection point of the feature vector of each region obtained by dividing each of the sample data in each layer from the first layer to the H layer, Projection points of feature vectors of regions obtained by dividing the novel data in each layer from the 1st layer to the Hth layer are as follows: 12) may be derived in accordance with the feature vector of the novel data when the X _s, the probability the feature vector X _s appears from class _{_{C g p (X s | C}} g) , the following equation (13 ). However, the subscript (h, i) in equation (13) indicates that it is derived from the i-th region in the h-th layer, the subscript s indicates that it is derived from novel data, and the subscript C _g is the class. It indicates that it belongs to C _g, subscript 1 ... n indicates that from the first 1~n th sample data belonging to the class C _g, "alpha ^h" is the weight to be given to the h layer “Z ^{(h, i) Cg} ” and “Θ ^{(h, i)} ” in the equation (13) are as shown in the following equations (14) and (15), and u ^{− ( h} , ^{j) Cg} is the average of the projection points u ^(h , ^{j) Cg} of the feature vector X ^(h , ^{j) Cg} belonging to the class C _g , and “Ψ ^{(h, i} ) in the equations (14) and (15) ⁾ "is the variance of the latent variables in the following equation (16), lambda in the equation (16) ^{(h, i)} is the solution of the eigenvalue problem in the i-th region in the h layer It is a diagonal matrix obtained by arranging eigenvalues diagonally in order. This makes it possible to derive the probability that a feature vector of novel data appears from a certain class with higher accuracy.

本発明によるクラス判別方法は、
実世界情報を示す新奇データがそれぞれ少なくとも１つの既知データに対応した複数のクラスのいずれに属するかを判別するクラス判別方法であって、
第ｈ層で前記新奇データおよび前記既知データのそれぞれがｈ×ｈ個（ただし、“ｈ”は値１から値Ｈまでの整数であり、“Ｈ”は値２以上の整数である）の領域に分割されるとして、第１層から第Ｈ層までの各層で前記既知データのそれぞれを分割して得られる領域それぞれの特徴ベクトルに基づいて、該特徴ベクトルを潜在空間に射影するための変換を該領域ごとに導出し、
第ｈ層で前記新奇データがｈ×ｈ個の領域に分割されるとして、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域のそれぞれから複数の高次局所特徴ベクトルを抽出し、
第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域のそれぞれから抽出された複数の高次局所特徴ベクトルの平均ベクトルと、該複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトル（ただし、“Ｍ”は値１以上の整数であり、“ｍ”は値１から値Ｍまでの整数である）とを取得すると共に、該平均ベクトルを構成する要素と該ｍ次相関ベクトルを構成する要素とに基づいて前記領域それぞれの特徴ベクトルを取得し、
前記クラスごとに、第１層から第Ｈ層までの各層で前記既知データのそれぞれを分割して得られる領域それぞれの特徴ベクトルを該領域に対応した前記変換により前記潜在空間に射影して得られる射影点と、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域それぞれの特徴ベクトルを該領域に対応した前記変換により前記潜在空間に射影して得られる射影点とに基づいて、前記既知データの第ｈ層における第ｉ番目（ただし、“ｉ”は値１から値ｈ²までの整数である）の領域の特徴ベクトルから前記新奇データの第ｈ層における第ｉ番目の領域の特徴ベクトルが出現する確率のｉ＝１からｉ＝ｈ²かつ第１層から第Ｈ層までの総和を前記クラスから前記新奇データの特徴ベクトルが出現する確率として導出し、
前記導出された確率が最大となるクラスを前記新奇データが属するクラスとして設定するものである。 The class discrimination method according to the present invention includes:
A class discrimination method for discriminating to which one of a plurality of classes corresponding to at least one known data each new data indicating real world information belongs,
Each of the novel data and the known data in the h-th layer is h × h (where “h” is an integer from value 1 to value H, and “H” is an integer greater than or equal to value 2). Is converted into a potential space based on the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer. Derived for each region,
Assuming that the novel data is divided into h × h regions in the h-th layer, a plurality of higher-order local features from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer Extract the vector,
An average vector of a plurality of higher-order local feature vectors extracted from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer, and a first vector between the plurality of higher-order local feature vectors. M-order correlation vectors from the first order to the M-th order (where “M” is an integer greater than or equal to value 1, and “m” is an integer from value 1 to value M) and the average Obtaining a feature vector of each of the regions based on an element constituting the vector and an element constituting the m-th order correlation vector;
For each class, it is obtained by projecting the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer into the latent space by the transformation corresponding to the region. A projection point, and a projection point obtained by projecting the feature vector of each region obtained by dividing the novel data in each layer from the first layer to the H-th layer onto the latent space by the transformation corresponding to the region; Based on the feature vector of the i-th area (where “i” is an integer from 1 to h ² ) in the h-th layer of the known data, the i-th in the h-th layer of the novel data th derives a sum of i = 1 of the probability that the feature vector appears in the area from the i = h ² and the first layer to the second H layer as the probability that the feature vector of the novel data from the class appears,
The class having the maximum derived probability is set as the class to which the novel data belongs.

この方法によれば、クラスごとに導出された確率から、実世界情報を示す新奇データが複数のクラスのいずれに属するかを高精度に判別することが可能となる。 According to this method, it is possible to determine with high accuracy which of the plurality of classes the new data indicating the real world information belongs from the probability derived for each class.

本発明によるクラス判別プログラムは、
実世界情報を示す新奇データがそれぞれ少なくとも１つの既知データに対応した複数のクラスのいずれに属するかを判別するクラス判別装置としてコンピュータを機能させるクラス判別プログラムであって、
第ｈ層で前記新奇データがｈ×ｈ個（ただし、“ｈ”は値１から値Ｈまでの整数であり、“Ｈ”は値２以上の整数である）の領域に分割されるとして、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域のそれぞれから複数の高次局所特徴ベクトルを抽出する局所特徴抽出モジュールと、
第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域のそれぞれから前記局所特徴抽出モジュールにより抽出された複数の高次局所特徴ベクトルの平均ベクトルと、該複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトル（ただし、“Ｍ”は値１以上の整数であり、“ｍ”は値１から値Ｍまでの整数である）とを取得すると共に、該平均ベクトルを構成する要素と該ｍ次相関ベクトルを構成する要素とに基づいて前記領域それぞれの特徴ベクトルを取得する特徴ベクトル取得モジュールと、
前記クラスごとに、第１層から第Ｈ層までの各層で前記既知データのそれぞれを分割して得られる領域それぞれの特徴ベクトルを予め定められた該領域に対応した変換により潜在空間に射影して得られる射影点と、第１層から第Ｈ層までの各層で前記新奇データを分割して得られる領域それぞれの特徴ベクトルを該領域に対応した前記変換により前記潜在空間に射影して得られる射影点とに基づいて、前記既知データの第ｈ層における第ｉ番目（ただし、“ｉ”は値１から値ｈ²までの整数である）の領域の特徴ベクトルから前記新奇データの第ｈ層における第ｉ番目の領域の特徴ベクトルが出現する確率のｉ＝１からｉ＝ｈ²かつ第１層から第Ｈ層までの総和を前記クラスから前記新奇データの特徴ベクトルが出現する確率として導出する確率導出モジュールと、
前記確率導出モジュールにより導出された確率が最大となるクラスを前記新奇データが属するクラスとして設定するクラス設定モジュールと、
を備えるものである。 The class discrimination program according to the present invention is:
A class determination program that causes a computer to function as a class determination device that determines which of a plurality of classes corresponding to at least one known data each of novel data indicating real world information,
In the h-th layer, the novel data is divided into h × h areas (where “h” is an integer from value 1 to value H and “H” is an integer greater than or equal to value 2). A local feature extraction module that extracts a plurality of higher-order local feature vectors from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer,
An average vector of a plurality of higher-order local feature vectors extracted by the local feature extraction module from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer, and the plurality of higher-order M-th order correlation vectors from the first order to the M-th order between local feature vectors (where “M” is an integer greater than or equal to value 1 and “m” is an integer between value 1 and value M). A feature vector obtaining module that obtains a feature vector of each of the regions based on an element constituting the average vector and an element constituting the m-th order correlation vector;
For each class, the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer is projected onto the latent space by conversion corresponding to the predetermined region. Projection points obtained by projecting the obtained projection points and the feature vectors of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer onto the latent space by the transformation corresponding to the regions. Based on the feature vector of the i-th region in the h-th layer of the known data (where “i” is an integer from the value 1 to the value h ² ) in the h-th layer of the novel data. The probability that i = 1 to i = h ^{2 of} the probability that the feature vector of the i-th region appears and the sum from the first layer to the H-th layer is derived as the probability that the feature vector of the novel data appears from the class. A rate derivation module;
A class setting module that sets a class having a maximum probability derived by the probability derivation module as a class to which the novel data belongs;
Is provided.

このクラス判別プログラムがインストールされたコンピュータによれば、クラスごとに導出された確率から、実世界情報を示す新奇データが複数のクラスのいずれに属するかを高精度に判別することが可能となる。 According to the computer in which the class discrimination program is installed, it is possible to determine with high accuracy which of the plurality of classes the novel data indicating the real world information belongs from the probability derived for each class.

本発明の一実施例に係るロボット装置を示す概略構成図である。It is a schematic block diagram which shows the robot apparatus which concerns on one Example of this invention. 画像データ全体の特徴を示す特徴ベクトルを生成するプロセスを説明するための説明図である。It is explanatory drawing for demonstrating the process which produces | generates the feature vector which shows the characteristic of the whole image data. 特徴ベクトル生成ルーチンの一例を示すフローチャートである。It is a flowchart which shows an example of a feature vector generation routine. クラス判別ルーチンの一例を示すフローチャートである。It is a flowchart which shows an example of a class discrimination | determination routine. 新奇データが複数のクラスのいずれに属するかを判別する手順を説明するための説明図である。It is explanatory drawing for demonstrating the procedure which discriminate | determines to which of a some class new data belongs. 新奇データが複数のクラスのいずれに属するかを判別する手順を説明するための説明図である。It is explanatory drawing for demonstrating the procedure which discriminate | determines to which of a some class new data belongs. 本発明の特徴ベクトル生成方法により生成される特徴ベクトルの特徴表現との評価結果を示す図表である。It is a graph which shows the evaluation result with the feature expression of the feature vector produced | generated by the feature vector production | generation method of this invention. 本発明のクラス判別方法の有効性の評価結果を示す図表である。It is a graph which shows the evaluation result of the effectiveness of the class identification method of this invention. 変形例に係る画像データ処理システムの概略構成図である。It is a schematic block diagram of the image data processing system which concerns on a modification.

次に、本発明を実施するための形態を実施例を用いて説明する。 Next, the form for implementing this invention is demonstrated using an Example.

図１は、本発明の一実施例に係るロボット装置２０の概略構成図である。同図に示すロボット装置２０は、人工知能を有するいわゆる人型ロボットであり、人間の目に相当する撮像ユニット２１や人間の耳に相当する集音ユニット２２、人間の手に相当するマニピュレータや脚部等の可動部を動かすための多数のアクチュエータ２３、図示しない音声発生ユニット、人工知能として機能する制御コンピュータ３０等を含む。制御コンピュータ３０は、図示しないＣＰＵ，ＲＯＭ，ＲＡＭ、グラフィックプロセッサ（ＧＰＵ）、グラフィックメモリ（ＶＲＡＭ）、システムバス、各種インターフェース、ハードディスクドライブやフラッシュメモリドライブ（ＳＳＤ）といった外部記憶装置等を含むものであり、制御コンピュータ３０には、これらのハードウェアと本発明による特徴量生成プログラムやクラス判別プログラムといったソフトウェアとの一方または双方の協働により、入出力処理部３１、特徴量処理部３２、学習処理部３３、判別処理部３４、リトリーバル処理部３５、主制御部３６等が構築されている。また、制御コンピュータ３０には、画像データや音声データ等を記憶するデータ記憶装置４０や、特徴量記憶装置４１や学習情報記憶装置４２が接続されている。 FIG. 1 is a schematic configuration diagram of a robot apparatus 20 according to an embodiment of the present invention. A robot apparatus 20 shown in FIG. 1 is a so-called humanoid robot having artificial intelligence, and includes an imaging unit 21 corresponding to a human eye, a sound collecting unit 22 corresponding to a human ear, a manipulator and a leg corresponding to a human hand. A number of actuators 23 for moving a movable part such as a part, a sound generation unit (not shown), a control computer 30 functioning as artificial intelligence, and the like are included. The control computer 30 includes an unillustrated CPU, ROM, RAM, graphic processor (GPU), graphic memory (VRAM), system bus, various interfaces, an external storage device such as a hard disk drive and a flash memory drive (SSD), and the like. The control computer 30 includes an input / output processing unit 31, a feature amount processing unit 32, a learning processing unit in cooperation with one or both of the hardware and software such as a feature amount generation program and a class determination program according to the present invention. 33, a discrimination processing unit 34, a retrieval processing unit 35, a main control unit 36, and the like are constructed. The control computer 30 is connected to a data storage device 40 that stores image data, audio data, and the like, a feature amount storage device 41, and a learning information storage device 42.

入出力処理部３１は、撮像ユニット２１や集音ユニット２２等を介してロボット装置２０に対して入出力される情報を処理するものであり、例えば人間からの音声による指令が集音ユニット２２により取得されると集音ユニット２２からの音声データを適宜処理して主制御部３６に与える。特徴量処理部３２は、例えばグリッドによる特徴点検出を実行して撮像ユニット２１により取得された画像データや集音ユニット２２により取得された音声データから画像や音声の特徴点（Key Point）を検出（選択）すると共に、例えばSIFT記述子を用いた各特徴点における特徴記述を実行することにより（図２における１）および２）参照）、対象となるデータから複数の高次局所特徴ベクトルＶ_kを抽出し、特徴量記憶装置４１に格納する。以下、高次局所特徴ベクトルＶ_kの次元を“ｄ”とする（ただし、“ｄ”は値２以上の整数である）。また、ある画像データから抽出される高次局所特徴ベクトルＶ_kの数を“ｐ”とすれば（ただし、“ｐ”は値２以上の整数である）、“ｋ”は値１から値ｐまでの整数となる。ここで、SIFT記述子を用いた特徴記述は、モノクロ画像については、当該モノクロ画像の特徴点をＬピクセルずつスペーシングしながら，特徴点を中心とするＰ×Ｐピクセルの領域から１２８次元の局所特徴ベクトル（Gray-SIFT）を抽出するものであり、カラー画像については、当該カラー画像の各特徴点においてＲＧＢそれぞれについて独立にSIFT特徴記述を行い、ＲＧＢそれぞれについて抽出された局所特徴を結合して３８４次元の局所特徴ベクトル（RGB-SIFT）を生成するものである。また、実施例では、スケールに対する頑健性を向上させるべく、Ｐ＝１６の領域とＰ＝３６の領域との双方から抽出された局所特徴ベクトルを列挙して最終的な高次局所特徴ベクトルＶ_kとしている。更に、特徴量処理部３２は、抽出した高次局所特徴ベクトルＶ_k等に基づいて画像データや音声データ全体の特徴（global feature）を示す特徴ベクトルＸ_jを生成し（図２における３）参照）、特徴量記憶装置４１に格納する。加えて、特徴量処理部３２は、画像データや音声データに対応付けられて画像に現れているものや音声の意味するところを示すシンボルについてのメタデータからそのデータの特徴を示す特徴ベクトルを抽出し、特徴量記憶装置４１に格納する。 The input / output processing unit 31 processes information input / output to / from the robot apparatus 20 via the imaging unit 21, the sound collection unit 22, and the like. When acquired, the audio data from the sound collection unit 22 is appropriately processed and given to the main control unit 36. The feature amount processing unit 32 detects feature points (key points) of images and sounds from image data acquired by the imaging unit 21 and sound data acquired by the sound collection unit 22 by performing feature point detection using a grid, for example. (Selection) and, for example, by executing feature description at each feature point using the SIFT descriptor (see 1 and 2 in FIG. 2), a plurality of higher-order local feature vectors V _{k are} obtained from the target data. Are extracted and stored in the feature quantity storage device 41. Hereinafter, the dimension of the higher-order local feature vector V _k is assumed to be “d” (where “d” is an integer of 2 or more). Further, if the number of higher-order local feature vectors V _k extracted from certain image data is “p” (where “p” is an integer greater than or equal to 2), “k” is from value 1 to value p. It becomes an integer up to. Here, the feature description using the SIFT descriptor is a 128-dimensional local region from a P × P pixel region centered on the feature point while spacing the feature points of the monochrome image by L pixels. Feature vector (Gray-SIFT) is extracted. For color images, each feature point of the color image is described independently for each RGB, and local features extracted for each RGB are combined. A 384-dimensional local feature vector (RGB-SIFT) is generated. In the embodiment, in order to improve the robustness to the scale, the local feature vectors extracted from both the P = 16 region and the P = 36 region are enumerated to obtain the final higher-order local feature vector V _k. It is said. Further, the feature quantity processing unit 32 generates a feature vector X _j indicating the global feature of the image data or the audio data based on the extracted higher-order local feature vector V _k (see 3 in FIG. 2). ), And stored in the feature amount storage device 41. In addition, the feature quantity processing unit 32 extracts a feature vector indicating the feature of the data from metadata about symbols appearing in the image in correspondence with the image data or the sound data or symbols indicating the meaning of the sound. And stored in the feature amount storage device 41.

学習処理部３３は、高次局所特徴ベクトルＶ_kや特徴ベクトルＸ_jを用いた主成分分析等を実行して判別処理部３４やリトリーバル処理部３５の処理に際して必要な学習情報を生成・更新し、学習情報記憶装置４２に格納する。判別処理部３４は、撮像ユニット２１により取り込まれた新奇画像データ（画像データに対応付けられて画像に現れているものや音声の意味するところを示すシンボルすなわちメタデータが付与されていない未注釈画像データ）等がそれぞれ複数の既知画像データ等に対応した複数のクラス（同種のものとして分類される複数の画像データが共通に意味するところを示すシンボル）のいずれに属するかを判別する。また、判別処理部３４は、学習情報記憶装置４２に記憶された学習情報等を用いて未注釈画像データや未注釈音声データに対するアノテーションを実行する。リトリーバル処理部３５は、シンボルに基づく未注釈画像データや未注釈音声データの検索処理（リトリーバル）を実行する。主制御部３６は、入出力処理部３１からの指令や、判別処理部３４の処理結果、リトリーバル処理部３５の処理結果等に基づいてロボット装置２０の動作態様を決定する等してアクチュエータ２３を制御する。 The learning processing unit 33 generates and updates learning information necessary for the processing of the discrimination processing unit 34 and the retrieval processing unit 35 by executing principal component analysis using the higher-order local feature vector V _k and the feature vector X _j. And stored in the learning information storage device 42. The discrimination processing unit 34 is a novel image data captured by the imaging unit 21 (an unannotated image to which no symbol or metadata indicating what is shown in the image associated with the image data or meaning of the sound is added). Data) and the like belong to a plurality of classes corresponding to a plurality of known image data and the like (symbols indicating a meaning of a plurality of image data classified as the same kind). Further, the discrimination processing unit 34 performs annotation on unannotated image data and unannotated audio data using learning information stored in the learning information storage device 42. The retrieval processing unit 35 executes search processing (retrieval) of unannotated image data and unannotated audio data based on symbols. The main control unit 36 determines the operation mode of the robot apparatus 20 based on the command from the input / output processing unit 31, the processing result of the discrimination processing unit 34, the processing result of the retrieval processing unit 35, and the like. Control.

次に、実施例のロボット装置２０において撮像ユニット２１により取り込まれた新奇画像データＩ_s全体の特徴（global feature）を示す特徴ベクトルＸ_sを生成する手順について説明する。図３は、新奇画像データＩ_sの特徴ベクトルＸ_sを生成するために制御コンピュータ３０の特徴量処理部３２により実行される特徴ベクトル生成ルーチンの一例を示すフローチャートである。 Next, a procedure for generating the feature vector X _s indicating the global feature of the novel image data I _s captured by the imaging unit 21 in the robot apparatus 20 of the embodiment will be described. FIG. 3 is a flowchart showing an example of a feature vector generation routine executed by the feature amount processing unit 32 of the control computer 30 in order to generate the feature vector X _s of the novel image data I _s .

図３の特徴ベクトル生成ルーチンの開始に際して、特徴量処理部３２は、新奇画像データＩ_sや射影行列Ｕ_dlといった特徴ベクトルＸ_sの生成に必要なデータを入力し、所定の記憶領域（メモリ）に格納する（ステップＳ１００）。射影行列Ｕ_dlは、学習情報記憶装置４２に記憶されている学習情報の一つであり、学習処理部３３によりデータ記憶装置４０に記憶されている既知画像データ（学習用のデータを含む）から抽出されたｄ次局所特徴ベクトルに基づいて予め求められている。具体的には、射影行列Ｕ_dlは、局所特徴ベクトルの次数ｄよりも低次であるｄｌ次（ｄｌ＜ｄ、例えばｄｌ＝３０）の主成分空間への射影行列であって、Ｎ個の既知画像データＩ^(j)が存在するとし（ただし、“Ｎ”は値２以上の整数であり、“ｊ”は値１から値Ｎまでの整数である）、１つの既知画像データＩ^(j)から抽出されたｐ^(j)個のｄ次局所特徴ベクトルをＶ_k ^(j)＝（ｖ₁，…，ｖ_d）とし、ｐ^(j)個のｄ次局所特徴ベクトルＶ_kの平均ベクトルを次式（１７）に示すμ^(j)とし、ｐ^(j)個のｄ次局所特徴ベクトルＶ_k ^(j)の自己相関行列を次式（１８）に示すＲ^(j)とし、Ｎ個の既知画像データから抽出されたｄ次局所特徴ベクトル全体の自己相関行列を次式（１９）に示すＲ_allとしたときに、次式（２０）の固有値問題の解として得られるものである。
At the start of the feature vector generation routine of FIG. 3, the feature quantity processing unit 32 inputs data necessary for generating the feature vector X _s such as the novel image data I _s and the projection matrix U _dl , and a predetermined storage area (memory). (Step S100). The projection matrix U _dl is one piece of learning information stored in the learning information storage device 42, and is based on the known image data (including learning data) stored in the data storage device 40 by the learning processing unit 33. It is obtained in advance based on the extracted d-th order local feature vector. Specifically, the projection matrix U _dl is a projection matrix onto a principal component space of dl order (dl <d, eg, dl = 30) that is lower than the order d of the local feature vector, Assume that there is known image data I ^(j) (where “N” is an integer greater than or equal to 2 and “j” is an integer from value 1 to value N), and one known image data I ^{(j )} extracted from the p ^(j) number of d next local feature vectors _{^{V k (j) = (v}} 1, ..., v d) and then, the average vector of the p ^(j) number of d next local feature vectors V _k Is μ ^(j) shown in the following equation (17), and the autocorrelation matrix of p ^(j) d-order local feature vectors V _k ^(j) is R ^(j) shown in the following equation (18). When the autocorrelation matrix of the entire d-th order local feature vector extracted from the known image data is R _all shown in the following equation (19), it is obtained as a solution to the eigenvalue problem of the following equation (20). Is.

ステップＳ１００のデータ入力処理の後、特徴量処理部３２は、新奇画像データＩ_sを階層的に分割するときの階層の数を示す変数ｈを値１に初期化する（ステップＳ１１０）。ここで、変数ｈに対応した階層である第ｈ層では、新奇画像データＩ_sがｈ×ｈ個の領域に分割されると仮定する。ただし、“ｈ”は値１から値Ｈまでの整数であり、“Ｈ”は値２以上の整数である。ステップＳ１１０の処理の後、特徴量処理部３２は、グリッドによる特徴点検出とSIFT記述子を用いた各特徴点における特徴記述とを実行することにより、第ｈ層において新奇画像データＩ_sをｈ×ｈ分割（ｈ×ｈ等分）して得られる領域のそれぞれから複数（実施例では、領域ごとに同数）のｄ次局所特徴ベクトルＶ_k ^(h,i)を抽出し、所定の記憶領域（実施例では、メモリおよび外部記憶装置すなわち特徴量記憶装置４１）に格納する（ステップＳ１２０）。ただし、“ｉ”は値１から値ｈ²までの整数（ｈ×ｈ分割後の領域の番号）であり、添え字（ｈ，ｉ）は、第ｈ層における第ｉ番目の領域に由来することを示す。また、第１層において新奇画像データＩ_sは１×１個の領域に分割されることになるから、ｈ＝１であるときに、ステップＳ１２０では、新奇画像データＩ_sの全体から複数のｄ次局所特徴ベクトルＶ_k ^(1,1)が抽出されることになる。そして、特徴量処理部３２は、第ｈ層の第１番目から第ｈ²番目までの領域ごとに上記式（１７）と同様の計算を行い、それぞれの領域から抽出された複数のｄ次局所特徴ベクトルＶ_k ^(h,i)の平均ベクトルμ_Xs ^(h,i)を導出し、所定の記憶領域（メモリ）に格納する（ステップＳ１３０）。 After the data input at step S100, the feature quantity processing unit 32 initializes a variable h to a value 1 indicating the number of hierarchies when dividing the novel image data I _s hierarchically (step S110). Here, it is assumed that the novel image data I _s is divided into h × h areas in the h-th layer, which is a hierarchy corresponding to the variable h. However, “h” is an integer from value 1 to value H, and “H” is an integer greater than or equal to value 2. After the process of step S110, the feature quantity processing unit 32 performs the feature point detection by the grid and the feature description at each feature point using the SIFT descriptor, thereby generating the novel image data I _s in the h-th layer h. A plurality of (in the embodiment, the same number for each area) d-order local feature vector V _k ^{(h, i)} is extracted from each of the areas obtained by xh division (equal to h * h ⁾ , and a predetermined storage area (In the embodiment, the data is stored in the memory and the external storage device, that is, the feature amount storage device 41) (step S120). However, “i” is an integer from the value 1 to the value h ² (the number of the area after h × h division), and the subscript (h, i) is derived from the i-th area in the h-th layer. It shows that. In addition, since the novel image data I _s is divided into 1 × 1 areas in the first layer, when h = 1, in step S120, a plurality of d from the entire novel image data I _s. The next local feature vector V _k ^(1,1) is extracted. Then, the feature amount processing unit 32 performs the same calculation as the above equation (17) for each of the first to h ^2nd regions of the h-th layer, and a plurality of d-order local regions extracted from the respective regions. An average vector μ _Xs ^{(h, i)} of the feature vector V _k ^{(h, i)} is derived and stored in a predetermined storage area (memory) (step S130).

次いで、特徴量処理部３２は、所定の変数ｍを値１に初期化した上で（ステップＳ１４０）、第ｈ層の第１番目から第ｈ²番目までの領域ごとに、それぞれの領域から抽出された複数のｄ次局所特徴ベクトルＶ_k ^(h,i)間におけるｍ次相関ベクトル（ただし、“ｍ”は値１から値Ｍまでの整数であり、“Ｍ”は値１以上の整数である）を導出する（ステップＳ１５０）。ステップＳ１５０において、特徴量処理部３２は、ｍ＝１である場合、第ｈ層の第１番目から第ｈ²番目までの領域ごとに、上記式（１８）に従って第ｈ層の第ｉ番目の領域から抽出された複数のｄ次局所特徴ベクトルＶ_k ^(h,i)の自己相関行列Ｒ^(h,i)を求めると共に、当該自己相関行列Ｒ^(h,i)とステップＳ１００にて入力した射影行列Ｕ_dlおよびその転置行列Ｕ_dl ^Tとに基づく対角行列Ｕ_dl ^TＲ^(h,i)Ｕ_dlを取得し、当該対角行列Ｕ_dl ^TＲ^(h,i)Ｕ_dlの上三角行列の要素を列挙して１次相関ベクトルｕｐｐｅｒ（Ｕ_dl ^TＲ^(h,i)Ｕ_dl）を取得した後、所定の記憶領域（メモリ）に格納する。ここで、射影行列Ｕ_dlは、上述のように、局所特徴ベクトルの次数ｄ（例えばｄ＝１２８または３８４）よりも低次であるｄｌ次（例えばｄｌ＝３０）の主成分空間への射影行列である。 Then, the feature quantity processing unit 32, after initializing a predetermined variable m to a value 1 (step S140), for each area from the first th h layer to the ^second first h, extracted from each region M-th order correlation vectors between the plurality of d-th order local feature vectors V _k ^{(h, i)} (where “m” is an integer from value 1 to value M, and “M” is an integer greater than or equal to value 1) Is derived (step S150). In step S150, the feature quantity processing unit 32, if it is m = 1, for each area from the first th h layer to the ^second first h, the formula h-th layer i-th of according (18) An autocorrelation matrix R ^{(h, i)} of a plurality of d-order local feature vectors V _k ^{(h, i)} extracted from the region is obtained, and the autocorrelation matrix R ^{(h, i)} is input in step S100. projection matrix U _dl and its transpose matrix U _dl ^T diagonal matrix based on the _{^{^{U dl T R (h, i}}} ) obtains the U _dl, the diagonal matrix _{^{^{U dl T R (h, i}}} ) on the U _dl triangular The elements of the matrix are enumerated to obtain a primary correlation vector upper (U _dl ^T R ^{(h, i)} U _dl ), and then stored in a predetermined storage area (memory). Here, as described above, the projection matrix U _dl is a projection matrix onto the principal component space of the dl order (for example, dl = 30) that is lower than the order d (for example, d = 128 or 384) of the local feature vector. It is.

ステップＳ１５０にてｍ次相関ベクトルｕｐｐｅｒ（Ｕ_dl ^TＲ^(h,i)Ｕ_dl）を導出すると、特徴量処理部３２は、変数ｍが最大値である値Ｍであるか否かを判定し（ステップＳ１６０）、変数ｍが最大値Ｍ未満であれば、変数ｍをインクリメントして（ステップＳ１７０）、再度ステップＳ１５０の処理を実行する。なお、変数ｍの最大値Ｍが値２以上である場合にも、ｍ次相関ベクトルの導出に際して適切なｄ次局所特徴ベクトルの次元圧縮を実行することが好ましい。そして、ステップＳ１６０にて変数ｍが最大値Ｍであると判断すると、特徴量処理部３２は、第ｈ層の第１番目から第ｈ²番目までの領域ごとに、次式（２１）に従ってｄ次局所特徴ベクトルＶ_k ^(h,i)の平均ベクトルμ_Xs ^(h,i)を構成する要素とｍ次相関ベクトルｕｐｐｅｒ（Ｕ_dl ^TＲ^(j+1)Ｕ_dl）を構成する要素とを順番に列挙することにより新奇画像データＩ_sの特徴ベクトルＸ_s ^(h,i)を生成し、所定の記憶領域（実施例では、メモリおよび外部記憶装置すなわち特徴量記憶装置４１）に格納する（ステップＳ１８０）。ステップＳ１８０の処理の後、特徴量処理部３２は、変数ｈが最大値Ｈ（実施例では、例えばＨ＝３）であるか否かを判定し（ステップＳ１９０）、変数ｈが最大値Ｈ未満であれば、変数ｈをインクリメントして（ステップＳ２００）、再度ステップＳ１２０以降の処理を実行する。また、ステップＳ１９０にて変数ｈが最大値Ｈであると判断されたときには、第１層から第Ｈ層までの各層で新奇画像データＩ_sをｈ×ｈ分割して得られる領域それぞれの特徴ベクトルＸ_s ^(h,i)が取得されていることになり、その段階で本ルーチンが終了する。
When the m-th order correlation vector upper (U _dl ^TR ^{(h, i)} U _dl ) is derived in step S150, the feature amount processing unit 32 determines whether or not the variable m is the maximum value M. (Step S160) If the variable m is less than the maximum value M, the variable m is incremented (Step S170), and the process of Step S150 is executed again. Even when the maximum value M of the variable m is 2 or more, it is preferable to perform dimensional compression of the appropriate d-th order local feature vector when deriving the m-th order correlation vector. When it is determined in step S160 that the variable m is the maximum value M, the feature amount processing unit 32 performs d according to the following equation (21) for each of the first to h ^2nd regions of the h-th layer. An element constituting the average vector μ _Xs ^{(h, i)} of the second-order local feature vector V _k ^{(h, i)} and an element constituting the m-th order correlation vector upper (U _dl ^T R ^{(j + 1)} U _dl ) The feature vectors X _s ^{(h, i)} of the novel image data I _s are generated by enumeration in order, and stored in a predetermined storage area (in the embodiment, the memory and the external storage device, that is, the feature amount storage device 41) ( Step S180). After the process of step S180, the feature amount processing unit 32 determines whether or not the variable h is the maximum value H (for example, H = 3 in the embodiment) (step S190), and the variable h is less than the maximum value H. If so, the variable h is incremented (step S200), and the processing after step S120 is executed again. If it is determined in step S190 that the variable h is the maximum value H, the feature vectors of the respective regions obtained by dividing the novel image data I _s by h × h in each layer from the first layer to the H-th layer. X _s ^{(h, i)} has been acquired, and this routine ends at that stage.

引き続き、実施例のロボット装置２０において撮像ユニット２１により取り込まれた新奇画像データＩ_sがそれぞれ複数の既知画像データに対応した複数のクラスＣ₁，…，Ｃ_g，…，Ｃ_G（ただし、“ｇ”は値１から値Ｇまでの整数であり、“Ｇ”は値２以上の整数である）のいずれに属するかを判別する手順について説明する。図４は、新奇画像データＩ_sが複数のクラスＣ₁〜Ｃ_Gのいずれに属するかを判別するために制御コンピュータ３０の判別処理部３４により実行されるクラス判別ルーチンの一例を示すフローチャートである。 Subsequently, a plurality of classes C ₁ to novel image data I _s captured by the imaging unit 21 corresponding to a plurality of known image data, respectively, in the robot apparatus 20 of the _{embodiment, ..., C g, ...,} C G ( where " G ”is an integer from value 1 to value G, and“ G ”is an integer greater than or equal to value 2). FIG. 4 is a flowchart illustrating an example of a class determination routine executed by the determination processing unit 34 of the control computer 30 in order to determine which of the plurality of classes C _{1 to} C _G the novel image data I _s belongs to. .

ここで、図４に例示するクラス判別ルーチンは、確率的線形判別分析（Probabilistic linear discriminant analysis）の枠組みを基に構築されたものである。確率的線形判別分析の枠組みにおいて、クラスＣ_gからサンプルとして抽出される既知画像データ（以下、「サンプルデータ」という）の数をｎとし、クラスＣ_gに属する第ｊ番目（ただし、“ｊ”は値１から値ｎまでの整数である）のサンプルデータの特徴ベクトルをＸ_j ^gとし、クラスＣ_gに属するサンプルデータの特徴ベクトルＸ_j ^gの平均ベクトルをＸ^-gとし（ただし、本明細書および特許請求の範囲において、上付きのバーは、アッパーラインを示す）、全サンプルデータの特徴ベクトルＸ_j ^gの平均ベクトルをμ_xとし、クラス内共分散行列を次式（２２）に示すΣ_wとし、クラス外共分散行列を次式（２３）に示すΣ_bとしたときに、次式（２４）の一般化固有値問題が定式化される。かかる式（２４）の固有値問題を解くことにより、特徴ベクトルＸ_j ^gを潜在空間に射影するための変換である射影行列Ｗを得ることができる。ただし、式（２４）の“Λ”は判別基準としての固有値を順番に対角に並べて得られる対角行列である。なお、特徴ベクトルの次元に対してサンプルデータの数が充分に大きくない場合には、次式（２５）に示すように、式（２３）から得られるクラス内共分散行列Σ_wに対して過学習を抑制すべく正則化項γＩを付加するとよい（ただし、“γ”は実験的に求められるパラメータである）。こうして得られる射影行列Ｗを用いることにより、上述の特徴ベクトル生成ルーチンの実行により得られた特徴ベクトルＸ^(h,i)の潜在空間における射影点（ベクトル）ｕ^(h,i)を次式（２６）に従って導出することができる。そして、上記式（２２）〜（２６）等に示す構造を用いれば、あるクラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sの射影点ｕ_sが出現する確率ｐ（ｕ_s｜Ｃ_g）を次式（２７）に従って導出することができる。ただし、式（２７）における添え字１…ｎはクラスＣ_gに属する第１〜ｎ番目のサンプルデータに由来することを示し、“Ψ”は、次式（２８）に示す潜在変数の分散である。
Here, the class discrimination routine illustrated in FIG. 4 is constructed based on the framework of probabilistic linear discriminant analysis. In the framework of probabilistic linear discriminant analysis, the number of known image data (hereinafter referred to as “sample data”) extracted as a sample from class C _g is n, and the j th (where “j”) belongs to class C _g. the feature vector of the sample data is an integer of from the value 1 to the value n) and X _j ^g, an average vector of feature vectors X _j ^g sample data belonging to the class C _g and X ^-g (provided that hereby In the description and claims, the superscript bar indicates the upper line), the average vector of the feature vectors X _j ^g of all sample data is μ _x , and the intra-class covariance matrix is shown in the following equation (22): sigma and _w, a class outside covariance matrix when the sigma _b shown in the following equation (23), generalized eigenvalue problem of the following formula (24) is formulated. By solving the eigenvalue problem of Equation (24), a projection matrix W that is a transformation for projecting the feature vector X _j ^g to the latent space can be obtained. However, “Λ” in the equation (24) is a diagonal matrix obtained by sequentially arranging eigenvalues as discrimination criteria diagonally. Note that if the number of sample data is not sufficiently large with respect to the dimension of the feature vector, as shown in the following equation (25), the intra-class covariance matrix Σ _w obtained from the equation (23) is excessive. A regularization term γI may be added to suppress learning (where “γ” is an experimentally determined parameter). By using the projection matrix W thus obtained, a projection point (vector) u ^{(h, i)} in the latent space of the feature vector X ^{(h, i)} obtained by executing the above-described feature vector generation routine is expressed by the following equation ( 26). Then, by using the structure shown in the formula (22) to (26) or the like, a probability projection point u _s feature vectors X _s of the novel image data I _s from a class C _g appears p (u _s | C _g ) Can be derived according to the following equation (27). However, character 1 ... n subscript in the formula (27) indicates that from the first 1~n th sample data belonging to the class C _g, "Ψ" is the variance of the latent variables in the following formula (28) is there.

確率的線形判別分析を利用した場合、あるクラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sの射影点ｕ_sが出現する確率ｐ（ｕ_s｜Ｃ_g）を上記式（２７）に従って求めることができるが、実施例のロボット装置２０では、あるクラスＣ_gから新奇画像データＩ_s（の特徴ベクトルＸ_s）が出現する確率をより高精度に導出可能とすべく、確率的線形判別分析に対して潜在空間を多重化するという拡張が導入されている。すなわち、実施例のロボット装置２０では、クラスＣ_gに属する第ｊ番目のサンプルデータの第ｈ層における第ｉ番目の領域の特徴ベクトルをＸ_j ^g(h,i)とし、クラスＣ_gに属するサンプルデータの第ｈ層におけるｉ番目の領域の特徴ベクトルＸ_j ^g(h,i)の平均ベクトルをＸ^-g(h,i)とし、クラスＣ_gに属する全サンプルデータの第ｈ層におけるｉ番目の領域の特徴ベクトルの平均ベクトルをμ_x ^(h,i)とし、第ｈ層におけるｉ番目の領域についてのクラス内共分散行列を次式（２９）に示すΣ_w ^(h,i)とし、第ｈ層におけるｉ番目の領域についてのクラス外共分散行列を次式（３０）に示すΣ_b ^(h,i)としたときに、第ｈ層における第ｉ番目の領域ごとに次式（３１）の固有値問題を解くことにより、第ｈ層における第ｉ番目の領域の特徴ベクトルを当該領域に対応した潜在空間に射影するための変換としての射影行列Ｗ^(h,i)が第ｈ層における第ｉ番目の領域ごとに予め導出される（ただし、式（３１）の“Λ^(h,i)”は判別基準としての固有値を順番に対角に並べて得られる対角行列である）。そして、ロボット装置２０は、複数のクラスＣ₁〜Ｃ_Gごとに、第１層から第Ｈ層までの各層でサンプルデータのそれぞれをｈ×ｈ分割して得られる領域それぞれの特徴ベクトルＸ^(h,i)を射影行列Ｗ^(h,i)により潜在空間に射影して得られる射影点ｕ^(h,i)と、第１層から第Ｈ層までの各層で新奇画像データＩ_sをｈ×ｈ分割して得られる領域それぞれの特徴ベクトルＸ_s ^(h,i)を射影行列Ｗ^(h,i)により潜在空間に射影して得られる射影点ｕ_s ^(h,i)と、次式（３２）とに基づいて、サンプルデータの第ｈ層における第ｉ番目の領域の特徴ベクトルＸ^(h,i)から新奇画像データＩ_sの第ｈ層における第ｉ番目の領域の特徴ベクトルＸ_s ^(h,i)が出現する確率のｉ＝１からｉ＝ｈ²かつ第１層から第Ｈ層までの総和を当該クラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sが出現する確率ｐ（Ｘ_s｜Ｃ_g）として導出するのである。ただし、式（３２）における添え字（ｈ，ｉ）は第ｈ層における第ｉ番目の領域に由来することを示し、添え字ｓは新奇データに由来することを示し、添え字Ｃ_gはクラスＣ_gに属することを示し、添え字１…ｎはクラスＣ_gに属する第１〜ｎ番目のサンプルデータに由来することを示し、“α^h”は、予め実験的に求められる第ｈ層に対して付与される重みであり、式（３２）における“Ｚ^(h,i)Cg”および“Θ^(h,i)”は次式（３３）および（３４）に示すとおりであり、式（３３）におけるｕ^-(h、^j)CgはクラスＣ_gに属する特徴ベクトルＸ^(h、^j)Cgの射影点ｕ^(h、^j)Cgの平均であり、式（３３）および（３４）における“Ψ^(h,i)”は次式（３５）に示す潜在変数の分散であり、式（３５）におけるΛ^(h,i)は第ｈ層における第ｉ番目の領域における固有値問題の解である固有値を順番に対角に並べて得られる対角行列である。図５および図６に実施例のロボット装置２０において確率ｐ（Ｘ_s｜Ｃ_g）が導出されていく様子を模式的に示す。上記式（３２）は、クラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sが出現する確率ｐ（Ｘ_s｜Ｃ_g）の対数尤度（重み付き対数尤度）を示すものであり、式（３２）を次式（３６）に示すように変形することにより、あるクラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sが出現する確率を導出することが可能となる。そして、図４のクラス判別ルーチンは、クラスＣ₁〜Ｃ_Gごとに確率ｐ（Ｘ_s｜Ｃ_g）を導出すると共に確率（Ｘ_s｜Ｃ_g）が最大となるクラスＣ_gを判別するために実行される。
When the probabilistic linear discriminant analysis is used, the probability p (u _s | C _g ) that the projection point u _s of the feature vector X _s of the novel image data I _s appears from a certain class C _g is obtained according to the above equation (27). However, in the robot apparatus 20 of the embodiment, the probability linear discriminant analysis is performed so that the probability that the novel image data I _s (feature vector X _s ) appears from a certain class C _g can be derived with higher accuracy. An extension to multiplex the latent space is introduced. That is, in the robot apparatus 20 of the embodiment, the feature vector of the i-th region in the h-th layer of the j-th sample data belonging to the class C _g is X _j ^{g (h, i),} and belongs to the class C _g . The average vector of feature vectors X _j ^{g (h, i)} of the i-th region in the h-th layer of sample data is X ^{−g (h, i),} and ⁱ in the h-th layer of all sample data belonging to class C _g The average vector of the feature vectors in the ^ith region is μ _x ^{(h, i),} and the intra-class covariance matrix for the ith region in the h-th layer is Σ _w ^{(h, i)} shown in the following equation (29). When the out-of-class covariance matrix for the i-th region in the h-th layer is Σ _b ^{(h, i)} shown in the following equation (30), for each i-th region in the h-th layer ( 31), the feature vector of the i-th region in the h-th layer is obtained by solving the eigenvalue problem "Lambda ^(h projection matrix W ^{(h, i)} as a transformation for projecting the corresponding the latent space is derived in advance for each i-th region in the h layer (wherein ^{(31), i )} "Is a diagonal matrix obtained by arranging eigenvalues as discrimination criteria in order diagonally). Then, the robot apparatus 20, for each of a plurality of classes C _{1 to} C _G , features vectors X ^{(h) of} regions obtained by dividing each sample data by h × h in each layer from the first layer to the Hth layer. ^{, i)} the projection matrix W ^(h, a projection projected point obtained by u ^{(h, i)} the latent space by ^i), h × a novel image data I _s in layers from the first layer to the second H layer h divided for each region obtained by a feature vector X _s ^{(h, i)} the projection matrix W ^{(h, i)} projection points obtained by projecting the latent space by u _s ^{(h, i)} and the following formula ( 32) from the feature vector X ^{(h, i)} of the i-th region in the h-th layer of the sample data to the feature vector X _s ⁽ in the h-th layer of the novel image data I _s ⁾ ^{h, i)} is characterized Baie novel image data I _s i = 1 to the probability of occurrence of the sum from i = h ² and the first layer to the second H layer from the class C _g This is derived as the probability p (X _s | C _g ) that the Couttle X _s appears. However, the subscript (h, i) in the equation (32) indicates that it is derived from the i-th region in the h-th layer, the subscript s indicates that it is derived from novel data, and the subscript C _g is the class. It indicates that it belongs to C _g, subscript 1 ... n indicates that from the first 1~n th sample data belonging to the class C _g, "alpha ^h" is the h layer obtained experimentally in advance in the The weights to be given to “Z ^{(h, i) Cg} ” and “Θ ^{(h, i)} ” in the equation (32) are as shown in the following equations (33) and (34). 33) u ^{− (h} , ^{j) Cg} is the average of the projection points u ^(h , ^{j) Cg} of the feature vector X ^(h , ^{j) Cg} belonging to the class C _g , and in Equations (33) and (34) “Ψ ^{(h, i)} ” is the variance of the latent variable shown in the following equation (35), and Λ ^{(h, i)} in equation (35) is in the i-th region in the h-th layer. This is a diagonal matrix obtained by arranging the eigenvalues that are solutions of the eigenvalue problem in a diagonal order. 5 and 6 schematically show how the probability p (X _s | C _g ) is derived in the robot apparatus 20 of the embodiment. The formula (32) is the probability the feature vector X _s of the novel image data I _s appears from class C _g p | is indicative of log-likelihood (X _s C _g) (weighted log likelihood), by deforming as shown equation (32) into equation (36), it is possible to derive a probability that the feature vector X _s of the novel image data I _s appears from a certain class C _g. The class discrimination routine of FIG. 4 derives the probability p (X _s | C _g ) for each of the classes C _{1 to} C _G and discriminates the class C _g having the maximum probability (X _s | C _g ). To be executed.

さて、図４のクラス判別ルーチンの開始に際して、判別処理部３４は、上述の特徴ベクトル生成ルーチンの実行により得られて所定の記憶領域の格納されている第１層から第Ｈ層までの各層で新奇画像データＩ_sをｈ×ｈ分割して得られる領域それぞれの特徴ベクトルＸ_s ^(h,i)や学習情報記憶装置４２に記憶されている学習情報といった新奇画像データＩ_sのクラス判別に必要なデータを入力し、所定の記憶領域（メモリ）に格納する（ステップＳ３００）。ステップＳ３００にて入力される学習情報には、学習処理部３３によってサンプルデータのそれぞれを第１層から第Ｈ層までの各層で分割して得られる領域それぞれについて導出されて学習情報記憶装置４２に記憶されている複数の射影行列Ｗ^(h、^j)や、第１層から第Ｈ層までの各層でサンプルデータのそれぞれを分割して得られる領域それぞれの特徴ベクトルＸ^(h,i)を射影行列Ｗ^(h、^j)により潜在空間に射影して得られる射影点（ベクトル）ｕ^(h,i)、予め求められて学習情報記憶装置４２に記憶されている第１層から第Ｈ層までの重みα^h等が含まれる。実施例では、クラスＣ₁〜Ｃ_Gごとにｎ個のサンプルデータがクラス判別用のサンプルとして予め定められており、射影行列Ｗ^(h、^j)は、サンプルデータの特徴ベクトルＸ_j ^g(h,i)等に基づいて予め求められ、学習情報記憶装置４２に記憶される。また、各サンプルデータの射影点ｕ^(h,i)は、学習処理部３３により射影行列Ｗ^(h、^j)が導出された後、学習処理部３３により上記式（２６）と同様の変換式に従って導出され、学習情報記憶装置４２に記憶される。 Now, at the start of the class discrimination routine of FIG. 4, the discrimination processing unit 34 is obtained in each layer from the first layer to the Hth layer obtained by executing the above-described feature vector generation routine and stored in a predetermined storage area. Necessary for class determination of novel image data I _s such as feature vector X _s ^{(h, i) of} each region obtained by dividing h × h of novel image data I _s and learning information stored in learning information storage device 42 Correct data is input and stored in a predetermined storage area (memory) (step S300). The learning information input in step S300 is derived for each region obtained by dividing the sample data by the learning processing unit 33 in each layer from the first layer to the H-th layer and stored in the learning information storage device 42. Projecting a plurality of stored projection matrices W ^(h , ^j) and feature vectors X ^{(h, i)} of each region obtained by dividing each sample data in each layer from the first layer to the H layer Projection points (vectors) u ^{(h, i)} obtained by projecting into the latent space by the matrix W ^(h , ^j) , from the first layer to the H layer, which are obtained in advance and stored in the learning information storage device 42 Weight α ^{h and the} like. In an embodiment, Class C ₁ -C n pieces of sample data for each _G are predetermined as a sample for a class discrimination, projection matrix W ^(h, ^j) is the sample data feature vector X _j ^{g (h , i) and the} like, and are stored in the learning information storage device 42 in advance. Further, the projection point u ^{(h, i)} of each sample data is obtained by the learning processing unit 33 after the projection matrix W ^(h , ^j) is derived, and then the learning processing unit 33 converts the same conversion equation as the above equation (26). And stored in the learning information storage device 42.

ステップＳ３００のデータ入力処理の後、判別処理部３４は、新奇画像データＩ_sについての全特徴ベクトルＸ_s ^(h,i)の潜在空間における射影点ｕ_s ^(h,i)を上記式（２６）と同様の変換式に従い、入力した特徴ベクトルＸ_s ^(h,i)や射影行列Ｗ^(h、^j)等に基づいて導出し、所定の記憶領域（メモリ）に格納する（ステップＳ３１０）。更に、判別処理部３４は、上述のクラスを識別するための変数ｇを値１に初期化すると共に（ステップＳ３２０）、上述の階層の数を示す変数ｈを値１に初期化し（ステップＳ３３０）、更に第ｈ層における領域の番号を示す変数ｉを値１に初期化する（ステップＳ３４０）。次いで、判別処理部３４は、ステップＳ３００にて入力した情報を用いて上記式（３６）の項ｑ参照）の値を計算すると共に（ステップＳ３５０）、項ｑの値を順次加算すべく値Ｑ＝Ｑ＋ｑを導出し、所定の記憶領域（メモリ）に格納する（ステップＳ３６０）。ステップＳ３６０の処理の後、判別処理部３４は、変数ｉが最大値ｈ²（第ｈ層における領域の総数）であるか否かを判定し（ステップＳ３７０）、変数ｉが最大値ｈ²未満であれば、変数ｉをインクリメントして（ステップＳ３８０）、再度ステップＳ３５０およびＳ３６０の処理を実行する。
After the data input process of step S300, the discrimination processing unit 34 calculates the projection point u _s ^{(h, i)} in the latent space of all feature vectors X _s ^{(h, i)} for the novel image data I _s by the above equation (26 ^). ) Is derived based on the input feature vector X _s ^{(h, i)} , the projection matrix W ^(h , ^j), etc., and stored in a predetermined storage area (memory) (step S310). Further, the discrimination processing unit 34 initializes a variable g for identifying the above-described class to a value 1 (step S320), and initializes a variable h indicating the number of the above-described layers to a value 1 (step S330). Further, a variable i indicating a region number in the h-th layer is initialized to a value 1 (step S340). Next, the discrimination processing unit 34 calculates the value of the term q in the above formula (36) using the information input in step S300 (step S350), and adds the value Q to sequentially add the value of the term q. = Q + q is derived and stored in a predetermined storage area (memory) (step S360). After the process of step S360, the determination processing unit 34 determines whether or not the variable i is the maximum value h ² (the total number of regions in the h-th layer) (step S370), and the variable i is less than the maximum value h ^2. If so, the variable i is incremented (step S380), and the processes of steps S350 and S360 are executed again.

ステップＳ３７０にて変数ｉが最大値ｈ²であると判断されると、その段階では、各サンプルデータの第ｈ層における各領域の特徴ベクトルＸ^(h,i)から新奇画像データＩ_sの第ｈ層における各領域の特徴ベクトルＸ_s ^(h,i)が出現する確率の総和が導出されたことになる。すなわち、ｈ＝１である場合、ステップＳ３７０にて肯定判断がなされた時点で、値Ｑは、図５からわかるように、あるクラスＣ_gにおける各サンプルデータそのもの（の特徴ベクトルＸ_j ^g）から新奇画像データＩ_sそのもの（の特徴ベクトルＸ_s）が出現する確率の総和を示す。また、ｈ＝２である場合、ステップＳ３７０にて肯定判断がなされた時点で、値Ｑは、図６からわかるように、あるクラスＣ_gにおける各サンプルデータの第２層における領域（の特徴ベクトルＸ^(2,i)）から当該領域（の特徴ベクトルＸ^(2,i)）に対応した新奇画像データＩ_sの第２層における領域（の特徴ベクトルＸ_s ^(2,i)）が出現する確率の総和を示す。従って、ステップＳ３７０にて変数ｉが最大値ｈ²であると判断すると、判別処理部３４は、第１層から第Ｈ層までの値Ｑの和を導出すべく値Ｐ＝Ｐ＋Ｑを導出すると共に所定の記憶領域（メモリ）に格納し（ステップＳ３９０）、更に変数ｈが最大値Ｈであるか否かを判定する（ステップＳ４００）。変数ｈが最大値Ｈ未満であれば、判別処理部３４は、変数ｈをインクリメントして（ステップＳ４１０）、再度ステップＳ３４０〜Ｓ３９０の処理を実行する。 If it is determined in step S370 that the variable i is the maximum value h ² , at that stage, the feature vector X ^{(h, i)} of each region in the h-th layer of each sample data is used to generate the new image data I _s . The sum of the probabilities that the feature vectors X _s ^{(h, i)} of each region in the h layer appear is derived. That is, when h = 1, when the affirmative determination is made in step S370, the value Q is obtained from each sample data itself (a feature vector X _j ^g ) in a certain class C _{g as} can be seen from FIG. The sum of the probabilities that the novel image data I _s itself (feature vector X _s ) appears. Also, if it is h = 2, when the affirmative determination is made in step S370, the value Q, as can be seen from FIG. 6, a region (feature vectors in the second layer of each sample data in a class C _g X ^{(2, i))} from the area (of the feature vector X ^{(2, i)} the area in the second layer of the novel image data I _s corresponding to) (feature vector X _s ^{(2, i))} appears Indicates the sum of probabilities. Accordingly, when determining that the variable i is the maximum value h ² in step S370, the discrimination processing unit 34 derives a value P = P + Q to derive the sum of the values Q from the first layer to the H layer. It is stored in a predetermined storage area (memory) (step S390), and it is further determined whether or not the variable h is the maximum value H (step S400). If the variable h is less than the maximum value H, the determination processing unit 34 increments the variable h (step S410) and executes the processes of steps S340 to S390 again.

ステップＳ４００にて変数ｈが最大値Ｈであると判断されると、その段階で、あるクラスＣ_gについてサンプルデータの第ｈ層における第ｉ番目の領域の特徴ベクトルＸ^(h,i)から新奇画像データＩ_sの第ｈ層における第ｉ番目の領域の特徴ベクトルＸ_s ^(h,i)が出現する確率の第１層から第Ｈ層までの総和が導出されたことになる。従って、ステップＳ３７０にて変数ｈが最大値Ｈであると判断すると、判別処理部３４は、クラスＣ_gから新奇画像データの特徴ベクトルＸ_sが出現する確率ｐ（Ｘ_s｜Ｃ_g）を値ｅｘｐ（−Ｐ／２）に設定すると共に所定の記憶領域（メモリ）に格納し（ステップＳ４２０）、更に変数ｇが最大値Ｇであるか否かを判定する（ステップＳ４３０）。変数ｇが最大値Ｇ未満であれば、判別処理部３４は、変数ｇをインクリメントして（ステップＳ４４０）、再度ステップＳ３３０〜Ｓ４２０の処理を実行する。そして、ステップＳ４３０にて変数ｇが最大値Ｇであると判断された段階で、すべてのクラスＣ_gについての確率ｐ（Ｘ_s｜Ｃ_g）の導出が完了することになる。判別処理部３４は、ステップＳ４３０にて変数ｇが最大値Ｇであると判断すると、得られた確率ｐ（Ｘ_s｜Ｃ_g）が最大となるクラスＣ_gmaxを求め、新奇画像データＩ_sがクラスＣ_gmaxに属することを識別可能とすべく、例えばシンボル等のメタデータを新奇画像データＩ_sやその特徴ベクトルＸ_sに付与し（ステップＳ４５０）、本ルーチンを終了させる。ここまで説明したクラス判別ルーチンによれば、新奇画像データＩ_sが複数のクラスＣ₁〜Ｃ_Gのいずれに属するかを高精度に判別することが可能となる。 When it is determined in step S400 that the variable h is the maximum value H, at that stage, a new value is obtained from the feature vector X ^{(h, i)} of the i-th region in the h-th layer of sample data for a certain class C _g. The sum from the first layer to the Hth layer of the probability that the feature vector X _s ^{(h, i)} of the i-th region in the h-th layer of the image data I _s appears is derived. Accordingly, when determining that the variable h is the maximum value H in step S370, the discrimination processing unit 34 sets the probability p (X _s | C _g ) that the feature vector X _s of the novel image data appears from the class C _g. It is set to exp (−P / 2) and stored in a predetermined storage area (memory) (step S420), and it is further determined whether or not the variable g is the maximum value G (step S430). If the variable g is less than the maximum value G, the determination processing unit 34 increments the variable g (step S440), and executes the processes of steps S330 to S420 again. When the variable g is determined to be the maximum value G in step S430, the derivation of the probability p (X _s | C _g ) for all the classes C _g is completed. When determining that the variable g is the maximum value G in step S430, the determination processing unit 34 obtains a class C _gmax that maximizes the obtained probability p (X _s | C _g ), and the novel image data I _s is obtained. In order to be able to identify that it belongs to the class C _gmax , for example, metadata such as a symbol is added to the novel image data I _s and its feature vector X _s (step S450), and this routine is terminated. According to the class determination routine described so far, it is possible to determine with high accuracy which of the plurality of classes C _{1 to} C _G the novel image data I _s belongs to.

以上説明したように、実施例のロボット装置２０では、図３の特徴ベクトル生成ルーチンにより、実世界情報を示すデータとしての新奇画像データＩ_sから抽出されたｐ^(j)個のｄ次局所特徴ベクトルＶ_k ^(j)の平均ベクトルμ^(j)を構成する要素と、ｐ^(j)個のｄ次局所特徴ベクトルＶ_k ^(j)間における第１次から第Ｍ次までのｍ次相関ベクトルを構成する要素とに基づいて当該新奇画像データＩ_s全体の特徴を示す特徴ベクトルＸ_sが取得される。ここで、ｍ次相関ベクトルは、例えばクラスタリング等に比して大幅に軽負荷な計算処理により取得可能なものであり、しかも、重要な特徴点とその周辺の特徴点との相関すなわち局所特徴の分布情報を良好に表すものである。この結果、実施例のロボット装置２０では、計算コストを大幅に削減しつつ新奇画像データＩ_s（あるいは既知画像データ）から抽出された多数かつ比較的高次の局所特徴ベクトルＶ_k ^(j)に基づいて当該新奇画像データＩ_sあるいは既知画像データ）全体の特徴を精度よく表す（特徴表現度の高い）特徴量としての特徴ベクトルＸ_sを速やかに生成することが可能となる。 As described above, the robot apparatus 20 of the embodiment, the feature vector generating routine of Fig. 3, novel image data I _s p extracted from the ^(j) number of d following local features as data indicating a real-world information M-th order correlation vectors from the first order to the M-th order between the elements constituting the average vector μ ^(j) of the vector V _k ^(j) and p ^(j) d-th order local feature vectors V _k ^(j) feature vector X _s indicating the novel image data I _s overall characteristics is obtained based on the elements that make up the. Here, the m-th order correlation vector can be obtained by a calculation process that is significantly lighter than, for example, clustering, and the correlation between an important feature point and its surrounding feature points, that is, the local feature It represents the distribution information well. As a result, in the robot apparatus 20 according to the embodiment, a large number of relatively high-order local feature vectors V _k ^(j) extracted from the novel image data I _s (or known image data) while greatly reducing the calculation cost. Based on this, it is possible to quickly generate a feature vector X _s as a feature quantity that accurately represents the entire feature (the novel image data I _s or the known image data) (high in feature expression).

また、実施例のロボット装置２０では、ｍ次相関ベクトルの導出に際して、ｄ次よりも低次であるｄｌ次の主成分空間への射影行列Ｕ_dlを用いたｄ次局所特徴ベクトルＶ_k ^(j)の次元圧縮が行われる。これにより、ｄ次局所特徴ベクトルＶ_k ^(j)の次元ｄがより高い場合に、ｍ次相関ベクトルの導出に伴う計算コストを削減することが可能となり、更に次元圧縮によりデータ全体の特徴を表現する上で不要な局所特徴を除去することもできる。そして、このようなｄ次局所特徴ベクトルＶ_k ^(j)の次元圧縮を実行するに際しては、予めＮ個のデータＩ^(j)を用いて射影行列Ｕ_dlを求めておくことにより、新奇画像データＩ^(j+1)が出現したときに、当該新奇データＩ^(j+1)の特徴ベクトルＸ^(j+1)を速やかに取得することが可能となる。 In the robot apparatus 20 of the embodiment, when deriving the m-th order correlation vector, the d-order local feature vector V _k ^(j using the projection matrix U _dl on the dl-order principal component space that is lower than the d-order. ⁾ Dimension compression is performed. As a result, when the dimension d of the d-th order local feature vector V _k ^(j) is higher, it is possible to reduce the calculation cost associated with the derivation of the m-th order correlation vector, and further express the characteristics of the entire data by dimensional compression. In addition, unnecessary local features can be removed. Then, when performing such dimensional compression of the d-th order local feature vector V _k ^(j) , novel image data is obtained by _obtaining a projection matrix U _dl using N pieces of data I ^(j) in advance. When I ^{(j + 1)} appears, the feature vector X ^{(j + 1)} of the novel data I ^{(j + 1)} can be quickly acquired.

更に、実施例のロボット装置２０は、新奇画像データＩ_sが複数のクラスＣ₁〜Ｃ_Gのいずれに属するかを判別するに際して、第１層から第Ｈ層までの各層で複数のサンプルデータおよび新奇画像データＩ_sのそれぞれをｈ×ｈ分割して得られる領域のそれぞれから抽出された複数のｄ次局所特徴ベクトルの平均ベクトルを構成する要素と、当該複数の高次局所特徴ベクトル間における第１次から第Ｍ次までのｍ次相関ベクトルを構成する要素とに基づいて取得される当該領域それぞれの特徴ベクトルＸ^(h,i)，Ｘ_s ^(h,i)を利用する。かかる特徴ベクトルＸ^(h,i)，Ｘ_s ^(h,i)は、上述のように低い計算コストで取得可能であると共に対象領域の特徴を良好に表現し得るものである。そして、実施例のロボット装置２０は、複数のクラスＣ₁〜Ｃ_Gごとに、第１層から第Ｈ層までの各層でサンプルデータのそれぞれをｈ×ｈ分割して得られる領域それぞれの特徴ベクトルＸ^(h,i)を射影行列Ｗ^(h,i)により潜在空間に射影して得られる射影点ｕ^(h,i)と、第１層から第Ｈ層までの各層で新奇画像データＩ_sをｈ×ｈ分割して得られる領域それぞれの特徴ベクトルＸ_s ^(h,i)を射影行列Ｗ^(h,i)により潜在空間に射影して得られる射影点ｕ_s ^(h,i)とに基づいて、サンプルデータの第ｈ層における第ｉ番目の領域の特徴ベクトルＸ^(h,i)から新奇画像データＩ_sの第ｈ層における第ｉ番目の領域の特徴ベクトルＸ_s ^(h,i)が出現する確率のｉ＝１からｉ＝ｈ²かつ第１層から第Ｈ層までの総和を当該クラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sが出現する確率として導出する。 Furthermore, when determining which of the plurality of classes C _{1 to} C _G the novel image data I _s belongs to, the robot apparatus 20 according to the embodiment includes a plurality of sample data and a plurality of sample data in each layer from the first layer to the H layer. and elements that make up the average vector of a plurality of d next local feature vectors extracted each novel image data I _s from each of the areas obtained by h × h division, first among the plurality of higher-order local feature vectors The feature vectors X ^{(h, i)} and X _s ^{(h, i)} of the respective regions acquired based on the elements constituting the m-th order correlation vectors from the first order to the M-th order are used. Such feature vectors X ^{(h, i)} and X _s ^{(h, i)} can be obtained at a low calculation cost as described above, and can well represent the features of the target region. Then, the robot apparatus 20 of the embodiment, for each of a plurality of classes C ₁ -C _G, of the respective sample data at each layer from the first layer up to the H layer each region obtained by h × h divided feature vectors Projection point u ^{(h, i)} obtained by projecting X ^{(h, i)} onto the latent space by projection matrix W ^{(h, i)} and novel image data I _{s in} each layer from the first layer to the H layer To the projection point u _s ^{(h, i)} obtained by projecting the feature vector X _s ^{(h, i)} of each region obtained by dividing h × h into the latent space by the projection matrix W ^{(h, i)} Based on the feature vector X _s ^{(h, i)} of the i-th region in the h-th layer of the novel image data I _s from the feature vector X ^{(h, i)} of the i-th region in the h-th layer of the sample data. I = 1 to i = h ² and the total from the first layer to the H-th layer is the feature vector X of the novel image data I _s from the class C _g. Derived as the probability of appearance of _s .

このように、低い計算コストで取得可能であると共に高い特徴表現度を有する特徴ベクトルＸ^(h,i)，Ｘ_s ^(h,i)と、確率的線形判別分析に対して潜在空間を多重化するという拡張を導入した手法を用いることにより、あるクラスＣ_gから新奇画像データＩ_sの特徴ベクトルＸ_sが出現する確率ｐ（Ｘ_s｜Ｃ_g）をより高精度に導出することができる。従って、実施例のロボット装置２０では、クラスＣ_gごとに導出された確率ｐ（Ｘ_s｜Ｃ_g）から、新奇画像データＩ_sが複数のクラスＣ₁〜Ｃ_Gのいずれに属するかをより高精度に判別することが可能となる。また、確率的線形判別分析の枠組みにおいては、一般化固有値問題を１回解けば学習が完了し、サンプル数に対する計算コストは線形であってメモリ使用量も微小であり、更にクラス数が増加しても基本的に計算コストは変化しない。従って、確率的線形判別分析に対して潜在空間を多重化するという拡張を導入した本発明によるクラス判別方法は、大規模な問題に適用されたとしても、高速な学習およびクラス判別における計算コストの低減化を可能とする。 In this way, feature vectors X ^{(h, i)} and X _s ^{(h, i)} , which can be acquired at a low calculation cost and have high feature expression ^, and the latent space are multiplexed for probabilistic linear discriminant analysis. By using a technique that introduces the extension of the function, the probability p (X _s | C _g ) that the feature vector X _s of the novel image data I _s appears from a certain class C _g can be derived with higher accuracy. Therefore, in the robot apparatus 20 according to the embodiment, it is determined from the probability p (X _s | C _g ) derived for each class C _g whether the novel image data I _s belongs to the plurality of classes C _{1 to} C _G. It becomes possible to discriminate with high accuracy. In the framework of probabilistic linear discriminant analysis, learning is completed by solving the generalized eigenvalue problem once, the calculation cost for the number of samples is linear, the memory usage is very small, and the number of classes increases. But basically the calculation cost does not change. Therefore, the class discrimination method according to the present invention, which introduces the extension of multiplexing the latent space to the stochastic linear discriminant analysis, can be applied to a large-scale problem. Reduction is possible.

上述のように、実施例のロボット装置２０では、必要な学習情報を学習情報記憶装置４２に記憶させておくことにより、撮像ユニット２１により取得された新奇画像データＩ_sから速やかに必要な特徴ベクトルＸ_s等を生成すると共に、新奇画像データＩ_sが複数のクラスＣ₁〜Ｃ_Gのいずれに属するかを高精度かつ速やかに判別することができる。これにより、ロボット装置２０に、取得した実世界情報すなわち見聞きした事柄が何を示すか高速かつ精度よく判断させることが可能となり、ロボット装置２０の自律的行動をより一層人間の行動に近いものとすると共に、ロボット装置２０の知能レベルをより一層向上させることが可能となる。 As described above, the robot apparatus 20 of the embodiment, by storing the learning information necessary for learning information storage device 42, immediately feature vectors required from novel image data I _s obtained by the imaging unit 21 In addition to generating X _{s and the} like, it is possible to determine with high accuracy and promptness which of the plurality of classes C _{1 to} C _G the novel image data I _s belongs to. This makes it possible to cause the robot apparatus 20 to determine the acquired real-world information, that is, what is seen or heard, at high speed and with high accuracy, and makes the autonomous action of the robot apparatus 20 closer to human action. In addition, the intelligence level of the robot apparatus 20 can be further improved.

ここで、本発明による特徴ベクトル生成方法と、既存の特徴ベクトルの生成手法であるBag-of-Keypoints法とについて、Ｎ個の画像データのそれぞれから抽出されたｐ個のｄ次局所特徴ベクトルに基づいて各画像データの特徴ベクトルを生成するのに要する計算コストをそれぞれの手法における前処理および特徴ベクトル生成処理という２プロセスに分けて評価する。ここで、本発明による特徴ベクトル生成方法では、主成分分析によるｄ次局所特徴ベクトルの次元圧縮に必要な射影行列の導出プロセスが前処理に相当し、図３のステップＳ１３０〜Ｓ１８０の処理が特徴ベクトル生成処理に相当する。また、Bag-of-Keypoints法では、ｄ次局所特徴ベクトルをクラスタリングすると共にクラスタの代表ベクトル（visual words）を導出する処理が前処理に相当し、局所特徴を最も近い“visual words”に割り当てる処理が特徴ベクトル生成処理に相当する。 Here, with respect to the feature vector generation method according to the present invention and the Bag-of-Keypoints method, which is an existing feature vector generation method, p number of d-order local feature vectors extracted from each of N pieces of image data. Based on this, the calculation cost required to generate the feature vector of each image data is evaluated by dividing into two processes of pre-processing and feature vector generation processing in each method. Here, in the feature vector generation method according to the present invention, the projection matrix derivation process necessary for dimensional compression of the d-th order local feature vector by principal component analysis corresponds to preprocessing, and the processing of steps S130 to S180 in FIG. This corresponds to vector generation processing. In the Bag-of-Keypoints method, the process of clustering d-order local feature vectors and deriving cluster representative vectors (visual words) corresponds to pre-processing, and assigns local features to the closest “visual words”. Corresponds to the feature vector generation process.

まず、本発明による特徴ベクトル生成方法およびBag-of-Keypoints法の前処理における計算コストのオーダーについて検討すると、本発明による特徴ベクトル生成方法の前処理における計算コストのオーダーＯａは、Ｏａ∝ｐ・Ｎ・ｄ²となるのに対して、代表ベクトル（visual words）の数を“Ｖ”とすれば、Bag-of-Keypoints法の前処理における計算コストのオーダーＯａ_bagは、Ｏａ_bag∝ｐ・Ｎ・Ｖ・ｄとなる。一般に、局所特徴ベクトルの次元ｄに比べてBag-of-Keypoints法における“visual words”の数Ｖは大きく、問題の規模が大きくなるほど値Ｖは大きくなることから、本発明による特徴ベクトル生成方法の前処理における計算コストは、Bag-of-Keypoints法のものに比べてかなり低いといえる。また、本発明による特徴ベクトル生成方法の前処理におけるメモリ使用量のオーダーＯｍは、Ｏｍ∝ｄ²となるのに対して、Bag-of-Keypoints法の前処理におけるメモリ使用量のオーダーＯｍ_bagは、Ｏｍ_bag∝ｐ・Ｎ・ｄとなる。一般に、局所特徴ベクトルの次元ｄに比べて画像データの数Ｎは大きく、問題の規模が大きくなるほど値Ｎは大きくなることから、本発明による特徴ベクトル生成方法の前処理におけるメモリ使用量は、Bag-of-Keypoints法のものに比べてかなり少なくなるといえる。そして、Ｎ＝８００，ｐ＝６００，ｄ＝１２８，Ｖ＝１５００として、本発明者らが既存の汎用パーソナルコンピュータを用いて本発明による特徴ベクトル生成方法における前処理と、Bag-of-Keypoints法における前処理とを実行したところ、Bag-of-Keypoints法では、前処理（クラスタリングおよびvisual words”の導出）におよそ１８時間もの時間を要することがあったのに対して、本発明による方法における前処理はおよそ９０秒ほどで完了した。 First, considering the order of calculation costs in the pre-processing of the feature vector generation method and the Bag-of-Keypoints method according to the present invention, the calculation cost order Oa in the pre-processing of the feature vector generation method according to the present invention is Oa∝p · If the number of representative vectors (visual words) is “V” while N · d ² , the calculation cost order Oa _bag in the pre-processing of the Bag-of-Keypoints method is Oa _bag ∝p · N · V · d. In general, the number V of “visual words” in the Bag-of-Keypoints method is larger than the dimension d of the local feature vector, and the value V increases as the problem scale increases. It can be said that the calculation cost in the preprocessing is considerably lower than that of the Bag-of-Keypoints method. The order of memory usage Om in the pre-processing of the feature vector generation method according to the present invention is OmＯd ² , whereas the order of memory usage Om _bag in the pre-processing of the Bag-of-Keypoints method is , Om _bag ∝p · N · d. In general, the number N of image data is larger than the dimension d of the local feature vector, and the value N increases as the scale of the problem increases. Therefore, the memory usage in the preprocessing of the feature vector generation method according to the present invention is It can be said that it is considerably less than that of -of-Keypoints method. Then, assuming that N = 800, p = 600, d = 128, and V = 1500, the inventors performed preprocessing in the feature vector generation method according to the present invention using an existing general-purpose personal computer, and the Bag-of-Keypoints method. In the Bag-of-Keypoints method, the pre-processing (clustering and derivation of visual words) may take approximately 18 hours, whereas the method according to the present invention Pre-processing was completed in about 90 seconds.

更に、本発明による特徴ベクトル生成方法およびBag-of-Keypoints法の特徴ベクトル生成処理における計算コストのオーダーについて検討すると、本発明の特徴ベクトル生成処理における計算コストのオーダーＯｆは、Ｏｆ∝ｐ・ｄ²となるのに対して、Bag-of-Keypoints法の特徴ベクトル生成処理における計算コストのオーダーＯｆ_bagは、Ｏｆ_bag∝ｐ・Ｖ・ｄとなる。上述のように、局所特徴ベクトルの次元ｄに比べてBag-of-Keypoints法における“visual words”の数Ｖは大きく、問題の規模が大きくなるほど値Ｖは大きくなることから、本発明の特徴ベクトル生成処理における計算コストも、Bag-of-Keypoints法のものに比べてかなり低いといえる。そして、ｐ＝６００，ｄ＝１２８，Ｖ＝１５００として、本発明者らが既存の汎用パーソナルコンピュータを用いて本発明の特徴ベクトル生成処理と、Bag-of-Keypoints法における特徴ベクトル生成処理とを実行したところ、Bag-of-Keypoints法では、１つの画像データについての特徴ベクトルの生成におよそ８６０ｍｓｅｃほどの時間を要したのに対して、本発明の特徴ベクトル生成処理は１つの画像データの特徴ベクトルをおよそ６０ｍｓｅｃほどの時間で生成した。これらの検討結果から、本発明による特徴ベクトル生成方法は、計算コストに関して既存手法に比べて極めて優位にあるといえる。 Further, considering the calculation cost order in the feature vector generation method of the present invention and the feature vector generation processing of the Bag-of-Keypoints method, the calculation cost order Of in the feature vector generation processing of the present invention is Of∝p · d ^On the other hand, the order Of _bag of the calculation cost in the feature vector generation process of the Bag-of-Keypoints method is Of _bag ∝p · V · d. As described above, the number V of “visual words” in the Bag-of-Keypoints method is larger than the dimension d of the local feature vector, and the value V increases as the problem size increases. It can be said that the calculation cost in the generation process is considerably lower than that of the Bag-of-Keypoints method. Then, assuming that p = 600, d = 128, and V = 1500, the inventors performed the feature vector generation processing of the present invention and the feature vector generation processing in the Bag-of-Keypoints method using an existing general-purpose personal computer. When executed, in the Bag-of-Keypoints method, it took about 860 msec to generate a feature vector for one image data, whereas the feature vector generation processing of the present invention performed a feature of one image data. The vector was generated in about 60 msec. From these examination results, it can be said that the feature vector generation method according to the present invention is extremely superior to the existing method in terms of calculation cost.

続いて、本発明による特徴ベクトル生成方法により生成された特徴ベクトルの特徴表現度を評価する。かかる特徴表現度を評価するために、本発明者らは、エッジヒストグラム、ＨＳＶカラーヒストグラム、“Gray-SIFT”および“RGB-SIFT”という４種類の局所特徴記述子と本発明による特徴ベクトル生成方法により得られる特徴ベクトルとの組み合わせと、上記４種類の局所特徴記述子と高次局所特徴ベクトルの平均ベクトル（０次相関ベクトル）を特徴ベクトルとする手法（以下、“Mean”という）との組合わせとを用いて、“OT8”と呼ばれるデータセットのシーン判別を実行し、局所特徴記述子ごとに本発明による特徴ベクトル生成方法と“Mean”とでシーン判別率を比較した。図７にシーン判別率の比較結果を示す。なお、“OT8”は，“coast”，“forest”，“mountain”，“open country”，“highway”，“inside city”，“tall building”，“street”という８クラスのシーン（カラー画像）を含むデータセットである（A. Oliva and A. Torallba. Modeling the shape of the scene: A holistic representation of the spatial参照）。また、エッジヒストグラムとして、７２次元の勾配方向ヒストグラムを用い、グレー画像から局所特徴を抽出した。カラーヒストグラムとしては、ＨＳＶ空間における標準的な８４次元(Ｈ：３６次元、Ｓ：３２次元、Ｖ：１６次元)のものを用いた。これらの局所特徴記述子については，局所特徴抽出窓を１０×１０ピクセル、Ｌ＝５に固定した。更に、SIFT記述子については、局所特徴抽出窓を１６×１６ピクセル、Ｌ＝５に固定し、他の記述子に比べて高次元であることを考慮して主成分分析による次元圧縮を行った（ｄｌ＝３０）。そして、本発明による特徴ベクトル生成方法の適用に際しては、相関ベクトルの最大次数ＭをＭ＝１とした。 Subsequently, the feature expression degree of the feature vector generated by the feature vector generation method according to the present invention is evaluated. In order to evaluate such feature expression, the present inventors have developed four types of local feature descriptors such as edge histogram, HSV color histogram, “Gray-SIFT” and “RGB-SIFT”, and a feature vector generation method according to the present invention. And a combination of the above-described four types of local feature descriptors and a technique (hereinafter referred to as “Mean”) that uses an average vector (0th-order correlation vector) of the four types of local feature descriptors and higher-order local feature vectors. The scene discrimination of the data set called “OT8” was executed using the combination, and the scene discrimination rate was compared between the feature vector generation method according to the present invention and “Mean” for each local feature descriptor. FIG. 7 shows a comparison result of scene discrimination rates. “OT8” has 8 classes of scenes (color images): “coast”, “forest”, “mountain”, “open country”, “highway”, “inside city”, “tall building”, “street” (See A. Oliva and A. Torallba. Modeling the shape of the scene: A holistic representation of the spatial). A 72-dimensional gradient direction histogram was used as the edge histogram, and local features were extracted from the gray image. As the color histogram, a standard 84 dimension (H: 36 dimension, S: 32 dimension, V: 16 dimension) in the HSV space was used. For these local feature descriptors, the local feature extraction window was fixed at 10 × 10 pixels and L = 5. Furthermore, for SIFT descriptors, the local feature extraction window was fixed to 16 × 16 pixels and L = 5, and dimension compression by principal component analysis was performed in consideration of higher dimensions than other descriptors. (Dl = 30). When the feature vector generation method according to the present invention is applied, the maximum order M of the correlation vector is set to M = 1.

図７からわかるように、上記４種類の局所特徴記述子のいずれを用いても、本発明の特徴ベクトル生成方法により得られる特徴ベクトルを利用することにより、“Mean”を利用した場合に比べてシーン判別性能が大きく向上することがわかる。そして、図７に示す評価結果は、画像データの特徴ベクトルの生成に際してｍ次相関ベクトルを用いることにより、画像の重要な特徴点とその周辺の特徴点との相関すなわち局所特徴の分布情報が特徴ベクトルに良好に反映されていることを示している。 As can be seen from FIG. 7, using any of the above four types of local feature descriptors, the feature vector obtained by the feature vector generation method of the present invention is used, compared with the case where “Mean” is used. It can be seen that the scene discrimination performance is greatly improved. The evaluation results shown in FIG. 7 are obtained by using the m-th order correlation vector when generating the feature vector of the image data, and thus the correlation between the important feature points of the image and the surrounding feature points, that is, the distribution information of the local features is the feature. It shows that it is well reflected in the vector.

次に、本発明による特徴ベクトル生成方法により生成された特徴ベクトルを用いた本発明によるクラス判別方法の有効性を評価する。かかる有効性を評価するために、本発明者らは、上記特徴ベクトルを用いた本発明によるクラス判別方法（Ｈ＝３）と、上記特徴ベクトルを用いた確率的線形判別分析のみによるクラス判別方法（Ｈ＝１：参考）と、複数の既存手法とを用いて、上述の“OT8”、“LSP15”および“Caltech-101”という３種類のデータセットのシーン判別を実行し、判別手法ごとのシーン判別率を比較した。なお、既存手法については、“NO-SI”：画像の位置情報（Spatial Information）を含まないもの、および“SI”：画像の位置情報を含むもの、を適宜用意した。ここで、“LSP15”は、“OT8”における８クラスのシーンと“bed room”，“kitchen”，“living room”，“store”，“suburb”，“industrial”，および“office”との合計１５クラスのシーン（モノクロ画像）を含むデータセットである（S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.参照）。また、“Caltech-101”は、１０１個の物体クラスと背景クラスとの合計１０２個のクラスを含むデータセットである（L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Proc. IEEE CVPR Workshop on Generative-Model Based Vision, 2004.参照）。また、有効性評価に際し、“OT8”および“Caltech-101”については“RGB-SIFT”を、“LSP15”については“Gray-SIFT”をそれぞれ局所特徴記述子として用い、１６×１６の領域と３６×３６の領域との双方から抽出された局所特徴ベクトルを列挙して最終的な高次局所特徴ベクトルとした。また、既存手法としては、それぞれ次の挙げる文献に記載された手法［Ａ］〜手法［Ｇ］を用いた。なお、手法［Ａ］は、“CRF（Conditional Random Field）により画像の“part-based”な“generative model”を推定して画像のセグメンテーションと識別とを同時に行うものであるが、その計算コストはbag-of-keypoints法と比べても更に高いものである。また、手法Ｂおよび手法Ｃは、SIFT記述子とbag-of-keypoints法とを用いた局所特徴抽出を行うと共に“SVM（Support Vector Machine）”等によるクラス判別を行うものである。図８にシーン判別率の比較結果を示す。 Next, the effectiveness of the class discrimination method according to the present invention using the feature vector generated by the feature vector generation method according to the present invention is evaluated. In order to evaluate the effectiveness, the inventors of the present invention used a class discrimination method according to the present invention using the feature vector (H = 3) and a class discrimination method based only on probabilistic linear discriminant analysis using the feature vector. (H = 1: Reference) and a plurality of existing methods, scene discrimination of the above three types of data sets “OT8”, “LSP15”, and “Caltech-101” is executed. The scene discrimination rate was compared. As for the existing methods, “NO-SI”: one not including image position information (Spatial Information) and “SI”: one including image position information were appropriately prepared. Here, “LSP15” is the total of 8 class scenes in “OT8” and “bed room”, “kitchen”, “living room”, “store”, “suburb”, “industrial”, and “office” In Proc. IEEE Conf. Computer Vision A dataset containing 15 classes of scenes (monochrome images) (S. Lazebnik, C. Schmid, and J. Ponce. and Pattern Recognition, 2006.). “Caltech-101” is a data set including 102 classes of 101 object classes and background classes (L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models). from few training examples: an incremental bayesian approach tested on 101 object categories. In Proc. See IEEE CVPR Workshop on Generative-Model Based Vision, 2004.). When evaluating the effectiveness, “RGB-SIFT” for “OT8” and “Caltech-101” and “Gray-SIFT” for “LSP15” are used as local feature descriptors. The local feature vectors extracted from both the 36 × 36 region were enumerated as final high-order local feature vectors. In addition, as the existing methods, the methods [A] to [G] described in the following documents are used. The method [A] is to estimate the “part-based” “generative model” of the image by “CRF (Conditional Random Field) and perform segmentation and identification of the image at the same time. Compared to the bag-of-keypoints method, Method B and Method C perform local feature extraction using the SIFT descriptor and the bag-of-keypoints method, as well as “SVM (Support Vector). Class) ”etc. FIG. 8 shows a comparison result of scene discrimination rates.

手法［Ａ］：Y. Wang and S. Gong. Conditional random field for natural scene categorization. In Proc. British Machine Vision Conference, 2007.
手法［Ｂ］：A. Bosch, A. Zisserman, and X. Mu・noz. Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Analysis and Machine Intelligence, pages 712・727, 2008.
手法［Ｃ］：S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
手法［Ｄ］：O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
手法［Ｅ］：H. Zhang, A. C. Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, volume 2, pages 2126・2136, 2006.
手法［Ｆ］：K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8:725・760, 2007.
手法［Ｇ］：N. Herv´e and N. Boujemaa. Image annotation: which approach for realistic databases? In Proc. ACM International Conference on Image and Video Retrieval, 2007 Method [A]: Y. Wang and S. Gong. Conditional random field for natural scene categorization. In Proc. British Machine Vision Conference, 2007.
Method [B]: A. Bosch, A. Zisserman, and X. Mu ・ noz. Scene classification using a hybrid generative / discriminative approach. IEEE Trans. Pattern Analysis and Machine Intelligence, pages 712 ・ 727, 2008.
Method [C]: S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
Method [D]: O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
Method [E]: H. Zhang, AC Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, volume 2, pages 2126 ・ 2136, 2006.
Method [F]: K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8: 725 ・ 760, 2007.
Method [G]: N. Herv´e and N. Boujemaa. Image annotation: which approach for realistic databases? In Proc. ACM International Conference on Image and Video Retrieval, 2007

図８の比較結果からわかるように、“OT8”および“LSP15”に関し、確率的線形判別分析に潜在空間の多重化という拡張を導入した本発明によるクラス判別方法（図８のＨ＝３）は、既存手法のスコアを上回るシーン判別率を記録している。また、“Caltech-101”に関しても、本発明によるクラス判別方法（図８のＨ＝３）は、少なくとも既存手法のスコアに比肩するシーン判別率を記録している。更に、Ｈ＝１の手法Ｃ（判別率４１．２％）は、最も標準的なBag-of-keypoints法に相当するものであるが、“Caltech-101”に関し、本発明によるクラス判別方法は、Ｈ＝１の手法Ｃのスコアを大きく上回るシーン判別率を記録しており、この点から、本発明によるクラス判別方法が実用上極めて優れていることが裏付けられる。なお、手法［Ｄ］および［Ｆ］は、“Caltech-101”に関し、それぞれ高いスコアを記録しているが，両手法はいずれも局所特徴のマッチングを必要とするものであって膨大な計算コストおよびメモリ使用量を要求し、実用に供し得ないものである。また、図８からわかるように、本発明による特徴ベクトル生成方法により生成された特徴ベクトルを用いた確率的線形判別分析のみによるクラス判別方法（Ｈ＝１：参考）は、既存手法“SI”のスコアに及ばないものの、既存手法“NO SI”のスコアを上回るシーン判別率を記録している。これは、本発明による特徴ベクトル生成方法により生成された特徴ベクトルが特徴表現度において優れていること、およびクラス判別における潜在空間の多重化の有効性の高さとの双方を示していることに他ならない。いずれにしても、図８の比較結果から、本発明による特徴ベクトル生成方法により生成された特徴ベクトルを用いた本発明によるクラス判別方法が、非常にシンプルでありながら高速な計算処理を可能とし、かつ既存手法に比肩もしくはそれを越える結果を挙げるものであることが理解されよう。 As can be seen from the comparison results in FIG. 8, with regard to “OT8” and “LSP15”, the class discrimination method (H = 3 in FIG. 8) according to the present invention in which the extension of multiplexing of latent space is introduced into the stochastic linear discriminant analysis is The scene discrimination rate exceeding the score of the existing method is recorded. Also for “Caltech-101”, the class discrimination method according to the present invention (H = 3 in FIG. 8) records a scene discrimination rate comparable to at least the score of the existing method. Further, Method C with H = 1 (discrimination rate 41.2%) corresponds to the most standard Bag-of-keypoints method. Regarding “Caltech-101”, the class discrimination method according to the present invention is as follows. The scene discrimination rate greatly exceeding the score of Method C with H = 1 is recorded, and this point confirms that the class discrimination method according to the present invention is extremely excellent in practical use. Although the methods [D] and [F] each have a high score for “Caltech-101”, both methods require matching of local features, and enormous calculation costs are required. In addition, it requires memory usage and cannot be put to practical use. Further, as can be seen from FIG. 8, the class discrimination method (H = 1: reference) based only on the probabilistic linear discriminant analysis using the feature vector generated by the feature vector generation method according to the present invention is based on the existing method “SI”. Although it does not reach the score, the scene discrimination rate exceeding the score of the existing method “NO SI” is recorded. This is because both the feature vector generated by the feature vector generation method according to the present invention is excellent in the feature expression level and the high effectiveness of the latent space multiplexing in the class discrimination. Don't be. In any case, from the comparison result of FIG. 8, the class discrimination method according to the present invention using the feature vector generated by the feature vector generation method according to the present invention enables a high-speed calculation process while being very simple. And it will be understood that it gives results that are comparable or better than existing methods.

なお、本発明の適用対象は、上述のようなロボット装置２０に限られるものでなく、本発明は、デジタルカメラやビデオカメラに適用される被写体判別装置や、車両前方等に存在する物体を判別する車載画像認識装置に適用されてもよい。また、本発明は、図９に例示するような画像データ処理システムにも適用され得る。同図に示す画像データ処理システム２００は、データベース化された多数の画像データや単語群データを記憶するデータ記憶装置２１０と、データ記憶装置２１０上のデータベースを管理すると共に新奇画像データに対するアノテーションや当該データベースの検索（リトリーバル）等を可能とする管理コンピュータ３００とから構成されるものである。管理コンピュータ３００は、図示しないＣＰＵ，ＲＯＭ，ＲＡＭ、システムバス、各種インターフェース、記憶装置等を含むものであり、この管理コンピュータ３００には、例えばインターネット等のネットワークを介して端末５００からアクセスすることができるようになっている。また、管理コンピュータ３００には、図９に示すように、ＣＰＵやＲＯＭ，ＲＡＭ、各種インターフェース、記憶装置といったハードウェアと、予めインストールされた本発明による特徴量生成プログラムやクラス判別プログラムを始めとする各種ソフトウェアとの一方または双方の協働により、検索ロボット３１０、データ受付部３２０、画像特徴量抽出部３３０、単語特徴量抽出部３４０、学習処理部３５０、アノテーション処理部３６０、検索クエリ受付部３７０、リトリーバル処理部３８０、結果出力部３９０等が機能ブロックとして構築されている。更に、管理コンピュータ３００には、特徴量記憶装置４００や学習情報記憶装置４１０が接続されている。 Note that the scope of application of the present invention is not limited to the robot apparatus 20 as described above, and the present invention is capable of discriminating an object existing in the front of a vehicle, etc. It may be applied to an in-vehicle image recognition device. The present invention can also be applied to an image data processing system as illustrated in FIG. The image data processing system 200 shown in the figure manages a data storage device 210 that stores a large number of image data and word group data in a database, a database on the data storage device 210, and annotations for new image data. The management computer 300 enables database retrieval (retrieval) and the like. The management computer 300 includes a CPU, ROM, RAM, system bus, various interfaces, storage devices and the like (not shown). The management computer 300 can be accessed from the terminal 500 via a network such as the Internet. It can be done. Further, as shown in FIG. 9, the management computer 300 includes hardware such as a CPU, a ROM, a RAM, various interfaces, and a storage device, and a preinstalled feature quantity generation program and class determination program according to the present invention. The search robot 310, the data reception unit 320, the image feature amount extraction unit 330, the word feature amount extraction unit 340, the learning processing unit 350, the annotation processing unit 360, and the search query reception unit 370 are cooperated with one or both of various software. The retrieval processing unit 380, the result output unit 390, and the like are constructed as functional blocks. Further, a feature amount storage device 400 and a learning information storage device 410 are connected to the management computer 300.

管理コンピュータ３００の検索ロボット３１０は、ネットワーク等を介してデータ記憶装置２１０のデータベースに記憶されていない画像を含むデータを収集し、データベースを更新する。データ受付部３２０は、各種入力手段を用いた人の手による画像データの入力や、画像データに対応付けられて当該画像データの画像に現れているものを示すメタデータとしての少なくとも１つの単語（シンボル）を示す単語群データの入力を受け付け、受け付けたデータをデータ記憶装置２１０に格納する。画像特徴量抽出部３３０は、画像データからそのデータの特徴を示す画像特徴量を抽出し、特徴量記憶装置４００に格納する。すなわち、画像特徴量抽出部３３０は、上述の高次局所特徴ベクトルの平均ベクトルを構成する要素と高次局所特徴ベクトル間のｍ次相関ベクトルを構成する要素とに基づいて画像データ全体の特徴を示す特徴ベクトルを取得する。単語特徴量抽出部３４０は、単語群データからそのデータの特徴を示す単語特徴量を抽出し、特徴量記憶装置４００に格納する。学習処理部３５０は、画像特徴量および単語特徴量の組み合わせを複数用いて画像データと単語群データとの関係を学習すると共に、未注釈画像データにメタデータとしての単語群を付与するアノテーションや単語に基づく未注釈画像データの検索（リトリーバル）に必要な学習情報を取得し、取得した学習情報を学習情報記憶装置４１０に格納する。また、学習処理部３５０は、上記特徴ベクトルや新奇画像データのクラス判別に必要な学習情報を生成する。アノテーション処理部３６０は、未注釈の画像データに対するアノテーションや、新奇画像データのクラス判別を実行する。検索クエリ受付部３７０は、端末５００等から検索クエリとしての少なくとも１つの単語（シンボル）の入力を受け付ける。リトリーバル処理部３８０は、検索クエリ受付部３７０により受け付けられた検索クエリに基づく未注釈の画像データを含む画像データの検索処理（リトリーバル）を実行する。結果出力部３９０は、リトリーバル処理部３８０の処理の結果を端末５００等に出力する。このような画像データ処理システム２００に本発明を適用すれば、画像特徴量（特徴ベクトル）の生成に要する計算コストを低減すると共に、新奇画像データのクラス判別性能を向上させることが可能となり、それによりシステム全体の性能を向上させることができる。 The search robot 310 of the management computer 300 collects data including images that are not stored in the database of the data storage device 210 via a network or the like, and updates the database. The data receiving unit 320 inputs at least one word (metadata indicating input of image data by a human hand using various input means or metadata appearing in an image of the image data in association with the image data). The input of word group data indicating (symbol) is received, and the received data is stored in the data storage device 210. The image feature quantity extraction unit 330 extracts an image feature quantity indicating the feature of the data from the image data, and stores it in the feature quantity storage device 400. That is, the image feature quantity extraction unit 330 calculates the features of the entire image data based on the elements constituting the average vector of the above-mentioned higher-order local feature vectors and the elements constituting the m-th order correlation vector between the higher-order local feature vectors. The indicated feature vector is acquired. The word feature amount extraction unit 340 extracts a word feature amount indicating the feature of the data from the word group data and stores it in the feature amount storage device 400. The learning processing unit 350 learns the relationship between the image data and the word group data by using a plurality of combinations of the image feature amount and the word feature amount, and assigns a word group as metadata to the unannotated image data. The learning information necessary for the search (retrieval) of the unannotated image data based on is acquired, and the acquired learning information is stored in the learning information storage device 410. The learning processing unit 350 generates learning information necessary for class determination of the feature vector and novel image data. The annotation processing unit 360 executes annotation for unannotated image data and class determination of novel image data. The search query receiving unit 370 receives input of at least one word (symbol) as a search query from the terminal 500 or the like. The retrieval processing unit 380 executes a retrieval process (retrieval) of image data including unannotated image data based on the search query accepted by the search query acceptance unit 370. The result output unit 390 outputs the processing result of the retrieval processing unit 380 to the terminal 500 or the like. By applying the present invention to such an image data processing system 200, it is possible to reduce the calculation cost required for generating image feature quantities (feature vectors) and improve the class discrimination performance of novel image data. As a result, the performance of the entire system can be improved.

以上、実施例を用いて本発明の実施の形態について説明したが、本発明は上記実施例に何ら限定されるものではなく、本発明の要旨を逸脱しない範囲内において、様々な変更をなし得ることはいうまでもない。 The embodiments of the present invention have been described above using the embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the present invention. Needless to say.

本発明は、実世界情報を示す１つのデータの特徴ベクトルを取り扱ったり、実世界情報を示す新奇データが複数のクラスのいずれに属するかを判別したりする情報処理分野において有用である。 INDUSTRIAL APPLICABILITY The present invention is useful in the information processing field in which a feature vector of one piece of data indicating real world information is handled and whether or not new data indicating real world information belongs to a plurality of classes.

２０ロボット装置、２１撮像ユニット、２２集音ユニット、２３アクチュエータ、３０制御コンピュータ、３１入出力処理部、３２特徴量処理部、３３学習処理部、３４判別処理部、３５リトリーバル処理部、３６主制御部、４０データ記憶装置、４１特徴量記憶装置、４２学習情報記憶装置、２００画像データ処理システム、２１０データ記憶装置、３００管理コンピュータ、３１０検索ロボット、３２０データ受付部、３３０画像特徴量抽出部、３４０単語特徴量抽出部、３５０学習処理部、３６０アノテーション処理部、３７０検索クエリ受付部、３８０リトリーバル処理部、３９０結果出力部、４００特徴量記憶装置、４１０学習情報記憶装置。 20 robot apparatus, 21 imaging unit, 22 sound collecting unit, 23 actuator, 30 control computer, 31 input / output processing unit, 32 feature amount processing unit, 33 learning processing unit, 34 discrimination processing unit, 35 retrieval processing unit, 36 main control Unit, 40 data storage device, 41 feature amount storage device, 42 learning information storage device, 200 image data processing system, 210 data storage device, 300 management computer, 310 search robot, 320 data reception unit, 330 image feature amount extraction unit, 340 word feature amount extraction unit, 350 learning processing unit, 360 annotation processing unit, 370 search query reception unit, 380 retrieval processing unit, 390 result output unit, 400 feature amount storage device, 410 learning information storage device.

Claims

A feature quantity generation device that generates a feature vector that represents a feature of the entire data using a plurality of higher-order local feature vectors extracted from one data that represents real world information,
Average acquisition means for acquiring an average vector of the plurality of higher-order local feature vectors;
First-order to M-th order m-order correlation vectors between the plurality of higher-order local feature vectors (where “M” is an integer greater than or equal to value 1, and “m” is an integer between value 1 and value M) Correlation acquisition means for acquiring
Feature vector obtaining means for obtaining the feature vector based on elements constituting the average vector obtained by the average obtaining means and elements constituting the m-th order correlation vector obtained by the correlation obtaining means;
A feature amount generating apparatus.

In the feature-value production | generation apparatus of Claim 1,
When p d-order local feature vectors extracted from one data I indicating real world information are V _k = (v ₁ ,..., V _d ) (where “p” and “d” are respectively The value is an integer greater than or equal to 2, and “k” is an integer from the value 1 to the value p), and the average acquisition means calculates the average vector μ of the p d-order local feature vectors V _k as follows: The correlation acquisition unit acquires the autocorrelation matrix R of the p d-order local feature vectors V _k according to the following equation (2) and enumerates elements of the upper triangular matrix of the autocorrelation matrix R Then, the primary correlation vector upper (R) is acquired, and the feature vector acquisition means, when the feature vector is X, the elements of the average vector μ and the primary correlation vector upper (R) according to the following equation (3): ) To list the feature vector. Feature value generating unit that acquires Le X.

In the feature-value production | generation apparatus of Claim 1,
The correlation acquisition unit is a feature quantity generation device that acquires the m-th order correlation vector with dimensional compression of the higher-order local feature vector by principal component analysis.

In the feature-value production | generation apparatus of Claim 3,
Assume that there are N pieces of data I ^(j) indicating real world information (where “N” is an integer greater than or equal to value 2 and “j” is an integer between value 1 and value N), The p ^(j) d-th order local feature vectors extracted from the data I ^(j) are V _k ^(j) = (v ₁ ,..., V _d ) (where “p ^(j) ” and “d” Are integers of value 2 or more, and “k” is an integer from value 1 to value p), and the average of the p ^(j) d-order local feature vectors V _k acquired by the average acquisition means The vector is μ ^(j) shown in the following equation (4), the autocorrelation matrix of the p ^(j) d-order local feature vectors V _k ^(j) is R ^(j) shown in the following equation (5), When the autocorrelation matrix of the entire d-th order local feature vector extracted from the N pieces of data is R _all shown in the following equation (6) and the novel data is I ^{(j + 1)} , the correlation obtaining unit , A projection matrix U _dl to dl next principal component space than d following obtained by solving the eigenvalue problem of 7) is low order, extracted from the novel data ^{^{I (j + 1) p (}} j + 1) Obtaining a diagonal matrix U _dl ^T R ^{(j + 1)} U _dl based on the autocorrelation matrix R ^{(j + 1) of the} number of d-order local feature vectors V _k ^{(j + 1)} , and the diagonal matrix U _dl ^T R ^{(j + 1)} U _dl upper triangular matrix elements are enumerated to obtain a primary correlation vector upper (U _dl ^T R ^{(j + 1)} U _dl ), and the feature vector acquisition means includes: The elements constituting the average vector μ ^{(j + 1)} of the p ^{(j + 1)} d-order local feature vectors V _k ^{(j + 1)} and the first-order correlation vector upper (U _dl ^T A feature quantity generation device that obtains a feature vector X ^{(j + 1)} of novel data I ^{(j + 1)} by enumerating elements constituting R ^{(j + 1)} U _dl ).

A feature amount generation method for generating a feature vector indicating a feature of the entire data using a plurality of higher-order local feature vectors extracted from one data indicating real world information,
The average vector of the plurality of higher-order local feature vectors and the m-th order correlation vector from the first order to the M-th order between the plurality of higher-order local feature vectors (where “M” is an integer greater than or equal to 1) , “M” is an integer from value 1 to value M),
A feature quantity generation method for acquiring the feature vector based on elements constituting the acquired average vector and elements constituting the acquired m-th order correlation vector.

A feature amount generation program that causes a computer to function as a device that generates a feature vector indicating the characteristics of the entire one data using a plurality of higher-order local feature vectors extracted from one data indicating real world information,
An average acquisition module for acquiring an average vector of the plurality of higher-order local feature vectors;
First-order to M-th order m-order correlation vectors between the plurality of higher-order local feature vectors (where “M” is an integer greater than or equal to value 1, and “m” is an integer between value 1 and value M) A correlation acquisition module for acquiring
The feature vector is acquired based on an element constituting an average vector acquired by the average acquisition module and an element constituting an m-th order correlation vector of the plurality of higher-order local feature vectors acquired by the correlation acquisition module. A feature vector acquisition module,
A feature generation program comprising:

A class discriminating apparatus for discriminating to which one of a plurality of classes each corresponding to at least one known data each of novel data indicating real world information belongs,
Each of the novel data and the known data in the h-th layer is h × h (where “h” is an integer from value 1 to value H, and “H” is an integer greater than or equal to value 2). The feature vector derived for each region based on the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer is set as a latent space. Conversion storage means for storing conversion for projecting;
Assuming that the novel data is divided into h × h regions in the h-th layer, a plurality of higher-order local features from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer Local feature extraction means for extracting vectors;
An average vector of a plurality of higher-order local feature vectors extracted by the local feature extraction means from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer, and the plurality of higher-order M-th order correlation vectors from the first order to the M-th order between local feature vectors (where “M” is an integer greater than or equal to value 1 and “m” is an integer between value 1 and value M). And a feature vector acquisition means for acquiring a feature vector of each of the regions based on an element constituting the average vector and an element constituting the m-th order correlation vector,
For each class, it is obtained by projecting the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer into the latent space by the transformation corresponding to the region. A projection point, and a projection point obtained by projecting the feature vector of each region obtained by dividing the novel data in each layer from the first layer to the H-th layer onto the latent space by the transformation corresponding to the region; Based on the feature vector of the i-th area (where “i” is an integer from 1 to h ² ) in the h-th layer of the known data, the i-th in the h-th layer of the novel data Probability derivation for deriving the sum of the probability from the first layer to the H-th layer from i = 1 to i = h ² and the probability that the feature vector of the new data appears from the class Means,
Class setting means for setting a class having a maximum probability derived by the probability deriving means as a class to which the novel data belongs;
A class discrimination device comprising:

In the class discrimination device according to claim 7,
In the transformation for the i-th region in the h-th layer, the number of classes is G (where “G” is an integer equal to or greater than 2), and the class is C _g (where “g” is The number of sample data that is known data extracted as a sample from class C _g is n (where “n” is an integer greater than or equal to value 1), A feature vector of the i-th region in the h-th layer of the j-th sample data (where “j” is an integer from 1 to n) belonging to the class C _g is represented by X _j ^{g (h, i )} , The average vector of the feature vectors X _j ^{g (h, i)} of the i-th region in the h-th layer of the sample data belonging to the class C _g is X ^{−g (h, i),} and all the samples belonging to the class C _g an average vector of feature vectors of the i-th region in the h layer of the sample data mu _x ^{(h, i)} And, i-th equation intraclass covariance matrix for region (9) to indicate sigma _w ^{(h, i)} in the h layer and then, a class outside covariance matrix for the i-th region in the h layer When Σ _b ^{(h, i)} shown in the following equation (10), it is a projection matrix W ^{(h, i)} obtained by solving the eigenvalue problem of the following equation (11) (however, “Λ ^{(h, i)} ” is a diagonal matrix obtained by arranging eigenvalues as discrimination criteria in order diagonally)
When the feature vector is X, the projection matrix is W, and the projection point of the feature vector X is u, it is obtained by dividing each of the sample data in each layer from the first layer to the H layer. Projection points of feature vectors of each region and projection points of feature vectors of each region obtained by dividing the novel data in each layer from the first layer to the H layer are derived according to the following equation (12):
When the feature vector of the novel data is X _s , the probability p (X _s | C _g ) that the feature vector X _s appears from the class C _g is a class discrimination derived based on the following equation (13): apparatus. However, the subscript (h, i) in equation (13) indicates that it is derived from the i-th region in the h-th layer, the subscript s indicates that it is derived from novel data, and the subscript C _g is the class. It indicates that it belongs to C _g, subscript 1 ... n indicates that from the first 1~n th sample data belonging to the class C _g, "alpha ^h" is the weight to be given to the h layer “Z ^{(h, i) Cg} ” and “Θ ^{(h, i)} ” in the equation (13) are as shown in the following equations (14) and (15), and u ^{− ( h} , ^{j) Cg} is the average of the projection points u ^(h , ^{j) Cg} of the feature vector X ^(h , ^{j) Cg} belonging to the class C _g , and “Ψ ^{(h, i} ) in the equations (14) and (15) ⁾ "is the variance of the latent variables in the following equation (16), lambda in the equation (16) ^{(h, i)} is the solution of the eigenvalue problem in the i-th region in the h layer It is a diagonal matrix obtained by arranging eigenvalues diagonally in order.

A class discrimination method for discriminating to which one of a plurality of classes corresponding to at least one known data each new data indicating real world information belongs,
Each of the novel data and the known data in the h-th layer is h × h (where “h” is an integer from value 1 to value H, and “H” is an integer greater than or equal to value 2). Is converted into a potential space based on the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer. Derived for each region,
Assuming that the novel data is divided into h × h regions in the h-th layer, a plurality of higher-order local features from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer Extract the vector,
An average vector of a plurality of higher-order local feature vectors extracted from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer, and a first vector between the plurality of higher-order local feature vectors. M-order correlation vectors from the first order to the M-th order (where “M” is an integer greater than or equal to value 1, and “m” is an integer from value 1 to value M) and the average Obtaining a feature vector of each of the regions based on an element constituting the vector and an element constituting the m-th order correlation vector;
For each class, it is obtained by projecting the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer into the latent space by the transformation corresponding to the region. A projection point, and a projection point obtained by projecting the feature vector of each region obtained by dividing the novel data in each layer from the first layer to the H-th layer onto the latent space by the transformation corresponding to the region; Based on the feature vector of the i-th area (where “i” is an integer from 1 to h ² ) in the h-th layer of the known data, the i-th in the h-th layer of the novel data th derives a sum of i = 1 of the probability that the feature vector appears in the area from the i = h ² and the first layer to the second H layer as the probability that the feature vector of the novel data from the class appears,
A class discrimination method for setting a class having the maximum derived probability as a class to which the novel data belongs.

A class determination program that causes a computer to function as a class determination device that determines which of a plurality of classes corresponding to at least one known data each of novel data indicating real world information,
In the h-th layer, the novel data is divided into h × h areas (where “h” is an integer from value 1 to value H and “H” is an integer greater than or equal to value 2). A local feature extraction module that extracts a plurality of higher-order local feature vectors from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer,
An average vector of a plurality of higher-order local feature vectors extracted by the local feature extraction module from each of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer, and the plurality of higher-order M-th order correlation vectors from the first order to the M-th order between local feature vectors (where “M” is an integer greater than or equal to value 1 and “m” is an integer between value 1 and value M). A feature vector obtaining module that obtains a feature vector of each of the regions based on an element constituting the average vector and an element constituting the m-th order correlation vector;
For each class, the feature vector of each region obtained by dividing each of the known data in each layer from the first layer to the H-th layer is projected onto the latent space by conversion corresponding to the predetermined region. Projection points obtained by projecting the obtained projection points and the feature vectors of the regions obtained by dividing the novel data in each layer from the first layer to the H-th layer onto the latent space by the transformation corresponding to the regions. Based on the feature vector of the i-th region in the h-th layer of the known data (where “i” is an integer from the value 1 to the value h ² ) in the h-th layer of the novel data. The probability that i = 1 to i = h ^{2 of} the probability that the feature vector of the i-th region appears and the sum from the first layer to the H-th layer is derived as the probability that the feature vector of the novel data appears from the class. A rate derivation module;
A class setting module that sets a class having a maximum probability derived by the probability derivation module as a class to which the novel data belongs;
Class discrimination program with