JP2013254254A

JP2013254254A - Identification device, control method and program thereof

Info

Publication number: JP2013254254A
Application number: JP2012128062A
Authority: JP
Inventors: Koichi Umakai; 浩一馬養
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-06-05
Filing date: 2012-06-05
Publication date: 2013-12-19

Abstract

PROBLEM TO BE SOLVED: To increase category identification accuracy.SOLUTION: An individual feature extraction part 102 extracts features from first data. A back ground model generation part 103 generates a back ground model associated with the features extracted from the first data. A noise determination part 104 determines a noise distribution in the back ground model. Also, the individual feature extraction part 102 extracts features from second data. A model generation part 1-6 generates a model associated with the features extracted from the second data on the basis of distribution other than the noise distribution of the back ground model. An identification apparatus generation part 108 generates an identification apparatus for identifying a category on the basis of the model.

Description

本発明は、データのカテゴリ識別を行う技術に関するものである。 The present invention relates to a technique for performing category identification of data.

画像データの中から「飛行機」や「歌っている人」といった高次特徴を検出するタスクに対し、ＳＩＦＴ特徴とＭＦＣＣ特徴との混合ガウス分布（ＧＭＭ）による統計的手法を用いた認識手法が知られている（非特許文献１参照）。この方法では、GMM Supervector SVM(ＧＳ−ＳＶＭ)により高次特徴の検出が行われる。ＧＳ−ＳＶＭでは、各ショット画像データのＧＭＭが求められ、ＧＭＭ間の距離から定義されるＲＢＦカーネルを用いたＳＶＭで学習及び識別が行われる。ショット画像データ毎にＧＭＭを生成する際には、先ず、カテゴリに関わらず、ショット画像データ全体から特徴をサンプリングすることで、ユニバーサルバックグラウンドモデル（ＵＢＭ）と呼ばれるモデルが生成される。次に、最大事後確率（Maximum A Posteriori; MAP）適応によって当該ショット画像データのパラメータが推定される。 For the task of detecting higher-order features such as “airplane” and “singer” from image data, a recognition method using a statistical method based on a mixed Gaussian distribution (GMM) of SIFT features and MFCC features is known. (See Non-Patent Document 1). In this method, higher-order features are detected by GMM Supervector SVM (GS-SVM). In GS-SVM, GMM of each shot image data is obtained, and learning and identification are performed by SVM using an RBF kernel defined from the distance between GMMs. When generating a GMM for each shot image data, a model called a universal background model (UBM) is first generated by sampling features from the entire shot image data regardless of the category. Next, the parameters of the shot image data are estimated by adaptive to a maximum posterior probability (Maximum A Posteriori; MAP).

井上中順, 斉藤辰彦, 篠田浩一, 古井貞煕, "大規模映像資源のためのマルチモーダル高次特徴検出,"電子情報通信学会論文誌 D, vol.J93-D, no.12, pp.2633-2644, Dec, 2010.Nakanobu Inoue, Yasuhiko Saito, Koichi Shinoda, Sadaaki Furui, "Multimodal higher-order feature detection for large-scale video resources," IEICE Transactions D, vol.J93-D, no.12, pp. 2633-2644, Dec, 2010.

しかしながら、非特許文献１に開示された技術では、カテゴリに関わらず、ショット画像データ全体を使用してユニバーサルバックグラウンドモデル（ＵＢＭ）を生成し、カテゴリ毎にＵＢＭからＭＡＰ適応したＧＭＭを構成するようにしている。従って、或るカテゴリに着目した場合、ＵＢＭに含まれる分布の中には、そのカテゴリから抽出される特徴からは生成されない分布が存在する。当該カテゴリにとっては、その分布は、認識には貢献しないノイズ分布となる。従って、非特許文献１に開示された技術では、ノイズ分布をも考慮した識別を行っており、このようなノイズ分布がカテゴリ識別精度の向上を阻害している。 However, in the technique disclosed in Non-Patent Document 1, a universal background model (UBM) is generated using the entire shot image data regardless of the category, and a MAP-adapted GMM is configured from the UBM for each category. I have to. Therefore, when attention is paid to a certain category, there is a distribution that is not generated from features extracted from the category among the distributions included in the UBM. For this category, the distribution is a noise distribution that does not contribute to recognition. Therefore, in the technique disclosed in Non-Patent Document 1, identification is performed in consideration of noise distribution, and such noise distribution hinders improvement in category identification accuracy.

そこで、本発明の目的は、カテゴリ識別精度を向上させることにある。 Therefore, an object of the present invention is to improve category identification accuracy.

本発明の識別装置は、第１のデータを入力する第１のデータ入力手段と、前記第１のデータから特徴を抽出する第１の特徴抽出手段と、前記第１の特徴抽出手段により抽出された特徴についてバックグラウンドモデルを生成するバックグラウンドモデル生成手段と、前記バックグラウンドモデル中のノイズ分布を判別する判別手段と、第２のデータを入力する第２のデータ入力手段と、前記第２のデータから特徴を抽出する第２の特徴抽出手段と、前記バックグラウンドモデルのうちの前記ノイズ分布以外の分布に基づいて、前記第２の特徴抽出手段により抽出された特徴についてモデルを生成するモデル生成手段と、前記モデルに基づいて、カテゴリを識別するための識別器を生成する識別器生成手段とを有することを特徴とする。 The identification device of the present invention is extracted by first data input means for inputting first data, first feature extraction means for extracting features from the first data, and the first feature extraction means. Background model generation means for generating a background model for the features, a determination means for determining a noise distribution in the background model, a second data input means for inputting second data, and the second Second feature extraction means for extracting features from the data, and model generation for generating a model for the features extracted by the second feature extraction means based on a distribution other than the noise distribution in the background model And classifier generating means for generating a classifier for identifying a category based on the model.

本発明によれば、カテゴリ識別精度を向上させることが可能となる。 According to the present invention, it is possible to improve category identification accuracy.

本発明の実施形態に係る識別装置の構成を示す図である。It is a figure which shows the structure of the identification device which concerns on embodiment of this invention. 図１に示した本実施形態に係る識別装置の構成から、バックグラウンドモデル生成処理に関わる構成を抜粋して示した図である。It is the figure which extracted and showed the structure in connection with a background model production | generation process from the structure of the identification device which concerns on this embodiment shown in FIG. バックグラウンドモデル生成処理を示すフローチャートである。It is a flowchart which shows a background model production | generation process. 図１に示した本実施形態に係る識別装置の構成から、カテゴリ別識別器の学習処理に関わる構成を抜粋して示した図である。It is the figure which extracted and showed the structure in connection with the learning process of the discrimination device according to category from the structure of the identification device which concerns on this embodiment shown in FIG. カテゴリ別識別器の学習処理を示すフローチャートである。It is a flowchart which shows the learning process of the discrimination device according to category. 図１に示した本実施形態に係る識別装置の構成から、カテゴリの識別処理に関わる構成を抜粋して示した図である。It is the figure which extracted and showed the structure in connection with the identification process of a category from the structure of the identification apparatus which concerns on this embodiment shown in FIG. カテゴリの識別処理を示すフローチャートである。It is a flowchart which shows the identification process of a category.

以下、本発明を適用した好適な実施形態を、添付図面を参照しながら詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments to which the invention is applied will be described in detail with reference to the accompanying drawings.

図１は、本発明の実施形態に係る識別装置の構成を示す図である。図１において、１００は識別装置である。図１に示すように、識別装置１００は、データ入力部１０１、個別特徴抽出部１０２、バックグラウンドモデル生成部１０３、ノイズ判別部１０４、記憶装置１０５、モデル生成部１０６、高次特徴生成部１０７、識別器生成部１０８及び識別部１０９により構成される。なお、識別装置１００は、例えばパーソナルコンピュータによって構成される。即ち、上述したデータ入力部１０１、個別特徴抽出部１０２、バックグラウンドモデル生成部１０３、ノイズ判別部１０４、モデル生成部１０６、高次特徴生成部１０７、識別器生成部１０８及び識別部１０９は、識別装置１００内のＣＰＵがＲＯＭ等の不揮発性記録媒体から必要なプログラム及びデータをＲＡＭにロードし、実行することで実現する機能構成である。また、記憶装置１０５は、ＲＡＭ等の書き換え可能な記録媒体に相当する構成である。 FIG. 1 is a diagram illustrating a configuration of an identification device according to an embodiment of the present invention. In FIG. 1, reference numeral 100 denotes an identification device. As illustrated in FIG. 1, the identification device 100 includes a data input unit 101, an individual feature extraction unit 102, a background model generation unit 103, a noise determination unit 104, a storage device 105, a model generation unit 106, and a higher-order feature generation unit 107. The discriminator generating unit 108 and the discriminating unit 109 are configured. Note that the identification device 100 is configured by, for example, a personal computer. That is, the data input unit 101, the individual feature extraction unit 102, the background model generation unit 103, the noise determination unit 104, the model generation unit 106, the higher-order feature generation unit 107, the classifier generation unit 108, and the identification unit 109 described above are: This is a functional configuration realized by the CPU in the identification device 100 loading a necessary program and data from a nonvolatile recording medium such as a ROM into the RAM and executing the program. The storage device 105 has a configuration corresponding to a rewritable recording medium such as a RAM.

本実施形態に係る識別装置１００は、（Ｉ）バックグラウンドモデルの生成処理、（ＩＩ）カテゴリ別識別器の学習処理、及び、（ＩＩＩ）カテゴリの識別処理を実行する。以下では、説明を分かりやすくするため、本実施形態に係る識別装置１００の処理を、（Ｉ）バックグラウンドモデルの生成処理、（ＩＩ）カテゴリ別識別器の学習処理、及び、（ＩＩＩ）カテゴリの識別処理に分けて説明する。 The identification device 100 according to the present embodiment executes (I) background model generation processing, (II) category-specific classifier learning processing, and (III) category identification processing. In the following, in order to make the explanation easy to understand, the processing of the identification device 100 according to the present embodiment includes (I) background model generation processing, (II) category-specific classifier learning processing, and (III) category The description will be divided into identification processing.

先ず、（Ｉ）バックグラウンドモデルの生成処理について説明する。ここで生成されるバックグラウンドモデルは、（ＩＩ）カテゴリ別識別器の学習処理と（Ｉ）カテゴリの識別処理との両方で使用される。 First, (I) background model generation processing will be described. The background model generated here is used in both (II) the learning process of the category classifier and (I) the classification process of the category.

図２は、図１に示した本実施形態に係る識別装置１００の構成から、バックグラウンドモデル生成処理に関わる構成を抜粋して示した図である。以下では、識別装置１００の構成のうち、バックグラウンドモデル生成処理に関わる構成を、バックグラウンドモデル生成装置と称して説明する。 FIG. 2 is a diagram showing an extracted configuration related to the background model generation process from the configuration of the identification apparatus 100 according to the present embodiment shown in FIG. Hereinafter, a configuration related to the background model generation process among the configurations of the identification device 100 will be described as a background model generation device.

図２において、２００はバックグラウンドモデル生成装置である。図２に示すように、バックグラウンドモデル生成装置２００は、データ入力部１０１、個別特徴抽出部１０２、バックグラウンドモデル生成部１０３、ノイズ判別部１０４及び記憶装置１０５により構成される。 In FIG. 2, reference numeral 200 denotes a background model generation apparatus. As illustrated in FIG. 2, the background model generation device 200 includes a data input unit 101, an individual feature extraction unit 102, a background model generation unit 103, a noise determination unit 104, and a storage device 105.

次に、図３を参照しながら、バックグラウンドモデル生成装置２００の処理について説明する。ステップＳ３０１において、データ入力部１０１は、バックグラウンドモデルの生成に必要な、複数の画像データと、それぞれの画像データが属するカテゴリ情報が記されたアノテーションデータとを入力する。ここで入力される画像データは、数秒から数十秒のショット画像データである。カテゴリ情報としては、例えば、誕生日パーティーやパレード等のシーン情報や、画像データ中に現れる飛行機や車等の被写体情報が挙げられる。ここで入力される可能性があるカテゴリ情報は、予め定められているものとする。なお、ステップＳ３０１において画像データを入力する処理は、第１のデータ入力手段の処理例である。 Next, processing of the background model generation apparatus 200 will be described with reference to FIG. In step S301, the data input unit 101 inputs a plurality of image data necessary for generating a background model and annotation data in which category information to which each image data belongs is described. The image data input here is shot image data of several seconds to several tens of seconds. Examples of the category information include scene information such as birthday parties and parades, and subject information such as airplanes and cars that appear in the image data. It is assumed that category information that may be input here is determined in advance. Note that the processing for inputting image data in step S301 is a processing example of the first data input means.

ステップＳ３０２において、個別特徴抽出部１０２は、ステップＳ３０１において入力された各画像データから個別特徴を抽出する。各個別特徴はベクトルで表現される。本実施形態では、個別特徴として、画像特徴であるＳＩＦＴ特徴と音響特徴であるＭＦＣＣ特徴とが抽出される。そして、個別特徴抽出部１０２は、抽出した個別特徴と、ステップＳ３０１において入力されたアノテーションデータであるカテゴリ情報とをリンクさせる。なお、ステップＳ３０２は、第１の特徴抽出手段の処理例である。 In step S302, the individual feature extraction unit 102 extracts individual features from each image data input in step S301. Each individual feature is represented by a vector. In the present embodiment, SIFT features that are image features and MFCC features that are acoustic features are extracted as individual features. Then, the individual feature extraction unit 102 links the extracted individual features with the category information that is the annotation data input in step S301. Step S302 is a processing example of the first feature extraction unit.

ステップＳ３０３において、バックグラウンドモデル生成部１０３は、個別特徴の種別毎に、即ち、ＳＩＦＴ特徴とＭＦＣＣ特徴とのそれぞれについて、バックグラウンドモデルを生成する。本実施形態では、バックグラウンドモデルとしてガウス混合分布（ＧＭＭ）が生成される。以下では、バックグラウンドモデルとなるＧＭＭをユニバーサルバックグラウンドモデル（ＵＢＭ）と称す。バックグラウンドモデル生成部１０３は、ＵＢＭを生成すると、ＵＢＭを構成する各分布について、そのパラメータである平均ベクトルと分散共分散行列とを記憶装置１０５に保存する。 In step S303, the background model generation unit 103 generates a background model for each individual feature type, that is, for each of the SIFT feature and the MFCC feature. In this embodiment, a Gaussian mixture distribution (GMM) is generated as a background model. Hereinafter, the GMM serving as the background model is referred to as a universal background model (UBM). When generating the UBM, the background model generation unit 103 stores, in the storage device 105, an average vector and a variance-covariance matrix that are parameters for each distribution constituting the UBM.

ステップＳ３０４において、ノイズ判別部１０４は、個別特徴とリンクされたカテゴリ情報に基づいて、カテゴリ毎にＵＢＭ中のノイズ分布を判別する。具体的には、例えば「車」カテゴリについてのノイズ分布を判別する場合、ノイズ判別部１０４は、ステップＳ３０３においてＵＢＭが生成された個別特徴のうち、「車」カテゴリとリンクされた個別特徴に着目する。以下では、これを「車」特徴ベクトルと称す。ノイズ判別部１０４は、ＵＢＭを構成する各分布について、その分布を構成する全個別特徴に対する「車」特徴ベクトルの頻度又は割合を算出する。そして、ノイズ判別部１０４は、算出した頻度又は割合が所定の閾値以下となる分布について、その分布を「車」カテゴリについてのノイズ分布と判別する。ノイズ分布は、複数判別される場合もあるし、１つも判別されない場合もあり得る。また、ここでは「車」カテゴリを例に挙げて説明したが、他のカテゴリについても同様にノイズ分布が判別される。ノイズ判別部１０４は、このようにしてカテゴリ毎に判別したノイズ分布を記憶装置１０５に保存する。 In step S304, the noise determination unit 104 determines the noise distribution in the UBM for each category based on the category information linked to the individual features. Specifically, for example, when determining the noise distribution for the “car” category, the noise determination unit 104 focuses on the individual features linked to the “car” category among the individual features for which the UBM is generated in step S303. To do. Hereinafter, this is referred to as a “car” feature vector. For each distribution constituting the UBM, the noise determination unit 104 calculates the frequency or ratio of the “car” feature vector for all the individual features that make up the distribution. Then, the noise determination unit 104 determines that the distribution with the calculated frequency or ratio is equal to or less than a predetermined threshold is the noise distribution for the “car” category. A plurality of noise distributions may be discriminated, or none may be discriminated. Although the “car” category has been described as an example here, the noise distribution is similarly determined for other categories. The noise determination unit 104 stores the noise distribution thus determined for each category in the storage device 105.

次に、（ＩＩ）カテゴリ別識別器の学習処理について説明する。画像データのカテゴリ識別には、カテゴリ別識別器の学習が必要となる。ここでは、説明を分かりやすくするため、学習対象を「車」カテゴリとして説明する。 Next, (II) the learning process of the category discriminator will be described. For category identification of image data, it is necessary to learn a category classifier. Here, in order to make the explanation easy to understand, the learning target is described as the “car” category.

図４は、図１に示した本実施形態に係る識別装置１００の構成から、カテゴリ別識別器の学習処理に関わる構成を抜粋して示した図である。以下では、識別装置１００の構成のうち、カテゴリ別識別器の学習処理に関わる構成を、カテゴリ別識別器学習装置と称して説明する。 FIG. 4 is a diagram showing an excerpt of the configuration related to the learning process for the category-specific classifier from the configuration of the identification device 100 according to the present embodiment shown in FIG. Below, the structure in connection with the learning process of the discriminator classified by category among the configurations of the discriminating apparatus 100 will be referred to as a discriminator classified by category.

図４において、４００はカテゴリ別識別器学習装置である。図４に示すように、カテゴリ別識別器学習装置４００は、データ入力部１０１、個別特徴抽出部１０２、モデル生成部１０６、高次特徴生成部１０７、識別器生成部１０８及び記憶装置１０５により構成される。 In FIG. 4, reference numeral 400 denotes a category discriminator learning device. As shown in FIG. 4, the category classifier learning device 400 includes a data input unit 101, an individual feature extraction unit 102, a model generation unit 106, a higher-order feature generation unit 107, a classifier generation unit 108, and a storage device 105. Is done.

次に、図５を参照しながら、カテゴリ別識別器学習装置４００の処理について説明する。ステップＳ５０１において、データ入力部１０１は、カテゴリ別識別器の学習処理に必要な、複数の画像データと、それぞれの画像データが属するカテゴリ情報が記されたアノテーションデータとを入力する。ステップＳ３０１と同様に、ここで入力される画像データは、数秒から数十秒のショット画像データである。ステップＳ５０２において、個別特徴抽出部１０２は、ステップＳ５０１において入力された画像データから個別特徴を抽出する。ここでは、上述したバックグラウンドモデルの生成処理と同様に、画像特徴であるＳＩＦＴ特徴と音響特徴であるＭＦＣＣ特徴とが個別特徴として抽出される。そして、個別特徴抽出部１０２は、抽出した個別特徴と、ステップＳ５０１において入力されたアノテーションデータであるカテゴリ情報とをリンクさせる。なお、ステップＳ５０１において画像データを入力する処理は、第２のデータ入力手段の処理例である。また、ステップＳ５０２は、第２の特徴抽出手段の処理例である。 Next, the processing of the category discriminator learning device 400 will be described with reference to FIG. In step S501, the data input unit 101 inputs a plurality of image data and annotation data in which the category information to which each image data belongs is necessary for the learning process of the category discriminator. Similar to step S301, the image data input here is shot image data of several seconds to several tens of seconds. In step S502, the individual feature extraction unit 102 extracts individual features from the image data input in step S501. Here, the SIFT feature that is an image feature and the MFCC feature that is an acoustic feature are extracted as individual features as in the background model generation process described above. Then, the individual feature extraction unit 102 links the extracted individual features to the category information that is the annotation data input in step S501. Note that the process of inputting image data in step S501 is a process example of the second data input unit. Step S502 is a processing example of the second feature extraction unit.

ステップＳ５０３において、モデル生成部１０６は、個別特徴の種別毎に、即ち、ＳＩＦＴ特徴とＭＦＣＣ特徴とのそれぞれについて、正例のモデルとして「車」カテゴリのモデルと、負例のモデルとして「非車」カテゴリのモデルとを生成する。ここでは、ガウス混合分布（ＧＭＭ）がモデルとして生成される。ＧＭＭの確率密度関数ｐ（ｘ）は、次の式１で与えられる。 In step S503, the model generation unit 106 sets the “car” category model as a positive example model and the “non-vehicle” as a negative example model for each individual feature type, that is, for each SIFT feature and MFCC feature. A model of the category. Here, a Gaussian mixture distribution (GMM) is generated as a model. The probability density function p (x) of GMM is given by the following equation 1.

ここで、ｘ∈Ｒ^dは、次元ｄの個別特徴である。Ｋは、混合数（分布数）である。ｋは、分布を特定するための添え字である。ｗ_k、μ_k及びΣ_kは、それぞれｋ番目の分布の重み係数、平均ベクトル及び分散共分散行列である。ここでのＧＭＭの生成は、ＵＢＭからのパラメータのMaximum A Posteriori（ＭＡＰ）適応によって行われる。また、モデル生成部１０６は、ＧＭＭの重み係数ｗ_kと分散共分散行列Σ_kとが全ての画像データについて共通であると仮定し、「車」カテゴリの一つの画像データから抽出された個別特徴の集合の平均ベクトルを、次の式２で推定する。 Here, x∈R ^d is an individual feature of dimension d. K is the number of mixtures (number of distributions). k is a subscript for specifying the distribution. w _k , μ _k, and Σ _k are the weight coefficient, average vector, and variance-covariance matrix of the k-th distribution, respectively. The generation of the GMM here is performed by adapting the parameter from the UBM to Maximum A Posteriori (MAP). Further, the model generation unit 106 assumes that the GMM weighting coefficient w _k and the variance-covariance matrix Σ _k are common to all the image data, and the individual feature extracted from one image data in the “car” category. The average vector of the set is estimated by the following equation 2.

ここで、ｇ_kは、平均ベクトルμ_kと分散共分散行列Σ_kとを持つガウス分布の確率密度関数である。τは、事前分布への依存度を定めるためのパラメータである。ｃ_ikは、個別特徴ｘ_iに対するｋ番目の混合要素の負担率である。同様に、モデル生成部１０６は、「非車」カテゴリの画像データから抽出された個別特徴の集合Ｙ_Fの平均ベクトルについても同様に推定する。 Here, g _k is a Gaussian probability density function having a mean vector μ _k and a variance-covariance matrix Σ _k . τ is a parameter for determining the degree of dependence on the prior distribution. c _ik is a burden factor of the k-th mixing element for the individual feature x _i . Similarly, the model generation unit 106 similarly estimates the average vector of the set Y _F of individual features extracted from the image data of the “non-car” category.

このとき、モデル生成部１０６は、ステップ３０４でカテゴリ毎に判定されたノイズ分布を記憶装置１０５から読み出す。モデル生成部１０６は、ＵＢＭからのＭＡＰ適応によるＧＭＭ生成時において、ノイズ分布についてはＭＡＰ適応を行わず、ノイズ分布以外の分布についてのみＭＡＰ適応を行う。即ち、ノイズ分布がｍ個ある場合、ＭＡＰ適応が行われる分布数は、Ｋ−ｍ個となる。 At this time, the model generation unit 106 reads the noise distribution determined for each category in step 304 from the storage device 105. When generating a GMM by MAP adaptation from the UBM, the model generation unit 106 does not perform MAP adaptation for the noise distribution, but performs MAP adaptation only for a distribution other than the noise distribution. That is, when there are m noise distributions, the number of distributions to which MAP adaptation is performed is K−m.

ステップＳ５０４において、高次特徴生成部１０７は、モデル生成部１０６により生成されたモデルに基づいて、高次特徴φ（Ｘ_F）を次の式３により生成する。同様に、高次特徴生成部１０７は、モデル生成部１０６により生成されたモデルに基づいて、高次特徴φ（Ｙ_F)も生成する。各高次特徴φ（Ｘ_F）、φ（Ｙ_F）はベクトルで表現される。なお、Ｘ_F＝｛ｘ_i｝^N _i=1は、「車」カテゴリの一つの画像データから抽出された個別特徴の集合（Ｆは、個別特徴の種別)である。また、Ｙ_F＝｛ｙ_i｝^N _i=1は、「非車」カテゴリの一つの画像データから抽出された個別特徴の集合である。 In step S <b> 504, the high-order feature generation unit 107 generates a high-order feature φ (X _F ) based on the model generated by the model generation unit 106 using the following Equation 3. Similarly, the high-order feature generation unit 107 also generates a high-order feature φ (Y _F ) based on the model generated by the model generation unit 106. Each higher-order feature φ (X _F ), φ (Y _F ) is represented by a vector. X _F = {x _i } ^N _{i = 1} is a set of individual features (F is a type of individual features) extracted from one image data of the “car” category. Y _F = {y _i } ^N _{i = 1} is a set of individual features extracted from one image data of the “non-car” category.

高次特徴φ（Ｘ_F）又はφ（Ｙ_F）は、ステップＳ５０１において入力された複数の画像データそれぞれについて生成される。ステップＳ５０５において、識別器生成部１０８は、「車」カテゴリに対するカテゴリ別識別器（識別関数）ｆ_Fを学習により生成する。カテゴリ別識別器の学習は、個別特徴の種別毎に行われる。識別器生成部１０８は、複数の高次特徴φ（Ｘ_F)と複数の高次特徴φ（Ｙ_F)とを入力としたＳＶＭで学習する。識別器生成部１０８によって生成されたカテゴリ別識別器ｆ_Fは、記憶装置１０５に格納される。 Higher-order features φ (X _F ) or φ (Y _F ) are generated for each of the plurality of image data input in step S501. In step S505, the classifier generation unit 108 generates a category-specific classifier (discriminant function) f _F for the “car” category by learning. Learning of the category classifier is performed for each type of individual feature. The discriminator generation unit 108 learns by SVM that has a plurality of higher-order features φ (X _F ) and a plurality of higher-order features φ (Y _F ) as inputs. The category-specific classifiers f _F generated by the classifier generation unit 108 are stored in the storage device 105.

次に、（ＩＩＩ）カテゴリの識別処理について説明する。ここでは、説明を分かりやすくするため、識別対象の画像データが「車」カテゴリであるとして説明する。即ち、ここでは、識別対象の画像データに車が写っているか否かが識別される。 Next, (III) category identification processing will be described. Here, in order to make the explanation easy to understand, it is assumed that the image data to be identified is in the “car” category. That is, here, it is identified whether or not a car is shown in the image data to be identified.

図６は、図１に示した本実施形態に係る識別装置１００の構成から、カテゴリの識別処理に関わる構成を抜粋して示した図である。以下では、識別装置１００の構成のうち、カテゴリの識別処理に関わる構成を、カテゴリ識別装置６００と称して説明する。 FIG. 6 is a diagram showing a configuration related to category identification processing extracted from the configuration of the identification apparatus 100 according to the present embodiment shown in FIG. Hereinafter, a configuration related to the category identification process among the configurations of the identification device 100 will be referred to as a category identification device 600.

図６において、６００はカテゴリ識別装置である。図６に示すように、カテゴリ識別装置６００は、データ入力部１０１、個別特徴抽出部１０２、モデル生成部１０６、高次特徴生成部１０７、識別部１０９及び記憶装置１０５により構成される。 In FIG. 6, reference numeral 600 denotes a category identification device. As shown in FIG. 6, the category identification device 600 includes a data input unit 101, an individual feature extraction unit 102, a model generation unit 106, a higher-order feature generation unit 107, an identification unit 109, and a storage device 105.

次に、図７を参照しながら、カテゴリ識別装置６００の処理について説明する。ステップＳ７０１において、データ入力部１０１は、識別対象の画像データを入力する。ステップＳ３０１と同様に、ここで入力される画像データは、数秒から数十秒のショット画像データである。一般に、識別対象の画像データはアノテーションデータを持たない。ステップＳ７０２において、個別特徴抽出部１０２は、ステップＳ７０１において入力された画像データから個別特徴を抽出する。ここでは、上述したバックグラウンドモデルの生成処理及びカテゴリ別識別器の学習処理と同様に、ステップＳ７０１において入力された画像データから、画像特徴であるＳＩＦＴ特徴と音響特徴であるＭＦＣＣ特徴とが個別特徴として抽出される。なお、ステップＳ７０１において画像データを入力する処理は、第３のデータ入力手段の処理例である。 Next, processing of the category identification device 600 will be described with reference to FIG. In step S701, the data input unit 101 inputs image data to be identified. Similar to step S301, the image data input here is shot image data of several seconds to several tens of seconds. Generally, image data to be identified does not have annotation data. In step S702, the individual feature extraction unit 102 extracts individual features from the image data input in step S701. Here, as in the background model generation process and the category classifier learning process described above, the SIFT feature that is an image feature and the MFCC feature that is an acoustic feature are individually featured from the image data input in step S701. Extracted as Note that the process of inputting image data in step S701 is a process example of the third data input unit.

ステップＳ７０３において、モデル生成部１０６は、個別特徴の種別毎に、即ち、ＳＩＦＴ特徴とＭＦＣＣ特徴とのそれぞれについて、モデルを生成する。モデルの生成は、カテゴリ別識別器の学習処理と同様の手法により行われる。即ち、モデル生成部１０６は、モデルとしてガウス混合分布（ＧＭＭ）を生成する。ＧＭＭの確率密度関数は、上記式１で与えられる。ここでのＧＭＭの生成は、ＵＢＭからのパラメータのMaximum A Posteriori（ＭＡＰ）適応によって行われる。また、モデル生成部１０６は、ＧＭＭの重み係数と分散行列とが全ての画像データについて共通であると仮定し、平均ベクトルのみを上記式２で推定する。さらに、モデル生成部１０６は、ステップＳ３０４で判別された「車」カテゴリについてのノイズ分布を記憶装置１０５から読み出す。モデル生成部１０６は、ＵＢＭからのＭＡＰ適応によるＧＭＭ生成時において、ノイズ分布についてはＭＡＰ適応を行わず、ノイズ分布以外の分布についてのみＭＡＰ適応を行う。即ち、ノイズ分布がｍ個ある場合、ＭＡＰ適応が行われる分布数は、Ｋ−ｍ個となる。 In step S703, the model generation unit 106 generates a model for each type of individual feature, that is, for each of the SIFT feature and the MFCC feature. The model is generated by the same method as the learning process for the category classifier. That is, the model generation unit 106 generates a Gaussian mixture distribution (GMM) as a model. The probability density function of GMM is given by Equation 1 above. The generation of the GMM here is performed by adapting the parameter from the UBM to Maximum A Posteriori (MAP). Further, the model generation unit 106 assumes that the GMM weighting coefficient and the variance matrix are common to all the image data, and estimates only the average vector by the above equation 2. Further, the model generation unit 106 reads out the noise distribution for the “car” category determined in step S <b> 304 from the storage device 105. When generating a GMM by MAP adaptation from the UBM, the model generation unit 106 does not perform MAP adaptation for the noise distribution, but performs MAP adaptation only for a distribution other than the noise distribution. That is, when there are m noise distributions, the number of distributions to which MAP adaptation is performed is K−m.

ステップＳ７０４において、高次特徴生成部１０７は、高次特徴φ（Ｚ_F）を上記式３により生成する。なお、高次特徴φ（Ｚ_F）はベクトルで表現される。ステップＳ７０５において、識別部１０９は、ステップＳ７０１において入力された画像データに車が写っているか否かを識別する。この識別処理では、識別部１０９は、ステップＳ５０５で生成された「車」カテゴリに対するカテゴリ別識別器を使用し、個別特徴の種別毎に識別関数値を算出する。次に、識別部１０９は、次の式４により識別関数値の重み付き和を計算し、最終的なスコアを算出する。 In step S <b> 704, the high-order feature generation unit 107 generates a high-order feature φ (Z _F ) using Equation 3 above. The higher-order feature φ (Z _F ) is expressed by a vector. In step S705, the identification unit 109 identifies whether or not a car is shown in the image data input in step S701. In this identification processing, the identification unit 109 uses a category-specific classifier for the “car” category generated in step S505, and calculates an identification function value for each type of individual feature. Next, the identification unit 109 calculates a weighted sum of the identification function values according to the following expression 4, and calculates a final score.

ここで、Ｆは、個別特徴の種類(ここでは、ＳＩＦＴ特徴及びＭＦＣＣ特徴)である。ｆ_Fは、個別特徴Ｆに対するカテゴリ別識別器である。α_Fは、重み係数である。α_Fは、validation setでAverage Precisionが最大となるものが選ばれる。 Here, F is the type of individual feature (here, SIFT feature and MFCC feature). f _F is a classifier for the individual feature F. α _F is a weighting factor. α _F is selected so that Average precision is maximized in validation set.

本実施形態においては、ノイズ分布を除外してカテゴリ別識別器を生成し、ショット画像データのカテゴリを識別するようにしているので、カテゴリ識別精度を向上させることができる。 In the present embodiment, the category identifier is generated by excluding the noise distribution and the category of the shot image data is identified, so that the category identification accuracy can be improved.

また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

１００：識別装置、１０１：データ入力部、１０２：個別特徴抽出部、１０３：バックグラウンドモデル生成部、１０４：ノイズ判別部、１０５：記憶装置、１０６：モデル生成部、１０７：高次特徴生成部、１０８：識別器生成部、１０９：識別部、２００：バックグラウンドモデル生成部、４００：カテゴリ別識別器学習装置、６００：カテゴリ識別装置 DESCRIPTION OF SYMBOLS 100: Identification apparatus, 101: Data input part, 102: Individual feature extraction part, 103: Background model generation part, 104: Noise discrimination | determination part, 105: Memory | storage device, 106: Model generation part, 107: Higher-order feature generation part , 108: classifier generating unit, 109: classifying unit, 200: background model generating unit, 400: classifier classifier learning device, 600: category classifying device

Claims

First data input means for inputting first data;
First feature extraction means for extracting features from the first data;
Background model generation means for generating a background model for the features extracted by the first feature extraction means;
Discriminating means for discriminating a noise distribution in the background model;
Second data input means for inputting second data;
Second feature extraction means for extracting features from the second data;
Model generation means for generating a model for the feature extracted by the second feature extraction means based on a distribution other than the noise distribution in the background model;
A classifier generating unit configured to generate a classifier for identifying a category based on the model;

The said discrimination | determination means discriminate | determines the said noise distribution for every category, The said model production | generation means produces | generates a model for every category based on distribution other than the noise distribution of each category. The identification device described.

The determining means calculates, for each distribution included in the background model, the frequency or ratio of the characteristics of the category to be identified that constitutes the distribution, and calculates the distribution in which the calculated frequency or ratio is equal to or less than a predetermined threshold. The identification device according to claim 2, wherein the identification device discriminates as a noise distribution of a category.

Third data input means for inputting third data;
4. The identification device according to claim 1, further comprising: an identification unit that identifies the category of the third data using the classifier. 5.

5. The identification device according to claim 1, wherein the classifier generation unit generates a classifier for each category.

A method for controlling an identification device, comprising:
A first data input step for inputting first data;
A first feature extraction step of extracting features from the first data;
A background model generation step for generating a background model for the features extracted by the first feature extraction step;
A determination step of determining a noise distribution in the background model;
A second data input step for inputting second data;
A second feature extraction step for extracting features from the second data;
A model generation step of generating a model for the feature extracted by the second feature extraction step based on a distribution other than the noise distribution of the background model;
And a discriminator generating step for generating a discriminator for identifying a category based on the model.

A first data input step for inputting first data;
A first feature extraction step of extracting features from the first data;
A background model generation step for generating a background model for the features extracted by the first feature extraction step;
A determination step of determining a noise distribution in the background model;
A second data input step for inputting second data;
A second feature extraction step for extracting features from the second data;
A model generation step of generating a model for the feature extracted by the second feature extraction step based on a distribution other than the noise distribution of the background model;
A program for causing a computer to execute a discriminator generating step for generating a discriminator for identifying a category based on the model.