JP2012118668A

JP2012118668A - Learning device for pattern classification device and computer program for the same

Info

Publication number: JP2012118668A
Application number: JP2010266448A
Authority: JP
Inventors: Hideyuki Watanabe; 秀行渡辺
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2010-11-30
Filing date: 2010-11-30
Publication date: 2012-06-21
Anticipated expiration: 2030-11-30
Also published as: JP5704692B2

Abstract

PROBLEM TO BE SOLVED: To provide a learning device for a pattern classification device for obtaining a high recognition rate by using a loss function directly connected to Bayesian error estimation.SOLUTION: A learning device 42 includes: a storage device 64 for storing a learning pattern set; and a learning device 66 for learning a discriminant function to be defined for each class by a learning pattern. The discriminant function is expressed with the linear sum of kernel arithmetic operations between an input pattern and a plurality of prototypes. The kernel is defined by an internal product between an input pattern after conversion and the prototype after conversion in the case of determining characteristic conversion for converting the input pattern into a space whose dimension is higher than that of the space of the input pattern, and a gram matrix configured of the kernel arithmetic operations between the mutual prototypes is turned into a positive definite matrix. The learning device is configured to adjust a coefficient vector to minimize an average classification error number loss to be defined as a function between a learning pattern and a coefficient vector set in the high dimensional space.

Description

本発明は，何らかの物理的な量の測定値からなるベクトルパターンが，所定の複数クラスのいずれに属するかを判別するパターン認識装置に関し，特に，学習データに基づいてそのようなパターン分類装置の学習を行なう学習装置に関する． The present invention relates to a pattern recognition apparatus that determines which vector class consisting of a measurement value of some physical quantity belongs to a predetermined class, and more particularly to learning of such a pattern classification apparatus based on learning data. It relates to a learning device that performs.

高い認識率を実現できるパターン認識技術として最小分類誤り（ＭｉｎｉｍｕｍＣｌａｓｓｉｆｉｃａｔｉｏｎＥｒｒｏｒ：ＭＣＥ）学習法及びサポートベクターマシーン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ：ＳＶＭ）法が広く用いられている．前者の初期の例は例えば非特許文献１に記載されており，可変長パターンをも含む多様なパターンに対するベイズ誤り推定を直接的に追求する．これに対し，後者は例えば非特許文献２に記載されており，カーネルに付随する高次元空間における線形判別関数の幾何マージン（分類決定境界と学習パターンとの間の距離）を最大化し学習耐性の向上を目指す． As a pattern recognition technique capable of realizing a high recognition rate, a minimum classification error (MCE) learning method and a support vector machine (SVM) method are widely used. The first example of the former is described in Non-Patent Document 1, for example, and directly pursues Bayesian error estimation for various patterns including variable-length patterns. On the other hand, the latter is described in Non-Patent Document 2, for example, which maximizes the geometric margin (distance between the classification decision boundary and the learning pattern) of the linear discriminant function in the high-dimensional space associated with the kernel, thereby improving the learning tolerance. Aim for improvement.

Ｂ．−Ｈ．Ｊｕａｎｇ及びＳ．カタギリ，「最小分類誤り基準識別学習法」，ＩＥＥＥシグナル・プロセッシング・トランザクションズ，第４０巻第１２号，３０４３ページ〜３０５４ページ，１９９２年（Ｂ.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol.40, no.12, pp.3043-3054, Dec. 1992.）B. -H. Jung and S.J. Katagiri, “Minimum Classification Error Criteria Discriminative Learning Method”, IEEE Signal Processing Transactions, Vol. 40, No. 12, pages 3043-3054, 1992 (B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification, ”IEEE Trans. Signal Processing, vol.40, no.12, pp.3043-3054, Dec. 1992.) Ｊ．Ｗａｎｇ，Ｘ．Ｗｕ，Ｃ．Ｚｈａｎｇ，「Ｋ−平均クラスタリングを用いた実時間ビジネス知能システムのためのサポートベクターマシーン」，ビジネス知能及びデータマイニング国際ジャーナル，第１巻第１号，５４ページ〜６４ページ，２００５年（J. Wang, X. Wu, and C. Zhang, “Support vector machines based on K-means clustering for real-time business intelligence systems,” International Journal of Business Intelligence and Data Mining, vol.1, no.1, pp. 54-64, 2005.）J. et al. Wang, X .; Wu, C.I. Zhang, “Support vector machines for real-time business intelligence systems using K-means clustering”, International Journal of Business Intelligence and Data Mining, Volume 1, Issue 1, pages 54-64, 2005 (J. Wang , X. Wu, and C. Zhang, “Support vector machines based on K-means clustering for real-time business intelligence systems,” International Journal of Business Intelligence and Data Mining, vol.1, no.1, pp. 54- 64, 2005.)

ＳＶＭ法は損失関数最小化がベイズ誤り推定に対し直接的でなく，有限個の学習パターンの場合での分類誤り確率の最小化が十分ではない．一方，従来のＭＣＥ法は損失関数がベイズ誤り推定と直結しているが，一般の実施例において，カーネルを用いた高次元空間への写像に基づく精緻な分類決定境界の形成を考慮しておらず，やはり十分な認識率が得られていない． In the SVM method, the loss function minimization is not direct to Bayes error estimation, and the classification error probability in the case of a finite number of learning patterns is not sufficient. On the other hand, in the conventional MCE method, the loss function is directly linked to the Bayes error estimation. However, in the general embodiment, the formation of a precise classification decision boundary based on the mapping to the high-dimensional space using the kernel is not considered. Also, a sufficient recognition rate is not obtained.

それゆえに本発明の目的は，ベイズ誤り推定と直結した損失関数を用い，従来のものより高い認識率が得られるようにパターン分類装置の学習を行なうことができる学習装置を提供することである． Therefore, an object of the present invention is to provide a learning device that can use a loss function directly linked to Bayesian error estimation and learn a pattern classification device so as to obtain a higher recognition rate than the conventional one.

本発明の第１の局面によれば，パターン分類装置の学習装置は，複数個のクラスのいずれかに入力パターンを分類するための学習装置である．この学習装置は，所定の物理量の観測データから得られるベクトルと，当該ベクトルが属するクラスのラベルとからなる学習パターンを要素とする学習パターン集合を記憶するための記憶手段と，複数個のクラスに対しそれぞれ定義される，入力パターンが当該クラスに属する度合いを測る判別関数を，記憶手段に記憶された学習パターン集合に含まれる学習パターンを学習データとして学習するための学習手段とを含む．判別関数は，入力パターンと，複数個のクラスにそれぞれ対応する，学習パターン集合から得られる複数個のプロトタイプとの間のカーネル演算の線形和により表される関数である．複数個のプロトタイプはプロトタイプ集合を形成する．カーネル演算は，入力パターンの空間より高次元の空間に入力パターンを変換する特徴変換を定めたときに，当該特徴変換による変換後の入力パターンと，当該特徴変換による変換後のプロトタイプとの間の内積により定義され，かつ，当該カーネル演算は，プロトタイプ集合内に含まれるプロトタイプ相互間でのカーネル演算により構成されるグラム行列が，どんな個数のどんなプロトタイプに対しても常に正定値行列となるカーネル演算である．複数個のクラスの各々に対して，線形和の各プロトタイプに対応するカーネルの係数は係数ベクトルを形成する．これら複数個のクラスの各々に対して形成される係数ベクトルは係数ベクトル集合を形成する．学習手段は，高次元の空間において，学習パターンと係数ベクトル集合との関数として定義される平均分類誤り数損失が最小となるように係数ベクトル集合に含まれる係数ベクトルを調整する． According to the first aspect of the present invention, the learning device of the pattern classification device is a learning device for classifying an input pattern into one of a plurality of classes. The learning apparatus includes a storage means for storing a learning pattern set including a learning pattern composed of a vector obtained from observation data of a predetermined physical quantity and a label of a class to which the vector belongs, and a plurality of classes. The learning means for learning the learning pattern included in the learning pattern set stored in the memory means as the learning data is a discriminant function that measures the degree to which the input pattern belongs to the class. A discriminant function is a function represented by a linear sum of kernel operations between an input pattern and multiple prototypes obtained from a learning pattern set corresponding to multiple classes. Multiple prototypes form a prototype set. When a kernel transformation defines a feature transformation that transforms an input pattern into a higher-dimensional space than the input pattern space, the kernel operation is performed between the input pattern after transformation by the feature transformation and the prototype after transformation by the feature transformation. The kernel operation is defined as an inner product, and the kernel operation is such that a Gram matrix composed of kernel operations between prototypes included in the prototype set is always a positive definite matrix for any number of prototypes. It is. For each of several classes, the kernel coefficients corresponding to each prototype of the linear sum form a coefficient vector. The coefficient vectors formed for each of these multiple classes form a coefficient vector set. The learning means adjusts the coefficient vector included in the coefficient vector set so that the average classification error loss defined as a function of the learning pattern and the coefficient vector set is minimized in a high-dimensional space.

好ましくは，観測データであるベクトルをクラスタリングすることにより，複数個のプロトタイプを算出するためのクラスタリング手段をさらに含む． Preferably, a clustering means for calculating a plurality of prototypes by clustering vectors as observation data is further included.

より好ましくは，学習手段は，係数ベクトル集合に含まれる係数ベクトルの各々を所定の初期化方法により初期化するための初期化手段と，学習パターン集合に含まれる学習パターンから１個を抽出するための学習パターン抽出手段と，学習パターン抽出手段により学習パターンが抽出されたことに応答して，係数ベクトル集合に含まれる係数ベクトルを，平均分類誤り数損失が最小となるように調整するための係数ベクトル調整手段と，学習パターン抽出手段による学習パターンの抽出と，係数ベクトル調整手段による係数ベクトルの調整とを，学習パターン集合内の全学習パターンが学習パターン抽出手段により抽出されるまで，繰返し実行させるための第１の繰返し制御手段とを含む． More preferably, the learning means is for initializing each coefficient vector included in the coefficient vector set by a predetermined initialization method, and for extracting one from the learning pattern included in the learning pattern set. And a coefficient for adjusting the coefficient vector included in the coefficient vector set so as to minimize the average classification error number loss in response to the learning pattern being extracted by the learning pattern extracting means. The vector adjustment means, the learning pattern extraction by the learning pattern extraction means, and the coefficient vector adjustment by the coefficient vector adjustment means are repeatedly executed until all learning patterns in the learning pattern set are extracted by the learning pattern extraction means. And a first iterative control means.

さらに好ましくは，学習手段はさらに，第１の繰返し制御手段による繰返しが終了するごとに，学習パターン集合内の学習パターンの並びをシャッフルするためのシャッフル手段と，シャッフル手段によるシャッフルが完了したことに応答して，第１の繰返し制御手段による繰返しを再開させるための第２の繰返し制御手段と，第２の繰返し制御手段による繰返しが所定の回数だけ完了したときに，第２の繰返し制御手段による繰返しを停止させるための停止手段とを含む． More preferably, the learning means further includes a shuffle means for shuffling the arrangement of the learning patterns in the learning pattern set and the shuffling by the shuffle means are completed each time the iteration by the first iteration control means is completed. In response, the second repetition control means for resuming the repetition by the first repetition control means and the second repetition control means when the repetition by the second repetition control means is completed a predetermined number of times. And stop means for stopping the repetition.

学習装置はさらに，初期化手段により得られた係数ベクトルの成分のうち，絶対値が所定のしきい値より小さな成分をゼロに固定するための手段を含んでもよい． The learning device may further include means for fixing a component whose absolute value is smaller than a predetermined threshold among the components of the coefficient vector obtained by the initialization means to zero.

好ましくは，プロトタイプ集合は学習パターン集合であり，初期化手段は，学習パターン集合に含まれる学習パターンを複数個のクラスに分類するための，学習パターンに対する所定の変換後のベクトルの線形和の係数ベクトルを，多クラスサポートベクターマシーンの学習により最適化するためのＳＶＭ学習手段と，ＳＶＭ学習手段により学習パターン集合に対して最適化された係数ベクトルを，線形和の各プロトタイプに対応するカーネルの係数からなる係数ベクトルの初期値として設定するための初期値設定手段とを含む． Preferably, the prototype set is a learning pattern set, and the initialization unit is a coefficient of a linear sum of vectors after a predetermined conversion for the learning pattern for classifying the learning patterns included in the learning pattern set into a plurality of classes. SVM learning means for optimizing a vector by learning a multi-class support vector machine, and coefficient vectors optimized for the learning pattern set by the SVM learning means, the coefficient of the kernel corresponding to each prototype of the linear sum Initial value setting means for setting as an initial value of the coefficient vector consisting of.

より好ましくは，初期化手段はさらに，ＳＶＭ学習手段により最適化された係数ベクトルに対応する学習パターンのうち，係数ベクトルが零ベクトルと所定の値以上異なるサポートベクトルのみをプロトタイプとして選択し，判別関数を構成するためのプロトタイプ選択手段を含む． More preferably, the initialization unit further selects, as a prototype, only a support vector whose coefficient vector differs from the zero vector by a predetermined value or more among the learning patterns corresponding to the coefficient vector optimized by the SVM learning unit. Includes prototype selection means for constructing.

さらに好ましくは，初期化手段は，学習パターン集合及びプロトタイプ集合とに適合するように予め学習がされていた混合ガウスモデル又は動径基底関数の係数ベクトルを，係数ベクトル集合の初期値として設定するための手段を含む． More preferably, the initialization means sets a coefficient vector of a mixed Gaussian model or a radial basis function that has been previously learned so as to be adapted to the learning pattern set and the prototype set as an initial value of the coefficient vector set. Including the following means.

学習手段は，係数ベクトル集合に含まれる係数ベクトルの各々を所定の初期化方法により初期化するための初期化手段と，学習パターン集合に含まれる学習パターンから１個を抽出するための学習パターン抽出手段と，学習パターン抽出手段により学習パターンが抽出されたことに応答して，係数ベクトル集合に含まれる係数ベクトルと，プロトタイプ集合に含まれるプロトタイプとを，平均分類誤り数損失が最小となるように調整するためのパラメータ調整手段と，学習パターン抽出手段による学習パターンの抽出と，パラメータ調整手段による係数ベクトル及びプロトタイプの調整とを，学習パターン集合内の全学習パターンが学習パターン抽出手段により抽出されるまで，繰返し実行させるための第１の繰返し制御手段とを含んでもよい． The learning means includes an initialization means for initializing each coefficient vector included in the coefficient vector set by a predetermined initialization method, and a learning pattern extraction for extracting one from the learning patterns included in the learning pattern set And the coefficient vector included in the coefficient vector set and the prototype included in the prototype set in response to the learning pattern being extracted by the means and the learning pattern extracting means so that the average classification error number loss is minimized. All the learning patterns in the learning pattern set are extracted by the learning pattern extraction means, including the parameter adjustment means for adjustment, the extraction of the learning pattern by the learning pattern extraction means, and the adjustment of the coefficient vector and the prototype by the parameter adjustment means. Up to the first repeat control means for repeatedly executing .

好ましくは，学習手段はさらに，第１の繰返し制御手段による繰返しが終了するごとに，学習パターン集合内の学習パターンの並びをシャッフルするためのシャッフル手段と，シャッフル手段によるシャッフルが完了したことに応答して，第１の繰返し制御手段による繰返しを再開させるための第２の繰返し制御手段と，第２の繰返し制御手段による繰返しが所定の回数だけ完了したときに，第２の繰返し制御手段による繰返しを停止させるための停止手段とを含む． Preferably, the learning means further responds to the completion of the shuffle by the shuffle means and the shuffle means for shuffling the arrangement of the learning patterns in the learning pattern set each time the iteration by the first iteration control means is completed. The second repetition control means for resuming the repetition by the first repetition control means and the repetition by the second repetition control means when the repetition by the second repetition control means is completed a predetermined number of times. And stop means for stopping.

本発明の第２の局面に係るコンピュータプログラムは，コンピュータを，複数個のクラスのいずれかに入力パターンを分類するためのパターン分類装置の学習装置として機能させる．当該コンピュータプログラムは，コンピュータを，所定の物理量の観測データから得られるベクトルと，当該ベクトルが属するクラスのラベルとからなる学習パターンを要素とする学習パターン集合を記憶するための記憶手段と，複数個のクラスに対しそれぞれ定義される，入力パターンが当該クラスに属する度合いを測る判別関数を，記憶手段に記憶された学習パターン集合に含まれる学習パターンを学習データとして学習するための学習手段として機能させる．判別関数は，入力パターンと，複数個のクラスにそれぞれ対応する，学習パターン集合から得られる複数個のプロトタイプとの間のカーネル演算の線形和により表される関数である．複数個のプロトタイプはプロトタイプ集合を形成する．当該カーネル演算は，入力パターンの空間より高次元の空間に入力パターンを変換する特徴変換を定めたときに，当該特徴変換による変換後の入力パターンと，当該特徴変換による変換後のプロトタイプとの間の内積により定義され，かつ，当該カーネル演算は，プロトタイプ集合内に含まれるプロトタイプ相互間でのカーネル演算により構成されるグラム行列が，どんな個数のどんなプロトタイプに対しても常に正定値行列となるカーネル演算である．複数個のクラスの各々に対して，線形和の各プロトタイプに対応するカーネルの係数は係数ベクトルを形成する．複数個のクラスの各々に対して形成される係数ベクトルは係数ベクトル集合を形成する．学習手段は，高次元の空間において，学習パターンと係数ベクトル集合との関数として定義される平均分類誤り数損失が最小となるように係数ベクトル集合に含まれる係数ベクトルを調整する． The computer program according to the second aspect of the present invention causes a computer to function as a learning device of a pattern classification device for classifying input patterns into any of a plurality of classes. The computer program includes a storage unit for storing a learning pattern set including a learning pattern composed of a vector obtained from observation data of a predetermined physical quantity and a label of a class to which the vector belongs; A discriminant function that is defined for each class and measures the degree to which the input pattern belongs to the class functions as a learning means for learning learning patterns included in the learning pattern set stored in the storage means as learning data . A discriminant function is a function represented by a linear sum of kernel operations between an input pattern and multiple prototypes obtained from a learning pattern set corresponding to multiple classes. Multiple prototypes form a prototype set. When the kernel operation defines a feature transformation that transforms the input pattern into a higher-dimensional space than the input pattern space, the kernel operation is performed between the input pattern after the transformation by the feature transformation and the prototype after the transformation by the feature transformation. This kernel operation is defined as a kernel in which a gram matrix composed of kernel operations between prototypes included in a prototype set is always a positive definite matrix for any number of prototypes. This is an operation. For each of several classes, the kernel coefficients corresponding to each prototype of the linear sum form a coefficient vector. The coefficient vectors formed for each of multiple classes form a coefficient vector set. The learning means adjusts the coefficient vector included in the coefficient vector set so that the average classification error loss defined as a function of the learning pattern and the coefficient vector set is minimized in a high-dimensional space.

以上のように本発明によれば，公知技術である大幾何マージンＭＣＥ学習法を，カーネルの線形和の形式を持つ判別関数の線形和係数パラメータに対して適用する．これにより，カーネルを用いて精緻な分類決定境界を形成することが可能となるだけでなく，分類誤り最小化と未知パターンに対する耐性向上とを共に直接的に目指す学習法が定型化される．結果的に，本発明により，パターンの分布構造が複雑である場合においても,学習パターン以外の未知パターンに対する高い認識率が得られる． As described above, according to the present invention, the known large geometric margin MCE learning method is applied to the linear sum coefficient parameter of the discriminant function having the form of the kernel linear sum. This not only makes it possible to form precise classification decision boundaries using the kernel, but also formalizes a learning method that directly aims to minimize classification errors and improve tolerance to unknown patterns. As a result, according to the present invention, even when the pattern distribution structure is complicated, a high recognition rate for unknown patterns other than the learning pattern can be obtained.

パターン認識装置による分類の概略を模式的に示す図である．It is a figure which shows the outline of the classification by a pattern recognition device typically. Ｎ次元特徴空間Ｂにおける幾何マージンと高次元空間Ｈにおける幾何マージンとの関係を模式的に示す図である．It is a figure which shows typically the relationship between the geometric margin in the N-dimensional feature space B, and the geometric margin in the high-dimensional space H. 本発明の第１の実施の形態に係る話者判別システムの構成を示す模式的ブロック図である．It is a typical block diagram which shows the structure of the speaker discrimination | determination system concerning the 1st Embodiment of this invention. 第１の実施の形態に係る判別関数学習装置を実現するコンピュータプログラムの制御構造を示すフローチャートである．It is a flowchart which shows the control structure of the computer program which implement | achieves the discriminant function learning apparatus which concerns on 1st Embodiment. 図４に示すフローチャートのうち，Ｍ次係数ベクトルτの適応的学習処理を実現するコンピュータプログラムの制御構造を示すフローチャートである．5 is a flowchart showing a control structure of a computer program that realizes adaptive learning processing of the Mth order coefficient vector τ among the flowcharts shown in FIG. 第２の実施の形態に係る判別関数学習装置を実現するコンピュータプログラムの制御構造を示すフローチャートである．It is a flowchart which shows the control structure of the computer program which implement | achieves the discriminant function learning apparatus which concerns on 2nd Embodiment. 図６に示すフローチャートのうち，係数ベクトルαの適応的学習処理を実現するコンピュータプログラムの制御構造を示すフローチャートである．FIG. 7 is a flowchart showing a control structure of a computer program that realizes adaptive learning processing of the coefficient vector α in the flowchart shown in FIG. 6. 第３の実施の形態に係る判別関数学習装置を実現するコンピュータプログラムの制御構造を示すフローチャートである．It is a flowchart which shows the control structure of the computer program which implement | achieves the discriminant function learning apparatus which concerns on 3rd Embodiment. 図８に示すフローチャートのうち，係数ベクトルτとプロトタイプベクトルｐ_mの適応的学習処理を実現するコンピュータプログラムの制御構造を示すフローチャートである．FIG. 9 is a flowchart showing a control structure of a computer program that realizes adaptive learning processing of the coefficient vector τ and the prototype vector p _{m in} the flowchart shown in FIG. 8. 本願発明の実施の形態を実現するコンピュータシステムの正面図である．It is a front view of the computer system which implement | achieves embodiment of this invention. 図１０に示すコンピュータシステムのブロック図である．It is a block diagram of the computer system shown in FIG.

以下の説明及び図面では，同一の部品には同一の参照番号を付してある．したがって，それらについての詳細な説明は繰返さない． In the following description and drawings, the same reference numerals are assigned to the same parts. Therefore, the detailed explanation about them will not be repeated.

《第１の実施の形態》
［１分類器構造］
図１を参照して，入力パターン（観測値）ベクトルｘ∈Χ（図１に示す全入力パターン空間２０）をＪ個のクラス（類）Ｃ₁，Ｃ₂，…，Ｃ_J（図１におけるクラス２２，２４，…，２６及び２８）のいずれか一つに割当てる分類問題を考える．以下の説明では，説明を簡略にするために，入力パターンベクトルを単に「入力パターン」と呼び，同様の考えで「学習パターンベクトル」を「学習パターン」と呼ぶ． << First Embodiment >>
[1 Classifier structure]
Referring to FIG. 1, the input pattern (observed value) vector X∈kai J-number of classes (total input pattern space 20 shown in FIG. 1) (s) C _1, C _2, ..., in C _J (Fig. 1 Consider the classification problem assigned to any one of classes 22, 24,..., 26 and 28). In the following explanation, to simplify the explanation, the input pattern vector is simply called “input pattern”, and the “learning pattern vector” is called “learning pattern” in the same way.

本実施の形態では，入力パターンｘがクラスＣ_jに帰属する度合いを測る判別関数ｇ_j(ｘ；Λ)として次式の関数を用いる． In the present embodiment, the following function is used as a discriminant function g _j (x; Λ) that measures the degree to which the input pattern x belongs to the class C _j .

ここで｛ｐ_m｝^M _m=1は，Ｎ個の学習パターン集合｛ｘ_n｝^N _n=1から計算されるプロトタイプの集合である（Ｍはプロトタイプの総数）．例えば，これらは学習パターン｛ｘ_n｝^N _n=1をクラスタリングして得られる，各クラスタの代表ベクトルである．クラスタリングの手法は問わない．なおプロトタイプ集合｛ｐ_m｝^M _m=1は学習パターン集合｛ｘ_n｝^N _n=1そのものでもよい（この場合はＭ＝Ｎ）．τ_m,jは学習によって調整される実数パラメータである．Λは分類器の学習パラメータ集合であり，今の場合はΛ＝｛τ_m,j｝^M _m=1 ^J _j=1である．２つの学習パターンｘ,ｘ´∈Χに対し，Ｋ(ｘ，ｘ´)は，入力パターン空間Χから非常に高い次元（しばしば無限次元）の空間（ここでは空間Ｈとする）への特徴変換φ(・)を適当に定めたときの，２つの特徴変換されたパターンベクトルφ(ｘ),φ(ｘ´)の内積を表し，カーネルとよばれる．カーネルＫ(・,・)としては様々なものが存在するが，ここでは，Ｍ個のプロトタイプで構成される次式のグラム行列 Here, {p _m } ^M _{m = 1} is a set of prototypes calculated from ^N learning pattern sets {x _n } ^N _{n = 1} (M is the total number of prototypes). For example, these are the representative vectors of each cluster obtained by clustering the learning patterns {x _n } ^N _{n = 1} . Any clustering method can be used. The prototype set {p _m } ^M _{m = 1} may be the learning pattern set {x _n } ^N _{n = 1} itself (in this case, M = N). τ _{m, j} is a real parameter adjusted by learning. Λ is the learning parameter set of the classifier. In this case, Λ = {τ _{m, j} } ^M _{m = 1} ^J _{j = 1} . For two learning patterns x, x'∈Χ, K (x, x ') is a feature transformation from the input pattern space Χ to a very high-dimensional (often infinite) space (here, space H) This represents the inner product of two feature-transformed pattern vectors φ (x) and φ (x ′) when φ (•) is appropriately determined, and is called the kernel. There are various kernels K (・, ・), but here, the Gram matrix of the following formula consisting of M prototypes

が，Ｍ及び｛ｐ_m｝^M _m=1がどんな値であったとしても常に正定値となるものであれば，その種類を問わない．そのようなカーネルを正定値カーネルという．実際，多くのカーネルがこの正定値の条件を満たす．例えば次式のガウシアンカーネルがこの条件を満たし，実際に広く用いられている．

However, no matter what kind of values M and {p _m } ^M _{m = 1} are always positive definite values. Such a kernel is called a positive definite kernel. In fact, many kernels satisfy this positive definite condition. For example, the following Gaussian kernel satisfies this condition and is widely used in practice.

そして，分類器は次式の分類決定則に従って分類を行なうものとする．

The classifier classifies according to the following classification decision rule.

［２カーネル線形和型の判別関数に対する大幾何マージン最小分類誤り学習］
（２．１特徴空間における大幾何マージン最小分類誤り学習の概略）
式（１）は，次式のようにＭ次ベクトルの内積形式で書き表すことができる．

[2 Large geometric margin minimum classification error learning for kernel linear sum type discriminant function]
(2.1 Outline of large geometric margin minimum classification error learning in feature space)
Equation (1) can be expressed in the inner product form of M-order vectors as in the following equation.

ここで上付き「^Ｔ」は行列及びベクトルの転置を表す．ベクトルｋ(ｘ)が学習パターンｘをＭ次元特徴空間上に写像した特徴ベクトルであるとみなせば，ｇ_j(ｘ；Λ)はその特徴空間上でのベクトルτ_jを係数ベクトルとした線形判別関数と考えることができる．

Here, the superscript “ ^T ” represents transposition of matrices and vectors. Assuming that the vector k (x) is a feature vector obtained by mapping the learning pattern x on the M-dimensional feature space, g _j (x; Λ) is a linear discrimination using the vector τ _j on the feature space as a coefficient vector. Think of it as a function.

更に，後述する高次元空間Ｈにおける学習と対応させるため，ベクトルｋ(ｘ)に対して線形変換を施すことを考える．まず式（２）のグラム行列Ｋ（ここでは正定値であると仮定）のコレスキー分解（G.H. Golub and C. F. Van Loan, Matrix Computations 2nd Ed., The Johns Hopkins University Press, 1989.）を考える． Furthermore, in order to correspond to learning in a high-dimensional space H described later, it is considered that linear transformation is performed on the vector k (x). First, consider the Cholesky decomposition (G.H. Golub and C.F. Van Loan, Matrix Computations 2nd Ed., The Johns Hopkins University Press, 1989.) of the gram matrix K in Equation (2) (assuming positive definite value here).

ここで行列Ｌは対角成分が正の下三角行列である．そしてこの下三角行列を用いて，Ｍ次係数ベクトルτ_j及びＭ次特徴ベクトルｋ(ｘ)を次式により変換する．

Here, the matrix L is a lower triangular matrix whose diagonal component is positive. Using this lower triangular matrix, the Mth order coefficient vector τ _j and the Mth order feature vector k (x) are converted by the following equation.

これにより，判別関数は次式で書き直される．

As a result, the discriminant function is rewritten as

図２を参照して，入力パターン空間３０からの，ベクトル変換β(・)＝Ｌ^-1ｋ(・)による写像先であるＭ次元空間３２をここではＭ次元空間Ｂと表す．そして，ベクトルｋ(ｘ)に代えてベクトルβ(ｘ)をＭ次元特徴ベクトルであるとみなせば，上式は，判別関数ｇ_j(ｘ；Λ)がＭ次元特徴空間Ｂ上でのベクトルα_jを係数ベクトルとした線形判別関数であることを示している．ただし本実施の形態では，式（９）の関係により，係数ベクトル集合｛α_j｝^J _j=1の最適化を係数ベクトル集合｛τ_j｝^J _j=1の調整を介して行なう．

Referring to FIG. 2, an M-dimensional space 32 that is a mapping destination by vector transformation β (•) = L ⁻¹ k (•) from the input pattern space 30 is represented as an M-dimensional space B here. If the vector β (x) is regarded as an M-dimensional feature vector in place of the vector k (x), the above equation indicates that the discriminant function g _j (x; Λ) is a vector α on the M-dimensional feature space B. _This shows that it is a linear discriminant function with _j as a coefficient vector. However, in the present embodiment, the optimization of the coefficient vector set {α _j } ^J _{j = 1} is performed through the adjustment of the coefficient vector set {τ _j } ^J _{j = 1} due to the relationship of Equation (9).

判別関数の集合｛ｇ_j(ｘ；Λ)｝^J _j=1により分類決定境界が定まるが，この境界はパターン空間Ｘのみならず特徴空間Ｂにおいても形成される．図２を参照して，ここでは特徴空間Ｂにおいて形成される分類決定境界Γを考える．式（１０）のベクトル変換β(・)により分類決定境界Γ付近に写され，しかも上記判別関数により正しく分類される学習パターンを１つ考え，これをｘ^oとする．ベクトルβ(ｘ^o)と境界Γとの（特徴空間Ｂにおける）ユークリッド距離ｒは，文献（H. Watanabe, S. Katagiri, K. Yamada, E. McDermott, A. Nakamura, S. Watanabe, and M. Ohsaki, “Minimum error classification with geometric margin control,” in Proc. IEEE ICASSP, pp. 2170-2173, Mar. 2010.）を参考にして，次式で与えられる． A classification decision boundary is determined by a set of discriminant functions {g _j (x; Λ)} ^J _{j = 1.} This boundary is formed not only in the pattern space X but also in the feature space B. Referring to FIG. 2, consider the classification decision boundary Γ formed in the feature space B here. Consider one learning pattern that is copied in the vicinity of the classification decision boundary Γ by the vector transformation β (·) in equation (10) and that is correctly classified by the discriminant function, and this is x ^o . The Euclidean distance r (in the feature space B) between the vector β (x ^o ) and the boundary Γ is described in the literature (H. Watanabe, S. Katagiri, K. Yamada, E. McDermott, A. Nakamura, S. Watanabe, and M Ohsaki, “Minimum error classification with geometric margin control,” in Proc. IEEE ICASSP, pp. 2170-2173, Mar. 2010.).

ここでＣ_yは学習パターンｘ^ｏの属する正しいクラス，Ｃ_iは学習パターンｘ^ｏに対するbest-incorrectクラス（最大の判別関数値を与える不正解のクラス）である．ユークリッド距離ｒは（空間Ｂにおける）幾何マージンとよばれる．この値が大きくなるように判別関数を学習すれば，空間Ｂにおいて，境界付近の正分類の学習パターンβ(ｘ^o)の近くに現れるであろう，同じクラスに属する未知パターンを正しく分類することができ，学習耐性が向上する［上記Watanabeらによる．］．

Here C _y is the correct class belongs learning pattern x ^o, best-incorrect class for C _i is a learning pattern x ^o (maximum incorrect classes that provide a discriminant function value). The Euclidean distance r is called the geometric margin (in space B). If the discriminant function is learned so that this value becomes large, the unknown pattern belonging to the same class that appears in the space B near the learning pattern β (x ^o ) near the boundary is correctly classified. Can improve learning tolerance [by Watanabe et al. ].

なお，上式（１２）は係数ベクトルα_y，α_iを含むため，このままではグラム行列Ｋのコレスキー分解の計算及び下三角行列の逆行列計算が必要となる．プロトタイプ数Ｍが大きい場合，このことは数値的不安定性をもたらす（特に学習パターン集合をプロトタイプ集合と考える場合はこの問題が深刻となる）．そこで本発明では，式（９）（１０）を式（１２）に代入することにより，幾何マージンｒを係数ベクトル集合｛τ_j｝^J _j=1の関数形式として次式により書き直し，これを学習に用いる． Since the above equation (12) includes coefficient vectors α _y and α _i , calculation of Cholesky decomposition of the Gram matrix K and inverse matrix calculation of the lower triangular matrix are necessary as it is. This leads to numerical instability when the number of prototypes M is large (especially when the learning pattern set is considered as a prototype set, this problem becomes serious). Therefore, in the present invention, by substituting Equations (9) and (10) into Equation (12), the geometric margin r is rewritten as a functional form of the coefficient vector set {τ _j } ^J _{j = 1} by the following equation, and this is learned. Used for.

すなわち上式は，係数ベクトル集合｛τ_j｝^J _j=1を調整パラメータとした空間Ｂ上の幾何マージンであり，しかもこの式の利用によりグラム行列Ｋのコレスキー分解の計算及び下三角行列の逆行列計算が不要となる．
そこで，各学習パターンｘに対して，幾何マージンｒの符号を反転させた

That is, the above equation is the geometric margin on the space B with the coefficient vector set {τ _j } ^J _{j = 1} as an adjustment parameter, and by using this equation, the calculation of the Cholesky decomposition of the Gram matrix K and the lower triangular matrix Inverse matrix calculation is not required.
Therefore, the sign of the geometric margin r was inverted for each learning pattern x.

を定める．Ｄ_y(ｘ；Λ)は，正値ならば誤分類，負値ならば正分類に対応し，最小分類誤り（ＭＣＥ）学習における誤分類測度の一種と考えることができる．そしてその絶対値は，分類決定境界からのユークリッド距離を表す．このＤ_y(ｘ；Λ)を幾何マージン型誤分類測度と呼ぶことにする．

Is defined. D _y (x; Λ) corresponds to misclassification if positive and positive if negative, and can be considered as a kind of misclassification measure in minimum classification error (MCE) learning. The absolute value represents the Euclidean distance from the classification decision boundary. This D _y (x; Λ) is called a geometric margin type misclassification measure.

続いて，幾何マージン型誤分類測度Ｄ_y(ｘ；Λ)に対する平滑化分類誤り数損失を次式で定める． Next, the smoothed classification error number loss for the geometric margin type misclassification measure D _y (x; Λ) is determined by the following equation.

式（１５）はＤ_y(ｘ；Λ)に関して単調増加のロジスティックシグモイド関数であり，パラメータζが大きくなるにつれてシグモイド関数の傾きが大きく（急に）なる．ζ→∞の極限において，Ｄ_y(ｘ；Λ)は，Ｄ_y(ｘ；Λ)＞０すなわち誤分類の場合に値１を，Ｄ_y(ｘ；Λ)＜０すなわち正分類の場合に値０をとる．すなわち，平滑化分類誤り数損失は，誤分類カウントと直接的に結びついているだけでなく，学習パラメータ集合Λに関して微分可能である．更に，式（１５）の値を小さくするような学習パラメータ集合Λの調整は，分類誤り数を減少させるのみならず，Ｄ_y(ｘ；Λ)を負の方向に増大させるため，正分類された学習パターンの（空間Ｂにおける）幾何マージンが増大し，未知のパターンに対する耐性を向上させることができる． Equation (15) is a logistic sigmoid function that increases monotonously with respect to D _y (x; Λ), and the slope of the sigmoid function increases (steeply) as the parameter ζ increases. In the limit of ζ → ∞, D _y (x; Λ) is 1 when D _y (x; Λ)> 0, that is, misclassification, and D _y (x; Λ) <0, ie, when it is a correct classification. Takes the value 0. That is, the number of smoothed classification error losses is not only directly related to the misclassification count, but is also differentiable with respect to the learning parameter set Λ. Furthermore, adjustment of the learning parameter set Λ to reduce the value of equation (15) not only reduces the number of classification errors, but also increases D _y (x; Λ) in the negative direction, so that the positive classification is performed. The geometric margin (in space B) of the learned pattern increases, and the resistance to unknown patterns can be improved.

学習では，Ｎ個の学習パターンからなる学習パターン集合Ω_N＝｛(ｘ_n，ｙ_n)｝^Ｎ _n=1（ｙ_nは学習パターンｘ_nの属する正しいクラスの指標）から構成される次式の経験的平均損失を最小にする学習パラメータ集合Λを求める． In learning, the following equation is formed from a learning pattern set Ω _N = {(x _n , y _n )} ^N _{n = 1} (y _n is an index of a correct class to which the learning pattern x _n belongs) consisting of ^N learning patterns. Find the learning parameter set Λ that minimizes the empirical average loss of.

損失Ｌ(Λ)の最小化に関して，最急降下法等のバッチ的手法だけではなく，学習パターン集合Ω_Nから１個の学習パターン（ｘ_n，ｙ_n）を抽出する度に学習パラメータ集合Λを調整する適応的な学習方法も広く用いられている．その方法における学習パラメータ集合Λの調整機構は次式で与えられる（ｔは繰返し番号）．

Regarding the minimization of the loss L (Λ), not only the batch method such as the steepest descent method but also the learning parameter set Λ is extracted every time one learning pattern (x _n , y _n ) is extracted from the learning pattern set Ω _N. Adaptive learning methods to adjust are also widely used. The adjustment mechanism of the learning parameter set Λ in the method is given by the following equation (t is a repetition number).

本実施の形態では，この適応的学習方法を採用することとする．

In this embodiment, this adaptive learning method is adopted.

（２．２システム構成及び判別関数の学習の計算手順）
２．２．１システム構成
図３を参照して，本実施の形態に係るシステム４０は，入力音声４６が，予め知られている複数の話者のうちの誰かを識別するためのものである．このシステム４０は，話者識別のための判別関数を以下に説明する手順にしたがって学習する判別関数学習装置４２と，判別関数学習装置４２により学習された判別関数を何らかの形で話者判別装置４８に伝達する判別関数伝達媒体４４と，判別関数伝達媒体４４により伝達された判別関数を用い，入力音声４６の話者識別を行ない，話者判別結果５０を出力する話者判別装置４８とを含む．一般的に，判別関数学習装置４２と話者判別装置４８とは別々の装置である．すなわち，判別関数学習装置４２で学習された判別関数は，ハードディスク，半導体メモリ等の記憶媒体，通信媒体を介して話者判別装置４８に配布される．したがって話者判別装置４８は判別関数学習装置４２と同じ場所にあることは必ずしも想定されていない． (2.2 Calculation procedure for learning system configuration and discriminant function)
2.2.1 System Configuration Referring to FIG. 3, in system 40 according to the present embodiment, input voice 46 is used for identifying someone among a plurality of speakers that are known in advance. . This system 40 includes a discriminant function learning device 42 that learns a discriminant function for speaker identification according to the procedure described below, and a speaker discriminator 48 that uses the discriminant function learned by the discriminant function learning device 42 in some form. A discriminant function transmission medium 44 for transmitting to the voice, and a speaker discriminating device 48 for performing speaker identification of the input speech 46 using the discriminant function transmitted by the discriminant function transmission medium 44 and outputting a speaker discrimination result 50. . Generally, the discriminant function learning device 42 and the speaker discriminating device 48 are separate devices. That is, the discriminant function learned by the discriminant function learning device 42 is distributed to the speaker discriminating device 48 via a storage medium such as a hard disk and a semiconductor memory, and a communication medium. Therefore, it is not necessarily assumed that the speaker discriminating device 48 is in the same place as the discriminant function learning device 42.

判別関数学習装置４２は，学習のための発話データを記憶する第１の記憶装置６０と，第１の記憶装置６０に記憶された発話データから所定の特徴量ベクトルを抽出し，話者判別のための学習パターンとして出力する特徴量抽出部６２と，特徴量抽出部６２により抽出された学習パターン集合を記憶する第２の記憶装置６４と，第２の記憶装置６４に記憶された学習パターン集合を学習のためのサンプルデータとして，後述する手順にしたがって話者の判別関数を学習し判別関数伝達媒体４４に与えるための学習装置６６とを含む． The discriminant function learning device 42 extracts a first feature value vector from the first storage device 60 that stores utterance data for learning and the utterance data stored in the first storage device 60, and performs speaker discrimination. Feature amount extraction unit 62 that outputs as a learning pattern for the purpose, second storage device 64 that stores the learning pattern set extracted by feature amount extraction unit 62, and learning pattern set that is stored in second storage device 64 Is included as a sample data for learning, and a learning device 66 for learning a speaker discriminant function according to a procedure to be described later and giving the discriminant function to the discriminant function transfer medium 44 is included.

一方，話者判別装置４８は，判別関数伝達媒体４４により伝達された話者別の判別関数を記憶する判別関数記憶部８０と，判別関数学習装置４２の特徴量抽出部６２と同じ手法により入力音声４６から所定の特徴量ベクトルを抽出するための特徴量抽出部８２と，特徴量抽出部８２により抽出された特徴量ベクトルに判別関数記憶部８０に記憶された判別関数を適用し，複数の話者のうち１人を入力音声４６の話者として判別し，話者判別結果５０を出力する話者判別部８４とを含む． On the other hand, the speaker discriminating device 48 is input by the same method as the discriminant function storage unit 80 for storing the discriminant function for each speaker transmitted by the discriminant function transmission medium 44 and the feature amount extracting unit 62 of the discriminant function learning device 42. A feature amount extraction unit 82 for extracting a predetermined feature amount vector from the voice 46, and a discriminant function stored in the discriminant function storage unit 80 is applied to the feature amount vector extracted by the feature amount extraction unit 82. A speaker discriminating unit 84 for discriminating one of the speakers as a speaker of the input speech 46 and outputting a speaker discrimination result 50 is included.

後述するように，判別関数学習装置４２及び話者判別装置４８はいずれも記憶装置及び判別関数伝達媒体４４とのデータ交換機能を備えたコンピュータハードウェア，及びそのコンピュータハードウェア上で実行されるコンピュータソフトウェアにより実現される．本明細書では，以下，判別関数学習装置４２を実現するためのコンピュータプログラムの制御構造について説明する． As will be described later, each of the discriminant function learning device 42 and the speaker discriminating device 48 includes computer hardware having a data exchange function with a storage device and a discriminant function transmission medium 44, and a computer executed on the computer hardware. Realized by software. In this specification, the control structure of the computer program for realizing the discriminant function learning device 42 will be described below.

２．２．２計算手順
図４及び図５を参照して，本実施の形態に係る学習を実現するためのプログラムは，以下の各ステップを有する．
１．（ステップ１１０）
正定値カーネルＫ(・,・)を用意する．
２．（ステップ１１２）
プロトタイプ集合｛ｐ_m｝^M _m=1を用意する．プロトタイプ集合は予め準備しておくこともできるが，本実施の形態では，学習パターン集合｛ｘ_n｝^Ｎ _n=1をクラスタリングすることによりプロトタイプ集合を求める．
３．（ステップ１１４及び１１６）
各クラスＣ_jに対して，Ｍ次係数ベクトルτ⁽⁰⁾ _jを初期化する（ｊ＝１，２，…，Ｊ）．
４.（ステップ１１８）
係数ベクトルτの適応的学習を通じた繰返回数を示す繰返制御変数ｔをｔ＝０に初期化する．同様に，全学習パターンを用いた繰返しの数を示すエポック回数を示す変数ｅの上限値Ｅを設定する．
５．（ステップ１２０）
係数ベクトルτについての適応的学習を行なう．この詳細については図５を参照して後述する．ステップ１２０の処理の結果，各クラスＣ_j（ｊ＝１，２，…，Ｊ）の各々について，判別関数ｇ_j(ｘ)を構成するために必要な係数ベクトルτ_j（ｊ＝１，２，…，Ｊ）を得ることができる．
６．（ステップ１２２及び１２４）
ステップ１２０の処理により最終的に得られた係数ベクトルτ_j（ｊ＝１，２，…，Ｊ）から，次式に従ってクラスＣ_j（ｊ＝１，２，…，Ｊ）の判別関数を構成する． 2.2.2 Calculation Procedure With reference to FIG. 4 and FIG. 5, the program for realizing learning according to the present embodiment has the following steps.
1. (Step 110)
Prepare a positive definite kernel K (・, ・).
2. (Step 112)
Prepare a prototype set {p _m } ^M _{m = 1} . Although the prototype set can be prepared in advance, in this embodiment, the prototype set is obtained by clustering the learning pattern set {x _n } ^N _{n = 1} .
3. (Steps 114 and 116)
For each class C _j , the M-th order coefficient vector τ ⁽⁰⁾ _j is initialized (j = 1, 2,..., J).
4. (Step 118)
A repetition control variable t indicating the number of repetitions through adaptive learning of the coefficient vector τ is initialized to t = 0. Similarly, an upper limit value E of a variable e indicating the number of epochs indicating the number of repetitions using all learning patterns is set.
5. (Step 120)
Perform adaptive learning on the coefficient vector τ. Details of this will be described later with reference to FIG. As a result of the processing in step 120, for each class C _j (j = 1, 2,..., J), the coefficient vector τ _j (j = 1, 2) necessary for constructing the discriminant function g _j (x). , ..., J).
6). (Steps 122 and 124)
From the coefficient vector τ _j (j = 1, 2,..., J) finally obtained by the processing of step 120, a discriminant function of class C _j (j = 1, 2,. Do it.

７．（ステップ１２６）
ステップ１２２及び１２４の処理で得られた各クラスの判別関数ｇ_j(ｘ)（ｊ＝１，２，…，Ｊ）を所定の記憶装置に記憶して処理を終了する．
図５を参照して，図４に示すステップ１２０の係数ベクトルτの適応的学習処理は以下のステップを含む．
すなわち，係数ベクトルτの適応的学習処理では，エポック変数ｅ＝０，１，…，Ｅに対して以下の処理１５２を繰返す（ステップ１５０）．
処理１５２は，全学習パターンに対して以下の処理１６２を繰返すステップ１６０と，ステップ１６０の処理が終了したのち，学習パターン集合Ω_Nにおける学習パターンの並び順をシャッフルするステップ１６４とを含む．
処理１６２は以下のサブステップを含む．
（ａ）（サブステップ１７０）
学習パターン集合Ω_Nから，１個の学習パターン（ｘ_n，ｙ_n）を取り出す．
（ｂ）（サブステップ１７２）
式（７）に従い，Ｍ次ベクトルｋ(ｘ_n)を構成する．
（ｃ）（サブステップ１７４及び１７６）
各クラスＣ_j（ｊ＝１，２，…，Ｊ）に対して，判別関数値ｇ_jを以下の式に従い計算する（ｊ＝１，２，…，Ｊ）． 7). (Step 126)
The discriminant function g _j (x) (j = 1, 2,..., J) of each class obtained in the processes of steps 122 and 124 is stored in a predetermined storage device, and the process is terminated.
Referring to FIG. 5, the adaptive learning process of coefficient vector τ in step 120 shown in FIG. 4 includes the following steps.
That is, in the adaptive learning process of the coefficient vector τ, the following process 152 is repeated for the epoch variable e = 0, 1,..., E (step 150).
The process 152 includes a step 160 for repeating the following process 162 for all the learning patterns, and a step 164 for shuffling the order of the learning patterns in the learning pattern set Ω _N after the process of the step 160 is completed.
Process 162 includes the following substeps.
(A) (Substep 170)
One learning pattern (x _n , y _n ) is extracted from the learning pattern set Ω _N.
(B) (Substep 172)
Construct M-order vector k (x _n ) according to equation (7).
(C) (Substeps 174 and 176)
For each class C _j (j = 1, 2,..., J), a discriminant function value g _j is calculated according to the following formula (j = 1, 2,..., J).

（ｄ）（サブステップ１７８）
学習パターンｘ_nに対するｂｅｓｔ−ｉｎｃｏｒｒｅｃｔクラスＣ_inを次式にしたがって求める．

(D) (Substep 178)
The best-indirect class C _in for the learning pattern x _{n is obtained} according to the following equation.

（ｅ）（サブステップ１８０）
初期のＭＣＥ学習定式化における関数マージン型誤分類測度ｄ_ynを次式にしたがって計算する．

(E) (Substep 180)
The function margin type misclassification measure d _yn in the initial MCE learning formulation is calculated according to the following equation.

（ｆ）（サブステップ１８２）
幾何マージン型誤分類測度値Ｄ_ynを次式に従って計算する．

(F) (Substep 182)
The geometric margin type misclassification measure value D _yn is calculated according to the following equation.

（ｇ）（サブステップ１８４）
次式に従って係数ベクトルτ_jを更新する（ｊ＝１，２，…，Ｊ）．

(G) (Substep 184)
The coefficient vector τ _j is updated according to the following equation (j = 1, 2,..., J).

（ｈ）（サブステップ１８６）
ｔ＝ｔ＋１と更新して，対象となっている学習パターンに対する処理を終了する．
以上のような制御構造を有するコンピュータプログラムをコンピュータで実行させることにより，第１の実施の形態に係る判別関数の学習が完了する． (H) (Substep 186)
Update t = t + 1 and end the processing for the target learning pattern.
Learning of the discriminant function according to the first embodiment is completed by causing the computer program having the above control structure to be executed by the computer.

（２．３初期化（ステップ１１６））
本実施の形態では，上記コンピュータプログラムのステップ１１４及び１１６における初期化手法として，公知の多クラスサポートベクターマシーン（以下「ＭＳＶＭ」と略記）を採用する．他手法も可能であり，それらについては変形例として後述する． (2.3 Initialization (Step 116))
In this embodiment, a known multi-class support vector machine (hereinafter abbreviated as “MSVM”) is adopted as an initialization method in steps 114 and 116 of the computer program. Other methods are possible and will be described later as modified examples.

再び図２を参照して，ＭＳＶＭでは，カーネルを定義する際にも現れる，非常に高い次元の空間３４（これを空間Ｈとする．）への特徴変換φ(・)を導入し，空間Ｈにおける線形判別関数を扱う．この線形判別関数は次式で与えられる． Referring to FIG. 2 again, MSVM introduces a feature transformation φ (•) into a very high-dimensional space 34 (this space is referred to as space H) that appears when defining a kernel. Handles linear discriminant functions in. This linear discriminant function is given by

ＭＳＶＭにおける学習対象は，非常に高い次元の係数ベクトル集合｛ｗ_j｝^J _j=1である．その学習は，次式の制約条件付き最適化問題の解を与える係数ベクトル集合｛ｗ_j｝^J _j=1を探索することで行なわれる．

The learning target in MSVM is a very high-dimensional coefficient vector set {w _j } ^J _{j = 1} . The learning is performed by searching the coefficient vector set {w _j } ^J _{j = 1} that gives the solution of the constrained optimization problem of the following equation.

ここで１(ｐ)は命題ｐが真なら１，偽なら０を返す指示関数である．目的関数の第１項

Here, 1 (p) is an indicator function that returns 1 if the proposition p is true and 0 if it is false. The first term of the objective function

は線形判別関数の係数ベクトルの大きさを制約する働きを持ち，この項の最小化は，空間Ｈにおける幾何マージン（式（１８）の判別関数により定まる空間Ｈでの分類決定境界Γ‘（図２を参照）と，それに最も近い学習パターンとの，空間Ｈにおけるユークリッド距離ｒ’）の最大化を目指すものである．また目的関数の第２項におけるξ_nはスラック変数とよばれる．上式の制約式は，「学習パターンｘ_nの所属しているクラスＣ_ynの判別関数値は，クラスＣ_yn以外のクラスの判別関数値＋（１−ξ_n）よりも大きい」ということを表している．この制約は，正解クラスＣ_ynに対応する判別関数値を，その他のクラスに対応する判別関数値より１以上大きくすることによってマージンをかせぐ効果をもたらしている．ただし，スラック変数ξ_nの値分だけはマージンが小さくなることを許容しており，目的関数の第２項の最小化により，この許容量を最小化しようとしている．更に，幾何マージン最大化のための係数ノルム最小化と上記スラック変数の最小化との２つの目的を達成するべく，両目的関数のハイパーパラメータβによる重み付け和により，ＭＳＶＭの目的関数が構成されている．

Serves to constrain the size of the coefficient vector of the linear discriminant function, and the minimization of this term is performed by the geometric margin in the space H (the classification decision boundary Γ ′ in the space H determined by the discriminant function in the equation (18) (see FIG. 2) and the closest learning pattern to the Euclidean distance r ′) in the space H. Ξ _n in the second term of the objective function is called slack variable. Constraint in the above equation, that "discriminant function value for class C _yn that belong learning pattern x _n is greater than Class C _yn other classes of discriminant function values + (1-ξ _n)" It represents. This restriction has the effect of increasing the margin by making the discriminant function value corresponding to the correct class C _yn one or more larger than the discriminant function values corresponding to the other classes. However, the margin for the slack variable ξ _n is allowed to be small, and we try to minimize this tolerance by minimizing the second term of the objective function. Furthermore, in order to achieve the two objectives of minimizing the coefficient norm for maximizing the geometric margin and minimizing the slack variable, the MSVM objective function is constructed by weighted sums of both objective functions by the hyperparameter β. Yes.

上記の制約条件付き最適化問題はラグランジュ乗数法によって解くことができ，結果的にＮ個のＪ次元ベクトル集合｛^〜τ_n｝^Ｎ _n=1に関する次式の凸最適化問題を解くことに帰着する．（この式中の「^〜」は，数式イメージでは「τ」の直上に記載されている．） The above optimization problem with constraints can be solved by the Lagrange multiplier method, resulting in solving the convex optimization problem of the following equation for ^N J-dimensional vector sets { ^~ τ _n } ^N _{n = 1} Do it. (" ^~ " In this expression is written directly above "τ" in the mathematical image.)

なおここで^〜τ_nは

Where ^~ τ _n is

である．式（６）の係数ベクトルτ_jがプロトタイプ番号ｍについて並べたＭ次ベクトルであったのに対し，上式の係数ベクトル^〜τ_nはクラス番号ｊについて並べたＪ次ベクトルである．また式（２０）の「１_yn」はｙ_n成分のみ１で他の成分が０のＪ次ベクトル，太字の「１」は全ての成分が１のＪ次ベクトルである．そして，所望の係数ベクトルｗ_jは次式で与えられる．

It is. Whereas coefficient vector tau _j of formula (6) was M order vector obtained by arranging prototype number m, the coefficient vector ^~ tau _n in the above equation is J order vector obtained by arranging for the class number j. Also a "1 _yn" is J order vector of J order vector of the other components is 0 at y _n component only 1, is "1" all components of bold 1 of formula (20). The desired coefficient vector w _j is given by

更にこれを式（１８）に代入して，判別関数が次に示す式で与えられることとなる（定数倍β^―１は分類決定に無関係のため省略可）．

Furthermore, by substituting this into the equation (18), the discriminant function is given by the following equation (the constant multiple β- ¹ is irrelevant to the classification decision and can be omitted).

この判別関数（式（２３））は，プロトタイプ集合｛ｐ_m｝^M _m=1＝｛ｘ_n｝^Ｎ _n=1（Ｍ＝Ｎ）としたときの式（１）の判別関数と同じである．よって，プロトタイプ集合を学習パターン集合そのものとした場合に，ＭＳＶＭで計算される係数ベクトル集合｛τ_n,j｝^Ｎ _n=1 ^J _j=1を２．２で開示したアルゴリズムのステップ１１４及び１１６における初期化に用いることができる．

This discriminant function (formula (23)) is the same as the discriminant function of formula (1) when the prototype set {p _m } ^M _{m = 1} = {x _n } ^N _{n = 1} (M = N). . Therefore, in the case where the prototype set is the learning pattern set itself, the coefficient vector set {τ _{n, j} } ^N _{n = 1} ^J _{j = 1} calculated by MSVM in

steps

114 and 116 of the algorithm disclosed in 2.2. Can be used for initialization.

本実施の形態において，上述のＭＳＶＭをそのまま初期化として採用する場合，プロトタイプ集合を学習パターン集合そのものとする必要がある．したがって，特に学習パターンの総数が非常に多い場合，本実施の形態におけるＭＣＥ学習を非常に高い次元で行なわなければならず，計算量の爆発と数値的不安定性をもたらす．この高次元化の問題を回避するために，（１）上記のＭＳＶＭで得られたサポートパターンのみをプロトタイプとする方法，及び（２）プロトタイプ集合を学習パターン集合としたＭＳＶＭ法，のいずれかを適用することも可能である．以下，これら変形例についてその詳細を説明する． In this embodiment, when the above-described MSVM is used as it is as an initialization, the prototype set must be the learning pattern set itself. Therefore, especially when the total number of learning patterns is very large, the MCE learning in this embodiment must be performed in a very high dimension, resulting in a computational complexity explosion and numerical instability. In order to avoid this problem of higher dimensions, either (1) a method using only the support pattern obtained by the above-mentioned MSVM as a prototype, or (2) an MSVM method using the prototype set as a learning pattern set is used. It is also possible to apply. The details of these modifications are described below.

〈変形例１〉
２．３．１サポートパターンのみをプロトタイプとする方法
前記したＭＳＶＭでは，式（２０）の凸最適化問題を解いて得られる係数ベクトル集合｛^〜τ_n｝^Ｎ _n=1に含まれる係数ベクトルが，いくつかの（しばしば多くの）ｎに対して零ベクトル（又は零ベクトルに近いベクトル）となる．これは対応する学習パターンｘ_nが分類境界から遠く離れた正解クラスの領域に存在していることを意味しており，そのような学習パターンは分類境界の形成に対して貢献度が低いと考えられる．零ベクトルに近くない^〜τ_nに対応する学習パターンｘ_nはサポートパターン又はサポートベクターとよばれる．よって，すべてのサポートパターンの集合をプロトタイプ集合｛ｐ_m｝^M _m=1として，式（１）の判別関数を構成してもよい．この場合，Ｍはサポートパターンの総数となる．具体的には，係数ベクトルのノルムの絶対値があるしきい値以上となる学習パターンのみを用いればよい． <Modification 1>
2.3.1 MSVM only was the method and the prototype support pattern, the coefficient vector included in the coefficient vector set ^{_{^{{~ τ n} N n =}}} 1 obtained by solving a convex optimization problem of Equation (20) , The zero vector (or a vector close to the zero vector) for some (often many) n. This means that the corresponding learning pattern x _n exists in the correct class area far from the classification boundary, and such a learning pattern has a low contribution to the formation of the classification boundary. It is possible. The learning pattern x _n corresponding ^to τ _n that is not close ^{to the} zero vector is called a support pattern or support vector. Therefore, the discriminant function of Equation (1) may be configured with the set of all support patterns as the prototype set {p _m } ^M _{m = 1} . In this case, M is the total number of support patterns. Specifically, it is only necessary to use a learning pattern in which the absolute value of the norm of the coefficient vector exceeds a certain threshold.

〈変形例２〉
２．３．２プロトタイプ集合を学習パターン集合としたＭＳＶＭ法
この初期化手法の実装には，まず学習パターン集合｛ｘ_n｝^Ｎ _n=1を所属クラス毎にクラスタリングしてプロトタイプ集合｛ｐ_m｝^M _m=1を得て（クラス毎のクラスタリングであるため各ｐ_mにはその所属するクラス番号ｙ_mが付与されている．），そして式（１９），（２０），（２２）及び（２３）において，｛ｘ_n｝^Ｎ _n=1を｛ｐ_m｝^M _m=1に，｛ｙ_n｝^Ｎ _n=1を｛ｙ_m｝^M _m=1に，ＮをＭに，それぞれ置き換えるだけで良い．こうして得られる判別関数である式（２３）が本実施の形態で採用する式（１）と同型となる．クラスタリングの手法は問わないが，例えばＫ−ｍｅａｎｓ法を用いる場合の手法は非特許文献２で提案されている（ただし非特許文献２は２クラス分類を対象としたＳＶＭを扱っている）． <Modification 2>
2.3.2 MSVM Method Using Prototype Set as Learning Pattern Set To implement this initialization method, the learning pattern set {x _n } ^N _{n = 1} is _first clustered for each class and the prototype set {p _m } to obtain ^M _{m = 1} (class number y _m for each p _m for a clustering for each class to which it belongs has been granted.), and formula (19), (20), (22) and ( 23), {x _n } ^N _{n = 1} is replaced by {p _m } ^M _{m = 1} , {y _n } ^N _{n = 1} is replaced by {y _m } ^M _{m = 1} , and N is replaced by M. OK. Equation (23), which is the discriminant function thus obtained, is the same type as Equation (1) adopted in this embodiment. The clustering method is not limited, but for example, a method using the K-means method has been proposed in Non-Patent Document 2 (however, Non-Patent Document 2 deals with SVM targeting two-class classification).

（２．４本実施の形態の効果に対する理論的考察）
本実施の形態で採用する判別関数である式（１）は，前記ＭＳＶＭの定式化と同じように，高次元空間Ｈへの特徴変換写像φ(・)を使って次式のような空間Ｈ上の線形判別関数の形式で書き表すこともできる． (2.4 Theoretical consideration on the effect of this embodiment)
Equation (1), which is a discriminant function employed in the present embodiment, is obtained by using a feature transformation map φ (•) to a high-dimensional space H, as in the MSVM formulation, as a space H It can also be written in the form of the linear discriminant function above.

特に，ｗ_j(τ_j)が空間Ｈに写されたプロトタイプ集合｛φ(ｐ_m)｝^M _m=1の線形結合であり，その結合係数が式（６）の係数ベクトルτ_jの成分で，したがってｗ_j(τ_j)の値が係数ベクトルτ_jにより定まることに注意するべきである．式（２５）より，２つのクラスＣ_iとＣ_j（ｉ≠ｊ)に関して次式がなりたつことがわかる．

In particular, w _j (τ _j ) is a linear combination of a prototype set {φ (p _m )} ^M _{m = 1} in which space H is copied, and the coupling coefficient is a component of coefficient vector τ _{j in} equation (6). Therefore, it should be noted that the value of w _j (τ _j ) is determined by the coefficient vector τ _j . From equation (25), it can be seen that the following equation holds for two classes C _i and C _j (i ≠ j).

ここでＭ次係数ベクトルτ_j及びＭ次行列Ｋはそれぞれ式（６）及び式（２）で与えられている．そして２．１で述べたような境界付近の正分類学習パターンｘ^ｏをとり，この正解クラスとbest-incorrectクラスとをそれぞれＣ_y及びＣ_iとする．式（２６）がなりたつことと，判別関数ｇ_j(ｘ；Λ)が式（５）と式（２４）との２通りに書けることにより，以下の等式がなりたつことがわかる．

Here, the Mth order coefficient vector τ _j and the Mth order matrix K are given by the equations (6) and (2), respectively. And takes a positive classification learning patterns x ^o near the boundary as described in 2.1, and the the correct class and best-The net part class and C _y and C _i, respectively. It can be seen that equation (26) is satisfied and that the discriminant function g _j (x; Λ) can be written in two ways, equation (5) and equation (24).

２．１で議論したように，上式（２７）の左辺はＭ次元空間Ｂにおける幾何マージンを表す．２．１での議論と同様の議論を高次元空間Ｈにおける線形判別関数である式（２４）に当てはめることにより，式（２７）の右辺が空間Ｈにおける幾何マージンを表していることもわかる．すなわち式（２７）は，高々有限次元の空間Ｂにおける幾何マージンと非常に高い次元の（しばしば無限次元の）特徴空間Ｈにおける幾何マージンとが一致することを示している．その結果，有限次元である係数ベクトル集合｛τ_j｝^J _j=1を調整する本実施の形態の大幾何マージンＭＣＥ学習が，カーネルに付随する非常に高い次元の特徴空間における大幾何マージンＭＣＥ学習にもなっていることが保証されている． As discussed in 2.1, the left side of equation (27) represents the geometric margin in M-dimensional space B. By applying the same argument as the argument in 2.1 to Expression (24), which is a linear discriminant function in the high-dimensional space H, it can also be seen that the right side of Expression (27) represents the geometric margin in the space H. In other words, equation (27) indicates that the geometric margin in the space B of at most finite dimension coincides with the geometric margin in the feature space H of very high dimension (often infinite dimension). As a result, the large geometric margin MCE learning of the present embodiment for adjusting the coefficient vector set {τ _j } ^J _{j = 1} having a finite dimension is the large geometric margin MCE learning in the very high dimensional feature space attached to the kernel. It is guaranteed that

式（２７）の右辺からわかるように，２クラス対の係数ベクトルの差のノルム||ｗ_y−ｗ_I||を小さくすることが，高次元空間Ｈにおける幾何マージンの増大化に対応する．ところが，ＭＳＶＭの学習目的関数である式（１９）は，差ではなく係数ベクトルそのもののノルム||ｗ_j||の２乗総和Σ^J _j=1||ｗ_j||²の最小化を目指している．この総和を小さくしても，各クラスに関する係数ベクトル個々のノルムが小さくなる保証はない．更にまた，本来目指すべき各クラス対の係数ベクトル差のノルムが小さくなることも保証されない．つまりＭＳＶＭにおける幾何マージン増大化は不十分であることがわかる．これに対し本実施の形態は，式（２７）の関係から，高々有限次元の係数ベクトルτ_jの調整によって空間Ｈにおける幾何マージンを直接的に増大させることができる．またＭＳＶＭは，式（１９）において，学習パターンｘ_nに対する損失であるスラック変数ξ_nが分類誤り数を直接的に表しておらず，学習パラメータ最適化が分類誤り最小化の最終目的に対して直接的ではない．しかし本実施の形態ではＭＣＥ学習を用いており，学習パラメータ最適化が分類誤り最小化の最終目的に対して直接的である． As can be seen from the right side of Equation (27), reducing the norm || w _y −w _I || of the difference between the coefficient vectors of the two class pairs corresponds to an increase in the geometric margin in the high-dimensional space H. However, Equation (19), which is the learning objective function of MSVM, aims to minimize the square sum Σ ^J _{j = 1} || w _j || ² of the norm || w _j || ing. Even if this sum is made small, there is no guarantee that the norm of each coefficient vector for each class will be small. Furthermore, it is not guaranteed that the norm of the coefficient vector difference of each class pair that should be aimed at will be small. In other words, it can be seen that the increase in geometric margin in MSVM is insufficient. On the other hand, in the present embodiment, the geometric margin in the space H can be directly increased by adjusting the coefficient vector τ _{j having} at most a finite dimension from the relationship of the equation (27). Also, in MSVM, in equation (19), the slack variable ξ _n, which is a loss for the learning pattern x _n , does not directly represent the number of classification errors, and the learning parameter optimization is performed for the final purpose of classification error minimization. Not direct. However, this embodiment uses MCE learning, and learning parameter optimization is straightforward for the final goal of classification error minimization.

すなわち，本実施の形態においては，カーネルを用いているため判別関数がＭＳＶＭと同様の高い表現能力を有している．しかも有限次元パラメータの調節を介して，分類誤り率の最小化とカーネルに付随する高次元空間における幾何マージンの増大化とが共に実現可能である．言い換えれば，本実施の形態により，複雑な分類タスクにおいて，未知パターンに対する高い認識率が実現できる． In other words, in this embodiment, since the kernel is used, the discriminant function has the same high expression ability as MSVM. Moreover, both the minimization of the classification error rate and the increase of the geometric margin in the high-dimensional space associated with the kernel can be realized by adjusting the finite-dimensional parameters. In other words, this embodiment can achieve a high recognition rate for unknown patterns in complex classification tasks.

［３さらなる変形例］
（３．１混合ガウスモデル及び動径基底関数ネットワークによる初期化）
カーネルとして式（３）のガウス関数を用いた場合，上記実施の形態における判別関数である式（１）は，パラメータの集合｛τ_m,j｝^M _m=1 ^J _j=1とプロトタイプ集合｛ｐ_m｝^M _m=1とを適当に定めることにより，混合ガウスモデル（ＧａｕｓｓｉａｎＭｉｘｔｕｒｅＭｏｄｅｌ：ＧＭＭ）又は動径基底関数（ＲａｄｉａｌＢａｓｉｃＦｕｎｃｔｉｏｎ：ＲＢＦ）ネットワーク（C. M. Bishop（元田浩，栗田多喜夫，樋口知之，松本裕治，村田昇監訳），パターン認識と機械学習，シュプリンガー・ジャパン，東京，2007.）と同じ形をしていることがわかる．従来，ＧＭＭに関しては，最尤推定法又は初期のＭＣＥ学習法によりパラメータ推定が行なわれていた．ＲＢＦネットワークに関しては，最小２乗法又は初期のＭＣＥ学習法により学習がなされていた． [3 Further modifications]
(3.1 Initialization with mixed Gaussian model and radial basis function network)
When the Gaussian function of Expression (3) is used as the kernel, Expression (1), which is the discriminant function in the above embodiment, is obtained by using the parameter set {τ _{m, j} } ^M _{m = 1} ^J _{j = 1} and the prototype set { p _m } ^M _{m = 1} is appropriately determined, thereby allowing a mixed Gaussian model (GMM) or a radial basic function (RBF) network (CM Bishop (Hiroshi Motoda, Takio Kurita, Higuchi) Tomoyuki, Yuji Matsumoto, Noboru Murata), pattern recognition and machine learning, Springer Japan, Tokyo, 2007.). Conventionally, parameter estimation has been performed for the GMM by the maximum likelihood estimation method or the initial MCE learning method. The RBF network was learned by the least square method or the initial MCE learning method.

そこで，これらの既学習のＧＭＭ又はＲＢＦネットワークを，２．２．２において開示したアルゴリズムのサブステップ１１４及び１１６における初期化として採用してもよい．こうすることにより，これらの古くから馴染み深い分類器モデルに対して，高次元空間における幾何マージン増大化の概念が導入され，複雑なパターン分布に対する分類精度の向上が実現される． Therefore, these learned GMM or RBF networks may be employed as initialization in the sub-steps 114 and 116 of the algorithm disclosed in 2.2.2. In this way, the concept of increasing geometric margins in high-dimensional space is introduced to these familiar classifier models from long ago, and improvement of classification accuracy for complex pattern distribution is realized.

（３．２値が０であるパラメータの固定）
２．２．２において開示したアルゴリズムのステップ１１４及び１１６における初期化において，係数の集合｛τ⁽⁰⁾ _m,j｝^M _m=1 ^J _j=1の中の係数ベクトルのいくつかが（しばしば多くが）０（又は０に近い値）となる．本実施の形態では，図４及び図５に示すコンピュータプログラムにおいて，そのようなパラメータを０に固定するような修正を施してもよい． (3.2 Fixed parameter with 0 value)
Upon initialization in steps 114 and 116 of the algorithm disclosed in 2.2.2, some of the coefficient vectors in the coefficient set {τ ⁽⁰⁾ _{m, j} } ^M _{m = 1} ^J _{j = 1} (often Many are 0) (or a value close to 0). In this embodiment, the computer program shown in FIGS. 4 and 5 may be modified so that such parameters are fixed to zero.

（３．３係数ベクトル集合｛α_j｝^J _j=1の調整）
２．２．２において開示したアルゴリズムは，係数ベクトル集合｛τ_j｝^J _j=1の調整による式（１３）の幾何マージンの増大化を行なう．しかし，本発明はそのような実施の形態には限定されない．例えば，係数ベクトル集合｛α_j｝^J _j=1の調整による式（１２）の増大化を行なってもよい．その際に大幾何マージンＭＣＥ学習で用いられる誤分類測度は (3.3 Adjustment of coefficient vector set {α _j } ^J _{j = 1} )
The algorithm disclosed in 2.2.2 increases the geometric margin of equation (13) by adjusting the coefficient vector set {τ _j } ^J _{j = 1} . However, the present invention is not limited to such an embodiment. For example, the equation (12) may be increased by adjusting the coefficient vector set {α _j } ^J _{j = 1} . The misclassification measure used in large geometric margin MCE learning is

であり，学習のためのコンピュータプログラムは次に説明する第２の実施の形態のような制御構造を持つものとなる．

The computer program for learning has a control structure as in the second embodiment described below.

（３．４プロトタイプの調整）
２．２．２において開示した制御構造を持つコンピュータプログラムは，式（１）の判別関数におけるパラメータの集合｛τ_m,j｝^M _m=1 ^J _j=1のみを調整する．しかし，本発明はそのような実施の形態には限定されない．例えば，この係数のみならず，プロトタイプ集合｛ｐ_m｝^M _m=1をも大幾何マージンＭＣＥ学習法に基づいて調整しても良い．こうすることで，幾何マージン増大化及び分類誤り確率最小化の目的に対して最適なプロトタイプを自動的に学習することが可能となり，分類器の分類精度が更に向上する． (3.4 Prototype adjustment)
The computer program having the control structure disclosed in 2.2.2 adjusts only the set of parameters {τ _{m, j} } ^M _{m = 1} ^J _{j = 1} in the discriminant function of Equation (1). However, the present invention is not limited to such an embodiment. For example, not only this coefficient but also the prototype set {p _m } ^M _{m = 1} may be adjusted based on the large geometric margin MCE learning method. This makes it possible to automatically learn the optimal prototype for the purpose of increasing the geometric margin and minimizing the classification error probability, further improving the classification accuracy of the classifier.

プロトタイプをも調整するアルゴリズムは，図４及び図５に示したプロトタイプ集合｛ｐ_m｝^M _m=1，ｋ(ｘ_n)及びグラム行列Ｋをそれぞれ繰返し番号ｔの添字のついたプロトタイプ集合｛ｐ^(t) _m｝^M _m=1，ｋ^(t)(ｘ_n)，及びグラム行列Ｋ^(t)に置き換え，更に図５のサブステップ１８４においてプロトタイプに関する更新式を付け加えるだけで良い．ここで，ｋ^(t)(ｘ)及びＫ^(t)は，それぞれ式（７）及び式（２）において，ｐ_m＝ｐ^(t) _m（ｍ＝１，…，Ｍ）としたものである．例えば式（３）のガウシアンカーネルを用いた場合の学習アルゴリズムを採用したものが，後述の第３の実施の形態である． The algorithm for adjusting the prototype also includes a prototype set {p _m } ^M _{m = 1} , k (x _n ) and a gram matrix K shown in FIG. 4 and FIG. ^(t) _m } ^M _{m = 1} , k ^(t) (x _n ), and the Gram matrix K ^(t) are replaced. Further, in sub-step 184 of FIG. Here, k ^(t) (x) and K ^(t) are obtained by setting p _m = p ^(t) _m (m = 1,..., M) in equations (7) and (2), respectively. is there. For example, a learning algorithm that uses a Gaussian kernel of Equation (3) is the third embodiment described later.

［第２の実施の形態］
図６を参照して，この発明の第２の実施の形態を実現するためのコンピュータプログラムは以下のような制御構造を有する．
１．（ステップ１１０）
正定値カーネルＫ(・,・)を用意する．
２．（ステップ１１２）
プロトタイプ集合｛ｐ_m｝^M _m=1を用意する．必要ならば，学習パターン集合｛ｘ_n｝^N _n=1をクラスタリングすることによりプロトタイプ集合を求める．
３．（ステップ２１０）
式（２）に従ってグラム行列Ｋを構成し，更に式（８）のコレスキー分解を行ない，下三角行列Ｌを得る．
４.（ステップ１１４及び１１６）
各クラスＣ_jに対して，Ｍ次係数ベクトルτ⁽⁰⁾ _jを初期化する（ｊ＝１，…，Ｊ）．
５．（ステップ２１２及び２１４）
各クラスＣ_jに対して，Ｍ次係数ベクトルα⁽⁰⁾ _jをα⁽⁰⁾ _j＝Ｌ^Tτ⁽⁰⁾ _jにより計算する（ｊ＝１，…，Ｊ）．
６．（ステップ１１８）
繰返番号を表す変数ｔを０に設定する．またエポック回数ｅの上限値Ｅを設定する．
７．（ステップ２２０）
ｅ＝０，１，…，Ｅに対して，係数ベクトルαの適応的学習処理を実行する．この処理の詳細については図７を参照して後述する．
８．（ステップ２２２及び２２４）
最終的に得られた係数ベクトルα_jから，次式に従ってクラスＣ_jの判別関数ｇ_j(ｘ；Λ)を構成する（ｊ＝１，…，Ｊ）． [Second Embodiment]
Referring to FIG. 6, the computer program for realizing the second embodiment of the present invention has the following control structure.
1. (Step 110)
Prepare a positive definite kernel K (・, ・).
2. (Step 112)
Prepare a prototype set {p _m } ^M _{m = 1} . If necessary, a prototype set is obtained by clustering the learning pattern set {x _n } ^N _{n = 1} .
3. (Step 210)
A gram matrix K is constructed according to equation (2), and Cholesky decomposition of equation (8) is further performed to obtain a lower triangular matrix L.
4. (Steps 114 and 116)
For each class C _j , the M-th order coefficient vector τ ⁽⁰⁾ _j is initialized (j = 1,..., J).
5. (Steps 212 and 214)
For each class C _j , the M-th order coefficient vector α ⁽⁰⁾ _j is calculated by α ⁽⁰⁾ _j = L ^T τ ⁽⁰⁾ _j (j = 1,..., J).
6). (Step 118)
Set the variable t representing the repetition number to 0. Also set the upper limit E of the number of epochs e.
7). (Step 220)
The adaptive learning process of the coefficient vector α is executed for e = 0, 1,. Details of this processing will be described later with reference to FIG.
8). (Steps 222 and 224)
A discriminant function g _j (x; Λ) of class C _j is constructed from the finally obtained coefficient vector α _j according to the following equation (j = 1,..., J).

ただしβ(ｘ)は連立1次方程式：Ｌβ＝ｋ(ｘ)のβについての解である．

Where β (x) is the solution for β of the simultaneous linear equation: Lβ = k (x).

９．（ステップ１２６）
このようにして各クラスＣ_jに関する判別関数ｇ_j(ｘ)が得られたら，それらを所定の記憶装置に記憶して処理を終了する．
図７を参照して．図６のステップ２２０で行なわれるα_jに関する適応的学習処理を実現するプログラムは，以下のような制御構造を持つ．
この処理は，ｅ＝０，１，…，Ｅに対して以下の処理２５２を実行するステップ２５０を含む．
処理２５２は，学習パターン集合｛ｘ_n｝^N _n=1の全ての要素に対して以下の処理２６２を実行するステップ２６０と，ステップ２６０の処理が完了した後，学習パターン集合Ω_N内の学習パターンの並び順をシャッフルするステップ２６４とを含む．
処理２６２は，以下のサブステップを含む．
（ａ）（サブステップ１７０）
学習パターン集合Ω_Nから，１個の学習パターン｛ｘ_n，ｙ_n｝を取り出す．
（ｂ）（サブステップ１７２）
式（７）に従い，Ｍ次ベクトルｋ(ｘ_n)を構成する．
（ｃ）（サブステップ２７０）
連立１次方程式：Ｌβ＝ｋ(ｘ_n)をβについて解き，解β_nを得る．
（ｄ）（サブステップ２７２及び２７４）
各クラスＣ_jに対して，判別関数値ｇ_jをｇ_j＝｛α^(t) _j｝^Tβ_nにしたがい計算する（ｊ＝１，…，Ｊ）．
（ｅ）（サブステップ１７８）
学習パターンｘ_nに対するbest-incorrectクラスＣ_inをｇ_in＝max^J _j,j≠ynｇ_jに従って求める．
（ｆ）（サブステップ１８０）
ｄ_yn＝−ｇ_yn＋ｇ_inを計算する．
（ｇ）（サブステップ１８２）
幾何マージン型誤分類測度値を次式に従って計算する． 9. (Step 126)
When the discriminant function g _j (x) for each class C _j is obtained in this way, they are stored in a predetermined storage device and the process is terminated.
See FIG. The program for realizing the adaptive learning process for α _j performed in step 220 in FIG. 6 has the following control structure.
This process includes a step 250 of executing the following process 252 for e = 0, 1,.
The process 252 includes a step 260 for executing the following process 262 for all elements of the learning pattern set {x _n } ^N _{n = 1} , and learning in the learning pattern set Ω _N after the process of step 260 is completed. And step 264 for shuffling the pattern order.
Process 262 includes the following sub-steps.
(A) (Substep 170)
One learning pattern {x _n , y _n } is extracted from the learning pattern set Ω _N.
(B) (Substep 172)
Construct M-order vector k (x _n ) according to equation (7).
(C) (Sub-step 270)
Solve the simultaneous linear equations: Lβ = k (x _n ) for β and get the solution β _n .
(D) (Substeps 272 and 274)
For each class C _j , the discriminant function value g _j is calculated according to g _j = {α ^(t) _j } ^T β _n (j = 1,..., J).
(E) (Substep 178)
Find the best-incorrect class C _in for the learning pattern x _{n according} to g _in = max ^J _{j, j ≠ yn} g _j .
(F) (Substep 180)
d _yn = -g _yn + g _in is calculated.
(G) (Substep 182)
Calculate the geometric margin misclassification measure according to the following formula.

（ｈ）（サブステップ２７６）
次式に従って係数ベクトルα_jを更新し（ｊ＝１，…，Ｊ），その後変数ｔの値を１インクリメントする．

(H) (Substep 276)
The coefficient vector α _j is updated according to the following equation (j = 1,..., J), and then the value of the variable t is incremented by 1.

《第３の実施の形態》
例えば式（３）のガウシアンカーネルを用いた場合の学習アルゴリズムは，ここに説明する第３の実施の形態のアルゴリズムとなる．以下，図８及び図９を参照してこのアルゴリズムを実現するコンピュータプログラムの制御構造について説明する．
１．（ステップ１１０）
式（３）のガウシアンカーネルＫ(・,・)を用意する．
２．（ステップ１１２）
プロトタイプ集合｛ｐ⁽⁰⁾ _m｝^M _m=1を用意する．必要ならば，学習パターン集合｛ｘ_n｝^N _n=1をクラスタリングすることによりプロトタイプ集合を求める．
３.（ステップ１１４及び１１６）
各クラスＣ_jに対して，Ｍ次係数ベクトルτ⁽⁰⁾ _jを初期化する（ｊ＝１，…，Ｊ）．
４．（ステップ１１８）
繰返番号を表す変数ｔを０に設定する．またエポック回数ｅの上限値Ｅを設定する．
５．（ステップ３００）
ｅ＝０，１，…，Ｅに対して，係数ベクトルτとｐ_mの適応的学習処理を実行する．この処理の詳細については図９を参照して後述する．
６．（ステップ３０２及び３０４）
最終的に得られた係数ベクトル集合｛τ_j｝^J _j=1及びプロトタイプ集合｛ｐ_m｝^M _m=1から，次式に従ってクラスＣ_jの判別関数ｇ_j(ｘ)を構成する（ｊ＝１，…，Ｊ）． << Third Embodiment >>
For example, the learning algorithm when the Gaussian kernel of Equation (3) is used is the algorithm of the third embodiment described here. The control structure of a computer program that implements this algorithm will be described below with reference to FIGS.
1. (Step 110)
Prepare Gaussian kernel K (・, ・) of equation (3).
2. (Step 112)
Prepare the prototype set {p ⁽⁰⁾ _m } ^M _{m = 1} . If necessary, a prototype set is obtained by clustering the learning pattern set {x _n } ^N _{n = 1} .
3. (Steps 114 and 116)
For each class C _j , the M-th order coefficient vector τ ⁽⁰⁾ _j is initialized (j = 1,..., J).
4). (Step 118)
Set the variable t representing the repetition number to 0. Also set the upper limit E of the number of epochs e.
5. (Step 300)
e = 0,1, ..., with respect to E, performing an adaptive learning process of the coefficient vector τ and p _m. Details of this processing will be described later with reference to FIG.
6). (Steps 302 and 304)
From the finally obtained coefficient vector set {τ _j } ^J _{j = 1} and prototype set {p _m } ^M _{m = 1} , a discriminant function g _j (x) of class C _j is constructed according to the following equation (j = 1, ..., J).

９．（ステップ１２６）
このようにして各クラスＣ_jに関する判別関数ｇ_j(ｘ)が得られたら，それらを所定の記憶装置に記憶して処理を終了する． 9. (Step 126)
When the discriminant function g _j (x) for each class C _j is obtained in this way, they are stored in a predetermined storage device and the process is terminated.

図９を参照して，図８のステップ３００で行なわれる係数ベクトルτとｐ_mの適応的学習処理を実現するプログラムは，以下のような制御構造を持つ．
この処理は，ｅ＝０，１，…，Ｅに対して以下の処理３１２を実行するステップ３１０を含む．
処理３１２は，学習パターン集合｛ｘ_n｝^N _n=1の全ての要素に対して以下の処理３２２を実行するステップ３２０と，ステップ３２０の処理が完了した後，学習パターン集合Ω_N内の学習パターンの並び順をシャッフルするステップ３２４とを含む．
処理３２２は，以下のサブステップを含む．
（ａ）（サブステップ１７０）
学習パターン集合Ω_Nから，１個の学習パターン｛ｘ_n，ｙ_n｝を取り出す．
（ｂ）（サブステップ３３０）
式（７）に従い，Ｍ次ベクトルｋ^(t)(ｘ_n)を構成する．
（ｃ）（サブステップ３３２及び３３４）
各クラスＣ_jに対して，判別関数値ｇ_jをｇ_j＝｛τ^(t) _j｝^Tｋ^(t)(ｘ_n)にしたがい計算する（ｊ＝１，…，Ｊ）．
（ｄ）（サブステップ１７８）
学習パターンｘ_nに対するbest-incorrectクラスＣ_inをｇ_in＝max^J _j,j≠ynｇ_jにより求める．
（ｅ）（サブステップ１８０）
ｄ_yn＝−ｇ_yn＋ｇ_inを計算する．
（ｆ）（サブステップ３３６）
幾何マージン型誤分類測度値を次式に従って計算する． Referring to FIG. 9, the program for realizing the adaptive learning process of the coefficient vector τ and p _m performed in step 300 of FIG. 8 has the following control structure.
This process includes a step 310 of executing the following process 312 for e = 0, 1,.
The processing 312 includes step 320 for executing the following processing 322 for all elements of the learning pattern set {x _n } ^N _{n = 1} , and learning in the learning pattern set Ω _N after the processing of step 320 is completed. And step 324 of shuffling the pattern order.
Process 322 includes the following substeps.
(A) (Substep 170)
One learning pattern {x _n , y _n } is extracted from the learning pattern set Ω _N.
(B) (Substep 330)
Construct M-order vector k ^(t) (x _n ) according to equation (7).
(C) (Substeps 332 and 334)
For each class C _j , the discriminant function value g _j is calculated according to g _j = {τ ^(t) _j } ^T k ^(t) (x _n ) (j = 1,..., J).
(D) (Substep 178)
Find the best-incorrect class C _in for the learning pattern x _n by g _in = max ^J _{j, j ≠ yn} g _j .
(E) (Substep 180)
d _yn = -g _yn + g _in is calculated.
(F) (Sub-step 336)
Calculate the geometric margin misclassification measure according to the following formula.

（ｇ）（サブステップ３３８）
次式に従って係数ベクトル集合｛τ_j｝^J _j=1及びプロトタイプ集合｛ｐ_m｝^M _m=1を更新する（ｊ＝１，…，Ｊ）．

(G) (Substep 338)
Update the coefficient vector set {τ _j } ^J _{j = 1} and prototype set {p _m } ^M _{m = 1} according to the following equation (j = 1,..., J).

（ｈ）（サブステップ１８６）
変数ｔの値を１インクリメントする．

(H) (Substep 186)
The value of variable t is incremented by 1.

以上に説明した本発明の実施の形態に係る学習装置は，カーネル関数を用いる分類器及び確率モデルを適用する分類器の汎用性をそのまま受け継ぐものである．したがって上記したような学習装置は，実施の形態で説明したような話者判別装置のみならず，入力パターンを予め定められた複数のクラスの１つに割当てるような任意のパターン認識器に応用可能である．より具体的な例として，パターンとクラス・プロトタイプとの距離尺度に基づいて判別関数を計算する文字認識装置，隠れマルコフモデルを用いてパターンのクラス帰属確率を計算しそれを判別関数とする音声認識装置等が挙げられる． The learning apparatus according to the embodiment of the present invention described above inherits the versatility of the classifier using the kernel function and the classifier applying the probability model as it is. Therefore, the learning device as described above can be applied not only to the speaker discrimination device described in the embodiment but also to an arbitrary pattern recognizer that assigns an input pattern to one of a plurality of predetermined classes. It is. As a more specific example, a character recognition device that calculates a discriminant function based on a distance measure between a pattern and a class prototype, and speech recognition that uses a hidden Markov model to calculate the class membership probability of a pattern and uses it as a discriminant function Equipment, etc.

[コンピュータによる実現]
上述の実施の形態は，コンピュータシステムと，コンピュータシステム上で動作するコンピュータプログラムとによって実現されうる．図１０はこの実施の形態で用いられるコンピュータシステム５３０の外観を示し，図１１はコンピュータシステム５３０のブロック図である．ここに示すコンピュータシステム５３０は単なる例示であって，他の構成も利用可能である． [Realization by computer]
The above-described embodiment can be realized by a computer system and a computer program that runs on the computer system. FIG. 10 shows the external appearance of the computer system 530 used in this embodiment, and FIG. 11 is a block diagram of the computer system 530. The computer system 530 shown here is merely exemplary, and other configurations can be used.

図１０を参照して，コンピュータシステム５３０は，コンピュータ５４０と，全てコンピュータ５４０に接続された，モニタ５４２，キーボード５４６，マウス５４８，スピーカ５７２及びマイクロフォン５７０とを含む．さらに，コンピュータ５４０は，ＤＶＤ(ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ)ドライブ５５０と，半導体メモリドライブ５５２とを含む． Referring to FIG. 10, a computer system 530 includes a computer 540 and a monitor 542, a keyboard 546, a mouse 548, a speaker 572, and a microphone 570 all connected to the computer 540. Further, the computer 540 includes a DVD (Digital Versatile Disc) drive 550 and a semiconductor memory drive 552.

図１１を参照して，コンピュータ５４０はさらに，ＤＶＤドライブ５５０と半導体メモリドライブ５５２とに接続されたバス５６６と，全てバス５６６に接続された，ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：中央処理装置）５５６，コンピュータ５４０のブートアッププログラム等を記憶するＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ：読出し専用メモリ）５５８，ＣＰＵ５５６の作業領域を提供するとともにＣＰＵ５５６によって実行されるプログラムの記憶領域を提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ：ランダムアクセスメモリ）５６０，並びに学習パターン集合及び判別関数を記憶するハードディスクドライブ（ＨＤＤ）５５４とを含む． Referring to FIG. 11, a computer 540 further includes a bus 566 connected to a DVD drive 550 and a semiconductor memory drive 552, and a CPU (Central Processing Unit) 556, which is all connected to the bus 566. Random access memory (RAM) that provides a ROM (Read-Only Memory) 558 for storing a boot-up program 540, a work area for the CPU 556, and a storage area for a program executed by the CPU 556 Memory) 560 and a hard disk drive (HDD) 554 for storing a learning pattern set and a discriminant function.

上述の実施の形態のシステムを実現するソフトウェアは，ＤＶＤ５６２又は半導体メモリ５６４等の記憶媒体に記録されるオブジェクトコード又はスクリプトの形で配布され，ＤＶＤドライブ５５０又は半導体メモリドライブ５５２等の読出装置によってコンピュータ５４０に与えられ，ＨＤＤ５５４に記憶されてもよい．ＣＰＵ５５６がプログラムを実行するときは，プログラムはＨＤＤ５５４から読出され，ＲＡＭ５６０に記憶される．ＲＡ５６０の，ＣＰＵ５５６内の図示しないプログラムカウンタによって指示されるアドレスから命令がフェッチされ，その命令が実行される．ＣＰＵ５５６は処理すべきデータをハードディスクドライブ５５４又はＲＡＭ５６０等から読出し，処理結果をまたハードディスクドライブ５５４又はＲＡＭ５６０等に記憶する． The software that realizes the system of the above-described embodiment is distributed in the form of an object code or a script recorded on a storage medium such as the DVD 562 or the semiconductor memory 564, and is computerized by a reading device such as the DVD drive 550 or the semiconductor memory drive 552. 540 and stored in the HDD 554. When the CPU 556 executes the program, the program is read from the HDD 554 and stored in the RAM 560. An instruction is fetched from an address indicated by a program counter (not shown) in the CPU 556 of the RA 560, and the instruction is executed. The CPU 556 reads data to be processed from the hard disk drive 554 or the RAM 560 and stores the processing result in the hard disk drive 554 or the RAM 560 again.

コンピュータシステム５３０の一般的動作は周知であるので，詳細な説明はここでは行なわない． The general operation of computer system 530 is well known and will not be described in detail here.

ソフトウェア配布の方法については，これを記憶媒体に固定することは必ずしも必要でない．例えば，ソフトウェアはネットワークに接続された別のコンピュータから配布されてもよい．ソフトウェアの一部をハードディスクドライブ５５４に記憶させ，ソフトウェアの残りの部分をネットワークを介してハードディスクに取込み，実行時に統合してもよい． As for software distribution method, it is not always necessary to fix this to the storage medium. For example, the software may be distributed from another computer connected to the network. A part of the software may be stored in the hard disk drive 554, and the remaining part of the software may be taken into the hard disk via the network and integrated at the time of execution.

典型的には，現代のコンピュータはコンピュータのオペレーティングシステム（ＯＳ）によって提供される一般的な機能を利用し，所望の目的にしたがって制御されたやり方で機能を実行する．さらに，サードパーティによって提供されるコンピュータプログラムツールキット又はツールボックスは基本的なものだけでなく，学習アルゴリズムの単位を構成する機能（例えばクラスタリングツール，ＭＳＶＭ学習ツール等の数値処理プログラムキット）を提供する洗練されたプログラムもある．したがって，ＯＳ又はサードパーティによって提供されうる一般的な機能を含まず，単にこうした単位となる機能の実行順序の組合せを指示するのみのプログラムも，そのプログラムが全体として所望の目的を達成するのであれば，この発明の範囲に含まれる． Typically, modern computers take advantage of the general functions provided by a computer operating system (OS) and perform functions in a controlled manner according to the desired purpose. Furthermore, computer program toolkits or toolboxes provided by third parties are not only basic, but also provide functions that constitute units of learning algorithms (for example, numerical processing program kits such as clustering tools and MSVM learning tools). There are also sophisticated programs. Therefore, a program that does not include a general function that can be provided by the OS or a third party, and merely indicates a combination of the execution order of the functions as such a unit may achieve the desired purpose as a whole. Is included in the scope of the present invention.

以上のように，本発明によれば，公知技術である大幾何マージンＭＣＥ学習法を，カーネルの線形和の形式を持つ判別関数の線形和係数パラメータに対して適用する．これにより，カーネルを用いて精緻な分類決定境界を形成することが可能となる．それだけでなく，分類誤り最小化と未知パターンに対する耐性向上とを共に直接的に目指す学習法が定型化される．結果的に，本発明により，パターンの分布構造が複雑である場合においても,学習パターン以外の未知パターンに対する高い認識率が得られる． As described above, according to the present invention, the known large geometric margin MCE learning method is applied to the linear sum coefficient parameter of the discriminant function having the form of the kernel linear sum. This makes it possible to form precise classification decision boundaries using the kernel. In addition, a learning method that directly aims at minimizing classification errors and improving tolerance to unknown patterns is standardized. As a result, according to the present invention, even when the pattern distribution structure is complicated, a high recognition rate for unknown patterns other than the learning pattern can be obtained.

今回開示された実施の形態は単に例示であって，本発明が上記した実施の形態のみに制限されるわけではない．本発明の範囲は，発明の詳細な説明の記載を参酌した上で，特許請求の範囲の各請求項によって示され，そこに記載された文言と均等の意味及び範囲内での全ての変更を含む． The embodiment disclosed herein is merely an example, and the present invention is not limited to the embodiment described above. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are included. Including.

２０，３０入力パターン空間
２２，２４，２６，２８クラス
３２写像先のＭ次元空間
３４非常に高い次元の空間
４０話者識別のためのシステム
４２判別関数学習装置
４４判別関数伝達媒体
４６入力音声
４８話者判別装置
５０話者判別結果
６０学習発話データを記憶する第１の記憶装置
６２，８２特徴量抽出部
６４第２の記憶装置
６６学習装置
８０判別関数記憶部
８４話者判別部 20, 30 Input pattern space 22, 24, 26, 28 Class 32 M-dimensional space of mapping destination 34 Very high-dimensional space 40 System for speaker identification 42 Discriminant function learning device 44 Discriminant function transmission medium 46 Input speech 48 Speaker discriminating device 50 Speaker discriminating result 60 First storage device 62, 82 for storing learning utterance data Feature amount extracting unit 64 Second storage device 66 Learning device 80 Discriminant function storage unit 84 Speaker discriminating unit

Claims

A pattern classification device learning device for classifying an input pattern into one of a plurality of classes,
Storage means for storing a learning pattern set having a learning pattern composed of a vector obtained from observation data of a predetermined physical quantity and a label of a class to which the vector belongs;
Learning for learning, as learning data, a learning function included in a learning pattern set stored in the storage means, a discriminant function defined for each of the plurality of classes, which measures the degree to which an input pattern belongs to the class. Means,
The discriminant function is a function represented by a linear sum of kernel operations between an input pattern and a plurality of prototypes obtained from the learning pattern set corresponding to the plurality of classes,
The plurality of prototypes form a prototype set;
When the kernel operation defines a feature transformation that transforms the input pattern into a higher-dimensional space than the input pattern space, the kernel operation is performed between the input pattern after the transformation by the feature transformation and the prototype after the transformation by the feature transformation. Kernel matrix defined by inner product of, and composed of kernel operations between prototypes included in the prototype set is always a positive definite matrix for any number of prototypes Kernel operations,
For each of the plurality of classes, the kernel coefficients corresponding to each prototype of the linear sum form a coefficient vector;
The coefficient vectors formed for each of the plurality of classes form a coefficient vector set;
The learning means adjusts a coefficient vector included in the coefficient vector set so that an average classification error number loss defined as a function of the learning pattern and the coefficient vector set is minimized in the high-dimensional space. , Learning device for pattern classifier.

The learning apparatus for a pattern classification apparatus according to claim 1, further comprising clustering means for calculating the plurality of prototypes by clustering the vectors as the observation data.

The learning means includes
Initialization means for initializing each coefficient vector included in the coefficient vector set by a predetermined initialization method;
Learning pattern extraction means for extracting one from the learning patterns included in the learning pattern set;
In response to the learning pattern being extracted by the learning pattern extracting means, coefficient vector adjusting means for adjusting the coefficient vector included in the coefficient vector set so that the average classification error number loss is minimized; ,
In order to repeatedly execute the learning pattern extraction by the learning pattern extraction unit and the coefficient vector adjustment by the coefficient vector adjustment unit until all the learning patterns in the learning pattern set are extracted by the learning pattern extraction unit. The learning device of the pattern classification device according to claim 1, further comprising: a first iterative control means.

The learning means further includes
Shuffle means for shuffling the arrangement of learning patterns in the learning pattern set each time repetition by the first repetition control means is completed;
In response to completion of shuffling by the shuffle means, second repetition control means for resuming repetition by the first repetition control means;
4. The learning of the pattern classification apparatus according to claim 3, further comprising a stopping unit for stopping the repetition by the second repetition control unit when the repetition by the second repetition control unit is completed a predetermined number of times. apparatus.

The pattern according to claim 3, wherein the learning device further includes means for fixing a component whose absolute value is smaller than a predetermined threshold among the components of the coefficient vector obtained by the initialization unit to zero. Classifier learning device.

The prototype set is the learning pattern set;
The initialization means includes
In order to optimize a coefficient vector of a linear sum of vectors after a predetermined conversion for learning patterns for classifying learning patterns included in the learning pattern set into the plurality of classes by learning a multi-class support vector machine SVM learning means,
Initial value setting means for setting a coefficient vector optimized for the learning pattern set by the SVM learning means as an initial value of a coefficient vector composed of kernel coefficients corresponding to each prototype of the linear sum; The learning device of the pattern classification device according to claim 3, further comprising:

The initialization means further selects, as a prototype, only a support vector whose coefficient vector differs from the zero vector by a predetermined value or more among the learning patterns corresponding to the coefficient vector optimized by the SVM learning means, and determines the discriminant function. 7. The pattern classification learning device according to claim 6, further comprising prototype selection means for configuring.

The initialization means sets a coefficient vector of a mixed Gaussian model or a radial basis function that has been previously learned so as to be adapted to the learning pattern set and the prototype set as an initial value of the coefficient vector set. The pattern classification learning device according to claim 3, further comprising:

The learning means includes
Initialization means for initializing each coefficient vector included in the coefficient vector set by a predetermined initialization method;
Learning pattern extraction means for extracting one from the learning patterns included in the learning pattern set;
In response to the learning pattern being extracted by the learning pattern extracting means, the coefficient vector included in the coefficient vector set and the prototype included in the prototype set are minimized so that the average classification error number loss is minimized. Parameter adjusting means for adjusting to
The learning pattern extraction by the learning pattern extraction means and the adjustment of the coefficient vector and prototype by the parameter adjustment means are repeatedly executed until all learning patterns in the learning pattern set are extracted by the learning pattern extraction means. The learning apparatus for a pattern classification apparatus according to claim 1, further comprising:

The learning means further includes
Shuffle means for shuffling the arrangement of learning patterns in the learning pattern set each time repetition by the first repetition control means is completed;
In response to completion of shuffling by the shuffle means, second repetition control means for resuming repetition by the first repetition control means;
10. The learning of the pattern classification apparatus according to claim 9, further comprising a stopping unit for stopping the repetition by the second repetition control unit when the repetition by the second repetition control unit is completed a predetermined number of times. apparatus.

A computer program for causing a computer to function as a learning device of a pattern classification device for classifying input patterns into any of a plurality of classes, the computer program comprising:
Storage means for storing a learning pattern set having a learning pattern composed of a vector obtained from observation data of a predetermined physical quantity and a label of a class to which the vector belongs;
Learning for learning, as learning data, a learning function included in a learning pattern set stored in the storage means, a discriminant function defined for each of the plurality of classes, which measures the degree to which an input pattern belongs to the class. Function as a means,
The discriminant function is a function represented by a linear sum of kernel operations between an input pattern and a plurality of prototypes obtained from the learning pattern set corresponding to the plurality of classes,
The plurality of prototypes form a prototype set;
When the kernel operation defines a feature transformation that transforms the input pattern into a higher-dimensional space than the input pattern space, the kernel operation is performed between the input pattern after the transformation by the feature transformation and the prototype after the transformation by the feature transformation. Kernel matrix defined by inner product of, and composed of kernel operations between prototypes included in the prototype set is always a positive definite matrix for any number of prototypes Kernel operations,
For each of the plurality of classes, the kernel coefficients corresponding to each prototype of the linear sum form a coefficient vector;
The coefficient vectors formed for each of the plurality of classes form a coefficient vector set;
The learning means adjusts a coefficient vector included in the coefficient vector set so that an average classification error number loss defined as a function of the learning pattern and the coefficient vector set is minimized in the high-dimensional space. , Computer programs.