JP5527728B2

JP5527728B2 - Pattern classification learning device

Info

Publication number: JP5527728B2
Application number: JP2010184334A
Authority: JP
Inventors: 秀行渡辺
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2010-08-19
Filing date: 2010-08-19
Publication date: 2014-06-25
Anticipated expiration: 2030-08-19
Also published as: JP2012043221A

Description

この発明は何らかの測定データを所定のクラスのいずれかに分類するパターン分類の学習装置に関し、特に、ＭＣＥ（最小分類誤り学習）を用い、効率的に学習が行なえる学習装置に関する。 The present invention relates to a pattern classification learning apparatus that classifies some measurement data into one of predetermined classes, and more particularly to a learning apparatus that can perform learning efficiently using MCE (minimum classification error learning).

［パターン認識と学習］
人間と機械との間のインターフェイスにおいて、パターン認識は重要な技術である。パターン認識技術は、話者の識別、発話内容の認識、顔画像による人物の識別、及び文字認識など、様々な局面で使用される。パターン認識は、端的にいえば、何らかの物理現象を観測することにより得られる観測値のパターンを、複数個のクラスに分類する作業であるということができる。こうした作業は人間には比較的簡単であるが、これを機械にさせるのは容易ではない。そうした作業を行なう装置は、包括的に呼べばパターン認識装置ということになるが、パターン認識装置にパターン認識を行なわせるためには、学習データを統計的に処理することにより、分類に必要なパラメータを得る、学習と呼ばれる予備的な作業が必要とされる。 [Pattern recognition and learning]
Pattern recognition is an important technique in the interface between humans and machines. The pattern recognition technique is used in various aspects such as speaker identification, utterance content recognition, person identification by face image, and character recognition. In short, pattern recognition can be said to be an operation of classifying observed value patterns obtained by observing some physical phenomenon into a plurality of classes. These tasks are relatively easy for humans, but it is not easy to make them work. A device that performs such work is a pattern recognition device if it is called comprehensively, but in order to make the pattern recognition device perform pattern recognition, the learning data is processed statistically, and the parameters necessary for classification A preliminary work called learning is required.

入力パターン（観測値）ｘ∈ΧをＪ個のクラス（類）Ｃ_１，…，Ｃ_Ｊのいずれか１つに割当てる分類タスクを考える。ここで、Χは全入力パターン空間を表す。 Consider a classification task that assigns an input pattern (observed value) xεΧ to any one of J classes (classes) C ₁ ,..., C _J. Here, Χ represents the entire input pattern space.

分類器の学習のための統計的アプローチは、分類リスクの最小化の概念に基づく。分類リスクとは、個々の入力パターンを分類する際に課せられる損失の、全パターン空間に対する期待値である。最も自然で基本的な損失は分類誤り数損失（０‐１損失）である。この損失は、誤分類に対して値１を、正分類に対して値０を、それぞれ課す損失である。この損失は次式で定義される。 The statistical approach for classifier learning is based on the concept of minimizing classification risk. The classification risk is an expected value for the entire pattern space of a loss imposed when classifying individual input patterns. The most natural and basic loss is the classification error number loss (0-1 loss). This loss is a loss that imposes a value of 1 for misclassification and a value of 0 for correct classification. This loss is defined by the following equation.

ここで＿ｌ（Ｃ_ｊ｜Ｃ_ｙ）（「＿」は直後の文字が筆記体であることを表す。）は、クラスＣ_ｙに属するパターンをクラスＣ_ｊに分類する際に課せられる分類誤り数損失である。

Here _{_l (C j | C y)} (. "_" As indicating that the character following it is a cursive) the classification error count imposed in classifying patterns belonging to the class C _y in class C _j It is a loss.

分類器学習の究極の目標は、次式の分類誤り数リスクＲを最小にする分類決定則ｃ：Χ→｛Ｃ_ｊ｝_ｊ＝１ ^Ｊの実現である。 The ultimate goal of classifier learning is to realize a classification decision rule c: Χ → {C _j } _{j = 1} ^J that minimizes the classification error number risk R in the following equation.

ただしｐは確率密度関数を表す。Ｒは全入力空間Χに対する分類誤り確率に等しい。

However, p represents a probability density function. R is equal to the classification error probability for the entire input space Χ.

以下、本発明に特に関連する最小分類誤り（ＭＣＥ）学習（非特許文献１）について説明する。 Hereinafter, minimum classification error (MCE) learning (Non-Patent Document 1) particularly related to the present invention will be described.

［ＭＣＥ学習］
〈判別関数に基づく分類決定則〉
ＭＣＥ学習法は、判別関数を用いた次式の実際的な分類決定則を採用しＲの直接的最小化を目指す。 [MCE learning]
<Classification decision rule based on discriminant function>
The MCE learning method employs a practical classification decision rule of the following equation using a discriminant function and aims at direct minimization of R.

ここでｇ_ｊ（ｘ、Λ）はクラスＣ_ｊに対する判別関数であり、任意の関数形において、ｘがＣ_ｊに帰属する度合いを測る。Λは分類器の学習パラメータ（調整パラメータ）セットを表す。ｇ_ｊ（ｘ、Λ）はΛに関して微分可能であるとする。
〈誤分類測度〉
式（３）の決定則は全ての判別関数値の計算とそれらの比較演算とからなる。比較を含む演算は、学習段階で要求されるパラメータ最適化のような数値演算には適さない。したがって、式（３）を、数値演算に適した形式に置換えなければならない。ＭＣＥ学習は、Ｌｐノルム形式の平滑な誤分類測度を学習パターンに適用することでそのような置換を実現する。クラスＣ_ｙに属するパターンｘに対する誤分類測度は次式で定義される。

Here, g _j (x, Λ) is a discriminant function for class C _j and measures the degree to which x belongs to C _j in an arbitrary function form. Λ represents a learning parameter (adjustment parameter) set of the classifier. Let g _j (x, Λ) be differentiable with respect to Λ.
<Misclassification measure>
The decision rule of equation (3) consists of calculating all discriminant function values and comparing them. Computations involving comparisons are not suitable for numerical computations such as parameter optimization required in the learning stage. Therefore, Equation (3) must be replaced with a format suitable for numerical operations. MCE learning implements such replacement by applying a smooth misclassification measure in Lp-norm format to the learning pattern. Classification measure erroneous with respect to the pattern x belonging to the class C _y is defined by the following equation.

ここでψは正の実数である。またｌｏｇを自然対数とする。ψ→∞とすることにより

Here, ψ is a positive real number. Log is a natural logarithm. By setting ψ → ∞

となることから確認できるように、十分大きいψにおいて、ｄ_ｙの正値は誤分類を、ｄ_ｙの負値は正分類を表す。ｄ_ｙの絶対値は分類決定の確信度を表す。加えて、ｄ_ｙはΛに関して微分可能であり、最も基本的な勾配探索型の最適化手法を学習に適用することが可能となる。

As can be seen from the fact that a, in large enough [psi, a positive value is misclassification of d _y, negative values of d _y represents a positive classification. the absolute value of d _y represents the confidence of the classification decision. In addition, _dy is differentiable with respect to Λ, and the most basic gradient search type optimization method can be applied to learning.

ＭＣＥ学習は誤分類測度を駆使して式（１）の分類誤り数損失および式（２）のリスクを再定式化し、効率的な最適化手法の適用を可能にする。誤分類測度の定義に基づき、分類誤り数損失は次の式のように書換えられる。 MCE learning uses the misclassification measure to reformulate the number of classification error losses in equation (1) and the risk in equation (2), enabling efficient optimization techniques to be applied. Based on the definition of the misclassification measure, the classification error count loss can be rewritten as:

この関数のグラフ２０を図１に示す。ここで関数１（Ｐ）は、命題Ｐが真ならば１を、偽ならば０を、それぞれ返す指示関数である。リスクは次式のようなΛの関数として再定義される。

A graph 20 of this function is shown in FIG. Here, the function 1 (P) is an instruction function that returns 1 if the proposition P is true and 0 if it is false. Risk is redefined as a function of Λ as

〈平滑化分類誤り数損失〉
分類誤り数損失ｌ（ｄ_ｙ（ｘ，Λ）＞０）はΛに関して微分不可能である。この計算上の問題を克服するために、ＭＣＥ学習は微分可能な平滑化分類誤り数損失を定義して、これに式（１）を置換える。クラスＣ_ｙに属するパターンｘに対する平滑化分類誤り数損失として、ＭＣＥ学習では一般に、次式のロジスティックシグモイド関数が用いられる。

<Smooth classification error number loss>
The classification error number loss l (d _y (x, Λ)> 0) is not differentiable with respect to Λ. To overcome this computational problem, MCE learning defines a differentiating smoothed classification error number loss and replaces it with equation (1). As the smoothing classification error count loss for the class C _y belonging pattern x, the MCE learning Generally, the logistic sigmoid function of the following equation is used.

式（７）により表わされる関数のグラフ３０を図２に示す。ここで損失平滑度α_ｙは正の実数である。この＿ｌ_ｙ（ｄ_ｙ（ｘ，Λ））は誤分類測度ｄ_ｙ（ｘ，Λ）の単調増加関数であり、損失平滑度α_ｙが大きくなるにつれて傾きが大きく（急に）なり、損失平滑度α_ｙ→∞の極限で１（ｄ_ｙ（ｘ，Λ）＞０）に一致する。すなわち、平滑化分類誤り数損失は、誤分類カウントと直接的に結びついているだけでなく、Λに関して微分可能である。ＭＣＥ学習の最終的な学習目的は、式（６）における１（ｄ_ｙ（ｘ，Λ）＞０）を＿ｌ_ｙ（ｄ_ｙ（ｘ，Λ））に置換えた次式の期待損失を最小にするΛを求めることである。

FIG. 2 shows a graph 30 of the function represented by the equation (7). Here, the loss smoothness α _y is a positive real number. This _l _y (d _y (x, Λ)) is a monotonically increasing function of the misclassification measure d _y (x, Λ), and the slope increases (suddenly) as the loss smoothness α _y increases, and the loss smoothing It coincides with 1 (d _y (x, Λ)> 0) in the limit α _y → ∞. That is, the smoothed classification error number loss is not only directly related to the misclassification count, but is also differentiable with respect to Λ. The final learning purpose of MCE learning is to minimize the expected loss of the following equation by replacing 1 (d _y (x, Λ)> 0) in equation (6) with _l _y (d _y (x, Λ)) Is to find Λ.

損失平滑度α_ｙ→∞において、式（８）は式（６）と一致する。

In the loss smoothness α _y → ∞, the equation (8) matches the equation (6).

式（８）は無限個の入力パターンに関する積分を含む。しかし現実的には、有限個（Ｎ個）の標本からなる学習標本集合Ω_Ｎ＝｛（ｘ_ｎ，ｙ_ｎ）｝_ｎ＝１ ^Ｎを使ってΛを推定することしかできない。ここでｘ_ｎ∈Χはｎ番目の学習パターン、すなわち学習標本でありｙ_ｎ（＝１，…，Ｊ）は標本ｘ_ｎが属するクラスの指標である。したがって、現実的なＭＣＥ学習の評価基準は、式（８）を有限の学習標本集合Ω_Ｎで近似する次式の経験的平均損失となる。 Equation (8) includes integration over an infinite number of input patterns. However, in reality, it is only possible to estimate Λ using a learning sample set Ω _N = {(x _n , y _n )} _{n = 1} ^N consisting of a finite number (N) of samples. Here, x _n ∈Χ is the nth learning pattern, that is, a learning sample, and y _n (= 1,..., J) is an index of a class to which the sample x _n belongs. Therefore, evaluation criteria of realistic MCE learning, the empirical average loss of the following equation that approximates the equation (8) in the finite learning sample set Ω _N.

ＭＣＥ学習では、この経験的平均損失＾Ｌ（Λ）（記号「＾」は、式中では直後の文字の直上に記載されている。）を最小にするようなパラメータΛを求める。有限個の学習標本のみから構成される上式の＾Ｌ（Λ）は、当然ながら、学習標本集合に含まれない全ての未知パターンをも含む分類誤り数リスク（分類誤り確率）Ｒ（Λ）の近似にすぎず、＾Ｌ（Λ）を最小にするΛがＲ（Λ）を最小にする保証はない。しかし、適度な有限値の損失平滑度α_ｙを設定することにより、評価基準＾Ｌ（Λ）が平滑な関数となり、学習標本集合に含まれない未知パターンに対する学習耐性を向上させる。すなわちこの平滑化により、与えられた学習標本のみならずその近傍に対しても損失が敏感となり、学習標本数を増やす効果が得られる。したがって適切な損失平滑度α_ｙを設定することが、汎化能力向上に対して極めて重要である。

In MCE learning, a parameter Λ that minimizes this empirical average loss ^ L (Λ) (the symbol “^” is described immediately above the character immediately after in the equation) is obtained. Of course, ^ L (Λ) in the above equation consisting only of a finite number of learning samples is, of course, the risk of classification error (classification error probability) R (Λ) including all unknown patterns not included in the learning sample set. Λ that minimizes L (Λ) is not guaranteed to minimize R (Λ). However, by setting an appropriate finite-value loss smoothness α _y , the evaluation criterion ^ L (Λ) becomes a smooth function and improves learning tolerance for unknown patterns not included in the learning sample set. That is, this smoothing makes the loss sensitive not only to a given learning sample but also to the vicinity thereof, and the effect of increasing the number of learning samples can be obtained. Therefore, setting an appropriate loss smoothness α _y is extremely important for improving the generalization ability.

Ｂ．‐Ｈ．ジュアン及びＳ．カタギリ，「最小誤り分類のための識別学習」ＩＥＥＥ信号処理トランザクション、第４０巻、第１２号，ｐｐ．３０４３‐３０５４，１９９２年１２月（B.‐H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol.40, no.12, pp.3043‐3054, Dec. 1992.）B. -H. Juan and S. Katagiri, “Distinguishing Learning for Minimum Error Classification” IEEE Signal Processing Transactions, Vol. 40, No. 12, pp. 3043-3054, December 1992 (B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol.40, no.12, pp.3043-3054, Dec. 1992.) Ｅ．マクダーマット及びＳ．カタギリ，「Ｐａｒｚｅｎ推定を用いた、理論的分類リスクからの最小分類誤りの導出」、コンピュータ・スピーチ及び言語、第１８巻、ｐｐ．１０７‐１２２，２００４年４月（E. McDermott and S. Katagiri, “A derivation of minimum classification error from the theoretical classification risk using Parzen estimation,” Computer Speech and Language, vol.18, pp.107‐122, April 2004.）E. McDermat and S. Katagiri, “Deriving Minimum Classification Errors from Theoretical Classification Risk Using Parzen Estimation”, Computer Speech and Language, Vol. 18, pp. 107-122, April 2004 (E. McDermott and S. Katagiri, “A derivation of minimum classification error from the theoretical classification risk using Parzen estimation,” Computer Speech and Language, vol.18, pp.107-122, April 2004.) Ｒ．Ｐ．Ｗ．デュイン，「確率密度関数のＰａｒｚｅｎ推定のための平滑化関数の選択について」、ＩＥＥＥトランザクション・オブ・コンピュータ、第Ｃ−２５巻、ｐｐ．１１７５‐１１７９，１９７６年１１月（R.P.W. Duin, “On the choice of smoothing parameters for Parzen estimators of probability density functions,” IEEE Trans. Comput., vol.C‐25, pp.1175‐1179, Nov. 1976.）R. P. W. Duin, “Selecting a Smoothing Function for Parzen Estimation of the Probability Density Function”, IEEE Transaction of Computer, Vol. C-25, pp. 1175-1179, November 1976 (RPW Duin, “On the choice of smoothing parameters for Parzen estimators of probability density functions,” IEEE Trans. Comput., Vol. C-25, pp. 1175-1179, Nov. 1976. ) Ｃ．Ｍ．ビショップ（元田浩、栗田多喜夫、樋口知之、松本裕治、村田昇監訳）,パターン認識と機械学習下、シュプリンガー・ジャパン、東京、２００７年.C. M.M. Bishop (translated by Hiroshi Motoda, Takio Kurita, Tomoyuki Higuchi, Yuji Matsumoto, Noboru Murata), Pattern Recognition and Machine Learning, Springer Japan, Tokyo, 2007.

従来、上記した損失平滑度α_ｙを決めるための具体的な設定指針が与えられていない。そのため、損失平滑度α_ｙはアドホックに設定せざるを得ないという問題がある。そのようにして得られた損失平滑度α_ｙは、学習データに対しては有効ではあるものの、未知のデータに対して高い認識が得られる可能性が低いという問題がある。すなわち、従来の学習装置では、得られる分類器の汎化能力を高くすることが難しいという問題がある。 Conventionally, a specific setting guideline for determining the above-described loss smoothness α _y has not been given. Therefore, there is a problem that the loss smoothness α _y must be set to ad hoc. Although the loss smoothness α _y obtained in this way is effective for learning data, there is a problem that it is unlikely that high recognition is obtained for unknown data. That is, the conventional learning device has a problem that it is difficult to increase the generalization ability of the obtained classifier.

したがって本発明の目的は、ＭＣＥ学習によるパターン分類器の学習装置において、得られる分類器の汎化能力を高くすることができる学習装置を提供することである。 Accordingly, an object of the present invention is to provide a learning device capable of increasing the generalization ability of the obtained classifier in the learning device for a pattern classifier by MCE learning.

本発明の他の目的は、ＭＣＥ学習によるパターン分類器の学習装置において、汎化能力を高めることができる損失平滑度を具体的な設定指針によって算出することができる学習装置を提供することである。 Another object of the present invention is to provide a learning device that can calculate loss smoothness that can increase generalization ability with a specific setting guideline in a learning device for a pattern classifier by MCE learning. .

本発明の第１の局面に係る学習装置は、入力パターンをＪ個のクラスＣ_ｊ（ｊは１〜Ｊの整数）のいずれかに分類する分類器の学習装置であって、各々が入力パターンとその属するクラスとを含むＮ個（Ｎは正の整数）の学習標本を記憶するための学習標本記憶手段と、分類器の学習パラメータΛを予め定めた設定方法により初期化するための初期化手段とを含む。クラスＣ_ｙに属する学習標本の入力パターンｘが他のクラスに誤分類される誤分類測度値ｄ_ｙ（ｘ，Λ）が以下により定義される。 The learning device according to the first aspect of the present invention is a learning device for a classifier that classifies an input pattern into any of J classes C _j (j is an integer from 1 to J), each of which is an input pattern. And a learning sample storage means for storing N learning samples (N is a positive integer) including the class to which the class belongs and initialization for initializing the learning parameter Λ of the classifier by a predetermined setting method Means. A misclassification measure value d _y (x, Λ) in which the input pattern x of the learning sample belonging to the class C _y is misclassified into another class is defined as follows.

ただしψは正の実数であり、ｇ_ｙ（ｘ、Λ）はＪ個のクラスＣ_ｙの各々に対して、学習標本の入力パターンｘが当該クラスに属するか否かの度合いを判別するための、任意の形の判別関数である。この学習装置はさらに、Ｊ個のクラスＣ_ｙの各々について、当該クラスＣ_ｙに属する学習標本の各々に関する前記誤分類測度値を求め、当該クラスに属する標本を生成した真の確率分布を、誤分類測度空間における各誤分類測度値を中心とする、Ｐａｒｚｅｎ窓幅ｈ_ｙのＰａｒｚｅｎ分布であって、かつ誤分類測度値の関数として、交差確認型最尤推定により推定するためのＰａｒｚｅｎ分布推定手段を含む。
Ｐａｒｚｅｎ分布推定手段は、交差確認型最尤推定において、Ｐａｒｚｅｎ窓幅ｈ_ｙの関数としてＰａｒｚｅｎ分布の尤度を評価する。学習装置はさらに、Ｊ個のクラスＣ_ｙの各々について、Ｐａｒｚｅｎ分布推定手段による交差確認型最尤推定において最尤となるＰａｒｚｅｎ分布を与えるＰａｒｚｅｎ窓幅ｈ_ｙに対し、以下の関数

However the ψ is a positive real number, g y _(x, Λ) for each of J-number of class C _y, the input pattern x of training samples are used to determine the degree of whether belonging to the class A discriminant function of any form. The learning apparatus further, for each of the J Class C _y, determined the misclassification measure value for each of the training samples belonging to the class C _y, the true probability distribution that generated the sample belonging to the class, erroneous Parzen distribution estimator for parzen distribution with Parzen window width _hy , centered on each misclassified measure value in the classified measure space, and for estimating by cross-confirmation type maximum likelihood estimation as a function of the misclassified measure value including.
The Parzen distribution estimation means evaluates the likelihood of the Parzen distribution as a function of the Parzen window width _{hy in} the cross-confirmation type maximum likelihood estimation. Learning apparatus further comprises for each of the J Class _{C y,} to Parzen window width _{h y} giving the Parzen distribution as a maximum likelihood in cross-validation type maximum likelihood estimation by Parzen distribution estimating means, the following functions

によって、分類器のクラスＣ_ｙに対する損失平滑度の最適値α_ｙを算出するための最適損失平滑度算出手段と、学習標本集合から学習標本を１つずつ取出し、分類誤りリスクを最小化するよう、学習パラメータΛを逐次的に調整するための学習パラメータ調整手段と、Ｐａｒｚｅｎ分布推定手段、最適損失平滑度算出手段、及び学習パラメータ調整手段を、予め定める終了条件が成立するまで繰返し動作させ、終了条件が成立したときの学習パラメータΛを出力するための繰返し制御手段とを含む。

Accordingly, the optimum loss smoothness calculating means for calculating the optimum value alpha _y loss smoothness for the class C _y classifier, taken out from the learning sample set, one of the training samples, to minimize the classification error risk The learning parameter adjusting means for sequentially adjusting the learning parameter Λ, the Parzen distribution estimating means, the optimum loss smoothness calculating means, and the learning parameter adjusting means are repeatedly operated until a predetermined end condition is satisfied, Repetitive control means for outputting a learning parameter Λ when the condition is satisfied.

好ましくは、学習装置はさらに、繰返し制御手段による繰返しの前、かつ学習パラメータ調整手段による学習パラメータΛの調整の前に、学習標本の並び順を乱数にしたがってシャッフルするためのシャッフル手段を含む。 Preferably, the learning device further includes shuffle means for shuffling the arrangement order of learning samples according to random numbers before repetition by the repetition control means and before adjustment of the learning parameter Λ by the learning parameter adjustment means.

より好ましくは、予め定める終了条件は、Ｐａｒｚｅｎ分布推定手段、最適損失平滑度算出手段、及び学習パラメータ調整手段による動作が所定回数完了したこと、という条件である。 More preferably, the predetermined termination condition is a condition that the operations by the Parzen distribution estimation unit, the optimum loss smoothness calculation unit, and the learning parameter adjustment unit have been completed a predetermined number of times.

さらに好ましくは、繰返し制御手段は、Ｐａｒｚｅｎ分布推定手段、最適損失平滑度算出手段、及び学習パラメータ調整手段のうち、Ｐａｒｚｅｎ分布推定手段、及び最適損失平滑度算出手段の動作を定期的に省略する。 More preferably, the iterative control means periodically omits the operations of the Parzen distribution estimation means and the optimum loss smoothness calculation means among the Parzen distribution estimation means, the optimum loss smoothness calculation means, and the learning parameter adjustment means.

Ｐａｒｚｅｎ分布を構成するＰａｒｚｅｎ窓がガウス型関数であり、Ｐａｒｚｅｎ分布推定手段は、クラスＣ_ｙに属する標本から１個の標本を取除き、残りの標本でＰａｒｚｅｎ推定分布を構成するためのＰａｒｚｅｎ推定分布構成手段と、Ｐａｒｚｅｎ推定分布構成手段を規定する式を、混合重み係数が１／（Ｎ´−１）（Ｎ´はクラスＣ_ｙに属する標本の個数）であるＮ´−１個の混合ガウス分布確率密度関数と見なし、ＥＭアルゴリズムにより当該混合ガウス分布確率密度関数を最大化するＰａｒｚｅｎ分布窓幅ｈ_ｙを算出するための窓幅算出手段とを含んでも良い。 A Parzen window Gaussian functions constitute the Parzen distribution, Parzen distribution estimating means removes one specimen from the specimen belonging to the class _{C y,} Parzen estimate distribution for constituting the Parzen estimate distribution in the rest of the sample a configuration unit, the expression for defining the Parzen estimate distribution arrangement means, mixing weighting factor 1 / (N'-1) ( N' the number of samples belonging to the class _{C y)} N'-1 single Gaussian mixture is It may be considered as a distribution probability density function, and may include a window width calculation means for calculating a Parzen distribution window width _hy that maximizes the mixed Gaussian distribution probability density function by an EM algorithm.

本発明の第２の局面に係るコンピュータプログラムは、入力パターンをＪ個のクラスＣ_ｊ（ｊは１〜Ｊの整数）のいずれかに分類するために、コンピュータを、各々が入力パターンとその属するクラスとを含むＮ個（Ｎは正の整数）の学習標本を記憶するための学習標本記憶手段と、分類器の学習パラメータΛを予め定めた設定方法により初期化するための初期化手段として機能させるコンピュータプログラムである。クラスＣ_ｙに属する学習標本の入力パターンｘが他のクラスに誤分類される誤分類測度値ｄ_ｙ（ｘ，Λ）が以下により定義される。 The computer program according to the second aspect of the present invention classifies an input pattern into any one of J classes C _j (j is an integer from 1 to J), and each of the computers belongs to the input pattern. Functions as learning sample storage means for storing N learning samples (N is a positive integer) including classes, and initialization means for initializing the learning parameters Λ of the classifier by a predetermined setting method It is a computer program to make it. A misclassification measure value d _y (x, Λ) in which the input pattern x of the learning sample belonging to the class C _y is misclassified into another class is defined as follows.

ただしψは正の実数であり、ｇ_ｙ（ｘ、Λ）はＪ個のクラスＣ_ｙの各々に対して、学習標本の入力パターンｘが当該クラスに属するか否かの度合いを判別するための、任意の形の判別関数である。このコンピュータプログラムは、コンピュータをさらに、Ｊ個のクラスＣ_ｙの各々について、当該クラスＣ_ｙに属する学習標本の各々に関する前記誤分類測度値を求め、当該クラスに属する標本を生成した真の確率分布を、誤分類測度空間における各誤分類測度値を中心とする、Ｐａｒｚｅｎ窓幅ｈ_ｙのＰａｒｚｅｎ分布であって、かつ誤分類測度値の関数として、交差確認型最尤推定により推定するためのＰａｒｚｅｎ分布推定手段として機能させる。
当該Ｐａｒｚｅｎ分布推定手段は、交差確認型最尤推定において、Ｐａｒｚｅｎ窓幅ｈ_ｙの関数としてＰａｒｚｅｎ分布の尤度を評価する。このコンピュータプログラムは、コンピュータをさらに、Ｊ個のクラスＣ_ｙの各々について、Ｐａｒｚｅｎ分布推定手段による交差確認型最尤推定において最尤となるＰａｒｚｅｎ分布を与えるＰａｒｚｅｎ窓幅ｈ_ｙに対し、以下の関数

However the ψ is a positive real number, g y _(x, Λ) for each of J-number of class C _y, the input pattern x of training samples are used to determine the degree of whether belonging to the class A discriminant function of any form. The computer program, the computer further, for each of the J Class C _y, determined the misclassification measure value for each of the training samples belonging to the class C _y, the true probability distribution that generated the sample belonging to the class Is a Parzen distribution with a Parzen window width _hy , centered on each misclassified measure value in the misclassified measure space, and Parzen for estimating by cross-confirmed maximum likelihood estimation as a function of the misclassified measure value It functions as a distribution estimation means.
The Parzen distribution estimation means evaluates the likelihood of the Parzen distribution as a function of the Parzen window width _{hy in} the cross-confirmation type maximum likelihood estimation. The computer program further computer, for each of the J Class _{C y,} to Parzen window width _{h y} giving the Parzen distribution as a maximum likelihood in cross-validation type maximum likelihood estimation by Parzen distribution estimating means, the following functions

によって、分類器のクラスＣ_ｙに対する損失平滑度の最適値α_ｙを算出するための最適損失平滑度算出手段と、学習標本集合から学習標本を１つずつ取出し、分類誤りリスクを最小化するよう、学習パラメータΛを逐次的に調整するための学習パラメータ調整手段と、Ｐａｒｚｅｎ分布推定手段、最適損失平滑度算出手段、及び学習パラメータ調整手段とを、予め定める終了条件が成立するまで繰返し動作させ、終了条件が成立したときの学習パラメータΛを出力するための繰返し制御手段として機能させる。

Accordingly, the optimum loss smoothness calculating means for calculating the optimum value alpha _y loss smoothness for the class C _y classifier, taken out from the learning sample set, one of the training samples, to minimize the classification error risk The learning parameter adjusting means for sequentially adjusting the learning parameter Λ, the Parzen distribution estimating means, the optimum loss smoothness calculating means, and the learning parameter adjusting means are repeatedly operated until a predetermined termination condition is satisfied, It functions as an iterative control means for outputting the learning parameter Λ when the end condition is satisfied.

分類誤り数損失関数のグラフである。It is a graph of classification error number loss function. ロジスティックシグモイド関数による平滑化分類誤り数損失関数のグラフである。It is a graph of the smoothing classification error number loss function by a logistic sigmoid function. 誤分類測度上の確率密度関数のＰａｒｚｅｎ推定を説明するための模式的グラフである。It is a schematic graph for demonstrating Parzen estimation of the probability density function on a misclassification measure. ｄ_ｙ（ｘ_ｙ ^ｋ，Λ）を中心とした幅ｈ_ｙのＰａｒｚｅｎ窓を示すグラフである。 _{_{^{d y (x y k, Λ}}} ) is a graph showing the Parzen window width _{h y} around the. ｄ_ｙ（ｘ_ｙ ^ｋ，Λ）を中心としたガウス関数型Ｐａｒｚｅｎ窓の正領域に対する積分値が、ロジスティックシグモイド関数型の平滑化分類誤り数損失関数のｄ_ｙ（ｘ_ｙ ^ｋ，Λ）における値に近似的に等しいことを模式的に示す図である。 _{_{^{d y (x y k, Λ}}} ) integral value for the positive region of a Gaussian Parzen window centered on _{_{^{the, d y (x y k,}}} Λ) of the logistic sigmoid function smoothing classification error count loss function values in It is a figure which shows typically that it is approximately equal to. Ｐａｒｚｅｎ窓の窓幅と、平滑化分類誤り数損失関数の滑らかさとの関係を説明するための図である。It is a figure for demonstrating the relationship between the window width of Parzen window, and the smoothness of the smoothing classification error number loss function. １個の標本ｘ_ｎを取除いた残りの標本で構成されるＰａｒｚｅｎ推定分布を示す図である。It is a figure which shows Parzen estimated distribution comprised by the remaining samples which remove | excluded one sample _xn . 第１の実施の形態において、誤分類測度空間上のＰａｒｚｅｎ推定に適用することにより得られる、損失平滑度の自動制御を伴うＭＣＥ学習を実現するプログラムの制御構造を示すフローチャートである。It is a flowchart which shows the control structure of the program which implement | achieves MCE learning with automatic control of loss smoothness obtained by applying to Parzen estimation on misclassification measure space in 1st Embodiment. 図８のステップ１５６で実行される、ＥＭアルゴリズムにより最適なＰａｒｚｅｎ窓幅を得るプログラムの制御構造を示すフローチャートである。10 is a flowchart showing a control structure of a program for obtaining an optimum Parzen window width by an EM algorithm, which is executed in step 156 of FIG. 8. 本発明の１実施の形態を実現する汎用のコンピュータシステムのハードウェア外観を示す図である。It is a figure which shows the hardware external appearance of the general purpose computer system which implement | achieves 1 embodiment of this invention. 図１０に示すコンピュータシステムの内部構造のブロック図である。It is a block diagram of the internal structure of the computer system shown in FIG. 本発明の第２の実施の形態において、損失平滑度の自動制御を伴うＭＣＥ学習を実現するプログラムの制御構造を示すフローチャートである。In the 2nd Embodiment of this invention, it is a flowchart which shows the control structure of the program which implement | achieves MCE learning accompanied by the automatic control of loss smoothness.

以下、本発明の実施の形態を説明する。いかの説明及び図面において、同一の部品には同一の参照番号を付してある。それらの名称及び機能も同一である。したがって、それらについての詳細な説明は繰返さない。なお、＾Ｌ（Λ）の最小化に関して、最急降下法などのバッチ的手法だけではなく、Ω_Ｎから１個の標本（ｘ_ｎ，ｙ_ｎ）を抽出する度にΛを調整する適応的な学習方法も広く用いられている。その方法におけるΛの調整機構は次式で与えられる。 Embodiments of the present invention will be described below. In the description and drawings, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated. In addition, regarding the minimization of ^ L (Λ), not only a batch method such as a steepest descent method, but also an adaptive adjustment of Λ every time one sample (x _n , y _n ) is extracted from Ω _N Learning methods are also widely used. The adjustment mechanism of Λ in the method is given by the following equation.

以下の実施の形態では、この適応的学習方法を採用することとする。
多くのＭＣＥ学習の実装において、式（７）に関し、全てのクラスＣ_ｙ（ｙ＝１，…，Ｊ）に対して共通の損失平滑度α_ｙが設定される。しかし本実施の形態では、後述するようにクラス毎に損失平滑度α_ｙの自動制御が行われるので、損失平滑度α_ｙは各クラス個別に設定されるものとする。

In the following embodiment, this adaptive learning method is adopted.
In many MCE learning implementations, with respect to equation (7), a common loss smoothness α _y is set for all classes C _y (y = 1,..., J). However, in this embodiment, the loss smoothness α _y is automatically controlled for each class as will be described later, and therefore the loss smoothness α _y is set for each class individually.

〈Ｐａｒｚｅｎ推定に基づくＭＣＥ学習の再定式化〉
ＭＣＥ学習法は、元々は前節で述べたような０‐１損失を用いた分類誤り数カウントという考え方で定式化された。それとは別に、分類誤り確率を直接推定する定式化もなされている（非特許文献２）。その分類誤り確率の推定は、非パラメトリックな確率密度推定法の一つであるＰａｒｚｅｎ推定を、誤分類測度の空間上で適用することによって実現している。この定式化によって、与えられた学習標本集合だけを対象とするのではなく、それらの近傍に存在していることが予測される未知のパターンをも考慮した最適化をＭＣＥ学習は行なっていると捉えることができる。以下ではこの定式化の手順を説明する。 <Reformulation of MCE learning based on Parzen estimation>
The MCE learning method was originally formulated based on the concept of counting the number of classification errors using 0-1 loss as described in the previous section. Apart from that, there is also a formulation for directly estimating the classification error probability (Non-Patent Document 2). The estimation of the classification error probability is realized by applying Parzen estimation, which is one of the nonparametric probability density estimation methods, on the misclassification measure space. With this formulation, MCE learning does not target only a given set of learning samples, but performs optimization that takes into account unknown patterns that are predicted to exist in the vicinity of them. Can be caught. In the following, the formulation procedure will be described.

初めに、分類誤り数リスクを表す式（６）における、パターン空間全体を積分範囲とした積分を、その部分集合全体を積分範囲とした積分に置換える。 First, the integration with the entire pattern space as the integration range in the equation (6) representing the classification error number risk is replaced with the integration with the entire subset as the integration range.

ここでΧ_ｙ（Λ）は以下の式で表される。

Here, _{ｙ y} (Λ) is expressed by the following equation.

Χ_ｙ（Λ）は判別関数｛ｇ_ｊ（ｘ，Λ）｝_ｊ＝１ ^Ｊによる分類結果が誤分類となるようなクラスＣ_ｙに属するパターンの集合である。

Chi _y (lambda) is the set of discriminant function _{_{{g j (x, Λ)}} } j = 1 classification result by ^J belongs to the class _{C y} such that misclassification pattern.

次に式（１１）のΧ_ｙ（Λ）を積分範囲とした入力パターン空間上での積分を、誤分類測度の値が正となる領域を積分範囲とした誤分類測度空間での積分に置換える。ここで、ｘと｛ｇ_ｊ（ｘ，Λ）｝_ｊ＝１ ^Ｊが連続確率変数と見なせることから、誤分類測度ｄ_ｙ（ｘ，Λ）も連続確率変数である。すると式（１１）における積分は、各クラスＣ_ｙにおいて、以下のように置換えることができる。 Next, the integration in the input pattern space with Χ _y (Λ) as the integration range in Equation (11) is replaced with the integration in the misclassification measure space with the region where the value of the misclassification measure is positive as the integration range. The Here, since x and {g _j (x, Λ)} _{j = 1} ^J can be regarded as continuous random variables, the misclassification measure d _y (x, Λ) is also a continuous random variable. Then the integral in equation (11), in each class C _y, can be replaced as follows.

ここでｐ_Λ（ｔ｜Ｃ_ｙ）は、連続確率変数と考えた誤分類測度ｄ_ｙ（ｘ，Λ）の出現確率を表現する確率密度関数である。ｄ_ｙ（ｘ，Λ）がΛに依存するため、この確率密度関数もΛに依存する。このような置き換えを行った積分を用いた分類誤り数リスクは、以下のようになる。

Here, p _Λ (t | C _y ) is a probability density function expressing the appearance probability of the misclassification measure d _y (x, Λ) considered as a continuous random variable. Since d _y (x, Λ) depends on Λ, this probability density function also depends on Λ. The classification error number risk using the integration with such replacement is as follows.

注目すべき点は、積分が普段高次元となる元々のパターン空間ではなく１次元の誤分類測度の空間上で行われることと、１次元空間上の条件付確率密度ｐ_Λ（ｔ｜Ｃ_ｙ）のモデル化がパラメータ学習の新たなアプローチを示唆していることである。

It should be noted that the integration is performed in the space of the one-dimensional misclassification measure instead of the original pattern space, which is usually high-dimensional, and the conditional probability density p _Λ (t | C _{y in the} one-dimensional space. ) Suggests a new approach to parameter learning.

この知見に基づき、新しいＭＣＥ学習の定式化は、各クラスＣ_ｙにおいて、有限個の学習標本｛ｘ^ｙ _ｋ｝_ｋ＝１ ^Ｎｙを用いてｐ_Λ（ｔ｜Ｃ_ｙ）を近似するための次式のＰａｒｚｅｎ推定分布を導入する（図３）。 Based on this knowledge, the new MCE learning formulation is the following for approximating p _Λ (t | C _y ) with a finite number of learning samples {x ^y _k } _{k = 1} ^Ny in each class C _y . Introduce the Parzen estimated distribution of the equation (FIG. 3).

ここでｘ_ｋ ^ｙはクラスＣ_ｙに属する学習標本のうち、ｋ番目の学習標本、Ｎ_ｙはクラスＣ_ｙに属する学習標本の総数である。

Here x _k ^y among the training samples belonging to the class C _y, k-th training samples, N _y is the total number of training samples belonging to the class C _y.

は誤分類測度空間上に変換されたデータ点ｄ_ｙ（ｘ^ｋ _ｙ，Λ）を中心とした幅ｈ_ｙのＰａｒｚｅｎ窓５０である（図４）。

Is a Parzen window 50 is converted into the misclassification measure spatial data points _{^{_{d y (x k y, Λ}}} ) width _{h y} centered (Fig. 4).

図３において、横軸は誤分類測度、縦軸はその出現確率分布（確率密度）を表している。ｘ_ｋ ^ｙはｙ番目のクラスＣ_ｙに属するｋ番目の学習パターン（全部でＮ_ｙ個）、ｄｙ（ｘ_ｋ ^ｙ，Λ）はその誤分類測度値である。Λは前述の通り、認識器の学習パラメータ集合を表す。図中に示すＰａｒｚｅｎ窓群４０は、各々がｄ_ｙ（ｘ_ｋ ^ｙ，Λ）を中心とする多数のＰａｒｚｅｎ窓（ｋ＝１，…，Ｎ_ｙ）を含む。ｙ番目のクラスＣ_ｙに属する全ての学習パターンに対してこのＰａｒｚｅｎ窓の相加平均をとることにより、クラスＣ_ｙにおける誤分類測度分布の近似（近似分布４２）が得られる。さらに、この近似分布４２を正の領域で積分したもの（図中のハッチング部分）は、正解クラスがＣ_ｙであるパターンをＣ_ｙ以外のクラスに誤分類する確率の近似値となる。 In FIG. 3, the horizontal axis represents the misclassification measure, and the vertical axis represents the appearance probability distribution (probability density). x _k ^y is _{(N y} pieces in total) k-th learning pattern belonging to the y-th class _{_{^{C y, dy (x k y}}} , Λ) is the misclassification measure value. As described above, Λ represents a learning parameter set of the recognizer. Parzen window group 40 shown in the figure, a number of Parzen window, each centered _{_{^{d y (x k y, Λ}}} ) to _{(k = 1, ..., N} y) containing. By taking the arithmetic mean of the Parzen window for all of the learning patterns belonging to y-th class C _y, approximation of misclassification measure distribution of class C _y (approximate distribution 42) is obtained. Furthermore, the integral of this approximation distribution 42 in the positive region (hatched portion in the drawing) is an approximation of the probability that correct class misclassified pattern is C _y to a class other than C _y.

図４を参照して、通常、Ｐａｒｚｅｎ窓５０は、データ点に対して左右対称で、その値が正の単峰性の関数である。式（１３）のｐ_Λ（ｔ｜Ｃ_ｙ）をＰａｒｚｅｎ推定分布＾ｐ_Λ（ｔ｜Ｃ_ｙ）で近似し、更にＰ（Ｃ_ｙ）をＮ_ｙ／Ｎで近似することにより、分類誤り数リスクの有限学習標本集合Ω_Ｎに基づく推定値が次式で表現されることとなる。 Referring to FIG. 4, the Parzen window 50 is generally a unimodal function whose value is positive with respect to the data point. P _lambda expression _{(13) (t | C y} ) the Parzen estimate distribution ^ p _lambda | By approximated by (t _{C y),} further approximates P a _{(C y)} in _N y / N, classification error count The estimated value based on the finite learning sample set Ω _N of the risk is expressed by the following equation.

ここで重要なことに、次式（１６）のように、損失関数＿ｌ_ｙ（ｄ_ｙ（ｘ，Λ））を各々のＰａｒｚｅｎ窓の正領域の積分として新たに定義することにより、式（１５）のＲ_Ｎ（Λ）が式（９）の経験的平均損失＾Ｌ（Λ）、すなわちＭＣＥ学習の評価基準と一致する。

Significantly, the loss function _l _y (d _y (x, Λ)) is newly defined as the integral of the positive region of each Parzen window, as shown in the following equation (16). ) of _{R N} (lambda) is the empirical average loss of the formula (9) ^ L (Λ) , that is consistent with the evaluation criteria MCE learning.

興味深いことに、重要な損失関数が馴染み深い窓関数から導かれる。例えば、ガウス関数型の窓関数

Interestingly, important loss functions are derived from familiar window functions. For example, a Gaussian window function

を採用した場合、式（１６）の＿ｌ_ｙ（ｄ_ｙ（ｘ，Λ））は式（７）のロジスティックシグモイド関数に似た損失関数となる。実際、

Is used, _l _y (d _y (x, Λ)) in equation (16) is a loss function similar to the logistic sigmoid function in equation (7). In fact,

と設定した場合、式（７）と式（１６）とは極めて近いものとなる（図５）。ただし両者が厳密には異なることには注意する必要がある。

Is set to be very close to (7) and (16) (FIG. 5). However, it should be noted that the two are strictly different.

図５と図６とを比較すると明らかなように、幅ｈ_ｙが広いＰａｒｚｅｎ窓６０の方が、幅ｈ_ｙの狭いＰａｒｚｅｎ窓７０より損失関数が滑らかとなる。つまり、Ｐａｒｚｅｎ窓の窓幅ｈが損失関数の平滑度を表現する。個々のＰａｒｚｅｎ窓６０又はＰａｒｚｅｎ窓７０を正の領域で積分したもの（図５及び図６の左側のハッチング部分）は、ＭＣＥで用いられる平滑化分類誤り損失に対応する（図５の曲線６２及び図６の曲線７２）。これらの相加平均は、誤分類確率の近似値となると同時に、ＭＣＥ学習における最小化対象の評価基準ともなる。 Figure 5 and Figure 6 and as is apparent from a comparison of found the width _{h y} wide Parzen window 60, a smooth loss function narrower Parzen window 70 width _{h y.} That is, the window width h of the Parzen window expresses the smoothness of the loss function. The integration of the individual Parzen window 60 or Parzen window 70 in the positive region (the hatched portion on the left side of FIGS. 5 and 6) corresponds to the smoothed classification error loss used in MCE (curve 62 and FIG. 5). Curve 72 in FIG. These arithmetic averages are approximate values of misclassification probabilities, and at the same time, are evaluation criteria for minimization targets in MCE learning.

上述のＰａｒｚｅｎ推定の枠組みにより、元来のＭＣＥ定式化に対して新しい損失平滑化制御のメカニズムがもたらされる。式（１４）のＰａｒｚｅｎ推定分布が真の確率分布ｐ_Λ（ｔ｜Ｃ_ｙ）を正確に近似すればするほど、式（１５）または式（９）の経験的平均損失が、式（１３）または式（６）の分類誤り数リスクのより良い近似となる。言い換えれば、式（１４）のＰａｒｚｅｎ推定分布が真の分布の良好な近似となるように、窓幅ｈ_ｙを推定すれば、有限個の学習標本で構成されるＭＣＥ学習の評価基準が未知標本も含む全パターン空間に対する分類誤り確率に近づく。そしてこのとき、ＭＣＥ学習の評価基準の最小状態が全パターン空間に対する分類誤り確率の最小状態に近づくこととなり、ＭＣＥ学習の汎化能力が向上する。 The Parzen estimation framework described above provides a new loss smoothing control mechanism for the original MCE formulation. The more closely the Parzen estimated distribution of equation (14) approximates the true probability distribution p _Λ (t | C _y ), the more empirical average loss of equation (15) or equation (9) becomes, Or it is a better approximation of the risk of classification error in equation (6). In other words, if the window width _hy is estimated so that the Parzen estimated distribution of Equation (14) is a good approximation of the true distribution, the evaluation criterion for MCE learning composed of a finite number of learning samples is unknown samples. It approaches the classification error probability for the entire pattern space including. At this time, the minimum state of the evaluation criteria for MCE learning approaches the minimum state of the classification error probability for the entire pattern space, and the generalization ability of MCE learning is improved.

〈交差確認型最尤推定に基づくＰａｒｚｅｎ推定〉
本節と次節では、ＭＣＥ学習における損失平滑度制御に限定されない一般的なＰａｒｚｅｎ推定の議論を展開するので、与えられた（Ｎ´個の）標本をｘ_ｋ（ｋ＝１，２，…，Ｎ´）、Ｐａｒｚｅｎ窓幅をｈとし、｛ｘ_ｋ｝_ｋ＝１ ^Ｎ´を生成した真の確率分布のＰａｒｚｅｎ推定という一般的な問題を扱う。ＭＣＥ学習における損失平滑度制御に対しては、Ｎ´＝Ｎ_ｙおよびｘ_ｋ＝ｄ_ｙ（ｘ_ｙ ^ｋ，Λ）（ｋ＝１，２，…，Ｎ_ｙ）として以下を適用する。これを各クラスＣ_ｙ（ｙ＝１，２，…，Ｊ）に対して行なう。 <Parzen estimation based on cross-confirmed maximum likelihood estimation>
In this section and the next section, a discussion of general Parzen estimation that is not limited to loss smoothness control in MCE learning will be developed. Therefore, given (N ′) samples are _represented by x _k (k = 1, 2,..., N '), a Parzen window width is _h, treat the common problem of Parzen estimate of the true probability distribution that generated _{^{{x k} k = 1 N'}} . For loss smoothness control in MCE learning, the following applies as N ′ = N _y and x _k = d _y (x _y ^k , Λ) (k = 1, 2,..., N _y ). This is performed for each class C _y (y = 1, 2,..., J).

Ｐａｒｚｅｎ推定分布が真の分布を良好に近似するような窓幅ｈを決定するために、最尤推定法によりｈを推定することを考える。ただし、平均または分散などの特性値を用いて分布モデルを構成するパラメトリック推定法と違い、非パラメトリック推定であるＰａｒｚｅｎ推定は全ての標本を用いて推定分布を構成するため、同じ標本を推定分布に代入して尤度関数の最大化を行なうことができない（窓幅が０になってしまう）。そこで非特許文献３では、１個の標本を取除いた標本集合でＰａｒｚｅｎ推定分布を構成し、取除いた標本を推定分布に代入することによる、交差確認型最尤推定が定式化されている。以下でこの概要を説明する。 In order to determine the window width h such that the Parzen estimated distribution is a good approximation to the true distribution, consider estimating h by the maximum likelihood estimation method. However, unlike the parametric estimation method that uses a characteristic value such as mean or variance, Parzen estimation, which is non-parametric estimation, constructs an estimated distribution using all samples, so the same sample is used as the estimated distribution. The likelihood function cannot be maximized by substituting (the window width becomes zero). Therefore, in Non-Patent Document 3, cross-confirmation type maximum likelihood estimation is formulated by forming a Parzen estimated distribution with a sample set from which one sample is removed, and substituting the removed sample into the estimated distribution. . This outline will be described below.

図７を参照して、与えられたＮ´個の標本ｘ_ｋ(ｋ＝１，２，…，Ｎ´)から、ｎ番目の標本ｘ_ｎを取除き、残りの標本でＰａｒｚｅｎ推定分布８２を構成する。図７において破線で表されたＰａｒｚｅｎ窓８０が、取除かれた標本ｘ_ｎに対応する。 Referring to FIG. 7, the n-th sample x _n is removed from the given N ′ samples x _k (k = 1, 2,..., N ′), and the Parzen estimated distribution 82 is obtained with the remaining samples. Configure. A Parzen window 80 represented by a broken line in FIG. 7 corresponds to the removed sample _xn .

取除いた標本ｘ_ｎを上式に代入し、全ての標本に対してこれの積をとった尤度関数を定める。

The removed sample _xn is substituted into the above equation, and a likelihood function is determined by taking the product of all the samples.

そして上式のＬ（ｈ）を最大化するｈを求める。以上が交差確認型最尤推定の概要である。
〈ＥＭアルゴリズムに基づくＰａｒｚｅｎ窓幅決定法＞
以下、本実施の形態の構成について説明する。式（２０）のＰａｒｚｅｎ窓幅ｈによる最大化は、一般的には多くの計算量を要する。本実施の形態に係る損失平滑度自動制御型ＭＣＥ学習（次節において説明する。）はＰａｒｚｅｎ窓幅ｈの最適化を多数回行なうため、この最適化を少ない計算量で実行するのが望ましい。そこで本実施の形態では、Ｐａｒｚｅｎ窓φが式（１７）のガウス型関数である場合において、ＥＭアルゴリズムによりＰａｒｚｅｎ窓幅ｈの最大化を効率的に行なう。以下、図９に示すフローチャートを参照しながら、このＥＭアルゴリズムについて説明する。

Then, h that maximizes L (h) in the above equation is obtained. The above is the outline of the intersection confirmation type maximum likelihood estimation.
<Parzen window width determination method based on EM algorithm>
Hereinafter, the configuration of the present embodiment will be described. The maximization by the Parzen window width h in the equation (20) generally requires a large amount of calculation. Since loss smoothness automatic control type MCE learning (described in the next section) according to the present embodiment optimizes the Parzen window width h many times, it is desirable to perform this optimization with a small amount of calculation. Therefore, in the present embodiment, when the Parzen window φ is a Gaussian function of Expression (17), the Parzen window width h is efficiently maximized by the EM algorithm. The EM algorithm will be described below with reference to the flowchart shown in FIG.

式（１９）を次式のように変形する。 Equation (19) is transformed into the following equation.

φが式（１７）のガウス型関数である場合、式（２２）中の次の項

If φ is a Gaussian function of equation (17), the next term in equation (22)

は平均ｙ_ｍ ^（ｎ）および分散ｈ^２のガウス分布確率密度関数と見なすこともできる。したがって、ｐ_−ｎ（ｔ｜ｈ）は、混合重み係数を均一の１／（Ｎ´−１）に固定した混合数Ｎ´−１の混合ガウス分布確率密度関数と形式的に見なされる。このとき、ｍを潜在変数と考えることもできるので、潜在変数を含む確率モデルの最尤推定に対する効率的な繰返し計算型アルゴリズムであるＥＭアルゴリズム（例えば、非特許文献４の第９章）を、式（２０）のＰａｒｚｅｎ窓幅ｈによる最大化に適用できる。

Can also be viewed as a Gaussian probability density function with mean y _m ⁽ⁿ⁾ and variance h ² . Therefore, p _−n (t | h) is formally regarded as a mixed Gaussian distribution probability density function of the number of mixing N′−1 in which the mixing weight coefficient is fixed to uniform 1 / (N′−1). At this time, since m can also be considered as a latent variable, an EM algorithm (for example, Chapter 9 of Non-Patent Document 4), which is an efficient iterative calculation algorithm for maximum likelihood estimation of a probability model including a latent variable, This can be applied to the maximization by the Parzen window width h in the equation (20).

繰返し計算の直前のステップにおいてＰａｒｚｅｎ窓幅ｈの推定値＾ｈが得られていると仮定する。ＥＭアルゴリズムで用いられる負担率は次式となる。 Assume that an estimated value ^ h of the Parzen window width h is obtained in the step immediately before the iterative calculation. The burden factor used in the EM algorithm is as follows.

ＥＭアルゴリズムで定義される補助関数（Ｑ関数）は次式となる。

The auxiliary function (Q function) defined by the EM algorithm is as follows.

ただし“Ｃｏｎｓｔ．”はｈに無関係の項を表す。上式をｓに関して微分し、更にこれが０となるｓを求めることにより、上式の補助関数を最小にするＰａｒｚｅｎ窓幅ｈの平方（ｈ^２）が以下で与えられる。

However, “Const.” Represents a term unrelated to h. By differentiating the above equation with respect to s and finding s where it becomes 0, the square (h ² ) of the Parzen window width h that minimizes the auxiliary function of the above equation is given below.

更にここで、ｑ´_ｍ，ｎ（ｍ＝１，…，Ｎ´；ｍ≠ｎ）を以下で定める。

Further, q ′ _{m, n} (m = 1,... _{, N} ′; m ≠ n) is defined as follows.

このときｑ´_ｍ，ｎおよび補助関数を最小にするｈ^２は以下となる。

At this time, q ′ _{m, n} and h ² that minimizes the auxiliary function are as follows.

アルゴリズム表記の煩雑さを避けるため、再びｑ´_ｍ，ｎをｑ_ｍ，ｎと置きなおす。結局、Ｐａｒｚｅｎ窓幅の最尤推定に対するＥＭアルゴリズムは以下でまとめられる。

In order to avoid the complexity of the algorithm notation, q ′ _{m, n} is replaced with q _{m, n} again. Finally, the EM algorithm for maximum likelihood estimation of the Parzen window width is summarized below.

（１）初期値ｈ^（０）＞０を与える。＿ｌ＝０と設定。（図９のステップ２２０）
（２）次式のｑ_ｍ，ｎを計算する（ｎ＝１，…，Ｎ´；ｍ＝１，…，Ｎ´，ｍ≠ｎ）。（図９のステップ２２２、２２４及び２２６） (1) An initial value h ⁽⁰⁾ > 0 is given. Set _l = 0. (Step 220 in FIG. 9)
(2) Calculate q _{m, n} of the following equation (n = 1,..., N ′; m = 1,..., N ′, m ≠ n). (Steps 222, 224 and 226 in FIG. 9)

（３）以下のようにパラメータの再推定をする。（ステップ２２８）

(3) Re-estimate parameters as follows. (Step 228)

（４）ｈが収束条件を満たしていれば（ステップ２３０でＹＥＳ）終了、さもなくば（ステップ２３０でＮＯ）、＿ｌ←＿ｌ＋１として（ステップ２３２）、（２）へ。収束条件としては、本実施の形態では、次式の対数尤度の収束を採用する。

(4) If h satisfies the convergence condition (YES in step 230), the process ends. If not (NO in step 230), _l ← _l + 1 is set (step 232) to (2). As a convergence condition, the present embodiment employs logarithmic likelihood convergence of the following equation.

または、予め繰返し回数の上限Ｉを設定しておき、繰返し回数＿ｌがＩに達したかまたは上式の対数尤度が収束したと判断されたとき、繰返しを終了させても良い。初期値ｈ^（０）は、合理的な設定法であればどのような手法で設定しても良い。例えば以下のアルゴリズムを適用できる。
（１）各標本ｘ_ｎ（ｎ＝１，…，Ｎ´）に対して、自身以外で最も近いデータを割当てる。

Alternatively, the upper limit I of the number of repetitions may be set in advance, and the repetition may be terminated when it is determined that the number of repetitions_l has reached I or the log likelihood of the above equation has converged. The initial value h ⁽⁰⁾ may be set by any method as long as it is a reasonable setting method. For example, the following algorithm can be applied.
(1) The nearest data other than itself is assigned to each sample x _n (n = 1,..., N ′).

（２）以下のようにパラメータの初期値を与える。

(2) The initial value of the parameter is given as follows.

〈損失平滑度自動制御型ＭＣＥ学習〉
［第１の実施の形態］
前節で説明したＥＭアルゴリズムに基づくＰａｒｚｅｎ窓幅決定法を、誤分類測度空間上のＰａｒｚｅｎ推定に適用することにより、損失平滑度の自動制御を伴うＭＣＥ学習アルゴリズムが定型化できる。具体的には、以下のアルゴリズムとなる。図８を参照しながら説明する。

<Loss smoothness automatic control type MCE learning>
[First Embodiment]
By applying the Parzen window width determination method based on the EM algorithm described in the previous section to Parzen estimation in a misclassification measure space, an MCE learning algorithm with automatic loss smoothness control can be standardized. Specifically, the following algorithm is used. This will be described with reference to FIG.

（１）分類器学習パラメータΛの初期値Λ^（０）を設定する。エポック回数ｅの上限値Ｅを設定し、ｅ＝０に設定する。（ステップ１４０）
（２）ｅ＝０，１，…，Ｅに対して、以下のステップ１８０の処理を実行する。（ステップ１４２）ステップ１８０は以下のサブステップ（ａ）、（ｂ）及び（ｃ）を含む。 (1) The initial value Λ ⁽⁰⁾ of the classifier learning parameter Λ is set. Set the upper limit E of the number of epochs e and set e = 0. (Step 140)
(2) The following step 180 is executed for e = 0, 1,. (Step 142) Step 180 includes the following sub-steps (a), (b) and (c).

（ａ）全てのクラスｙ＝１，…，Ｊに対して以下のサブステップａ１〜ａ３を含むステップ１５０を繰返す。
（ａ１）全てのクラスｊ（ｊ＝１…，Ｊ）に対する、クラスｙに属する全ての学習標本（この個数をＮ_ｙ個とする。）の判別関数値ｇ_ｊ（ｘ_ｋ ^ｙ，Λ^（ｅ））を計算する（ｊ＝１，…，Ｊ；ｋ＝１，…，Ｎ_ｙ）。（ステップ１４４，１４６及び１４８）
（ａ２）クラスｙに対して、以下のステップｉ〜ｉｉｉを実行する。（ステップ１５２〜ステップ１５８）
ｉ．誤分類測度値ｄ_ｙ（ｘ_ｋ ^ｙ，Λ^（ｅ））を計算する（ｋ＝１，…，Ｎ_ｙ）。（ステップ１５２）
ｉｉ．Ｎ´＝Ｎ_ｙおよびｘ_ｋ＝ｄ_ｙ（ｘ_ｋ ^ｙ，Λ^（ｅ））（ｋ＝１，…，Ｎ_ｙ）として（ステップ１５４）、前節のＥＭアルゴリズムに基づくＰａｒｚｅｎ窓幅決定法を実行し、最適窓幅ｈ_ｙを得る。（ステップ１５６）
ｉｉｉ．式（１８）を実行して、最適な損失平滑度α_ｙを得る。（ステップ１５８）
（ｂ）学習標本集合Ω_Ｎから、学習標本（ｘ_ｎ，ｙ_ｎ）を取出して（ステップ１６２）、式（１０）による分類器パラメータΛの調整を行なう（ステップ１６４）。これを各々の学習標本に対して順番に１回ずつ実行する（ステップ１６０）。全標本に対する調整が終了した時点で、新たな分類器パラメータΛ^{（ｅ＋１）}を得る。（ステップ１６６）
（ｃ）Ω_Ｎにおける学習標本の並び順をシャッフルする。（ステップ１６８）
上記アルゴリズムにおけるサブステップ２ａは、毎エポックｅにおいて実行しても良いが、実行間隔Ｅ´を設定して、エポックｅが間隔Ｅ´の整数倍であるときのみ実行するようにしても良い。 (A) Repeat step 150 including the following sub-steps a1 to a3 for all classes y = 1,..., J.
(A1) Discriminant function values g _j (x _k ^y , Λ ^(e ) for all learning samples (this number is N _y ) belonging to class y for all classes j (j = 1..., J) ⁾ ) Is calculated (j = 1,..., J; k = 1,..., N _y ). (Steps 144, 146 and 148)
(A2) The following steps i to iii are executed for the class y. (Steps 152 to 158)
i. A misclassification measure value d _y (x _k ^y , Λ ^(e) ) is calculated (k = 1,..., N _y ). (Step 152)
ii. N ′ = N _y and x _k = d _y (x _k ^y , Λ ^(e) ) (k = 1,..., N _y ) (step 154), and execute the Parzen window width determination method based on the EM algorithm in the previous section And the optimum window width _hy is obtained. (Step 156)
iii. Equation (18) is executed to obtain the optimum loss smoothness α _y . (Step 158)
(B) The learning sample (x _n , y _n ) is taken out from the learning sample set Ω _N (step 162), and the classifier parameter Λ is adjusted by equation (10) (step 164). This is executed once for each learning sample in turn (step 160). When the adjustment for all samples is completed, a new classifier parameter Λ ^{(e + 1)} is obtained. (Step 166)
(C) to shuffle the order of the training samples in Ω _N. (Step 168)
The sub-step 2a in the above algorithm may be executed at every epoch e, but may be executed only when the execution interval E ′ is set and the epoch e is an integral multiple of the interval E ′.

〈実験結果〉
上記実施の形態体による、ＥＭアルゴリズムに基づくＰａｒｚｅｎ窓幅決定法を用いた装置を用い、以下のような実験を行なった。 <Experimental result>
The following experiment was performed using the apparatus using the Parzen window width determination method based on the EM algorithm according to the above-described embodiment.

上記実施の形態に係る装置は、元来、多様な判別関数に対して適用可能である。ここでは、１例として、プロトタイプ・ベクトル（「プロトタイプ」と省略する。）とのユークリッド距離を判別関数とする分類器を用いた実験を行なった。「プロトタイプ・ベクトル」とは、各クラスを代表するベクトルのことをいう。プロトタイプは１クラスあたり複数個設けることができる。本実験では、各クラスのプロトタイプ数をクラス共通で８とした。距離と確率との近縁性より、この分類器は汎用性が高く、音声認識などで多用される隠れマルコフモデルなどの確率測度型の判別関数に容易に適用可能である。 The device according to the above embodiment is originally applicable to various discriminant functions. Here, as an example, an experiment was performed using a classifier having a Euclidean distance from a prototype vector (abbreviated as “prototype”) as a discriminant function. “Prototype vector” refers to a vector representing each class. Multiple prototypes can be provided per class. In this experiment, the number of prototypes in each class was set to 8 in common. Due to the closeness between distance and probability, this classifier is highly versatile and can easily be applied to probability measure type discriminant functions such as hidden Markov models often used in speech recognition and the like.

クラスＣ_ｊにおける判別関数は次式で与えられる。 The discriminant function in class C _j is given by

ここでｐ_ｊはＣ_ｊに属するプロトタイプの中でｘに最も近いものである。Λはすべてのプロトタイプの集合である。クラスＣ_ｙに属する学習標本ｘが与えられたとする。係数ψを∞にした式（５）の誤分類測度は、ｘに対するｂｅｓｔ‐ｉｎｃｏｒｒｅｃｔクラス（正解ではないが、正解に最も近い誤りクラス）をＣ_ｉとして

Here, p _j is the closest to x among the prototypes belonging to C _j . Λ is a set of all prototypes. _Assume that a learning sample x belonging to the class Cy is given. The misclassification measure of equation (5) with the coefficient ψ set to ∞ is the best-incorrect class for x (not the correct answer, but the error class closest to the correct answer) C _i

となる。

It becomes.

実験にはＵＣＩＭａｃｈｉｎｅＬｅａｒｎｉｎｇＲｅｐｏｓｉｔｏｒｙが提供するＧｌａｓｓＩｄｅｎｔｉｆｉｃａｔｉｏｎデータセットを用いた。このデータセットは６クラス２１４個のガラス標本パターンで構成されており、各ガラス標本の中に含まれる９種類の酸化物の含有量が、９次元ベクトル入力パターンとして与えられている。
データセットからある１つのパターンを認識対象として取除き、残りのパターンを用いて分類器を学習した後に、取除いたパターンを認識させるという処理を、２１４個全てのパターンに対して行なって認識率（オープン・データ認識率）を計算した (Leave‐One‐Out法)。また、取除いた一つのパターンを認識対象とするオープン・データ認識率計算に加えて、学習に用いた２１３個のパターンを対象にした認識率（クローズド・データ認識率）も計算した。クローズド・データ認識率の計算は２１４回行われるので、それらを平均したものを最終的なクローズド・データ認識率とした。 A Glass Identification data set provided by UCI Machine Learning Repository was used for the experiment. This data set is composed of 214 glass specimen patterns of 6 classes, and the contents of nine kinds of oxides contained in each glass specimen are given as 9-dimensional vector input patterns.
A recognition rate is obtained by removing one pattern from the data set as a recognition target, learning the classifier using the remaining patterns, and then recognizing the removed pattern for all 214 patterns. (Open data recognition rate) was calculated (Leave-One-Out method). In addition to the open data recognition rate calculation for the removed pattern as a recognition target, the recognition rate for the 213 patterns used for learning (closed data recognition rate) was also calculated. Since the calculation of the closed data recognition rate is performed 214 times, the average of them is used as the final closed data recognition rate.

テーブル１は従来型のＭＣＥ学習法に対する認識率の結果である。この方法では、式（７）の平滑化分類誤り損失の損失平滑度α_ｙを予め定められた値（各クラス共通の損失平滑度α）に固定してＭＣＥ学習が行われる。すなわち、前節のアルゴリズムにおいて、α_１＝…＝α_Ｊ＝αが固定値に固定されるとともに、サブステップ２ａが省略される。表では、複数種類の固定値である損失平滑度αに対する認識率が記されている。オープン・データ認識率の最高値は、α＝１．０のときに得られている（７５．２３％）。 Table 1 shows the recognition rate results for the conventional MCE learning method. In this method, MCE learning is performed with the loss smoothness α _y of the smoothed classification error loss in Expression (7) fixed to a predetermined value (loss smoothness α common to each class). That is, in the algorithm of the previous section, α ₁ =... = Α _J = α is fixed to a fixed value, and the sub-step 2a is omitted. In the table, the recognition rate for the loss smoothness α which is a plurality of types of fixed values is described. The highest open data recognition rate is obtained when α = 1.0 (75.23%).

テーブル２は、損失平滑度α_ｙを本発明技術により自動的に設定する新しいＭＣＥ学習法に対する認識率の結果である。

Table 2 shows the recognition rate results for a new MCE learning method in which the loss smoothness α _y is automatically set by the technique of the present invention.

オープン・データ認識率は、７４．３０％と、従来型のＭＣＥ学習法とほぼ同等の値が得られている。この値は、若干であるが、従来型のＭＣＥ学習法でのオープン・データ認識率の最大値より小さい。しかし、従来型のＭＣＥ学習法では、オープン・データ認識率を最大にする平滑化パラメータ（本実験では１．０）をアドホックに設定しなければならず、パターン認識器の学習に多大な労力を要することとなる。一方新しいＭＣＥ学習法では、平滑化パラメータがデータから自動的に算出され、学習の手間が大幅に削減されるとともに、従来型ＭＣＥ（の最大性能）と同等のオープン・データ認識率が得られている。

The open data recognition rate is 74.30%, which is almost the same value as the conventional MCE learning method. This value is slightly smaller than the maximum open data recognition rate in the conventional MCE learning method. However, in the conventional MCE learning method, the smoothing parameter (1.0 in this experiment) that maximizes the open data recognition rate must be set to ad hoc, and a great deal of effort is required to learn the pattern recognizer. It will be necessary. On the other hand, in the new MCE learning method, the smoothing parameter is automatically calculated from the data, and the labor of learning is greatly reduced, and the open data recognition rate equivalent to the conventional MCE (maximum performance) is obtained. Yes.

［コンピュータによる実現］
以上に説明した第１の実施の形態に係るパターン分類器の学習装置は、汎用コンピュータ及びその上で実行されるコンピュータプログラムにより実現することができる。図１０はこの実施の形態で用いられるコンピュータシステム５５０の外観を示し、図１１はコンピュータシステム５５０のブロック図である。ここで示すコンピュータシステム５５０は単なる例であって、他の構成も利用可能である。このコンピュータプログラムのうち、コアとなる部分は、図８及び図９のフローチャートにより示される制御構造を有する。 [Realization by computer]
The pattern classifier learning apparatus according to the first embodiment described above can be realized by a general-purpose computer and a computer program executed thereon. FIG. 10 shows the external appearance of a computer system 550 used in this embodiment, and FIG. 11 is a block diagram of the computer system 550. The computer system 550 shown here is merely an example, and other configurations can be used. The core part of this computer program has a control structure shown by the flowcharts of FIGS.

図１０を参照して、コンピュータシステム５５０は、コンピュータ５６０と、全てコンピュータ５６０に接続された、モニタ５６２と、キーボード５６６と、マウス５６８と、スピーカ５５８と、マイクロフォン５９０と、を含む。さらに、コンピュータ５６０はＤＶＤ−ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋＲｅａｄ−Ｏｎｌｙ−Ｍｅｍｏｒｙ：ディジタル多用途ディスク読出専用メモリ）ドライブ５７０と、半導体メモリドライブ５７２とを含む。 Referring to FIG. 10, a computer system 550 includes a computer 560, a monitor 562, a keyboard 566, a mouse 568, a speaker 558, and a microphone 590 that are all connected to the computer 560. Further, the computer 560 includes a DVD-ROM (Digital Versatile Disk Read-Only-Memory) drive 570 and a semiconductor memory drive 572.

図１１を参照して、コンピュータ５６０はさらに、ＤＶＤ−ＲＯＭドライブ５７０と半導体メモリドライブ５７２とに接続されたバス５８６と、全てバス５８６に接続された、ＣＰＵ５７６と、コンピュータ５６０のブートアッププログラムを記憶するＲＯＭ５７８と、ＣＰＵ５７６によって使用される作業領域を提供するとともにＣＰＵ５７６によって実行されるプログラムのための記憶領域となるＲＡＭ５８０と、観測値データ（学習データ）などを記憶するためのハードディスクドライブ５７４と、ネットワーク５５２への接続を提供するネットワークインターフェイス５９６とを含む。 Referring to FIG. 11, computer 560 further stores bus 586 connected to DVD-ROM drive 570 and semiconductor memory drive 572, CPU 576 all connected to bus 586, and a boot-up program for computer 560. ROM 578, RAM 580 which provides a work area used by CPU 576 and serves as a storage area for programs executed by CPU 576, hard disk drive 574 for storing observation data (learning data), and the like, network And a network interface 596 that provides a connection to 552.

上述の実施の形態のシステムを実現するソフトウェアは、ＤＶＤ−ＲＯＭ５８２又は半導体メモリ５８４等のコンピュータ読取可能な記録媒体に記録されたオブジェクトコード、スクリプト、又はソースプログラムの形で流通し、ＤＶＤ−ＲＯＭドライブ５７０又は半導体メモリドライブ５７２等の読出装置を介してコンピュータ５６０に提供され、ハードディスクドライブ５７４に記憶される。ソースプログラムでコンピュータ５６０に導入されるときには、所定のコンパイラでコンパイルしてオブジェクトコードを生成する必要がある。ＣＰＵ５７６がプログラムを実行する際には、オブジェクトプログラム（又はスクリプト）はハードディスクドライブ５７４から読出されてＲＡＭ５８０に記憶される。図示しないプログラムカウンタによって指定されたアドレスから命令がフェッチされ、その命令が実行される。ＣＰＵ５７６はハードディスクドライブ５７４から処理すべきデータを読出し、処理の結果をこれもまたハードディスクドライブ５７４に記憶する。スピーカ５５８とマイクロフォン５９０とは、直接に本発明とは関係ないが、スピーカ５５８は音声の再生時に必要である。音声についての学習データを収集するときには、発話データの収録にマイクロフォン５９０が必要となる。 The software that realizes the system of the above-described embodiment is distributed in the form of object code, script, or source program recorded on a computer-readable recording medium such as DVD-ROM 582 or semiconductor memory 584, and is a DVD-ROM drive. The data is provided to the computer 560 via a reading device such as 570 or the semiconductor memory drive 572 and stored in the hard disk drive 574. When the source program is introduced into the computer 560, it is necessary to compile with a predetermined compiler to generate an object code. When the CPU 576 executes the program, the object program (or script) is read from the hard disk drive 574 and stored in the RAM 580. An instruction is fetched from an address designated by a program counter (not shown), and the instruction is executed. The CPU 576 reads data to be processed from the hard disk drive 574 and stores the processing result in the hard disk drive 574 as well. The speaker 558 and the microphone 590 are not directly related to the present invention, but the speaker 558 is necessary when reproducing sound. When learning data about speech is collected, a microphone 590 is required for recording speech data.

学習用データは、予め収集され、入力パターンとそのパターンの属するクラスとの組を多数含む。学習用データは、ハードディスクドライブ５７４に記憶される。上記した処理により算出されるクラス分類用のパラメータセットΛは、一旦はハードディスクドライブ５７４などに記憶され、さらにネットワークを介して、又はＵＳＢメモリを介して、分類器にコピーされる。分類器はこれらクラス分類用のパラメータセットΛを用いて入力パターンをいずれかのクラスに分類する。 The learning data is collected in advance and includes a large number of sets of input patterns and classes to which the patterns belong. The learning data is stored in the hard disk drive 574. The class classification parameter set Λ calculated by the above processing is temporarily stored in the hard disk drive 574 or the like, and further copied to the classifier via the network or USB memory. The classifier classifies the input pattern into any class using the parameter set Λ for class classification.

コンピュータシステム５５０の一般的動作は周知であるので、詳細な説明はここでは繰返さない。 Since the general operation of computer system 550 is well known, detailed description will not be repeated here.

ソフトウェアの流通の方法に関して、ソフトウェアは必ずしも記憶媒体上に固定されたものでなくても良い。例えば、ソフトウェアはネットワークに接続された別のコンピュータから分配されても良い。ソフトウェアの一部がハードディスクドライブ５７４に記憶され、ソフトウェアの残りの部分をネットワーク上からハードディスクドライブ５７４に取込み、実行の際に統合する様にしても良い。 Regarding the software distribution method, the software does not necessarily have to be fixed on a storage medium. For example, the software may be distributed from another computer connected to the network. A part of the software may be stored in the hard disk drive 574, and the remaining part of the software may be taken into the hard disk drive 574 from the network and integrated at the time of execution.

典型的には、現代のコンピュータはコンピュータのオペレーティングシステム（ＯＳ）によって提供される一般的な機能を利用し、所望の目的に従って制御された態様で機能を達成する。従って、ＯＳ又はサードパーティから提供されうる一般的な機能を含まず、一般的な機能の実行順序の組合せのみを指定したプログラムであっても、そのプログラムが全体として所望の目的を達成する制御構造を有する限り、そのプログラムがこの発明の範囲に包含されることは明らかである。
［第２の実施の形態］
上記第１の実施の形態の図８に示すアルゴリズムにより得られる結果と同等の結果を、異なるアルゴリズムで得ることもできる。そうしたアルゴリズムを実現するプログラムのフローチャートを図１２に示す。
図１２を参照して、第２の実施の形態に係る、損失平滑度の自動制御を伴うＭＣＥ学習を実現するプログラムは、図８に示すものと同様のステップ１４０及び１４２を含む。ただし、ステップ１４２では、図８のステップ１８０に代えて、ステップ６００をすべてのエポックｅ＝０，…，Ｅに対して繰返す。
ステップ６００は、以下の処理ステップを含む。
（ａ）全ての学習標本および全てのクラスに対する判別関数値ｇ_ｊ（ｘ_ｎ，Λ^（ｅ））を計算する（ｊ＝１，…，Ｊ；ｎ＝１，…，Ｎ）。（ステップ６１０，６１２及び６１４）
（ｂ）各クラスｙ＝１，…，Ｊに対して、以下のステップｉ〜ｉｉｉからなるステップ６２０を実行する。（ステップ６１６）
ｉ．誤分類測度値ｄ_ｙ（ｘ_ｋ ^ｙ，Λ^（ｅ））を計算する（ｋ＝１，…，Ｎ_ｙ）。（ステップ１５２）
ｉｉ．Ｎ´＝Ｎ_ｙおよびｘ_ｋ＝ｄ_ｙ（ｘ_ｋ ^ｙ，Λ^（ｅ））（ｋ＝１，…，Ｎ_ｙ）として（ステップ１５４）、第１の実施の形態と同じＥＭアルゴリズムに基づくＰａｒｚｅｎ窓幅決定法を実行し、最適窓幅ｈ_ｙを得る。（ステップ１５６）
ｉｉｉ．式（１８）を実行して、最適な損失平滑度α_ｙを得る。（ステップ１５８）
ステップ６２０を全てのクラスｙ＝１，…，Ｊに対して実行すると、制御はステップ１６０に移る。ステップ１６０以下の処理は、図８に示したものと同様である。
この図１２に示すアルゴリズムを用いても、図８に示した第１の実施の形態によるものと同様の結果を得ることができる。 Typically, modern computers utilize the general functions provided by a computer operating system (OS) to achieve functions in a controlled manner according to the desired purpose. Therefore, a control structure that does not include a general function that can be provided from the OS or a third party, and that achieves a desired purpose as a whole even if the program specifies only a combination of execution orders of the general functions. It is obvious that the program is included in the scope of the present invention.
[Second Embodiment]
A result equivalent to the result obtained by the algorithm shown in FIG. 8 of the first embodiment can be obtained by a different algorithm. FIG. 12 shows a flowchart of a program that realizes such an algorithm.
Referring to FIG. 12, the program for realizing MCE learning with automatic loss smoothness control according to the second embodiment includes steps 140 and 142 similar to those shown in FIG. However, in step 142, step 600 is repeated for all epochs e = 0,..., E instead of step 180 in FIG.
Step 600 includes the following processing steps.
(A) Compute discriminant function values g _j (x _n , Λ ^(e) ) for all learning samples and all classes (j = 1,..., J; n = 1,..., N). (Steps 610, 612 and 614)
(B) For each class y = 1,..., J, execute step 620 consisting of the following steps i to iii. (Step 616)
i. A misclassification measure value d _y (x _k ^y , Λ ^(e) ) is calculated (k = 1,..., N _y ). (Step 152)
ii. As N ′ = N _y and x _k = d _y (x _k ^y , Λ ^(e) ) (k = 1,..., N _y ) (step 154), Parzen based on the same EM algorithm as in the first embodiment The window width determination method is executed to obtain the optimum window width _hy . (Step 156)
iii. Equation (18) is executed to obtain the optimum loss smoothness α _y . (Step 158)
When step 620 is executed for all classes y = 1,..., J, control passes to step 160. The processing after step 160 is the same as that shown in FIG.
Even when the algorithm shown in FIG. 12 is used, the same result as that of the first embodiment shown in FIG. 8 can be obtained.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are included. Including.

２０，３０グラフ
４０Ｐａｒｚｅｎ窓群
４２，８２分類誤り数リスクの推定値の関数
５０，６０，７０，８０Ｐａｒｚｅｎ窓 20, 30 Graph 40 Parzen window group 42, 82 Function 50, 60, 70, 80 Parzen window of classification error number risk estimate

Claims

A classifier learning device that classifies an input pattern into one of J classes C _j (j is an integer of 1 to J),
Learning sample storage means for storing N (N is a positive integer) learning samples each including an input pattern and a class to which the input pattern belongs;
Initializing means for initializing the learning parameter Λ of the classifier by a predetermined setting method,
Class C misclassification measure value measure the degree to which the input pattern x is misclassified other classes of training samples belonging to _{_y} d _y _(x, _Λ) is defined below,

However ψ is a positive real number, g y _(x, Λ) is said for each of the J Class C _y, since the input pattern x of training samples to determine the degree of whether belonging to the class Is an arbitrary form of discriminant function,
For each of the J-number of class C _y, determined the misclassification measure value for each of the training samples belonging to the class C _y, the true probability distribution that generated the sample belonging to the class, each of misclassification measure space A Parzen distribution centered on the misclassification measure value and having a Parzen window width _hy , and including, as a function of the misclassification measure value, Parzen distribution estimation means for estimating by cross-validation maximum likelihood estimation,
The Parzen distribution estimation means evaluates the likelihood of the Parzen distribution as a function of the Parzen window width _{hy in} the intersection confirmation type maximum likelihood estimation,
For each of the J-number of class _{C y,} the Parzen in the cross-validation type maximum likelihood estimation by distribution estimating means to Parzen window width _{h y} giving the Parzen distribution as a maximum likelihood, the following function

The optimum loss smoothness calculating means for calculating the optimum value α _y of the loss smoothness for the class C _{y of the} classifier,
Learning parameter adjustment means for sequentially adjusting the learning parameter Λ so as to take out one learning sample from the learning sample set one by one and minimize the classification error risk;
The Parzen distribution estimating means, the optimum loss smoothness calculating means, and the learning parameter adjusting means are repeatedly operated until a predetermined end condition is satisfied, and the learning parameter Λ when the end condition is satisfied is output. A classifier learning device comprising: an iterative control means.

The learning apparatus according to claim 1, further comprising: shuffling the order of the learning samples before repetition by the repetition control unit and before adjustment of the learning parameter Λ by the learning parameter adjustment unit. A learning apparatus including a shuffle means.

3. The learning apparatus according to claim 1, wherein the predetermined end condition is that the operation by the Parzen distribution estimation unit, the optimum loss smoothness calculation unit, and the learning parameter adjustment unit is completed a predetermined number of times. That is, the learning device.

4. The learning device according to claim 1, wherein the iterative control unit includes the Parzen among the Parzen distribution estimation unit, the optimum loss smoothness calculation unit, and the learning parameter adjustment unit. A learning apparatus that periodically omits the operations of the distribution estimation means and the optimum loss smoothness calculation means.

The learning device according to any one of claims 1 to 4,
The Parzen window constituting the Parzen distribution is a Gaussian function,
The Parzen distribution estimating means removes one specimen from the specimen belonging to the class _{C y,} a Parzen estimated distribution configuration means for configuring the Parzen estimate distribution in the rest of the sample,
The expression for defining the Parzen estimate distribution arrangement means, mixing weighting factor 1 / (N'-1) ( N' Class _C belongs number of specimens _y) is N'-1 single Gaussian mixture probability density A learning apparatus comprising: a window width calculating means for calculating a Parzen distribution window width _hy that is regarded as a function and maximizes the mixed Gaussian distribution probability density function by an EM algorithm.

In order to classify the input pattern into any of J classes C _j (j is an integer from 1 to J),
Learning sample storage means for storing N (N is a positive integer) learning samples each including an input pattern and a class to which the input pattern belongs;
A computer program that functions as an initialization unit for initializing the learning parameter Λ of the classifier by a predetermined setting method,
Class C misclassification measure value input pattern x of training samples belonging to _y is misclassified other classes d _{y (x,} lambda) is defined below,

However ψ is a positive real number, g y _(x, Λ) is said for each of the J Class C _y, since the input pattern x of training samples to determine the degree of whether belonging to the class Is an arbitrary form of discriminant function,
The computer program further includes the computer.
For each of the J-number of class C _y, determined the misclassification measure value for each of the training samples belonging to the class C _y, the true probability distribution that generated the sample belonging to the class, each of misclassification measure space A Parzen distribution with a Parzen window width _hy , centered on the misclassification measure value, and functioning as a Parzen distribution estimation means for estimating by cross-validation maximum likelihood estimation as a function of the misclassification measure value,
The Parzen distribution estimation means evaluates the likelihood of the Parzen distribution as a function of the Parzen window width _{hy in} the intersection confirmation type maximum likelihood estimation,
The computer program further includes the computer.
For each of the J-number of class _{C y,} the Parzen in the cross-validation type maximum likelihood estimation by distribution estimating means to Parzen window width _{h y} giving the Parzen distribution as a maximum likelihood, the following function

The optimum loss smoothness calculating means for calculating the optimum value α _y of the loss smoothness for the class C _{y of the} classifier,
Learning parameter adjustment means for sequentially adjusting the learning parameter Λ so as to take out one learning sample from the learning sample set one by one and minimize the classification error risk;
The Parzen distribution estimating means, the optimum loss smoothness calculating means, and the learning parameter adjusting means are repeatedly operated until a predetermined end condition is satisfied, and the learning parameter Λ when the end condition is satisfied is output. A computer program that functions as a repetitive control means.