JP2013016006A

JP2013016006A - Learning apparatus for pattern classification

Info

Publication number: JP2013016006A
Application number: JP2011148142A
Authority: JP
Inventors: Hideyuki Watanabe; 秀行渡辺
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2011-07-04
Filing date: 2011-07-04
Publication date: 2013-01-24
Anticipated expiration: 2031-07-04
Also published as: JP5834287B2

Abstract

PROBLEM TO BE SOLVED: To provide a learning apparatus capable of enhancing accuracy of a pattern classifier.SOLUTION: A misclassification measure value D(x;Λ) of an input pattern is defined by formula (1). A value of each parameter is adjusted so that a value of a minimization target function is minimized.

Description

この発明は、何らかの測定データを所定のクラスのいずれかに分類するパターン分類の学習装置に関し、特に、ＬＧＭ‐ＭＣＥ（大幾何マージン最小分類誤り）学習を用い、より分類精度が高くなることが期待できる学習が行なえる学習装置に関する。 The present invention relates to a pattern classification learning apparatus that classifies some measurement data into one of predetermined classes, and in particular, LGM-MCE (Large Geometric Margin Minimum Classification Error) learning is used to expect higher classification accuracy. The present invention relates to a learning device that can perform learning.

［パターン認識と学習］
人間と機械との間のインターフェイスにおいて、パターン認識は重要な技術である。パターン認識技術は、話者の識別、発話内容の認識、顔画像による人物の識別、及び文字認識等、様々な局面で使用される。パターン認識は、端的にいえば、何らかの物理現象を観測することにより得られる観測値のパターンを、複数個のクラスのいずれかに分類する作業である。こうした作業は人間には比較的簡単であるが、機械にさせるのは容易ではない。そうした作業を行なう装置は、包括的に呼べばパターン認識装置ということになる。パターン認識装置にパターン認識を行なわせるためには、学習データを統計的に処理することにより、分類に必要なパラメータを得る、学習と呼ばれる予備的な作業が必要とされる。 [Pattern recognition and learning]
Pattern recognition is an important technique in the interface between humans and machines. The pattern recognition technique is used in various aspects such as speaker identification, utterance content recognition, person identification by face image, and character recognition. In short, pattern recognition is an operation of classifying an observed value pattern obtained by observing some physical phenomenon into one of a plurality of classes. Although these tasks are relatively easy for humans, it is not easy to make them work. A device that performs such work is a pattern recognition device when called comprehensively. In order for the pattern recognition apparatus to perform pattern recognition, a preliminary operation called learning is required to obtain parameters necessary for classification by statistically processing learning data.

こうしたパターン分類のための学習方法として、非特許文献１に開示されたＬＧＭ−ＭＣＥ法と、非特許文献２に開示されたＭＣＥ法とがある。いずれの方法も、判別関数に基づく分類決定則を採用する。以下、それらについて説明する。 As learning methods for such pattern classification, there are an LGM-MCE method disclosed in Non-Patent Document 1 and an MCE method disclosed in Non-Patent Document 2. Both methods employ classification decision rules based on discriminant functions. These will be described below.

入力パターン（観測値）ｘ∈ΧをＪ個のクラス（類）Ｃ₁、…、Ｃ_Jのいずれか１つに割当てる分類タスクを考える。ここで、Χは全入力パターン空間を表す。ＬＧＭ−ＭＣＥ法（非特許文献１）は初期のＭＣＥ法（非特許文献２）と同様、判別関数に基づく以下の分類決定則を採用する。 Consider a classification task that assigns an input pattern (observed value) xεΧ to any one of J classes (classes) C ₁ ,..., C _J. Here, Χ represents the entire input pattern space. The LGM-MCE method (Non-Patent Document 1) employs the following classification decision rule based on a discriminant function, as in the early MCE method (Non-Patent Document 2).

ここでg_j(x;Λ)はクラスＣ_jに対する判別関数であり、ｘがクラスＣ_jに帰属する程度を表わす。Λは分類器の学習パラメータ（調整パラメータ）を表し、g_j(x;Λ)（j=1、…、J）はxとΛとに関して2階微分可能であるとする。 Here g _j (x; Λ) is the discriminant function for a class C _j, represents the degree to which x is attributable to the class C _j. Λ represents a learning parameter (adjustment parameter) of the classifier, and g _j (x; Λ) (j = 1,..., J) is second-order differentiable with respect to x and Λ.

次に、上式の分類決定則により形成される分類決定境界に着目し、xを正しく分類される境界付近の学習標本として、xと境界とのユークリッド距離rを考える。このrは幾何マージンに他ならず、この値を大きくとることで、誤分類されやすい未知パターンの正確な分類の可能性が高まる。x∈Ｃ_yであるとして、非特許文献１の結果より、幾何マージンは次式で（一般には近似的に）表される。 Next, focusing on the classification decision boundary formed by the classification decision rule of the above equation, the Euclidean distance r between x and the boundary is considered as a learning sample near the boundary where x is correctly classified. This r is nothing but the geometric margin, and by increasing this value, the possibility of accurate classification of unknown patterns that are easily misclassified increases. Assuming that xεC _y , the geometric margin is expressed by the following equation (generally approximately) from the result of Non-Patent Document 1.

ここでd_y(x;Λ)は初期のＭＣＥ法で定義される次式の誤分類尺度である（ψ＞０）。 Here, d _y (x; Λ) is a misclassification measure of the following equation defined by the initial MCE method (ψ> 0).

なおψ→∞とすれば、d_y(x;Λ)は次式となる。 If ψ → ∞, d _y (x; Λ) is as follows.

ここでＣ_iはｘに対するbest-incorrectクラスである。すなわち幾何マージンは、誤分類尺度の正負反転（関数マージンと呼ばれる。）をその勾配のノルムで正規化したものに近似的に等しい。 Here, C _i is a best-incorrect class for x. In other words, the geometric margin is approximately equal to the positive / negative reversal of the misclassification measure (called the function margin) normalized by the norm of the gradient.

ＬＧＭ−ＭＣＥ学習法は、この幾何マージンの正負反転に対応する以下のＤ_y(x;Λ)を新たな誤分類尺度として採用する。 The LGM-MCE learning method adopts the following D _y (x; Λ) corresponding to the positive / negative inversion of this geometric margin as a new misclassification measure.

D_y(x;Λ)の正値は誤分類、負値は正分類に対応する。この性質は初期のＭＣＥ法における誤分類尺度d_y(x;Λ)と共通である。以降、従来の誤分類尺度d_y(x;Λ)及び新しい誤分類尺度D_y(x;Λ)をそれぞれ、関数マージン型誤分類尺度及び幾何マージン型誤分類尺度とよぶ。 A positive value of D _y (x; Λ) corresponds to misclassification, and a negative value corresponds to positive classification. This property is in common with the misclassification measure d _y (x; Λ) in the early MCE method. Hereinafter, the conventional misclassification measure d _y (x; Λ) and the new misclassification measure D _y (x; Λ) are referred to as a function margin type misclassification measure and a geometric margin type misclassification measure, respectively.

Λの理想状態は、無限個の標本から成る次式の分類誤り数リスク（すべてのパターンに対する分類誤り確率）を最小にするものである。 The ideal state of Λ is to minimize the classification error number risk (classification error probability for all patterns) of the following equation consisting of an infinite number of samples.

ただしpは確率密度関数を表し、1（Ａ）は命題Ａが真なら1、偽ならOを返す指示関数である。したがって1(D_y(x;Λ)>0)は誤分類ならば1、正分類ならばOを返す分類誤り数損失を表す。この関数を図１のグラフ２２により示す。しかし分類誤り数損失はΛに関して微分不可能である。しかも現実的には有限個の学習用標本しか利用できない。そこでＬＧＭ−ＭＣＥ法は（初期のＭＣＥ法と同様に）、分類誤り数損失を平滑な（Λに関して微分可能な）ロジスティック関数に置き換え、有限学習標本に対するこの平均の最小化を行なう。ロジスティック関数を図２のグラフ３２により示す。x∈Ｃ_yに対する平滑化分類誤り数損失は次式で定義される(α_y＞０)。 Here, p represents a probability density function, and 1 (A) is an indicator function that returns 1 if the proposition A is true and returns O if the proposition A is false. Therefore, 1 (D _y (x; Λ)> 0) represents a loss of classification error that returns 1 if it is a misclassification and O if it is a correct classification. This function is illustrated by the graph 22 in FIG. However, the classification error number loss is not differentiable with respect to Λ. In reality, only a finite number of learning samples can be used. Thus, the LGM-MCE method (similar to the initial MCE method) replaces the classification error number loss with a smooth (differentiable with respect to Λ) logistic function and minimizes this average for a finite learning sample. The logistic function is shown by the graph 32 in FIG. The smoothed classification error number loss for x∈C _y is defined by the following equation (α _y > 0).

ＬＧＭ−ＭＣＥ学習法が目指す最小化目標関数は、Ω_N＝｛x_n, y_n｝_n=1 ^NをＮ個の標本からなる教師付学習標本集合として、以下の式で示される経験的平均損失Ｌ（Λ）である。 The minimization target function aimed by the LGM-MCE learning method is as follows: Ω _N = {x _n , y _n } _{n = 1} ^N as a supervised learning sample set consisting of N samples, and an empirical average represented by the following equation Loss L (Λ).

上式のL(Λ)の最小化は、有限学習標本に対する分類誤り数の最小化を直接的に目指すだけでなく、図２に示されるように、損失_l_y(D_y)（文字の直前に付加されたアンダースコア”_“は、その文字が式中ではイタリック体で描かれていることを示す。）がD_yの単調増加関数であるが故に、D_yを負方向に大きく増加させる。これにより、D_yの正負反転、すなわち幾何マージン（図２におけるr）が増大することとなる。 The minimization of L (Λ) in the above equation not only directly aims at minimizing the number of classification errors for a finite learning sample, but as shown in FIG. 2, loss_l _y (D _y ) (character is underscore "_" was added immediately before, indicating that the character is drawn in italics in the formula.) it is because it is a monotonically increasing function of D _y, greatly increases the D _y in the negative direction Let Thereby, the positive / negative inversion of D _y , that is, the geometric margin (r in FIG. 2) increases.

有限個の学習標本のみから構成される上式のL(Λ)は、当然ながら、学習標本集合に含まれない全ての未知パターンをも含む分類誤り数リスクR(Λ)の近似にすぎず、L(Λ)を最小にするΛは一般にR(Λ)を最小にはしない。しかし、適度な有限値のα_y（式（７）を参照）を設定することにより、評価基準L(Λ)が平滑な関数となり、学習標本集合に含まれない未知パターンに対する学習耐性を向上させる。すなわち、この平滑化により、与えられた学習標本のみならずその近傍に対しても損失が敏感となり、学習標本数を増やす効果が得られる。 Of course, L (Λ) in the above equation consisting only of a finite number of learning samples is only an approximation of the classification error number risk R (Λ) including all unknown patterns not included in the learning sample set, Λ that minimizes L (Λ) generally does not minimize R (Λ). However, by setting an appropriate finite value α _y (see equation (7)), the evaluation criterion L (Λ) becomes a smooth function and improves learning tolerance to unknown patterns not included in the learning sample set. . That is, this smoothing makes the loss sensitive not only to a given learning sample but also to the vicinity thereof, and the effect of increasing the number of learning samples can be obtained.

L(Λ)の最小化に関して、最急降下法などのバッチ的手法だけではなく、Ω_Nから１個の標本(x,y)を抽出する度にΛを調整する適応的な学習方法も広く用いられている。その方法におけるΛの調整機構は次式で与えられる。ただし_l^’ _yは損失関数_l_yの導関数であり、学習係数εは各繰返しステップで可変とする。 For minimizing L (Λ), not only a batch method such as steepest descent method, but also an adaptive learning method that adjusts Λ every time one sample (x, y) is extracted from Ω _N is widely used. It has been. The adjustment mechanism of Λ in the method is given by the following equation. Here, _l ^′ _y is a derivative of the loss function _l _y , and the learning coefficient ε is variable at each iteration step.

以上がＬＧＭ−ＭＣＥ法の概要である。 The above is the outline of the LGM-MCE method.

Ｈ．ワタナベ他、「幾何マージン制御を伴う最小誤り分類」、ＩＥＥＥＩＣＡＳＳＰ予稿集、ｐｐ．２１７０−２１７３、２０１０年３月（H. Watanabe et al.、 Minimum error classification with geometric margin control.” in Proc. IEEE ICASSP、 pp. 2170-2173 Mar. 2010）H. Watanabe et al., “Minimum Error Classification with Geometric Margin Control”, IEEE ICASSP Proceedings, pp. 2170-2173, March 2010 (H. Watanabe et al., Minimum error classification with geometric margin control. ”In Proc. IEEE ICASSP, pp. 2170-2173 Mar. 2010) Ｂ．‐Ｈ．ジュアン及びＳ．カタギリ、「最小誤り分類のための識別学習」ＩＥＥＥ信号処理トランザクション、第４０巻、第１２号、ｐｐ．３０４３‐３０５４、１９９２年１２月（B.‐H. Juang and S. Katagiri、 “Discriminative learning for minimum error classification、” IEEE Trans. Signal Processing、 vol.40、 no.12、 pp.3043‐3054、 Dec. 1992.）B. -H. Juan and S. Katagiri, “Distinguishing Learning for Minimum Error Classification” IEEE Signal Processing Transactions, Vol. 40, No. 12, pp. 3043-3054, December 1992 (B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol.40, no.12, pp.3043-3054, Dec. 1992.)

従来実装されているＬＧＭ‐ＭＣＥ学習法における各クラスの間の境界は、線形関数により与えられる。すなわち、各クラスの間の境界は、２次元空間の場合には直線で、３次元の場合には平面で、４次元以上の場合にはその次元−１の超平面により規定される。 The boundary between each class in the conventionally implemented LGM-MCE learning method is given by a linear function. That is, the boundary between each class is defined by a straight line in the case of a two-dimensional space, a plane in the case of three dimensions, and a hyperplane of dimension-1 in the case of four or more dimensions.

これは、判別関数として線形関数を用いているためである。実際に、線形関数を用いることにより学習のための処理は比較的単純であるという効果がある。しかし逆に、そのためにＬＧＭ−ＭＣＥ学習法は限定された分野にしか適用が難しいという問題がある。さらに、線形の判別関数を用いたＬＧＭ−ＭＣＥ学習法では、分類の精度を高めることが難しいという問題がある。 This is because a linear function is used as the discriminant function. In fact, the use of a linear function has the effect that the process for learning is relatively simple. However, on the contrary, there is a problem that the LGM-MCE learning method is difficult to apply only in a limited field. Furthermore, the LGM-MCE learning method using a linear discriminant function has a problem that it is difficult to increase the accuracy of classification.

したがって本発明の目的は、ＬＧＭ‐ＭＣＥ学習によるパターン分類器の学習装置において、得られる分類器の精度をより高くすることができる学習装置を提供することである。 Therefore, an object of the present invention is to provide a learning device that can increase the accuracy of the obtained classifier in the learning device for a pattern classifier by LGM-MCE learning.

本発明の第１の局面に係る分類器の学習装置は、入力パターンをＪ個のクラスＣ_ｊ（ｊは１〜Ｊの整数）のいずれかに分類する分類器の学習装置である。この装置は、Ｎ個（Ｎは正の整数）の教師付の入力パターンを含む学習標本集合を記憶するための学習標本記憶手段と、分類器の学習パラメータ集合Λを予め定めた設定方法により初期化するための初期化手段とを含む。クラスＣ_ｙに属する学習標本集合内の入力パターンｘが他のクラスに誤分類される度合いを測る幾何マージン型誤分類尺度値D_y(x;Λ)が以下により定義される。 The classifier learning device according to the first aspect of the present invention is a classifier learning device that classifies an input pattern into one of J classes C _j (j is an integer from 1 to J). This apparatus uses a learning sample storage means for storing a learning sample set including N (N is a positive integer) supervised input pattern, and a learning parameter set Λ of a classifier is initialized by a predetermined setting method. Initializing means for converting to an initial value. Class C input pattern x of training samples in the set which belongs to _y is measure the degree to which misclassified other classes Geometric Margin type misclassification measure value D _y (x; lambda) is defined below.

ただしψは正の実数であり、g_y(x;Λ)はＪ個のクラスＣ_ｙの各々に対して、学習標本集合内の入力パターンｘが当該クラスに属するか否かの度合いを判別するための、ｘと学習パラメータ集合Λとについて２階微分可能な任意の形の判別関数であり、d_y(x;Λ)は関数マージン型誤分類尺度と呼ばれる。 However ψ is a positive real number, g _y (x; Λ) for each of J-number of class C _y, the input pattern x of training samples in the set to determine the degree of whether belonging to the class Therefore, d _y (x; Λ) is called a function margin-type misclassification measure.

学習パラメータ集合Λに含まれるｋ個の変数を並べたベクトルλ＝［λ₁…λ_k］について、誤分類尺度値D_y(x;Λ)のベクトルλによる偏微分は、関数d_y(x;Λ)の勾配ベクトル∇_xd_yを用いて以下の式により与えられ、ただし上付きのＴは行列の転置を表す。 For the vector λ = [λ ₁ ... Λ _k ] in which _k variables included in the learning parameter set Λ are arranged, the partial differentiation of the misclassification measure value D _y (x; Λ) by the vector λ is the function d _y (x ; Λ) gradient vector ∇ _x d _y is given by the following equation, where the superscript T represents the transpose of the matrix.

この学習装置はさらに、学習パラメータ集合Λに関する所定の最小化目標関数Ｌ（Λ）の値が、学習標本集合に対して最小となるように、誤分類尺度値D_y(x;Λ)の偏微分を用いて、学習パラメータ集合Λに含まれる各パラメータの値を適応的に調整するパラメータ調整手段を含む。 The learning apparatus further includes a bias of the misclassification measure value D _y (x; Λ) so that the value of the predetermined minimization target function L (Λ) with respect to the learning parameter set Λ is minimized with respect to the learning sample set. Parameter adjustment means for adaptively adjusting the value of each parameter included in the learning parameter set Λ using differentiation.

好ましくは、クラスＣ_j（ｊ＝１，…，Ｊ）に対する判別関数が、クラスＣ_jに属するＭ個のプロトタイプをp_j,1,...,p_j,M、各プロトタイプに対応する正定値行列をＡ_j,1，…，Ａ_j,Mとして、次式で与えられる。 Preferably, the discriminant function for class C _j (j = 1,..., J) has M prototypes belonging to class C _j as p _{j, 1} ,..., P _{j, M} and positive definite corresponding to each prototype. The value matrix is given by the following equation as A _{j, 1} ,..., A _{j, M.}

ただし、p_j及びＡ_jは、クラスＣ_jに属するプロトタイプの中で、入力パターンｘとの間に次式で定められる距離Ｄｉｓｔａｎｃｅ However, p _j and A _j are distance distances defined by the following equation between the input pattern x and the prototype belonging to the class C _j.

が最小となるプロトタイプの指標をm(j)として、p_j=p_j,m(j)、及びＡ_j=Ａ_j,m(j)である。関数マージン型誤分類尺度d_y(x;Λ)は、次式で与えられる。 P _j = p _{j, m (j)} and A _j = A _{j, m (j),} where _{m (j)} is the prototype index that minimizes. The function margin type misclassification scale d _y (x; Λ) is given by the following equation.

幾何マージン型誤分類尺度D_y(x;Λ)及びその偏微分は以下の式で与えられる。 The geometric margin type misclassification measure D _y (x; Λ) and its partial derivative are given by the following equations.

より好ましくは、正定値行列Ａ_j,1，…，Ａ_j,Mは以下のような、正の対角成分を持つ対角行列である。
More preferably, the positive definite matrix A _{j, 1} ,..., A _{j, M} is a diagonal matrix having a positive diagonal component as follows.

そして、パラメータa_j,1,...,a_j,Dは学習パラメータ集合Λに含まれ、幾何マージン型誤分類尺度D_y(x;Λ)の、パラメータa_y,d及びa_i,d（d=1,...,D）に関する偏微分は以下の式により表される。 The parameters a _{j, 1} , ..., a _{j, D} are included in the learning parameter set Λ, and the parameters a _{y, d} and a _{i, d of the} geometric margin type misclassification measure D _y (x; Λ) The partial differentiation with respect to (d = 1, ..., D) is expressed by the following equation.

より好ましくは、クラスＣ_j（j=1,...,J）に対する判別関数が以下で与えられてもよい。 More preferably, the discriminant function for class C _j (j = 1,..., J) may be given as:

ただしp_j,1,...,p_j,MはクラスＣ_jに属するＭ個のプロトタイプであり、w_j,m（m=1,...,M）は、ｍ番目のプロトタイプとのユークリッド距離に対する重みである。学習パラメータ集合Λ及び関数マージン型誤分類尺度d_y(x;Λ)は以下の式で与えられてもよい。 Where p _{j, 1} , ..., p _{j, M} are M prototypes belonging to class C _j , and w _{j, m} (m = 1, ..., M) is the mth prototype. It is a weight for the Euclidean distance. The learning parameter set Λ and the function margin type misclassification measure d _y (x; Λ) may be given by the following equations.

ただしクラスＣ_y及びＣ_iはそれぞれｘの正解クラス及びbest-incorrectクラスである。幾何マージン型誤分類尺度D_y(x;Λ)及びその偏微分は以下の式により表される。 However Class C _y and C _i are the correct class and best-The net part class x respectively. The geometric margin type misclassification scale D _y (x; Λ) and its partial derivative are expressed by the following equations.

さらに好ましくは、分類器は、入力層、中間層及び出力層からなる３層フィードフォワード型ニューラルネットワーク分類器である。入力層はＤ＋１個のユニットを含む。中間層はＭ＋１個のユニットを含む。中間層のｍ番目（m=1,...,M）のユニットは入力層からの出力の重み付け総和に対して非線形関数ｆ_mを施して出力する。出力層は、Ｊ個のユニットを含む。各ｊ番目ユニット（j=1,...,J）は、中間層からの出力の重み付け総和をクラスＣ_jの判別関数g_jとして出力する。クラスＣ_j（j=1,...,J）に対する判別関数は以下で与えられる。
More preferably, the classifier is a three-layer feedforward neural network classifier including an input layer, an intermediate layer, and an output layer. The input layer includes D + 1 units. The intermediate layer includes M + 1 units. The m-th unit (m = 1,..., M) in the intermediate layer performs output by applying a non-linear function f _m to the weighted sum of the output from the input layer. The output layer includes J units. Each j-th unit (j = 1,..., J) outputs the weighted sum of the outputs from the intermediate layer as a discriminant function g _j of class C _j . The discriminant function for class C _j (j = 1,..., J) is given by

ここでw_m,d（m=1,...,M; d=0,1,...,D）は、入力層のｄ番目のユニットから中間層のｍ番目のユニットへの結合に対する重み付け係数、v_j,m（j=1,...,J; m=0,1,...,M）は中間層のｍ番目のユニットから出力層のｊ番目のユニットへの結合に対する重み付け係数である。学習パラメータ集合Λは、重み付け係数w_m,d（m=1,...,M; d=0,1,...,D）及びv_j,m（j=1,...,J; m=0,1,...,M）を含む。幾何マージン型誤分類尺度D_y(x;Λ)及びその偏微分は以下の式である。 Where w _{m, d} (m = 1, ..., M; d = 0,1, ..., D) is for the coupling from the d-th unit of the input layer to the m-th unit of the intermediate layer. The weighting factor, v _{j, m} (j = 1, ..., J; m = 0,1, ..., M) is for the coupling from the mth unit in the middle layer to the jth unit in the output layer It is a weighting factor. The learning parameter set Λ includes weighting coefficients w _{m, d} (m = 1, ..., M; d = 0,1, ..., D) and v _{j, m} (j = 1, ..., J ; m = 0,1, ..., M). The geometric margin type misclassification scale D _y (x; Λ) and its partial differentiation are as follows.

本発明の第２の局面に係るコンピュータプログラムは、コンピュータを、上記したいずれかの分類器の学習装置の各手段として機能させる。 The computer program according to the second aspect of the present invention causes a computer to function as each means of the learning device for any one of the classifiers described above.

ＬＧＭ‐ＭＣＥ学習法における分類誤り数損失関数のグラフである。It is a graph of the classification error number loss function in the LGM-MCE learning method. ＬＧＭ‐ＭＣＥ学習法におけるロジスティックシグモイド関数による平滑化分類誤り数損失関数のグラフである。It is a graph of the smoothing classification error number loss function by the logistic sigmoid function in a LGM-MCE learning method. 本発明の１実施の形態に係る分類器を用いる文字認識システムのブロック図である。It is a block diagram of the character recognition system using the classifier which concerns on one embodiment of this invention. 本発明の第１の実施の形態により分類器の学習を行なうためのプログラムのフローチャートである。It is a flowchart of the program for performing the learning of a classifier by the 1st Embodiment of this invention. 本発明の第１の実施の形態の変形例により分類器の学習を行なうためのプログラムのフローチャートである。It is a flowchart of the program for performing the learning of a classifier by the modification of the 1st Embodiment of this invention. 本発明の第２の実施の形態により分類器の学習を行なうためのプログラムのフローチャートである。It is a flowchart of the program for performing the learning of a classifier by the 2nd Embodiment of this invention. 本発明の第３の実施の形態のシステムで用いられるニューラルネットワークの構成を模式的に示す図である。It is a figure which shows typically the structure of the neural network used with the system of the 3rd Embodiment of this invention. 本発明の第３の実施の形態により分類器の学習を行なうためのプログラムのフローチャートである。It is a flowchart of the program for performing the learning of a classifier by the 3rd Embodiment of this invention. 本発明の実施の形態を実現する汎用のコンピュータシステムのハードウェア外観を示す図である。It is a figure which shows the hardware external appearance of the general purpose computer system which implement | achieves embodiment of this invention. 図９に示すコンピュータシステムの内部構造のブロック図である。It is a block diagram of the internal structure of the computer system shown in FIG.

以下、本発明の実施の形態を説明する。以下の説明及び図面において、同一の構成要素には同一の参照番号を付してある。それらの名称及び機能も同一である。したがって、それらについての詳細な説明は繰返さない。 Embodiments of the present invention will be described below. In the following description and drawings, the same components are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

以下の実施の形態では、判別関数として非線形関数を用いる。非線形関数を用いることにより、概念的には、４次元以上の標本空間では分類境界が超曲面となり、分類精度をより高めることが期待できる。 In the following embodiment, a nonlinear function is used as the discriminant function. By using a non-linear function, conceptually, the classification boundary becomes a hypersurface in a sample space of four or more dimensions, and it can be expected that the classification accuracy is further improved.

［偏微分計算］
上記したように判別関数として非線形関数を採用する場合、特に多次元空間の場合には偏微分計算が難しいという問題がある。以下、非線形の判別関数に対する幾何マージン型誤分類尺度D_y(x;Λ)の偏微分計算について一般的に論じる。 [Partial differential calculation]
As described above, when a nonlinear function is employed as a discriminant function, there is a problem that partial differential calculation is difficult particularly in the case of a multidimensional space. In the following, the partial differential calculation of the geometric margin type misclassification measure D _y (x; Λ) for the nonlinear discriminant function will be generally discussed.

一般形の判別関数に対するＬＧＭ−ＭＣＥ学習を式（９）に従って実行するために、クラスＣ_yに属するＤ次元入力パターンを
ｘ＝［ｘ_１…ｘ_Ｄ］^Ｔ
として、式（９）に含まれる非線形の幾何マージン型誤分類尺度D_y(x;Λ)の変微分∇_ΛD_y(x;Λ)を以下で導出する。ただし、上付きＴは行列の転置を表す。 The LGM-MCE training for the general form of the discriminant function in order to perform in accordance with equation (9), a D-dimensional input pattern belonging to the class _{_{_{C y x = [x 1 ...}}} x D] T
As false Geometric Margin type nonlinear in an expression (9) Classification Scale D _y to derive;; (lambda x) below variable differential ∇ of (x Λ) Λ D _{_y.} However, superscript T represents transposition of a matrix.

まずＡ＝||∇_xd_y(x;Λ)||²とおく。このとき、１／||∇_xd_y(x;Λ)||＝Ａ^-1/2。Λに含まれるある１変数λでD_y(x;Λ)を偏微分すると、以下となる。 First _{_{A = || ∇ x d y (}} x; Λ) is denoted by || ^2. In this _{_{case, 1 / || ∇ x d y}} (x; Λ) || = A -1/2. When D _y (x; Λ) is partially differentiated by one variable λ included in Λ, the following is obtained.

さらにＡをλで偏微分すると以下となる。 Further, when A is partially differentiated by λ, the following is obtained.

これを式（１０）に代入して以下を得る。 Substituting this into equation (10) gives:

またΛに含まれるｋ個の変数を並べたベクトルλ＝［λ₁…λ_k］^Tに関しては、式（１１）より、偏微分ベクトルは以下の式により表される。 For the vector λ = [λ ₁ ... Λ _k ] ^{T in which} _k variables included in Λ are arranged, the partial differential vector is expressed by the following equation from Equation (11).

なお、式（１４）の左辺は勾配ベクトル∇_xd_yのヤコビ行列である。 Incidentally, the left-hand side of equation (14) is a Jacobian matrix of the gradient vector ∇ _x d _y.

結局、式（１０）及び（１３）から、幾何マージン型誤分類尺度D_y(x;Λ)のベクトル値変数に対する偏微分は以下で与えられる。 Eventually, from equations (10) and (13), the partial differentiation of the geometric margin type misclassification measure D _y (x; Λ) with respect to the vector value variable is given as follows.

［ＬＧＭ−ＭＣＥ学習法の２次判別関数型分類器への適用］
《判別関数及び幾何マージン型誤分類尺度の偏微分》
クラスＣ_j（j=1,…,J）に対する判別関数は、クラスＣ_jに属するＭ個のプロトタイプをp_j,1，…，p_j,M、各プロトタイプに対応する正定値行列をＡ_j,i，…，Ａ_j,Mとして、次式で与えられる。 [Application of LGM-MCE learning method to quadratic discriminant function classifier]
《Partial differentiation of discriminant function and geometric margin type misclassification measure》
The discriminant function for class C _j (j = 1,..., J) has M prototypes belonging to class C _j as p _{j, 1} ,..., P _{j, M} , and a positive definite matrix corresponding to each prototype as A _{j. , i} ,..., A _{j, M} are given by the following equation.

式（１６）におけるp_j及びＡ_jは、クラスＣ_jに属するプロトタイプの中でxに対する次式の意味での距離Distanceの最小値を与えるプロトタイプの指標をm(j)として、p_j=p_j,m(j)及びＡ_j=Ａ_j,m(j)としている。 P _j and A _j in equation (16) are p _j = p, where m (j) is a prototype index that gives the minimum value of the distance Distance in the sense of the following equation among x prototypes belonging to class C _j _{j, m (j)} and A _j = A _{j, m (j)} .

また、Ａ_j,1=…=Ａ_j,M=Ａ_jとしてもよい。Ａ_j,mの具体的な形は問わないが、例えばクラスＣ_jにおけるｍ番目クラスタ（K-means法等で求められる。）に属する学習標本集合の共分散行列の逆行列として与えればよく、あるいは、Ａ_j,mは、同じクラスタに属する学習標本集合の対角型共分散行列の逆行列として与えればよい。関数マージン型誤分類尺度は次式で与えられる。 Alternatively, A _{j, 1} =... = A _{j, M} = A _j . The specific form of A _{j, m} does not matter. For example, it may be given as an inverse matrix of the covariance matrix of the learning sample set belonging to the m-th cluster in class C _j (determined by the K-means method, etc.) Alternatively, A _{j, m} may be given as an inverse matrix of a diagonal covariance matrix of learning sample sets belonging to the same cluster. The function margin type misclassification scale is given by the following equation.

ただしＣ_y及びＣ_iはそれぞれｘの正解クラス及びbest-incorrectクラスである。このとき、関数マージン型誤分類尺度の微分は以下のとおりとなる。 However C _y and C _i are the correct class and best-The net part class x respectively. At this time, the differential of the function margin type misclassification scale is as follows.

幾何マージン型誤分類尺度D_y(x;Λ)及びその偏微分は式（１５）より次式となる。 The geometric margin type misclassification scale D _y (x; Λ) and its partial differentiation are expressed by the following equation from the equation (15).

《第１の実施の形態：システムの構成》
図３を参照して、本実施の形態に係る分類器を用いた一例としての文字認識システム４０は、教師付標本データによる学習を行なうことにより、文字画像データを文字カテゴリに分類するための分類器８０と、教師文字カテゴリが付されたデジタルの文字データを使用した学習により、分類器８０の学習を行なうための学習ユニット５０と、文字画像を入力するタッチパネル５２と、タッチパネル５２の出力する信号を、学習ユニット５０による学習が行なわれた分類器８０を用いて文字カテゴリ５６として出力する文字認識ユニット５４とを含む。 << First Embodiment: System Configuration >>
Referring to FIG. 3, a character recognition system 40 as an example using the classifier according to the present embodiment performs classification based on supervised sample data to classify character image data into character categories. , A learning unit 50 for learning by the classifier 80 by learning using digital character data with a teacher character category, a touch panel 52 for inputting a character image, and a signal output by the touch panel 52 Is recognized as a character category 56 using the classifier 80 learned by the learning unit 50.

学習ユニット５０は、教師文字カテゴリ付の文字画像データを記憶する記憶部７０と、記憶部７０から、所定の文字特徴量（位置情報、統計的モーメント、エッジカウントなど）を計算により抽出するための文字特徴量抽出モジュール７４と、文字特徴量抽出モジュール７４の出力する教師データを学習用標本データとして記憶する学習データ記憶部７６と、学習データ記憶部７６に記憶された学習用標本データを用い、後述する学習方法により分類器８０の学習を行なう学習モジュール７８とを含む。なお、以下の説明では、文字特徴量はベクトルで表されるものとする。すなわち、文字特徴量ベクトルをｘ、教師データとしての文字カテゴリをｙとすると、標本データの各々は（x,y）という形式で表すことができる。 The learning unit 50 stores character image data with a teacher character category, and a predetermined character feature (position information, statistical moment, edge count, etc.) is extracted from the storage unit 70 by calculation. Using the character feature extraction module 74, the learning data storage unit 76 that stores the teacher data output from the character feature extraction module 74 as learning sample data, and the learning sample data stored in the learning data storage unit 76, And a learning module 78 that performs learning of the classifier 80 by a learning method to be described later. In the following description, it is assumed that the character feature amount is represented by a vector. That is, if the character feature vector is x and the character category as teacher data is y, each sample data can be expressed in the form (x, y).

文字認識ユニット５４は、タッチパネル５２の出力信号をデジタル信号に変換する２値化処理部９０と、２値化処理部９０の出力する信号について、文字特徴量抽出モジュール７４と同じ方法により文字特徴量を抽出して出力する文字特徴量抽出モジュール９４と、文字特徴量抽出モジュール９４の出力する一連の文字特徴量に対して分類器８０を適用し、出力文字カテゴリ５６を出力するデコーダ９６とを含む。 The character recognition unit 54 converts the output signal of the touch panel 52 into a digital signal, and the character output of the signal output from the binarization processor 90 by the same method as the character feature extraction module 74. Character extraction module 94 that extracts and outputs the character, and a decoder 96 that applies the classifier 80 to a series of character feature output from the character feature extraction module 94 and outputs an output character category 56. .

《学習アルゴリズム》
以下、本実施の形態においてパラメータの学習を行なうためのアルゴリズムについて、図４を参照して説明する。図４を参照して、このアルゴリズムを実現するプログラムは、以下のステップを含む。《Learning algorithm》
Hereinafter, an algorithm for performing parameter learning in the present embodiment will be described with reference to FIG. Referring to FIG. 4, a program for realizing this algorithm includes the following steps.

１．初期化ステップ１２０。ここでは、プロトタイプp_j,mの初期値p_j,m ⁽⁰⁾及び正定値行列Ａ_j,mを設定する（j=1,...,J; m=1,...,M）。またエポック回数eの上限値Eを設定する。 1. Initialization step 120. Here, the initial value p _{j, m} ⁽⁰⁾ of the prototype p _{j, m} and the positive definite matrix A _{j, m} are set (j = 1, ..., J; m = 1, ..., M). . In addition, an upper limit value E of the epoch count e is set.

２．エポック回数e=0,1,...,Eに対して、以下の各サブステップ（ａ）〜（ｃ）を含む処理１２４を実行するステップ１２２。 2. Step 122 of executing processing 124 including the following substeps (a) to (c) for the number of epochs e = 0, 1,.

（ａ）サブステップ１４０。ここでは、必要ならば、損失平滑度パラメータの最適値α_y（y=1,...,J）を得る。損失平滑度パラメータα_yの値としては、経験的に求めたものでもよいが、本願発明者が先に出願した特願２０１０−１８４３３４号に記載した、損失関数平滑度自動設定法を用いて決定した損失平滑度パラメータα_yを用いるとより好ましい。 (A) Sub-step 140. Here, if necessary, the optimum value α _y (y = 1,..., J) of the loss smoothness parameter is obtained. The value of the loss smoothness parameter α _y may be determined empirically, but is determined using the loss function smoothness automatic setting method described in Japanese Patent Application No. 2010-184334 filed earlier by the present inventor. It is more preferable to use the obtained loss smoothness parameter α _y .

（ｂ）サブステップ１４２。ここでは、学習標本集合Ω_Nから、教師付学習標本（x,y）を取出し、各学習標本に対して以下のサブステップ１６０〜１７４を含む処理１４４を実行する。 (B) Sub-step 142. Here, the supervised learning sample (x, y) is taken out from the learning sample set Ω _N , and the processing 144 including the following sub-steps 160 to 174 is executed for each learning sample.

（ｂ１）以下の手順にしたがい、判別関数値g_jを計算する（j=1,...,J）（サブステップ１６０）。 (B1) The discriminant function value g _j is calculated according to the following procedure (j = 1,..., J) (substep 160).

（ｂ２）ｘに対するbest-incorrectクラスの指標iを求める（サブステップ１６２）。 (B2) The index i of the best-incorrect class for x is obtained (substep 162).

（ｂ３）関数マージン型誤分類尺度をd_y=-g_y+g_iによって計算する（サブステップ１６４）。 (B3) A function margin type misclassification measure is calculated by d _y = −g _y + g _i (substep 164).

（ｂ４）関数マージン型誤分類尺度の入力パターンに対する勾配ベクトルを計算する（ステップ１６６）。 (B4) A gradient vector for the input pattern of the function margin type misclassification measure is calculated (step 166).

（ｂ５）勾配ベクトルのノルム||∇_xd_y||を計算する（サブステップ１６８）。 (B5) gradient to calculate the norm || ∇ _x d _y || vector (substep 168).

（ｂ６）幾何マージン型誤分類尺度D_y(x;Λ)をD_y=d_y/||∇_xd_y||により計算する（サブステップ１７０）。 (B6) The geometric margin type misclassification measure D _y (x; Λ) is calculated by D _y = d _y / || ∇ _x d _y || (substep 170).

（ｂ７）幾何マージン型誤分類尺度D_y(x;Λ)の偏微分を以下により計算する（サブステップ１７２）。 (B7) The partial differentiation of the geometric margin type misclassification measure D _y (x; Λ) is calculated as follows (substep 172).

（ｂ８）以下の式によりパラメータ更新を行なう（サブステップ１７４）。 (B8) The parameter is updated by the following formula (substep 174).

以上のサブステップ（ｂ１）〜（ｂ８）が処理１４４の内容である。処理１４４を各学習標本に対して順番に１回ずつ実行し、全標本に対する調整が終了した時点で、新たなプロトタイプp_j,m ^(e+1)（j=1,...,J; m=1,...,M）を得る。 The above sub-steps (b1) to (b8) are the contents of the process 144. The processing 144 is executed once for each learning sample in turn, and when the adjustment for all the samples is completed, a new prototype p _{j, m} ^{(e + 1)} (j = 1,..., J; m = 1, ..., M).

（ｃ）ステップ１４６。ここでは、学習標本集合Ω_Nにおける学習標本の並び順をシャッフルする。 (C) Step 146. Here, the order of learning samples in the learning sample set Ω _N is shuffled.

以上のステップ（ａ）〜（ｃ）が処理１２４の内容である。エポックeに対して処理１２４を行なった後、エポックを１進め（e=e+1）、同じ処理を繰返す。 The above steps (a) to (c) are the contents of the process 124. After performing the process 124 for the epoch e, the epoch is advanced by 1 (e = e + 1) and the same process is repeated.

こうして、エポック数が予定した上限値Ｅに達して処理１２４が終了すると、分類器８０の学習後のパラメータ集合Λが得られる。 Thus, when the number of epochs reaches the planned upper limit value E and the processing 124 is finished, a parameter set Λ after learning by the classifier 80 is obtained.

《第１の実施の形態の変形例》
さらに、プロトタイプ｛p_j,m｝のみならず、Ａ_jを対角行列として、その正の対角成分を学習してもよい。ここでは << Modification of First Embodiment >>
Further, not only the prototype {p _{j, m} } but also the positive diagonal component may be learned by using A _j as a diagonal matrix. here

として、a_j,d（d=1,...,D）を調整する。関数マージン型誤分類尺度の各a_y,d、a_i,d（d=1,...,D）に関する偏微分は以下のようになる。 Adjust a _{j, d} (d = 1,..., D). The partial differential for each of the function margin type misclassification scales a _{y, d} , a _{i, d} (d = 1, ..., D) is as follows.

ただしp_j=[p_j,1…p_j,D]^Tとしている。d_yの２階変微分は Here, p _j = [p _{j, 1} ... P _{j, D} ] ^T. The second-order variable derivative of d _y is

となる。ただしδ_i,j=１（if i=j）、０（if i≠j）。幾何マージン型誤分類尺度D_yの各a_y,d、a_i,d（d=1,...,D）に関する偏微分は、式（１２）より次式となる。 It becomes. However, δ _{i, j} = 1 (if i = j), 0 (if i ≠ j). The partial differentiation with respect to each a _{y, d} , a _{i, d} (d = 1,..., D) of the geometric margin type misclassification scale D _y is expressed by the following equation from Equation (12).

なお、式（９）における学習係数εに関して、p_j,mの修正に対するεとa_j,dの修正に対するεとは互いに異なる値であってもよい。 Regarding the learning coefficient ε in equation (9), ε for correction of p _{j, m} and ε for correction of a _{j, d} may be different from each other.

《変形例の学習プログラム》
図５を参照して、上記第１の実施の形態の変形例のアルゴリズムを実現するプログラムは以下のようなステップを含む。《Modification learning program》
Referring to FIG. 5, the program that realizes the algorithm of the modified example of the first embodiment includes the following steps.

１．初期化ステップ２２０。ここでは、プロトタイプp_j,mの初期値p_j,m ⁽⁰⁾（j=1,...,J; m=1,...,M）及び正定値行列Ａ_jに対応する変数の初期値a_j,d ⁽⁰⁾（j=1,...,J; d=1,...,D）を設定する。またエポック回数eの上限値Eを設定する。 1. Initialization step 220. Here, the initial values p _{j, m} ⁽⁰⁾ (j = 1, ..., J; m = 1, ..., M) of the prototype p _{j, m} and the variables corresponding to the positive definite matrix A _j Set initial values a _{j, d} ⁽⁰⁾ (j = 1, ..., J; d = 1, ..., D). In addition, an upper limit value E of the epoch count e is set.

２．エポック回数e=0,1,...,Eに対して、以下の各サブステップ（ａ）〜（ｃ）を含む処理２２４を実行するステップ２２２。 2. Step 222 of executing processing 224 including the following substeps (a) to (c) for the epoch counts e = 0, 1,.

（ａ）サブステップ２４０。ここでは、必要ならば、損失平滑度パラメータの最適値α_y（y=1,...,J）を得る。 (A) Sub-step 240. Here, if necessary, the optimum value α _y (y = 1,..., J) of the loss smoothness parameter is obtained.

（ｂ）サブステップ２４２。ここでは、学習標本集合Ω_Nから、教師付学習標本（x,y）を取出し、各学習標本に対して以下のサブステップ２６０〜２７４を含む処理２４４を実行する。なお、変数の右肩の「（e）」は、エポック番号を表す。 (B) Substep 242. Here, the supervised learning sample (x, y) is taken out from the learning sample set Ω _N , and the processing 244 including the following sub-steps 260 to 274 is executed for each learning sample. “(E)” on the right shoulder of the variable represents an epoch number.

（ｂ１）以下の手順にしたがい、判別関数値g_jを計算する（j=1,...,J）（サブステップ２６０）。 (B1) The discriminant function value g _j is calculated according to the following procedure (j = 1,..., J) (substep 260).

（ｂ２）ｘに対するbest-incorrectクラスの指標iを求める（サブステップ２６２）。 (B2) A best-incorrect class index i for x is obtained (substep 262).

（ｂ３）関数マージン型誤分類尺度をd_y=-g_y+g_iによって計算する（サブステップ２６４）。 (B3) a function margin type misclassification measure calculated by _{_{_{d y = -g y + g i}}} ( substep 264).

（ｂ４）関数マージン型誤分類尺度の入力パターンに対する勾配ベクトルを計算する（ステップ２６６）。 (B4) A gradient vector for the input pattern of the function margin type misclassification measure is calculated (step 266).

（ｂ５）勾配ベクトルのノルム||∇_xd_y||を計算する（サブステップ２６８）。 (B5) gradient to calculate the norm || ∇ _x d _y || vector (substep 268).

（ｂ６）幾何マージン型誤分類尺度D_y(x;Λ)をD_y=d_y/||∇_xd_y||により計算する（サブステップ２７０）。 (B6) The geometric margin type misclassification measure D _y (x; Λ) is calculated by D _y = d _y / || ∇ _x d _y || (substep 270).

（ｂ７）幾何マージン型誤分類尺度D_y(x;Λ)の偏微分を以下により計算する（サブステップ２７２）。 (B7) The partial differentiation of the geometric margin type misclassification measure D _y (x; Λ) is calculated as follows (substep 272).

（ｂ８）以下の式によりパラメータ更新を行なう（サブステップ２７４）。 (B8) The parameter is updated by the following formula (substep 274).

以上のサブステップ（ｂ１）〜（ｂ８）が処理２４４の内容である。処理２４４を各学習標本に対して順番に１回ずつ実行し、全標本に対する調整が終了した時点で、新たなプロトタイプp_j,m ^(e+1)（j=1,...,J; m=1,...,M）及び行列パラメータa_j,d ^(e+1)（j=1,...,J; d=1,...,D）を得る。
The above substeps (b1) to (b8) are the contents of the processing 244. The processing 244 is executed once for each learning sample in turn, and when the adjustment for all the samples is completed, a new prototype p _{j, m} ^{(e + 1)} (j = 1,..., J; m = 1, ..., M) and matrix parameters a _{j, d} ^{(e + 1)} (j = 1, ..., J; d = 1, ..., D).

（ｃ）ステップ２４６。ここでは、学習標本集合Ω_Nにおける学習標本の並び順をシャッフルする。 (C) Step 246. Here, the order of learning samples in the learning sample set Ω _N is shuffled.

以上のステップ（ａ）〜（ｃ）が処理２２４の内容である。エポックeに対して処理２２４を行なった後、エポックを１進め（e=e+1）、同じ処理を繰返す。 The above steps (a) to (c) are the contents of the process 224. After performing the process 224 on the epoch e, the epoch is advanced by 1 (e = e + 1) and the same process is repeated.

こうして、エポック数が予定した上限値Ｅに達して処理２２４が終了すると、この変形例に係る分類器８０の学習後のパラメータ集合Λが得られる。 Thus, when the number of epochs reaches the predetermined upper limit value E and the processing 224 is completed, a parameter set Λ after learning of the classifier 80 according to this modification is obtained.

［第２の実施の形態：重み付きプロトタイプ型分類器］
非線形の判別関数を用いた例として、重み付きプロトタイプ型の分類器を考える。この場合も第１の実施の形態と同様、以下のようにして分類器の学習を行なうことができる。 [Second Embodiment: Weighted Prototype Classifier]
As an example using a nonlinear discriminant function, consider a weighted prototype type classifier. In this case, as in the first embodiment, the classifier can be learned as follows.

《判別関数及び幾何マージン型分類尺度の偏微分》
クラスＣ_j（j=1,...,J）に対する判別関数は次式で与えられる。《Partial differentiation of discriminant function and geometric margin type classification measure》
The discriminant function for class C _j (j = 1,..., J) is given by

ここでp_j,1,...,p_j,MはクラスＣ_jに属するＭ個のプロトタイプであり、w_j,m（m=1,...,M）はｍ番目のプロトタイプとのユークリッド距離に対する重みである。学習パラメータ集合Λ及び関数マージン型誤分類尺度は以下で与えられる。 Where p _{j, 1} , ..., p _{j, M} are M prototypes belonging to class C _j , and w _{j, m} (m = 1, ..., M) is the m-th prototype. It is a weight for the Euclidean distance. The learning parameter set Λ and the function margin type misclassification measure are given below.

ただし、クラスＣ_y及びＣ_iはそれぞれｘの正解クラス及びbest-incorrectクラスである。このとき、 However, the class C _y and C _i are the correct class and best-The net part class x respectively. At this time,

となり（ただしＩは単位行列）、幾何マージン型誤分類尺度D_y(x;Λ)及びその偏微分は式（１５）より次式となる。 (Where I is a unit matrix), the geometric margin type misclassification measure D _y (x; Λ) and its partial differentiation are expressed by the following equation from equation (15).

なお、式（９）における学習係数εに関して、p_j,mの修正に対するεとw_j,mの修正に対するεとは互いに異なる値であってもよい。 Regarding the learning coefficient ε in equation (9), ε for correction of p _{j, m} and ε for correction of w _{j, m} may be different from each other.

《第２の実施の形態の学習プログラム》
図６を参照して、上記第２の実施の形態のアルゴリズムを実現するプログラムは以下のようなステップを含む。 << Learning program of the second embodiment >>
Referring to FIG. 6, the program for realizing the algorithm of the second embodiment includes the following steps.

１．初期化ステップ３２０。ここでは、プロトタイプp_j,mの初期値p_j,m ⁽⁰⁾（j=1,...,J; m=1,...,M）及び重み係数w_j,mの初期値w_j,m ⁽⁰⁾（j=1,...,J; m=1,...,M）を設定する。またエポック回数eの上限値Eを設定する。 1. Initialization step 320. Here, the initial value p _{j, m} ⁽⁰⁾ (j = 1, ..., J; m = 1, ..., M) of the prototype p _{j, m and} the initial value w of the weight coefficient w _{j, m} Set _{j, m} ⁽⁰⁾ (j = 1, ..., J; m = 1, ..., M). In addition, an upper limit value E of the epoch count e is set.

２．ステップ３２２。ここでは、エポック回数e=0,1,...,Eに対して、以下の各サブステップ（ａ）〜（ｃ）を含む処理３２４を実行する。以下は処理３２４を構成する各サブステップである。 2. Step 322. Here, a process 324 including the following substeps (a) to (c) is executed for the number of epochs e = 0, 1,. The following are the sub-steps constituting the process 324.

（ａ）サブステップ３４０。ここでは、必要ならば、損失平滑度パラメータの最適値α_y（y=1,...,J）を得る。 (A) Sub-step 340. Here, if necessary, the optimum value α _y (y = 1,..., J) of the loss smoothness parameter is obtained.

（ｂ）サブステップ３４２。ここでは、学習標本集合Ω_Nから、教師付学習標本（x,y）を取出し、各学習標本に対して以下のサブステップ３６０〜３７４を含む処理３４４を実行する。なお、変数の右肩の「（e）」は、エポック番号を表す。以下の（ｂ１）〜（ｂ８）は処理３４４を構成するサブステップである。 (B) Sub-step 342. Here, the supervised learning sample (x, y) is taken out from the learning sample set Ω _N , and processing 344 including the following sub-steps 360 to 374 is executed for each learning sample. “(E)” on the right shoulder of the variable represents an epoch number. The following (b1) to (b8) are sub-steps constituting the process 344.

（ｂ１）判別関数値g_jを計算する（j=1,...,J）（サブステップ３６０）。 (B1) The discriminant function value g _j is calculated (j = 1,..., J) (substep 360).

（ｂ２）ｘに対するbest-incorrectクラスの指標iを求める（サブステップ３６２）。 (B2) The index i of the best-incorrect class for x is obtained (substep 362).

（ｂ３）関数マージン型誤分類尺度をd_y=-g_y+g_iによって計算する（サブステップ３６４）。 (B3) a function margin type misclassification measure calculated by _{_{_{d y = -g y + g i}}} ( substep 364).

（ｂ４）関数マージン型誤分類尺度の入力パターンに対する勾配ベクトルを計算する（ステップ３６６）。 (B4) A gradient vector for the input pattern of the function margin type misclassification measure is calculated (step 366).

（ｂ５）勾配ベクトルのノルム||∇_xd_y||を計算する（サブステップ３６８）。 (B5) gradient to calculate the norm || ∇ _x d _y || vector (substep 368).

（ｂ６）幾何マージン型誤分類尺度D_y(x;Λ)をD_y=d_y/||∇_xd_y||により計算する（サブステップ３７０）。 (B6) The geometric margin type misclassification measure D _y (x; Λ) is calculated by D _y = d _y / || ∇ _x d _y || (substep 370).

（ｂ７）幾何マージン型誤分類尺度D_y(x;Λ)の偏微分を以下により計算する（m=1,...,M）（サブステップ３７２）。 (B7) The partial differential of the geometric margin type misclassification measure D _y (x; Λ) is calculated as follows (m = 1,..., M) (substep 372).

（ｂ８）以下の式によりパラメータ更新を行なう（m=1,...,M）（サブステップ３７４）。 (B8) Parameter update is performed by the following equation (m = 1,..., M) (substep 374).

以上のサブステップ（ｂ１）〜（ｂ８）が処理３４４の内容である。処理３４４を各学習標本に対して順番に１回ずつ実行し、全標本に対する調整が終了した時点で、新たなプロトタイプp_j,m ^(e+1)（j=1,...,J; m=1,...,M）及び重み係数w_j,m ^(e+1)（j=1,...,J; m=1,...,M）を得る。 The above substeps (b1) to (b8) are the contents of the process 344. The process 344 is executed once for each learning sample in turn, and when the adjustment for all the samples is completed, a new prototype p _{j, m} ^{(e + 1)} (j = 1,..., J; m = 1, ..., M) and weight coefficients w _{j, m} ^{(e + 1)} (j = 1, ..., J; m = 1, ..., M).

（ｃ）ステップ３４６。ここでは、学習標本集合Ω_Nにおける学習標本の並び順をシャッフルする。 (C) Step 346. Here, the order of learning samples in the learning sample set Ω _N is shuffled.

以上が処理３２４の内容である。エポックeに対して処理３２４を行なった後、エポックを１進め（e=e+1）、同じ処理を繰返す。 The above is the content of the process 324. After performing the process 324 on the epoch e, the epoch is advanced by 1 (e = e + 1) and the same process is repeated.

こうして、エポック数が予定した上限値Ｅに達して処理３２４が終了すると、この第２の実施の形態に係る分類器８０の学習後のパラメータ集合Λが得られる。 Thus, when the number of epochs reaches the planned upper limit value E and the processing 324 is finished, a parameter set Λ after learning of the classifier 80 according to the second embodiment is obtained.

［第３の実施の形態：３層フィードフォワード型ニューラルネットワーク分類器］
非線形の判別関数を用いた例として、フィードフォワード型のニューラルネットワークからなる分類器を考える。この場合も第１及び第２の実施の形態と同様、以下のようにして分類器の学習を行なうことができる。 [Third embodiment: three-layer feedforward neural network classifier]
As an example using a nonlinear discriminant function, consider a classifier consisting of a feedforward neural network. In this case, as in the first and second embodiments, the classifier can be learned as follows.

《判別関数及び幾何マージン型分類尺度の偏微分》
３層フィードフォワード型ニューラルネットワーク分類器４００を図７に示す。このニューラルネットワーク分類器４００は、入力層４１２と、中間層４１４と、出力層４１６とを含む。《Partial differentiation of discriminant function and geometric margin type classification measure》
A three-layer feedforward neural network classifier 400 is shown in FIG. The neural network classifier 400 includes an input layer 412, an intermediate layer 414, and an output layer 416.

入力層４１２は、Ｄ＋１個のユニット（d=0,1,...,D）を含む。０番目ユニットは値１を、それ以外のユニットはＤ次元入力パターンｘの各成分を受取り、そのまま出力する。 The input layer 412 includes D + 1 units (d = 0, 1,..., D). The 0th unit receives the value 1 and the other units receive each component of the D-dimensional input pattern x and output it as it is.

中間層４１４は、Ｍ＋１個のユニット（m=0,1,...,M）を含む。０番目ユニットは入力に何も受けず、値１を出力する。それ以外のｍ番目ユニット(m=1,...,M)は、入力層４１２からの出力の重み付け総和に対して非線形関数ｆ_ｍを施し、その結果を出力する。 The intermediate layer 414 includes M + 1 units (m = 0, 1,..., M). The 0th unit receives nothing and outputs a value of 1. The other m-th unit (m = 1, ..., M ) performs a non-linear function f _m with respect to the weighted sum of the outputs from the input layer 412, and outputs the result.

出力層４１６はＪ個のユニット（j=1,...,J）を含む。各j番目のユニット（j=1,...,J）は、中間層４１４からの出力の重み付け総和をクラスＣ_jの判別関数g_jとして出力する。 The output layer 416 includes J units (j = 1,..., J). Each j-th unit (j = 1,..., J) outputs the weighted sum of the outputs from the intermediate layer 414 as the discriminant function g _j of class C _j .

なお、ＭＣＥ学習に基づく実装では、出力層のユニットには非線形関数処理は施されない。 Note that in the implementation based on MCE learning, the output layer unit is not subjected to nonlinear function processing.

クラスＣ_j（j=1,...,J）に対する判別関数は次の式で与えられる。 The discriminant function for class C _j (j = 1,..., J) is given by:

ここでw_m,d（m=1,...,M; d=0,1,...,D）は入力層４１２から中間層４１４への結合に対する重み係数であり、v_j,m（j=1,...,J; m=0,1,...M）は中間層４１４から出力層４１６への結合に対する重み係数である。学習パラメータ集合Λは上記すべての重み係数の集合である。非線形関数ｆ_ｍは任意の微分可能な関数でよいが、ここでは次式のシグモイド関数を採用する。 Here, w _{m, d} (m = 1,..., M; d = 0, 1,..., D) is a weighting coefficient for coupling from the input layer 412 to the intermediate layer 414, and v _{j, m} (J = 1,..., J; m = 0, 1,... M) are weighting coefficients for coupling from the intermediate layer 414 to the output layer 416. The learning parameter set Λ is a set of all the weighting factors. Non-linear function f _m may be any differentiable function, but here employs a sigmoid function of the following equation.

ｋ番目クラスＣ_k（k=1,...,J）の判別関数の各重み係数に関する偏微分は次式となる。 The partial differentiation for each weighting coefficient of the discriminant function of the kth class C _k (k = 1,..., J) is as follows.

またｋ番目クラス（k=1,...,J）の判別関数の第ｐ次元目の入力に関する偏微分は次式となる。 Further, the partial differentiation with respect to the input of the pth dimension of the discriminant function of the kth class (k = 1,..., J) is as follows.

さらに、ｋ番目クラス（k=1,...,J）の判別関数に対する２階微分が次式で与えられる。 Further, the second derivative for the discriminant function of the kth class (k = 1,..., J) is given by

ここで式（５１）のシグモイド非線形関数の場合、その１階及び２階の導関数はそれぞれ次式となる。 Here, in the case of the sigmoid nonlinear function of Expression (51), the first-order and second-order derivatives are respectively expressed by the following expressions.

以上に基づき、関数マージン型誤分類尺度d_y=-g_y+g_iの１階及び２階の偏微分は次式で与えられることとなる。ただしクラスＣ_y及びクラスＣ_iはそれぞれ、ベクトルｘの正解クラス及びbest-incorrectクラスである。 Based on the above, the partial differential of 1 floor and second floor of the classification scale d _{_y} = -g _y + g _i erroneous function margin type and thus given by the following equation. However, the class C _y and the class C _i are the correct class and the best-incorrect class of the vector x, respectively.

そして幾何マージン型誤分類尺度D_y(x;Λ)及びその偏微分は式（１２）より次式となる。 Then, the geometric margin type misclassification scale D _y (x; Λ) and its partial differentiation are expressed by the following equation from the equation (12).

なお、式（９）における学習係数εに関して、v_j,mの修正に対するεとw_m,dの修正に対するεとは互いに異なる値であってもよい。 Regarding the learning coefficient ε in equation (9), ε for the correction of v _{j, m} and ε for the correction of w _{m, d} may be different from each other.

《第３の実施の形態の学習プログラム》
図８を参照して、上記第３の実施の形態のアルゴリズムを実現するプログラムは以下のようなステップを含む。 << Learning program of the third embodiment >>
Referring to FIG. 8, the program for realizing the algorithm of the third embodiment includes the following steps.

１．初期化ステップ４４０。重み係数｛v_j,m｝_j=1 ^Ｊ _m=0 ^M、｛w_m,d｝_m=1 ^M _d=0 ^Dの初期値｛v_j,m ⁽⁰⁾｝_j=1 ^Ｊ _m=0 ^M、｛w_m,d ⁽⁰⁾｝_m=1 ^M _d=0 ^Dを設定する。またエポック回数eの上限値Eを設定する。 1. Initialization step 440. Weight coefficient {v _{j, m} } _{j = 1} ^J _{m = 0} ^M , {w _{m, d} } _{m = 1} ^M _{d = 0} Initial value of ^D {v _{j, m} ⁽⁰⁾ } _{j = 1} ^J _{m = 0} ^M , {w _{m, d} ⁽⁰⁾ } _{m = 1} ^M _{d = 0} ^D is set. In addition, an upper limit value E of the epoch count e is set.

２．ステップ４４２。ここでは、エポック回数e=0,1,...,Eに対して、以下の各サブステップを含む処理４４４を実行する。以下は処理４４４を構成する各サブステップである。 2. Step 442. Here, processing 444 including the following substeps is executed for the number of epochs e = 0, 1,. The following are the sub-steps constituting the process 444.

（ａ）サブステップ４６０。ここでは、必要ならば、損失平滑度パラメータの最適値α_y（y=1,...,J）を得る。 (A) Sub-step 460. Here, if necessary, the optimum value α _y (y = 1,..., J) of the loss smoothness parameter is obtained.

（ｂ）サブステップ４６２。ここでは、学習標本集合Ω_Nから、教師付学習標本（x,y）を取出し、各学習標本に対して以下のサブステップ４８０〜５０２を含む処理４６４を実行する。なお、変数の右肩の「（e）」は、エポック番号を表す。以下の（ｂ１）〜（ｂ１２）は処理４６４を構成するサブステップである。 (B) Sub-step 462. Here, the supervised learning sample (x, y) is taken out from the learning sample set Ω _N , and processing 464 including the following sub-steps 480 to 502 is executed for each learning sample. “(E)” on the right shoulder of the variable represents an epoch number. The following (b1) to (b12) are sub-steps constituting the process 464.

（ｂ１）中間層４１４への入力値を計算する（m=1,...,M）（ステップ４８０）。 (B1) An input value to the intermediate layer 414 is calculated (m = 1,..., M) (step 480).

（ｂ２）判別関数値g_jを計算する（j=1,...,J）（サブステップ４８２）。 (B2) The discriminant function value g _j is calculated (j = 1,..., J) (substep 482).

ただしf_mは例えば式（５１）で与えられる。 However, f _m is given by, for example, equation (51).

（ｂ３）ｘに対するbest-incorrectクラスの指標iを求める（サブステップ４８４）。 (B3) The index i of the best-incorrect class for x is obtained (substep 484).

（ｂ４）判別関数の偏微分を計算する（k=y,i）（サブステップ４８６）。 (B4) The partial differentiation of the discriminant function is calculated (k = y, i) (substep 486).

ただしf’_mは例えば式（６１）で与えられる。 However f _'m is given by example equation (61).

（ｂ５）判別関数の入力に関する偏微分を計算する（k=y,i）（ステップ４８８）。 (B5) The partial differentiation related to the input of the discriminant function is calculated (k = y, i) (step 488).

（ｂ６）判別関数に対する２階偏微分を計算する（k=y,i）（ステップ４９０）。 (B6) Second-order partial differentiation with respect to the discriminant function is calculated (k = y, i) (step 490).

ただしf’’_mは例えば式（６２）で与えられる。 However f '' _m is given by example equation (62).

（ｂ７）関数マージン型誤分類尺度をd_y=-g_y+g_iにより計算する（ステップ４９２）。 (B7) A function margin type misclassification measure is calculated by d _y = −g _y + g _i (step 492).

（ｂ８）関数マージン型誤分類尺度の１階及び２階偏微分を次式で計算する（サブステップ４９４）。 (B8) First-order and second-order partial differentiation of the function margin type misclassification scale is calculated by the following equation (substep 494).

（ｂ９）勾配ベクトルのノルム||∇_xd_y||を計算する（サブステップ４９６）。 (B9) calculating a norm || ∇ _x d _y || of the gradient vector (substep 496).

（ｂ１０）幾何マージン型誤分類尺度D_y(x;Λ)をD_y=d_y/||∇_xd_y||により計算する（サブステップ４９８）。 (B10) The geometric margin type misclassification measure D _y (x; Λ) is calculated by D _y = d _y / || ∇ _x d _y || (substep 498).

（ｂ１１）幾何マージン型誤分類尺度D_y(x;Λ)の偏微分を以下により計算する（サブステップ５００）。 (B11) The partial differentiation of the geometric margin type misclassification measure D _y (x; Λ) is calculated as follows (substep 500).

（ｂ１２）以下の式によりパラメータ更新を行なう（サブステップ５０２）。 (B12) The parameter is updated by the following equation (substep 502).

以上のサブステップ（ｂ１）〜（ｂ１２）が処理４６４の内容である。処理４６４を各学習標本に対して順番に１回ずつ実行し、全標本に対する調整が終了した時点で、新たな｛v_j,m ^(e+1)｝_j=1 ^Ｊ _m=0 ^M、｛w_m,d ^(e+1)｝_m=1 ^M _d=0 ^Dを得る。 The above substeps (b1) to (b12) are the contents of the process 464. The processing 464 is executed once for each learning sample in order, and when the adjustment for all the samples is completed, a new {v _{j, m} ^{(e + 1)} } _{j = 1} ^J _{m = 0} ^M , { w _{m, d} ^{(e + 1)} } _{m = 1} ^M _{d = 0} ^D is obtained.

（ｃ）ステップ４６６。ここでは、学習標本集合Ω_Nにおける学習標本の並び順をシャッフルする。 (C) Step 466; Here, the order of learning samples in the learning sample set Ω _N is shuffled.

以上が処理４４４の内容である。エポックeに対して処理４４４を行なった後、エポックを１進め（e=e+1）、同じ処理を繰返す。 The above is the content of the process 444. After performing the process 444 for the epoch e, the epoch is advanced by 1 (e = e + 1) and the same process is repeated.

こうして、エポック数が予定した上限値Ｅに達して処理４４４が終了すると、この第３の実施の形態に係る分類器８０の学習後のパラメータ集合Λが得られる。 Thus, when the number of epochs reaches the planned upper limit value E and the processing 444 ends, a parameter set Λ after learning of the classifier 80 according to the third embodiment is obtained.

［実験結果］
上記実施の形態に係るＬＧＭ−ＭＣＥ学習法による非線形判別関数分類器の有用性を検証するため、非線形判別関数として2次判別関数を採用し、関数マージン型誤分類尺度を用いる従来のＭＣＥ学習法(ＦＭ−ＭＣＥ法)と幾何マージン型誤分類尺度を用いるＬＧＭ−ＭＣＥ法との比較を行なった。 [Experimental result]
In order to verify the usefulness of the nonlinear discriminant function classifier by the LGM-MCE learning method according to the above embodiment, a conventional MCE learning method that employs a quadratic discriminant function as a nonlinear discriminant function and uses a function margin type misclassification measure A comparison was made between the (FM-MCE method) and the LGM-MCE method using the geometric margin type misclassification scale.

クラスＣ_jにおける２次判別関数は式（１６）で与えられ、本実験ではＡ_j,1＝…＝Ａ_j,M＝Ａ_jとし、行列Ａ_jとしてクラスＣ_jに属する学習標本集合の対角共分散行列の逆行列に固定した。ＦＭ−ＭＣＥ及びＬＧＭ−ＭＣＥ両法で学習されるのはプロトタイプ｛p_j,m｝_j=1 ^J _m=1 ^Mであり、これらの初期化としてK-means法を用いた。更に、ユークリッド距離判別関数型の分類器にＦＭ−ＭＣＥ及びＬＧＭ−ＭＣＥ両法を適用した実験も行なった。ユークリッド距離判別関数は（区分的）線形判別関数の代表例であり、式（１６）においてＡ_jを単位行列に固定したものとして与えられる。 The secondary discriminant function in class C _j is given by equation (16). In this experiment, A _{j, 1} =... = A _{j, M} = A _j and a pair of learning sample sets belonging to class C _j as matrix A _j Fixed to the inverse of the angular covariance matrix. The prototype {p _{j, m} } _{j = 1} ^J _{m = 1} ^M is learned by both the FM-MCE and LGM-MCE methods, and the K-means method was used as initialization thereof. Furthermore, an experiment was conducted in which both FM-MCE and LGM-MCE methods were applied to a Euclidean distance discriminant function type classifier. Euclidean distance discriminant function is a typical example of (piecewise) linear discriminant function is given as to fix the A _j in matrix in equation (16).

実験にはUCI Machine Learning Repository（http://archive.ics.uci.edu/ml/）が提供するLetter Recognitionデータセットを用いた。このデータセットは、英語アルファベットのフォント文字画像から特徴抽出された20,000個のデータで構成される、26クラス、16次元のデータセットである。このデータは標本数が多いため、評価方法としてデータセットを分割するHoldout法を用いた。20,000個の標本集合のうち1,000個を学習用標本集合、他の19,000個を未知標本集合とした。 The experiment used the Letter Recognition data set provided by UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/). This data set is a 26-class, 16-dimensional data set composed of 20,000 pieces of data extracted from a font character image of the English alphabet. Since this data has a large number of samples, the Holdout method for dividing the data set was used as an evaluation method. Of the 20,000 sample sets, 1,000 were used as learning sample sets, and the other 19,000 were used as unknown sample sets.

テーブル１は、各判別関数及び各学習手法の未知分類標本率（％）を示したものである。カッコ内は学習標本分類率である。プロトタイプ数が1の場合、ＦＭ−ＭＣＥ及びＬＧＭ−ＭＣＥ両法とも、２次判別関数型分類器の方がユークリッド距離型より分類率が高く、更に２次判別関数型分類器において、ＬＧＭ−ＭＣＥ法がＦＭ−ＭＣＥ法より高い分類率を与えている。プロトタイプ数が３の場合は、未知標本分類率において、ＦＭ−ＭＣＥ及びＬＧＭ−ＭＣＥ両法とも、２次判別関数型分類器の方がユークリッド距離型より分類率が高く、更に２次判別関数型分類器において、ＬＧＭ−ＭＣＥ法がＦＭ−ＭＣＥ法より高い分類率を与えている。以上により、線形判別関数のみならず２次判別関数においても、ＬＧＭ−ＭＣＥ学習法が従来のＦＭ−ＭＣＥ法より高い分類精度を与えることが確認された。 Table 1 shows the unknown classification sample rate (%) of each discriminant function and each learning method. Figures in parentheses are learning sample classification rates. When the number of prototypes is 1, both the FM-MCE and LGM-MCE methods have a higher classification rate for the secondary discriminant function type classifier than the Euclidean distance type, and in the secondary discriminant function type classifier, the LGM-MCE The method gives a higher classification rate than the FM-MCE method. When the number of prototypes is 3, in the unknown sample classification rate, both the FM-MCE and LGM-MCE methods have a higher classification rate for the secondary discriminant function type classifier than the Euclidean distance type, and further a secondary discriminant function type In the classifier, the LGM-MCE method gives a higher classification rate than the FM-MCE method. From the above, it was confirmed that the LGM-MCE learning method gives higher classification accuracy than the conventional FM-MCE method not only in the linear discriminant function but also in the quadratic discriminant function.

［コンピュータによる実現］
以上に説明した実施の形態に係る分類器の学習装置は、汎用コンピュータ及びその上で実行されるコンピュータプログラムにより実現することができる。図９は上記実施の形態で用いられるコンピュータシステム５５０の外観を示し、図１０はコンピュータシステム５５０のブロック図である。ここで示すコンピュータシステム５５０は単なる例であって、他の構成も利用可能である。このコンピュータプログラムのうち、コアとなる部分は、図４〜図６及び図８のフローチャートにより示される制御構造を有する。 [Realization by computer]
The classifier learning apparatus according to the embodiment described above can be realized by a general-purpose computer and a computer program executed thereon. FIG. 9 shows the external appearance of the computer system 550 used in the above embodiment, and FIG. 10 is a block diagram of the computer system 550. The computer system 550 shown here is merely an example, and other configurations can be used. The core part of this computer program has the control structure shown by the flowcharts of FIGS.

図９を参照して、コンピュータシステム５５０は、コンピュータ５６０と、全てコンピュータ５６０に接続された、モニタ５６２と、キーボード５６６と、マウス５６８と、スピーカ５５８と、マイクロフォン５９０と、を含む。さらに、コンピュータ５６０はＤＶＤ−ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋＲｅａｄ−Ｏｎｌｙ−Ｍｅｍｏｒｙ：デジタル多用途ディスク読出専用メモリ）ドライブ５７０と、半導体メモリポート５７２とを含む。 Referring to FIG. 9, computer system 550 includes a computer 560, a monitor 562, a keyboard 566, a mouse 568, a speaker 558, and a microphone 590 that are all connected to computer 560. Further, the computer 560 includes a DVD-ROM (Digital Versatile Disk Read-Only-Memory) drive 570 and a semiconductor memory port 572.

図１０を参照して、コンピュータ５６０はさらに、ＤＶＤ−ＲＯＭドライブ５７０と半導体メモリドライブ５７２とに接続されたバス５８６と、全てバス５８６に接続された、ＣＰＵ５７６と、コンピュータ５６０のブートアッププログラムを記憶するＲＯＭ５７８と、ＣＰＵ５７６によって使用される作業領域を提供するとともにＣＰＵ５７６によって実行されるプログラムのための記憶領域となるＲＡＭ５８０と、学習データ等を記憶するためのハードディスクドライブ５７４と、ネットワーク５５２への接続を提供するネットワークインターフェイス５９６とを含む。 Referring to FIG. 10, computer 560 further stores a bus 586 connected to DVD-ROM drive 570 and semiconductor memory drive 572, a CPU 576 all connected to bus 586, and a boot-up program for computer 560. A ROM 578, a RAM 580 that provides a work area used by the CPU 576 and a storage area for programs executed by the CPU 576, a hard disk drive 574 for storing learning data, etc., and a connection to the network 552. A network interface 596 to be provided.

上述の実施の形態のシステムを実現するソフトウェアは、ＤＶＤ−ＲＯＭ５８２又は半導体メモリ５８４等のコンピュータ読取可能な記録媒体に記録されたオブジェクトコード、スクリプト、又はソースプログラムの形で流通し、ＤＶＤ−ＲＯＭドライブ５７０又は半導体メモリポート５７２等の読出装置を介してコンピュータ５６０に提供され、ハードディスクドライブ５７４に記憶される。ソースプログラムでコンピュータ５６０に導入されるときには、所定のコンパイラでコンパイルしてオブジェクトコードを生成する必要がある。ＣＰＵ５７６がプログラムを実行する際には、オブジェクトプログラム（又はスクリプト）はハードディスクドライブ５７４から読出されてＲＡＭ５８０に記憶される。図示しないプログラムカウンタによって指定されたアドレスから命令がフェッチされ、その命令が実行される。ＣＰＵ５７６はハードディスクドライブ５７４又はＲＡＭ５８０から処理すべきデータを読出し、処理の結果をこれもまたハードディスクドライブ５７４又はＲＡＭ５８０に記憶する。スピーカ５５８及びマイクロフォン５９０は、本実施の形態では用いられないが、本発明は音声認識及び話者認識にも適用可能であり、そうした場合にはこれらは音声についての学習データを準備するときに必要となる。マイクロフォン５９０はまた、このコンピュータ上で音声認識を行なうときには、処理対象の音声を入力するための入力装置としても機能する。 The software that realizes the system of the above-described embodiment is distributed in the form of object code, script, or source program recorded on a computer-readable recording medium such as DVD-ROM 582 or semiconductor memory 584, and is a DVD-ROM drive. The data is provided to the computer 560 via a reading device such as 570 or the semiconductor memory port 572 and stored in the hard disk drive 574. When the source program is introduced into the computer 560, it is necessary to compile with a predetermined compiler to generate an object code. When the CPU 576 executes the program, the object program (or script) is read from the hard disk drive 574 and stored in the RAM 580. An instruction is fetched from an address designated by a program counter (not shown), and the instruction is executed. The CPU 576 reads data to be processed from the hard disk drive 574 or the RAM 580 and stores the processing result in the hard disk drive 574 or the RAM 580 as well. Although the speaker 558 and the microphone 590 are not used in this embodiment mode, the present invention can also be applied to speech recognition and speaker recognition, in which case they are necessary when preparing learning data about speech. It becomes. The microphone 590 also functions as an input device for inputting the voice to be processed when performing voice recognition on this computer.

学習用データは、予め収集され、入力パターンとそのパターンの属するクラスとの組を多数含む。これは、図３に示すシステムでは、各文字画像から抽出した文字特徴量と、その文字画像に対応する文字カテゴリである。学習用データは、ハードディスクドライブ５７４（図３に示す記憶部７０及び学習データ記憶部７６）に記憶される。上記した処理により算出される分類用のパラメータセットΛ等は、一旦はハードディスクドライブ５７４等に記憶され、さらにネットワークを介して、又はＵＳＢメモリを介して、分類器にコピーされる。分類器はこれらクラス分類用のパラメータセットΛを用いて入力パターンをいずれかのクラスに分類する。 The learning data is collected in advance and includes a large number of sets of input patterns and classes to which the patterns belong. In the system shown in FIG. 3, this is a character feature amount extracted from each character image and a character category corresponding to the character image. The learning data is stored in the hard disk drive 574 (the storage unit 70 and the learning data storage unit 76 shown in FIG. 3). The classification parameter set Λ or the like calculated by the above processing is temporarily stored in the hard disk drive 574 or the like and further copied to the classifier via the network or the USB memory. The classifier classifies the input pattern into any class using the parameter set Λ for class classification.

コンピュータシステム５５０の一般的動作は周知であるので、詳細な説明はここでは繰返さない。 Since the general operation of computer system 550 is well known, detailed description will not be repeated here.

ソフトウェアの流通の方法に関して、ソフトウェアは必ずしも記憶媒体上に固定されたものでなくても良い。例えば、ソフトウェアはネットワークに接続された別のコンピュータから分配されても良い。ソフトウェアの一部がハードディスクドライブ５７４に記憶され、ソフトウェアの残りの部分をネットワーク上からハードディスクドライブ５７４に取込み、実行の際に統合する様にしても良い。 Regarding the software distribution method, the software does not necessarily have to be fixed on a storage medium. For example, the software may be distributed from another computer connected to the network. A part of the software may be stored in the hard disk drive 574, and the remaining part of the software may be taken into the hard disk drive 574 from the network and integrated at the time of execution.

典型的には、現代のコンピュータはコンピュータのオペレーティングシステム（ＯＳ）によって提供される一般的な機能、及びスクリプト言語を使用する場合にはスクリプト言語の実行系により提供される一般的又は特定の目的に沿った機能を利用し、所望の目的にしたがって制御された態様で機能を達成する。したがって、ＯＳ又はサードパーティから提供されうる一般的な機能を含まず、そのように他のシステムにより提供される機能の実行順序の組合せを指定したプログラムであっても、そのプログラムが全体として所望の目的を達成する制御構造を有する限り、そのプログラムがこの発明の範囲に包含されることは明らかである。 Typically, modern computers serve the general functionality provided by the computer's operating system (OS), and the general or specific purpose provided by the scripting language's execution system if a scripting language is used. The functions along the line are utilized to achieve the functions in a controlled manner according to the desired purpose. Therefore, even if the program does not include a general function that can be provided from the OS or a third party and thus specifies a combination of execution order of functions provided by other systems, the program is desired as a whole. Obviously, the program is included in the scope of the present invention as long as it has a control structure that achieves the object.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are included. Including.

４０文字認識システム
５０学習ユニット
５４文字認識ユニット
５６出力文字カテゴリ
７６学習データ記憶部
７８学習モジュール
８０分類器
９６デコーダ
４００ニューラルネットワーク分類器
４１２入力層
４１４中間層
４１６出力層
５５０コンピュータシステム
５６０コンピュータ 40 character recognition system 50 learning unit 54 character recognition unit 56 output character category 76 learning data storage unit 78 learning module 80 classifier 96 decoder 400 neural network classifier 412 input layer 414 intermediate layer 416 output layer 550 computer system 560 computer

Claims

A classifier learning device that classifies an input pattern into one of J classes C _j (j is an integer of 1 to J),
Learning sample storage means for storing a learning sample set including N (N is a positive integer) supervised input pattern;
Initializing means for initializing the learning parameter set Λ of the classifier by a predetermined setting method,
Class C input pattern x of training samples in the set which belongs to _y is measure the degree to which misclassified other classes Geometric Margin type misclassification measure value D _y (x; lambda) is defined below,
However ψ is a positive real number, g _y (x; Λ) is said for each of the J Class C _y, determines the degree input pattern x of training samples in the set is whether belonging to the class Is a discriminant function that can be second-order differentiated with respect to x and the learning parameter set Λ, and d _y (x; Λ) is called a function margin type misclassification measure,
For a vector λ = [λ ₁ ... Λ _k ] in which _k variables included in the learning parameter set Λ are arranged, the partial differentiation of the misclassification measure value D _y (x; Λ) by the vector λ is the function d _y ( x; Λ) using the gradient vector ∇ _x d _y , where superscript T represents the transpose of the matrix,
Further, the partial differentiation of the misclassification measure value D _y (x; Λ) so that the value of the predetermined minimization target function L (Λ) with respect to the learning parameter set Λ is minimized with respect to the learning sample set. A classifier learning device including parameter adjustment means for adaptively adjusting the value of each parameter included in the learning parameter set Λ.

The discriminant function for class C _j (j = 1,..., J) has M prototypes belonging to class C _j as p _{j, 1} ,..., P _{j, M} and a positive definite matrix corresponding to each prototype. A _{j, 1} ,..., A _{j, M} are given by
However, p _j and A _j are distance distances defined by the following equation between the input pattern x and the prototype belonging to the class C _j.
P _j = p _{j, m (j)} and A _j = A _{j, m (j)} , where _{m (j)} is the prototype index that minimizes
The function margin type misclassification scale d _y (x; Λ) is given by
The classifier learning device according to claim 1, wherein the geometric margin type misclassification measure D _y (x; Λ) and its partial differentiation are given by the following equations.

The positive definite matrix A _{j, 1} ,..., A _{j, M} is a diagonal matrix having a positive diagonal component as follows:
Parameters a _{j, 1} , ..., a _{j, D} are included in the learning parameter set Λ, and parameters a _{y, d} and a _{i, d} (of the geometric margin type misclassification measure D _y (x; Λ) The classifier learning device according to claim 2, wherein the partial differentiation with respect to d = 1,..., D) is represented by the following expression.

The discriminant function for class C _j (j = 1, ..., J) is given by
Where p _{j, 1} , ..., p _{j, M} are M prototypes belonging to class C _j , and w _{j, m} (m = 1, ..., M) is the mth prototype. A weight for the Euclidean distance,
The learning parameter set Λ and the function margin type misclassification measure d _y (x; Λ) are given by the following equations:
However Class C _y and C _i are correct class and best-The net part class x respectively,
The classifier learning apparatus according to claim 1, wherein the geometric margin type misclassification scale and the partial differentiation thereof are represented by the following equations.

The classifier is a three-layer feedforward neural network classifier including an input layer, an intermediate layer, and an output layer,
The input layer includes D + 1 units;
The intermediate layer comprises M + 1 single unit, m-th of the intermediate layer (m = 1, ..., M ) is the unit of performing a non-linear function f _m with respect to the weighted sum of the outputs from the input layer Output,
The output layer includes J units,
Each j-th unit (j = 1,..., J) outputs a weighted sum of outputs from the intermediate layer as a discriminant function g _j of class C _j ,
The discriminant function for class C _j (j = 1, ..., J) is given by
Where w _{m, d} (m = 1, ..., M; d = 0,1, ..., D) is for the coupling from the d-th unit of the input layer to the m-th unit of the intermediate layer. The weighting factor, v _{j, m} (j = 1, ..., J; m = 0,1, ..., M) is for the coupling from the mth unit in the middle layer to the jth unit in the output layer A weighting factor,
The learning parameter set Λ includes the weighting coefficients w _{m, d} (m = 1, ..., M; d = 0,1, ..., D) and v _{j, m} (j = 1, ... , J; m = 0,1, ..., M)
The classifier learning apparatus according to claim 1, wherein the geometric margin type misclassification scale and the partial differentiation thereof are as follows.

A computer program that causes a computer to function as each unit of the learning device for a classifier according to any one of claims 1 to 5.