JP2001272994A

JP2001272994A - Device and method for study, device and method for recognizing pattern, and recording medium

Info

Publication number: JP2001272994A
Application number: JP2000090724A
Authority: JP
Inventors: Yoshinaga Kato; 喜永加藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2000-03-27
Filing date: 2000-03-27
Publication date: 2001-10-05

Abstract

PROBLEM TO BE SOLVED: To provide a device for study enabling a prescribed model (for example, a recognition model) to stabilize itself at an early stage and reach a local optimal state independently of a model to be handled and also by reducing a calculation amount. SOLUTION: This device comprises a normalizing means 11 for normalizing a discrimination function to each class by converting it into at least a primarily differentiable and stochastically limited function, a loss calculating means 12 for calculating a loss to a correct answer by using a discrimination function value normalized by the normalizing means 11, an optimal parameter adjustment amount calculating means 13 for calculating an optimal adjustment amount of a prescribed model 100 to minimize the loss calculated by the loss calculating means 12, and an adjusting means 14 for adjusting the parameter of the prescribed model 100 by the optimal parameter adjusting amount calculated by the optimal parameter adjustment amount calculating means 13.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、学習装置および学
習方法およびパターン認識装置およびパターン認識方法
および記録媒体に関する。The present invention relates to a learning device, a learning method, a pattern recognition device, a pattern recognition method, and a recording medium.

【０００２】[0002]

【従来の技術】従来、例えば文献「The Journal of the
Acoustical Society of Japan(E) vol.13 no.6, pp.34
1-349, Nov.1992」には、入力パターンをクラス数Ｕの
クラス１〜Ｕのいずれかに分類する技術が示されてい
る。すなわち、この文献に開示の技術では、まず、入力
パターンｘと認識モデルのパラメータΛが与えられた時
のクラスｕに対する判別関数をｇ_u(ｘ，Λ)，(ｕ＝１，
・・・，Ｕ）とする。ここで、入力パターンｘがクラスｕ
に属している場合には、判別関数ｇ_uの値が大きくなる
ように設計されているものとする。この場合、ｐという
クラスに属するパターンに対する誤分類測度ｄ_p(ｘ，
Λ)は次式（数１）のように表わされる。2. Description of the Related Art Conventionally, for example, the document "The Journal of the
Acoustical Society of Japan (E) vol.13 no.6, pp.34
1-349, Nov. 1992 ", there is disclosed a technique for classifying an input pattern into any one of classes 1 to U having a class number U. That is, according to the technique disclosed in this document, first, a discriminant function for a class u when an input pattern x and a parameter の of a recognition model are given is g _u (x, Λ), (u = 1,
..., U). Here, the input pattern x is a class u
If it belongs to is assumed to be designed such that the value of the discriminant function g _u increases. In this case, the misclassification measure d _p (x,
Λ) is represented by the following equation (Equation 1).

【０００３】[0003]

【数１】 (Equation 1)

【０００４】ここで、ζは比較操作を制御する指数（す
なわち、調整量）を表わしている。数１より、誤分類測
度ｄ_p(ｘ，Λ)は、これが負(ー)となるときには正解を
表わし、正(＋)となるときには不正解を表わす。Here, ζ represents an index for controlling the comparison operation (that is, an adjustment amount). From Equation 1, the misclassification measure d _p (x, ｘ) indicates a correct answer when it is negative (−), and indicates an incorrect answer when it is positive (+).

【０００５】次に、ｍ番目のパターンを入力したときの
ｕ番目のクラスに対する個々の損失Ｅ_u(ｘ，Λ)を次式
（数２）のように平滑な形式で定義する。Next, individual losses E _u (x, Λ) for the u-th class when the m-th pattern is input are defined in a smooth form as in the following equation (Equation 2).

【０００６】[0006]

【数２】 (Equation 2)

【０００７】なお、数２の損失Ｅ_u(ｘ，Λ)はシグモイ
ド関数であり、正解のときには“０”，完全に間違って
いるときには“１”に非常に近い値をとる。数２の損失
の表現Ｅ_u(ｘ，Λ)を用いると、経験損失Ｌ(Λ)は次式
（数３）のように表現できる。The loss E _u (x, Λ) in Equation 2 is a sigmoid function, and takes a value very close to “0” when the answer is correct and “1” when the answer is completely wrong. Using the loss expression E _u (x, Λ) of Equation 2, the empirical loss L (Λ) can be expressed as the following Equation (Equation 3).

【０００８】[0008]

【数３】 (Equation 3)

【０００９】ここで、ｘ_mは、Ｍ個の中から入力された
ｍ番目のパターンである。経験損失Ｌ(Λ)を最小にする
ため、次式(数４)のように、モデルパラメータΛの修正
量ΔΛを計算する。Here, x _m is the m-th pattern input from the M patterns. In order to minimize the experience loss L (Λ), the correction amount ΔΛ of the model parameter Λ is calculated as in the following equation (Equation 4).

【００１０】[0010]

【数４】 (Equation 4)

【００１１】ここで、ηは正の小さな学習係数である。
確率的降下定理に基づき次式(数５)の繰り返し計算を実
行することにより、数３の局所最小状態が保証される。Here, η is a small positive learning coefficient.
By executing the iterative calculation of the following equation (Equation 5) based on the stochastic descent theorem, the local minimum state of Equation 3 is guaranteed.

【００１２】[0012]

【数５】Λ(ｔ＋１)＝Λ(ｔ)＋ΔΛ(ｔ)５ (t + 1) = Λ (t) + ΔΛ (t)

【００１３】なお、数５において、Λ(ｔ)は繰り返し計
算をｔ回適用後のパラメータである。In equation (5),) (t) is a parameter after the repetitive calculation is applied t times.

【００１４】[0014]

【発明が解決しようとする課題】ところで、上記文献に
示された技術内容によれば、数１を各クラス１〜Ｕごと
に計算するため、計算量が多く、全体の計算負荷が大き
くなるという不都合がある。By the way, according to the technical contents disclosed in the above-mentioned literature, since the equation (1) is calculated for each of the classes 1 to U, the calculation amount is large and the total calculation load is large. There are inconveniences.

【００１５】一方、上記文献には、比較操作を制御する
ζを無限大に設定し、次式(数６)のように簡単化した誤
分類度を用いることも示されている。On the other hand, the above document also discloses that ζ, which controls the comparison operation, is set to infinity, and a simplified misclassification degree is used as in the following equation (Equation 6).

【００１６】[0016]

【数６】ｄ_p(ｘ，Λ)＝−ｇ_p(ｘ，Λ)＋ｇ_q(ｘ，Λ)## EQU6 ## d _p (x, Λ) = − g _p (x, Λ) + g _q (x, Λ)

【００１７】ここで、ｑはｐ以外で最も判別関数値が大
きくなるクラスである。すなわち、数６は、パターンｘ
が属するクラスｐとそれ以外で最も近いクラスｑとの間
だけで比較を行なう式となっている。数６を用いれば、
計算量を減らすことはできるが、損失を最小にするため
の操作がｐ，ｑ以外のモデルパラメータに反映されない
ので、最適な状態への到達するまでの時間が長くかかっ
てしまうという不都合がある。さらに、判別関数の値
は、扱うモデル（例えば、隠れマルコフモデルやニュー
ラルネットワークモデルなど）により、取りうる範囲に
ばらつきがある。そのため、数２では、ｈ，ｂという変
数を用いて損失関数の形状を制御している。しかしなが
ら、これらの値は経験的に定められるため、適切に設定
されていない場合は、最適状態への到達時間が長くなる
という問題があった。Here, q is a class having the largest discriminant function value except p. That is, Equation 6 is a pattern x
The comparison is performed only between the class p to which belongs and the closest class q. Using Equation 6,
Although the amount of calculation can be reduced, the operation for minimizing the loss is not reflected on the model parameters other than p and q, so that it takes a long time to reach an optimum state. Further, the value of the discriminant function varies in a possible range depending on a model to be handled (for example, a hidden Markov model or a neural network model). Therefore, in Equation 2, the shape of the loss function is controlled using the variables h and b. However, since these values are determined empirically, if they are not set appropriately, there is a problem that the time to reach the optimum state becomes longer.

【００１８】本発明は、扱うモデルによらずに、また、
計算量を低減させて、早い段階で安定して所定のモデル
(例えば認識モデル)の局所最適状態に到達させることの
可能な学習装置および学習方法およびパターン認識装置
およびパターン認識方法および記録媒体を提供すること
を目的としている。The present invention is independent of the model to be treated and
Predetermined model stably at an early stage by reducing the amount of calculation
It is an object of the present invention to provide a learning device, a learning method, a pattern recognition device, a pattern recognition method, and a recording medium capable of reaching a local optimum state (for example, a recognition model).

【００１９】[0019]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の発明は、判別関数を用いて入力パタ
ーンが属するクラスを求め、正解との損失を最小化する
ために、所定のモデルのパラメータを調整する学習装置
において、各クラスに対する判別関数を少なくとも一次
微分可能でかつ確率的制約をもつ関数に変換して正規化
する正規化手段と、正規化手段によって正規化された判
別関数値を用いて正解に対する損失を算出する損失算出
手段と、損失算出手段で算出された損失を最小にするこ
とで、所定のモデルの最適なパラメータ調整量を算出す
る最適パラメータ調整量算出手段と、最適パラメータ調
整量算出手段によって算出された最適なパラメータ調整
量により所定のモデルのパラメータを調整する調整手段
とを有していることを特徴としている。In order to achieve the above object, according to the first aspect of the present invention, a class to which an input pattern belongs is determined by using a discriminant function, and a predetermined class is determined in order to minimize a loss from a correct answer. A learning device that adjusts the parameters of the model of (1), normalizing means for converting a discriminant function for each class into a function having at least first-order differentiable and stochastic constraints and normalizing the discriminant function, and discriminant normalized by the normalizing means Loss calculation means for calculating a loss to a correct answer using a function value, and an optimum parameter adjustment amount calculation means for calculating an optimum parameter adjustment amount of a predetermined model by minimizing the loss calculated by the loss calculation means; Adjusting means for adjusting the parameters of a predetermined model based on the optimum parameter adjustment amount calculated by the optimum parameter adjustment amount calculating means. It is characterized in.

【００２０】また、請求項２記載の発明は、請求項１記
載の学習装置において、損失算出手段は、正規化手段に
よって正規化された判別関数値を入力パターンが属する
クラスへの可能性の分布として評価する尺度を用いて、
正解クラスに対する損失を求めることを特徴としてい
る。According to a second aspect of the present invention, in the learning apparatus according to the first aspect, the loss calculating means distributes the discriminant function value normalized by the normalizing means to a class to which the input pattern belongs. Using a scale that evaluates as
It is characterized in that a loss for the correct answer class is obtained.

【００２１】また、請求項３記載の発明は、請求項１記
載の学習装置において、損失算出手段は、正規化手段に
よって正規化された判別関数値のうち、正解に対する判
別関数値と正解に対する判別関数値以外でもっともらし
い値をもつ少なくとも１つの判別関数値とに対して、入
力パターンが属する可能性の分布として評価する尺度を
用いて、正解クラスに対する損失を求めることを特徴と
している。According to a third aspect of the present invention, in the learning apparatus according to the first aspect, the loss calculating means includes a discriminant function value for the correct answer and a discriminant for the correct answer among the discriminant function values normalized by the normalizing means. It is characterized in that a loss for a correct answer class is obtained using at least one discriminant function value having a plausible value other than the function value using a scale evaluated as a distribution of the possibility that the input pattern belongs.

【００２２】また、請求項４記載の発明は、請求項２ま
たは請求項３記載の学習装置において、尺度に用いる正
解の分布には、正解が属するクラスにのみ可能性が存在
する分布が用いられることを特徴としている。According to a fourth aspect of the present invention, in the learning apparatus according to the second or third aspect, the distribution of the correct answer used for the scale is a distribution having a possibility only in the class to which the correct answer belongs. It is characterized by:

【００２３】また、請求項５記載の発明は、請求項１乃
至請求項４のいずれか一項に記載の学習装置において、
入力パターンが可変長入力パターンである場合に、所定
のモデルとして、パターンの特徴量を評価するパラメー
タと部分パラメータの継続長を評価するパラメータとで
表現される状態遷移モデルを用いて、可変長入力パター
ンを得点化することを特徴としている。According to a fifth aspect of the present invention, in the learning apparatus according to any one of the first to fourth aspects,
When the input pattern is a variable-length input pattern, a variable-length input is performed by using a state transition model represented by a parameter for evaluating the characteristic amount of the pattern and a parameter for evaluating the continuation length of the partial parameter as a predetermined model. It is characterized by scoring patterns.

【００２４】また、請求項６記載の発明は、請求項５記
載の学習装置において、状態遷移モデルは、特徴量を評
価するパラメータによる得点と継続長を評価するパラメ
ータによる得点との割合を調整可能になっていることを
特徴としている。According to a sixth aspect of the present invention, in the learning apparatus according to the fifth aspect, the state transition model can adjust a ratio between a score based on a parameter for evaluating a feature amount and a score based on a parameter for evaluating a continuation length. It is characterized by being.

【００２５】また、請求項７記載の発明は、判別関数を
用いて入力パターンが属するクラスを求め、正解との損
失を最小化するために、所定のモデルのパラメータを調
整する学習方法において、各クラスに対する判別関数を
少なくとも一次微分可能でかつ確率的制約をもつ関数に
変換して正規化した上で、正解に対する損失を求め、正
解に対する損失を最小にすることで、所定のモデルの最
適なパラメータを求めることを特徴としている。According to a seventh aspect of the present invention, there is provided a learning method for determining a class to which an input pattern belongs by using a discriminant function and adjusting parameters of a predetermined model in order to minimize a loss from a correct answer. After converting the discriminant function for the class to a function having at least first-order differentiable and stochastic constraints and normalizing it, find the loss for the correct answer and minimize the loss for the correct answer to obtain the optimal parameters for the given model. It is characterized by seeking.

【００２６】また、請求項８記載の発明は、請求項７記
載の学習方法において、正解との損失を最小化するため
に、確率的降下法によって所定のモデルのパラメータを
求めることを特徴としている。The invention according to claim 8 is characterized in that, in the learning method according to claim 7, parameters of a predetermined model are obtained by a stochastic descent method in order to minimize a loss from a correct answer. .

【００２７】また、請求項９記載の発明は、請求項７記
載の学習方法において、正規化された判別関数値を入力
パターンが属するクラスへの可能性の分布として評価す
る尺度を用いて正解クラスに対する損失を求めることを
特徴としている。According to a ninth aspect of the present invention, there is provided the learning method according to the seventh aspect, wherein a standardized discriminant function value is determined by using a scale for evaluating the normalized discriminant function value as a probability distribution to a class to which the input pattern belongs. It is characterized by finding the loss for

【００２８】また、請求項１０記載の発明は、請求項７
記載の学習方法において、正規化された判別関数値のう
ち、正解に対する判別関数値と正解に対する判別関数値
以外でもっともらしい値をもつ少なくとも１つの判別関
数値とに対して、入力パターンが属する可能性の分布と
して評価する尺度を用いて、正解クラスに対する損失を
求めることを特徴としている。The invention according to claim 10 is the same as the invention according to claim 7.
In the learning method described, the input pattern may belong to the discriminant function value for the correct answer and at least one discriminant function value having a plausible value other than the discriminant function value for the correct answer among the normalized discriminant function values. It is characterized in that a loss for a correct answer class is obtained using a scale evaluated as a gender distribution.

【００２９】また、請求項１１記載の発明は、請求項９
または請求項１０記載の学習方法において、尺度に用い
る正解の分布には、正解が属するクラスにのみ可能性が
存在する分布を用いることを特徴としている。Further, the invention described in claim 11 is the same as the ninth invention.
Alternatively, in the learning method according to the tenth aspect, the distribution of the correct answer used as the scale is a distribution having a possibility only in the class to which the correct answer belongs.

【００３０】また、請求項１２記載の発明は、請求項７
乃至請求項１１のいずれか一項に記載の学習方法におい
て、入力パターンが可変長入力パターンである場合に、
所定のモデルとして、パターンの特徴量を評価するパラ
メータと部分パラメータの継続長を評価するパラメータ
とで表現される状態遷移モデルを用いて、可変長入力パ
ターンを得点化することを特徴としている。The invention according to claim 12 is the same as claim 7.
In the learning method according to any one of claims 11 to 11, when the input pattern is a variable length input pattern,
As a predetermined model, a variable-length input pattern is scored by using a state transition model represented by a parameter for evaluating a pattern characteristic amount and a parameter for evaluating a continuation length of a partial parameter.

【００３１】また、請求項１３記載の発明は、請求項１
２記載の学習方法において、状態遷移モデルは、特徴量
を評価するパラメータによる得点と継続長を評価するパ
ラメータによる得点との割合を調整可能になっているこ
とを特徴としている。The invention according to claim 13 is the first invention.
2. In the learning method described in 2, the state transition model is characterized in that the ratio between the score based on the parameter for evaluating the characteristic amount and the score based on the parameter for evaluating the continuation length can be adjusted.

【００３２】また、請求項１４記載の発明は、請求項１
２または請求項１３記載の学習方法において、特徴量を
評価するパラメータの平均ベクトルを、請求項７乃至請
求項１１のいずれか一項に記載の学習方法を用いて調整
することを特徴としている。The invention according to claim 14 is the first invention.
In the learning method according to the second or thirteenth aspect, an average vector of a parameter for evaluating a feature amount is adjusted using the learning method according to any one of the seventh to eleventh aspects.

【００３３】また、請求項１５記載の発明は、請求項１
２または請求項１３記載の学習方法において、特徴量を
評価するパラメータの分散を、請求項７乃至請求項１１
のいずれか一項に記載の学習方法を用いて調整すること
を特徴としている。Further, the invention according to claim 15 provides the invention according to claim 1.
In the learning method according to the second or the thirteenth aspect, the variance of the parameter for evaluating the feature amount is determined.
The adjustment is performed using the learning method described in any one of the above.

【００３４】また、請求項１６記載の発明は、請求項１
２または請求項１３記載の学習方法において、継続長を
評価するパラメータの平均ベクトルを、請求項７乃至請
求項１１のいずれか一項に記載の学習方法を用いて調整
することを特徴としている。The invention according to claim 16 is the first invention.
In the learning method according to the second or thirteenth aspect, the average vector of the parameter for evaluating the continuation length is adjusted using the learning method according to any one of the seventh to eleventh aspects.

【００３５】また、請求項１７記載の発明は、請求項１
２または請求項１３記載の学習方法において、継続長を
評価するパラメータの分散を、請求項７乃至請求項１１
のいずれか一項に記載の学習方法を用いて調整すること
を特徴としている。The invention according to claim 17 is the first invention.
In the learning method according to the second or the thirteenth aspect, the variance of the parameter for evaluating the continuation length is determined.
The adjustment is performed using the learning method described in any one of the above.

【００３６】また、請求項１８記載の発明は、請求項１
３記載の学習方法において、特徴量を評価するパラメー
タによる得点と継続長を評価するパラメータによる得点
の割合を、請求項７乃至請求項１１のいずれか一項に記
載の学習方法を用いて調整することを特徴としている。The invention according to claim 18 is the first invention.
In the learning method according to the third aspect, a ratio between a score based on the parameter for evaluating the feature amount and a score based on the parameter for evaluating the continuation length is adjusted using the learning method according to any one of claims 7 to 11. It is characterized by:

【００３７】また、請求項１９記載の発明は、判別関数
を用いて入力パターンが属するクラスを求め、正解との
損失を最小化するために、認識モデルのパラメータを調
整するパターン認識装置において、各クラスに対する判
別関数を少なくとも一次微分可能でかつ確率的制約をも
つ関数に変換して正規化する正規化手段と、正規化手段
によって正規化された判別関数値を用いて正解に対する
損失を算出する損失算出手段と、損失算出手段で算出さ
れた損失を最小にすることで、認識モデルの最適なパラ
メータ調整量を算出する最適パラメータ調整量算出手段
と、最適パラメータ調整量算出手段によって算出された
最適なパラメータ調整量により認識モデルのパラメータ
を調整する調整手段とを有し、調整手段によって認識モ
デルのパラメータが最適なパラメータに調整されたとき
に、パラメータ調整された認識モデルを用いて入力パタ
ーンに対するパターン認識を行なうようになっているこ
とを特徴としている。According to a nineteenth aspect of the present invention, there is provided a pattern recognition apparatus which obtains a class to which an input pattern belongs by using a discriminant function and adjusts parameters of a recognition model in order to minimize a loss from a correct answer. A normalizing means for converting a discriminant function for a class into a function having at least first differentiable and having a stochastic constraint to normalize the discriminant function, and a loss for calculating a loss for a correct answer using the discriminant function value normalized by the normalizing means Calculating means, an optimum parameter adjustment amount calculating means for calculating an optimum parameter adjustment amount of the recognition model by minimizing the loss calculated by the loss calculating means, and an optimum parameter adjustment amount calculated by the optimum parameter adjustment amount calculating means. Adjusting means for adjusting the parameters of the recognition model by the parameter adjustment amount, and adjusting the parameters of the recognition model by the adjusting means. When it is adjusted to the optimum parameters, it is characterized by being adapted to perform pattern recognition for the input pattern by using the recognition models parameter adjustment.

【００３８】また、請求項２０記載の発明は、判別関数
を用いて入力パターンが属するクラスを求め、正解との
損失を最小化するために、認識モデルのパラメータを調
整するパターン認識方法において、各クラスに対する判
別関数を少なくとも一次微分可能でかつ確率的制約をも
つ関数に変換して正規化した上で、正解に対する損失を
求め、正解に対する損失を最小にすることで、認識モデ
ルの最適なパラメータを求め、認識モデルのパラメータ
が最適なパラメータに調整されたときに、パラメータ調
整された認識モデルを用いて入力パターンに対するパタ
ーン認識を行なうようになっていることを特徴としてい
る。According to a twentieth aspect of the present invention, in the pattern recognition method for determining a class to which an input pattern belongs by using a discriminant function and adjusting parameters of a recognition model in order to minimize a loss from a correct answer, After converting the discriminant function for the class to a function with at least first-order differentiable and stochastic constraints and normalizing it, find the loss for the correct answer and minimize the loss for the correct answer to obtain the optimal parameters of the recognition model. When the parameters of the recognition model are adjusted to the optimum parameters, the recognition of the input pattern is performed using the parameter-adjusted recognition model.

【００３９】また、請求項２１記載の発明は、各クラス
に対する判別関数を少なくとも一次微分可能でかつ確率
的制約をもつ関数に変換して正規化した上で、正解に対
する損失を求め、正解に対する損失を最小にすること
で、所定のモデルの最適なパラメータを求める処理をコ
ンピュータに実行させるためのプログラムを記録したコ
ンピュータ読取可能な記録媒体である。According to a twenty-first aspect of the present invention, the discriminant function for each class is converted to a function having at least first-order differentiable and stochastic constraints and normalized, and the loss for the correct answer is obtained. Is a computer-readable recording medium on which a program for causing a computer to execute a process of obtaining an optimal parameter of a predetermined model by minimizing the parameter is recorded.

【００４０】[0040]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて説明する。図１は本発明に係る学習装置の構成
例を示す図である。図１を参照すると、この学習装置
は、判別関数を用いて入力パターンが属するクラスを求
め、正解との損失を最小化するために、所定のモデル１
００(例えば、前述あるいは後述の認識モデル)のパラメ
ータを調整(学習)するものであって、各クラスに対する
判別関数を少なくとも一次微分可能でかつ確率的制約を
もつ関数に変換して正規化する正規化手段１１と、正規
化手段１１によって正規化された判別関数値を用いて正
解に対する損失を算出する損失算出手段１２と、損失算
出手段１２で算出された損失を最小にするために、所定
のモデル１００の最適なパラメータ調整量を算出する最
適パラメータ調整量算出手段１３と、最適パラメータ調
整量算出手段１３によって算出された最適なパラメータ
調整量により所定のモデル１００のパラメータを調整す
る調整手段１４とを有している。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a learning device according to the present invention. Referring to FIG. 1, the learning apparatus obtains a class to which an input pattern belongs using a discriminant function, and uses a predetermined model 1 in order to minimize a loss from a correct answer.
00 (for example, a recognition model described above or below) that adjusts (learns) the parameters, and converts a discriminant function for each class into a function having at least first-order differentiable and stochastic constraints to normalize it. Converting means 11, a loss calculating means 12 for calculating a loss for a correct answer using the discriminant function value normalized by the normalizing means 11, and a predetermined value for minimizing the loss calculated by the loss calculating means 12. An optimal parameter adjustment amount calculating unit 13 for calculating an optimal parameter adjustment amount of the model 100; an adjusting unit 14 for adjusting a parameter of a predetermined model 100 based on the optimal parameter adjustment amount calculated by the optimal parameter adjustment amount calculating unit 13; have.

【００４１】ここで、損失算出手段１２は、正規化手段
１１によって正規化された判別関数値を入力パターンが
属するクラスへの可能性の分布として評価する尺度を用
いて、正解クラスに対する損失を求めるようになってい
る。Here, the loss calculating means 12 obtains a loss for the correct answer class by using a scale for evaluating the discriminant function value normalized by the normalizing means 11 as a distribution of possibility to the class to which the input pattern belongs. It has become.

【００４２】次に、このような構成の学習装置における
学習方法について説明する。この学習装置では、入力パ
ターンｘをクラス数Ｕのクラス１〜Ｕのいずれかに分類
する。すなわち、まず、入力パターンｘと所定のモデル
１００のパラメータΛが与えられた時のクラスｕに対す
る判別関数をｇ_u(ｘ，Λ)，(ｕ＝１，・・・，Ｕ）とす
る。ここで、入力パターンｘがクラスｕに属している場
合には、判別関数ｇ_uの値が大きくなるように設計され
ているものとする。Next, a learning method in the learning device having such a configuration will be described. In this learning device, the input pattern x is classified into one of classes 1 to U of the number of classes U. That is, first, the discriminant function for the class u when the input pattern x and the parameter の of the predetermined model 100 are given is g _u (x, Λ), (u = 1,..., U). Here, when the input pattern x belongs to a class u is assumed to be designed such that the value of the discriminant function g _u increases.

【００４３】図２はこのような所定のモデル１００を説
明するための図である。図２を参照すると、クラス１，
クラス２，…，クラスＵに対する判別関数をｇ₁(ｘ，
Λ)，ｇ₂(ｘ，Λ)，…，ｇ_U(ｘ，Λ)とするとき、パタ
ーンｘが入力した場合の所定のモデル１００の判別関数
値はｇ₁，ｇ₂，…，ｇ_Uとなる。ここで、例えば、入力
パターンｘがクラス１に属している場合には、判別関数
値ｇ₁が最も大きくなるように、すなわち、ｇ₁＞ｇ₂，
…，ｇ_Uとなるように、所定のモデル１００が設計され
ているとする。FIG. 2 is a diagram for explaining such a predetermined model 100. Referring to FIG. 2, class 1,
The discriminant function for class 2,..., Class U is g ₁ (x,
Λ), g ₂ (x, Λ),..., G _U (x, Λ), the discriminant function values of the predetermined model 100 when the pattern x is input are g ₁ , g ₂ _,. Becomes Here, for example, when the input pattern x belongs to class 1, as the discriminant function value g ₁ is largest, i.e., g _1> g _2,
.., G _{U are} assumed to be a predetermined model 100.

【００４４】この所定のモデル１００(判別関数ｇ
₁(ｘ，Λ)〜ｇ_U(ｘ，Λ))のパラメータΛを学習するの
に、図１の学習装置では、まず、正規化手段１１におい
て、ｉ番目の判別関数値ｇ_iに対して正規化された関数
値ａ_iを次式(数７)により導出する。The predetermined model 100 (discriminating function g)
_{_{1 (x, Λ) ~g U}} (x, Λ) for learning the parameters lambda in), the learning apparatus of FIG. 1, first, the normalization unit 11, with respect to i-th discriminant function values g _i The normalized function value a _i is derived by the following equation (Equation 7).

【００４５】[0045]

【数７】 (Equation 7)

【００４６】数７で表わされる関数値ａ_iは、一次微分
可能な関数(指数関数)による表現であり、また、次式
(数８)で表わされるように確率的制約をもっている。The function value a _i expressed by the equation (7) is expressed by a function (exponential function) that can be linearly differentiated.
It has a probabilistic constraint as expressed by (Equation 8).

【００４７】[0047]

【数８】 (Equation 8)

【００４８】このように正規化手段１１において、正規
化された関数値ａ_iが導出されると、損失算出手段１２
では、正解に対する損失を求める。数７，数８の関数値
ａ_i(ｉ＝１〜Ｕ)は、確率表現により、入力パターンｘ
が属するクラスの可能性の分布と解釈できる。従って、
損失算出手段１２では、この分布を評価する尺度を損失
とし、損失Ｅ(ｘ，Λ)を次式(数９)のように算出するこ
とができる。When the normalized function value a _i is derived in the normalizing means 11 as described above, the loss calculating means 12
Then, the loss for the correct answer is determined. The function values a _i (i = 1 to U) of the equations (7) and (8) are converted into the input pattern x by the probability expression.
Can be interpreted as a distribution of the likelihood of the class to which it belongs. Therefore,
The loss calculating means 12 can calculate the loss E (x, Λ) as in the following equation (Equation 9), with the measure for evaluating this distribution being the loss.

【００４９】[0049]

【数９】 (Equation 9)

【００５０】ここで、ｔ_uは、正解がｕ番目に属する可
能性を表わす関数値であり、ａ_iと同様に、数８と同様
の確率表現をもっている。Here, t _u is a function value representing the possibility that the correct answer belongs to the u-th, and has a probability expression similar to Equation 8 as in _ai .

【００５１】損失算出手段１２で数９のように損失Ｅ
(ｘ，Λ)が算出されると、最適パラメータ調整量算出手
段１３は、算出された損失Ｅ(ｘ，Λ)を最小にするよう
に、所定のモデル１００の最適なパラメータ調整量ΔΛ
を算出する。そして、調整手段１４では、最適パラメー
タ調整量算出手段１３によって算出された最適なパラメ
ータ調整量ΔΛにより数５を用いて所定のモデル１００
のパラメータを調整する。The loss calculation means 12 calculates the loss E
When (x, Λ) is calculated, the optimal parameter adjustment amount calculating means 13 sets the optimum parameter adjustment amount ΔΛ of the predetermined model 100 so as to minimize the calculated loss E (x, Λ).
Is calculated. Then, the adjusting unit 14 uses the optimum parameter adjustment amount ΔΛ calculated by the optimum parameter adjustment amount calculating unit 13 to obtain a predetermined model 100
Adjust the parameters of.

【００５２】このように、図１の学習装置では、確率的
な制約をもつ関数を用いて各クラスの判別関数値を正規
化することにより、扱うモデルによらずに、安定してモ
デルパラメータの調整を行ない、所定のモデル１００
(例えば、認識モデル)を安定して局所最適状態に到達さ
せることができる。As described above, in the learning apparatus of FIG. 1, by normalizing the discriminant function value of each class using a function having a probabilistic constraint, the model parameters can be stably set regardless of the model to be handled. After making adjustments, the model 100
(For example, a recognition model) can be stably reached to a local optimum state.

【００５３】また、図１の学習装置では、入力パターン
が属するクラスへの可能性を分布として評価する尺度を
用いて、正解クラスに対する損失を求めることにより、
早い段階で所定のモデル(例えば認識モデル）を局所最
適状態に到達させることができる。すなわち、各クラス
の判別関数値を一つの分布と捉えて計算することによ
り、各クラスに対応するモデルパラメータの修正量を同
時に決定することができて、高速に所定のモデル（例え
ば、認識モデル）を局所最適状態に到達させることがで
きる。Further, in the learning apparatus of FIG. 1, the loss for the correct answer class is obtained by using a scale for evaluating the possibility of the class to which the input pattern belongs as a distribution.
A predetermined model (for example, a recognition model) can reach a local optimum state at an early stage. That is, by calculating the discriminant function value of each class as one distribution, the correction amount of the model parameter corresponding to each class can be determined at the same time, and a predetermined model (for example, a recognition model) Can reach a local optimal state.

【００５４】なお、図１の学習装置は、数７乃至数９に
おいて、全クラス数Ｕを考慮しているが、計算負荷を小
さくするために、正規化された判別関数値のうち、正解
に対する判別関数値と正解に対する判別関数値以外でも
っともらしい値をもつ少なくとも１つの判別関数値とに
対して、入力パターンが属する可能性の分布として評価
する尺度を用いて、正解クラスに対する損失を求めるこ
ともできる。Although the learning apparatus of FIG. 1 takes into account the total number of classes U in Equations 7 to 9, in order to reduce the calculation load, the normalized discriminant function values for the correct answer are reduced. Finding the loss for the correct class using a measure that evaluates the distribution of the possibility that the input pattern belongs to the discriminant function value and at least one discriminant function value having a plausible value other than the discriminant function value for the correct answer. Can also.

【００５５】図３は図１の学習装置の変形例を示す図で
あり、図３の学習装置では、図１の学習装置において、
計算負荷を小さくするため、正規化手段１１の前段にク
ラス選択手段１５がさらに設けられている。このクラス
選択手段１５では、全クラス数Ｕのクラス１〜Ｕの中か
ら所定数Ｕ'のクラスを選択するようになっている。具
体的に、この場合のＵ'に含まれるクラスを求めるに
は、判別関数値ｇ₁〜ｇ_Uの大きさを判断し、正解クラス
と正解クラス以外で得点の大きい順に選択したクラスの
合計がＵ'になるようにクラス選択操作を行なえばよ
い。FIG. 3 is a diagram showing a modification of the learning apparatus of FIG. 1. In the learning apparatus of FIG. 3, the learning apparatus of FIG.
In order to reduce the calculation load, a class selection unit 15 is further provided before the normalization unit 11. The class selecting means 15 selects a predetermined number U 'of classes from classes 1 to U of the total number U of classes. Specifically, in order to obtain the class included in U ′ in this case, the magnitudes of the discriminant function values g _{1 to} g _U are determined, and the sum of the correct class and the class selected in the order of the highest score other than the correct class is calculated as What is necessary is just to perform a class selection operation so that it may become U '.

【００５６】このように、損失を正解と正解に近いいく
つかのクラスに限定して求めることにより、所定のモデ
ル１００(例えば認識モデル)のパラメータ調整に要する
計算量を削減することができる。すなわち、損失の計算
に要するクラス数を制御できるので、パラメータ調整に
要する計算量を削減できる。As described above, by determining the loss only for the correct answer and some classes close to the correct answer, it is possible to reduce the amount of calculation required for adjusting the parameters of the predetermined model 100 (for example, the recognition model). That is, since the number of classes required for calculating the loss can be controlled, the amount of calculation required for parameter adjustment can be reduced.

【００５７】また、上述した本発明の学習装置および学
習方法において、後述のように、尺度に用いる正解の分
布には、正解が属するクラスにのみ可能性が存在する分
布を用いることもできる。In the learning apparatus and the learning method of the present invention described above, as will be described later, the distribution of the correct answer used for the scale may be a distribution having a possibility only in the class to which the correct answer belongs.

【００５８】また、上述した本発明の学習装置および学
習方法では、入力パターンｘが可変長パターンであると
きに、可変長パターンに対する所定のモデル(例えば、
認識モデル)の局所最適化を高速に行なうことができ
る。具体的に、後述のように、入力パターンが可変長入
力パターンである場合に、所定のモデルとして、パター
ンの特徴量を評価するパラメータと部分パラメータの継
続長を評価するパラメータとで表現される状態遷移モデ
ルを用いて、可変長入力パターンを得点化することがで
きる。ここで、状態遷移モデルは、特徴量を評価するパ
ラメータによる得点と継続長を評価するパラメータによ
る得点との割合を調整可能になっている。In the learning apparatus and the learning method of the present invention described above, when the input pattern x is a variable-length pattern, a predetermined model (for example,
Local optimization of the recognition model can be performed at high speed. Specifically, as described later, when the input pattern is a variable-length input pattern, a state represented by a parameter for evaluating the feature of the pattern and a parameter for evaluating the continuation length of the partial parameter as a predetermined model A variable-length input pattern can be scored using a transition model. Here, the state transition model can adjust the ratio between the score based on the parameter for evaluating the feature amount and the score based on the parameter for evaluating the duration.

【００５９】また、上述した本発明の学習装置および学
習方法は、入力パターンｘが可変長パターンだけでな
く、入力パターンｘが静的なパターンに対する所定のモ
デル(例えば、認識モデル)の設計にも応用できる。Further, the learning apparatus and the learning method of the present invention described above are applicable not only to the design of a predetermined model (for example, a recognition model) for a static pattern in which the input pattern x is not only a variable-length pattern but also an input pattern x. Can be applied.

【００６０】また、上述した本発明の学習装置および学
習方法は、音声認識，文字認識などのパターン認識に適
用できる。図４，図５は本発明に係るパターン認識装置
の構成例を示す図である。なお、図４，図５のパターン
認識装置は、それぞれ図１，図３の学習装置を適用した
ものとなっている。従って、図４，図５において、図
１，図３に対応する箇所には同じ符号を付している。The above-described learning apparatus and learning method of the present invention can be applied to pattern recognition such as voice recognition and character recognition. FIG. 4 and FIG. 5 are diagrams showing a configuration example of the pattern recognition device according to the present invention. Note that the pattern recognition devices of FIGS. 4 and 5 apply the learning device of FIGS. 1 and 3, respectively. Therefore, in FIG. 4 and FIG. 5, the same reference numerals are given to portions corresponding to FIG. 1 and FIG.

【００６１】図４，図５のパターン認識装置では、図
１，図３の学習装置における所定のモデル１００とし
て、認識モデル(例えば、後述のように、音声認識に用
いられる継続時間長制御型状態遷移（ＤＳＴ）モデル)
が用いられる。In the pattern recognition apparatus shown in FIGS. 4 and 5, the predetermined model 100 in the learning apparatus shown in FIGS. 1 and 3 is a recognition model (for example, as described later, a state length control type used for voice recognition). Transition (DST) model)
Is used.

【００６２】すなわち、図４，図５のパターン認識装置
は、判別関数を用いて入力パターンｘが属するクラスを
求め、正解との損失を最小化するために、認識モデル１
００のパラメータを調整(学習)する学習装置を適用した
ものであって、各クラスに対する判別関数を少なくとも
一次微分可能でかつ確率的制約をもつ関数に変換して正
規化する正規化手段１１と、正規化手段１１によって正
規化された判別関数値を用いて正解に対する損失を算出
する損失算出手段１２と、損失算出手段１２で算出され
た損失を最小にすることで、認識モデル１００の最適な
パラメータ調整量を算出する最適パラメータ調整量算出
手段１３と、最適パラメータ調整量算出手段１３によっ
て算出された最適なパラメータ調整量により認識モデル
１００のパラメータを調整する調整手段１４とを有し、
調整手段１４によって認識モデル１００のパラメータが
最適なパラメータに調整されたときに、パラメータ調整
された認識モデル１００を用いて、未知の入力パターン
ｘに対するパターン認識を行なうように構成されてい
る。That is, the pattern recognition apparatus shown in FIGS. 4 and 5 obtains the class to which the input pattern x belongs by using the discriminant function, and minimizes the loss from the correct answer.
A learning device that adjusts (learns) the parameters of 00, and converts a discriminant function for each class into a function having at least first-order differentiable and stochastic constraints and normalizes the discriminant function; A loss calculating unit 12 that calculates a loss for a correct answer using the discriminant function value normalized by the normalizing unit 11 and an optimal parameter of the recognition model 100 by minimizing the loss calculated by the loss calculating unit 12. An optimal parameter adjustment amount calculation unit for calculating an adjustment amount; and an adjustment unit for adjusting the parameters of the recognition model 100 based on the optimum parameter adjustment amount calculated by the optimum parameter adjustment amount calculation unit.
When the parameters of the recognition model 100 are adjusted to the optimum parameters by the adjustment means 14, the recognition of the unknown input pattern x is performed using the parameter-adjusted recognition model 100.

【００６３】また、図４，図５のパターン認識装置で
は、認識モデル１００への入力パターンｘとして所定の
特徴パターンを生成する(例えば、入力音声から音声特
徴パターンを抽出する)特徴抽出手段１７が設けられて
いる。In the pattern recognition apparatus shown in FIGS. 4 and 5, the feature extraction means 17 generates a predetermined feature pattern as the input pattern x to the recognition model 100 (for example, extracts a voice feature pattern from an input voice). Is provided.

【００６４】図４，図５のパターン認識装置では、パラ
メータΛで表現可能なモデル(認識モデル)として、音声
パターンのような可変長パターンを扱うことが可能であ
る。具体的に、例えば、特許第２８０４２６５号に記載
の継続時間長制御型状態遷移（ＤＳＴ）モデルを基本モ
デルとして以下に説明する。The pattern recognition apparatuses shown in FIGS. 4 and 5 can handle a variable-length pattern such as a voice pattern as a model (recognition model) that can be expressed by the parameter Λ. Specifically, for example, a description will be given below using a duration length control type state transition (DST) model described in Japanese Patent No. 2804265 as a basic model.

【００６５】ＤＳＴモデルの個数は、クラスの数Ｕと同
じであり、音声パターンｘをｕ番目のモデルｕで測った
ときの得点を判別関数の値ｇ_u（ｘ，Λ）とし、次式(数
１０)で表わす。The number of DST models is the same as the number U of classes, and the score when the voice pattern x is measured by the u-th model u is defined as the value g _u (x, 値) of the discriminant function. It is expressed by Equation 10).

【００６６】[0066]

【数１０】 (Equation 10)

【００６７】数１０において、ｘは、特徴抽出手段１７
により生成された音声特徴パターンである。特徴抽出手
段１７における特徴抽出には、よく知られたＬＰＣ（線
形予測）分析などを用いることができる。例えば、特徴
抽出条件を標本化周波数：８ｋＨｚ、高域強調：一次差
分、２５６点ハミング窓、移動幅：１６ｍｓ、ＬＰＣ分
析次数：２０とし、１０次元メルケプストラム係数＋対
数パワーの一次差分＋対数パワーという特徴量をフレー
ム単位で抽出する。なお、特徴抽出条件としては、上記
のものに限定されるものではなく、周波数分析などの任
意の抽出手法を用いることができる。In the equation (10), x is the characteristic extracting means 17
Is a voice feature pattern generated by the above. A well-known LPC (linear prediction) analysis or the like can be used for the feature extraction in the feature extraction unit 17. For example, the feature extraction conditions are as follows: sampling frequency: 8 kHz, high-frequency emphasis: first-order difference, 256-point Hamming window, moving width: 16 ms, LPC analysis order: 20, 10-dimensional mel-cepstral coefficient + first-order difference of logarithmic power + logarithmic power Is extracted for each frame. Note that the feature extraction conditions are not limited to those described above, and any extraction method such as frequency analysis can be used.

【００６８】また、数１０において、r(・)は、照合に
より得られた音声パターンとモデルの各状態との対応関
係を表し、r(ｎ)は、第ｎ状態と対応した部分パターン
の終了フレーム番号とする。In Equation 10, r (·) represents the correspondence between the voice pattern obtained by the matching and each state of the model, and r (n) represents the end of the partial pattern corresponding to the n-th state. Frame number.

【００６９】また、数１０において、Ｓ_nは、特徴量を
評価するパラメータ(特徴量に関する第ｎ状態の得点)で
あり、次式（数１１）のように定義される。[0069] Further, in the equation 10, S _n is a parameter for evaluating the characteristic quantity (scores of the n state related features), is defined as follows (number 11).

【００７０】[0070]

【数１１】 [Equation 11]

【００７１】数１１において、Ｔ_nはバイアス値であ
り、Ｄは各状態における局所距離を表わしている。な
お、Ｄには、次式（数１２）の重み付き２乗ユークリッ
ド距離を用いることとする。In Equation 11, T _n is a bias value, and D represents a local distance in each state. The weighted square Euclidean distance of the following equation (Equation 12) is used for D.

【００７２】[0072]

【数１２】 (Equation 12)

【００７３】数１２において、μ_n＝（μ_nk），σ² _n＝
（σ² _nk），（ｋ＝１，・・・，Ｋ）は、それぞれ、第
ｎ状態の平均，分散である。なお、ｋはＫ次元ベクトル
の要素番号を表わしている。また、パターンｘはＭフレ
ームからなり、ｘ_m＝（ｘ_mk），（ｍ＝１，・・・，
Ｍ）は、フレーム番号ｍの音声特徴量を表わしている。In Equation 12, μ _n = (μ _nk ), σ ² _n =
(Σ ² _nk ) and (k = 1,..., K) are the mean and variance of the n-th state, respectively. Note that k represents an element number of a K-dimensional vector. The pattern x is composed of M frames, and x _m = (x _mk ), (m = 1,...,
M) represents the audio feature amount of the frame number m.

【００７４】また、数１０において、Ｒ_nは、継続長を
評価するパラメータ(第ｎ状態の継続時間に関する距離)
であり、次式(数１３)のように表わされる。In Equation 10, R _n is a parameter for evaluating the duration (distance relating to the duration of the n-th state)
And is represented by the following equation (Equation 13).

【００７５】[0075]

【数１３】 (Equation 13)

【００７６】数１３において、Ｊは、重み付き２乗ユー
クリッド距離であり、次式(数１４)のように表わされ
る。In Expression 13, J is a weighted squared Euclidean distance, and is expressed by the following expression (Expression 14).

【００７７】[0077]

【数１４】 [Equation 14]

【００７８】ここで、τ_n，ζ_n ²は、それぞれ、各状態
の継続時間長の平均，分散である。また、ｌ_nは各状態
に対応付けられた部分パターンの時間長である。Here, τ _n and ζ _n ² are the average and the variance of the duration of each state, respectively. Further, l _n is the length of time of the partial pattern associated with each state.

【００７９】また、数１０において、ｚ_n（０≦ｚ_n≦
１）は、Ｓ_nとＲ_nとから得られた得点の割合を調整する
重みであり、ｚ_nの値が大きいほど、継続時間評価に関
する得点の影響が小さくなる。Further, in Expression 10, z _n (0 ≦ z _n ≦
1) is a weight to adjust proportion of scores obtained from the S _n and R _n, as the value of z _n is large, the influence of the score on evaluation duration decreases.

【００８０】数１０の判別関数値は、特許第２８０４２
６５号に示されているように、動的計画法に継続時間評
価に関する得点を組み込みながら状態探索を行なうこと
により、求めることができる。The discriminant function value of Equation 10 is disclosed in Japanese Patent No. 280402.
As shown in No. 65, it can be obtained by performing a state search while incorporating a score regarding the duration evaluation into the dynamic programming.

【００８１】すなわち、数９で定義した損失Ｅ(ｘ，Λ)
の最小化を行うため、上記認識モデルｇ_u（ｘ，Λ）の
パラメータを最適パラメータ調整量算出手段１３と調整
手段１４とを用いて確率的降下法により調整（学習）す
る。以下では、認識モデルｇ _u（ｘ，Λ）の各パラメー
タにクラスを表すｕを加えて説明することにする。That is, the loss E (x, Λ) defined by equation 9
To minimize the recognition model g_u(X, Λ)
Adjust parameters with optimal parameter adjustment amount calculating means 13
Adjustment (learning) by the stochastic descent method using the means 14
You. In the following, the recognition model g _uEach parameter of (x, Λ)
The description will be made by adding u representing the class to the data.

【００８２】まず、モデルパラメータの中から、特徴量
に関する平均ベクトルを対象とする。この場合、数９の
損失関数Ｅ（ｘ，Λ）からパラメータの修正量は次式
（数１５）のように計算できる。First, an average vector related to a feature amount is targeted from among the model parameters. In this case, the amount of parameter correction can be calculated from the loss function E (x, Λ) of Equation 9 as in the following Equation (Equation 15).

【００８３】[0083]

【数１５】 (Equation 15)

【００８４】数１５において、ｘ_c(n)kはｎ番目の状態
に対応付けられた音声特徴量のｋ次元目の要素を示す。In Expression 15, x _{c (n) k} indicates the _k- th element of the audio feature amount associated with the n-th state.

【００８５】同様に、特徴量に関する分散の修正量は次
式（数１６）のように計算できる。Similarly, the variance correction amount related to the feature amount can be calculated as in the following equation (Equation 16).

【００８６】[0086]

【数１６】 (Equation 16)

【００８７】次に、継続長に関するパラメータについて
も、修正量を計算する。継続長に関する平均ベクトル，
分散ベクトルについて求めると、それぞれ、数１７，数
１８のように計算できる。Next, the correction amount is calculated for the parameter relating to the continuation length. Mean vector for duration,
When the variance vector is obtained, it can be calculated as in Equations 17 and 18, respectively.

【００８８】[0088]

【数１７】 [Equation 17]

【００８９】[0089]

【数１８】 (Equation 18)

【００９０】続いて、特徴量と継続長の評価割合を決定
するパラメータについても、修正量を求めると、次式
（数１９）のように計算できる。Subsequently, the parameter for determining the evaluation ratio of the feature amount and the duration can also be calculated as in the following equation (Equation 19) by obtaining the correction amount.

【００９１】[0091]

【数１９】 [Equation 19]

【００９２】数１５乃至数１９が計算できれば、各モデ
ルパラメータは、数５を用いて調整できる。この時、正
解の分布をどのように与えるかが問題となるが、正解が
属するクラスｐにのみ可能性が存在する次式（数２０）
の分布ｔ_uを用いることができる。If equations 15 to 19 can be calculated, each model parameter can be adjusted using equation 5. At this time, how to give the distribution of the correct answer becomes a problem, but the following equation (Equation 20) exists only in the class p to which the correct answer belongs.
It can be used for distribution t _u.

【００９３】[0093]

【数２０】 (Equation 20)

【００９４】数１５乃至数２０から、モデルの各パラメ
ータ（特徴量を評価するパラメータの平均ベクトル，特
徴量を評価するパラメータの分散，継続長を評価するパ
ラメータの平均ベクトル，継続長を評価するパラメータ
の分散，特徴量を評価するパラメータによる得点と継続
長を評価するパラメータによる得点との割合）は、前述
した本発明の学習方法を用いて(すなわち、調整手段１
４を用いて)、次のように調整することができる。From Equations 15 to 20, the parameters of the model (the average vector of the parameter for evaluating the feature, the variance of the parameter for evaluating the feature, the average vector of the parameter for evaluating the duration, and the parameter for evaluating the duration) (The ratio between the score of the parameter for evaluating the feature amount and the score of the parameter for evaluating the continuation length) using the learning method of the present invention described above (that is, the adjusting means 1).
4) can be adjusted as follows.

【００９５】すなわち、特徴量を評価するパラメータの
平均ベクトルは、次式(数２１)によって調整できる。That is, the average vector of the parameter for evaluating the characteristic amount can be adjusted by the following equation (Equation 21).

【００９６】[0096]

【数２１】 (Equation 21)

【００９７】また、特徴量を評価するパラメータの分散
は、次式(数２２)によって調整できる。The variance of the parameter for evaluating the feature can be adjusted by the following equation (Equation 22).

【００９８】[0098]

【数２２】 (Equation 22)

【００９９】また、継続長を評価するパラメータの平均
ベクトルは、次式(数２３)によって調整できる。The average vector of the parameter for evaluating the continuation length can be adjusted by the following equation (Equation 23).

【０１００】[0100]

【数２３】 (Equation 23)

【０１０１】また、継続長を評価するパラメータの分散
は、次式(数２４)によって調整できる。The variance of the parameter for evaluating the continuation length can be adjusted by the following equation (Equation 24).

【０１０２】[0102]

【数２４】 (Equation 24)

【０１０３】また、特徴量を評価するパラメータによる
得点と継続長を評価するパラメータによる得点との割合
は、次式(数２５)によって調整できる。The ratio between the score obtained by the parameter for evaluating the characteristic amount and the score obtained by the parameter for evaluating the continuation length can be adjusted by the following equation (Equation 25).

【０１０４】[0104]

【数２５】 (Equation 25)

【０１０５】このように、本発明のパターン認識装置お
よびパターン認識方法では、入力パターンｘが可変長パ
ターンであるときに、可変長パターンに対する認識モデ
ルの局所最適化を高速に行ない、かつ高精度に入力パタ
ーンｘに対するパターン認識を行なうことができる。ま
た、可変長パターンだけでなく、静的なパターンに対す
る認識モデルの設計にも応用できる。As described above, in the pattern recognition apparatus and the pattern recognition method of the present invention, when the input pattern x is a variable length pattern, local optimization of a recognition model for the variable length pattern is performed at high speed and with high accuracy. Pattern recognition for the input pattern x can be performed. Further, the present invention can be applied to design of a recognition model for not only a variable-length pattern but also a static pattern.

【０１０６】図６は図１，図３，図４あるいは図５の学
習装置あるいはパターン認識装置のハードウェア構成例
を示す図である。図６を参照すると、図１，図３，図４
あるいは図５の学習装置あるいはパターン認識装置は、
例えばワークステーションやパーソナルコンピュータ等
で実現され、全体を制御するＣＰＵ２１と、ＣＰＵ２１
の制御プログラム等が記憶されているＲＯＭ２２と、Ｃ
ＰＵ２１のワークエリア等として使用されるＲＡＭ２３
と、データを記憶するハードディスク２４と、パターン
入力部(または音声入力部)２５とを有している。FIG. 6 is a diagram showing an example of a hardware configuration of the learning device or the pattern recognition device shown in FIG. 1, FIG. 3, FIG. 4 or FIG. Referring to FIG. 6, FIGS.
Alternatively, the learning device or the pattern recognition device of FIG.
For example, a CPU 21 that is realized by a workstation, a personal computer, or the like and controls the whole,
ROM 22 storing the control program of
RAM 23 used as a work area of PU 21
And a hard disk 24 for storing data, and a pattern input unit (or a voice input unit) 25.

【０１０７】ここで、所定のモデル１００は、例えばＲ
ＡＭ２３などに設定され、ＣＰＵ２１によって読み出さ
れ使用されるようになっている。また、ＣＰＵ２１は、
図１，図３，図４あるいは図５の正規化手段１１，損失
算出手段１２，最適パラメータ調整量算出手段１３，調
整手段１４，クラス選択手段１５，特徴抽出手段１７の
機能を有している。Here, the predetermined model 100 is, for example, R
It is set in the AM 23 or the like, and is read and used by the CPU 21. Further, the CPU 21
It has the functions of the normalizing means 11, the loss calculating means 12, the optimal parameter adjustment amount calculating means 13, the adjusting means 14, the class selecting means 15, and the feature extracting means 17 shown in FIG. 1, FIG. 3, FIG. .

【０１０８】なお、ＣＰＵ２１におけるこのような正規
化手段１１，損失算出手段１２，最適パラメータ調整量
算出手段１３，調整手段１４，クラス選択手段１５，特
徴抽出手段１７等としての機能は、例えばソフトウェア
パッケージ(具体的には、ＣＤ−ＲＯＭ等の情報記録媒
体)の形で提供することができ、このため、図６の例で
は、情報記録媒体３０がセットさせるとき、これを駆動
する媒体駆動装置３１が設けられている。The functions of the CPU 21 such as the normalizing means 11, the loss calculating means 12, the optimum parameter adjustment amount calculating means 13, the adjusting means 14, the class selecting means 15, the feature extracting means 17 and the like are, for example, software packages. (Specifically, an information recording medium such as a CD-ROM) can be provided. For this reason, in the example of FIG. 6, when the information recording medium 30 is set, the medium driving device 31 drives the information recording medium 30. Is provided.

【０１０９】換言すれば、本発明の学習装置，パターン
認識装置は、イメージスキャナ，ディスプレイ等を備え
た汎用の計算機システムにＣＤ−ＲＯＭ等の情報記録媒
体に記録されたプログラムを読み込ませて、この汎用計
算機システムのマイクロプロセッサに学習処理，パター
ン認識処理を実行させる装置構成においても実施するこ
とが可能である。この場合、本発明の学習処理，パター
ン認識処理を実行するためのプログラム(すなわち、ハ
ードウェアシステムで用いられるプログラム)は、媒体
に記録された状態で提供される。プログラムなどが記録
される情報記録媒体としては、ＣＤ−ＲＯＭに限られる
ものではなく、ＲＯＭ，ＲＡＭ，フレキシブルディス
ク，メモリカード等が用いられても良い。媒体に記録さ
れたプログラムは、ハードウェアシステムに組み込まれ
ている記憶装置、例えばハードディスク装置にインスト
ールされることにより、このプログラムを実行して、本
発明の学習機能，パターン認識機能を実現することがで
きる。In other words, the learning apparatus and the pattern recognition apparatus of the present invention allow a general-purpose computer system having an image scanner, a display, and the like to read a program recorded on an information recording medium such as a CD-ROM. The present invention can also be implemented in an apparatus configuration in which a microprocessor of a general-purpose computer system executes a learning process and a pattern recognition process. In this case, a program for executing the learning process and the pattern recognition process of the present invention (that is, a program used in the hardware system) is provided in a state recorded on a medium. The information recording medium on which the program or the like is recorded is not limited to a CD-ROM, but may be a ROM, a RAM, a flexible disk, a memory card, or the like. The program recorded on the medium is installed in a storage device incorporated in the hardware system, for example, a hard disk device, so that the program can be executed to realize the learning function and the pattern recognition function of the present invention. it can.

【０１１０】また、本発明の学習機能，パターン認識機
能を実現するためのプログラムは、媒体の形で提供され
るのみならず、ネットワークからダウンロードされて提
供されるものであっても良い。The program for realizing the learning function and the pattern recognition function of the present invention may be provided not only in the form of a medium but also downloaded from a network.

【０１１１】[0111]

【発明の効果】以上に説明したように、請求項１乃至請
求項１８，請求項２１記載の発明によれば、判別関数を
用いて入力パターンが属するクラスを求め、正解との損
失を最小化するために、所定のモデルのパラメータを調
整する場合に、各クラスに対する判別関数を少なくとも
一次微分可能でかつ確率的制約をもつ関数に変換して正
規化した上で、正解に対する損失を求め、正解に対する
損失を最小にすることで、所定のモデルの最適なパラメ
ータを求めるので、扱うモデルによらずに、安定してモ
デルパラメータの調整を行ない、所定のモデルの局所最
適状態に到達させることができる。As described above, according to the present invention, the class to which the input pattern belongs is obtained by using the discriminant function, and the loss from the correct answer is minimized. In order to adjust the parameters of the predetermined model, the discriminant function for each class is converted to a function having at least first-order differentiable and stochastic constraints and normalized, and the loss to the correct answer is obtained. , The optimum parameters of the predetermined model are obtained, so that the model parameters can be adjusted stably regardless of the model to be handled, and the local optimum state of the predetermined model can be reached. .

【０１１２】特に、請求項２，請求項４，請求項９，請
求項１１記載の発明によれば、正規化された判別関数値
を入力パターンが属するクラスへの可能性の分布として
評価する尺度を用いて正解クラスに対する損失を求める
ので、早い段階で所定のモデル(例えば認識モデル）の
局所最適状態に到達させることができる。すなわち、各
クラスの判別関数値を一つの分布と捉えて計算すること
により、各クラスに対応するモデルパラメータの修正量
を同時に決定することができて、高速に所定のモデル
（例えば、認識モデル）の局所最適状態に到達させるこ
とができる。In particular, according to the second, fourth, ninth, and eleventh aspects of the present invention, a scale for evaluating a normalized discriminant function value as a distribution of possibility to a class to which an input pattern belongs. Is used to determine the loss for the correct answer class, so that a local optimal state of a predetermined model (for example, a recognition model) can be reached at an early stage. That is, by calculating the discriminant function value of each class as one distribution, the correction amount of the model parameter corresponding to each class can be determined at the same time, and a predetermined model (for example, a recognition model) Can reach the local optimal state of

【０１１３】また、請求項３，請求項１０記載の発明に
よれば、損失を正解のクラスと正解に近いいくつかのク
ラスとに限定して求めることにより、所定のモデル(例
えば認識モデル)のパラメータ調整に要する計算量を削
減することができる。すなわち、損失の計算に要するク
ラス数を制御できるので、パラメータ調整に要する計算
量に削減できる。Further, according to the third and tenth aspects of the present invention, the loss is obtained by limiting the loss to the correct class and some classes close to the correct one to obtain a predetermined model (for example, a recognition model). The amount of calculation required for parameter adjustment can be reduced. That is, since the number of classes required for calculating the loss can be controlled, the amount of calculation required for parameter adjustment can be reduced.

【０１１４】また、請求項５，請求項６，請求項１２，
請求項１３記載の発明によれば、入力パターンが可変長
入力パターンである場合に、所定のモデルとして、パタ
ーンの特徴量を評価するパラメータと部分パラメータの
継続長を評価するパラメータとで表現される状態遷移モ
デルを用いて、可変長入力パターンを得点化するので、
入力パターンｘが可変長パターンであるときに、可変長
パターンに対する認識モデルの局所最適化を高速に行な
い、かつ高精度に入力パターンｘに対するパターン認識
を行なうことができる。また、可変長パターンだけでな
く、静的なパターンに対する認識モデルの設計にも応用
できる。Further, claim 5, claim 6, claim 12,
According to the thirteenth aspect, when the input pattern is a variable-length input pattern, the predetermined model is represented by a parameter for evaluating the characteristic amount of the pattern and a parameter for evaluating the continuation length of the partial parameter. Using a state transition model to score variable-length input patterns,
When the input pattern x is a variable-length pattern, local optimization of a recognition model for the variable-length pattern can be performed at high speed, and pattern recognition for the input pattern x can be performed with high accuracy. Further, the present invention can be applied to design of a recognition model for not only a variable-length pattern but also a static pattern.

【０１１５】また、請求項１４乃至請求項１８記載の発
明によれば、調整可能なモデルパターンを同一の損失を
用いて調整することにより、効率的にパラメータの調整
を行なうことができる。Further, according to the present invention, the parameters can be adjusted efficiently by adjusting the adjustable model patterns using the same loss.

【０１１６】また、請求項１９，請求項２０記載の発明
によれば、各クラスに対する判別関数を少なくとも一次
微分可能でかつ確率的制約をもつ関数に変換して正規化
した上で、正解に対する損失を求め、正解に対する損失
を最小にすることで、認識モデルの最適なパラメータを
求め、認識モデルのパラメータが最適なパラメータに調
整されたときに、パラメータ調整された認識モデルを用
いて入力パターンに対するパターン認識を行なうように
なっているので、パターン認識精度を向上させることが
できる。According to the nineteenth and twentieth aspects of the present invention, the discriminant function for each class is converted to a function having at least first-order differentiable and stochastic constraints, and is normalized. Is determined, and the optimal parameter of the recognition model is determined by minimizing the loss to the correct answer. When the parameters of the recognition model are adjusted to the optimal parameters, the pattern for the input pattern is determined using the parameter-adjusted recognition model. Since recognition is performed, pattern recognition accuracy can be improved.

[Brief description of the drawings]

【図１】本発明に係る学習装置の構成例を示す図であ
る。FIG. 1 is a diagram showing a configuration example of a learning device according to the present invention.

【図２】所定のモデルを説明するための図である。FIG. 2 is a diagram for explaining a predetermined model.

【図３】図１の学習装置の変形例を示す図である。FIG. 3 is a diagram showing a modification of the learning device of FIG. 1;

【図４】本発明に係るパターン認識装置の構成例を示す
図である。FIG. 4 is a diagram showing a configuration example of a pattern recognition device according to the present invention.

【図５】図４のパターン認識装置の変形例を示す図であ
る。FIG. 5 is a diagram showing a modification of the pattern recognition device of FIG. 4;

【図６】図１または図３の学習装置あるいは図４または
図５のパターン認識装置のハードウェア構成例を示す図
である。FIG. 6 is a diagram illustrating an example of a hardware configuration of the learning device of FIG. 1 or 3 or the pattern recognition device of FIG. 4 or 5;

[Explanation of symbols]

１１正規化手段１２損失算出手段１３最適パラメータ調整量算出手段１４調整手段１５クラス選択手段１７特徴抽出手段１００所定のモデル（認識モデル）２１ＣＰＵ２２ＲＯＭ２３ＲＡＭ２４ハードディスク２５パターン入力部または音声入力部３０記録媒体３１媒体駆動装置 DESCRIPTION OF SYMBOLS 11 Normalization means 12 Loss calculation means 13 Optimum parameter adjustment amount calculation means 14 Adjustment means 15 Class selection means 17 Feature extraction means 100 Predetermined model (recognition model) 21 CPU 22 ROM 23 RAM 24 Hard disk 25 Pattern input unit or voice input unit Reference Signs List 30 recording medium 31 medium driving device

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/14 Ｇ１０Ｌ 3/00 ５３５Ｚ 15/16 ５３９Ｆターム(参考） 5B049 DD00 DD03 DD05 EE03 EE08 EE31 FF03 FF04 FF09 5B056 BB01 BB71 BB91 5D015 HH23 JJ00 5L096 BA16 BA17 CA22 EA17 FA34 GA09 GA30 KA04 9A001 GG05 HH21 JJ74 KK09 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/14 G10L 3/00 535Z 15/16 539 F term (Reference) 5B049 DD00 DD03 DD05 EE03 EE08 EE31 FF03 FF04 FF09 5B056 BB01 BB71 BB91 5D015 HH23 JJ00 5L096 BA16 BA17 CA22 EA17 FA34 GA09 GA30 KA04 9A001 GG05 HH21 JJ74 KK09

Claims

[Claims]

1. A learning apparatus for determining a class to which an input pattern belongs by using a discriminant function and adjusting a parameter of a predetermined model in order to minimize a loss from a correct answer. Normalization means for converting to a function having a probable and stochastic constraint and normalizing the loss function, and a loss calculation means for calculating a loss for a correct answer using the discriminant function value normalized by the normalization means,
An optimal parameter adjustment amount calculating unit that calculates an optimal parameter adjustment amount of a predetermined model by minimizing the loss calculated by the loss calculating unit; and an optimal parameter adjustment amount calculated by the optimal parameter adjustment amount calculating unit. And an adjusting means for adjusting a parameter of a predetermined model by using the learning device.

2. The learning apparatus according to claim 1, wherein the loss calculation means uses a scale for evaluating the discriminant function value normalized by the normalization means as a distribution of the possibility of a class to which the input pattern belongs. A learning device for determining a loss for a correct answer class.

3. The learning device according to claim 1, wherein the loss calculating means is plausible among the discriminant function values for the correct answer and the discriminant function values for the correct answer among the discriminant function values normalized by the normalizing means. A learning apparatus for determining a loss for a correct answer class using at least one discriminant function value having a value and a scale that is evaluated as a distribution of a possibility that an input pattern belongs to.

4. The learning device according to claim 2, wherein the distribution of the correct answer used for the scale is a distribution having a possibility only in the class to which the correct answer belongs. .

5. The learning apparatus according to claim 1, wherein, when the input pattern is a variable-length input pattern, a parameter for evaluating a feature of the pattern is used as the predetermined model. A learning device, wherein a variable-length input pattern is scored using a state transition model expressed by a parameter for evaluating the continuation length of a partial parameter.

6. The learning apparatus according to claim 5, wherein the state transition model is capable of adjusting a ratio between a score based on the parameter for evaluating the feature amount and a score based on the parameter for evaluating the duration. Characteristic learning device.

7. A learning method for determining a class to which an input pattern belongs by using a discriminant function and adjusting parameters of a predetermined model in order to minimize a loss from a correct answer, wherein a discriminant function for each class is at least a first derivative. After converting to a function having a probable and stochastic constraint and normalizing, a loss for the correct answer is obtained, and by minimizing the loss for the correct answer, an optimal parameter of the predetermined model is obtained. Learning method.

8. The learning method according to claim 7, wherein a parameter of a predetermined model is obtained by a stochastic descent method in order to minimize a loss from a correct answer.

9. The learning method according to claim 7, wherein a loss for a correct answer class is obtained by using a scale for evaluating the normalized discriminant function value as a distribution of possibility to a class to which an input pattern belongs. And learning method.

10. The learning method according to claim 7, wherein, among the normalized discriminant function values, at least one discriminant function value having a plausible value other than the discriminant function value for the correct answer and the discriminant function value for the correct answer. A learning method for determining a loss for a correct answer class using a scale that is evaluated as a distribution of a possibility that an input pattern belongs.

11. The learning method according to claim 9, wherein the distribution of the correct answer used for the scale is a distribution having a possibility only in the class to which the correct answer belongs.

12. The learning method according to claim 7, wherein when the input pattern is a variable-length input pattern, a parameter for evaluating a feature of the pattern is used as the predetermined model. A learning method characterized in that a variable-length input pattern is scored using a state transition model represented by a parameter for evaluating the continuation length of a partial parameter.

13. The learning method according to claim 12, wherein
The learning method, wherein the state transition model is capable of adjusting a ratio between a score based on the parameter for evaluating the feature amount and a score based on the parameter for evaluating the duration.

14. The learning method according to claim 12, wherein an average vector of the parameter for evaluating the feature amount is calculated using the learning method according to any one of claims 7 to 11. A learning method characterized by adjusting.

15. The learning method according to claim 12, wherein a variance of a parameter for evaluating the feature amount is adjusted using the learning method according to any one of claims 7 to 11. Learning method characterized by doing.

16. The learning method according to claim 12, wherein an average vector of the parameter for evaluating the continuation length is determined by using the learning method according to claim 7. A learning method characterized by adjusting.

17. The learning method according to claim 12, wherein a variance of the parameter for evaluating the continuation length is adjusted using the learning method according to any one of claims 7 to 11. Learning method characterized by doing.

18. The learning method according to claim 13, wherein
12. A learning method comprising: adjusting a ratio of a score based on a parameter for evaluating the feature amount and a score based on a parameter for evaluating a continuation length by using the learning method according to any one of claims 7 to 11. Method.

19. A pattern recognition apparatus for obtaining a class to which an input pattern belongs by using a discriminant function and adjusting a parameter of a recognition model in order to minimize a loss from a correct answer. Normalization means for converting the function into a function having a probable and stochastic constraint and normalizing the loss function; loss calculation means for calculating a loss for a correct answer using the discriminant function value normalized by the normalization means; By calculating the optimal parameter adjustment amount of the recognition model by minimizing the loss calculated in step (a), an optimal parameter adjustment amount is calculated by the optimal parameter adjustment amount calculated by the optimal parameter adjustment amount calculation unit. Adjusting means for adjusting parameters, whereby the parameters of the recognition model are adjusted to optimal parameters by the adjusting means. A pattern recognition device configured to perform pattern recognition on an input pattern by using a recognition model adjusted for parameters when the recognition is performed.

20. A pattern recognition method for determining a class to which an input pattern belongs using a discriminant function and adjusting parameters of a recognition model in order to minimize a loss from a correct answer, wherein a discriminant function for each class is at least a first derivative. After converting to a function having a probable and stochastic constraint and normalizing, the loss for the correct answer is obtained, and the loss for the correct answer is minimized, thereby obtaining the optimal parameters of the recognition model. A pattern recognition method comprising: performing pattern recognition on an input pattern using a recognition model adjusted for parameters when the parameters are adjusted to optimal parameters.

21. A discriminant function for each class is converted to a function having at least first-order differentiability and a stochastic constraint and normalized, and then a loss for a correct answer is determined. And a computer-readable recording medium on which a program for causing a computer to execute a process of obtaining an optimal parameter of the model is recorded.