JP3364234B2

JP3364234B2 - Pattern recognition device

Info

Publication number: JP3364234B2
Application number: JP22568491A
Authority: JP
Inventors: 亨今井; 彰男安藤
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1991-09-05
Filing date: 1991-09-05
Publication date: 2003-01-08
Anticipated expiration: 2018-01-08
Also published as: JPH0566791A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、パターン認識率を向上
させることのできるＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖ
Ｍｏｄｅｌ）によるパターン認識装置に関する。【０００２】［発明の概要］本発明は、ＨＭＭにより、時系列のパターン認識を行う
パターン認識装置に関するもので、与えられた全ての学
習データから認識誤りが起こる可能性を表す関数を求
め、これを最小化するＨＭＭパラメータを算出すること
によって、従来に比べて認識能力の高いＨＭＭによるパ
ターン認識を可能とするものである。【０００３】【従来の技術】従来のＨＭＭパラメータ学習方法には、
例えば、次の３つの学習方法が知られている。【０００４】（１）Ｂａｕｍ−ＷｅｌｃｈアルゴリズムこれはＦｏｒｗａｒｄ−Ｂａｃｋｗａｒｄアルゴリズム
ともいい、広く用いられている手法である。この手法
は、ＨＭＭパラメータである状態遷移確率Ａと出力確率
Ｂとが与えられた時、学習データＯをＨＭＭが出力する
尤度Ｌ（Ｏ｜Ａ，Ｂ）を最大とするように、最尤推定に
基づいて新たな状態遷移確率Ａと出力確率Ｂとを推定す
る方法である。このパラメータ推定を繰り返すことで、
ＨＭＭパラメータを局所的な最適値に収束させるもので
ある（例えば、L.R.Rabiner, B.H.Juang,"An Introduct
ion to Hidden Markov Models",IEEE ASSP MAGAGINE,JA
N.1986,pp.4-16. 参照）。【０００５】（２）誤り訂正学習この手法は、まずＢａｕｍ−Ｗｅｌｃｈアルゴリズムに
よるＨＭＭパラメータ推定後に学習データを認識させ、
誤認識あるいは正しく認識しても尤度差が小さい時に、
ＨＭＭパラメータを修正する。つまり真のカテゴリに対
しては学習データの記号（ラベル）の度数を出力確率Ｂ
に加え、誤ったカテゴリあるいは尤度差の小さいカテゴ
リからは引く、という修正を行う。これにより、学習デ
ータに対する認識性能を向上させるものである（例え
ば、L.R.Bahl,P.F.Brown,P.V.de Souza,and R.L.Merce
r,"A New Algorithm for the Estimation of Hidden Ma
rkovModel Parameters",Proceeding of the 1988 IEEE
International Conference on Acoustics,Speech and S
ignal Processing,pp493-496参照）。【０００６】（３）アニーリングによるＨＭＭの学習この手法は、自カテゴリの学習データのみから作られる
対数尤度を負にしたエネルギー関数を、アニーリングの
手法で減少させる。ＨＭＭパラメータは状態遷移確率Ａ
と出力確率Ｂを交互に選び、ガウス分布に従う乱数を加
えるものである（例えば、Douglas B.Paul,"Training of HMM Recogniz
ers by Simulated Annealing",Proceedings of the 198
5 IEEE Internatinal Conference on Acoustics,Speec
h,and Signal Processing,pp13-16参照）。【０００７】【発明が解決しようとする課題】しかしながら、（１）
に示したＢａｕｍ−Ｗｅｌｃｈアルゴリズムと、（３）
に示したアニーリングによるＨＭＭの学習では、他カテ
ゴリの学習データを考慮していないので、これらの手法
は必ずしもカテゴリ間の分離度を高めようとする学習法
ではない。また、（１）のＢａｕｍ−Ｗｅｌｃｈアルゴ
リズムを用いると、ＨＭＭパラメータは初期値に依存す
る局所的な最適値に収束するが、その収束値が必ずしも
最適値とはならない。【０００８】一方、（２）に示した誤り訂正学習は、他
カテゴリの学習データを考慮しており、学習データに対
しては非常に良い結果を示すが、未知のデータに対して
はさほど認識率の向上が見られない。これは、誤り訂正
学習が、真のカテゴリと誤ったカテゴリあるいは尤度差
の小さいカテゴリの間でのみＨＭＭパラメータの修正を
行うので、カテゴリ全体として認識しやすい方向に修正
されているとはいえないからである。また、学習データ
に適応しすぎるので、ばらつきの多いカテゴリを扱う場
合、未知のデータに対しては認識性能が高くならない。【０００９】本発明は上記事情に鑑みてなされたもので
あり、その目的は、全カテゴリに対する全ての学習デー
タから認識誤りが起こる可能性を表す関数を求め、これ
を最小化するＨＭＭパラメータを算出することにより、
従来法に比べて認識能力の高いＨＭＭによるパターン認
識装置を提供することにある。【００１０】上記の目的を達成するために、本発明のパ
ターン認識装置は、ＨＭＭによるパターン認識をする際
に、学習データおよび初期のＨＭＭパラメータを設定す
る設定手段と、設定された全ての学習データおよび初期
ＨＭＭパラメータから認識誤りが起こる可能性を表す初
期の目的関数を、各学習データに対する最近傍カテゴリ
と自カテゴリとの対数尤度の差をシグモイド関数を用い
てカテゴリ全体で和をとることによって求める目的関数
演算手段と、ＨＭＭパラメータに摂動を与え、新たなＨ
ＭＭパラメータを求める処理、求められた新たなＨＭＭ
パラメータを用いて新たな目的関数を求める処理、およ
び求められた新たな目的関数が減少している場合には、
新たなＨＭＭパラメータと新たな目的関数を採用する処
理を適宜繰り返して前記目的関数を最小化するＨＭＭパ
ラメータを求めるＨＭＭパラメータ演算手段と、備え、
該ＨＭＭパラメータによりパターン認識を行うことを特
徴としている。【００１１】（Ａ）学習データおよび初期のＨＭＭパラ
メータを設定する。【００１２】（Ｂ）設定された全ての学習データおよび
初期ＨＭＭパラメータから認識誤りが起こる可能性を表
す初期の目的関数を求める。【００１３】（Ｃ）ＨＭＭパラメータに摂動を与え、新
たなＨＭＭパラメータを求める。【００１４】（Ｄ）新たなＨＭＭパラメータを用いて新
たな目的関数を求める。【００１５】（Ｅ）目的関数が減少していれば、新たな
ＨＭＭパラメータと目的関数を採用する。【００１６】（Ｆ）上記過程（Ｃ），（Ｄ），（Ｅ）を
適宜繰り返して前記目的関数を最小化するＨＭＭパラメ
ータを求める。【００１７】【作用】上記構成の本発明では、認識誤りが起こる可能
性を表す目的関数を最小化するＨＭＭパラメータを求め
ることができ、音声認識に本発明を適用した場合の認識
率が向上する。【００１８】【実施例】以下、本発明の一実施例を図面を参照しつつ
説明する。【００１９】初めに、本実施例の記法を以下のように定
める。【００２０】Ｍ：認識すべき時系列パターンのカテゴリ
数ｍ，ｍ′：カテゴリ番号（ｍ，ｍ′＝１，２，……，
Ｍ）Ｎ：カテゴリ内の学習データ数ｎ：学習データ番号（ｎ＝１，２，……，Ｎ）Ｏmn：カテゴリｍでのｎ番目の学習データ Λ：全カテゴリのＨＭＭパラメータの組、Λ＝｛λm ｝ λm ：カテゴリｍでのＨＭＭパラメータ、λm ＝｛ａ^m
ij，ｂ^mijk ｝ａ^mij：カテゴリｍで、ＨＭＭの状態がｉからｊへ遷移
する時の遷移確率ｂ^mijk ：カテゴリｍで、ＨＭＭの状態がｉからｊへ遷
移する時に、離散出力分布ＨＭＭの場合、記号ｋを出力
する確率Ｌ（Ｏmn｜λm ）：カテゴリｍから学習データＯmnを出
力する尤度Ｅmn: 学習データＯmnに対する、最近傍カテゴリと自カ
テゴリとの対数尤度の差Ｅ：カテゴリ全体での認識誤りが起こる可能性を表す目
的関数本発明は、上記の目的関数Ｅを最小にするＨＭＭパラメ
ータの組Λ＝｛λm｝を算出することを特徴とする。【００２１】ここでは、関数Ｆとしてシグモイド関数Ｆ（ｘ）＝１／（１＋ｅｘｐ（−ｘ／μ））（但し、μは定数）…（３）を用い、Ｅを最小化する方法として最急降下法を用いた
場合の一実施例について図１，図２に示すフローチャー
トを用いて説明する。図１、図２に示すフローチャート
において、ステップＳＴ１、ＳＴ２が請求項１の設定手
段を、ステップＳＴ３が請求項１の目的関数演算手段
を、ステップＳＴ４〜ＳＴ１１が請求項１のＨＭＭパラ
メータ演算手段をそれぞれ構成する。【００２２】まず、学習データを設定する（ステップＳ
Ｔ１）。これは、時系列パターンの学習データを全ての
カテゴリに対して用意する。【００２３】次に、初期ＨＭＭパラメータの組Λを設定
する（ステップＳＴ２）。ＨＭＭパラメータである状態
遷移確率と出力確率には、任意の初期値を設定すること
が可能である。本実施例では、Ｂａｕｍ−Ｗｅｌｃｈア
ルゴリズムを数回実行した後のＨＭＭパラメータを設定
する方法を用い、この初期ＨＭＭパラメータの組をΛと
する。なお、時系列の学習データをＨＭＭの状態数で等
分割し、各記号で統計的に出力確率を求める方法などを
用いてもよい。【００２４】次に、初期の目的関数Ｅを求める（ステッ
プＳＴ３）。これは、学習データＯmnに対する、最近傍
カテゴリと自カテゴリの対数尤度の差Ｅmnを上記（１）
式により求める。求められた対数尤度の差Ｅmnを単調増
加関数Ｆで変換し、全学習データについてその総和を上
記（２）式に基づいて求める。これが初期の目的関数Ｅ
となる。対数尤度は前向きアルゴリズム、あるいはVite
rbi アルゴリズムで求められるが、本実施例では前向き
アルゴリズムを用いる。【００２５】次に、摂動を与えるＨＭＭパラメータを選
択する（ステップＳＴ４）。これは、全カテゴリの全パ
ラメータの中から、どのＨＭＭパラメータに摂動を与え
るかについて一様乱数を発生させて一つだけ決定する。【００２６】次に、選択されたＨＭＭパラメータに摂動
を与える（ステップＳＴ５）。【００２７】この処理は、一様乱数を用いてステップＳ
Ｔ４で選ばれたＨＭＭパラメータの値を変更する。例え
ば、状態遷移確率ａ^mijが選ばれたとすると、［１，−
１］上の一様乱数ｒに定数δを乗じたものを加算する。
本実施例では、定数δ＝０．０１とした。【００２８】出力確率ｂ^mijk が選ばれた時も同様にして、次のよ
うに摂動を与える。【００２９】【００３０】次に、上述のようにして与えられた摂動を
採用するか否かを決定する（ステップＳＴ６）。【００３１】状態遷移確率と出力確率は、共に確率の値
をとるので、摂動を与えた後の値が０より小さいか１よ
り大きいときは、ステップＳＴ４に戻って摂動を与える
別のＨＭＭパラメータを選択する。【００３２】次に、新しいＨＭＭパラメータの組Λ′を
求める（ステップＳＴ７）。【００３３】摂動を与えたカテゴリｍの状態遷移確率、
あるいは出力確率について、次の確率の条件を満たすよ
うな値を調整する。【００３４】としたものを、新しいＨＭＭパラメータの組Λ′とす
る。【００３５】次に、新しい目的関数Ｅ′を求める（ステ
ップＳＴ８）。【００３６】新しく求められたＨＭＭパラメータの組
Λ′を用いて、ステップＳＴ３と同様にして新しい目的
関数Ｅ′を求める。【００３７】次に、目的関数の減少を判定する（ステッ
プＳＴ９）。【００３８】Ｅ′≦Ｅならば、摂動の結果を採用してＥ
＝Ｅ′とし、ＨＭＭパラメータの組Λ＝Λ′とする（ス
テップＳＴ１０）。それ以外すなわちＥ′＞Ｅならば、
ＥとΛは更新しない。【００３９】あらかじめ設定したループ回数だけ摂動を
行った場合および目的関数Ｅが非常に小さくなった場
合、例えば１０^-8よりも小さくなった時に摂動を終了す
る（ステップＳＴ１１）。【００４０】こうして求められた目的関数を最小とする
ＨＭＭパラメータの組Λを、最適なＨＭＭパラメータの
組として採用するのである。【００４１】なお、上記の実施例では、離散出力分布Ｈ
ＭＭを用いた場合の例を示したが、連続分布ＨＭＭにお
いては、出力確率密度の平均と分散に乱数で摂動を加え
るとした場合にも本発明を適用できる。【００４２】また、目的関数Ｅを最小化するのに、上記
実施例では最急降下法を用いたが、アニーリング法等、
他の最適化手法を用いることも可能である。単調増加関
数Ｆ（ｘ）についても、ここではシグモイド関数を用い
たが、Ｆ（ｘ）の条件を満たすものであれば良いので、
例えば、次のような関数が考えられる。【００４３】【数１】【００４４】次に本発明方法を適用した音声認識処理の
実験例について図３のブロック図を参照しつつ説明す
る。【００４５】ここでは、音声認識の問題のうち、日本語
有声破裂子音/b/,/d/,/g/ を離散分布型ＨＭＭで認識す
る問題に本発明を適用する。【００４６】この音声認識処理においては、有声破裂子
音／ｂ／、／ｄ／、／ｇ／をそれぞれ１つのＨＭＭでモ
デル化し、各ＨＭＭのパラメータを本発明の方法で学習
する。ここでは、離散分布型で４状態３ループ、スキッ
プなしのＨＭＭを用いた。【００４７】また、学習と認識のための音声データは、
ＡＴＲデータベースの中の話者１名が文節単位に発声し
た重要単語を用いる。本実験では、話者がＭＡＵ（成人
男性）およびＦＳＵ（成人女性）の場合について検討を
行った。【００４８】先ず、入力された有声破裂子音の各子音デ
ータの始まりと終りは、ＡＴＲデータベースに付与され
たラベルを参照し、標本化周波数１５ｋＨｚで標本化
（ブロックＢ２０）して切り出し、１８次のＬＰＣケプ
ストラム分析（ブロック２１）を行った。【００４９】コードブックは、各話者の音韻バランス単
語の１７子音（/p/,/t/,/k/,/ts/,/s/,/h/,/z/,/ch/,/s
h/,/b/,/d/,/g/,/r/,/w/,/y/,/m/,/n/）から、２５６の
サイズで作成し（ブロックＢ２２）、ＬＰＣケプストラ
ム分析した結果をベクトル量子化した（ブロックＢ２
３）。【００５０】次に各子音それぞれ３００個のデータを１
００個ずつに分割して、データセット１、２、３を作成
した（ブロックＢ２４）。ＨＭＭの学習を１つのデータ
セットで行い、認識を他の２つのデータセットで行うと
いう実験を行い、各データセットでの平均認識誤り率を
結果とする。【００５１】パラメータ学習時の初期ＨＭＭパラメータ
の設定は、Ｂａｕｍ−Ｗｅｌｃｈアルゴリズムを、自カ
テゴリの対数尤度が収束するまで（約４５回の繰り返
し）実行したものを用いた。【００５２】次に、上記学習データセットを用いて、本
発明の学習方法により、ＨＭＭパラメータの学習を行っ
た（ブロックＢ２５）。作成されたＨＭＭパラメータ
（ブロックＢ２６）を使って、認識データセットの対数
尤度を計算した（ブロックＢ２７）。この対数尤度を判
定する（ブロックＢ２８）ことによって全カテゴリのう
ち、最大となったカテゴリを認識結果とした。【００５３】図４に話者ＭＡＵでの日本語有声破裂子音
の認識結果を、図５に話者ＦＳＵでの日本語有声破裂子
音の認識結果を示す。ここでは、比較のために、本発明
の学習方法の初期ＨＭＭパラメータ、すなわち、Ｂａｕ
ｍ−Ｗｅｌｃｈ学習後の認識誤り率と、同様の実験を誤
り訂正学習で行ったときの認識誤り率も示した。なお、
各図中、openとは、未知データを認識したときをいい、
close とは、学習したデータそのものを認識したときを
いう。【００５４】図４から理解されるように、話者ＭＡＵの
未知データに対して、初期ＨＭＭパラメータでの認識誤
り率は、１２．９％であったが、本学習方法では、１
０．１％となった。これは、誤り訂正学習法による認識
誤り率１２．３％よりも優れている。また、図５から理
解されるように、話者ＦＳＵの未知データに関しても、
初期ＨＭＭパラメータでの認識誤り率は１１．６％であ
り、誤り訂正学習法による認識誤り率は１１．５％であ
るのに対し、本学習法では、１０．４％であり、認識誤
り率の顕著なる低下が確認された。【００５５】以上の実験結果により、本発明のＨＭＭパ
ラメータ学習法は、従来の学習法よりも高い認識性能を
持つＨＭＭパラメータを提供できることが確認された。【００５６】以上説明したように本発明によれば、従来
に比べて高いパターン認識率を有するパターン認識装置
を提供することが可能となる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an HMM (Hidden Markov) capable of improving a pattern recognition rate.
Model). [Summary of the Invention] [0002] The present invention relates to a pattern recognition apparatus for performing time-series pattern recognition using an HMM, and obtains a function representing the possibility of occurrence of a recognition error from all given learning data. By calculating an HMM parameter that minimizes the following, pattern recognition using an HMM having a higher recognition ability than in the past can be performed. [0003] Conventional HMM parameter learning methods include:
For example, the following three learning methods are known. (1) Baum-Welch algorithm This is also called the Forward-Backward algorithm, and is a widely used technique. In this method, when a state transition probability A and an output probability B, which are HMM parameters, are given, the maximum likelihood L (O | A, B) that the HMM outputs learning data O is maximized. This is a method of estimating a new state transition probability A and an output probability B based on the estimation. By repeating this parameter estimation,
It converges HMM parameters to local optimal values (eg, LRRabiner, BHJuang, "An Introduct
ion to Hidden Markov Models ", IEEE ASSP MAGAGINE, JA
N.1986, pp.4-16). (2) Error correction learning In this method, learning data is first recognized after HMM parameter estimation by the Baum-Welch algorithm,
When the likelihood difference is small even if misrecognized or correctly recognized,
Modify HMM parameters. That is, for the true category, the frequency of the symbol (label) of the training data is calculated as the output probability B
In addition to the above, a correction is made such that subtraction is performed from an erroneous category or a category having a small likelihood difference. Thereby, the recognition performance for learning data is improved (for example, LRBahl, PFBrown, PVde Souza, and RLMerce
r, "A New Algorithm for the Estimation of Hidden Ma
rkovModel Parameters ", Proceeding of the 1988 IEEE
International Conference on Acoustics, Speech and S
ignal Processing, pp493-496). (3) HMM learning by annealing In this method, an energy function having a negative logarithmic likelihood, which is created only from the learning data of the own category, is reduced by an annealing method. The HMM parameter is the state transition probability A
And the output probability B are selected alternately, and a random number according to a Gaussian distribution is added (for example, Douglas B. Paul, "Training of HMM Recogniz
ers by Simulated Annealing ", Proceedings of the 198
5 IEEE Internatinal Conference on Acoustics, Speec
h, and Signal Processing, pp13-16). [0007] However, (1)
Baum-Welch algorithm shown in (3)
In the learning of the HMM by annealing shown in (1), since learning data of other categories is not considered, these methods are not necessarily learning methods for increasing the degree of separation between categories. When the Baum-Welch algorithm of (1) is used, the HMM parameters converge to a local optimum value depending on the initial value, but the convergence value is not always the optimum value. On the other hand, the error correction learning shown in (2) considers learning data of other categories, and shows a very good result with respect to learning data, but recognizes too little with respect to unknown data. No improvement in rate is seen. This is because the error correction learning corrects the HMM parameters only between the true category and the incorrect category or the category having a small likelihood difference, so that it cannot be said that the correction is made in the direction in which the entire category can be easily recognized. Because. In addition, since the data is too adapted to the learning data, the recognition performance does not increase for unknown data when a category with many variations is handled. The present invention has been made in view of the above circumstances, and has as its object to obtain a function representing the possibility of occurrence of a recognition error from all learning data for all categories and calculate an HMM parameter for minimizing the function. By doing
An object of the present invention is to provide a pattern recognition apparatus using an HMM having a higher recognition ability than a conventional method. In order to achieve the above object, a pattern recognition apparatus according to the present invention comprises: a setting means for setting learning data and initial HMM parameters when performing pattern recognition using an HMM; and the initial objective function representing the likelihood that the initial recognition error from HMM parameter occurs, nearest category for each learning data
Using the sigmoid function
Function calculation means for obtaining the sum by category over the entire category, and perturbing the HMM parameters to obtain a new H
Process for obtaining MM parameters, new HMM obtained
In the process of obtaining a new objective function using the parameters, and when the obtained new objective function is decreasing,
HMM parameter calculation means for obtaining a HMM parameter that minimizes the objective function by appropriately repeating a process of adopting the new HMM parameter and the new objective function,
It is characterized in that pattern recognition is performed using the HMM parameters. (A) Set learning data and initial HMM parameters. (B) An initial objective function representing the possibility of a recognition error occurring is obtained from all the set learning data and initial HMM parameters. (C) Perturbation is applied to the HMM parameters to obtain new HMM parameters. (D) A new objective function is obtained using new HMM parameters. (E) If the objective function has decreased, a new HMM parameter and objective function are adopted. (F) The above steps (C), (D) and (E) are repeated as appropriate to determine HMM parameters for minimizing the objective function. According to the present invention having the above structure, the HMM parameter for minimizing the objective function representing the possibility of occurrence of a recognition error can be obtained, and the recognition rate when the present invention is applied to speech recognition is improved. . An embodiment of the present invention will be described below with reference to the drawings. First, the notation of this embodiment is defined as follows. M: the number of categories of the time series pattern to be recognized m, m ': category number (m, m' = 1, 2,...,
M) N: number of learning data in category n: learning data number (n = 1, 2,..., N) Omni: n-th learning data in category m Λ: set of HMM parameters of all categories, Λ = {λm} λm: HMM parameters category ^m, λm = {a m
^{^{ij, b m ijk} a m}} ij: Category m, transition probability b ^m ijk of when the state of the HMM is changed from i to j: Category m, when the state of the HMM is changed from i to j, discrete output In the case of a distributed HMM, the probability L (Omn | λm) of outputting the symbol k: the likelihood Emn of outputting the learning data Omn from the category m: the difference in log likelihood between the nearest category and the subject category with respect to the learning data Omn E: Objective function that represents the possibility of recognition errors occurring in the entire category The present invention is characterized in that a set of HMM parameters Λ = {λm} that minimizes the objective function E is calculated. Here, a sigmoid function F (x) = 1 / (1 + exp (−x / μ)) (where μ is a constant) (3) is used as the function F, and the steepest descent method is used as a method for minimizing E. An embodiment using the method will be described with reference to the flowcharts shown in FIGS. In the flowcharts shown in FIGS. 1 and 2, steps ST1 and ST2 correspond to the setting means of claim 1, step ST3 corresponds to the objective function calculating means of claim 1, and steps ST4 to ST11 correspond to the HMM parameter calculating means of claim 1. Configure each. First, learning data is set (step S).
T1). In this method, time series pattern learning data is prepared for all categories. Next, a set 初期 of initial HMM parameters is set (step ST2). An arbitrary initial value can be set for the state transition probability and the output probability, which are the HMM parameters. In the present embodiment, a method of setting HMM parameters after executing the Baum-Welch algorithm several times is used, and this set of initial HMM parameters is denoted by Λ. Note that a method may be used in which the time-series learning data is equally divided by the number of states of the HMM, and an output probability is statistically obtained for each symbol. Next, an initial objective function E is obtained (step ST3). This is because the difference Emn of the log likelihood between the nearest category and the own category with respect to the learning data Omn is expressed by the above (1).
Obtained by the formula The obtained log-likelihood difference Emn is converted by a monotonically increasing function F, and the sum of all learning data is obtained based on the above equation (2). This is the initial objective function E
It becomes. Log likelihood is a forward algorithm or Vite
Although determined by the rbi algorithm, in this embodiment, a forward algorithm is used. Next, HMM parameters to be perturbed are selected (step ST4). In this method, a uniform random number is generated to determine which HMM parameter is to be perturbed from all parameters of all categories, and only one is determined. Next, a perturbation is given to the selected HMM parameter (step ST5). This processing is performed in step S using uniform random numbers.
The value of the HMM parameter selected at T4 is changed. For example, if the state transition probability a ^m ij is selected, [1, -
1] multiply the above uniform random number r by a constant δ.
In the present embodiment, the constant δ = 0.01. [0028] Are similarly when the output probability b ^m ijk is selected, perturbing as follows. [0029] Next, it is determined whether or not to use the perturbation given as described above (step ST6). Since the state transition probability and the output probability both take the value of a probability, if the value after perturbation is smaller than 0 or larger than 1, the process returns to step ST4 to set another HMM parameter for perturbation. select. Next, a new set of HMM parameters Λ 'is determined (step ST7). The state transition probability of the perturbed category m,
Alternatively, the output probability is adjusted to a value that satisfies the following probability condition. [0034] Is a new set of HMM parameters Λ ′. Next, a new objective function E 'is obtained (step ST8). Using the newly obtained set of HMM parameters Λ ′, a new objective function E ′ is obtained in the same manner as in step ST3. Next, a decrease in the objective function is determined (step ST9). If E ′ ≦ E, the result of the perturbation is adopted to
= E ', and the set of HMM parameters Λ = Λ' (step ST10). Otherwise, if E '> E,
E and しない are not updated. When the perturbation is performed for a preset number of loops and when the objective function E becomes very small, for example, when the objective function E becomes smaller than 10 ^-8 , the perturbation is terminated (step ST11). The set of HMM parameters ＭＭ which minimizes the objective function thus obtained is adopted as the optimum set of HMM parameters. In the above embodiment, the discrete output distribution H
Although the example in which the MM is used has been described, in the continuous distribution HMM, the present invention can be applied to a case where perturbation is added to the average and variance of the output probability density by using random numbers. In the above embodiment, the steepest descent method is used to minimize the objective function E.
Other optimization techniques can be used. Although the sigmoid function is used here also for the monotone increasing function F (x), any function that satisfies the condition of F (x) may be used.
For example, the following function can be considered. ## EQU1 ## Next, an experimental example of a speech recognition process to which the method of the present invention is applied will be described with reference to the block diagram of FIG. Here, the present invention is applied to the problem of recognizing Japanese voiced plosive consonants / b /, / d /, / g / among discrete speech recognition problems using a discrete distribution type HMM. In this speech recognition processing, voiced plosive consonants / b /, / d /, / g / are each modeled by one HMM, and the parameters of each HMM are learned by the method of the present invention. Here, a discrete distribution type HMM without four skips and three states is used. The voice data for learning and recognition is
An important word uttered by a single speaker in the ATR database in units of phrases is used. In this experiment, the case where speakers were MAU (adult male) and FSU (adult female) was examined. First, the beginning and the end of each consonant data of the input voiced consonant are sampled at a sampling frequency of 15 kHz (block B20) with reference to the label assigned to the ATR database, and cut out. An LPC cepstrum analysis (block 21) was performed. The code book is composed of 17 consonants (/ p /, / t /, / k /, / ts /, / s /, / h /, / z /, / ch /, phonologically balanced words of each speaker). / s
h /, / b /, / d /, / g /, / r /, / w /, / y /, / m /, / n /) with a size of 256 (block B22) and an LPC cepstrum The result of the analysis is vector-quantized (block B2
3). Next, 300 data of each consonant are divided into 1
Data sets 1, 2, and 3 were created by dividing the data set into 00 pieces (block B24). An experiment in which HMM learning is performed on one data set and recognition is performed on the other two data sets is performed, and the average recognition error rate in each data set is used as a result. The initial HMM parameters at the time of parameter learning were set by executing the Baum-Welch algorithm until the log likelihood of the own category converged (about 45 repetitions). Next, HMM parameters were learned by the learning method of the present invention using the learning data set (block B25). The log likelihood of the recognition data set was calculated using the created HMM parameters (block B26) (block B27). By judging the log likelihood (block B28), the category which became the maximum among all the categories was determined as the recognition result. FIG. 4 shows the recognition result of the Japanese voiced plosive consonant by the speaker MAU, and FIG. 5 shows the recognition result of the Japanese voiced plosive consonant by the speaker FSU. Here, for comparison, the initial HMM parameters of the learning method of the present invention, ie, Bau
The recognition error rate after m-Welch learning and the recognition error rate when the same experiment is performed by error correction learning are also shown. In addition,
In each figure, open means when unknown data is recognized,
Close means when the learned data itself is recognized. As can be understood from FIG. 4, the recognition error rate with the initial HMM parameters for the unknown data of the speaker MAU was 12.9%.
0.1%. This is superior to the recognition error rate of 12.3% by the error correction learning method. Also, as understood from FIG. 5, regarding the unknown data of the speaker FSU,
The recognition error rate with the initial HMM parameters is 11.6%, and the recognition error rate by the error correction learning method is 11.5%, whereas the recognition error rate is 10.4% in the present learning method. Was remarkably reduced. From the above experimental results, it was confirmed that the HMM parameter learning method of the present invention can provide HMM parameters having higher recognition performance than the conventional learning method. As described above, according to the present invention, it is possible to provide a pattern recognition device having a higher pattern recognition rate than the conventional one.

【図面の簡単な説明】【図１】本発明装置の機能を示すフローチャートであ
る。【図２】本発明装置の機能を示すフローチャートであ
る。【図３】本発明が適用された音声認識処理を説明するブ
ロック図である。【図４】図３の音声認識結果を示す説明図である。【図５】図３の音声認識結果を示す説明図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart showing functions of the device of the present invention. FIG. 2 is a flowchart showing functions of the device of the present invention. FIG. 3 is a block diagram illustrating a speech recognition process to which the present invention has been applied. FIG. 4 is an explanatory diagram showing a speech recognition result of FIG. 3; FIG. 5 is an explanatory diagram showing a speech recognition result of FIG. 3;

フロントページの続き (56)参考文献特開平４−205389（ＪＰ，Ａ) 特開平３−176781（ＪＰ，Ａ) 松永務，阿部一朗，木田博巳，シミュレーテッドアニーリング法を用いた文字認識辞書の最適化，電子情報通信学会技術研究報告［パターン認識・理解］，日本，1990年７月12日，ＰＲＵ90−39, ｐ．79−84 安藤彰男，尾関和彦，誤認識関数を最小化する標準パターン学習アルゴリズム，日本音響学会平成３年度春季研究発表会講演論文集，日本，1991年３月27 日，ｐ．205−206 今井享，安藤彰男，対数尤度差に基づく誤差関数を最小化するＨＭＭ学習法, 日本音響学会平成３年度秋季研究発表会講演論文集，日本，1991年10月２日, ｐ．79−80 今井享，安藤彰男，対数尤度差に基づく誤差関数を最小化するＨＭＭ学習法, 電子情報通信学会技術研究報告［音声］，日本，1991年12月19日，ＳＰ91− 87，ｐ．49−56 安藤彰男，尾関和彦，誤認識関数を最小化する標準パターン学習アルゴリズム，電子情報通信学会論文誌Ａ，日本，1993年４月25日，Ｖｏｌ．Ｊ76− Ａ，ｐ．580−588 水田忍，中島邦男，混合連続分布ＨＭＭに対する最適識別学習法の検討，日本音響学会平成２年度春季研究発表会講演論文集，日本，1990年３月28日，ｐ．23 −24 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/14 G10L 15/06 Continuation of the front page (56) References JP-A-4-205389 (JP, A) JP-A-3-176781 (JP, A) Tsutomu Matsunaga, Ichiro Abe, Hiromi Kida, Character recognition using the simulated annealing method Dictionary optimization, IEICE Technical Report [Pattern Recognition and Understanding], Japan, July 12, 1990, PRU90-39, p. 79-84 Akio Ando, Kazuhiko Ozeki, Standard Pattern Learning Algorithm for Minimizing False Recognition Functions, Proceedings of the Acoustical Society of Japan, Spring Meeting, 1991, Japan, March 27, 1991, p. 205-206 Takashi Imai, Akio Ando, HMM Learning Method to Minimize Error Function Based on Log-Likelihood Difference, Proceedings of the Acoustical Society of Japan 1991 Autumn Meeting, Japan, October 2, 1991, p. 79-80 Takashi Imai, Akio Ando, HMM Learning Method to Minimize Error Function Based on Log-Likelihood Difference, IEICE Technical Report [Speech], Japan, December 19, 1991, SP91-87 , P. 49-56 Akio Ando, Kazuhiko Ozeki, Standard Pattern Learning Algorithm for Minimizing False Recognition Functions, IEICE Transactions A, Japan, April 25, 1993, Vol. J76-A, p. 580-588 Shinobu Mizuta, Kunio Nakajima, Optimal Discrimination Learning Method for Mixed Continuous Distribution HMM, Proceedings of the Acoustical Society of Japan Spring Meeting, 1990, Japan, March 28, 1990, p. 23 −24 (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 15/14 G10L 15/06

Claims

(57) [Claims] [Claim 1] When performing pattern recognition by HMM,
Setting means for setting the HMM parameters of learning data and the initial, the initial objective function representing the likelihood that recognition errors occur from all learning data and initial HMM parameters are set, the self and nearest Categories for each training data
Using the sigmoid function
Means for calculating an objective function by obtaining a sum in a field; processing for obtaining a new HMM parameter by perturbing the HMM parameter; processing for obtaining a new objective function using the obtained new HMM parameter; If the new objective function has decreased, a new HMM
HMM parameter calculation means for obtaining an HMM parameter for minimizing the objective function by appropriately repeating the process of adopting the parameter and the new objective function, and performing pattern recognition using the HMM parameter. .