JP2006126398A

JP2006126398A - Method of compressing plurality of probability density function kernels of probability model, and computer program therefor

Info

Publication number: JP2006126398A
Application number: JP2004313470A
Authority: JP
Inventors: Soong Frank; フランク・スーン; Shiyouhei Ri; 小兵李; Satoru Nakamura; 哲中村
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2004-10-28
Filing date: 2004-10-28
Publication date: 2006-05-18

Abstract

<P>PROBLEM TO BE SOLVED: To enable effective compression of a probability model, while maintaining performance. <P>SOLUTION: A method of compressing an HMM model includes the steps of clustering the means of the probability density function (pdf) kernels 20, 22, 24, 26, 28 and 30 of the HMM model into first centroid kernels 40 and 42; clustering the variances of the pdf kernels 20, 22, 24, 26, 28 and 30 into second centroid kernels 50 and 52; and redefining each of the pdf kernels 20, 22, 24, 26, 28 and 30 by means of one of the first centroid kernels 40 and 42 that is nearest the original mean of pdf kernels, and by dispersion of one of the second centroid kernels 50 and 52 that is nearest the original pdf kernel. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は確率の効率的なモデル化に関し、特に、記憶容量を少なくし計算量も少なくするための、ＨＭＭ（隠れマルコフモデル）等の確率モデルの圧縮に関する。 The present invention relates to efficient modeling of probability, and more particularly to compression of a probability model such as HMM (Hidden Markov Model) for reducing storage capacity and computational complexity.

効率的な自動音声認識のためには、容量が少なく計算量も少なく、なおかつ好ましい認識性能を維持できる、より小型のモデルが常に望まれる。トレーニングの間に、いわゆる半連続ＨＭＭについてステート共有[非特許文献１を参照]または分布共有[非特許文献２を参照]を適用することによって、小型のモデルを構築できる。構築されたＨＭＭの基本ｐｄｆ（ｐｒｏｂａｂｉｌｉｔｙｄｅｎｓｉｔｙｆｕｎｃｔｉｏｎまたはｐｒｏｂａｂｉｌｉｔｙｄｉｓｔｒｉｂｕｔｉｏｎｆｕｎｃｔｉｏｎ：確率密度関数）カーネルもまた、トレーニング手順の後に素性空間でクラスタリングできる。ｐｄｆカーネル平均を各素性次元[非特許文献３を参照]またはサブ空間[非特許文献４を参照]でクラスタリングすることが可能であるという提案がされている。
Ｓ．Ｊ．ヤング及びＰ．Ｃ．ウッドランド、「連続音声認識におけるステート共有の使用」、ユーロスピーチ−１９９３、ｐｐ．２２０３−６、１９９３年（S.J. Young and P.C. Woodland, "The Use of State Tying in Continuous Speech Recognition", Eurospeech-1993, pp. 2203-6, 1993.）Ｘ．Ｄ．ハン、「半連続隠れマルコフモデルを用いた音韻分類」、ＩＥＥＥＡＳＳＰトランザクション、Ｖｏｌ．４０、Ｎｏ．５、ｐｐ．１０６２−７、１９９２年（X.D. Huang, "Phoneme Classification Using Semicontinuous Hidden Markov Models", IEEE Trans. ASSP, Vol. 40, No. 5, pp. 1062-7, 1992.）Ｓ．タカハシ及びＳ．サガヤマ、「音響モデリングの効率的表現のための４−レベル共有構造」、ＩＣＡＳＳＰ−１９９５、ｐｐ．５２０−３、１９９５年（S. Takahashi and S. Sagayama, "Four-Level Tied-Structure for Efficient Representation of Acoustic Modeling", ICASSP-1995, pp. 520-3, 1995.）Ｅ．ボッチエリ及びＫ．Ｗ．マック、「サブ空間モデルクラスタリングによる隠れマルコフモデル」、ＩＥＥＥ音響音声処理トランザクション、Ｖｏｌ．９、Ｎｏ．３、ｐｐ．２６４−７５、２００１年３月（E. Bocchieri and K.W. Mak, "Subspace Distribution Clustering Hidden Markov Model", IEEE Trans. Speech Audio Proc. Vol. 9, No. 3, pp. 264-75, Mar. 2001.）Ｔ．Ａ．ミルフォール及びＦ．Ｋ．スーン、「ダイバージェンスを用いた多変数正規分布の最適クラスタリング及びＨＭＭへの適用とその応用」、ＩＣＡＳＳＰ−２００３、ｐｐ．５５２−５、２００３年（T.A. Myrvoll and F.K. Soong, "Optimal Clustering of Multivariate Normal Distributions Using Divergence and Its Application to HMM Adaptation", ICASSP-2003, pp. 552-5, 2003.）Ｊ．キム、Ｒ．ハイミ−コーエン及びＦ．Ｋ．スーン、「ダイバージェンスベースのベクトル量子化変数を備えた隠れマルコフモデル」、ＩＣＡＳＳＰ−１９９９、ｐｐ．１２５−８、１９９９年（J. Kim, R. Haimi-Cohen, and F.K. Soong, "Hidden Markov Models with Divergence based Vector Quantization Variances", ICASSP-1999, pp. 125-8, 1999.） For efficient automatic speech recognition, there is always a desire for a smaller model that has a small capacity and a small amount of calculation, and that can maintain favorable recognition performance. By applying state sharing [see non-patent document 1] or distribution sharing [see non-patent document 2] for so-called semi-continuous HMMs during training, a small model can be constructed. The constructed HMM basic pdf (probability density function or probability density function) kernel can also be clustered in the feature space after the training procedure. It has been proposed that the pdf kernel average can be clustered in each feature dimension [see non-patent document 3] or subspace [see non-patent document 4].
S. J. et al. Young and P.C. C. Woodland, “Use of State Sharing in Continuous Speech Recognition”, Eurospeech-1993, pp. 2203-6, 1993 (SJ Young and PC Woodland, "The Use of State Tying in Continuous Speech Recognition", Eurospeech-1993, pp. 2203-6, 1993.) X. D. Han, “Phonology classification using semi-continuous hidden Markov model”, IEEE ASSP transaction, Vol. 40, no. 5, pp. 1062-7, 1992 (XD Huang, "Phoneme Classification Using Semicontinuous Hidden Markov Models", IEEE Trans. ASSP, Vol. 40, No. 5, pp. 1062-7, 1992.) S. Takahashi and S.H. Sagayama, “4-Level Shared Structure for Efficient Representation of Acoustic Modeling”, ICASSP-1995, pp. 520-3, 1995 (S. Takahashi and S. Sagayama, "Four-Level Tied-Structure for Efficient Representation of Acoustic Modeling", ICASSP-1995, pp. 520-3, 1995.) E. Bocchieri and K. W. Mac, “Hidden Markov Model with Sub-Space Model Clustering”, IEEE Acoustic Processing Transactions, Vol. 9, no. 3, pp. 264-75, March 2001 (E. Bocchieri and KW Mak, "Subspace Distribution Clustering Hidden Markov Model", IEEE Trans. Speech Audio Proc. Vol. 9, No. 3, pp. 264-75, Mar. 2001. ) T. T. et al. A. Millfort and F.M. K. Soon, “Optimal Clustering of Multivariable Normal Distribution Using Divergence and Application to HMM and Its Application”, ICASSP-2003, pp. 552-5, 2003 (TA Myrvoll and FK Soong, "Optimal Clustering of Multivariate Normal Distributions Using Divergence and Its Application to HMM Adaptation", ICASSP-2003, pp. 552-5, 2003.) J. et al. Kim, R.A. Haimi-Cohen and F.M. K. Soon, “Hidden Markov Model with Divergence-Based Vector Quantization Variables”, ICASSP-1999, pp. 125-8, 1999 (J. Kim, R. Haimi-Cohen, and FK Soong, "Hidden Markov Models with Divergence based Vector Quantization Variances", ICASSP-1999, pp. 125-8, 1999.)

非特許文献３では、各素性次元で平均のみをクラスタリングすることにより、高いモデル解像度が維持される。しかし、大きなメモリ空間と大きな計算量が依然として必要である。 In Non-Patent Document 3, high model resolution is maintained by clustering only the average in each feature dimension. However, a large memory space and a large amount of calculation are still necessary.

非特許文献４では、平均と分散とを合わせてクラスタリングする。しかし、結果として生じる量子化誤差のため、合わせてクラスタリングされたセントロイドによって良好なモデル表現を保証することはできない。この点を説明するため、図９に一例を示す。ここでは、６個のガウス分布カーネルが２個のクラスタにクラスタリングされている。カーネル２０、２２、２４がセントロイドカーネル４００で示される１個のクラスタにクラスタリングされ、カーネル２６、２８、３０がセントロイドカーネル４０２で示される別のクラスタにクラスタリングされる。 In Non-Patent Document 4, clustering is performed by combining the average and the variance. However, due to the resulting quantization error, a good model representation cannot be guaranteed by a centroid clustered together. In order to explain this point, an example is shown in FIG. Here, six Gaussian distribution kernels are clustered into two clusters. Kernels 20, 22, 24 are clustered into one cluster indicated by centroid kernel 400, and kernels 26, 28, 30 are clustered into another cluster indicated by centroid kernel 402.

その結果得られるセントロイドは、特に分散に関して、クラスタ内の個々の要素を良好に表してはいない。 The resulting centroid does not represent well the individual elements in the cluster, especially with respect to dispersion.

従って、この発明の目的の一つは、元のモデルの性能を維持しつつ、確率モデルを効果的に圧縮する方法を提供することである。 Accordingly, one object of the present invention is to provide a method for effectively compressing a probabilistic model while maintaining the performance of the original model.

この発明の一局面は、確率モデルの複数個のｐｄｆカーネルを圧縮する方法に関するものである。ｐｄｆカーネルは各々第１のパラメータと第２のパラメータとで定義されている。この方法は、複数個のｐｄｆカーネルの第１のパラメータを１以上の第１のセントロイドカーネルにクラスタリングするステップと、複数個のｐｄｆカーネルの第２のパラメータを１以上の第２のセントロイドカーネルにクラスタリングするステップと第１のセントロイドカーネルのうち、前記複数個のｐｄｆカーネルの各々の第１のパラメータに最も近い第１のパラメータを有するものの第１のパラメータと、第２のセントロイドカーネルのうち、前記複数個のｐｄｆカーネルの各々に最も近い第２のパラメータを有するものの第２のパラメータとによって、複数個のｐｄｆカーネルの各々を再定義するステップとを含む。 One aspect of the present invention relates to a method for compressing a plurality of pdf kernels of a probability model. Each pdf kernel is defined by a first parameter and a second parameter. The method includes clustering a first parameter of a plurality of pdf kernels into one or more first centroid kernels, and a second parameter of the plurality of pdf kernels into one or more second centroid kernels. The first parameter of the first centroid kernel and the first centroid kernel having the first parameter closest to the first parameter of each of the plurality of pdf kernels; And redefining each of the plurality of pdf kernels with a second parameter having a second parameter closest to each of the plurality of pdf kernels.

複数個のｐｄｆカーネルは第１のセントロイドカーネルと第２のセントロイドカーネルとにクラスタリングされる。ｐｄｆの各々はその後、第１のセントロイドカーネルのうち最も近いものの第１のパラメータと、第２のセントロイドカーネルのうち最も近いものの第２のパラメータとによって再定義される。ｐｄｆカーネルがより少ないセントロイドカーネルのパラメータによって再定義されるため、モデルが必要とする記憶容量の総計が削減される。 The plurality of pdf kernels are clustered into a first centroid kernel and a second centroid kernel. Each of the pdfs is then redefined by the first parameter of the closest one of the first centroid kernels and the second parameter of the closest one of the second centroid kernels. Since the pdf kernel is redefined with fewer centroid kernel parameters, the total amount of storage required by the model is reduced.

好ましくは、複数個のｐｄｆカーネルの各々はガウスｐｄｆカーネルを含む。第１及び第２のパラメータはガウスｐｄｆカーネルの平均と分散とを含む。 Preferably, each of the plurality of pdf kernels includes a Gaussian pdf kernel. The first and second parameters include the mean and variance of the Gaussian pdf kernel.

ガウスｐｄｆカーネルの平均と分散とが別個にクラスタリングされる。このため、記憶容量は少なくなり、計算量も少なくなる。 The mean and variance of the Gaussian pdf kernel are clustered separately. For this reason, the storage capacity is reduced and the calculation amount is also reduced.

より好ましくは、第１のパラメータをクラスタリングするステップは、再定義されたガウスｐｄｆカーネルとそれぞれの対応する元のｐｄｆカーネルとの合計の誤差（カルバック−ライブラー・ダイバージェンス（Ｋｕｌｌｂａｃｋ‐ＬｅｉｂｌｅｒＤｉｖｅｒｇｅｎｃｅ：ＫＬＤ））が最小になるようにガウスｐｄｆカーネルの平均をクラスタリングするステップを含む。 More preferably, the step of clustering the first parameter is the sum error between the redefined Gaussian pdf kernel and each corresponding original pdf kernel (Kullback-Leibler Divergence (KLD)). Clustering the average of the Gaussian pdf kernel so that) is minimized.

さらに好ましくは、第２のパラメータをクラスタリングするステップは、再定義されたガウスｐｄｆカーネルと、ゼロ平均を持つそれぞれの対応する元のｐｄｆカーネルとの誤差の合計が最小になるようにガウスｐｄｆカーネルの分散をクラスタリングするステップを含む。 More preferably, the step of clustering the second parameter is such that the sum of errors between the redefined Gaussian pdf kernel and each corresponding original pdf kernel with zero mean is minimized. Clustering the variances.

所与の二つの確率密度関数の誤差は、所与の二つの確率密度関数間の対称カルバック−ライブラー・ダイバージェンスとして計算されてもよい。 The error of a given two probability density functions may be calculated as a symmetric Kalbach-librarian divergence between the two given probability density functions.

ガウスｐｄｆをクラスタリングし、クラスタリングされたセントロイドカーネルを用いてガウスｐｄｆを再定義することによって、モデルはより小型となり、計算量が少なくなる。 By clustering the Gaussian pdf and redefining the Gaussian pdf using the clustered centroid kernel, the model becomes smaller and less computationally intensive.

この発明の別の局面は、コンピュータ上で実行されると、上述の方法のいずれかの全てのステップを当該コンピュータに実行させるコンピュータプログラムに関する。 Another aspect of the present invention relates to a computer program that, when executed on a computer, causes the computer to execute all the steps of any of the methods described above.

［はじめに］
モデル解像度とリソース割当との間の良好なトレードオフを見出すために、この実施の形態では、最適なクラスタセントロイド計算[非特許文献５を参照]に従って、対称カルバック−ライブラー・ダイバージェンスを誤差尺度として用い、各スカラー次元で平均と分散とを別個にクラスタリングすることを提案する。最適セントロイドの近似はすでに非特許文献６で提案されている。 [Introduction]
In order to find a good trade-off between model resolution and resource allocation, in this embodiment, according to the optimal cluster centroid calculation [see Non-Patent Document 5], the symmetric Kalbach-Librer divergence is an error measure. We propose to cluster the mean and variance separately in each scalar dimension. The approximation of the optimal centroid has already been proposed in Non-Patent Document 6.

具体的には、図１に示すように、カーネル２０、２２、２４、２６、２８及び３０の平均を２個の平均セントロイドカーネル（以下「平均クラスタカーネル」と称する）４０及び４２にクラスタリングする。すなわちカーネル２０、２２及び２４は平均クラスタカーネル４０に、カーネル２６、２８及び３０は平均クラスタカーネル４２に、クラスタリングされる。さらに、カーネル２０、２２、２４、２６、２８及び３０の分散を分散カーネル（以下「分散クラスタカーネル」と称する）５０及び５２にクラスタリングする。すなわち、カーネル２０及び２８は分散クラスタカーネル５０に、カーネル２２、２４、２６及び３０は分散クラスタカーネル５２に、クラスタリングされる。平均と分散とを別個にクラスタリングすることによって、より高いモデル解像度が得られる。 Specifically, as shown in FIG. 1, the average of kernels 20, 22, 24, 26, 28, and 30 is clustered into two average centroid kernels (hereinafter referred to as “average cluster kernels”) 40 and 42. . That is, the kernels 20, 22 and 24 are clustered into the average cluster kernel 40, and the kernels 26, 28 and 30 are clustered into the average cluster kernel 42. Further, the distributions of the kernels 20, 22, 24, 26, 28 and 30 are clustered into distributed kernels (hereinafter referred to as “distributed cluster kernels”) 50 and 52. That is, the kernels 20 and 28 are clustered into the distributed cluster kernel 50, and the kernels 22, 24, 26 and 30 are clustered into the distributed cluster kernel 52. By clustering the mean and variance separately, a higher model resolution is obtained.

［ＫＬＤに基づく最適セントロイド］
ここで所与の二つのｐｄｆ、ｆおよびｇ間の誤差（距離）を測定するために用いられる対称カルバック−ライブラー・ダイバージェンスは、以下の式で定義される。 [Optimal centroid based on KLD]
The symmetric Kalbach-Librer divergence used here to measure the error (distance) between two given pdfs, f and g is defined by the following equation:

以下の式に従ってクラスタセントロイドとクラスタ内の全てのカーネルとの合計ＫＬＤを最小化することにより、最適クラスタセントロイドｆ_ｃが得られる。

By minimizing the total KLD with all kernel in a cluster centroid and the cluster according to the following equation, the optimal cluster centroid f _c is obtained.

多変数ガウス分布では、式（１）の対称ＫＬＤの閉じた式があり、これは以下で表される。

In the multivariable Gaussian distribution, there is a closed equation for the symmetric KLD in equation (1), which is expressed as:

ここでμ及びＲは対応する分布の平均と分散とである。最適セントロイドは一組のリカッティ行列式[非特許文献５を参照]を解くことで得られる。対角共分散の特殊な事例では、最適セントロイドのｉ次元の平均と分散、すなわちμ_ｃｉ及びσ_ｃｉ ^２は以下の通りである。

Here, μ and R are the mean and variance of the corresponding distribution. The optimal centroid can be obtained by solving a set of Riccati determinants [see Non-Patent Document 5]. In the special case of diagonal covariance, the i-dimensional mean and variance of the optimal centroid, ie, μ _ci and σ _ci ² are as follows:

［なぜ平均と分散のクラスタリングを別個に行なうか］
図９に示すように、ガウス分布カーネルの平均と分散とは、合わせてクラスタリングすることができる。しかし、結果として得られるセントロイドは、特に分散に関して、クラスタ内の個々の要素を良好に表してはいない。 [Why do clustering of mean and variance separately?]
As shown in FIG. 9, the mean and variance of the Gaussian distribution kernel can be clustered together. However, the resulting centroid does not represent well the individual elements in the cluster, especially with respect to dispersion.

平均と分散とを別個にクラスタリングすることによって、このモデル解像度の問題を克服することができる。分散は、対応する平均をゼロに設定することによってクラスタリング可能である。図１の下部に示されるように、カーネル２０及び２８の分散は分散クラスタカーネル５０によって示される左の分散にクラスタリングされ、カーネル２２、２４、２６、３０の分散は分散クラスタカーネル５２によって示される右の分散にクラスタリングされる。 This model resolution problem can be overcome by clustering the mean and variance separately. The variance can be clustered by setting the corresponding average to zero. As shown at the bottom of FIG. 1, the variances of kernels 20 and 28 are clustered into the left variance indicated by the distributed cluster kernel 50, and the variances of kernels 22, 24, 26, 30 are indicated by the distributed cluster kernel 52 Clustered into a variance of

元のカーネル２０、２２、２４、２６、２８及び３０の平均の各々は、平均クラスタカーネル４０又は４２のうち最も近く隣接するものの平均で近似される。同様に、カーネル２０、２２、２４、２６、２８及び３０の分散の各々は分散カーネル５０又は５２のうち最も近く隣接するものの分散で近似される。分散を平均のクラスタリングとは別個にクラスタリングすることにより、明らかに、高いモデル解像度を維持できる。 Each of the averages of the original kernels 20, 22, 24, 26, 28 and 30 is approximated by the average of the nearest neighbors of the average cluster kernels 40 or 42. Similarly, each of the variances of kernels 20, 22, 24, 26, 28 and 30 is approximated by the variance of the nearest neighbor of variance kernels 50 or 52. By clustering the variance separately from the average clustering, obviously a high model resolution can be maintained.

［構造］
図２はこの発明の一実施の形態に従った自動音声認識装置（ＡｕｔｏｍａｔｉｃＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ：ＡＳＲ）システム６０の構造を示す。図２を参照して、ＡＳＲシステム６０は、セグメント化され音声表記が付された音声データを含むトレーニングコーパス７０と、トレーニングコーパス７０内の音声データをトレーニングデータとして利用して、ＨＭＭ音響モデルをトレーニングするトレーニングモジュール７２と、上述のように、ＨＭＭ音響モデル７４内の状態の各々のカーネルの平均と分散とを別個にクラスタリングすることによって、ＨＭＭ音響モデル７４を圧縮するための圧縮モジュール７６とを含む。結果として得られる圧縮されたＨＭＭ音響モデル７８がＡＳＲに用いられる。 [Construction]
FIG. 2 shows the structure of an automatic speech recognition (ASR) system 60 according to an embodiment of the present invention. Referring to FIG. 2, ASR system 60 trains an HMM acoustic model using training corpus 70 including segmented speech data with speech notation and speech data in training corpus 70 as training data. And a compression module 76 for compressing the HMM acoustic model 74 by separately clustering the mean and variance of each kernel of the states in the HMM acoustic model 74 as described above. . The resulting compressed HMM acoustic model 78 is used for ASR.

ＡＳＲシステム６０はさらに、言語モデル８０と、入力発話８２の音声データを受け、ＨＭＭ音響モデル７８及び言語モデル８０を利用して入力音声を認識し、結果として得られるテキスト８６を出力するＡＳＲモジュール８４とを含む。 The ASR system 60 further receives the language model 80 and speech data of the input utterance 82, recognizes the input speech using the HMM acoustic model 78 and the language model 80, and outputs the resulting text 86. Including.

圧縮モジュール７６はソフトウェアで実現することができる。ソフトウェアの全体制御構造を図３に示す。 The compression module 76 can be realized by software. The overall control structure of the software is shown in FIG.

図３を参照して、このプログラムは、起動されると、ステップ１００において繰返し制御変数ｉをゼロに初期化する。ステップ１０２で、変数ｉを１だけ増分する。ステップ１０４で、変数ｉがＨＭＭパラメータの次元数Ｎ_dimより大きいか否かが判断される。もし変数ｉがＮ_dimより大きければ、このプログラムの実行は終了する。そうでなければ、制御はステップ１０６に進む。 Referring to FIG. 3, when this program is started, it repeatedly initializes control variable i to zero in step 100. In step 102, the variable i is incremented by one. In step 104, it is determined whether the variable i is larger than the dimension number _{Ndim of the} HMM parameter. If the variable i is greater than N _dim , execution of this program ends. Otherwise, control proceeds to step 106.

ステップ１０６で、ｉ番目の次元のカーネルの平均がクラスタリングされる。このステップの詳細は後に図４を参照して説明する。平均のクラスタリングが終了すると、ステップ１０８でｉ番目の次元内のカーネルの分散がクラスタリングされる。ここでカーネルの分散は、カーネルの平均をゼロに固定した状態でクラスタリングされる。ステップ１０８の詳細は図５を参照して後述する。 At step 106, the averages of the i th dimension kernels are clustered. Details of this step will be described later with reference to FIG. When the average clustering is finished, the distribution of kernels in the i-th dimension is clustered in step 108. Here, the kernel variance is clustered with the kernel average fixed at zero. Details of step 108 will be described later with reference to FIG.

ステップ１０８の後、制御はステップ１１６に進み、ここでステップ１０６及び１０８で得られたカーネルがＨＭＭモデルのｉ次元目のカーネルのためのコードブックページに書込まれる。ステップ１１８で、ＨＭＭのｉ次元目のカーネルに平均と分散とが割当られる。 After step 108, control proceeds to step 116 where the kernel obtained in steps 106 and 108 is written to the codebook page for the i-th kernel of the HMM model. At step 118, the mean and variance are assigned to the i-th kernel of the HMM.

ｉ番目の次元のガウスカーネルの各々には、クラスタリングされた平均のうちで最も近い隣接したものの平均と、クラスタリングされた分散のうちで最も近い隣接したものの分散とが割当てられる。言換えれば、もとのガウスカーネルはそれぞれ、最も近い隣接する平均クラスタカーネルの平均と、最も近い隣接する分散クラスタカーネルの分散とによって再定義される。 Each i th dimension Gaussian kernel is assigned the average of the nearest neighbor of the clustered averages and the variance of the nearest neighbor of the clustered variances. In other words, each original Gaussian kernel is redefined by the average of the nearest neighboring average cluster kernel and the variance of the nearest neighboring distributed cluster kernel.

ステップ１１８の後、制御はステップ１０２に戻り、（ｉ＋１）次元目のカーネルの圧縮が行なわれる。 After step 118, control returns to step 102, and compression of the (i + 1) -dimensional kernel is performed.

図４はステップ１０６の詳細を示す。図４を参照して、ｉ次元目のカーネルの平均は以下のステップによってクラスタリングされる。ステップ１３０で、繰返し制御変数ｊがゼロに初期化され、別の変数Ｑ＿ｏｌｄがコンピュータが取扱うことのできる最大値に初期化される。 FIG. 4 shows details of step 106. Referring to FIG. 4, the average of the i-th kernel is clustered by the following steps. At step 130, the iterative control variable j is initialized to zero and another variable Q_old is initialized to the maximum value that the computer can handle.

ステップ１３２で、クラスタリング用のカーネルに１個のカーネルが追加される。すなわち、クラスタリングは、ｉ次元目のガウスカーネルの平均を１個のクラスタリングカーネルにクラスタリングすることによって始まる。ステップ１３４で、変数ｊが１だけ増分される。 In step 132, one kernel is added to the clustering kernel. That is, clustering starts by clustering the average of the i-dimensional Gaussian kernels into one clustering kernel. At step 134, variable j is incremented by one.

ステップ１３６で、上述のＫＬＤ−ベースの最適化を利用して、最適の１個の（又は複数の）平均クラスタカーネルが計算される。結果として得られるＫＬＤの値を、Ｑ＿ｎｅｗとして保存する。 At step 136, an optimal single (or multiple) average cluster kernel is calculated utilizing the KLD-based optimization described above. The resulting KLD value is stored as Q_new.

ステップ１３８で、Ｑ＿ｏｌｄとＱ＿ｎｅｗとの差としてΔＱが計算される。すなわち、ΔＱ＝Ｑ＿ｏｌｄ−Ｑ＿ｎｅｗである。 At step 138, ΔQ is calculated as the difference between Q_old and Q_new. That is, ΔQ = Q_old−Q_new.

ステップ１４０で、ΔＱが予め定められたしきい値であるδより小さいか否かが判断される。もしΔＱがδより小さければ、制御はステップ１４４に進む。そうでなければ、制御はステップ１４２に進む。 In step 140, it is determined whether or not ΔQ is smaller than a predetermined threshold value δ. If ΔQ is less than δ, control proceeds to step 144. Otherwise, control proceeds to step 142.

ステップ１４２で、変数Ｑ＿ｎｅｗがＱ＿ｏｌｄとして保存され、制御はステップ１３２に戻る。 In step 142, the variable Q_new is saved as Q_old and control returns to step 132.

ステップ１４４で、ｊ番目の繰返しで得られたカーネルを、最適平均クラスタカーネルとして選択する。ステップ１４４の後、制御はこのルーチンを抜けて図３のステップ１０８に戻る。 In step 144, the kernel obtained in the jth iteration is selected as the optimal average cluster kernel. After step 144, control exits this routine and returns to step 108 of FIG.

図５は図３のステップ１０８の詳細を示す。なお、この処理の間は、カーネルの平均がゼロに固定されている。図５を参照して、ｉ次元目のカーネルの分散は以下のステップでクラスタリングされる。ステップ１６０で、繰返し制御変数ｊがゼロに初期化され、別の変数Ｑ＿ｏｌｄがコンピュータが取扱うことのできる最大値に初期化される。 FIG. 5 shows details of step 108 of FIG. During this process, the kernel average is fixed to zero. Referring to FIG. 5, the distribution of the i-th kernel is clustered in the following steps. At step 160, the iteration control variable j is initialized to zero, and another variable Q_old is initialized to the maximum value that the computer can handle.

ステップ１６２で、クラスタリング用のカーネルに１個のカーネルが追加される。すなわち、クラスタリングは、ｉ次元目のガウスカーネルの分散を１個のクラスタリングカーネルにクラスタリングすることによって始まる。ステップ１６４で、変数ｊが１だけ増分される。 In step 162, one kernel is added to the clustering kernel. That is, clustering is started by clustering the variance of the i-th Gaussian kernel into one clustering kernel. At step 164, variable j is incremented by one.

ステップ１６６で、上述のＫＬＤ−ベースの最適化を利用して、最適の１個の（又は複数の）分散クラスタカーネルが計算される。結果として得られるＫＬＤの値を、Ｑ＿ｎｅｗとして保存する。 At step 166, an optimal single (or multiple) distributed cluster kernel is calculated utilizing the KLD-based optimization described above. The resulting KLD value is stored as Q_new.

ステップ１６８で、Ｑ＿ｏｌｄとＱ＿ｎｅｗとの差としてΔＱが計算される。 At step 168, ΔQ is calculated as the difference between Q_old and Q_new.

ステップ１７０で、ΔＱがδより小さいか否かが判断される。もしΔＱがδより小さければ、制御はステップ１７４に進む。そうでなければ、制御はステップ１７２に進む。 In step 170, it is determined whether ΔQ is smaller than δ. If ΔQ is less than δ, control proceeds to step 174. Otherwise, control proceeds to step 172.

ステップ１７２で、変数Ｑ＿ｎｅｗがＱ＿ｏｌｄとして保存され、制御はステップ１６２に戻る。 In step 172, the variable Q_new is saved as Q_old and control returns to step 162.

ステップ１７４で、ｊ番目の繰返しで得られたカーネルを、最適分散クラスタカーネルとして選択する。ステップ１７４の後、制御はこのルーチンを抜けて図３のステップ１１６に戻る。 In step 174, the kernel obtained in the jth iteration is selected as the optimal distributed cluster kernel. After step 174, control exits this routine and returns to step 116 of FIG.

［コンピュータによる実現］
上述の実施の形態は、コンピュータシステムとそのシステムで実行されるコンピュータプログラムとで実現できる。ソフトウェアの制御構造は図３から図５を参照して説明した。図６はこの実施の形態のコンピュータシステム３３０の外観図であり、図７はシステム３３０をブロック図で示す。 [Realization by computer]
The above-described embodiment can be realized by a computer system and a computer program executed by the system. The software control structure has been described with reference to FIGS. FIG. 6 is an external view of the computer system 330 of this embodiment, and FIG. 7 shows the system 330 in a block diagram.

図６を参照して、コンピュータシステム３３０は、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）ドライブ３５２及びＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）ドライブ３５０を有するコンピュータ３４０と、キーボード３４６と、マウス３４８と、モニタ３４２とを含む。 Referring to FIG. 6, a computer system 330 includes a computer 340 having an FD (Flexible Disk) drive 352 and a CD-ROM (Compact Disc Read-Only Memory) drive 350, a keyboard 346, a mouse 348, a monitor 342, and the like. including.

図７を参照して、コンピュータ３４０はさらに、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３５６と、ＣＰＵ３５６、ＣＤ−ＲＯＭドライブ３５０及びＦＤドライブ３５２に接続されたバス３６６と、ブートアッププログラム等を記憶するＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）３５８と、ＣＰＵ３５６に接続されアプリケーションプログラム命令、システムプログラム及びデータを記憶するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３６０と、プリンタ３４４とを含む。 Referring to FIG. 7, the computer 340 further includes a CPU (Central Processing Unit) 356, a bus 366 connected to the CPU 356, the CD-ROM drive 350 and the FD drive 352, and a ROM (Read) for storing a bootup program and the like. -Only Memory) 358, a RAM (Random Access Memory) 360 connected to the CPU 356 for storing application program instructions, system programs and data, and a printer 344.

ここでは図示しないが、コンピュータ３４０はさらに、ローカルエリアネットワーク（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ：ＬＡＮ）への接続を提供するネットワークアダプタボードを含んでもよい。 Although not shown here, the computer 340 may further include a network adapter board that provides a connection to a local area network (LAN).

ＨＭＭモデルを圧縮する方法をコンピュータシステム３３０に実行させるプログラムは、ＣＤ−ＲＯＭ３６２又はＦＤ３６４等に記憶され、これらがＣＤ−ＲＯＭドライブ３５０又はＦＤドライブ３５２に挿入されると、さらにハードディスク３５４に転送される。これに代えて、プログラムは図示しないネットワークを介してコンピュータ３４０に送信され、ハードディスク３５４に記憶されてもよい。プログラムは実行の際にＲＡＭ３６０にロードされる。プログラムはＣＤ−ＲＯＭ３６２、ＦＤ３６４又はネットワークから直接ＲＡＭ３６０にロードされてもよい。 A program for causing the computer system 330 to execute the method for compressing the HMM model is stored in the CD-ROM 362 or the FD 364 and the like, and when these are inserted into the CD-ROM drive 350 or the FD drive 352, the program is further transferred to the hard disk 354. . Alternatively, the program may be transmitted to the computer 340 via a network (not shown) and stored in the hard disk 354. The program is loaded into the RAM 360 when executed. The program may be loaded directly into the RAM 360 from the CD-ROM 362, the FD 364, or the network.

図３から図５を参照して説明したプログラムは、コンピュータ３４０にこの実施の形態の方法を行なわせるための複数の命令を含む。この方法を行なうために必要とされる基本的な機能の幾つかはコンピュータ３４０上で動作しているオペレーティングシステム（ＯＳ）またはサードパーティプログラム、またはコンピュータ３４０にインストールされたＨＭＭツールキット等のモジュールによって提供されるので、プログラムは必ずしもこの発明の実施の形態を実現するのに必要なすべての基本的機能を含まなくてもよい。プログラムは、所望の結果が得られるように制御された様態で適切な機能または「ツール」を呼び出すことにより圧縮処理を行なう命令の部分のみを含んでいればよい。コンピュータ３３０がどのように動作するかは周知であるので、説明は省略する。 The program described with reference to FIGS. 3 to 5 includes a plurality of instructions for causing the computer 340 to perform the method of this embodiment. Some of the basic functions required to perform this method are performed by modules such as an operating system (OS) or a third party program running on computer 340, or an HMM toolkit installed on computer 340. As provided, the program need not necessarily include all the basic functions necessary to implement the embodiments of the present invention. The program only needs to include a portion of an instruction that performs compression processing by calling an appropriate function or “tool” in a controlled manner to obtain a desired result. Since the operation of the computer 330 is well known, a description thereof will be omitted.

［動作］
この実施の形態のＡＳＲシステム６０は以下のように動作する。図２を参照して、トレーニングコーパス７０が準備される。トレーニングコーパス７０はセグメント化されたスピーチデータと、関連の音声表記とを含む。 [Operation]
The ASR system 60 of this embodiment operates as follows. Referring to FIG. 2, a training corpus 70 is prepared. Training corpus 70 includes segmented speech data and associated phonetic notations.

トレーニングモジュール７２は、ＨＭＭ音響モデル７４を、ＨＭＭ音響モデル７４をトレーニングデータとして用いてトレーニングする。ＨＭＭモデルをトレーニングするツールは容易に入手可能であり、従って、トレーニングの詳細はここでは説明しない。 The training module 72 trains the HMM acoustic model 74 using the HMM acoustic model 74 as training data. Tools for training the HMM model are readily available, so training details will not be described here.

次に、圧縮モジュール７６がＨＭＭ音響モデル７４を以下のように圧縮する。まず、圧縮モジュール７６がＨＭＭ音響モデル７４の各状態の第１のパラメータ次元のガウスカーネルの平均を、ＫＬＤが最小となるようにクラスタリングして、最適数の平均クラスタカーネルとする（図３、ステップ１０６）。次に、圧縮モジュール７６はＨＭＭ音響モデル７４の各状態の第１のパラメータ次元のガウスカーネルの分散を、ＫＬＤが最小になるようにクラスタリングして最適数の分散クラスタカーネルとする（図３、ステップ１０８）。このとき、カーネルの平均はゼロに固定される。 Next, the compression module 76 compresses the HMM acoustic model 74 as follows. First, the compression module 76 clusters the average of the Gaussian kernels of the first parameter dimension in each state of the HMM acoustic model 74 so as to minimize the KLD to obtain the optimum number of average cluster kernels (FIG. 3, step). 106). Next, the compression module 76 clusters the Gaussian kernels of the first parameter dimension in each state of the HMM acoustic model 74 so as to minimize the KLD to obtain the optimum number of distributed cluster kernels (FIG. 3, step). 108). At this time, the average of the kernel is fixed to zero.

ステップ１１６で、このようにして得られたクラスタカーネルが１次元目のコードブックページに書込まれる。次に、１次元目のガウスカーネルが、それぞれ元のガウスカーネルの平均に最も近い平均を有する平均クラスタカーネルと、それぞれ元のガウスカーネルの分散に最も近い分散を有する分散クラスタカーネルとによって再定義される。 In step 116, the cluster kernel thus obtained is written into the first dimension codebook page. Next, the first dimension Gaussian kernel is redefined by an average cluster kernel, each having an average closest to the average of the original Gaussian kernel, and a distributed cluster kernel, each having a variance closest to the variance of the original Gaussian kernel. The

その後、上述のステップがＨＭＭ音響モデル７４の各状態の他のパラメータ次元について繰返され、ＨＭＭ音響モデルが圧縮される。ＨＭＭ音響モデルの全てのパラメータ次元が圧縮されると、圧縮されたモデルが圧縮モジュール７６からＨＭＭ音響モデル７８として出力される。 The above steps are then repeated for the other parameter dimensions of each state of the HMM acoustic model 74 to compress the HMM acoustic model. When all parameter dimensions of the HMM acoustic model are compressed, the compressed model is output from the compression module 76 as an HMM acoustic model 78.

ＨＭＭ音響モデル７８と言語モデル８０が利用可能となると、ＡＳＲモジュール８４は入力発話８２を受けるための準備ができたことになり、ＨＭＭ音響モデル７８と言語モデル８０とを利用して音声データを認識し、認識されたテキスト８６を出力する。 When the HMM acoustic model 78 and the language model 80 are available, the ASR module 84 is ready to receive the input utterance 82 and recognizes speech data using the HMM acoustic model 78 and the language model 80. The recognized text 86 is output.

［実験結果］
この実施の形態のクラスタリング方法を、ＤＡＲＰＡ（ＴｈｅＤｅｆｅｎｓｅＡｄｖａｎｃｅｄＲｅｓｅａｒｃｈＰｒｏｊｅｃｔｓＡｇｅｎｃｙ）９９１単語リソースマネジメントデータベースで試験した。この実現に際しては、標準的なＳＩ−１０９トレーニングデータセット（３，９９０発話）が用いられた。ＣＭＵ（ＣａｒｎｅｇｉｅＭｅｌｌｏｎＵｎｉｖｅｒｓｉｔｙ：カーネギーメロン大学）４８フォンセットを用いて、各々が３状態で状態ごとに１２ガウス混合要素を有する、文脈独立な（ＣｏｎｔｅｘｔＩｎｄｅｐｅｎｄｅｎｔ：ＣＩ）音素モデルが生成された。素性は従来の３９次元メル周波数ケプストラム計数（ｍｅｌ−ｆｒｅｑｕｅｎｃｙｃｅｐｓｔｒｕｍｃｏｅｆｆｉｃｉｅｎｔｓ：ＭＦＣＣ）（１２個の静的ＭＦＣＣ及び対数エネルギ、並びにそれらの第１次、第２次の導関数）であった。 [Experimental result]
The clustering method of this embodiment was tested on a DARPA (The Defense Advanced Research Projects Agency) 991 word resource management database. For this implementation, a standard SI-109 training data set (3,990 utterances) was used. Using CMU (Carnegie Mellon University) 48 phone sets, context-independent (CI) phoneme models were generated, each with 3 states and 12 Gaussian mixing elements per state. The features were conventional 39-dimensional mel-frequency cepstrum counts (MFCC) (12 static MFCCs and logarithmic energy, and their first and second derivatives).

Ｆｅｂ８９のテストセットを用いて、パープレキシティ６０の標準単語対文法を評価した。元の、量子化していないＨＭＭのベースライン認識性能は、単語精度で９２．８２％であった。 The Perplexity 60 standard word-pair grammar was evaluated using the Feb89 test set. The baseline recognition performance of the original, unquantized HMM was 92.82% in word accuracy.

（１）平均と分散とが別個にクラスタリングされる別個のクラスタリング（本実施の形態）を、（２）平均と分散とを合わせたクラスタリング（非特許文献４）、及び（３）元の分散を用いた平均クラスタリングのみ（非特許文献３）、と比較した。結果を図８に示す。 (1) Separate clustering in which the mean and variance are clustered separately (this embodiment), (2) Clustering that combines the mean and variance (Non-Patent Document 4), and (3) The original variance Comparison was made with only the average clustering used (Non-Patent Document 3). The results are shown in FIG.

図８に見られるように、平均と分散とを別個にクラスタリングすると、平均と分散とを合わせたクラスタリングの性能を上回った。また、分散を別個にクラスタリングした場合、認識性能は、次元ごとに１６またはそれ以上のクラスタで、クラスタリングなしの分散の場合と同等（またはわずかに良好）であった。 As can be seen in FIG. 8, when the average and variance were clustered separately, the performance of clustering combining the average and variance exceeded the performance. Also, when the variances were clustered separately, the recognition performance was equal (or slightly better) with 16 or more clusters per dimension than the variance without clustering.

［記憶容量と計算量の要件］

表１はこの実施の形態に従った最適にクラスタリングされたモデル（平均と分散との別個のクラスタリング）の記憶容量及び計算量の、元のクラスタリングされていないＨＭＭに対するパーセンテージを示す。容量は、クラスタの数により１２％〜２４％までに減少した。演算量に関しては、ｊ番目の状態の対数尤度は以下のように計算される。 [Storage capacity and computational requirements]

Table 1 shows the percentage of storage capacity and complexity of the optimally clustered model (mean and variance separate clustering) according to this embodiment relative to the original unclustered HMM. Capacity decreased from 12% to 24% depending on the number of clusters. Regarding the computational complexity, the log likelihood of the jth state is calculated as follows.

平均と分散とをクラスタリングしているので、第３項(o_ti-μ_jmi)²/2σ_jmi ²は予め計算して異なる出力ｐｄｆで共有する。表１に示すように、乗算／除算は２％〜１９％まで減少し、加算／減算は５２％〜５４％までに減少し、性能の劣化はわずかであった。

Since the mean and variance are clustered, the third term (o _ti -μ _jmi ) ² / _{2σ jmi} ² is calculated in advance and shared by different output pdfs. As shown in Table 1, multiplication / division decreased from 2% to 19%, addition / subtraction decreased from 52% to 54%, and the performance degradation was slight.

［結論］
多変量、対角共分数ベースのＨＭＭガウス分布カーネルを、各スカラー次元において、対応する対称カルバック−ライブラー・ダイバージェンスを最小化することによって最適にクラスタリングした。平均と分散とを別個にクラスタリングすることにより、元のＨＭＭの高いモデル解像度を維持した。リソースマネジメントデータベースでの評価では、さほど性能を劣化させることなく、記憶量と計算量とをかなり減じることができた。 [Conclusion]
A multivariate, diagonal co-fraction based HMM Gaussian distribution kernel was optimally clustered by minimizing the corresponding symmetric Kalbach-Librer divergence in each scalar dimension. By clustering the mean and variance separately, we maintained the high model resolution of the original HMM. In the evaluation using the resource management database, the amount of storage and the amount of calculation could be reduced considerably without degrading performance.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the embodiment described above. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

この発明の一実施の形態に従った平均と分散とのクラスタリングを概略的に示す図である。It is a figure which shows roughly the clustering of the average and dispersion | distribution according to one embodiment of this invention. この発明の一実施の形態に従ったＡＳＲシステム６０の構造を示すブロック図である。It is a block diagram which shows the structure of ASR system 60 according to one embodiment of this invention. 実施の形態に従ったクラスタリングプログラムのフローチャートである。It is a flowchart of the clustering program according to the embodiment. 実施の形態に従った平均クラスタリングのフローチャートである。It is a flowchart of the average clustering according to the embodiment. 実施の形態に従った分散クラスタリングのフローチャートである。It is a flowchart of distributed clustering according to an embodiment. クラスタリングプログラムを実行するコンピュータシステム３３０を示す図である。It is a figure which shows the computer system 330 which executes a clustering program. コンピュータシステム３３０の構造を示すブロック図である。2 is a block diagram showing a structure of a computer system 330. FIG. この発明の実施の形態に従ったＡＳＲ性能を示す図である。It is a figure which shows the ASR performance according to embodiment of this invention. 先行技術によるクラスタリングを概略的に示す図である。It is a figure which shows the clustering by a prior art schematically.

Explanation of symbols

２０、２２、２４、２６、２８、３０ガウスカーネル
４０、４２平均クラスタカーネル
５０、５２分散クラスタカーネル
６０ＡＳＲシステム
７０トレーニングコーパス
７２トレーニングモジュール
７４ＨＭＭ音響モデル
７６圧縮モジュール
７８圧縮ＨＭＭ音響モデル
８０言語モデル
８４ＡＳＲモジュール 20, 22, 24, 26, 28, 30 Gaussian kernel 40, 42 Average cluster kernel 50, 52 Distributed cluster kernel 60 ASR system 70 Training corpus 72 Training module 74 HMM acoustic model 76 Compression module 78 Compression HMM acoustic model 80 Language model 84 ASR module

Claims

A method of compressing a plurality of probability density function kernels of a probability model, wherein the plurality of probability density function kernels are each defined by a first parameter and a second parameter,
Clustering first parameters of the plurality of probability density function kernels into one or more first centroid kernels;
Clustering a second parameter of the plurality of probability density function kernels into one or more second centroid kernels; and a first parameter of each of the plurality of probability density function kernels of the first centroid kernels. A first parameter having a first parameter closest to one parameter and a second parameter having a second parameter closest to each of the plurality of probability density function kernels among the second centroid kernels. Redefining each of the plurality of probability density function kernels with two parameters.

The method of claim 1, wherein each of the plurality of probability density function kernels includes a Gaussian probability density function kernel, and the first and second parameters include a mean and variance of a Gaussian probability density function kernel.

Clustering the first parameter comprises averaging the Gaussian probability density function kernels such that an error between the redefined Gaussian probability density function kernel and each corresponding original probability density function kernel is minimized. The method of claim 2, comprising clustering.

The step of clustering the second parameter includes the Gaussian probability such that the sum of errors between the redefined Gaussian probability density function kernel and each corresponding original probability density function kernel having zero mean is minimized. 4. A method according to claim 2 or 3, comprising clustering the variance of the density function kernel.

5. A method according to claim 3 or claim 4, wherein the error of a given two probability density functions is calculated as a symmetric Kalbach-Librarian divergence between the two given probability density functions.

A computer program that, when executed on a computer, causes the computer to execute all the steps according to any one of claims 1 to 5.