JPH064092A

JPH064092A - Hmm generating device, hmm storage device, likelihood calculating device, and recognizing device

Info

Publication number: JPH064092A
Application number: JP4159834A
Authority: JP
Inventors: Hidekazu Tsuboka; 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-06-18
Filing date: 1992-06-18
Publication date: 1994-01-14

Abstract

PURPOSE:To obtain the recognizing device using HMM(Hidden Markov Model) which has high recognition precision and is small in calculation quantity. CONSTITUTION:The device is equipped with continuous probability distribution HMM generating means 101, 102, and 104 which generate a continuous probability distribution HMM, a clustering means 106 which clusters learnt vectors, and a generation degree calculating means 108 which calculates the generation degrees of respective clusters in respective states of the HMM from probability density functions in the respective states of the continuous type HMM; and the generation degrees are used as generation degrees in the respective states of the respective cluster levels to generate a discrete probability distribution HMM.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】音声認識等のパターン認識に適用
可能なＨＭＭ(ヒト゛ンマルコフモテ゛ル(Hidden Markov Mode
l)）作成装置、ＨＭＭ記憶装置、尤度計算装置及び、認
識装置に関するものである。[Industrial application] HMM (Hidden Markov Mode) applicable to pattern recognition such as voice recognition
l)) A creation device, an HMM storage device, a likelihood calculation device, and a recognition device.

【０００２】[0002]

【従来の技術】ＨＭＭは一般の時系列信号の処理に適用
可能なものであるが、説明の便宜のために、以下、音声
認識に適用した例を説明する。2. Description of the Related Art An HMM is applicable to general time series signal processing, but for convenience of description, an example applied to speech recognition will be described below.

【０００３】先ずＨＭＭを用いた音声認識装置について
説明する。First, a voice recognition device using an HMM will be described.

【０００４】図３は、ＨＭＭを用いた音声認識装置のブ
ロック図である。音声分析部２０１は、入力音声信号を
フィルタバンク、フーリエ変換、ＬＰＣ分析等の周知の
方法により、一定時間間隔（フレームと呼ぶ）例えば１
０msec毎に特徴ベクトルに変換する。従って、入力音声
信号は特徴ベクトルの系列Ｙ＝(ｙ(１),ｙ(２),・・・,ｙ
(Ｔ))に変換される。Ｔはフレーム数である。コードブ
ック２０２は、ラベル付けされた代表ベクトルを保持し
ている。ベクトル量子化部２０３は、前記ベクトル系列
Ｙのそれぞれのベクトルをそれに最も近い前記コードブ
ック２０２に登録されている代表ベクトルに対応するラ
ベルに置き換えるものである。ＨＭＭ作成部２０４は、
訓練データから認識語彙たる各単語に対応するＨＭＭを
作成するものである。即ち、単語ｖに対応するＨＭＭを
作るには、先ず、ＨＭＭの構造（状態数やそれら状態の
間に許される遷移規則）を適当に定め、然る後に前記の
如くして単語ｖを多数回発声して得られたラベル系列か
ら、それらラベル系列の発生確率が出来るだけ高くなる
ように、前記モデルにおける状態遷移確率や状態の遷移
に伴って発生するラベルの発生確率を求めるものであ
る。ＨＭＭ記憶部２０５は、このようにして得られたＨ
ＭＭを各単語毎に記憶するものである。尤度計算部２０
６は、認識すべき未知入力音声のラベル系列に対し、前
記ＨＭＭ記憶部２０５に記憶されているそれぞれのモデ
ルのそのラベル系列に対する尤度を計算するものであ
る。比較判定部２０７は尤度計算部２０６で得られた前
記それぞれのモデルの尤度の最大値を与えるモデルに対
応する単語を認識結果として判定するものである。FIG. 3 is a block diagram of a voice recognition device using an HMM. The voice analysis unit 201 uses a well-known method such as a filter bank, Fourier transform, or LPC analysis for the input voice signal to set a fixed time interval (called a frame), for example, 1
Convert to a feature vector every 0 msec. Therefore, the input speech signal is a sequence of feature vectors Y = (y (1), y (2), ..., Y
(T)). T is the number of frames. The codebook 202 holds the labeled representative vector. The vector quantization unit 203 replaces each vector of the vector series Y with a label corresponding to the representative vector registered in the codebook 202 that is closest to the vector. The HMM creating unit 204
The HMM corresponding to each word which is a recognition vocabulary is created from the training data. That is, in order to create the HMM corresponding to the word v, first, the structure of the HMM (the number of states and the transition rules allowed between those states) is appropriately determined, and then the word v is repeated many times as described above. From the label sequences obtained by uttering, the state transition probabilities in the model and the label occurrence probabilities associated with the state transitions are calculated so that the occurrence probabilities of the label sequences are as high as possible. The HMM storage unit 205 stores the H thus obtained.
The MM is stored for each word. Likelihood calculator 20
6 calculates the likelihood of the label sequence of each model stored in the HMM storage unit 205 with respect to the label sequence of the unknown input speech to be recognized. The comparison determination unit 207 determines a word corresponding to the model that gives the maximum value of the likelihood of each model obtained by the likelihood calculation unit 206 as a recognition result.

【０００５】このＨＭＭによる認識は具体的には次のよ
うにして行われる。即ち、未知入力に対して得られたラ
ベル系列をＯ＝(ｏ(１),ｏ(２),・・・,ｏ(Ｔ))、単語ｖに
対応したモデルをλ^vとし、モデルλ^vにより発生される
長さＴの任意の状態系列を１、Ｘ＝(ｘ(１),ｘ(２),・・
・,ｘ(Ｔ))とするとき、λ^vのラベル系列Ｏに対する尤度
は〔厳密解〕The recognition by the HMM is specifically performed as follows. That is, the label sequence obtained for the unknown input is O = (o (1), o (2), ..., O (T)), the model corresponding to the word v is λ ^v , and the model λ ^v , An arbitrary state sequence of length T generated by X = (x (1), x (2), ...
, X (T)), the likelihood of λ ^{v for} the label sequence O is [exact solution]

【０００６】[0006]

【数１】 [Equation 1]

【０００７】〔近似解〕[Approximate Solution]

【０００８】[0008]

【数２】 [Equation 2]

【０００９】または、対数をとってOr, taking the logarithm

【００１０】[0010]

【数３】 [Equation 3]

【００１１】で定義される。ここで、Ｐ(ｘ,ｙ|λ^v)
は、モデルλ^vにおけるｘ,ｙの同時確率である。Is defined by Where P (x, y | λ ^v )
Is the joint probability of x and y in the model λ ^v .

【００１２】従って、例えば、（数１）を用いればTherefore, if, for example, (Equation 1) is used,

【００１３】[0013]

【数４】 [Equation 4]

【００１４】とするとき、ｖ^が認識結果となる。（数
２），（数３）を用いるときも同様である。Then, v ^ is the recognition result. The same applies when using (Equation 2) and (Equation 3).

【００１５】Ｐ(Ｏ,Ｘ|λ) は次のようにして求められ
る。P (O, X | λ) is obtained as follows.

【００１６】いま、ＨＭＭλの状態ｑ_i(ｉ＝１〜Ｉ)に
対して、状態ｑ_i毎に、ラベルｏの発生確率ｂ_i(ｏ)と状
態ｑ_i(ｉ＝１〜Ｉ)から状態ｑ_j(ｊ＝１〜Ｉ＋１)への遷
移確率ａ_ijが与えられているとき、状態系列Ｘ＝(ｘ
(１),ｘ(２),・・・,ｘ(Ｔ＋１))とラベル系列Ｏ＝(ｏ
(１),ｏ(２),・・・,ｏ(Ｔ))のＨＭＭλから発生する同時
確率はNow, for the state q _i (i = 1 to I) of the HMMλ, for each state q _i , the occurrence probability b _i (o) of the label o and the state q _i (i = 1 to I) When the transition probability a _ij to q _j (j = 1 to I + 1) is given, the state sequence X = (x
(1), x (2), ..., x (T + 1)) and label series O = (o
(1), o (2), ..., o (T))

【００１７】[0017]

【数５】 [Equation 5]

【００１８】と定義出来る。ここでπ_x(1)は状態ｘ(１)
の初期確率である。また、ｘ(Ｔ＋１)＝Ｉ＋１は最終状
態であって、如何なるラベルも発生しないとする。It can be defined as Where π _{x (1)} is the state x (1)
Is the initial probability of. It is also assumed that x (T + 1) = I + 1 is the final state and no label is generated.

【００１９】この例では入力の特徴ベクトルｙをラベル
に変換したが、各状態におけるラベルの発生確率の代り
に特徴ベクトルｙをそのまま用い、各状態において特徴
ベクトルｙの確率密度関数を与える方法もある。このと
きは（数５）における前記ラベルｏの状態ｑ_iにおける
発生確率ｂ_i(ｏ)の代わりに特徴ベクトルｙの確率密度
ｂ_i(ｙ)を用いることになる（以後、ｚがラベルのとき
はｂ_i(ｚ)はｚが状態ｉにおいて生じる確率、ｚがベク
トルのときはｂ_i(ｚ)はｚの確率密度を意味するものと
する）。このときは、前記（数１）、（数２）、（数
３）は次のようになる。〔厳密解〕In this example, the input feature vector y is converted into a label, but there is also a method of using the feature vector y as it is in place of the label occurrence probability in each state and providing the probability density function of the feature vector y in each state. . At this time, the occurrence probability b _i (o) in the state q _i of the label o in (Equation 5) B _i when instead would use a probability density b _i (y) of the feature vector y in (hereinafter, the probability z is generated in b _i (z) is z state i when the label, z is a vector of (z) means the probability density of z). At this time, the above (Formula 1), (Formula 2), and (Formula 3) are as follows. [Exact solution]

【００２０】[0020]

【数６】 [Equation 6]

【００２１】〔近似解〕[Approximate Solution]

【００２２】[0022]

【数７】 [Equation 7]

【００２３】または、対数をとれば次式が得られる。Alternatively, the following equation can be obtained by taking the logarithm.

【００２４】[0024]

【数８】 [Equation 8]

【００２５】以上、何れの方式を用いるにしても最終的
な認識結果は、それぞれの単語ｖに対してＨＭＭλ^vを
ｖ＝１〜Ｖについて準備しておけば、入力音声信号Ｙ
に対して、The above, final recognition results in the use of any method, if preparing a HMMramuda ^v for v = 1 to V for each word v, the input speech signal Y
Against

【００２６】[0026]

【数９】 [Equation 9]

【００２７】あるいはOr

【００２８】[0028]

【数１０】 [Equation 10]

【００２９】がＹの認識結果となる。勿論、ここでのＹ
は前記それぞれ方法に応じて、入力されたラベル系列、
特徴ベクトル系列等である。Is the recognition result of Y. Of course, Y here
Is the input label sequence according to the above method,
A feature vector series or the like.

【００３０】[0030]

【発明が解決しようとする課題】このような従来例にお
いて、入力特徴ベクトルをラベルに変換するものを離散
確率分布ＨＭＭ、入力特徴ベクトルをそのまま用いるも
のを連続確率分布ＨＭＭと以下呼ぶことにする。このと
き、これら両者の特徴は次のようである。In such a conventional example, the one that transforms the input feature vector into a label is called a discrete probability distribution HMM, and the one that uses the input feature vector as it is is called a continuous probability distribution HMM. At this time, the characteristics of both of them are as follows.

【００３１】離散確率分布ＨＭＭは、入力ラベル系列に
対するモデルの尤度の計算において、各状態での各ラベ
ルの発生度合ｂ_i(Ｃ_m)はラベルに関連して予め記憶され
ている記憶装置から読み出すことで実行できるから計算
量が非常に少ないと言う利点がある反面、量子化に伴う
誤差のため、認識精度が悪くなると言う課題がある。こ
れを避けるためにラベル数（クラスタ数）を多くする必
要があるが、その増加に伴ってモデルを学習するために
必要な学習パターン数が膨大になる。ここで、学習パタ
ーン数が不十分な場合は、前記ｂ_i(Ｃ_m)が頻繁に０にな
ることがあり、正しい推定が出来なくなる。例えば、次
のようなことが生じる。In the discrete probability distribution HMM, the occurrence degree b _i (C _m ) of each label in each state in the calculation of the likelihood of the model with respect to the input label sequence is stored from the storage device stored in advance in association with the label. Although there is an advantage that the amount of calculation is very small because it can be executed by reading out, there is a problem that recognition accuracy deteriorates due to an error associated with quantization. To avoid this, it is necessary to increase the number of labels (the number of clusters), but as the number of labels increases, the number of learning patterns necessary for learning the model becomes enormous. Here, when the number of learning patterns is insufficient, the b _i (C _m ) may frequently become 0, and correct estimation cannot be performed. For example, the following occurs.

【００３２】コードブック作成は、認識すべき全ての単
語について多数の話者の発声音声を特徴ベクトル系列に
変換し、この特徴ベクトルの集合をクラスタリングし、
それぞれのクラスタにラベリングすることによって行わ
れる。それぞれのクラスタは、セントロイドと呼ばれる
そのクラスタの代表ベクトルを持ち、通常これは各々の
クラスタに分類されたベクトルの期待値である。コード
ブックは、これらセントロイドを前記ラベルで検索可能
な形で記憶したものである。The codebook is created by converting the uttered voices of a large number of speakers into a feature vector sequence for all the words to be recognized, clustering this feature vector set,
This is done by labeling each cluster. Each cluster has a representative vector of that cluster, called the centroid, which is usually the expected value of the vector classified into each cluster. The codebook stores these centroids in a form searchable by the label.

【００３３】いま、前記認識語彙の中に、例えば「大
阪」と言う単語があって、これに対応するモデルを作る
場合を考える。多数話者が発声した単語「大阪」に対応
する音声サンプルが特徴ベクトル列に変換され、各々の
特徴ベクトルが前記セントロイドと比較され、最近隣の
セントロイドに対応するラベルがその特徴ベクトルの量
子化されたものとなる。このようにして、前記「大阪」
に対する各々の音声サンプルは、ラベル系列に変換され
る。得られたラベル系列から、それらラベル系列に対す
る尤度が最大になるようにＨＭＭのパラメータを推定す
ることにより、単語「大阪」に対応するモデルが出来上
がる。この推定には周知のホ゛ーム・ウェルチ(Baum-Welch)法等
が用いられ得る。Now, consider a case where, for example, there is a word "Osaka" in the recognition vocabulary, and a model corresponding to this is made. A voice sample corresponding to the word "Osaka" uttered by a large number of speakers is converted into a feature vector sequence, each feature vector is compared with the centroid, and a label corresponding to the nearest centroid is a quantum of the feature vector. Will be In this way, the "Osaka"
Each voice sample for is converted into a label sequence. A model corresponding to the word "Osaka" is created by estimating the parameters of the HMM from the obtained label series so that the likelihood for the label series is maximized. The well-known Baum-Welch method or the like can be used for this estimation.

【００３４】この場合、前記コードブックにあるラベル
の中で、単語「大阪」に対応する学習ラベル系列の中に
は含まれていないものが有り得る。この含まれていない
ラベルの発生確率は学習の過程で“０”と推定されてし
まう。従って、認識の時に発声される「大阪」と言う単
語が変換されたラベル系列の中に、前記「大阪」のモデ
ルの作成に用いたラベル系列には含まれていないラベル
が存在することは十分有り得る。この場合は、この認識
時に発声された「大阪」のラベル系列が前記「大阪」の
モデルから発生する確率は“０”になってしまう。しか
し、このような場合でも、ラベルとしては異なっていて
も、ラベルに変換される前の特徴ベクトルの段階ではモ
デルの学習に用いた音声サンプルとかなり近く、ベクト
ルの段階で見れば十分「大阪」と認識されても良い場合
がある。もともと同じ単語を発声しているのであるから
ベクトルのレベルでは似通っているにも関わらず、ラベ
ルのレベルでは僅かの差で全く異なったラベルに変換さ
れてしまうということは十分起こり得るのであって、こ
のようなことが認識精度に悪影響を及ぼすことは容易に
想像がつく。クラスタ数が増加する程、訓練データ数が
少ない程このような問題は頻繁に生じることになる。In this case, among the labels in the codebook, some may not be included in the learning label series corresponding to the word "Osaka". The occurrence probability of the label not included is estimated to be "0" in the learning process. Therefore, it is sufficient that there is a label that is not included in the label sequence used to create the model of "Osaka" in the label sequence in which the word "Osaka" converted at the time of recognition is converted. It is possible. In this case, the probability that the “Osaka” label sequence uttered at the time of recognition will occur from the “Osaka” model will be “0”. However, even in such a case, even if the label is different, at the stage of the feature vector before being converted to the label, it is quite close to the speech sample used for learning the model, and if you look at the stage of the vector, it is sufficient to say “Osaka”. It may be recognized that It is quite possible that a label will be converted to a completely different label with a slight difference, even though they are similar at the vector level because they originally speak the same word. It can be easily imagined that such a situation adversely affects the recognition accuracy. Such problems occur more frequently as the number of clusters increases and the number of training data decreases.

【００３５】この課題を除去するためには、訓練集合に
は現れてこない（含まれていない）ラベルに対して、平
滑化や補完を行う等の工夫が必要となる。「結び」と呼
ばれる概念を用いてパラメータ数を減少させる工夫をは
じめとして、０確率が推定される場合はそれを０にせず
に微小量に置き換えたり、ファジイベクトル量子化等の
ようにクラスタの境界をぼかしたりする方法等、平滑化
や補完を行う方法が種々提案されているが、何れも上記
課題を根本的に解決するものではない。また、場合に応
じて経験的に決めなければならない要素があって、それ
らの要素を決める理論的な指標はない。In order to eliminate this problem, it is necessary to take measures such as smoothing and complementing labels that do not appear (are not included) in the training set. Starting from the idea of reducing the number of parameters using the concept called “conclusion”, when the 0 probability is estimated, it is replaced with a small amount instead of 0, or the boundary of the cluster such as fuzzy vector quantization. Although various methods for smoothing and complementing, such as a method for blurring, have been proposed, none of these methods fundamentally solve the above problems. Also, there are factors that must be determined empirically depending on the case, and there is no theoretical indicator that determines those factors.

【００３６】他方、連続確率分布ＨＭＭは、分布形状は
正規分布等と予め関数の形で与えておき、学習データか
らこの関数を規定するパラメータを推定するものであ
る。従って、推定すべきパラメータ数は少なく、前記離
散型のものに比べて少ない学習パターンで精度良くパラ
メータの推定が出来、平滑化や補完を考える必要もなく
なり、一般に離散型よりも高い認識率の得られることが
報告されている。On the other hand, in the continuous probability distribution HMM, the distribution shape is given in advance in the form of a function such as a normal distribution and the parameters defining this function are estimated from the learning data. Therefore, the number of parameters to be estimated is small, the parameters can be accurately estimated with less learning patterns than the discrete type, there is no need to consider smoothing and complementation, and generally a higher recognition rate than the discrete type can be obtained. It is reported that

【００３７】因に、離散型と連続型とで、図４のような
４状態３ループのＨＭＭにおけるパラメータ数を比較す
れば例えば次のようになる。離散型の場合は用いられる
ラベルの種類を２５６とすれば、ラベルの発生確率は２
５６×３=７６８、遷移確率は６の計８７４が１モデル
当り必要である。連続型の場合は１０次元の正規分布と
すれば、平均ベクトルは１０×３=３０、分散共分散行
列は５５×３=１６５（∵対称行列）、遷移確率は６の
計２０１となり、推定すべきパラメータの値は、連続型
は離散型の１／４以下となる。The number of parameters in the four-state, three-loop HMM as shown in FIG. 4 is compared between the discrete type and the continuous type, for example, as follows. In the case of the discrete type, if the type of label used is 256, the label occurrence probability is 2
56 × 3 = 768, the transition probability is 6, and a total of 874 are required per model. In the case of the continuous type, if the 10-dimensional normal distribution is used, the average vector is 10 × 3 = 30, the variance / covariance matrix is 55 × 3 = 165 (∵symmetric matrix), and the transition probability is 6, totaling 201. The value of the power parameter in the continuous type is 1/4 or less of that in the discrete type.

【００３８】しかしながら、連続型は認識精度の点で優
れているが計算量は離散型に比べて非常に多くなるとい
う課題がある。即ち、入力特徴ベクトルｙ(ｔ)が、状態
ｉで平均ベクトルμ_i、分散共分散行列Σ_iの正規分布を
するとするき、状態ｉにおけるｙ(ｔ)の発生確率（密
度）の計算には(ｙ(ｔ)−μ_i)^TΣ_i ^-1(ｙ(ｔ)−μ_i)なる
計算を必要とし、例えば、１０次元の連続型のＨＭＭで
は、この計算だけでも１１０回のかけ算が必要であり、
１つのモデルに対しては、これの（状態数×入力フレー
ム数）倍になる。従って、入力フレーム数が５０フレー
ムの場合で前記モデルを想定すれば、１つのモデル当り
必要とされる(ｙ(ｔ)−μ_i)^TΣ_i ^-1(ｙ(ｔ)−μ_i)の計算
における掛算の回数は、１１０×３×５０＝１６５００
となり、単語数が５００であるとさらにこれが５００倍
される。即ち、その場合はこの部分の掛け算のみで８２
５万回が必要となる。However, although the continuous type is excellent in recognition accuracy, there is a problem that the amount of calculation becomes much larger than that of the discrete type. That is, assuming that the input feature vector y (t) has a normal distribution of the mean vector μ _i and the variance-covariance matrix Σ _{i in} the state i, the calculation of the occurrence probability (density) of y (t) in the state i (y (t) −μ _i ) ^T Σ _i ⁻¹ (y (t) −μ _i ), which is necessary. For example, in a 10-dimensional continuous HMM, this calculation alone requires 110 multiplications. And
This is (number of states x number of input frames) times this for one model. Therefore, if the above model is assumed when the number of input frames is 50, (y (t) −μ _i ) ^T Σ _i ⁻¹ (y (t) −μ _i ) of one model is required. The number of multiplications in the calculation is 110 × 3 × 50 = 16500
When the number of words is 500, this is further multiplied by 500. That is, in that case, only the multiplication of this part is 82
It requires 50,000 times.

【００３９】離散型の場合は、ベクトル量子化の計算を
完了すれば、前記のようにラベルに従って記憶装置から
そのラベルの発生確率を読み出すのみでよい。またｙ
(ｔ)をベクトル量子化するのに必要な計算は、前記の例
では、２５６個の代表ベクトルとｙ(ｔ)との距離あるい
は類似度の計算である。距離を(ユークリッド距離)²と
する場合は、ｙ(ｔ)をラベル付けするのに必要な計算
は、１０回の引算と１０回の掛算と１０回の足算の２５
６倍である。従って５０フレームでは、掛算のみで考え
れば、１０×２５６×５０＝１２８０００回と言うこと
になる。もし、バイナリサーチと呼ばれる方法でベクト
ル量子化する場合は、前記２５６は２log₂２５６＝１６
でおきかえて、１０×１６×５０＝８０００回と言うこ
とになる。In the case of the discrete type, once the vector quantization calculation is completed, it is only necessary to read the occurrence probability of the label from the storage device according to the label as described above. See y
The calculation required for vector quantization of (t) is the calculation of the distance or the similarity between 256 representative vectors and y (t) in the above example. If the distance is (Euclidean distance) ² , the calculation required to label y (t) is 25 times 10 subtractions, 10 multiplications, and 10 additions.
6 times. Therefore, in 50 frames, if only multiplication is considered, 10 × 256 × 50 = 1280 thousand times. If the vector quantization is performed by a method called binary search, the above 256 is 2log ₂ 256 = 16.
In other words, 10 × 16 × 50 = 8000 times.

【００４０】以上のように離散型とすることにより計算
量が著しく減少し、連続型の場合は認識単語数が増える
と計算量もそれに比例して増大するが、離散型の場合
は、入力音声信号を一旦ベクトル量子化するときのみこ
の計算が必要なのであって、認識単語数が増えてもこの
計算量は不変である。As described above, when the discrete type is used, the amount of calculation is remarkably reduced, and when the number of recognized words is increased in the continuous type, the amount of calculation is also increased in proportion thereto. This calculation is necessary only when the signal is vector-quantized once, and this calculation amount does not change even if the number of recognized words increases.

【００４１】要するに、離散型の場合は計算量は少ない
が認識精度的に課題があり、連続型の場合は認識精度は
よいが計算量に課題がある。In short, the discrete type has a small calculation amount but has a problem in recognition accuracy, and the continuous type has a good recognition accuracy but has a problem in the calculation amount.

【００４２】本発明は、このような従来のＨＭＭの課題
を考慮し、認識精度が高く、しかも計算量を少なくでき
るＨＭＭ作成装置、ＨＭＭ記憶装置、尤度計算装置及
び、認識装置を提供することを目的とする。In consideration of such problems of the conventional HMM, the present invention provides an HMM creating device, an HMM storage device, a likelihood calculating device, and a recognizing device which have high recognition accuracy and can reduce the amount of calculation. With the goal.

【００４３】[0043]

【課題を解決するための手段】本発明は、連続確率密度
分布ＨＭＭ作成手段と、訓練ベクトル集合をクラスタリ
ングし各々のクラスタにラベルを付与するクラスタリン
グ手段と、前記ＨＭＭの各状態における前記各クラスタ
従って前記ラベルの発生度合を、前記各クラスタに含ま
れる前記訓練ベクトルと、前記連続確率密度分布ＨＭＭ
の各状態における確率密度関数から算出するラベル発生
度合算出手段とを備え、ＨＭＭの各状態におけるラベル
の発生度合を前記ラベル発生度合算出手段の出力として
得ることにより離散確率分布ＨＭＭを作成するＨＭＭ作
成装置である。According to the present invention, there is provided a continuous probability density distribution HMM creating means, a clustering means for clustering a training vector set and labeling each cluster, and each cluster in each state of the HMM. The degree of occurrence of the label is calculated based on the training vector included in each cluster and the continuous probability density distribution HMM.
And a label occurrence degree calculating means for calculating from a probability density function in each state of HMM, and obtaining a label occurrence degree in each state of the HMM as an output of the label occurrence degree calculating means to create a discrete probability distribution HMM. It is a device.

【００４４】[0044]

【作用】本発明では、連続確率分布ＨＭＭ作成手段によ
り、該ＨＭＭの各状態における確率密度関数を得、クラ
スタリング手段により訓練ベクトル集合をクラスタリン
グし各々のクラスタにラベルを付与し、ラベル発生度合
算出手段により前記ＨＭＭの各状態における前記各クラ
スタ従って前記ラベルの発生度合を、前記各クラスタに
含まれる前記訓練ベクトルと、前記連続確率密度分布Ｈ
ＭＭの各状態における確率密度関数から算出することに
より、離散確率分布ＨＭＭを作成する。In the present invention, the probability density function in each state of the HMM is obtained by the continuous probability distribution HMM creating means, the training vector set is clustered by the clustering means, and a label is given to each cluster, and the label occurrence degree calculating means. Therefore, the degree of occurrence of each of the clusters in each state of the HMM, that is, the label, is determined by the training vector included in each cluster and the continuous probability density distribution H.
A discrete probability distribution HMM is created by calculating from the probability density function in each state of MM.

【００４５】[0045]

【実施例】以下、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００４６】まず、以後用いる記号の定義をまとめて説
明する。その際、簡単のために、誤解を生じない限り、
状態ｑ_i，ｑ_j等は単にｉ，ｊ等と表記することにする。
また、モデルの学習は単語ｖについて行う場合を述べる
こととし、区別する必要のある場合はパラメータの右肩
に添字ｖを付加し、通常はこれを省くものとする。以下
の通りである。First, the definitions of the symbols used hereinafter will be summarized. At that time, for the sake of simplicity, unless misunderstanding occurs,
The states q _i , q _j, etc. will be simply referred to as i, j, etc.
In addition, the case where the model learning is performed for the word v will be described, and when it is necessary to distinguish them, the subscript v is added to the right shoulder of the parameter, and this is usually omitted. It is as follows.

【００４７】ｉ＝１,２,・・・,Ｉ＋１：第ｉ番の状態 [ａ_ij]：遷移マトリクスａ_ij：状態ｉから状態ｊへの遷移確率ｒ：単語ｖに対する訓練パターン番号(ｒ＝１,・・・,Ｒ) ｙ^(r)(ｔ)：訓練パターンｒの第ｔフレームにおける観
測ベクトルｏ^(r)(ｔ)：訓練パターンｒの第ｔフレームにおける観
測ラベルｂ_i(ｙ^(r)(ｔ))：訓練パターンｒのフレームｔの観測ベ
クトルｙ^(r)(ｔ)の状態ｉにおける確率密度ｂ_i(ｏ^(r)(ｔ))：訓練パターンｒのフレームｔの観測ラ
ベルｏ^(r)(ｔ)の状態ｉにおける発生度合（確率、確率
密度、等）ｙ^(r)＝(ｙ^(r)(１),ｙ^(r)(２),・・・,ｙ^(r)(Ｔ^(r)))：訓
練パターンｒのベクトル系列(ただし、ｒ＝１,２,・・・,
Ｒ) Ｏ^(r)＝(ｏ^(r)(１),ｏ^(r)(２),・・・,ｏ^(r)(Ｔ^(r)))：単
語ｖに対する第ｒ番のラベル系列(ただし、ｒ＝１,２,・
・・,Ｒ) Ｘ^(r)＝(ｘ^(r)(１),ｘ^(r)(２),・・・,ｘ^(r)(Ｔ^(r)),ｘ^(r)
(Ｔ^(r)＋１))：X^(r)またはＯ^(r)に対応する状態系列ｘ^(r)(ｔ)：単語ｖに対する第ｒ番の訓練パターンの第
ｔフレームにおける状態Ｔ^(r)：単語ｖに対する第ｒ番の訓練パターンのフレー
ム数 μ_i：ｂ_i(ｙ)の平均ベクトル Σ_i：ｂ_i(ｙ)の分散共分散行列 ξ_i：状態ｉにおける観測ベクトルの確率分布を規定す
るパラメータの集合 (ξ_i＝{μ_i,Σ_i}) λ_i＝［ξ_i,{ａ_ij}_{j=1,・・・,I+1} ]：状態ｉのパラメー
タの集合 λ＝{λ_i}：全パラメータの集合(λをパラメータとする
モデルをモデルλとも呼ぶ) Ｐ(Ｙ|λ)：観測ベクトル系列Ｙがモデルλから発生す
る確率密度Ｐ(Ｏ|λ)：観測ラベル系列Ｏがモデルλから発生する
確率 π_i：状態ｉがｔ＝１で生じる確率先ず、単語ｖに対応する連続確率分布ＨＭＭを学習する
方法について述べる。I = 1, 2, ..., I + 1: i-th state [a _ij ]: transition matrix a _ij : transition probability from state i to state j r: training pattern number for word v (r = 1, ..., R) y ^(r) (t): observation vector in the t-th frame of the training pattern r o ^(r) (t): observation label b _i (y ^(r ^(r) in the t-th frame of the training pattern ^{r )} (t)): probability density b _i (o ^(r) (t)) of observation vector y ^(r) (t) of frame t of training pattern r in observation state o of frame t of training pattern r ^(r) (t) occurrence degree in state i (probability, probability density, etc.) y ^(r) = (y ^(r) (1), y ^(r) (2), ..., y ^(r) (T ^(r) )): vector sequence of training pattern r (where r = 1, 2, ...,
R) O ^(r) = (o ^(r) (1), o ^(r) (2), ..., O ^(r) (T ^(r) )): r-th label series for word v ( However, r = 1, 2, ...
.., R) X ^(r) = (x ^(r) (1), x ^(r) (2), ..., x ^(r) (T ^(r) ), x ^(r)
(T ^(r) +1)): State sequence corresponding to X ^(r) or O ^(r) x ^(r) (t): State at frame t of the r-th training pattern for word v T ^(r) : words v the r th training pattern frame number mu _i of for: b _i (y) mean vector sigma _i of: b _i (y) of the covariance matrix xi] _i: defining a probability distribution of the observed vectors in state i Set of parameters (ξ _i = {μ _i , Σ _i }) λ _i = [ξ _i , {a _ij } _{j = 1, ..., I + 1} ]: Set of parameters of state i λ = {λ _i }: Set of all parameters (a model having λ as a parameter is also referred to as model λ) P (Y | λ): Probability that observation vector series Y occurs from model λ Density P (O | λ): Probability that observation label sequence O is generated from model λ π _i : Probability that state i occurs when t = 1 First, a method of learning a continuous probability distribution HMM corresponding to word v will be described.

【００４８】問題は、単語ｖについて準備されたｒ＝１
〜Ｒの訓練パターンに対して尤度関数Ｐ(Ｙ⁽¹⁾,Ｙ⁽²⁾,
^・・・,Ｙ^(R)|λ)を最大にするパラメータλを推定するこ
とである。The problem is that r = 1 prepared for word v.
~ R training patterns P (Y ⁽¹⁾ , Y ⁽²⁾ ,
^···, Y ^(R) | is to estimate the parameters lambda to maximize lambda).

【００４９】Ｙ^(r)が互いに独立であるとすればIf Y ^(r) are independent of each other,

【００５０】[0050]

【数１１】 [Equation 11]

【００５１】で与えられる。ここで、次の補助関数Q
(λ,λ')を定義する。Is given by Where the following auxiliary function Q
Define (λ, λ ').

【００５２】[0052]

【数１２】 [Equation 12]

【００５３】このとき、次のことが言える。Ｑ(λ,λ')
≧Ｑ(λ,λ)なら、Ｐ(Ｙ⁽¹⁾,…,Ｙ^(R)|λ')≧Ｐ(Ｙ⁽¹⁾,
…,Ｙ^(R)|λ)であって、等号はλ'＝λの時に成り立
つ。故に、At this time, the following can be said. Q (λ, λ ')
If ≧ Q (λ, λ), P (Y ⁽¹⁾ , ..., Y ^(R) | λ ') ≧ P (Y ⁽¹⁾ ,
, Y ^(R) | λ), and the equal sign holds when λ ′ = λ. Therefore,

【００５４】[0054]

【数１３】 [Equation 13]

【００５５】を求めることが出来れば、λ^*→λとして
（数１３）を繰り返し適用することによって、λはＰ
(Ｙ⁽¹⁾,…,Ｙ^(R)|λ)の停留点、即ち、Ｐ(Ｙ⁽¹⁾,…,Ｙ
^(R)|λ)の極大値または鞍点を与える点に収束すること
になり、Ｐ(Ｙ⁽¹⁾,…,Ｙ^(R)|λ)の変化率が予め定めた
閾値以下になるまでこの操作を繰り返すことにより局所
最適解が得られる。If it is possible to obtain, by repeatedly applying (Equation 13) with λ ^* → λ, λ becomes P
(Y ⁽¹⁾ , ..., Y ^(R) | λ) stop point, that is, P (Y ⁽¹⁾ , ..., Y
The maximum value of ^(R) | λ) or the point that gives the saddle point is converged, and the change rate of P (Y ⁽¹⁾ , ..., Y ^(R) | λ) becomes equal to or less than a predetermined threshold value. A local optimum solution can be obtained by repeating the operation.

【００５６】次にＱ(λ,λ')を用いてパラメータを推定
する方法について説明する。Next, a method of estimating parameters using Q (λ, λ ') will be described.

【００５７】（数１２）を変形すれば、次式が得られ
る。By modifying (Equation 12), the following equation is obtained.

【００５８】[0058]

【数１４】 [Equation 14]

【００５９】前述の説明から、Ｑ(λ,λ')をλ'の関数
と見なしてＱ(λ,λ')＞Ｑ(λ,λ)なるλ'を見出せば、
それはλの更新されたものとなり、Ｐ(Ｙ⁽¹⁾,・・・,Ｙ^(R)
|λ)はλ'に関しては一定値となるから、これを取り除
いてFrom the above description, if Q (λ, λ ') is regarded as a function of λ', and λ'where Q (λ, λ ')> Q (λ, λ) is found,
It becomes an updated version of λ, and P (Y ⁽¹⁾ , ..., Y ^(R)
| λ) is a constant value for λ ', so remove this

【００６０】[0060]

【数３０】 [Equation 30]

【００６１】とするとき、Ｑ'(λ,λ')＞Ｑ'(λ,λ)な
るλ'を見出すことと同様である。ただし、ここでThen, it is similar to finding λ ′ such that Q ′ (λ, λ ′)> Q ′ (λ, λ). However, here

【００６２】[0062]

【数１５】 [Equation 15]

【００６３】とおいている。It is said that.

【００６４】（数１４）はさらに次のようになる。(Equation 14) becomes as follows.

【００６５】[0065]

【数１６】 [Equation 16]

【００６６】右辺第１項からπ_i'について最大化すれば
π_iの再推定値π_i ^*は[0066] Re-estimate of when maximized for the first term π _{_{_i}} 'π _i π _i ^* is

【００６７】[0067]

【数１７】 [Equation 17]

【００６８】右辺第２項からａ_ij'について最大化すれ
ばａ_ijの再推定値ａ_ij ^*は[0068] Re-estimate of when maximized for a _ij 'from the second term on the right side a _{_ij} a _ij ^* is

【００６９】[0069]

【数１８】 [Equation 18]

【００７０】右辺第３項からμ_i'，Σ_i'について最大化
すれば、μ_i，Σ_i各々の再推定値μ_i ^*，Σ_i ^*は[0070] From the third term on the right side μ _{_i} ', Σ _i' be maximized for, mu _i, re-estimated value of each _{_{^{_{Σ i μ i *, Σ i}}}} * is

【００７１】[0071]

【数１９】 [Formula 19]

【００７２】[0072]

【数２０】 [Equation 20]

【００７３】ここで、ξ^(r) _ij(ｔ)は次のように計算さ
れる。即ち、Here, ξ ^(r) _ij (t) is calculated as follows. That is,

【００７４】[0074]

【数２１】 [Equation 21]

【００７５】とおけば、In summary,

【００７６】[0076]

【数２２】 [Equation 22]

【００７７】である。It is

【００７８】このときAt this time

【００７９】[0079]

【数２３】 [Equation 23]

【００８０】[0080]

【数２４】 [Equation 24]

【００８１】なる漸化式が成り立つ。従って、α
^(r) ₁(１)＝１としてパラメータλに適当な初期値を与
え、ｔ＝１〜Ｔ^(r)＋１，ｊ＝１〜Ｉ＋１について（数
２３）に従ってα^(r) _j(ｔ)を、β^(r) _I+1(Ｔ^(r)＋１)＝
１としてｔ＝Ｔ^(r)＋１〜１、ｉ＝Ｉ〜１について（数
２４）に従ってβ^(r) _i(ｔ)をそれぞれ順次計算して行け
ば、（数１５）が計算できる。The following recurrence formula holds. Therefore, α
^(r) ₁ (1) = 1 and a proper initial value is given to the parameter λ, and α ^(r) _j (t) is calculated according to ⁽ Equation 23) for t = 1 to T ^(r) +1 and j = 1 to I + 1. , Β ^(r) _{I + 1} (T ^(r) +1) =
If t = T ^(r) +1 to 1 and i = I to 1 as 1, and β ^(r) _i (t) is sequentially calculated according to (Equation 24), (Equation 15) can be calculated.

【００８２】パラメータ推定の実際の計算手順は次のよ
うになる。The actual calculation procedure for parameter estimation is as follows.

【００８３】（１）Ｌ₁＝∞ （２）ｉ,ｊ＝１〜Iについてλ_i={(a_ij)_{j=1,・・・,I+1},μ
_i,Σ_i} に適当な初期値を与える。(1) L ₁ = ∞ (2) For i, j = 1 to I λ _i = {(a _ij ) _{j = 1, ..., I + 1} , μ
Give an appropriate initial value to _i , Σ _i }.

【００８４】（３）ｒ＝１〜Ｒ, ｔ＝２〜Ｔ^(r), ｉ＝
１〜Ｉ＋１についてα^(r) _i(ｔ)をλ＝{λ_i}として（数
２３）に従って計算する。(3) r = 1 to R, t = 2 to T ^(r) , i =
For 1 to I + 1, α ^(r) _i (t) is calculated according to (Equation 23) with λ = {λ _i }.

【００８５】（４）ｒ＝１〜Ｒ, ｔ＝２〜Ｔ^(r), ｉ＝
１〜Ｉ＋１についてβ^(r) _i(ｔ)とξ^(r) _ij(ｔ)をλ＝{λ
_i}としてそれぞれ（数２４）、（数２２）に従って計算
する。(4) r = 1 to R, t = 2 to T ^(r) , i =
^Let β ^(r) _i (t) and ξ ^(r) _ij (t) for 1 to I + 1 be λ = {λ
_i } is calculated according to (Equation 24) and (Equation 22), respectively.

【００８６】（５）ｒ＝１〜Ｒ，ｉ，ｊ＝１〜Ｉ＋１に
ついて、（数１８）、（数１９）、（数２０）の分子：ａ_ij,num(ｒ), μ_i,num(ｒ), Σ_i,num(ｒ) と、分母：Den_i(ｒ)＝ａ_ij,denom(ｒ)＝ μ_i,denom(ｒ)＝Σ
_i,denom(r) を計算する。(5) For r = 1 to R, i, j = 1 to I + 1, the numerator of (Equation 18), (Equation 19), (Equation 20): a _{ij, num} (r), μ _{i, num} (r), Σ _{i, num} (r) and denominator: Den _i (r) = a _{ij, denom} (r) = μ _{i, denom} (r) = Σ
Calculates _{i, denom} (r).

【００８７】（６）ａ_ij,μ_i,Σ_iの再推定値ａ_ij ^*, μ_i
^*, Σ_i ^*を次の(数)に従って計算する。[0087] _{_{(6) a ij, μ i}} , re-estimated value of Σ _{_i} a _ij ^*, μ _i
^* , Σ _i ^* is calculated according to the following (number).

【００８８】[0088]

【数２５】 [Equation 25]

【００８９】（７）ｉ,ｊ＝１〜Ｉ＋１についてａ_ij＝
ａ_ij ^*, μ_i＝μ_i ^*, Σ_i＝Σ_i ^*なる代入を行うことによ
って、再推定されたパラメータ集合λ＝{λ_i}を得る。(7) For i, j = 1 to I + 1, a _ij =
The re-estimated parameter set λ = {λ _i } is obtained by performing the substitution a _ij ^* , μ _i = μ _i ^* , Σ _i = Σ _i ^* .

【００９０】（８）ｒ＝１〜Ｒ，ｔ＝２〜Ｔ^(r),ｉ＝
１〜Ｉ＋１に対してstep（７）で得たパラメータ集合λ
に対して(8) r = 1 to R, t = 2 to T ^(r) , i =
Parameter set λ obtained in step (7) for 1 to I + 1
Against

【００９１】[0091]

【数２６】 [Equation 26]

【００９２】を計算する。Calculate

【００９３】（９）|Ｌ₁−Ｌ₂|／Ｌ₁＞εならば、Ｌ₂＝
Ｌ₁とおいてステップ（４）へ、そうでなければ終了。(9) | L ₁ −L ₂ | / L ₁ > ε, then L ₂ =
Leave L ₁ and go to step (4), otherwise end.

【００９４】前記ステップ（９）におけるεは収束の幅
を決める適当に小さな正の数であって、その値は状況に
よって実用的な値が選ばれる。Ε in the step (9) is an appropriately small positive number that determines the width of convergence, and its value is selected as a practical value depending on the situation.

【００９５】以上のようにして、連続確率分布ＨＭＭが
得られるが、本発明はこれをもとにして離散確率分布Ｈ
ＭＭを得るものであって、次の手順による。As described above, the continuous probability distribution HMM is obtained. The present invention is based on this, but the discrete probability distribution HMM is obtained.
The MM is obtained by the following procedure.

【００９６】（１）学習ベクトルのクラスタリング行
い、Ｍ個のクラスタを算出する。クラスタ名をＣ₁,Ｃ₂,
・・・,Ｃ_m,・・・,Ｃ_Mとする。クラスタＣ_mに属する訓練ベク
トルをｙ_m(１),ｙ_m(２),・・・,ｙ_m(Ｋ^m)とする。(1) Clustering of learning vectors is performed to calculate M clusters. The cluster name is C ₁ , C ₂ ,
..., C _m , ..., C _M. Let the training vectors belonging to the cluster C _m be y _m (1), y _m (2), ..., Y _m (K ^m ).

【００９７】（２）前記連続型ＨＭＭを用いて該ＨＭＭ
の各状態におけるＣ_m（ｍ＝１,・・・,Ｍ）の発生度合を求
める。(2) The HMM using the continuous HMM
The degree of occurrence of C _m (m = 1, ..., M) in each state is calculated.

【００９８】ここで、各ラベルの発生度合を定義する方
法は種々考えられる。即ち、(ａ)状態ｉにおけるＣ_mの
セントロイドの発生確率密度、(ｂ)Ｃ_mに属する学習ベ
クトルの確率密度の平均値または中央値、(ａ)、(ｂ)に
おいてそれらのクラスタに関する総和が１になるように
正規化したもの、また、前記(ｂ)において平均値の場合
は、その平均として、算術平均、幾何平均、調和平均等
が考えられる。ここでは本発明の一実施例として(ｂ)の
方法で、算術平均を用い、前記正規化はしない場合を例
にとって説明する。次式で用いるｂ_i(ｙ)は前記連続型
ＨＭＭの推定パラメータから得られたものである。この
場合は、状態ｉにおけるクラスタＣ_mの発生度合ｂ_imは
次式で与えられる。Here, various methods of defining the degree of occurrence of each label can be considered. That is, (a) the probability density of occurrence of a centroid of C _m in state i, (b) the average or median of the probability densities of learning vectors belonging to C _m , and (a) and (b) the sum of those clusters. Is normalized so that it becomes 1, and in the case of the average value in (b), the average thereof may be an arithmetic average, a geometric average, a harmonic average, or the like. Here, as an embodiment of the present invention, the method (b) will be described by using an arithmetic mean as an example and not normalizing. The b _i (y) used in the following equation is obtained from the estimated parameters of the continuous HMM. In this case, the occurrence degree b _im of the cluster C _m in the state i is given by the following equation.

【００９９】[0099]

【数２７】 [Equation 27]

【０１００】前記ステップ（１）におけるクラスタリン
グの方法は、例えば、ＬＢＧ法と呼ばれる周知の方法が
用いられ得る(具体的方法の説明は省略する)。クラスタ
リングするデータとしては、前記ＨＭＭの学習に用いた
ｖ＝１〜Ｖの単語音声に対応するパターンを構成する特
徴ベクトルの全集合を用いることが出来る。As the clustering method in the step (1), for example, a well-known method called LBG method can be used (the description of the specific method is omitted). As the data to be clustered, it is possible to use the entire set of feature vectors forming a pattern corresponding to the word speech of v = 1 to V used for learning of the HMM.

【０１０１】図１及び図２は、本発明のＨＭＭ作成装置
の一実施例である。以下図面に従ってその構成と作用を
同時に説明する。1 and 2 show an embodiment of the HMM creating apparatus of the present invention. The structure and operation will be described below with reference to the drawings.

【０１０２】特徴抽出部１０１は、周知の方法によっ
て、単語ｖ（=1,…,Ｖ）に対応するモデル作成のために
準備された訓練単語ｒ＝１〜Ｒ^vの音声信号を特徴ベク
トルの系列The feature extraction unit 101 uses the well-known method to convert the speech signals of the training words r = 1 to R ^v prepared for model creation corresponding to the word v (= 1, ..., V) into feature vector vectors. series

【０１０３】[0103]

【数２８】 [Equation 28]

【０１０４】に変換する。Convert to

【０１０５】単語パターン記憶部１０２は、ＲＡＭ、Ｒ
ＯＭ、各種ディスク等の手段であって、モデルλ^vを作
成するための学習用単語を前記特徴ベクトル系列の形で
Ｒ^v個記憶する。The word pattern storage unit 102 has RAM, R
A means such as an OM and various disks, which stores R ^v learning words for creating the model λ ^v in the form of the feature vector series.

【０１０６】バッファメモリ１０３は、単語パターン記
憶部１０２に記憶されているｖに対する単語パターンを
Ｒ^v個取り出して一時的に記憶する。The buffer memory 103 retrieves R ^v word patterns for v stored in the word pattern storage unit 102 and temporarily stores them.

【０１０７】パラメータ推定部１０４は、前記モデルλ
^vを作成するステップ（１）〜（９）を実行し、単語ｖ
に対応するモデルλ^vを推定する。The parameter estimation unit 104 uses the model λ
^v perform the step (1) to (9) to create a word v
Estimate the model λ ^v corresponding to.

【０１０８】第１のパラメータ記憶部１０５は、前記ス
テップ（６）で得られたパラメータの再推定値を一次的
に記憶する。パラメータ推定部１０４はこのパラメータ
記憶部１０５の値を用いて再推定を行う。The first parameter storage unit 105 temporarily stores the re-estimated value of the parameter obtained in the step (6). The parameter estimation unit 104 re-estimates using the value of the parameter storage unit 105.

【０１０９】クラスタリング部１０６は、単語パターン
記憶部１０２に記憶されているThe clustering unit 106 is stored in the word pattern storage unit 102.

【０１１０】[0110]

【数２９】 [Equation 29]

【０１１１】個の特徴ベクトル集合をＭ個のクラスタに
クラスタリングする。このとき、第ｍクラスタのラベル
をＣ_m，セントロイドをｙ_0mとする。Cluster feature vector sets into M clusters. At this time, the label of the m-th cluster is C _m and the centroid is y _0m .

【０１１２】クラスタベクトル記憶部１０７は、クラス
タリング部１０６で求められたＭ個のそれぞれのクラス
タのベクトルとセントロイドをｍにて参照可能な形で記
憶する。The cluster vector storage unit 107 stores the vector and centroid of each of the M clusters obtained by the clustering unit 106 in a form that can be referred to by m.

【０１１３】ラベル発生度合計算部１０８は、パラメー
タ記憶部１０５に記憶されているモデルλ^vの確率密度
関数から、クラスタベクトル記憶部１０７に記憶されて
いるクラスタＣ_mのベクトルｙ_m(１),・・・,ｙ_m(Ｋ^m)の確
率密度をｖ＝１,・・・,Ｖ，ｉ＝１,・・・,Ｉ，ｍ＝１,・・・,
Ｍについて計算し、（数２７）に従って、単語ｖのＨＭ
Ｍの状態ｉにおけるＣ_mの発生度合ｂ^v _imを計算する。The label generation degree calculation unit 108 calculates the vector y _m (1) of the cluster C _m stored in the cluster vector storage unit 107 from the probability density function of the model λ ^v stored in the parameter storage unit 105. The probability density of y _m (K ^m ) is v = 1, ..., V, i = 1, ..., I, m = 1 ,.
Compute for M, and according to (Equation 27), the HM of the word v
Calculate the occurrence degree b ^v _im of C _m in the state i of M.

【０１１４】第２のパラメータ記憶部１０９は単語ｖ＝
１〜Ｖに対応するパラメータを記憶するものであって、
前記それぞれの単語ｖ＝１,・・・,Ｖに対応するパラメー
タが、パラメータ記憶部１,・・・,パラメータ記憶部Ｖに
それぞれ記憶される。即ち、それぞれの単語の各状態に
対応する遷移確率は、第１のパラメータ記憶部１０５か
ら読み出され、ｖ,ｉ,ｊで参照可能な形で記憶される。
また、それぞれの単語の各状態におけるラベルの発生度
合はラベル発生度合算出部１０８から読み出され、ｖ,
ｉ，ｍで参照可能な形で記憶される。The second parameter storage unit 109 stores the word v =
For storing parameters corresponding to 1 to V,
The parameters corresponding to the respective words v = 1, ..., V are stored in the parameter storage units 1 ,. That is, the transition probability corresponding to each state of each word is read from the first parameter storage unit 105 and stored in a form that can be referred to by v, i, j.
Further, the degree of label occurrence in each state of each word is read from the label occurrence degree calculator 108, and v,
It is stored in a form that can be referred to by i and m.

【０１１５】以上のようにして、学習に用いたパターン
集合を形成するベクトルの集合をクラスタリングし、ク
ラスタｍに含まれるベクトルの前記ＨＭＭの状態ｉにお
ける発生度合ｂ_imを連続確率分布型ＨＭＭとして求めら
れた確率密度を用いて求め、離散確率分布型ＨＭＭに変
換する。As described above, the set of vectors forming the pattern set used for learning is clustered, and the occurrence degree b _im of the vector included in the cluster m in the state i of the HMM is obtained as a continuous probability distribution type HMM. The obtained probability density is used for conversion into a discrete probability distribution type HMM.

【０１１６】次に、以上のようなモデルを用いて実際の
入力音声を認識する装置について、その構成及び作用を
同時に説明する。Next, the structure and operation of an apparatus for recognizing an actual input voice using the above model will be described at the same time.

【０１１７】図５はその認識装置のブロック図である。FIG. 5 is a block diagram of the recognition device.

【０１１８】特徴抽出部４０１は、図１の特徴抽出部１
０１と全く同様の構成、機能を有する。The feature extraction unit 401 is the feature extraction unit 1 of FIG.
01 has exactly the same configuration and function.

【０１１９】コードブック４０３には、図１及び図２の
ＨＭＭ作成装置のクラスタベクトル記憶部１０７に記憶
されている各クラスタのセントロイドが記憶されてい
る。The codebook 403 stores the centroid of each cluster stored in the cluster vector storage unit 107 of the HMM creating apparatus of FIGS. 1 and 2.

【０１２０】ベクトル量子化部４０２は、特徴抽出部４
０１の出力の特徴ベクトルｙ(ｔ)とコードブック４０３
に記憶されている前記それぞれのクラスタの代表ベクト
ルｙ_0m（ｍ＝１,…,Ｍ）との距離を計算し、ｙ(ｔ)をｙ
(ｔ)に最も近い代表ベクトルに対応するクラスタのラベ
ルに置き換えて、特徴ベクトル系列をラベル系列に変換
する。The vector quantizer 402 includes a feature extractor 4
01 output feature vector y (t) and codebook 403
The distance from the representative vector y _0m (m = 1, ..., M) of each of the clusters stored in
The feature vector series is converted into a label series by replacing it with the label of the cluster corresponding to the representative vector closest to (t).

【０１２１】パラメータ記憶部４０４は、図２のパラメ
ータ記憶部１０９と全く同様の構成、機能を有するもの
であって、パラメータ記憶部ｖには、単語ｖ（=１,・・・,
Ｖ）に対応するモデルのパラメータが記憶されている。The parameter storage unit 404 has exactly the same configuration and function as the parameter storage unit 109 of FIG. 2, and the parameter storage unit v stores the words v (= 1, ...,
The parameters of the model corresponding to V) are stored.

【０１２２】尤度計算部４０５は、ベクトル量子化部４
０２の出力に得られるラベル系列に対する各モデルの尤
度をパラメータ記憶部４０４の内容を用いて計算するも
のである。即ち、尤度計算部ｖではパラメータ記憶部ｖ
の内容が用いられる。尤度の計算方法は、（数１）、
（数２）、（数３）等の何れかが用いられ得る。Likelihood calculation section 405 includes vector quantization section 4
The likelihood of each model with respect to the label sequence obtained as the output of 02 is calculated using the contents of the parameter storage unit 404. That is, in the likelihood calculation unit v, the parameter storage unit v
Content is used. The likelihood calculation method is (Equation 1),
Either (Equation 2), (Equation 3), or the like may be used.

【０１２３】比較判定部４０６は、尤度計算部４０５に
含まれる尤度計算部１,・・・，Ｖの何れの出力が最大であ
るかを比較判定し、それに対応する単語を認識結果とし
て出力するもので、（数４）に相当する計算を実行す
る。The comparison determination unit 406 compares and determines which output of the likelihood calculation units 1, ..., V included in the likelihood calculation unit 405 is the maximum, and the word corresponding to that is determined as the recognition result. This is output, and the calculation corresponding to (Equation 4) is executed.

【０１２４】この比較判定部４０６から単語の認識結果
が出力される。The result of word recognition is output from the comparison / determination unit 406.

【０１２５】なお、本実施例においては、単語を認識す
るとして述べたが、本発明では、単語を音韻や音節等に
置き換えても勿論よく、また、音声以外のパターンにも
適用出来るものである。In the present embodiment, it is described that a word is recognized. However, in the present invention, the word may be replaced with a phoneme, a syllable, or the like, and may be applied to a pattern other than voice. .

【０１２６】さらに、本実施例では特徴ベクトルの分布
は、各状態において単一の正規分布に従うとして説明し
たが、本発明では、いわゆる混合分布を用いることによ
り、より精密なラベルの発生度合を得ることも勿論可能
である。Further, in the present embodiment, the distribution of the feature vector is described as following a single normal distribution in each state, but in the present invention, a more precise label generation degree is obtained by using a so-called mixed distribution. Of course, it is possible.

【０１２７】また、本発明は、音声認識装置にかぎら
ず、他の時系列信号処理分野に適用可能である。Further, the present invention is applicable not only to the voice recognition device but also to other time series signal processing fields.

【０１２８】なお、本発明の各手段は、コンピュータを
用いてソフトウェア的に実現し、あるいはそれら各機能
を有する専用のハード回路を用いて実現してもかまわな
い。Each means of the present invention may be realized by software using a computer or by using a dedicated hardware circuit having each of these functions.

【０１２９】[0129]

【発明の効果】以上述べたところから明らかなように、
本発明は、連続確率密度分布ＨＭＭを作成するＨＭＭ作
成手段と、訓練ベクトル集合をクラスタリングし各々の
クラスタにラベルを付与するクラスタリング手段と、Ｈ
ＭＭの各状態における各クラスタの従って各ラベルの発
生度合を、各クラスタに含まれる訓練ベクトルと、連続
確率密度分布ＨＭＭの各状態における確率密度関数から
算出するラベル発生度合算出手段とを備えているので、
離散型ＨＭＭにおける課題である訓練データの不足やそ
の偏りによる推定誤差を解消し、離散型ＨＭＭのもつ計
算量が少ないという利点を活かしたモデルを実現するこ
とが出来る。As is apparent from the above description,
The present invention relates to an HMM creating means for creating a continuous probability density distribution HMM, a clustering means for clustering a training vector set and giving a label to each cluster, and H
The label generation degree calculating means for calculating the degree of occurrence of each label according to each cluster in each state of MM from the training vector included in each cluster and the probability density function in each state of the continuous probability density distribution HMM is provided. So
It is possible to solve the problem of the discrete HMM, that is, the estimation error due to the lack of training data and its bias, and to realize a model that takes advantage of the advantage that the discrete HMM has a small amount of calculation.

[Brief description of drawings]

【図１】本発明によるＨＭＭのパラメータ推定を行う装
置の一実施例を示すブロック図の一部である。FIG. 1 is a part of a block diagram showing an embodiment of an apparatus for estimating parameters of an HMM according to the present invention.

【図２】本発明によるＨＭＭのパラメータ推定を行う装
置の一実施例を示すブロック図の残部である。FIG. 2 is the rest of the block diagram showing one embodiment of the apparatus for estimating the parameters of the HMM according to the present invention.

【図３】ＨＭＭを用いた音声認識装置の従来例を説明す
るブロック図である。FIG. 3 is a block diagram illustrating a conventional example of a voice recognition device using an HMM.

【図４】連続確率分布型ＨＭＭの構成を示すＨＭＭの構
成図である。FIG. 4 is a configuration diagram of an HMM showing a configuration of a continuous probability distribution type HMM.

【図５】本発明により構成されたＨＭＭを用いた音声認
識装置の一実施例を示すブロック図である。FIG. 5 is a block diagram showing an embodiment of a voice recognition device using an HMM constructed according to the present invention.

[Explanation of symbols]

１０１・・・・特徴抽出部、１０２・・・・単語パターン記憶部、１０３・・・・バッファメモリ、１０４・・・・パラメータ推定部１０５・・・・パラメータ記憶部１０６・・・・クラスタリング部１０７・・・・クラスタベクトル記憶部１０８・・・・ラベル発生度合計算部１０９・・・・パラメータ記憶部 101 ... Feature extraction unit, 102 ... Word pattern storage unit, 103 ... Buffer memory, 104 ... Parameter estimation unit 105 ... Parameter storage unit 106 ... Clustering unit 107 ... Cluster vector storage unit 108 ... Label generation degree calculation unit 109 ... Parameter storage unit

Claims

[Claims]

1. An HM for creating a continuous probability density distribution HMM.
M creating means, clustering means for clustering a training vector set and giving a label to each cluster, and the degree of occurrence of each label according to each cluster in each state of the HMM, the training included in each cluster An HMM creation device comprising a vector and a label generation degree calculation means for calculating from a probability density function in each state of the continuous probability density distribution HMM.

2. A state transition probability storing means for storing the state transition probability obtained by the HMM creating apparatus according to claim 1, and a label occurrence degree storing means for storing the occurrence degree of each label in each state. HMM characterized by
Storage device.

3. The vector series by calculating which cluster of claim 1 each vector of the feature vector series constituting the input pattern belongs to and replacing each vector with the label of the cluster to which the vector belongs. Is described as a parameter stored in the HMM storage device based on the vector quantization means for converting the label into a label sequence, the state transition probability stored in the HMM storage device according to claim 2, and the degree of label occurrence in each state. Of the HMM,
A likelihood calculating device, comprising: a likelihood calculating unit that calculates a likelihood for the input pattern.

4. The likelihood calculation device according to claim 3 is provided for each recognition unit, the likelihood of each recognition unit model for an input signal is calculated, and the input signal is recognized as the recognition signal from the value of the likelihood. A recognition device characterized by determining which of the units.

5. The label generation degree calculation means includes, when the cluster is C _m (m = 1, ..., M), from the probability density function of the state i of the continuous probability density distribution HMM to C _m . The probability density of each of the training vectors to be obtained, and a characteristic value calculating means for calculating a characteristic value such as an average value or a median of the probability density is included, and the characteristic value is defined as the occurrence degree b _im of C _{m in} the state i. The HMM creating apparatus according to claim 1, wherein:

6. The label generation degree calculating means represents C _m from the probability density function of the state i of the continuous probability density distribution HMM when the cluster is C _m (m = 1, ..., M). 2. The probability density of the vector is calculated, and the probability density is used as the occurrence degree C _im of C _{m in} the state i.
The described HMM creation device.

7. A label occurrence rate calculating means is further from the _{_{b im, b im '= b}} im / (b i1 + ··· + b iM) comprises generating the degree normalizing means for calculating, the normalized occurrence 7. The HMM creating apparatus according to claim 5, wherein the degree b _im ′ is the degree of occurrence of C _m in state i.