JPH08248986A

JPH08248986A - Pattern recognition method

Info

Publication number: JPH08248986A
Application number: JP7052391A
Authority: JP
Inventors: Satoshi Takahashi; 敏高橋; Shigeki Sagayama; 茂樹嵯峨山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-03-13
Filing date: 1995-03-13
Publication date: 1996-09-27

Abstract

PURPOSE: To effectively reduce the number of model parameters without degrading the recognition performance by reducing the number of the total parameters of the entire model through the use of one common parameter for similar characteristic parameters. CONSTITUTION: Against the inputted vectors, the likelihood of the hidden Markov model (HMM), in which the output probability distribution of each state is expressed by a multi-dimensional continuous distribution, is computed and the method outputs the category, which expresses the model having a highest likelihood, as the recognition result. Moreover, when it is expressed by a single continuous probability distribution or a mixed continuous probability distribution in each state of the HMM, one dimensional continuous distribution existing in each dimension of the multi-dimensional continuous distribution, which constitutes the above distributions, has common parameters, which express the distributions, between the distribution that exists in each dimension of other multi-dimensional continuous distributions. Furthermore, the method is provided with the HMM in which the commonality relationship of these parameters is specified in an individual manner. Thus, the parameters are effectively learned and the computation cost of the output probability is reduced.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、隠れマルコフモデル
（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ，以下ＨＭ
Ｍと記す）を用いて、入力ベクトルに対する各モデルの
尤度を求めてその入力ベクトルの認識を行うパターン認
識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Hidden Markov Model (hereinafter referred to as HM).
(Hereinafter referred to as M), the likelihood of each model with respect to the input vector is obtained, and the input vector is recognized.

【０００２】[0002]

【従来の技術】確率、統計論に基づいてモデル化するＨ
ＭＭ法は、音声、文字、図形等のパターン認識において
有用な技術である。以下では、音声認識を例にＨＭＭ法
を用いた従来技術について説明する。従来の音声認識装
置において、認識すべき音声をＨＭＭを用いてモデル化
しておく方法は、性能が高く、現在の主流になってい
る。このＨＭＭ法の詳細は例えば文献１（中川聖一：確
率モデルによる音声認識、電子情報通信学会）に示され
ている。図２に従来のＨＭＭを用いた音声認識装置の構
成例を示す。入力端子１１から入力された音声は、Ａ／
Ｄ変換部１２においてディジタル信号に変換される。そ
のディジタル信号から音声特徴パラメータ抽出部１３に
おいて音声特徴パラメータを抽出する。予め、認識しよ
うとする音声単位（例えば音素、音節、単語）ごとに作
成したＨＭＭをＨＭＭメモリ１４から読み出し、モデル
尤度計算部１５において、入力音声に対する各モデルの
尤度を計算する。最も大きな尤度を示すモデルが表現す
る音声単位を認識結果として認識結果出力部１６より出
力する。2. Description of the Related Art H modeling based on probability and statistics
The MM method is a useful technique in pattern recognition of voices, characters, figures and the like. In the following, a conventional technique using the HMM method will be described taking speech recognition as an example. In a conventional voice recognition device, a method of modeling a voice to be recognized by using an HMM has high performance and has become the mainstream at present. The details of this HMM method are shown in, for example, Document 1 (Seiji Nakagawa: Speech recognition by probabilistic model, Institute of Electronics, Information and Communication Engineers). FIG. 2 shows a configuration example of a conventional voice recognition device using an HMM. The voice input from the input terminal 11 is A /
The D conversion unit 12 converts the digital signal. The voice feature parameter extraction unit 13 extracts voice feature parameters from the digital signal. An HMM created in advance for each speech unit to be recognized (for example, phoneme, syllable, word) is read from the HMM memory 14, and the model likelihood calculator 15 calculates the likelihood of each model for the input speech. The recognition result output unit 16 outputs the speech unit represented by the model having the largest likelihood as the recognition result.

【０００３】図３Ａに、３状態のＨＭＭの例を示す。こ
の様なモデルを音声単位（カテゴリ）ごとに作成する。
各状態Ｓ１からＳ３には、音声特徴パラメータの統計的
な分布Ｄ１からＤ３がそれぞれ付与される。例えば、こ
れが音素モデルであるとすると、第１状態は音素の始端
付近、第２状態は中心付近、第３状態は終端付近の特徴
量の統計的な分布を表現する。FIG. 3A shows an example of a three-state HMM. Such a model is created for each voice unit (category).
Statistical distributions D1 to D3 of voice characteristic parameters are given to the states S1 to S3, respectively. For example, if this is a phoneme model, the first state represents the statistical distribution of the feature amount near the beginning of the phoneme, the second state near the center, and the third state near the end.

【０００４】各状態の特徴量分布は、複雑な分布形状を
表現するために、複数の連続確率分布（以下、混合連続
分布と記す）を用いて表現される場合が多い。連続確率
分布には、様々な分布が考えられるが、正規分布が用い
られることが多い。また、それぞれの正規分布は、特徴
量と同じ次元数の多次元無相関正規分布で表現されるこ
とが多い。図３Ｂに、混合連続分布の例を示す。この図
では平均値ベクトルがμ₁、分散値がσ₁の正規分布Ｎ
（μ₁，σ₁）とＮ（μ₂，σ₂）とＮ（μ₃，σ₃）
との３つの正規分布で表現された場合である。時刻ｔの
入力特徴量ベクトルＸ_t＝（ｘ_t,1,x _t,2,…x _t,P) ^T
（Ｐは総次元数）に対する混合連続分布ＨＭＭの状態ｓ
の出力確率ｂ_s（Ｘ_t）は、The feature amount distribution of each state is often expressed using a plurality of continuous probability distributions (hereinafter referred to as mixed continuous distributions) in order to express a complicated distribution shape. Various distributions can be considered as the continuous probability distribution, but a normal distribution is often used. In addition, each normal distribution is often represented by a multidimensional uncorrelated normal distribution having the same number of dimensions as the feature amount. FIG. 3B shows an example of continuous mixed distribution. In this figure, the normal distribution N with mean value vector μ ₁ and variance value σ ₁
(Μ ₁ , σ ₁ ) and N (μ ₂ , σ ₂ ) and N (μ ₃ , σ ₃ )
It is a case where it is expressed by three normal distributions of. Input feature vector _Xt = ( _{xt, 1,} _{xt, 2,} ... _{xt, P} ) ^{T at time} _t
The state s of the mixed continuous distribution HMM for (P is the total number of dimensions)
The output probability b _s (X _t ) of

【０００５】[0005]

【数１】 [Equation 1]

【０００６】のように計算される。ここで、Ｗ_k ^sは状
態ｓに含まれるｋ番目の多次元正規分布ｋに対する重み
係数を表わす。多次元正規分布ｋに対する確率密度Ｐ_k
^s（Ｘ _t）は、It is calculated as follows. Where W_k ^sState
Weight for the k-th multidimensional normal distribution k included in state s
Represents a coefficient. Probability density P for multidimensional normal distribution k_k
^s(X _t) Is

【０００７】[0007]

【数２】 [Equation 2]

【０００８】のように計算される。ここで、μ_k ^sはｋ
番目の多次元正規分布ｋに対する平均値ベクトル、Σ_k
^sは同じく共分散行列を表わす。共分散行列が対角成分
のみ、つまり対角共分散行列であるとすると、Ｐ
_k ^s（Ｘ_t）の対数値は、It is calculated as follows. Where μ _k ^s is k
Mean vector for the th multidimensional normal distribution k, Σ _k
^s also represents the covariance matrix. If the covariance matrix is only diagonal components, that is, the diagonal covariance matrix, then P
The logarithmic value of _k ^s (X _t ) is

【０００９】[0009]

【数３】 (Equation 3)

【００１０】と表わせる。ここで、μ_k,i ^sは状態ｓの
第ｋ番目の多次元正規分布の平均値ベクトルの第ｉ次目
の成分を、σ_k,i ^sは、状態ｓの第ｋ番目の多次元正規
分布の共分散行列の第ｉ次目の対角成分（分散値）を表
わす。この計算を認識候補のモデルについて、入力音声
の各時刻の特徴量ベクトルに対して行い、得られた対数
尤度をもとに認識結果を出力する。Can be expressed as Here, μ _{k, i} ^s is _{the i-} th component of the mean value vector of the k-th multidimensional normal distribution of state s _, and σ _{k, i} ^s is the k-th multidimensional normal of state s It represents the i-th diagonal component (dispersion value) of the covariance matrix of the distribution. This calculation is performed on the feature vector at each time of the input speech for the model of the recognition candidate, and the recognition result is output based on the obtained log-likelihood.

【００１１】[0011]

【発明が解決しようとする課題】認識性能を高めるため
には、音響モデルの表現能力を高める必要があり、この
ためにモデルパラメータ数を増やす必要がある。多数の
モデルパラメータを学習するためには膨大なデータ量が
必要となるが、現実には限られたデータしか集めること
ができないので、むやみにモデルパラメータ数を増やす
ことができない。多数のパラメータを含むモデルを少量
のデータで学習すると、学習データに深く依存したモデ
ルとなってしまい、認識時に学習データとわずかに異な
るデータに対しても認識誤りを起こしてしまう。しか
し、モデルパラメータ数が少ないと表現能力が低いため
に十分な認識性能が得られない。このように、モデルの
精度と認識時の頑健性にはトレードオフの関係があり、
より少数のモデルパラメータで、より精密なモデルを表
現する必要があるという問題がある。In order to improve the recognition performance, it is necessary to increase the expression capability of the acoustic model, and for this reason, it is necessary to increase the number of model parameters. An enormous amount of data is required to learn a large number of model parameters, but in reality, only limited data can be collected, so the number of model parameters cannot be increased unnecessarily. If a model including a large number of parameters is trained with a small amount of data, the model will be deeply dependent on the training data, and a recognition error will occur even for data slightly different from the training data at the time of recognition. However, if the number of model parameters is small, the expression ability is low, and sufficient recognition performance cannot be obtained. Thus, there is a trade-off between model accuracy and robustness during recognition.
There is a problem that it is necessary to represent a more accurate model with a smaller number of model parameters.

【００１２】また、ＨＭＭに基づくパターン認識装置に
おいて、式（２）の出力確率の計算コストが最も高い。
音声認識装置の典型的な例では、この計算が消費する時
間は音声認識処理時間全体の４５％から６５％を消費す
る。実時間処理はヒューマンインタフェースの観点から
も重要な課題であるにもかかわらず、現状の処理速度は
十分満足できるものではないという問題がある。Further, in the pattern recognition device based on HMM, the calculation cost of the output probability of the equation (2) is the highest.
In a typical example of a speech recognizer, this calculation consumes 45% to 65% of the total speech recognition processing time. Although real-time processing is an important issue from the viewpoint of human interface, there is a problem that the current processing speed is not sufficiently satisfactory.

【００１３】そこで、この発明の目的は、モデルの表現
能力を保ちながら、認識性能を劣化させずに、モデルパ
ラメータ数を効果的に削減し、同じデータ量でも効率的
にパラメータを学習することができ、かつ出力確率の計
算コストが少なく、実時間処理が可能なパターン認識方
法を提供することにある。Therefore, an object of the present invention is to effectively reduce the number of model parameters and efficiently learn parameters even with the same amount of data, while maintaining recognition ability of the model, without deteriorating recognition performance. An object of the present invention is to provide a pattern recognition method that can be performed and has a low calculation cost of output probability and can perform real-time processing.

【００１４】[0014]

【課題を解決するための手段】この発明によれば、ＨＭ
Ｍの異なるモデル間や状態間に存在する分布のパラメー
タであっても、類似した性質のものは１つのパラメータ
で共通に使用して、モデル全体の総パラメータ数を削減
する。例えば、多次元正規確率分布を構成する各次元の
正規分布を表現する（規定する）パラメータ（平均値、
分散値）を、類似した他の正規分布のパラメータとの間
で共通化することを特徴とする。According to the present invention, the HM
Even if parameters of distribution existing between different models or states of M have similar properties, one parameter is commonly used to reduce the total number of parameters of the entire model. For example, parameters (means) expressing (defining) the normal distribution of each dimension that constitutes the multidimensional normal probability distribution,
It is characterized in that the variance value) is shared with other similar normal distribution parameters.

【００１５】[0015]

【実施例】この発明の方法を、多次元連続分布が正規分
布である場合を例にして説明する。今、システム中のＨ
ＭＭのある２つの状態に存在する多次元正規分布Ａ，Ｂ
に着目し、これらを、Ｎ（μ₁ ^A, Σ₁ ^A），Ｎ（μ₁
^B, Σ₁ ^B）と表す、Ｎ（μ，Σ）は、平均値ベクトル
がμで、共分散行列がΣの正規分布であることを表現す
る。特徴パラメータの次元数を４とし、共分散行列は対
角成分のみとすると、これらの分布は、Ｎ（μ₁ ^A, Σ₁ ^A）＝｛Ｎ（μ_1,1 ^A，σ_1,1 ^A），
Ｎ（μ_1,2 ^A，σ_1,2 ^A），Ｎ（μ_1,3 ^A，
σ_1,3 ^A），Ｎ（μ_1,4 ^A，σ_1,4 ^A）｝Ｎ（μ₁ ^B, Σ₁ ^B）＝｛Ｎ（μ_1,1 ^B，σ_1,1 ^B），
Ｎ（μ_1,2 ^B，σ_1,2 ^B），Ｎ（μ_1,3 ^B，
σ_1,3 ^B），Ｎ（μ_1,4 ^B，σ_1,4 ^B）｝と表すことができる。ここで、それぞれの２次元目の正
規分布、Ｎ（μ_1,2 ^A，Σ_1,2 ^A）とＮ（μ_1,2 ^B，Σ
_1,2 ^B）とが類似しているとき、これらを共有化し、１
つの分布Ｎ（μ_1,2 ^C，Σ_1,2 ^C）で代表させ、それぞ
れの多次元正規分布を、Ｎ（μ₁ ^A, Σ₁ ^A）＝｛Ｎ（μ_1,1 ^A，σ_1,1 ^A），
Ｎ（μ_1,2 ^C，σ_1,2 ^C），Ｎ（μ_1,3 ^A，
σ_1,3 ^A），Ｎ（μ_1,4 ^A，σ_1,4 ^A）｝Ｎ（μ₁ ^B, Σ₁ ^B）＝｛Ｎ（μ_1,1 ^B，σ_1,1 ^B），
Ｎ（μ_1,2 ^C，σ_1,2 ^C），Ｎ（μ_1,3 ^B，
σ_1,3 ^B），Ｎ（μ_1,4 ^B，σ_1,4 ^B）｝と置き換える。これによりパラメータμ，σの総数を１
６個から１４個に減らすことができた。DESCRIPTION OF THE PREFERRED EMBODIMENTS The method of the present invention will be described by taking the case where the multidimensional continuous distribution is a normal distribution as an example. H in the system now
Multidimensional normal distributions A and B existing in two states with MM
Focusing on N (μ ₁ ^A , Σ ₁ ^A ), N (μ _{1 A}
^B (Σ, ₁ ^B ), N (μ, Σ), represents that the mean value vector is μ and the covariance matrix is a normal distribution of Σ. Assuming that the dimensionality of the feature parameter is 4 and the covariance matrix has only diagonal components, these distributions are N (μ ₁ ^A , Σ ₁ ^A ) = {N (μ _1,1 ^A , σ _1,1 ^A ),
N (μ _1,2 ^A , σ _1,2 ^A ), N (μ _1,3 ^A ,
σ _1,3 ^A ), N (μ _1,4 ^A , σ _1,4 ^A )} N (μ ₁ ^B , Σ ₁ ^B ) = {N (μ _1,1 ^B , σ _1,1 ^B ),
N (μ _1,2 ^B , σ _1,2 ^B ), N (μ _1,3 ^B ,
σ _1,3 ^B ), N (μ _1,4 ^B , σ _1,4 ^B )}. Here, each of the second-dimensional normal distributions, N (μ _1,2 ^A , Σ _1,2 ^A ) and N (μ _1,2 ^B , Σ
_{When 1,2} ^B ) are similar, share them and
One distribution N (μ _1,2 ^C , Σ _1,2 ^C ) is represented, and each multidimensional normal distribution is represented by N (μ ₁ ^A , Σ ₁ ^A ) = {N (μ _1,1 ^A , σ _{1 , 1} ^A ),
N (μ _1,2 ^C , σ _1,2 ^C ), N (μ _1,3 ^A ,
σ _1,3 ^A ), N (μ _1,4 ^A , σ _1,4 ^A )} N (μ ₁ ^B , Σ ₁ ^B ) = {N (μ _1,1 ^B , σ _1,1 ^B ),
N (μ _1,2 ^C , σ _1,2 ^C ), N (μ _1,3 ^B ,
σ _1,3 ^B ), N (μ _1,4 ^B , σ _1,4 ^B )}. This makes the total number of parameters μ and σ 1
I was able to reduce from 6 to 14.

【００１６】次に、上記の例において、時刻ｔの入力ベ
クトルＸ_t＝（ｘ_t,1，ｘ_t,2，ｘ _t,3，ｘ_t,4）に対
し尤度を計算する場合に、各次元の分布が共有化されて
いることの利点を述べる。尤度を各々のモデルに対して
計算するためには、式（３）の計算が必要になる。初め
に、分布Ａの尤度を計算したとする。次に、分布Ｂの計
算を行うが、２次元目の分布は共有化されているので再
計算する必要はなく、分布Ａに対する２次元目の計算結
果を利用することができる。このように、共有化された
パラメータに対する計算は、いずれかのモデルで計算が
なされると、他のモデルにおいては計算結果を利用する
ことができ、計算量の削減が図れる。Next, in the above example, the input vector at time t is
Cutle X_t= (X_{t, 1}, X_{t, 2}, X _{t, 3}, X_{t, 4}) To
When calculating the likelihood, the distribution of each dimension is shared.
State the benefits of being Likelihood for each model
In order to calculate, the calculation of formula (3) is required. beginning
Then, it is assumed that the likelihood of the distribution A is calculated. Next, the total of distribution B
However, since the distribution of the second dimension is shared,
There is no need to calculate, and the calculation result of the second dimension for distribution A
The fruit can be used. Shared like this
The calculation for the parameters can be calculated by either model.
Once done, use the calculated results in other models
It is possible to reduce the calculation amount.

【００１７】実際のシステムでは、例えば図１Ａに示す
様な要素パラメータのインデックステーブルを設け、つ
まり多次元正規分布Ａ，Ｂに存在する各次元の正規分
布、Ｎ（μ_1,1 ^A，σ_1,1 ^A），Ｎ（μ_1,3 ^A，σ_1,3
^A），Ｎ（μ_1,4 ^A，σ_1,4 ^A），Ｎ（μ_1,1 ^B，σ
_1,1 ^B），Ｎ（μ_1,3 ^B，σ_1,3 ^B），Ｎ（μ_1,4 ^B，
σ _1,4 ^B），Ｎ（μ_1,2 ^C，σ_1,2 ^C）に対し、インデ
ックスＩ₁〜Ｉ₇をそれぞれ付け、各多次元分布Ａ，Ｂ
をそれぞれＩ₁〜Ｉ₇で記述する。すなわち、Ｎ（μ₁
^A, Σ₁ ^A）＝｛Ｉ₁，Ｉ₇．Ｉ₂，Ｉ₃｝，Ｎ（μ₁
^B, Σ₁ ^B）＝｛Ｉ ₄，Ｉ₇．Ｉ₅，Ｉ₆｝とする。更
に、計算結果バッファを図１Ｂに示すように設け、入力
ベクトルに対する各次元の分布の計算結果を要素パラメ
ータＩ_iを参照して格納する。計算結果バッファは初期
状態では例えば−１を設定しておく。In an actual system, for example, as shown in FIG. 1A.
The index table of element parameters like
Mari Normals of each dimension existing in multidimensional normal distributions A and B
Cloth, N (μ_1,1 ^A, Σ_1,1 ^A), N (μ_1,3 ^A, Σ_1,3
^A), N (μ_1,4 ^A, Σ_1,4 ^A), N (μ_1,1 ^B, Σ
_1,1 ^B), N (μ_1,3 ^B, Σ_1,3 ^B), N (μ_1,4 ^B,
σ _1,4 ^B), N (μ_1,2 ^C, Σ_1,2 ^C)
Cox I₁~ I₇, And each multidimensional distribution A, B
Respectively I₁~ I₇Described in. That is, N (μ₁
^A, Σ₁ ^A) = {I₁, I₇． I₂, I₃}, N (μ₁
^B, Σ₁ ^B) = {I _Four, I₇． I_Five, I₆}. Change
In addition, a calculation result buffer is provided as shown in FIG.
The parameter calculation result of the distribution of each dimension for the vector
Data I_iRefer to and store. Calculation result buffer is initial
For example, -1 is set in the state.

【００１８】入力ベクトルＸ_tとの計算に当たっては、
例えば分布Ａ（Ｎ（μ₁ ^A, Σ₁ ^A））から行い、その
１番目の要素インデックスがＩ₁であるから、計算結果
のバッファ（図（Ｂ））のインデックスＩ₁に対する記
憶が−１か否かを調べ、−１の場合は、図１Ａの要素の
パラメータのインデックステーブルを参照して、分布Ａ
のパラメータμ_1,1 ^A，σ_1,1 ^Aを読み出し演算をし
て、その結果を計算結果バッファのインデックスＩ₁の
箇所に格納する。次の要素のインデックスＩ₇について
も同様に計算し、その計算結果を計算結果バッファのイ
ンデックスＩ₇の箇所に格納する。以下同様にする。分
布Ｂについての計算を行う際に、その２番目のインデッ
クスＩ₇については、計算結果バッファの箇所は先の分
布Ａについての計算の際に実行された結果が格納され、
−１とは異なる値となっており、よってこの格納されて
いる値を用いる。In calculating the input vector X _t ,
For example, the distribution A (N (μ ₁ ^A , Σ ₁ ^A )) is performed, and the first element index is I ₁ , so the storage of the calculation result buffer (FIG. (B)) for index I ₁ is −1. If it is -1, the distribution A is referred to by referring to the index table of the parameter of the element of FIG. 1A.
The parameters μ _1,1 ^A and σ _1,1 ^A are read out and calculated, and the result is stored in the index I ₁ of the calculation result buffer. The index I ₇ of the next element is similarly calculated, and the calculation result is stored in the index I ₇ of the calculation result buffer. The same applies hereinafter. When performing the calculation on the distribution B, for the second index I ₇ , the location of the computation result buffer stores the result executed at the time of the computation on the distribution A,
Since it is a value different from -1, this stored value is used.

【００１９】この様なパラメータの共有化による計算結
果の共有は、モデル数、状態数、状態内の分布数、特徴
パラメータの次元数によらず行うことができる。次に、
分布パラメータを各次元により異なる共有化関係で結ぶ
例について述べる。音声の特徴量ベクトルは、各次元ご
とにパラメータが持つ情報量が異なる。例えば、特徴量
の１つであるケプストラムは、１６次元ぐらいで表現さ
れることが多いが、主に低い次元の要素により多くの情
報量が含まれている。そこで、低い次元の分布に対して
は、共通化するパラメータの数を少なくし、自由度を高
くして、分布の表現能力を高めておく。一方、高い次元
の分布に対しては、パラメータの共通化を積極的に行
い、類似した分布は共通化しておく。このような操作に
より、分布の総数が同じでも、各次元ごとに分布の数を
不均一に配置することによって、より効率的な表現が実
現できる。必ずしも次元ごとに分布数を不均一に配置す
るのではなく、分布数が同一なる次元があってもよいこ
とは明らかである。多次元正規分布を共通化する方法が
提案されているが、この場合、ベクトルを共通化するた
めに、各次元は同じ共有化関係が結ばれている。よっ
て、共有化後に各次元に存在する分布の数は同じであ
り、情報量の多い次元も少ない次元も同じ数の分布で表
現されている。したがって、同じ数の分布を持つモデル
であっても（分布数が同じモデルの場合、計算量も一緒
になる）、この発明による共通化方法によれば情報量の
多い次元の分布数を多くし、情報量の少ない次元の分布
数を少なくすることにより、より性能の高いモデルを構
築することができる。言い換えれば、同じ性能のモデル
を、より少ない数の分布で実現することができ、計算量
が削減できる。The sharing of calculation results by such sharing of parameters can be performed regardless of the number of models, the number of states, the number of distributions in states, and the number of dimensions of feature parameters. next,
An example of connecting distribution parameters with different sharing relations for each dimension is described. In the feature vector of voice, the amount of information held by the parameter differs for each dimension. For example, the cepstrum, which is one of the feature quantities, is often expressed in about 16 dimensions, but mainly contains elements of lower dimensions, and thus contains a large amount of information. Therefore, for low-dimensional distributions, the number of parameters to be shared is reduced, the degree of freedom is increased, and the expression ability of the distribution is enhanced. On the other hand, for high-dimensional distributions, the parameters are actively shared, and similar distributions are shared. By such an operation, even if the total number of distributions is the same, a more efficient expression can be realized by arranging the number of distributions non-uniformly for each dimension. It is obvious that the distribution numbers may not be arranged non-uniformly for each dimension, and that the dimensions may have the same distribution number. A method of sharing a multidimensional normal distribution has been proposed, but in this case, in order to share a vector, each dimension has the same sharing relationship. Therefore, the number of distributions existing in each dimension after sharing is the same, and the dimension having a large amount of information and the dimension having a small amount of information are represented by the same number of distributions. Therefore, even if the models have the same number of distributions (when the number of distributions is the same, the calculation amount is also the same), the commonization method according to the present invention increases the number of distributions in a dimension having a large amount of information. A model with higher performance can be constructed by reducing the number of distributions of a dimension with a small amount of information. In other words, models with the same performance can be realized with a smaller number of distributions, and the amount of calculation can be reduced.

【００２０】ＨＭＭのような統計的手法では、パラメー
タ数と学習データ量には密接な関係がある。数多くの分
布がモデルに含まれていると、推定すべきパラメータ数
が増加し、大量の学習データが必要になる。学習データ
量が少ないと、モデルの一般性がなくなる。この発明で
は、認識のために必要な情報は残しながら、分布数を効
果的に減らしているので、少ない学習データ量で、高い
認識性能を得ることができる。In a statistical method such as HMM, there is a close relationship between the number of parameters and the amount of learning data. When a large number of distributions are included in the model, the number of parameters to be estimated increases and a large amount of training data is required. When the amount of training data is small, the generality of the model is lost. In the present invention, the number of distributions is effectively reduced while leaving the information necessary for recognition, so that high recognition performance can be obtained with a small amount of learning data.

【００２１】また、音声認識では、ある話者が発声した
音声を用いて、不特定話者用に作られた音響モデルをそ
の話者に合うように適応化する話者適応が行われる。話
者適応の実際の場面では、少量の適応用音声データしか
得られないことがしばしばである。少量のデータでモデ
ルパラメータを学習すると、パラメータが各モデルで独
立に設定されている場合、学習データに関係する一部の
モデルパラメータしか適応化できない。この発明の共有
化を行えば、一部のモデルパラメータが適応化された場
合、それらを共有する他のモデルのパラメータも同時に
適応化できる。In the voice recognition, speaker adaptation is performed by using a voice uttered by a speaker to adapt an acoustic model created for an unspecified speaker so as to match the speaker. In the actual situation of speaker adaptation, only a small amount of adaptation voice data is often obtained. When model parameters are learned with a small amount of data, only some model parameters related to the training data can be adapted if the parameters are set independently in each model. According to the sharing of the present invention, when some model parameters are adapted, the parameters of other models sharing them can be adapted at the same time.

【００２２】次に実施例を述べる。ＨＭＭにおける共有
化には（１）異なる音素環境が同一のモデルを共有する
モデルレベルの共有化（第１階層）（例えば、嵯峨山
“音素環境のクラスタリング”日本音響学会昭和６２年
度秋季研究発表会講演論文集１−５−１５や、K-F Lee
他 "Large-vocabulary speaker-independent continuou
s speech recognition using HMM" Proceedings of 198
8 International Conference on Acoustics, Speech an
d Signal Processing, pp 123-126.参照）、（２）異な
るモデルが同一の状態を共有する状態レベルの共有化
（第２階層）（例えば、鷹見、他“逐次状態分割法（Ｓ
ＳＳ）による隠れマルコフネットワークの自動生成”、
日本音響学会平成３年秋季研究発表会講演論文集２−５
−１３参照）、（３）異なる状態が同一の多次元正規分
布を共有する基底分布レベル共有化（第３階層）（例え
ば、 X.D. Huang "Unified technique for vector quan
tization and hidden Markov modeling using semi-con
tinuous models", Proceedingsof 1989 International
Conference on Acoustics, Speech and Signal Process
ing, pp.639-642. 参照）、（４）この発明による異な
る多次元正規分布の同一の一次元正規分布（平均値と分
散値）を共有する特徴量レベルの共有化（第４階層）が
ある。Next, examples will be described. For HMM sharing (1) Model level sharing where different phoneme environments share the same model (first layer) Proceedings 1-5-15 and KF Lee
Other "Large-vocabulary speaker-independent continuou
s speech recognition using HMM "Proceedings of 198
8 International Conference on Acoustics, Speech an
d Signal Processing, pp 123-126.), (2) State level sharing (second layer) where different models share the same state (eg Takami, et al. “Sequential State Division Method (S
Automatic generation of hidden Markov networks by SS) ",
Acoustical Society of Japan 1991 Autumn Research Presentation Presentation 2-5
-13), (3) Basis distribution level sharing (third layer) where different states share the same multidimensional normal distribution (eg, XD Huang "Unified technique for vector quan
tization and hidden Markov modeling using semi-con
tinuous models ", Proceedingsof 1989 International
Conference on Acoustics, Speech and Signal Process
ing, pp.639-642.), (4) Sharing of feature level sharing the same one-dimensional normal distribution (mean value and variance value) of different multidimensional normal distributions according to the present invention (fourth layer) There is.

【００２３】ステップ１：第１階層、第２階層の共有を
実現するために、状態逐次分割法（ＳＳＳ）を採用す
る。１名の話者データを用いて、各状態を単一正規分布
で表現した４５０状態のＨＭｎｅｔを作成する。ステップ２：上記モデルの各状態を２混合分布化した
後、多数話者のデータで学習し、不特定話者用ＨＭｎｅ
ｔを作成する（２階層共有モデル）。Step 1: In order to realize the sharing of the first layer and the second layer, the sequential state division method (SSS) is adopted. The 450 state HMnet in which each state is expressed by a single normal distribution is created using the data of one speaker. Step 2: After each state of the model is mixed into two distributions, learning is performed with the data of a large number of speakers, and HMne for unspecified speakers is calculated.
Create t (2-layer shared model).

【００２４】ステップ３：多次元正規分布レベルの共有
を実現するために、すべての分布（２混合×４５０状態
＝９００分布）から７００個の分布クラスタを生成する
（実験では、参考のため２５６個、６４個の場合も行っ
た）。分布ｉと分布ｊの距離尺度は以下のように定義し
た。Step 3: To realize the sharing of multidimensional normal distribution levels, 700 distribution clusters are generated from all distributions (2 mixture × 450 states = 900 distributions) (in the experiment, 256 clusters are provided for reference). , 64 cases). The distance measure of distribution i and distribution j was defined as follows.

【００２５】[0025]

【数４】 [Equation 4]

【００２６】ここで、μ，σはそれぞれ平均値、分散値
を表す。Ｐは総次元数である。各クラスタの代表分布
は、クラスタ内の分布によって共有される。よって、各
状態２混合を保ちながら、総分布数は９００個から７０
０個に削減される（３階層共有モデル）。ステップ４：特徴量レベルの共有化では、各次元を異な
る数の分布で表現する。ここでは、ステップ２で得られ
たモデルの各次元に存在する９００個の分布の平均分布
間距離を計算し、その距離の比をもとに各次元の分布数
ｍ_p（ｐは次元）を決定した。距離尺度は式（４）と同
じである。Here, μ and σ represent an average value and a variance value, respectively. P is the total number of dimensions. The representative distribution of each cluster is shared by the distribution within the cluster. Therefore, the total number of distributions is from 900 to 70 while maintaining the state 2 mixture.
The number is reduced to 0 (3-layer shared model). Step 4: In sharing the feature level, each dimension is represented by a different number of distributions. Here, the average inter-distribution distance of 900 distributions existing in each dimension of the model obtained in step 2 is calculated, and the distribution number m _p (p is a dimension) of each dimension is calculated based on the distance ratio. Decided. The distance measure is the same as in equation (4).

【００２７】ステップ５：ステップ３で得られたモデル
をもとに、各次元で独立にｍ_p個の分布クラスタを生成
し、共有関係を決定する（４階層共有モデル）。なお、
すべてのクラスタリングには最遠近隣法（Ｆｕｒｔｈｅ
ｓｔｎｅｉｇｈｂｏｒｍｅｔｈｏｄ）を用いた。ま
た、３階層共有モデル４階層共有モデルとも、分布の共
有化関係が決定した後に再学習を行った。[0027] Step 5: Based on the model obtained in the step 3, independently in each dimension to generate m _p pieces of distribution clusters, determining a share relationship (4 hierarchical sharing model). In addition,
The furthest neighbor method (Furthe) for all clustering
st neighbor method) was used. In addition, with the three-layer shared model and the four-layer shared model, re-learning was performed after the sharing relationship of distribution was determined.

【００２８】このようにして得られた各階層モデルの性
能と認識時の計算時間を、音素認識実験と単語認識実験
で評価した。実験には、ＡＴＲ重要語５２４０単語セッ
トと２１６単語セットの男性１０名分を使用した。この
うち、８名を学習用話者、２名を評価話者とした。モデ
ルの学習には、学習用話者の５２４０単語の偶数番目か
ら１０，４８０単語を平均的に選び出したセット、およ
びすべての学習用話者の２１６単語セットを使用した。
音素認識実験の評価には、奇数番目の単語セットから５
２４単語を任意に選び出し使用した。単語認識実験で
は、５２４０単語の奇数番目の単語から１５００語を任
意に選び出し認識対象語彙とし、２００単語を実際に認
識して評価した。音素カテゴリー数は２６である。パラ
メータは１６次のケプストラム、１６次のΔケプストラ
ム、Δパワー（全部で３３次元）である。The performance of each hierarchical model thus obtained and the calculation time at the time of recognition were evaluated by a phoneme recognition experiment and a word recognition experiment. In the experiment, 10 males with 5240 word sets of ATR important words and 216 word sets were used. Of these, eight were learning speakers and two were evaluation speakers. The model was trained using a set of evenly selected 10,480 words from the even number of 5240 words of the learning speaker and a 216 word set of all the learning speakers.
For the evaluation of the phoneme recognition experiment, 5 from the odd-numbered word set
Twenty-four words were arbitrarily selected and used. In the word recognition experiment, 1500 words were arbitrarily selected from the odd-numbered words of 5240 words as a recognition target vocabulary, and 200 words were actually recognized and evaluated. The number of phoneme categories is 26. The parameters are the 16th-order cepstrum, the 16th-order Δ cepstrum, and the Δpower (33 dimensions in total).

【００２９】ステップ４において、特徴量の各次元に割
り当てられた分布数の例を図４に示す。図４は各次元、
平均６４分布（総数：６４×３３次元＝２１１２分布）
とした場合の結果である。ケプストラム、Δケプストラ
ムとも、低い次元の成分の方が平均的な分布間距離が大
きく、より多くの分布が割り当てられていることがわか
る。FIG. 4 shows an example of the number of distributions assigned to each dimension of the feature quantity in step 4. Figure 4 shows each dimension
Average 64 distributions (total: 64 x 33 dimensions = 2112 distribution)
Is the result when. It can be seen that both the cepstrum and the Δ cepstrum have a larger average inter-distribution distance for the components of lower dimensions, and more distributions are assigned.

【００３０】次に図５に、各モデルの構成、平均音素認
識率、平均単語認識率、平均計算時間（ＣＰＵｔｉｍ
ｅ）の比を示す。図５中の３階層共有モデル、４階層共
有モデルはどれも、常に４５０状態２混合（９００分
布）の構成を保ちながら共有化を行っている。４階層共
有モデルでは、各次元の分布数は不均一に割り当てられ
ているので、図５中の数値は平均分布数を示す。総パラ
メータ数には、平均値、共分散値、分布の重み係数が考
慮されている。計算時間は、単語認識において出力確率
計算に費やしたＣＰＵｔｉｍｅを２階層共有モデルの場
合を１．０として比で表した。計算機はＳＵＮＳＰＡ
ＲＣ１０である。参考のため、３状態２混合の音素環境
独立モデルの結果も示す。Next, FIG. 5 shows the structure of each model, the average phoneme recognition rate, the average word recognition rate, and the average calculation time (CPU tim).
The ratio of e) is shown. All of the three-layer shared model and the four-layer shared model in FIG. 5 are shared while always maintaining the structure of 450 state 2 mixture (900 distribution). In the four-tier shared model, the distribution numbers in each dimension are non-uniformly assigned, and the numerical values in FIG. 5 represent the average distribution numbers. A mean value, a covariance value, and a distribution weighting factor are considered in the total number of parameters. The calculation time was expressed as a ratio, with the CPUtime spent in the output probability calculation in word recognition being 1.0 for the two-layer shared model. Calculator is SUN SPA
RC10. For reference, the results of the phoneme environment independent model with three states and two mixtures are also shown.

【００３１】３階層共有モデルは、基底分布の数を７０
０個に減らしても性能は変わらずに計算時間を短縮でき
るが、この場合は各次元のすべてについて平均値、共分
散値か否かに類似しているとしているとするものである
から２５６個、６４個にまで数を減らすと、もともと９
００個の分布に対し無理な共有関係が結ばれ図５に示す
ように性能（認識率）が急激に低下する。これに対し、
この発明の４階層共有モデルでは、各次元に効果的に分
布が割り振られ、それらの組み合わせで７００個の基底
分布を表現し、これらが９００個の分布を表現してい
る。これらの階層化によって、効率的な表現がなされ、
分布数が２５６でも音素認識率が８７．６、単語認識率
が９０．０もあり、３階層共有モデルの分布数７００と
同一結果となり、分布数を６４に減らしても、音素認識
率が８６．４、単語認識率が８９．３と、わずかしか減
少せず、３階層共有モデルの分布数６４よりも著しく高
い認識率が得られ、しかも計算時間を削減できる。In the three-layer shared model, the number of basis distributions is 70
Even if the number is reduced to 0, the performance does not change and the calculation time can be shortened. However, in this case, it is assumed that it is similar to the average value and the covariance value for all of the dimensions. , It is originally 9 when the number is reduced to 64.
An unreasonable sharing relationship is established for the 00 distributions, and the performance (recognition rate) sharply decreases as shown in FIG. In contrast,
In the four-layer shared model of the present invention, distributions are effectively assigned to each dimension, and the combination thereof represents 700 base distributions, and these represent 900 distributions. By these layering, efficient expression is made,
Even if the number of distributions is 256, the phoneme recognition rate is 87.6 and the word recognition rate is 90.0, which is the same result as the number of distributions 700 of the three-layer shared model. Even if the number of distributions is reduced to 64, the phoneme recognition rate is 86. .4, the word recognition rate is only 89.3, which is a slight decrease, and a recognition rate significantly higher than the distribution number 64 of the three-layer shared model is obtained, and the calculation time can be reduced.

【００３２】このような効果が得られたのは、各次元で
独立に正規分布の共有化をし、この結果、図４に示した
ように、次元のもつ情報量に応じて分布数が異なり、つ
まり情報量の少ない高次元は多く共有化され、分布数が
少なく、情報量が多い低次元は共有化が少なく、分布数
が多く、全体として少ない分布数でモデルと効率的に表
現することができるからである。Such an effect is obtained because the normal distribution is shared in each dimension independently, and as a result, as shown in FIG. 4, the number of distributions varies depending on the information amount of the dimension. In other words, high dimensions with a small amount of information are commonly shared, and the number of distributions is small, and low dimensions with a large amount of information are less shared, the number of distributions is large, and the number of distributions as a whole can be expressed efficiently with the model Because you can

【００３３】この発明は、音声認識に限らず、ＨＭＭを
用いた文字認識、図形認識など、ＨＭＭを使用するあら
ゆるパターン認識において使用することができる。The present invention can be used not only for voice recognition but also for any pattern recognition using HMM, such as character recognition using HMM and figure recognition.

【００３４】[0034]

【発明の効果】この発明によるモデルパラメータの共有
化の効果は２点ある。１つは、モデルの学習効率を上げ
ることができる点、１つは認識時の計算量を削減できる
点である。一般に、モデルパラメータは各モデルで独立
に設定され、各々のカテゴリーのデータを用いて学習さ
れる。しかし、性質の類似したパラメータを異なるモデ
ル間で共有化すれば、共有化したパラメータについては
両方のカテゴリーのデータを用いて学習できるので、見
かけ上のデータ量は増えることになる。The effect of sharing model parameters according to the present invention has two points. One is that the learning efficiency of the model can be improved, and the other is that the amount of calculation at the time of recognition can be reduced. In general, model parameters are set independently for each model and learned using data in each category. However, if parameters with similar properties are shared between different models, the shared parameters can be learned by using data in both categories, so the apparent data amount will increase.

【００３５】例えば、音素環境依存モデルには多数の多
次元正規分布（例えば１０００個以上）が存在するので
各次元には同数の一次元正規分布が存在する。例えば、
正規分布が各次元で１０００個からｍ個にマージされた
場合でも、ｍ個の正規分布の組み合わせで多次元正規分
布を表現するので、それらが表現できる多次元正規分布
の数はｍ^P個であり（Ｐは次元数）、共有化後もかなり
の表現能力を保持している。For example, since a phoneme environment-dependent model has a large number of multidimensional normal distributions (for example, 1000 or more), each dimension has the same number of one-dimensional normal distributions. For example,
Even when the normal distributions are merged from 1000 to m in each dimension, the number of multidimensional normal distributions that can be expressed is m ^P because the multidimensional normal distribution is expressed by the combination of m normal distributions. Yes (P is the number of dimensions), it retains considerable expressive power even after sharing.

【００３６】次に、計算量の観点から利点を考える。現
在の多くのＨＭＭは、多次元無相関正規分布を仮定して
いるので、対数尤度は（３）式のように計算される。
（３）式は、各次元において、正規分布に対する確率密
度値の和になっており、次元間に渡る計算はない。した
がって、計算は各次元で独立に考えることができる。分
布を共有すれば、（３）式の各次元の結果をモデル間で
共有でき、認識時の計算量の削減が可能であるという利
点がある。Next, advantages will be considered from the viewpoint of calculation amount. Since many current HMMs assume a multidimensional uncorrelated normal distribution, the log-likelihood is calculated as in equation (3).
Equation (3) is the sum of the probability density values for the normal distribution in each dimension, and there is no calculation across dimensions. Therefore, the calculation can be considered independently in each dimension. Sharing the distribution has the advantage that the results of each dimension of equation (3) can be shared between the models and the amount of calculation at the time of recognition can be reduced.

【００３７】なお各分布中の平均値のみを共通化するこ
とが提案されているが、例えば、３３次元の特徴量を持
つ多次元正規分布が全部で１０００個ある場合を考える
（音声認識で実際的な値）。パラメータ数は２（平均
値、分散値）×３３×１０００＝６６０００個となる。
これを平均値のみ６４個まで共通化すると、３３×１０
６４＝３５１１２となる。平均値と分散値を共有化する
と３３×６４＝１１１２個となり、大幅に削減可能であ
る。計算量は、平均値のみを共有化した場合より更にわ
り算を各１回省略することができる。It has been proposed that only the average value in each distribution be made common, but for example, consider the case where there are 1000 multidimensional normal distributions having 33-dimensional feature values in total (actually in speech recognition. Value). The number of parameters is 2 (average value, variance value) × 33 × 1000 = 66000.
If this is shared by only 64 average values, 33 × 10
64 = 35112. If the average value and the variance value are shared, the number will be 33 × 64 = 1112, which can be greatly reduced. As for the amount of calculation, the division can be further omitted once compared to the case where only the average value is shared.

[Brief description of drawings]

【図１】Ａは多次元正規分布要素インデックステーブル
の例を示す図、Ｂは計算結果バッファテーブルの例を示
す図である。FIG. 1A is a diagram showing an example of a multidimensional normal distribution element index table, and B is a diagram showing an example of a calculation result buffer table.

【図２】ＨＭＭを用いたパターン認識装置の一般的構成
を示すブロック図。FIG. 2 is a block diagram showing a general configuration of a pattern recognition device using an HMM.

【図３】ＡはＨＭＭの例を示す図、Ｂは混合分布の例を
示す図である。3A is a diagram showing an example of an HMM, and FIG. 3B is a diagram showing an example of a mixture distribution.

【図４】この発明において特徴量の各次元に割り振られ
た分布数の実験例を示す図。FIG. 4 is a diagram showing an experimental example of the number of distributions assigned to each dimension of the feature amount in the present invention.

【図５】従来の共有化法、この発明による共有化法の認
識性能と計算時間の実験結果を示す図。FIG. 5 is a diagram showing experimental results of recognition performance and calculation time of the conventional sharing method and the sharing method according to the present invention.

Claims

[Claims]

1. The likelihood of a hidden Markov model in which the output probability distribution of each state is represented by a multidimensional continuous distribution is calculated for an input vector, and the category represented by the model with the highest likelihood is output as a recognition result. In the pattern recognition method described above, when each state of the hidden Markov model is represented by a single continuous probability distribution or a mixed continuous probability distribution, one-dimensional continuous existing in each dimension of the multidimensional continuous distribution that constitutes them Hidden Markov that a distribution has a common parameter that expresses that distribution with distributions that exist in each dimension of other multidimensional continuous distributions, and the commonalization relation of the parameters is individually specified in each dimension. A pattern recognition method comprising a model.

2. The pattern recognition method according to claim 1, wherein the multidimensional continuous distribution is a multidimensional normal distribution, and the common parameter is an average value and a variance value.

3. The pattern recognition method according to claim 1, further comprising a hidden Markov model in which the number of common parameters differs depending on the dimension.

4. When modifying the hidden Markov model by learning or the like, modifying common parameters in some models and interlocking modifying the common parameters included in all models. The pattern recognition method according to claim 1 or 2, wherein.