JP2836968B2

JP2836968B2 - Signal analyzer

Info

Publication number: JP2836968B2
Application number: JP50192896A
Authority: JP
Inventors: 英一坪香; 順一中橋
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-06-13
Filing date: 1995-06-09
Publication date: 1998-12-14
Anticipated expiration: 2013-12-14

Description

【発明の詳細な説明】技術分野本発明は、音声認識装置等の信号解析装置に関するも
のである。Description: TECHNICAL FIELD The present invention relates to a signal analysis device such as a speech recognition device.

背景技術音声認識を行う方法として、HMM（Hidden Markov Mod
el）を用いる方法やDPマッチングを用いる方法が知られ
ている。何れも音声認識の基本技術として多用されてい
るが、これらの方式において、性能を落とすことなく如
何に計算量を減ずるかと言うことは、大語彙化や連続音
声認識等を実現する上で重要な問題の一つである。この
問題の解決法の一つとしてベクトル量子化を用いるもの
が既に提案されている。本願発明は、この改良に関わる
ものである。そこで、本題にはいる前に、先ず、HMMとD
Pマッチングの一般的な説明と、前記ベクトル量子化の
技術が如何なる形で用いられているかと言うことを説明
する。BACKGROUND ART HMM (Hidden Markov Mod) is a method for performing speech recognition.
A method using el) and a method using DP matching are known. Both are widely used as basic techniques for speech recognition, but in these methods, how to reduce the amount of computation without deteriorating performance is important for realizing large vocabulary and continuous speech recognition. One of the problems. As a solution to this problem, one using vector quantization has already been proposed. The present invention relates to this improvement. So, before going into the subject, first, HMM and D
A general description of P matching and how the vector quantization technique is used will be described.

HMMは、ある確率的性質に従って時系列信号を発生す
るモデルの１つであると考えられる。HMMを用いた音声
認識は、認識すべき単語や音節、音韻等の認識単位（以
後代表的に単語とする）ｒ（＝1,...,R）に対応してHMM
rを設けておき、ベクトル系列Ｙ＝（y₁,y₂,...,y_T）
（y_t:時点ｔで観測されるベクトル）が観測されたとき
前記各HMM rからＹの発生する度合を計算し、その度合
が最大のHMMに対応する単語を認識結果とするものであ
る。The HMM is considered to be one of models that generate a time-series signal according to a certain stochastic property. Speech recognition using an HMM is performed by an HMM corresponding to a recognition unit of a word, a syllable, a phoneme, or the like to be recognized (hereinafter referred to as a word) r (= 1,..., R).
r, a vector series Y = (y ₁ , y ₂ ,..., y _T )
When (y _t : a vector observed at time t) is observed, the degree of occurrence of Y is calculated from each of the HMMs r, and the word corresponding to the HMM with the maximum degree is used as the recognition result.

図１は、HMMの一例を示すものである。○はそのHMMで
モデル化しようとするシステムの状態、→は状態の遷移
の方向、q₁は状態ｉを表す。状態ｉから状態ｊへの遷移
は確率a_ijで生じるとする。状態とその遷移確率のみが
定義されている場合はマルコフ連鎖と呼ばれるが、HMM
ではさらに、各状態遷移にともなってベクトルが発生す
るとし、状態遷移q_i→q_jに伴って、ベクトルｙが発生す
る度合ω_ij（ｙ）が定義されている。ｙが状態の遷移に
伴ってではなく、状態に伴って発生するとしてω
_ij（ｙ）＝ω_ij（ｙ）＝ω_ｉ（ｙ）、またはω_ij（ｙ）
＝ω_ij（ｙ）＝ω_ｊ（ｙ）とする場合も多い。本願で
は、状態にともなってｙが発生するものとして説明す
る。ここで、HMMの構造や状態遷移確率、ベクトルの発
生確率は、そのHMMでモデル化しようとする対象（音声
認識に用いる場合は単語などの音声パターン）の挙動を
できるだけ忠実に説明できるように決定される。図２は
音声認識でよく用いられるHMMの構成の１例である。FIG. 1 shows an example of the HMM.は indicates the state of the system to be modeled by the HMM, → indicates the direction of state transition, and q ₁ indicates state i. It is assumed that the transition from the state i to the state j occurs with a probability a _ij . If only states and their transition probabilities are defined, they are called Markov chains.
Further, it is assumed that a vector is generated with each state transition, and the degree ω _ij (y) at which the vector y is generated is defined according to the state transition q _i → q _j . If y occurs with the state, not with the state transition,
_ij (y) = ω _ij (y) = ω _i (y) or ω _ij (y)
= Ω _ij (y) = ω _j (y) in many cases. In the present application, a description will be given assuming that y occurs according to the state. Here, the structure, state transition probability, and vector occurrence probability of the HMM are determined so that the behavior of the object to be modeled by the HMM (a voice pattern such as a word when used for voice recognition) can be described as faithfully as possible. Is done. FIG. 2 shows an example of the configuration of an HMM often used in speech recognition.

あるHMMが定義されると、観測ベクトル系列Ｙがある
モデル（λと名付ける）から発生する度合Ｌ（Y|λ）は
次のようにして計算できる。When a certain HMM is defined, the degree L (Y | λ) of the observation vector sequence Y generated from a certain model (named λ) can be calculated as follows.

ここで、Ｘ＝（x₁,x₂,...,x_T+1）は状態系列、π_ｉはｔ
＝１で状態ｉである確率である。このモデルでは、x_t∈
｛1,2,...,J,J＋１｝であって、x_T+1＝Ｊ＋１は最終状
態であるとしている。最終状態では、そこへの遷移のみ
起り、そこではベクトルの発生はないものとする。 Here, X = (x ₁ , x ₂ ,..., X _{T + 1} ) is a state sequence, and π _i is t
= 1 and the probability of state i. In this model, x _t ∈
{1,2, ..., J, J + 1}, where _{xT + 1} = J + 1 is the final state. In the final state, it is assumed that only a transition to that occurs, and that no vector is generated there.

HMMには大きく分けて連続型と離散型がある。連続型
はω_ｉ（Ｙ）確率密度関数等のｙの連続関数であって、
y_tの発生度合はｙ＝y_tのときのωｉ（ｙ）の値として与
えられる。状態ｉ毎にω_ｉ（ｙ）を規定するパラメータ
が定義されており、y_tをω_ｉ（ｙ）に代入することによ
って、状態ｉでのy_tの発生度合が計算される。例えば、
ω_ｉ（ｙ）を多次元の正規分布で与えるものとすれば、であって、状態ｉで規定される前記関数のパラメータは
μ_ｉとΣ_ｉである。HMMs are broadly divided into continuous and discrete types. The continuous type is a continuous function of y such as ω _i (Y) probability density function,
generation degree of y _t is given as the value of .omega.i (y) when the y = y _t. A parameter that defines ω _i (y) is defined for each state i, and the degree of occurrence of y _{t in} state i is calculated by substituting y _t into ω _i (y). For example,
If ω _i (y) is given by a multidimensional normal distribution, Where the parameters of the function defined by state i are μ _i and Σ _i .

離散型は、ベクトル量子化によりy_tが変換されるべき
ラベルｍ∈｛1,2,...,M｝の発生確率b_imが状態ｉ毎にテ
ーブルとして記憶されており、状態ｉでのy_tの発生度合
は、y_tの変換されたラベルがｍであるときは、b_imとす
るものである。In the discrete type, an occurrence probability b _{im of a} label m {1,2, ..., M} to which y _t is to be transformed by vector quantization is stored as a table for each state i. generation degree of y _t, when transformed label y _t is m, it is an b _im.

ベクトル量子化はコードブックを用いて行われる。コ
ードブックは、そのサイズをＭとするとき、学習サンプ
ルとして集められた特徴ベクトルを1,2,...,Mのクラス
タにクラスタリングし、クラスタｍ（＝1,2,...,M）の
代表ベクトル（平均ベクトル、セントロイド、コードベ
クトル等とも呼ばれる）μ_ｍをラベルｍで検索可能な形
で記憶したものである。クラスタリング法としてはL.B.
G.アルゴリズムと呼ばれる方法がよく知られている。y_t
のベクトル量子化はそれに最も近いセントロイドのラベ
ルにy_tを変換することによって行われる。従って、y_tの
状態ｉにおける発生度合は、数式的にはで与えられる。ｄ（y_t,μ_ｍ）はy_tとμ_ｍとの距離であ
って、ユークリッド距離を始めとして種々のものが考え
られる。Vector quantization is performed using a codebook. When the size of the codebook is M, the feature vectors collected as learning samples are clustered into clusters of 1, 2,..., M, and the cluster m (= 1, 2,. representative vectors is obtained by storing (mean vector, centroid, also called code vector such) mu _m with a searchable form with label m. LB as a clustering method
G. A method called an algorithm is well known. y _t
Is performed by transforming y _t to the nearest centroid label. Therefore, the degree of occurrence of y _t in state i is mathematically expressed as Given by d (y _t , μ _m ) is the distance between y _t and μ _m, and various types including the Euclidean distance can be considered.

図３は、離散型HMMを用いた音声認識装置のブロック
図である。301は特徴抽出部であって、入力音声信号を
フィルタバンク、フーリエ変換、LPC分析等の周知の方
法により、一定時間間隔（フレームと呼ぶ）例えば10ms
ec毎に特徴ベクトルに変換する。従って、入力音声信号
は特徴ベクトルの系列Ｙ＝（y₁、y₂,...,y_T）に変換さ
れる。Ｔはフレーム数である。302はコードブックと呼
ばれるもので、ラベルにより検索可能な形で各ラベルに
対応した代表ベクトルを保持している。303はベクトル
量子化部であって、前記ベクトル系列Ｙのそれぞれのベ
クトルをそれに最も近い、前記コードブックに登録され
ている代表ベクトルに対応するラベルに置き換える（符
号化する）ものである。304はパラメータ推定部であっ
て、学習サンプルから認識語彙たる各単語に対応するHM
Mのパラメータを推定するものである。即ち、単語ｒに
対応するHMMを作るには、先ず、HMMの構造（状態数やそ
の遷移規則）を適当に定め、然る後に単語ｒを多数回発
声して得られたラベル系列から、それらラベル系列の発
生度合が出来るだけ高くなるように、前記モデルにおけ
る状態遷移確率や状態に伴って発生するラベルの発生確
率を求めるものである。305はHMM記憶部であって、この
ようにして得られたHMMを各単語毎に記憶するものであ
る。306は尤度計算部であって、認識すべき未知入力音
声のラベル系列に対し、前記HMM記憶部305に記憶されて
いるそれぞれのモデルの前記ラベル系列に対する尤度を
計算するものでる。307は判定部であって尤度計算部306
で得られた前記それぞれのモデルの尤度の最大値を与え
るモデルに対応する単語を認識結果として判定するもの
である。図３において、破線はHMM作成時の信号の流れ
を示すものである。FIG. 3 is a block diagram of a speech recognition device using a discrete HMM. Reference numeral 301 denotes a feature extraction unit which converts an input audio signal into a predetermined time interval (called a frame), for example, 10 ms, by a known method such as filter bank, Fourier transform, LPC analysis, or the like.
Convert to a feature vector for each ec. Therefore, the input speech signal is converted into a sequence of feature vectors Y = (y ₁ , y ₂ ,..., Y _T ). T is the number of frames. Reference numeral 302 denotes a codebook, which holds a representative vector corresponding to each label in a form that can be searched by the label. Reference numeral 303 denotes a vector quantization unit which replaces (encodes) each vector of the vector series Y with a label corresponding to the closest vector corresponding to a representative vector registered in the codebook. Reference numeral 304 denotes a parameter estimating unit, which is an HM corresponding to each word that is a recognized vocabulary from the learning sample.
It estimates the parameters of M. That is, in order to create an HMM corresponding to the word r, first, the structure of the HMM (the number of states and its transition rules) is appropriately determined, and then the label sequence obtained by uttering the word r a number of times is used. The state transition probability in the model and the probability of occurrence of a label generated in accordance with the state are determined so that the degree of occurrence of the label sequence is as high as possible. An HMM storage unit 305 stores the HMM obtained as described above for each word. A likelihood calculation unit 306 calculates the likelihood of each model stored in the HMM storage unit 305 with respect to the label sequence for the label sequence of the unknown input speech to be recognized. 307 is a determination unit and a likelihood calculation unit 306
The word corresponding to the model giving the maximum likelihood of each model obtained in step (1) is determined as the recognition result. In FIG. 3, a broken line indicates a signal flow when the HMM is created.

連続型HMMは、各状態における観測ベクトルの発生度
合はそこに定義された確率密度関数によって与えられ、
離散型より精度は高いが多量の計算を必要とすると言う
問題がある。一方、離散型HMMは、観測ラベル系列に対
するモデルの尤度の計算において、各状態でのラベルｍ
（＝1,…,M）の発生確率b_imはラベルに関連して予め記
憶されている記憶装置から読み出すことで実行できるか
ら計算量が非常に少ないと言う利点があるが、量子化に
伴う誤差のため、認識精度が連続型より悪くなると言う
欠点がある。これを避けるためにはラベル数Ｍを多くす
る（コードブックサイズを大きくする）必要があるが、
その増加に伴ってモデルを学習するために必要となる学
習サンプル数が膨大になる。学習サンプル数が不十分な
場合は、前記b_imの推定値が頻繁に０になることがあ
り、正しい推定が出来なくなる。In continuous HMM, the degree of occurrence of the observation vector in each state is given by the probability density function defined there,
The accuracy is higher than the discrete type, but there is a problem that a large amount of calculation is required. On the other hand, the discrete HMM calculates the label m in each state in calculating the likelihood of the model for the observed label sequence.
(= 1, ..., M) the probability of occurrence of b _im is there is an advantage that very small amount of calculation because can be executed by reading from the storage device stored in advance in connection with the label, due to quantization There is a disadvantage that the recognition accuracy is worse than that of the continuous type due to an error. To avoid this, it is necessary to increase the number of labels M (increase the codebook size),
With the increase, the number of learning samples required for learning the model becomes enormous. If the number of learning samples is insufficient, there is the estimated value of the b _im is frequently 0, can not be the correct estimate.

この推定誤差は、例えば、次のようなものである。い
ま、前記認識語彙の中に、「大阪」と言う単語音声があ
ったとして、これに対応するモデルを作る場合を考え
る。多数話者が発声した単語「大阪」に対応する音声サ
ンプルが特徴ベクトル系列に変換され、各々の特徴ベク
トルが前記のようにしてラベルに変換される。このよう
にして、前記「大阪」に対する各々の音声サンプルは、
それぞれに対応したラベル系列に変換される。得られた
ラベル系列から、それらラベル系列に対する尤度が最大
になるようにHMMのパラメータ｛a_ij,b_im｝を推定するこ
とにより、単語「大阪」に対応する離散型HMMが出来上
がる。この推定には周知のBaum−Welch法等を用いるこ
とが出来る。This estimation error is, for example, as follows. Now, let us consider a case where a word corresponding to "Osaka" is present in the recognized vocabulary and a model corresponding to the word is created. Voice samples corresponding to the word "Osaka" uttered by many speakers are converted into a feature vector sequence, and each feature vector is converted into a label as described above. Thus, each audio sample for the "Osaka"
It is converted into a label sequence corresponding to each. By estimating the HMM parameters {a _ij , b _im } from the obtained label sequences so that the likelihood for those label sequences is maximized, a discrete HMM corresponding to the word “Osaka” is completed. The well-known Baum-Welch method or the like can be used for this estimation.

この場合、単語「大阪」に対応する学習サンプルのラ
ベル系列の中には、コードブックに存在する全てのラベ
ルが含まれるとは必ずしも言えない。この学習サンプル
のラベル系列に現れないラベルの発生確率は「大阪」に
対応するモデルにおいては学習の過程で“0"と推定され
る。従って、認識の時に発声される「大阪」と言う単語
音声に対応するラベル系列の中に、たまたま前記「大
阪」のモデルの作成に用いたラベル系列に反含まれてい
ないラベルが存在する場合（学習サンプル数が少ないと
きはこのことは十分有り得る）、この認識時に発声され
た「大阪」のラベル系列が前記学習された「大阪」のモ
デルから発生する度合は“0"になってしまう。ところ
が、このような場合でも、ラベルとしては異なっていて
も、ラベルに変換される前の特徴ベクトルの段階ではモ
デルの学習に用いた音声サンプルとかなり近く、ベクト
ルの段階で見れば十分「大阪」と認識されてしかるべき
場合がある。もともと同じ単語を発声しているのである
からベクトルの段階では似通っているはずであるが、そ
れらのベクトルが、変換されるべきラベルのクラスタの
境界付近にある場合は、ベクトルの段階では僅かの差で
も、全く異なったラベルに変換されてしまうということ
は十分起こり得る。このようなことが認識精度に悪影響
を及ぼすことは容易に想像がつく。コードブックサイズ
Ｍが大きくなる程、学習サンプル数が少ない程このよう
な問題は頻繁に生じることになる。In this case, the label sequence of the learning sample corresponding to the word “Osaka” does not necessarily include all the labels existing in the codebook. The occurrence probability of a label that does not appear in the label sequence of this learning sample is estimated to be “0” in the learning process in the model corresponding to “Osaka”. Therefore, in the label sequence corresponding to the word voice “Osaka” uttered at the time of recognition, a label that happens to be not included in the label sequence used for creating the model of “Osaka” exists ( When the number of learning samples is small, this is satisfactorily possible.) When the label sequence of “Osaka” uttered during this recognition is generated from the learned “Osaka” model, the degree becomes “0”. However, even in such a case, even if the label is different, at the stage of the feature vector before being converted to the label, it is quite close to the voice sample used for learning the model, and if it is seen at the vector stage, it is enough "Osaka" May be recognized as appropriate. They should be similar at the vector stage because they originally uttered the same word, but if the vectors are near the boundaries of the clusters of labels to be transformed, there will be a slight difference at the vector stage. However, it is quite possible that the label will be converted to a completely different label. It is easy to imagine that this adversely affects recognition accuracy. Such a problem occurs more frequently as the codebook size M increases and the number of learning samples decreases.

この欠点を除去する方法の１つとしてファジイベクト
ル量子化に基づくHMM（FVQ/HMM）がある。中でも電子情
報通信学会技術報告SP93−27（1993年６月）に記載され
ている相乗型FVQ/HMMは優れた性能を示すものとして注
目に値する。One of the methods for eliminating this defect is an HMM (FVQ / HMM) based on fuzzy vector quantization. Among them, the synergistic FVQ / HMM described in IEICE Technical Report SP93-27 (June 1993) is notable for showing excellent performance.

図４はFVQ/HMMの一般的な原理を説明するブロック図
である。同図において破線はHMMの作成時の信号の流れ
を示す。401は特徴抽出部であって、図３の301と同様な
ものである。402はコードブックであって図３の302と同
様のものである。403は帰属度算出部であって、前記特
徴ベクトルは帰属度ベクトルに変換される。帰属度ベク
トルは、各時点における特徴ベクトルの、各クラスタに
対する帰属度を要素とするベクトルであって、時点ｔに
おける特徴ベクトルをy_t,前記クラスタをC₁,..,C_M,y_tの
C_mにおける帰属度をu_tmとすれば、y_tは帰属度ベクトルu
_t＝（u_t1,...,u_tM）^Ｔに変換される。以後本願において
ベクトルは縦ベクトルとし、右肩のＴは転置を表すこと
とする。u_tmの定義としては種々考えられるが、例えばと定義できる。（J.G.Bezdek:“Pattern Recognition w
ith Fuzzy Objective Function Algorithm",Plenum Pre
ss,New York（1981）．）。ｄ（x,y）は、ベクトルｘと
ｙの距離である。この式において、Ｆ＞１はファジネス
と呼ばれるものであって、である。ここに、δ_ijはクロネッカーのデルタで、ｉ＝
ｊのときδ_ij＝１、１≠ｊのときδ_ij＝０である。Ｆ→
１のときy_tはそれに最も近いセントロイドに対応するク
ラスタのラベルをo_tとすると、そのクラスタへの帰属度
は１、他のクラスタへの帰属度は０となるから、これは
通常のベクトル量子化となり、Ｆ→∞となると、何れの
クラスタに対してもy_tの帰属度は1/Mであり、曖昧性が
最大になるということを（数５）は意味している。他に
帰属度の定義として、ニューラルネット他の手段を用い
てy_tに対するC_mの事後確率が算出できるときは、その事
後確率とすることが出来る（以後は、「事後確率」およ
び「帰属度」は共に「帰属度」と呼ぶことにする）。FIG. 4 is a block diagram illustrating the general principle of FVQ / HMM. In the same figure, the broken line indicates the signal flow when creating the HMM. Reference numeral 401 denotes a feature extraction unit, which is similar to 301 in FIG. Reference numeral 402 denotes a code book, which is similar to 302 in FIG. Reference numeral 403 denotes a membership calculation unit, which converts the feature vector into a membership vector. Membership vector, the feature vector at each time point, a vector with the membership elements for each cluster, the feature vector at time t y _t, the cluster C _1, .., C _M, the y _t
If the degree of membership in C _m is u _tm , y _t is the degree of membership vector u
_t = ( _ut1 , ..., _utM ) Converted to ^T. Hereinafter, in the present application, a vector is a vertical vector, and T at the right shoulder represents transposition. There are various possible definitions of u _tm , for example, Can be defined as (JGBezdek: “Pattern Recognition w
ith Fuzzy Objective Function Algorithm ", Plenum Pre
ss, New York (1981). ). d (x, y) is the distance between the vectors x and y. In this equation, F> 1 is called fuzziness, It is. Where δ _ij is the Kronecker delta, i =
When j, δ _ij = 1, and when 1 ≠ j, δ _ij = 0. F →
When 1 y _t when the label of the cluster corresponding to the closest centroid to it and o _t, 1 degree of belonging to the cluster, because the degree of belonging to other cluster becomes 0, this is normal vector (Equation 5) means that when F → ∞, the membership of y _t is 1 / M for any cluster, and the ambiguity is maximized. If the posterior probability of C _m with respect to y _t can be calculated using a neural network or other means, the posterior probability can be defined as the posterior probability. Are referred to as "degree of membership").

後に述べる理由のために、実際には、前記帰属度u_tm
は、全てのクラスタについて計算されるものではなく、
ｄ（y_t,μ_ｍ）が最小のクラスタからＫ番目に小さいク
ラスタ（Ｋ−nearest neighbor）について計算される。
即ち、前記帰属度ベクトルu_tを形成する要素は、帰属度
の大きい上位Ｋのクラスタに関しては（数４）で計算さ
れた値であり、他は０とされる。404はパラメータ推定
部である。405はHMM記憶部であって、認識すべき単語や
音節等の各認識単位に対応したHMMを記憶するものであ
る。406は尤度計算部であって、前記ベクトル量子化部
の出力に得られる帰属度ベクトル系列から、前記各HMM
の入力音声に対する尤度、即ち、前記特徴ベクトルの系
列y₁,...,y_Tが前記HMM r（ｒ＝1,...,R）それぞれから
発生する度合L^rを計算するものである。407は判定部で
あって、を計算し、ｒ^＊は認識結果とするものである。In practice, for the reasons described below, the degree of membership u _tm
Is not calculated for all clusters,
d (y _t , μ _m ) is calculated for the K-th smallest cluster (K-nearest neighbor) from the smallest cluster.
That is, the elements forming the membership vector u _t is the terms cluster large upper K degree of membership is computed values (Equation 4), the other is zero. 404 is a parameter estimation unit. An HMM storage unit 405 stores an HMM corresponding to each recognition unit such as a word or a syllable to be recognized. Reference numeral 406 denotes a likelihood calculation unit that calculates each of the HMMs from the membership vector series obtained at the output of the vector quantization unit.
Likelihood for the input speech, i.e., sequence y ₁ of the feature vector, ..., y _T is the HMM r (r = 1, ... , R) is adapted to calculate the degree L ^r generated from each is there. 407 is a determination unit, Is calculated, and r ^* is a recognition result.

尤度計算部406は、認識単位ｒに対応する尤度L^rをｒ
＝1,...,Rについて（数１）に従って計算するものであ
るが、前記ω_ｉ（y_t）に定義の仕方によって種々のHMM
が定義される。ここで取り上げている相乗型FVQ/HMM
は、ω_ｉ（y_t）を原理的には次のように定義したもので
ある。ただし、b_imはHMMの状態ｉにおけるクラスタｍの
発生確率である。Likelihood calculating section 406 calculates likelihood L ^r corresponding to recognition unit ^r by r
= 1, ..., but is intended to calculate the R according to equation (1), wherein the omega _i various HMM by way of definition _(y _t)
Is defined. Synergistic FVQ / HMM featured here
Defines ω _i (y _t ) in principle as follows. Here, b _im is the occurrence probability of cluster m in state i of the HMM.

前記のように実際には（数７）におけるｍに関する加
算または乗算は帰属度の上位Ｋクラスタのみで行われる
ものであり、この場合は、（数７）は（数８）のように
なる（以後、加算形式で説明する。）ただし、ｈ（ｋ）はy_tがｋ番目に帰属度の高いクラスタ
名である。帰属度を（数４）で定義するときは、小さい
順にＫ番目までのｄ（y_t,μ_ｍ）に関して（数４）を計
算すればよい。この場合、ｕ_t,h（１）＋…＋ｕ
_t,h（Ｋ）＝１、ｕ_{t,h（Ｋ＋１）}＝…＝ｕ_t,h（Ｍ）＝
０となる。（数８）のように（数７）における加算は帰
属度の上位Ｋクラスタのみで行われるのは、計算量の削
減も勿論であるが次のような理由にもよる。 As described above, the addition or multiplication of m in (Equation 7) is actually performed only in the K clusters with the highest degree of membership. In this case, (Equation 7) becomes (Equation 8) ( Hereinafter, the description will be made in the form of addition.) Here, h (k) is a cluster name in which y _t has the k-th highest degree of belonging. When the degree of membership is defined by (Equation 4), (Equation 4) may be calculated for d (y _t , μ _m ) up to the K-th order from the smallest. In this case, u _{t, h (1)} +... + U
_{t, h (K)} = 1, ut _{, h (K + 1)} =... = ut _{, h (M)} =
It becomes 0. The reason why the addition in (Expression 7) as in (Expression 8) is performed only in the K clusters with the highest degree of membership is due to the following reason, not to mention reducing the amount of calculation.

FVQ型が離散型に比べて高い認識率を示すのは、パラ
メータ推定時における学習サンプルの補間効果のためで
ある。この補完効果は例えば次のような形で効いて来
る。例えば、クラスタＡとクラスタＢが状態ｉで発生す
る確率を学習サンプルから推定する場合を考える。離散
型の場合は、量子化されるべきベクトルはいかにＢに近
くてもその境界より少しでもＡ側にあればＡに類別さ
れ、少しでもＢ側にあればＢに類別される。従って、母
集団としてはA,Bが同じ位の割合で含まれていても、学
習サンプルでは偏りがあって、特にA,Bの境界付近のベ
クトルでたまたまＡに含まれるものが多かったために、
Ａに生じる確率がＢの生じる確率よりも大きく推定され
てしまうと言うようなことが起こり得る。コードブック
サイズに対する学習データ数が小さいとこのような学習
データの偏りが起こり易くなり、学習サンプルと評価デ
ータが独立である場合は、この偏りは評価データの傾向
とは必ずしも一致しないから認識率は悪くなる。The reason why the FVQ type shows a higher recognition rate than the discrete type is due to the interpolation effect of the learning sample at the time of parameter estimation. This complementary effect works in the following manner, for example. For example, consider a case where the probability that clusters A and B occur in state i is estimated from a learning sample. In the case of the discrete type, a vector to be quantized is classified as A if it is on the A side even if it is close to B, but it is classified as B if it is on the B side. Therefore, even if A and B are included in the same proportion as the population, there is a bias in the learning sample, and especially because many vectors near the boundary between A and B happen to be included in A,
It is possible that the probability of occurrence of A is estimated to be larger than the probability of occurrence of B. If the number of training data with respect to the codebook size is small, such bias of the training data is likely to occur. Deteriorate.

一方、FVQ型の場合は、ベクトルの帰属度に応じてＡ
ばかりでなくＢも発生しているとしてそれらの出現確率
を計算することになるから、上のような学習サンプルに
対しては、Ａの発生確率の方が幾分高く推定されるにし
ても、Ｂの発生確率もその帰属度に応じて推定されるこ
とになり、離散型ほどには極端な推定誤差は生じない。
これは、FVQ型とすることにより学習サンプルに対して
補間が行われる、言い換えれば近似的に学習サンプルを
増やしていると言える。このことが、特にコードブック
サイズの大きいところでFVQ型の認識率が離散型の認識
率を上回る理由である。On the other hand, in the case of the FVQ type, A
Not only that, but also the occurrence of B will be calculated assuming that B has also occurred, so for the above learning sample, even if the occurrence probability of A is estimated to be somewhat higher, The occurrence probability of B is also estimated in accordance with the degree of belonging, and no extreme estimation error occurs as in the discrete type.
This means that interpolation is performed on the learning samples by using the FVQ type, in other words, the number of learning samples is approximately increased. This is the reason why the recognition rate of the FVQ type exceeds the recognition rate of the discrete type particularly in a case where the codebook size is large.

ところが、FVQ型は学習サンプル数の不足を補間する
と言っても、これは飽くまで与えられた学習サンプルそ
のものから見かけ上近似的に学習サンプル数を増やすと
言うことであって、実際の学習サンプル数を増やすと言
うこととは些か異なる。従って、コードブックサイズが
小さくなって各クラスタに対する学習サンプル数が相対
的に増加し、b_imの推定精度が十分に上がって来ると、
補間の仕方によっては、下手に補間をするよりも補完を
しない離散型の方がFVQ型よりも認識率が高くなる、も
しくは同程度になるということは十分有り得る。However, even though the FVQ type interpolates the lack of training samples, this means that the number of training samples is increased apparently from the given training sample itself. It's a little different from increasing. Therefore, the number of learning samples is relatively increased for each cluster codebook size is reduced, the accuracy of estimation of b _im come up sufficiently,
Depending on the method of interpolation, it is quite possible that the discrete type that does not perform interpolation will have a higher or similar recognition rate than the FVQ type, rather than performing poor interpolation.

この補間の程度は、コードブックサイズやファジィネ
スと共にＫの値如何によって影響を受ける。Ｋ＝１に近
づくにつれて、即ち、離散型に近づくにつれて補完の影
響は小さくなり、Ｋが増加するにつれて補間の影響は大
きくなる。従って、ファジィネスを固定したとき、Ｋに
よって補間の程度をコントロールすることが出来る。即
ち、Ｋは無闇に大きくすることはかえって良くなく、離
散型に対する、FVQ型による認識率の改善量を最大にす
るという言う意味で、コードブックサイズに応じてＫに
は最適値K₀が存在する。実験によれば、不特定話者によ
る100都市名の認識において、コードブックサイズ256に
対してはＫ＝６が最適値、コードブックサイズ16に対し
てはＫ＝３が最適値であった。The degree of this interpolation is affected by the value of K as well as the codebook size and fuzziness. As K = 1 approaches, that is, as the discrete type approaches, the effect of complementation decreases, and as K increases, the effect of interpolation increases. Therefore, when fuzziness is fixed, the degree of interpolation can be controlled by K. In other words, it is not good to increase K indiscriminately, and in the sense that the improvement of the recognition rate by the FVQ type with respect to the discrete type is maximized, the K has an optimal value K ₀ according to the codebook size. I do. According to the experiment, in recognizing 100 city names by an unspecified speaker, the optimal value is K = 6 for a codebook size of 256 and the optimal value is K = 3 for a codebook size of 16.

このように、FVQ型は、離散型に比べれば、認識時に
（数８）を計算する必要があるからＫ回の帰属度の計算
とＫ回の積和演算が増加するが、認識率は離散型より向
上し、連続型の場合と同等以上となり、連続型の場合に
比べて計算量はかなり減る。Thus, the FVQ type needs to calculate (Equation 8) at the time of recognition as compared with the discrete type, so that the calculation of K times of membership and the product-sum operation of K times increase, but the recognition rate is discrete. It is better than the case of the continuous type, is equal to or more than the case of the continuous type, and the calculation amount is considerably reduced as compared with the case of the continuous type.

（数１）を計算する方法としてForward−Backward法
と呼ばれる方法が用いられるが、計算量の削減のため
（数１）の近似解としてＸに関する最大値を計算するVi
terbi法がよく用いられ、対数化して加算の形で用いら
れるのが普通である。即ち、を計算し、Ｌ′を尤度とする。（数９）は動的計算法に
よって効率的に計算することができる。即ち、Ｌ′はをφ_ｉ（１）＝log π_ｉとして、ｔ＝2,...,Tについて
漸化的に計算し、として求められる。これをViterbi法という。認識結果
としてはＬを用いてもＬ′を用いても大差がないという
ことから、モデルの作成においてはBaum−Welch法（For
ward−Backward法）を用い、認識においてはViterbi法
を用いることがよく行われる。相乗型FVQ/HMMの場合、
認識においてViterbi法を用いる場合は、b_imはlog b_im
の形でしか用いないから、b_imをそのまま記憶するので
はなく、log b_imを記憶しておけば、（数７）あるいは
（数８）の計算は、対数演算は不要で積和のみで実行で
きる。As a method of calculating (Equation 1), a method called the Forward-Backward method is used.
The terbi method is often used, and is usually used in the form of logarithmic addition. That is, Is calculated, and L ′ is set as the likelihood. (Equation 9) can be efficiently calculated by the dynamic calculation method. That is, L ' Is assumed to be φ _i (1) = log π _i , and t = 2,. Is required. This is called the Viterbi method. Since there is no great difference between using L and L 'as the recognition result, the Baum-Welch method (For
ward-Backward method), and the Viterbi method is often used for recognition. For synergistic FVQ / HMM,
When the Viterbi method is used for recognition, b _im is log b _im
Do not used only in the form of, instead of directly storing the b _im, by storing the log b _im, (7) or calculation of the equation (8), the logarithmic operation is only required a sum of products I can do it.

次にDPマッチングについて説明する。最も基本的には
特徴ベクトル列同士のパターンマッチングによる方法が
ある。図５はその従来例である。51は特徴抽出部であっ
て、図３の301と同様のものである。53は標準パターン
記憶部であって、単語に対応した標準パターンが記憶さ
れている。この標準パターンは、認識すべき単語に対応
して、特徴抽出部51で特徴ベクトル系列に変換されたも
のとして標準パターン記憶部に予め登録されるものであ
る。図５における破線はこの登録のとき用いられる接続
を示すものであり、認識時には、この破線部分の示す接
続は解除される。52はパターンマッチング部であって、
標準パターン記憶部53に記憶されているそれぞれの標準
パターンと入力パターンとのマッチング計算を行い、入
力パターンとそれぞれの標準パターンとの距離（または
類似度）を計算する。54は判定部であって、前記入力パ
ターンとそれぞれの標準パターンとの距離（または類似
度）の最小値（最大値）を与える標準パターンに対応す
る単語を見出す。Next, DP matching will be described. Most basically, there is a method based on pattern matching between feature vector strings. FIG. 5 shows a conventional example. Reference numeral 51 denotes a feature extraction unit, which is the same as 301 in FIG. Reference numeral 53 denotes a standard pattern storage unit that stores standard patterns corresponding to words. This standard pattern is registered in advance in the standard pattern storage unit as a result of conversion into a feature vector sequence by the feature extraction unit 51, corresponding to the word to be recognized. The broken line in FIG. 5 indicates the connection used for the registration, and the connection indicated by the broken line is released at the time of recognition. 52 is a pattern matching unit,
A matching calculation between each standard pattern stored in the standard pattern storage unit 53 and the input pattern is performed, and a distance (or similarity) between the input pattern and each standard pattern is calculated. Reference numeral 54 denotes a determination unit that finds a word corresponding to a standard pattern that gives a minimum value (maximum value) of the distance (or similarity) between the input pattern and each standard pattern.

もう少し具体的に説明すれば次のようになる。本例で
は、パターン間の「距離」を求めるとして説明する。
（「類似度」に基づく場合は「距離」を「類似度」に、
「最小値」を「最大値」に置き換えればよい）。いま、
特徴抽出部51において時点ｔに出力される特徴ベクトル
をy_t、その系列たる入力パターンをＹ＝（y₁,y₂,...,
y_T）、単語ｒに対応する標準パターンをとし、ＹのY^(r)に対する距離をD^(r)、y_tとy^(r) _jとの距
離をd^(r)（t,j）とするとき（ただし、乗算形式で表す
ときはそれぞれをD₂ ^(r)、d₂ ^(r)（t,j）、加算形式で表
すときはD₁ ^(r)、d₁ ^(r)（t,j）とする）、を計算し、を認識結果とする。ただし、（数13）においてｘ（ｋ）
＝（ｔ（ｋ）,j（ｋ））は格子グラフ（t,j）における
ＹとY^(r)とのマッチング径路Ｘ上の第ｋ番の格子点であ
り、ｗ（ｘ（ｋ））は格子点ｘ（ｋ）における前記距離
に重み付けられる重み係数である。The following is a more specific explanation. In this example, a description will be given assuming that a “distance” between patterns is obtained.
(If based on "similarity,""distance" is replaced by "similarity,
Just replace "minimum" with "maximum.") Now
The feature vector output at the time point t in the feature extraction unit 51 is y _t , and the series of input patterns is Y = (y ₁ , y ₂ ,.
y _T ), the standard pattern corresponding to the word r And then, the distance to Y of ^{^{Y (r) D (r)}} , when the distance between y _t and y ^(r) _j and ^{d (r) (t, j} ) ( where each when expressed in multiplication format Is represented by D ₂ ^(r) , d ₂ ^(r) (t, j), and D ₁ ^(r) , d ₁ ^(r) (t, j) when expressed in addition form), And calculate Is the recognition result. Where x (k) in (Equation 13)
= (T (k), j (k)) is the k-th grid point on the matching path X between Y and Y ^(r) in the grid graph (t, j), and w (x (k)) Is a weight coefficient for weighting the distance at the grid point x (k).

以後、乗算形式でも加算形式でも並行的な議論が成り
立ち、必要とあれば乗算形式の表現に変換するのは容易
であり（d₁ ^(r)（t,j）＝log d₂ ^(r)（t,j）、D₁ ^(r)＝log
D₂ ^(r)等）、加算形式で用いられるのが一般的であるか
ら、ここでは主として加算形式で説明することにし（従
って、添え字1,2は省略）、必要に応じて乗算形式も表
記する。Thereafter, a parallel discussion holds in both the multiplication and addition forms, and it is easy to convert to a multiplication form expression if necessary (d ₁ ^(r) (t, j) = log d ₂ ^(r) ( t, j), D ₁ ^(r) = log
D ₂ ^(r) ), since it is generally used in the addition form, the description will be mainly made in the addition form here (therefore, the suffixes 1 and 2 are omitted), and the multiplication form is used as necessary. write.

ｘ（k₁）からｘ（k₂）までの点列ｘ（k₁），…,x
（k₂）をＸ（k₁,k₂）とし、ｘ（Ｋ）＝（ｔ（Ｋ）,J
（Ｋ））＝（T,J）とすれば、（数13）の意味は、点列
Ｘ（1,K）に沿って対応付けられる、入力パターンＹと
標準パターンy^(r)それぞれの特徴ベクトル間の重み付き
距離の累積の、Ｘ（1,K）に関する最小値をＹとY^(r)の
距離D^(r)とすると言うことである。（数13）の計算は、
重み係数ｗ（ｘ（ｋ））をうまく選べば動的計画法（Dy
namic Programming）を用いて効率的に実行することが
出来、DPマッチングと呼ばれる。x (k ₁₎ from x (k ₂₎ up to the point sequence _{x (k 1), ...,} x
Let (k ₂ ) be X (k ₁ , k ₂ ) and x (K) = (t (K), J
If (K)) = (T, J), the meaning of (Equation 13) means that the characteristic of each of the input pattern Y and the standard pattern y ^(r) associated with the point sequence X (1, K) The minimum value of the accumulation of the weighted distances between the vectors with respect to X (1, K) is defined as the distance D ^(r) between Y and Y ^(r) . The calculation of (Equation 13) is
If the weighting factor w (x (k)) is properly selected, dynamic programming (Dy
It can be efficiently executed using dynamic programming, and is called DP matching.

DPが行えるためには最適性の原理が成り立つ必要があ
る。即ち、「最適方策の部分方策はその部分方策でまた
最適方策である」と言うことが言えなければならない。
これが言えれば、に対して、なる漸化式が成り立ち、計算量が大幅に削減されること
になる。In order to perform DP, the principle of optimality must be established. In other words, it must be said that "a sub-policy of an optimal policy is a sub-policy and also an optimal policy".
If this can be said, For The following recurrence formula holds, and the amount of calculation is greatly reduced.

点ｘ（１）から、点p₀＝ｘ（ｋ）までの最適方策は、
点列Ｘ（1,k）＝（ｘ（１）,...,x（ｋ）＝p₀）に沿う
重み付き累積距離をφ′（p₀,X（1,k））とするとき、
φ′（p₀,X（1,k））を最小にする点列（最適点列）を
見出すことである。この最適点列をＸ^＊（1,k）＝（ｘ
^＊（１）,...,x^＊（ｋ−１）,x^＊（ｋ）＝p₀）とし、
φ′（p₀,X^＊（1,k））をφ（p₀）とすれば、前記最適
性の原理が成り立つと言うことは、点ｘ（１）から点ｘ
^＊（ｋ−１）までの最適点列は、点列Ｘ^＊（1,k）上
の、点ｘ^＊（１）から点ｘ^＊（ｋ−１）までの点列に一
致するということである。言い換えれば、ｘ（１）を始
端、ｘ（ｋ−１）を終端とする最適点列の中で、φ（ｘ
（ｋ−１））＋ｗ（p₀）d^(r)（p₀）が最小になる点列を
Ｘ^＊（1,k−１）＝（ｘ^＊（１）,...,x^＊（ｋ−１））
とするとき、ｘ（１）からｘ（ｋ）＝p₀までの最適点列
におけるｘ（ｋ−１）までの点列は、Ｘ^＊（1,k−１）
に一致する。故に、種々のｘ（１）を始端とし、種々の
ｘ（ｋ−１）を終端とする最適点列が既知、従って種々
のｘ（ｋ−１）についてφ（ｘ（ｋ−１））が既知であ
れば、種々のｘ（１）から特定のｘ（ｋ）＝p₀までの最
適点列とそれに沿う重み付き累積距離は（数16）によっ
て計算できる。即ち、点ｘ（１）から点ｘ（ｋ）迄の重
み付き最小累積距離φ（ｘ（ｋ））は、重み付き最小累
積距離φ（ｘ（ｋ−１））を用いてその続きとして（数
16）に従って求められると言うことであって、φ（ｘ
（１））＝ｗ（ｘ（１））d^(r)（ｘ（１））を初期値と
してD^(r)＝φ（ｘ（Ｋ））が漸化的に求められるから、
全ての許される径路における累積距離を総当たりで計算
するよりははるかに少ない計算量で重み付き最小累積距
離が求められる。The optimal strategy from point x (1) to point p ₀ = x (k) is
When the weighted cumulative distance along the point sequence X (1, k) = (x (1),..., X (k) = p ₀ ) is φ ′ (p ₀ , X (1, k)) ,
This is to find a point sequence (optimal point sequence) that minimizes φ ′ (p ₀ , X (1, k)). X ^* (1, k) = (x
^{^{* (1), ..., x}} * (k-1), x * (k) = p 0) and then,
If φ ′ (p ₀ , X ^* (1, k)) is φ (p ₀ ), the fact that the principle of the above-mentioned optimality is satisfied means that the point x (1) is converted to the point x (1).
^* (K-1) optimal point sequence up is that it coincides with the point sequence ^{X *} (1, k) on the column points from the point ^x * (1) to the point ^x * (k-1) is there. In other words, in the optimal point sequence starting at x (1) and ending at x (k-1), φ (x
(K-1)) + w (p 0) d (r) (p 0) X * the column point where the minimum (1, k-1) = (x * (1), ..., x * ( k-1))
, The point sequence from x (1) to x (k−1) in the optimal point sequence from x (k) = p ₀ is X ^* (1, k−1)
Matches. Therefore, the optimal point sequence starting at various x (1) and ending at various x (k-1) is known, and therefore φ (x (k-1)) for various x (k-1) If known, the optimal point sequence from various x (1) to a specific x (k) = p ₀ and the weighted cumulative distance along it can be calculated by (Equation 16). That is, the weighted minimum cumulative distance φ (x (k)) from the point x (1) to the point x (k) is calculated using the weighted minimum cumulative distance φ (x (k−1)) as number
16), and φ (x
Since (1)) = w (x (1)) d ^(r) (x (1)) as an initial value, D ^(r) = φ (x (K)) is recursively obtained.
The weighted minimum cumulative distance can be obtained with much less calculation amount than calculating the cumulative distance in all allowable paths by brute force.

ここで、（数16）を成立させることが出来る重み係数
の例として等の何れかを満足する場合が考えられる。即ち、重み係
数を（数17）等とすれば、最適性の原理が成立し、動的
計画法が適用できる。（１）は重み係数の総和が入力パ
ターンの長さ（フレーム数）に等しくなる場合、（２）
は重み係数の総和が標準パターンの長さに等しくなる場
合、（３）は重み係数の総和が入力パターンと標準パタ
ーンの長さの和に等しくなる場合である。Here, as an example of a weighting factor that can satisfy Equation 16, And so on. That is, if the weighting coefficient is (Equation 17) or the like, the principle of optimality holds, and the dynamic programming can be applied. (1) if the sum of the weighting factors is equal to the length of the input pattern (the number of frames), (2)
Is the case where the sum of the weighting factors is equal to the length of the standard pattern, and (3) is the case where the sum of the weighting factors is equal to the sum of the lengths of the input pattern and the standard pattern.

（数17）の式（１）を用いれば、（数16）の漸化式の
具体例の１つとして（数18）が考えられる。If Expression (1) of Expression (17) is used, Expression (18) can be considered as one specific example of the recurrence expression of Expression (16).

（数18）をｔ＝1,...,T,j＝1,...,Jについて逐次計算
することによって（数13）即ちD^(r)を計算することが出
来る。この場合はｘ（ｋ）につながり得る径路は、図６
のように拘束していることになる。即ち、点（t,j）に
至る径路は、点（ｔ−2,j−１）→点（ｔ−1,j）→点
（t,j）、点（ｔ−1,j−１）→点（t,j）、点（ｔ−1,j
−１）→点（t,j）、の３通りの何れかのみを通るもの
であって、径路上の数値はそれぞれの径路が選ばれたと
きの重み係数を示す。この場合は、ｗ（ｘ（１））＋…
＋ｗ（ｘ（Ｋ））は入力フレーム数Ｔに等しくなる。従
って、この場合は（数14）の分母は標準パターンと関係
なく一定になるので、入力パターンがどの標準パターン
に最も近いかを計算する場合は、ｗ（ｘ（１））＋…＋
ｗ（ｘ（Ｋ））で正規化する必要はない。この場合、d
^(r)（t,j）としては、ユークリッド距離またはより簡略
化されたものとして市街地距離等がよく用いられる。 By sequentially calculating (Equation 18) for t = 1,..., T, j = 1,..., J, (Equation 13), that is, D ^(r) can be calculated. In this case, the path that can lead to x (k) is shown in FIG.
Will be constrained as follows. That is, the path leading to the point (t, j) is as follows: point (t−2, j−1) → point (t−1, j) → point (t, j), point (t−1, j−1) → Point (t, j), point (t−1, j
-1) → point (t, j), and passes through only one of the three ways, and the numerical value on the path indicates the weight coefficient when each path is selected. In this case, w (x (1)) + ...
+ W (x (K)) is equal to the number T of input frames. Accordingly, in this case, the denominator of (Equation 14) is constant irrespective of the standard pattern. Therefore, when calculating which standard pattern is closest to the input pattern, w (x (1)) +.
There is no need to normalize with w (x (K)). In this case, d
^{(r) As the} (t, j), the Euclidean distance or the city distance as a more simplified one is often used.

前記マッチング計算において最も計算量は多いのは、
特徴ベクトル間の距離計算あるいは類似度計算である。
特に単語数が多くなって来るとこの計算量がそれに比例
して多くなり応答に時間がかかり、実用上問題となって
来る。これを減らすために考え出されたものにベクトル
量子化を用いるいわゆる“SPLIT法”がある（SPLIT:Wor
d Recognition System Using Strings of Phoneme−Lik
e Templates）。（菅村、古井“擬音韻標準パタンによ
る大語彙単語音声認識",信学論（Ｄ）,J65−D,8,pp.104
1−1048（昭57−08）。）図７はその従来例を示すブロック図である。特徴抽出
部71は図３のものと同様である。73はコードブックであ
って、Ｍ個のラベル付けされた代表ベクトルがラベルに
よって検索可能な形で記憶されている。74はベクトル量
子化部であって、特徴抽出部71の出力特徴ベクトルy_tを
コードブック73を用いて、y_tに最も近いセントロイドを
持つクラスタのラベルに変換するものである。77は単語
辞書であって、認識すべき単語音声の標準パターンが上
記の如き操作によってラベル系列に変換されたものとし
て記憶されている。このラベルは別名擬音韻とも呼ばれ
る。標準パターンたる単語ｒの第ｋ番フレームの擬音韻
をs^(r) _kとすれば、同図に示すような形で認識すべき単
語が擬音韻列の形で登録される。J^(r)は単語ｒの標準パ
ターンの最終フレーム（従ってフレーム数）である。同
図における破線は認識単語の登録動作の時にのみ用いら
れる接続を示す。72は距離行列算出部であって、特徴抽
出部71のそれぞれの出力ベクトルの、それぞれのクラス
タのセントロイドに対する距離を求め、それら距離を要
素とするベクトルに変換し、特徴ベクトル系列を距離ベ
クトル系列即ち距離行列に変換する。例えば、距離行列
は75に示すようなもので、フレームｔの特徴ベクトルy_t
の、クラスタC_mのセントロイドμ_ｍとの距離ｄ（y_t,μ
_ｍ）（図７ではd_TMと表記されている）を要素とする距
離ベクトル（ｄ（y_t,μ_１）、ｄ（y₁,μ_２）,...,d
（y₁,μ_Ｍ））^Ｔにy_tに変換される。距離は例えば市街
地距離を用いる場合はと定義できる。ここに、y_tkはベクトルy_tの第ｋ要素、
μ_mkはC_mのセントロイドベクトルμ_ｍの第ｋ要素であ
る。76はマッチング部であって距離行列算出部62の出力
たる距離行列と単語辞書のそれぞれの単語とのマッチン
グをとり、その間の距離を計算するものである。具体的
には、s^(r) _j＝C_mとするとき、y_tとs^(r) _jとの距離d
^(r)（t,j）を（数20） d^(r)（t,j）＝ｄ（y_t,μ_ｍ）として、（数18）を計算することになる。即ち、図７は
図５の従来例におけるd^(r)（t,j）の代わりに、距離行
列を参照することによって前以て計算されているｄ
（y_t,μ_ｍ）を用いる点が異なるのみであって全く同様
にDPを用いて計算できる。78は判定部であって、（数1
4）を計算し、最終的に認識結果を得るものである。こ
の場合、（数14）の分母は図１の場合と同じ値を持ち、
図５の実施例で説明したことと同じ理由でｗ（ｘ
（１））＋…＋ｗ（ｘ（Ｋ））＝Ｔであるからこれで正
規化する必要はない。The most calculation amount in the matching calculation is
This is distance calculation or similarity calculation between feature vectors.
In particular, as the number of words increases, the amount of calculation increases in proportion to the number of words, and it takes a long time to respond, which poses a practical problem. To reduce this, there is a so-called “SPLIT method” that uses vector quantization (SPLIT: Wor
d Recognition System Using Strings of Phoneme−Lik
e Templates). (Sugamura, Furui, "Large Vocabulary Word Speech Recognition Using Onomatopoeia Standard Patterns", IEICE (D), J65-D, 8, pp.104
1-1048 (Showa 57-08). FIG. 7 is a block diagram showing a conventional example. The feature extraction unit 71 is the same as that in FIG. Reference numeral 73 denotes a codebook in which M labeled representative vectors are stored in a form that can be searched by labels. Reference numeral 74 denotes a vector quantization unit that converts the output feature vector y _t of the feature extraction unit 71 into a label of a cluster having a centroid closest to y _t using the codebook 73. Reference numeral 77 denotes a word dictionary which stores a standard pattern of a word voice to be recognized, which is converted into a label sequence by the above-described operation. This label is also called onomatopoeia. Assuming that the pseudophoneme of the k-th frame of the word r as the standard pattern is s ^(r) _k , a word to be recognized in the form shown in FIG. J ^(r) is the last frame (and thus the number of frames) of the standard pattern of word r. The broken line in the figure indicates a connection used only during the operation of registering a recognized word. Reference numeral 72 denotes a distance matrix calculation unit that obtains the distance of each output vector of the feature extraction unit 71 to the centroid of each cluster, converts the distance into a vector having the distance as an element, and converts a feature vector sequence into a distance vector sequence. That is, it is converted into a distance matrix. For example, the distance matrix is as shown in 75, and the feature vector y _t of the frame _t
Of, cluster C _m distance d (y _t of the centroid μ _m of, μ
_m ) (denoted as _{dTM in} FIG. 7) as distance elements (d (y _t , μ ₁ ), d (y ₁ , μ ₂ ),..., d
(Y _{_1,} μ M) is converted into y _t ^{in) T.} For example, when using the city distance Can be defined as Where y _tk is the k-th element of the vector y _t ,
μ _mk is the k-th element of the centroid vector μ _m of C _m . A matching unit 76 matches the distance matrix output from the distance matrix calculation unit 62 with each word in the word dictionary, and calculates the distance therebetween. Specifically, when s ^(r) _j = C _m , the distance d between y _t and s ^(r) _j
^(r) (t, j) the (number ^{20) d (r) (t} , j) = d (y t, μ m) as will compute the (number 18). That is, FIG. 7 shows a case where d ^(r) (t, j) in the conventional example of FIG. 5 is replaced with d which is calculated in advance by referring to a distance matrix.
The difference is that (y _t , μ _m ) is used, and the calculation can be performed in exactly the same way using DP. Reference numeral 78 denotes a judgment unit,
4) is calculated and finally the recognition result is obtained. In this case, the denominator of (Equation 14) has the same value as in FIG.
For the same reason as described in the embodiment of FIG. 5, w (x
Since (1)) +... + W (x (K)) = T, there is no need to normalize this.

図５の従来例の場合は、y_tとy^(r) _jの距離計算は認識
単語数が増えるとそれにともなって増加するが、図７の
従来例の場合は、距離行列75を一旦計算してしまえば、
y_tと擬音韻との距離は距離行列75を参照するのみでよい
ので、単語がいくら増えてもd^(r)（t,j）の計算量は不
変である。In the case of the conventional example of FIG. 5, the distance calculation of y _t and y ^(r) _j increases with the number of recognized words, but in the case of the conventional example of FIG. 7, the distance matrix 75 is calculated once. Once
Since the distance between y _t and the onomatopoeia only needs to refer to the distance matrix 75, the amount of calculation of d ^(r) (t, j) remains unchanged regardless of how many words are added.

例えば、１単語平均50フレーム、特徴ベクトルを10次
元として100単語を認識する場合を考えてみれば、図５
の場合、y_tと距離計算を行うべき標準パターンベクトル
の数は50×100＝5000のオーダーであり、距離をユーク
リッド距離とすればかけ算の回数はこれを10倍して5000
0回となる。図７の場合は、y_tと距離計算を行うのは、
コードブックの各セントロイドベクトルのそれぞれとで
あるから、クラスタ数をＭ＝256とすれば、認識単語数
に関わりなく256回の距離計算で済み、かけ算の回数は2
560となり、後者は前者の約1/20で済むと言うことにな
る。For example, consider the case of recognizing 100 words with an average of 50 frames per word and a 10-dimensional feature vector.
In the case of, y _t and the number of standard pattern vectors for which the distance calculation is to be performed are of the order of 50 × 100 = 5000, and if the distance is the Euclidean distance, the number of multiplications is multiplied by 10 to 5000.
It becomes 0 times. In the case of FIG. 7, the calculation of y _t and distance is
Since each centroid vector in the codebook is the same, if the number of clusters is M = 256, 256 distance calculations are required regardless of the number of recognized words, and the number of multiplications is 2
560, which means that the latter is about 1/20 of the former.

なお、ここでは、入力特徴ベクトル系列は距離マトリ
クスに変換されるとして説明したが、実際には、距離ベ
クトル（d_t1,…,d_tM）^Ｔは標準パターンの擬音韻s^(r) _j
（ｒ＝1,…,R;j＝1,…,J^(r)）それぞれとの一通りの照
合が終わると不要になるから、入力のフレーム毎に距離
ベクトルの算出と累積距離の漸化式の計算を全ての標準
パターンに対して行えば、ｄ（y_t,μ_ｊ）はマトリクス
として記憶する必要はなく、例えば（数18）を用いる場
合は、現フレームと直前のフレームの２フレーム分につ
いて距離ベクトルを記憶しておけば良く、記憶量は実際
にはもっと少なくなる。Here, although the input feature vector sequence has been described as being converted into a distance matrix, in fact, the distance vector _{_{(d t1, ..., d tM}} ) T is the standard pattern onomatopoeic rhyme s ^(r) _j
(R = 1,..., R; j = 1,..., J ^(r) ). If the calculation of the formula is performed for all the standard patterns, d (y _t , μ _j ) does not need to be stored as a matrix. It is sufficient to store the distance vector for each minute, and the storage amount is actually smaller.

前記FVQ/HMMは、連続型HMMと同等以上の認識率を示
し、計算量は連続型に比べればはるかに少ないが、ワー
ドスポッティングを行う場合は、ω_ｉ（y_t）の定義を前
記FVQ/HMMと同じにすると言うわけには行かない。The FVQ / HMM shows a recognition rate equal to or higher than that of the continuous HMM, and the calculation amount is much smaller than that of the continuous HMM. However, when performing word spotting, the definition of ω _i (y _t ) is defined by the FVQ / HMM. It cannot be said that it is the same as HMM.

また、前記SPLIT法は、スペクトルを直接マッチング
する方法に比べれば格段に少ない計算量ですむが、認識
精度に劣化をきたす問題がある。The SPLIT method requires much less computation than a method of directly matching spectra, but has a problem in that recognition accuracy is deteriorated.

発明の開示本願の第１の発明はこの問題点を解決したものであ
る。第２の発明は前記SPLIT法の改良に関するものであ
り、前記FVQの考え方をDPマッチングに適用することで
ある。第３の発明は、前記HHMおよびDPにおける記憶
量、計算量の削減に関するものである。第４の発明は、
特に前記HMMにおいて、認識時における計算量をさらに
削減するものである。DISCLOSURE OF THE INVENTION The first invention of the present application has solved this problem. The second invention relates to an improvement of the SPLIT method, and applies the concept of FVQ to DP matching. The third invention relates to a reduction in the amount of storage and the amount of calculation in the HHM and DP. The fourth invention is
Particularly, in the HMM, the amount of calculation at the time of recognition is further reduced.

（１）第１の発明は、解析の対象とするシステムは複数
の状態をとるとし、特徴ベクトル空間をクラスタリング
し、それぞれのクラスタの代表ベクトルがそのラベルで
検索可能な形で記憶されたコードブックと、各状態にお
ける前記各ラベルの発生確率（従って各クラスタの発生
確率）を記憶するクラスタ発生確率記憶手段と、前記コ
ードブックを用いて観測ベクトルの前記各クラスタへの
帰属度（前記各クラスタの該観測ベクトルに対する事後
確率）を算出する帰属度算出手段と、該算出された各ク
ラスタへの前記観測ベクトルの帰属度の対数値と前記ク
ラスタ発生確率記憶手段に記憶されている各クラスタの
発生確率との積和またはそれに等価な量を算出し、観測
ベクトルの前記システムの各状態における発生度合とす
る観測ベクトル発生度合算出手段とを含む。(1) The first invention assumes that a system to be analyzed has a plurality of states, clusters a feature vector space, and stores a representative vector of each cluster in a form searchable by its label. Cluster occurrence probability storage means for storing the occurrence probability of each label in each state (accordingly, the occurrence probability of each cluster); and the degree of belonging of an observation vector to each cluster (the Membership calculation means for calculating the posterior probability of the observation vector), the logarithmic value of the degree of membership of the observation vector to each calculated cluster, and the occurrence probability of each cluster stored in the cluster occurrence probability storage means Calculate the sum of products of the observation vector and the equivalent amount, and generate the observation vector as the degree of occurrence of the observation vector in each state of the system. And a degree calculation means.

（２）第２の発明は、特徴ベクトル空間をクラスタリン
グし、それぞれのクラスタの代表ベクトルがそのラベル
で検索可能な形で記憶されたコードブックと、観測ベク
トルの前記各クラスタの帰属度あるいは前記各クラスタ
の前記観測ベクトルに対する事後確率（両方含めて以後
帰属度と呼ぶことにする）を算出し、前記観測ベクトル
の各クラスタに対する帰属度を要素とする帰属度ベクト
ルを算出する帰属度算出手段と、帰属度ベクトルで表現
した標準パターンを記憶する標準パターン記憶手段と、
前記帰属度算出手段の出力として得られる前記観測ベク
トルから変換された帰属度ベクトルからなる入力パター
ンと前記標準パターンとのマッチングを行うマッチング
手段を含む。(2) The second invention clusters a feature vector space, and stores a codebook in which a representative vector of each cluster is stored in a searchable form by its label, and a degree of membership of each cluster of observation vectors or each of the above Means for calculating a posterior probability of the cluster with respect to the observation vector (both are hereinafter referred to as the degree of membership), and calculating a degree of membership vector having the degree of membership of the observation vector with respect to each cluster as an element; Standard pattern storage means for storing a standard pattern represented by the membership degree vector,
A matching means for matching an input pattern consisting of the membership vector converted from the observation vector obtained as an output of the membership calculation means with the standard pattern;

（３）第３の発明は、特徴ベクトル空間をクラスタリン
グし、それぞれのクラスタの代表ベクトルがそのラベル
で検索可能な形で記憶されたコードブックと、HHMの状
態ｉにおけるクラスタｍの発生確率またはDPマッチング
における標準パターンベクトルの第ｉフレームの特徴ベ
クトルのクラスタｍへの帰属度をb_im、クラスタ数をＭ
とするとき、b_i1,...,b_iMの中から大きさの順にとった
Ｎ個ｂ_i,g（i,1）,b_i,g（i,2）,...,b_i,g（i,N）（ｇ
（i,n）はｎ番目に大きいクラスタのラベル）はそのま
まの値またはそれぞれの対数値log b_i,g（i,1）,log b
_i,g（i,2）,...,log b_i,g（i,N）の形で記憶し、残りの
ｂ_{i,g（i,N＋１）},...,b_i,g（i,M）は一定値を記憶する
クラスタ発生確率記憶手段または帰属度標準パターン記
憶手段を含む。(3) The third invention clusters a feature vector space, and stores a codebook in which a representative vector of each cluster is stored in a form searchable by its label, a probability of occurrence of a cluster m in the state i of HHM, or DP The degree of belonging of the feature vector of the i-th frame of the standard pattern vector to the cluster m in the matching is b _im , and the number of clusters is M
Then, N pieces of b _{i, g (i, 1)} , b _{i, g (i, 2)} , ..., b _i taken in order of size from b _i1 , ..., b _iM _{, g (i, N)} (g
(I, n) is the label of the nth largest cluster) is the value as it is or its logarithmic value log b _{i, g (i, 1)} , log b
_{i, g (i, 2)} , ..., log b _{i, g (i, N)} and store the remaining b _{i, g (i, N + 1)} , ..., b _{i, g ( i, M)} includes a cluster occurrence probability storage unit or a membership standard pattern storage unit that stores a constant value.

（４）第４の発明は、特徴ベクトル空間をクラスタリン
グし、それぞれのクラスタの代表ベクトルがそのラベル
で検索可能な形で記憶されたコードブックと、各状態に
おける前記各ラベルの発生確率（従って各クラスタの発
生確率）を記憶するクラスタ発生確率記憶手段と、前記
コードブックを用いて観測ベクトルの前記各クラスタへ
の帰属度（前記各クラスタの該観測ベクトルに対する事
後確率）を算出する帰属度算出手段と、該算出された各
クラスタへの前記観測ベクトルの帰属度と、前記クラス
タ発生確率記憶手段に記憶されている各クラスタの発生
確率の対数値との積和またはそれに等価な量を算出し、
観測ベクトルの前記システムの各状態における発生度合
を算出する観測ベクトル発生度合算出手段とを含み、前
記各状態における前記各クラスタの発生確率の推定は、
前記観測ベクトル発生度合算出手段を用いて計算し、認
識時は、前記観測ベクトルの帰属度を、最大の帰属度は
１とし、他の帰属度はすべて０になるように算出する手
段を含む。(4) According to a fourth aspect of the present invention, a feature vector space is clustered, and a code book in which a representative vector of each cluster is stored in a form searchable by its label, and a probability of occurrence of each label in each state (accordingly, Cluster occurrence probability storage means for storing cluster occurrence probabilities; and membership degree calculation means for calculating the degree of membership of an observation vector to each cluster (posterior probability of each cluster for the observation vector) using the codebook And calculating the sum of products of the calculated degree of belonging of the observation vector to each cluster and the logarithmic value of the occurrence probability of each cluster stored in the cluster occurrence probability storage means or an equivalent amount thereof,
An observation vector occurrence degree calculating means for calculating the occurrence degree of the observation vector in each state of the system, and estimating the occurrence probability of each cluster in each state,
It includes means for calculating using the observation vector occurrence degree calculating means, and at the time of recognition, calculating the degree of belonging of the observation vector such that the maximum degree of belonging is 1 and all other degrees of belonging are zero.

本願発明の作用を次ぎに説明する。 The operation of the present invention will be described below.

（１）第１の発明では、解析の対象とするシステムは複
数の状態をとるとし、特徴ベクトル空間をクラスタリン
グし、それぞれのクラスタの代表ベクトルがそのラベル
で検索可能な形で記憶されたコードブックを備え、クラ
スタ発生確率記憶手段によって各状態における前記各ラ
ベルの発生確率（従って各クラスタの発生確率）を記憶
しておき、帰属度算出手段によって、前記コードブック
を用いて観測ベクトルの前記各クラスタへの帰属度（前
記各クラスタの該観測ベクトルに対する事後確率）を算
出し、該算出された各クラスタへの前記観測ベクトルの
帰属度の対数値と前記クラスタ発生確率記憶手段に記憶
されている各クラスタの発生確率との積和またはそれに
等価な量を観測ベクトル発生度合算出手段により算出
し、前記観測ベクトルの前記システムの各状態における
発生度合を算出する。(1) In the first invention, a system to be analyzed assumes a plurality of states, clusters a feature vector space, and stores a representative vector of each cluster in a form searchable by its label. The probability of occurrence of each label in each state (accordingly, the probability of occurrence of each cluster) is stored by cluster occurrence probability storage means, and each cluster of observation vectors is stored by the membership degree calculating means using the codebook. (The posterior probability of each cluster with respect to the observed vector) is calculated, and the logarithmic value of the calculated degree of belonging of the observed vector to each cluster and each cluster stored in the cluster occurrence probability storage unit are calculated. The sum of products with the cluster occurrence probability or an equivalent amount is calculated by the observation vector occurrence degree calculation means, and the observation vector is calculated. To the calculated occurrence rate in each state of the system.

（２）第２の発明では、特徴抽出手段により入力信号を
特徴ベクトルの系列に変換し、帰属度算出手段により、
前記ベクトル系列の各ベクトルを、クラスタ記憶手段に
記憶されている該ベクトルが分類されるべき各クラスタ
への帰属度を算出し、標準パターン記憶手段により、前
記ベクトルの各クラスタに対する帰属度を要素とする帰
属度ベクトルを算出し、認識すべき各認識単位をそれぞ
れ帰属度ベクトル列で表現した標準パターン記憶し、マ
ッチング手段により、前記帰属度算出手段の出力として
得られる帰属度ベクトル列からなる入力パターンと前記
標準パターンとのマッチングを行うものである。(2) In the second invention, the input signal is converted into a sequence of feature vectors by the feature extraction unit, and
For each vector of the vector series, the degree of membership of each vector stored in the cluster storage means to each cluster to be classified is calculated, and the degree of membership of each vector to each cluster is defined as an element by the standard pattern storage means. An input pattern consisting of a sequence of membership degree vectors obtained as an output of the membership degree calculation means by a matching means is calculated by calculating a membership degree vector to be recognized, each recognition unit to be recognized is represented by a membership degree vector sequence, and stored. And the standard pattern.

（３）第３の発明では、HMMは、クラスタ発生確率記憶
手段を備え、クラスタ発生確率記憶手段は、状態ｉにお
けるクラスタｍの発生確率をb_im、クラスタ数をＭとす
るとき、b_i1,...,b_iMの中から大きさの順にとったＲ個
ｂ_i,g（i,1）,b_i,g（i,2）,...,b_i,g（i,R）（ｇ（i,
r）はｒ番目に大きいクラスタのラベル）はそのままの
値またはそれぞれの対数値log b_i,g（i,1）,log b
_i,g（i,2）,...,log b_i,g（i,R）の形で記憶し、残りの
ｂ_{i,g（i,R＋１）},...,b_i,g（i,M）は一定値を記憶し、
特徴抽出手段は、入力信号を特徴ベクトルの系列に変換
し、クラスタ記憶手段は、前記ベクトルが分類されるべ
きクラスタを記憶し、帰属度算出手段は、前記特徴ベク
トル系列の各ベクトルの前記各クラスタへの帰属度を算
出し、特徴ベクトル発生手段は、前記特徴ベクトルの各
クラスタに対する該帰属度と前記HHMの各状態における
前記各クラスタの発生確率とから前記HMMの各状態にお
ける前記特徴ベクトルの発生度合を算出し、ベクトル系
列発生度合算出手段は、前記特徴ベクトル発生度合算出
手段の出力を用いて前記HMMから前記特徴ベクトル系列
の発生する度合を算出し、前記特徴ベクトル発生度合算
出手段は、前記帰属度の上位Ｋ個のクラスタとそれぞれ
に対応する請求項１記載のクラスタの発生確率とから前
記HMMの各状態における前記特徴ベクトルの発生度合を
算出する。(3) In the third invention, the HMM includes cluster occurrence probability storage means. When the occurrence probability of cluster m in state i is b _im and the number of clusters is M, the HMM includes b _i1 , R, b _{i, g (i, 1)} , b _{i, g (i, 2)} , ..., b _{i, g (i, R)} taken in order of size from ..., b _iM (G (i,
r) is the label of the r-th largest cluster) as is or the logarithmic value log b _{i, g (i, 1)} , log b
_{i, g (i, 2)} , ..., log b _{i, g (i, R)} and store the remaining b _{i, g (i, R + 1)} , ..., b _{i, g ( i, M)} stores a constant value,
The feature extraction means converts an input signal into a sequence of feature vectors, the cluster storage means stores clusters to which the vectors are to be classified, and the membership calculation means calculates the respective clusters of each vector of the feature vector series. The feature vector generating means calculates the degree of belonging to each cluster of the feature vector and the probability of occurrence of each cluster in each state of the HHM based on the degree of belonging to each cluster of the feature vector. Calculating the degree, the vector series occurrence degree calculating means calculates the degree of occurrence of the feature vector sequence from the HMM using the output of the feature vector occurrence degree calculating means, and the feature vector occurrence degree calculating means, 2. A method according to claim 1, wherein the first and second clusters having the highest degree of membership correspond to the respective cluster occurrence probabilities. It calculates the degree of occurrence of the feature vector.

（４）第４の発明では、特徴ベクトル空間をクラスタリ
ングし、それぞれのクラスタの代表ベクトルがそのラベ
ルで検索可能な形で記憶されたコードブックを備え、ク
ラスタ発生確率記憶手段は、各状態における前記各ラベ
ルの発生確率（従って各クラスタの発生確率）を記憶
し、帰属度算出手段によって、前記コードブックを用い
て観測ベクトルの前記各クラスタへの帰属度（前記各ク
ラスタの該観測ベクトルに対する事後確率）を算出し、
観測ベクトル発生度合算出手段は、該算出された各クラ
スタへの前記観測ベクトルの帰属度と、前記クラスタ発
生確率記憶手段に記憶されている各クラスタの発生確率
の対数値との積和またはそれに等価な量を算出し、前記
観測ベクトルの前記システムの各状態における発生度合
を算出し、前記各状態における前記各クラスタの発生確
率の推定は、前記観測ベクトル発生度合算出手段を用い
て計算し、認識時は、前記観測ベクトルの帰属度を、最
大の帰属度は１とし、他の帰属度はすべて０になるよう
に算出する。(4) In the fourth invention, the feature vector space is clustered, and a code book in which a representative vector of each cluster is stored in a form that can be searched by its label is provided. The probability of occurrence of each label (accordingly, the probability of occurrence of each cluster) is stored, and the degree of membership of the observation vector to each cluster (the posterior probability of each cluster with respect to the observation vector) using the codebook is calculated by the degree of membership calculation means. )
The observed vector occurrence degree calculating means is a product sum of the calculated degree of belonging of the observed vector to each cluster and a logarithmic value of the occurrence probability of each cluster stored in the cluster occurrence probability storage means or is equivalent thereto. And calculating the degree of occurrence of the observation vector in each state of the system, and estimating the probability of occurrence of each cluster in each state using the observation vector occurrence degree calculation means. At this time, the degree of membership of the observation vector is calculated such that the maximum degree of membership is 1 and all other degrees of membership are zero.

図面の簡単な説明図１は、HMMの説明図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an explanatory diagram of an HMM.

図２は、音声認識の際によく用いられるHMMの例示図
である。FIG. 2 is an exemplary diagram of an HMM frequently used in speech recognition.

図３は、離散型HMMによる音声認識装置の従来例を示
すブロック図である。FIG. 3 is a block diagram showing a conventional example of a speech recognition device using a discrete HMM.

図４は、ファジィベクトル量子化に基づくHMMによる
音声認識装置の従来例および本願発明の一実施例を示す
ブロック図である。FIG. 4 is a block diagram showing a conventional example of a speech recognition apparatus using an HMM based on fuzzy vector quantization and an embodiment of the present invention.

図５は、パターンマッチングによる音声認識装置の従
来例のブロック図である。FIG. 5 is a block diagram of a conventional example of a speech recognition apparatus using pattern matching.

図６は、入力パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of a constraint condition of a matching path in the input pattern axis-dependent DP matching.

図７は、ベクトル量子化を用いた音声認識装置の従来
例を示すブロック図である。FIG. 7 is a block diagram showing a conventional example of a speech recognition apparatus using vector quantization.

図８は、ワードスポッティングの一つの方法の説明図
である。FIG. 8 is an explanatory diagram of one method of word spotting.

図９は、ファジィベクトル量子化に基づく本願発明に
よるDPマッチングによる音声認識装置の一実施例を示す
ブロック図である。FIG. 9 is a block diagram showing an embodiment of a speech recognition apparatus using DP matching according to the present invention based on fuzzy vector quantization.

図10は、入力パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a constraint condition of a matching path in DP matching of an input pattern axis dependent type.

図11は、入力パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of a constraint condition of a matching path in the input pattern axis-dependent DP matching.

図12は、入力パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 12 is an explanatory diagram showing an example of a constraint condition of a matching path in the input pattern axis-dependent DP matching.

図13は、入力パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 13 is an explanatory diagram showing an example of a constraint condition of a matching path in the input pattern axis-dependent DP matching.

図14は、入力パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 14 is an explanatory diagram showing an example of a constraint condition of a matching path of the input pattern axis-dependent DP matching.

図15は、標準パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 15 is an explanatory diagram showing an example of a constraint condition of a matching path in the standard pattern axis-dependent DP matching.

図16は、標準パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 16 is an explanatory diagram showing an example of a constraint condition of a matching path in the standard pattern axis-dependent DP matching.

図17は、標準パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 17 is an explanatory diagram showing an example of a constraint condition of a matching path in the standard pattern axis-dependent DP matching.

図18は、標準パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 18 is an explanatory diagram showing an example of a constraint condition of a matching path in the standard pattern axis-dependent DP matching.

図19は、標準パターン軸依存型のDPマッチングのマッ
チング径路の拘束条件の一例を示す説明図である。FIG. 19 is an explanatory diagram showing an example of a constraint condition of a matching path in the standard pattern axis-dependent DP matching.

図20は、よく用いられる標準パターン軸依存型のDPマ
ッチングのマッチング径路の拘束条件の一例を示す説明
図である。FIG. 20 is an explanatory diagram showing an example of a constraint condition of a matching path of a frequently used standard pattern axis-dependent DP matching.

図21は、本願発明によるHMMにおける各状態における
クラスタの発生確率、または、本願発明によるDPマッチ
ングにおける標準パターンにおる特徴ベクトルのクラス
タに対する帰属度の記憶方法を説明する説明図である。FIG. 21 is an explanatory diagram illustrating a method of storing the probability of occurrence of a cluster in each state in the HMM according to the present invention or the degree of membership of a feature vector in a standard pattern in a DP matching according to the present invention.

図22は、本願発明によるHMMにおける各状態における
クラスタの発生確率、または、本願発明によるDPマッチ
ングにおける標準パターンにおける特徴ベクトルのクラ
スタに対する帰属度の記憶方法を説明する説明図であ
る。FIG. 22 is an explanatory diagram for explaining a method of storing the probability of occurrence of a cluster in each state in the HMM according to the present invention or the degree of belonging of a feature vector to a cluster in a standard pattern in DP matching according to the present invention.

図23は、本願発明によるHMM、または、本願発明によ
るDPマッチングにおける入力パターンにおける特徴ベク
トルのクラスタに対する帰属度の記憶方法を説明する説
明図である。FIG. 23 is an explanatory diagram for explaining a method of storing the degree of membership of a feature vector to a cluster in an input pattern in an HMM according to the present invention or in DP matching according to the present invention.

図24は、本願発明によるHMM、または、本願発明によ
るDPマッチングにおける入力パターンにおける特徴ベク
トルのクラスタに対する帰属度の記憶方法を説明する説
明図である。FIG. 24 is an explanatory diagram for explaining a method of storing the degree of membership of a feature vector to a cluster in an input pattern in an HMM according to the present invention or in DP matching according to the present invention.

発明を実施するための最良の形態以下、本発明の実施例について図面を参照して説明す
る。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（数７）は分布u_t＝｛u_t1,...,u_tM｝と分布b
_i＝｛_i1,...,b_iM｝のKull back−Leibler divergence
（以後KLDと略記）から導かれるものである（文献：電
子情報通信学会技術報告SP93−27（1993年６月））。即
ち、u_tのb_iからの乖離度をＤ（u_t‖b_i）とすれば、KLD
はで与えられる。これは母集団b_iからu_tの生じ難さ、言い
換えれば、状態ｉにおけるu_tの生じ難さを表すものであ
る。従って、log ψ_ｉ（y_t）＝−Ｄ（u_t‖b_i）とおけ
ば、ψ_ｉ（y_t）は状態ｉにおけるu_tの生じ易さを表すこ
とになりω_ｉ（y_t）として用いることが出来る。そこ
で、（数１）のω_ｉ（y_t）をψ_ｉ（y_t）とすれば、であって、はモデルとは関係なく入力のみで決まる量であるから、
入力パターンがどのモデルから発生する可能性が高いか
を（数22）の値によって比較する場合は省略できる。そ
こで新しくと定義することが出来る。即ち、（数24）は（数１）に
おいてと定義したことになる。（数７）はこのようにして導く
ことができる。これが相乗型FVQ/HMMの原理である。(Equation 7) is distribution u _t = { _{ut 1} , ..., _{ut M} } and distribution b
Kull back−Leibler divergence of _i = ｛ _i1 , ..., b _iM ｝
(Hereinafter abbreviated as KLD) (Reference: IEICE Technical Report SP93-27 (June 1993)). That is, if the degree of deviation of u _t from b _i is D (ut _t b _i ), KLD
Is Given by This occurs difficulty of u _t from the population b _i, in other words, is representative of the resulting difficulty of u _t in state i. _{_{Therefore, log ψ i (y t)}} = - D if put the _{_{(u t ‖b i), ψ}} i (y t) will be expressed and the resulting ease of u _t in state _i ω i (y _t) Can be used as Therefore, if ω _i (y _t ) in (Equation 1) is ψ _i (y _t ), And Is a quantity determined only by the input, regardless of the model,
In the case where the model from which the input pattern is likely to be generated is compared using the value of (Equation 22), this can be omitted. So new Can be defined as That is, (Equation 24) becomes (Equation 1) It is defined as (Equation 7) can be derived in this way. This is the principle of the synergistic FVQ / HMM.

ところが、このようなことが言えるのは、各HMMにお
いて任意の状態系列Ｘに対して、入力パターンＹを形成
する全てのベクトルは、１度だけ、しかも必ず１度は発
生するとしているからである。一方、入力パターンがい
くつかのモデルの発生するベクトル系列の連結であると
考えたとき、何れの部分区間がある特定のモデルから発
生する度合が高いかを探索する場合は事情が異なって来
る。この場合は、照合しようとするモデルの状態系列の
最初の状態１を対応させる入力パターンフレームをＳ
（Ｘ）、最終の状態Ｊを対応させる入力パターンフレー
ムをＥ（Ｘ）とすれば、原理的にはをあらゆるＸについて計算しを求め、Ｓ（Ｘ^＊）〜Ｅ（Ｘ^＊）を求めるべき入力音声
パターンの部分区間とすることが出来る。However, this can be said because all the vectors forming the input pattern Y occur once and always once for any state sequence X in each HMM. . On the other hand, when the input pattern is considered to be a concatenation of vector sequences generated by several models, the situation is different when searching for which sub-interval has a high degree of occurrence from a specific model. In this case, the input pattern frame corresponding to the first state 1 of the state sequence of the model to be matched is represented by S
(X), if the input pattern frame corresponding to the final state J is E (X), in principle, Is calculated for every X , And S (X ^* ) to E (X ^* ) can be set as a partial section of the input voice pattern.

これをまともに計算するとなると、Ｅ（Ｘ）、Ｚ
（Ｘ）のあらゆる組合せに対して計算することになり、
計算量が膨大になる。しかも、この場合は（数23）はＸ
によって変わるから、それを省略するわけには行かな
い。そこで、（数27）の最適化問題を動的計画法で解く
ことを考える。フレームｓを始端フレームと仮定して、
ｔ′±ｖの範囲で終端フレームｔを変化させ、入力部分
パターンy_s,...,y_tがHMMλから発生する度合を動的計画
法で計算する。この場合は（数10）が適用でき、入力パ
ターン（y_s,..,y_ｔ′−ｖ,...,（y_s,...,y_ｔ′＋ｖ）の
それぞれについて発生度合が最大になる最適の状態系列
が動的計画法により得られる。即ち、終端フレームｔを
適当な範囲で動かし、それぞれの終端の中からさらに最
適のものを選べば、ｓを始端とする最適の終端が求めら
れる。終端の範囲は、例えば始端ｓに対して照合すべき
HMMが発生するベクトル系列の平均長などから予め設定
した値が用いられ得る。この場合は、ｔが変わる毎にｔ
−ｓ＋１で正規化する必要がある。ｓを変化させ同様な
操作を行えば、結局、最適の始端終端、即ち、所望の部
分区間が求められることになる。この場合、あるｓに対
して、ｔを変化させる毎に（数10）を計算する必要はな
い。即ち、図２のモデルを用いるときは、あるｓに対し
て、入力特徴ベクトルとHMMの状態との対応は、図８の
斜線の範囲に限定され、同斜線内の状態ｉに対応する入
力フレームｍ（ｉ）の範囲は、であって、ｍ（ｉ＋１）の範囲は（数29）であり、（数10）に従えば、（数29）の範囲に亘るφ
_i+1（ｍ（ｉ＋１））は、（数28）のｍ（ｉ）の範囲に
亘るφ_ｉ（ｍ（ｉ））の続きとして計算される。従っ
て、0i＝1,...,J＋１のそれぞれにおいて、φ_ｉ（ｍ
（ｉ））を（数28）の範囲で計算して行けば、ｉ＝Ｊ＋
１に対しての範囲で得られたそれぞれのφ_J+1（ｍ（Ｊ＋１））
は、入力パターンの始端フレームがｓのときの、終端フ
レームｔ′±ｖに対するそれぞれの照合結果となる。即
ち、このようにすれば、一つの始端フレームｓに対して
終端フレームｔ′±ｖに対する結果が一度に求められ
る。しかしこの方法であっても、フレーム毎に始端を変
えて、前記斜線内の範囲について前記計算を行わねばな
らず、計算量は甚だ多い。この計算量をさらに減ずるに
は、始端フレームも動的計画法により自動的に決まるよ
うにすればよい。それには、漸化式（数10）を次のよう
に変形する。If this is calculated properly, E (X), Z
Would be calculated for every combination of (X),
The amount of calculation becomes enormous. Moreover, in this case, (Equation 23) is X
It depends, so we can't omit it. Therefore, we consider solving the optimization problem of (Equation 27) by dynamic programming. Assuming that frame s is the starting frame,
The end frame t is changed in the range of t ′ ± v, and the degree of occurrence of the input partial patterns y _s ,..., y _t from the HMMλ is calculated by dynamic programming. In this case, can be applied. (Number 10), the input pattern _{_{(y s, .., y t'}} -v, ..., (y s, ..., occurrence rate for each maximum y _{t '+ v)} Is obtained by dynamic programming, that is, by moving the end frame t within an appropriate range and selecting a further optimum one from each end, the optimum end starting from s is obtained. The end range should be checked against, for example, the start s
A value set in advance from the average length of the vector sequence in which the HMM occurs can be used. In this case, every time t changes t
It is necessary to normalize by −s + 1. If the same operation is performed while changing s, the optimum start and end, that is, a desired partial section is eventually obtained. In this case, it is not necessary to calculate (Equation 10) every time t is changed for a certain s. That is, when the model of FIG. 2 is used, for a certain s, the correspondence between the input feature vector and the state of the HMM is limited to the range of the diagonal line in FIG. The range of m (i) is And the range of m (i + 1) is (Equation 29) According to (Equation 10), φ over the range of (Equation 29)
_{i + 1} (m (i + 1)) is calculated as a continuation of φ _i (m (i)) over the range of m (i) in (Equation 28). Therefore, for each of 0i = 1,..., J + 1, φ _i (m
If (i)) is calculated in the range of (Equation 28), i = J +
For one ΦJ _{+ 1} (m (J + 1)) obtained in the range of
Are matching results for the end frames t ′ ± v when the start frame of the input pattern is s. That is, in this way, the result for the end frame t '± v for one start frame s is obtained at a time. However, even in this method, the calculation must be performed for the range within the oblique line by changing the starting end for each frame, and the amount of calculation is extremely large. In order to further reduce the calculation amount, the starting frame may be automatically determined by the dynamic programming. To do this, the recurrence equation (Equation 10) is transformed as follows.

いま、y_s,...,y_tに対応する最適状態系列をＸ^＊＝ｘ
^＊ _s,x^＊ _s+1,...,x^＊ _ｔとする。動的計画法を適用するた
めには、Ｘ^＊において入力フレームｍ′に対し、ｓ＜
ｍ′＜ｔでｘ^＊ _ｍ′−１＝i,x^＊ _ｍ′＝ｊ（ｊ≠ｉ）で
あるとすれば、部分パターンy_s,..,y_ｍ′が対応する最
適の状態系列は、ｘ^＊ _s,...,x^＊ _ｍ′に一致すると言う
ことが言えなければならない。即ち、Φ_ｊ（ｍ′）を、
状態系列ｘ^＊ _s,...,x^＊ _ｍ′に対して部分パターン
y_s,..,y_ｍ′が発生する度合で、そのときの状態ｉの区
間長（フレーム数）がｚ（ｉ）であって、dur_i（ｚ）＝
bur_i（ｚ（ｉ））を状態ｉがｚフレーム続く度合である
とするととすれば、であって、ｍ′＜ｍ＜ｔなるｍに対して、ｘ^＊ _m-1＝j,x
^＊＝ｈであるとき、とおけば、が成り立たねばならない。ここに、W_i,W_i′,w_di,w_ai,w
_bi（ｋ）,w_di′,w_ai′,w_bi（ｋ）′（ｉ＝1,...,J）等
は状態系列に伴う重み係数、またはその和、Ψ_ｉ（ｍ）
は、状態系列ｘ_ｓ′,...,x_m-z(i)（≠ｘ^＊ _s,...,x^＊
_m-z(i)）に対応して入力部分パターンy_s,...,y_m-z(i)が
発生する度合である。W_iは状態系列ｘ^＊ _s,...,x^＊
_m-z(i)に沿う重み係数の総和、W_i′は状態系列
ｘ_ｓ′,..,x_{ｍ−ｚ（ｉ）′}に沿う重み係数の総和であ
る。ここで、これらの重み係数をうまく選べば、状態系
列の如何に関わらず（数31）〜（数34）が常に成立する
ようにできる。例えば、明らかに、W_i＝W_i′,w_di′＝w
_di,w_ai′＝w_ai,w_bi（１）′＋…＋w_bi（ｚ（ｉ）′）′
＝w_bi（１）＋…＋w_bi（ｚ（ｉ））になるようにすれば
上の条件は満足される。即ち、入力フレームｍで他の状
態から状態ｉになる状態系列に関し、ｍや始端フレーム
ｓ、点（m,i）までの状態のとり方に関わりなくそこま
での状態系列に沿う重み係数の和が一定になるようにす
ればよい。具体的な数値としては、ｉ＝1,...,Jについ
て、w_di＝w_ai＝1,w_bi（ｋ）＝1/z（ｉ）とすることがで
きる。Now, the optimal state sequence corresponding to y _s , ..., y _t is X ^* = x
^* _S , x ^* _{s + 1} , ..., x ^* _t . In order to apply the dynamic programming, for an input frame m ′ at X ^* , s <
m '<t with ^{_{^{x * m'-1 = i,}}} x * m' if a = j (j ≠ i), the partial pattern y _s, .., state series _{optimal y m 'corresponding} in , X ^* _s , ..., x ^* _{m '} . That is, Φ _j (m ′) is
Partial pattern for state sequence x ^* _s , ..., x ^* _{m '}
y _s, .., in the degree to which y _{m 'is} generated, the section length of the state i in the time (number of frames) is a z (i), dur _i (z) =
Let bur _i (z (i)) be the degree to which state i lasts for z frames given that, Where, for m '<m <t, x ^* _m-1 = j, x
^* = H If so, Must hold. Where W _i , W _i ′, w _di , w _ai , w
_bi (k), w _di ′, w _ai ′, w _bi (k) ′ (i = 1,..., J) are weighting factors associated with the state sequence or the sum thereof, _{ｉ i} (m)
Is the state sequence _xs _' , ..., _{xmz (i)} ( ₎ x ^* _s , ..., x ^*
_{mz (i)} ), the degree of occurrence of the input partial pattern y _s ,..., y _{mz (i)} . W _i is the state sequence x ^* _s , ..., x ^*
The sum of the weighting factors along _{mz (i)} , W _i ′, is the sum of the weighting factors along the state sequence _{xs ′} ,..., xm _{−z (i) ′} . Here, if these weighting factors are properly selected, (Equation 31) to (Equation 34) can always be established regardless of the state sequence. For example, obviously, W _i = W _i ′, w _di ′ = w
_di , w _ai ′ = w _ai , w _bi (1) ′ +... + w _bi (z (i) ′) ′
= W _bi (1) +... + W _bi (z (i)) satisfies the above condition. That is, regarding a state sequence that changes from another state to a state i in an input frame m, the sum of the weighting factors along the state sequence up to that point regardless of how to take the state to m, the starting frame s, and the point (m, i). What is necessary is just to make it constant. As a specific numerical value, for i = 1,..., J, w _di = w _ai = 1, w _bi (k) = 1 / z (i).

以上の議論から、いま、入力フレームｔにおける状態
がｊに変化したとすると、点（t,j）までの最適の部分
区間の発生度合は、状態ｉに対応する入力の部分区間の
フレーム数をｚとすれば、次の漸化式から得られる。From the above discussion, assuming that the state in the input frame t has now changed to j, the optimal degree of occurrence of the partial section up to the point (t, j) is determined by the number of frames in the input partial section corresponding to the state i. Assuming that z is obtained from the following recurrence formula.

このとき、（数35）を満足するi,zをｉ^＊,z^＊とし、を同時に記憶して行けば、次のステップによりワードス
ポッティングが行える。 At this time, i, z that satisfies (Equation 35) is i ^* , z ^* , Are stored simultaneously, word spotting can be performed in the next step.

（１）初期化 Φ_１（ｔ）＝0 for t＝1,...,T （π_１＝1,π_ｉ≠０
∀ｉ≠１） Φ_ｉ（０）＝−∞ for i＝1,...,J B₁（ｔ）＝0 for t＝1,...,T （２）ｔ＝1,...,T＋１について（３）（４）を実行（３）ｉ＝1,..,J＋１について（数26）（数27）を実行（４）Φ（ｔ）＝Φ_J+1（ｔ）,B（ｔ）＝B_J+1（ｔ−ｚ
^＊）（５）部分パターン検出終端フレーム：このようにすれば、（数35）におけるΦ_ｉ（ｔ）の計算
は、各（t,j）について１回行うだけで済み、log ω_ｉ
（y_t-z-1+k）のｋ＝１〜ｚにわたる加算もｚが変わる毎
に計算しなくても、ｋ＝１からｚ（ｍ）までの和をｓ
（ｍ）とすれば、ｚ（ｍ）までの和はｓ（ｍ＋１）＝ｓ
（ｍ）＋log ω_ｉ（y_t-z+z(m)）の計算で済むから、計
算量は大幅に少なくなる。(1) Initialization Φ ₁ (t) = 0 for t = 1,..., T (π ₁ = 1, π _i ≠ 0
∀i ≠ 1) Φ _i (0) = − ∞ for i = 1, ..., JB ₁ (t) = 0 for t = 1, ..., T (2) t = 1, ..., Execute (3) and (4) for T + 1 (3) Execute (Equation 26) and (Equation 27) for i = 1, .., J + 1 (4) Φ (t) = Φ _{J + 1} (t), B ( t) = B _{J + 1} (t−z
^* ) (5) Partial pattern detection End frame: In this way, the calculation of Φ _i (t) in (Equation 35) need only be performed once for each (t, j), and log ω _i
The sum of (y _{tz-1 + k} ) over k = 1 to z does not need to be calculated every time z changes.
(M), the sum up to z (m) is s (m + 1) = s
Since the calculation of (m) + log ω _i (y _{t−z + z (m)} ) is sufficient, the calculation amount is significantly reduced.

ここで、本発明においては、とおくものである。このとき、（数35）に対応してなる漸化式が定義される。図２のモデルを用いるとき
は、となる。いま、最適状態系列が求められたとして、その
ときの状態ｉの長さがｚ（ｉ）^＊であるとし、簡単のた
めに次のようにおく。Here, in the present invention, It is a thing to keep. At this time, according to (Equation 35) The following recurrence formula is defined. When using the model of FIG. 2, Becomes Now, assuming that the optimal state sequence has been obtained, the length of state i at that time is z (i) ^* , and the following is set for simplicity.

であって、である。従って、（数31）の漸化式を用いるときは、Φ
_J+1（ｔ）は、状態系列に関わりなく、右辺最終項およ
び右辺最終項から２番目の項の、状態に関する総和を含
み、これは入力パターンとは関係なく、照合されるモデ
ルが決まれば決まる量であり、この式の最大化問題には
無関係な量であるから、入力パターンの最適区間を見出
す際には不要である。従って、ω_ｉ（y_t）としては、次
式のように定義し直したものを採用することが出来る。 And It is. Therefore, when using the recurrence formula of (Equation 31), Φ
_{J + 1} (t) includes the sum of the last term on the right-hand side and the second term from the last term on the right-hand side, regardless of the state sequence, regardless of the input pattern. Since it is a determined amount and is not related to the problem of maximizing this equation, it is unnecessary when finding the optimal section of the input pattern. Therefore, as ω _i (y _t ), one redefined as the following equation can be adopted.

次に本願第２の発明である前記FVQの考え方をDPマッ
チングに適用する方法について説明する。図９は本発明
の原理を示すブロック図である。91、93は図７の71、73
と同様な動作をする。92は帰属度行列算出部であって、
図７の72における距離行列算出部に相当するものである
が、本実施例では帰属度行列を算出する。即ち、特徴ベ
クトルy_tのクラスタC_mに対する帰属度u_tm（ｍ＝1,..,
M、u_t1＋u_t2＋…＋u_rM＝１）を算出する。帰属度として
は、ファジィ理論などで用いられるメンバーシップ関数
等を用いることが出来、この場合も（数４）等前記FVQ/
HMMで用いたものと同様のものが用いられる。95は単語
辞書であって、認識すべき各単語に対応して予め登録さ
れた帰属度行列からなる。即ち、第ｒの単語の標準パタ
ーンは、その発声に対して帰属度行列算出部92で得られ
た帰属度行列として登録される。図９においては、単語
ｒの標準パターンにおける第ｊフレームのクラスタｍに
対する帰属度はb^(r) _jmで表している。96は単語1,...,R
に対する帰属度行列である。 Next, a method of applying the concept of FVQ, which is the second invention of the present application, to DP matching will be described. FIG. 9 is a block diagram showing the principle of the present invention. 91 and 93 are 71 and 73 in FIG.
Performs the same operation as. 92 is a membership degree matrix calculation unit,
Although this corresponds to the distance matrix calculation unit 72 in FIG. 7, in the present embodiment, a membership matrix is calculated. That is, the degree of membership u _{tm of} the feature vector y _{t to} the cluster C _m (m = 1, ..,
M, and calculates the _{_{u t1 + u t2 + ... +}} u rM = 1). As the degree of membership, a membership function or the like used in fuzzy theory or the like can be used. In this case, the FVQ /
The same one used in the HMM is used. Reference numeral 95 denotes a word dictionary, which comprises a membership degree matrix registered in advance corresponding to each word to be recognized. That is, the standard pattern of the r-th word is registered as the membership matrix obtained by the membership matrix calculation unit 92 for the utterance. In FIG. 9, the degree of belonging of the word r to the cluster m of the j-th frame in the standard pattern is represented by b ^(r) _jm . 96 is the word 1, ..., R
Is a membership matrix for.

このとき、認識時における入力パターンのフレームｔ
と標準パターンｒのフレームｊとの類似度は帰属度ベク
トルu_t＝（u_t1,u_t2,...,u_tM）^Ｔとb^(r) _j＝（b^(r) _j1,b
^(r) _j2,..,b^(r) _jM）^Ｔの類似度として与えられる。ここ
で、u_tm≧０、u_t1＋…＋u_tM＝１、b^(r) _jm≧０、b^(r) _j1
＋…＋b^(r) _jM＝１であるから、前記両ベクトルを確率分
布ベクトルと見なすことができ（u_tm、b^(r) _jmを事後確
率とするときはまさに確率分布そのものである）、前記
HMMの場合と同様に、前記類似度は確率分布間の距離と
して知られているKullback−Letbler bivergenceで与え
ることが出来る。即ち、分布（q₁,..,q_M）と（p₁,..,.p
_M）とを乖離度はで与えられる。At this time, the frame t of the input pattern at the time of recognition
Membership vector similarity between frame j of reference pattern r and _{_{u t = (u t1, u}} t2, ..., u tM) T and ^{_{b (r) j = (b}} (r) j1, b
^(r) _j2 , .., b ^(r) _jM ) is given as the similarity of ^T. Here, _utm ≧ 0, _ut1 +... + _UtM = 1, b ^(r) _jm ≧ 0, b ^(r) _j1
+ ... + b ^(r) _jM = 1, so that both vectors can be regarded as a probability distribution vector (when u _tm and b ^(r) _jm are posterior probabilities, it is exactly the probability distribution itself).
As in the case of the HMM, the similarity can be given by Kullback-Letbler bivergence known as the distance between probability distributions. That is, the distributions (q ₁ , .., q _M ) and (p ₁ , ..,. P
_M ) Given by

これを用いれば、d^(r)（t,j）としての次の３つの定
義が可能である。Using this, the following three definitions as d ^(r) (t, j) are possible.

（数45）（１）は（数17）（１）の重み係数を採用した
とき有用な距離定義であり、（数45）（２）は（数17）
（２）の重み係数を採用したとき有用な距離定義であ
り、（数45）（３）は処理の対称性を問題にするとき
（d^(r)（t,j）＝d^(r)（j,t）が要求されるとき）有用な
距離定義である。勿論、これらの定義において、加算形
式の場合は定数倍、乗算形式の場合は定数乗したものを
用いることが出来る。 (Equation 45) (1) is a useful distance definition when the weighting coefficients of (Equation 17) and (1) are adopted, and (Equation 45) and (2) are (Equation 17)
(2) is a useful distance definition when employing the weighting coefficients, equation (45) (3) When the problem symmetry process ^{(d (r) (t,} j) = d (r) ( j, t) is a useful distance definition. Of course, in these definitions, a constant multiplication can be used in the case of the addition type, and a constant multiplication can be used in the case of the multiplication type.

94はマッチング部であって、単語辞書95に登録されて
いる各単語に対応した帰属度行列のそれぞれと、入力パ
ターンから得られた帰属度行列とのDPマッチングを行う
ものである。即ち、（数45）に示されたフレーム間距離
d^(r)（t,j）に基づいて、漸化式（数18）を計算し、
（数13）（ａ）で定義される累積距離D^(r)を計算する。
97は判定部であって、（数14）を計算し、認識結果を得
るものである。A matching unit 94 performs DP matching between each of the membership matrices corresponding to each word registered in the word dictionary 95 and the membership matrix obtained from the input pattern. That is, the distance between frames shown in (Equation 45)
Based on d ^(r) (t, j), calculate the recurrence formula (Equation 18),
(Equation 13) The cumulative distance D ^(r) defined in (a ⁾ is calculated.
Reference numeral 97 denotes a determination unit that calculates (Equation 14) and obtains a recognition result.

このように、照合すべき単語辞書を構成するために発
声した特徴ベクル系列の各特徴ベクトルを唯一つの擬音
韻に置き換えてしまうのではなく、帰属度付きで各フレ
ームに複数個の擬音韻を対応させたので、従来例のもつ
量子化誤差の悪影響を緩和できる。また、前記説明から
明らかなように、本願発明においては、これら帰属度の
算出、標準パターンの各フレームと入力パターンとの距
離の算出は、数学的に明確な定義に基づいて決められる
という特徴がある。In this way, instead of replacing each feature vector of the feature vector sequence uttered to construct a word dictionary to be matched with only one onomatopoeia, multiple onomatopoeia correspond to each frame with degree of belonging Therefore, the adverse effect of the quantization error of the conventional example can be reduced. As is clear from the above description, the present invention has a feature that the calculation of the degree of belonging and the calculation of the distance between each frame of the standard pattern and the input pattern are determined based on a mathematically clear definition. is there.

次に本発明のさらなる改良について説明する。 Next, further improvements of the present invention will be described.

先ず、（数45）（１）を距離尺度として用いる場合に
ついて説明する。このときは、フレーム間距離はであって、これを重み係数を（数17）（１）として（数
13）に代入すればとなるが、ｔ（ｋ）−ｔ（ｋ−ｎ）＝１となる１≦ｎ≦
ｋ−１が存在し（マッチング径路において、入力パター
ンフレームに対する飛び超しはない）、このｎに対し、
ｘ（ｋ−ｎ）からｘ（ｋ）に至る径路に沿う重み係数の
和が１、即ち、ｗ（ｋ−ｎ＋１）＋ｗ（ｋ−ｎ＋２）＋
…＋ｗ（ｋ）＝１の場合、例えば、図10〜14のような場
合、（数47）はとなる。図10〜14の例は、ｘ（ｋ）＝（ｔ＋ｊ）,k−１
≧ｎ≧１に対して、図10は、ｘ（ｋ−１）＝（ｔ−1,
j）or（ｔ−1,j−ｎ）；図11、14は、ｘ（ｋ−１＝（ｔ
−1,j）of（ｔ−1,j−１）、ｍ＝2,..,nに対しては、ｘ
（ｋ−ｍ）＝（ｔ−1,j−ｍ）；図12、13は、ｘ（ｋ−
１）＝（ｔ−1,j），（ｔ−1,j−１）or（t,j−１）,m
＝2,..,n−１に対しては、ｘ（ｋ−ｍ）＝（t,j−ｍ）,
x（ｋ−ｎ）＝（ｔ−1,j−ｎ）の場合である。それぞれ
の図の径路の側に付した数値は、それぞれの場合につい
ての径路に沿う重み係数の一例である。First, a case where (Equation 45) (1) is used as a distance scale will be described. In this case, the distance between frames is , And using this as a weighting factor (Equation 17) (1),
13) Where t ≦ (k) −t (k−n) = 1 and 1 ≦ n ≦
k-1 exists (there is no jump for the input pattern frame in the matching path), and for this n,
The sum of the weighting factors along the path from x (kn) to x (k) is 1, that is, w (kn + 1) + w (kn + 2) +
.. + W (k) = 1, for example, in the case of FIGS. Becomes In the example of FIGS. 10 to 14, x (k) = (t + j), k−1
For ≧ n ≧ 1, FIG. 10 shows that x (k−1) = (t−1,
j) or (t−1, j−n); FIGS. 11 and 14 show x (k−1 = (t
−1, j) of (t−1, j−1), m = 2, .., n, x
(Km) = (t−1, j−m); FIGS. 12 and 13 show x (k−m)
1) = (t−1, j), (t−1, j−1) or (t, j−1), m
= (2, .., n-1), x (km) = (t, j-m),
This is the case where x (kn) = (t−1, j−n). The numerical values attached to the side of the path in each figure are examples of weighting factors along the path in each case.

このとき、（数48）右辺の最初の項は、径路の選ばれ
方、標準パターンの何れとも独立であって、入力パター
ンによってのみ決まる量となる。従って、各標準パター
ンと入力パターンとの比較結果の大小関係のみを問題に
するときは省略できる。そこで、この項を省略し、符号
を変えれば、をパターン間の類似度とすることが出来る。この場合入
力フレームｔと、標準パターンｒのフレームｊとのフレ
ーム間類似度をとすることが出来る。At this time, the first term on the right side of (Equation 48) is independent of any of the path selection method and the standard pattern, and is an amount determined only by the input pattern. Therefore, this can be omitted when only the magnitude relation of the comparison result between each standard pattern and the input pattern is considered. Therefore, by omitting this term and changing the sign, Can be the similarity between the patterns. In this case, the inter-frame similarity between the input frame t and the frame j of the standard pattern r is It can be.

ここで、さらにｔ（ｋ）−ｔ（ｋ−１）＝１（マッチ
ング径路において、入力パターンフレームに対する飛び
越しはないと同時に、重複もない…（図６）あるいは
（図10の場合が該当）とすれば、となる。ただし、ｊ＝ｊ（ｔ）はｔ−ｊ平面におけるマ
ッチング径路を表す関数であって、ｔ＝ｔ（ｋ）、ｊ＝
ｊ（ｋ）からｋを消去して得られる。図６の径路を用い
る場合は、マッチング部94は、（数50）に示されたフレ
ーム間類似度s^(r)（t,j）に基づいて、例えばなる漸化式を計算し、（数51）で定義される累積類似度
S^(r)を計算することになる。判定部97はを計算し、認識結果を得る。Here, furthermore, t (k) -t (k-1) = 1 (in the matching path, there is no jump to the input pattern frame and no overlap ... (FIG. 6) or (applicable in the case of FIG. 10). if, Becomes Here, j = j (t) is a function representing a matching path on the tj plane, and t = t (k), j =
It is obtained by eliminating k from j (k). In the case of using a path of FIG. 6, the matching unit 94 on the basis of the equation (50) at the indicated inter-frame similarity s ^(r) (t, j), for example, Calculates the recurrence formula, and calculates the cumulative similarity defined by (Equation 51)
S ^(r) will be calculated. The judgment unit 97 Is calculated to obtain a recognition result.

ｔ（ｋ）−ｔ（ｋ−ｎ）＝１となる１≦ｎ≦ｋ−１が
存在し（マッチング径路において、入力パターンフレー
ムに対する飛び越しはない）、このｎに対し、ｘ（ｋ−
ｎ）からｘ（ｋ）に至る径路に沿う重み係数の和を１、
即ち、ｗ（ｋ−ｎ＋１）＋ｗ（ｋ−ｎ＋２）＋…＋ｗ
（ｋ）＝１とする上記方法は、連続単語音声認識等で有
用である。即ち、このようにすることによって、連続発
声された入力単語音声パターンに対し、それに最も良く
類似する、個々に登録された認識単語標準パターンの最
適の連結パターンを見出すという問題を、公知の２段DP
等を用いて効率的に計算できるからである。ここで提案
したフレーム間類似度は、このような場合に、適用でき
るものであって、簡単な計算で高い認識性能を与えるこ
とができるものである。There exists 1 ≦ n ≦ k−1 such that t (k) −t (k−n) = 1 (there is no jump to the input pattern frame in the matching path).
The sum of weighting factors along the path from n) to x (k) is 1,
That is, w (kn + 1) + w (kn + 2) + ... + w
The above method of setting (k) = 1 is useful in continuous word speech recognition and the like. In other words, by doing so, the problem of finding the most suitable connected pattern of individually registered recognized word standard patterns, which is most similar to the continuously uttered input word voice pattern, is solved by a known two-step method. DP
This is because the calculation can be efficiently performed using the above. The inter-frame similarity proposed here can be applied in such a case, and can provide high recognition performance with a simple calculation.

次に、（数45）（２）を距離尺度として用いる場合に
ついて説明する。このときは、フレーム間距離はであって、これを重み係数を（数17）（２）として（数
13）に代入すればとなるが、ｊ（ｋ）−ｊ（ｋ−ｎ）１となる１≦ｎ≦ｋ
−１が存在し（マッチング径路において、標準パターン
フレームに対する飛び越しはない）、このｎに対し、ｘ
（ｋ−ｎ）からｘ（ｋ）に至る径路に沿う重み係数の和
が１、即ち、ｗ（ｋ−１＋１）＋ｗ（ｋ−ｎ＋２）＋…
＋ｗ（ｋ）＝１の場合、例えば、図15〜17のような場
合、（数55）はとなる。図15〜19の例は、ｘ（ｋ）＝（t,j）,k−１≧
ｎ≧１に対して、図15はｘ（ｋ−１）＝（t,j−１）or
（ｔ−n,j−１）；図16、19は、ｘ（ｋ−１）＝（t,j−
１）or（ｔ−1,j−１）,m＝2,...,nに対しては、ｘ（ｋ
−ｍ）＝（ｔ−m,j−１）；図17、18は、ｘ（ｋ−１）
＝（t,j−１），（ｔ−1,j−１）or（ｔ−1,j）,m＝
2,..,n−１に対しては、ｘ（ｋ−ｍ）＝（ｔ−m,j）,x
（ｋ−ｎ）＝（ｔ−n,j−１）の場合である。それぞれ
の図の径路の側に付した数値は、それぞれの場合につい
ての径路に沿う重み係数の一例である。Next, a case where (Equation 45) and (2) are used as a distance scale will be described. In this case, the distance between frames is , And the weighting coefficient is expressed as (Equation 17) (2) (Equation 17).
13) Where 1 ≦ n ≦ k, which is j (k) −j (kn) 1
-1 exists (there is no jump to the standard pattern frame in the matching path), and for this n, x
The sum of the weighting factors along the path from (kn) to x (k) is 1, that is, w (k−1 + 1) + w (kn−2) +.
In the case of + w (k) = 1, for example, in the case of FIGS. Becomes 15 to 19, x (k) = (t, j), k−1 ≧
For n ≧ 1, FIG. 15 shows x (k−1) = (t, j−1) or
(T−n, j−1); FIGS. 16 and 19 show x (k−1) = (t, j−).
1) For or (t−1, j−1), m = 2, ..., n, x (k
−m) = (tm, j−1); FIGS. 17 and 18 show x (k−1)
= (T, j-1), (t-1, j-1) or (t-1, j), m =
For 2, .., n-1, x (km) = (tm, j), x
This is the case where (kn) = (t-n, j-1). The numerical values attached to the side of the path in each figure are examples of weighting factors along the path in each case.

このとき、（数56）右辺の最初の項は、径路の選ばれ
方、入力パターンの何れの区間とも独立であって、標準
パターンによってのみ決まる量となる（標準パターンｒ
に対するこの量をC^(r)とする）。従って、ある標準パタ
ーンと、入力連続単語音声パターンの種々の区間、ある
いは種々の入力パターンとの比較結果の大小関係のみを
問題にするときは省略できる。そこでこの項を省略し、
符号を変えれば、をパターン間の類似度とすることが出来る。この場合は
入力フレームｔと、標準パターンｒのフレームｊとのフ
レーム間類似度をとすることができる。At this time, the first term on the right side of (Equation 56) is independent of any path selection method and any section of the input pattern, and is an amount determined only by the standard pattern (standard pattern r
This amount relative to C ^(r) ). Therefore, it can be omitted when only the magnitude relation of the comparison result between a certain standard pattern and various sections of the input continuous word voice pattern or various input patterns is considered. So we omit this section,
If you change the sign, Can be the similarity between the patterns. In this case, the inter-frame similarity between the input frame t and the frame j of the standard pattern r is It can be.

このフレーム間類似度の定義を用いて、入力パターン
が何れの標準パターンに近いかを判定するときは、（S
^(r)−C^(r)）/J^(r)を比較し、その最大のものを見出すこ
とになる。When using the definition of the similarity between frames to determine which standard pattern the input pattern is close to,
^(r) −C ^(r) ) / J ^(r) and find the largest one.

ここで、さらにｊ（ｋ）−ｊ（ｋ−１）＝１（マッチ
ング径路において、標準パターンフレームに対する飛び
越しはないと同時に、重複もない…（図20）あるいは
（図15の場合が該当）とすれば、となる。ただし、ｔ＝ｔ（ｊ）はｔ−ｊ平面におけるマ
ッチング径路を表す関数であって、ｔ＝ｔ（ｋ）,j＝ｊ
（ｋ）からｋを消去して得られる。図15〜19の径路を用
いる場合は、マッチング部94は、（数58）に示されたフ
レーム間類似度s^(r)（t,j）に基づいて、なる漸化式を計算し、（数59）で定義される累積類似度
S^(r)を計算することになる。Here, further, j (k) -j (k-1) = 1 (in the matching path, there is no jump to the standard pattern frame and no overlap ... (FIG. 20) or (applicable in the case of FIG. 15)). if, Becomes Here, t = t (j) is a function representing the matching path on the tj plane, and t = t (k), j = j
It is obtained by eliminating k from (k). When using the paths shown in FIGS. 15 to 19, the matching unit 94 calculates the inter-frame similarity s ^(r) (t, j) shown in (Equation 58). Calculates the recurrence formula, and calculates the cumulative similarity defined by (Equation 59)
S ^(r) will be calculated.

ｊ（ｋ）−ｊ（ｋ−ｎ）＝１となる１≦ｎ≦ｋ−１が
存在し（マッチング径路において、標準パターンフレー
ムに対する飛び越しはない）、このｎに対し、ｘ（ｋ−
ｎ）からｘ（ｋ）に至る径路に沿う重み係数の和を１、
即ち、ｗ（ｋ−ｎ＋１）＋ｗ（ｋ−ｎ＋２）＋…＋ｗ
（ｋ）＝１とする上記方法は、単語を連続して発声した
連続単語音声の入力パターンから、ある標準パターンと
最もよく整合する部分区間を同定するいわゆるワードス
ポッティングを行う際に有用である。この場合は、入力
パターンの区間の長さに関係なく、比較すべき標準パタ
ーンをｒとすれば、各区間におけるS^(r)を比較するのみ
でよい。即ち、このようにすれば、次のステップによっ
て、ワードスポッティングの問題を動的計画法を適用し
て効率的に計算できる。ここで提案したフレーム間類似
度は、このような場合に、適用できるものであって、簡
単な計算で高い認識性能を与えることができるものであ
る。例えば、図20の径路制限条件を用いて、ある単語に
対するワードスポッティングを行う場合は次のようにな
る。There exists 1 ≦ n ≦ k−1 such that j (k) −j (k−n) = 1 (there is no jump to the standard pattern frame in the matching path).
The sum of weighting factors along the path from n) to x (k) is 1,
That is, w (kn + 1) + w (kn + 2) + ... + w
The above method of setting (k) = 1 is useful when performing so-called word spotting for identifying a partial section that best matches a certain standard pattern from an input pattern of continuous word speech in which words are uttered continuously. In this case, assuming that the standard pattern to be compared is r regardless of the length of the section of the input pattern, it is only necessary to compare S ^(r) in each section. That is, in this way, the word spotting problem can be efficiently calculated by applying the dynamic programming method in the following steps. The inter-frame similarity proposed here can be applied in such a case, and can provide high recognition performance with a simple calculation. For example, when word spotting is performed on a certain word using the route restriction condition in FIG.

（１）初期化ｆ（0,f）＝ｆ（−1,j）＝−∞ for i＝−1,0,1,...,
J ｆ（0,0）＝０（２）ｔ＝1,...,T＋１について（３）〜（６）を実行（３）ｆ（t,0）＝−∞ （４）ｆ（t,1）＝ｓ（t,1）（５）Ｂ（t,1）＝ｔ−１（６）ｊ＝2,...,Jについて次の漸化式を計算（７）Ｄ（ｔ）＝ｆ（t,J）,B（ｔ）＝Ｂ（t,J）（８）部分パターン検出終端フレーム：次に、第３の発明である前記HMMおよびDPにおける記
憶量、計算量の削減法について説明する。(1) Initialization f (0, f) = f (−1, j) = − ∞ for i = −1,0,1, ...,
J f (0,0) = 0 (2) Execute (3) to (6) for t = 1,..., T + 1 (3) f (t, 0) = − ∞ (4) f (t, 1) = s (t, 1) (5) B (t, 1) = t-1 (6) Calculate the following recurrence formula for j = 2, ..., J (7) D (t) = f (t, J), B (t) = B (t, J) (8) Partial pattern detection End frame: Next, a method for reducing the amount of storage and the amount of calculation in the HMM and DP according to the third invention will be described.

基本的な考え方は、メモリ量を削減する場合は標準パ
ターンの帰属度は上位Ｎ＜Ｍについて記憶し、計算量を
削減する場合は入力パターンの帰属度は上位Ｋ＜Ｍにつ
いてのみ計算することに基づいている。この場合、注意
すべきは、確率分布（p₁,...,p_M）と確率分布（q₁,...,
q_M）の類似度をで定義するとき、p_i＝０∃ｉ∈｛1,...,M｝は有り得る
が、q_i＞０∀ｉ∈｛1,...,M｝であり、q_iは０にはなり
得ないということである。従って、q_iの上位Ｎのみ計算
あるいは記憶する場合は、残りのq_iについては、共通の
値をq₁＋…＋q_M＝１になるように決め、その値を用いる
ようにする。従って、この場合にq_i（ｉ＝1,...,M）に
対して必要とされる記憶量は、q_g(1),...,q_g(N)につい
てはN,q_g(N+1),...,q_g(M)については１である。ただ
し、ｇ（ｎ）は、｛q₁,...,q_M｝のうち、ｎ番目に大き
いｑの添え字である。p_iは上位ＫとＫ＋１以下に分けて
q_iと同様に出来る（ＫはＮと同じである必要はない）
が、これらは０になり得るから、ｈ（ｋ）を｛p₁,...,p
_M｝のうち、ｋ番目に大きいｐの添え字とすれば、p_h(1)
＋…＋p_h(K)＝１、p_h(K+1)＋…＋p_h(M)＝０とすること
も出来る。この場合はp_i（ｉ＝1,...,M）に対して必要
とされる記憶量は、p_h(1),...,p_h(K)に対するＫのみで
ある。The basic idea is that when the memory amount is reduced, the degree of membership of the standard pattern is stored for the top N <M, and when the amount of calculation is reduced, the degree of membership of the input pattern is calculated only for the top K <M. Is based on In this case, it should be noted that the probability distribution (p ₁ , ..., p _M ) and the probability distribution (q ₁ , ...,
q _M ) Where p _i = 0∃i∈ ｛1, ..., M｝ is possible, but q _i > 0∀i∈ ｛1, ..., M｝, and q _i is 0 It cannot be. Therefore, when calculating or storing only the top N of q _i, for the remaining q _i, determined to be the common value for _{_{q 1 + ... + q M =}} 1, so that use that value. _{Accordingly, q i (i = 1,} ..., M) in this case amount of storage required for _{the, q g (1), ...} , q for _{_g (N) N,} q _g _{(N + 1)} ,..., Q _{g (M)} is 1. Here, g (n) is a subscript of the n-th largest q in {q ₁ ,..., Q _M }. p _i is divided into upper K and below K + 1
q Same as _i (K does not have to be the same as N)
However, since these can be 0, h (k) is expressed as ｛p ₁ , ..., p
If the subscript of the k-th largest p in _M ｝, then _{ph (1)}
+ ... + _{ph (K)} = 1, _{ph (K + 1)} + ... + _{ph (M)} = 0. In this case, the amount of storage required for p _i (i = 1,..., M) is only _K for _{ph (1)} ,.

前記相乗型FVQ/HMMにおけるω^(r) _i（y_t）（前記ω_ｉ
（y_t）やb_im,a_ij等が特に単語ｒに関するものであるこ
とを明記するときは、右肩に（ｒ）を付けて示す）、相
乗型FVQ/DPにおけるs^(r)（t,j）は、何れも（数63）の
形をしており、前記メモリ量、計算量の削減に関して同
じことが言えるから、以後の説明は相乗型FVQ/HMMの場
合、即ち、ω^(r) _i（y_t）に対して実施例を説明すること
にする。この場合、HMMにおける状態ｊを、DPにおける
標準パターンの第ｊフレーム、HMMの状態ｊにおけるク
ラスタｍの発生確率b^(r) _imを、DPマッチングにおける標
準パターンｒの第ｊフレームのクラスタｍに対する帰属
度と読み変えれば、HMMと全く同じ議論がDPマッチング
の場合においても成り立つ。Ω ^(r) _i (y _t ) in the synergistic FVQ / HMM (the ω _i
(Y _t ), _bim , a _ij, etc., are particularly related to the word r, where (r) is attached to the right shoulder), and s ^(r) in the synergistic FVQ / DP (t , j) have the form of (Equation 63), and the same can be said for the reduction of the memory amount and the calculation amount. Therefore, the following description is for the case of the synergistic FVQ / HMM, that is, ω ^{(r )} An embodiment will be described for _i (y _t ). In this case, the state j in the HMM, the j-th frame of the standard pattern in the DP, the occurrence probability b ^(r) _im of the cluster m in the state j of the HMM are assigned to the cluster m in the j-th frame of the standard pattern r in the DP matching. In other words, the same argument as HMM holds in the case of DP matching.

ω^(r) _j（ｙ）の定義として、u_tm,b^(r) _jmに対し、それ
ぞれの記憶量を削減する方法として次のようなものが考
えられる。ただし、添え字ｇ（r,j,n）はHMM rの第ｊ状
態のクラスタの発生確率がｎ番目であるクラスタ名（番
号）を意味し、b^(r) _{i,g（r,j,n）}は、HMMrの第ｊ状態に
おけるクラスタｇ（r,j,n）の発生確率、ｈ（t,k）は入
力パターンの第ｔフレームの特徴ベクトルの帰属度がｋ
番目であるクラスタ名を意味し、ｕ_t,h（t,k）は、クラ
スタｈ（t,k）に対するy_tの帰属度を意味する。As a definition of ω ^(r) _j (y), the following method is considered as a method for reducing the storage amount for each of u _tm and b ^(r) _jm . Here, the subscript g (r, j, n) means a cluster name (number) in which the occurrence probability of the cluster in the j-th state of the HMM r is the n-th, and b ^(r) _{i, g (r, j, n)} is the probability of occurrence of the cluster g (r, j, n) in the j-th state of HMMr, and h (t, k) is the degree of membership of the feature vector of the t-th frame of the input pattern is k.
U _{t, h (t, k)} means the degree of belonging of y _t to cluster h (t, k).

〔第１の方法〕とする。ただし、b^(r) _{j,g（r,j,n）}に関して、１≦ｎ≦
Ｎにおいてはｎ＝1,...,Nに対する推定値そのまま、Ｎ
＋１≦ｎ≦Ｍにおいてはとする。u_tmに関しては、（1.1）１≦ｍ≦Ｍにおけるｍ
の全ての推定値を用いるか、または、（1.2）ｕ
_t,h（t,k）に関しては、１≦ｋ≦ＫにおいてはであってＫ＋１≦ｋ≦Ｍにおいてはｕ_t,h（t,k）＝０と
なるように推定しても良い。（1.2）の場合は帰属度計
算の削減も同時に行うことになる（後述）。[First method] And However, for b ^(r) _{j, g (r, j, n)} , 1 ≦ n ≦
For N, the estimated values for n = 1,.
For + 1 ≦ n ≦ M And For u _tm , (1.1) m in 1 ≦ m ≦ M
Use all estimates of
_{As for t, h (t, k)} , when 1 ≦ k ≦ K In K + 1 ≦ k ≦ M, it may be estimated such that u _{t, h (t, k)} = 0. In the case of (1.2), the calculation of the degree of membership is also reduced (described later).

〔第２の方法〕において、b^(r) _{r,g（r,j,n）}に関しては、１≦ｎ≦Ｎに
おいてはであってＮ＋１≦ｎ≦Ｍにおいてはb^(r) _{j,g（r,j,n）}＝
０となるように推定したものとする。ｕ_t,h（t,k）に関
しては、（2.1）１≦ｋ≦Ｍにおけるｕ_t,h（t,k）の全
ての推定値を用いるか、または、（2.2）１≦ｋ≦Ｋに
おいては前記と同様のｕ_t,h（t,k）を用い、Ｋ＋１≦ｋ
≦Ｍにおいてはとする。（2.2）の場合は帰属度計算の削減も同時に行
うことになる（後述）。[Second method] , For b ^(r) _{r, g (r, j, n)} , for 1 ≦ n ≦ N And for N + 1 ≦ n ≦ M, b ^(r) _{j, g (r, j, n)} =
It is assumed that it has been estimated to be zero. For ut _{, h (t, k)} , use (2.1) all estimated values of ut _{, h (t, k)} at 1 ≦ k ≦ M or (2.2) at 1 ≦ k ≦ K Uses u _{t, h (t, k)} similar to the above, and K + 1 ≦ k
For ≤M And In the case of (2.2), the calculation of the degree of membership is also reduced (described later).

〔第３の方法〕において、u_tmに関しては、１≦ｋ≦Ｋにおいてｕ
_t,h（t,k）はｋ＝1,...,Kの推定値そのまま、Ｋ＋１≦
ｋ≦Ｍにおいては、とする。^(r) _{j,g（r,j,n）}に関しては、（3.1）１≦ｎ≦
Ｍにおいてはｎ＝1,...,Mに対する全ての推定値を用い
るか、または、（3.2）１≦ｎ≦ＮにおいてはであってＮ＋１≦ｎ≦Ｍにおいてはb^(r) _{j,g（r,j,n）}＝
０となるように推定しても良い。（3.2）の場合はメモ
リ量の削減も同時に行うことになる。[Third method] , With respect to u _tm , when 1 ≦ k ≦ K
_{t, h (t, k)} is the estimated value of k = 1, ..., K, K + 1 ≦
For k ≦ M, And ^{For (r)} _{j, g (r, j, n)} , (3.1) 1 ≦ n ≦
For M, use all estimates for n = 1, ..., M, or (3.2) for 1 ≦ n ≦ N And for N + 1 ≦ n ≦ M, b ^(r) _{j, g (r, j, n)} =
It may be estimated to be zero. In the case of (3.2), the amount of memory is reduced at the same time.

〔第４の方法〕において、ｕ_t,h（t,k）に関しては、１≦ｋ≦Ｋにおい
てはであって、Ｋ＋１≦ｋ≦Ｍにおいてはｕ_t,h（t,k）＝０
となるように推定する。b^(r) _{j,g（r,j,n）}に関しては、
（4.1）１≦ｎ≦Ｍにおいてはｎの全ての推定値を用い
るか、または、（4.2）１≦ｎ≦Ｎに関しては、b^(r)
_{j,g（r,j,n）}は推定値そのままで、Ｎ＋１≦ｎ≦Ｍにお
いてはと定義しても良い。（4.2）の場合は、メモリ量の削減
も同時に行うことになる。[Fourth method] , For ut _{, h (t, k)} , when 1 ≦ k ≦ K _{Ut, h (t, k)} = 0 when K + 1 ≦ k ≦ M
It is estimated to be b ^(r) _{j, g (r, j, n)}
(4.1) For 1 ≦ n ≦ M, use all estimated values of n, or (4.2) For 1 ≦ n ≦ N, b ^(r)
_{j, g (r, j, n)} is an estimated value as it is, and when N + 1 ≦ n ≦ M, May be defined as In the case of (4.2), the amount of memory is also reduced at the same time.

第１の方法、第２の方法、第３の方法の（3.2）、第
４の方法の（4.2）は、HMMの各状態におけるクラスタの
発生確率を（DPにおける単語標準パターンとしての帰属
度を）全てのクラスタについて記憶するのではなく、HM
Mの各状態（DPの標準パターンの各フレーム）につい
て、確率（帰属度）の高いクラスタの第Ｎ位迄の確率
（帰属度）をもつクラスタについてそれらのラベルと確
率（帰属度）を記憶するものである。例えば、第ｒ番の
単語に対するHMM（標準パターン）は図21または図22の
ように示される。図21は（数67）（数70）によって類似
度を定義する場合、図22は（数64）（数73）で類似度を
定義する場合に用いることが出来る。In the first method, the second method, the third method (3.2), and the fourth method (4.2), the occurrence probability of the cluster in each state of the HMM is calculated by the following equation. ) Instead of remembering every cluster, HM
For each state of M (each frame of the standard pattern of DP), the labels and probabilities (degrees of membership) are stored for clusters having probabilities (degrees of membership) up to the Nth rank of clusters with a high probability (degree of membership). Things. For example, the HMM (standard pattern) for the r-th word is shown as in FIG. 21 or FIG. FIG. 21 can be used to define the similarity by (Equation 67) and (Equation 70), and FIG. 22 can be used to define the similarity by (Equation 64) and (Equation 73).

第１の方法の（1.2）、第２の方法の（2,2）、第３の
方法、第４の方法は、入力パターンとしての帰属度行列
を全てのクラスタについて計算するのではなく、入力パ
ターンの各フレームについて、帰属度の高いクラスタの
第Ｋ位迄の帰属度を計算するものである。例えば、入力
パターンは図23または図24のように示される。図23は
（数64）（数73）によって類似度を定義する場合、図24
は（数67）（数70）で類似度を定義する場合に用いるこ
とが出来る。In the first method (1.2), the second method (2, 2), the third method, and the fourth method, instead of calculating the membership matrix as an input pattern for all clusters, the For each frame of the pattern, the degree of belonging of the cluster having the high degree of belonging to the K-th rank is calculated. For example, the input pattern is shown as in FIG. 23 or FIG. FIG. 23 shows a case where the similarity is defined by (Equation 64) and (Equation 73).
Can be used to define the similarity in (Equation 67) and (Equation 70).

（数64）、（数73）の場合、標準パターンの帰属度行
列として、図22において、b^(r) _{j,g（r,j,n）}の代わりに
log b^(r) _{j,g（t,j,n）}を記憶しておけば（図示せず）、
この計算は積和演算でよい。このとき、図７の従来例よ
りも増える計算量は、（数64）においてはＮ＝３、（数
73）においてはＫ＝３とすれば、各格子点がかけ算が３
回分増加すると言うことになるから、かけ算の回数は25
60＋３×50×100＝4060ということになり、図７の従来
例に比べて確かに増加するが、図５の場合に比べると格
段に少ない計算量であり、なおかつ、図７の従来例に比
べて高い認識精度が得られる。In the case of (Equation 64) and (Equation 73), in FIG. 22, instead of b ^(r) _{j, g (r, j, n)} as the membership matrix of the standard pattern,
By storing log b ^(r) _{j, g (t, j, n)} (not shown),
This calculation may be a product-sum operation. At this time, the amount of calculation that increases compared to the conventional example of FIG.
In 73), if K = 3, each grid point multiplies 3
The number of multiplications is 25
That is, 60 + 3 × 50 × 100 = 4060, which is certainly increased as compared with the conventional example of FIG. 7, but is much smaller in calculation amount as compared with the case of FIG. 5, and further, as compared with the conventional example of FIG. And high recognition accuracy can be obtained.

（数67）、（数70）の場合、入力パターンの帰属度行
列として、図24において、ｕ_t,h（t,k）の代わりにlog
u_t,h（t,k）を記憶しておけば（図示せず）、この計算
は積和演算でよい。このとき、図７の従来例よりも増え
る計算量は、（数67）においてはＮ＝３、（数70）にお
いてはＫ＝３とすれば、各格子点でかけ算が３回分増加
すると言うことになるから、かけ算の回数は2560＋３×
50×100＝4069ということになり、図７の従来例に比べ
て確かに増加するが、図５の場合に比べると格段に少な
い計算量であり、なおかつ、図７の従来例に比べて高い
認識精度が得られる。この場合は前節のlog b_imを記憶
しておく場合に比べると、入力パターンの毎フレームに
ついてlog u_t,h（t,k）の演算が必要である。しかし、
Ｋ＝３とすれば、これは毎フレーム３回のみであり、ｕ
_t,h（t,k）は０〜１の間の数値しか取らないから、０≦
ｘ≦１についてlog xをテーブル化しておけば、この計
算の代わりにテーブルルックアップで済ませることもで
きる。(Number 67), if the (number 70), as membership matrix of the input pattern, in FIG. 24, u _t, log instead of _{h (t, k)}
If u _{t, h (t, k)} is stored (not shown), this calculation may be a product-sum operation. At this time, if the amount of calculation which increases compared to the conventional example of FIG. Therefore, the number of multiplications is 2560 + 3 ×
That is, 50 × 100 = 4069, which is certainly increased as compared with the conventional example of FIG. 7, but is much smaller in calculation amount than in the case of FIG. The recognition accuracy can be obtained. In this case, compared to the case of storing the log b _im in the previous section, log u _t for each frame of the input _pattern, it is necessary to operation of the _{h (t, k).} But,
If K = 3, this is only 3 times per frame and u
_{Since t, h (t, k)} can only take a value between 0 and 1, 0 ≦
If log x is tabulated for x ≦ 1, a table lookup may be sufficient instead of this calculation.

帰属度を（数４）で定義し、ｕ_t,h（t,1）＋…＋ｕ
_t,h（t,K）＝１、ｕ_{t,h（t,K＋１）}＝…＝ｕ_t,h（t,M）
＝０とするときは、u_tmの大きさの順とｄ（y₁,μ_ｍ）の
小ささの順は同じであるから、先ず、全クラスタについ
てｄ（y_t,μ_ｍ）を計算し、上位Ｋのu_tmの計算は、ｄ
（y_t,μ_ｍ）の下位Ｋのクラスタに関して行えば良いと
いうことになり計算量を削減することが出来る。即ち、
１≦ｋ≦Ｋにおける帰属度はとおくときで与えられる。このときは、（数76）の分母の分数計算
と（数77）の計算は共にＫ回である。Ｍ＝256、Ｋ＝３
〜６であるとすれば、この計算量は1/40〜1/80となる。The degree of membership is defined by (Equation 4), and u _{t, h (t, 1)} + ... + u
_{t, h (t, K)} = 1, ut _{, h (t, K + 1)} = ... = ut _{, h (t, M)}
When = 0, the order of the magnitude of u _tm and the order of the _smallest d (y ₁ , μ _m ) are the same, so first calculate d (y _t , μ _m ) for all clusters , The calculation of the upper K u _tm is d
This means that it is sufficient to perform the processing for the lower K clusters of (y _t , μ _m ), and the amount of calculation can be reduced. That is,
The degree of membership at 1 ≦ k ≦ K is When Given by At this time, the fraction calculation of the denominator of (Equation 76) and the calculation of (Equation 77) are both K times. M = 256, K = 3
If で 6, this calculation amount is 1/40 to 1/80.

帰属度を（数４）で定義し、ｕ_{t,h（t,K＋１）}＝…＝
ｕ_t,h（t,M）＝u_t0,u_t,h（t,1）＋…＋ｕ_t,h（t,M）＝
１とするときは、１≦ｋ≦Ｋにおける帰属度はとおくときとして計算でき、（数78）は分母の分数計算はＭ回必要
であるが、u_tmの大きさの順とｄ（y_t,μ_ｍ）の小ささの
順は同じであるから、先ず、全クラスタについてｄ
（y_t,μ_ｍ）を計算し、（数79）ｉ）のu_tmの計算は、ｄ
（y_t,μ_ｍ）の下位Ｋのクラスタに関して行えば良い。The degree of membership is defined by (Equation 4), and u _{t, h (t, K + 1)} = ... =
ut _{, h (t, M)} = _ut0 , ut _{, h (t, 1)} + ... + ut _{, h (t, M)} =
When 1 is set, the degree of membership at 1 ≦ k ≦ K is When (Equation 78) requires a fraction calculation of the denominator M times, but the order of the magnitude of u _tm and the order of the small d (y _t , μ _m ) are the same. D for all clusters
(Y _t , μ _m ) and ( _equation 79) i) u _tm is calculated by d
(Y _t, μ _m) may be carried out with respect to a cluster of lower K of.

あるいは、より簡略化するために、次のようにするの
も一法である。例えば、d_t0＝｛ｄ（y_t,μ_h(K+1)）＋…
＋ｄ（y_t,μ_h(M)）｝／（Ｍ−Ｋ）またはd_t0＝｛ｄ
（y_t,μ_h(K+1)＋ｄ（y_t,μ_h(K)）｝/2とおき、ｄ（y_t,
μ_h(K+1)）＝…＝ｄ（y_t,μ_M)）＝d_t0とし、（数78）をで近似するものである。Alternatively, for simplicity, the following is one method. For example, d _t0 = ｛d (y _t , μ _{h (K + 1)} ) +.
+ D (y _t , μ _{h (M)} )｝ / (M−K) or d _t0 = {d
(Y _t , μ _{h (K + 1)} + d (y _t , μ _{h (K)} )｝ / 2 and d (y _t ,
_{μ h (K + 1))} = ... = d (y t, and μ _M)) = d _t0, the (number 78) Is approximated by

あるいはまた、帰属度算出手段は、帰属度を算出すべ
き観測ベクトルと各クラスタの代表ベクトルとの距離か
ら算出するものであって、その距離が最も小さいもの第
１位として、小ささの順に並べてＫ＋１位以下のクラス
タについては予め定めた1/K以下の一定値とし、距離の
小さいものから順にＫのクラスタについてはそれら個々
の距離と前記一定値から帰属度の総和が１になるように
算出するものである。Alternatively, the membership degree calculation means calculates the membership degree from the distance between the observation vector to be calculated and the representative vector of each cluster, and arranges them in order of the smallest as the first place having the smallest distance. For clusters below the K + 1 rank, a fixed value of 1 / K or less is set in advance. For clusters of K in ascending order of distance, the sum of the degree of membership is calculated from the individual distances and the above-mentioned constant value to become 1 from the individual distances. Is what you do.

相乗型DPマッチングの場合は、標準パターンは帰属度
ベクトル列であるから標準パターンにおいても帰属度の
上位Ｎのクラスタについてその帰属度を登録する際に、
前記u_tmに対して行ったのと全く同様な方法を用いるこ
とが出来る。即ち、b^(r) _{j,g（r,j,1）}＋…＋b^(r)
_{j,g（r,j,N）}＝１、b^(r) _{j,g（r,j,N＋１）}＝…＝b^(r)
_{j,g（r,j,M）}＝０とするときは、Ｋ→N,h（h,k）→ｇ
（r,j,n）,u_t,h（t,k）→b^(r) _{i,g（r,j,n）}として（数7
6）（数77）に準じてb^(r) _jmを求めることが出来る。同
様に、b^(r) _{j,g（r,j,N＋１）}＝…b^(r) _{j,g（r,j,M）}＝b
^(r) _j0,b^(r) _{j,g（r,j,1）}＋…＋b^(r) _{j,g（r,j,M）}＝１と
するときは、（数78）（数79）（数80）等に準じてb^(r)
_jmを求めることが出来る。In the case of the synergistic DP matching, since the standard pattern is a membership vector sequence, when registering the membership of the top N clusters of the membership also in the standard pattern,
The exact same method as performed for _utm can be used. That is, b ^(r) _{j, g (r, j, 1)} + ... + b ^(r)
_{j, g (r, j, N)} = 1, b ^(r) _{j, g (r, j, N + 1)} = ... = b ^(r)
_{When j, g (r, j, M)} = 0, K → N, h (h, k) → g
(R, j, n), u _{t, h (t, k)} → b ^(r) _{i, g (r, j, n)}
6) b ^(r) _jm can be obtained according to ⁽ _Equation 77). Similarly, b ^(r) _{j, g (r, j, N + 1)} = ... b ^(r) _{j, g (r, j, M)} = b
^(r) _j0 , b ^(r) _{j, g (r, j, 1)} + ... + b ^{(r) When} _{j, g (r, j, M)} = 1, ⁽ _{Equation 78} ^{) (} _Equation 79) B ^(r) according to ^(Equation 80 ⁾
_jm can be obtained.

次に、本願第４の発明について説明する。この場合は
前記HMMの場合に有効である。その考え方は、u_tmは、b
_imを推定するときの前記Ｋと、認識を行うときの前記Ｋ
が異なっても理論的には差し支えないと言う点を利用す
るものである。特に、モデルを作成する場合はともか
く、認識の場合はできるだけ計算量の少ないことが望ま
れる場合が多い。計算量の最も少ないのは、離散型のHM
Mであって、これはFVQ/HMMにおいて、認識の場合にＫ＝
１として計算する場合に相当する。従って、モデルの作
成の場合はFVQ/HMMの方法で行って、認識の場合は、離
散型HMMの方法で認識することが出来る。前述したよう
に、FVQ型とすることの意味は、ベクトル量子化による
量子化歪を補完によって減ずると言うよりも、HMMのパ
ラメータを学習する際の学習サンプル数の不十分さを緩
和し、パラメータの推定精度を上げることの効果の方が
大きい。従って、モデルの作成はFVQ型で行って、認識
時は離散型で行うことは、認識もFVQ型で行う場合に比
べて若干性能は落ちるが、モデルの作成も認識も離散型
で行うよりは、特に、コートブックサイズの大きいとこ
ろでは認識率の向上することが実験的にも確かめられ
る。Next, the fourth invention of the present application will be described. This case is effective in the case of the HMM. The idea is that _utm , b
_The K when estimating _im and the K when recognizing
The difference is that there is no problem in theory. In particular, it is often desired that the amount of calculation be as small as possible in the case of recognition, regardless of whether a model is created. The least computational complexity is the discrete HM
M, which in FVQ / HMM, K =
This corresponds to the case where the calculation is performed as 1. Therefore, the model can be created by the FVQ / HMM method, and the model can be recognized by the discrete HMM method. As described above, the meaning of using the FVQ type is not to reduce the quantization distortion due to vector quantization by complementation, but to alleviate the insufficient number of learning samples when learning the parameters of the HMM, and to reduce the parameter. The effect of increasing the estimation accuracy of is larger. Therefore, performing model creation with the FVQ type and performing recognition with the discrete type at the time of recognition is slightly lower in performance than performing recognition with the FVQ type, but it is better than performing model creation and recognition with the discrete type. It can be experimentally confirmed that the recognition rate is improved particularly where the coat book size is large.

なお、時間軸の線形な伸縮によるマッチングにおいて
も、前記帰属度ベクトル同士の比較に基づいて入力パタ
ーンと標準パターンの比較を行うことが出来る。この場
合も、標準パターのフレーム数を入力パターンのフレー
ム数に合わせるべく線形に伸縮する場合は、（数７）の
類似度の定義が使え、入力パターンのフレーム数を標準
パターンのフレーム数に合わせるべく線形に伸縮する場
合は、（数43）の類似度の定義が使える。In the matching based on the linear expansion and contraction of the time axis, the input pattern and the standard pattern can be compared based on the comparison between the membership vectors. Also in this case, when the number of frames of the standard pattern is linearly expanded and contracted so as to match the number of frames of the input pattern, the definition of the similarity of Expression 7 can be used, and the number of frames of the input pattern is adjusted to the number of frames of the standard pattern. In the case of expanding and contracting linearly as much as possible, the definition of the similarity of (Expression 43) can be used.

産業上の利用可能性本第１の発明によれば、Kullbach−Lejbler Divergen
ceという距離尺度を用いて、少ない計算量で精度良くワ
ードスポッティングが可能なHMM装置を提供できる。Industrial Applicability According to the first invention, Kullbach-Lejbler Divergen
It is possible to provide an HMM device capable of performing word spotting with a small amount of calculation and with high accuracy by using a distance measure of ce.

本第２の発明によれば、従来のベクトル量子化に基づ
くDPマッチングにおいては、特徴ベクトルを唯一つのク
ラスタに属するとしていたのを、本発明により特徴ベク
トルは複数のクラスタにそれぞれのクラスタに対する帰
属度に相当する割合で属する、或いはそれぞれのクラス
タのその特徴ベクトルに対する事後確率に相当する割合
で属するとし、その帰属度に基づいてフレーム間の類似
度を確率的な距離尺度で定義したので、種々の要因に基
づくスペクトルの変動に強く、計算量も従来例に比べて
僅かの増加で済むパターン比較装置の実現が可能となっ
た。According to the second invention, in the conventional DP matching based on vector quantization, a feature vector belongs to only one cluster. However, according to the present invention, a feature vector is assigned to a plurality of clusters by a degree of membership for each cluster. Or a cluster corresponding to the posterior probability of each cluster with respect to its feature vector, and based on the degree of belonging, the similarity between frames was defined by a probabilistic distance scale. This makes it possible to realize a pattern comparison apparatus that is resistant to spectrum fluctuations due to factors and requires only a small increase in the amount of calculation as compared with the conventional example.

本第３の発明によれば、各認識単位に対応するHMMの
各状態毎に全クラスタの発生確率を記憶するのでなく、
確率の高さの順に第Ｎ位までを記憶し、残りは等しい確
率であるとして共通の値１つを記憶することにより、必
要な記憶量の大幅な削減が可能となったものである。According to the third aspect, instead of storing the occurrence probabilities of all clusters for each state of the HMM corresponding to each recognition unit,
By storing up to the N-th place in the order of the probability, and storing one common value assuming that the others have the same probability, it is possible to greatly reduce the required storage amount.

本第４の発明によれば、モデルの作成は相乗型FVQ/HM
Mとして行い、認識は離散型HMMとして行うようにしたの
で、モデル作成の際の学習サンプル数の不足による推定
誤差を小さくし、認識時は少ない計算量で済む装置が実
現できる。According to the fourth aspect of the present invention, the model is created by a synergistic FVQ / HM
Since the recognition is performed as M and the recognition is performed as the discrete HMM, an estimation error due to the shortage of the number of learning samples at the time of model creation is reduced, and a device that requires a small amount of calculation at the time of recognition can be realized.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 531 G10L 3/00 515 G10L 3/00 541 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continued on the front page (58) Fields surveyed (Int. Cl. ⁶ , DB name) G10L 3/00 531 G10L 3/00 515 G10L 3/00 541 JICST file (JOIS)

Claims

(57) [Claims]

1. A system to be analyzed has a plurality of states, a feature vector space is clustered, and a representative book of each cluster is stored in a form retrievable by its label. And a cluster occurrence probability storage means for storing the occurrence probability of each label (ie, the occurrence probability of each cluster) in (i), and the degree of belonging of the observation vector to each cluster (ie, the observation vector of each cluster) using the codebook. Posterior probability for
And a product sum of a logarithmic value of the occurrence probability of each cluster stored in the calculated cluster occurrence probability storage unit or the equivalent thereof. Vector generation degree calculation means for calculating the degree of occurrence of the observation vector in each state of the system, wherein the cluster generation probability storage means comprises a cluster in which the generation probability of the cluster is N + 1 or less. Is a common value that is not zero, and is calculated so that the sum of cluster probabilities becomes 1.

2. A method according to claim 1, wherein said membership degree calculating means calculates the membership degree for clusters having a membership degree equal to or lower than the K + 1th rank to be zero and the sum of the membership degrees to be one. Signal analyzer.

3. A system to be analyzed has a plurality of states, a feature vector space is clustered, and a representative book of each cluster is stored in a form retrievable by its label. And a cluster occurrence probability storage means for storing the occurrence probability of each label (ie, the occurrence probability of each cluster) in (i), and the degree of belonging of the observation vector to each cluster (ie, the observation vector of each cluster) using the codebook. Posterior probability for
And a product sum of the logarithmic value of the calculated degree of membership of the observation vector to each cluster and the occurrence probability of each cluster stored in the cluster occurrence probability storage means, or equivalent thereto. An observation vector occurrence degree calculating means for calculating an observation vector occurrence degree in each state of the system by calculating an appropriate amount of the observation vector.

4. A cluster occurrence probability storage means in each state stores a cluster calculated so that the sum of the clusters having a predetermined probability N up to the highest N is equal to 1; 4. The signal analysis device according to claim 3, wherein the occurrence probability is zero.

5. The degree of membership calculation means sets the degree of membership of each observation vector to each cluster to a common non-zero value if the degree of membership is less than or equal to the (K + 1) th rank, and sets the sum of the degrees of membership to one. The signal analyzer according to claim 3, wherein the signal is calculated.

6. The signal analyzing apparatus according to claim 1, wherein each state is a state of a hidden Markov model.

7. A codebook in which a feature vector space is clustered and a representative vector of each cluster is stored in a form that can be searched by its label, and a probability of occurrence of each label in each state (ie, a probability of occurrence of each cluster) ) Is stored, and the codebook is used to calculate the degree of membership of the observed vector to each of the clusters (that is, the posterior probability of each cluster to the observed vector). Calculate the product sum of the calculated degree of belonging of the observation vector to each cluster and the logarithmic value of the occurrence probability of each cluster stored in the cluster occurrence probability storage means or an equivalent amount, and calculate the observation vector An observation vector occurrence degree calculating means for calculating the occurrence degree in each state of the system, The estimation of the raster occurrence probability is calculated using the observation vector occurrence degree calculation means. At the time of recognition, the degree of belonging of the observation vector is set to 1, the maximum degree of belonging is set to 1, and all other degrees of belonging are set to 0. Signal analysis device characterized in that the calculation is performed as follows.

8. A cluster storage means into which a feature vector is to be classified, and for vectors x and y to be compared, the degree of belonging of each vector to each cluster or the posterior probability of each cluster to each vector ( A membership degree calculating means for calculating the membership degree vector having the membership degree of each vector as an element, and a distance or similarity between the membership degree vectors. And a similarity calculating means for calculating the degree.
The similarity calculating means calculates the distance or the similarity.
Let a = (a ₁ , ..., a _M ) and b
= (B ₁ , ..., b _M ), the distance or similarity is calculated as one of the following or an equivalent amount thereof, and the feature vectors x and y are calculated using the distance or similarity. A signal analysis device characterized in that the distance or similarity is determined.

9. A cluster storage unit into which a feature vector is to be classified, a degree of belonging of each vector of the vector series to each cluster, and a degree of membership vector having the degree of belonging of each vector to each cluster as an element. , A standard pattern storage means that simultaneously represents the recognition units to be collated with a membership degree vector sequence, and an input pattern comprising a membership degree vector sequence obtained as an output of the membership degree calculation means. Matching means for matching with the pattern, and as a result of the matching,
9. The signal analyzer according to claim 8, wherein a similarity or a distance between the input pattern and the standard pattern is calculated.

10. The time axis of either the input pattern or the standard pattern, or both time axes, are linearly or non-linearly expanded and contracted, and the time axes of both patterns are adjusted to obtain the distance or the distance between the corresponding membership vectors. A similarity calculating means for calculating the similarity; and a cumulative similarity for accumulating the distance or the similarity along the time axis of one of the input pattern and the standard pattern, or both of them. 10. The signal analyzing apparatus according to claim 9, further comprising a calculating unit, wherein the accumulated value is used as a distance or a similarity between the input pattern and the standard pattern.

11. A similarity calculating means for calculating the degree of belonging or similarity between the degree of belonging vectors, each of the degree of belonging vectors constituting the input pattern, and the degree of belonging constituting a standard pattern to be matched with the input pattern. With each of the vectors, the similarity between the associated degree of association vectors of both patterns, the time value of either the input pattern or the standard pattern, or the cumulative value accumulated along both time axes is the minimum or 10. The signal analyzing apparatus according to claim 9, further comprising: a dynamic planning unit that optimally corresponds to the maximum value and calculates the maximum value.

12. The membership vector corresponding to the frame t of the input pattern _{_{a t = (a t1, ...}} , a tM), the membership vector corresponding to the frame j of the reference pattern b _j = (b _j1 , ...,
b _jM ), the k-th (t, j) coordinate on the matching path is x (k) = (t (k), j (k)), and the weighting factor at x (k) is w (x (k) ) _Is the similarity between at _(k) and bj _(k) , and the vector sequence at ₍₁₎ , ...,
a _{t (K)} and b _{j (1),} ..., a cumulative similarity along the path of the b _{j (K)} Where 1 ≦ n ≦ k−1, t
When (k) -t (kn) = 1, w (x (kn +
12. The signal analyzer according to claim 11, wherein 1))... + W (x (k)) = 1.

13. The matching path is t (k) -t (k-
13. The signal analyzer according to claim 12, wherein 1) = 1, w (x (k)) = 1.

14. The matching path is represented by x (k) = (t,
j), for k-1 ≧ n ≧ 1, (1) x (k−1) =
(T−1, j−n) or x (k−1) = (t−1, j),
(2) x (k-1) = (t-1, j-1) or x (k-
1) = (t−1, j) and m = 2,.
m) = (t−1, j−m), (3) For m = 1,..., n−1, x (km) = (t, j−m), x (k− n) = (t
−1, j−n), (4) For m = 1,..., N−1, x (k
-M) = (t, j-m), x (kn) = (t-1, j-
n), (5) x (k-1) = (t-1, j-1) or x
For (k-1) = (t-1, j) and m = 2, ..., n, x
(Km) = (t−1, j−m), and w (x (k)) = 1 for the path (1) and w (x (k)) for the path (2). x (k)) = 1, w (x (km + 1)) =
0, w (x (km−1)) = 0 for the same (3),
13. The signal analysis according to claim 12, wherein w (x (k−n + 1)) = 1 and w (x (km−1)) = 1 / n for (4) and (5). apparatus.

15. Distance or similarity calculating means for calculating the similarity, the membership vector corresponding to the frame t of the input pattern _{_{a t = (a t1, ..}} , a tM), the frame j of the reference pattern The corresponding membership degree vector is represented by b _j = (b _j1 , ..., b _jM ),
Let the k-th (t, j) coordinate on the matching path be x
When (k) = (t (k), j (k)) and the weighting coefficient at x (k) is w (x (k)), _Is the similarity between at _(K) and bj _(K) , and the vector sequence at ₍₁₎ , ...,
a _{t (K)} and the similarity along the path between b _{j (1)} , ..., b _{j (K)} And for 1 ≦ n ≦ k−1, j
When (k) -j (kn) = 1, w (x (kn +
12. The signal analyzing apparatus according to claim 11, wherein 1))... + W (x (k)) = 1.

16. The matching path is j (k) -j (k-
16. The signal analyzer according to claim 15, wherein 1) = 1 and w (x (k)) = 1.

17. The matching path is defined as x (k) = (t,
j), for k-1 ≧ n ≧ 1, (1) x (k−1) =
(T−n, j−1) or x (k−1) = (t, j−1),
(2) x (k-1) = (t-1, j-1) or x (k-
1) = (t, j−1), m = 2,..., N, x (k−
m) = (tm, j−1), (3) For m = 1,..., n−1, x (km) = (tm, j), x (k− n) = (t
−n, j−1), (4) For m = 1,..., N−1, x (k
-M) = (tm, j), x (kn) = (tm, j-
1), (5) x (k-1) = (t-1, j-1) or x
For (k-1) = (t, j-1) and m = 2, ..., n, x
(Km) = (tm, j-1), and w (x (k)) = 1 for the route (1) and w (x (k)) for the route (2). x (k)) = 1, w (x (km + 1)) =
0, w (x (km−1)) = 0 for the same (3),
16. The signal analysis according to claim 15, wherein w (x (k-n + 1)) = 1 and w (x (k-m + 1)) = 1 / n for (4) and (5). apparatus.

18. When the degree of belonging of the feature vector of the frame j of the standard pattern to the cluster m is b _jm , and the number of clusters is M, the magnitude is taken from b _j1 _,. N b _{j, g (j, 1)} , b _{j, g (j, 2)} , ..., b _{j, g (j, N)} (g (j,
n) is the label of the n-th largest cluster in the frame j of the standard pattern, N ≦ M) is the same value, and the rest is a fixed value b ₀ and b _{j, g (j, 1)} +... + b _{j, g (j , N)} + b ₀ (M−
N) = 1 or logarithmic values log b _{j, g (j, 1)} , log b _{j, g (j, 2)} , ..., log b
_{j, g (j, N)} , signal analysis apparatus according to claim 8, further comprising a standard pattern storage means for storing in the form of log b _0.

19. When the degree of belonging of the feature vector of the frame j of the standard pattern to the cluster m is b _jm and the number of clusters is M, the magnitude is taken from b _j1 _,. N b _{j, g (j, 1)} , b _{j, g (j, 2)} , ..., b _{j, g (j, N)} (g (j,
n) is the label of the n-th largest cluster in frame j of the standard pattern, N ≦ M) is b _{j, g (j, 1)} +.
b _{j, g (j, N)} = 1, the remainder is b
_9. The signal analyzing apparatus according to claim 8 _{, wherein j, g (j, N + 1)} =... = b _{j, g (j, M)} = 0 is stored.

20. When the degree of membership of the feature vector y _t of the frame t of the input pattern to the cluster m is u _tm , and the number of clusters is M, the degree of membership to which y _t is to be transformed is
K u's in order of magnitude from u _t1 , ..., u _tM
_{t, h (t, 1)} , ut _{, h (t, 2)} , ..., ut _{, h (t, K)} (h (t, k)
Is the label of the k-th largest cluster in the frame t of the input pattern, K ≦ M) is the same value, and the rest is a constant value u ₀ and u _{t, h (t, 1)} +... + Ut _{, h (t, K) )} + U ₀ (M−K)
9. The signal analyzing apparatus according to claim 8, wherein a value calculated so that = 1 is set.

21. When the degree of membership of the feature vector y _t of the frame t of the input pattern to the cluster m is u _tm and the number of clusters is M, the degree of membership y _t is to be transformed is
K u's in order of magnitude from u _t1 , ..., u _tM
_{t, h (t, 1)} , ut _{, h (t, 2)} , ..., ut _{, h (t, K)} (h (t, k)
Is the label of the k-th largest cluster in the frame t of the input pattern, K ≦ M) is u _{t, h (t, 1)} +.
_{t, h (t, K)} = 1, the value calculated so that the remainder is u
_9. The signal analyzing apparatus according to claim 8 _{, wherein t, h (t, K + 1)} =... = ut _{, h (t, M)} = 0.

Similarity between the j-th frame of the t-th frame and the standard pattern of 22. Input pattern, b _j1, ..., b N pieces of b _j taken in order of size from among _{_jM, g (j, 1)} , b
_{j, g (j, 2)} , ..., b _{j, g (j, N)} (where g (j, n) is the label of the n-th largest cluster in frame j of the standard pattern, N ≦ M) b _{j, g (j, 1)} + ... + b _{j, g (j, N)} + b ₀
And (M-N) = 1 values b ₀ the calculated so that, u _tm or u _t1, which is calculated for all clusters, ..., in correspondence with the order of magnitude from the u _tM u _{t, h (t, 1)} + ... + u
K u calculated so that _{t, h (t, K)} = 1
_{t, h (t, 1)} , ut _{, h (t, 2)} , .., ut _{, h (t, K)} (where h (t, k) is the k-th largest cluster in frame t of the input pattern) Label, K ≦ M), 9. The signal analyzing apparatus according to claim 8, wherein:

23. The similarity between the t-th frame of the input pattern and the j-th frame of the standard pattern corresponds to b _{j, g (} b _j1 _,. _{j, 1)} + ... + b
N b's calculated so that _{j, g (j, N)} = 1
_{j, g (j, 1)} , b _{j, g (j, 2)} , ..., b _{j, g (j, N)} (g (j, n)
Is the label of the n-th largest cluster in the frame j of the standard pattern, N ≦ M) and u _tm calculated for all clusters or u _t1 _,. K u _{t, h (t, 1)} , u _{t, h (t, 2)} , ..., u
_{t, h (t, K)} (h (t, k) is the label of the k-th largest cluster in the frame t of the input pattern, K ≦ M)
to _{u t, h (t, 1} ) + ... + u t, h (t, K) + u 0 (M-K) = 1 values u ₀ calculated so that, 9. The signal analyzing apparatus according to claim 8, wherein:

24. The similarity between the t-th frame of the input pattern and the j-th frame of the standard pattern is determined from b _jm or b _j1 ,..., B _jM calculated for all clusters. _{, G (j, 1)} + ... + bj _{, g (j, N)} =
N b _{j, g (j, 1)} , b calculated to be 1
_{j, g (j, 2)} , ..., b _{j, g (j, N)} (where g (j, n) is the label of the n-th largest cluster in frame j of the standard pattern, N ≦ M) K u _{t, h (t, 1)} , u _{t, h (t, 2)} , ..., ut _{, h (} ₎ taken in order of magnitude from u _t1 , ..., _{ut M} _{t, K)}
(H (t, k) is the label of the k-th largest cluster in frame t of the input pattern, K ≦ M) and u
_t, with respect to _{h (t, 1) + ...} + u t, h (t, K) + u 0 (M-K) = 1 values u ₀ calculated so that, 9. The signal analyzing apparatus according to claim 8, wherein:

25. The similarity between the t-th frame of the input pattern and the j-th frame of the standard pattern is calculated from b _jm or b _j1 ,..., B _jM calculated for all clusters. N b _{j, g (j, 1)} , b _{j, g (j, 2)} , ..., b
_{j, g (j, N)} (g (j, n) is the label of the n-th largest cluster in frame j of the standard pattern, N ≦ M)
b _{j, g (j, 1)} +... + b _{j, g (j, N)} + b ₀ (M−N) = 1 b ₀ and u _t1 , ..., _ut M K, u _{t, h (t, 1)} , calculated so that u _{t, h (t, 1)} +... + Ut _{, h (t, K)} = 1 in the order of magnitude from
u _{t, h (t, 2)} , ..., ut _{, h (t, K) where} h (t, k) is the label of the k-th largest cluster in frame t of the input pattern, K ≦ M On the other hand, The signal analysis device according to claim 1, wherein

26. The degree of membership is calculated from the distance between the vector for which the degree of membership is to be calculated and the representative vector of each cluster. 19. The method according to claim 5, wherein the distances are used as they are up to N, and the degree of membership is calculated using a common value for clusters whose rank is equal to or smaller than K + 1 or N + 1. The signal analyzer according to the above.

27. The method according to claim 27, wherein the common value is K + 1 or N +
27. The signal analyzing apparatus according to claim 26, wherein clusters having a rank of 1 or less are distances and averages for each cluster.

28. A method according to claim 28, wherein the common value is K + 1 or N +
27. The signal analysis device according to claim 26, wherein the cluster having the rank of 1 or less is an average of a minimum distance and a maximum distance.

29. The degree of membership is determined from the distance between the observed vector for which the degree of membership is to be calculated and the representative vector of each cluster is small, and is determined in advance for clusters K + 1 or N + 1 or less by 1 / K or 1 / N. The following constant values are calculated, and the clusters of K or N are calculated in order from the smallest distance so that the sum of the degree of belonging becomes 1 from the respective distances and the constant value, or 18. The signal analyzer according to 18.