JPH0318983A

JPH0318983A - Pattern collating system

Info

Publication number: JPH0318983A
Application number: JP1153926A
Authority: JP
Inventors: Tetsuya Muroi; 室井　哲也
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-06-15
Filing date: 1989-06-15
Publication date: 1991-01-28

Abstract

PURPOSE:To execute the category classification by the same distance measure even in the case lack of uniformity exists in a learning sample by synthesizing a vector to be referred to an unknown input vector by taking the accuracy of a feature vector into consideration. CONSTITUTION:A phoneme recognizing part 3 is provided with a reference vector synthesizing part 4 and a distance calculating part 5, and when a feature vector for representing a large category (i) and a feature vector for representing a small category (j) belonging to the large category (i) are denoted as Yi and Zij, respectively, a reference vector C to be compared with and referred to an inputted unknown vector X is synthesized by an expression of C = (1 - Wij)Yi +Wij Zij, and a distance between X and C is calculated. Wij is a constant of a range of 0<=Wij<=1 and an index for showing the reliability of Zij, and when Zij is generated by learning data of a sufficient number, a large value is taken. In such a way, even in the case lack of uniformity of reliability of the category caused by a difference of the number of learning samples exists, the category classification can be executed by the same distance measure or a similar measure.

Description

【発明の詳細な説明】該１０Ｌ腎本発明は、音声認識や文字認識装置等のパターンマツチ
ング部におけるパターン照合方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a pattern matching method in a pattern matching section of a voice recognition or character recognition device.

従」ｑ【４ベクトル量子化やカテゴリー分類問題では、カテゴリー
数と学習サンプル数の関係が常に問題となっている。つ
まり、カテゴリー数が大きい程、細かい分類ができるが
、逆に１カテゴリーあたりの学習サンプル数が小さくな
るため、カテゴリーらしさを表わす指標（代表ベクトル
、標準パターン等）が正確でなくなり、未知入力のカテ
ゴリーへの帰属度もしくは距離が正確に求まらなくなっ
てしまう、また、逆に、カテゴリー数を減らせば、カテ
ゴリーらしさを表わす指標は統計的に信頼できるものに
なるが、本来異質なものが同一カテゴリーに配属された
り、量子化歪が大きくなる等の欠点があった。[4] In vector quantization and category classification problems, the relationship between the number of categories and the number of learning samples is always an issue. In other words, the larger the number of categories, the more detailed classification can be made, but conversely, the number of learning samples per category becomes smaller, so the indicators (representative vectors, standard patterns, etc.) that express category-likeness become less accurate, and unknown input On the other hand, if the number of categories is reduced, the index representing category-likeness becomes statistically reliable, but different items may not be in the same category. There were drawbacks such as increased quantization distortion and increased quantization distortion.

ファジーベクトル量子化（「ファジーベクトル量子化を
用いたスペクトログラムの正規化」音響学会論文誌４５
巻２号（１９８９）　）は、この欠点を改良したもので
、少ないカテゴリー数であっても量子化歪を小さくでき
る。Fuzzy vector quantization ("Normalization of spectrograms using fuzzy vector quantization" Journal of the Acoustical Society of Japan 45)
Vol. 2, No. 2 (1989)) improves on this drawback and can reduce quantization distortion even with a small number of categories.

しかし、細かい分類が必要になったとき、依然として上
記の欠点は解析されていなかった。また、従来は学習サ
ンプル数のふぞろいに対応しにくいという欠点があった
０例えば、特定話者の音声認識装置に標準パターンとし
て音素を登録する場合について考えてみる。人間は、音
素単位で発声する事は不可能に近いので、例えば単語単
位で発声し、これを音素単位で切り出して標準パターン
にする。ここで問題となるのは音素の頻度分布の片寄り
である。例えば、／ａ／のデータは１００個そろったが
／ｐ／のデータは２つしか得られなかった、というよう
な事態が起こり得る。この結果／ａ／の標準パターンは
、統計的にも十分信頼できるものであるが、／ｐ／に関
しては、精度の良い標準パターンは期待できない。また
、／ｐ／に関しては、ＨＭＭやベイズ判定、マハラノビ
スの距離等での認識は不可能になってしまう。また、極
端な例では、用意されたカテゴリー（音素）に対する発
声がない場合も起こり得る。However, when detailed classification became necessary, the above drawbacks were still not analyzed. Furthermore, conventional techniques have had the disadvantage of being difficult to deal with variations in the number of learning samples. For example, consider the case where phonemes are registered as standard patterns in a speech recognition device for a specific speaker. Since it is nearly impossible for humans to utter words in units of phonemes, for example, they utter words in units of words, which are then cut out in units of phonemes to create a standard pattern. The problem here is that the frequency distribution of phonemes is uneven. For example, a situation may occur in which 100 pieces of data for /a/ are obtained, but only two pieces of data for /p/ are obtained. As a result, the standard pattern for /a/ is statistically reliable enough, but a highly accurate standard pattern cannot be expected for /p/. Further, regarding /p/, recognition using HMM, Bayesian judgment, Mahalanobis distance, etc. becomes impossible. Furthermore, in extreme cases, there may be cases where there is no utterance for a prepared category (phoneme).

以上のように、学習サンプルにふぞろいがある場合には
、同一の距離尺度ではカテゴリー分類が不可能な場合が
あった。As described above, if the training samples are uneven, category classification may not be possible using the same distance measure.

且−□拵本発明は、上述のごとき実情に鑑みてなされたもので、
学習サンプル数の違いによるカテゴリーの信頼性のふぞ
ろいがあった場合でも、同一の距離尺度、もしくは類似
尺度によってカテゴリー分類を可能にするパターン照合
方式を提供することを目的としてなされたものである。且-□拵The present invention was made in view of the above-mentioned circumstances,
The purpose of this method is to provide a pattern matching method that enables category classification using the same distance measure or similarity measure even when there are variations in the reliability of categories due to differences in the number of training samples.

盪−一双本発明は、上記目的を達成するために、人力された未知
ベクトルＸとカテゴリーを代表する特徴ベクトルとを照
合するパターン照合方式において、各カテゴリーはＭ個
の大カテゴリーに分類されており、大カテゴリーｉを代
表する特徴ベクトルをＹｉとし、大カテゴリーｉはさら
にＮ（１）個の小カテゴリーに分類されており、大カテ
ゴリーｉに属する小カテゴリーｊを代表する特徴ベクト
ルをＺｉｊとしたとき、前記未知ベクトルＸが、大カテ
ゴリーｉ内の小カテゴリーｊに帰属する度合、もしくは
Ｘが、大カテゴリーｉ内の小カテゴリーｊとの距離を算
出する際。(ii) In order to achieve the above object, the present invention uses a pattern matching method that matches a human-generated unknown vector X with a feature vector representing a category, in which each category is classified into M large categories. , where Yi is the feature vector representing the large category i, the large category i is further classified into N(1) small categories, and Zij is the feature vector representing the small category j belonging to the large category i. , when calculating the degree to which the unknown vector X belongs to the small category j within the large category i, or the distance between X and the small category j within the large category i.

Ｃ＝（Ｉ　　Ｗｉｊ　）　　Ｙｉ十Ｗｉｊ　　Ｚｉｊ　
　Ｏ≦Ｗｉｊ≦１なる合成ベクトルＣを参照して、前記
帰属する度合、もしくは距離を算出することを特徴とし
たものである。以下、本発明の実施例に基づいて説明す
る。C=(I Wij) Yi ten Wij Zij
This method is characterized in that the degree of belonging or the distance is calculated with reference to a composite vector C satisfying O≦Wij≦1. Hereinafter, the present invention will be explained based on examples.

第１図は、特定話者音声認識におけるパターン照合部に
本発明のパターン照合方式を適用した場合の一実施例を
説明するためのシステム構成図で、図中、１はマイク、
２は特徴系列変換部５３は音素認識部である。FIG. 1 is a system configuration diagram for explaining an embodiment in which the pattern matching method of the present invention is applied to a pattern matching section in specific speaker speech recognition. In the figure, 1 is a microphone;
2, the feature sequence conversion section 53 is a phoneme recognition section.

マイク１から入力された音声波形は、特徴系列変換部２
で特徴ベクトルの時系列に変換される。The audio waveform input from the microphone 1 is processed by the feature series converter 2
is converted into a time series of feature vectors.

音声認識に有効な特徴ベクトル及びその変換手段は様々
なものが知られている。例えば、１２Ｋ）［ｚ。Various types of feature vectors and means for converting them are known that are effective for speech recognition. For example, 12K) [z.

１２ｂｉｔでＡ／Ｄ変換した後、窓長２５６　ｐａｉｎ
ｔ、シフト幅１２８ｐａｉｎｔで１４次の線形予測係数
を求めれば良い。After A/D conversion with 12 bits, window length is 256 pain
t and a shift width of 128 paint to find the 14th linear prediction coefficient.

その後、音’−ｈ　Ｊ識部３では特徴ベクトルＸについ
て音素認識を行なう。ここでＸは１フレームのベクトル
でも良いし、数フレーム単位でまとまったベクトルでも
良い。Thereafter, the sound '-h J recognition section 3 performs phoneme recognition on the feature vector X. Here, X may be a vector for one frame or a vector grouped in units of several frames.

音素認識においては調音結合の影響を避けるため、前後
の音韻環境ごとに異なった標準パターンを用意しておく
ことが望ましい。例えば、／に／の音素標準パターンは
後続母音ごとに５種類用意するのがよい。ところが、全
ての音韻環境について標準パターンを用意する、即ち話
者が登録するのでは、発声数が膨大になってしまい、現
実的ではない。In phoneme recognition, in order to avoid the effects of articulatory combination, it is desirable to prepare different standard patterns for each preceding and following phonetic environment. For example, it is preferable to prepare five types of phoneme standard patterns for /ni/ for each subsequent vowel. However, preparing standard patterns for all phonetic environments, that is, having speakers register them, would result in a huge number of utterances, which is not realistic.

第２図は、音素認識部の構成を示す図で、図中、４は参
照ベクトル合成部、５は距離計算部、６は信頼度Ｗｉｊ
、７は標準パターンＺｉｊ、８は標準パターンＹｉであ
る。標準パターンは、音素ごとに作成された標準パター
ンＹｉ　（１≦ｉ≦Ｍ、Ｍは音素数）と、前後の音韻環
境ごとに作成された標準パターンＺｉｊ（１≦ｊ≦Ｎ　
（ｉ）　、　Ｎ　（ｉ）は音素ｉの環境数）との２種類
を用意する。例えば、前述の／に／の例では、音韻環境
を考慮しないで作成されたＹｉ（複数個であっても良い
）と後続母音ごとに５種類用意されたＺｉｊである。（
Ｎ（ｉ）＝５）そして、入力された未知ベクトルＸと比
較参照されるべき参照ベクトルＣをＣ：（１−Ｗｉｊ　）　Ｙｉ＋Ｗｉｊ　Ｚｉｊ　　　　
（１）なる式で合成し、ＸとＣとの距離を算出する。式
（１）において、Ｗｉｊは０≦Ｗｉｊ≦１の範囲の定数
である。ＷｉｊはＺｉｊの信頼度を表わす指標であり。FIG. 2 is a diagram showing the configuration of the phoneme recognition unit, in which 4 is the reference vector synthesis unit, 5 is the distance calculation unit, and 6 is the reliability level Wij.
, 7 is a standard pattern Zij, and 8 is a standard pattern Yi. The standard patterns are a standard pattern Yi (1≦i≦M, M is the number of phonemes) created for each phoneme, and a standard pattern Zij (1≦j≦N
Two types are prepared: (i) and N (i) is the number of environments of phoneme i). For example, in the above-mentioned example of /ni/, there are Yi (which may be plural) created without considering the phonetic environment and Zij, which are prepared in five types for each subsequent vowel. (
N(i)=5) Then, compare the input unknown vector
The distance between X and C is calculated by combining them using the formula (1). In equation (1), Wij is a constant in the range of 0≦Wij≦1. Wij is an index representing the reliability of Zij.

Ｚｉｊが十分な数の学習データで作成されている時はど
大きな値をとるようにする。逆に、Ｚｉｊに信頼性が少
ない場合には、Ｙｉｊの方を信頼して、Ｃが合成される
ようになる。When Zij is created with a sufficient number of learning data, it takes a large value. Conversely, if Zij is less reliable, Yij is more trusted and C is synthesized.

例えば、前述の／に／の例で、／ｋｕ／の学習データが
少なかった時には／ｋａ／〜／ｋｏ／の全体で作成され
た／に／の［１４１！パターンＹｉｊを信頼して、Ｃ岬
Ｙｉｊとなるように式（１）は設定されている。For example, in the example of /ni/ mentioned above, when there was less training data for /ku/, the whole of /ka/ to /ko/ was created with /ni/'s [141! Equation (1) is set so that pattern Yij is trusted and C cape Yij is obtained.

ＸとＣとの距離ｄは、例えばユークリッド距離を用いて
、ａ＝ｌｘ−ａｌｌ”　　　　　　　　　　（２）として
計算すれば良い。The distance d between X and C may be calculated using, for example, Euclidean distance as follows: a=lx-all'' (2).

羞−一末以上の説明から明らかなように、本発明によると、参照
ベクトル合成部ではＣ＝　（１−Ｗｉｊ　）　Ｙｉ＋Ｗｉｊ　Ｚｉｊなる方
法で、未知入力ベクトルＸが参照すべきベクトルＣを合
成している。このため、Ｚｉｊがごく少数の学習データ
から作成されており、Ｚｊｊの精度が悪い場合には１重
みＷｉｊを小さく設定する事により、大カテゴリーｉを
代表するＹｉによる概略的な近似によってＣを合成する
ことができる。As is clear from the above explanation, according to the present invention, the reference vector synthesis unit synthesizes the vector C to be referenced by the unknown input vector There is. For this reason, if Zij is created from a very small number of learning data and the accuracy of Zjj is low, by setting the 1 weight Wij small, C can be synthesized by rough approximation by Yi representing the large category i. can do.

逆に大カテゴリーｉ内の小カテゴリーｊを代表するＺｉ
ｊが多くの学習データから作成されている場合にはＷｉ
ｊを大きく設定する事により、精密な参照ベクトルが合
成する事ができる。Conversely, Zi representing the small category j within the large category i
If j is created from many learning data, Wi
By setting j large, a precise reference vector can be synthesized.

また、各小カテゴリーごとの学習データ数のふぞろいに
起因する各Ｚｊｊの信頼性のバラツキがあった場合でも
、本発明によって同一の合成力法で参照ヴクトルを合成
することができる。Further, even if there is variation in reliability of each ZJJ due to variation in the number of learning data for each small category, reference vectors can be synthesized using the same synthetic force method according to the present invention.

[Brief explanation of drawings]

第１図は、特定話者音声認識におけるパターン照合部に
本発明のパターン照合方式を適用した一実施例を説明す
るためのシステム構成図、第２図は、音素認識部の構成
図である。１・・マイク、２・・・特徴系列変換部、３・・・音素
認識部、４・・参照ベクトル合成部、５・・・距離計算
部、６・・・信頼度、７，８・・・４ｉ１＄パターン。FIG. 1 is a system configuration diagram for explaining an embodiment in which the pattern matching method of the present invention is applied to a pattern matching unit in specific speaker speech recognition, and FIG. 2 is a configuration diagram of a phoneme recognition unit. 1... Microphone, 2... Feature sequence conversion unit, 3... Phoneme recognition unit, 4... Reference vector synthesis unit, 5... Distance calculation unit, 6... Reliability, 7, 8...・4i1$ pattern.

Claims

[Claims] 1. In a pattern matching method that matches an input unknown vector X with a feature vector representing a category,
Each category is divided into M large categories,
Let Yi be the feature vector representing the large category i, the large category i is further classified into N(i) small categories, and let Zij be the feature vector representing the small category j belonging to the large category i, When calculating the degree to which the unknown vector X belongs to the small category j within the large category i, or the distance between X and the small category j within the large category i, C=(1-Wij)Yi+WijZ1j0≦Wij≦ 1
A pattern matching method characterized in that the degree of attribution or distance is calculated by referring to a composite vector C.