JPH0574838B2

JPH0574838B2 -

Info

Publication number: JPH0574838B2
Application number: JP59147189A
Authority: JP
Inventors: Yuriko Ishigaki; Yasuo Sato
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-07-16
Filing date: 1984-07-16
Publication date: 1993-10-19
Also published as: JPS6126095A

Description

[Detailed description of the invention]

〔産業上の利用分野〕本発明は、単語間距離の自動算出方法に関し、
音声認識装置の認識対対象語セツトの適否を事前
評価するのに用いて有効なものである。〔従来の技術〕音声認識装置には音声入力してよい単語群（認
識対象単語セツトという）を予め定めておくもの
があるが、かゝる音声認識装置の入力単語の認識
率を向上させるには該単語セツトに似ている発音
（読み）のものがないようにしておくことが重要
である。単語セツトに含まれる単語それ自体は装
置使用目的により定まり、変更しにくいことが予
想されるが、単語の読みは変えても格別支障ない
から、単語セツト中の各単語の読みは紛らわしい
ものがないように選定しておくのがよい。紛らわしいか否かは、簡単に分るものもある。
例えば数字の「７」は「しち」とも「なな」とも
読めるが、これを「しち」と発音すると数字の
「１」即ち「いち」と紛らわしく、両者は誤認識
され易い。そこで「７」は「なな」と発音するよ
うにすれば「いち」との区別が明瞭になり、これ
は経験的にも知られている。しかし音声認識対象
の単語の数が増加するにつれて、紛らわしいのは
どれとどれか簡単には分らなくなり、また紛らわ
しいものを見つけてその一方を他の読みに代える
と今度は他の単語と紛らわしくなるという問題も
ある。そこで音声認識対象の単語セツトは、２様
３様の読みがある単語については適当な１つを選
んで該単語セツトの読みを固定し、それで実際に
音声認識してみて誤認識が生じるか否かをテスト
し、誤認識が生じれば該当単語を他の読みに代え
て再びテストし、といつたカツトアンドトライの
方法をとつて適当な単語セツトを求めている。し
かしこの方法では時間的、労力的負担が非常に大
きい。そこで実際に音声認識テストをするのではな
く、事前に、まだ文字の段階で、単語セツトの各
単語の読みの適否をチエツクするのが有効であ
る。そして紛らわしいという問題は単語セツト中
の任意の２つの単語間で発生することを考える
と、単語セツト中の全単語につき残りの全単語と
の紛らわしさの程度を全て調べ上げ、誤認識の恐
れがある単語対（の読み）があればその単語セツ
トは不採用とする、のが有効である。単語間の類
似、非類似度を数値で表わすものに単語間距離が
ある。 DPマツチング方法（Velichko et al.Int.J.
Man−Machine Studies、vol.2、p223、1970）
ではこの単語間距離を文字列相互間の距離として
求める。簡単には２つの単語の各音節間のローカ
ル距離を累積しこれらの音節には種々の組合せが
考えられ組合せが異なれば累積値も異なるが、そ
の中の最小値を単語間距離とするものである。〔発明が解決しようとする問題点〕しかしながらこのDP法は、各単語の音節数を
反映したものではない。例えば音節数２の単語Ａ
と音節数３の単語Ｂとの距離が、音節数２の単語
Ｃと音節数２の単語Ｄとの距離に等しいという結
果が得られたとすると、単語Ａ，Ｂの組の単語
Ｃ，Ｄの組の誤認識率は同じであるとされるが、
経験的に言つても音節数の異なる組Ａ，Ｂの認識
率は文字数の等しい組Ｃ，Ｄの認識率より高いは
ずである。この点が、従来のDPマツチング法に
よる距離計算では反映されていない。本発明は、上述したDP法の不十分さを補い、
認識対象単語の事前評価をより実用性の高いもの
にしようとするものである。〔問題点を解決するための手段〕本発明は、音声認識対象の単語セツトの各単語
間の距離を算出する方法において、各単語を個々
の音節に分解するステツプと、音節数Ｍの単語と
音節数Ｎの単語との距離をDPマツチング法で求
めるステツプと、得られた距離に、２つの単語
が同じ音節数を持てば、そうでない場合よりも距
離は小さい、２つの単語が同じ単語長さを持つ
場合には、DP法で求められた終点までの距離を
単語長で割つたものを正規化された距離とする、
距離は２つの距離について常に対称である、の
３条件を満足させる修正を施す正規化定数を乗じ
て単語間距離を求めるステツプを有することを特
徴とするものである。次に実施例を参照しながら
構成及び作用を詳細に説明する。〔実施例〕第１図はかな表記された２つの単語Ａ，Ｂを処
理ブロツク１，２で音節に分解し、処理ブロツク
３でその単語間距離をDP法により求めるシステ
ムの概略図である。４は音節に分解する際に使用
するかなと音節の対応表、５は距離を求める際に
使用する音素距離マトリクスである。単語Ａ，Ｂのかな表記は50音（46音）、濁音、
鼻濁音、半濁音、促音、撥音、拗音、外来語の
“スイ”、“テイ”などと、これらの長音のうちの
任意の１つ又は複数の組合せからなる。また処理
ブロツク１，２により分割される音節は子音＋母
音からなり、母音はａ、ｉ、ｕ、……等、子音は
ｓ、ｋ、ｔ、……等からなる。かなのローマ字表
記表の一部を次に示す。 [Industrial Application Field] The present invention relates to a method for automatically calculating distance between words,
This is effective when used to pre-evaluate the suitability of a target word set for recognition by a speech recognition device. [Prior Art] Some speech recognition devices have a predetermined group of words (referred to as a recognition target word set) that may be input by voice.In order to improve the recognition rate of input words of such speech recognition devices, It is important to ensure that there are no pronunciations (pronunciations) similar to the word set. The words themselves included in the word set are determined by the purpose of use of the device and are expected to be difficult to change, but there is no particular problem in changing the pronunciation of the words, so the pronunciation of each word in the word set is not confusing. It is best to select as follows. Sometimes it's easy to tell whether something is confusing or not.
For example, the number ``7'' can be read as ``shichi'' or ``nana,'' but when pronounced as ``shichi,'' it is confusingly pronounced as the number ``1,'' or ``ichi,'' and both are easily misrecognized. Therefore, if "7" is pronounced as "nana", it will be clearly distinguished from "ichi", and this is known from experience. However, as the number of words targeted for speech recognition increases, it becomes difficult to tell which words are confusing, and when one finds a confusing word and substitutes another pronunciation, it becomes confusing with the other word. There are also problems. Therefore, for the word set to be used for speech recognition, for words that have two or three different pronunciations, select an appropriate one and fix the pronunciation of the word set, and then try actually performing speech recognition to see if there are any misrecognitions. We use a cut-and-try method to find an appropriate set of words. However, this method requires an extremely large amount of time and labor. Therefore, rather than conducting an actual speech recognition test, it is effective to check the pronunciation of each word in the word set in advance, while it is still in the character stage. Considering that the problem of confusion occurs between any two words in a word set, it is necessary to investigate the degree of confusion between every word in the word set and all the remaining words to eliminate the risk of misrecognition. It is effective to reject a word set if there is a certain word pair (pronunciation). Word distance is a numerical expression of the degree of similarity and dissimilarity between words. DP matching method (Velichko et al.Int.J.
Man-Machine Studies, vol.2, p223, 1970)
Now, find this distance between words as the distance between character strings. Simply put, the local distance between each syllable of two words is accumulated, and there are various possible combinations of these syllables, and the cumulative value differs depending on the combination, but the minimum value among them is taken as the distance between words. be. [Problem to be solved by the invention] However, this DP method does not reflect the number of syllables in each word. For example, word A with 2 syllables
Suppose that the distance between and word B with 3 syllables is equal to the distance between word C with 2 syllables and word D with 2 syllables, then the distance between words C and D in the pair of words A and B is It is said that the false recognition rate for the pairs is the same, but
From experience, the recognition rate for sets A and B, which have different numbers of syllables, should be higher than the recognition rate for sets C and D, which have the same number of characters. This point is not reflected in distance calculations using the conventional DP matching method. The present invention compensates for the insufficiencies of the DP method described above,
This is an attempt to make the prior evaluation of recognition target words more practical. [Means for Solving the Problems] The present invention provides a method for calculating the distance between each word in a word set to be speech recognized, which includes a step of decomposing each word into individual syllables, and a step of dividing each word into individual syllables. The step of calculating the distance to a word with the number of syllables N using the DP matching method, and the distance obtained, if the two words have the same number of syllables, the distance is smaller than if the two words have the same word length. In the case where the distance to the end point determined by the DP method is divided by the word length, the normalized distance is
This method is characterized by a step of determining the distance between words by multiplying it by a normalization constant that makes corrections that satisfy three conditions: the distance is always symmetrical with respect to the two distances. Next, the structure and operation will be explained in detail with reference to embodiments. [Embodiment] Fig. 1 is a schematic diagram of a system in which two words A and B written in kana are decomposed into syllables in processing blocks 1 and 2, and the distance between the words is determined in processing block 3 by the DP method. 4 is a correspondence table between kana and syllables used when decomposing into syllables, and 5 is a phoneme distance matrix used when calculating distances. The kana notation of words A and B is 50 sounds (46 sounds), voiced sounds,
It consists of nasal duplication, semi-voiced consonance, consonant consonant, consonant consonant, consonant consonant, consonant consonant, loanword "sui", "tei", etc., and any combination of one or more of these long consonants. Further, the syllables divided by processing blocks 1 and 2 consist of consonants and vowels, where the vowels consist of a, i, u, . . . , etc., and the consonants consist of s, k, t, . . . . A part of the romaji notation table for kana is shown below.

【表】あ【table】 a

Claims

[Scope of Claims] 1. A method for calculating the distance between each word of a word set to be speech recognized, comprising the steps of decomposing each word into individual syllables, and dividing a word with M syllables into a word with N syllables. distance
If two words have the same number of syllables, the distance is smaller than if the two words have the same number of syllables.
If two words have the same word length, DP
The normalized distance is the distance to the end point calculated by the word length divided by the word length, and the distance is always symmetric about the two distances. 1. A method for automatically calculating distances between words, comprising the step of calculating distances between words.