JPS5872995A

JPS5872995A - Word voice recognition

Info

Publication number: JPS5872995A
Application number: JP56171365A
Authority: JP
Inventors: 入間野　孝雄
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1981-10-28
Filing date: 1981-10-28
Publication date: 1983-05-02

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、入力音声に対して先ず音素認識を行ない、こ
の認識音素系列を、音素表記された単語辞書と照合して
単語を認識する単語音声認識方法に関し、特に無声化母
音と有声母音とを区別することにより単語認識率を向上
することを目的とするものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a word speech recognition method that first performs phoneme recognition on input speech and then recognizes words by comparing this recognized phoneme sequence with a word dictionary in which phonemes are expressed. The purpose of this method is to improve the word recognition rate by distinguishing between voiced vowels and voiced vowels.

まず、従来のこの種の単語音声認識方法について第１図
とともに説明する。First, a conventional word speech recognition method of this type will be explained with reference to FIG.

第１図に示すように、入力単語音声を分析し、この入力
単語音声の特徴を抽出して入力単語音声を構成する音素
を認識し、この認識された音素系列を音素表記された単
語辞書中の各単語とコンフユージヨンマトリクス（Ｃｏ
ｎｆｕｓｉｏｎ　Ｍａｔｒｉｘｓ以下Ｃ，Ｍ、と略す）
を用いて照合し、尤度を計算し、尤度の大きい単語を認
識単語とするものである。As shown in Figure 1, the input word speech is analyzed, the features of this input word speech are extracted, the phonemes that make up the input word speech are recognized, and the recognized phoneme sequence is stored in a word dictionary with phoneme notation. Each word and the conflation matrix (Co
(hereinafter abbreviated as C, M)
, the likelihood is calculated, and the word with the highest likelihood is selected as the recognized word.

第１表は上記音素表記された単語辞書（都市名）の−例
を示している。Table 1 shows an example of the word dictionary (city names) expressed in phonemes.

第　　　１　　　表また、第２表は単語辞書の音素表記法の一例を示してい
る。Table 1 Table 2 also shows an example of phoneme notation in the word dictionary.

第１表に示すように単語辞書の音素表記は、あたかもａ
−〜字を書くかのごとく機械的になされ、無声化°よの
考慮はなかった。また従来例において音素認識段階（第
１図）において、音声のスペクトル等の物理的パラメー
タに基づいて音素認識を行なっており、その結果、非常
に長い持続時間を持つ無声子音が認識された場合には、
その子音を３個の音素、すなわち無声子音音素、無声化
母音音素、無声子音音素の連続したものと見なし、音声
系列を修正していた。例えば長い持続時間を持つ無声子
音ＣＩＢＵ　のＣが認識された場合、３個の音素ＣＵＩ
Ｃ＄連続したものと見なし音声系列をＣＵＩＣＩＢＵ　
　に修正していた。通常無声化する母音は！又はＵであ
るので、修正によシ加えられる母音は工とＵの中間母音
ＵＩである。As shown in Table 1, the phoneme notation in the word dictionary is as if a
-~ It was done mechanically, as if writing letters, and no consideration was given to devoicing °. Furthermore, in the conventional example, phoneme recognition is performed at the phoneme recognition stage (Figure 1) based on physical parameters such as the spectrum of speech, and as a result, when a voiceless consonant with an extremely long duration is recognized, teeth,
The consonant was regarded as a series of three phonemes: an unvoiced consonant phoneme, a devoiced vowel phoneme, and an unvoiced consonant phoneme, and the phonetic sequence was modified. For example, if C of the long-duration voiceless consonant CIBU is recognized, three phonemes CUI
C$ CUICIBU considers the audio sequence to be continuous.
It was corrected. Which vowels are usually devoiced? or U, so the vowel added in the modification is the intermediate vowel UI between 葡 and U.

第２図は従来例におけるＣ、Ｍ、の一部を示している。FIG. 2 shows part of C and M in the conventional example.

このＣ，Ｍ・中の数字は、単語辞書中のそれぞれの音素
−“Ｄが、どの音素Ｗに認識されるかの確率Ｐ（Ｗ／Ｄ
）をチで表わしたものである。Ｃ，Ｍ・中の認識音素Ｕ
Ｉには、物理的性質がＵとＩの中間である有声遍音と・
非常１長１持続ｉ間を持９無声子音が認識された結果修
正された無声化母音とを含む。The numbers in C and M are the probability P (W/D
) is expressed in chi. Recognized phoneme U in C, M.
I has a voiced ubiquitous tone whose physical properties are intermediate between U and I, and
It has 1 duration, 1 duration, and 9 voiceless consonants and a modified devoiced vowel as a result of recognition.

日本語では、「ん」以外の子音は、必ず有声の母音又は
半母音が後続することが原則である。例えばＮＡＲＡ（
ナラ）Ｋ　　Ｊ　　ＯＯＴ　　Ｏ（キョート）等である
。この原則に基づく限り、前記従来の方法で認識される
ことはない。更に従来例でも一応修正による無声化対策
がなされていた。In Japanese, as a general rule, consonants other than ``n'' are always followed by a voiced vowel or semi-vowel. For example, NARA (
Oak) K J OOT O (Kyoto) etc. As long as it is based on this principle, it will not be recognized by the conventional method. Furthermore, even in the conventional example, countermeasures for devoicing have been taken through modification.

しかしながら、実際には語中の■及びＵは２０チ程度無
声化が認められる。このため従来例では無声化対策が不
充分なため、単語を誤認識する欠点があった。However, in reality, ■ and U in words are devoiced by about 20 characters. For this reason, in the conventional example, countermeasures against devoicing were insufficient, resulting in the drawback of erroneous recognition of words.

本発明は、上記従来例の欠点を除≠・するものであり、
以下に本発明の一実施例について説明する。The present invention eliminates the drawbacks of the above conventional example,
An embodiment of the present invention will be described below.

本実施例の単語辞書においては、無声化し易い母音には
予め無声化記号を付し、本来の有声母音とは異なる音素
表記にする。例えば都市名、府中「フチクー」は、従来
の単語辞書では第１表に示すように（ＨＵＣＪＵＵ　　
　）と表記していたが、「フ」は無声化し易いので、本
実施例では（）ＩＵ−ＣＪＩＪＵ　　　）と表わし、Ｕ
−とＵとを別の音素とする。In the word dictionary of this embodiment, vowels that are likely to be devoiced are given a devoicing symbol in advance, and are given a different phoneme notation from the original voiced vowel. For example, the city name Fuchu ``Fuchiku'' is written in conventional word dictionaries as shown in Table 1 (HUCJUU
), but since "fu" is easily devoiced, in this example, it is expressed as ()IU-CJIJU ), and U
- and U are different phonemes.

また、本実施例では、音素認識段階における無声化対策
として音素系列修正のために付加された母音とＵＩ−と
表わし、有声のＵＩと区別するものである。Furthermore, in this embodiment, the vowel added to correct the phoneme sequence as a measure against devoicing in the phoneme recognition stage is expressed as UI-, to distinguish it from voiced UI.

第３図は本実施例におけるＣ、Ｍ、の一部を示している
。第３図において、単語辞書中の音素ＩとＩ−ｔＵとＵ
−がそれぞれどのように認識されたかを比べてみると、
有声音と無声音とが明らかに分離されていることがわか
る。また第３図から無声化し易い母音は、音素認識時に
脱・落し易いこともわかるＯ従来の方法においては、例えば都市名、府中「フチュー
」（単語辞書では（Ｉ（ＵＣＪＵＵ　　））の音素認識
結果が１．シばしば（ＳＵＩＳＥＵ）と々す、この場合
の単語認識結果は「フチクー」ではなく「シンシーク」
（単語辞書では（ＳＩＮ＝ＺＪＵＫＵ　　））であった
。これに対して本実施例によれば「フチュー」（単語辞
書ではｔＨｕ−ｃ　　Ｊ　ｕｕ））の従来例と同一デー
タの音素認識結果は（ＳＵＩ−８ＥＵ　　　）となシ、
単語認識結果は正しく「フチュー」となった。これは従
来の認識音素（Ｕｌ）が、単語辞書のＵとも■とも同程
度の尤度を持っていたのに対し、本実施例における認識
音素（ＵＩ−）は、「フチュー」の「フ」の（Ｕ−）と
は高い尤度を持ち、「シンノーク」の「シ」の＋Ｉ）と
は低い尤度しか持たないからでさる。FIG. 3 shows part of C and M in this embodiment. In Figure 3, the phonemes I, I-tU and U in the word dictionary
Comparing how each - was recognized,
It can be seen that voiced sounds and unvoiced sounds are clearly separated. Also, from Figure 3, it can be seen that vowels that are easily devoiced are easily dropped or dropped during phoneme recognition. 1. Shibashiba (SUISEU) Totosu, in this case the word recognition result is "Shinshiku" instead of "Fuchikou"
(In the word dictionary, it was (SIN=ZJUKU)). On the other hand, according to this embodiment, the phoneme recognition result of the same data as the conventional example for "Fuchu" (tHu-c Juu in the word dictionary) is (SUI-8EU).
The word recognition result was correctly ``fuchu''. This is because while the conventional recognized phoneme (Ul) had the same likelihood for both U and ■ in the word dictionary, the recognized phoneme (UI-) in this embodiment This is because (U-) in ``Shinnok'' has a high likelihood, and +I) in ``Shinnok'' has only a low likelihood.

なお上記実施例では、単語辞書の音素としてＩ−。In the above embodiment, I- is used as a phoneme in the word dictionary.

Ｕ−、を加えただけであるが、母音の無声化の頻度によ
り、更に細かく分けてもよい。寸たまれではあるが、Ｅ
、Ａ、Ｏも無声化することがあり、これら無声化母音を
加えてもよい。また上記実施例では認識音素としてＵＩ
−だけを加えているが、Ｕ−２■−の分離が可能になれ
ば、認識音素としてＵ−、Ｉ−を加えれば、さらに単語
認識率を向上することができるものである。Although only U- is added, it may be further divided depending on the frequency of vowel devoicing. Although it is rare, E
, A, and O may also be devoiced, and these devoiced vowels may be added. In addition, in the above embodiment, UI is used as a recognized phoneme.
Although only - is added, if it becomes possible to separate U-2- and then add U- and I- as recognized phonemes, the word recognition rate can be further improved.

本発明は上記のように、無声化母音を本来の有声母音と
区別しているため、単語認識率を向上させることができ
る利点を有するものである。As described above, the present invention has the advantage that the word recognition rate can be improved because the devoiced vowels are distinguished from the original voiced vowels.

[Brief explanation of the drawing]

第１図は単語音声認識方法の概略を示す図１第２図は従
来例におけるＣ、Ｍ、の一部を示す図、第３図は本発明
の一実施例における単語音声認識方法に用いるＣ、Ｍ、
の一部を示す図である。第１図Fig. 1 shows an outline of the word speech recognition method. Fig. 2 shows a part of C and M in the conventional example. Fig. 3 shows the C used in the word speech recognition method in an embodiment of the present invention. ,M,
FIG. Figure 1

Claims

[Claims]

(1) Perform phoneme recognition on the input speech to obtain a recognized phoneme sequence, and use a confusion matrix to calculate the likelihood between this recognized phoneme and a phoneme in a word dictionary with phoneme notation to recognize words. In the word speech recognition method, the phoneme symbols representing vowels in the word dictionary are written separately into normally generated voiced vowels and vowels that are easily devoiced, and the above-mentioned phoneme symbols are represented as dictionary phoneme items in the confusion matrix. A word speech recognition method 1 characterized in that voiced vowels and vowels that are easily devoiced are provided separately.

(2) In the word speech recognition method, voiced vowels and unvoiced vowels are distinguished as the types of recognized phonemes that are the result of phoneme recognition, and the recognized phoneme items of the conflation matrix are adjusted to the types of recognized phonemes. The word speech recognition method according to claim 1, wherein voiced vowels and voiceless vowels are provided separately.