JPS5872995A - Word voice recognition - Google Patents
Word voice recognitionInfo
- Publication number
- JPS5872995A JPS5872995A JP56171365A JP17136581A JPS5872995A JP S5872995 A JPS5872995 A JP S5872995A JP 56171365 A JP56171365 A JP 56171365A JP 17136581 A JP17136581 A JP 17136581A JP S5872995 A JPS5872995 A JP S5872995A
- Authority
- JP
- Japan
- Prior art keywords
- phoneme
- vowels
- word
- recognized
- voiced
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
本発明は、入力音声に対して先ず音素認識を行ない、こ
の認識音素系列を、音素表記された単語辞書と照合して
単語を認識する単語音声認識方法に関し、特に無声化母
音と有声母音とを区別することにより単語認識率を向上
することを目的とするものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a word speech recognition method that first performs phoneme recognition on input speech and then recognizes words by comparing this recognized phoneme sequence with a word dictionary in which phonemes are expressed. The purpose of this method is to improve the word recognition rate by distinguishing between voiced vowels and voiced vowels.
まず、従来のこの種の単語音声認識方法について第1図
とともに説明する。First, a conventional word speech recognition method of this type will be explained with reference to FIG.
第1図に示すように、入力単語音声を分析し、この入力
単語音声の特徴を抽出して入力単語音声を構成する音素
を認識し、この認識された音素系列を音素表記された単
語辞書中の各単語とコンフユージヨンマトリクス(Co
nfusion Matrixs以下C,M、と略す)
を用いて照合し、尤度を計算し、尤度の大きい単語を認
識単語とするものである。As shown in Figure 1, the input word speech is analyzed, the features of this input word speech are extracted, the phonemes that make up the input word speech are recognized, and the recognized phoneme sequence is stored in a word dictionary with phoneme notation. Each word and the conflation matrix (Co
(hereinafter abbreviated as C, M)
, the likelihood is calculated, and the word with the highest likelihood is selected as the recognized word.
第1表は上記音素表記された単語辞書(都市名)の−例
を示している。Table 1 shows an example of the word dictionary (city names) expressed in phonemes.
第 1 表
また、第2表は単語辞書の音素表記法の一例を示してい
る。Table 1 Table 2 also shows an example of phoneme notation in the word dictionary.
第1表に示すように単語辞書の音素表記は、あたかもa
−〜字を書くかのごとく機械的になされ、無声化°よの
考慮はなかった。また従来例において音素認識段階(第
1図)において、音声のスペクトル等の物理的パラメー
タに基づいて音素認識を行なっており、その結果、非常
に長い持続時間を持つ無声子音が認識された場合には、
その子音を3個の音素、すなわち無声子音音素、無声化
母音音素、無声子音音素の連続したものと見なし、音声
系列を修正していた。例えば長い持続時間を持つ無声子
音CIBU のCが認識された場合、3個の音素CUI
C$連続したものと見なし音声系列をCUICIBU
に修正していた。通常無声化する母音は!又はUであ
るので、修正によシ加えられる母音は工とUの中間母音
UIである。As shown in Table 1, the phoneme notation in the word dictionary is as if a
-~ It was done mechanically, as if writing letters, and no consideration was given to devoicing °. Furthermore, in the conventional example, phoneme recognition is performed at the phoneme recognition stage (Figure 1) based on physical parameters such as the spectrum of speech, and as a result, when a voiceless consonant with an extremely long duration is recognized, teeth,
The consonant was regarded as a series of three phonemes: an unvoiced consonant phoneme, a devoiced vowel phoneme, and an unvoiced consonant phoneme, and the phonetic sequence was modified. For example, if C of the long-duration voiceless consonant CIBU is recognized, three phonemes CUI
C$ CUICIBU considers the audio sequence to be continuous.
It was corrected. Which vowels are usually devoiced? or U, so the vowel added in the modification is the intermediate vowel UI between 葡 and U.
第2図は従来例におけるC、M、の一部を示している。FIG. 2 shows part of C and M in the conventional example.
このC,M・中の数字は、単語辞書中のそれぞれの音素
−“Dが、どの音素Wに認識されるかの確率P(W/D
)をチで表わしたものである。C,M・中の認識音素U
Iには、物理的性質がUとIの中間である有声遍音と・
非常1長1持続i間を持9無声子音が認識された結果修
正された無声化母音とを含む。The numbers in C and M are the probability P (W/D
) is expressed in chi. Recognized phoneme U in C, M.
I has a voiced ubiquitous tone whose physical properties are intermediate between U and I, and
It has 1 duration, 1 duration, and 9 voiceless consonants and a modified devoiced vowel as a result of recognition.
日本語では、「ん」以外の子音は、必ず有声の母音又は
半母音が後続することが原則である。例えばNARA(
ナラ)K J OOT O(キョート)等である
。この原則に基づく限り、前記従来の方法で認識される
ことはない。更に従来例でも一応修正による無声化対策
がなされていた。In Japanese, as a general rule, consonants other than ``n'' are always followed by a voiced vowel or semi-vowel. For example, NARA (
Oak) K J OOT O (Kyoto) etc. As long as it is based on this principle, it will not be recognized by the conventional method. Furthermore, even in the conventional example, countermeasures for devoicing have been taken through modification.
しかしながら、実際には語中の■及びUは20チ程度無
声化が認められる。このため従来例では無声化対策が不
充分なため、単語を誤認識する欠点があった。However, in reality, ■ and U in words are devoiced by about 20 characters. For this reason, in the conventional example, countermeasures against devoicing were insufficient, resulting in the drawback of erroneous recognition of words.
本発明は、上記従来例の欠点を除≠・するものであり、
以下に本発明の一実施例について説明する。The present invention eliminates the drawbacks of the above conventional example,
An embodiment of the present invention will be described below.
本実施例の単語辞書においては、無声化し易い母音には
予め無声化記号を付し、本来の有声母音とは異なる音素
表記にする。例えば都市名、府中「フチクー」は、従来
の単語辞書では第1表に示すように(HUCJUU
)と表記していたが、「フ」は無声化し易いので、本
実施例では()IU−CJIJU )と表わし、U
−とUとを別の音素とする。In the word dictionary of this embodiment, vowels that are likely to be devoiced are given a devoicing symbol in advance, and are given a different phoneme notation from the original voiced vowel. For example, the city name Fuchu ``Fuchiku'' is written in conventional word dictionaries as shown in Table 1 (HUCJUU
), but since "fu" is easily devoiced, in this example, it is expressed as ()IU-CJIJU ), and U
- and U are different phonemes.
また、本実施例では、音素認識段階における無声化対策
として音素系列修正のために付加された母音とUI−と
表わし、有声のUIと区別するものである。Furthermore, in this embodiment, the vowel added to correct the phoneme sequence as a measure against devoicing in the phoneme recognition stage is expressed as UI-, to distinguish it from voiced UI.
第3図は本実施例におけるC、M、の一部を示している
。第3図において、単語辞書中の音素IとI−tUとU
−がそれぞれどのように認識されたかを比べてみると、
有声音と無声音とが明らかに分離されていることがわか
る。また第3図から無声化し易い母音は、音素認識時に
脱・落し易いこともわかるO
従来の方法においては、例えば都市名、府中「フチュー
」(単語辞書では(I(UCJUU ))の音素認識
結果が1.シばしば(SUISEU)と々す、この場合
の単語認識結果は「フチクー」ではなく「シンシーク」
(単語辞書では(SIN=ZJUKU ))であった
。これに対して本実施例によれば「フチュー」(単語辞
書ではtHu−c J uu))の従来例と同一デー
タの音素認識結果は(SUI−8EU )となシ、
単語認識結果は正しく「フチュー」となった。これは従
来の認識音素(Ul)が、単語辞書のUとも■とも同程
度の尤度を持っていたのに対し、本実施例における認識
音素(UI−)は、「フチュー」の「フ」の(U−)と
は高い尤度を持ち、「シンノーク」の「シ」の+I)と
は低い尤度しか持たないからでさる。FIG. 3 shows part of C and M in this embodiment. In Figure 3, the phonemes I, I-tU and U in the word dictionary
Comparing how each - was recognized,
It can be seen that voiced sounds and unvoiced sounds are clearly separated. Also, from Figure 3, it can be seen that vowels that are easily devoiced are easily dropped or dropped during phoneme recognition. 1. Shibashiba (SUISEU) Totosu, in this case the word recognition result is "Shinshiku" instead of "Fuchikou"
(In the word dictionary, it was (SIN=ZJUKU)). On the other hand, according to this embodiment, the phoneme recognition result of the same data as the conventional example for "Fuchu" (tHu-c Juu in the word dictionary) is (SUI-8EU).
The word recognition result was correctly ``fuchu''. This is because while the conventional recognized phoneme (Ul) had the same likelihood for both U and ■ in the word dictionary, the recognized phoneme (UI-) in this embodiment This is because (U-) in ``Shinnok'' has a high likelihood, and +I) in ``Shinnok'' has only a low likelihood.
なお上記実施例では、単語辞書の音素としてI−。In the above embodiment, I- is used as a phoneme in the word dictionary.
U−、を加えただけであるが、母音の無声化の頻度によ
り、更に細かく分けてもよい。寸たまれではあるが、E
、A、Oも無声化することがあり、これら無声化母音を
加えてもよい。また上記実施例では認識音素としてUI
−だけを加えているが、U−2■−の分離が可能になれ
ば、認識音素としてU−、I−を加えれば、さらに単語
認識率を向上することができるものである。Although only U- is added, it may be further divided depending on the frequency of vowel devoicing. Although it is rare, E
, A, and O may also be devoiced, and these devoiced vowels may be added. In addition, in the above embodiment, UI is used as a recognized phoneme.
Although only - is added, if it becomes possible to separate U-2- and then add U- and I- as recognized phonemes, the word recognition rate can be further improved.
本発明は上記のように、無声化母音を本来の有声母音と
区別しているため、単語認識率を向上させることができ
る利点を有するものである。As described above, the present invention has the advantage that the word recognition rate can be improved because the devoiced vowels are distinguished from the original voiced vowels.
第1図は単語音声認識方法の概略を示す図1第2図は従
来例におけるC、M、の一部を示す図、第3図は本発明
の一実施例における単語音声認識方法に用いるC、M、
の一部を示す図である。
第1図Fig. 1 shows an outline of the word speech recognition method. Fig. 2 shows a part of C and M in the conventional example. Fig. 3 shows the C used in the word speech recognition method in an embodiment of the present invention. ,M,
FIG. Figure 1
Claims (2)
素系列を得、この認識音素と音素表記された単語辞書の
音素との尤度をコンフユージヨンマトリクスを用いて計
算して単語を認識する単語音声認識方法において、上記
単語辞書の母音を表記する音素記号を、正常に発生され
る有声母音と無声化し易い母音とに区別して表記し、か
つ上記コンフユージヨンマトリクスにおける辞書音素の
項目として上記有声母音と無声化し易い母音とを別々に
設けることを特徴とする単語音声認1方法。(1) Perform phoneme recognition on the input speech to obtain a recognized phoneme sequence, and use a confusion matrix to calculate the likelihood between this recognized phoneme and a phoneme in a word dictionary with phoneme notation to recognize words. In the word speech recognition method, the phoneme symbols representing vowels in the word dictionary are written separately into normally generated voiced vowels and vowels that are easily devoiced, and the above-mentioned phoneme symbols are represented as dictionary phoneme items in the confusion matrix. A word speech recognition method 1 characterized in that voiced vowels and vowels that are easily devoiced are provided separately.
である認識音素の種類として、有声の母音と無声の母音
とを区別し、かつコンフユージヨンマトリクスの認識音
素項目を認識音素の種類に合わせて、有声母音と無声母
音とを別々に設けてなることを特徴とする特許請求の範
囲第(1)項記載の単語音声認識方法。(2) In the word speech recognition method, voiced vowels and unvoiced vowels are distinguished as the types of recognized phonemes that are the result of phoneme recognition, and the recognized phoneme items of the conflation matrix are adjusted to the types of recognized phonemes. The word speech recognition method according to claim 1, wherein voiced vowels and voiceless vowels are provided separately.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP56171365A JPS5872995A (en) | 1981-10-28 | 1981-10-28 | Word voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP56171365A JPS5872995A (en) | 1981-10-28 | 1981-10-28 | Word voice recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS5872995A true JPS5872995A (en) | 1983-05-02 |
Family
ID=15921825
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP56171365A Pending JPS5872995A (en) | 1981-10-28 | 1981-10-28 | Word voice recognition |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS5872995A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6078495A (en) * | 1983-10-05 | 1985-05-04 | 松下電器産業株式会社 | Big vocabulary word recognition equipment |
JPS6146995A (en) * | 1984-08-11 | 1986-03-07 | 富士通株式会社 | Voice recognition system |
JP2006522370A (en) * | 2003-03-31 | 2006-09-28 | ノヴォーリス テクノロジーズ リミテッド | Phonetic-based speech recognition system and method |
-
1981
- 1981-10-28 JP JP56171365A patent/JPS5872995A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6078495A (en) * | 1983-10-05 | 1985-05-04 | 松下電器産業株式会社 | Big vocabulary word recognition equipment |
JPS6146995A (en) * | 1984-08-11 | 1986-03-07 | 富士通株式会社 | Voice recognition system |
JP2006522370A (en) * | 2003-03-31 | 2006-09-28 | ノヴォーリス テクノロジーズ リミテッド | Phonetic-based speech recognition system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3542026B2 (en) | Speech recognition system, speech recognition method, and computer-readable recording medium | |
JP2001296880A (en) | Method and device to generate plural plausible pronunciation of intrinsic name | |
Chen | Speech recognition with automatic punctuation | |
JP2005517216A (en) | Transcription method and apparatus assisted in fast and pattern recognition of spoken and written words | |
JPS5872995A (en) | Word voice recognition | |
JP4840051B2 (en) | Speech learning support apparatus and speech learning support program | |
JPS6316766B2 (en) | ||
JP2002278579A (en) | Voice data retrieving device | |
JP2004294542A (en) | Speech recognition device and program therefor | |
Rao et al. | Word boundary hypothesization in Hindi speech | |
Daly | Recognition of words from their spellings: Integration of multiple knowledge sources | |
KR20010085219A (en) | Speech recognition device including a sub-word memory | |
JPS5968795A (en) | Recognition of word voice | |
JPS5958493A (en) | Recognition system | |
JP2615643B2 (en) | Word speech recognition device | |
JP2004301968A (en) | Utterance processing apparatus, utterance processing method, and program for utterance processing | |
JP2006113269A (en) | Phonetic sequence recognition device, phonetic sequence recognition method and phonetic sequence recognition program | |
JPS5872996A (en) | Word voice recognition | |
KR20030097309A (en) | Method of korean utterance recognition using spelling pronunciation | |
JPS6073697A (en) | Preparation of phoneme dictionary | |
JPS63153596A (en) | Voice sentence input device | |
Dhanjal et al. | PUNJARPAbet: A NEW PHONETIC ALPHABET FOR SPEECH PROCESSING IN THE PUNJABI LANGUAGE | |
GB2292235A (en) | Word syllabification. | |
JPS5849996A (en) | Average phonemic pattern preparation system | |
JPS617896A (en) | Word voice recognition method |