JPH07302098A

JPH07302098A - Word voice recognition device

Info

Publication number: JPH07302098A
Application number: JP6113572A
Authority: JP
Inventors: Yoshinori Morimoto; 本吉則森
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 1994-04-28
Filing date: 1994-04-28
Publication date: 1995-11-14

Abstract

PURPOSE:To make the word having a highest degree of similarity as a recognized word by comparing plural word data and the word data which are classified into basic vowels. CONSTITUTION:A vowel cutout section 4 receives word unit digital data constituted by the combination of sound syllables supplied by an A/D conversion section 2, cut outs the sound syllable of a last section (the last end), the sound syllable of a front section (the forefront) or the vowel of the last syllable and the front syllable and outputs them to a first stage voice recognition section 7. The section 7 compares the vowel section of the last section syllable of a cutout inputted word and the vowel comparison specimen which is beforehand registered in a vowel storage section 5 and the vowel having the highest degree of similarity comparison specimen is made as the recognition result. A second stage voice recognition section 8 reads the word selected by the section 7 as a comparison specimen, compares the specimen and the inputted word recorded in a backup memory 3 and the word having a high degree of similarity is outputted as a recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は単語音声認識装置に関
し、特に高精度な単語音声認識を可能とする単語音声認
識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word voice recognition device, and more particularly to a word voice recognition device capable of highly accurate word voice recognition.

【０００２】[0002]

【従来の技術】人間の発生した音声を認識する音声認識
装置は、荷物の配達先別区分、各種装置のワイヤレス制
御等、幅広い分野における利用が期待されている。一
方、認識すべき対象となる単語数が増えると、認識に要
する処理時間が増加し、また認識率が低下し、実用に耐
えなくなってしまうため、認識対象となる単語の数には
一定の限界がある。2. Description of the Related Art A voice recognition device for recognizing a voice generated by humans is expected to be used in a wide variety of fields such as classification of packages by delivery destination and wireless control of various devices. On the other hand, if the number of words to be recognized increases, the processing time required for recognition will increase and the recognition rate will decrease, making it impractical. Therefore, the number of words to be recognized has a certain limit. There is.

【０００３】従来、単語音声の認識処理については、対
象単語数の増加に対しても影響を受け難いように、単語
音声を母音や子音の最小単位の組み合わせから成る音節
に分け、各音節単位で認識処理を行なおうとする試みが
数多く行なわれてきている。また、特開昭５９−１６０
１９７号では、子音＋母音から成る音節単位の認識処理
が開示されている。この処理では、音節の母音に着目し
て母音の認識を施した後、認識結果としての母音を後部
に含む音節の認識処理を行なっている。Conventionally, in word speech recognition processing, the word speech is divided into syllables consisting of a combination of minimum units of vowels and consonants, and each syllable is united so that it is not easily affected by an increase in the number of target words. Many attempts have been made to perform recognition processing. Also, JP-A-59-160
No. 197 discloses recognition processing in syllable units composed of consonants and vowels. In this process, vowel recognition is performed by paying attention to the vowel of the syllable, and then syllable recognition processing including the vowel as a recognition result in the rear part is performed.

【０００４】しかしながら、上述方式は、その認識精度
が充分に高ければ、きわめて汎用性の高い単語音声認識
方式ではあるが、実際に発声された単語音声を音節単位
に区切ることは、非常に困難であり、正しく区切られて
いない音節単位での認識は、意味を為さない。したがっ
て、理論的には、きわめて汎用性の高い上述の如き音節
単位の認識処理後、認識された音節の組み合わせから成
る単語単位の認識処理を行なう認識処理方式は、構成が
複雑化するという問題も有するため、実用化に至ってい
ない。However, the above-mentioned method is a highly versatile word speech recognition method if its recognition accuracy is sufficiently high, but it is very difficult to divide the actually uttered word speech into syllable units. Yes, recognition in syllable units that are not correctly delimited does not make sense. Therefore, theoretically, the recognition processing method that performs the recognition processing for each word including the combination of the recognized syllables after the recognition processing for each syllable as described above, which is extremely versatile, has a problem that the configuration becomes complicated. Since it has, it has not been put to practical use.

【０００５】そこで、こうした被認識対象とする単語数
が多い単語音声認識に対しては、次のような認識処理が
実用化されている。すなわち、先ず、予め単語をいくつ
かの所定の関連付けのもとで複数個のグループに分け、
各グループに名称をつけておき、そのグループ名を単語
の音声認識処理をする前段階で発声し、発声されたグル
ープ名に対しての音声認識処理を行なう。次に、認識結
果として得られるグループ内に属する単語のみだけを登
録標本、つまり被認識対象単語として用意し、続いて発
声された本来の被認識音声と当該グループに属する上記
登録標本との間で比較処理を行なって認識を行なう。Therefore, the following recognition processing has been put to practical use for word speech recognition in which the number of words to be recognized is large. That is, first, the words are divided into a plurality of groups in advance based on some predetermined associations,
A name is given to each group, the group name is uttered before the speech recognition processing of words, and the speech recognition processing is performed on the uttered group name. Next, only the words belonging to the group obtained as a recognition result are prepared as registered samples, that is, the words to be recognized, and subsequently, between the original recognized speech and the registered sample belonging to the group. Recognition is performed by performing comparison processing.

【０００６】例えば、東京近郊の数百程度ある駅名を音
声認識する場合、先ず『山手線』『中央線』『総武線』
『埼京線』『南武線』等と、沿線毎に各駅をグループ分
けする。次に、被認識駅名が属するグループ名を発声す
る。例えば『山手線』に属する駅“大崎”を被認識駅名
とすると、先ず『山手線』と発声して、入力音声が『山
手線』であることを認識した後、『山手線』のグループ
に属する駅名が、次の本来の被認識駅名の対象登録標本
となる。続いて、”大崎”と発声すると、『山手線』グ
ループに属する駅名（数は限定されている）と、発声駅
名との比較処理が行なわれて最も類似度の高い登録標本
の駅名を認識結果とする。For example, when recognizing several hundred station names in the suburbs of Tokyo by voice, first, "Yamanote Line", "Chuo Line" and "Sobu Line"
Each station is divided into groups, such as "Saikyo Line" and "Nanbu Line". Next, say the group name to which the recognized station name belongs. For example, if the station name "Osaki" that belongs to the "Yamanote Line" is the recognized station name, first say "Yamanote Line", recognize that the input voice is "Yamanote Line", and then enter the "Yamanote Line" group. The station name to which it belongs becomes the target registration sample for the next original station name to be recognized. Next, when you say "Osaki", the station name (the number of which is limited) belonging to the "Yamanote Line" group is compared with the vocal station name, and the station name of the registered sample with the highest degree of similarity is recognized. And

【０００７】[0007]

【発明が解決しようとする課題】上述のように、多くの
分野で要望されている音声認識装置の実用化は、認識率
を考慮すると、その認識対象とする単語数に限界がある
ため、現時点では種々の限界が生ずる。これらの限界を
越えて、数多くの単語を音声認識しようとすれば、前述
した通り、グループ分けしておき、グループ名の発声、
音声認識、続いて被認識単語の発声、認識と２段階の発
声及び認識処理が必要となるため、手数が掛かり、ま
た、グループ名を使用者が記憶しておかねばならない煩
わしさが付きまとう。更に、地名や人名など、グループ
分けに際して関連付けが困難な場合も考えられる。As described above, the practical use of the speech recognition apparatus demanded in many fields is limited at present because the number of words to be recognized is limited in consideration of the recognition rate. Then, various limits occur. If you try to recognize a large number of words by voice over these limits, you should divide them into groups and utter the group name as described above.
Since voice recognition, and subsequently utterance of the recognized word, recognition and two-stage utterance and recognition processing are required, it is troublesome and the user has to remember the group name. Furthermore, it may be difficult to associate places such as place names and person names in grouping.

【０００８】そこで、本発明の目的は、比較的簡易な構
成で大量の単語音声の高速、且つ高精度な認識処理を可
能とする単語音声認識装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a word voice recognition device capable of high-speed and highly accurate recognition processing of a large number of word voices with a relatively simple structure.

【０００９】[0009]

【課題を解決するための手段】前述した課題を解決する
ため本発明の単語音声認識装置は、予め定めた母音につ
いてのデータを記憶する母音データ記憶手段と、複数の
単語のデータを、各単語を構成する音節のうち所定の音
節に含まれる母音の種類に応じて複数の母音グループに
分類して記憶する母音グループ記憶手段と、入力単語音
声について前記所定の音節が含まれる母音を基礎母音と
して抽出する母音抽出手段と、該母音抽出手段で抽出さ
れた基礎母音のデータと、前記母音データ記憶手段に記
憶されているデータを比較し、最も類似度の高い母音を
前記基礎母音として認識する第１の認識手段と、前記入
力単語音声のデータと、前記第１の認識手段で認識され
た基礎母音に分類されている単語のデータとを比較し、
最も類似度の高い単語を認識単語として出力する第２の
認識手段と、を備えて構成される。ここで、所定の音節
は、単語を構成する音節のうち最後部音節、最前部音節
または最前部と最後部の２つの音節とすることができ
る。In order to solve the above-mentioned problems, a word voice recognition apparatus of the present invention is provided with a vowel data storage means for storing data about a predetermined vowel and a plurality of word data for each word. A vowel group storing means for classifying and storing into a plurality of vowel groups according to the type of vowel contained in a predetermined syllable among the syllables that constitute the vowel, and a vowel containing the predetermined syllable for the input word speech as a basic vowel A vowel extracting means for extracting, comparing the data of the basic vowel extracted by the vowel extracting means and the data stored in the vowel data storing means, and recognizing the vowel with the highest degree of similarity as the basic vowel. 1 recognition means, the data of the input word voice, and the data of the words classified into the basic vowels recognized by the first recognition means are compared,
Second recognition means for outputting the word having the highest degree of similarity as a recognition word. Here, the predetermined syllable can be the last syllable, the frontmost syllable, or the two syllables of the frontmost part and the last part of the syllables forming the word.

【００１０】[0010]

【作用】本発明では、複数の単語のデータを、各単語を
構成する音節のうち所定の音節に含まれる母音の種類に
応じて複数の母音グループに分類して記憶しておき、入
力単語音声について単語を構成する音節のうち最後部音
節、最前部音節または最前部と最後部の２つの音節を上
記所定の音節とし、この所定の音節に含まれる母音を基
礎母音として抽出し、基礎母音のデータと比較し、最も
類似度の高い母音を前記基礎母音として認識し、認識さ
れた基礎母音に分類されている単語のデータとを比較
し、最も類似度の高い単語を認識単語とする。According to the present invention, the data of a plurality of words are classified into a plurality of vowel groups according to the type of vowel contained in a predetermined syllable of the syllables forming each word and stored. About the last syllable, the frontmost syllable or the two syllables of the frontmost part and the last part of the syllables that compose the word, the vowels included in the predetermined syllables are extracted as the basic vowels. A vowel having the highest degree of similarity is recognized as the basic vowel by comparing with the data, and the word having the highest degree of similarity is compared with the data of the words classified into the recognized basic vowels to be the recognized word.

【００１１】[0011]

【実施例】次に、本発明の実施例について図面を参照し
ながら説明する。本発明の一実施例による単語音声認識
装置では、一度の音声認識処理の認識対象単語数を少な
くするため、まず全ての対象単語を予め定めたグループ
に分類する。分類方法は、単語を構成する音節のうち、
定められた位置の音節（例えば、単語の最後部音節、最
前部音節、また、これら２つの音節等）の母音部をキー
とし、『あ』、『い』、『う』、『え』、『お』、
『ん』を登録標本として認識処理し、認識対象単語が
『あ』グループ、『い』グループ、『う』グループ、
『え』グループ、『お』グループ、『ん』グループのう
ち、どのグループに属するかを分類する。その後、認識
されたグループ内の認識対象単語だけを登録標本とし、
認識処理を行なう。Embodiments of the present invention will now be described with reference to the drawings. In the word voice recognition apparatus according to the embodiment of the present invention, all the target words are first classified into a predetermined group in order to reduce the number of recognition target words in one voice recognition process. The classification method is as follows:
With the vowel part of the syllable at the defined position (for example, the last syllable of the word, the frontmost syllable, and these two syllables) as the key, "a", "i", "u", "e", "O",
Recognize "n" as a registered sample, and the recognition target words are "a" group, "i" group, "u" group,
The group to which the “E” group, the “O” group, and the “N” group belong is classified. After that, only the recognition target words in the recognized group are registered samples,
Perform recognition processing.

【００１２】本実施例では、単語内の所定音節の母音に
着目しているので、従来のように何らかの関連付けによ
る認識対象単語の分類及びそのグループ名の付与を予め
行なう必要はなく、又、認識対象単語の発声前にグルー
プ名を発声して、認識させる必要もなく、さらに、グル
ープ名を使用者が記憶しておく煩わしさもない。In this embodiment, since the vowel of a predetermined syllable in a word is focused on, it is not necessary to classify the recognition target words by some association and assign the group name in advance as in the prior art, and the recognition is not performed. There is no need to utter a group name before uttering the target word for recognition, and there is no need for the user to remember the group name.

【００１３】すなわち、本実施例では、数多い認識対象
単語を登録標本とする音声認識装置の場合、第１段階の
認識部で認識対象単語を母音キーとしてグループ別に自
己自動分類し、第１段階の認識部で識別されたグループ
内の認識対象単語を登録標本として認識処理を行なう。That is, in the present embodiment, in the case of a speech recognition apparatus in which a large number of recognition target words are used as registered samples, the recognition unit at the first stage automatically classifies the recognition target words as vowel keys into groups, The recognition process is performed by using the recognition target words in the group identified by the recognition unit as the registered sample.

【００１４】図１は本発明による単語音声認識装置の一
実施例の構成ブロック図である。単語単位で音声出力部
１に入力された音声信号は、Ａ／Ｄ変換部２でデジタル
データに変換され、第２段階で音声認識処理させるため
の単語単位データを一時記憶するバックアップメモリ３
と、単語の最終音節の母音を切り出す母音切出部４に供
給される。母音記憶部（第１グループ登録音声部）５と
単語記憶部（第２グループ登録音声部）６は、音声認識
の際、被認識単語との比較対象である比較標本を予め登
録しておく記憶部で、母音記憶部５は、第１段階の音声
認識の母音比較標本（『あ』、『い』、『う』、
『え』、『お』、『ん』）を記憶し、単語記憶部６は、
第２段階の音声認識の際、被認識単語との比較対象であ
る単語比較標本をグループに分類して記憶している。FIG. 1 is a block diagram showing the configuration of an embodiment of a word voice recognition apparatus according to the present invention. The voice signal input to the voice output unit 1 in word units is converted into digital data in the A / D conversion unit 2, and the backup memory 3 for temporarily storing the word unit data for voice recognition processing in the second stage.
Is supplied to the vowel extraction unit 4 which extracts the vowel of the final syllable of the word. The vowel storage unit (first group registered voice unit) 5 and the word storage unit (second group registered voice unit) 6 store in advance a comparative sample to be compared with the recognized word during voice recognition. The vowel storage unit 5 stores vowel comparison samples (“a”, “i”, “u”,
"E", "O", "N") is stored, and the word storage unit 6 stores
At the time of the second stage speech recognition, word comparison samples to be compared with the recognized word are classified into groups and stored.

【００１５】母音切出部４は、Ａ／Ｄ変換部２から供給
される音節の組み合わせで構成されている単語単位のデ
ジタルデータを受け、最後部（最終）音節、最前部（先
頭）音節、または最後部音節と最前部音節（本例では最
後部音節）の母音を切り出して、第１段階音声認識部７
に出力する。The vowel cutout unit 4 receives word-by-word digital data composed of a combination of syllables supplied from the A / D conversion unit 2, and receives the last (final) syllable, the frontmost (first) syllable, Alternatively, the vowels of the last syllable and the frontmost syllable (in this example, the last syllable) are cut out, and the first-stage speech recognition unit 7
Output to.

【００１６】第１段階音声認識部７は、母音切出部４で
切り出された入力単語の最後部音節の母音部と、母音記
憶部５に予め登録されている母音比較標本とを比較し、
最も類似度の高い比較標本をもつ母音を認識結果とす
る。The first-stage voice recognition unit 7 compares the vowel part of the last syllable of the input word cut out by the vowel cutout unit 4 with the vowel comparison sample registered in advance in the vowel storage unit 5,
The recognition result is the vowel that has the comparative sample with the highest similarity.

【００１７】第２段階音声認識部８は、第１段階音声認
識部７で選択された母音グループに分類され、予め登録
されている単語を比較標本として読み出し、先にバック
アップメモリ３に記録されている入力単語との比較処理
を実行し、最も類似度の高い比較標本の単語を認識結果
として出力する。The second-stage voice recognition unit 8 is classified into the vowel group selected by the first-stage voice recognition unit 7 and reads a word registered in advance as a comparative sample, which is first recorded in the backup memory 3. The comparison process with the existing input word is executed, and the word of the comparative sample having the highest similarity is output as the recognition result.

【００１８】信号処理部９は、こうして第２段階音声認
識部８で得られた認識結果を該当する文字制御信号とし
て出力する。The signal processing unit 9 outputs the recognition result thus obtained by the second-stage voice recognition unit 8 as a corresponding character control signal.

【００１９】母音切出部４における、単語を形成する最
後部音節の母音抽出は、図２（Ａ）に示すようにスレッ
シュホールドレベルＴhを切った最後の時点より、子音
部にかからない程度だけ遡った入力信号のｔn1時間部分
だけ切り出すことにより実用に耐えうる充分な母音情報
が得られる。時間ｔn1は、実験によれば、早い発声の場
合であっても１００〜１５０ｍｓｅｃに設定すれば良い
ことが確認できた。また、ゆっくり発声した場合でも子
音部の長さはほとんど変わらず母音部のみ長くなるだけ
で、発声のスピードには関係なく、同様にｔｎ１は１０
０〜１５０ｍｓｅｃに設定すれば良いことも確認され
た。The vowel extraction of the last syllable forming a word in the vowel cutout unit 4 goes back from the last time when the threshold level Th is cut as shown in FIG. By extracting only the tn1 time portion of the input signal, sufficient vowel information that can be practically used can be obtained. According to the experiment, it was confirmed that the time tn1 should be set to 100 to 150 msec even in the case of early vocalization. In addition, even when uttering slowly, the length of the consonant part hardly changes and only the vowel part becomes long, and tn1 is 10 regardless of the speed of utterance.
It was also confirmed that it may be set to 0 to 150 msec.

【００２０】単語の最前部音節の母音抽出は、図２
（Ｂ）に示すように、スレッシュホールドレベルＴhを
越えた最初の時点から子音部分を除去するために必要な
遅延時間ｔs1（通常、３０ｍｓｅｃ程度）を経てから切
り出し時間ｔs2とすればよい。母音部分は次にくる音節
の子音部分にかからない程度、長時間切り出したいが発
声の早い場合は子音＋母音部分で２００ｍｓｅｃ程度と
短くなり、時間ｔｓ２は１００〜１５０ｍｓｅｃとす
る。The vowel extraction of the frontmost syllable of a word is shown in FIG.
As shown in (B), the cut-out time ts2 may be set after the delay time ts1 (usually about 30 msec) necessary for removing the consonant portion is passed from the first time point when the threshold level Th is exceeded. The vowel part does not cover the consonant part of the next syllable, and when it is desired to cut out for a long time but the vocalization is fast, the consonant + vowel part is shortened to about 200 msec, and the time ts2 is set to 100 to 150 msec.

【００２１】本実施例の具体的動作例として東京近郊の
駅名の音声認識に関し、単語駅名「渋谷」の認識処理に
ついて、以下説明する。As a concrete operation example of the present embodiment, the recognition processing of the word station name "Shibuya" with respect to the voice recognition of the station name in the suburbs of Tokyo will be described below.

【００２２】図１において、音声出力部１から「渋谷」
の単語音声がマイクロホン等を介して入力されると、こ
の入力信号は、Ａ／Ｄ変換部２によりデジタルデータに
変換された後、一旦、「渋谷」対応のデジタルデータが
バックアップメモリ３に記憶される。続いて、母音切出
部４にて、例えば、単語の最後部音節の母音を切り出
す。こうして、単語「渋谷（しぶや）」の最終音節
「や」の母音「あ」が切り出され、第１段階音声認識部
７に供給される。In FIG. 1, from the voice output unit 1, "Shibuya"
When the word voice of is input through a microphone or the like, this input signal is converted into digital data by the A / D conversion unit 2 and then digital data corresponding to “Shibuya” is temporarily stored in the backup memory 3. It Then, in the vowel cutout unit 4, for example, the vowel of the last syllable of the word is cut out. In this way, the vowel "a" of the final syllable "ya" of the word "Shibuya" is cut out and supplied to the first stage speech recognition unit 7.

【００２３】第１段階音声認識部７は、予め登録されて
いる「あ」、「い」、「う」、「え」、「お」、「ん」
のデータを標本として音声認識処理を行ない、被認識母
音が「あ」であると認識する。The first-stage voice recognition unit 7 has pre-registered "A", "I", "U", "E", "O", "N".
The voice recognition process is performed by using the data of (3) as a sample, and the recognized vowel is recognized as "A".

【００２４】第２段階音声認識部８は、予め図３に示す
ようにグループ分けして登録されている登録単語の中か
ら「あ」グループを選択し、「あ」グループの登録単語
「品川」から「横浜」の２０駅名だけを比較標本データ
として、先にバックアップメモリ３に記憶されている単
語「渋谷」のデータとの間で比較して認識処理を行な
う。The second-stage voice recognition unit 8 selects the group "a" from the registered words registered in advance by grouping as shown in FIG. 3, and the registered word "Shinagawa" of the group "a" is selected. From 20 to “Yokohama” as the comparison sample data, the recognition processing is performed by comparing with the data of the word “Shibuya” previously stored in the backup memory 3.

【００２５】予め登録されている「あ」グループの標本
は、図３に示すとおり、２０単語しかなく、この少ない
標本の中から、最も類似度の高い単語「渋谷」を識別
し、最終的な認識結果が得られることになる。したがっ
て、従来のように、グループ名の発声に続く、本来の単
語の発声というような２回の発声及び認識処理が不安と
なり、一回の発声で単語認識が可能となる。As shown in FIG. 3, the sample of the “A” group registered in advance has only 20 words, and the word “Shibuya” having the highest degree of similarity is identified from this small sample, and the final sample is identified. The recognition result will be obtained. Therefore, as in the conventional case, the utterance of the group name and the utterance of the original word, such as the utterance of the original word, become uncertain, and the word can be recognized by the utterance of one time.

【００２６】本実施例は、単語のグループ分けを、各単
語の最後部音節に含まれる母音に基づいて行なっている
が、各単語の最前部音節に含まれる母音に基づいて行な
っても良いし、最前部と最後部の両音節に含まれる母音
に基づいて行なうこともできる。両音節に含まれる母音
に基づいてグループ分けを行なうと、各グループに含ま
れる単語の数は、激減するため、更に高精度な単語認識
が可能となる。例えば、最後部音節の母音に基づくグル
ープ分けは、図２（Ａ）、（Ｂ）に示すように、各母音
に含まれる単語の数が２０個程度なのに対して、最前部
と最後部の両音節の母音に基づいてグループ分けをする
と、図４に示すように各グループに属する単語の数は激
減する。In the present embodiment, the grouping of words is performed based on the vowels included in the last syllable of each word, but may be performed based on the vowels included in the frontmost syllable of each word. , Can also be performed based on the vowels contained in both the frontmost and last syllables. When grouping is performed based on the vowels included in both syllables, the number of words included in each group is drastically reduced, which enables more highly accurate word recognition. For example, in the grouping based on the vowel of the last syllable, as shown in FIGS. 2 (A) and 2 (B), the number of words included in each vowel is about 20, while the number of words in both the front part and the last part is large. Grouping based on vowel syllables dramatically reduces the number of words belonging to each group, as shown in FIG.

【００２７】[0027]

【発明の効果】以上説明したように、本発明の単語音声
認識装置は、一回の被認識対象となる単語の発声で高速
且つ高精度に単語の認識が行なえる。As described above, the word voice recognition apparatus of the present invention can recognize a word at high speed and with high accuracy by uttering a word to be recognized once.

[Brief description of drawings]

【図１】本発明による単語音声認識装置の一実施例の構
成ブロック図である。FIG. 1 is a configuration block diagram of an embodiment of a word voice recognition device according to the present invention.

【図２】図１に示す実施例における母音切り出しの原理
を説明するための図である。FIG. 2 is a diagram for explaining the principle of vowel cutout in the embodiment shown in FIG.

【図３】図１に示す実施例における最後部音節について
予め定めた母音対応でグループ分けされた駅名のリスト
である。FIG. 3 is a list of station names grouped according to predetermined vowels for the last syllable in the embodiment shown in FIG.

【図４】図１に示す実施例における最前部音節と最後部
音節について予め定めた母音対応でグループ分けされた
駅名のリストである。FIG. 4 is a list of station names grouped according to predetermined vowels for the frontmost syllable and the rearmost syllable in the embodiment shown in FIG.

[Explanation of symbols]

１音声出力部２Ａ／Ｄ変換部３バックアップメモリ４母音切出部５母音記憶部６単語記憶部７第１段階音声認識部８第２段階音声認識部９信号処理部 1 voice output unit 2 A / D conversion unit 3 backup memory 4 vowel cutout unit 5 vowel storage unit 6 word storage unit 7 first stage voice recognition unit 8 second stage voice recognition unit 9 signal processing unit

Claims

[Claims]

1. A vowel data storage means for storing data about a predetermined vowel, and a plurality of word data according to a type of vowel contained in a predetermined syllable among syllables forming each word. Vowel group storage means for classifying and storing into vowel groups, vowel extraction means for extracting vowels including the predetermined syllable of the input word speech as basic vowels, and basic vowel data extracted by the vowel extraction means , Comparing the data stored in the vowel data storage means,
First recognizing means for recognizing a vowel having the highest degree of similarity as the basic vowel, data of the input word voice, and data of words classified as the basic vowel recognized by the first recognizing means. A second speech recognition device for comparing and outputting a word having the highest degree of similarity as a recognition word.

2. The word voice recognition device according to claim 1, wherein the predetermined syllable is the last syllable of the syllables forming the word.

3. The word voice recognition device according to claim 1, wherein the predetermined syllable is the frontmost syllable of the syllables forming the word.

4. The word voice recognition device according to claim 1, wherein the predetermined syllables are two syllables of a front part and a last part of syllables forming the word.