JPH09120296A

JPH09120296A - Device and method for speech recognition, device and method for dictionary generation, and information storage medium

Info

Publication number: JPH09120296A
Application number: JP8165447A
Authority: JP
Inventors: Masako Hirose; 雅子広瀬
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1995-08-22
Filing date: 1996-06-26
Publication date: 1997-05-06
Anticipated expiration: 2016-06-26
Also published as: JP3865149B2

Abstract

PROBLEM TO BE SOLVED: To speed up speed recognition. SOLUTION: A word notation correspondence dictionary 5 is stored previously with speeches of recognition candidates while they are classified by the readings of the head parts. The readings of the head parts of '100' and '150' are 'hyaku'. When a voice is inputted to a voice input means 3, the speech whose head part has the same reading with its head part is detected by a speech recognition means 4 in the word notation correspondence dictionary 5. The reading to be processed for speech recognition is limited to the head part, so the number of head parts is decreased and the processing is completed fast.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を認識する音
声認識装置および方法と、音声認識装置の語表記対応辞
書に言語と読みとを格納する辞書作成装置および方法
と、コンピュータのプログラムが予め書き込まれた情報
記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device and method for recognizing voice, a dictionary creation device and method for storing language and reading in a word notation corresponding dictionary of the voice recognition device, and a computer program in advance. It relates to a written information storage medium.

【０００２】[0002]

【従来の技術】現在、音声を認識する音声認識装置の実
用化が要望されている。一般的な音声認識装置は、認識
候補の言語毎に読みが格納された辞書を有しており、音
声が入力されると辞書の読みを検索し、これが一致した
言語として音声を認識する。2. Description of the Related Art Currently, there is a demand for practical use of a voice recognition device for recognizing voice. A general voice recognition device has a dictionary in which a reading is stored for each language of a recognition candidate, and when a voice is input, the reading of the dictionary is searched, and the voice is recognized as a matching language.

【０００３】[0003]

【発明が解決しようとする課題】上述のような音声認識
装置は、音声に一致する読みを辞書から検索することに
より、音声を言語として認識することができる。The speech recognition device as described above can recognize speech as a language by searching the dictionary for a reading that matches the speech.

【０００４】しかし、認識する言語の個数は膨大である
ため、その読みを個々に検索していると処理が遅滞し、
連続的に入力される音声をリアルタイムに認識するよう
なことが困難となる。However, since the number of languages to recognize is enormous, processing is delayed if the readings are searched individually,
It becomes difficult to recognize continuously input voices in real time.

【０００５】このような課題を解決する手法の一つが、
「自由発声音声認識における意味を考慮した２段ＬＲパ
ーザ」（南等、日本音響学会講演論文集、3-4-10，199
3.3）に開示されている。これは電話番号案内の問い合
わせタスクを想定しており、このような問い合わせは、
項目に意味が有り文体には意味が無いことに着目し、意
味を考慮することで認識候補の言語を減少させている。
より具体的には、ＬＲテーブルを意味のレベルで分類し
て作成し、意味が同等の複数の言語を一つにまとめるな
どしている。One of the techniques for solving such a problem is
"Two-stage LR parser considering meaning in free speech recognition" (Minami et al., Proceedings of ASJ, 3-4-10, 199)
It is disclosed in 3.3). This is intended for phone number inquiry tasks, and such inquiries are
Focusing on the fact that the items have meaning and the style has no meaning, the language of recognition candidates is reduced by considering the meaning.
More specifically, the LR table is created by classifying it at the meaning level, and a plurality of languages having the same meaning are put together.

【０００６】しかし、タスクを極度に限定すれば、上述
のようにして認識候補の言語を削減することができる
が、一般的なタスクの場合、意味が同等の言語が多数の
場合もあり、このような場合には認識候補の言語を有効
に削減することができない。However, if the tasks are extremely limited, the recognition candidate languages can be reduced as described above. However, in the case of general tasks, there are many languages having the same meaning. In such a case, the recognition candidate languages cannot be effectively reduced.

【０００７】また、認識候補の言語を減少させる他の手
法が、「メニューに基づく音声自然言語入力システム」
（山本等、情報処理学会第47回全国大会、7M-2，1993.1
9)に開示されている。これは音声の入力単位を文節と
し、システムが高い確率で処理を実行できる範囲に入力
を制限して認識対象の言語を削減している。Another method for reducing the recognition candidate language is a "menu-based speech natural language input system".
(Yamamoto et al., 47th National Convention of IPSJ, 7M-2, 1993.1
It is disclosed in 9). In this method, the input unit of speech is a phrase, and the input is limited to the range where the system can execute the process with high probability to reduce the recognition target language.

【０００８】しかし、人間の発声は文頭に比較して文末
が曖昧になる傾向があるため、上述のように入力単位を
音節とすると誤認識が発生しやすい。また、連続した数
詞などの発声では、“さんじゅう，さんじゅうに，さん
びゃく”のように、最初は同一で最後が相違することが
多いが、このような音声を上述した手法で認識すると、
認識率が低いまま処理に時間を要する。However, since human utterances tend to be ambiguous at the end of sentences as compared to the beginning of sentences, erroneous recognition tends to occur when the input unit is a syllable as described above. In addition, in the case of utterances such as consecutive numbers, the first is often the same and the last is different, such as "sanju, sanjuni, sanbyaku", but when such a voice is recognized by the method described above,
Processing takes time with a low recognition rate.

【０００９】このような場合、長時間の処理で一つの間
違った認識結果が出力されるよりは、正解が含まれる複
数の認識候補が短時間の処理で出力されるほうが望まし
い。つまり、結果として出力される認識候補が複数で
も、それに正解が含まれるならば、これを他の手法によ
り一つの正解に絞り込むことが可能であり、このような
場合には最初の処理が迅速であることが要求される。し
かし、このようなことは、上述した手法に考慮されてい
ない。In such a case, it is desirable that a plurality of recognition candidates including the correct answer be output in a short time process, rather than one incorrect recognition result being output in a long time process. In other words, even if there are multiple recognition candidates that are output as a result, if the correct answer is included, it is possible to narrow it down to one correct answer by another method. In such a case, the first process is quick. Required to be present. However, such a thing is not considered in the above-mentioned method.

【００１０】[0010]

【課題を解決するための手段】請求項１記載の発明の音
声認識装置は、認識対象の音声が入力される音声入力手
段と、認識候補の言語が先頭部の読み毎に予め格納され
た語表記対応辞書と、音声入力手段により入力された音
声の先頭部に先頭部の読みが一致する言語を語表記対応
辞書から検出する音声認識手段と、を有する。音声入力
手段に認識対象の音声が入力されると、音声認識手段
は、音声の先頭部に先頭部の読みが一致する言語を語表
記対応辞書から検出する。このとき、音声認識の処理対
象となる読みが先頭部に制限されており、その個数が削
減されているので、この処理動作は高速に実行される。
人間の発声は音声の先頭部で明瞭な傾向にあるので、誤
認識の発生率も低下する。認識結果が複数となる場合は
発生するが、これには高確率で正解が含まれるので、長
時間の処理で一つの間違った認識結果が出力されるもの
より実用的である。なお、ここで言う先頭部は、先頭か
ら一定の部分であるので、例えば、短い言語では、読み
の全体が読みの先頭部となることもある。According to a first aspect of the present invention, there is provided a voice recognition device, wherein a voice input means for inputting a voice to be recognized and a language of a recognition candidate are stored in advance for each reading of the head part. It has a notation corresponding dictionary and a voice recognition means for detecting from the word notation corresponding dictionary a language in which the reading of the beginning matches the beginning of the voice input by the voice input means. When the voice to be recognized is input to the voice input unit, the voice recognition unit detects, from the word notation corresponding dictionary, a language in which the reading of the beginning matches the beginning of the voice. At this time, the reading to be processed by the voice recognition is limited to the head portion and the number of readings is reduced, so that this processing operation is executed at high speed.
Since human utterances tend to be clear at the beginning of speech, the rate of false recognition also decreases. It occurs when there are multiple recognition results, but since this includes a correct answer with a high probability, it is more practical than one in which one wrong recognition result is output by long-time processing. Since the head part here is a fixed part from the head, for example, in a short language, the entire reading may be the head part of the reading.

【００１１】請求項２記載の発明の辞書作成装置は、各
種の言語が読みと共に予め格納された一般言語辞書と、
この一般言語辞書から取り出した言語を読みの先頭部毎
に語表記対応辞書に格納する読み生成手段と、を有す
る。読み生成手段は、一般言語辞書から取り出した言語
を読みの先頭部毎に語表記対応辞書に格納するので、先
頭が同一でも末尾が相違する複数の言語が一つの読みに
集約される。このような語表記対応辞書を利用して音声
認識装置が音声を認識する場合、この音声認識装置の処
理対象となる読みの個数が削減される。なお、ここで言
う一般言語辞書は、各種の言語が読みと共に予め格納さ
れた一般的な辞書であれば良く、例えば、既存の言語デ
ータベース等が利用できる。According to another aspect of the dictionary creating apparatus of the present invention, a general language dictionary in which various languages are read and stored in advance,
And a reading generation unit that stores the language extracted from the general language dictionary in the word notation corresponding dictionary for each head part of the reading. Since the reading generation means stores the language extracted from the general language dictionary in the word notation corresponding dictionary for each head portion of the reading, a plurality of languages having the same head but different tails are integrated into one reading. When the voice recognition device recognizes a voice by using such a word notation compatible dictionary, the number of readings to be processed by the voice recognition device is reduced. It should be noted that the general language dictionary mentioned here may be a general dictionary in which various languages are stored in advance together with reading, and for example, an existing language database or the like can be used.

【００１２】請求項３記載の発明の辞書作成装置では、
言語の読みの表音単位が予め格納された表音単位辞書を
設け、読み生成手段は、言語の先頭から一定の表音単位
の部分を読みの先頭部として生成するので、簡易な処理
で読みの先頭部の長さが一定となり、音声認識装置の処
理対象となる読みに人間の発声の特徴が良好に反映され
る。According to the dictionary creating apparatus of the invention of claim 3,
A phonetic unit dictionary in which phonetic units for reading the language are stored in advance is provided, and the reading generation means generates a fixed phonetic unit portion from the beginning of the language as the beginning part of the reading, so reading is performed by simple processing. The length of the beginning part of the voice recognition becomes constant, and the characteristics of human utterance are well reflected in the reading to be processed by the voice recognition device.

【００１３】請求項４記載の発明の辞書作成装置では、
読み生成手段は、言語の先頭から一定の表記単位の部分
を読みの先頭部として生成するので、簡易な処理で読み
の先頭部の長さが一定となり、音声認識装置の処理対象
となる読みに人間の発声の特徴が良好に反映される。According to another aspect of the dictionary creating apparatus of the present invention,
Since the reading generation means generates a portion of a fixed notation unit from the beginning of the language as the beginning of the reading, the length of the beginning of the reading becomes constant by a simple process, and the reading to be processed by the voice recognition device becomes The characteristics of human vocalization are well reflected.

【００１４】請求項５記載の発明の辞書作成装置では、
言語の分類毎に読みの先頭部の長さが長さ設定辞書に予
め設定されており、読み生成手段は、生成する言語の読
みの先頭部の長さを分類毎に可変する。例えば、特定の
言語のみ読みの長さを延長すれば、音声認識装置の処理
全体の所要時間は増加させることなく、特定の音声の認
識精度が向上する。According to another aspect of the dictionary creating apparatus of the present invention,
The length of the leading part of the reading is preset in the length setting dictionary for each language classification, and the reading generation means changes the length of the leading part of the reading of the generated language for each classification. For example, if the reading length of only a specific language is extended, the recognition accuracy of a specific voice is improved without increasing the time required for the entire processing of the voice recognition device.

【００１５】請求項６記載の発明の辞書作成装置では、
数詞の各桁の読みが桁数毎に桁対応辞書に予め格納され
ており、読み生成手段は、複数桁の数詞を言語として先
頭部の読みを生成する場合、先頭部の所定桁の数詞の読
みを一般言語辞書から検出すると共に、先頭部の所定桁
の読みを桁対応辞書から検出して組み合わせる。一般言
語辞書が一般的なデータベースなどからなる場合、一桁
の数詞は格納されていても複数桁の数詞は格納されてい
ない可能性が高いが、このような場合でも複数桁の数詞
の読みが簡易な処理で生成される。According to the dictionary creating apparatus of the invention described in claim 6,
The reading of each digit of the number is stored in advance in the digit-corresponding dictionary for each number of digits, and the reading generation means, when generating the reading of the leading part by using the plurality of digits of the language as the language, The reading is detected from the general language dictionary, and the reading of a predetermined digit at the beginning is detected from the digit correspondence dictionary and combined. If the general language dictionary consists of a general database, etc., it is highly likely that a single-digit number will be stored, but a multi-digit number will not be stored. It is generated by a simple process.

【００１６】請求項７記載の発明の辞書作成装置では、
組み合わされる数詞により変化する各桁の読みが読み変
化辞書に予め格納されており、読み生成手段は、複数桁
の数詞を言語として読みを生成する場合に、読み変化辞
書を参照して対応する数詞の読みを修正する。単純な組
み合わせでは不自然な形態となる読みが、自然な形態に
修正される。According to another aspect of the dictionary creating apparatus of the present invention,
The reading of each digit that changes depending on the combined number is stored in advance in the reading change dictionary, and the reading generation means refers to the reading change dictionary to generate a reading when the reading is generated using a plurality of digits as a language. Correct the reading of. Readings that are unnatural in a simple combination are corrected to a natural form.

【００１７】請求項８記載の発明の辞書作成装置では、
言語の分類が言語分類辞書に予め設定されており、読み
生成手段は、生成する読みに対応する言語を言語分類辞
書の設定に従って分類し、この分類された言語の個数が
予め設定された基準値を超過しなければ、読みを先頭部
に制限しない。例えば、音声認識の出現頻度が高い言語
の読みを先頭部に制限し、出現頻度が低い言語の読みを
先頭部に制限しないようにすれば、音声認識装置の処理
時間が短縮されると共に認識精度が向上する。According to another aspect of the dictionary creating apparatus of the present invention,
The language classification is preset in the language classification dictionary, and the reading generation unit classifies the language corresponding to the generated reading according to the setting of the language classification dictionary, and the number of the classified languages is a preset reference value. If it does not exceed, the reading is not restricted to the beginning. For example, if the reading of a language in which voice recognition frequently appears is limited to the beginning and the reading of a language in which appearance frequency is low is not limited to the beginning, the processing time of the speech recognition apparatus is shortened and the recognition accuracy is reduced. Is improved.

【００１８】請求項９記載の発明の辞書作成装置では、
言語の分類が言語分類辞書に予め設定されており、読み
生成手段は、生成する読みに対応する言語を言語分類辞
書の設定に従って分類し、この分類における読みの個数
が予め設定された基準値を超過しないように、生成する
言語の読みの先頭部の長さを可変する。一つの分類の言
語が多数でも読みの個数は一定となり、一つの分類の言
語が少数の場合は読みが先頭部に制限されない。According to the dictionary creating apparatus of the invention of claim 9,
The language classification is preset in the language classification dictionary, and the reading generation means classifies the language corresponding to the generated reading according to the setting of the language classification dictionary, and the number of readings in this classification is set to a preset reference value. The length of the leading part of the reading of the generated language is changed so that it does not exceed the limit. Even if there are many languages in one category, the number of readings is constant, and if the number of languages in one category is small, the reading is not limited to the beginning.

【００１９】請求項１０記載の発明の辞書作成装置で
は、複数の言語の連続する条件が条件設定辞書に予め設
定されており、読み生成手段は、条件設定辞書を参照し
て複数の連続する言語の読みを生成し、末尾に位置する
言語のみ読みを先頭部に制限する。連続が予想される複
数の言語が予め組み合わされ、一つの言語と同様に取り
扱かわれる。In the dictionary creating apparatus according to the tenth aspect of the present invention, consecutive conditions of a plurality of languages are preset in the condition setting dictionary, and the reading generation means refers to the condition setting dictionary and a plurality of consecutive languages. Generate the reading of and limit the reading to the beginning only in the language located at the end. Multiple languages that are expected to be consecutive are combined in advance and treated in the same way as one language.

【００２０】請求項１１記載の発明の辞書作成装置で
は、読み生成手段は、複数の連続する言語の全体の読み
が予め設定された基準値を超過しないように、末尾に位
置する言語の読みの先頭部の長さを制限するので、組み
合わされる複数の言語の先頭の言語が長くとも全体の長
さは一定となる。In the dictionary creating apparatus according to the eleventh aspect of the present invention, the reading generation means reads the reading of the language located at the end so that the total reading of a plurality of consecutive languages does not exceed a preset reference value. Since the length of the head part is limited, even if the head language of a plurality of languages to be combined is long, the total length is constant.

【００２１】請求項１２記載の発明の辞書作成装置で
は、条件設定辞書は、複数の言語の連続する条件と共
に、末尾に位置する言語の読みの長さが予め設定されて
おり、読み生成手段は、末尾に位置する言語の読みを設
定された長さに制限する。末尾の言語の読みの長さが分
類に従って可変されるので、先頭の言語が長いほど末尾
の言語を短くするようなことができる。In the dictionary creating apparatus according to the twelfth aspect of the present invention, the condition setting dictionary is configured such that the reading length of the language located at the end is preset together with the continuous conditions of a plurality of languages. , Limit the reading of the last language to a set length. Since the reading length of the last language is variable according to the classification, the longer the first language, the shorter the last language can be.

【００２２】請求項１３記載の発明の音声認識方法は、
認識候補の言語を先頭部の読み毎に語表記対応辞書に予
め格納しておき、認識対象の音声の先頭部に先頭部の読
みが一致する言語を前記語表記対応辞書から検出するよ
うにした。認識対象の音声が入力されると、これに先頭
部の読みが一致する言語が語表記対応辞書から検出され
る。このとき、音声認識の処理対象となる読みが先頭部
に制限されており、その個数が削減されているので、こ
の処理動作は高速に実行される。人間の発声は音声の先
頭部で明瞭な傾向にあるので、誤認識の発生率も低下す
る。認識結果が複数となる場合は発生するが、これには
高確率で正解が含まれるので、長時間の処理で一つの間
違った認識結果が出力されるものより実用的である。The speech recognition method according to the invention of claim 13 is
The language of the recognition candidate is stored in advance in the word expression corresponding dictionary for each reading of the head part, and the language in which the head reading matches the head part of the speech to be recognized is detected from the word expression corresponding dictionary. . When the speech to be recognized is input, a language whose reading at the beginning matches the speech is detected from the word notation corresponding dictionary. At this time, the reading to be processed by the voice recognition is limited to the head portion and the number of readings is reduced, so that this processing operation is executed at high speed. Since human utterances tend to be clear at the beginning of speech, the rate of false recognition also decreases. It occurs when there are multiple recognition results, but since this includes a correct answer with a high probability, it is more practical than one in which one wrong recognition result is output by long-time processing.

【００２３】請求項１４記載の発明の辞書作成方法は、
各種の言語が読みと共に予め格納された一般言語辞書か
ら言語を取り出し、この取り出した言語を読みの先頭部
毎に語表記対応辞書に格納するようにした。一般言語辞
書から取り出された言語が読みの先頭部毎に語表記対応
辞書に格納されるので、先頭が同一でも末尾が相違する
複数の言語が一つの読みに集約される。このような語表
記対応辞書を利用して音声認識装置が音声を認識する場
合、この音声認識装置の処理対象となる読みの個数が削
減される。The dictionary creating method according to the invention of claim 14 is
Various languages are taken out from a general language dictionary that is stored in advance with reading, and the taken out languages are stored in the word notation corresponding dictionary for each leading part of reading. Since the languages extracted from the general language dictionary are stored in the word notation corresponding dictionary for each head part of the reading, a plurality of languages having the same head but different tails are aggregated into one reading. When the voice recognition device recognizes a voice by using such a word notation compatible dictionary, the number of readings to be processed by the voice recognition device is reduced.

【００２４】請求項１５記載の発明の情報記憶媒体は、
コンピュータが読取自在なソフトウェアが予め書き込ま
れた情報記憶媒体において、認識候補の言語が先頭部の
読み毎に予め格納された語表記対応辞書と、認識対象の
音声の先頭部に先頭部の読みが一致する言語を前記語表
記対応辞書から前記コンピュータに検出させるプログラ
ムと、が書き込まれている。この情報記憶媒体のソフト
ウェアをコンピュータに読み取らせて動作させれば、こ
のコンピュータは、音声の先頭部に先頭部の読みが一致
する言語を語表記対応辞書から検出する音声認識装置と
して機能する。このとき、音声認識の処理対象となる読
みが先頭部に制限されており、その個数が削減されてい
るので、この処理動作は高速に実行される。人間の発声
は音声の先頭部で明瞭な傾向にあるので、誤認識の発生
率も低下する。認識結果が複数となる場合は発生する
が、これには高確率で正解が含まれるので、長時間の処
理で一つの間違った認識結果が出力されるものより実用
的である。An information storage medium according to the invention of claim 15 is
In an information storage medium in which computer-readable software is written in advance, a word notation-compatible dictionary in which a recognition candidate language is stored in advance for each reading of the beginning portion and a reading of the beginning portion in the beginning portion of the speech to be recognized are stored. A program for causing the computer to detect a matching language from the word notation corresponding dictionary is written. When the software of this information storage medium is read by a computer and operated, the computer functions as a voice recognition device that detects, from the word notation corresponding dictionary, a language in which the reading of the beginning matches the beginning of the voice. At this time, the reading to be processed by the voice recognition is limited to the head portion and the number of readings is reduced, so that this processing operation is executed at high speed. Since human utterances tend to be clear at the beginning of speech, the rate of false recognition also decreases. It occurs when there are multiple recognition results, but since this includes a correct answer with a high probability, it is more practical than one in which one wrong recognition result is output by long-time processing.

【００２５】請求項１６記載の発明の情報記憶媒体は、
コンピュータが読み取って対応する動作を実行するプロ
グラムが予め書き込まれた情報記憶媒体において、各種
の言語が読みと共に予め格納された一般言語辞書から言
語を取り出すこと、この取り出した言語を読みの先頭部
毎に語表記対応辞書に格納すること、を前記コンピュー
タに実行させるプログラムが書き込まれている。この情
報記憶媒体のプログラムをコンピュータに読み取らせて
動作させれば、このコンピュータは、音声認識装置に利
用される語表記対応辞書を作成する辞書作成装置として
機能する。このとき、一般言語辞書から取り出された言
語が読みの先頭部毎に語表記対応辞書に格納されるの
で、先頭が同一でも末尾が相違する複数の言語が一つの
読みに集約される。このような語表記対応辞書を利用し
て音声認識装置が音声を認識する場合、この音声認識装
置の処理対象となる読みの個数が削減される。The information storage medium of the invention according to claim 16 is
In an information storage medium in which a program that a computer reads and executes a corresponding operation is written in advance, take out a language from a general language dictionary in which various languages are read and stored in advance. A program for causing the computer to execute is stored in the word notation compatible dictionary. When the computer reads the program of the information storage medium and operates the computer, the computer functions as a dictionary creating device that creates a dictionary corresponding to word notation used in the voice recognition device. At this time, since the language extracted from the general language dictionary is stored in the word notation corresponding dictionary for each head part of the reading, a plurality of languages having the same head but different tails are integrated into one reading. When the voice recognition device recognizes a voice by using such a word notation compatible dictionary, the number of readings to be processed by the voice recognition device is reduced.

【００２６】[0026]

【発明の実施の形態】本発明の実施の第一の形態を図１
ないし図６に基づいて以下に説明する。まず、図１に示
すように、ここで例示する音声認識装置１と辞書作成装
置２とは、一体に形成されており、図２および図３に示
すように、そのハードウェアとしてデータ処理装置であ
るコンピュータシステム１００を有している。このコン
ピュータシステム１００は、コンピュータの主体として
ＣＰＵ(Central Processing Unit）１０１を有してお
り、このＣＰＵ１０１には、バスライン１０２により、
ＲＯＭ(Read Only Memory)１０３、ＲＡＭ(Random Acce
ss Memory)１０４、ＨＤ(Hard Disk…図示せず）を内蔵
したＨＤＤ(HD Drive)１０５、ＦＤ(Floppy Disk）１０
６が装填されるＦＤＤ(FD Drive)１０７、ＣＤ(Compact
Disk)−ＲＯＭ１０８が装填されるＣＤ−ＲＯＭドライ
ブ１０９、マウス１１０が接続されたキーボード１１
１、ディスプレイ１１２、マイクロフォン１１３、通信
Ｉ／Ｆ(Interface）１１４、等が接続されている。FIG. 1 shows a first embodiment of the present invention.
This will be described below with reference to FIG. First, as shown in FIG. 1, the voice recognition device 1 and the dictionary creation device 2 illustrated here are integrally formed. As shown in FIGS. 2 and 3, the hardware is a data processing device. It has a computer system 100. This computer system 100 has a CPU (Central Processing Unit) 101 as a main body of a computer, and this CPU 101 is provided with a bus line 102.
ROM (Read Only Memory) 103, RAM (Random Acce
ss Memory) 104, HDD (HD Drive) 105 with built-in HD (Hard Disk ... Not shown), FD (Floppy Disk) 10
FDD (FD Drive) 107, CD (Compact
Disk) -CD-ROM drive 109 in which ROM 108 is loaded, keyboard 11 to which mouse 110 is connected
1, a display 112, a microphone 113, a communication I / F (Interface) 114, etc. are connected.

【００２７】このコンピュータシステム１００は、前記
ＣＰＵ１０１に各種の処理動作を実行させるプログラム
等が予め設定されており、このプログラム等のソフトウ
ェアは、例えば、情報記憶媒体である前記ＲＡＭ１０４
や前記ＨＤＤ１０５のＨＤ（図示せず）に予め書き込ま
れている。上述のようなコンピュータシステム１００に
おいて、前記ＣＰＵ１０１が前記ＲＡＭ１０４等に格納
されたプログラムに従って各種の処理動作を実行するこ
とにより、本実施の形態の音声認識装置１と辞書作成装
置２とが実現されている。In the computer system 100, a program or the like for causing the CPU 101 to execute various processing operations is set in advance, and the software such as the program is, for example, the RAM 104 which is an information storage medium.
And is previously written in the HD (not shown) of the HDD 105. In the computer system 100 as described above, the CPU 101 executes various processing operations in accordance with the programs stored in the RAM 104 or the like, whereby the voice recognition device 1 and the dictionary creating device 2 of the present embodiment are realized. There is.

【００２８】本実施の形態の音声認識装置１は、図１に
示すように、音声入力手段である音声入力部３、音声認
識手段である音声認識部４、語表記対応辞書である語表
記対応表５、結果出力手段である結果出力部６、結果選
択手段である結果選択部７、を有しており、前記音声認
識部４に前記音声入力部３と前記語表記対応表５と前記
結果出力部６と前記結果選択部７とが接続されている。
本実施の形態の辞書作成装置２は、一般言語辞書である
単語辞書８と読み生成手段である読み生成部９とを有し
ており、この読み生成部９には、前記単語辞書８と前記
語表記対応表５とが接続されている。As shown in FIG. 1, the voice recognition device 1 of the present embodiment supports a voice input unit 3 which is a voice input unit, a voice recognition unit 4 which is a voice recognition unit, and a word notation which is a word notation compatible dictionary. It has a table 5, a result output section 6 which is a result output means, and a result selection section 7 which is a result selection means, and the voice recognition section 4 has the voice input section 3, the word notation correspondence table 5, and the result. The output unit 6 and the result selection unit 7 are connected.
The dictionary creating device 2 of the present embodiment has a word dictionary 8 that is a general language dictionary and a reading generating unit 9 that is a reading generating unit. The reading generating unit 9 includes the word dictionary 8 and the reading dictionary. The word notation correspondence table 5 is connected.

【００２９】前記音声入力部３は、ハードウェアとして
前記マイクロフォン１１３などを有しており、人間が発
声した音声を電気信号に変換する。前記語表記対応表５
と前記単語辞書８とは、前記ＲＡＭ１０４等の情報記憶
媒体を有しており、ここでは認識候補の言語として数詞
が予め格納されている。前記単語辞書８は、例えば、音
声認識のタスクに対応した数詞の既存のデータベースな
どからなり、図４に示すように、認識候補となる各種の
数詞が読みと共に予め格納されている。The voice input unit 3 has the microphone 113 as hardware and converts a voice uttered by a human into an electric signal. Correspondence table 5 for word notation
The word dictionary 8 and the word dictionary 8 have an information storage medium such as the RAM 104. In this case, numerical words are stored in advance as a recognition candidate language. The word dictionary 8 is composed of, for example, an existing database of numerical words corresponding to a task of voice recognition, and as shown in FIG. 4, various numerical words that are recognition candidates are stored together with readings in advance.

【００３０】前記語表記対応表５には、図５に示すよう
に、認識候補の数詞が先頭部の読み毎に予め格納されて
いる。この数詞の先頭部の読みは、ここでは数詞の読み
の先頭から一定の表記単位の部分として生成されてお
り、具体的には、数詞の表記単位である文字の個数が三
個以下となるように制限されている。このため、数詞で
ある“１１５”の読みである“ひゃくじゅうご”などは
先頭部の“ひゃく”に短縮されているが、“１００”の
“ひゃく”は“ひゃく”のままである。In the word notation correspondence table 5, as shown in FIG. 5, the recognition candidate numbers are stored in advance for each reading of the head part. The reading of the leading part of the numeral is generated as a part of a fixed notation unit from the beginning of the reading of the numeral, and specifically, the number of characters as the notation unit of the numeral is three or less. Is restricted to. For this reason, "hyakujugo," which is the reading of the number "115", is shortened to "hyaku" at the beginning, but "100""hyaku" remains "hyaku."

【００３１】前記音声認識部４は、前記ＣＰＵ１０１な
どを有しており、前記音声入力部３が認識対象の音声が
入力されると、この音声の先頭部に先頭部の読みが一致
する数詞を語表記対応表５から検出する。この場合、上
述のように語表記対応表５に格納されている読みの先頭
部は三文字なので、入力された音声も先頭の三文字のみ
が処理対象となる。The voice recognition unit 4 has the CPU 101 and the like, and when the voice input unit 3 inputs a voice to be recognized, the head of the voice is read with a numeral that matches the reading of the head. It is detected from the word notation correspondence table 5. In this case, since the leading part of the reading stored in the word notation correspondence table 5 is three characters as described above, only the leading three characters of the input voice are processed.

【００３２】前記結果出力部６は、前記ディスプレイ１
１２などを有しており、前記音声認識部４の認識結果を
出力する。前記結果選択部７は、前記キーボード１１１
などを有しており、前記結果出力部６の出力結果が複数
の場合に、これをユーザの手動操作に対応して一つに選
定する。The result output unit 6 includes the display 1
12 and the like, and outputs the recognition result of the voice recognition unit 4. The result selection unit 7 uses the keyboard 111.
When there are a plurality of output results of the result output unit 6, one of them is selected in response to the user's manual operation.

【００３３】前記読み生成部９は、前記ＣＰＵ１０１な
どを有しており、単語辞書８から取り出した数詞を、読
みの先頭部毎に音声認識装置１の語表記対応表５に格納
する。前述のように語表記対応表５には数詞の読みが先
頭部の三文字毎に格納されるので、前記読み生成部９
は、前記単語辞書８から取り出した数詞の読みの先頭か
ら一定の表記単位である三文字の部分を先頭部として生
成する。The reading generation unit 9 has the CPU 101 and the like, and stores the numerical words extracted from the word dictionary 8 in the word notation correspondence table 5 of the voice recognition device 1 for each head of the reading. As described above, in the word notation correspondence table 5, the readings of the numerical words are stored for every three characters at the beginning, so that the reading generation unit 9
Generates a three-letter part, which is a fixed notation unit, from the beginning of the reading of the number taken out from the word dictionary 8 as the beginning.

【００３４】上述した音声認識装置１と辞書作成装置２
との各部は、必要により前記キーボード１１１や前記デ
ィスプレイ１１２や前記マイクロフォン１１３等のハー
ドウェアを利用して実現されるが、その主体は前記ＲＡ
Ｍ１０４等に書き込まれたソフトウェアに対応して前記
ＣＰＵ１０１が動作することにより実現されている。The above-mentioned voice recognition device 1 and dictionary creation device 2
The respective units are realized by utilizing the hardware such as the keyboard 111, the display 112, the microphone 113, etc., if necessary.
This is realized by the CPU 101 operating in accordance with software written in the M104 or the like.

【００３５】このように前記ＲＡＭ１０４に書き込まれ
たソフトウェアは、前記ＣＰＵ１０１が読取自在なソフ
トウェアからなる前記単語辞書８、この単語辞書８から
言語を取り出させて読みの先頭部毎に前記語表記対応表
５に格納させる前記ＣＰＵ１０１の制御プログラム、前
記ＣＰＵ１０１が読取自在なソフトウェアからなる前記
語表記対応表５、前記音声入力部３に認識対象の音声が
入力されると、その先頭部に先頭部の読みが一致する言
語を前記語表記対応表５から検出させる前記ＣＰＵ１０
１の制御プログラム、等からなる。The software written in the RAM 104 in this way is the word dictionary 8 composed of software readable by the CPU 101, the language is taken out from the word dictionary 8 and the word notation correspondence table is provided for each leading part of reading. 5, the control program of the CPU 101 stored therein, the word notation correspondence table 5 composed of software readable by the CPU 101, and when the voice to be recognized is input to the voice input unit 3, the head portion is read at the beginning. CPU 10 for detecting a language in which is matched from the word notation correspondence table 5
1 control program, etc.

【００３６】このような構成において、音声認識装置１
は、人間が発声する音声を認識する。より詳細には、図
６に示すように、人間が発声した音声が音声入力部３に
入力されると、音声認識部４は、この音声の先頭部の三
文字を抽出し、始点を先頭に固定したスポッティングに
より、読みが一致する数詞を語表記対応表５から検出す
る。このように検出された数詞は結果出力部６から出力
されるので、検出された数詞が複数の場合は結果選択部
７の手動操作により一つに選定される。In such a configuration, the voice recognition device 1
Recognizes human voices. More specifically, as shown in FIG. 6, when a voice uttered by a human is input to the voice input unit 3, the voice recognition unit 4 extracts the three characters at the beginning of the voice and sets the start point to the beginning. Numerals whose readings match are detected from the word notation correspondence table 5 by the fixed spotting. Since the number output detected in this way is output from the result output unit 6, if there are a plurality of detected number output, the result selection unit 7 manually selects one number.

【００３７】例えば、音声として“ひゃくじゅう”が入
力されると、読みが“ひゃく”の数詞である“１００，
１１０，１１５”の三つが出力されるので、ユーザは所
望により“１１０”を選択することになる。For example, when "Hyakuju" is input as a voice, the reading is "100," which is the number of "Hyaku".
Since three of 110 and 115 "are output, the user selects" 110 "as desired.

【００３８】上述した音声認識装置１は、語表記対応表
５に三文字の読み毎に数詞が格納されているので、処理
対象となる読みの個数が削減されており、処理負担が軽
減されて所要時間が短縮されている。しかも、このよう
に読みの検索処理を三文字だけで実行するので、このこ
とでも処理負担が軽減されて所要時間が短縮されてい
る。In the voice recognition device 1 described above, since the word notation correspondence table 5 stores the numerics for each reading of three characters, the number of readings to be processed is reduced and the processing load is reduced. The time required is shortened. Moreover, since the reading retrieval processing is executed with only three characters in this way, the processing load is reduced and the required time is shortened.

【００３９】人間の発声は文頭に比較して文末が曖昧に
なる傾向があるが、上述した音声認識装置１は、音声の
先頭部のみを処理対象とするので、誤認識が発生しにく
い。この場合、上述のように認識結果が複数となること
が多発するが、この複数の認識候補には高確率で正解が
含まれており、短時間の処理で出力されるので、これを
一つに選定する第二の処理を実行しても全体の所要時間
は短く、長時間の処理で一つの間違った認識結果が出力
されるものより実用的である。Human utterances tend to be ambiguous at the end of a sentence as compared to the beginning of a sentence, but since the voice recognition apparatus 1 described above targets only the beginning of a voice, erroneous recognition is less likely to occur. In this case, there are many cases where the recognition result becomes plural as described above, but the plural recognition candidates include the correct answer with a high probability and are output in a short time. Even if the second process selected in step 1 is executed, the overall required time is short, and it is more practical than the case where one erroneous recognition result is output in a long time process.

【００４０】音声認識装置１の語表記対応表５は、上述
のように数詞が特殊な読み毎に格納されているが、これ
は辞書作成装置２により機械的に作成される。つまり、
単語辞書８には、図４に示すように、認識候補となる各
種の数詞が読みと共に予め格納されているので、読み生
成部９が、単語辞書８から取り出した数詞を、読みの先
頭部毎に音声認識装置１の語表記対応表５に格納する。The word notation correspondence table 5 of the voice recognition device 1 stores the number for each special reading as described above, which is mechanically created by the dictionary creating device 2. That is,
As shown in FIG. 4, in the word dictionary 8, various numbers as recognition candidates are pre-stored together with the reading, so that the reading generation unit 9 extracts the numbers extracted from the word dictionary 8 for each reading head part. And stored in the word notation correspondence table 5 of the voice recognition device 1.

【００４１】より具体的には、最初に単語辞書８から数
詞“１００”が取り出された場合、その読みは“ひゃ
く”なので、この“ひゃく”が読みの先頭部として“１
００”が語表記対応表５に格納される。つぎに、数詞
“１１０”が取り出された場合、その読みは“ひゃくじ
ゅう”なので先頭部は“ひゃく”であり、この数詞“１
１０”は上述した“１００”と共に語表記対応表５の
“ひゃく”の読みの位置に格納される。More specifically, when the number "100" is first retrieved from the word dictionary 8, the reading is "hyaku", so this "hyaku" is "1" as the beginning of the reading.
00 ”is stored in the word notation correspondence table 5. Next, when the number“ 110 ”is taken out, the reading is“ hyakujyu ”, so the head part is“ hyaku ”, and this number“ 1 ”
10 "is stored at the reading position of" hyaku "in the word notation correspondence table 5 together with" 100 "described above.

【００４２】このため、語表記対応表５には、多数の数
詞が少数の読みに割り当てられて格納され、先頭が同一
でも末尾が相違して誤認識が発生しやすい複数の数詞が
一つの読みに集約される。このような音声認識装置１の
語表記対応表５が、辞書作成装置２により既存の単語辞
書８から機械的に作成されるので、この作業を人間が実
行する必要がない。このように数詞の読みを表記単位で
ある文字の個数により先頭部に制限するので、簡易な処
理で読みの先頭部の長さを一定に共通化することができ
る。For this reason, in the word notation correspondence table 5, a large number of numerical words are allocated and stored in a small number of readings, and a plurality of numerical words having the same head but different tails are likely to cause misrecognition. Are summarized in. Since the word notation correspondence table 5 of the voice recognition device 1 is mechanically created by the dictionary creation device 2 from the existing word dictionary 8, it is not necessary for a person to perform this work. In this way, since the reading of the number of words is limited to the head portion according to the number of characters that are the notation units, the length of the head portion of the reading can be made constant in common by a simple process.

【００４３】なお、本発明は上記した実施の形態に限定
されるものではなく、各種の変形を許容する。例えば、
ここでは音声認識装置１と辞書作成装置２とが一体であ
ることを例示したが、これを別体の装置とし、音声認識
装置１の開発時に辞書作成装置２を使用し、製品として
出荷される音声認識装置１には辞書作成装置２を搭載し
ないことも可能である。The present invention is not limited to the above-described embodiment, but allows various modifications. For example,
Here, it is illustrated that the voice recognition device 1 and the dictionary creation device 2 are integrated, but this is a separate device, and the dictionary creation device 2 is used when the voice recognition device 1 is developed and shipped as a product. The voice recognition device 1 may not include the dictionary creation device 2.

【００４４】また、本実施の形態では、ＲＡＭ１０４等
にソフトウェアとして格納されているプログラムに従っ
てＣＰＵ１０１が動作することにより、音声認識装置１
や辞書作成装置２の各部が実現されることを例示した。
しかし、このような各部の各々を固有のハードウェアと
して製作することも可能であり、一部をソフトウェアと
してＲＡＭ１０４等に格納するとともに一部をハードウ
ェアとして製作することも可能である。また、所定のソ
フトウェアが格納されたＲＡＭ１０４等や各部のハード
ウェアを、例えば、ファームウェアとして製作すること
も可能である。In the present embodiment, the CPU 101 operates according to the program stored in the RAM 104 or the like as software, whereby the voice recognition device 1
It is exemplified that each unit of the dictionary creating device 2 is realized.
However, it is also possible to manufacture each of these units as its own hardware, and to store some of them as software in the RAM 104 or the like and to manufacture some of them as hardware. Further, it is possible to manufacture the RAM 104 or the like in which predetermined software is stored and the hardware of each unit as firmware, for example.

【００４５】また、本実施の形態では、コンピュータシ
ステム１００の起動時に、ＨＤＤ１０５に格納されてい
るソフトウェアがＲＡＭ１０４に複写され、このように
ＲＡＭ１０４に格納されたソフトウェアをＣＰＵ１０１
が読み取ることを想定したが、このようなソフトウェア
をＨＤＤ１０５に格納したままＣＰＵ１０１に利用させ
ることや、ＲＯＭ１０３やＲＡＭ１０４に予め書き込ん
でおくことも可能である。Further, in this embodiment, when the computer system 100 is started up, the software stored in the HDD 105 is copied to the RAM 104, and the software stored in the RAM 104 is stored in the CPU 101.
However, it is also possible to allow the CPU 101 to use such software stored in the HDD 105 or to write it in the ROM 103 or the RAM 104 in advance.

【００４６】さらに、単体で取り扱える情報記憶媒体で
あるＦＤ１０６やＣＤ−ＲＯＭ１０９にソフトウェアを
書き込んでおき、このＦＤ１０６等からＲＡＭ１０４等
にソフトウェアをインストールすることも可能であり、
このようなインストールを実行することなくＦＤ１０６
等に書き込まれたソフトウェアをＣＰＵ１０１が適宜読
み取ってデータ処理を実行することも可能である。Further, it is also possible to write the software in the FD 106 or the CD-ROM 109, which is an information storage medium that can be handled by itself, and install the software in the RAM 104 or the like from the FD 106 or the like.
FD106 without performing such installation
It is also possible for the CPU 101 to appropriately read the software written in, for example, to execute data processing.

【００４７】また、このような音声認識装置１や辞書作
成装置２の各部を実現するプログラムを、複数のソフト
ウェアの組み合わせにより実現することも可能であり、
その場合、単体の製品となる情報記憶媒体には必要最小
限のソフトウェアのみを格納しておけば良い。例えば、
オペレーティングシステムが実装されているコンピュー
タシステム１００に、ＣＤ−ＲＯＭ１０８等の情報記憶
媒体によりアプリケーションソフトを提供するような場
合、音声認識装置１や辞書作成装置２の各部を実現する
ソフトウェアは、アプリケーションソフトとオペレーテ
ィングシステムとの組み合わせで実現されるので、オペ
レーティングシステムに依存する部分のソフトウェアは
アプリケーションソフトの情報記憶媒体から省略するこ
とができる。Further, it is also possible to realize a program for realizing each part of the voice recognition device 1 and the dictionary creation device 2 by combining a plurality of software,
In that case, only the minimum necessary software needs to be stored in the information storage medium which is a single product. For example,
When the application software is provided to the computer system 100 in which the operating system is installed by an information storage medium such as the CD-ROM 108, the software that realizes each unit of the voice recognition device 1 and the dictionary creation device 2 is the application software. Since it is realized in combination with the operating system, the software of the part that depends on the operating system can be omitted from the information storage medium of the application software.

【００４８】また、このように情報記憶媒体に書き込ん
だソフトウェアをコンピュータに供給する手法は、その
情報記憶媒体をコンピュータに直接に装填することに限
定されない。例えば、上述のようなソフトウェアをホス
トコンピュータの情報記憶媒体に書き込み、このホスト
コンピュータを通信ネットワークにより端末コンピュー
タに接続し、ホストコンピュータからデータ通信により
端末コンピュータにソフトウェアを供給することも可能
である。The method of supplying the software thus written in the information storage medium to the computer is not limited to loading the information storage medium directly into the computer. For example, it is also possible to write the above-mentioned software on an information storage medium of a host computer, connect the host computer to a terminal computer via a communication network, and supply the software to the terminal computer by data communication from the host computer.

【００４９】この場合、端末コンピュータが自身の情報
記憶媒体にソフトウェアをダウンロードした状態でスタ
ンドアロンのデータ処理を実行することも可能である
が、ソフトウェアをダウンロードすることなくホストコ
ンピュータとのリアルタイムのデータ通信によりデータ
処理を実行することも可能である。この場合、ホストコ
ンピュータと端末コンピュータとを通信ネットワークに
より接続したシステム全体が、本発明の音声認識装置１
や辞書作成装置２に相当することになる。In this case, it is possible for the terminal computer to execute stand-alone data processing while the software is downloaded to its own information storage medium. However, the real-time data communication with the host computer is possible without downloading the software. It is also possible to perform data processing. In this case, the entire system in which the host computer and the terminal computer are connected by the communication network is the voice recognition device 1 of the present invention.
And the dictionary creation device 2.

【００５０】つぎに、本発明の実施の第二の形態を図７
ないし図９に基づいて以下に説明する。なお、この実施
の第二の形態に関し、上述した第一の形態と同一の部分
は、同一の名称および符号を用いて詳細な説明は省略す
る。Next, a second embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIG. Regarding the second embodiment, the same parts as those in the first embodiment described above are designated by the same names and reference numerals, and detailed description thereof is omitted.

【００５１】まず、図７に示すように、ここで例示する
音声認識装置１１と辞書作成装置１２も一体に形成され
ており、この辞書作成装置１２には、表音単位辞書であ
る音節表１３が付加されている。この音節表１３は、Ｒ
ＡＭなどの記憶デバイスを有しており、図８に示すよう
に、数詞の読みの表音単位である音節が予め格納されて
いる。読み生成部９は、単語辞書８から取り出した数詞
を読みの先頭部毎に語表記対応表５に格納する際、前記
音節表１３を参照して数詞の読みの先頭から二つの音節
の部分を先頭部として生成する。First, as shown in FIG. 7, a voice recognition device 11 and a dictionary creating device 12 illustrated here are also integrally formed, and the dictionary creating device 12 has a syllable table 13 which is a phonetic unit dictionary. Has been added. This syllable table 13 is R
It has a storage device such as an AM, and as shown in FIG. 8, syllables, which are the phonetic units for reading the numbers, are stored in advance. The reading generation unit 9 refers to the syllable table 13 when storing the numbers extracted from the word dictionary 8 in the word notation correspondence table 5 for each beginning of the reading, and identifies the two syllables from the beginning of the reading of the numbers. It is generated as the head part.

【００５２】このような構成において、本実施の形態の
音声認識装置１１も、人間が発声する音声を認識する。
この時、音声認識部４は、この音声の先頭部の二音節を
抽出し、これに二音節の読みが一致する数詞を語表記対
応表５から検出する。この音声認識装置１１は、音声を
認識する処理を表音単位である音節に従って実行するの
で、音声認識の処理動作に人間の発声の特徴を良好に反
映させることができる。With such a configuration, the voice recognition device 11 of the present embodiment also recognizes a voice uttered by a human.
At this time, the voice recognition unit 4 extracts the two syllables at the beginning of the voice, and detects from the word notation correspondence table 5 a number that matches the reading of the two syllables. Since the voice recognition device 11 executes the process of recognizing a voice in accordance with the syllable which is a phonetic unit, it is possible to favorably reflect the characteristics of human utterance in the process of voice recognition.

【００５３】そして、辞書作成装置１２も、上述のよう
な音声認識装置１１の語表記対応表５を作成する。その
読み生成部９は、単語辞書８から取り出した数詞を読み
の先頭部毎に語表記対応表５に格納する際、音節表１３
を参照して読みを二音節に制限する。このように数詞の
読みを表音単位である音節の個数により先頭部に制限す
るので、簡易な処理で読みの先頭部の長さを一定に共通
化することができる。Then, the dictionary creating device 12 also creates the word notation correspondence table 5 of the voice recognition device 11 as described above. When the reading generation unit 9 stores the number extracted from the word dictionary 8 in the word notation correspondence table 5 for each head of reading, the syllable table 13
Refer to to limit reading to two syllables. In this way, since the reading of the number is limited to the head portion according to the number of syllables that are phonetic units, the length of the head portion of the reading can be made constant in common by a simple process.

【００５４】つぎに、本発明の実施の第三の形態を図１
０ないし図１３に基づいて以下に説明する。なお、この
実施の第三の形態に関し、上述した第二の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a third embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. Note that, regarding the third embodiment of this embodiment, the same parts as those of the second embodiment described above are denoted by the same names and reference numerals, and detailed description thereof is omitted.

【００５５】まず、図１０に示すように、ここで例示す
る音声認識装置２１と辞書作成装置２２も一体に形成さ
れており、この辞書作成装置２２には、言語分類辞書で
ある認識単語表２３が付加されている。First, as shown in FIG. 10, a voice recognition device 21 and a dictionary creation device 22 illustrated here are integrally formed, and the dictionary creation device 22 has a recognition word table 23 which is a language classification dictionary. Has been added.

【００５６】この認識単語表２３は、ＲＡＭなどの記憶
デバイスを有しており、図１１に示すように、言語であ
る数詞の分類が“商品Ａ，商品Ｂ”として設定され、こ
れらの分類毎に読みの先頭部の長さが“２，３”として
設定されている。ここでは商品Ａがコピーマシンで商品
Ｂがファクシミリなどと想定しており、これに対応する
数詞は商品の型式番号を想定している。The recognition word table 23 has a storage device such as a RAM, and as shown in FIG. 11, the classification of numbers as languages is set as "product A, product B". The length of the leading part of the reading is set as "2, 3". Here, it is assumed that the product A is a copy machine and the product B is a facsimile, and the corresponding numeral is the model number of the product.

【００５７】そして、読み生成部９は、単語辞書８から
取り出した数詞を語表記対応表５に設定する場合に読み
を先頭部に制限する際、生成する読みの先頭部の長さを
前記認識単語表２３の設定内容に対応して分類毎に可変
する。このため、図１２に示すように、前記単語辞書８
に格納された数詞の各々にも“商品Ａ，商品Ｂ”の分類
が設定されており、図１３に示すように、商品Ａの数詞
の読みは二音節からなるが、商品Ｂの数詞の読みは三音
節からなる。Then, the reading generation unit 9 recognizes the length of the leading portion of the reading to be generated when limiting the reading to the leading portion when setting the numerics taken out from the word dictionary 8 in the word notation correspondence table 5. It is changed for each classification according to the setting contents of the word table 23. Therefore, as shown in FIG. 12, the word dictionary 8
The classification of "product A, product B" is also set for each of the numbers stored in the table, and as shown in FIG. 13, the reading of the numbers of the product A consists of two syllables. Consists of three syllables.

【００５８】このような構成において、本実施の形態の
音声認識装置２１も、人間が発声する音声を認識する。
この時、音声認識部４は、この音声の先頭部の一音節ず
つ抽出し、これに読みが一致する数詞を語表記対応表５
から検出する。このため、商品Ａに分類される数詞“ひ
ゃく”は二音節目で認識されるが、商品Ｂに分類される
数詞“にひゃく”は三音節目で認識される。With such a configuration, the voice recognition device 21 of the present embodiment also recognizes a voice uttered by a human.
At this time, the voice recognition unit 4 extracts one syllable at the beginning of the voice, and identifies the number whose reading is consistent with this as a word notation correspondence table 5.
Detect from. For this reason, the number "hyaku" classified as the product A is recognized at the second syllable, while the number "nyak" classified as the product B is recognized at the third syllable.

【００５９】音声認識装置２１は、認識する音声の読み
の長さが言語の分類に従って可変されるので、例えば、
特定の言語のみ読みの長さを延長して処理全体の所要時
間は増加させることなく特定の音声の認識精度を向上さ
せるようなことができ、音声認識の精度や速度を言語の
分類に従って調節することができる。In the voice recognition device 21, since the reading length of the recognized voice is changed according to the classification of the language, for example,
You can improve the recognition accuracy of a specific voice without increasing the reading time for a specific language and increasing the overall processing time, and adjust the voice recognition accuracy and speed according to the language classification. be able to.

【００６０】そして、辞書作成装置２２は、上述のよう
な音声認識装置２１の語表記対応表５を作成する。その
読み生成部９は、単語辞書８から取り出した数詞を読み
の先頭部毎に語表記対応表５に格納する際、認識単語表
２３を参照して読みの長さを分類毎に可変する。このよ
うに数詞の読みの長さを分類毎に可変するので、上述の
ように音声認識の精度や速度が言語の分類に従って調節
された音声認識装置２１の語表記対応表５を、簡易な処
理で作成することができる。Then, the dictionary creating device 22 creates the word notation correspondence table 5 of the voice recognition device 21 as described above. The reading generation unit 9 refers to the recognition word table 23 to change the reading length for each classification when storing the numbers extracted from the word dictionary 8 in the word notation correspondence table 5 for each reading head. In this way, since the reading length of the number is changed for each classification, the word notation correspondence table 5 of the speech recognition device 21 in which the accuracy and speed of the speech recognition are adjusted according to the classification of the language as described above, can be simply processed. Can be created with.

【００６１】つぎに、本発明の実施の第四の形態を図１
４ないし図１８に基づいて以下に説明する。なお、この
実施の第四の形態に関し、前述した第一の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a fourth embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. With regard to the fourth embodiment, the same parts as those in the first embodiment described above are designated by the same names and reference numerals, and detailed description thereof will be omitted.

【００６２】まず、図１４に示すように、ここで例示す
る音声認識装置３１と辞書作成装置３２も一体に形成さ
れている。この辞書作成装置３２には、認識単語辞書で
ある認識単語表３３と桁対応辞書である桁対応表３４と
が付加されている。First, as shown in FIG. 14, the voice recognition device 31 and the dictionary creation device 32 illustrated here are also integrally formed. A recognition word table 33, which is a recognition word dictionary, and a digit correspondence table 34, which is a digit correspondence dictionary, are added to the dictionary creating device 32.

【００６３】前記認識単語表３３は、ＲＡＭなどの記憶
デバイスを有しており、図１５に示すように、認識する
言語として桁数が複数の数詞が“４００，４１０，…”
などと設定されているが、これらの数詞には読みは設定
されていない。一方、単語辞書８には、数詞が読みと共
に格納されているが、これは記憶内容が簡素化されてお
り、図１６に示すように、一桁の数詞“４，５，…”な
どは格納されているが、複数桁の数詞は格納されていな
い。前記桁対応表３４も、ＲＡＭなどの記憶デバイスを
有しており、図１７に示すように、数詞の各桁の読みが
桁数毎に予め格納されている。The recognized word table 33 has a storage device such as a RAM, and as shown in FIG. 15, as a recognized language, a numeral having a plurality of digits is "400, 410, ...".
However, no reading is set for these numbers. On the other hand, although the number dictionary is stored together with the reading in the word dictionary 8, the storage content is simplified, and as shown in FIG. 16, the one-digit number "4,5, ..." Is stored. However, multi-digit numbers are not stored. The digit correspondence table 34 also has a storage device such as a RAM, and as shown in FIG. 17, the reading of each digit of the numeral is stored in advance for each digit number.

【００６４】そして、読み生成部９は、前記認識単語表
３３から取り出した複数桁の数詞を言語として読みを生
成する場合、言語の先頭から一定の表音単位の部分を読
みの先頭部として生成するため、ここでは複数桁の数詞
の読みを先頭の一桁の読みから生成する。数詞の先頭部
の一桁の数詞と桁数とを判断し、この数詞の読みを単語
辞書８から検出すると共に、桁数の読みを前記桁対応表
３４から検出し、これらを組み合わせて語表記対応表５
に格納する。このため、音声認識装置１の語表記対応表
５は、先頭部の一桁のみに対応した読みで数詞が格納さ
れており、音声認識部４は、数詞の音声を先頭部の一桁
のみで認識する。Then, when the reading generation unit 9 generates a reading by using the plural-digit number extracted from the recognition word table 33 as a language, it generates a certain phonetic unit portion from the beginning of the language as the beginning portion of the reading. Therefore, here, the reading of plural digits is generated from the reading of the first digit. The leading digit of the number and the number of digits are determined, the reading of this number is detected from the word dictionary 8, the reading of the number of digits is detected from the digit correspondence table 34, and these are combined and written as a word. Correspondence table 5
To be stored. For this reason, the word notation correspondence table 5 of the voice recognition device 1 stores the numerals in the reading corresponding to only the first digit of the head portion, and the voice recognition unit 4 outputs the voice of the number only in the first digit of the head portion. recognize.

【００６５】このような構成において、本実施の形態の
音声認識装置３１も、人間が発声する音声を語表記対応
表５に格納された数詞として認識する。この語表記対応
表５には、数詞が先頭部の一桁のみに対応した読みで格
納されているので、音声認識部４は、数詞の音声を先頭
部の一桁のみで認識する。In such a configuration, the voice recognition device 31 of the present embodiment also recognizes a voice uttered by a human as a number stored in the word notation correspondence table 5. In this word notation correspondence table 5, since the numbers are stored in the reading corresponding to only the first digit of the head part, the voice recognition unit 4 recognizes the voice of the number only in the first digit of the head part.

【００６６】辞書作成装置３２は、上述のような音声認
識装置３１の語表記対応表５を作成する。その読み生成
部９は、語表記対応表５に格納する複数桁の数詞を認識
単語表３３から取り出し、この複数桁の数詞の先頭部の
一桁の数詞と桁数とを判断し、この数詞の読みを単語辞
書８から検出すると共に桁数の読みを桁対応表３４から
検出して組み合わせる。例えば、複数桁の数詞として
“４００”が取り出されると、先頭部の一桁は数詞が
“４”で桁数が“３”なので、この数詞の読み“よん”
と桁数の読み“ひゃく”とが組み合わされ、先頭部の読
みは“よんひゃく”となる。The dictionary creating device 32 creates the word notation correspondence table 5 of the voice recognition device 31 as described above. The reading generation unit 9 extracts from the recognition word table 33 the plural-digit number stored in the word notation correspondence table 5, determines the leading one-digit number and the number of digits of the plural-digit number, and determines the number. Is read from the word dictionary 8 and the reading of the number of digits is detected from the digit correspondence table 34 and combined. For example, if "400" is extracted as a plural digit number, the first digit of the number is "4" and the number of digits is "3", so the reading of this number is "Yon".
And the reading of the number of digits "hyaku" are combined, and the reading at the beginning is "yonhyaku".

【００６７】前述のように商品の型式番号などを認識対
象の言語とする場合、このような数詞は桁数が多数であ
る場合が一般的である。しかし、単語辞書８が一般的な
データベースなどからなる場合、一桁の数詞は格納され
ていても複数桁の数詞は格納されていない可能性が高
い。このような場合でも、上述した辞書作成装置３２は
複数桁の数詞の読みを簡易な処理で生成することがで
き、音声認識装置３１は、複数桁の数詞を的確な読みと
共に語表記対応表５に獲得することができる。As described above, when the model number of a product is used as a language to be recognized, such a numeral generally has a large number of digits. However, when the word dictionary 8 is composed of a general database or the like, there is a high possibility that a single-digit number is stored but a multi-digit number is not stored. Even in such a case, the above-described dictionary creating device 32 can generate a reading of a plural-digit number by a simple process, and the voice recognition device 31 can accurately read the plural-digit number and the word notation correspondence table 5 Can be earned.

【００６８】しかも、上述のように辞書作成装置３２
は、複数桁の数詞の先頭の一桁から読みを生成するの
で、簡易な処理で読みの長さを一定にすることができ、
この処理が言語の表音単位で実行されているので、音声
認識装置３１の処理動作に人間の発声の特徴を良好に反
映させることができる。つまり、数字や漢字などの表意
文字は、一文字に複数の音節が設定されるが、その発声
は表意文字の表記単位で区切られることが一般的なの
で、これを読みに反映させれば認識精度を向上させるこ
とができる。Moreover, as described above, the dictionary creating device 32
Generates a reading from the first digit of a multi-digit numeral, so the length of the reading can be made constant with simple processing.
Since this processing is executed for each phonetic unit of the language, it is possible to favorably reflect the characteristics of human speech in the processing operation of the voice recognition device 31. In other words, for ideographic characters such as numbers and kanji, multiple syllables are set for one character, but it is common for utterances to be separated by the ideographic character notation unit, so if this is reflected in reading, recognition accuracy will be improved. Can be improved.

【００６９】つぎに、本発明の実施の第五の形態を図１
９ないし図２４に基づいて以下に説明する。なお、この
実施の第五の形態に関し、上述した第四の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a fifth embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. With regard to the fifth embodiment, the same parts as those in the fourth embodiment described above are designated by the same names and reference numerals, and detailed description thereof will be omitted.

【００７０】まず、図１９に示すように、ここで例示す
る音声認識装置４１と辞書作成装置４２も一体に形成さ
れており、この辞書作成装置４２には、読み変化辞書で
ある読み変化表４３が付加されている。First, as shown in FIG. 19, the voice recognition device 41 and the dictionary creating device 42 illustrated here are also integrally formed, and the dictionary creating device 42 has a reading change table 43 which is a reading change dictionary. Has been added.

【００７１】この読み変化表４３は、ＲＡＭなどの記憶
デバイスを有しており、図２３に示すように、組み合わ
される数詞により変化する桁の読みが予め格納されてい
る。そして、読み生成部９は、認識単語表３３から取り
出した複数桁の数詞を言語として読みを生成する場合、
その数詞の先頭部の一桁の数詞の読みを単語辞書８から
検出すると共に桁数の読みを前記桁対応表３４から検出
して組み合わせるが、この場合に前記読み変化表４３を
参照して対応する数詞の読みを修正する。This reading change table 43 has a storage device such as a RAM, and as shown in FIG. 23, readings of digits that change depending on the combined numeral are stored in advance. Then, when the reading generation unit 9 generates a reading using a plurality of digit numbers extracted from the recognition word table 33 as a language,
The one-digit number reading of the leading part of the number is detected from the word dictionary 8 and the number reading is detected and combined from the digit correspondence table 34. In this case, the reading change table 43 is used to correspond. Correct the reading of the number.

【００７２】このような構成において、辞書作成装置４
２は音声認識装置４１の語表記対応表５を作成する。そ
の読み生成部９は、複数桁の数詞の読みを先頭部の一桁
の数詞と桁数との読みの組み合わせで生成する場合に、
読み変化表４３を参照して対応する数詞の読みは修正す
る。例えば、複数桁の数詞として“３００”が取り出さ
れると、先頭部の一桁は数詞が“４”で桁数が“３”な
ので、この数詞の読み“さん”と桁数の読み“ひゃく”
とが単純に組み合わされると“さんひゃく”となるが、
これは自然な読みである“さんびゃく”に修正される。In such a configuration, the dictionary creating device 4
2 creates a word notation correspondence table 5 of the voice recognition device 41. The reading generation unit 9 generates a reading of a plural-digit number by a combination of the reading of the leading one-digit number and the number of digits,
The reading change table 43 is referenced to correct the reading of the corresponding numeral. For example, if "300" is taken out as a plural-digit number, the first digit of the number is "4" and the number of digits is "3", so the reading of this number is "san" and the reading of the number is "hyaku".
When and are simply combined, it becomes “sanhyaku”,
This is corrected to a natural reading, "sanbyaku."

【００７３】上述した辞書作成装置４２は、複数桁の数
詞の読みを簡易な処理で生成することができ、単純な組
み合わせでは不自然な形態となる読みは自然な形態に修
正することができ、音声認識装置４１は、複数桁の数詞
を的確な読みと共に語表記対応表５に獲得することがで
きる。The dictionary creating device 42 described above can generate a reading of a plural-digit number by a simple process, and can correct a reading that is unnatural with a simple combination into a natural form. The voice recognition device 41 can acquire a plural-digit number in the word notation correspondence table 5 together with an accurate reading.

【００７４】つぎに、本発明の実施の第六の形態を図２
５ないし図２７に基づいて以下に説明する。なお、この
実施の第六の形態に関し、前述した第三の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a sixth embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. With regard to the sixth embodiment, the same parts as those in the third embodiment described above are designated by the same names and reference numerals, and detailed description thereof will be omitted.

【００７５】図２５に示すように、言語分類辞書となる
認識単語表２３には、言語である数詞の分類が“商品
Ａ，商品Ｂ”として設定されているが、ここでは読みの
先頭部の長さは設定されていない。そして、読み生成部
９は、前述のように単語辞書８から取り出した数詞を語
表記対応表５に設定する場合に、その読みを音節数など
により先頭部に制限する際、生成する読みに対応する言
語を前記認識単語表２３の設定に従って分類し、この分
類された言語の個数が、予め設定された“５”などの基
準値を超過しなければ、その読みは先頭部に制限しな
い。このため、図２６に示すように、前記単語辞書８に
格納された数詞の各々にも“商品Ａ，商品Ｂ”の分類が
設定されており、図２７に示すように、語表記対応表５
に設定された商品Ａの数詞の読みは先頭部の二音節に制
限されているが、商品Ｂの数詞の読みは制限されていな
い。As shown in FIG. 25, the recognition word table 23, which serves as a language classification dictionary, has the classification of the numerical numbers as the language set as "commodity A, commodity B". The length is not set. Then, when setting the numbers read out from the word dictionary 8 in the word notation correspondence table 5 as described above, the reading generation unit 9 corresponds to the readings to be generated when the reading is limited to the head portion by the number of syllables or the like. If the number of the classified languages does not exceed a preset reference value such as "5", the reading is not limited to the head part. Therefore, as shown in FIG. 26, the classification of “product A, product B” is also set for each of the numbers stored in the word dictionary 8, and as shown in FIG.
Although the reading of the quantifiers of the product A set to 1 is restricted to the leading two syllables, the reading of the quantifiers of the product B is not restricted.

【００７６】このような構成において、音声認識装置の
語表記対応表５を辞書作成装置が作成するため、その読
み生成部９は、単語辞書８から取り出した数詞を読みの
先頭部毎に語表記対応表５に格納する。この時、生成す
る読みに対応する言語を認識単語表２３の設定に従って
分類し、この分類された言語の個数が予め設定された基
準値を超過しなければ、その読みは先頭部に制限しな
い。In such a configuration, since the dictionary creation device creates the word notation correspondence table 5 of the voice recognition device, the reading generation unit 9 thereof writes the numbers extracted from the word dictionary 8 into the word notation for each head part of the reading. Store in correspondence table 5. At this time, the language corresponding to the generated reading is classified according to the setting of the recognition word table 23, and if the number of the classified languages does not exceed the preset reference value, the reading is not limited to the head part.

【００７７】例えば、数詞として“１００”が取り出さ
れると、この数詞の分類が認識単語表２３から“商品
Ａ”として検出され、この分類の数詞は七個であること
も検出される。これは基準値である五個より多数なの
で、“商品Ａ”の分類の数詞は読みが先頭部の二音節に
制限されることになり、ここでは全部が“ひゃく”とし
て設定される。一方、数詞として“２００”が取り出さ
れて分類が“商品Ｂ”として検出されると、この分類の
個数である二個は基準値である五個より少数なので、
“商品Ｂ”の分類の数詞は読みが先頭部に制限されな
い。For example, when "100" is taken out as a number, the category of this number is detected as "commodity A" from the recognition word table 23, and it is also detected that the number of this number is seven. Since this is more than the standard value of five, the quantifiers of the classification of "product A" are limited to the leading two syllables, and all are set as "hyaku" here. On the other hand, when "200" is taken out as a number and the classification is detected as "commodity B", the number of the classification, two, is less than the reference value of five,
The reading of the quantifiers of the category of “product B” is not limited to the beginning.

【００７８】上述のように数詞が設定された音声認識装
置は、同一の分類が多数の言語は読みが先頭部に制限さ
れているので、出現頻度が高い言語は読みの先頭部で認
識されることになり、処理時間を短縮することができ
る。一方、同一の分類が少数の数詞は読みが先頭部に制
限されないので、出現頻度が低い言語は読みの全体で認
識されることになり、認識精度を向上させることができ
る。そして、本実施の形態の辞書作成装置は、上述のよ
うな言語を簡易な処理で語表記対応表５に設定すること
ができるので、高性能な音声認識装置を作成することが
できる。As described above, in the speech recognition apparatus in which the number is set, the reading is restricted to the head portion in many languages with the same classification, so that the language having a high appearance frequency is recognized in the head portion of the reading. Therefore, the processing time can be shortened. On the other hand, since the readings of the numbers having the same classification with a small number are not limited to the leading part, the language having a low appearance frequency is recognized in the entire reading, and the recognition accuracy can be improved. Then, the dictionary creating apparatus of the present embodiment can set the above-mentioned languages in the word notation correspondence table 5 by a simple process, so that a high-performance speech recognition apparatus can be created.

【００７９】つぎに、本発明の実施の第七の形態を図２
８ないし図３０に基づいて以下に説明する。なお、この
実施の第七の形態に関し、上述した第六の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a seventh embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. Note that, regarding the seventh embodiment of the present embodiment, the same parts as those of the sixth embodiment described above are denoted by the same names and reference numerals, and detailed description thereof is omitted.

【００８０】図２８に示すように、言語分類辞書となる
認識単語表２３には、言語である数詞の分類が“商品
Ａ，商品Ｂ”として設定されており、図２９に示すよう
に、前記単語辞書８に格納された数詞の各々にも“商品
Ａ，商品Ｂ”の分類が設定されている。読み生成部９
は、単語辞書８から取り出して語表記対応表５に設定す
る数詞の読みを先頭部に制限する際、生成する読みに対
応する言語を認識単語表２３の設定に従って分類し、こ
の分類における読みの個数が、予め設定された“２”な
どの基準値を超過しないように、生成する言語の読みの
先頭部の長さを可変する。As shown in FIG. 28, the recognition word table 23, which is a language classification dictionary, has the classification of the numerical numbers of the languages set as "product A, product B". As shown in FIG. The classification of “product A, product B” is also set for each of the numbers stored in the word dictionary 8. Reading generator 9
When limiting the readings of the numbers read from the word dictionary 8 and set in the word notation correspondence table 5 to the head part, the languages corresponding to the generated readings are classified according to the setting of the recognition word table 23, and the readings in this classification are The length of the leading part of the reading of the language to be generated is changed so that the number does not exceed a preset reference value such as "2".

【００８１】より詳細には、最初に認識単語表２３に従
って単語辞書８から一つの分類の全部の言語が取り出さ
れ、その読みが先頭から一文字ずつ増加される。この読
みの個数が基準値を超過すると、その直前の読みを採用
する。このため、図３０に示すように、語表記対応表５
は、商品Ａの数詞の読みは先頭部の二音節に制限されて
いるが、商品Ｂの数詞の読みは制限されていない。More specifically, first, all languages of one classification are extracted from the word dictionary 8 according to the recognized word table 23, and the readings thereof are increased by one character from the beginning. When the number of readings exceeds the reference value, the reading immediately before that is adopted. Therefore, as shown in FIG. 30, the word notation correspondence table 5
, The reading of the number of the product A is limited to the leading two syllables, but the reading of the number of the product B is not limited.

【００８２】このような構成において、音声認識装置の
語表記対応表５を辞書作成装置が作成するため、その読
み生成部９は、単語辞書８から取り出した数詞を読みの
先頭部毎に語表記対応表５に格納する。この時、生成す
る読みに対応する言語を認識単語表２３の設定に従って
分類し、この分類された言語の読みの個数が基準値を超
過しないように、読みの長さを可変する。In such a configuration, since the dictionary creation device creates the word notation correspondence table 5 of the voice recognition device, the reading generation unit 9 thereof writes the numbers taken out from the word dictionary 8 into the word notation for each head part of the reading. Store in correspondence table 5. At this time, the language corresponding to the generated reading is classified according to the setting of the recognition word table 23, and the reading length is changed so that the number of readings of the classified language does not exceed the reference value.

【００８３】例えば、“商品Ａ”の分類の数詞として
“１００，１１０，１２０，１２５，１２７，１３０，
１７０”が取り出されると、これらの読みは一文字では
“ひ”の一個なので、これは基準値である二個より少数
である。この読みの文字数を一つずつ増加させても、
“ひゃく”までは個数は一個で基準値より少数である。
しかし、読みの文字数を四つまで増加させると、読みは
“ひゃく”“ひゃくじ”“ひゃくに”“ひゃくさ”“ひ
ゃくな”の五個となり、これは基準値である二個を超過
している。そこで、この場合は読みの個数が基準値を超
過する直前の状態で採用され、“商品Ａ”の分類の数詞
の読みは“ひゃく”の一個となる。For example, "100, 110, 120, 125, 127, 130,
When "170" is taken out, these readings are less than the standard value of two because one reading is one "hi". Even if the number of readings is increased by one,
Up to "Hyaku", the number is one and less than the standard value.
However, if the number of reading characters is increased to four, the reading will be five, which is "hyaku", "hyakuji", "hyakuni", "hyakusa", and "hyakuna", which exceeds the standard value of 2. ing. Therefore, in this case, it is adopted just before the number of readings exceeds the reference value, and the reading of the number in the classification of "product A" becomes "hyaku".

【００８４】一方、“商品Ｂ”の分類の数詞として“２
００，２２０”が取り出された場合、これは個数が二個
で基準値と同数なので、読みの文字数を一つずつ増加さ
せても、その個数が基準値を超過することはない。この
ため、これらの読みは先頭部に制限されず、“にひゃ
く”“にひゃくにじゅう”が各々に設定される。On the other hand, "2" is used as a quantifier for classifying "commodity B".
When "00,220" is extracted, the number is two and the same number as the reference value. Therefore, even if the number of reading characters is increased by one, the number does not exceed the reference value. These readings are not limited to the leading part, and "niyaku" and "niyaku niju" are set for each.

【００８５】上述のように数詞が設定された音声認識装
置は、一つの分類に対して読みの個数が制限されている
ので、一つの分類の言語が多数でも読みの個数は一定と
なり、処理時間を短縮することができる。一方、一つの
分類の言語が少数の場合は、その読みは先頭部に制限さ
れないので、出現頻度が低い言語は読みの全体で認識さ
れることになり、認識精度を向上させることができる。
そして、本実施の形態の辞書作成装置は、上述のような
言語を簡易な処理で語表記対応表５に設定することがで
きるので、高性能な音声認識装置を作成することができ
る。Since the number of readings is limited for one classification in the voice recognition device in which the number is set as described above, the number of readings is constant even if there are many languages in one classification, and the processing time is long. Can be shortened. On the other hand, when the number of languages in one category is small, the reading is not limited to the beginning, so that the language with a low appearance frequency is recognized in the entire reading, and the recognition accuracy can be improved.
Then, the dictionary creating apparatus of the present embodiment can set the above-mentioned languages in the word notation correspondence table 5 by a simple process, so that a high-performance speech recognition apparatus can be created.

【００８６】つぎに、本発明の実施の第八の形態を図３
１ないし図３３に基づいて以下に説明する。なお、この
実施の第八の形態に関し、前述した第三の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, an eighth embodiment of the present invention will be described with reference to FIG.
A description will be given below with reference to FIGS. With respect to the eighth embodiment, the same parts as those in the third embodiment described above are designated by the same names and reference numerals, and detailed description thereof will be omitted.

【００８７】図３１に示すように、条件設定辞書となる
認識単語表２３には、複数の言語の連続する条件が設定
されている。この条件は、言語である商品の型式名称
“Ａタイプ，Ｂタイプ”と、言語である商品の型式番号
の分類“商品Ａ，商品Ｂ”とが、言語の連続を示す
“＋”により個々に連結されており、“Ａタイプ＋商品
Ａ，…”などと設定されている。図３２に示すように、
単語辞書８は、数詞の各々に“商品Ａ，商品Ｂ”の分類
と読みとが設定されているが、名称の言語“Ａタイプ，
Ｂタイプ”の各々にも読みが設定されている。As shown in FIG. 31, the recognized word table 23 serving as a condition setting dictionary has continuous conditions set in a plurality of languages. This condition is that the product type name "A type, B type" of the language and the product type number classification "Product A, Product B" of the language are individually indicated by "+" indicating the continuity of the language. They are linked and set as "A type + product A, ...". As shown in FIG. 32,
In the word dictionary 8, the classification and the reading of “product A, product B” are set for each number, but the language of the name “A type,
The reading is set for each of the "B type".

【００８８】読み生成部９は、単語辞書８から取り出し
て語表記対応表５に設定する数詞の読みを先頭部に制限
する際、前記認識単語表２３を参照して複数の連続する
言語の読みを生成し、その末尾に位置する言語のみ読み
を先頭部に制限する。より詳細には、前記認識単語表２
３には“Ａタイプ＋商品Ａ，…”なる条件が設定されて
いるので、これに整合する複数の連続する言語の読みは
“えーたいぷひゃく”などとなり、その末尾に位置する
数詞“ひゃく”のみ読みが音節数などにより先頭部に制
限される。このため、図３３に示すように、語表記対応
表５は、“Ａタイプ１００”などのように連続する言語
が設定されているが、その読みは後部の数詞の読みが二
音節に制限されている。The reading generation unit 9 refers to the recognition word table 23 to read the readings of a plurality of continuous languages when limiting the readings of the numerical words taken out from the word dictionary 8 and set in the word notation correspondence table 5 to the leading part. Is generated, and the reading is restricted to the beginning only in the language located at the end. More specifically, the recognition word table 2
Since the condition of "A type + product A, ..." is set in 3, the reading of multiple consecutive languages that match this is "Etaipu Hyaku", etc., and the number "hyaku" located at the end of it. "Only reading is restricted to the beginning part by the number of syllables. Therefore, as shown in FIG. 33, the word notation correspondence table 5 is set to a continuous language such as “A type 100”, but the reading of the rear number is limited to two syllables. ing.

【００８９】このような構成において、音声認識装置の
語表記対応表５を辞書作成装置が作成するため、その読
み生成部９は、単語辞書８から取り出した数詞を読みの
先頭部毎に語表記対応表５に格納する。この時、読み生
成部９は、認識単語表２３を参照して複数の言語の連続
する条件を認識し、この条件に従って複数の言語の連続
する読みを生成する。In such a configuration, since the dictionary creation device creates the word notation correspondence table 5 of the voice recognition device, the reading generation unit 9 thereof writes the numbers taken out from the word dictionary 8 for each head part of the reading. Store in correspondence table 5. At this time, the reading generation unit 9 refers to the recognition word table 23, recognizes the continuous condition of a plurality of languages, and generates the continuous reading of a plurality of languages according to this condition.

【００９０】例えば、“Ａタイプ＋商品Ａ，…”なる条
件に整合する複数の連続する言語の読みは、“えーたい
ぷひゃく”“えーたいぷひゃくじゅう”“えーたいぷひ
ゃくにじゅう”の三つが生成される。しかし、その末尾
に位置する数詞“ひゃく，ひゃくじゅう，ひゃくにじゅ
う”の読みが先頭部の二音節に制限されるので、これら
の数詞の読みは何れも“ひゃく”となり、“Ａタイプ”
の連続する言語の読みは“えーたいぷひゃく”の一つと
なる。For example, the reading of a plurality of consecutive languages that matches the condition of "A type + commodity A, ..." is "Etaipu Hyaku""EtaipuHyakuju""EtaipuHyakuniju" Three are generated. However, the reading of the number "Hyaku, Hyakuju, Hyakuniju" located at the end of the number is limited to the first two syllables.
The reading of a continuous language of is one of the "Etaipu Hyaku".

【００９１】上述した“Ａタイプ１００”や“Ｂタイプ
２００”などの連続する言語は、“商品型式＋型式番
号”を想定しており、このような言語は特定の組み合わ
せで連続的に一息で発声されることが多い。そこで、本
実施の形態の音声認識装置では、上述のような複数の言
語を予め組み合わせて一つの言語として処理することに
より認識精度を向上させ、その末尾の言語のみ読みを先
頭部に制限することにより処理速度を向上させている。
そして、本実施の形態の辞書作成装置は、上述のような
言語を簡易な処理で語表記対応表５に設定することがで
きるので、高性能な音声認識装置を作成することができ
る。The continuous languages such as "A type 100" and "B type 200" described above are assumed to be "commodity model + model number", and such a language is continuously breathed in a specific combination. Often vocalized. Therefore, in the voice recognition device of the present embodiment, the recognition accuracy is improved by preliminarily combining a plurality of languages as described above and processing them as one language, and the reading of only the last language is limited to the leading part. Improves the processing speed.
Then, the dictionary creating apparatus of the present embodiment can set the above-mentioned languages in the word notation correspondence table 5 by a simple process, so that a high-performance speech recognition apparatus can be created.

【００９２】つぎに、本発明の実施の第九の形態を図３
４ないし図３６に基づいて以下に説明する。なお、この
実施の第九の形態に関し、上述した第八の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a ninth embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. Note that, regarding the ninth embodiment of the present invention, the same parts as those of the eighth embodiment described above are denoted by the same names and reference numerals, and detailed description thereof is omitted.

【００９３】図３４に示すように、条件設定辞書となる
認識単語表２３には、複数の言語の連続する条件が設定
されており、この条件は、言語である商品の型式名称の
分類“商品名Ａ，商品名Ｂ”と、言語である商品の型式
番号の分類“商品Ａ，商品Ｂ”とが、言語の連続を示す
“＋”により個々に連結されている。図３５に示すよう
に、単語辞書８には、型式名称の言語である“コピー，
ファクシミリ”の各々に“商品名Ａ，商品名Ｂ”の分類
と読みとが設定されており、型式番号の数詞“１００，
…”の各々に“商品Ａ，商品Ｂ”の分類と読みとが設定
されている。As shown in FIG. 34, the recognition word table 23, which is a condition setting dictionary, is set with continuous conditions in a plurality of languages. The name A and the product name B "and the category" product A and product B "of the model number of the language are individually connected by" + "indicating the continuity of the language. As shown in FIG. 35, the word dictionary 8 includes "copy," which is the language of the model name.
The classification and reading of "product name A, product name B" are set for each of "facsimile", and the quantifier "100," of the model number is set.
The category and the reading of "commodity A, commodity B" are set for each of "...".

【００９４】読み生成部９は、単語辞書８から取り出し
て語表記対応表５に設定する数詞の読みを先頭部に制限
する際、前記認識単語表２３を参照して複数の連続する
言語の読みを生成し、その全体の読みが予め八文字など
と設定された基準値を超過しないように、末尾に位置す
る言語の読みの先頭部の長さを制限する。より詳細に
は、前記認識単語表２３には“商品名Ａ＋商品Ａ，…”
なる条件が設定されているので、これに整合する複数の
連続する言語の読みは“こぴーひゃく”などとなり、こ
の全体の読みが八文字を超過しないように、その末尾に
位置する数詞“ひゃく”の読みが先頭部に制限される。
このため、図３６に示すように、語表記対応表５は、
“コピー１００”の読みは“こぴーひゃく”のままであ
るが、“ファクシミリ１００”の読みは八文字の“ふぁ
くしみりひゃ”として制限されている。The reading generation unit 9 refers to the recognized word table 23 to read the readings of a plurality of continuous languages when limiting the reading of the numerical words taken out from the word dictionary 8 and set in the word notation correspondence table 5 to the leading portion. Is generated, and the length of the leading part of the reading of the language located at the end is limited so that the entire reading does not exceed the reference value set in advance such as eight characters. More specifically, the recognition word table 23 includes “product name A + product A, ...”
Since the following conditions are set, the readings of multiple consecutive languages that match this will be, for example, “kopyyaku”, and the number at the end of the reading will be “hakuyaku” so that the entire reading does not exceed eight characters. The reading of "is restricted to the beginning.
Therefore, as shown in FIG. 36, the word notation correspondence table 5 is
The reading of "copy 100" is still "kopyyaku", but the reading of "facsimile 100" is restricted to eight characters "fakushimirihya".

【００９５】このような構成において、音声認識装置の
語表記対応表５を辞書作成装置が作成するため、その読
み生成部９は、単語辞書８から取り出した数詞を読みの
先頭部毎に語表記対応表５に格納する。この時、読み生
成部９は、認識単語表２３を参照して複数の言語の連続
する条件を認識し、この条件に従って複数の言語の連続
する読みを生成する。In such a configuration, since the dictionary creation device creates the word notation correspondence table 5 of the voice recognition device, the phonetic generation unit 9 thereof writes the numbers taken out from the word dictionary 8 into the word notation for each head part of the phonetic reading. Store in correspondence table 5. At this time, the reading generation unit 9 refers to the recognition word table 23, recognizes the continuous condition of a plurality of languages, and generates the continuous reading of a plurality of languages according to this condition.

【００９６】例えば、“商品名Ａ＋商品Ａ”なる条件に
整合する複数の連続する言語の読みは、“こぴーひゃ
く”“こぴーさんびゃく”“こぴーさんびゃくにじゅ
う”の三つが生成される。しかし、その全体が八文字を
超過しないように末尾の数詞の読みが先頭部に制限され
るので、上述した複数の連続する言語の全体の読みは
“こぴーひゃく”“こぴーさんびゃく”の二つとなる。For example, when reading a plurality of consecutive languages that match the condition of "commodity name A + commodity A", three words "kopyhyaku", "kopysanbyaku", and "kopysanbyakuniju" are generated. However, since the reading of the last number is limited to the beginning so that the whole number does not exceed eight characters, the entire reading of the above-mentioned plural consecutive languages is “kopyhyaku” or “kopysanbyaku”. It becomes one.

【００９７】本実施の形態の音声認識装置では、連続的
に発声される複数の言語を予め組み合わせて処理するこ
とにより認識精度を向上させ、その末尾の言語のみ読み
を先頭部に制限することにより処理速度を向上させてい
る。このとき、言語の読みが基準値を超過しないので、
先頭の言語が長くとも全体の長さは一定となり、処理速
度が安定して向上している。そして、本実施の形態の辞
書作成装置は、上述のような言語を簡易な処理で語表記
対応表５に設定することができるので、高性能な音声認
識装置を作成することができる。In the speech recognition apparatus of this embodiment, the recognition accuracy is improved by preliminarily combining and processing a plurality of continuously uttered languages, and the reading of only the last language is restricted to the beginning. It improves the processing speed. At this time, since the reading of the language does not exceed the standard value,
Even if the leading language is long, the overall length is constant, and the processing speed is steadily improving. Then, the dictionary creating apparatus of the present embodiment can set the above-mentioned languages in the word notation correspondence table 5 by a simple process, so that a high-performance speech recognition apparatus can be created.

【００９８】つぎに、本発明の実施の第十の形態を図３
７ないし図３９に基づいて以下に説明する。なお、この
実施の第十の形態に関し、上述した第九の形態と同一の
部分は、同一の名称および符号を用いて詳細な説明は省
略する。Next, a tenth embodiment of the present invention will be described with reference to FIG.
This will be described below with reference to FIGS. Note that, regarding the tenth embodiment of the present invention, the same parts as those of the ninth embodiment described above are designated by the same names and reference numerals, and detailed description thereof will be omitted.

【００９９】図３７に示すように、条件設定辞書となる
認識単語表２３には、複数の言語の連続する条件が“商
品名Ａ＋商品Ａ，…”などと設定されており、さらに、
この末尾の分類に読みの先頭部の長さが“２，…”など
として設定されている。図３８に示すように、単語辞書
８には、型式名称の言語である“コピー，…”の各々に
“商品名Ａ，…”の分類と読みとが設定されており、型
式番号の数詞“１００，…”の各々に“商品Ａ，…”の
分類と読みとが設定されている。As shown in FIG. 37, in the recognition word table 23 which is a condition setting dictionary, continuous conditions in a plurality of languages are set such as “product name A + product A, ...”, and further,
The length of the leading part of the reading is set as "2, ..." In this end classification. As shown in FIG. 38, in the word dictionary 8, the classification and the reading of “product name A, ...” are set for each of the “copy ,. The classification and the reading of "commodity A, ..." are set for each of "100, ...".

【０１００】読み生成部９は、単語辞書８から取り出し
て語表記対応表５に設定する数詞の読みを先頭部に制限
する際、前記認識単語表２３を参照して複数の連続する
言語の読みを生成し、その末尾に位置する言語の読みの
先頭部の長さを前記認識単語表２３に設定された長さに
制限する。より詳細には、前記認識単語表２３には“商
品名Ａ＋商品Ａ”なる条件が設定されているので、これ
に整合する複数の連続する言語の読みは“こぴーにひゃ
く”などとなるが、その末尾の数詞の読みの長さは三文
字に設定されているので、この読みは“こぴーにひゃ”
に制限される。一方、“商品名Ｂ＋商品Ｂ”なる条件で
は、末尾の数詞の読みの長さは一文字に設定されている
ので、“ふぁくしみりにひゃく”なる読みは“ふぁくし
みりに”に制限される。The reading generation unit 9 refers to the recognition word table 23 to limit the reading of the numerical words taken from the word dictionary 8 and set in the word notation correspondence table 5 to the reading of a plurality of consecutive languages. Is generated, and the length of the beginning part of the reading of the language located at the end is limited to the length set in the recognition word table 23. More specifically, since the condition “product name A + product A” is set in the recognition word table 23, a plurality of consecutive languages that match this condition are read as “kopi nihyaku” or the like. , The reading length of the last number is set to 3 characters, so this reading is “kopynihya”
Is limited to On the other hand, under the condition of "product name B + product B", the reading length of the last number is set to one character, so the reading of "Fakushimari nihyaku" is limited to "Fakushimi ni" .

【０１０１】このため、図３９に示すように、語表記対
応表５は、“コピー１００”の読みは“こぴーひゃく”
のままであるが、“コピー２００”の読みは“こぴーに
ひゃ”として制限され、“ファクシミリ２００”の読み
は“ふぁくしみりに”に制限されている。Therefore, as shown in FIG. 39, in the word notation correspondence table 5, the reading of "copy 100" is "kopihyaku".
However, the reading of "copy 200" is limited to "copying" and the reading of "facsimile 200" is limited to "faximili".

【０１０２】このような構成において、音声認識装置の
語表記対応表５を辞書作成装置が作成するため、その読
み生成部９は、単語辞書８から取り出した数詞を読みの
先頭部毎に語表記対応表５に格納する。この時、読み生
成部９は、認識単語表２３を参照して複数の言語の連続
する条件を認識し、この条件に従って複数の言語の連続
する読みを生成する。In such a configuration, since the dictionary creation device creates the word notation correspondence table 5 of the voice recognition device, the reading generation unit 9 thereof writes the numbers extracted from the word dictionary 8 into the word notation for each head part of the reading. Store in correspondence table 5. At this time, the reading generation unit 9 refers to the recognition word table 23, recognizes the continuous condition of a plurality of languages, and generates the continuous reading of a plurality of languages according to this condition.

【０１０３】例えば、“商品名Ａ＋商品Ａ”なる条件に
整合する複数の連続する言語の読みは、“こぴーひゃ
く”“こぴーにひゃく”“こぴーにひゃくにじゅう”の
三つが生成される。しかし、その末尾の数詞は三文字を
超過しないように先頭部に制限されるので、上述した複
数の言語の全体の読みは“こぴーひゃく”“こぴーにひ
ゃ”の二つとなる。同様に、“商品名Ｂ＋商品Ｂ”なる
条件に整合する複数の連続する言語の読みは、“ふぁく
しみりにひゃくじゅう”“ふぁくしみりにひゃくにじゅ
う”“ふぁくしみりごひゃくごじゅう”などの五つが生
成されるが、その末尾の数詞が先頭部の一文字に制限さ
れるので、上述した複数の言語の全体の読みは“ふぁく
しみりに”“ふぁくしみりご”の二つとなる。For example, when reading a plurality of consecutive languages that match the condition of "commodity name A + commodity A", three words "kopihyaku", "kopyunihyaku", and "kopyunihyakuniju" are generated. It However, the final number is restricted to the beginning so that it does not exceed three letters, so the total reading of the above-mentioned multiple languages is "kopyhyaku" or "kopynyhyya". Similarly, the readings of multiple consecutive languages that match the condition of "Product name B + Product B" are "Fakushiri Nihyakuju", "Fakushiri Nihyaku niju", and "Fakushiri mihyaku goju". , Etc. are generated, but the number at the end is limited to one character at the beginning, so the whole reading of the above-mentioned multiple languages is "Fakushimari ni""Fakushimirigo" .

【０１０４】本実施の形態の音声認識装置では、連続的
に発声される複数の言語を予め組み合わせて処理するこ
とにより認識精度を向上させ、その末尾の言語のみ読み
を先頭部に制限することにより処理速度を向上させてい
る。このとき、末尾の言語の読みの長さが分類に従って
可変されるので、先頭の言語が長いほど末尾の言語を短
くするようなことができ、処理速度が安定して向上して
いる。そして、本実施の形態の辞書作成装置は、上述の
ような言語を簡易な処理で語表記対応表５に設定するこ
とができるので、高性能な音声認識装置を作成すること
ができる。In the speech recognition apparatus of the present embodiment, the recognition accuracy is improved by preliminarily combining and processing a plurality of languages which are uttered continuously, and the reading of only the last language is restricted to the beginning. It improves the processing speed. At this time, since the reading length of the last language is changed according to the classification, the longer the beginning language, the shorter the ending language can be made, and the processing speed is stably improved. Then, the dictionary creating apparatus of the present embodiment can set the above-mentioned languages in the word notation correspondence table 5 by a simple process, so that a high-performance speech recognition apparatus can be created.

【０１０５】[0105]

【発明の効果】請求項１記載の発明の音声認識装置は、
語表記対応辞書に認識候補の言語が先頭部の読み毎に予
め格納されており、音声入力手段に認識対象の音声が入
力されると、音声認識手段は、音声の先頭部に先頭部の
読みが一致する言語を語表記対応辞書から検出すること
により、音声認識の処理対象となる読みが先頭部に制限
されているので、その個数が削減されており、この音声
認識の処理を高速に実行することができ、人間の発声は
音声の先頭部のみ明瞭な傾向にあり、この先頭部のみを
処理対象とするので、誤認識の発生率を低下させること
ができ、認識結果が複数となる場合でも高確率で正解が
含まれるので、長時間の処理で一つの間違った認識結果
が出力されるものより高い実用性を得ることができる。According to the first aspect of the present invention, there is provided a speech recognition apparatus.
The recognition candidate language is stored in advance in the word notation corresponding dictionary for each reading of the head portion, and when the voice to be recognized is input to the voice input means, the voice recognition means causes the voice recognition means to read the beginning portion of the voice. By detecting a language that matches the word from the word notation compatible dictionary, the number of readings that are the target of voice recognition processing is limited to the beginning, so the number is reduced, and this voice recognition processing is executed at high speed. Human utterance tends to be clear only at the beginning of the speech, and since only this beginning is processed, the incidence of false recognition can be reduced and multiple recognition results can be obtained. However, since the correct answer is included with a high probability, it is possible to obtain higher practicality than that in which one wrong recognition result is output in a long-time process.

【０１０６】請求項２記載の発明の辞書作成装置は、一
般言語辞書に各種の言語が読みと共に予め格納されてお
り、読み生成手段は、一般言語辞書から取り出した言語
を読みの先頭部毎に語表記対応辞書に格納することによ
り、先頭が同一でも末尾が相違して誤認識が発生しやす
い複数の言語を一つの読みに集約することができ、多数
の言語に少数の読みを割り当てて認識処理の対象個数を
削減することができるので、音声認識装置の認識精度と
処理速度との向上に寄与することができ、このような言
語を既存のデータベースなどからなる一般言語辞書から
機械的に作成することができるので、この作業を人間が
実行する必要がない。In the dictionary creating apparatus according to the second aspect of the present invention, various languages are stored in advance in the general language dictionary together with the readings, and the reading generation means selects the language extracted from the general language dictionary for each leading part of the readings. By storing in a dictionary for word notation, multiple languages that have the same beginning but different ends and are likely to cause misrecognition can be aggregated into one reading, and a small number of readings are assigned to many languages for recognition. Since the number of objects to be processed can be reduced, it can contribute to the improvement of the recognition accuracy and processing speed of the speech recognition device, and such a language can be created mechanically from a general language dictionary consisting of an existing database. Humans do not have to perform this task because they can.

【０１０７】請求項３記載の発明の辞書作成装置では、
言語の読みの表音単位が予め格納された表音単位辞書を
設け、読み生成手段は、言語の先頭から一定の表音単位
の部分を読みの先頭部として生成することにより、簡易
な処理で読みの先頭部の長さを一定にすることができ、
この処理が言語の表音単位で実行されるので、音声認識
装置の処理動作に人間の発声の特徴を良好に反映させる
ことができる。In the dictionary creating apparatus according to the third aspect of the invention,
By providing a phonetic unit dictionary in which phonetic units for reading the language are stored in advance, and the reading generation means generates a certain phonetic unit portion from the beginning of the language as the beginning part of the reading, thereby performing a simple process. You can make the length of the beginning of the reading constant,
Since this processing is executed for each phonetic unit of the language, it is possible to favorably reflect the characteristics of human utterance in the processing operation of the voice recognition device.

【０１０８】請求項４記載の発明の辞書作成装置では、
読み生成手段は、言語の先頭から一定の表記単位の部分
を読みの先頭部として生成することにより、簡易な処理
で読みの先頭部の長さを一定にすることができ、この処
理が言語の表記単位で実行されるので、音声認識装置の
処理動作に人間の発声の特徴を良好に反映させることが
できる。According to the dictionary creating apparatus of the invention described in claim 4,
The reading generation means can make the length of the reading head constant by a simple process by generating a part of a fixed notation unit from the head of the language as the reading head part. Since the processing is executed in the notation unit, it is possible to favorably reflect the characteristics of human utterance in the processing operation of the voice recognition device.

【０１０９】請求項５記載の発明の辞書作成装置では、
言語の分類毎に読みの先頭部の長さが予め設定された長
さ設定辞書を設け、読み生成手段は、生成する言語の読
みの先頭部の長さを分類毎に可変することにより、例え
ば、特定の言語のみ読みの長さを延長し、音声認識装置
の処理全体の所要時間は増加させることなく特定の音声
の認識精度を向上させるようなことができ、音声認識の
精度や速度を言語の分類に従って調節することができ
る。According to the dictionary creating apparatus of the invention of claim 5,
By providing a length setting dictionary in which the length of the reading head is preset for each language classification, and the reading generation means varies the length of the reading head of the language to be generated for each classification, for example, It is possible to extend the reading length of only a specific language and improve the recognition accuracy of a specific voice without increasing the overall processing time of the voice recognition device. It can be adjusted according to the classification.

【０１１０】請求項６記載の発明の辞書作成装置では、
数詞の各桁の読みが桁数毎に予め格納された桁対応辞書
を設け、読み生成手段は、複数桁の数詞を言語として先
頭部の読みを生成する場合、先頭部の所定桁の数詞の読
みを一般言語辞書から検出すると共に先頭部の所定桁の
読みを桁対応辞書から検出して組み合わせることによ
り、一般言語辞書が一般的なデータベースなどからなる
場合、一桁の数詞は格納されていても複数桁の数詞は格
納されていない可能性が高いが、このような場合でも複
数桁の数詞の読みを簡易な処理で生成することができる
ので、音声認識装置に複数桁の数字を設定することがで
きる。According to the dictionary creating apparatus of the invention described in claim 6,
A digit-corresponding dictionary in which the reading of each digit of the numeral is stored in advance for each number of digits is provided, and the reading generation means, when generating the reading of the leading part using a plurality of digits of the language as the language, If the general language dictionary consists of a general database, etc., one digit digit is stored if the reading is detected from the general language dictionary and the reading of the predetermined digit at the beginning is combined from the digit correspondence dictionary. There is a high possibility that a multi-digit number will not be stored, but even in such a case, it is possible to generate a multi-digit number reading by a simple process, so set a multi-digit number in the voice recognition device. be able to.

【０１１１】請求項７記載の発明の辞書作成装置では、
組み合わされる数詞により変化する各桁の読みが予め格
納された読み変化辞書を設け、読み生成手段は、複数桁
の数詞を言語として先頭部の読みを生成する場合に、読
み変化辞書を参照して対応する数詞の読みを修正するこ
とにより、単純な組み合わせでは不自然な形態となる読
みを自然な形態に修正することができるので、音声認識
装置に複数桁の数字を適切な読みと共に設定することが
できる。According to the dictionary creating apparatus of the invention described in claim 7,
A reading change dictionary is provided in which the readings of each digit that changes depending on the combined number are stored in advance, and the reading generation unit refers to the reading change dictionary when generating the reading of the head part using a plurality of digits as a language. By correcting the reading of the corresponding numeral, it is possible to correct the reading that is unnatural with a simple combination into a natural one. Therefore, set multiple digits with appropriate readings in the voice recognition device. You can

【０１１２】請求項８記載の発明の辞書作成装置では、
言語の分類が予め設定された言語分類辞書を設け、読み
生成手段は、生成する読みに対応する言語を言語分類辞
書の設定に従って分類し、この分類された言語の個数が
予め設定された基準値を超過しなければ、読みを先頭部
に制限しないことにより、例えば、音声認識の出現頻度
が高い言語の読みを先頭部に制限し、出現頻度が低い言
語の読みを先頭部に制限しないようにして、音声認識装
置の処理時間を短縮すると共に認識精度を向上させるよ
うなことができる。According to the dictionary creating apparatus of the invention described in claim 8,
A language classification dictionary having preset language classifications is provided, and the reading generation unit classifies the languages corresponding to the generated readings according to the setting of the language classification dictionary, and the number of the classified languages is a preset reference value. If it does not exceed, the reading is not restricted to the beginning, so that, for example, the reading of a language with a high appearance frequency of voice recognition is limited to the beginning, and the reading of a language with a low appearance frequency is not restricted to the beginning. Thus, it is possible to shorten the processing time of the voice recognition device and improve the recognition accuracy.

【０１１３】請求項９記載の発明の辞書作成装置では、
言語の分類が予め設定された言語分類辞書を設け、読み
生成手段は、生成する読みに対応する言語を言語分類辞
書の設定に従って分類し、この分類における読みの個数
が予め設定された基準値を超過しないように、生成する
言語の読みの先頭部の長さを可変することにより、一つ
の分類の言語が多数でも読みの個数は一定となるので、
音声認識装置の処理時間を短縮することができ、一つの
分類の言語が少数の場合は読みが先頭部に制限されない
ので、音声認識装置の認識精度を向上させることができ
る。According to the dictionary creating apparatus of the invention described in claim 9,
A language classification dictionary in which the classification of languages is set in advance is provided, and the reading generation means classifies the language corresponding to the generated reading according to the setting of the language classification dictionary, and sets the reference value in which the number of readings in this classification is set in advance. By changing the length of the leading part of the reading of the generated language so that it does not exceed, the number of readings is constant even if there are many languages of one classification,
The processing time of the voice recognition device can be shortened, and when the number of languages in one classification is small, the reading is not limited to the beginning part, so that the recognition accuracy of the voice recognition device can be improved.

【０１１４】請求項１０記載の発明の辞書作成装置で
は、複数の言語の連続する条件が予め設定された条件設
定辞書を設け、読み生成手段は、条件設定辞書を参照し
て複数の連続する言語の読みを生成し、その末尾に位置
する言語のみ読みを先頭部に制限することにより、連続
が予想される複数の言語を予め組み合わせることによ
り、音声認識装置の認識精度を向上させることができ、
その末尾の言語のみ読みを先頭部に制限することによ
り、音声認識装置の処理速度を向上させることができ
る。In the dictionary creating apparatus according to the tenth aspect of the present invention, a condition setting dictionary in which continuous conditions of a plurality of languages are preset is provided, and the reading generation means refers to the condition setting dictionary and a plurality of continuous languages. By limiting the reading to only the language located at the end of the reading, and by combining a plurality of languages expected to be continuous in advance, the recognition accuracy of the voice recognition device can be improved,
By limiting the reading only to the last language to the first part, the processing speed of the voice recognition device can be improved.

【０１１５】請求項１１記載の発明の辞書作成装置で
は、読み生成手段は、複数の連続する言語の全体の読み
が予め設定された基準値を超過しないように、末尾に位
置する言語の読みの先頭部の長さを制限することによ
り、組み合わされる複数の言語の先頭の言語が長くとも
全体の長さは一定となるので、音声認識装置の処理速度
を安定に向上させることができる。In the dictionary creating apparatus according to the eleventh aspect of the present invention, the reading generation means reads the reading of the language located at the end so that the total reading of a plurality of consecutive languages does not exceed a preset reference value. By limiting the length of the head portion, the entire length is constant even if the head language of the combined plural languages is long, so that the processing speed of the voice recognition device can be stably improved.

【０１１６】請求項１２記載の発明の辞書作成装置で
は、条件設定辞書は、複数の言語の連続する条件と共
に、末尾に位置する言語の読みの長さが予め設定されて
おり、読み生成手段は、末尾に位置する言語の読みを設
定された長さに制限することにより、末尾の言語の読み
の長さが分類に従って可変されるので、先頭の言語が長
いほど末尾の言語を短くするようなことができ、音声認
識装置の処理速度を安定に向上させることができる。In the dictionary creating apparatus according to the twelfth aspect of the present invention, the condition setting dictionary is set in advance with the continuous condition of a plurality of languages and the reading length of the language located at the end. , By limiting the reading of the language located at the end to the set length, the reading length of the ending language is changed according to the classification, so that the longer the starting language is, the shorter the ending language is. Therefore, the processing speed of the voice recognition device can be stably improved.

【０１１７】請求項１３記載の発明の音声認識方法は、
認識候補の言語を先頭部の読み毎に語表記対応辞書に予
め格納しておき、認識対象の音声の先頭部に先頭部の読
みが一致する言語を語表記対応辞書から検出するように
したことにより、音声認識の処理対象となる読みが先頭
部に制限されているので、その個数が削減されており、
この音声認識の処理を高速に実行することができ、人間
の発声は音声の先頭部のみ明瞭な傾向にあり、この先頭
部のみを処理対象とするので、誤認識の発生率を低下さ
せることができ、認識結果が複数となる場合でも高確率
で正解が含まれるので、長時間の処理で一つの間違った
認識結果が出力されるものより高い実用性を得ることが
できる。The speech recognition method according to the invention of claim 13 is
The language of the recognition candidate is stored in advance in the word notation corresponding dictionary for each reading of the beginning part, and the language in which the beginning reading matches the beginning part of the speech to be recognized is detected from the word notation corresponding dictionary. As a result, the number of readings to be processed by voice recognition is limited to the beginning, so the number is reduced,
This voice recognition process can be executed at high speed, and human speech tends to be clear only at the beginning of the voice, and since only this beginning is processed, it is possible to reduce the false recognition rate. Even if there are a plurality of recognition results, the correct answer is included with a high probability, so that it is possible to obtain higher practicality than that in which one wrong recognition result is output in a long-time process.

【０１１８】請求項１４記載の発明の辞書作成方法は、
各種の言語が読みと共に予め格納された一般言語辞書か
ら言語を取り出し、この取り出した言語を読みの先頭部
毎に語表記対応辞書に格納するようにしたことにより、
先頭が同一でも末尾が相違して誤認識が発生しやすい複
数の言語を一つの読みに集約することができ、多数の言
語に少数の読みを割り当てて認識処理の対象個数を削減
することができるので、音声認識装置の認識精度と処理
速度との向上に寄与することができ、このような言語を
既存のデータベースなどからなる一般言語辞書から機械
的に作成することができるので、この作業を人間が実行
する必要がない。The dictionary creating method according to the fourteenth aspect of the invention is
By extracting a language from a general language dictionary in which various languages are stored in advance with reading, and storing the extracted language in the word notation corresponding dictionary for each head of reading,
Multiple languages that have the same beginning but different ends and are likely to cause misrecognition can be aggregated into one reading, and a small number of readings can be assigned to many languages to reduce the number of recognition processing targets. Therefore, it is possible to contribute to the improvement of the recognition accuracy and the processing speed of the speech recognition apparatus, and since such a language can be mechanically created from a general language dictionary including an existing database, this work is performed by a human. Does not need to run.

【０１１９】請求項１５記載の発明の情報記憶媒体は、
認識候補の言語が先頭部の読み毎に予め格納された語表
記対応辞書と、認識対象の音声の先頭部に先頭部の読み
が一致する言語を語表記対応辞書からコンピュータに検
出させるプログラムと、が書き込まれていることによ
り、この情報記憶媒体のソフトウェアをコンピュータに
読み取らせて動作させれば、このコンピュータは音声認
識装置として機能することができ、この音声認識装置
は、音声認識の処理対象となる読みが先頭部に制限され
ているので、その個数が削減されており、この音声認識
の処理を高速に実行することができ、人間の発声は音声
の先頭部のみ明瞭な傾向にあり、この先頭部のみを処理
対象とするので、誤認識の発生率を低下させることがで
き、認識結果が複数となる場合でも高確率で正解が含ま
れるので、長時間の処理で一つの間違った認識結果が出
力されるものより高い実用性を得ることができる。An information storage medium according to the fifteenth aspect of the present invention is
A word notation corresponding dictionary in which the language of the recognition candidate is stored in advance for each reading of the head, and a program for causing a computer to detect a language whose head reading coincides with the head of the speech to be recognized from the word notation corresponding dictionary, Since this is written, if the software of this information storage medium is read and operated by a computer, this computer can function as a voice recognition device. Since the number of readings is limited to the beginning, the number is reduced, and this speech recognition processing can be executed at high speed, and human speech tends to be clear only at the beginning of the speech. Since only the beginning part is processed, the occurrence rate of false recognition can be reduced, and even if there are multiple recognition results, the correct answer is included with a high probability, so long-term processing is possible. In it can be obtained one wrong recognition result is higher than what is output practicality.

【０１２０】請求項１６記載の発明の情報記憶媒体は、
各種の言語が読みと共に予め格納された一般言語辞書か
ら言語を取り出すこと、この取り出した言語を読みの先
頭部毎に語表記対応辞書に格納すること、をコンピュー
タに実行させるプログラムが書き込まれていることによ
り、この情報記憶媒体のプログラムをコンピュータに読
み取らせて動作させれば、このコンピュータは、辞書作
成装置として機能することができ、この辞書作成装置
は、先頭が同一でも末尾が相違して誤認識が発生しやす
い複数の言語を一つの読みに集約することができ、多数
の言語に少数の読みを割り当てて認識処理の対象個数を
削減することができるので、音声認識装置の認識精度と
処理速度との向上に寄与することができ、このような言
語を既存のデータベースなどからなる一般言語辞書から
機械的に作成することができるので、この作業を人間が
実行する必要がない。An information storage medium according to the invention of claim 16 is
A program is written to cause a computer to extract a language from a general language dictionary that is stored in advance with various languages and store the extracted language in a word notation corresponding dictionary for each head of the reading. Thus, if a computer reads the program of this information storage medium and operates it, this computer can function as a dictionary creating device. Multiple languages that are likely to be recognized can be aggregated into one reading, and a small number of readings can be assigned to a large number of languages to reduce the number of recognition processing targets. It is possible to contribute to the improvement of speed and to mechanically create such a language from a general language dictionary such as an existing database. Since it is, there is no need to perform human beings this task.

[Brief description of the drawings]

【図１】本発明の実施の第一の形態の音声認識装置と辞
書作成装置とを示す模式的なブロック図である。FIG. 1 is a schematic block diagram showing a voice recognition device and a dictionary creation device according to a first embodiment of the present invention.

【図２】音声認識装置と辞書作成装置とを実現したコン
ピュータシステムのハードウェアを示すブロック図であ
る。FIG. 2 is a block diagram showing hardware of a computer system that realizes a voice recognition device and a dictionary creation device.

【図３】コンピュータシステムの外観を示す斜視図であ
る。FIG. 3 is a perspective view illustrating an appearance of a computer system.

【図４】一般言語辞書である単語辞書の記憶内容を示す
模式図である。FIG. 4 is a schematic diagram showing stored contents of a word dictionary which is a general language dictionary.

【図５】語表記対応辞書である語表記対応表の記憶内容
を示す模式図である。FIG. 5 is a schematic diagram showing stored contents of a word notation correspondence table which is a word notation correspondence dictionary.

【図６】音声認識装置による音声認識方法を示すフロー
チャートである。FIG. 6 is a flowchart showing a voice recognition method by the voice recognition device.

【図７】本発明の実施の第二の形態の音声認識装置と辞
書作成装置とを示すブロック図である。FIG. 7 is a block diagram showing a voice recognition device and a dictionary creation device according to a second embodiment of the present invention.

【図８】表音単位辞書である音節表の記憶内容を示す模
式図である。FIG. 8 is a schematic diagram showing stored contents of a syllable table which is a phonetic unit dictionary.

【図９】語表記対応表の記憶内容を示す模式図である。FIG. 9 is a schematic diagram showing stored contents of a word notation correspondence table.

【図１０】本発明の実施の第三の形態の音声認識装置と
辞書作成装置とを示すブロック図である。FIG. 10 is a block diagram showing a voice recognition device and a dictionary creation device according to a third embodiment of the present invention.

【図１１】長さ設定辞書である認識単語表の記憶内容を
示す模式図である。FIG. 11 is a schematic diagram showing stored contents of a recognition word table which is a length setting dictionary.

【図１２】単語辞書の記憶内容を示す模式図である。FIG. 12 is a schematic diagram showing stored contents of a word dictionary.

【図１３】語表記対応表の記憶内容を示す模式図であ
る。FIG. 13 is a schematic diagram showing stored contents of a word notation correspondence table.

【図１４】本発明の実施の第四の形態の音声認識装置と
辞書作成装置とを示すブロック図である。FIG. 14 is a block diagram showing a voice recognition device and a dictionary creation device according to a fourth embodiment of the present invention.

【図１５】認識単語表の記憶内容を示す模式図である。FIG. 15 is a schematic diagram showing stored contents of a recognition word table.

【図１６】単語辞書の記憶内容を示す模式図である。FIG. 16 is a schematic diagram showing stored contents of a word dictionary.

【図１７】桁対応辞書である桁対応表の記憶内容を示す
模式図である。FIG. 17 is a schematic diagram showing stored contents of a digit correspondence table which is a digit correspondence dictionary.

【図１８】語表記対応表の記憶内容を示す模式図であ
る。FIG. 18 is a schematic diagram showing stored contents of a word notation correspondence table.

【図１９】本発明の実施の第五の形態の音声認識装置と
辞書作成装置とを示すブロック図である。FIG. 19 is a block diagram showing a voice recognition device and a dictionary creation device according to a fifth embodiment of the present invention.

【図２０】認識単語表の記憶内容を示す模式図である。FIG. 20 is a schematic diagram showing stored contents of a recognition word table.

【図２１】単語辞書の記憶内容を示す模式図である。FIG. 21 is a schematic diagram showing stored contents of a word dictionary.

【図２２】桁対応表の記憶内容を示す模式図である。FIG. 22 is a schematic diagram showing stored contents of a digit correspondence table.

【図２３】読み変化辞書である読み変化表の記憶内容を
示す模式図である。FIG. 23 is a schematic diagram showing stored contents of a reading change table which is a reading change dictionary.

【図２４】語表記対応表の記憶内容を示す模式図であ
る。FIG. 24 is a schematic diagram showing stored contents of a word notation correspondence table.

【図２５】本発明の実施の第六の形態の辞書作成装置の
言語分類辞書である認識単語表の記憶内容を示す模式図
である。FIG. 25 is a schematic diagram showing stored contents of a recognition word table which is a language classification dictionary of a dictionary creating device according to a sixth embodiment of the present invention.

【図２６】単語辞書の記憶内容を示す模式図である。FIG. 26 is a schematic diagram showing stored contents of a word dictionary.

【図２７】語表記対応表の記憶内容を示す模式図であ
る。FIG. 27 is a schematic diagram showing stored contents of a word notation correspondence table.

【図２８】本発明の実施の第七の形態の辞書作成装置の
認識単語表の記憶内容を示す模式図である。FIG. 28 is a schematic diagram showing stored contents of a recognition word table of the dictionary creating device according to the seventh embodiment of the present invention.

【図２９】単語辞書の記憶内容を示す模式図である。FIG. 29 is a schematic diagram showing stored contents of a word dictionary.

【図３０】語表記対応表の記憶内容を示す模式図であ
る。FIG. 30 is a schematic diagram showing stored contents of a word notation correspondence table.

【図３１】本発明の実施の第八の形態の辞書作成装置の
条件設定辞書である認識単語表の記憶内容を示す模式図
である。FIG. 31 is a schematic diagram showing stored contents of a recognition word table which is a condition setting dictionary of a dictionary creating device according to an eighth embodiment of the present invention.

【図３２】単語辞書の記憶内容を示す模式図である。FIG. 32 is a schematic diagram showing stored contents of a word dictionary.

【図３３】語表記対応表の記憶内容を示す模式図であ
る。FIG. 33 is a schematic diagram showing stored contents of a word notation correspondence table.

【図３４】本発明の実施の第九の形態の辞書作成装置の
認識単語表の記憶内容の記憶内容を示す模式図である。FIG. 34 is a schematic diagram showing stored contents of a recognized word table of a dictionary creation device according to a ninth embodiment of the invention.

【図３５】単語辞書の記憶内容を示す模式図である。FIG. 35 is a schematic diagram showing storage contents of a word dictionary.

【図３６】語表記対応表の記憶内容を示す模式図であ
る。FIG. 36 is a schematic diagram showing stored contents of a word notation correspondence table.

【図３７】本発明の実施の第十の形態の辞書作成装置の
認識単語表の記憶内容の記憶内容を示す模式図である。FIG. 37 is a schematic diagram showing stored contents of a recognized word table of the dictionary creation device according to the tenth embodiment of the present invention.

【図３８】単語辞書の記憶内容を示す模式図である。FIG. 38 is a schematic diagram showing stored contents of a word dictionary.

【図３９】語表記対応表の記憶内容を示す模式図であ
る。FIG. 39 is a schematic diagram showing stored contents of a word notation correspondence table.

[Explanation of symbols]

１，１１，２１，３１，４１音声認識装置２，１２，２２，３２，４２辞書作成装置３音声入力手段４音声認識手段５語表記対応辞書８一般言語辞書９読み生成手段１３表音単位辞書２３長さ設定辞書、言語分類辞書、条件設定辞書３４桁対応辞書４３読み変化辞書１０１コンピュータ１０３〜１０６，１０８情報記憶媒体 1,11,21,31,41 Speech recognition device 2,12,22,32,42 Dictionary creation device 3 Speech input means 4 Speech recognition means 5 Word notation compatible dictionary 8 General language dictionary 9 Reading generation means 13 Phonetic unit dictionary 23 length setting dictionary, language classification dictionary, condition setting dictionary 34 digit correspondence dictionary 43 reading change dictionary 101 computer 103 to 106, 108 information storage medium

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成８年８月１日[Submission date] August 1, 1996

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００３６[Correction target item name] 0036

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００３６】このような構成において、音声認識装置１
は、人間が発声する音声を認識する。より詳細には、図
６に示すように、人間が発声した音声が音声入力部３に
入力されると、音声認識部４は、この音声の先頭部の三
文字を抽出し、始点を先頭に固定したスポッティングに
より、音声の先頭部と語表記対応表５に格納された複数
の読みとを照合させてスコアを算出し、このスコアが最
高の読みを検出する。このように検出された数詞は結果
出力部６から出力されるので、検出された数詞が複数の
場合は結果選択部７の手動操作により一つに選定され
る。In such a configuration, the voice recognition device 1
Recognizes human voices. More specifically, as shown in FIG. 6, when a voice uttered by a human is input to the voice input unit 3, the voice recognition unit 4 extracts the three characters at the beginning of the voice and sets the start point to the beginning. By fixed spotting, the beginning of the voice and a plurality of words stored in the word notation correspondence table 5
The score is calculated by matching with the reading of
Detect high readings . Since the number output detected in this way is output from the result output unit 6, if there are a plurality of detected number output, the result selection unit 7 manually selects one number.

【手続補正２】[Procedure amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００５２[Correction target item name] 0052

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００５２】このような構成において、本実施の形態の
音声認識装置１１も、人間が発声する音声を認識する。
この時、音声認識部４は、この音声の先頭部と語表記対
応表５に格納された複数の読みとを照合させてスコアを
算出し、このスコアが最高の読みを検出する。この音声
認識装置１１は、音声を照合する単位を表音単位である
音節とするので、音声認識の処理動作に人間の発声の特
徴を良好に反映させることができる。With such a configuration, the voice recognition device 11 of the present embodiment also recognizes a voice uttered by a human.
At this time, the voice recognition unit 4 and the word notation pair with the beginning of this voice.
Match a plurality of readings stored in table 5 with the score
Calculate and detect the reading with the highest score . Since this voice recognition device 11 uses a syllable , which is a phonetic unit, as a unit for collating voices, it is possible to favorably reflect the characteristics of human speech in the processing operation of voice recognition.

【手続補正３】[Procedure 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００５８[Correction target item name] 0058

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００５８】このような構成において、本実施の形態の
音声認識装置２１も、人間が発声する音声を認識する。
この時、音声認識部４は、この音声の先頭部と語表記対
応表５に格納された複数の読みとを照合させてスコアを
算出し、このスコアが最高の読みを検出する。このた
め、商品Ａに分類される数詞“ひゃく”は二音節目で認
識されるが、商品Ｂに分類される数詞“にひゃく”は三
音節目で認識される。With such a configuration, the voice recognition device 21 of the present embodiment also recognizes a voice uttered by a human.
At this time, the voice recognition unit 4 and the word notation pair with the beginning of this voice.
Match a plurality of readings stored in table 5 with the score
Calculate and detect the reading with the highest score . For this reason, the number "hyaku" classified as the product A is recognized at the second syllable, while the number "nyak" classified as the product B is recognized at the third syllable.

【手続補正４】[Procedure amendment 4]

【補正対象書類名】図面[Document name to be amended] Drawing

【補正対象項目名】図６[Correction target item name] Fig. 6

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【図６】 FIG. 6

Claims

[Claims]

1. A voice input means for inputting a voice to be recognized, a word notation corresponding dictionary in which a recognition candidate language is stored in advance for each reading of a head portion, and a head of a voice input by the voice input means. A voice recognition device, comprising: a voice recognition unit that detects a language in which the reading of the head portion matches the part from the word notation corresponding dictionary.

2. A general language dictionary in which various languages are stored together with readings, and a reading generation means for storing the languages extracted from this general language dictionary in a word notation corresponding dictionary for each head of reading. A dictionary creation device characterized by:

3. A phonetic unit dictionary in which phonetic units for reading a language are stored in advance, and the phonetic generation unit generates a part of a fixed phonetic unit from the head of the language as the head part of the phonetic reading. The dictionary creation device according to claim 2, characterized in that

4. The dictionary creating apparatus according to claim 2, wherein the reading generation means generates a portion of a fixed notation unit from the beginning of the language as the beginning of the reading.

5. A length setting dictionary in which the length of the reading head is preset for each language classification is provided, and the reading generation means varies the length of the reading head of the generated language for each classification. The dictionary creating apparatus according to claim 2, wherein

6. A digit-corresponding dictionary in which the readings of each digit of the numeral are stored in advance for each number of digits, and the reading generation means, when the reading of the leading portion is generated by using the plural-digit numeral as a language, 3. The dictionary creating apparatus according to claim 2, wherein the reading of the predetermined digit of the numeral is detected from the general language dictionary, and the reading of the leading predetermined digit from the digit corresponding dictionary is detected and combined.

7. A reading change dictionary is provided in which readings of each digit that change depending on the combined numeral are stored in advance, and when the reading generation means generates the reading of the head part by using a plurality of digits as a language, the reading is formed. 7. The dictionary creating apparatus according to claim 6, wherein the reading of the corresponding numeral is corrected by referring to the change dictionary.

8. A language classification dictionary in which the classification of languages is preset is provided, and the reading generation means classifies the languages corresponding to the readings to be generated according to the setting of the language classification dictionary, and the number of the classified languages is The dictionary creating apparatus according to claim 2, wherein the reading is not limited to the beginning unless the reference value set in advance is exceeded.

9. A language classification dictionary having a preset language classification is provided, and the reading generation means classifies the language corresponding to the reading to be generated according to the setting of the language classification dictionary, and the number of readings in this classification is preset. 3. The dictionary creation device according to claim 2, wherein the length of the leading part of the reading of the language to be generated is changed so as not to exceed the set reference value.

10. A condition setting dictionary in which consecutive conditions of a plurality of languages are preset is provided, and the reading generation means refers to the condition setting dictionary to generate readings of a plurality of consecutive languages and positions at the end. 3. The dictionary creating apparatus according to claim 2, wherein reading of only the languages to be read is restricted to the head part.

11. The reading generation means limits the length of the leading part of the reading of the language located at the end so that the total reading of a plurality of consecutive languages does not exceed a preset reference value. The dictionary creation device according to claim 10, which is characterized in that.

12. The condition setting dictionary is preset with the reading length of the language located at the end together with the continuous condition of a plurality of languages, and the reading generation means sets the reading of the language located at the end. The dictionary creating apparatus according to claim 10, wherein the dictionary creating apparatus limits the length to a specified length.

13. A recognition candidate language is stored in advance in a word notation corresponding dictionary for each head reading, and a language whose head reading matches the head of the speech to be recognized is extracted from the word notation corresponding dictionary. A voice recognition method characterized by being detected.

14. A dictionary characterized in that various languages are taken out from a general language dictionary stored in advance together with reading, and the taken out languages are stored in a word notation corresponding dictionary for each leading part of reading. How to make.

15. An information storage medium in which computer-readable software is pre-written, wherein a recognition candidate language is pre-stored for each reading of the head part and a word notation corresponding dictionary is stored in the head part of the voice to be recognized. An information storage medium, in which a program that causes the computer to detect a language in which the reading of the head portion matches is detected from the word notation corresponding dictionary.

16. A language is extracted from a general language dictionary in which various languages are stored in advance in an information storage medium in which a program which a computer reads and executes a corresponding operation is written in advance. Store in the dictionary for word notation for each beginning of reading,
An information storage medium, wherein a program for causing a computer to execute the above is written.