JPH11153998A

JPH11153998A - Audio response equipment and its method, and computer readable memory

Info

Publication number: JPH11153998A
Application number: JP31868897A
Authority: JP
Inventors: Kenichiro Nakagawa; 賢一郎中川; Tetsuo Kosaka; 哲夫小坂
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1997-11-19
Filing date: 1997-11-19
Publication date: 1999-06-08

Abstract

PROBLEM TO BE SOLVED: To easily and efficiently determine the audio response result of a correct inter-precaution from an audio response result in which homophonous and different meaning words are included. SOLUTION: A speech recognition part 105 recognizes an inputted speech signal. When homo-phonous and different meaning words are included in the speech recognition result, a response is prepared based on information concerning the homophobous and meaning words while storing a read changing dictionary 110 in which words and information concerning the homophonous and different meaning words of words are made to correspond each other and the prepared response is subjected to a voice synthesis in a voice synthesis part 108 to be outputted to the input origin of the inputted voice signal. Then speech recognition result is determined based on the answer of the input origin with respect to the response.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された音声信
号を認識し、その認識に基づいた応答を行う音声応答装
置及びその方法、コンピュータ可読メモリに関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice response apparatus and method for recognizing an input voice signal and making a response based on the recognition, and a computer readable memory.

【０００２】[0002]

【従来の技術】ＣＴＩ（ＣｏｍｐｕｔｅｒＴｅｌｅｐ
ｈｏｎｙＩｎｔｅｇｒａｔｉｏｎ）システムのような
音声応答システムでは、電話からの音声を認識すること
ができ、電話からの音声を音声コマンドとすることがで
きる。また、音声応答システムに、電子メールの機能を
組み込むことにより、電子メールを音声合成により読み
上げることができる。また、特定の電子メールを指定す
る手法として、ある音声によるキーワードを含むＳｕｂ
ｊｅｃｔを選択することができる。このときに、入力さ
れた音声が同音異義語の候補を複数持つ場合、これに対
処する手法として下記の方法があった。2. Description of the Related Art CTI (Computer Telep)
In a voice response system such as a honey integration system, voice from a telephone can be recognized, and voice from a telephone can be used as a voice command. Also, by incorporating an e-mail function into the voice response system, the e-mail can be read out by voice synthesis. In addition, as a method of designating a specific e-mail, Sub including a keyword by a certain voice
jet can be selected. At this time, if the input speech has a plurality of homonym candidates, there is the following method as a method for coping with this.

【０００３】１．音声応答システムが、同音異義語のい
ずれかを音声認識結果として一意にに決定する２．音声認識結果の前後の文脈から音声認識結果として
適当なものを同音異義語の中から選択して決定する３．音声応答システムは同音異義語の中から音声認識結
果を決定せずに、後でユーザかオペレータが音声応答シ
ステムにアクセスし、ディスプレイ等を用いて同音異義
語の中から音声認識結果を決定する[0003] 1. 1. The voice response system uniquely determines any of the homonyms as a voice recognition result. 2. From the context before and after the speech recognition result, an appropriate speech recognition result is selected and determined from homonyms. The voice response system does not determine the voice recognition result from the homonym, but the user or the operator accesses the voice response system later and determines the voice recognition result from the homonym using a display or the like.

【発明が解決しようとする課題】しかしながら、上記従
来の音声応答システムにおいて、上記１の場合、同音異
義語の差別化を行わなければならなくなったときに混乱
が生じていた。また、上記２の場合、多大な計算量と学
習データを必要とするうえ、名前等といった固有名詞の
場合、それらを識別することができなくなる。また、上
記３の場合、オペレータやユーザの負担が多くなり、さ
らに名前等の同音異義語の候補の決定にはユーザ以外の
オペレータでは難しくなる。However, in the above-mentioned conventional voice response system, in case (1) above, confusion has arisen when it is necessary to differentiate homonyms. In the case of the above 2, a large amount of calculation and learning data are required, and in the case of proper nouns such as names, it becomes impossible to identify them. In the case of the above 3, the burden on the operator and the user increases, and it becomes difficult for an operator other than the user to determine the homonym candidate such as the name.

【０００４】本発明は上記の問題点に鑑みてなされたも
のであり、同音異義語が含まれる音声認識結果から、正
解の音声認識結果を容易にかつ効率的に確定することが
できる音声応答装置及びその方法、コンピュータ可読メ
モリを提供することを目的とする。[0004] The present invention has been made in view of the above problems, and a voice response apparatus capable of easily and efficiently determining a correct voice recognition result from a voice recognition result including a homonymous word. And a method thereof, and a computer-readable memory.

【０００５】[0005]

【課題を解決するための手段】上記の目的を達成するた
めの本発明による音声応答装置は以下の構成を備える。
即ち、入力された音声信号を認識し、その認識に基づい
た応答を行う音声認識装置であって、前記入力された音
声信号を認識する音声認識手段と、単語と、該単語の同
音異義語に関する情報とを対応づけた辞書を記憶する記
憶手段と、前記音声認識手段による音声認識結果に同音
異義語が含まれる場合、該同音異義語に関する情報に基
づいた応答を前記入力された音声信号の入力元に実行す
る実行手段と、前記応答に対する前記入力元の返答に基
づいて、前記音声認識結果を確定する確定手段とを備え
る。A voice response apparatus according to the present invention for achieving the above object has the following arrangement.
That is, a voice recognition device that recognizes an input voice signal and performs a response based on the recognition, the voice recognition device recognizing the input voice signal, a word, and a homonym of the word. A storage unit for storing a dictionary in which information is associated with the input speech signal, wherein a response based on information relating to the homonym is included in a result of the speech recognition performed by the speech recognition unit. An execution unit for executing the speech recognition based on a response from the input source with respect to the response.

【０００６】また、好ましくは、前記実行手段は、前記
同音異義語のいずれかを音声認識結果として決定するた
めの質問文を、前記同音異義語に関する情報に基づいて
作成する作成手段と、前記作成手段で作成された質問文
を音声合成する音声合成手段とを備え、前記入力された
音声信号の入力元に対し、前記音声合成手段で音声合成
された質問文を出力する。[0006] Preferably, the execution means creates a question sentence for determining any of the homonyms as a speech recognition result based on information on the homonyms, Voice synthesis means for voice-synthesizing the question sentence created by the means, and outputs the question text voice-synthesized by the voice synthesis means to an input source of the input voice signal.

【０００７】また、好ましくは、前記同音異義語に関す
る情報は、少なくとも該同音異義語の読み、該同音異義
語の特徴を示す情報、該同音異義語に属する情報を含
む。Preferably, the information relating to the homonym includes at least reading of the homonym, information indicating characteristics of the homonym, and information belonging to the homonym.

【０００８】また、好ましくは、ネットワーク回線を介
して電子メールの送受信を管理する管理手段を更に備え
る。Preferably, the apparatus further comprises management means for managing transmission and reception of electronic mail via a network line.

【０００９】また、好ましくは、前記入力された音声信
号の音声認識結果が、前記管理手段で管理される電子メ
ールの内、所定の文字列を含む電子メールを読み上げる
指示であり、その文字列に同音異義語が含まれる場合、
前記実行手段は該同音異義語に関する情報に基づいた応
答を該入力された音声信号の入力元に実行し、前記確定
手段は、前記応答に対する前記入力元の返答に基づい
て、前記文字列を確定し、その文字列を含む電子メール
を該入力元に対し読み上げる。Preferably, the voice recognition result of the input voice signal is an instruction to read out an e-mail including a predetermined character string from among the e-mails managed by the management means. If it contains homonyms,
The execution means executes a response based on the information regarding the homonym to the input source of the input voice signal, and the determination means determines the character string based on a response from the input source to the response. Then, an e-mail including the character string is read out to the input source.

【００１０】また、好ましくは、前記入力された音声信
号の音声認識結果が、前記管理手段で管理される電子メ
ールの内、所定の差出人からの電子メールを読み上げる
指示であり、その差出人に同音異義語が含まれる場合、
前記実行手段は該同音異義語に関する情報に基づいた応
答を該入力された音声信号の入力元に実行し、前記確定
手段は、前記応答に対する前記入力元の返答に基づい
て、前記差出人を確定し、その差出人からの電子メール
を該入力元に対し読み上げる。Preferably, the voice recognition result of the input voice signal is an instruction to read out an e-mail from a predetermined sender in the e-mail managed by the management means. If the word is included,
The execution means executes a response based on the information on the homonym to the input source of the input voice signal, and the determination means determines the sender based on a response of the input source to the response. Then, the e-mail from the sender is read out to the input source.

【００１１】上記の目的を達成するための本発明による
音声応答方法は以下の構成を備える。即ち、入力された
音声信号を認識し、その認識に基づいた応答を行う音声
認識方法であって、前記入力された音声信号を認識する
音声認識工程と、単語と、該単語の同音異義語に関する
情報とを対応づけた辞書を記憶媒体に記憶する記憶工程
と、前記音声認識工程による音声認識結果に同音異義語
が含まれる場合、該同音異義語に関する情報に基づいた
応答を前記入力された音声信号の入力元に実行する実行
工程と、前記応答に対する前記入力元の返答に基づい
て、前記音声認識結果を確定する確定工程とを備える。A voice response method according to the present invention for achieving the above object has the following configuration. That is, a voice recognition method for recognizing an input voice signal and performing a response based on the recognition, the voice recognition step of recognizing the input voice signal, a word, and a homonym of the word. A storage step of storing a dictionary in which information is associated with the storage medium, and, if a homonym is included in the speech recognition result obtained by the speech recognition step, a response based on the information regarding the homonym is given to the input speech. An execution step of executing the signal on an input source; and a determination step of determining the speech recognition result based on a response from the input source to the response.

【００１２】上記の目的を達成するための本発明による
コンピュータ可読メモリは以下の構成を備える。即ち、
入力された音声信号を認識し、その認識に基づいた応答
を行う音声認識のプログラムコードが格納されたコンピ
ュータ可読メモリであって、前記入力された音声信号を
認識する音声認識工程のプログラムコードと、単語と、
該単語の同音異義語に関する情報とを対応づけた辞書を
記憶媒体に記憶する記憶工程のプログラムコードと、前
記音声認識工程による音声認識結果に同音異義語が含ま
れる場合、該同音異義語に関する情報に基づいた応答を
前記入力された音声信号の入力元に実行する実行工程の
プログラムコードと、前記応答に対する前記入力元の返
答に基づいて、前記音声認識結果を確定する確定工程の
プログラムコードとを備える。A computer readable memory according to the present invention for achieving the above object has the following configuration. That is,
A computer-readable memory storing a speech recognition program code for recognizing an input speech signal and performing a response based on the recognition, wherein a program code of a speech recognition step for recognizing the input speech signal, Words and
A program code for a storage step of storing in a storage medium a dictionary in which information relating to the homonym of the word is associated, and information on the homonym if the homonym is included in the speech recognition result of the speech recognition step. A program code of an execution step of executing a response based on the input source of the input voice signal, and a program code of a determination step of determining the voice recognition result based on a response of the input source to the response. Prepare.

【００１３】[0013]

【発明の実施の形態】以下、図面を参照して本発明の好
適な実施形態を詳細に説明する。［実施形態１］図１は本発明の実施形態１の音声応答シ
ステムの構成を示すブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below in detail with reference to the drawings. [First Embodiment] FIG. 1 is a block diagram showing a configuration of a voice response system according to a first embodiment of the present invention.

【００１４】図１に示すように、音声応答システムは、
大きく分けて音声応答装置１０３と音声応答装置１０３
に対し電話入力を行う電話器１０１から構成される。ま
た、実施形態１の電話器１０１は、電話回線１０２を通
じて音声応答装置１０３と接続されているが、無線であ
っても良い。電話器１０１には、通常に使用されている
電話器である。As shown in FIG. 1, the voice response system comprises:
Broadly speaking, the voice response device 103 and the voice response device 103
, A telephone 101 for making a telephone input. The telephone 101 of the first embodiment is connected to the voice response device 103 through the telephone line 102, but may be wireless. The telephone 101 is a telephone normally used.

【００１５】次に、音声応答装置１０３を構成する各種
構成要素について説明する。Next, various components constituting the voice response apparatus 103 will be described.

【００１６】１０４は電話管理部であり、電話器１０１
からの電話入力を受け付け、また、その電話入力に応じ
た応答を電話器１０１に返す。１０５は音声認識部であ
り、電話入力された音声波形を文法辞書１０６を用いて
認識する。１０６は文法辞書であり、音声認識の対象と
なる単語や単語の接続情報を記述した音声認識用の辞書
である。１０７は同音異義語抽出部であり、音声認識部
１０５による音声認識結果から同音語異義語があるかど
うかを判定する。Reference numeral 104 denotes a telephone management unit, and the telephone 101
, And a response corresponding to the telephone input is returned to the telephone 101. Reference numeral 105 denotes a voice recognition unit that recognizes a voice waveform input by telephone using a grammar dictionary 106. Reference numeral 106 denotes a grammar dictionary, which is a dictionary for speech recognition that describes words to be subjected to speech recognition and word connection information. Reference numeral 107 denotes a homonym extraction unit, which determines whether or not there is a homonym from the speech recognition result by the speech recognition unit 105.

【００１７】１０８は音声合成部であり、電話入力に基
づいたテキストデータを作成し、そのテキストデータか
ら音声波形を合成する。また、読み替え作成部１０９か
ら入力されたテキストデータから音声波形を合成する。
１０９は読み替え作成部であり、同音異義語抽出部１０
７で音声認識結果に同音異義語がある場合に、その同音
異義語の部分を違う読みにした質問文を読み替え辞書１
１０を参照して作成する。１１０は読み替え辞書であ
り、音声認識の対象となる単語を構成する文字と、その
読みからなるデータ対を登録した辞書である。１１１は
電子メール管理部であり、ネットワーク１１２を介した
電子メールの送受信の制御を行う。１１２はネットワー
クであり、外部装置との接続を行い、電子メールやＦＡ
Ｘ等のデータの送受信に用いられる。Reference numeral 108 denotes a speech synthesizer which creates text data based on telephone input and synthesizes a speech waveform from the text data. In addition, a speech waveform is synthesized from the text data input from the read-replacement creating unit 109.
Reference numeral 109 denotes a transliteration creating unit, which is a homonym extracting unit 10
7, if there is a homonym in the speech recognition result, the dictionary is replaced with a question sentence in which the homonym part is read differently.
10 is created. Reference numeral 110 denotes a reading dictionary, which is a dictionary in which characters constituting words to be subjected to speech recognition and data pairs formed by reading the words are registered. An e-mail management unit 111 controls transmission and reception of e-mail via the network 112. Reference numeral 112 denotes a network, which connects to an external device and sends e-mails and FAs.
It is used for transmitting and receiving data such as X.

【００１８】次に、実施形態１の音声応答システムで実
行される処理について、図２を用いて説明する。Next, processing executed by the voice response system according to the first embodiment will be described with reference to FIG.

【００１９】図２は本発明の実施形態１の音声応答シス
テムで実行される処理を示すフローチャートである。FIG. 2 is a flowchart showing processing executed in the voice response system according to the first embodiment of the present invention.

【００２０】尚、ここでは、音声応答システムで実行さ
れる処理を、電話器１０１と音声応答装置１０３とのや
り取りとして捉えて説明する。Here, the processing executed by the voice response system will be described as an exchange between the telephone 101 and the voice response device 103.

【００２１】まず、ステップＳ２０１で、ユーザが電話
器１０１を用いて音声応答装置１０３に対し電話回線１
０２を通じて電話をかける。ステップＳ２０２で、音声
応答装置１０３はかかってきた電話を電話管理部１０４
で電話入力として受け付ける。ステップＳ２０３で、ユ
ーザは電話器１０１を用いて音声を発声する。例えば。
ここでは「はしをかう」と発声する。First, in step S 201, the user uses the telephone set 101 to send a telephone line 1 to the voice response apparatus 103.
Make a call through 02. In step S202, the voice response device 103 stores the incoming call in the telephone management unit 104.
Accept as phone input. In step S203, the user utters a voice using the telephone 101. For example.
Here, "Shoulder" is uttered.

【００２２】ステップＳ２０４で、ユーザからの電話入
力が示す音声波形を音声認識部１０５と文法辞書１０６
により音声認識を行う。ステップ２０５で、同音異義語
抽出部１０７で音声認識結果と文法辞書１０６を比較す
ることで、音声認識結果に同音異義語が存在するか否か
を判定する。同音異義語が存在しない場合（ステップＳ
２０５でＮＯ）、ステップＳ２０９に進む。一方、同音
異義語が存在する場合（ステップＳ２０５でＹＥＳ）、
ステップＳ２０６に進む。尚、ステップＳ２０５におけ
る判定は、音声認識結果と文法辞書１０６を比較し、複
数の同音異義語が文法辞書１０６に登録されていて、接
続情報だけではその曖昧性が除去できない場合に音声認
識結果に同音異義語が存在すると判定する。例えば、音
声認識結果が「はしをかう」であり、その音声結果の
内、「はし」に該当する単語として文法辞書１０６に
「箸」、「橋」が登録され、かつ「箸を買う」、「橋を
買う」が共に文法により許される接続であった場合、
「箸」と「橋」が同音異義語としてみなされる。In step S204, the speech waveform indicated by the telephone input from the user is stored in the speech recognition unit 105 and the grammar dictionary 106.
Performs voice recognition. In step 205, the homonym extraction section 107 compares the speech recognition result with the grammar dictionary 106 to determine whether the homonym exists in the speech recognition result. If there is no homonym (step S
(NO at 205), the process proceeds to step S209. On the other hand, if a homonym exists (YES in step S205),
Proceed to step S206. Note that the determination in step S205 compares the speech recognition result with the grammar dictionary 106. If a plurality of homonyms are registered in the grammar dictionary 106 and the ambiguity cannot be removed only by the connection information, the speech recognition result is determined. It is determined that a homonym exists. For example, the speech recognition result is “Make a chopstick”, and “Chopsticks” and “Hashi” are registered in the grammar dictionary 106 as words corresponding to “Mr. "And" Buy a bridge "are both connections allowed by grammar,
"Chopsticks" and "bridge" are considered as homonyms.

【００２３】ステップＳ２０５で、同音異義語は存在す
ると判定された場合、ステップＳ２０６で、同音異義語
に対し、例えば、図３に示すような読み替え辞書１１０
を用いて、読み替え文作成部１０９で同音異義語を異な
る読みにした読み替え文を作成する。具体的には、読み
替え辞書１１０からお互いに重なり合っていない読みを
取り出し、「（読み替え）の（同音読み）での（同音異
義語が含まれるフレーズ）ですか」という文（テキスト
データ）を作成する。例えば、先の例に対し、図３に示
す読み替え辞書１１０を用いた場合、「（ちょ）の（は
し）でのはしをかうですか」と、「（きょう）の（は
し）でのはしをかうですか」の読み替え文を作成する。
ステップＳ２０７で、読み替え文作成部１０９により作
成されたテキストデータを音声合成部１０８で音声波形
に変換し、電話器１０１を通じてユーザに示す。If it is determined in step S205 that a homonym exists, then in step S206, the homonym is read, for example, as shown in FIG.
The replacement sentence creating unit 109 creates a replacement sentence in which the homonym is read differently. Specifically, readings that do not overlap each other are taken out from the reading dictionary 110 and a sentence (text data) of “Is this a (phrase containing a homonym) in (homophonetic reading) of (reading)?” Is created. . For example, in the case of using the replacement dictionary 110 shown in FIG. 3 with respect to the above example, "Do you want to use a chopstick at (cho)?" Do you want to use a chopstick? "
In step S207, the text data created by the replacement sentence creation unit 109 is converted into a speech waveform by the speech synthesis unit 108, and is shown to the user through the telephone 101.

【００２４】ステップＳ２０８で、ユーザは音声応答装
置１０３から音声で示された読み替え文を聞くことによ
り、音声または電話器１０１のプッシュ音を用いて、電
話入力によって発声した内容がどの同音異義語を意図し
て発声したものかを選択する。ステップＳ２０９で、ユ
ーザによって選択された同音異義語を音声認識結果とし
て確定し、上位のアプリケーションに引き渡す。上位の
アプリケーションの例としては、音声認識結果に相当す
るキーワードや差出人の電子メールを読み上げる。ある
いは、音声認識結果を電子メールとして返信する。In step S208, the user listens to the voice-reading device 103 for the replacement sentence indicated by voice, and uses the voice or the push sound of the telephone 101 to find out which homonym is uttered by telephone input. Choose whether you uttered it intentionally. In step S209, the homonym selected by the user is determined as a speech recognition result, and transferred to a higher-level application. As an example of a higher-level application, a keyword corresponding to a speech recognition result and an e-mail of a sender are read out. Alternatively, the voice recognition result is returned as an e-mail.

【００２５】ステップＳ２１０で、ユーザからの電話入
力によって終了が入力されたか否かを判定する。終了で
ない場合（ステップＳ２１０でＮＯ）、他の電話入力を
受け付けるためにステップＳ２０３に戻る。一方、終了
である場合（ステップＳ２１０でＹＥＳ）、ステップＳ
２１１に進み、電話管理部１０４により電話を切断す
る。In step S210, it is determined whether or not the end is input by a telephone input from the user. If the process has not been completed (NO in step S210), the process returns to step S203 to accept another telephone input. On the other hand, if the processing has ended (YES in step S210), the processing proceeds to step S210.
In step 211, the telephone is disconnected by the telephone management unit 104.

【００２６】以上説明したように、実施形態１によれ
ば、ユーザからの電話入力に対する音声認識結果に同音
異義語が存在する場合には、その同音異義語を互いに異
なる読みを示してユーザに確認することができるので、
ユーザに負担をかけることなく正確な音声認識結果を確
定することができる。［実施形態２］実施形態１では、図３に示すような読み
替え辞書１１０を用いたが、図４に示すような読み以外
の単語の特徴を示す特徴情報も登録した構成にしても良
い。この場合、同音異義語同士で互いに重ならない特徴
情報をユーザに示すことで、同音異義語を確定させるこ
とができる。また、このような構成は、電子メールのキ
ーワード指定の曖昧性除去においても有効である。As described above, according to the first embodiment, when a homonym exists in the speech recognition result in response to a telephone input from a user, the homonym is presented to the user with different readings and confirmed to the user. So you can
An accurate speech recognition result can be determined without imposing a burden on the user. [Second Embodiment] In the first embodiment, the replacement dictionary 110 as shown in FIG. 3 is used. However, a configuration may be employed in which feature information indicating features of words other than the reading as shown in FIG. 4 is also registered. In this case, the homonyms can be determined by presenting the user with feature information that does not overlap with each other. Further, such a configuration is also effective in disambiguating keyword designation of an e-mail.

【００２７】以下、その具体例を示す。Hereinafter, a specific example will be described.

【００２８】ユーザ（電話器１０１）：「はしについて
のめ−るをよめ」を音声応答装置１０３に対し電話入力
する。User (telephone set 101): Calls "chopsticks" to the voice response apparatus 103.

【００２９】音声応答装置１０３：『橋の開通工事』の
電子メールと『塗箸の販売網』の電子メールが電子メー
ル管理部１１１に管理されている場合、電話入力された
「はしについてのめーるをよめ」の内、「はし」を同音
異義語とみなす。そして、『橋』と『箸』のお互いに重
なり合っていない読み、または特徴情報を図４に示す読
み替え辞書１１０から選択してその読み替え文をユーザ
に対し示す。この場合、「（ぶりっじ）の（はし）でのはしですか」「（ちょっぷすてぃっく）の（はし）でのはしですか」という読み替え文がユーザに対し示される。Voice response device 103: When the e-mail of "opening bridge" and the e-mail of "sales network of chopsticks" are managed by the e-mail management unit 111, the telephone input Of "Meru yoyome", "Hashi" is regarded as a homonym. Then, non-overlapping readings of "bridge" and "chopsticks" or feature information are selected from the reading dictionary 110 shown in FIG. 4, and the reading sentences are shown to the user. In this case, the user should see the following sentence for the user: "Is it a (Hashi) at (Hashi)" for "(Fashion)"? Is shown.

【００３０】ユーザ（電話器１０１）：ここで、例え
ば、電話入力によって発声した内容が「（ぶりっじ）の
（はし）」を意図していた場合、「ぶりっじのはし」と
電話入力する。User (telephone 101): Here, for example, if the content uttered by telephone input is intended to be "(Hashi) of (Fuji)", "Fujiji no Hoshi" is displayed. Enter by phone.

【００３１】音声応答装置１０３：同音異義語を“橋”
で決定し、『橋の開通工事』の電子メールを読み上げ
る。Voice response device 103: Homophonic word "bridge"
And read the e-mail of "Bridge opening work".

【００３２】以上説明したように、実施形態２によれ
ば、読み替え辞書１１０に読み以外の同音異義語に関す
る情報を示す特徴情報を登録しているので、実施形態１
で得られる効果に加えて、より柔軟に同音異義語をユー
ザに対して確認することができる。［実施形態３］実施形態１、２の読み替え辞書１１０で
は、文字単位でその読みや特徴情報が管理されていた
が、図５に示すように単語ごとに前もって同音異義語を
区別するための特徴情報を登録した読み替え辞書１１０
を用いても良い。このような読み替え辞書１１０を用い
た場合、電子メールの発信者の指定にも利用することが
可能である。As described above, according to the second embodiment, since the feature information indicating information on homonyms other than the reading is registered in the replacement dictionary 110, the first embodiment is used.
In addition to the effect obtained in the above, it is possible to more flexibly confirm the homonymous words to the user. [Embodiment 3] In the replacement dictionary 110 of Embodiments 1 and 2, the pronunciation and characteristic information are managed in units of characters. However, as shown in FIG. 5, a characteristic for distinguishing homonyms in advance for each word as shown in FIG. Replacement dictionary 110 in which information is registered
May be used. When such a replacement dictionary 110 is used, it can be used to specify a sender of an e-mail.

【００３３】以下、その動作例を示す。The operation example will be described below.

【００３４】ユーザ（電話器１０１）：「め−るがとど
いているか」を音声応答装置１０３に対し電話入力す
る。User (telephone 101): Telephone input to voice response device 103, "Whether or not me is staying."

【００３５】音声応答装置１０３：加藤さんと、加東さ
んからの電子メールが電子メール管理部１１１で管理さ
れている場合、「かとうさんからめーるがとどいていま
す」とユーザに対し応答する。Voice response device 103: When e-mails from Kato and Kato are managed by the e-mail management unit 111, the voice response device 103 responds to the user as "Mr.

【００３６】ユーザ（電話器１０１）：この応答に対
し、「よめ」と音声応答装置１０３に対し電話入力す
る。User (telephone 101): In response to this response, telephone input to the voice response device 103 is "Yome".

【００３７】音声応答装置１０３：「どのかとうさん」
とユーザに対し応答する。Voice response device 103: "Which one is it?"
And respond to the user.

【００３８】ユーザ（電話器１０１）：同音異義語とみ
なし、『加藤』と『加東』の単語としてお互いに重なり
合っていない特徴情報を読み替え辞書１１０から選択す
るように音声応答装置１０３に指示する。User (telephone 101): Instructs the voice response device 103 to select from the replacement dictionary 110 feature information that does not overlap each other as words of "Kato" and "Kato", which are regarded as homonyms.

【００３９】音声応答装置１０３：ユーザからの指示に
基づいて、『加藤』と『加東』の単語のお互いに重なり
合っていない特徴情報を図５に示す読み替え辞書１１０
から選択してその読み替え文をユーザに対し示す。この
場合、「（○○けんせつ）の（かとう）でのかとうですか」「（××じどうしゃ）の（かとう）でのかとうですか」という読み替え文がユーザに対し示される。Voice response device 103: Based on an instruction from the user, non-overlapping feature information of the words "Kato" and "Kato" is read in the replacement dictionary 110 shown in FIG.
And shows the replacement sentence to the user. In this case, the user is presented with a replacement sentence of "Is it going to be (Katosetsu) of (XX Kensetsu)?"

【００４０】ユーザ（電話器１０１）：ここで、例え
ば、「（○○けんせつ）の（かとう）」を意図していた
場合、「○○けんせつのかとう」と電話入力する。User (telephone set 101): Here, for example, if the intention is "(Kent of XX Kensetsu)", telephone input of "Kent of XX Kensetsu".

【００４１】音声応答装置１０３：同音異義語を“加
藤”で決定し、加藤さんからの電子メールを読み上げ
る。Voice response device 103: Determines the homonym by "Kato" and reads out an e-mail from Mr. Kato.

【００４２】以上説明したように、実施形態３によれ
ば、読み替え辞書１１０に読み以外の同音異義語に関す
る情報を示す特徴情報を登録しているので、実施形態１
で得られる効果に加えて、より柔軟に同音異義語をユー
ザに対して確認することができる。As described above, according to the third embodiment, since the characteristic information indicating information on homonyms other than the pronunciation is registered in the replacement dictionary 110, the first embodiment is modified.
In addition to the effect obtained in the above, it is possible to more flexibly confirm the homonymous words to the user.

【００４３】以上説明したように、実施形態１〜実施形
態３によれば、同音異義語の曖昧性の解消を、ユーザが
音声応答装置１０３の質問に答えることにより、音声を
通して行うことができるようになる。それにより、ユー
ザはオペレータや特別な機器を使用することなく電話器
１０１だけで、同音異義語の曖昧性を解消することがで
きる。As described above, according to the first to third embodiments, the ambiguity of the homonym can be eliminated through the voice by answering the question of the voice response device 103 by the user. become. As a result, the user can eliminate the ambiguity of the homonym by using only the telephone 101 without using an operator or special equipment.

【００４４】尚、本発明は、複数の機器（例えばホスト
コンピュータ、インタフェース機器、リーダ、プリンタ
など）から構成されるシステムに適用しても、一つの機
器からなる装置（例えば、複写機、ファクシミリ装置な
ど）に適用してもよい。Even if the present invention is applied to a system composed of a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), a single device (for example, a copying machine, a facsimile machine) Etc.).

【００４５】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体を、システムあるいは装置に供給し、そ
のシステムあるいは装置のコンピュータ（またはＣＰＵ
やＭＰＵ）が記憶媒体に格納されたプログラムコードを
読出し実行することによっても、達成されることは言う
までもない。Another object of the present invention is to provide a storage medium storing a program code of software for realizing the functions of the above-described embodiments to a system or apparatus, and to provide a computer (or CPU) of the system or apparatus.
And MPU) read and execute the program code stored in the storage medium.

【００４６】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００４７】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク、ハードディス
ク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ
−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００４８】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also an OS (Operating System) running on the computer based on the instruction of the program code. ) May perform some or all of the actual processing, and the processing may realize the functions of the above-described embodiments.

【００４９】更に、記憶媒体から読出されたプログラム
コードが、コンピュータに挿入された機能拡張ボードや
コンピュータに接続された機能拡張ユニットに備わるメ
モリに書込まれた後、そのプログラムコードの指示に基
づき、その機能拡張ボードや機能拡張ユニットに備わる
ＣＰＵなどが実際の処理の一部または全部を行い、その
処理によって前述した実施形態の機能が実現される場合
も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instruction of the program code, It goes without saying that the CPU included in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００５０】[0050]

【発明の効果】以上説明したように、本発明によれば、
同音異義語が含まれる音声認識結果から、正解の音声認
識結果を容易にかつ効率的に確定することができる音声
認識装置及びその方法、コンピュータ可読メモリを提供
できる。As described above, according to the present invention,
It is possible to provide a speech recognition apparatus and method capable of easily and efficiently determining a correct speech recognition result from a speech recognition result including a homonym, and a computer-readable memory.

【００５１】[0051]

[Brief description of the drawings]

【図１】本発明の実施形態１の音声応答システムの構成
を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a voice response system according to a first embodiment of the present invention.

【図２】本発明の実施形態１の音声応答システムで実行
される処理を示すフローチャートである。FIG. 2 is a flowchart illustrating a process executed by the voice response system according to the first embodiment of the present invention.

【図３】本発明の実施形態１の読み替え辞書の構成を示
す図である。FIG. 3 is a diagram illustrating a configuration of a replacement dictionary according to the first embodiment of the present invention.

【図４】本発明の実施形態２の読み替え辞書の構成を示
す図である。FIG. 4 is a diagram illustrating a configuration of a replacement dictionary according to Embodiment 2 of the present invention.

【図５】本発明の実施形態３の読み替え辞書の構成を示
す図である。FIG. 5 is a diagram illustrating a configuration of a replacement dictionary according to a third embodiment of the present invention.

[Explanation of symbols]

１０１電話機１０２電話回線１０３音声認識装置１０４電話管理部１０５音声認識部１０６文法辞書１０７同音異義語抽出部１０８音声合成部１０９読み替え作成部１１０読み替え辞書１１１電子メール管理部１１２ネットワーク DESCRIPTION OF SYMBOLS 101 Telephone 102 Telephone line 103 Speech recognition device 104 Telephone management unit 105 Speech recognition unit 106 Grammar dictionary 107 Homonymous word extraction unit 108 Speech synthesis unit 109 Transliteration creation unit 110 Translation dictionary 111 E-mail management unit 112 Network

Claims

[Claims]

1. A voice response apparatus for recognizing an input voice signal and performing a response based on the recognition, comprising: voice recognition means for recognizing the input voice signal; a word; A storage unit for storing a dictionary in which information relating to the homonym is stored; and a case where the voice recognition result by the voice recognition unit includes a homonym, a response based on the information regarding the homonym is input to the input voice. A voice response device comprising: an execution unit that executes a signal input source; and a determination unit that determines the voice recognition result based on a response from the input source to the response.

2. A creating means for creating a question sentence for determining any of the homonyms as a speech recognition result based on information on the homonym, Voice synthesis means for voice-synthesizing the obtained question text, and outputting a question text voice-synthesized by the voice synthesis means to an input source of the input voice signal. The voice response device according to claim 1.

3. The homonym according to claim 1, wherein the homonym information includes at least information of the homonym reading, information indicating characteristics of the homonym, and information belonging to the homonym. Voice response device.

4. The voice response apparatus according to claim 1, further comprising management means for managing transmission and reception of electronic mail via a network line.

5. The voice recognition result of the input voice signal is an instruction to read out an electronic mail including a predetermined character string from among the electronic mails managed by the management means, and the character string includes a homonymous word. Is included, the execution means executes a response based on the information on the homonym to the input source of the input voice signal, and the determination means, based on a response of the input source to the response, The voice response device according to claim 4, wherein the character string is determined, and an e-mail including the character string is read out to the input source.

6. The voice recognition result of the input voice signal is an instruction to read out an e-mail from a predetermined sender in the e-mail managed by the management means, and the sender includes a homonymous word. The execution means executes a response based on the information relating to the homonym to the input source of the input voice signal, and the determination means executes the response based on the response of the input source to the response. 5. The voice response apparatus according to claim 4, wherein the voice response device reads out the e-mail from the sender to the input source.

7. A voice response method for recognizing an input voice signal and performing a response based on the recognition, comprising: a voice recognition step of recognizing the input voice signal; A storage step of storing, in a storage medium, a dictionary in which information relating to the homonym is associated; and when the speech recognition result of the speech recognition step includes a homonym, a response based on the information relating to the homonym is input to the input. A voice response method, comprising: an execution step to be performed on an input source of the input voice signal; and a determination step of determining the voice recognition result based on a response from the input source to the response.

8. The execution step includes: creating a question sentence for determining any of the homonyms as a speech recognition result based on the information on the homonyms; A voice synthesis step of voice-synthesizing the obtained question text, and outputting a question text that is voice-synthesized in the voice synthesis step to an input source of the input voice signal. Voice response method as described.

9. The method according to claim 7, wherein the information on the homonym includes at least reading of the homonym, information indicating characteristics of the homonym, and information belonging to the homonym. Voice response method.

10. The voice response method according to claim 7, further comprising a management step of managing transmission / reception of an electronic mail via a network line.

11. The voice recognition result of the input voice signal is an instruction to read out an electronic mail including a predetermined character string from among the electronic mails managed in the managing step, and the character string includes a homonymous word. Is included, the execution step executes a response based on the information regarding the homonym to the input source of the input audio signal, the determination step is based on a response of the input source to the response, 11. The voice response method according to claim 10, wherein the character string is determined, and an electronic mail including the character string is read out to the input source.

12. The voice recognition result of the input voice signal is an instruction to read out an email from a predetermined sender in the email managed in the management step, and the sender includes a homonymous word. The execution means executes a response based on the information relating to the homonym to the input source of the input voice signal, and the determining step includes determining the sender based on a response of the input source to the response. 11. The voice response method according to claim 10, wherein the e-mail from the sender is read out to the input source.

13. A computer readable memory storing a voice response program code for recognizing an input voice signal and performing a response based on the recognition, wherein the voice recognition step recognizes the input voice signal. A program code of a storage step of storing a dictionary in which a word and information on a homonym of the word are associated with each other in a storage medium; and a speech recognition result in the speech recognition step includes the homonym. In the case, the speech recognition result is determined based on a program code of an execution step of executing a response based on the information regarding the homonym to the input source of the input voice signal, and a response of the input source to the response. Computer-readable memory comprising: