JP2004145014A

JP2004145014A - Apparatus and method for automatic vocal answering

Info

Publication number: JP2004145014A
Application number: JP2002310066A
Authority: JP
Inventors: Kiyouko Okuyama; 奥山　鏡子
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-10-24
Filing date: 2002-10-24
Publication date: 2004-05-20
Anticipated expiration: 2022-10-24
Also published as: JP4206253B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus and a method for automatic vocal answering that facilitates management of grammar and dictionaries and can answer faithfully to an inputted speech. <P>SOLUTION: The automatic vocal answering apparatus comprises a voice recognition processing part 1, an interaction processing part 2 which generates prompt information, a voice synthesis processing part 3 which performs voice synthesis processing, and a common reading database 12 in which a plurality of words are registered together with readings for voice recognition and readings for voice synthesis. The voice recognition processing part 1 is made to generate recognition information including an identifier for specifying the reading for voice synthesis of a word registered in the common reading database 12. The interaction processing part 2 is made to extract the reading for voice synthesis of the recognized word according to the identifier and generate prompt information including the extracted reading for voice synthesis. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識処理と音声合成処理とを行なって、入力された音声に自動で応答を行なう自動音声応答装置、及び自動音声応答方法に関する。
【０００２】
【従来の技術】
近年、音声認識技術や音声合成技術の発達により、利用者からの音声に自動で応答する自動音声応答装置が実用化されている（例えば特許文献１参照）。自動音声応答装置は、音声ポータルやカーナビゲーションシステムといった自動応答サービスが必要とされる種々の分野で利用されている。
【０００３】
図４は、従来の自動音声応答装置の構成を示す図である。図４に示すように、自動音声応答装置は、主に、音声認識処理部２１と、対話処理部２２と、音声合成処理部２３とで構成されている。
【０００４】
音声認識処理部２１は、音声が入力されると、音響情報を保持する音響モデル２４と文法情報部２５とを利用して音声認識処理を行なう。また、音声認識処理部２１は、認識した単語とこれに対応するスロット情報とを認識情報として、対話処理部２２に出力する。なお、音響モデル２４と文法情報部２５とを利用した音声認識処理は、音声認識処理の一例であるが、近年最も利用されている。
【０００５】
音響モデル２４は、ある音声がいずれの文字に対応するかを決定するために参照される音響情報が登録されたデータベースである。文法情報部２５は、音声認識用の文法を登録した文法辞書であり、ユーザが発声すると予想される単語又は単語列を登録している。単語列が登録されている場合は、文法情報部２５には、単語の順番情報が状態遷移情報として更に登録される。
【０００６】
また、文法情報部２５には、単語毎に設定されたスロット情報が登録されている。スロット情報は、ある単語が発声された場合に音声認識の結果として返される戻り値であり、後述するアプリケーションデータベース２６の識別子として利用される。
【０００７】
対話処理部２２は、スロット情報が入力されると、音声認識処理部２１の識別結果に応じたプロンプトを決定し、プロンプトデータベース２７から、決定されたプロンプトに対応するプロンプ文を抽出する。更に、対話処理部２２は、アプリケーションデータベース２６を参照して、このプロンプト文に含まれる変数を認識結果に対応する具体的な値に置き換える。プロンプト文はテキストデータであり、音声合成処理部２３に入力される。
【０００８】
音声合成処理部２３は、ユーザ読み辞書２８、基本読み辞書２９、波形辞書３０及び言語辞書３１を備えている。音声合成処理部２３は、プロンプト文が入力されると、出力音声を作成する。具体的には、音声合成処理部２３は、言語辞書３１を用いてプロンプト文を形態素解析し、基本読み辞書２９及びユーザ読み辞書２８を参照してプロンプト文に抑揚情報やポーズ情報、アクセント情報を付与し、更に、波形辞書に登録されている音声データを抽出して出力音声を作成する。
【０００９】
なお、基本読み辞書２９及びユーザ読み辞書２８には、下記の表５に示すように、テキスト（主に単語）、それに対応するアクセント記号付きの発音、及び品詞情報が格納されている。
【００１０】
【表５】

【００１１】
なお、基本読み辞書は２９、一般的なテキスト（主に単語）が登録された辞書であり、音声合成の開発者から予め提供される。また、ユーザ読み辞書２８は、基本読み辞書に登録されていないテキストが登録された辞書であり、ユーザが個別な事情に対応して作成する。通常、音声合成処理部２３は、ユーザ読み辞書２８を、基本読み辞書２９よりも優先して参照する。
【００１２】
【特許文献１】
特開２０００−２４２２８９号公報
【００１３】
【発明が解決しようとする課題】
しかしながら、上記図４で示す自動音声応答装置を用いた自動応答サービスでは、入力された音声を認識し、これを音声出力するためには、音声認識用の文法情報部２５と音声合成用の基本読み辞書２９又はユーザ読み辞書２８とに、認識させたい音声を登録する必要がある。
【００１４】
例えば、「富士通研（フジツウケン）」を音声認識させたい場合、音声認識用の文法情報部２５に、読みである単語「フジツウケン」を登録し、更に、ユーザ読み辞書２８に、文字「富士通研」と読み「フジツーケン」とを登録する必要がある。なお、基本読み辞書２９に、「富士通研」が既に登録されている場合は、ユーザ読み辞書２８に登録する必要はないと考えられるが、音声合成処理部２３による音声出力の正確さを高める点からは、ユーザ読み辞書２８にも登録する必要がある。
【００１５】
このように、上記図４で示す自動音声応答装置を用いて自動応答サービスを行なう場合は、運用管理上、辞書を二重に管理しなければならないという問題がある。
【００１６】
また、上記図４で示す自動音声応答装置を用いた自動応答サービスでは、漢字の読み間違えの問題もある。例えば、人名の「長田さん」は、「ナガタさん」、「オサダさん」の二通りの読み方があるが、上記図４で示す自動音声応答装置では、その構成上、一つの漢字に対して一つの読みしか、基本読み辞書２８又はユーザ読み辞書２９に登録できないこととなっている。また、基本読み辞書２８とユーザ読み辞書２９それぞれに違う読みを登録しても、ユーザ読み辞書が優先されてしまう。
【００１７】
このため、ユーザが「長田（ナガタ）さん」と発声し、音声認識処理部２１で「ナガタさん」と認識されても、ユーザ読み辞書２８に「長田さん（オサダさん）と登録されていれば、音声合成処理部２３で「オサダさん」と出力されてしまう。
【００１８】
また、上記図４で示す自動音声応答装置では、音声認識処理部２１と音声合成処理部２３との間では直接情報のやり取りは行なわれておらず、これらの間には対話処理部２２が介在している。
【００１９】
このため、ユーザが「長田（ナガタ）さん」と発声し、音声認識処理で「ナガタさん」と認識されても、対話処理部２２が、アプリケーションデータベース２６を参照してプロンプト文に含まれる変数を置き換える際に、「長田さん」と置き換えてしまうと、「オサダさん」と出力される場合がある。
【００２０】
本発明の目的は、文法と辞書の管理を容易に行なうことができ、且つ、入力された音声に忠実に音声応答を行ない得る自動音声応答装置及び自動音声応答方法を提供することにある。
【００２１】
【課題を解決するための手段】
上記目的を達成するために本発明にかかる自動音声応答装置は、音声認識処理を行なって、認識情報を出力する音声認識処理部と、前記認識情報に対応するプロンプトを決定し、前記認識情報と決定した前記プロンプトからプロンプト情報を作成する対話処理部と、前記プロンプト情報に基づいて音声合成処理を行なう音声合成処理部と、一又は複数の単語が音声認識用読み及び音声合成用読みと共に登録されている共通読みデータベースとを少なくとも有し、前記音声認識処理部は、前記共通読みデータベースに登録された単語を認識した場合に、前記認識した単語の音声合成用読みを前記共通読みデータベースの中から特定するための識別子が含まれた前記認識情報を出力し、前記対話処理部は、前記識別子に基づいて、前記認識した単語の音声合成用読みを抽出し、抽出した音声合成用読みが含まれたプロンプト情報を作成することを特徴とする。
【００２２】
上記本発明にかかる自動音声応答装置においては、前記音声認識処理部が、前記認識した単語の音声合成用読みを前記共通読みデータベースの中から特定するための識別子が含まれた認識情報を出力する代わりに、前記認識した単語の音声合成用読みを含んだ前記認識情報を出力し、前記対話処理部が、前記認識した単語の音声合成用読みを抽出し、抽出した音声合成用読みが含まれたプロンプト情報を作成する代わりに、前記認識情報に含まれた単語の音声合成用読みを含んだプロンプト情報を作成する態様とすることもできる。
【００２３】
また、上記本発明にかかる自動音声応答装置は、前記共通読みデータベースに登録されている音声認識用読みを用いて音声認識用の文法を作成し、前記音声認識用読みに対応する音声合成用読みを前記共通読みデータベースの中から特定するための識別子を前記文法に付加する文法生成部を有し、音声認識処理部が、前記文法生成部が生成した文法を用いることによって、前記認識した単語の音声合成用読みを前記共通読みデータベースの中から特定するための識別子が含まれた前記認識情報を作成する態様とすることもできる。
【００２４】
更に、前記共通読みデータベースに登録されている音声認識用読みを用いて音声認識用の文法を作成し、前記音声認識用読みに対応する音声合成用読みを前記文法に付加する文法生成部を有し、音声認識処理部が、前記文法生成部が生成した文法を用いることによって、前記認識した単語の音声合成用読みを含んだ前記認識情報を作成する態様とすることもできる。
【００２５】
また、上記本発明にかかる自動音声応答装置においては、前記共通読みデータベースの内容を編集するための編集手段を有しているのが好ましい。
【００２６】
次に、上記目的を達成するために本発明にかかる自動音声応答方法は、音声認識処理を行ない、前記音声認識処理によって得られた認識情報に対応するプロンプトを決定し、前記認識情報と決定した前記プロンプトからプロンプト情報を作成し、前記プロンプト情報に基づいて音声合成処理を行なう自動音声応答方法であって、（ａ）前記音声認識処理において認識された単語が、一又は複数の単語が音声認識用読み及び音声合成用読みと共に登録されている共通読みデータベースに登録された単語である場合に、前記音声認識処理において認識された単語の音声合成用読みを前記共通読みデータベースの中から特定するための識別子が含まれた前記認識情報を出力する工程と、（ｂ）前記識別子に基づいて、前記音声認識処理において認識された単語の音声合成用読みを抽出し、抽出した音声合成用読みが含まれたプロンプト情報を作成する工程とを少なくとも有することを特徴とする。
【００２７】
上記本発明にかかる自動音声応答方法においては、前記（ａ）の工程において、前記音声認識処理において認識された単語の音声合成用読みを前記共通読みデータベースの中から特定するための識別子が含まれた認識情報を出力する代わりに、前記音声認識処理において認識された単語の音声合成用読みを含んだ認識情報を出力し、前記（ｂ）の工程において、前記認識した単語の音声合成用読みを抽出し、抽出した音声合成用読みが含まれたプロンプト情報を作成する代わりに、前記認識情報に含まれた単語の音声合成用読みを含んだプロンプト情報を作成する態様とすることもできる。
【００２８】
本発明は、上記の本発明にかかる自動音声応答方法を具現化するためのプログラムであっても良い。このプログラムをコンピュータにインストールして実行することにより、本発明にかかる案内仲介方法を実行できる。
【００２９】
このように、上記本発明にかかる自動音声応答装置及び自動音声応答方法においては、音声認識用の文法と、音声合成用の辞書を一本化するテーブル、即ち共通読みデータベースを用意することで、上記問題の解決を図っている。ここで、共通読みデータベースについて説明する。共通読みデータベースの内容の一例を下記の表１に示す。
【００３０】
【表１】

【００３１】
上記表１に示す共通読みデータベースは、識別番号毎の複数の名称で構成されており、名称ｉ（１≦ｉ≦Ｎ）は、テキスト▲１▼、音声認識用読み▲２▼、音声合成用読み▲３▼で構成されている。音声認識用読み▲２▼は、テキスト▲１▼の音声認識用の文法に記載する読みを記述したものである。音声合成用読み▲３▼は、テキスト▲１▼の音声合成用読みを記述したものである。
【００３２】
日本語の場合は、音声認識用読み▲２▼はひらがな又はカタカナで記述される。また、音声合成用読み▲３▼は、アクセント記号と共にひらがな又はカタカナで記述される。英語の場合は、音声認識用読み▲２▼は、ＩＰＡ（Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｐｈｏｎｅｔｉｃ　Ａｌｐｈａｂｅｔ）等で記述される。また、音声合成用読み▲３▼は、アクセント記号と共にＩＰＡで記述される。
【００３３】
また、識別番号１〜３から分るように、共通読みデータベースにおいては、テキスト▲１▼が同じであっても、音声認識用読み▲２▼が異なる場合は、名称を分けて記述される。一方、識別番号３から分るように、音声認識用読み▲２▼は複数登録（例：ぶちょう、ぶっちょう）が可能であるが、音声合成用読み▲３▼は一つ（ブチョー）しか登録できないようになっている。
【００３４】
なお、通常は、一つの単語に対して、音声認識用読み▲２▼と音声合成用読み▲３▼との両方が登録される。但し、一般に自動音声応答装置が備えている基本読み辞書に登録されている音声合成用読みと、音声合成用読み▲３▼とが同じである場合は、識別番号１の名称２のように、音声合成用読み▲３▼の欄は空欄として、基本読み辞書に登録された読みが登録されているとみなされる。
【００３５】
【発明の実施の形態】
以下、本発明の自動音声応答装置及び自動音声応答方法の一例について、図面を参照しながら説明する。最初に、図１を用いて本発明の自動音声応答装置の構成を説明する。図１は、本発明の自動音声応答装置の一例を示す構成図である。
【００３６】
図１に示すように、本発明の自動音声応答装置は、音声認識処理部１、対話処理部２及び音声合成処理部３を有しており、この点で従来の自動音声応答装置と同様である。また、本発明の自動音声応答装置は、音響モデル４、文法情報部５、アプリケーションデータベース６、プロンプトデータベース７、波形辞書９、基本読み辞書８及び言語辞書１０を有している点でも、従来の自動音声応答装置と同様である。
【００３７】
但し、本発明の自動音声応答装置は、ユーザ読み辞書の代わりに、共通読みデータベース１２と、編集手段１１と、文法生成部１３とを有しており、この点で、従来の自動音声応答装置と異なっている。また、このために、音声認識処理部１及び対話処理部２における処理も、従来の自動音声応答装置と異なっている。
【００３８】
共通読みデータベース１２は、上記表１で示したように、テキスト、音声認識用読み及び音声合成用読みで構成されている。編集手段１１は、共通読みデータベース１２の内容を編集するための手段である。自動音声応答装置の管理者は、対話例を想定し、編集手段１１を用いて共通読みデータベース１２の内容の追加、削除及び修正を行なう。
【００３９】
文法生成部１３は、共通読みデータベース１２に登録された内容に基づいて文法を生成し、これを文法情報部５に登録する。なお、文法生成部１３で行なわれる処理については後述の図２で具体的に説明する。
【００４０】
次に、本発明の自動音声応答装置における処理及び本発明の自動音声応答方法について、具体的な対話例を挙げて説明する。なお、本発明の自動音声応答方法は、図１に示す自動音声応答装置を動作させることによって実行することができる。また、以下の説明では適宜図１を参照する。
【００４１】
対話例は以下の通りである。
［対話例］
自動音声応答装置：「どなたの連絡先ですか？　名前をおっしゃってください。」
ユーザ　　　　　：「長田（ながた）さん」
自動音声応答装置：「はい、長田（ながた）さんの連絡先は、○○○−△△△△−□□□□です。」
上記の対話例を実現するために、予め、自動音声応答装置の管理者は、編集手段１１を用いて、以下の表２に示すテキスト▲１▼、音声認識用読み▲２▼及び音声合成用読み▲３▼を共通読みデータベース１２に入力する。
【００４２】
【表２】

【００４３】
文法生成部１３は、表１に示す内容に基づいて、文法を作成し、作成した文法を文法情報部５に登録する。この点について図２を用いて説明する。図２は、本発明の自動音声応答装置を構成する文法生成部における処理の一例を示す図である。
【００４４】
図２に示すように、最初に、文法生成部１３は、共通読みデータベース１２から、名称１〜２及び識別番号１〜３に対応する音声認識用読みを抽出し、エントリ情報を設定する（ステップＳ１）。
【００４５】
エントリ情報は、抽出した音声認識用読みに対応する音声合成用読みを、共通読みデータベースの中から特定するための識別子である。本例では、エントリ情報は、共通読みデータベース１２の名前、名称番号及び識別番号を組み合わせて設定している。
【００４６】
次に、文法生成部１３は、抽出した音声認識用読みを音声認識処理で使用する単語として文法情報部５に登録し、更に、エントリ情報をこの音声認識用の読みが認識された場合のスロット情報（戻り値）として文法情報部５に登録する（ステップＳ２）。以下の表３に、文法情報部５に登録された単語及びスロット情報の例を示す。
【００４７】
【表３】

【００４８】
なお、本例では、共通読みデータベース１２の名前を「ｕｓｅｒ」と設定している。このため、例えば「（ｔａｂｌｅｎｏ　　ｕｓｅｒ−１−１）」は、共通読みデータベース「ｕｓｅｒ」における名称１の識別番号１を意味する。
【００４９】
このように、本発明においては、共通読みデータベース１２に新たに登録された単語について、文法生成部１３が文法を生成する。このため、この新たに登録された単語を用いて音声認識処理が行なわれ、上記の対話例が実現される。上記の対話例を実現するために、自動音声応答装置で行なわれる処理について図３を用いて説明する。
【００５０】
図３は、本発明の自動音声応答装置における処理の一例を示す図である。図３に示すように、最初に、上記の対話例で挙げた入力音声（ユーザ：「長田（ながた）さん」）が受信されると（ステップＳ１１）、音声認識処理部１は音声認識処理を行ない、以下の表４に示す認識情報を対話処理部２に出力する（ステップＳ１２）。
【００５１】
【表４】

【００５２】
上記表３から分るように、音声認識処理部１によって認識された単語は「長田（ながた）」であり、これは共通読みデータベース１２に登録された単語である。このため、認識情報として出力されるスロット情報は、上記したエントリ情報を含むものとなる。なお、音声認識処理部１によって認識された単語が、共通読みデータベース１２に登録されていない単語である場合は、スロット情報として従来と同様の戻り値が出力される。
【００５３】
次に、認識情報が入力された対話処理部２は、認識情報に対応するプロンプトを決定する（ステップＳ１３）。具体的には、対話処理部２は、スロット情報「ｔａｂｌｅｎｏ　　ｕｓｅｒ−１−１」から、ユーザが要求する電話番号の相手方の名前「長田」を取得し、更に、アプリケーションデータベース６を参照して「長田」の電話番号情報を取得する。対話処理部２は、この得られた情報に基づいてプロンプトを決定する。
【００５４】
次いで、対話処理部２は、決定したプロンプトに対応するプロンプト文をプロンプトデータベースから抽出する（ステップＳ１４）。なお、本例では、プロンプトデータベース７には雛型となる複数のプロンプト文が登録されており、そこから該当するプロンプト文が抽出されているが、対話処理部２が一から日本語を組み立ててプロンプト文を作成する態様とすることもできる。なお、プロンプトデータベース７から抽出されたばかりのプロンプト文は、「はい、［ユーザ名］の連絡先は、［電話番号］です。」といったものであり、「ユーザ名」や「電話番号」は変数で記述されている。
【００５５】
次に、本例ではスロット情報に共通読みデータベース１２のエントリ情報が含まれているため、対話処理部２は、スロット情報からエントリ情報を抽出する（ステップＳ１５）。また、対話処理部２は、アプリケーションデータベース２６を参照して、プロンプト文の［電話番号］を具体的な値に置き換える。
【００５６】
更に、対話処理部２は、抽出したエントリ情報「ｕｓｅｒ−１−１」に基づいて、共通読みデータベース「ｕｓｅｒ」から、名称番号１及び識別番号１として登録された音声合成用読み「ナガタ」を抽出し、これをプロンプト文に追加する（ステップＳ１６）。
【００５７】
この場合、プロンプト文は、例えば「はい、長田（発声：ナガタ）の連絡先は、○○○−△△△△−□□□□です。」のようになり、単語の読みが合成音声に対応した読みに置換されたものとなる（合成音声読み置換処理）。このプロンプト文は、プロンプト情報として音声合成処理部３へと出力される。なお、上記プロンプト文における「長田（発声：ナガタ）」の記述は、音声合成エンジンにおける単語の読みの記述方法によって異なるので、使用する音声合成エンジンに対応するように記述すれば良い。
【００５８】
その後、音声合成処理部３は、プロンプト情報に基づいて音声合成処理を行ない（ステップＳ１７）、出力音声を送信する（ステップＳ１８）。この結果、ユーザに対して応答がなされたことになる。
【００５９】
このように、本発明においては、音声認識用の文法と音声合成用の辞書とを一本化しているため、辞書を二重に管理する必要がなく、共通読みデータベースの管理のみを行なえば良い。また、共通読みデータベースに、例えば「長田（ながた）さん」と「長田（おさだ）さん」との両方を登録しておけば、入力された音声と対応する音声合成用読み▲３▼を特定するエントリ情報に基づいて、対話処理部がプロンプト情報を作成するため、入力音声に忠実に応答を行なうことが可能となる。
【００６０】
また、本例においては、エントリ情報を含むスロット情報が認識情報として出力されているが、本発明においては、エントリ情報の代わりに、共通読みデータベース１２に登録された音声合成用読みを含むスロット情報を認識情報として出力することもできる。この場合、対話処理部２は、図３に示すステップＳ１６のように共通読みデータベース１２にアクセスしなくても、プロンプト文に音声合成用読みを追加することができるので、処理速度の向上を図ることができる。また、この場合、文法生成部１３は、図２で示すステップＳ１において、音声合成用読みを含むスロット情報を設定する。
【００６１】
本発明の自動音声応答装置は、コンピュータに、図２に示すステップＳ１〜Ｓ２及び図３に示すＳ１１〜Ｓ１８を具現化させるプログラムをインストールし、このプログラムを実行することによって、実現することができる。この場合、コンピュータのＣＰＵ（ｃｅｎｔｒａｌ　　ｐｒｏｃｅｓｓｉｎｇ　　ｕｎｉｔ）によって、音声認識処理部１、対話処理部２、音声合成処理部３及び文法生成部１３における処理が行われる。
【００６２】
また、本発明では、音響モデル４、文法情報部５、アプリケーションデータベース６、プロンプトデータベース７、基本読み辞書８、波形辞書９、言語辞書１０及び共通読みデータベース１２は、コンピュータに備えられたハードディスク等の記憶装置に、これらを構成するデータファイルを格納することによって、又はこのデータファイルが格納された記録媒体をコンピュータと接続された読取装置に搭載することによって実現されている。
【００６３】
なお、共通読みデータベース１２は、データ構造が複雑ではないので、テキストファイルとして記述することが可能である。この場合、編集手段としては、通常のテキストエディタを用いることができる。
【００６４】
【発明の効果】
以上のように，本発明によれば、音声合成用の辞書と音声認識用の文法とを一元的に管理することが出来るようになり、管理コストを削減できる。また、ユーザが“長田（ながた）さん”と言えば、“長田（ながた）さん”と応答し、“長田（おさだ）さん”といえば、“長田（おさだ）さん”と応答する、というように、入力された音声に忠実に音声応答を行ないえる自動音声応答装置を提供することができる。
【図面の簡単な説明】
【図１】本発明の自動音声応答装置の一例を示す構成図である。
【図２】本発明の自動音声応答装置を構成する文法生成部における処理の一例を示す図である。
【図３】本発明の自動音声応答装置における処理の一例を示す図である。
【図４】従来の自動音声応答装置の構成を示す図である。
【符号の説明】
１　音声認識処理部
２　対話処理部
３　音声合成処理部
４　音響モデル
５　文法情報部
６　アプリケーションデータベース
７　プロンプトデータベース
８　基本読み辞書
９　波形辞書
１０　言語辞書
１１　編集手段
１２　共通読みデータベース１２
１３　文法生成部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an automatic voice response apparatus and an automatic voice response method that perform voice recognition processing and voice synthesis processing to automatically respond to input voice.
[0002]
[Prior art]
In recent years, with the development of voice recognition technology and voice synthesis technology, an automatic voice response device that automatically responds to voice from a user has been put into practical use (for example, see Patent Document 1). The automatic voice response device is used in various fields that require an automatic response service such as a voice portal and a car navigation system.
[0003]
FIG. 4 is a diagram showing a configuration of a conventional automatic voice response device. As shown in FIG. 4, the automatic voice response apparatus mainly includes a voice recognition processing unit 21, a dialog processing unit 22, and a voice synthesis processing unit 23.
[0004]
When a speech is input, the speech recognition processing unit 21 performs a speech recognition process using an acoustic model 24 that holds acoustic information and a grammar information unit 25. Further, the voice recognition processing unit 21 outputs the recognized word and the corresponding slot information to the dialog processing unit 22 as recognition information. Note that the speech recognition process using the acoustic model 24 and the grammar information unit 25 is an example of the speech recognition process, and has been used most recently.
[0005]
The acoustic model 24 is a database in which acoustic information referred to for determining which character corresponds to a certain voice is registered. The grammar information section 25 is a grammar dictionary in which grammars for speech recognition are registered, and registers words or word strings expected to be uttered by the user. When a word string is registered, the grammar information section 25 further registers word order information as state transition information.
[0006]
In the grammar information section 25, slot information set for each word is registered. The slot information is a return value returned as a result of voice recognition when a certain word is uttered, and is used as an identifier of the application database 26 described later.
[0007]
When the slot information is input, the dialog processing unit 22 determines a prompt according to the identification result of the voice recognition processing unit 21, and extracts a prompt sentence corresponding to the determined prompt from the prompt database 27. Further, the interaction processing unit 22 refers to the application database 26 and replaces the variables included in the prompt statement with specific values corresponding to the recognition results. The prompt sentence is text data and is input to the speech synthesis processing unit 23.
[0008]
The speech synthesis processing unit 23 includes a user reading dictionary 28, a basic reading dictionary 29, a waveform dictionary 30, and a language dictionary 31. When the prompt sentence is input, the speech synthesis processing unit 23 creates an output speech. Specifically, the speech synthesis processing unit 23 morphologically analyzes the prompt sentence using the language dictionary 31 and refers to the basic reading dictionary 29 and the user reading dictionary 28 to add intonation information, pause information, and accent information to the prompt sentence. Then, the voice data registered in the waveform dictionary is extracted to create an output voice.
[0009]
As shown in Table 5 below, the basic reading dictionary 29 and the user reading dictionary 28 store text (mainly words), pronunciations with accent marks corresponding to the texts, and part of speech information.
[0010]
[Table 5]

[0011]
The basic reading dictionary 29 is a dictionary in which general texts (mainly words) are registered, and is provided in advance by a speech synthesis developer. The user reading dictionary 28 is a dictionary in which texts that are not registered in the basic reading dictionary are registered, and is created by the user according to individual circumstances. Normally, the speech synthesis processing unit 23 refers to the user reading dictionary 28 with higher priority than the basic reading dictionary 29.
[0012]
[Patent Document 1]
JP 2000-242289 A
[Problems to be solved by the invention]
However, in the automatic response service using the automatic voice response device shown in FIG. 4, in order to recognize the input voice and output the voice, the grammar information section 25 for voice recognition and the basic It is necessary to register the voice to be recognized in the reading dictionary 29 or the user reading dictionary 28.
[0014]
For example, when the user wants to recognize “Fujitsu Lab” by voice recognition, register the word “Fujitsu Lab”, which is a reading, in the grammar information section 25 for voice recognition, and furthermore, enters the characters “Fujitsu Lab” in the user reading dictionary 28. "Fujitsuken" must be registered. If "Fujitsu Lab" is already registered in the basic reading dictionary 29, it is considered unnecessary to register it in the user reading dictionary 28. However, it is necessary to improve the accuracy of voice output by the voice synthesis processing unit 23. After that, it is necessary to register in the user reading dictionary 28 as well.
[0015]
As described above, when the automatic response service is performed using the automatic voice response apparatus shown in FIG. 4, there is a problem that the dictionary must be managed twice in terms of operation management.
[0016]
In the automatic response service using the automatic voice response device shown in FIG. 4, there is also a problem of misreading kanji. For example, the personal name "Mr. Nagata" has two ways of reading, "Mr. Nagata" and "Mr. Osada". However, in the automatic voice response apparatus shown in FIG. Only one reading can be registered in the basic reading dictionary 28 or the user reading dictionary 29. Further, even if different readings are registered in the basic reading dictionary 28 and the user reading dictionary 29, the user reading dictionary has priority.
[0017]
For this reason, even if the user utters “Mr. Nagata” and is recognized as “Mr. Nagata” by the speech recognition processing section 21, if “Mr. Nagata (Mr. Osada) is registered in the user reading dictionary 28, Then, the voice synthesis processing unit 23 outputs “Mr. Osada”.
[0018]
Further, in the automatic voice response apparatus shown in FIG. 4, no information is exchanged directly between the voice recognition processing unit 21 and the voice synthesis processing unit 23, and a dialog processing unit 22 is interposed between them. are doing.
[0019]
Therefore, even if the user utters “Mr. Nagata” and is recognized as “Mr. Nagata” in the voice recognition processing, the interactive processing section 22 refers to the application database 26 to change the variables included in the prompt sentence. At the time of replacement, if it is replaced with "Mr. Nagata", "Mr. Osada" may be output.
[0020]
SUMMARY OF THE INVENTION An object of the present invention is to provide an automatic voice response apparatus and an automatic voice response method capable of easily managing a grammar and a dictionary and performing a voice response faithfully to an input voice.
[0021]
[Means for Solving the Problems]
To achieve the above object, an automatic voice response apparatus according to the present invention performs voice recognition processing, a voice recognition processing unit that outputs recognition information, and determines a prompt corresponding to the recognition information. A dialogue processing unit that creates prompt information from the determined prompt, a speech synthesis processing unit that performs speech synthesis processing based on the prompt information, and one or more words are registered together with a speech recognition reading and a speech synthesis reading. Having at least a common reading database, wherein the speech recognition processing unit, when recognizing a word registered in the common reading database, reads a speech synthesis reading of the recognized word from the common reading database. Outputting the recognition information including an identifier for identification; and the interactive processing unit outputs the recognized unit based on the identifier. Of extract audio synthesis reading, characterized in that to create the extracted prompt information including read for speech synthesis was.
[0022]
In the automatic voice response apparatus according to the present invention, the voice recognition processing unit outputs recognition information including an identifier for specifying a voice-synthesis reading of the recognized word from the common reading database. Instead, the recognition information including the speech synthesis reading of the recognized word is output, and the interaction processing unit extracts the speech synthesis reading of the recognized word, and includes the extracted speech synthesis reading. Instead of creating the prompt information, it is also possible to create prompt information including a speech synthesis reading of a word included in the recognition information.
[0023]
Also, the automatic voice response apparatus according to the present invention creates a grammar for voice recognition using the voice recognition reading registered in the common reading database, and generates a voice synthesis reading corresponding to the voice recognition reading. Has a grammar generation unit for adding an identifier for specifying from the common reading database to the grammar, the speech recognition processing unit, by using the grammar generated by the grammar generation unit, the The recognition information including an identifier for specifying a speech synthesis reading from the common reading database may be created.
[0024]
Further, a grammar generation unit is provided for creating a grammar for speech recognition using the speech recognition reading registered in the common reading database and adding a speech synthesis reading corresponding to the speech recognition reading to the grammar. Then, the voice recognition processing unit may use the grammar generated by the grammar generation unit to create the recognition information including the speech-for-speech reading of the recognized word.
[0025]
Further, the automatic voice response device according to the present invention preferably has editing means for editing the contents of the common reading database.
[0026]
Next, in order to achieve the above object, the automatic voice response method according to the present invention performs a voice recognition process, determines a prompt corresponding to recognition information obtained by the voice recognition process, and determines the prompt as the recognition information. An automatic voice response method for generating prompt information from the prompt and performing a voice synthesis process based on the prompt information, wherein (a) one or more words recognized in the voice recognition process are recognized by voice. In the case where the word is registered in the common reading database that is registered together with the voice reading and the voice synthesis reading, in order to specify the voice synthesis reading of the word recognized in the voice recognition process from the common reading database. Outputting the recognition information including the identifier of (b), and (b) performing recognition in the voice recognition processing based on the identifier. Extracting the words read for speech synthesis, and having at least a step of creating the extracted prompt information including read for speech synthesis was.
[0027]
In the above automatic voice response method according to the present invention, in the step (a), an identifier for specifying a voice synthesis reading of a word recognized in the voice recognition processing from the common reading database is included. Instead of outputting the recognized information, it outputs recognition information including the speech synthesis reading of the word recognized in the speech recognition processing, and in the step (b), the speech synthesis reading of the recognized word is output. Instead of creating the prompt information including the extracted and extracted speech synthesis reading, a mode may be adopted in which the prompt information including the speech synthesis reading of the word included in the recognition information is created.
[0028]
The present invention may be a program for realizing the above-described automatic voice response method according to the present invention. By installing and executing this program on a computer, the guidance mediating method according to the present invention can be executed.
[0029]
As described above, in the automatic voice response apparatus and the automatic voice response method according to the present invention, by preparing a table for unifying a grammar for voice recognition and a dictionary for voice synthesis, that is, by preparing a common reading database, The above problems are being solved. Here, the common reading database will be described. An example of the contents of the common reading database is shown in Table 1 below.
[0030]
[Table 1]

[0031]
The common reading database shown in Table 1 is composed of a plurality of names for each identification number, and the names i (1 ≦ i ≦ N) are text (1), reading for speech recognition (2), and reading for speech synthesis. It consists of reading (3). The voice recognition reading (2) describes the reading described in the voice recognition grammar of the text (1). Speech synthesis reading (3) describes a speech synthesis reading of text (1).
[0032]
In the case of Japanese, the voice recognition reading (2) is described in hiragana or katakana. In addition, the reading (3) for speech synthesis is described in hiragana or katakana together with accent symbols. In the case of English, the speech recognition reading (2) is described in IPA (International Phonetic Alphabet) or the like. The reading (3) for speech synthesis is described in IPA together with accent marks.
[0033]
Further, as can be seen from the identification numbers 1 to 3, in the common reading database, even if the text (1) is the same but the voice recognition reading (2) is different, the names are described separately. On the other hand, as can be seen from the identification number 3, a plurality of readings (2) for voice recognition can be registered (eg, boucho, buchu), but only one (bucho) can be read for voice synthesis (3). You cannot register.
[0034]
Normally, both a speech recognition reading (2) and a speech synthesis reading (3) are registered for one word. However, if the voice synthesis reading registered in the basic voice reading dictionary provided in the automatic voice response device and the voice synthesis reading (3) are the same, as in the name 2 of the identification number 1, The column for voice synthesis reading (3) is blank, and it is considered that the reading registered in the basic reading dictionary is registered.
[0035]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an example of an automatic voice response device and an automatic voice response method of the present invention will be described with reference to the drawings. First, the configuration of the automatic voice response apparatus of the present invention will be described with reference to FIG. FIG. 1 is a configuration diagram showing an example of the automatic voice response device of the present invention.
[0036]
As shown in FIG. 1, the automatic voice response apparatus according to the present invention includes a voice recognition processing unit 1, a dialog processing unit 2, and a voice synthesis processing unit 3, which is the same as the conventional automatic voice response apparatus. is there. Further, the automatic voice response apparatus of the present invention has an acoustic model 4, a grammar information section 5, an application database 6, a prompt database 7, a waveform dictionary 9, a basic reading dictionary 8, and a language dictionary 10 in the conventional manner. It is the same as the automatic voice response device.
[0037]
However, the automatic voice response device according to the present invention includes a common reading database 12, an editing unit 11, and a grammar generation unit 13 instead of the user reading dictionary. Is different from For this reason, the processes in the voice recognition processing unit 1 and the dialog processing unit 2 are also different from those of the conventional automatic voice response device.
[0038]
As shown in Table 1 above, the common reading database 12 includes texts, readings for speech recognition, and readings for speech synthesis. The editing unit 11 is a unit for editing the contents of the common reading database 12. The administrator of the automatic voice response apparatus adds, deletes, and corrects the contents of the common reading database 12 using the editing means 11 assuming an example of dialogue.
[0039]
The grammar generation unit 13 generates a grammar based on the content registered in the common reading database 12 and registers the grammar in the grammar information unit 5. The processing performed by the grammar generation unit 13 will be specifically described with reference to FIG.
[0040]
Next, the processing in the automatic voice response apparatus of the present invention and the automatic voice response method of the present invention will be described with reference to specific examples of dialogue. Note that the automatic voice response method of the present invention can be executed by operating the automatic voice response device shown in FIG. In the following description, FIG. 1 will be referred to as appropriate.
[0041]
An example of the dialogue is as follows.
[Example of dialogue]
Automatic voice response device: "Who is your contact? Tell me your name."
User: "Mr. Nagata"
Automatic voice response device: "Yes, contact information for Mr. Nagata is ○○○-△△△△-□□□□.”
In order to realize the above-mentioned dialogue example, the administrator of the automatic voice response apparatus uses the editing means 11 in advance to edit the text {1}, voice recognition reading {2}, and voice synthesis The reading (3) is input to the common reading database 12.
[0042]
[Table 2]

[0043]
The grammar generation unit 13 creates a grammar based on the contents shown in Table 1, and registers the created grammar in the grammar information unit 5. This will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of a process in a grammar generation unit included in the automatic voice response device according to the present invention.
[0044]
As shown in FIG. 2, first, the grammar generation unit 13 extracts, from the common reading database 12, speech recognition readings corresponding to the

names

1 and 2 and the identification numbers 1 to 3, and sets entry information (step S1). S1).
[0045]
The entry information is an identifier for specifying a speech synthesis reading corresponding to the extracted speech recognition reading from the common reading database. In this example, the entry information is set by combining the name, name number, and identification number of the common reading database 12.
[0046]
Next, the grammar generation unit 13 registers the extracted voice recognition reading in the grammar information unit 5 as a word to be used in the voice recognition process, and further stores the entry information in the slot when the voice recognition reading is recognized. It is registered in the grammar information section 5 as information (return value) (step S2). Table 3 below shows an example of word and slot information registered in the grammar information section 5.
[0047]
[Table 3]

[0048]
In this example, the name of the common reading database 12 is set to “user”. Thus, for example, “(tableno user-1-1)” means the identification number 1 of the name 1 in the common reading database “user”.
[0049]
As described above, in the present invention, the grammar generating unit 13 generates a grammar for a word newly registered in the common reading database 12. Therefore, speech recognition processing is performed using the newly registered word, and the above-described dialogue example is realized. A process performed by the automatic voice response apparatus to realize the above-described dialogue example will be described with reference to FIG.
[0050]
FIG. 3 is a diagram showing an example of a process in the automatic voice response apparatus of the present invention. As shown in FIG. 3, when the input voice (user: “Mr. Nagata”) mentioned in the above-described dialogue example is first received (step S11), the voice recognition processing unit 1 performs the voice recognition. The processing is performed, and the recognition information shown in Table 4 below is output to the interaction processing unit 2 (step S12).
[0051]
[Table 4]

[0052]
As can be seen from Table 3, the word recognized by the speech recognition processing unit 1 is “Nagata”, which is a word registered in the common reading database 12. Therefore, the slot information output as the recognition information includes the above-described entry information. If the word recognized by the voice recognition processing unit 1 is not registered in the common reading database 12, a return value similar to the conventional one is output as slot information.
[0053]
Next, the dialog processing unit 2 to which the recognition information has been input determines a prompt corresponding to the recognition information (step S13). Specifically, the interaction processing unit 2 acquires the partner name “Nagata” of the telephone number requested by the user from the slot information “tableno user-1-1”, and further refers to the application database 6 to “ Nagata ”phone number information. The interaction processing unit 2 determines a prompt based on the obtained information.
[0054]
Next, the interaction processing unit 2 extracts a prompt sentence corresponding to the determined prompt from the prompt database (step S14). In this example, a plurality of template prompts are registered in the prompt database 7 and corresponding prompts are extracted therefrom. The interactive processing unit 2 constructs Japanese from scratch. A mode in which a prompt statement is created can also be adopted. The prompt sentence that has just been extracted from the prompt database 7 is "Yes, the contact information of [user name] is [telephone number]", and "user name" and "telephone number" are variables. It has been described.
[0055]
Next, in this example, since the slot information includes the entry information of the common reading database 12, the interaction processing unit 2 extracts the entry information from the slot information (step S15). Further, the interaction processing unit 2 refers to the application database 26 and replaces [phone number] in the prompt statement with a specific value.
[0056]
Further, based on the extracted entry information “user-1-1”, the dialog processing unit 2 reads the speech synthesis reading “Nagata” registered as the name number 1 and the identification number 1 from the common reading database “user”. Extract and add this to the prompt sentence (step S16).
[0057]
In this case, the prompt sentence is, for example, "Yes, the contact information of Nagata (utterance: Nagata) is XX- △△△△-□□□□□." It is replaced by the corresponding reading (synthetic speech reading replacement processing). This prompt sentence is output to the speech synthesis processing unit 3 as prompt information. Note that the description of “Nagata (utterance: Nagata)” in the above-mentioned prompt sentence differs depending on the method of describing the reading of a word in the speech synthesis engine, and therefore may be described so as to correspond to the speech synthesis engine to be used.
[0058]
Thereafter, the voice synthesis processing unit 3 performs voice synthesis processing based on the prompt information (step S17), and transmits the output voice (step S18). As a result, a response has been made to the user.
[0059]
As described above, in the present invention, since the grammar for speech recognition and the dictionary for speech synthesis are unified, there is no need to manage the dictionary twice, and only the common reading database needs to be managed. . If both "Nagata-san" and "Osada-san" are registered in the common reading database, for example, the reading for voice synthesis corresponding to the input voice is {3}. Since the dialogue processing unit creates the prompt information based on the entry information specifying (i), it is possible to faithfully respond to the input voice.
[0060]
In this example, the slot information including the entry information is output as the recognition information. In the present invention, instead of the entry information, the slot information including the speech synthesis reading registered in the common reading database 12 is used. Can be output as recognition information. In this case, the dialog processing unit 2 can add the reading for speech synthesis to the prompt sentence without accessing the common reading database 12 as in step S16 shown in FIG. 3, thereby improving the processing speed. be able to. In this case, the grammar generation unit 13 sets the slot information including the reading for speech synthesis in step S1 shown in FIG.
[0061]
The automatic voice response system of the present invention can be realized by installing a program for realizing steps S1 to S2 shown in FIG. 2 and S11 to S18 shown in FIG. 3 in a computer and executing the program. . In this case, the processing in the speech recognition processing unit 1, the dialog processing unit 2, the speech synthesis processing unit 3, and the grammar generation unit 13 is performed by the CPU (central processing unit) of the computer.
[0062]
Further, in the present invention, the acoustic model 4, the grammar information section 5, the application database 6, the prompt database 7, the basic reading dictionary 8, the waveform dictionary 9, the language dictionary 10, and the common reading database 12 include a hard disk or the like provided in a computer. This is realized by storing data files constituting these in a storage device, or by mounting a recording medium storing the data file in a reading device connected to a computer.
[0063]
Since the data structure of the common reading database 12 is not complicated, it can be described as a text file. In this case, a normal text editor can be used as the editing means.
[0064]
【The invention's effect】
As described above, according to the present invention, the dictionary for speech synthesis and the grammar for speech recognition can be centrally managed, and the management cost can be reduced. In addition, when the user says "Mr. Nagata", the user responds with "Mr. Nagata", and when the user says "Mr. Nagata", she says "Mr. Nagata". It is possible to provide an automatic voice response apparatus capable of giving a voice response faithfully to an input voice, such as responding.
[Brief description of the drawings]
FIG. 1 is a configuration diagram illustrating an example of an automatic voice response device according to the present invention.
FIG. 2 is a diagram illustrating an example of a process in a grammar generation unit included in the automatic voice response device according to the present invention.
FIG. 3 is a diagram showing an example of processing in the automatic voice response device of the present invention.
FIG. 4 is a diagram showing a configuration of a conventional automatic voice response device.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Speech recognition processing part 2 Dialog processing part 3 Speech synthesis processing part 4 Acoustic model 5 Grammar information part 6 Application database 7 Prompt database 8 Basic reading dictionary 9 Waveform dictionary 10 Language dictionary 11 Editing means 12 Common reading database 12
13 Grammar generator

Claims

A voice recognition processing unit that performs voice recognition processing and outputs recognition information; a dialog processing unit that determines a prompt corresponding to the recognition information, and creates prompt information from the recognition information and the determined prompt; A voice synthesis processing unit that performs voice synthesis processing based on the information, and at least one or more words having a common reading database registered together with voice recognition reading and voice synthesis reading,
The speech recognition processing unit, when recognizing a word registered in the common reading database, the recognition including an identifier for specifying a speech synthesis reading of the recognized word from the common reading database. Output information,
The automatic voice response device, wherein the dialogue processing unit extracts a speech synthesis reading of the recognized word based on the identifier, and creates prompt information including the extracted speech synthesis reading.

The voice recognition processing unit outputs, instead of outputting recognition information including an identifier for specifying the recognized word for speech synthesis from the common reading database, reads the recognized word for speech synthesis. Outputting the recognition information including
The dialogue processing unit extracts a speech synthesis reading of the recognized word, and instead of creating prompt information including the extracted speech synthesis reading, a speech synthesis reading of the word included in the recognition information is performed. 2. The automatic voice response device according to claim 1, wherein prompt information including the following is created.

An identifier for creating a grammar for voice recognition using the voice recognition reading registered in the common reading database, and specifying a voice synthesis reading corresponding to the voice recognition reading from the common reading database. Has a grammar generation unit that adds to the grammar,
A speech recognition processing unit creates the recognition information including an identifier for specifying a speech synthesis reading of the recognized word from the common reading database by using the grammar generated by the grammar generation unit. The automatic voice response device according to claim 1, wherein

A grammar generation unit that creates a grammar for voice recognition using the voice recognition reading registered in the common reading database, and adds a voice synthesis reading corresponding to the voice recognition reading to the grammar,
3. The automatic voice response apparatus according to claim 2, wherein the voice recognition processing unit generates the recognition information including a voice synthesis reading of the recognized word by using the grammar generated by the grammar generation unit.

3. The automatic voice response system according to claim 1, further comprising editing means for editing the contents of said common reading database.

A voice recognition process is performed, a prompt corresponding to the recognition information obtained by the voice recognition process is determined, prompt information is created from the recognition information and the determined prompt, and a voice synthesis process is performed based on the prompt information. An automatic voice response method,
(A) when one or more words recognized in the speech recognition processing are words registered in a common reading database registered together with a reading for speech recognition and a reading for speech synthesis, A step of outputting the recognition information including an identifier for identifying a speech synthesis reading of the word recognized in the processing from the common reading database,
(B) extracting a voice-synthesis reading of the word recognized in the voice recognition processing based on the identifier, and generating prompt information including the extracted voice-synthesis reading. Automatic voice response method.

In the step (a), instead of outputting recognition information including an identifier for specifying a speech synthesis reading of a word recognized in the speech recognition processing from the common reading database, the speech recognition is performed. Outputting recognition information including a speech synthesis reading of the word recognized in the processing;
In the step (b), instead of extracting a speech synthesis reading of the recognized word and creating prompt information including the extracted speech synthesis reading, a speech synthesis of the word included in the recognition information is performed. 7. The automatic voice response method according to claim 6, wherein prompt information including read-ahead is created.

A voice recognition process is performed, a prompt corresponding to the recognition information obtained by the voice recognition process is determined, prompt information is created from the recognition information and the determined prompt, and a voice synthesis process is performed based on the prompt information. A program for causing a computer to execute an automatic voice response method,
(A) when one or more words recognized in the speech recognition processing are words registered in a common reading database registered together with a reading for speech recognition and a reading for speech synthesis, Outputting the recognition information including an identifier for specifying a speech synthesis reading of a word recognized in the processing from the common reading database,
(B) extracting a speech synthesis reading of the word recognized in the speech recognition process based on the identifier, and creating prompt information including the extracted speech synthesis reading. A program to be executed by a computer.

In the step (a), instead of outputting recognition information including an identifier for specifying a speech synthesis reading of a word recognized in the speech recognition process from the common reading database, the speech recognition is performed. Outputting recognition information including a speech synthesis reading of the word recognized in the processing;
In the step (b), instead of extracting a speech synthesis reading of the recognized word and creating prompt information including the extracted speech synthesis reading, a speech synthesis of the word included in the recognition information is performed. The program for causing a computer according to claim 8 to create prompt information including a read-aloud.