JP3576511B2

JP3576511B2 - Voice interaction device

Info

Publication number: JP3576511B2
Application number: JP2001284377A
Authority: JP
Inventors: 和也野村
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2001-09-19
Filing date: 2001-09-19
Publication date: 2004-10-13
Anticipated expiration: 2021-09-19
Also published as: JP2003091297A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識技術と音声合成技術を用いた音声対話装置に関するものである。
【０００２】
【従来の技術】
近年、音声認識をして対話をすることにより所望の目的を達成するようにする音声対話装置が各種機器に搭載されてきており、例えば、手入力することなく操作を補助するようにナビゲーション装置などに搭載されている。
【０００３】
この種の音声対話装置としては、例えば、ナビゲーション装置に搭載されている場合には、図２３に示すように、使用者との間で対話を継続して、目的地の表示や設定をすることができるようになっている。
【０００４】
【発明が解決しようとする課題】
しかしながら、このような従来の音声対話装置にあっては、例えば、ナビゲーション装置の目的地設定のための項目検索機能を用いて、図２３に示すように、千葉県にある○○ゴルフ場を検索する場合には、その○○ゴルフ場が千葉県にあることを知っていなければ対話を継続することができず、検索不能になってしまう。
【０００５】
要するに、使用者が対話装置側から発せられる質問に対する答えを持っていない場合には、対話が中断することになり、また、曖昧な答えしかできない場合には、間違った答えにより、適切な音声認識辞書を選択することができずに、使用者の目的を達成することができない、という問題があった。
【０００６】
本発明は、このような問題を解決するためになされたもので、使用者が装置側の発する質問に正確に答えられない場合でも対話を継続して目的を達成することができる音声対話装置を提供するものである。
【０００７】
【課題を解決するための手段】
本発明の音声対話装置は、使用者が発した音声に対応し、対話を行う音声対話装置において、入力された前記音声を認識する音声認識手段と、前記対話の階層毎に音声認識辞書が格納されている辞書格納手段と、前記対話に応じた辞書を準備する辞書準備手段と、前記使用者に対して発声を促す応答音声を出力する応答音声出力手段と、入力された前記音声を用いて次の応答音声を作成するとともに、前記音声が不明なことを表す内容のときは過去の前記応答音声または過去に入力された前記音声を用いて次の応答音声を作成するよう前記音声認識辞書を前記辞書準備手段に準備させて前記応答音声出力手段から前記応答音声を出力させる対話制御手段とを備える構成を有している。
【０００８】
このような構成により、音声認識された使用者の言葉（音声）に応じた指示がなされることにより、音声認識辞書が辞書格納手段内から準備されて、使用者に対して発声を促す音声、すなわち、質問音声や応答音声などが出力され対話が継続される一方、音声認識された使用者の音声の内容が次の指示を確定できない内容のときには、対話を継続するのに有効な音声認識辞書が辞書格納手段内から準備されて、続けて、質問音声や応答音声などが出力されることにより対話が継続される。したがって、使用者が応答できない場合や曖昧な答えしかできない場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【０００９】
本発明の音声対話装置の前記辞書準備手段は、前記辞書格納手段内の前記音声認識辞書を選択結合して対話に必要な辞書を準備し、前記対話制御手段は、前記音声認識手段により認識された前記音声が不明なことを表す内容のときには、次に遷移する可能性のある階層の前記辞書格納手段内の前記音声認識辞書を前記辞書準備手段に選択結合させて準備させる構成を有している。
【００１０】
このような構成により、音声認識された使用者の音声の内容が不明のために応答できない意味内容で次の指示を確定できないときには、対話を継続させたときに使用される可能性のある全ての音声認識辞書が辞書格納手段内から選択されて結合されることにより準備されて、続けて、質問音声や応答音声などが出力されることにより対話が継続される。したがって、使用者が答えを知らないために応答できない場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００１１】
本発明の音声対話装置の前記辞書準備手段は、前記辞書格納手段内の前記音声認識辞書を選択して対話に必要な辞書を準備し、前記対話制御手段は、前記音声認識手段により認識された前記音声が不明なことを表す内容のときには、前記応答音声出力手段に別の質問をする音声を出力させて前記辞書準備手段に前記辞書格納手段から対話に必要な前記音声認識辞書を選択準備させる構成を有している。
【００１２】
このような構成により、音声認識された使用者の音声の内容が不明のために応答できない意味内容で次の指示を確定できないときには、別の質問がされて、その答えに対応する音声認識辞書が辞書格納手段内から選択されることにより準備されることにより、続けて、質問音声や応答音声などが出力され対話が継続される。したがって、使用者が答えを知らないために応答できない場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００１３】
本発明の音声対話装置の前記辞書準備手段は、前記辞書格納手段内の前記音声認識辞書を選択結合して対話に必要な辞書を準備し、前記対話制御手段は、前記音声認識手段により認識された音声が１回目の不明なことを意味する内容のときには、前記応答音声出力手段に別の応答音声を出力させて前記辞書準備手段に前記辞書格納手段から対話に必要な前記音声認識辞書を選択準備させるとともに、前記音声認識手段により認識された前記使用者の音声が不明なことを表す内容のときが続いたときには、次に遷移する可能性のある階層全ての前記辞書格納手段内の前記音声認識辞書を前記辞書準備手段に選択結合させて準備させる構成を有している。
【００１４】
このような構成により、内容が不明で応答できない意味内容であるために次の指示を確定できない使用者の音声を初めて音声認識したときには、別の質問がされて、その答えに対応する音声認識辞書が辞書格納手段内から選択されることにより準備されて、質問音声や応答音声などが出力されることにより対話が継続されるが、次の音声認識でも使用者の音声の内容が不明のために指示を確定できないときには、対話を継続させたときに使用される可能性のある全ての音声認識辞書が辞書格納手段内から選択されて結合されることにより準備されて、続けて、質問音声や応答音声などが出力されることにより対話が継続される。したがって、使用者が答えを知らないために応答できない場合が繰り返されても、対話を中断することなく継続させることができ、目的を達成することができる。
【００１５】
本発明の音声対話装置の前記辞書準備手段は、前記辞書格納手段内の前記音声認識辞書を選択結合して対話に必要な辞書を準備し、前記対話制御手段は、前記音声認識手段により認識された音声が曖昧であることを表す内容のときには、過去の前記応答音声または過去に入力された前記音声を用いて次の応答音声を作成するよう前記音声認識辞書を前記辞書準備手段に準備させる構成を有している。
【００１６】
このような構成により、音声認識された使用者の音声の内容が曖昧な答えの意味内容で次の指示を確定できないときには、その曖昧さを表す部分の言葉を除いたときに対応する音声認識辞書と共に概念的に近い音声認識辞書が辞書格納手段内から選択されて結合されることにより準備されて、続けて、質問音声や応答音声などが出力されることにより対話が継続される。したがって、使用者が正確な答えを知らない場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００１７】
本発明の音声対話装置の前記辞書準備手段は、前記辞書格納手段内の前記音声認識辞書を選択結合して対話に必要な辞書を準備し、前記対話制御手段は、前記音声認識手段により認識された音声が複数の内容を含むときには、前記複数の内容毎に対応する前記辞書格納手段内の前記音声認識辞書の複数を前記辞書準備手段に選択結合させて準備させる構成を有している。
【００１８】
このような構成により、音声認識された使用者の音声が複数の内容を含む意味内容で次の指示を確定できないときには、それぞれの内容に対応する音声認識辞書が辞書格納手段内から選択されて結合されることにより準備されて、続けて、質問音声や応答音声などが出力されることにより対話が継続される。したがって、使用者が正確な答えを知らない場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００１９】
本発明の音声対話装置の前記辞書準備手段は、前記辞書格納手段内の前記音声認識辞書を一つ選択して対話に必要な辞書を準備し、前記対話制御手段は、前記音声認識手段により認識された音声が複数の内容を含むときには、前記内容に含まれる一つの内容に対応する前記辞書格納手段内の前記音声認識辞書を前記辞書準備手段に選択準備させて、前記応答音声出力手段に質問をする音声を出力させることにより、前記音声認識手段により認識される音声で対話の正誤を確認し、誤っていた場合には、前記音声に含まれる他の内容に対応する前記辞書格納手段内の前記音声認識辞書を前記辞書準備手段に選択準備させる構成を有している。
【００２０】
このような構成により、音声認識された使用者の音声が複数の内容を含む意味内容で次の指示を確定できないときには、まずは一つの内容に対応する音声認識辞書が辞書格納手段内から選択準備されて、別の質問がされることにより、その一つの内容でよかったのか否かが確認され、誤っていた場合には、他の内容に対応する音声認識辞書が辞書格納手段内から選択準備されて、続けて、質問音声や応答音声などが出力されることにより対話が継続される。したがって、使用者が正確な答えを知らない場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００２１】
【発明の実施の形態】
以下、本発明を図面に基づいて説明する。図１〜図６は本発明の音声対話装置の第１の実施の形態を示す図である。
【００２２】
まず、装置構成を説明する。図１において、音声対話装置１０は、使用者が入力した音声を認識する音声認識部１１と、使用者との間の音声による対話を制御する対話制御部１２と、対話の階層（種類や進度など）毎に必要な音声認識辞書が全対話階層分格納されている辞書格納部１３と、対話制御部１２からの指令により辞書格納部１３内に格納されている音声認識辞書を１個以上選択して結合することにより音声認識部１１が用いる音声認識辞書を作成する辞書選択結合部（辞書準備手段）１４と、対話制御部１２の指令により使用者に対して発声を促す質問音声あるいは応答音声を発する応答音声出力部１５と、この応答音声出力部１５で用いられる複数の音声を格納する応答音声格納部１６と、不明であることを意味する言葉が項目として登録されている不明表現語辞書１７と、対話制御部１２の問い合わせに応じて不明表現語辞書１７を参照し音声認識結果が不明なことを表現しているかどうか判定する不明表現語判定部１８とから構成されており、ナビゲーション装置に搭載されて検索や目的地設定などの操作を音声入力により補助するようになっている。
【００２３】
次に本発明の音声対話装置による処理動作を、図２に示す対話のフロー図を用いて説明する。
【００２４】
まず、使用者（ナビゲーション装置のユーザー）の指示により音声対話が開始されると、対話制御部１２は辞書選択結合部１４に検索のジャンルを表す言葉を含む辞書の作成を指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から図３に示すような検索ジャンルを表す言葉を含む音声認識辞書の作成を行う。
【００２５】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対して言葉の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ご用はなんでしょうか。」というメッセージを選択し、使用者に提示する。
【００２６】
次いで、対話制御部１２は音声認識部１１に対し、辞書選択結合部１４が作成した辞書を用いて音声認識を実行することを指令する。先の「ご用はなんでしょうか。」というメッセージを聞いた使用者が施設の検索を行うために「施設検索。」を発声して音声対話装置１０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「施設検索。」がコマンドとしてとして選ばれ、対話制御部１２へ出力される。この結果により、対話制御部１２は辞書選択結合部１４に検索のジャンルを表す言葉と共に「わかりません。」などの使用者が施設の種類を知らない場合に発声する可能性のある言葉をも含む辞書の選択を指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から図４に示すような検索ジャンルを表す言葉と「わかりません。」などの言葉とを含む音声認識辞書の選択（作成）を行う。
【００２７】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対して施設の種類の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「施設の種類をお話ください。」というメッセージを選択し、使用者に提示する。
【００２８】
次いで、対話制御部１２は音声認識部１１に対し、辞書選択結合部１４が作成した辞書を用いて音声認識を実行することを指令する。先の「施設の種類をお話ください。」というメッセージを聞いた使用者が検索したいジャンルを表す言葉として、「ゴルフ場。」を発声して音声対話装置１０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「ゴルフ場。」が検索のジャンルとして選ばれる。
【００２９】
次いで、対話制御部１２はゴルフ場の所在地を絞り込むため辞書選択結合部１４に県名と「わかりません。」など使用者がゴルフ場の所在する県名を知らない場合に発声する可能性のある言葉とで構成された辞書の作成を指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から図５に示すような県名と「わかりません。」などの言葉とで構成された音声認識辞書の作成を行う。
【００３０】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の所在する県名の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場のある県名をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、ゴルフ場がある県名が分からないため「わかりません。」と発声して音声対話装置１０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「わかりません。」が選ばれる。
【００３１】
次いで、対話制御部１２はこの結果を不明表現語判定部１８へ出力する。不明表現語判定部１８はこの結果を受け取り、図６に示すような不明であることを示す言葉を項目とする不明表現語辞書１７を参照し、不明を表す言葉かどうかの判定を行い、その結果を対話制御部１２へ出力する。この場合、「わかりません。」が不明を表す言葉として判定されるので、この判定結果に対し、対話制御部１２は辞書選択結合部１４に県名毎に分けられたゴルフ場の辞書をすべて結合した辞書の作成を指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から県名毎に分けられたゴルフ場の辞書をすべて取り出して結合した音声認識辞書の作成を行う。
【００３２】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の名称の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場の名前をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、ゴルフ場の名前「○○ゴルフ場。」と発声して音声対話装置１０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「○○ゴルフ場。」が選ばれ、検索対象が確定する。
【００３３】
次いで、対話制御部１２は応答音声出力部１５に対し、確定した検索対象「○○ゴルフ場。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「○○ゴルフ場。」を組み合わせて、「○○ゴルフ場の地図を表示します。」というメッセージを作成し、使用者に提示する。
【００３４】
以上の動作により、検索対象の目的地などの地図をナビゲーション装置の表示画面に表示させることができる。
【００３５】
このように第１の実施の形態においては、辞書選択結合部１４および不明表現語判定部１８を設けることにより、使用者が例えばゴルフ場の所在する県名を知らずに「わからない。」などと、対話中の質問に対して答えたために次に行うべき指示が確定できない場合でも、対話を継続させたときに使用されるであろう、県名毎に分けられたゴルフ場の辞書をすべて取り出して結合した音声認識辞書を作成し音声認識を行うことによって、音声対話の流れが中断してしまうことなく検索対象を確定することができる。したがって、使用者が答えを知らない応答の場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００３６】
次に、図７〜図９は本発明の音声対話装置の第２の実施の形態を示す図である。なお、この第２の実施の形態は、上述した第１の実施の形態と略同様に構成されているので、図面を流用して、同様な構成には同一の符号を付して特徴部分を説明する。
【００３７】
まず、装置構成を説明する。図７において、音声対話装置２０は、音声認識部１１と、対話制御部１２と、辞書格納部１３と、応答音声出力部１５と、応答音声格納部１６と、不明表現語辞書１７と、不明表現語判定部１８とを備えるとともに、上述第１の実施の形態における辞書選択結合部１４に代えて、辞書選択部（辞書準備手段）２４を設けられており、この辞書選択部２４は、対話制御部１２からの指令により辞書格納部１３内に格納されている音声認識辞書を１個選択することにより音声認識部１１が用いる音声認識辞書を作成するようになっている。
【００３８】
次に本発明の音声対話装置による処理動作を、図８に示す対話のフロー図を用いて説明する。
【００３９】
まず、上述第１の実施の形態と同様に、使用者の指示により音声対話が開始されて、「ご用はなんでしょうか。」というメッセージに対して、使用者が図３に示す検索ジャンルの「施設検索。」を発声し、この「施設検索。」がコマンドとして選ばれると、図４に示すような検索ジャンルを表す言葉と「わかりません。」などの言葉とを含む音声認識辞書が作成されて、「施設の種類をお話ください。」というメッセージが使用者に提示される。
【００４０】
そして、この「施設の種類をお話ください。」というメッセージを聞いた使用者が、検索したいジャンルを表す言葉がわからないときに、「わかりません。」と発声して音声対話装置２０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「わかりません。」が選ばれる。
【００４１】
次いで、対話制御部１２はこの結果を不明表現語判定部１８へ出力する。不明表現語判定部１８はこの結果を受け取り、図６に示すような不明であることを示す言葉を項目とする不明表現語辞書１７を参照し、不明を表す言葉か否かの判定を行い、その結果を対話制御部１２へ出力する。この場合、「わかりません。」が不明を表す言葉として判定されるので、この判定結果に対し、対話制御部１２は施設の所在地を絞り込むため辞書選択部２４に県名で構成された辞書の選択を指令する。この指令により、辞書選択部２４は音声認識辞書格納部１３から図９に示すような県名で構成された音声認識辞書の選択を行う。
【００４２】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対して施設の所在する県名の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「施設のある県名をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、施設のある県名「千葉県。」を発声して音声対話装置２０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「千葉県。」が選ばれる。この結果に対し、対話制御部１２は辞書選択部２４に千葉県にあるすべてのジャンルの施設で構成された辞書の選択を指令する。この指令により、辞書選択部２４は音声認識辞書格納部１３から千葉県にあるすべてのジャンルの施設で構成された辞書の選択を行う。
【００４３】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対して施設の名称の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「千葉県の施設の名前をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、施設の名前「○○ゴルフ場。」と発声して音声対話装置２０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「○○ゴルフ場。」が選ばれ、検索対象が確定する。
【００４４】
次いで、対話制御部１２は応答音声出力部１５に対し、確定した検索対象「○○ゴルフ場。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「○○ゴルフ場。」を組み合わせて、「○○ゴルフ場の地図を表示します。」というメッセージを作成し、使用者に提示する。
【００４５】
以上の動作により、検索対象の目的地などの地図をナビゲーション装置の表示画面に表示させることができる。
【００４６】
このように第２の実施の形態においては、辞書選択部２４および不明表現語判定部１８を設けることにより、使用者が例えば施設のジャンルの名称を知らずに「わからない。」などと、対話中の質問に対して答えたために次に行うべき指示が確定できない場合でも、県名を知っていれば県名毎に分けられたすべてのジャンルの施設を含む辞書を用いて音声認識を行うことによって、音声対話の流れが中断してしまうことなく検索対象を確定することができる。したがって、使用者が答えを知らない応答の場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００４７】
また、この第２の実施の形態の他の態様としては、図１０に示すように、「施設の種類をお話ください。」というメッセージに対して、また、「施設のある県名をお話ください。」というメッセージに対しても「わかりません。」という発声が繰り返し入力された場合には、辞書選択部２４にすべてのジャンルの施設で構成された辞書の選択を行って、「施設の名前をお話ください。」というメッセージを使用者に提示することによって、このメッセージを聞いた使用者が、施設の名前「○○ゴルフ場。」と発声して音声対話装置２０に入力することにより、検索対象として「○○ゴルフ場。」を確定し、「○○ゴルフ場の地図を表示します。」というメッセージを使用者に提示することになる。
【００４８】
次に、図１１〜図１４は本発明の音声対話装置の第３の実施の形態を示す図である。なお、この第３の実施の形態は、上述した第１の実施の形態と略同様に構成されているので、図面を流用して、同様な構成には同一の符号を付して特徴部分を説明する。
【００４９】
まず、装置構成を説明する。図１１において、音声対話装置３０は、音声認識部１１と、対話制御部１２と、辞書格納部１３と、辞書選択結合部１４と、応答音声出力部１５と、応答音声格納部１６とを備えるとともに、上述第１の実施の形態における不明表現語辞書１７および不明表現語判定部１８に代えて、曖昧表現語辞書３７および曖昧表現語判定部３８が設けられており、また加えて、距離計算部３９が設けられている。
【００５０】
曖昧表現語辞書３７は、曖昧であることを意味する言葉が項目として登録されており、曖昧表現語判定部３８は、対話制御部１２の問い合わせに応じて曖昧表現語辞書３７を参照し音声認識結果が曖昧なことを表現しているかどうか判定するようになっている。
【００５１】
距離計算部３９は、特定距離内に含まれるか否かを計算して検索対象とする範囲を選択するようになっており、例えば、東京都の指定に対しては隣接県の千葉県、埼玉県、神奈川県、山梨県を選択するようになっている。
【００５２】
次に本発明の音声対話装置による処理動作を、図１２に示す対話のフロー図を用いて説明する。
【００５３】
まず、上述第１の実施の形態と同様に、使用者の指示により音声対話が開始されて、「ご用はなんでしょうか。」というメッセージに対して、使用者が図３に示す検索ジャンルの「施設検索。」を発声し、次いで、図４に示すような検索ジャンルの音声認識辞書が作成されて「施設の種類をお話ください。」というメッセージが使用者に提示されるのに対して、使用者が「ゴルフ場。」と発声して音声対話装置３０に入力することにより、入力された音声が音声認識部１１で認識されて、「ゴルフ場。」が検索のジャンルとして選ばれる。
【００５４】
そして、対話制御部１２はゴルフ場の所在地を絞り込むために、県名と、その県名に連接する「かなあ。」、「の辺り。」などの使用者がゴルフ場の所在する県名を曖昧にしか知らない場合に発声する可能性のある言葉とで構成された辞書の作成を辞書選択結合部１４に指令する。この指令により、辞書選択結合部１４は図１３に示すように県名を主体辞書とするとともにその県名に連接する「かなあ。」、「の辺り。」などの言葉を連接辞書とする音声認識辞書を音声認識辞書格納部１３から選択して作成する。
【００５５】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の所在する県名の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場のある県名をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、第Ｎ番目の対話階層において（ただし、Ｎは自然数である。以下同じ。）、ゴルフ場がある県名が曖昧なため「東京都かなあ。」と発声して音声対話装置３０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「東京都かなあ。」が選ばれる。
【００５６】
次いで、対話制御部１２はこの結果を曖昧表現語判定部３８へ出力する。曖昧表現語判定部３８はこの結果を受け取り、図１４に示すような曖昧であることを示す言葉を項目とする曖昧表現語辞書３７を参照し、曖昧を表す言葉を含むか否かの判定を行い、その結果を対話制御部１２へ出力する。この場合、「かなあ。」が曖昧を表す言葉として判定される。この判定結果に対し、対話制御部１２は距離計算部３９に対して認識結果から曖昧を表す言葉を取り除いた「東京都。」に対する各県の距離を計算し、東京都と距離が近い県を選択することを指令する。この指令に対し距離計算部３９は千葉県、埼玉県、神奈川県、山梨県を選択し辞書選択結合部１４にこれら４県に東京都を加えたゴルフ場の辞書を結合することを指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から千葉県、埼玉県、神奈川県、山梨県、東京都にあるゴルフ場の辞書を取り出して結合し、第Ｎ＋１番目の対話階層用の音声認識辞書の作成を行う。
【００５７】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の名称の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場の名前をお話しください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、ゴルフ場の名前「○○ゴルフ場。」と発声して音声対話装置３０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「○○ゴルフ場。」が選ばれ、検索対象が確定する。
【００５８】
次いで、対話制御部１２は応答音声出力部１５に対し、確定した検索対象「○○ゴルフ場。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「○○ゴルフ場。」を組み合わせて、「○○ゴルフ場の地図を表示します。」というメッセージを作成し、使用者に提示する。
【００５９】
以上の動作により、検索対象の目的地などの地図をナビゲーション装置の表示画面に表示させることができる。
【００６０】
このように第３の実施の形態においては、辞書選択結合部１４および曖昧表現語判定部３８を設けることにより、使用者が例えばゴルフ場の所在する県名について曖昧にしか知らずに「東京都かなあ。」などと、対話中の質問に対して答えたために次に行うべき指示が確定できない場合でも、東京都から距離的に近い県のゴルフ場の辞書を結合した音声認識辞書を作成して音声認識を行うことによって、音声対話の流れが中断してしまうことなく検索対象を確定することができる。したがって、使用者が正確な答えを知らずに曖昧な応答になってしまう場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００６１】
次に、図１５〜図１８は本発明の音声対話装置の第４の実施の形態を示す図である。なお、この第４の実施の形態は、上述した第３の実施の形態と略同様に構成されているので、図面を流用して、同様な構成には同一の符号を付して特徴部分を説明する。
【００６２】
まず、装置構成を説明する。図１５において、音声対話装置４０は、音声認識部１１と、対話制御部１２と、辞書格納部１３と、辞書選択結合部１４と、応答音声出力部１５と、応答音声格納部１６と、曖昧表現語辞書３７と、曖昧表現語判定部３８とを備えるとともに、上述第３の実施の形態における構成に加えて、概念辞書テーブル４７および類似概念選択部４８が設けられている。
【００６３】
概念辞書テーブル４７は、図１７に示すように、類似する概念を関連づけして予め設定されているものであり、類似概念選択部４８は、概念辞書テーブル４７を参照していずれを採用するか決定するようになっている。
【００６４】
次に本発明の音声対話装置による処理動作を、図１６に示す対話のフロー図を用いて説明する。
【００６５】
まず、上述第３の実施の形態と同様に、使用者の指示により音声対話が開始されて、「ご用はなんでしょうか。」というメッセージに対して、使用者が図３に示す検索ジャンルの「施設検索。」を発声し、この「施設検索。」がコマンドとして選ばれる。
【００６６】
そして、この「施設検索。」が指示されると、対話制御部１２は施設の種類を絞り込むために、施設の名称と、その名称に連接する「かなあ。」、「の辺り。」などの使用者が施設の種類が曖昧にしか知らない場合に発声する可能性のある言葉とで構成された辞書の作成を辞書選択結合部１４に指令する。この指令により、辞書選択結合部１４は図１８に示すように施設の名称を主体辞書とするとともにその名称に連接する「かなあ。」、「の辺り。」などの言葉を連接辞書とする音声認識辞書を音声認識辞書格納部１３から選択して作成する。
【００６７】
次いで、対話制御部１２は音声認識部１１に対し、辞書選択結合部１４が作成した辞書を用いて音声認識を実行することを指令するとともに、応答音声出力部１５に対し、使用者に対して施設の種類の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「施設の種類をお話ください。」という第Ｎ番目の対話階層用のメッセージを選択し、使用者に提示する。この「施設の種類をお話ください。」というメッセージを聞いた使用者が、第Ｎ番目の対話階層において、検索したいジャンルを表す言葉として、「動物園かなあ。」と発声して音声対話装置４０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「動物園かなあ。」が選ばれる。
【００６８】
次いで、対話制御部１２はこの結果を曖昧表現語判定部３８へ出力する。曖昧表現語判定部３８はこの結果を受け取り、図１４に示すような曖昧であることを示す言葉を項目とする曖昧表現語辞書３７を参照し、曖昧を表す言葉を含むか否かの判定を行い、その結果を対話制御部１２へ出力する。この場合、「かなあ。」が曖昧を表す言葉として判定される。この判定結果に対し、対話制御部１２は類似概念選択部４８に対して認識結果から曖昧を表す言葉を取り除いた「動物園。」と近い概念の検索ジャンルを図１７に示すような概念辞書テーブル４７を参照して「遊園地。」と決定し、この結果を対話制御部１２へ出力する。対話制御部１２はこの結果を認識結果の「動物園。」と共に記憶する。
【００６９】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対して施設の所在する県名の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「施設のある県名をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者は、「大阪府。」と発声して音声対話装置４０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「大阪府。」が選ばれる。
【００７０】
次いで、対話制御部１２は辞書選択結合部１４に対し、先に記憶した「動物園。」と「遊園地。」の２ジャンルの辞書を結合することを指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から大阪府の動物園と遊園地の施設辞書を取り出して結合し、第Ｎ＋１番目の対話階層用の音声認識辞書の作成を行う。
【００７１】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対して施設の名称の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「施設の名前をお話しください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、動物園と曖昧に記憶している遊園地の名前「○○パーク。」と発声して音声対話装置４０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「○○パーク。」が選ばれ、検索対象が確定する。
【００７２】
次いで、対話制御部１２は応答音声出力部１５に対し、確定した検索対象「○○パーク。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「○○パーク。」を組み合わせて、「○○パークの地図を表示します。」というメッセージを作成し、使用者に提示する。
【００７３】
以上の動作により、検索対象の目的地などの地図をナビゲーション装置の表示画面に表示させることができる。
【００７４】
このように第４の実施の形態においては、辞書選択結合部１４、曖昧表現語判定部３８および類似概念選択部４８とを設けることにより、使用者が例えば施設の種類について曖昧にしか知らずに「動物園かなあ。」などと、対話中の質問に対して答えたために次に行うべき指示が確定できない場合でも、動物園に類似する施設の遊園地の辞書を結合した音声認識辞書を作成して音声認識を行うことによって、音声対話の流れが中断してしまうことなく検索対象を確定することができる。したがって、使用者が正確な答えを知らずに曖昧な応答になってしまう場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００７５】
次に、図１９および図２０は本発明の音声対話装置の第５の実施の形態を示す図である。なお、この第５の実施の形態は、上述した第１の実施の形態と略同様に構成されているので、図面を流用して、同様な構成には同一の符号を付して特徴部分を説明する。
【００７６】
まず、装置構成を説明する。図１９において、音声対話装置５０は、音声認識部１１と、対話制御部１２と、辞書格納部１３と、辞書選択結合部１４と、応答音声出力部１５と、応答音声格納部１６とを備えるとともに、上述第１の実施の形態における不明表現語辞書１７および不明表現語判定部１８に代えて、複数結果判定部５８を設けられており、この複数結果判定部５８は、対話制御部１２の問い合わせに応じて音声認識結果が複数かどうかを判定するようになっている。
【００７７】
次に本発明の音声対話装置による処理動作を、図２０に示す対話のフロー図を用いて説明する。なお、ここでは、使用者が千葉県の○○ゴルフ場を検索することを目的としているが、○○ゴルフ場の所在地の記憶が曖昧で東京都か千葉県にあると認識している場合を一例に説明する。
【００７８】
まず、上述第１の実施の形態と同様に、使用者の指示により音声対話が開始されて、「ご用はなんでしょうか。」というメッセージに対して、使用者が図３に示す検索ジャンルの「施設検索。」を発声し、次いで、図４に示すような検索ジャンルの音声認識辞書が作成されて「施設の種類をお話ください。」というメッセージが使用者に提示されるのに対して、使用者が「ゴルフ場。」と発声して音声対話装置５０に入力することにより、入力された音声が音声認識部１１で認識されて、「ゴルフ場。」が検索のジャンルとして選ばれる。
【００７９】
そして、対話制御部１２はゴルフ場の所在地を絞り込むために辞書選択結合部１４に県名を表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から図９に示すような県名で構成された音声認識辞書の作成を行う。
【００８０】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の所在する県名の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場のある県名をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、ゴルフ場がある県名が曖昧なため「東京都か千葉県。」と発声して音声対話装置５０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「東京都か千葉県。」が得られる。
【００８１】
次いで、対話制御部１２はこの結果を複数結果判定部５８へ出力する。複数結果判定部５８はこの結果を受け取り、認識結果中に県名をあらわす言葉として東京都と千葉県の２つが含まれると判定する。その結果を対話制御部１２へ出力する。この判定結果に対し、対話制御部１２は辞書選択結合部１４に東京都のゴルフ場の辞書と千葉県のゴルフ場の辞書を結合した辞書の作成を指令する。この指令により、辞書選択結合部１４は音声認識辞書格納部１３から東京都のゴルフ場の辞書と千葉県のゴルフ場の辞書を取り出して結合し音声認識辞書の作成を行う。
【００８２】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の名称の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場の名前をお話ください。」というメッセージを選択し、使用者に提示する。このメッセージを聞いた使用者が、ゴルフ場の名前「○○ゴルフ場。」と発声して音声対話装置５０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「○○ゴルフ場。」が選ばれ、検索対象が確定する。
【００８３】
次いで、対話制御部１２は応答音声出力部１５に対し、確定した検索対象「○○ゴルフ場。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「○○ゴルフ場。」を組み合わせて、「○○ゴルフ場の地図を表示します。」というメッセージを作成し、使用者に提示する。
【００８４】
以上の動作により、検索対象の目的地などの地図をナビゲーション装置の表示画面に表示させることができる。
【００８５】
このように第５の実施の形態においては、辞書選択結合部１４および複数結果判定部５８を設けることにより、使用者が例えばゴルフ場の所在する県名を良く知らずに「東京都か千葉県。」と、対話中の質問に対して答えたために次に行うべき指示が確定できない場合でも、東京都のゴルフ場の辞書と千葉県のゴルフ場の辞書を取り出して結合した音声認識辞書を作成し音声認識を行うことによって、音声対話の流れが中断してしまうことなく検索対象を確定することができる。したがって、使用者が正確な答えを知らずに曖昧な応答になってしまう場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【００８６】
次に、図２１および図２２は本発明の音声対話装置の第６の実施の形態を示す図である。なお、この第６の実施の形態は、上述した第５の実施の形態と略同様に構成されているので、図面を流用して、同様な構成には同一の符号を付して特徴部分を説明する。
【００８７】
まず、装置構成を説明する。図２１において、音声対話装置６０は、音声認識部１１と、対話制御部１２と、辞書格納部１３と、応答音声出力部１５と、応答音声格納部１６と、複数結果判定部５８とを備えるとともに、上述第５の実施の形態における辞書選択結合部１４に代えて、上述第２の実施の形態で採用した辞書選択部２４を採用しており、また加えて、入力蓄積部６１を設けている。
【００８８】
ここで、音声認識部１１は、使用者が入力した音声を認識するのと同時にその入力音声を音響分析した結果も入力蓄積部６１へ出力するようになっており、この入力蓄積部６１は、音声認識部１１から出力される入力音声またはその入力音声の音響分析結果を蓄積するようになっている。
【００８９】
なお、辞書選択部２４は、対話制御部１２からの指令により辞書格納部１３内に格納されている音声認識辞書を１個選択することにより音声認識部１１が用いる音声認識辞書を作成する。
【００９０】
次に本発明の音声対話装置による処理動作を、図２２に示す対話のフロー図を用いて説明する。
【００９１】
まず、上述第５の実施の形態と同様に、使用者の指示により音声対話が開始されて、「ご用はなんでしょうか。」というメッセージに対して、使用者が図３に示す検索ジャンルの「施設検索。」を発声し、次いで、「施設の種類をお話ください。」というメッセージに対して、使用者が図４に示す検索ジャンルの「ゴルフ場。」を発声し、次いで、検索ジャンルが「ゴルフ場。」であることから、図９に示すような検索ジャンルの音声認識辞書を作成して「ゴルフ場のある県名をお話ください。」というメッセージを使用者に提示するのに対して、その使用者が、第Ｎ番目の対話階層において、ゴルフ場がある県名が曖昧であるために「東京都か千葉県。」と発声して音声対話装置５０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「東京都か千葉県。」が得られる。
【００９２】
そして、複数結果判定部５８は音声認識部１１による認識結果中に県名を表す言葉として東京都と千葉県の２つが含まれると判定することになり、この判定結果に対し、対話制御部１２は、まずは、辞書選択部２４に東京都のゴルフ場の辞書の選択作成を指令する。この指令により、辞書選択部２４は音声認識辞書格納部１３から東京都のゴルフ場の辞書を取り出して第Ｎ＋１番目の対話階層用の音声認識辞書を作成する。
【００９３】
次いで、対話制御部１２は応答音声出力部１５に対し、使用者に対してゴルフ場の名称の発声を促すメッセージを出力することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６から「ゴルフ場の名前をお話ください。」というメッセージを選択し、使用者に提示する。
【００９４】
次いで、対話制御部１２は音声認識部１１に対し、辞書選択部２４が作成した辞書を用いて音声認識を実行することを指令する。先のメッセージを聞いた使用者が、ゴルフ場の名前「○○ゴルフ場。」を発声して音声対話装置５０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として「××ゴルフ場。」が得られるのと同時に、使用者が発声した「○○ゴルフ場。」の入力を入力音声の形で、またはその入力音声を音響分析した結果の形で入力蓄積部６１へ出力し蓄積させる。
【００９５】
次いで、対話制御部１２は応答音声出力部１５に対し、認識結果である「××ゴルフ場。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「××ゴルフ場。」を組み合わせて、「××ゴルフ場ですか。」というメッセージを作成し、使用者に提示する。
【００９６】
次いで、対話制御部１２は音声認識部１１に対し、辞書選択部２４が作成した辞書を用いて音声認識を実行することを指令する。これに対し使用者が、期待する結果ではないため「違う。」と発声して音声対話装置５０に入力することにより、入力された音声は音声認識部１１で認識され、認識結果として、「違う。」が得られる。
【００９７】
次いで、対話制御部１２は先の東京都の選択対話が誤っていたため、辞書選択部２４に千葉県のゴルフ場の辞書の作成を指令する。この指令により、辞書選択部２４は音声認識辞書格納部１３から千葉県のゴルフ場の辞書を取り出して音声認識辞書を作成する。
【００９８】
次いで、対話制御部１２は音声認識部１１に対し、辞書選択部２４が作成した辞書を用いて、入力蓄積部６１から先の入力を取り出して音声認識を実行することを指令することにより、その認識結果として「○○ゴルフ場。」が得られる。
【００９９】
次いで、対話制御部１２は応答音声出力部１５に対し、「○○ゴルフ場。」を使用者に提示することを指令する。この指令に対し、応答音声出力部１５は応答音声格納部１６に格納されている内容と「○○ゴルフ場。」を組み合わせて、「○○ゴルフ場の地図を表示します。」というメッセージを作成し、使用者に提示する。
【０１００】
以上の動作により、検索対象の目的地などの地図をナビゲーション装置の表示画面に表示させることができる。
【０１０１】
このように第６の実施の形態においては、辞書選択部２４、複数結果判定部５８および入力蓄積部６１を設けることにより、使用者が例えばゴルフ場の所在する県名を良く知らずに「東京都か千葉県。」と、対話中の質問に対して答えたために次に行うべき指示が確定できない場合でも、東京都のゴルフ場の辞書での音声認識結果と千葉県のゴルフ場の辞書での音声認識結果を順次求めて、その認識結果を提示することによって、音声対話の流れが中断してしまうことなく検索対象を確定することができる。したがって、使用者が正確な答えを知らずに曖昧な応答になってしまう場合でも、対話を中断することなく継続させることができ、目的を達成することができる。
【０１０２】
【発明の効果】
以上説明したように、本発明によれば、音声認識された使用者の音声の内容が次の指示を確定できない内容のときには、対話を継続するのに有効な音声認識辞書を辞書格納手段内から準備するので、続けて、質問音声や応答音声などを出力し対話を継続することができる。したがって、使用者が応答できない場合や曖昧な答えしかできない場合でも、対話を中断することなく継続させて目的を達成することができる、という優れた効果を有する音声対話装置を提供することができる。
【図面の簡単な説明】
【図１】本発明の音声対話装置の第１実施形態の概略全体構成を示す関係ブロック図
【図２】本発明の音声対話装置の第１実施形態による処理動作を説明する対話フロー図
【図３】本発明の音声対話装置の第１実施形態が用いる音声認識辞書を示す概念図
【図４】本発明の音声対話装置の第１実施形態が用いる音声認識辞書を示す概念図
【図５】本発明の音声対話装置の第１実施形態が用いる音声認識辞書を示す概念図
【図６】本発明の音声対話装置の第１実施形態が用いる音声認識辞書を示す概念図
【図７】本発明の音声対話装置の第２実施形態の概略全体構成を示す関係ブロック図
【図８】本発明の音声対話装置の第２実施形態による処理動作を説明する対話フロー図
【図９】本発明の音声対話装置の第２実施形態が用いる音声認識辞書を示す概念図
【図１０】本発明の音声対話装置の第２実施形態の他の実施形態を示す処理動作を説明する対話フロー図
【図１１】本発明の音声対話装置の第３実施形態の概略全体構成を示す関係ブロック図
【図１２】本発明の音声対話装置の第３実施形態による処理動作を説明する対話フロー図
【図１３】本発明の音声対話装置の第３実施形態が用いる音声認識辞書を示す概念図
【図１４】本発明の音声対話装置の第３実施形態が用いる音声認識辞書を示す概念図
【図１５】本発明の音声対話装置の第４実施形態の概略全体構成を示す関係ブロック図
【図１６】本発明の音声対話装置の第４実施形態による処理動作を説明する対話フロー図
【図１７】本発明の音声対話装置の第４実施形態が用いる音声認識辞書を示す概念図
【図１８】本発明の音声対話装置の第４実施形態が用いる音声認識辞書を示す概念図
【図１９】本発明の音声対話装置の第５実施形態の概略全体構成を示す関係ブロック図
【図２０】本発明の音声対話装置の第５実施形態による処理動作を説明する対話フロー図
【図２１】本発明の音声対話装置の第６実施形態の概略全体構成を示す関係ブロック図
【図２２】本発明の音声対話装置の第６実施形態による処理動作を説明する対話フロー図
【図２３】従来技術による処理動作を説明する対話フロー図
【符号の説明】
１０〜６０音声対話装置
１１音声認識部
１２対話制御部
１３音声認識辞書格納部
１４辞書選択結合部
１５応答音声出力部
１６応答音声格納部
１７不明表現語辞書
１８不明表現語判定部
２４辞書選択部
３７曖昧表現語辞書
３８曖昧表現語判定部
３９距離計算部
４７概念辞書テーブル
４８類似概念選択部
５８複数結果判定部
６１入力蓄積部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice interaction device using a voice recognition technology and a voice synthesis technology.
[0002]
[Prior art]
2. Description of the Related Art In recent years, various devices have been equipped with a voice interaction device that achieves a desired purpose by performing voice recognition and dialogue, such as a navigation device that assists an operation without manual input. It is installed in.
[0003]
As this type of voice interactive device, for example, when mounted on a navigation device, as shown in FIG. 23, it is necessary to continue a dialog with a user to display and set a destination. Can be done.
[0004]
[Problems to be solved by the invention]
However, in such a conventional voice interactive device, for example, as shown in FIG. 23, a search is made for a XX golf course in Chiba using an item search function for setting a destination of a navigation device. In such a case, if the user does not know that the XX golf course is located in Chiba Prefecture, the conversation cannot be continued, and the search becomes impossible.
[0005]
In other words, if the user does not have an answer to the question asked by the dialogue device, the dialogue will be interrupted. There was a problem that the user's purpose could not be achieved because the dictionary could not be selected.
[0006]
SUMMARY OF THE INVENTION The present invention has been made to solve such a problem, and a voice dialogue apparatus capable of continuing a dialogue and achieving an object even when a user cannot correctly answer a question asked by the apparatus side. To provide.
[0007]
[Means for Solving the Problems]
The voice interaction device of the present invention In a voice interaction device that performs a dialogue in response to a voice uttered by a user, input Said that Voice recognition means for recognizing voice; Said Dictionary storage means for storing a speech recognition dictionary for each level of dialogue; Said Dictionary preparation means for preparing a dictionary according to the dialogue; For the user Prompt vocalization response Response voice output means for outputting voice; A next response voice is created using the input voice, and the next response voice is generated using the past response voice or the previously input voice when the content indicates that the voice is unknown. To create The voice recognition dictionary is prepared by the dictionary preparation means, and the response voice output means The response And a dialogue control unit for outputting a voice.
[0008]
With such a configuration, when an instruction is made in accordance with the user's speech (voice) recognized by the speech, a speech recognition dictionary is prepared from the dictionary storage means, and a voice prompting the user to speak, In other words, while the dialogue is continued by outputting a question voice or a response voice, etc., if the voice content of the voice-recognized user cannot determine the next instruction, a speech recognition dictionary effective for continuing the dialogue. Is prepared from the dictionary storage means, and then the dialogue is continued by outputting a question voice and a response voice. Therefore, even when the user cannot respond or can only give an ambiguous answer, the dialog can be continued without interruption and the object can be achieved.
[0009]
The dictionary preparing means of the voice dialogue apparatus of the present invention prepares a dictionary necessary for dialogue by selectively combining the voice recognition dictionaries in the dictionary storage means, and the dialogue control means is recognized by the voice recognition means. Was Of the content indicating that the voice is unknown In some cases, the speech recognition dictionary in the dictionary storage unit of the hierarchy to which the next transition is likely to be made is selectively combined and prepared with the dictionary preparation unit.
[0010]
With such a configuration, when the next instruction cannot be determined in a meaning that cannot be answered because the content of the voice of the user whose voice has been recognized is unknown, all of the possibilities that may be used when the dialogue is continued. The voice recognition dictionary is prepared by being selected from the dictionary storage means and combined, and then the dialogue is continued by outputting a question voice, a response voice, and the like. Therefore, even when the user cannot respond because he does not know the answer, the dialog can be continued without interruption, and the object can be achieved.
[0011]
The dictionary preparing means of the voice dialogue apparatus of the present invention selects the voice recognition dictionary in the dictionary storage means to prepare a dictionary necessary for a dialogue, and the dialogue control means is recognized by the voice recognition means. Of the content indicating that the voice is unknown In some cases, a voice for asking another question is output to the response voice output unit, and the dictionary preparation unit is configured to select and prepare the voice recognition dictionary required for a conversation from the dictionary storage unit.
[0012]
According to such a configuration, when the next instruction cannot be determined because the content of the voice of the user whose voice has been recognized cannot be answered because it is unknown, another question is asked, and the voice recognition dictionary corresponding to the answer is issued. By being prepared by being selected from the dictionary storage means, a question voice, a response voice, and the like are output, and the dialogue is continued. Therefore, even when the user cannot respond because he does not know the answer, the dialog can be continued without interruption, and the object can be achieved.
[0013]
The dictionary preparing means of the voice dialogue apparatus of the present invention prepares a dictionary necessary for dialogue by selectively combining the voice recognition dictionaries in the dictionary storage means, and the dialogue control means is recognized by the voice recognition means. When the voice that has been heard has a content meaning unknown for the first time, another voice is output to the response voice output means. Answer voice To cause the dictionary preparation means to select and prepare the speech recognition dictionary required for a conversation from the dictionary storage means, and to be recognized by the speech recognition means. When the user's voice is unknown Is continued, the speech recognition dictionaries in the dictionary storage means of all the hierarchies which may transition next are selectively combined with the dictionary preparation means to prepare them.
[0014]
With such a configuration, when the voice of the user who cannot determine the next instruction for the first time is recognized because the content is unknown and cannot be answered, another question is asked and a voice recognition dictionary corresponding to the answer is given. Is prepared by being selected from the dictionary storage means, and the dialogue is continued by outputting question voices and response voices. However, since the content of the user's voice is unknown even in the next voice recognition, If the instruction cannot be confirmed, all the speech recognition dictionaries that may be used when the dialogue is continued are prepared by being selected from the dictionary storage means and combined, and then the question speech and response The dialogue is continued by outputting a voice or the like. Therefore, even when the user cannot respond because he does not know the answer, the dialog can be continued without interruption, and the object can be achieved.
[0015]
The dictionary preparing means of the voice dialogue apparatus of the present invention prepares a dictionary necessary for dialogue by selectively combining the voice recognition dictionaries in the dictionary storage means, and the dialogue control means is recognized by the voice recognition means. When the content indicates that the sound is ambiguous, The voice recognition dictionary is provided to the dictionary preparing means so as to create a next response voice using the past response voice or the voice input in the past. It has a configuration to prepare.
[0016]
With such a configuration, when the content of the voice of the user whose speech has been recognized cannot be determined with the meaning of the ambiguous answer, the corresponding speech recognition dictionary is removed when the words representing the ambiguity are removed. In addition, a speech recognition dictionary that is conceptually similar is prepared by being selected from the dictionary storage means and combined, and then the dialogue is continued by outputting a question voice, a response voice, and the like. Therefore, even when the user does not know the correct answer, the dialog can be continued without interruption, and the object can be achieved.
[0017]
The dictionary preparing means of the voice dialogue apparatus of the present invention prepares a dictionary necessary for dialogue by selectively combining the voice recognition dictionaries in the dictionary storage means, and the dialogue control means is recognized by the voice recognition means. When the voice includes a plurality of contents, a plurality of the speech recognition dictionaries in the dictionary storage means corresponding to the plurality of contents are selectively coupled to the dictionary preparation means to prepare.
[0018]
With such a configuration, when the next instruction cannot be determined with the meaning of the user whose voice has been recognized and includes a plurality of contents, the voice recognition dictionary corresponding to each content is selected from the dictionary storage means and combined. Then, the dialogue is continued by outputting a question voice, a response voice, and the like. Therefore, even when the user does not know the correct answer, the dialog can be continued without interruption, and the object can be achieved.
[0019]
The dictionary preparing means of the voice dialogue apparatus of the present invention selects one of the voice recognition dictionaries in the dictionary storage means to prepare a dictionary necessary for a dialogue, and the dialogue control means recognizes the speech by the voice recognition means. When the input speech includes a plurality of contents, the dictionary preparation unit selects and prepares the speech recognition dictionary in the dictionary storage unit corresponding to one content included in the contents, and asks the response speech output unit to answer the question. By confirming the correctness of the dialogue with the voice recognized by the voice recognition unit, if the voice is recognized by the voice recognition unit, and if it is wrong, in the dictionary storage unit corresponding to other contents included in the voice The voice recognition dictionary is selected and prepared by the dictionary preparation means.
[0020]
With such a configuration, when the voice of the user whose voice has been recognized cannot be determined with the semantic content including a plurality of contents, a voice recognition dictionary corresponding to one content is first selected and prepared from the dictionary storage means. Then, when another question is asked, it is confirmed whether or not one of the contents is satisfactory. If it is incorrect, a speech recognition dictionary corresponding to the other content is selected and prepared from the dictionary storage means. Then, the dialogue is continued by outputting a question voice, a response voice, and the like. Therefore, even when the user does not know the correct answer, the dialog can be continued without interruption, and the object can be achieved.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described with reference to the drawings. FIG. 1 to FIG. 6 are views showing a first embodiment of the voice interaction apparatus of the present invention.
[0022]
First, the device configuration will be described. In FIG. 1, a voice interaction device 10 includes a voice recognition unit 11 for recognizing a voice input by a user, a dialog control unit 12 for controlling a voice interaction with the user, and a hierarchy (type and progress) of the dialog. And the like, and a dictionary storage unit 13 in which necessary speech recognition dictionaries are stored for all dialog layers, and one or more speech recognition dictionaries stored in the dictionary storage unit 13 according to a command from the dialog control unit 12. A dictionary selection / coupling unit (dictionary preparation means) 14 for creating a voice recognition dictionary used by the voice recognition unit 11 by combining the data, and a question voice or a response voice prompting the user to generate a voice according to a command from the dialog control unit 12. , A response voice storage unit 16 for storing a plurality of voices used in the response voice output unit 15, and an unknown table in which words meaning unknown are registered as items. A word dictionary 17, and an unknown expression word determining unit 18 that determines whether or not the speech recognition result is unknown by referring to the unknown expression word dictionary 17 in response to an inquiry from the dialog control unit 12. It is mounted on a navigation device to assist operations such as search and destination setting by voice input.
[0023]
Next, the processing operation of the speech dialogue apparatus of the present invention will be described with reference to the flowchart of the dialogue shown in FIG.
[0024]
First, when a voice dialogue is started by an instruction of a user (a user of the navigation device), the dialogue control unit 12 instructs the dictionary selection / connection unit 14 to create a dictionary including words representing a genre of search. In response to this command, the dictionary selecting / combining unit 14 creates a speech recognition dictionary including words representing a search genre from the speech recognition dictionary storage unit 13 as shown in FIG.
[0025]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to output a message prompting the user to speak words. In response to this command, the response voice output unit 15 selects a message "What is your need?" From the response voice storage unit 16 and presents it to the user.
[0026]
Next, the dialog control unit 12 instructs the speech recognition unit 11 to execute speech recognition using the dictionary created by the dictionary selection and combination unit 14. When the user who has heard the message "What is your need?" Utters "Facility search." To search for a facility and inputs it to the voice interactive device 10, the input voice is Recognized by the voice recognition unit 11, “Facility search.” Is selected as a command as a recognition result, and output to the dialog control unit 12. Based on this result, the dialogue control unit 12 also sends words indicating the genre of search to the dictionary selection / connection unit 14 together with words that may be uttered when the user does not know the type of facility, such as “I do not understand.” Instructs the dictionary to be included. In response to this command, the dictionary selection / coupling unit 14 selects (creates) a speech recognition dictionary including words indicating the search genre and words such as “I do not understand.” From the speech recognition dictionary storage unit 13 as shown in FIG. Do.
[0027]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the type of facility. In response to this command, the response voice output unit 15 selects a message “Please tell us the type of facility” from the response voice storage unit 16 and presents it to the user.
[0028]
Next, the dialog control unit 12 instructs the speech recognition unit 11 to execute speech recognition using the dictionary created by the dictionary selection and combination unit 14. When the user who hears the message "Please tell us the type of facility." Utters "Golf course." Is recognized by the voice recognition unit 11, and “Golf course.” Is selected as a search genre as a recognition result.
[0029]
Next, the dialogue control unit 12 may narrow down the location of the golf course and, if the user does not know the prefecture name where the golf course is located, such as “I do not know.” Instructs the creation of a dictionary composed of certain words. In response to this command, the dictionary selecting / combining unit 14 creates a speech recognition dictionary composed of a prefecture name and words such as "I do not understand" as shown in FIG. 5 from the speech recognition dictionary storage unit 13.
[0030]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the prefecture where the golf course is located. In response to this command, the response voice output unit 15 selects a message "Please talk about the name of the prefecture where the golf course is located" from the response voice storage unit 16 and presents it to the user. When the user who hears this message does not know the name of the prefecture where the golf course is located, he utters “I do not know.” And inputs it to the voice interaction device 10, and the input voice is recognized by the voice recognition unit 11. Then, "I do not understand." Is selected as the recognition result.
[0031]
Next, the dialog control unit 12 outputs this result to the unknown expression word determination unit 18. The unknown expression word determination unit 18 receives this result, refers to the unknown expression word dictionary 17 having words indicating unknown as shown in FIG. 6 as an item, determines whether the word is unknown, and determines whether the word is unknown. The result is output to the dialog control unit 12. In this case, since "I do not understand." Is determined as a word representing unknown, in response to this determination result, the dialogue control unit 12 causes the dictionary selection connection unit 14 to input all the dictionaries of the golf course divided by prefecture name. Order creation of a combined dictionary. In response to this command, the dictionary selection / coupling unit 14 extracts all the golf course dictionaries classified by prefecture name from the voice recognition dictionary storage unit 13 and creates a voice recognition dictionary by combining them.
[0032]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the golf course. In response to this command, the response voice output unit 15 selects a message “Please tell us the name of the golf course” from the response voice storage unit 16 and presents it to the user. When the user who hears this message utters the name of the golf course "XX golf course." And inputs it to the voice interaction device 10, the input voice is recognized by the voice recognition unit 11, and as a recognition result. , "XX golf course." Is selected, and the search target is determined.
[0033]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to present the determined search target “XX golf course.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 with “XX golf course.” And displays a message “Display a map of golf course.” Create and present to the user.
[0034]
With the above operation, a map such as a destination to be searched can be displayed on the display screen of the navigation device.
[0035]
As described above, in the first embodiment, by providing the dictionary selecting / combining unit 14 and the unknown expression determination unit 18, the user can say, for example, “I do not know” without knowing the name of the prefecture where the golf course is located. Even if the next instruction cannot be determined because you answered the question during the dialogue, take out all the golf course dictionaries divided by prefecture name that would be used when continuing the dialogue By creating a combined voice recognition dictionary and performing voice recognition, a search target can be determined without interrupting the flow of voice dialogue. Therefore, even in the case of a response in which the user does not know the answer, the dialog can be continued without interruption, and the object can be achieved.
[0036]
Next, FIGS. 7 to 9 are diagrams showing a second embodiment of the voice interaction apparatus of the present invention. Note that the second embodiment is configured substantially in the same manner as the first embodiment described above. explain.
[0037]
First, the device configuration will be described. In FIG. 7, the voice interaction device 20 includes a voice recognition unit 11, a dialogue control unit 12, a dictionary storage unit 13, a response voice output unit 15, a response voice storage unit 16, an unknown expression word dictionary 17, and an unknown expression word dictionary. A dictionary selection unit (dictionary preparation means) 24 is provided in place of the dictionary selection and combination unit 14 in the first embodiment described above. By selecting one of the speech recognition dictionaries stored in the dictionary storage unit 13 according to a command from the control unit 12, a speech recognition dictionary used by the speech recognition unit 11 is created.
[0038]
Next, the processing operation of the speech dialogue apparatus of the present invention will be described with reference to the flowchart of the dialogue shown in FIG.
[0039]
First, in the same manner as in the first embodiment described above, a voice dialogue is started by a user's instruction, and the user responds to the message "What is your use?" When "Facility search." Is uttered and this "Facility search." Is selected as a command, a voice recognition dictionary containing words indicating a search genre and words such as "I do not understand." Once created, a message "Please tell us the type of facility." Is presented to the user.
[0040]
Then, when the user who hears the message "Please tell us the type of facility" does not understand the word indicating the genre to be searched, he utters "I do not understand." Thus, the input voice is recognized by the voice recognition unit 11, and "I do not understand" is selected as the recognition result.
[0041]
Next, the dialog control unit 12 outputs this result to the unknown expression word determination unit 18. The unknown expression word determining unit 18 receives this result, refers to the unknown expression word dictionary 17 having words indicating unknown as shown in FIG. 6 as items, and determines whether or not the word represents unknown. The result is output to the dialog control unit 12. In this case, "I don't understand." Is determined as a word indicating unknown, and the dialog control unit 12 responds to this determination result by using the dictionary selection unit 24 to narrow down the location of the facility. Command selection. In response to this command, the dictionary selection unit 24 selects a speech recognition dictionary composed of prefecture names as shown in FIG. 9 from the speech recognition dictionary storage unit 13.
[0042]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the prefecture where the facility is located. In response to this command, the response voice output unit 15 selects a message "Please tell us the name of the prefecture where the facility is located" from the response voice storage unit 16 and presents it to the user. The user who hears this message utters the name of the prefecture where the facility is located, "Chiba prefecture", and inputs it to the voice interaction device 20. The input voice is recognized by the voice recognition unit 11, and "Chiba Prefecture" is selected. In response to this result, the dialog control unit 12 instructs the dictionary selection unit 24 to select a dictionary composed of facilities of all genres in Chiba Prefecture. In response to this command, the dictionary selection unit 24 selects a dictionary composed of facilities of all genres in Chiba from the speech recognition dictionary storage unit 13.
[0043]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to output a message prompting the user to speak the name of the facility. In response to this command, the response voice output unit 15 selects a message "Please tell us the name of the facility in Chiba Prefecture" from the response voice storage unit 16 and presents it to the user. When the user who hears this message utters the name of the facility “ゴルフ golf course.” And inputs it to the voice interaction device 20, the input voice is recognized by the voice recognition unit 11, and as a recognition result, “XX golf course.” Is selected, and the search target is determined.
[0044]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to present the determined search target “XX golf course.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 with “XX golf course.” And displays a message “Display a map of golf course.” Create and present to the user.
[0045]
With the above operation, a map such as a destination to be searched can be displayed on the display screen of the navigation device.
[0046]
As described above, in the second embodiment, by providing the dictionary selecting unit 24 and the unknown expression determining unit 18, the user can be in a dialogue, for example, with “unknown” without knowing the name of the facility genre. Even if you can not determine the next instruction because you answered the question, if you know the prefecture name, by performing speech recognition using a dictionary containing facilities of all genres divided by prefecture name, The search target can be determined without interrupting the flow of the voice dialogue. Therefore, even in the case of a response in which the user does not know the answer, the dialog can be continued without interruption, and the object can be achieved.
[0047]
As another mode of the second embodiment, as shown in FIG. 10, in response to a message "Please tell us the type of facility." Is repeatedly input in response to the message "", a dictionary composed of facilities of all genres is selected in the dictionary selection unit 24, and "the name of the facility is entered." The user who hears this message utters the name of the facility "XX golf course." Is determined, and a message "Display a map of the XX golf course." Is presented to the user.
[0048]
Next, FIG. 11 to FIG. 14 are views showing a third embodiment of the voice interaction apparatus of the present invention. Since the third embodiment is configured substantially in the same manner as the above-described first embodiment, the same reference numerals are given to the same configurations using the drawings, and the characteristic portions will be described. explain.
[0049]
First, the device configuration will be described. In FIG. 11, the voice interaction device 30 includes a voice recognition unit 11, a dialogue control unit 12, a dictionary storage unit 13, a dictionary selection connection unit 14, a response voice output unit 15, and a response voice storage unit 16. In addition, an ambiguity expression word dictionary 37 and an ambiguity expression word determination unit 38 are provided in place of the unknown expression word dictionary 17 and the unknown expression word determination unit 18 in the above-described first embodiment. A part 39 is provided.
[0050]
In the vague expression word dictionary 37, words meaning vague are registered as items, and the vague expression word determination unit 38 refers to the vague expression word dictionary 37 in response to an inquiry of the dialog control unit 12, and performs voice recognition. It is determined whether the result expresses an ambiguous thing.
[0051]
The distance calculation unit 39 calculates whether or not the distance is included in the specific distance and selects a range to be searched. For example, for the designation of Tokyo, the neighboring prefectures Chiba and Saitama Prefecture, Kanagawa prefecture, Yamanashi prefecture.
[0052]
Next, the processing operation of the speech dialogue apparatus of the present invention will be described with reference to the flowchart of the dialogue shown in FIG.
[0053]
First, in the same manner as in the first embodiment described above, a voice dialogue is started by a user's instruction, and the user responds to the message "What is your use?" While saying "Facility search.", A speech recognition dictionary of a search genre as shown in FIG. 4 is created, and a message "Please tell us the type of facility" is presented to the user. When the user utters “Golf course.” And inputs it to the voice interaction device 30, the input voice is recognized by the voice recognition unit 11, and “Golf course” is selected as a search genre.
[0054]
Then, in order to narrow down the location of the golf course, the dialogue control unit 12 displays the prefecture name and the prefecture name where the user is located, such as "Kanaa." The dictionary selection / combination unit 14 is instructed to create a dictionary composed of words that may be uttered when the user only knows vaguely. In response to this command, the dictionary selection / coupling unit 14 uses the prefecture name as the main dictionary as shown in FIG. 13, and uses words such as "Kanaa." A recognition dictionary is selected from the speech recognition dictionary storage unit 13 and created.
[0055]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the prefecture where the golf course is located. In response to this command, the response voice output unit 15 selects a message "Please talk about the name of the prefecture where the golf course is located" from the response voice storage unit 16 and presents it to the user. The user who heard this message, In the Nth conversation hierarchy (where N is a natural number; the same applies hereinafter): Since the name of the prefecture where the golf course is located is ambiguous, by saying "Tokyo Kana." And inputting it to the voice interaction device 30, the input voice is recognized by the voice recognition unit 11, and as a recognition result, "Tokyo" I wonder if the city is. "
[0056]
Next, the dialog control unit 12 outputs this result to the ambiguous expression word determination unit 38. The ambiguous expression word judging unit 38 receives this result, and refers to the ambiguous expression word dictionary 37 having words indicating that it is ambiguous as shown in FIG. 14 as an item, and judges whether or not the word includes an ambiguous word. And outputs the result to the dialog control unit 12. In this case, “Kanaa.” Is determined as a word that represents ambiguity. In response to this determination result, the dialog control unit 12 calculates the distance of each prefecture to “Tokyo.” From which the words indicating ambiguity have been removed from the recognition result to the distance calculation unit 39, and determines the prefecture whose distance is close to Tokyo. Command to make a selection. In response to this command, the distance calculator 39 selects Chiba, Saitama, Kanagawa, and Yamanashi prefectures, and instructs the dictionary selection coupling unit 14 to combine the golf course dictionaries in which Tokyo is added to these four prefectures. According to this command, the dictionary selection / coupling unit 14 takes out and combines the dictionaries of the golf courses in Chiba, Saitama, Kanagawa, Yamanashi and Tokyo from the voice recognition dictionary storage unit 13, For the N + 1th conversation hierarchy Create a speech recognition dictionary.
[0057]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the golf course. In response to this command, the response voice output unit 15 selects a message “Please tell us the name of the golf course” from the response voice storage unit 16 and presents it to the user. When the user who hears this message utters the name of the golf course "XX golf course." And inputs it to the voice interaction device 30, the input voice is recognized by the voice recognition unit 11, and as a recognition result. , "XX golf course." Is selected, and the search target is determined.
[0058]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to present the determined search target “XX golf course.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 with “XX golf course.” And displays a message “Display a map of golf course.” Create and present to the user.
[0059]
With the above operation, a map such as a destination to be searched can be displayed on the display screen of the navigation device.
[0060]
As described above, in the third embodiment, by providing the dictionary selecting / combining unit 14 and the ambiguous expression word judging unit 38, the user can know, for example, only vaguely the name of the prefecture where the golf course is located. Even if you couldn't determine the next instruction because you answered the question during the conversation, create a voice recognition dictionary that combines dictionaries of golf courses in prefectures that are close to Tokyo. By performing voice recognition, a search target can be determined without interrupting the flow of voice dialogue. Therefore, even if the user gets an ambiguous response without knowing the correct answer, the dialog can be continued without interruption and the object can be achieved.
[0061]
Next, FIG. 15 to FIG. 18 are views showing a fourth embodiment of the voice interaction apparatus of the present invention. Since the fourth embodiment is configured substantially in the same manner as the above-described third embodiment, the same reference numerals are given to the same configurations using the drawings, and the characteristic portions will be described. explain.
[0062]
First, the device configuration will be described. In FIG. 15, the voice interaction device 40 includes a voice recognition unit 11, a dialogue control unit 12, a dictionary storage unit 13, a dictionary selection connection unit 14, a response voice output unit 15, a response voice storage unit 16, A concept dictionary table 47 and a similar concept selection unit 48 are provided in addition to the expression word dictionary 37 and the ambiguous expression word determination unit 38, in addition to the configuration of the third embodiment.
[0063]
The concept dictionary table 47 is set in advance by associating similar concepts as shown in FIG. 17, and the similar concept selecting unit 48 determines which one to adopt with reference to the concept dictionary table 47. It is supposed to.
[0064]
Next, the processing operation of the speech dialogue apparatus of the present invention will be described with reference to the flowchart of the dialogue shown in FIG.
[0065]
First, in the same manner as in the third embodiment described above, a voice dialogue is started by a user's instruction, and the user responds to the message "What is your need?""Facilitysearch." Is spoken, and this "facility search." Is selected as a command.
[0066]
Then, when this "facility search." Is instructed, the dialog control unit 12 narrows down the type of the facility, and the name of the facility and "Kanaa." When the user only knows the type of the facility vaguely, he instructs the dictionary selection / combination unit 14 to create a dictionary composed of words likely to be uttered. In response to this command, the dictionary selecting / combining unit 14 uses the name of the facility as the main dictionary as shown in FIG. 18 and uses words such as "Kanaa." A recognition dictionary is selected from the speech recognition dictionary storage unit 13 and created.
[0067]
Next, the dialogue control unit 12 instructs the speech recognition unit 11 to execute speech recognition using the dictionary created by the dictionary selection and combination unit 14, and instructs the response speech output unit 15 to the user. Instructs to output a message prompting the utterance of the facility type. In response to this command, the response voice output unit 15 reads from the response voice storage unit 16 "Please tell us the type of facility." For the Nth conversation hierarchy Select a message and present it to the user. The user who heard the message "Please tell us the type of facility." , In the Nth conversation hierarchy, By uttering "Kana zoo" as a word indicating the genre to be searched and inputting it to the voice interaction device 40, the input voice is recognized by the voice recognition unit 11, and as a recognition result, "Kana zoo. Is selected.
[0068]
Next, the dialog control unit 12 outputs this result to the ambiguous expression word determination unit 38. The ambiguous expression word judging unit 38 receives this result, and refers to the ambiguous expression word dictionary 37 having words indicating that it is ambiguous as shown in FIG. 14 as an item, and judges whether or not the word includes an ambiguous word. And outputs the result to the dialog control unit 12. In this case, “Kanaa.” Is determined as a word that represents ambiguity. In response to this determination result, the dialogue control unit 12 sends a similar concept selection table 48 to the concept dictionary table 47 as shown in FIG. , And the result is output to the dialogue control unit 12. The dialog control unit 12 stores this result together with the recognition result "zoo."
[0069]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the prefecture where the facility is located. In response to this command, the response voice output unit 15 selects a message "Please tell us the name of the prefecture where the facility is located" from the response voice storage unit 16 and presents it to the user. The user who hears this message utters "Osaka prefecture." And inputs it to the voice interaction device 40. The input voice is recognized by the voice recognition unit 11, and as a recognition result, "Osaka prefecture." Is selected.
[0070]
Next, the dialogue control unit 12 instructs the dictionary selection and combination unit 14 to combine the previously stored dictionaries of two genres, “zoo.” And “amusement park.” According to this command, the dictionary selection / coupling unit 14 takes out the facility dictionaries of the zoo and the amusement park in Osaka from the voice recognition dictionary storage unit 13 and combines them. For the N + 1th conversation hierarchy Create a speech recognition dictionary.
[0071]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to output a message prompting the user to speak the name of the facility. In response to this command, the response voice output unit 15 selects a message “Please tell us the name of the facility” from the response voice storage unit 16 and presents it to the user. When the user who hears this message utters the name of the amusement park “OO park.” Which is vaguely stored as the zoo, and inputs it to the voice interaction device 40, the input voice is recognized by the voice recognition unit 11. Is selected, "○ park." Is selected as the recognition result, and the search target is determined.
[0072]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to present the determined search target “OO park.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 with "OO park." To create a message "display a map of OO park." To the user.
[0073]
With the above operation, a map such as a destination to be searched can be displayed on the display screen of the navigation device.
[0074]
As described above, in the fourth embodiment, the provision of the dictionary selection / combination unit 14, the ambiguous expression word determination unit 38, and the similar concept selection unit 48 allows the user to know, for example, only vaguely about the type of facility. Even if you could not determine the next instruction because you answered the question during the dialogue, such as "Kana zoo", create a speech recognition dictionary that combines the amusement park dictionaries of facilities similar to the zoo By performing the recognition, the search target can be determined without interrupting the flow of the voice dialogue. Therefore, even if the user gets an ambiguous response without knowing the correct answer, the dialog can be continued without interruption and the object can be achieved.
[0075]
Next, FIG. 19 and FIG. 20 are views showing a fifth embodiment of the voice interaction apparatus of the present invention. Since the fifth embodiment is configured substantially in the same manner as the first embodiment described above, the same components are denoted by the same reference numerals using the drawings, and the characteristic portions will be described. explain.
[0076]
First, the device configuration will be described. In FIG. 19, the voice interaction device 50 includes a voice recognition unit 11, a dialogue control unit 12, a dictionary storage unit 13, a dictionary selection connection unit 14, a response voice output unit 15, and a response voice storage unit 16. In addition, a multiple result determination unit 58 is provided instead of the unknown expression word dictionary 17 and the unknown expression word determination unit 18 in the first embodiment, and the multiple result determination unit 58 It is determined whether there are a plurality of speech recognition results in response to the inquiry.
[0077]
Next, the processing operation of the voice dialogue apparatus of the present invention will be described with reference to the flowchart of the dialogue shown in FIG. Here, the purpose is to search for the XX golf course in Chiba Prefecture, but the memory of the location of the XX golf course is ambiguous and the user recognizes that it is in Tokyo or Chiba Prefecture. An example will be described.
[0078]
First, in the same manner as in the first embodiment described above, a voice dialogue is started by a user's instruction, and the user responds to the message "What is your use?" While saying "Facility search.", A speech recognition dictionary of a search genre as shown in FIG. 4 is created, and a message "Please tell us the type of facility" is presented to the user. When the user utters “Golf course” and inputs it to the voice interaction device 50, the input voice is recognized by the voice recognition unit 11, and “Golf course” is selected as a genre of search.
[0079]
Then, the dialog control unit 12 instructs the dictionary selection connection unit 14 to create a dictionary composed of words representing prefecture names in order to narrow down the location of the golf course. In response to this command, the dictionary selection / combination unit 14 creates a speech recognition dictionary composed of prefecture names as shown in FIG. 9 from the speech recognition dictionary storage unit 13.
[0080]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the prefecture where the golf course is located. In response to this command, the response voice output unit 15 selects a message "Please talk about the name of the prefecture where the golf course is located" from the response voice storage unit 16 and presents it to the user. The user who hears this message utters “Tokyo or Chiba Prefecture” because the name of the prefecture where the golf course is located is ambiguous and inputs it to the voice interaction device 50, so that the input voice is recognized by the voice recognition unit 11. And "Tokyo or Chiba Prefecture" is obtained as a recognition result.
[0081]
Next, the dialog control unit 12 outputs this result to the multiple result determination unit 58. The multiple result determination unit 58 receives this result, and determines that two words of Tokyo and Chiba are included in the recognition result as words representing the prefecture name. The result is output to the dialog control unit 12. In response to this determination result, the dialogue control unit 12 instructs the dictionary selection and combination unit 14 to create a dictionary in which the dictionary of the golf course in Tokyo and the dictionary of the golf course in Chiba are combined. In response to this command, the dictionary selection / combination unit 14 extracts the dictionary of the golf course in Tokyo and the dictionary of the golf course in Chiba from the speech recognition dictionary storage unit 13 and combines them to create a speech recognition dictionary.
[0082]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the golf course. In response to this command, the response voice output unit 15 selects a message “Please tell us the name of the golf course” from the response voice storage unit 16 and presents it to the user. When the user who hears this message utters the name of the golf course “XX golf course.” And inputs it to the voice interaction device 50, the input voice is recognized by the voice recognition unit 11, and as a recognition result. , "XX golf course." Is selected, and the search target is determined.
[0083]
Next, the dialogue control unit 12 instructs the response voice output unit 15 to present the determined search target “XX golf course.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 with “XX golf course.” And displays a message “Display a map of golf course.” Create and present to the user.
[0084]
With the above operation, a map such as a destination to be searched can be displayed on the display screen of the navigation device.
[0085]
As described above, in the fifth embodiment, by providing the dictionary selecting / combining unit 14 and the multiple-result determining unit 58, the user does not need to know the name of the prefecture where the golf course is located. Even if the answer to the question during the dialogue was answered and the next instruction could not be determined, a dictionary for the golf course in Tokyo and a dictionary for the golf course in Chiba was taken out and combined to create a speech recognition dictionary. By performing voice recognition, a search target can be determined without interrupting the flow of voice dialogue. Therefore, even if the user gets an ambiguous response without knowing the correct answer, the dialog can be continued without interruption and the object can be achieved.
[0086]
Next, FIGS. 21 and 22 are views showing a sixth embodiment of the voice interaction apparatus of the present invention. Since the sixth embodiment is configured substantially in the same manner as the above-described fifth embodiment, the same reference numerals are given to the same configurations using the drawings, and the characteristic portions will be described. explain.
[0087]
First, the device configuration will be described. 21, the voice interaction device 60 includes a voice recognition unit 11, a dialog control unit 12, a dictionary storage unit 13, a response voice output unit 15, a response voice storage unit 16, and a multiple result determination unit 58. In addition, the dictionary selection unit 24 employed in the second embodiment is employed in place of the dictionary selection / combination unit 14 in the fifth embodiment. In addition, an input storage unit 61 is provided. I have.
[0088]
Here, the voice recognition unit 11 recognizes the voice input by the user and simultaneously outputs the result of acoustic analysis of the input voice to the input storage unit 61. The input storage unit 61 An input voice output from the voice recognition unit 11 or an acoustic analysis result of the input voice is stored.
[0089]
The dictionary selecting unit 24 creates one of the speech recognition dictionaries used by the speech recognition unit 11 by selecting one of the speech recognition dictionaries stored in the dictionary storage unit 13 according to a command from the dialog control unit 12.
[0090]
Next, the processing operation of the speech dialogue apparatus of the present invention will be described with reference to the flowchart of the dialogue shown in FIG.
[0091]
First, in the same manner as in the fifth embodiment described above, a voice dialogue is started by a user's instruction, and in response to a message "What is your need?" The user utters "Facility search.", And then utters "Golf course" in the search genre shown in FIG. 4 in response to the message "Please tell us the type of facility." Since it is "golf course.", A speech recognition dictionary of a search genre as shown in FIG. 9 is created and a message "Please speak the name of the prefecture where the golf course is located" is presented to the user. , The user , In the Nth conversation hierarchy, Since the name of the prefecture where the golf course is located is ambiguous, by saying "Tokyo or Chiba Prefecture" and inputting it to the voice interaction device 50, the input voice is recognized by the voice recognition unit 11, and as a recognition result. , "Tokyo or Chiba Prefecture."
[0092]
Then, the multiple result determination unit 58 determines that the recognition result by the voice recognition unit 11 includes two words representing the name of the prefecture, Tokyo and Chiba. First instructs the dictionary selection unit 24 to select and create a dictionary for a golf course in Tokyo. In response to this command, the dictionary selection unit 24 extracts the dictionary of the golf course in Tokyo from the speech recognition dictionary storage unit 13 For the N + 1th conversation hierarchy Create a speech recognition dictionary.
[0093]
Next, the dialog control unit 12 instructs the response voice output unit 15 to output a message prompting the user to utter the name of the golf course. In response to this command, the response voice output unit 15 selects a message “Please tell us the name of the golf course” from the response voice storage unit 16 and presents it to the user.
[0094]
Next, the dialogue control unit 12 instructs the speech recognition unit 11 to execute speech recognition using the dictionary created by the dictionary selection unit 24. When the user who has heard the previous message utters the name of the golf course “XX golf course.” And inputs it to the voice interaction device 50, the input voice is recognized by the voice recognition unit 11, and the recognition result is obtained. At the same time as "XX golf course." Is obtained, and the input of "OO golf course." Spoken by the user is input in the form of an input voice or in the form of the result of acoustic analysis of the input voice. It is output to the unit 61 and stored.
[0095]
Next, the dialog control unit 12 instructs the response voice output unit 15 to present the recognition result “XX golf course.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 and “XX golf course.” To create and use the message “XX golf course?” To the person.
[0096]
Next, the dialog control unit 12 instructs the speech recognition unit 11 to execute speech recognition using the dictionary created by the dictionary selection unit 24. On the other hand, when the user utters “No” because the result is not the expected result and inputs it to the voice interaction device 50, the input voice is recognized by the voice recognition unit 11, and the recognition result is “No”. . "Is obtained.
[0097]
Next, the dialogue control unit 12 instructs the dictionary selection unit 24 to create a dictionary of the golf course in Chiba Prefecture because the previous selection dialogue in Tokyo was incorrect. In response to this command, the dictionary selecting unit 24 extracts the dictionary of the golf course in Chiba from the voice recognition dictionary storage unit 13 and creates a voice recognition dictionary.
[0098]
Next, the dialog control unit 12 instructs the voice recognition unit 11 to take out the previous input from the input storage unit 61 and execute voice recognition using the dictionary created by the dictionary selection unit 24, "XX golf course." Is obtained as a recognition result.
[0099]
Next, the dialog control unit 12 instructs the response sound output unit 15 to present “XX golf course.” To the user. In response to this command, the response voice output unit 15 combines the content stored in the response voice storage unit 16 with “XX golf course.” And displays a message “Display a map of golf course.” Create and present to the user.
[0100]
With the above operation, a map such as a destination to be searched can be displayed on the display screen of the navigation device.
[0101]
As described above, in the sixth embodiment, by providing the dictionary selecting unit 24, the multiple-result determining unit 58, and the input storing unit 61, the user can easily find out, for example, the name of the prefecture where the golf course is located. Or Chiba prefecture. ”Even if the answer to the question during the dialogue could not be determined, the results of speech recognition in the Tokyo Metropolitan golf course dictionary and the Chiba prefecture golf course dictionary By sequentially obtaining the speech recognition results and presenting the recognition results, the search target can be determined without interrupting the flow of the speech dialogue. Therefore, even when the user gets an ambiguous response without knowing the correct answer, the dialog can be continued without interruption, and the object can be achieved.
[0102]
【The invention's effect】
As described above, according to the present invention, when the content of the voice of the user whose voice has been recognized is a content in which the next instruction cannot be determined, a voice recognition dictionary effective for continuing the conversation is stored in the dictionary storage unit. Since the preparation is made, the dialogue can be continued by outputting the question voice and the response voice. Therefore, it is possible to provide a speech dialogue device having an excellent effect that a dialogue can be continued without interruption even if a user cannot respond or only an ambiguous answer can be achieved, and an object can be achieved.
[Brief description of the drawings]
FIG. 1 is a related block diagram illustrating a schematic overall configuration of a first embodiment of a voice interaction device according to the present invention;
FIG. 2 is a dialog flow diagram illustrating a processing operation according to the first embodiment of the voice interaction device of the present invention;
FIG. 3 is a conceptual diagram showing a speech recognition dictionary used by the first embodiment of the speech dialogue device of the present invention.
FIG. 4 is a conceptual diagram showing a speech recognition dictionary used by the first embodiment of the speech dialogue device of the present invention.
FIG. 5 is a conceptual diagram showing a speech recognition dictionary used by the first embodiment of the speech dialogue device of the present invention.
FIG. 6 is a conceptual diagram showing a speech recognition dictionary used by the first embodiment of the speech dialogue device of the present invention.
FIG. 7 is a related block diagram showing a schematic overall configuration of a second embodiment of the voice interaction device of the present invention.
FIG. 8 is a dialog flow diagram illustrating a processing operation according to a second embodiment of the voice interaction device of the present invention.
FIG. 9 is a conceptual diagram showing a speech recognition dictionary used by a second embodiment of the speech dialogue device of the present invention.
FIG. 10 is a dialog flow chart for explaining a processing operation showing another embodiment of the second embodiment of the voice dialog device of the present invention;
FIG. 11 is a related block diagram showing a schematic overall configuration of a third embodiment of the voice interaction device of the present invention.
FIG. 12 is a dialog flow diagram illustrating a processing operation according to a third embodiment of the voice dialog device of the present invention;
FIG. 13 is a conceptual diagram showing a speech recognition dictionary used by a third embodiment of the speech dialogue device of the present invention.
FIG. 14 is a conceptual diagram showing a speech recognition dictionary used by a third embodiment of the speech dialogue device of the present invention.
FIG. 15 is a related block diagram illustrating a schematic overall configuration of a fourth embodiment of the voice interaction device of the present invention.
FIG. 16 is a dialog flow diagram illustrating a processing operation according to a fourth embodiment of the voice dialog device of the present invention;
FIG. 17 is a conceptual diagram showing a speech recognition dictionary used in a fourth embodiment of the speech dialogue device of the present invention.
FIG. 18 is a conceptual diagram showing a speech recognition dictionary used in a fourth embodiment of the speech dialogue device of the present invention.
FIG. 19 is a related block diagram illustrating a schematic overall configuration of a fifth embodiment of the voice interaction device of the present invention.
FIG. 20 is a dialog flow chart for explaining the processing operation of the voice interaction device according to the fifth embodiment of the present invention;
FIG. 21 is a related block diagram illustrating a schematic overall configuration of a sixth embodiment of the voice interaction device of the present invention.
FIG. 22 is a dialog flowchart for explaining a processing operation according to a sixth embodiment of the voice dialog device of the present invention;
FIG. 23 is a dialog flow diagram for explaining a processing operation according to the related art.
[Explanation of symbols]
10-60 voice dialogue device
11 Voice Recognition Unit
12 Dialogue control unit
13 Voice recognition dictionary storage
14 Dictionary selection unit
15 Response voice output unit
16 Response voice storage
17 Unknown expression dictionary
18 Unknown expression word judgment unit
24 Dictionary Selector
37 Ambiguous Expression Word Dictionary
38 Ambiguous expression word judgment unit
39 Distance calculator
47 Concept Dictionary Table
48 Similarity concept selector
58 Multiple Result Determination Unit
61 Input storage unit

Claims

Voice recognition means for recognizing words corresponding to the utterance,
Response voice output means for outputting a response voice for urging the user to utter;
Dictionary storage means for storing a speech recognition dictionary that is distinguished for each dialog hierarchy and has relevance between the hierarchies;
Dictionary preparation means for preparing a speech recognition dictionary used for a dialogue among the speech recognition dictionaries stored in the dictionary storage means;
When the word recognized by the speech recognition means in the Nth conversation hierarchy is a combination of an ambiguous expression word and another word that is not an ambiguous expression word, the speech recognition dictionary used in the (N + 1) th interaction hierarchy is The speech recognition dictionary is controlled to prepare a speech recognition dictionary corresponding to another word and a word having a predetermined relationship with the other word, and a response voice for shifting to the (N + 1) th conversation hierarchy is output. And a dialogue control means for controlling the response voice output means .

Voice recognition means for recognizing words corresponding to the utterance,
Response voice output means for outputting a response voice for urging the user to utter;
Dictionary storage means for storing a speech recognition dictionary that is distinguished for each dialog hierarchy and has relevance between the hierarchies;
Dictionary preparation means for preparing a speech recognition dictionary used for conversation among speech recognition dictionaries stored in the dictionary storage means,
When there are a plurality of words recognized by the voice recognition means in the Nth conversation hierarchy, the response speech output means is controlled to output a response speech for shifting to the (N + 1) th conversation hierarchy. Preparing a voice recognition dictionary corresponding to one of the plurality of words as a voice recognition dictionary used in the (N + 1) th conversation hierarchy, and performing voice recognition using the prepared voice recognition dictionary by the voice recognition means. A dialogue control unit for controlling the dictionary preparation unit so as to newly prepare a speech recognition dictionary corresponding to another word when the operation fails.