JP2004258233A

JP2004258233A - Adaptive speech interactive system and method

Info

Publication number: JP2004258233A
Application number: JP2003048021A
Authority: JP
Inventors: Ryosuke Miyata; 亮介宮田; Toshiyuki Fukuoka; 俊之福岡; Hideshi Kitagawa; 英志北川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-02-25
Filing date: 2003-02-25
Publication date: 2004-09-16
Anticipated expiration: 2023-02-25
Also published as: JP4223832B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an adaptive speech interactive system and method for judging the degree of proficiency of a user to an application while judging the state of the user and for outputting a proper guidance. <P>SOLUTION: The adaptive speech interactive system inputs a user's speech, extracts a command by recognizing the inputted speech, generates a system response that the system outputs on the basis of the command in a 1st timing where the user inputs the speech or a 2nd timing a specified time after the speech input by the user, and outputs the speech in response to an indication from an interaction control part. The generated system response includes a guidance regarding contents that the user should input, and the output state of the guidance and the state of speech input by the user are recorded in a state recording database to change the guidance according to the state by referring to the state recording database. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、人間とコンピュータが音声を用いて対話する適応型音声対話システム及び方法に関する。特に、使用する人間のシステムに対する慣れに応じてガイダンス出力を変化させる適応型音声対話システム及び方法に関する。
【０００２】
【従来の技術】
近年におけるＩＴ化の急速な進展に伴って、音声を用いた対話インタフェースが各種アプリケーションで活用されている。特に、車の運転中のように、操作のために手を使うことができず、かつ視線をそらすことができないような状況下においては、アプリケーション操作は、アイズ・フリーあるいはハンズフリーであることが重要な要素となる。つまり、アプリケーション機器の操作に手や目を奪われて、運転操作自体が妨げられるようなことは、安全上必ず回避しなければならない。
【０００３】
そこで、かかる状況下で使用される可能性の高いアプリケーションには、積極的に音声を用いたインターフェースが採用されてきている。音声による対話インタフェースは、たとえ運転中であっても比較的注意を集中しやすく、運転操作を妨げることなくアプリケーション操作を行うことが可能である。
【０００４】
しかし、音声を用いるインタフェースであっても、出力される音声の内容によっては、注意深く聞いたり、あるいは記憶しておく必要が生じる。このような場合、例えば運転者は集中力や注意力を削がれることになり、安全上好ましくない。
【０００５】
そこで、このような問題を解決するために、様々な工夫がなされている。例えば、（特許文献１）においては、ユーザのアプリケーションに対する慣れを、当該アプリケーションへのアクセス頻度から推定し、慣れの程度に応じて音声案内の内容を自動的に変更する音声案内装置が開示されている。
【０００６】
あるいは、（特許文献２）では、ユーザの使用アプリケーションに対する熟練度を評価し、当該熟練度に基づいてガイダンスを選択する音声対話装置が開示されている。熟練度の判断基準としては、ユーザによる音声入力がなされるまでの応答時間や入力結果に対する修正回数等が用いられている。
【０００７】
また（特許文献３）では、応答時間のみで判断できない要素を加味するために、不要語や言いよどみ等を検出することで、ユーザがアプリケーションに対して不慣れであるか否かを判断し、ガイダンスを変更する音声応答装置が開示されている。
【０００８】
【特許文献１】
特開２００１−２２３７０号公報
【０００９】
【特許文献２】
特開平１０−２０８８４号公報
【００１０】
【特許文献３】
特開２００１−３３１１９６号公報
【００１１】
【発明が解決しようとする課題】
しかし、上述したような音声対話システムにおいては、ユーザによる応答入力があることを前提としており、手が離せない等の緊急の状況に遭遇したユーザについても、応答時間がかかってしまったために習熟度が低いユーザであると判断され、習熟度が高いユーザに対して初心者用のガイダンスが出力されてしまうという問題点があった。
【００１２】
また、従来の音声対話インターフェースでは、応答音声の出力中に割り込んでユーザが応答を音声入力することができず、習熟者にとっては使いにくいインタフェースとなっているという問題点もあった。
【００１３】
本発明は、上記問題点を解決するために、ユーザの状況を判断しながらアプリケーションに対する習熟度を判定でき、適切なガイダンスを出力することができる適応型音声対話システム及び方法を提供することを目的とする。
【００１４】
【課題を解決するための手段】
上記目的を達成するために本発明にかかる適応型音声対話システムは、ユーザの音声を入力する音声入力部と、入力された音声を認識してコマンドを抽出するコマンド抽出部と、ユーザが音声を入力した第一のタイミングにおいて、コマンドに基づいてシステムが出力するシステム応答を生成する対話制御部と、対話制御部からの指示によって音声による出力を行う音声出力部とを含む適応型音声対話システムであって、生成されたシステム応答にユーザが応答するべき内容に関するガイダンスを含み、ガイダンスの出力状況及びユーザによる音声入力状況を状況記録データベースに記録し、状況記録データベースを参照することで、ガイダンスを状況に応じて変化させることを特徴とする。
【００１５】
かかる構成により、ユーザのガイダンスに対する応答状況に基づいて、次回のガイダンスをどのような内容で出力するのかを制御することができ、ユーザの個々の状況に応じたガイダンスを含む応答を行うことが可能となる。
【００１６】
また、本発明にかかる適応型音声対話システムは、対話制御部において、ユーザからの応答無しに所定時間経過した第二のタイミングにおいても、コマンドに基づいてシステムが出力するシステム応答を生成することが好ましい。
【００１７】
また、本発明にかかる適応型音声対話システムは、コマンドが主部と引数とで構成され、状況記録データベースに引数の利用頻度も記録し、ユーザが音声入力したコマンドにおいて引数が省略されている場合に、抽出されたコマンドの主部に対して、引数の利用頻度に応じて引数を補うことが好ましい。引数が入力されていない場合であっても、引数の利用頻度に応じて補うことができ、無駄な応答やガイダンスを省略することができるからである。
【００１８】
また、本発明にかかる適応型音声対話システムは、複数のユーザに関する状況記録データベースの記録内容を集計する記録集計部をさらに含み、記録集計部における集計結果に応じて、対話制御部が、ガイダンス出力の有無、及びガイダンスの内容を決定することが好ましい。音声対話システムを使い始めたばかりのユーザであっても、ある程度有効なガイダンスを出力することができるからである。
【００１９】
また、本発明は、上記のような適応型音声対話システムの機能をコンピュータの処理ステップとして実行するソフトウェアを特徴とするものであり、具体的には、ユーザの音声を入力する工程と、入力された音声を認識してコマンドを抽出する工程と、ユーザが音声を入力した第一のタイミングにおいて、コマンドに基づいてシステムが出力するシステム応答を生成する工程と、音声による出力を行う工程とを含む適応型音声対話方法であって、生成されたシステム応答にユーザが応答するべき内容に関するガイダンスを含み、ガイダンスの出力状況及びユーザによる音声入力状況を状況記録データベースに記録し、状況記録データベースを参照することで、ガイダンスを状況に応じて変化させる適応型音声対話方法並びにそのような工程を具現化するコンピュータ実行可能なプログラムであることを特徴とする。
【００２０】
かかる構成により、コンピュータ上へ当該プログラムをロードさせ実行することで、ユーザのガイダンスに対する応答状況に基づいて、次回のガイダンスをどのような内容で出力するのかを制御することができ、ユーザの個々の状況に応じたガイダンスを含む応答を行うことができる適応型音声対話システムを実現することが可能となる。
【００２１】
また、本発明にかかる適応型音声対話システムを具現化するコンピュータ実行可能なプログラムは、上述した適応型音声対話方法におけるシステム応答を生成する工程において、ユーザからの応答無しに所定時間経過した第二のタイミングにおいても、コマンドに基づいてシステムが出力するシステム応答を生成することが好ましい。
【００２２】
【発明の実施の形態】
以下、本発明の実施の形態にかかる適応型音声対話システムについて、図面を参照しながら説明する。図１は本発明の実施の形態１にかかる適応型音声対話システムの構成図である。
【００２３】
図１において、１１はユーザの音声を入力する音声入力部を示しており、入力媒体としてはマイクロホン等が考えられる。１２は、入力された音声を認識してコマンドを抽出するコマンド抽出部を示しており、入力された音声を認識することによって入力された内容を認識して、含まれているアプリケーションを制御するためのコマンドを抽出する。
【００２４】
認識できるコマンドと、当該コマンドがどのような引数を求められるか、といった「文法」に関する情報は、あらかじめ登録してあるコマンド／文法データベース１４を、コマンド抽出部１２が参照する。コマンド抽出部１２は、認識結果と合致するコマンド、及び引数と考えられる内容を対話制御部１３へと送る。
【００２５】
またコマンド抽出部１２は、認識結果と合致するコマンドがコマンド／文法データベース１４に存在しない場合にはその旨を対話制御部１３に通知する。また、認識開始時に設定された所定のタイムアウト時間を過ぎてもユーザからの入力が無かった場合には、無入力であった旨についても対話制御部１３に通知する。
【００２６】
なお、音声入力の認識方法については、既存の音声認識方法であれば何でも良く、特に限定されるものではない。
【００２７】
そして、対話制御部１３では、ユーザが音声を入力した第一のタイミングもしくはユーザによる音声入力から所定のタイムアウト時間が経過した第二のタイミングにおいて、抽出されたコマンドに基づいて、ユーザに提示するべき応答を生成する。
【００２８】
ユーザにより音声が入力される第一のタイミングとしては、音声出力の開始と同時、あるいは音声出力開始後しばらくしてから、又は音声出力の完了と同時等、様々なタイミングが考えられる。
【００２９】
例えば、システム側のガイダンスとしての音声出力の開始と同時にユーザが音声の入力を開始した場合、システムによる音声出力の途中であってもユーザからの音声入力を受け付ける必要が生じる。このように、音声出力途上においてユーザからの音声入力を受け付けた場合、音声出力を中断することも考えられるし、音声入力の認識結果に応じて出力を続行することも考えられる。
【００３０】
また、音声入力の開始時には、通常タイムアウト時間が設定されている。したがって、ユーザによる応答が全く無い場合であっても、タイムアウト時間経過時という第二のタイミングにおいて、ユーザに提示するべき応答を生成することができる。
【００３１】
そして、ユーザからの入力があったら、あるいはユーザからの入力が無いまま当該タイムアウト時間が経過したら、対話制御部１３は、入力されたコマンドあるいは無入力であったという情報に基づいて、以下の応答を生成する。
【００３２】
次に、生成される応答には、ユーザが入力するべき音声のガイダンス情報が含まれている。そして、ガイダンスの出力状況や、当該ガイダンスに対するユーザの音声入力状況を状況記録データベース１４に記録する。
【００３３】
コマンド／文法データベース１４には、各コマンドごとに、コマンドの文法に関するガイダンスを登録しておく。ガイダンスは録音した音声であっても良いし、音声合成に用いられるテキストデータであっても良い。図２に、テキストデータを用いたコマンド／文法データベース１４におけるガイダンス登録例を示す。
【００３４】
また、状況記録データベース１５には、コマンドごとにガイダンス出力状況とコマンド呼び出し状況が記録される。図３は、状況記録データベース１５の一例である。
【００３５】
図３の例において、まずガイダンス出力状況については、出力した回数を記録しておき、対話制御部１３が当該コマンドについてのガイダンスを出力するたびに回数を累積する。そして、当該ガイダンスの出力回数が所定のしきい値を超えた場合には、ユーザが当該コマンドに習熟したものと判断して、ガイダンスの出力を停止する。
【００３６】
また、最後に出力を行った日時を記録しておき、過去一定期間内に出力を行っている場合には、ガイダンスを出力しないようにすることも考えられる。この場合、設定されている一定の期間が経過すると、再びガイダンスを行うようにすることが好ましい。
【００３７】
また、ガイダンス出力回数の累積値と最終出力日時を併用することで、ガイダンスの出力回数が一定の回数を超えたらガイダンスを止めるが、最終出力から一定期間経過したら累積カウンタをリセットし、再びガイダンスを行うようにすることも可能となる。
【００３８】
次に、コマンド呼び出し状況としては、コマンド呼び出し回数を記録している。ユーザが当該コマンドを呼び出すたびにコマンド呼び出し回数を累積する。ユーザによるコマンド呼び出しが所定の回数を超えた場合には、ユーザが当該コマンドに習熟したものと判断して、ガイダンスの出力を停止する。
【００３９】
さらに、当該コマンドの最終呼び出し日時を記録しておき、最終呼び出しから一定の期間が経過したら呼び出し回数の累積カウンタをリセットし、再びガイダンスを行うようにすることも考えられる。
【００４０】
また、全てのコマンドについてガイダンスを行うとシステムの発話量が多くなり過ぎるという問題を避けるため、ガイダンスを行う数を制限することも考えられる。例えば、一回の発話、あるいは一定期間、の中でガイダンスするコマンドの数を一定数に制限することで実現可能となる。
【００４１】
さらに、状況記録データベース１５の内容をユーザごとに保存しておくことによって、ユーザが再び対話を開始したときに前回の状況を引き継いで対話を行うこともできる。
【００４２】
最後に音声出力部１６では、対話制御部１３からの指示によって音声による出力を行う。テキストデータを与えられ、当該テキストデータを音声合成によって音声に変換して出力する場合もあるし、ファイル識別子を与えられ、当該ファイル識別子に対応する音声ファイルを再生する場合もある。
【００４３】
次に、本発明の実施の形態１にかかる適応型音声対話システムを実現するプログラムの処理の流れについて説明する。図４に本発明の実施の形態１にかかる適応型音声対話システムを実現するプログラムの処理の流れ図を示す。
【００４４】
図３において、まずユーザによる音声が入力され（ステップＳ３０１）、入力された音声を認識してコマンド／文法データベース１４を照会する（ステップＳ３０２）。そして、認識結果と合致するコマンドが存在した場合には（ステップＳ３０３：Ｙｅｓ）、当該コマンド及び引数と考えられる内容が抽出され対話制御部１３に渡される（ステップＳ３０４）。
【００４５】
認識結果と合致するコマンドが存在しない場合には（ステップＳ３０３：Ｎｏ）、その旨を対話制御部１３に通知し、ユーザの再入力待ちとなる。
【００４６】
そして、ユーザが音声を入力した第一のタイミングもしくはユーザによる音声入力から所定のタイムアウト時間が経過した第二のタイミングにおいて、抽出されたコマンドに基づいて、状況記録データベース１５を照会する（ステップＳ３０５）。
【００４７】
そして、状況記録データベース１５におけるコマンドごとのガイダンス出力状況及びコマンド呼び出し状況に応じて、ガイダンスを含めた応答が生成され（ステップＳ３０６）、合成音声として出力される（ステップＳ３０７）。
【００４８】
以上のように本実施の形態１によれば、システムによる応答出力の中で、ユーザが入力すべきコマンドの文法、すなわち呼び出し方がガイダンスされ、ユーザは応答出力音声を聞きながら、当該音声対話システムに対するコマンドの呼び出し方を修得することができる。
【００４９】
また、同じガイダンス出力は、一定頻度に抑制することができ、延々と繰り返されることを回避することができる。また、ユーザが当該コマンドの呼び出し方を覚えて直接当該コマンドを呼び出す場合には、当該コマンドのガイダンス出力は行われない。さらに、ガイダンス出力が行われなくなってから、あるいはコマンドを呼び出さなくなってから、所定の時間経過すると、再びガイダンスが行われるようになる。このように、ユーザのコマンド習得の状況に対して適応的にガイダンス出力を行うことができるようになる。
【００５０】
ガイダンスの選択方法には様々な方法が考えられる。例えば、コマンドに対して優先順位を付けておき、優先順位が上位のコマンドから順にガイダンスを行うかどうかの判定を行い、上位のコマンドのガイダンスを行わない場合にのみ下位のコマンドのガイダンスを行うようにする方法が考えられる。
【００５１】
このようにすることで、基本的なコマンドほど優先順位を上位に設定しておくことで、最初は基本的なコマンドについてのガイダンスを行い、当該ガイダンスがユーザにとって不必要であると判断されたら、より高度なコマンドのガイダンスを行う、というように段階的にガイダンスを行うことが可能となる。なお、優先順位が同じコマンドが複数設定されていても良い。
【００５２】
具体的には、例えば「停止」コマンドの方が「再生」コマンドよりも優先順位が上位に設定されている場合には、まず「停止」コマンドのガイダンスが優先されて出力される。その後、ユーザが実際に「停止」コマンドを呼び出し、「停止」コマンドのガイダンスが必要なくなった後に、「再生」コマンドのガイダンスが行われるようになる。
【００５３】
また、コマンドを階層化しておくことも考えられる。例えば、「カーナビ」コマンドに対して、サブコマンド「目的地設定」、「渋滞情報」、「所要時間情報」を準備しておく。この場合の対話の状況を図５に示す。図５において、‘Ｕ’はユーザによる音声入力を、‘Ｓ’はシステムによる応答出力（ガイダンス）を示している。
【００５４】
図５のように、まずユーザが「カーナビ」コマンドを呼び出す。対話制御部１３は、各サブコマンドについて問い合わせを行う。すなわち、次にサブコマンド「目的地設定」を利用するかどうかについての問い合わせを出力する。かかるガイダンスでは、ユーザが音声入力として「はい」もしくは「いいえ」等の肯定もしくは否定の回答のみを受け付けるようガイダンスする。
【００５５】
「いいえ」等の否定回答をユーザから受け付けた、あるいはガイダンスを出力してから所定の時間が経過した場合には、図５（ａ）のように、次のサブコマンド「渋滞情報」について同様の処理を行う。
【００５６】
「はい」等の肯定回答をユーザから受け付けた場合には、図５（ｂ）のように、当該サブコマンドに基づいて応答生成処理を行う。なお、図５（ｂ）のように、サブコマンド「目的地設定」が直接呼び出されたのではなく、コマンド「カーナビ」から間接的に呼び出された場合には、状況記録データベース１５に、コマンド間接呼び出し状況として呼び出された回数の累積値を記録する。このときの状況記録データベース１５の例を図６に示す。
【００５７】
そして、状況記録データベース１５に累積されている間接呼び出しの回数が所定の回数に到達したら、直接呼び出しを行うためのガイダンスを応答として生成する。当該ユーザは、間接呼び出しされたコマンドに習熟していると判断できるからである。
【００５８】
そして、状況記録データベース１５において、「カーナビ」のコマンド呼び出し状況が更新されると同時に、「目的地設定」のコマンド間接呼び出し状況も更新される。間接呼び出しの判断回数が‘３’とすると、コマンド「目的地設定」については所定の回数に到達していることから、システムはコマンド「目的地設定」の直接呼出し（通常のコマンド呼び出し）のガイダンスを行うことになる。
【００５９】
さらにコマンド「目的地設定」の直接呼び出しが所定の回数行われたら、当該コマンドのガイダンスを止めることも考えられる。ユーザにとっての習熟度がかなり高いと判断できるからである。
【００６０】
また、ガイダンスも直接呼出しも何度か行われているにも関わらず、再度間接呼び出しが行われた場合は、ユーザが当該コマンドを忘れたものと判断できる。したがって、かかる場合には、ガイダンス出力状況の累積値をリセットし、再びガイダンスを行うようにすることも考えられる。
【００６１】
一方、同じコマンドであっても、ガイダンスの内容自体を変化させることも考えられる。例えば、一つのコマンドに対してガイダンスを丁寧なものから簡略なものまで複数用意しておき、ガイダンスの出力回数が増えるに伴い、より簡略なものへとガイダンスを切り替えて出力する処理も考えられる。
【００６２】
例えば、１回目のガイダンスとしては「天気予報をご利用の際は『天気』または『天気予報』と言って下さい。」と、２回目のガイダンスとしては「天気予報をご利用の際は『天気』と言って下さい。」と、３回目のガイダンスとしては「天気予報は『天気』で呼び出せます。」というように、回数を重ねるごとに簡略にガイダンスを出力するよう内容を変化させる。
【００６３】
あるいは、一つのコマンドに対して別の呼び出し方をガイダンスすることもできる。例えば、コマンドの呼び出し回数が増えたら、より複雑な呼び出し方をガイダンスするものである。具体的には、１回目あるいは２回目のガイダンスでは、「天気予報をご利用の際は『天気』と言って下さい。」とガイダンス出力するのに対し、３回目以降のガイダンスでは、「天気予報をご利用の際は『大阪の天気』のように言って下さい。」とガイダンス出力する。なお、ガイダンスの文面を切り替えることに特に限定されるものではなく、例えばガイダンスを読み上げる速度を変化させるものであっても良い。
【００６４】
（実施の形態２）
以下、本発明の実施の形態２にかかる適応型音声対話システムについて、図面を参照しながら説明する。本発明の実施の形態２にかかる適応型音声対話システムの構成図は実施の形態１と同様に図１に示す構成となる。
【００６５】
本実施の形態２においては、状況記録データベース１５の記録データ構成が相違する。図７に本発明の実施の形態２にかかる適応型音声対話システムにおける状況記録データベース１５のデータ構成例示図を示す。
【００６６】
図７に示すように、コマンドが引数を持つコマンドである場合、当該引数の呼び出し頻度についても状況記録データベース１５における記録対象としている点に特徴を有する。
【００６７】
例えば、コマンド「天気」は、場所や日時の情報を引数として持つことができる。したがって、「天気」、「明石の天気」、「明日の神戸の天気」等は、全てコマンド「天気」の呼び出し文法であり、それぞれ場所や日時という引数が与えられている。
【００６８】
そして、実施の形態１と同様にコマンドごとのガイダンス出力とコマンド呼び出しの状況を記録しているのに加えて、引数を持つコマンドそれぞれについて引数の頻度も記録する。例えば図７においては、引数を持つコマンド「天気」について、単純に与えられた引数の回数を記録している。
【００６９】
そして、例えばコマンド「天気」が引数なしで呼び出された場合において、引数の頻度に応じて自動的に引数を補う。すなわち、特定の引数が用いられる頻度が特に高いと判断される場合等に、当該引数を自動的に設定してガイダンスを出力する。逆に特に頻度の高い引数が無い場合には、デフォルトの引数をあらかじめ設定しておくことで、ガイダンスにデフォルトの引数を含めることが可能となる。
【００７０】
例えば、ユーザが「明石の天気」という呼び出しを頻繁に行い、かつ「今日」以外の日時を指定することがほとんどなかった場合、状況記録データベース１５においては、引数「明石」の呼び出し回数が所定のしきい値よりも大きく、引数「今日」についても同様の状況となっている。この場合には、ユーザによるコマンド「天気」だけの呼び出しに対して、引数「今日」と「明石」を補って、ガイダンスとして「今日の明石の天気は晴れです」というように出力されることになる。
【００７１】
以上のように本実施の形態２によれば、ユーザによるコマンドごとの引数の呼び出し回数についても累積値を記録しておくことで、ユーザが引数なしでコマンドを呼び出した場合であっても、効果的なガイダンスを行うことが可能となる。
【００７２】
（実施の形態３）
以下、本発明の実施の形態３にかかる適応型音声対話システムについて、図面を参照しながら説明する。図８に本発明の実施の形態３にかかる適応型音声対話システムの構成図を示す。図８においては、複数のユーザが用いる音声対話システム８１ごとに状況記録データベース１５を形成し、それぞれの累積値を記録集計部８２で集約する点に特徴を有している。
【００７３】
すなわち、本実施の形態３においては、ユーザごとではなく、複数のユーザの値が集計された状況記録データベースを用いて対話制御部１３がガイダンスを生成することになる。
【００７４】
例えば、ほとんどのユーザで利用されていないコマンドのガイダンス出力頻度を低くしたり、利用頻度の低いコマンドのガイダンス出力は簡易化したり、あるいは多くのユーザが利用するコマンドの優先順位を高くする等の制御を行うことによって、ガイダンス出力の有無やガイダンスの内容等を変化させる。このようにすることで、特に当該音声対話システムを使い始めたばかりのユーザについても、比較的有効なガイダンス出力を行うことが可能となる。
【００７５】
なお、記録集計部８２において、複数のユーザの管理下にある状況記録データベース１５の内容を集計するためには、図８の構成のように状況記録データベース１５と記録集計部８２をサーバに設置し、各々の対話制御部１３からネットワークを介して状況記録データベース１５を更新するものであっても良いし、記録集計部８２のみをサーバに設置し、各状況記録データベース１５の内容自体をネットワークを介して集計するものであっても良い。また、ユーザを識別して状況記録データベース１５を切り替えることによって、一つの音声対話システムに対して複数の状況記録データベースを持たせる構成であっても良い。
【００７６】
なお、本発明の実施の形態にかかる適応型音声対話システムを実現するプログラムは、図９に示すように、ＣＤ−ＲＯＭ９２−１やフレキシブルディスク９２−２等の可搬型記録媒体９２だけでなく、通信回線の先に備えられた他の記憶装置９１や、コンピュータ９３のハードディスクやＲＡＭ等の記録媒体９４のいずれに記憶されるものであっても良く、プログラム実行時には、プログラムはローディングされ、主メモリ上で実行される。
【００７７】
また、本発明の実施の形態にかかる適応型音声対話システムにより生成された状況記録データベース等についても、図９に示すように、ＣＤ−ＲＯＭ９２−１やフレキシブルディスク９２−２等の可搬型記録媒体９２だけでなく、通信回線の先に備えられた他の記憶装置９１や、コンピュータ９３のハードディスクやＲＡＭ等の記録媒体９４のいずれに記憶されるものであっても良く、例えば本発明にかかる適応型音声対話システムを利用する際にコンピュータ９３により読み取られる。
【００７８】
【発明の効果】
以上のように本発明にかかる適応型音声対話システムによれば、ユーザのガイダンスに対する応答状況に基づいて、次回のガイダンスをどのような内容で出力するのかを制御することができ、ユーザの個々の状況に応じたガイダンスを含む応答を行うことが可能となる。
【図面の簡単な説明】
【図１】本発明の実施の形態１にかかる適応型音声対話システムの構成図
【図２】本発明の実施の形態１にかかる適応型音声対話システムにおけるコマンド／文法データベースのデータ構成例示図
【図３】本発明の実施の形態１にかかる適応型音声対話システムにおける処理の流れ図
【図４】本発明の実施の形態１にかかる適応型音声対話システムにおける状況記録データベースのデータ構成例示図
【図５】本発明の実施の形態１にかかる適応型音声対話システムにおけるガイダンス出力の例示図
【図６】本発明の実施の形態１にかかる適応型音声対話システムにおける状況記録データベースの他のデータ構成例示図
【図７】本発明の実施の形態２にかかる適応型音声対話システムにおける状況記録データベースのデータ構成例示図
【図８】本発明の実施の形態３にかかる適応型音声対話システムの構成図
【図９】コンピュータ環境の例示図
【符号の説明】
１１音声入力部
１２コマンド抽出部
１３対話制御部
１４コマンド／文法データベース
１５状況記録データベース
１６音声出力部
８１音声対話システム
８２記録集計部
９１回線先の記憶装置
９２ＣＤ−ＲＯＭやフレキシブルディスク等の可搬型記録媒体
９２−１ＣＤ−ＲＯＭ
９２−２フレキシブルディスク
９３コンピュータ
９４コンピュータ上のＲＡＭ／ハードディスク等の記録媒体[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an adaptive spoken dialogue system and method in which humans and a computer interact with each other using speech. In particular, the present invention relates to an adaptive spoken dialogue system and method for changing a guidance output according to a user's familiarity with a system.
[0002]
[Prior art]
With the rapid progress of IT in recent years, interactive interfaces using voice have been used in various applications. Application operations may be eye-free or hands-free, especially in situations where the user cannot use their hands and cannot look away, such as while driving a car. It is an important factor. In other words, it is necessary to avoid a situation in which the operation of the application device is deprived of the hands and eyes and the driving operation itself is hindered for safety.
[0003]
Therefore, an interface using voice has been actively adopted for an application likely to be used in such a situation. The voice-based interactive interface makes it easier to concentrate attention even during driving, and can perform application operations without hindering driving operations.
[0004]
However, even with an interface using voice, depending on the content of the output voice, it is necessary to listen carefully or memorize it. In such a case, for example, the driver loses concentration and attention, which is not preferable for safety.
[0005]
Therefore, various devices have been devised to solve such a problem. For example, Patent Literature 1 discloses a voice guidance device that estimates a user's familiarity with an application from the frequency of access to the application and automatically changes the content of voice guidance according to the degree of familiarity. I have.
[0006]
Alternatively, Patent Literature 2 discloses a voice interaction apparatus that evaluates a user's skill level with respect to a used application and selects a guidance based on the skill level. As a criterion for determining the skill level, a response time until a voice input is performed by the user, the number of corrections to the input result, and the like are used.
[0007]
Further, in Patent Document 3, in order to take into account elements that cannot be determined only by the response time, unnecessary words and stagnation are detected to determine whether the user is unfamiliar with the application and provide guidance. A changing voice response device is disclosed.
[0008]
[Patent Document 1]
JP 2001-22370A
[0009]
[Patent Document 2]
JP-A-10-20884
[0010]
[Patent Document 3]
JP 2001-331196 A
[0011]
[Problems to be solved by the invention]
However, the above-described voice interaction system is based on the premise that there is a response input by the user, and even if the user encounters an urgent situation such as being unable to release his hand, the response time was increased, and the proficiency level was increased. Is determined to be a low user, and guidance for beginners is output to a user with high proficiency.
[0012]
In addition, the conventional voice interaction interface has a problem that the user cannot input a response by interrupting the output of the response voice, which makes the interface difficult for a trained user to use.
[0013]
SUMMARY OF THE INVENTION An object of the present invention is to provide an adaptive voice interaction system and method that can determine the proficiency level of an application while determining the situation of a user and can output appropriate guidance in order to solve the above problems. And
[0014]
[Means for Solving the Problems]
To achieve the above object, an adaptive voice interaction system according to the present invention includes a voice input unit that inputs a user's voice, a command extraction unit that recognizes the input voice and extracts a command, At an input first timing, the adaptive voice dialogue system includes a dialogue control unit that generates a system response output by the system based on a command, and a voice output unit that performs voice output according to an instruction from the dialogue control unit. Then, it includes guidance on the content that the user should respond to the generated system response, records the guidance output status and the voice input status by the user in a status recording database, and refers to the status recording database to provide guidance. It is characterized in that it is changed according to.
[0015]
With this configuration, it is possible to control the content of the next guidance to be output based on the response status to the user's guidance, and it is possible to perform a response including the guidance according to each user's individual status. It becomes.
[0016]
Further, in the adaptive voice dialogue system according to the present invention, the dialogue control unit may generate a system response output by the system based on the command even at a second timing after a predetermined time has passed without a response from the user. preferable.
[0017]
In the adaptive voice interaction system according to the present invention, the command is composed of a main part and an argument, the frequency of use of the argument is also recorded in the situation record database, and the argument is omitted in the command input by the user by voice. Preferably, the main part of the extracted command is supplemented with an argument according to the frequency of use of the argument. This is because, even when an argument is not input, compensation can be made in accordance with the frequency of use of the argument, and useless response and guidance can be omitted.
[0018]
In addition, the adaptive voice interaction system according to the present invention further includes a record aggregation unit that aggregates the recorded contents of the situation record database for the plurality of users, and the dialogue control unit outputs a guidance output according to the aggregation result in the record aggregation unit. It is preferable to determine the presence or absence and the content of the guidance. This is because even a user who has just started using the voice interaction system can output effective guidance to some extent.
[0019]
Further, the present invention is characterized by software that executes the functions of the above-mentioned adaptive speech dialogue system as processing steps of a computer, and specifically, a step of inputting a user's voice, Extracting a command by recognizing the received voice, generating a system response output by the system based on the command at a first timing when the user inputs the voice, and outputting the voice. An adaptive spoken dialogue method, which includes guidance on contents to be responded to by a user to a generated system response, records an output state of the guidance and a voice input state by the user in a situation record database, and refers to the situation record database. In this way, an adaptive spoken dialogue method that changes the guidance according to the situation and such a process can be used. Characterized in that it is a computer-executable program to Fight.
[0020]
With this configuration, by loading the program on the computer and executing the program, it is possible to control what content the next guidance is output based on the response status to the guidance of the user, and it is possible to control individual contents of the user. It is possible to realize an adaptive voice interaction system that can make a response including guidance according to a situation.
[0021]
In addition, the computer-executable program for realizing the adaptive voice interaction system according to the present invention includes the step of generating a system response in the above-described adaptive voice interaction method. It is also preferable to generate a system response output by the system based on the command at the timing described above.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an adaptive voice interaction system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a configuration diagram of the adaptive voice interaction system according to the first embodiment of the present invention.
[0023]
In FIG. 1, reference numeral 11 denotes a voice input unit for inputting a user's voice, and a microphone or the like can be used as an input medium. Reference numeral 12 denotes a command extraction unit for recognizing the input voice and extracting a command, for recognizing the input content by recognizing the input voice and controlling the included application. To extract the command.
[0024]
The command extraction unit 12 refers to a command / grammar database 14 registered in advance for information on “grammar” such as a recognizable command and what argument is required by the command. The command extraction unit 12 sends the command that matches the recognition result and the contents considered as arguments to the dialog control unit 13.
[0025]
If a command matching the recognition result does not exist in the command / grammar database 14, the command extracting unit 12 notifies the dialog control unit 13 of the fact. Further, if there is no input from the user even after a predetermined time-out period set at the start of recognition, the dialog control unit 13 is also notified that there is no input.
[0026]
Note that the method of recognizing the voice input is not particularly limited as long as it is an existing voice recognition method.
[0027]
Then, the dialogue control unit 13 should present to the user based on the extracted command at the first timing when the user inputs the voice or at the second timing when the predetermined timeout time has elapsed from the voice input by the user. Generate a response.
[0028]
Various timings can be considered as the first timing at which the voice is input by the user, such as at the same time as the start of the voice output, some time after the start of the voice output, or at the same time as the completion of the voice output.
[0029]
For example, when the user starts inputting voice simultaneously with the start of voice output as guidance on the system side, it is necessary to accept voice input from the user even during voice output by the system. As described above, when a voice input from the user is received during the voice output, the voice output may be interrupted, or the output may be continued according to the recognition result of the voice input.
[0030]
At the start of voice input, a normal timeout time is set. Therefore, even if there is no response from the user, a response to be presented to the user can be generated at the second timing when the timeout period has elapsed.
[0031]
Then, if there is an input from the user, or if the timeout period elapses without any input from the user, the dialogue control unit 13 returns the following response based on the input command or information that there is no input. Generate
[0032]
Next, the generated response includes voice guidance information to be input by the user. Then, the output status of the guidance and the voice input status of the user for the guidance are recorded in the status recording database 14.
[0033]
Guidance on command grammar is registered in the command / grammar database 14 for each command. The guidance may be recorded speech or text data used for speech synthesis. FIG. 2 shows an example of guidance registration in the command / grammar database 14 using text data.
[0034]
The status record database 15 records a guidance output status and a command call status for each command. FIG. 3 is an example of the situation record database 15.
[0035]
In the example of FIG. 3, first, as for the guidance output status, the number of times of output is recorded, and the number of times is output each time the dialog control unit 13 outputs the guidance for the command. If the number of times of outputting the guidance exceeds a predetermined threshold value, it is determined that the user has mastered the command, and the output of the guidance is stopped.
[0036]
It is also conceivable to record the date and time of the last output, and not output the guidance if the output has been performed within a certain period in the past. In this case, it is preferable that the guidance be performed again after a set period elapses.
[0037]
Also, by using the cumulative value of the guidance output count and the final output date and time together, the guidance is stopped when the guidance output count exceeds a certain number, but after a certain period from the final output, the cumulative counter is reset and the guidance is restarted. It is also possible to do it.
[0038]
Next, the number of command calls is recorded as the command call status. Each time the user calls the command, the command call count is accumulated. If the number of command calls by the user exceeds a predetermined number, it is determined that the user has mastered the command, and the output of guidance is stopped.
[0039]
Further, it is conceivable to record the date and time of the last call of the command, reset a cumulative counter of the number of calls after a certain period has elapsed since the last call, and provide guidance again.
[0040]
Further, in order to avoid the problem that the amount of utterance of the system becomes too large if guidance is provided for all commands, the number of guidances may be limited. For example, it can be realized by limiting the number of commands to be guided in one utterance or a certain period to a certain number.
[0041]
Further, by storing the contents of the situation record database 15 for each user, it is possible to carry out the conversation by taking over the previous situation when the user starts the conversation again.
[0042]
Finally, the audio output unit 16 performs audio output according to an instruction from the dialog control unit 13. In some cases, text data is given and the text data is converted into speech by speech synthesis and output, or in the case where a file identifier is given and an audio file corresponding to the file identifier is reproduced.
[0043]
Next, a description will be given of a processing flow of a program for realizing the adaptive voice interaction system according to the first embodiment of the present invention. FIG. 4 shows a flowchart of the processing of a program for realizing the adaptive voice interaction system according to the first embodiment of the present invention.
[0044]
In FIG. 3, first, a voice is input by the user (step S301), and the input voice is recognized and the command / grammar database 14 is queried (step S302). If there is a command that matches the recognition result (step S303: Yes), the command and the contents considered as arguments are extracted and passed to the dialog control unit 13 (step S304).
[0045]
If there is no command that matches the recognition result (step S303: No), the fact is notified to the dialog control unit 13, and the user waits for re-input.
[0046]
Then, at the first timing when the user inputs the voice or at the second timing when the predetermined timeout time has elapsed from the voice input by the user, the status recording database 15 is queried based on the extracted command (step S305). .
[0047]
Then, a response including the guidance is generated according to the guidance output status and the command call status for each command in the status recording database 15 (step S306), and output as a synthesized voice (step S307).
[0048]
As described above, according to the first embodiment, in the response output by the system, the grammar of the command to be input by the user, that is, how to call the command is guided, and the user listens to the response output voice while listening to the voice output system. You can learn how to call commands for.
[0049]
Further, the same guidance output can be suppressed at a constant frequency, and can be prevented from being repeated endlessly. When the user directly calls the command by learning how to call the command, the guidance output of the command is not performed. Further, when a predetermined time has elapsed since the guidance output is stopped or the command is not called, the guidance is resumed. In this way, guidance output can be performed adaptively in response to the user's command learning situation.
[0050]
There are various methods for selecting guidance. For example, priorities are assigned to commands, and it is determined whether to give guidance in order from the command with the highest priority, and guidance for lower commands is performed only when guidance for higher commands is not provided. There is a method that can be considered.
[0051]
In this way, by setting a higher priority for a basic command, guidance on the basic command is performed first, and when it is determined that the guidance is unnecessary for the user, Guidance can be provided step by step, such as providing more advanced command guidance. Note that a plurality of commands having the same priority may be set.
[0052]
Specifically, for example, when the “stop” command has a higher priority than the “playback” command, first, the guidance of the “stop” command is output with priority. After that, the user actually calls the “stop” command, and after the guidance of the “stop” command becomes unnecessary, the guidance of the “playback” command is performed.
[0053]
It is also conceivable to arrange commands hierarchically. For example, subcommands “set destination”, “congestion information”, and “required time information” are prepared for the “car navigation” command. FIG. 5 shows the situation of the dialogue in this case. In FIG. 5, “U” indicates a voice input by the user, and “S” indicates a response output (guidance) by the system.
[0054]
As shown in FIG. 5, the user first calls the "car navigation" command. The dialogue control unit 13 inquires about each subcommand. That is, next, an inquiry about whether to use the subcommand “destination setting” is output. In such guidance, guidance is provided such that the user accepts only a positive or negative answer such as “Yes” or “No” as a voice input.
[0055]
If a negative response such as “No” is received from the user or a predetermined time has elapsed since the guidance was output, the same is performed for the next subcommand “congestion information” as shown in FIG. Perform processing.
[0056]
When an affirmative answer such as “Yes” is received from the user, a response generation process is performed based on the subcommand as shown in FIG. As shown in FIG. 5B, when the subcommand “destination setting” is not called directly but is called indirectly from the command “car navigation”, the status recording database 15 stores the command The cumulative value of the number of calls is recorded as the call status. FIG. 6 shows an example of the situation record database 15 at this time.
[0057]
Then, when the number of indirect calls accumulated in the situation record database 15 reaches a predetermined number, guidance for making a direct call is generated as a response. This is because the user can judge that he is familiar with the indirectly called command.
[0058]
Then, in the status record database 15, the command calling status of "car navigation" is updated, and at the same time, the command indirect calling status of "destination setting" is updated. Assuming that the number of times of indirect call determination is “3”, the command “destination setting” has reached a predetermined number, so the system provides guidance for the direct call (normal command call) of the command “destination setting”. Will be done.
[0059]
Further, when the command "destination setting" is called up a predetermined number of times, the guidance of the command may be stopped. This is because it can be determined that the proficiency level for the user is quite high.
[0060]
In addition, when the indirect call is performed again despite the guidance and the direct call being performed several times, it can be determined that the user has forgotten the command. Therefore, in such a case, it is conceivable to reset the cumulative value of the guidance output status and perform the guidance again.
[0061]
On the other hand, even with the same command, the content of the guidance itself may be changed. For example, it is also conceivable to prepare a plurality of guidances from a polite one to a simple one for one command, and to switch and output the guidance to a simpler one as the number of guidance outputs increases.
[0062]
For example, for the first guidance, "Please say" weather "or" weather forecast "when using the weather forecast." For the second guidance, "use the weather when using the weather forecast. ], And as the third guidance, the content is changed so that the guidance is simply output as the number of times is increased, such as "Weather can be called by" weather "."
[0063]
Alternatively, one command can be guided to another calling method. For example, when the number of command calls increases, guidance for a more complicated calling method is provided. Specifically, in the first or second guidance, the guidance output is “Please say“ weather ”when using the weather forecast.” In the third and subsequent guidance, “weather forecast” When using, please say something like "Weather in Osaka". " In addition, it is not specifically limited to switching the text of the guidance, and for example, the speed of reading the guidance may be changed.
[0064]
(Embodiment 2)
Hereinafter, an adaptive voice interaction system according to a second embodiment of the present invention will be described with reference to the drawings. The configuration diagram of the adaptive voice interaction system according to the second embodiment of the present invention has the configuration shown in FIG. 1 as in the first embodiment.
[0065]
In the second embodiment, the recording data configuration of the situation recording database 15 is different. FIG. 7 shows an example of a data configuration of the situation recording database 15 in the adaptive voice interaction system according to the second embodiment of the present invention.
[0066]
As shown in FIG. 7, when the command is a command having an argument, the feature is that the calling frequency of the argument is also recorded in the situation recording database 15.
[0067]
For example, the command "weather" can have information on a place and date and time as arguments. Therefore, “weather”, “akashi weather”, “tomorrow's Kobe weather”, etc. are all call grammars of the command “weather”, and are given arguments such as a location and a date and time, respectively.
[0068]
As in the first embodiment, in addition to recording the guidance output for each command and the status of the command call, the frequency of the argument is also recorded for each command having the argument. For example, in FIG. 7, for the command “weather” having an argument, the number of times of the given argument is simply recorded.
[0069]
Then, for example, when the command “weather” is called without an argument, the argument is automatically supplemented according to the frequency of the argument. That is, when it is determined that the frequency of using a particular argument is particularly high, the argument is automatically set and guidance is output. Conversely, if there is no particularly frequent argument, setting the default argument in advance enables the guidance to include the default argument.
[0070]
For example, if the user frequently calls “Akashi-no-weather” and rarely specifies a date and time other than “today”, in the situation record database 15, the number of calls of the argument “Akashi” is a predetermined number. It is larger than the threshold value, and the same situation occurs for the argument “today”. In this case, when the user calls only the command "weather", the arguments "today" and "Akashi" are supplemented, and the guidance "Today's Akashi weather is fine" is output. Become.
[0071]
As described above, according to the second embodiment, even when the user calls a command without an argument, the cumulative value is also recorded for the number of times the user calls the argument for each command. Guidance can be provided.
[0072]
(Embodiment 3)
Hereinafter, an adaptive voice interaction system according to a third embodiment of the present invention will be described with reference to the drawings. FIG. 8 shows a configuration diagram of an adaptive voice interaction system according to the third embodiment of the present invention. FIG. 8 is characterized in that the situation record database 15 is formed for each voice dialogue system 81 used by a plurality of users, and the accumulated values are collected by a record summation unit 82.
[0073]
That is, in the third embodiment, the dialogue control unit 13 generates the guidance using a situation record database in which the values of a plurality of users are tabulated, not for each user.
[0074]
For example, control to reduce the frequency of guidance output for commands that are not used by most users, to simplify the guidance output for commands that are not frequently used, or to increase the priority of commands used by many users. Is performed, the presence / absence of guidance output, the content of guidance, and the like are changed. By doing so, it is possible to output a relatively effective guidance output especially to a user who has just started using the voice interaction system.
[0075]
In order to total the contents of the status record database 15 managed by a plurality of users in the record totaling unit 82, the status record database 15 and the record totalizing unit 82 are installed on the server as shown in FIG. The status record database 15 may be updated from each of the dialog control units 13 via a network, or only the record aggregation unit 82 may be installed on a server, and the contents of each status record database 15 may be updated via the network. It may be one that totals. Further, by switching the status record database 15 by identifying the user, one voice interactive system may have a plurality of status record databases.
[0076]
As shown in FIG. 9, the program for realizing the adaptive voice interaction system according to the embodiment of the present invention includes not only the portable recording medium 92 such as the CD-ROM 92-1 and the flexible disk 92-2, but also the program. The program may be stored in any of the other storage device 91 provided at the end of the communication line or the recording medium 94 such as a hard disk or a RAM of the computer 93. When the program is executed, the program is loaded and the main memory is stored. Run on
[0077]
Also, as shown in FIG. 9, a situation recording database or the like generated by the adaptive voice interaction system according to the embodiment of the present invention is a portable recording medium such as a CD-ROM 92-1 or a flexible disk 92-2. Not only the storage device 92 but also a storage device 91 provided at the end of a communication line or a storage medium 94 such as a hard disk or a RAM of a computer 93 may be stored. It is read by the computer 93 when using the interactive speech dialogue system.
[0078]
【The invention's effect】
As described above, according to the adaptive voice interaction system according to the present invention, it is possible to control the content of the next guidance based on the response status to the guidance of the user, and to control the individual A response including guidance according to the situation can be made.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an adaptive voice interaction system according to a first embodiment of the present invention;
FIG. 2 is a diagram showing an example of a data configuration of a command / grammar database in the adaptive voice interaction system according to the first embodiment of the present invention;
FIG. 3 is a flowchart of a process in the adaptive voice interaction system according to the first embodiment of the present invention;
FIG. 4 is a diagram illustrating an example of a data configuration of a situation recording database in the adaptive voice interaction system according to the first embodiment of the present invention;
FIG. 5 is a view showing an example of guidance output in the adaptive voice dialogue system according to the first embodiment of the present invention;
FIG. 6 is another example of the data configuration of the situation record database in the adaptive voice dialogue system according to the first embodiment of the present invention;
FIG. 7 is a diagram illustrating an example of a data configuration of a situation recording database in the adaptive voice interaction system according to the second embodiment of the present invention;
FIG. 8 is a configuration diagram of an adaptive voice interaction system according to a third embodiment of the present invention;
FIG. 9 is an exemplary diagram of a computer environment.
[Explanation of symbols]
11 Voice input section
12 Command extractor
13 Dialogue control unit
14 Command / grammar database
15 Situation record database
16 Audio output unit
81 Spoken Dialogue System
82 Record tabulation section
91 Line destination storage device
92 Portable recording media such as CD-ROM and flexible disk
92-1 CD-ROM
92-2 Flexible disk
93 Computer
94 Recording media such as RAM / hard disk on computer

Claims

A voice input unit for inputting a user's voice,
A command extraction unit for recognizing the input voice and extracting a command,
At a first timing when the user inputs a voice, a dialogue control unit that generates a system response output by the system based on the command,
An audio output unit that performs audio output in accordance with an instruction from the interaction control unit,
The system includes guidance on the content that the user should respond to the generated system response, records the output status of the guidance and the voice input status of the user in a status recording database, and refers to the status recording database to convert the guidance. An adaptive speech dialogue system characterized by changing according to the situation.

The adaptive voice dialogue system according to claim 1, wherein the dialogue control unit generates a system response output by the system based on the command even at a second timing after a predetermined time has passed without a response from a user.

The command is composed of a main part and an argument, the frequency of use of the argument is also recorded in the status record database, and when the argument is omitted in a command input by a user by voice, the command of the extracted command is The adaptive voice interaction system according to claim 1, wherein the argument is supplemented to the main part in accordance with the frequency of use of the argument.

Further includes a record aggregation unit that aggregates the record contents of the situation record database for a plurality of users,
The adaptive voice dialogue system according to claim 1, wherein the dialogue control unit determines the presence / absence of the guidance output and the content of the guidance in accordance with the result of the aggregation in the record aggregation unit.

Inputting the user's voice;
Recognizing the input voice and extracting a command;
At a first timing when the user inputs a voice, generating a system response output by the system based on the command,
Providing an audio output, comprising:
The system includes guidance on the content that the user should respond to the generated system response, records the output status of the guidance and the voice input status of the user in a status recording database, and refers to the status recording database to convert the guidance. An adaptive speech dialogue method characterized by changing according to the situation.

6. The adaptive voice dialogue according to claim 5, wherein, in the step of generating the system response, a system response output by the system based on the command is generated even at a second timing after a predetermined time has passed without a response from a user. Method.

Inputting the user's voice;
Recognizing the input voice and extracting a command;
At a first timing when the user inputs voice, generating a system response output by the system based on the command,
Providing an audio output, comprising: a computer-executable program embodying an adaptive spoken dialogue method comprising:
The system includes guidance on the content that the user should respond to the generated system response, records the output status of the guidance and the voice input status of the user in a status recording database, and refers to the status recording database to convert the guidance. A computer-executable program characterized by changing according to a situation.

6. The computer-executable method according to claim 5, wherein in the step of generating the system response, a system response output by the system based on the command is generated even at a second timing after a predetermined time has passed without a response from a user. program.