JP2004271909A

JP2004271909A - Voice interactive system and method, and voice interactive program and its recording medium

Info

Publication number: JP2004271909A
Application number: JP2003062552A
Authority: JP
Inventors: Kouji Dousaka; 浩二堂坂; Yoshihito Yasuda; 宜仁安田; Kiyoaki Aikawa; 清明相川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-03-07
Filing date: 2003-03-07
Publication date: 2004-09-30
Anticipated expiration: 2023-03-07
Also published as: JP3883066B2

Abstract

<P>PROBLEM TO BE SOLVED: To eliminate conducting confirmation for a portion of or all of the contents of an user's inquiry and to reduce the degree of unsatisfaction of the user associated with the increase in the number of outputs concerning confirmation sentences. <P>SOLUTION: A voice interactive system is provided with a means 100 which is used to input an inquiry sentence, an approval sentence for the confirmation and a correction sentence for the confirmation, a means 110 which is used to generate and to update a system understanding condition at each time point of interaction, a means 120 which determines the kind of provided information, a means 130 which generates fixed confirmation type, trial confirmation type, fixed instantly responding type and trial instantly responding type interactive procedures, a means 140 which computes cost of each interactive procedure, a means 150 which generates all combinations of attributes, that are not yet confirmed, as confirmation candidates, a means 160 which computes cost of each confirmation candidate based on the interactive cost including the confirmation of the confirmation candidates and a means 170 which compares the cost of the confirmation candidates with the cost of the instantly responding type interactive procedures and outputs a confirmation sentence or a responding sentence having a minimum cost. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、対話型情報提供システムに係り、詳しくは、ユーザが音声を使ってデータベースの内容についての問い合わせを入力したとき、システムは、必要ならば、ユーザの問い合わせ内容を確認するためのやり取りを行った後、認識された問い合わせ内容にしたがって、データベースの内容をユーザに対し音声により提供する音声対話システム及び方法、音声対話プログラム並びにその記録媒体に関するものである。
【０００２】
【従来の技術】
音声対話システムは、たとえば、気象情報、テレビ番組表、スケジュール表、交通機関の時刻、といった様々な情報を蓄積したデータベースがあるとき、データベースの内容についての問い合わせを意味する文を、ユーザが音声認識装置を介して音声によって入力すると、システムが認識した問い合わせ内容を、属性と属性の値と属性の値が確定済みであるかどうかを示すフラグの３つ組みの集合として表現されるシステム理解状態として保持する。
【０００３】
このような音声対話システムでは、音声認識誤りのため、ユーザ問い合わせ内容を認識した結果は必ずしも正しいとは限らない。そこで、必要に応じて、システム理解状態の全て或いは一部の内容を確認するために、確認文を音声としてユーザに対し出力する。ユーザが確認の承認を意味する承認文（例：「はい」）を入力した場合には、システムは確認したシステム理解状態の内容を確定済みとして記録する。ユーザが確認に対し訂正を意味する訂正文を入力した場合には、システムは訂正文の内容にしたがってシステム理解状態を更新する。確認を行うためのやり取りの後、システムはシステム理解状態にしたがってユーザ問い合わせ内容を決定し、ユーザに対し情報を提供するための応答文を出力する。
【０００４】
一方、確認を一切行わない場合には、ユーザが問い合わせを入力した直後、システムはその時のシステム理解状態にしたがってユーザ問い合わせ内容を決定し、ユーザに対し情報を提供するための応答文を出力することとなる。
【０００５】
確認を行う利点は、属性の値を確認するための確認文に対してユーザが承認文を入力すると、属性の値が正しい確率が高まるので、システムはユーザの必要とする情報をより的確に伝達することが可能となることにある。しかし、確認を行うことにより対話の長さは長くなるので、ユーザの満足度が減少する可能性がある。
【０００６】
従来、音声対話システムがユーザ問い合わせ内容の確認を行うための技術としては、ユーザからの承認文によって確定済みとなった属性の値のみにしたがって、応答文を生成する第一の技術、確認を全く行わないという第二の技術、各属性の値を認識した際に得られる音声認識結果の信頼度をシステム理解状態の中に記録しておき、その信頼度が或る閾値よりも大きければ、その属性の値については確認を行わないという第三の技術（例えば、非特許文献１参照）、及び、確定済みの属性の値だけでなく、確定済みでない属性の値も正しい値あると仮定して、応答文を生成したとき、もしその応答文が誤っていたならば、ユーザからの拒否発話があることを前提とし、ユーザからの拒否発話を受け取ったなら、次は確定済みの属性の値が正しい値であるとして、応答文を生成するという第四の技術（例えば、非特許文献２、非特許文献３、参照）がある。
【０００７】
【非特許文献１】
駒谷和範，河原達也，「混合主導対話における音声認識誤りに対処するための対話管理」，
言語処理学会第６回年次大会発表論文集，３３６−３３９（２０００）
【非特許文献２】
堂坂浩二，安田宜仁，相川清明，「システム知識制限下での効率的音声対話制御」，
自然言語処理，Ｖｏ１．９，Ｎｏ．１（２００２）４３−６３
【非特許文献３】
堂坂浩二，安田宣仁，相川清明，「情報伝達効率最大化に基づく音声対話制御法」，
言語処理学会第８回年次大会発表論文集，２６０−２６３（２００２）
【０００８】
【発明が解決しようとする課題】
音声対話システムは、音声認識誤りのため、ユーザ問い合わせ内容を認識した結果は正しいとは限らない。そこで、音声対話システムでは、音声認識誤りが存在しても、ユーザの問い合わせ内容に対して可能な限り応えるための技術が開発されてきている。
【０００９】
上記した従技の第一の技術、第二の技術、第三の技術、第四の技術のいずれにおいても、対話の各時点においてシステムが認識しているユーザの問い合わせ内容は、属性と属性の値と値が確定済みであるかどうかを示すフラグの３つ組みの集合として表される。この３つ組みの集合をシステム理解状態と呼ぶ。
【００１０】
第一の技術においては、システムは認識した属性の値を確認するための確認文をユーザに対し出力し、ユーザからの承認文、例えば「はい」という文、が認識されたときに、確認した属性の値が確定済みとなったことをシステム理解状態の中に記録する。システムは、確定済みとなった属性の値のみにしたがってデータベースを検索し、検索結果をユーザに伝達するための応答文を出力する。この方法は、ユーザによって承認された値のみにしたがって応答文を出力するので、システムはユーザの問い合わせ内容を誤認識した結果にしたがって誤った応答を行う可能性が小さくなるという利点がある。確認文の出力は、システムの応答文の信頼性を上げるために有効である一方で、数多くの確認は、ユーザが必要な情報を得るまでにかかる時間が増大するので、ユーザの不満足度が高まるという問題も引き起こす。
【００１１】
従来の第一の技術は、応答文を作成するために必要なすべての属性値について確認を行い、確認文の数を減らす工夫がなされていないため、ユーザの不満足度が高まってしまうという問題がある。
【００１２】
従来の第二の技術は、確認を全く行わないという方法である。この方法は、従来の第一の技術と違って、確認文を全く出力しないという点では、ユーザの不満足度は高まらない。しかし、確認を全く行わないので、システムはユーザ問い合わせ内容の誤認識結果に基づいて応答文を出力する可能性がある。結果として、ユーザは再度問い合わせを行うことになり、かえって、ユーザが必要とする情報を得るまでの対話の長さが増大してしまうという問題がある。
【００１３】
従来の第三の技術は、属性値の音声認識結果の信頼度を記録しておき、或る閾値よりも大きな信頼度もつ属性値に関しては、確認を行わず、音声認識結果の信頼度が低いときに確認を行うという方法である。この方法では、閾値が、ユーザ問い合わせ内容の理解率を向上させるという基準で決定されている。ユーザの問い合わせ内容の理解率を向上させるという基準は、必ずしも、ユーザが必要とする情報を手に入れるまでに要する対話の長さをできるだけ短くすることを帰結しないという問題がある。
【００１４】
従来の第四の技術は、対話の各時点で、応答文を出力し終わるまでの対話の手順として可能な手順を枚挙し、最も対話の長さが短くなるような対話手順にしたがって、対話の各時点でのシステムの振る舞いを決定するという方法である。対話の手順としては、確認を行った後に、確定済みのシステム理解状態のみにしたがって、応答文を出力するという確定的な確認型の対話手順、確認を行わないで、現時点で確定済みのシステム理解状態のみにしたがって応答文を出力するという確定的な即時応答型の対話手順、現時点で確定済みでない属性値も含めたシステム理解状態にしたがって応答文を出力し、ユーザが応答文に対して拒否発話を起こったときには対話をやり直し、やり直した対話の中では確定済みのシステム理解状態のみにしたがって応答文を出力するという確認必要性探索型の対話手順を考える。第四の技術では、これらの対話手順のそれぞれについて、対話の長さを推定し、対話の長さが最も短い対話手順を選択する。選択された対話手順にしたがって、システムの次の行動が決定される。
【００１５】
第四の技術では、確認必要性探索型の対話手順が、他の種類の対話手順よりも、短い対話でユーザが必要な情報を手に入れることができることと判断できる場合には、この確認必要性探索型の対話手順が選択され、いくつかの属性値の確認を省くことができる。このとき、対話全体の長さが他の対話手順に比べて増大することがないことも保証されている。
【００１６】
従来の第四の技術は、従来の第一の技術に比べて、いくつかの確認を省くことができるという利点をもち、従来の第二の技術における、誤った応答文を何度も出力することにより、対話の長さが増大するという問題点を克服することができる。第三の技術では、閾値を選択するための基準がユーザ問い合わせの理解率を向上するという基準であるが、第四の技術では、各対話手順にしたがったときの対話の長さを短くするという基準を用いている。
【００１７】
しかし、第四の技術は、システムが誤った応答を行った場合、ユーザが拒否発話を行ってくれることを前提としており、この前提が成立しない場合には適用できないという問題がある。実際の対話では、ユーザは、システムが誤った応答を行ったとき、「違います」といった拒否発話を行ってくれる場合もあるだろうが、拒否発話を明示的に行うことなく、再度、問い合わせを繰り返して対話をやり直すという場合も考えられる。また、システムが誤った応答を行った場合、必ず拒否発話を行わなければいけないとユーザに強制することは、ユーザに負担をかけ、ユーザの不満足度を増大させることになる。
【００１８】
このように、第四の技術は、第一、第二、第三の技術の問題点を克服するという利点はあるが、システムの誤った応答に対してユーザが拒否発話を行うことを前提としており、この前提が、実際の対話の中では、必ずしも成立しないという問題がある。
【００１９】
本発明は、音声対話システムにおいて、上述のような従来の技術の問題点に鑑みてなされたものであり、システムが誤認識により誤った応答を行った場合にユーザが拒否発話を行うことを前提とせずに、システム理解状態を確認するための確認文の出力をできるだけ削減すると同時に、対話のやり直しによる対話の長さの増大を抑制することを目的とする。
【００２０】
【課題を解決するための手段】
本発明では、従来の第四の技術で考慮されていた確認必要性探索型の対話手順を考慮することをとりやめる。なぜなら、この確認必要性探索型の対話手順は、ユーザの拒否発話を前提とする対話手順であるからである。代わりに、システム理解状態に含まれる属性の値の全て或いは一部を確認するための確認文を出力し、ユーザが承認文を入力した後に、確定済みの属性の値だけでなく、確定済みでない属性の値も正しいと仮定して、情報を提供するための応答文を出力するという試行的な確認型の対話手順と、ユーザに対する確認は行わないで、確定済みの属性の値だけでなく、確定済みでない属性の値も正しいと仮定して、即座にユーザに情報を提供するための応答文を出力するという試行的な即時応答型の対話手順を考慮する。この試行的な確認型の対話手順及び試行的な即時応答型の対話手順では、システムの誤った応答に対してユーザからの拒否発話は前提としない。
【００２１】
また、本発明では、上記の試行的な確認型の対話手順と、試行的な即時応答型の対話手順に加えて、システム理解状態に含まれる属性の値の全て或いは一部を確認するための確認文を出力し、ユーザが承認文を入力した後に、確定済みの属性の値にしたがってユーザに情報を提供するための応答文を出力するという確定的な確認型の対話手順と、ユーザに対する確認は行わないで、確定済みの属性の値にしたがって、即座にユーザに情報を提供するための応答文を出力するという確定的な即時応答型の対話手順も考慮する。
【００２２】
対話の各時点において、確定的な確認型の対話手順、試行的な確認型の対話手順、確定的な即時応答型の対話手順、試行的な即時応答型の対話手順の４種類の対話手順をすべて考慮し、可能な対話手順の中でも最も対話の長さが短くなるような対話手順を選択する。対話手順にしたがって対話を行うときの対話の長さを対話手順のコストと呼ぶこととすれば、対話手順コストが最小となるような対話手順を選択すればよい。
【００２３】
各対話手順の対話コストは次のように計算することができる。
確定的な確認型の対話手順の対話コスト：対話手順にしたがって対話を行うときに、システムとユーザがやり取りする自立語の数の期待値。
確定的な即時応答型の対話手順の対話コスト：対話手順にしたがって対話を行うときに、システム応答文に含まれる自立語の数の期待値。
試行的な確認型の対話手順の対話コスト：対話手順にしたがって対話を行うときに、システムとユーザがやり取りする自立語の期待値と、その対話手順に含まれるシステム応答文が誤っていた場合に、後続する対話においてシステムとユーザがやり取りする自立語の数の期待値の和。
試行的な即時応答型の対話手順の対話コスト：対話手順にしたがって対話を行うときに、システム応答文に含まれる自立語の数の期待値と、システム応答が誤っていた場合に、後続する対話においてシステムとユーザがやり取りする自立語の数の期待値の和。
【００２４】
本発明によれば、これら４種類の対話手順の全てを考慮し、対話コストという客観的な基準の下で最小の対話コストをもつ対話手順を選択することにより、システムが誤認識により誤った応答を行った場合にユーザが拒否発話を行うことを前提とせずに、システム理解状態を確認するための確認文の出力をできるだけ削減すると同時に、対話のやり直しによる対話の長さの増大を抑制することができる。
【００２５】
【発明の実施の形態】
以下、発明の実施の形態について図面により詳細に説明する。
図１に本発明による音声対話システムのシステム環境を示す。本音声対話システム１０はデータベース２０及びメモリ装置３０を具備し、通信回線４０などにより多数の利用者端末（計算機端末）５０と接続されている。ユーザが利用者端末５０を使用し、情報の問い合わせを意味する文を音声によって入力すると、音声対話システム１０は、ユーザと問い合わせ内容の全てあるいは一部を確認するためのやり取りを行った後、問い合わせ内容に応じてデータベース２０に保持された情報を、通信回線４０、利用者端末５０を介し、音声によってユーザに提供する。音声対話システム１０の実体はコンピュータシステムである。メモリ装置３０は、音声対話システム１０の作業用メモリであり、後述するように、対話の各時点におけるシステム理解状態、対話手順、対話コスト、確認候補、確認候補コスト等を格納する。
【００２６】
図２は、本発明による音声対話システム１０の構成例を示す機能ブロックである。図中、入力部１００と出力部１７０が通信回線４０などにより、利用者端末５０と接続される。
【００２７】
入力部１００は、ユーザが計算機端末としての利用者端末５０の音声認識装置を介し音声（音声信号）として入力する、情報の問い合わせを意味する文または問い合わせ内容の訂正を意味する文または確認の承認を意味する文を取り込む。
【００２８】
文理解部１１０は、入力部１００から音声（音声信号）によって入力されるユーザ問い合わせの内容を認識し、システムが認識した問い合わせ内容をシステム理解状態として保持する。実際には、システム理解状態はメモリ装置３０に保持される。システム理解状態は、属性と属性の値と属性の値が確認済みかどうかを示すフラグの３つ組みの集合として表現される。文理解部１１０は、ユーザから情報の問い合わせを意味する文が入力されたと認識された場合には、問い合わせ文の認識結果からシステム理解状態を生成する。ユーザから問い合わせ内容の訂正文が入力された場合には、訂正文の内容を認識した結果にしたがって現在のシステム理解状態を更新する。システムがシステム理解状態に含まれる属性の値の全て或いは一部についてユーザに対して確認するために確認文を出力した後に、ユーザから承認文が入力されたと認識された場合には、確認した値が確定済みとなったことを現在のシステム理解状態の中に記録する。
【００２９】
提供情報種別決定部１２０は、文理解部１１０が生成・更新するシステム理解状態から判断して、ユーザに提供することが可能なすべての情報の種別を提供情報種別として決定するとともに、該提供情報種別に加えて、提供情報種別の確率を計算する。図１では省略したが、この提供情報別及びその確率もメモリ装置３０に保持される。
【００３０】
対話手順生成部１３０は、提供情報種別決定部１２０が決定した提供情報種別の各々に関して、システム理解状態に含まれる属性の値の全て或いは一部を確認するためのやり取りを行った後に、確定済みの属性の値のみを正しい値と仮定して、ユーザに情報を提供するための応答文を出力するという確定的な確認型の対話手順（Ａ）と、システム理解状態に含まれる属性の値の全て或いは一部を確認するためのやり取りを行った後に、確定済みとなった属性の値だけでなく、確定済みでない属性の値も正しい値であると仮定して、ユーザに情報を提供するための応答文を出力するという試行的な確認型の対話手順（Ｂ）と、ユーザに対する確認は行わないで、確定済みの属性の値のみが正しい値であると仮定して、即座にユーザに情報を提供するための応答文を出力するという確定的な即時応答型の対話手順（Ｃ）と、ユーザに対する確認は行わないで、確定済みの属性の値だけでなく、確定済みでない属性の値も正しい値であると仮定して、即座にユーザに情報を提供するための応答文を出力するという試行的な即時応答型の対話手順（Ｄ）を生成する。生成された各対話手順（対話プラン）はメモリ装置３０に保持される。
【００３１】
対話手順コスト計算部１４０は、対話手順生成部１３０で生成された各対話手順のコスト（対話コスト）を計算する。確定的な確認型の対話手順（Ａ）に関しては、対話手順にしたがって対話を行うときにシステムとユーザがやり取りする自立語の数の期待値を、該確定的な確認型の対話手順のコストとして計算する。試行的な確認型の対話手順（Ｂ）に関しては、対話手順に従って対話を行うときにシステムとユーザがやり取りする自立語の期待値と、その対話手順に含まれるシステム応答文が誤っていた場合に後続する対話においてシステムとユーザがやり取りする自立語の数の期待値の和を、該試行的な確認型の対話手順のコストとして計算する。確定的な即時応答型の対話手順（Ｃ）に関しては、対話手順に従ってシステムが応答するときにシステム応答文に含まれる自立語の数の期待値を、該確定的な即時応答型の対話手順のコストとして計算する。試行的な即時応答型の対話手順（Ｄ）に関しては、対話手順に従ってシステムが応答するときに、システム応答文に含まれる自立語の数の期待値と、システム応答が誤っていた場合に後続する対話においてシステムとユーザがやり取りする自立語の数の期待値の和を、該試行的な即時応答型の対話手順として計算する。生成された各々の対話手順のコストは、当該対話手順と対応付けてメモリ装置３０に保持される。
【００３２】
確認候補生成部１５０は、システム理解状態に含まれる属性のうち、確定済みでない属性のすべての組みあわせを確認候補として生成する。生成された各確認候補はメモリ装置３０に保持される。
【００３３】
確認候補コスト生成部１６０は、確認候補の各々に関して、提供情報種別ごとに、確認候補の確認を含むような対話手順の中で、最小のコストをもつ対話手順を選び、提供情報種別の確率を考慮して、コストの期待値を計算し、その期待値を確認候補のコストとして生成する。生成された各々の確認候補のコストは、当該確認候補と対応付けてメモリ装置３０に保持される。
【００３４】
出力部１７０は、すべての確認候補のコストと、すべての即時応答型の対話手順のコストを比較する。比較の結果、即時応答型の対話手順のコストの方が小さい場合には、その最小の即時応答型の対話手順にしたがって情報を提供するための応答文を生成して出力し、確認候補のコストの方が小さい場合には、そのうちの最小のコストの確認候補の確認を行うための確認文を生成して出力する。
【００３５】
制御部１８０は、各部１００〜１７０が以上のように連携動作するように、各部の動作を制御する。
【００３６】
図３に本音声対話システム１０の処理フローチャートを示す。また、図４に、図３中のコスト判定ステップ２１０の詳細処理フローチャートを示す。
【００３７】
以下に、具体例とともに本発明の実施例の詳細な動作について説明する。
ここでは、一例として天気情報案内を行う音声対話システムについて考える。可能な提供情報種別は、天気と警報の２つであるとする。利用者の問い合せの内容は、場所、日、情報種別という３つの属性によって表されるとする。場所という属性は、神奈川県、香川県といった都道府県名や都市名を値としてとり、日属性は、今日、明日という値をとり、情報種別は、天気、警報という値をとるとする。また、データベース２０には、１００個の場所が登録されており、各場所に予報されている天気カテゴリと発表されている警報の種類が記憶されている。
【００３８】
入力部１００は、ユーザから、最初に、次のような情報の問い合せを意味する文を入力したとする（ステップ２０１）。
「神奈川県の明日の天気について教えてください」（１）
【００３９】
文理解部１１０は、ユーザからの問い合わせ文の内容を認識し、システム理解状態を生成する（ステップ２０２）。システム理解状態は、属性と属性の値と属性の値が確認済みかどうかを示すフラグの３つの組の集合として表現される。文理解部１１０が生成するシステム理解状態は、次のようであったとする。
【００４０】

上記の表現において、３つ組み＜属性、値、未＞は、属性の値が確認済みでないことを表す。（１）の例の場合、場所属性の値は神奈川県であり、日属性の値は明日であり、情報種別属性の値は天気であり、いずれも確定済みではない。また、生成されたシステム理解状態（２）では、場所属性の値は「香川県」と誤認識されている。日属性と情報種別属性の値は正しく認識されている。
【００４１】
制御部１８０は、システム理解状態について、属性の値に未確認のものが存在するか判定する（ステップ２０４）。存在すればステップ２０５に進み、存在しなければステップ２１２に進む。今の時点では、確定済みの属性は存在しないため、ステップ２０５に進むが、初回のステップ２０５〜２１１の処理は省略する。
【００４２】
この後、本音声対話システム１０が情報種別の値が天気であることを確認する「天気ですか？」といった確認文を出力し、ユーザがこの確認文に対して「はい」といった承認文を入力した状況を想定する。
【００４３】
文理解部１１０は、ユーザからの承認文を認識した場合には、確認した値が確定済みとなったことをシステム理解状態の中に記録する（ステップ２０３）。ここでは、システム理解状態の内容は次のように表現される。

上の表現において、３つ組み＜属性、値、済＞は、属性の値が確認済みであることを示す。情報種別の値は天気であることが確定済みとなっている。この時点でも、場所と日の属性の値は未確定であるため、ステップ２０５に進む。
【００４４】
提供情報種別決定部１２０は、システム理解状態に基づき、ユーザに提供することが可能なすべての情報の種別を決定し、それぞれの確率を計算する（ステップ２０５）。ここでは、システム理解状態（３）から判断して、提供情報種別として天気を生成する。可能な提供情報種別は１個しかないので、天気という情報種別の確率は１である。なお、提供情報種別が複数あるときは、たとえば、各提供情報種別の確率が等確率であると仮定して、各提供情報種別の確率を計算する方法がある。
【００４５】
対話手順生成部１３０は、現在のシステム理解状態（３）の下で可能な対話手順（対話プラン）を生成する（ステップ２０６）。可能な対話手順を網羅的に記述すると以下の通りである。
【００４６】
（１）確定的な確認型の対話手順Ａ
（ｉ）場所属性の値のみを確認し、その後で、確認した場所のすべての日（今日、明日）の天気を応答するという手順。
（ｉｉ）日属性の値のみを確認し、その後で、確認した日のすべての場所（１００ケ所）の天気を応答するという手順。
（ｉｉｉ）場所属性の値を確認、その後、日属性の値を確認し、その後で、確認した場所、日の天気を応答するという手順。
（ｉｖ）日属性の値を確認し、その後、場所属性の値を確認し、その後で、確認した日、場所の天気を応答するという手順。
（ｖ）場所属性と日属性の値を同時に確認し、その後で、確定した場所、日の天気を応答するという手順。
【００４７】
（２）試行的な確認型の対話手順Ｂ
（ｉ）場所属性を確認し、その後で、確定した場所における明日の天気を応答するという手順。この場合、明日という値は確定済みではないが、正しい値であると仮定して応答を生成する。
（ｉｉ）日属性を確認し、その後で、確定した日の場所「香川県」の天気を応答するという手順。この場合、場所「香川県」という値は確定済みではないが、正しい値であると仮定して応答を生成する。
【００４８】
（３）確定的な即時応答型の対話手順Ｃ
確認を行わずに、すべての場所（１００ケ所）、すべての日（今日、明日）の天気を応答するという手順。
【００４９】
（４）試行的な即時応答型の対話手順Ｄ
（ｉ）確認を行わずに、香川県のすべての日（今日、明日）の天気を応答するという手順。この場合、場所属性「香川県」という値は態定済みではないか、正しい値であると仮定して応答を生成する。
（ｉｉ）確認を行わずに、明日のすべての場所（１００ケ所）の天気を応答するという手順。この場合、日属性「明日」という値は確定済みではないが、正しい値であると仮定して応答を生成する。
（ｉｉｉ）確認を行わずに、香川県、明日の天気を応答するという手順。この場合、場所属性「香川県」、日属性「明日」という値は確定済みではないが、正しい値であると仮定して応答を生成する。
【００５０】
ここでは、簡単に、（１）確定的な確認型の対話手順Ａでは（ｖ）のケースの、場所と日を同時に確認し、その後で、確定した場所、日の天気を応答するという手順、（２）試行的な確認型の対話手順Ｂでは（ｉ）のケースの、場所を確認し、その後で、確定した場所における明日の天気を応答するという手順、（３）確定的な即時応答型の対話手順Ｃでは上記の通り、すべての場所、すべての日の天気を応答するという手順、（４）試行的な即時応答型の対話手順Ｄでは（）のケースの、香川県、明日の天気を応答するという手順の、４つの対話手順が生成されたとする。
【００５１】
対話手順コスト計算部１４０は、各対話手順のコストを計算する（ステップ２０７）。今、提供情報種別は天気の一つであり、その確率は１である。
まず、確定的な即時応答型の対話手順Ｃの対話コストについて説明する。対話手順Ｃは、現在確定済みである天気という値のみにしたがって応答するという対話手順である。場所属性と日属性の確定済みでない値は使わないので、すべての場所、日の天気を応答することになる。ここで想定している状況では、データベース１４０に１００個の場所が登録されており、日属性の値は今日と明日の２つの値であるとしている。一つの場所、一つの日における天気について応答するためには、「明日の神奈川県は晴れです」のように、「明日」、「神奈川県」、「晴れ」という３つの自立語が必要であるとする。対話手順は、１００個の場所における今日と明日の天気を応答するので、６００個の自立語を含む応答文を出力することになる。したがって、対話手順Ｃのコストは「６００」となる。
【００５２】
次に、確定的な確認型の対話手順Ａのコストについて説明する。対話手順Ａでは、まず、場所と日という２つの属性の値を確認する。この確認のための対話の中でやり取りされる自立語の数の期待値は、先の第四の従来法で用いられている方法によって計算されるものとする（非特許文献２参照）。ここでは、確認する属性の認識精度をｒとし、確認する属性の数をｍとするとき、それらの属性の値を確定するまでにやり取りされる自立語の数は、以下の式で表される。
２ｍ／ｒ−ｍ＋１（４）
【００５３】
今、場所属性の認識精度が０．６０、日属性の認識精度が０．９５であるとする。場所属性と日属性を同時に正しく認識できる認識精度は、各属性の認識精度の積で表されるとすると、０．６０×０．９５＝０．５７となる。したがって、場所属性と日属性の値を同時に確定するまでの対話の中でやり取りされる自立語の期待数は、（４）式で、ｍ＝２、Ｆ＝０．５７として、２・２／０．５７−２＋１＝６．０２となる。
【００５４】
対話手順Ａは、場所属性と日属性の値を確定した後、確定済みの場所と日における天気を応答する。この応答においては、「今日の神奈川県は晴れです」のように、３つの自立語が出力されるとする。確認のために６．０２個の自立語を要し、確認後の応答のために３個の自立語を要することになり、合計「９．０２」個の自立語が必要となる。結果として、対話手順Ａの対話コストは９．０２となる。
【００５５】
次に、試行的な即時応答型の対話手順Ｄのコストについて説明する。対話手順Ｄでは、現在のシステム理解状態（３）において確定済みではない、香川県、明日という値と、確定済みの天気という値にしたがって、応答を行うという対話手順である。ここでは、システムは「明日の香川県は晴れです」と応答することになる。もちろん、香川県は誤認識した値なので、この応答によってはユーザが必要とする情報は与えられない。
【００５６】
ここで、試行的な即時応答型の対話手順のコストについて、一般的な形で説明する。現在のシステム理解状態が正しい確率をｐとして、システム理解状態が正しい場合と正しくない場合に分けて説明を進める。まず、現在のシステム理解状態が正しい場合について考える。システム理解状態が正しいならば、システムの応答はユーザの問い合わせに合致したものであり、ユーザは必要な情報を得ることができて、ここで対話は終了する。システム理解状態が正しいときのシステム応答に含まれる自立語の数をＬ１とする。
【００５７】
次に、現在のシステム理解状態が正しくない場合を考える。システム理解状態が正しくない確率は１−ｐである。このとき、システム応答は誤っているので、ユーザは必要な情報を手に入れることはできない。ユーザは必要な情報を手に入れるために、現在の問い合わせをはじめからやり直し、対話を何度か行った後に、必要な情報を手に入れることになる。
【００５８】
ここでコストの計算を容易にするために、誤った応答が出力された後に続く対話においても、現在のシステム理解状態で確定済みとなっている属性は確定済みとなり、確定済みでない属性は確定しないという状態で、システム応答が出力されると仮定する。継続する１回の対話の中でやり取りされる自立語の期待値をＬ２とする。期待値Ｌ２に関しては、事前に収集した対話データを使って、現存の問い合わせている情報種別についての対話に含まれる自立語の数の平均値を計算し、この平均値をＬ２とする。縦続する個々の対話では、確率ｑで正しい応答が出力されるとすると、正しい応答を行うまでに必要な対話回数の期待値は１／ｑとなる。
【００５９】
以上の考察をまとめると、現在のシステム理解状態が確率ｐで正しい場合、システムは正しい応答を行って、ユーザは必要な情報を得ることに成功する。このとき、システム応答に含まれる自立語の数をＬ１とする。次に、現在のシステム理解状態が確率１−ｐで正しくない場合、Ｌ１個の自立語を含む誤ったシステム応答が出力された後、Ｌ２個の自立語を含む対話が１／ｑ回続き、ユーザは必要な情報を得ることに成功する。したがって、試行的な即時応答型の対話手順の対話コストは次のようになる。これをあらかじめ定義しておく。

【００６０】
今取り上げている例に戻る。現在のシステム理解状態（３）が正しい確率は、場所属性の認識精度０．６０と日属性の認識精度０．９５の積で与えられるとする。したがって、ｐ＝０．６０×０．９５＝０．５７でシステム理解状態（３）は正しい。ここでは、誤った応答を行った後の対話で正しいシステム応答が生成される確率ｑは、Ｐと等しいと仮定して、ｑ＝ｐ＝０．５７とする。誤った応答「明日の香川県は晴れです」に含まれる自立語は３個であるので、Ｌ１＝３である。誤った応答の後に続く各対話でやり取りされる自立語の期待値Ｌ２については、事前に収集された対話データの中で天気の問い合わせを行う対話に含まれる自立語の平均数をとり、Ｌ２＝１０となったと仮定する。（５）式より、対話手順Ｄのコストは、３＋０．４３・１０／０．５７＝３＋７．５４＝１０．５４となる。
【００６１】
次に、試行的な確認型の対話手順Ｂの対話コストについて説明する。対話手順Ｂは、ここでは、場所属性を確認し、その後で、確定した場所における明日の天気を応答するという手順である。まず、場所属性の値を確定するまでにやり取りされる自立語の期待数は、（４）式で、ｍ＝１、ｒ＝０．６５であるから、２・１／０．６５−１＋１＝３．０８となる。
【００６２】
場所属性が確定した後の状況では、日属性の値のみが未確定である。日属性の認識精度は０．９５であると仮定しているので、この時点でシステム理解状態が正しい確率は０．９５となる。この状況は、日属性の値が正しいと仮定してシステム応答を行うという試行的な即時応答型の対話手順を踏んでいるのと同じ状況である。したがって、場所属性の値が確定した後の対話に含まれる自立語の期待数は、（５）式において、ｐ＝ｑ＝０．９５、Ｌ１＝３、Ｌ２＝１０として、３＋０．０５・１０／０．９５＝３＋０．５３＝３．５３となる。結局、対話手順Ｂの対話コストは、３．０８＋３．５３＝６．６１となる。
【００６３】
以上をまとめると、
（１）確定的な確認型の対話手順Ａの対話コスト：９．０２
（２）試行的な確認型の対話手順Ｂの対話コスト：６．６１
（３）確定的な即時応答型の対話手順Ｃの対話コスト：６００
（４）試行的な即時応答型の対話手順Ｄの対話コスト：１０．５４
となる。
【００６４】
このように、対話手順コスト計算部１４０が各対話手順のコストを計算した後、確認侯補生成部１５０が、現在のシステム理解状態に含まれる属性のうち、確定済みでない属性のすべての組み合わせを確認候補として生成する（ステップ２０８）。先のシステム理解状態（３）から、組み合わせは、（１）場所を確認、（２）日を確認、（３）場所と日を確認の３通りである。ここでは、簡単のため、確認候補として、場所属性のみを確認するという確認候補、場所属性と日属性を同時に確認するという確認候補が生成されたとする。
【００６５】
次に、確認候補コスト生成手段１６０は、確認候補の各々に関して、確認侯補を含むような対話手順の中で、最小のコストをもつ対話手順を選び、提供情報種別の確率を考慮して、コストの期待値を計算し、その期待値を確認候補のコストとして生成する（ステップ２０９）。ここでは、可能な提供情報種別は天気のみであり、その確率は１である。
場所属性のみの確認を含む対話手順は、試行的な確認型の対話手順Ｂのみである。したがって、場所属性のみを確認するという確認候補のコストは「６．６１」となる。
場所属性と日属性を同時に確認することを含む対話手順は、確定的な確認型の対話手順Ａのみである。したがって、場所属性と日属性を同時に確認するという確認候補のコストは「９．０２」となる。
【００６６】
続いて、出力部１８０が、コスト判定し（ステップ２１０）、その判定結果に基づいて確定文又は応答文を生成して出力する（ステップ２１１）。詳しくは、まず、すべての確認候補（ここでは、場所のみ確認、場所と日を確認）のコストと、すべての即時応答型の対話手順（ここでは、対話手順ＣとＤ）のコストを比較する（ステップ２１１０）。そして、即時応答型の対話手順のコストの方が小さかったならば、即時応答型の対話手順の中から、最小のコストをもつ即時応答型（確定的な即時応答型又は試行的な即時応答型）を決定し（ステップ２１１１）、その対話手順にしたがって情報を提供するための応答文を生成する（ステップ２１１２）。一方、確認候補のコストの方が小さかったならば、確認候補の中から最小のコストをもつものを決定し（ステップ２１１３）、その確認候補の確認（ここでは、場所を確認又は場所と日を確認）を行うための確認文を生成する（ステップ２１１４）。
【００６７】
現在の例の場合、結局、最小のコストは、場所属性のみを確認するという確認侯補のコスト６．６１である。出力部１７０は、現在のシステム理解状態（３）にしたがって、場所属性の値のみを確認するための確認文「香川県ですか？」を出力する。
【００６８】
この後、ユーザが「神奈川県です」という訂正文を入力し、システムはその訂正文を正しく認識し、次に、「神奈川県ですか」という場所属性の値を確認するための確認文を出力し、ユーザが「はい」という承認文を入力し、システムはその承認文が正しく認識したとする。このとき、システム理解状態は次のようになる。

日属性のみが未確定である。したがって、この場合もステップ２０５以降の処理に進む。
【００６９】
ここで、対話手順生成部１３０は、再び可能な対話手順をすべて生成する（ステップ２０６）。ここでは、簡単のため、次の２つの対話手順を取り上げて、説明を続ける。
（５）確定的な確認型の対話手順Ａ：
日属性の値を確定した後で、確定した日における神奈川県の天気を応答するという手順。
（６）試行的な即時応答型の対話手順Ｄ：
確認を行わずに、神奈川県の明日の天気を応答するという手順。ここでは、明日という値は確定済みではないが、正しい値であると仮定して応答が出力される。
【００７０】
次に、対話手順コスト計算部１４０が、各対話手順のコストを計算する（ステップ２０７）。今、可能な提供情報種別は天気のみであり、その確率は１である。
まず、（５）の対話手順Ａの対話コストについて説明する。日属性を確定するまでに必要な自立語の期待数は、日属性の認識精度は０．９５であるので、式（４）で、ｍ＝１、ｒ＝０．９５として、２・１／０．９５−１＋１＝２．１１となる。日属性が確定した後に生成される応答に含まれる自立語は３である。したがって、対話手順Ｅの対話コストは、２．１１＋３＝５．１１となる。
次に、（６）の対話手順Ｄの対話コストについて説明する。日属性の認識精度は０．９５であるので、現在のシステム理解状態（６）が正しい確率は０．９５である。また、天気についての問い合わせの対話は、平均して１０個の自立語をやり取りするとしているので、対話手順の対話コストは、式（５）で、ｐ＝ｑ＝０．９５、Ｌ１＝３、Ｌ２＝１０とおいて、３＋０．０５・１０／０．９５＝３＋０．５３＝３．５３となる。
【００７１】
まとめると、
確定的な確認型の対話手順Ａのコスト：５．１１
試行的な即時応答型の対話手順Ｄのコスト：３．５３
となる。
【００７２】
次に、確認候補コスト生成部１５０が、ここでは日属性のみを確認するという確認候補のみを生成する（ステップ２０８）。この確認候補のコストは、対話手順Ｅのコストと等しく、５．１１となる。
【００７３】
文出力部１８０は、日属性のみを確認するという確認候補のコスト５．１１と、試行的な即時応答型の対話手順Ｄのコスト３．５３を比較し、小さいコストを与える対話手順Ｄを選択する（ステップ２１０、２１１）。この選択にしたがって、日属性を確認するための確認文を出力することなしに、「明日の神奈川県は晴れです」という応答文が出力されることになる。
【００７４】
以上の対話をまとめると次のようになる。
（文１）ユーザ：「神奈川県の明日の天気を教えてください」
（文２）システム：「天気ですか？」
（文３）ユーザ：「はい」
（文４）システム：「香川県ですか？」
（文５）ユーザ：「神奈川県です」
（文６）システム：「神奈川県ですか？」
（文７）ユーザ：「はい」
（文８）システム：「神奈川県の明日の天気は晴れです」
ここでの説明は、文（４）以降のシステムの行動に焦点を絞った説明である。
【００７５】
なお、図３の処理フローにおいて、ある時点で、ステップ２０４にて、システム理解状態に未確定の属性がなくなったことが判定されれば、出力部１７０では、当該システム理解状態に基づいて直ちに応答文を生成し出力することになる。
【００７６】
本音声対話システム１０では、認識精度の低い場所属性の値は確認するが、認識精度の高い日属性の値は確認しないという振る舞いを示す。どの属性の値を確認するかという判断は、対話手順のコストという客観的な基準によって決定されている。この方法により、ユーザは短い対話で必要な情報を手に入れることが可能となる。
【００７７】
なお、図２で示した音声対話システム１０における各部の一部もしくは全部の処理機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、あるいは、図３、図４で示した処理手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもない。また、コンピュータでその処理機能を実現するためのプログラム、あるいは、コンピュータにその処理手順を実行させるためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えば、ＦＤ、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディスクなどに記録して、保存したり、提供したりすることができるとともに、インターネット等のネットワークを通してそのプログラムを配布したりすることが可能である。
【００７８】
【発明の効果】
以上説明したように、本発明の音声対話システムによれば、対話全体の長さを増大させることがないと判断される場合には、ユーザ問い合わせ内容の一部或いは全てに関して、確認を省くことができ、確認文の出力数の増大に伴うユーザ不満足度の増大を避けることができる。また、システムが誤認識のため誤った応答を行ったときには、ユーザが拒否発話を行うということを前提としていないので、ユーザに拒否発話を行うことを強制する必要がなくなり、ユーザ満足度が増す。
【図面の簡単な説明】
【図１】本発明のシステム環境を示す全体構成図である。
【図２】本発明による音声対話システムの一実施例の機能ブロック図である。
【図３】本発明の一実施例の処理フローチャートである。
【図４】図３におけるコスト判定の詳細処理フローチャートである。
【符号の説明】
１０音声対話システム
２０データベース
３０メモリ装置
４０通信回線
５０利用者端末
１００入力部
１１０文理解部
１２０提供情報種別決定部
１３０対話手順生成部
１４０対話手順コスト計算部
１５０確認候補生成部
１６０確認候補コスト生成部
１７０出力部
１８０制御部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an interactive information providing system, and more specifically, when a user inputs an inquiry about the contents of a database using voice, the system exchanges, if necessary, for confirming the contents of the user's inquiry. The present invention relates to a voice interaction system and method, a voice interaction program, and a recording medium for providing the contents of a database to a user by voice in accordance with the contents of a query that has been recognized.
[0002]
[Prior art]
For example, when there is a database that stores various information such as weather information, a television program guide, a schedule table, and the time of transportation, when a user has a database that stores various information such as weather information, a TV program guide, a schedule table, and the like, the user can recognize a sentence meaning an inquiry about the contents of the database by voice recognition. When input by voice through the device, the contents of the query recognized by the system are represented as a system understanding state expressed as a set of triples indicating attributes, attribute values, and flags indicating whether the attribute values have been determined. Hold.
[0003]
In such a speech dialogue system, the result of recognizing the contents of the user inquiry is not always correct due to a speech recognition error. Therefore, if necessary, a confirmation sentence is output to the user as a voice in order to confirm all or a part of the system understanding state. When the user inputs an approval statement (eg, “Yes”) indicating approval of the confirmation, the system records the contents of the confirmed system understanding state as determined. When the user inputs a correction sentence indicating a correction in response to the confirmation, the system updates the system understanding state according to the content of the correction sentence. After the exchange for confirmation, the system determines the contents of the user inquiry according to the system understanding state and outputs a response sentence for providing information to the user.
[0004]
On the other hand, if no confirmation is made, immediately after the user inputs the inquiry, the system determines the contents of the user inquiry according to the system understanding state at that time and outputs a response sentence for providing information to the user. It becomes.
[0005]
The advantage of performing the confirmation is that if the user enters an approval statement in response to the confirmation statement for confirming the attribute value, the probability that the attribute value is correct increases, so the system can more accurately convey the information required by the user. It is possible to do. However, the confirmation increases the length of the dialogue and may reduce user satisfaction.
[0006]
Conventionally, as a technology for the voice dialogue system to confirm the contents of the user inquiry, the first technology for generating a response sentence only according to the value of the attribute determined by the approval sentence from the user, the confirmation The second technique of not performing, the reliability of the voice recognition result obtained when recognizing the value of each attribute is recorded in the system understanding state, and if the reliability is larger than a certain threshold, the Assuming that the third technique of not confirming the value of the attribute (for example, see Non-Patent Document 1) and that the value of the attribute that has not been determined as well as the value of the attribute that has not been determined are correct. When the response sentence is generated, if the response sentence is incorrect, it is assumed that there is a rejection utterance from the user, and if the rejection utterance from the user is received, the value of the determined attribute is next. Correct As a value, the fourth technique of generating a response text (e.g., Non-Patent Documents 2 and 3, reference) is.
[0007]
[Non-patent document 1]
Kazunori Komatani, Tatsuya Kawahara, "Dialogue Management for Addressing Speech Recognition Errors in Mixed Initiative Dialogue",
Proceedings of the 6th Annual Conference of the Language Processing Society of Japan, 336-339 (2000)
[Non-patent document 2]
Koji Dosaka, Yoshihito Yasuda, Kiyoaki Aikawa, "Efficient Spoken Dialogue Control under System Knowledge Restriction",
Natural language processing, Vo1.9, No. 1 (2002) 43-63
[Non-Patent Document 3]
Koji Dosaka, Nobuhito Yasuda, Kiyoaki Aikawa, "Speech Dialogue Control Method Based on Maximizing Information Transmission Efficiency",
Proceedings of the 8th Annual Conference of the Language Processing Society of Japan, 260-263 (2002)
[0008]
[Problems to be solved by the invention]
The speech dialogue system does not guarantee that the result of recognizing the content of the user inquiry is correct due to a speech recognition error. Therefore, in the voice interaction system, a technology for responding to the user's inquiry as much as possible even if a voice recognition error exists has been developed.
[0009]
In any of the first, second, third, and fourth technologies of the above-described conventional technology, the content of the user's inquiry recognized by the system at each point of the dialogue is attribute and attribute information. It is represented as a set of triples of values and flags indicating whether the values have been determined. This set of three is called a system understanding state.
[0010]
In the first technique, the system outputs a confirmation sentence for confirming the value of the recognized attribute to the user, and when the approval sentence from the user, for example, the sentence "Yes" is recognized, the system confirms the sentence. The fact that the attribute value has been determined is recorded in the system understanding state. The system searches the database according to only the determined attribute values, and outputs a response sentence for transmitting the search result to the user. Since this method outputs a response sentence only according to the value approved by the user, there is an advantage that the system is less likely to give an erroneous response according to the result of erroneously recognizing the contents of the user's inquiry. The output of a confirmation sentence is effective to increase the reliability of the response sentence of the system, but the number of confirmations increases the time required for the user to obtain necessary information, thereby increasing user dissatisfaction. It also causes the problem.
[0011]
The first conventional technique checks all attribute values necessary for creating a response sentence, and does not reduce the number of check sentence, thereby increasing user dissatisfaction. is there.
[0012]
The second conventional technique is a method in which no confirmation is performed. Unlike the first conventional technique, this method does not increase user dissatisfaction in that it does not output a confirmation sentence at all. However, since the confirmation is not performed at all, the system may output a response sentence based on the result of incorrect recognition of the content of the user inquiry. As a result, the user has to make an inquiry again, and on the contrary, there is a problem that the length of the dialogue until obtaining the information required by the user increases.
[0013]
The third conventional technique records the reliability of a speech recognition result of an attribute value, and does not check the attribute value having a reliability greater than a certain threshold value, and the reliability of the speech recognition result is low. This is a method of checking at times. In this method, the threshold is determined based on a criterion for improving the understanding rate of the content of the user inquiry. There is a problem that the criterion of improving the understanding rate of the content of the inquiry of the user does not necessarily result in shortening the length of the dialog required for obtaining the information required by the user as much as possible.
[0014]
The conventional fourth technique enumerates possible procedures at each point in the dialogue until the output of a response sentence is completed, and follows the dialogue procedure that minimizes the length of the dialogue. The method is to determine the behavior of the system at each point in time. As a dialogue procedure, after confirming, a deterministic confirmation-type interactive procedure in which a response sentence is output only according to the confirmed system understanding state. Deterministic immediate response type dialogue procedure that outputs a response sentence only according to the state, a response sentence is output according to the system understanding state including attribute values that have not been finalized at this time, and the user rejects the response sentence In the case of, the dialogue is re-executed, and in the dialogue that has been redone, a confirmation necessity search type dialogue procedure in which a response sentence is output only according to the determined system understanding state is considered. In the fourth technique, for each of these interaction procedures, the length of the interaction is estimated, and the interaction procedure with the shortest interaction length is selected. According to the selected interaction procedure, the next action of the system is determined.
[0015]
In the fourth technique, if the necessity of searching for confirmation type dialogues can be determined that the user can obtain necessary information in a shorter dialogue than other types of dialogues, this confirmation necessity is required. A gender-search-type interaction procedure is selected, and the confirmation of some attribute values can be omitted. At this time, it is also guaranteed that the length of the entire dialogue does not increase compared to other dialogue procedures.
[0016]
The conventional fourth technique has an advantage that some confirmations can be omitted as compared with the conventional first technique, and outputs an incorrect response sentence many times in the conventional second technique. This can overcome the problem that the length of the dialogue increases. In the third technique, a criterion for selecting a threshold is a criterion of improving the understanding rate of a user inquiry, while in the fourth technique, a length of a dialog when each dialog procedure is followed is shortened. Standards are used.
[0017]
However, the fourth technique is based on the premise that the user makes a rejection utterance when the system makes an incorrect response, and there is a problem that it cannot be applied when this premise is not satisfied. In an actual dialogue, when the system responds incorrectly, the user may give a denial utterance such as "No," but the user will be asked again without explicitly making the denial utterance. It is also possible to repeat the dialog repeatedly. Also, forcing the user to make sure that a rejection utterance must be made when the system responds incorrectly places a burden on the user and increases the user's dissatisfaction.
[0018]
Thus, although the fourth technique has the advantage of overcoming the problems of the first, second, and third techniques, it is premised that the user makes a rejection utterance for an incorrect response of the system. Therefore, there is a problem that this premise is not always satisfied in an actual dialogue.
[0019]
The present invention has been made in view of the above-described problems of the conventional technology in a voice interaction system, and is based on the premise that a user makes a rejection utterance when the system responds incorrectly due to erroneous recognition. It is an object of the present invention to reduce the output of a confirmation sentence for confirming a system understanding state as much as possible, and to suppress an increase in the length of a dialogue due to a restart of the dialogue.
[0020]
[Means for Solving the Problems]
In the present invention, the consideration necessity search type interaction procedure which has been considered in the conventional fourth technique is not considered. This is because the confirmation necessity search type dialog procedure is a dialog procedure based on the premise that the user rejects the utterance. Instead, a confirmation statement for confirming all or a part of the values of the attributes included in the system understanding state is output, and after the user inputs the approval statement, not only the confirmed attribute values but also the confirmed attributes are not confirmed. Assuming that the value of the attribute is also correct, it outputs a response sentence to provide information, a trial-type confirmation-type interactive procedure, and without confirmation to the user, not only the value of the determined attribute, Assuming that the value of the attribute that has not been determined is also correct, consider a trial immediate response type interactive procedure in which a response sentence for immediately providing information to the user is output. The trial confirmation-type interactive procedure and the trial immediate response-type interactive procedure do not assume a rejection utterance from the user for an erroneous response of the system.
[0021]
Further, in the present invention, in addition to the trial confirmation-type interactive procedure and the trial immediate-response-type interactive procedure, all or some of the values of the attributes included in the system understanding state are confirmed. A definitive confirmation-type interactive procedure in which a confirmation sentence is output, and after the user inputs an approval sentence, a response sentence for providing information to the user according to the determined attribute value is output, and confirmation to the user is performed. Is not performed, and a definite immediate response type interactive procedure in which a response sentence for immediately providing information to the user is output according to the value of the determined attribute is also considered.
[0022]
At each point in the dialogue, there are four types of interaction procedures: deterministic confirmation-type interaction procedure, trial confirmation-type interaction procedure, deterministic immediate response-type interaction procedure, and trial immediate-response-type interaction procedure. Considering all of these, select the dialogue procedure that minimizes the length of the dialogue among the possible dialogue procedures. Assuming that the length of the dialogue when the dialogue is performed according to the dialogue procedure is called the cost of the dialogue procedure, a dialogue procedure that minimizes the cost of the dialogue procedure may be selected.
[0023]
The interaction cost for each interaction procedure can be calculated as follows.
Interaction cost of a deterministic confirmation-type interaction procedure: The expected value of the number of self-contained words exchanged between the system and the user when interacting according to the interaction procedure.
Interaction cost of a deterministic immediate response interaction procedure: the expected value of the number of independent words included in the system response sentence when the interaction is performed according to the interaction procedure.
Dialogue cost of trial confirmation-type dialogue procedure: When the dialogue is performed according to the dialogue procedure, the expected value of the independent word exchanged between the system and the user and the system response sentence included in the dialogue procedure are incorrect. , The sum of the expected values of the number of independent words that the system and the user interact with in a subsequent dialogue.
Interaction cost of trial immediate response type interaction procedure: When performing an interaction according to the interaction procedure, the expected value of the number of independent words included in the system response sentence and the subsequent interaction if the system response is incorrect. The sum of the expected values of the number of independent words exchanged between the system and the user in.
[0024]
According to the present invention, by considering all of these four types of interaction procedures and selecting the interaction procedure having the minimum interaction cost under an objective criterion of the interaction cost, the system can provide an incorrect response due to erroneous recognition. In order to reduce the output of confirmation sentences to confirm the system understanding state as much as possible without assuming that the user makes a rejection utterance when performing Can be.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 shows a system environment of a voice interaction system according to the present invention. The voice dialogue system 10 includes a database 20 and a memory device 30, and is connected to a large number of user terminals (computer terminals) 50 via a communication line 40 or the like. When the user uses the user terminal 50 to input a sentence meaning an inquiry about information by voice, the voice interaction system 10 exchanges with the user to confirm all or a part of the inquiry, and then performs an inquiry. The information held in the database 20 according to the content is provided to the user by voice via the communication line 40 and the user terminal 50. The entity of the voice interaction system 10 is a computer system. The memory device 30 is a working memory of the voice interaction system 10, and stores a system understanding state, an interaction procedure, an interaction cost, a confirmation candidate, a confirmation candidate cost, and the like at each point of the interaction, as described later.
[0026]
FIG. 2 is a functional block diagram showing a configuration example of the voice interaction system 10 according to the present invention. In the figure, an input unit 100 and an output unit 170 are connected to a user terminal 50 via a communication line 40 or the like.
[0027]
The input unit 100 inputs a sentence indicating an information inquiry or a sentence indicating a correction of the inquiry content or a confirmation that is input by the user as a voice (voice signal) via the voice recognition device of the user terminal 50 as a computer terminal. The sentence which means is taken in.
[0028]
The sentence understanding unit 110 recognizes the content of the user inquiry input by voice (voice signal) from the input unit 100, and holds the content of the inquiry recognized by the system as a system understanding state. In practice, the system understanding state is held in the memory device 30. The system understanding state is expressed as a set of three sets of attributes, attribute values, and flags indicating whether the attribute values have been confirmed. When it is recognized that the user has input a sentence indicating an information inquiry, the sentence understanding unit 110 generates a system understanding state from the recognition result of the inquiry sentence. When a correction sentence of the content of the inquiry is input from the user, the current system understanding state is updated according to the result of recognizing the content of the correction sentence. If the system outputs a confirmation statement to confirm to the user all or part of the values of the attributes included in the system understanding state, and then recognizes that the user has input the approval statement, the confirmed value Is recorded in the current system understanding state.
[0029]
The provided information type determining unit 120 determines all types of information that can be provided to the user as provided information types by judging from the system understanding state generated / updated by the sentence understanding unit 110. The probability of the provided information type is calculated in addition to the type. Although omitted in FIG. 1, the provided information and its probability are also stored in the memory device 30.
[0030]
The dialog procedure generation unit 130 confirms, for each of the provided information types determined by the provided information type determining unit 120, after performing an exchange for confirming all or a part of the values of the attributes included in the system understanding state. Assuming that only the attribute value is a correct value, a deterministic confirmation-type interactive procedure (A) that outputs a response sentence for providing information to the user, and the attribute value included in the system understanding state To provide information to the user assuming that the value of the attribute that has not been finalized as well as the value of the attribute that has been finalized after the exchange for confirming all or part of the attribute is correct (B), which is a trial confirmation-type interactive procedure of outputting a response sentence, and prompting the user immediately by assuming that only the confirmed attribute value is the correct value without confirming the user. Provide Deterministic immediate response type interactive procedure (C) of outputting a response sentence for confirmation, and not only confirmation of the user but also correct values of not only confirmed attributes but also undetermined attributes As a result, a trial immediate response type interactive procedure (D) of immediately outputting a response sentence for providing information to the user is generated. Each generated interactive procedure (interactive plan) is held in the memory device 30.
[0031]
The interaction procedure cost calculator 140 calculates the cost (interaction cost) of each interaction procedure generated by the interaction procedure generator 130. Regarding the deterministic confirmation type dialogue procedure (A), the expected value of the number of independent words exchanged between the system and the user when performing a dialogue according to the dialogue procedure is defined as the cost of the deterministic confirmation type dialogue procedure. calculate. Regarding the trial confirmation-type dialog procedure (B), when the expected value of the independent word exchanged between the system and the user when performing the dialog according to the dialog procedure and the system response sentence included in the dialog procedure are incorrect. The sum of the expected value of the number of independent words exchanged between the system and the user in the subsequent dialog is calculated as the cost of the trial confirmation-type dialog procedure. Regarding the deterministic immediate response type interactive procedure (C), when the system responds according to the interactive procedure, the expected value of the number of independent words included in the system response sentence is determined by the deterministic immediate response type interactive procedure. Calculate as cost. Regarding the trial immediate response type interactive procedure (D), when the system responds according to the interactive procedure, the expected value of the number of self-contained words included in the system response sentence and the following when the system response is incorrect. The sum of the expected values of the number of independent words exchanged between the system and the user in the dialog is calculated as the trial immediate response type dialog procedure. The cost of each generated interactive procedure is stored in the memory device 30 in association with the interactive procedure.
[0032]
The confirmation candidate generation unit 150 generates, as confirmation candidates, all combinations of unconfirmed attributes among the attributes included in the system understanding state. Each of the generated confirmation candidates is held in the memory device 30.
[0033]
The confirmation candidate cost generation unit 160 selects, for each of the confirmation candidates, the interaction procedure having the minimum cost among the interaction procedures including confirmation of the confirmation candidate for each provided information type, and determines the probability of the provided information type. Considering this, the expected value of the cost is calculated, and the expected value is generated as the cost of the confirmation candidate. The generated cost of each confirmation candidate is held in the memory device 30 in association with the confirmation candidate.
[0034]
The output unit 170 compares the cost of all confirmation candidates with the cost of all immediate response interactive procedures. As a result of the comparison, if the cost of the immediate response type interactive procedure is smaller, a response sentence for providing information is generated and output according to the minimum immediate response type interactive procedure, and the cost of the confirmation candidate is output. Is smaller, a confirmation sentence for confirming the minimum cost confirmation candidate is generated and output.
[0035]
The control unit 180 controls the operation of each unit so that the units 100 to 170 operate in cooperation as described above.
[0036]
FIG. 3 shows a processing flowchart of the voice conversation system 10. FIG. 4 shows a detailed processing flowchart of the cost determination step 210 in FIG.
[0037]
Hereinafter, detailed operations of the embodiment of the present invention will be described together with specific examples.
Here, as an example, a voice interaction system that provides weather information guidance is considered. It is assumed that there are two types of available information, weather and alarm. It is assumed that the content of the user's inquiry is represented by three attributes: location, date, and information type. The location attribute takes the name of a prefecture or city such as Kanagawa prefecture or Kagawa prefecture as a value, the day attribute takes the value of today and tomorrow, and the information type takes the values of weather and warning. The database 20 stores 100 locations, and stores the weather category predicted for each location and the type of warning that has been announced.
[0038]
It is assumed that the input unit 100 firstly inputs a sentence indicating the following information inquiry from the user (step 201).
"Please tell me about the weather in Kanagawa tomorrow." (1)
[0039]
The sentence understanding unit 110 recognizes the contents of the inquiry sentence from the user and generates a system understanding state (step 202). The system understanding state is represented as a set of three sets of attributes, attribute values, and flags indicating whether the attribute values have been confirmed. It is assumed that the system understanding state generated by the sentence understanding unit 110 is as follows.
[0040]

In the above expression, the triple <attribute, value, not yet> indicates that the value of the attribute has not been confirmed. In the example of (1), the value of the location attribute is Kanagawa prefecture, the value of the day attribute is tomorrow, the value of the information type attribute is weather, and none of them has been determined. In the generated system understanding state (2), the value of the location attribute is erroneously recognized as “Kagawa Prefecture”. The values of the day attribute and the information type attribute are correctly recognized.
[0041]
The control unit 180 determines whether an unconfirmed attribute value exists in the system understanding state (step 204). If it exists, the process proceeds to step 205; At this point, since there is no determined attribute, the process proceeds to step 205, but the process of the first steps 205 to 211 is omitted.
[0042]
Thereafter, the spoken dialogue system 10 outputs a confirmation sentence such as "Is the weather?" That confirms that the value of the information type is weather, and the user inputs an approval sentence such as "Yes" to this confirmation sentence. Assume a situation where
[0043]
When recognizing the approval sentence from the user, the sentence understanding unit 110 records that the confirmed value has been determined in the system understanding state (step 203). Here, the contents of the system understanding state are expressed as follows.

In the above expression, the triple <attribute, value, done> indicates that the value of the attribute has been confirmed. It is determined that the value of the information type is weather. At this point, the values of the attributes of the place and the day are undetermined, so that the process proceeds to step 205.
[0044]
The provided information type determining unit 120 determines all types of information that can be provided to the user based on the system understanding state, and calculates the respective probabilities (step 205). Here, the weather is generated as the provided information type, judging from the system understanding state (3). Since there is only one possible provided information type, the probability of the information type called weather is 1. When there are a plurality of provided information types, for example, there is a method of calculating the probability of each provided information type, assuming that the probabilities of the provided information types are equal probabilities.
[0045]
The dialogue procedure generation unit 130 generates a dialogue procedure (dialog plan) that is possible under the current system understanding state (3) (step 206). The possible dialogue procedures are described comprehensively as follows.
[0046]
(1) Definitive confirmation type dialogue procedure A
(I) A procedure in which only the value of the location attribute is confirmed, and thereafter, weather of all days (today, tomorrow) of the confirmed location is responded.
(Ii) A procedure in which only the value of the day attribute is confirmed, and then the weather of all places (100 places) on the confirmed day is responded.
(Iii) A procedure of checking the value of the location attribute, then checking the value of the day attribute, and then responding with the weather of the checked location and day.
(Iv) A procedure in which the value of the day attribute is checked, then the value of the place attribute is checked, and then the weather of the checked date and place is responded.
(V) A procedure in which the values of the location attribute and the day attribute are checked at the same time, and then the weather of the determined location and day is responded.
[0047]
(2) Trial confirmation type dialogue procedure B
(I) A procedure of confirming a location attribute, and then responding to tomorrow's weather at the determined location. In this case, the value of tomorrow is not determined, but a response is generated assuming that it is the correct value.
(Ii) A procedure of confirming the day attribute and then responding with the weather at the place “Kagawa Prefecture” on the determined day. In this case, although the value of the location “Kagawa Prefecture” has not been determined, a response is generated assuming that the value is correct.
[0048]
(3) Deterministic immediate response dialogue procedure C
The procedure of responding the weather of all places (100 places) and all days (today, tomorrow) without confirmation.
[0049]
(4) Trial immediate response type dialogue procedure D
(I) A procedure of responding to the weather on all days (today, tomorrow) in Kagawa without confirmation. In this case, a response is generated assuming that the value of the location attribute “Kagawa Prefecture” is not settled or is a correct value.
(Ii) A procedure of responding to the weather at all places (100 places) tomorrow without confirmation. In this case, the value of the day attribute “tomorrow” is not determined, but a response is generated assuming that the value is correct.
(Iii) A procedure of responding to Kagawa Prefecture and tomorrow's weather without confirmation. In this case, the values of the location attribute “Kagawa Prefecture” and the day attribute “Tomorrow” are not determined, but a response is generated on the assumption that they are correct values.
[0050]
Here, simply, (1) in the definite confirmation type dialogue procedure A, the procedure of (v) is to simultaneously confirm the place and day, and then respond with the decided place and day weather, (2) A trial confirmation type interactive procedure B is a procedure of confirming a location in case (i) and then responding to tomorrow's weather at the determined location, and (3) a definite immediate response type. In the interactive procedure C, the procedure of responding to the weather at all places and all days is as described above. (4) In the trial immediate interactive dialog procedure D, the weather of Kagawa prefecture and tomorrow in the case of () , And four interactive procedures are generated.
[0051]
The interaction procedure cost calculator 140 calculates the cost of each interaction procedure (step 207). Now, the provided information type is one of the weather, and its probability is 1.
First, the dialog cost of the deterministic immediate response dialog procedure C will be described. Dialogue procedure C is a dialogue procedure in which a response is made only according to the value of the weather that has been determined. Since the undetermined values of the place attribute and the day attribute are not used, the weather of all places and days is responded. In the situation assumed here, 100 locations are registered in the database 140, and the value of the day attribute is two values of today and tomorrow. In order to respond to the weather in one place and one day, three independent words such as "tomorrow", "Kanagawa", and "sunny" are needed, such as "Tomorrow's Kanagawa is sunny." And Since the interactive procedure responds to today and tomorrow's weather in 100 locations, it will output a response sentence containing 600 independent words. Therefore, the cost of the dialogue procedure C is “600”.
[0052]
Next, the cost of the definite confirmation-type interactive procedure A will be described. In the dialogue procedure A, first, the values of two attributes, a place and a day, are confirmed. The expected value of the number of independent words exchanged in the dialog for confirmation is calculated by the method used in the fourth conventional method (see Non-Patent Document 2). Here, when the recognition accuracy of the attribute to be confirmed is r and the number of attributes to be confirmed is m, the number of independent words exchanged until the values of those attributes are determined is represented by the following equation. .
2m / r-m + 1 (4)
[0053]
Now, it is assumed that the recognition accuracy of the location attribute is 0.60 and the recognition accuracy of the day attribute is 0.95. The recognition accuracy for correctly recognizing the location attribute and the day attribute simultaneously is 0.60 × 0.95 = 0.57, assuming that the recognition accuracy is represented by the product of the recognition accuracy of each attribute. Therefore, the expected number of independent words exchanged during the dialogue until the values of the location attribute and the day attribute are simultaneously determined is given by Expression (4), where m = 2, F = 0.57, and 2/2 / 0.57-2 + 1 = 6.02.
[0054]
After determining the values of the location attribute and the day attribute, the dialogue procedure A responds with the weather of the determined location and day. In this response, it is assumed that three independent words are output, such as "Today Kanagawa is sunny." 6.02 independent words are required for confirmation, and three independent words are required for a response after confirmation, so that a total of “9.02” independent words are required. As a result, the interaction cost of the interaction procedure A is 9.02.
[0055]
Next, the cost of the trial immediate response type interactive procedure D will be described. Dialogue procedure D is a dialogue procedure in which a response is made according to the value of Kagawa prefecture, tomorrow and the value of confirmed weather, which have not been determined in the current system understanding state (3). Here, the system will respond, "Tomorrow's Kagawa Prefecture is fine." Of course, since Kagawa Prefecture is a value that was misrecognized, this response does not provide information required by the user.
[0056]
Here, the cost of the trial immediate response type interactive procedure will be described in a general form. Assuming that the probability that the current system understanding state is correct is p, the description will be divided into a case where the system understanding state is correct and a case where the system understanding state is not correct. First, consider a case where the current system understanding state is correct. If the system understanding state is correct, the response of the system matches the user's query, and the user can obtain necessary information, and the dialog ends here. Let L1 be the number of independent words included in the system response when the system understanding state is correct.
[0057]
Next, consider the case where the current system understanding state is incorrect. The probability that the system understanding state is incorrect is 1-p. At this time, since the system response is incorrect, the user cannot obtain necessary information. In order to obtain the necessary information, the user must restart the current inquiry from the beginning, and after several conversations, obtain the required information.
[0058]
Here, in order to facilitate the cost calculation, even in the dialog following the output of an erroneous response, attributes that have been determined in the current system understanding state are determined, and attributes that have not been determined are not determined. It is assumed that a system response is output in such a state. The expected value of a self-sustained word exchanged in one continuous dialog is L2. As for the expected value L2, the average value of the number of independent words included in the dialogue for the existing inquiry type of information is calculated using the dialogue data collected in advance, and this average value is set to L2. Assuming that a correct response is output with probability q in each cascaded dialogue, the expected value of the number of dialogues required until a correct response is performed is 1 / q.
[0059]
Summarizing the above considerations, if the current system understanding state is correct with the probability p, the system responds correctly and the user succeeds in obtaining necessary information. At this time, the number of independent words included in the system response is L1. Next, if the current system understanding state is incorrect with the probability 1-p, an erroneous system response including L1 independent words is output, and then a dialog including L2 independent words continues for 1 / q times, The user succeeds in obtaining the necessary information. Therefore, the interactive cost of the trial immediate response interactive procedure is as follows. This is defined in advance.

[0060]
Let's return to the example we just took. It is assumed that the probability that the current system understanding state (3) is correct is given by the product of the location attribute recognition accuracy 0.60 and the day attribute recognition accuracy 0.95. Therefore, the system understanding state (3) is correct when p = 0.60 × 0.95 = 0.57. Here, it is assumed that the probability q that a correct system response is generated in a dialog after performing an erroneous response is equal to P, and q = p = 0.57. Since there are three independent words included in the incorrect response "Tomorrow's Kagawa Prefecture is Fine", L1 = 3. Regarding the expected value L2 of the independent word exchanged in each dialogue following the incorrect response, the average number of independent words included in the dialogue for inquiring the weather in the dialogue data collected in advance is calculated as L2 = Assume that it is 10. From the equation (5), the cost of the dialogue procedure D is 3 + 0.43 · 10 / 0.57 = 3 + 7.54 = 10.54.
[0061]
Next, the dialog cost of the trial confirmation type dialog procedure B will be described. Here, the interactive procedure B is a procedure of confirming the location attribute, and then responding to tomorrow's weather at the determined location. First, the expected number of independent words exchanged until the value of the location attribute is determined is m = 1 and r = 0.65 in equation (4), so that 2 · 1 / 0.65-1−1 = 3.08.
[0062]
In the situation after the location attribute is determined, only the value of the day attribute is undetermined. Since the recognition accuracy of the day attribute is assumed to be 0.95, the probability that the system understanding state is correct at this time is 0.95. This situation is the same as taking a trial immediate response type interactive procedure of performing a system response assuming that the value of the day attribute is correct. Therefore, the expected number of independent words included in the dialogue after the value of the location attribute is determined is 3 + 0.05 · 10, where p = q = 0.95, L1 = 3, and L2 = 10 in equation (5). /0.95=3+0.53=3.53. As a result, the interaction cost of the interaction procedure B is 3.08 + 3.53 = 6.61.
[0063]
To summarize the above,
(1) Dialog cost of determinate confirmation type dialog procedure A: 9.02
(2) Interactive cost of trial confirmation type interactive procedure B: 6.61
(3) Dialog cost of deterministic immediate response type dialog procedure C: 600
(4) Interaction cost of trial immediate response type interaction procedure D: 10.54
It becomes.
[0064]
As described above, after the interaction procedure cost calculation unit 140 calculates the cost of each interaction procedure, the confirmation candidate generation unit 150 determines all combinations of unconfirmed attributes among the attributes included in the current system understanding state. It is generated as a confirmation candidate (step 208). From the previous system understanding state (3), there are three combinations: (1) confirm the location, (2) confirm the date, and (3) confirm the location and date. Here, for simplicity, it is assumed that a confirmation candidate for confirming only the location attribute and a confirmation candidate for confirming the location attribute and the day attribute simultaneously are generated as confirmation candidates.
[0065]
Next, for each of the confirmation candidates, the confirmation candidate cost generation unit 160 selects an interaction procedure having the minimum cost from among the interaction procedures including the confirmation candidate, and takes into consideration the probability of the provided information type, The expected value of the cost is calculated, and the expected value is generated as the cost of the confirmation candidate (step 209). Here, the available information type that can be provided is only weather, and the probability is 1.
The interactive procedure including confirmation of only the location attribute is only the trial confirmation type interactive procedure B. Therefore, the cost of the confirmation candidate for confirming only the location attribute is “6.61”.
The interactive procedure including confirmation of the location attribute and the day attribute at the same time is only the determinative confirmation type interactive procedure A. Therefore, the cost of the confirmation candidate for simultaneously confirming the location attribute and the day attribute is “9.02”.
[0066]
Subsequently, the output unit 180 determines the cost (step 210), and generates and outputs a definite sentence or a response sentence based on the determination result (step 211). More specifically, first, the costs of all confirmation candidates (here, confirmation of only a location, confirmation of a place and a date) are compared with the costs of all immediate response type interaction procedures (here, interaction procedures C and D). (Step 2110). Then, if the cost of the immediate response type interactive procedure is smaller, the immediate response type having the minimum cost (deterministic immediate response type or trial immediate response type) is selected from among the immediate response type interactive procedures. ) Is determined (step 2111), and a response sentence for providing information is generated according to the dialog procedure (step 2112). On the other hand, if the cost of the confirmation candidate is smaller, the one having the minimum cost is determined from the confirmation candidates (step 2113), and the confirmation candidate is confirmed (here, the place is confirmed or the place and date are determined). A confirmation sentence for performing (confirmation) is generated (step 2114).
[0067]
In the case of the present example, the minimum cost is, finally, the cost of the confirmation candidate of confirming only the location attribute, which is 6.61. The output unit 170 outputs a confirmation sentence "Is Kagawa?" For confirming only the value of the location attribute according to the current system understanding state (3).
[0068]
After this, the user enters a correction sentence "Is Kanagawa Prefecture", the system correctly recognizes the correction sentence, and then outputs a confirmation sentence to confirm the value of the location attribute "Is Kanagawa Prefecture?" Then, it is assumed that the user inputs an approval sentence of “Yes” and the system correctly recognizes the approval sentence. At this time, the system understanding state is as follows.

Only the day attribute is undetermined. Therefore, in this case as well, the process proceeds to step 205 and subsequent steps.
[0069]
Here, the dialogue procedure generation unit 130 generates all possible dialogue procedures again (step 206). Here, for the sake of simplicity, the following two dialogue procedures are taken up and the description is continued.
(5) Definitive confirmation type dialogue procedure A:
After determining the value of the day attribute, the procedure is to return the weather in Kanagawa on the determined day.
(6) Trial immediate response type interactive procedure D:
The procedure to respond to the weather in Kanagawa tomorrow without confirmation. Here, the value of tomorrow is not determined, but a response is output assuming that it is a correct value.
[0070]
Next, the interaction procedure cost calculator 140 calculates the cost of each interaction procedure (step 207). Now, the available information type is only weather, and the probability is 1.
First, the conversation cost of the conversation procedure A of (5) will be described. The expected number of independent words required until the day attribute is determined is 2 · 1 /, assuming that m = 1 and r = 0.95 in Expression (4), since the recognition accuracy of the day attribute is 0.95. 0.95-1 + 1 = 2.11. The independent word included in the response generated after the date attribute is determined is 3. Therefore, the interaction cost of the interaction procedure E is 2.11 + 3 = 5.11.
Next, the interaction cost of the interaction procedure D of (6) will be described. Since the recognition accuracy of the day attribute is 0.95, the probability that the current system understanding state (6) is correct is 0.95. In addition, since the dialogue of the inquiry about the weather exchanges 10 independent words on average, the dialogue cost of the dialogue procedure is expressed by the following equation (5): p = q = 0.95, L1 = 3, Assuming that L2 = 10, 3 + 0.05 · 10 / 0.95 = 3 + 0.53 = 3.53.
[0071]
Summary,
Cost of determinate confirmation-type interactive procedure A: 5.11
Cost of trial immediate response interactive procedure D: 3.53
It becomes.
[0072]
Next, the confirmation candidate cost generation unit 150 generates only a confirmation candidate that confirms only the day attribute here (step 208). The cost of this confirmation candidate is equal to the cost of the dialogue procedure E, that is, 5.11.
[0073]
The sentence output unit 180 compares the cost 5.11 of the confirmation candidate for confirming only the day attribute with the cost 3.53 of the trial immediate response type interactive procedure D, and selects the interactive procedure D giving a small cost. (Steps 210 and 211). According to this selection, a response sentence "Tomorrow's Kanagawa Prefecture is fine" is output without outputting a confirmation sentence for confirming the day attribute.
[0074]
The above dialogue can be summarized as follows.
(Sentence 1) User: "Please tell me the weather tomorrow in Kanagawa Prefecture."
(Sentence 2) System: "Is the weather?"
(Sentence 3) User: "Yes"
(Sentence 4) System: "Is Kagawa Prefecture?"
(Sentence 5) User: "I'm Kanagawa Prefecture"
(Sentence 6) System: "Is Kanagawa Prefecture?"
(Sentence 7) User: "Yes"
(Sentence 8) System: "Tomorrow's weather in Kanagawa is sunny."
The description here focuses on the behavior of the system after the sentence (4).
[0075]
In the processing flow of FIG. 3, if it is determined at step 204 that there are no undetermined attributes in the system understanding state, the output unit 170 immediately responds based on the system understanding state. A statement is generated and output.
[0076]
The voice dialogue system 10 shows a behavior in which the value of the location attribute with low recognition accuracy is checked, but the value of the day attribute with high recognition accuracy is not checked. The determination of which attribute value to check is determined by an objective criterion of the cost of the interaction procedure. In this way, the user can obtain necessary information in a short dialog.
[0077]
A part or all of the processing functions of each unit in the voice interaction system 10 shown in FIG. 2 may be configured by a computer program, and the program may be executed using a computer to realize the present invention; or Needless to say, the processing procedures shown in FIGS. 3 and 4 can be configured by a computer program and the computer can execute the program. Further, a program for realizing the processing function by the computer or a program for causing the computer to execute the processing procedure is stored in a computer-readable recording medium such as an FD, an MO, a ROM, a memory card, and a CD. , A DVD, a removable disk, or the like, and can be stored or provided, and the program can be distributed through a network such as the Internet.
[0078]
【The invention's effect】
As described above, according to the spoken dialogue system of the present invention, when it is determined that the length of the entire dialogue is not to be increased, the confirmation can be omitted for some or all of the user inquiry contents. Thus, it is possible to avoid an increase in user dissatisfaction with an increase in the number of confirmation sentences output. Further, when the system gives an erroneous response due to erroneous recognition, since it is not assumed that the user makes a rejection utterance, it is not necessary to force the user to make a rejection utterance, and the user satisfaction is increased.
[Brief description of the drawings]
FIG. 1 is an overall configuration diagram showing a system environment of the present invention.
FIG. 2 is a functional block diagram of an embodiment of a voice interaction system according to the present invention.
FIG. 3 is a processing flowchart of one embodiment of the present invention.
FIG. 4 is a detailed processing flowchart of cost determination in FIG. 3;
[Explanation of symbols]
10. Spoken dialogue system
20 Database
30 Memory devices
40 communication lines
50 user terminals
100 Input section
110 Sentence Understanding Department
120 Provided information type determination unit
130 Dialogue procedure generator
140 Dialogue procedure cost calculator
150 Confirmation candidate generation unit
160 Confirmation candidate cost generation unit
170 Output unit
180 control unit

Claims

When a user inputs a sentence indicating an inquiry about information by voice, the user exchanges all or a part of the recognized inquiry contents, and then exchanges the information held in the database according to the inquiry contents. In a voice dialogue system provided to a user by voice,
Generating the recognized inquiry content as a system understanding state represented by a set of triples of an attribute, a value of the attribute, and a flag indicating whether or not the value is determined; Means for updating
After the exchange for confirming all or a part of the attribute values included in the system understanding state, the user inquiry content is determined on the assumption that only the determined attribute values are correct values. A determinative confirmation type interactive procedure (A) in which a response sentence for providing information to the user is output, and an exchange for confirming all or a part of attribute values included in the system understanding state was performed. Later, assuming that the value of the attribute that has not been finalized as well as the value of the attribute that has been finalized is the correct value, the content of the user inquiry is determined, and a response sentence for providing information to the user is output. Trial confirmation type interactive procedure (B), without confirming the user, assuming that only the determined attribute value is the correct value, determining the content of the user inquiry, and providing the user with information. A definitive immediate response type interactive procedure (C) that immediately outputs a response sentence to be provided, and not only a confirmed attribute value but also a non-determined attribute value without confirmation to the user Means for generating a trial immediate response type interactive procedure (D) of determining a user inquiry content assuming that the value is also a correct value and outputting a response sentence for providing information to the user;
The cost of the deterministic confirmation type interaction procedure (A), the cost of the deterministic immediate response type interaction procedure (B), the cost of the trial confirmation type interaction procedure (C), and the trial Means for calculating the cost of a typical immediate response interactive procedure (D);
Means for generating, as a confirmation candidate, all or a part of the value of the attribute that has not been determined in the system understanding state;
Means for calculating the cost of the confirmation candidate based on the cost of the interaction procedure including confirmation of the confirmation candidate for each of the confirmation candidates;
Comparing the cost of all confirmation candidates with the cost of all immediate response interactive procedures, and if the cost of the immediate response interactive procedure is the minimum cost, then follow the information of the immediate response interactive procedure. Means for generating a response sentence for providing a confirmation sentence, and if the cost of the confirmation candidate is the minimum cost, generating a confirmation sentence for confirming the confirmation candidate;
A speech dialogue system comprising:

The speech dialogue system according to claim 1,
The means for calculating the cost of the dialog procedure is to calculate the expected value of the number of independent words that the system and the user exchange when performing a dialog according to the determinate confirmation dialog procedure as the cost of the determinate confirmation dialog procedure. Calculate and calculate the expected value of the number of autonomous words included in the system response sentence when the system responds according to the deterministic immediate response type interactive procedure as the cost of the deterministic immediate response type interactive procedure, and try The expected value of the self-sustained word exchanged between the system and the user when interacting according to a typical confirmation-type interaction procedure, and the interaction between the system and the user in the subsequent interaction if the system response sentence included in the interaction procedure is incorrect When the system responds according to the trial immediate response type interactive procedure, the sum of the expected values of the number of independent words to be calculated is calculated as the cost of the trial confirmation type interactive procedure. , The expected value of the number of independent words contained in the system response sentence and the expected value of the number of independent words exchanged between the system and the user in the subsequent dialog if the system response is incorrect A spoken dialogue system, characterized in that it is calculated as the cost of a typed dialogue procedure.

The speech dialogue system according to claim 1 or 2,
The means for calculating the cost of the confirmation candidate selects, for each of the confirmation candidates, an interaction procedure having the minimum cost from among the interaction procedures including confirmation of the confirmation candidate, and considers the probability of the provided information type, and A spoken dialogue system which calculates an expected value and uses the expected value as the cost of a confirmation candidate.

When the user inputs a sentence that means an inquiry for information by voice, the user exchanges information to confirm all or a part of the identified inquiry content, and then exchanges the information held in the database according to the inquiry content. In a voice interaction method provided to a user by voice,
Receiving a query sentence indicating an inquiry of information input by the user, a correction sentence indicating a correction of the content of the inquiry, and an approval sentence indicating an approval of confirmation;
When a query is input from the user, it is represented by a set of triples of an attribute, a value of the attribute, and a flag indicating whether the value has been determined according to the result of recognizing the content of the query. When the system comprehension state is generated and a correction sentence of the inquiry content is input by the user, the system comprehension state is updated according to the result of recognizing the contents of the correction sentence, and all of the attribute values included in the system comprehension state are included. Alternatively, after the system outputs a confirmation statement to confirm the part, if an approval statement from the user is recognized, the fact that the confirmed value has been determined is recorded in the system understanding state. ,
Judging from the system understanding state, determining the type of all information that can be provided to the user as the provided information type, calculating the probability of each of the provided information type,
For each of the provided information types, after performing a confirmation exchange for confirming all or a part of the values of the attributes included in the system understanding state, only the determined attribute values are correct values. A definitive confirmation-type interactive procedure (A) in which the contents of the user inquiry are determined and a response sentence for providing information to the user is output, and all or one of the attribute values included in the system understanding state are assumed. After the exchange for confirming the part, the contents of the user inquiry are determined by assuming that the value of the attribute that has been finalized as well as the value of the attribute that has not been finalized is correct, and the information is transmitted to the user. A trial confirmation-type interactive procedure (B) in which a response sentence to be provided is output. The user is not supposed to confirm, but assumes that only the determined attribute value is the correct value. Deterministic immediate response type interactive procedure (C) in which the contents of the decision are determined and a response sentence for providing information to the user is immediately output (C), the value of the determined attribute without confirmation to the user In addition, the trial immediate response type interactive procedure of determining the content of the user inquiry assuming that the value of the attribute that has not been determined is also correct and outputting a response sentence for providing information to the user ( D) generating
For each of the provided information types, the cost of the deterministic confirmation type interaction procedure, the cost of the deterministic immediate response type interaction procedure, the cost of the trial confirmation type interaction procedure, and the trial immediate response Calculating the cost of the type interaction procedure;
Generating all or part of the values of the attributes that have not been determined as a confirmation candidate in the system understanding state;
For each of the confirmation candidates, for each provided information type, calculating a cost of the confirmation candidate based on a cost of an interactive procedure including confirmation of the confirmation candidate;
Comparing the cost of all confirmation candidates with the cost of all immediate response interactive procedures, and if the cost of the immediate response interactive procedure is the minimum cost, then follow the information of the immediate response interactive procedure. Generating a response sentence for providing the confirmation sentence, and if the cost of the confirmation candidate is the minimum cost, generating and outputting a confirmation sentence for confirming the confirmation candidate; and
A speech dialogue method comprising:

The voice interaction method according to claim 4,
The step of calculating the cost of the dialogue procedure includes, as the cost of the deterministic confirmation-type dialogue procedure, the expected value of the number of independent words exchanged between the system and the user when performing the dialogue according to the deterministic confirmation-type dialogue procedure. Calculate and calculate the expected value of the number of autonomous words included in the system response sentence when the system responds according to the deterministic immediate response type interactive procedure as the cost of the deterministic immediate response type interactive procedure, and try The expected value of the self-sustained word exchanged between the system and the user when interacting according to a typical confirmation-type interaction procedure, and the interaction between the system and the user in the subsequent interaction if the system response sentence included in the interaction procedure is incorrect The expected value of the number of independent words is calculated as the cost of a trial confirmation-type interactive procedure, and the system responds according to the trial immediate response-type interactive procedure. The expected value of the number of autonomous words included in the system response sentence and the expected value of the number of autonomous words exchanged by the system and the user in the subsequent dialogue when the system response is incorrect are determined on a trial basis. Calculated as the cost of an immediate response interactive procedure,
In the step of calculating the cost of the confirmation candidate, for each of the confirmation candidates, an interaction procedure having the minimum cost is selected from among the interaction procedures including confirmation of the confirmation candidate, and the cost of the provision information is considered in consideration of the probability of the provided information type. Calculate the expected value and use the expected value as the cost of the confirmation candidate.
A speech dialogue system characterized by the following.

A program for causing a computer to execute the processing steps of the voice interaction method according to claim 4.

A computer-readable recording medium storing a program for causing a computer to execute the processing steps of the voice interaction method according to claim 4.