JP3949356B2

JP3949356B2 - Spoken dialogue system

Info

Publication number: JP3949356B2
Application number: JP2000211551A
Authority: JP
Inventors: 明人永井; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-07-12
Filing date: 2000-07-12
Publication date: 2007-07-25
Anticipated expiration: 2020-07-12
Also published as: JP2002024212A

Description

【０００１】
【発明の属する技術分野】
この発明は、利用者の自然言語入力を理解して情報提供サービスの自動応答を行なう音声対話システムに関するものである。
【０００２】
【従来の技術】
情報システムにおけるマンマシンインタフェース技術として、従来から、利用者と自然言語による対話を行なって自動応答システムを実現する対話技術があり、特に、利用者と音声による対話を行なって実現する音声対話技術への要求が高まっている。音声対話技術の応用システムとして、例えば、受付、注文、予約などの各種サービス代行や、利用者が要求する情報の提供を行なう電話音声自動応答装置が知られており、２４時間サービス化、業務の効率化、省力化などの点で有用性が高い。
【０００３】
このような電話系サービスの分野では、ＣＴＩ（ＣｏｍｐｕｔｅｒＴｅｌｅｐｈｏｎｙＩｎｔｅｇｒａｔｉｏｎ）システムの導入が最近急速に進んでいる。この分野では、顧客の満足度を向上させるために、発信呼通知によって顧客を特定し、過去の顧客情報を利用して、顧客個人にあった情報提供や応対のサービスが試みられている。特に、音声自動応答装置を用いて業務の自動化を図るＣＴＩシステムでは、人間のオペレータ代行に伴うサービスの質の低下に対し、いかにして顧客の満足度を向上させるかが大きな課題となっており、顧客個人に適応した応対を実現する音声対話技術が必要となる。
【０００４】
音声対話技術により構築される音声対話システムでは、一般的な構成として、利用者の発話を認識する音声認識部、認識された発話文をシステムへのコマンドへ翻訳する音声理解部、コマンドで表現された利用者の要求に応じて、データベース検索や予約などを行なうアプリケーションを制御し、利用者とシステムとの対話を管理して、システムの応答を決定する対話管理部、システムの応答を音声で通知する音声合成部を備えている。
【０００５】
個人性を考慮した応対を実現する音声対話技術としては、従来から、音声合成部からのシステムの応答に対する利用者の入力のタイミングにより、システムに対する利用者の習熟度を推定し、音声ガイダンスの内容を習熟度に合わせて変更する技術（特開平４−３４４９３０号）、利用者の発話に対する音声認識部での音響尤度と、対話管理部による認識結果の確認対話で判明する認識失敗回数とを用いて、利用者の音声が認識しやすいか否かを推定し、認識のしやすさに応じて確認対話の制御方法を変更する技術（特開平７−１８１９９４号）、発信呼の電話番号により利用者を特定した後に、利用者の年齢（大人、子供）や国籍（言語）に合わせて、ガイダンスの文体や言語を変更する技術（特開平８−１１６５７２号）などがある。
【０００６】
【発明が解決しようとする課題】
しかし、上記のような音声対話システムでは、利用者の発話文をシステムへのコマンドへ翻訳する音声理解技術において、個人性が考慮された翻訳がなされてなく、利用者から入力された発話文の翻訳結果は、全ての利用者に関して差異のない翻訳結果となっていた。
【０００７】
例えば、ホテルの検索のような情報検索型のサービスにおける対話では、利用者が希望条件に合うホテルを探すときに、「横浜で安いホテルを教えて下さい」といった、漠然と料金の希望を指定する発話が頻繁に生じる。このような「安い」という曖昧な単語に対しては、一般的に、設計者が予め想定した固定の値、例えば６０００円以下という値を一律に用いてコマンドへ翻訳する。
【０００８】
このために、１００００円程度が安くて手頃だと思って探している利用者に対して、システムは「横浜の安いホテルは、Ａホテル４５００円、Ｂホテル５５００円、Ｃホテル６０００円、があります」のような応答を行ない、利用者は再度、「もう少し高めのホテルが良いのですが」といった発話が必要になるため、検索が効率的でないという課題があった。また、利用者の料金に対する感覚に一致していないために、利用者に違和感を生じさせるという課題があった。
【０００９】
この発明は、上記課題を解決するためになされたもので、利用者の曖昧語を含む自然言語の入力に対して、効率的な検索ができる音声対話システムを提供することを目的とする。
また、この発明は、利用者から入力される自然言語に含まれている曖昧語に対応する意味を推定して、効率的かつ柔軟な検索ができる音声対話システムを提供することを目的とする。
また、この発明は、利用者が対話システムを利用した回数が少ない場合でも、利用者の感覚に合致した自然な情報提示を行なうことができ、情報検索の効率化、及び利用者の利便性を向上させることができる音声対話システムを提供することを目的とする。
また、この発明は、曖昧な語が表わす値を利用者の発話履歴から学習して、利用者に応じて自動的に設定して翻訳できるようにし、情報検索の効率化、及び利用者の感覚に合致した自然な情報提示を行なうことで、利用者の利便性を向上させる音声対話システムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
この発明に係る音声対話システムは、対話システム動作に対応して定義されたコマンド意図、対話システム動作のパラメータの種類を定義した項目、及び項目に対応する値である項目値からなる表現を対話システムのコマンドとし、自然言語をコマンドへ変換するための変換知識をコマンド知識として記憶するコマンド知識記憶手段と、自然言語において項目値へ一意に変換できない語を曖昧語とし、曖昧語、曖昧語の項目、及び曖昧語に対応する意味標識を曖昧語辞書として記憶する曖昧語辞書記憶手段と、曖昧語辞書記憶手段に記憶された曖昧語辞書を参照して、利用者が入力した自然言語に含まれる曖昧語を曖昧語に対応する意味標識に置換して、曖昧語の項目と意味標識の対を作成し、コマンド知識記憶手段に記憶されたコマンド知識を参照して、入力された自然言語を、曖昧語の項目と意味標識の対を含んだコマンドに変換するコマンド変換手段と、曖昧語に対応する意味標識の値を推定するための推定値情報を利用者を特定する利用者識別子とともに記憶する推定値情報記憶手段と、コマンド変換手段から入力される曖昧語の項目と意味標識の対を含んだコマンドに対し、推定値情報記憶手段に記憶された利用者識別子に対応した推定値情報を参照して、曖昧語に対応する意味標識の推定値を決定してコマンドとともに出力する曖昧語翻訳手段と、前記意味標識に対応する推定値同士の関係を関数として規定し、補間モデルとして記憶した補間モデル記憶手段と、曖昧語翻訳手段からの利用者識別子及び曖昧語の意味標識を入力とし、利用者識別子に対応した推定値情報における曖昧語のうち、入力された意味標識の推定値情報が未学習である曖昧語に対して、上記補間モデル記憶手段の補間モデルを用いて、学習済の曖昧語の意味標識に対する推定値情報から、未学習の意味標識の推定値を算出して曖昧語翻訳手段へ出力する推定値補間手段とを備えたものである。
【００１２】
この発明に係る音声対話システムは、対話システム動作に対応して定義されたコマンド意図、対話システム動作のパラメータの種類を定義した項目、及び項目に対応する値である項目値からなる表現を対話システムのコマンドとし、自然言語をコマンドへ変換するための変換知識をコマンド知識として記憶するコマンド知識記憶手段と、自然言語において項目値へ一意に変換できない語を曖昧語とし、曖昧語、曖昧語の項目、及び曖昧語に対応する意味標識を曖昧語辞書として記憶する曖昧語辞書記憶手段と、上記曖昧語辞書記憶手段に記憶された曖昧語辞書を参照して、利用者が入力した自然言語に含まれる曖昧語を曖昧語に対応する意味標識に置換して、曖昧語の項目と意味標識の対を作成し、上記コマンド知識記憶手段に記憶されたコマンド知識を参照して、入力された自然言語を、上記曖昧語の項目と意味標識の対を含んだコマンドに変換するコマンド変換手段と、曖昧語に対応する意味標識の値を推定するための推定値情報を利用者を特定する利用者識別子とともに記憶する推定値情報記憶手段と、上記コマンド変換手段から入力される曖昧語の項目と意味標識の対を含んだコマンドに対し、上記推定値情報記憶手段に記憶された利用者識別子に対応した推定値情報を参照して、曖昧語に対応する意味標識の推定値を決定してコマンドとともに出力する曖昧語翻訳手段と、全ての利用者に対する推定値情報を記憶する全利用者推定値情報記憶手段と、曖昧語翻訳手段からの利用者識別子及び曖昧語の意味標識を入力とし、利用者識別子に対応した推定値情報における曖昧語のうち、入力された意味標識の推定値情報が未学習である曖昧語に対して、上記全利用者推定値情報記憶手段に記憶された全ての利用者に対する推定値情報を参照して、学習済の曖昧語の意味標識に対する推定値情報との一致度が高い他の利用者の推定値情報を利用し、未学習の意味標識の推定値を選択して曖昧語翻訳手段へ出力する推定値選択手段とを備えたものである。
【００１３】
この発明に係る音声対話システムにおいて、項目及び項目値が付与された検索対象データの集合を記憶するデータベースと、入力されたコマンドに対応して、所定の対話システム動作を実行してシステムと利用者との対話を管理するとともにデータベースを検索し、利用者へ通知する応答文の意味内容を表わす応答意味表現を生成する対話管理手段とを備えたものである。
【００１５】
【発明の実施の形態】
以下、この発明の実施の一形態を説明する。
実施の形態１．
図１は、この発明の実施の形態１における音声対話システムの機能ブロック構成図であり、図において、１はコマンド知識記憶部（コマンド知識記憶手段）、２はコマンド変換部（コマンド変換手段）、３はデータベース、４は応答履歴記憶部（応答履歴記憶手段）、５は対話管理部（対話管理手段）、６は曖昧語辞書記憶部（曖昧語辞書記憶手段）、７は推定値情報記憶部（推定値情報記憶手段）、８は曖昧語翻訳部（曖昧語翻訳手段）、９は曖昧語記憶部（曖昧語記憶手段）、１０は推定知識記憶部（推定知識記憶手段）、１１は推定値適応部（推定値適応手段）である。
【００１６】
次に動作について説明する。
まず、利用者からの自然言語がコマンド変換部２へ入力される。入力される自然言語は、利用者の発話を音声認識した結果のテキストである。ただし入力可能なものとしては利用者からの自然言語に限定するものではない。キーボードやＧＵＩなどの別の手段から入力されたテキストであっても構わない。さらに、入力された自然言語に対し、コマンド変換の前段階として、形態素解析や構文解析、意味解析などの言語処理が施された結果の、意味的な構造を持った表現形式である意味表現であってもよい。
【００１７】
次に、コマンド変換部２は、コマンド知識記憶部１に記憶されたコマンド知識に従って、入力された自然言語を対話システムへのコマンドに変換する。コマンド知識記憶部１には、自然言語とコマンドとの対応を記述したコマンド知識が記憶されている。コマンドの定義の一例としては、コマンド＝意図：（項目１、項目値１）、（項目２、項目値２）、…、（項目ｎ、項目値ｎ）、のように表現し、コマンドを、意図と、そのパラメータとなる項目及び項目値の対の組み合わせで表現する。ここで、意図は対話システムの動作に対応して定義し、項目は検索対象データに関する検索条件の種類に対応して定義する。項目値は項目に属する具体的な値である。例えば、ホテル予約の場合、意図としては、＜意図：検索要求＞、＜意図：予約要求＞、＜意図：項目質問＞、＜意図：項目値確認＞、＜意図：項目値表明＞、＜意図：肯定＞、＜意図：否定＞、などであり、項目及び項目値の対としては、（＜場所＞、横浜）、（＜料金＞、６０００≧）、（＜部屋タイプ＞、シングル）、（＜人数＞、２）、（＜対象＞、ホテル）、などである。
【００１８】
コマンド知識記憶部１に記憶されているコマンド知識は、上記のコマンド表現と、自然言語との対応関係を規定するための知識である。例えば、「教えて下さい」、「ありますか」、「探しているのですが」、などの自然言語に対しては、＜意図：検索要求＞が対応し、「どこですか」、「いくらですか」、に対しては、＜意図：項目質問＞が対応する。また、例えば、「横浜で」に対しては、項目及び項目値の対として、（＜場所＞、横浜）が対応し、「６０００円以下の」に対しては、（＜料金＞、６０００≧）が対応する。コマンド知識は、これらの対応関係について、自然言語に関する形態素情報や助詞、助動詞などの意味的な情報を用いて、対応表や変換規則などの形式で表現する。
【００１９】
さらに、コマンド変換部２は、「安い」、「近い」などのような項目値が一意に決定できない曖昧語に対して、曖昧語辞書記憶部６に記憶された、曖昧語と、曖昧語の項目と、曖昧語に対応する意味標識の対応関係を参照し、入力に含まれる曖昧語を、曖昧語に対応する意味標識に置換して、曖昧語の項目と意味標識の対を作成する。さらに、コマンド知識を参照して、入力された自然言語を、上記曖昧語の項目と意味標識の対を含んだコマンドの表現に変換する。
【００２０】
図２に曖昧語辞書記憶部６に記憶する対応関係の例を示す。例えば、項目＜料金＞に関しては、曖昧語「自立語（安い）」に対して意味標識「＄ｃｈｅａｐ１」が対応している。コマンド変換部２は、入力された自然言語に曖昧語「安い」が含まれていれば、上記の対応関係を参照して、（＜料金＞、＄ｃｈｅａｐ１）に変換する。以上より、例えば、「横浜で安いホテルを教えて下さい」という自然言語は、コマンド変換部２により、「＜意図：検索要求＞：（＜場所＞、横浜）、（＜料金＞、＄ｃｈｅａｐ１）、（＜対象＞、ホテル）」というコマンドに変換される。
【００２１】
曖昧語翻訳部８は、コマンド変換部２から入力されたコマンド中に曖昧語の意味標識が含まれている場合には、推定値情報記憶部７で記憶されている利用者毎の推定値情報から、現在システムを対話している利用者の利用者識別子に対応する推定値情報を参照し、曖昧語の推定値を決定してコマンド中の曖昧語の意味標識を決定された推定値に置き換え、対話管理部５へ出力する。
【００２２】
推定値情報記憶部７は、曖昧語と曖昧語に対応する意味標識の推定値情報を利用者毎に記憶する。推定値情報は、利用者が過去の対話で曖昧語をどんな値として用いたかの情報を記録したものであり、利用者とシステムとの対話の履歴を利用して、後述する推定値適応部１１により学習される。なお、利用者が初めてシステムと対話する場合には、曖昧語の意味標識に対して初期に設定された値が推定値情報として用いられる。
【００２３】
対話管理部５は、コマンドが入力されると、設定された所定の対話手順に基づいて、コマンドに対応したシステムの動作を実行し、システムと利用者との対話を管理する。所定の対話手順の一例としては、例えば、コマンドの意図が検索要求であれば、対話管理部５は、コマンドのパラメータである項目及び項目値の対を用いて検索式を作成してデータベース３の検索を行ない、検索結果を利用者へ通知するための応答の意味表現を出力する。データベース３は、項目及び項目値が付与された検索対象データの集合を記憶する。図３はデータベース３に記憶される検索対象データの例であり、各対象名に対し、項目と項目値のデータが与えられている。
【００２４】
あるいは、所定の対話手順についての他の例としては、コマンドの意図が予約要求である場合、予約に必須の項目、例えば、＜対象名＞、＜予約日＞、＜人数＞、＜部屋タイプ＞、などに対する項目値が全て得られていれば、予約動作の確認を利用者に行なって、確認後に予約動作の実行を行ない、全て得られていない場合には、不足している項目の項目値を利用者に質問するための応答の意味表現を出力する。
【００２５】
応答の意味表現は、システムが利用者へ通知する応答文を生成するための表現形式である。一般的な音声自動応答装置では、応答の意味表現から応答文を生成する文生成手段と、文生成手段から受け取った応答文を合成音声へ変換する音声合成手段とを備えており、対話管理部５から出力される応答の意味表現は音声として利用者に通知される。
【００２６】
例えば、この応答の意味表現としては、利用者の「横浜駅で６０００円以下のホテルを教えて下さい」という入力に対してシステムがデータベース検索を行なった結果が、Ａホテル４５００円、Ｂホテル５５００円、Ｃホテル６０００円、の３件である場合、その応答の意味表現は、「＜検索結果提示＞：（対象名Ａホテル（＜料金＞４５００円））、（対象名Ｂホテル（＜料金＞５５００円））、（対象名Ｃホテル（＜料金＞６０００円））」のような形式となる。
【００２７】
さらに、対話管理部５は、利用者が対話を開始してからの応答の意味表現を、対話の開始から応答順に付与される応答識別番号とともに応答履歴記憶部４に記録する。
以上が、推定値情報記憶部７に記憶された利用者個人に対応した推定値情報を利用して、曖昧語の推定値を決定する場合の動作例である。
【００２８】
次に、推定値情報記憶部７に記憶される推定値情報を学習する場合の動作例について説明する。
曖昧語記憶部９は、曖昧語の意味標識を曖昧語の項目とともに記憶するものであり、対話の開始時からの入力識別番号が付与された形式で記憶する。例えば、対話の３番目の発話で入力されたコマンドが、「＜意図：検索要求＞：（＜場所＞、横浜）、（＜料金＞、＄ｃｈｅａｐ１）、（＜対象＞、ホテル）」の場合、（３：＜料金＞、＄ｃｈｅａｐ１）という形式のデータが、曖昧語記憶部９に登録される。推定値適応部１１は、コマンド変換部２から入力されたコマンドに曖昧語の意味標識が含まれる場合に、まず、曖昧語記憶部９へ曖昧語の意味標識を上記の形式で登録する。
【００２９】
次に、推定値適応部１１は、上記コマンドに対する対話管理部５の応答が利用者に対して通知された後に、この応答に対する利用者の発話内容から、曖昧語記憶部９に登録された曖昧語の意味標識に対する推定値を推定する。次に推定の方法を具体例とともに説明する。例えば、利用者の発話が「横浜で安いホテルを教えて下さい」であって、これに対する対話管理部５の応答が、「横浜の安いホテルは、Ａホテル４５００円、Ｂホテル５５００円、Ｃホテル６０００円、があります」であったとする。この応答に対する利用者の発話は、例えば、以下の３通りが考えられる。
（１）「１００００円くらいが良いのですが」
（２）「もう少し高くても構いません」
（３）「Ｃホテルの最寄駅はどこですか」
【００３０】
（１）は、応答中に示された金額を受け入れられず、利用者が明示的に自分が想定している金額を表明している場合である。このときは、「安い」という曖昧語の推定値は、入力されたコマンド中の１００００円程度であると推定できる。
（２）は、応答中に示された金額を受け入れられず、利用者がシステムに対し再度、検索要求の意図の発話をしている場合である。このときは、入力されたコマンド中の「高い」という別の曖昧語により、「安い」という曖昧語の推定値は、提示した金額の最高値である６０００円より高い金額であると推定できる。
（３）は、発話の意図が、項目＜料金＞以外の項目＜最寄駅＞を尋ねる＜項目質問＞の意図であることから、応答中に示された金額のうち、Ｃホテルの金額を受け入れたと考えられる。そこで、「安い」という曖昧語の推定値は、６０００円程度であると推定できる。
【００３１】
以上のような推定を行なうために、推定知識記憶部１０は、応答履歴記憶部４に記憶された応答意味表現と、コマンド変換部２から入力されたコマンドとの関係から判定するための推定知識を曖昧語の推定値として記憶する。推定値適応部１１は、入力されたコマンド及び応答履歴記憶部４の応答の意味表現を参照して、推定知識記憶部１０の推定知識に基づいて、曖昧語の推定値を決定する。これより、利用者識別子に対応した推定値情報記憶部７の推定値情報を更新して学習し、学習の対象とした曖昧語の意味標識を曖昧語記憶部９から削除する。図４に推定値情報記憶部７に記憶されたデータ構造を示す。
【００３２】
推定知識記憶部１０に記憶された推定知識は、例えば、上記（１）〜（３）の場合分けができるような条件判定部を持つ知識として、ｉｆ〜ｔｈｅｎ〜形式のルールで以下のように記述する。
（１）ｉｆ（応答履歴記憶部４の応答中に＜検索結果提示＞の項目値Ａが存在するａｎｄ現在のコマンド中に＜意図：項目値表明＞とともに項目値Ｂが存在する）ｔｈｅｎ（推定値を項目値Ｂとする）
（２）ｉｆ（応答履歴記憶部４の応答中に＜検索結果提示＞の項目値Ａが存在するａｎｄ現在のコマンド中に＜意図：検索要求＞とともに項目値Ａに対応する項目に関する曖昧語の意味標識が存在する）ｔｈｅｎ（次のコマンド入力を待つ）
（３）ｉｆ（応答履歴記憶部４の応答中に＜検索結果提示＞の項目値Ａが存在するａｎｄ現在のコマンド中に項目値Ａと対応しない項目に関する＜意図：項目質問＞とともに直前の応答中の対象名が存在する）ｔｈｅｎ（推定値を直前の応答中の対象名に対応する項目値Ａとする）
【００３３】
推定値適応部１１は、上記のようにして求めた推定値を推定値情報記憶部７における推定値情報として記録する。推定値情報は、例えば、各項目の各曖昧語の意味標識に関して、各推定値の頻度情報を記録しておけばよい。
【００３４】
以上のように上記実施の形態１によれば、利用者の曖昧語を含む自然言語の入力に対して、意図、項目、及び項目値からなる表現でコマンドに変換し、曖昧語に対応する意味標識の推定値を決定してコマンドとともに出力することにより、効率的かつ柔軟な検索ができるという効果が得られる。
また、曖昧な語が表わす値を利用者の発話履歴から学習して、利用者に応じて自動的に設定して翻訳することにより、情報検索の効率化、及び利用者の感覚に合致した自然な情報提示を行なうことができ、利用者の利便性を向上させることができるという効果が得られる。
【００３５】
なお、上記実施の形態１において、複数の検索対象がある場合には、項目＜料金＞をそれぞれ別な項目として定義する。例えば、検索項目がホテル及びレストランである場合には、ホテルは、Ｓｅａｒｃｈ［ｈｏｔｅｌ］：＜料金（ホテル）＞＝＄ｃｈｅａｐｈｏｔｅｌとなり、レストランは、Ｓｅａｒｃｈ［ｒｅｓｔａｕｒａｎｔ］：＜料金（レストラン）＞＝＄ｃｈｅａｐｒｅｓｔａｕｒａｎｔとなる。
【００３６】
実施の形態２．
図５はこの発明の実施の形態２における音声対話システムの機能ブロック構成図であり、図において、１２は補間モデル記憶部（補間モデル記憶手段）、１３は推定値補間部（推定値補間手段）である。他の構成は図１に示した実施の形態１の構成と同じであり、同一の符号で表されている。
次に動作について説明する。
この実施の形態２は、利用者が対話システムを利用した回数が少ない場合に、推定値情報記憶部７に記憶される推定値情報の学習において、推定値情報が未学習である曖昧語の意味標識に対して、他の学習済の曖昧語の推定値情報を用いて、未学習の曖昧語の意味標識の推定値を補間して算出するものである。
【００３７】
補間モデル記憶部１２は、曖昧語の意味標識と、該意味標識に対応する推定値との関係を関数として規定し、補間モデルとして記憶する。補間モデルとして用いる上記関数は、曖昧語の意味標識が与えられたときにその推定値を補間して算出できるものであればよい。例えば、図２に示すように、＜料金＞という同一項目に対する複数の曖昧語の意味標識として、＄ｃｈｅａｐｅｓｔ（曖昧語：できるだけ安い）、＄ｃｈｅａｐ１（曖昧語：安い）、＄ｃｈｅａｐ２（曖昧語：できれば安い）、＄ｎｏｔ＿ｓｏ＿ｅｘｐ（曖昧語：あまり高くない）、＄ｅｘｐ（曖昧語：少し高くても良い）、などが定義されている場合、これらの推定値を順に、ｖ１、ｖ２、ｖ３、ｖ４、ｖ５、とすれば、ｖ１＝ｖ２−１０００、ｖ１＝ｖ３−２０００、ｖ１＝ｖ４−３０００、ｖ１＝ｖ５−４０００、などのように、推定値同士の差分を規定する関数を記憶しておく。
【００３８】
推定値補間部１３は、曖昧語翻訳部８からの利用者識別子及び曖昧語の意味標識を入力とし、推定値情報記憶部７に記憶されている利用者識別子に対応した推定値情報を参照して、入力された曖昧語の意味標識に対する推定値情報が未学習の場合に、補間モデル記憶部１２の補間モデルを用いて、学習済の曖昧語の意味標識に対する推定値情報から、未学習の該意味標識の推定値を算出して曖昧語翻訳部８へ出力する。例えば、曖昧語の意味標識＄ｃｈｅａｐｅｓｔ（曖昧語：できるだけ安い）の推定値ｖ１が未学習であり、＄ｃｈｅａｐ１（曖昧語：安い）の推定値ｖ２が学習済であって、ｖ２＝６０００であるとする。このとき、推定値補間部１３は、補間モデル記憶部１２に記憶された上記推定値同士の差分を規定する関数を参照して、ｖ１＝ｖ２−１０００＝５０００、のように、未学習の推定値ｖ１を算出する。
【００３９】
以上のように、上記実施の形態２によれば、未学習の曖昧語の推定値を学習済の曖昧語の推定値情報から補間して算出できるようにしたので、利用者が対話システムを利用した回数が少ない場合でも、曖昧語の項目値を推定して、情報検索の効率化、及び利用者の感覚に合致した自然な情報提示を行なうことができ、利用者の利便性を向上させることができるという効果が得られる。
【００４０】
実施の形態３．
図６はこの発明の実施の形態３における音声対話システムの機能ブロック構成図であり、図において、１４は全利用者推定値情報記憶部（全利用者推定値情報記憶手段）、１５は推定値選択部（推定値選択手段）である。他の構成については図１に示した実施の形態１の構成と同じであり、同一の符号で表されている。
【００４１】
次に動作について説明する。
この実施の形態３は、利用者が対話システムを利用した回数が少ない場合に、推定値情報記憶部７に記憶される推定値情報が未学習である曖昧語の意味標識に対して、推定値情報の一致度が高い他の利用者の学習済の曖昧語の推定値情報を用いて、未学習の曖昧語の意味標識を推定して算出するものである。
【００４２】
全利用者推定値情報記憶部１４は、全ての利用者に対する推定値情報を利用者識別子に対応して記憶する。全利用者推定値情報記憶部１４におけるデータ構造は、図４に示したデータ構造にさらに利用者識別子を付加したものになる。推定値選択部１５は、曖昧語翻訳部８からの利用者識別子及び曖昧語の意味標識を入力とし、推定値情報記憶部７に記憶されている利用者識別子に対応した推定値情報を参照して、入力された曖昧語の意味標識に対する推定値情報が未学習の場合に、全利用者推定値情報記憶部１４を参照する。そして、現在システムを利用している利用者Ａの推定値情報と、他の利用者Ｂの推定値情報との、推定値情報の一致度を算出する。一致度は、例えば、利用者Ａ、利用者Ｂともに学習済の曖昧語の推定値を比較し、推定値の差がある一定の範囲内であれば、その曖昧語の推定値が一致しているとし、一致した曖昧語の数を一致度として定義する。
【００４３】
推定値選択部１５は、全利用者推定値情報記憶部１４に記憶された全ての利用者に対する推定値情報を参照して、利用者Ａで未学習である曖昧語の推定値情報を有する利用者の内、利用者Ａとの一致度が最も高い利用者Ｃを選択し、利用者Ｃの学習済の曖昧語の推定値情報を、利用者Ａの未学習の意味標識の推定値として曖昧語翻訳部８へ出力する。
【００４４】
以上のように、上記実施の形態３によれば、未学習の曖昧語の推定値を、推定値情報の一致度が高い他の利用者の推定値で代用するようにしたので、利用者が対話システムを利用した回数が少ない場合でも、曖昧語の推定値を推定して、情報検索の効率化、及び利用者の感覚に合致した自然な情報提示を行なうことができ、利用者の利便性を向上させることができるという効果が得られる。
【００４５】
なお、上記各実施の形態においては、音声対話システムの発明について説明したが、この発明の音声対話システム及び電話回線を含む統合的なコンピュータシステムを構築して、電話回線を介して入力された利用者すなわち顧客の曖昧語を含む自然言語を理解して、顧客が要望する情報を安い料金で提供するビジネスを展開することができる。その他、例えば、受付、注文、予約などの各種サービス代行や、利用者が要求する情報の提供を行なう電話音声自動応答装置にもこの発明の音声対話システムを適用することにより著しい効果が得られる。あるいは、発明の音声対話システムを適用することにより、顧客の曖昧語を含む自然言語を理解する自動販売機を実現できるという効果が得られる。
【００４６】
【発明の効果】
以上のように、この発明によれば、音声対話システムを、対話システム動作に対応して定義されたコマンド意図、対話システム動作のパラメータの種類を定義した項目、及び項目に対応する値である項目値からなる表現を対話システムのコマンドとし、自然言語をコマンドへ変換するための変換知識をコマンド知識として記憶するコマンド知識記憶手段と、自然言語において項目値へ一意に変換できない語を曖昧語とし、曖昧語、曖昧語の項目、及び曖昧語に対応する意味標識を曖昧語辞書として記憶する曖昧語辞書記憶手段と、曖昧語辞書記憶手段に記憶された曖昧語辞書を参照して、利用者が入力した自然言語に含まれる曖昧語を曖昧語に対応する意味標識に置換して、曖昧語の項目と意味標識の対を作成し、コマンド知識記憶手段に記憶されたコマンド知識を参照して、入力された自然言語を、曖昧語の項目と意味標識の対を含んだコマンドに変換するコマンド変換手段と、曖昧語に対応する意味標識の値を推定するための推定値情報を利用者を特定する利用者識別子とともに記憶する推定値情報記憶手段と、コマンド変換手段から入力される曖昧語の項目と意味標識の対を含んだコマンドに対し、推定値情報記憶手段に記憶された利用者識別子に対応した推定値情報を参照して、曖昧語に対応する意味標識の推定値を決定してコマンドとともに出力する曖昧語翻訳手段と、前記意味標識に対応する推定値同士の関係を関数として規定し、補間モデルとして記憶した補間モデル記憶手段と、曖昧語翻訳手段からの利用者識別子及び曖昧語の意味標識を入力とし、利用者識別子に対応した推定値情報における曖昧語のうち、入力された意味標識の推定値情報が未学習である曖昧語に対して、上記補間モデル記憶手段の補間モデルを用いて、学習済の曖昧語の意味標識に対する推定値情報から、未学習の意味標識の推定値を算出して曖昧語翻訳手段へ出力する推定値補間手段とを備えるように構成したので、利用者の曖昧語を含む自然言語の入力に対して、意図、項目、及び項目値からなる表現でコマンドに変換し、曖昧語に対応する意味標識の推定値を決定してコマンドとともに出力することにより、効率的かつ柔軟な検索ができるという効果がある。また、利用者が対話システムを利用した回数が少ない場合でも、曖昧語の項目値を推定して、情報検索の効率化、及び利用者の感覚に合致した自然な情報提示を行なうことができ、利用者の利便性を向上させることができるという効果がある。
【００４８】
この発明における音声対話システムは、対話システム動作に対応して定義されたコマンド意図、対話システム動作のパラメータの種類を定義した項目、及び項目に対応する値である項目値からなる表現を対話システムのコマンドとし、自然言語をコマンドへ変換するための変換知識をコマンド知識として記憶するコマンド知識記憶手段と、自然言語において項目値へ一意に変換できない語を曖昧語とし、曖昧語、曖昧語の項目、及び曖昧語に対応する意味標識を曖昧語辞書として記憶する曖昧語辞書記憶手段と、上記曖昧語辞書記憶手段に記憶された曖昧語辞書を参照して、利用者が入力した自然言語に含まれる曖昧語を曖昧語に対応する意味標識に置換して、曖昧語の項目と意味標識の対を作成し、上記コマンド知識記憶手段に記憶されたコマンド知識を参照して、入力された自然言語を、上記曖昧語の項目と意味標識の対を含んだコマンドに変換するコマンド変換手段と、曖昧語に対応する意味標識の値を推定するための推定値情報を利用者を特定する利用者識別子とともに記憶する推定値情報記憶手段と、上記コマンド変換手段から入力される曖昧語の項目と意味標識の対を含んだコマンドに対し、上記推定値情報記憶手段に記憶された利用者識別子に対応した推定値情報を参照して、曖昧語に対応する意味標識の推定値を決定してコマンドとともに出力する曖昧語翻訳手段と、全ての利用者に対する推定値情報を記憶する全利用者推定値情報記憶手段と、曖昧語翻訳手段からの利用者識別子及び曖昧語の意味標識を入力とし、利用者識別子に対応した推定値情報における曖昧語のうち、入力された意味標識の推定値情報が未学習である曖昧語に対して、上記全利用者推定値情報記憶手段に記憶された全ての利用者に対する推定値情報を参照して、学習済の曖昧語の意味標識に対する推定値情報との一致度が高い他の利用者の推定値情報を利用し、未学習の意味標識の推定値を選択して曖昧語翻訳手段へ出力する推定値選択手段とを備えるように構成したので、利用者が対話システムを利用した回数が少ない場合でも、曖昧語の推定値を推定して、情報検索の効率化、及び利用者の感覚に合致した自然な情報提示を行なうことができ、利用者の利便性を向上させることができる効果がある。
【００４９】
この発明における音声対話システムにおいて、項目及び項目値が付与された検索対象データの集合を記憶するデータベースと、入力されたコマンドに対応して、所定の対話システム動作を実行してシステムと利用者との対話を管理するとともにデータベースを検索し、利用者へ通知する応答文の意味内容を表わす応答意味表現を生成する対話管理手段とを備えるように構成したので、利用者の曖昧語を含む自然言語の入力に対して、意図、項目、及び項目値からなる表現でコマンドに変換してデータベースを検索し、利用者の入力に適応した応答ができるという効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１における音声対話システムの機能ブロック構成図である。
【図２】この発明の各実施の形態における曖昧語辞書記憶部に記憶される項目、曖昧語、及び意味標識の対応関係の例を示す図である。
【図３】この発明の各実施の形態におけるデータベースに記憶される検索対象データの例を示す図である。
【図４】この発明の実施の形態１における推定情報記憶部に記憶されるデータ構造を示す図である。
【図５】この発明の実施の形態２における音声対話システムの機能ブロック構成図である。
【図６】この発明の実施の形態３における音声対話システムの機能ブロック構成図である。
【符号の説明】
１コマンド知識記憶部（コマンド知識記憶手段）、２コマンド変換部（コマンド変換手段）、３データベース、４応答履歴記憶部（応答履歴記憶手段）、５対話管理部（対話管理手段）、６曖昧語辞書記憶部（曖昧語辞書記憶手段）、７推定値情報記憶部（推定値情報記憶手段）、８曖昧語翻訳部（曖昧語翻訳手段）、９曖昧語記憶部（曖昧語記憶手段）、１０推定知識記憶部（推定知識記憶手段）、１１推定値適応部（推定値適応手段）、１２補間モデル記憶部（補間モデル記憶手段）、１３推定値補間部（推定値補間手段）、１４全利用者推定値情報記憶部（全利用者推定値情報記憶手段）、１５推定値選択部（推定値選択手段）。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a spoken dialogue system that understands a user's natural language input and automatically responds to an information providing service.
[0002]
[Prior art]
Conventionally, as a man-machine interface technology in information systems, there is a dialogue technology that realizes an automatic response system by conducting a dialogue with the user in natural language, and in particular, a voice dialogue technology that is realized by conducting a dialogue with the user by voice. The demand is growing. As an application system of voice interactive technology, for example, various service agents such as reception, ordering, reservation, etc., and an automatic telephone voice response device for providing information requested by a user are known. It is highly useful in terms of efficiency and labor saving.
[0003]
In the field of such telephone service, the introduction of a computer telephony integration (CTI) system has been rapidly progressing recently. In this field, in order to improve customer satisfaction, customers are identified by outgoing call notification, and past customer information is used to provide information and reception services that are tailored to individual customers. In particular, in a CTI system that automates work using an automatic voice response device, how to improve customer satisfaction against the decline in service quality associated with human operator substitution is a major issue. Therefore, a voice dialogue technology that realizes a response adapted to individual customers is required.
[0004]
In a voice dialogue system constructed by voice dialogue technology, a general configuration is expressed by a voice recognition unit that recognizes a user's utterance, a voice understanding unit that translates a recognized utterance sentence into a command to the system, and a command. In response to the user's request, it controls applications that perform database searches and reservations, manages the interaction between the user and the system, and determines the system response. A speech synthesizer.
[0005]
Conventionally, as a spoken dialogue technology that realizes personalized response, the user's proficiency with the system is estimated based on the user's input timing to the system response from the speech synthesizer, and the contents of the voice guidance Is changed in accordance with the proficiency level (Japanese Patent Laid-Open No. 4-344930), the acoustic likelihood in the voice recognition unit for the user's utterance, and the number of recognition failures determined by the confirmation dialogue of the recognition result by the dialogue management unit A technique for estimating whether or not the user's voice is easy to recognize and changing the control method of the confirmation dialog according to the ease of recognition (Japanese Patent Laid-Open No. 7-181994), depending on the telephone number of the outgoing call There is a technique (Japanese Patent Laid-Open No. 8-116572) that changes the style and language of guidance according to the age (adult, child) and nationality (language) of the user after specifying the user.
[0006]
[Problems to be solved by the invention]
However, in the speech dialogue system as described above, in the speech understanding technology for translating the user's utterance sentence to the command to the system, the translation considering the individuality is not made, and the utterance sentence inputted by the user is not The translation result was the same translation result for all users.
[0007]
For example, in an information search service such as a hotel search, when a user searches for a hotel that meets the desired conditions, an utterance that vaguely specifies the desired price, such as “Tell me a cheap hotel in Yokohama” Frequently occurs. For such an ambiguous word “cheap”, generally, a fixed value assumed by the designer in advance, for example, a value of 6000 yen or less is uniformly used for translation into a command.
[0008]
For this reason, for users who are looking for cheap and affordable around 10000 yen, the system is "cheap hotels in Yokohama, A hotel 4500 yen, B hotel 5500 yen, C hotel 6000 yen ”And the user again has to speak such as“ I want a hotel that is a little higher ”, so there was a problem that the search was not efficient. Moreover, since it does not correspond to the user's sense of charge, there is a problem that the user feels uncomfortable.
[0009]
The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a spoken dialogue system capable of performing an efficient search for a natural language input including an ambiguous word of a user.
Another object of the present invention is to provide a spoken dialogue system capable of efficiently and flexibly searching by estimating meanings corresponding to ambiguous words included in a natural language input from a user.
In addition, the present invention can provide natural information that matches the user's sense even when the user has used the dialogue system less frequently, improving the efficiency of information retrieval and user convenience. An object of the present invention is to provide a voice dialogue system that can be improved.
In addition, the present invention learns a value represented by an ambiguous word from a user's utterance history, can automatically set and translate it according to the user, improve information retrieval efficiency, and sense of the user. It is an object of the present invention to provide a spoken dialogue system that improves the convenience for the user by presenting natural information that matches the requirements.
[0010]
[Means for Solving the Problems]
  The spoken dialogue system according to the present invention provides a dialogue system that expresses a command intention defined corresponding to a dialogue system operation, an item defining a parameter type of the dialogue system operation, and an item value that is a value corresponding to the item. Command knowledge storage means for storing conversion knowledge for converting a natural language into a command as command knowledge, and words that cannot be uniquely converted into item values in natural language are defined as ambiguous words, items of ambiguous words and ambiguous words , And an ambiguous word dictionary storage means for storing a meaning indicator corresponding to the ambiguous word as an ambiguous word dictionary, and an ambiguous word dictionary stored in the ambiguous word dictionary storage means, and is included in the natural language input by the user Replace the ambiguous word with the meaning indicator corresponding to the ambiguous word, create a pair of the ambiguous word item and the meaning indicator, and store the command knowledge stored in the command knowledge storage means And a command conversion means for converting the input natural language into a command including a pair of an ambiguous word item and a semantic indicator, and estimated value information for estimating a value of the semantic indicator corresponding to the ambiguous word Is stored in the estimated value information storage means for a command including a pair of an ambiguous word item and a semantic marker input from the command converting means. An ambiguous word translating means for determining an estimated value of a semantic indicator corresponding to an ambiguous word and outputting it together with a command with reference to estimated value information corresponding to the user identifierThe relationship between the estimated values corresponding to the meaning indicators is defined as a function, and the interpolation model storage means stored as an interpolation model, the user identifier from the ambiguous word translation means, and the meaning sign of the ambiguous word are input, and the user Of the ambiguous words in the estimated value information corresponding to the identifier, for the ambiguous words for which the estimated value information of the input semantic marker is unlearned, the learned ambiguous words are obtained using the interpolation model of the interpolation model storage means. An estimated value interpolation means for calculating an estimated value of an unlearned meaning sign from the estimated value information for the meaning sign and outputting it to the ambiguous word translation means;It is equipped with.
[0012]
  Spoken dialogue system according to the present inventionIs defined as a command of a dialog system, an expression consisting of a command intention defined corresponding to a dialog system action, an item defining a parameter type of the dialog system action, and an item value which is a value corresponding to the item as a command of the dialog system. Command knowledge storage means for storing conversion knowledge for conversion into commands as command knowledge, and words that cannot be uniquely converted into item values in natural language are defined as ambiguous words, and correspond to ambiguous words, ambiguous word items, and ambiguous words The ambiguous word dictionary storage means for storing the meaning sign as an ambiguous word dictionary and the ambiguous word dictionary stored in the ambiguous word dictionary storage means, and the ambiguous word included in the natural language input by the user is changed to the ambiguous word. A pair of ambiguous words and a meaning indicator is created by substituting the corresponding meaning indicator, and the command knowledge stored in the command knowledge storage means is referred to and input. Command conversion means for converting a natural language into a command including a pair of an ambiguous word item and a semantic indicator, and use of specifying a user with estimated value information for estimating a value of a semantic indicator corresponding to the ambiguous word A user identifier stored in the estimated value information storage means for a command including an ambiguous word item and a meaning indicator pair input from the command conversion means; An ambiguous word translating means for determining an estimated value of a semantic marker corresponding to an ambiguous word with reference to the corresponding estimated value information and outputting it together with a command, and all user estimated values for storing estimated value information for all users Input the user identifier and the ambiguous word meaning indicator from the information storage means and the ambiguous word translating means, and input the meaning indicator of the ambiguous words in the estimated value information corresponding to the user identifier. For ambiguous words whose fixed value information is unlearned, refer to the estimated value information for all users stored in the all-user estimated value information storage means, and estimate values for semantic markers of learned ambiguous words Estimated value selection means for selecting estimated values of unlearned semantic tags and outputting them to ambiguous word translation means using estimated value information of other users having a high degree of coincidence with informationIt is equipped with.
[0013]
In the spoken dialogue system according to the present invention, a database for storing a set of search target data to which items and item values are assigned, a system and a user by executing a predetermined dialogue system operation corresponding to the input command And a dialogue management means for searching the database and generating a response semantic expression representing the semantic content of the response sentence notified to the user.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described below.
Embodiment 1 FIG.
FIG. 1 is a functional block configuration diagram of a voice interaction system according to Embodiment 1 of the present invention, in which 1 is a command knowledge storage unit (command knowledge storage unit), 2 is a command conversion unit (command conversion unit), 3 is a database, 4 is a response history storage unit (response history storage unit), 5 is a dialog management unit (dialog management unit), 6 is an ambiguous word dictionary storage unit (ambiguous word dictionary storage unit), and 7 is an estimated value information storage unit (Estimated value information storage means), 8 is an ambiguous word translation section (ambiguity word translation means), 9 is an ambiguous word storage section (ambiguity word storage means), 10 is an estimated knowledge storage section (estimated knowledge storage means), and 11 is an estimation A value adaptation unit (estimated value adaptation means).
[0016]
Next, the operation will be described.
First, a natural language from the user is input to the command conversion unit 2. The natural language to be input is text resulting from speech recognition of the user's utterance. However, what can be input is not limited to the natural language from the user. The text may be input from another means such as a keyboard or GUI. Furthermore, as a result of language processing such as morphological analysis, syntactic analysis, and semantic analysis being performed on the input natural language as a pre-stage of command conversion, it is a semantic expression that is an expression format with a semantic structure. There may be.
[0017]
Next, the command conversion unit 2 converts the input natural language into a command for the dialogue system according to the command knowledge stored in the command knowledge storage unit 1. The command knowledge storage unit 1 stores command knowledge describing the correspondence between natural language and commands. As an example of the definition of the command, command = intention: (item 1, item value 1), (item 2, item value 2), ..., (item n, item value n) are expressed as follows. It is expressed as a combination of an intention, an item that is a parameter, and a pair of item values. Here, the intention is defined corresponding to the operation of the dialogue system, and the item is defined corresponding to the type of search condition relating to the search target data. The item value is a specific value belonging to the item. For example, in the case of hotel reservation, the intentions are <intention: search request>, <intention: reservation request>, <intention: item question>, <intention: item value confirmation>, <intention: item value statement>, <intention : <Affirmation>, <Intention: Denial>, etc. As the pairs of items and item values, (<location>, Yokohama), (<charge>, 6000 ≧), (<room type>, single), ( <Number>, 2), (<target>, hotel), etc.
[0018]
The command knowledge stored in the command knowledge storage unit 1 is knowledge for defining the correspondence relationship between the command expression and the natural language. For example, for natural languages such as "Tell me", "Do you have", "I'm looking for", etc., <Intention: Search request> corresponds to "Where", "How much" ”Corresponds to <intention: item question>. In addition, for example, (<Yokohama>) corresponds to (<place>, Yokohama) as a pair of item and item value, and [<Fee>, 6000 ≧ ) Corresponds. Command knowledge expresses these correspondences in the form of correspondence tables, conversion rules, etc., using morphological information about natural language, semantic information such as particles and auxiliary verbs.
[0019]
Further, the command conversion unit 2 detects the ambiguous word and the ambiguous word stored in the ambiguous word dictionary storage unit 6 for ambiguous words whose item values such as “cheap” and “close” cannot be determined uniquely. The correspondence between the item and the meaning sign corresponding to the ambiguous word is referred to, and the ambiguous word included in the input is replaced with the meaning sign corresponding to the ambiguous word to create a pair of the ambiguous word item and the meaning sign. Further, referring to the command knowledge, the input natural language is converted into a command expression including the above-mentioned ambiguous word item and semantic indicator pair.
[0020]
FIG. 2 shows an example of the correspondence relationship stored in the ambiguous word dictionary storage unit 6. For example, with respect to the item <charge>, the meaning indicator “$ cheap1” corresponds to the ambiguous word “independent word (cheap)”. If the input natural language includes the ambiguous word “cheap”, the command conversion unit 2 refers to the above correspondence and converts it to (<charge>, $ cheap1). From the above, for example, the natural language “Tell me about a cheap hotel in Yokohama” is displayed by the command conversion unit 2 as “<intention: search request>: (<location>, Yokohama), (<charge>, $ cheap1). , (<Object>, hotel) ”.
[0021]
When the command input from the command conversion unit 2 includes an ambiguous word meaning indicator, the ambiguous word translation unit 8 stores the estimated value information for each user stored in the estimated value information storage unit 7. From the estimated value information corresponding to the user identifier of the user who is currently interacting with the system, the estimated value of the ambiguous word is determined, and the meaning indicator of the ambiguous word in the command is replaced with the determined estimated value. And output to the dialogue management unit 5.
[0022]
The estimated value information storage unit 7 stores ambiguous words and estimated value information of semantic markers corresponding to the ambiguous words for each user. The estimated value information is a record of what value the user used in the past conversation as an ambiguous word. The estimated value information is recorded by the estimated value adaptation unit 11 described later using the history of the conversation between the user and the system. To be learned. When the user first interacts with the system, values initially set for the meaning sign of the ambiguous word are used as estimated value information.
[0023]
When a command is input, the dialog management unit 5 executes the operation of the system corresponding to the command based on a set predetermined dialog procedure, and manages the dialog between the system and the user. As an example of the predetermined dialog procedure, for example, if the intention of the command is a search request, the dialog management unit 5 creates a search expression using a pair of an item and an item value that are parameters of the command, and Performs a search and outputs a semantic representation of the response to notify the user of the search results. The database 3 stores a set of search target data to which items and item values are assigned. FIG. 3 is an example of search target data stored in the database 3, and item and item value data are given to each target name.
[0024]
Alternatively, as another example of the predetermined interactive procedure, when the intention of the command is a reservation request, items required for reservation, for example, <target name>, <reservation date>, <number>, <room type> If all the item values for, etc. are obtained, confirmation of the reservation operation is performed to the user and the reservation operation is executed after confirmation. If all of the item values are not obtained, the item value of the missing item is obtained. Outputs the semantic representation of the response for asking the user.
[0025]
The semantic expression of the response is an expression format for generating a response sentence that the system notifies the user. A general automatic speech response apparatus includes a sentence generation unit that generates a response sentence from a semantic expression of a response, and a voice synthesis unit that converts a response sentence received from the sentence generation unit into a synthesized speech. The semantic expression of the response output from 5 is notified to the user as voice.
[0026]
For example, as a semantic expression of this response, the result of the database search performed by the system in response to the user's input “Tell me a hotel of 6,000 yen or less at Yokohama Station” is A hotel 4500 yen, B hotel 5500 In the case of three cases of Yen and C Hotel 6000 Yen, the meaning expression of the response is “<Search result presentation>: (Target name A hotel (<Fee> 4500 yen))”, (Target name B Hotel (<Fee) > 5500 yen)), (target name C Hotel (<charge> 6000 yen)) ".
[0027]
Furthermore, the dialogue management unit 5 records the semantic expression of the response after the user starts the dialogue in the response history storage unit 4 together with the response identification numbers given in the response order from the start of the dialogue.
The above is an operation example in the case where the estimated value of the ambiguous word is determined using the estimated value information corresponding to the individual user stored in the estimated value information storage unit 7.
[0028]
Next, an operation example when learning estimated value information stored in the estimated value information storage unit 7 will be described.
The ambiguous word storage unit 9 stores an ambiguous word meaning indicator together with an ambiguous word item, and stores it in a format to which an input identification number from the start of the dialogue is given. For example, when the command input in the third utterance of the dialogue is “<intent: search request>: (<location>, Yokohama), (<charge>, $ cheap1), (<target>, hotel)” , (3: <charge>, $ cheap1) is registered in the ambiguous word storage unit 9. When the command input from the command conversion unit 2 includes an ambiguous word meaning indicator, the estimated value adaptation unit 11 first registers the ambiguous word meaning indicator in the ambiguous word storage unit 9 in the above format.
[0029]
Next, after the response of the dialog management unit 5 to the command is notified to the user, the estimated value adaptation unit 11 uses the utterance content of the user to the response to the ambiguous word registered in the ambiguous word storage unit 9. Estimate an estimate for the meaning tag of the word. Next, the estimation method will be described together with a specific example. For example, the user's utterance is “Tell me about a cheap hotel in Yokohama” and the response of the dialogue management unit 5 is “A cheap hotel in Yokohama is A hotel 4500 yen, B hotel 5500 yen, C hotel There is 6000 yen ”. For example, the following three types of user utterances for this response can be considered.
(1) "I'd like about 10,000 yen"
(2) "It doesn't matter if it is a little higher"
(3) "Where is the nearest station of C Hotel"
[0030]
(1) is a case where the amount indicated in the response cannot be accepted and the user explicitly expresses the amount assumed by himself / herself. At this time, the estimated value of the ambiguous word “cheap” can be estimated to be about 10,000 yen in the input command.
(2) is a case where the user cannot accept the amount indicated in the response and the user speaks the intention of the search request to the system again. At this time, it is possible to estimate that the estimated value of the ambiguous word “cheap” is higher than 6000 yen, which is the maximum value of the presented amount, by another ambiguous word “high” in the input command.
(3) is that the intention of the utterance is the intention of <item question> to ask for the item <nearest station> other than the item <charge>. Probably accepted. Therefore, the estimated value of the ambiguous word “cheap” can be estimated to be about 6000 yen.
[0031]
In order to perform the estimation as described above, the estimated knowledge storage unit 10 is configured to estimate knowledge based on the relationship between the response meaning expression stored in the response history storage unit 4 and the command input from the command conversion unit 2. Is stored as an estimate of the ambiguous word. The estimated value adaptation unit 11 determines the estimated value of the ambiguous word based on the estimated knowledge in the estimated knowledge storage unit 10 with reference to the input command and the semantic representation of the response in the response history storage unit 4. As a result, the estimated value information in the estimated value information storage unit 7 corresponding to the user identifier is updated and learned, and the meaning marker of the ambiguous word targeted for learning is deleted from the ambiguous word storage unit 9. FIG. 4 shows the data structure stored in the estimated value information storage unit 7.
[0032]
The estimated knowledge stored in the estimated knowledge storage unit 10 is, for example, as knowledge having a condition determination unit that can be divided into cases (1) to (3) as follows with rules of if to then: Describe.
(1) if (the item value A of <present search result> exists in the response of the response history storage unit 4 and item value B exists together with <intention: item value assertion> in the current command) then (estimated Value is item value B)
(2) if (the item value A of <present search result> exists in the response of the response history storage unit 4) and the ambiguous word related to the item corresponding to the item value A together with <intention: search request> in the current command There is a semantic indicator) then (waits for next command input)
(3) if (the item value A of <present search result> exists in the response in the response history storage unit 4) and the previous response together with <intention: item question> relating to an item not corresponding to the item value A in the current command There (there is an item name A corresponding to the object name in the previous response)
[0033]
The estimated value adaptation unit 11 records the estimated value obtained as described above as estimated value information in the estimated value information storage unit 7. As the estimated value information, for example, frequency information of each estimated value may be recorded with respect to the meaning marker of each ambiguous word of each item.
[0034]
As described above, according to the first embodiment, a natural language input including an ambiguous word of a user is converted into a command with an expression including an intention, an item, and an item value, and the meaning corresponding to the ambiguous word By determining the estimated value of the sign and outputting it together with the command, it is possible to obtain an effect that an efficient and flexible search can be performed.
In addition, by learning from the user's utterance history the value represented by the ambiguous word, and automatically setting and translating it according to the user, it is possible to improve the efficiency of information retrieval and to match the user's sense. Information can be presented and the convenience of the user can be improved.
[0035]
In the first embodiment, when there are a plurality of search targets, the item <fee> is defined as a separate item. For example, when the search items are a hotel and a restaurant, the hotel is Search [hotel]: <rate (hotel)> = $ cheap hotel, and the restaurant is Search [restaurant]: <rate (restaurant)> = $. It becomes a cheap restaurant.
[0036]
Embodiment 2. FIG.
FIG. 5 is a functional block configuration diagram of the spoken dialogue system according to Embodiment 2 of the present invention. In the figure, 12 is an interpolation model storage unit (interpolation model storage unit), and 13 is an estimated value interpolation unit (estimated value interpolation unit). It is. Other configurations are the same as those of the first embodiment shown in FIG. 1, and are denoted by the same reference numerals.
Next, the operation will be described.
In the second embodiment, the meaning of an ambiguous word whose estimated value information is not learned in learning of estimated value information stored in the estimated value information storage unit 7 when the number of times the user has used the dialogue system is small. The estimated value of the meaning marker of the unlearned ambiguous word is interpolated and calculated using the estimated value information of other learned ambiguous words.
[0037]
The interpolation model storage unit 12 defines the relationship between the meaning sign of the ambiguous word and the estimated value corresponding to the meaning sign as a function, and stores it as an interpolation model. The function used as the interpolation model may be any function that can be calculated by interpolating the estimated value when the meaning marker of the ambiguous word is given. For example, as shown in FIG. 2, as a meaning indicator of a plurality of ambiguous words for the same item <charge>, $ cheapest (ambiguity word: as cheap as possible), $ cheap1 (ambiguity word: cheap), $ cheap2 (ambiguity word: $ Not_so_exp (ambiguous word: not so high), $ exp (ambiguous word: may be a little high), etc. are defined, and these estimated values are sequentially set to v1, v2, v3, v4. , V5, a function that defines the difference between the estimated values is stored such as v1 = v2-1000, v1 = v3-2000, v1 = v4-3000, v1 = v5-4000, and the like. .
[0038]
The estimated value interpolation unit 13 receives the user identifier from the ambiguous word translation unit 8 and the meaning marker of the ambiguous word as input, and refers to the estimated value information corresponding to the user identifier stored in the estimated value information storage unit 7. When the estimated value information for the input ambiguous word meaning marker is not learned, the interpolation model storage unit 12 is used to calculate the unlearned information from the estimated value information for the learned ambiguous word meaning marker. The estimated value of the meaning marker is calculated and output to the ambiguous word translation unit 8. For example, the estimated value v1 of the ambiguous word meaning indicator $ cheapest (ambiguous word: as cheap as possible) is unlearned, the estimated value v2 of $ cheap1 (ambiguous word: cheap) is learned, and v2 = 6000. And At this time, the estimated value interpolation unit 13 refers to a function that defines the difference between the estimated values stored in the interpolation model storage unit 12, and performs unlearned estimation such as v1 = v2-1000 = 5000. The value v1 is calculated.
[0039]
As described above, according to the second embodiment, an estimated value of an unlearned ambiguous word can be interpolated and calculated from estimated value information of a learned ambiguous word. Even if the number of times is small, it is possible to estimate the item value of ambiguous words, improve the efficiency of information retrieval, and present natural information that matches the user's sense, and improve user convenience The effect of being able to be obtained.
[0040]
Embodiment 3 FIG.
FIG. 6 is a functional block configuration diagram of the spoken dialogue system according to Embodiment 3 of the present invention. In the figure, 14 is an all user estimated value information storage unit (all user estimated value information storage means), and 15 is an estimated value. It is a selection part (estimated value selection means). Other configurations are the same as those of the first embodiment shown in FIG. 1, and are denoted by the same reference numerals.
[0041]
Next, the operation will be described.
In the third embodiment, when the number of times the user has used the dialogue system is small, the estimated value for the meaning marker of the ambiguous word whose estimated value information stored in the estimated value information storage unit 7 is not learned is used. It uses the estimated value information of learned ambiguous words of other users having a high degree of coincidence of information, and estimates and calculates meaning markers of unlearned ambiguous words.
[0042]
The all-user estimated value information storage unit 14 stores estimated value information for all users in association with user identifiers. The data structure in the total user estimated value information storage unit 14 is obtained by adding a user identifier to the data structure shown in FIG. The estimated value selection unit 15 receives the user identifier from the ambiguous word translation unit 8 and the meaning marker of the ambiguous word, and refers to the estimated value information corresponding to the user identifier stored in the estimated value information storage unit 7. Then, when the estimated value information for the input meaning marker of the ambiguous word is not learned, the total user estimated value information storage unit 14 is referred to. Then, the degree of coincidence between the estimated value information of the estimated value information of the user A who is currently using the system and the estimated value information of the other user B is calculated. The degree of coincidence is, for example, by comparing estimated values of learned ambiguous words for both user A and user B. If the difference between the estimated values is within a certain range, the estimated values of the ambiguous words match. The number of matched ambiguous words is defined as the degree of matching.
[0043]
The estimated value selection unit 15 refers to estimated value information for all users stored in the all-user estimated value information storage unit 14 and uses estimated value information of ambiguous words that have not been learned by the user A. The user C having the highest degree of coincidence with the user A is selected, and the estimated value information of the learned ambiguous word of the user C is used as the estimated value of the unlearned semantic marker of the user A. Output to the word translation unit 8.
[0044]
As described above, according to the third embodiment, the estimated value of an unlearned ambiguous word is substituted with the estimated value of another user who has a high degree of coincidence of estimated value information. Even when the number of times of using the dialogue system is small, it is possible to estimate the estimated value of ambiguous words, improve the efficiency of information retrieval, and present natural information that matches the user's sense, and convenience for users The effect that can be improved is obtained.
[0045]
In each of the above embodiments, the invention of the voice interaction system has been described. However, the integrated computer system including the voice interaction system of the present invention and the telephone line is constructed, and the input inputted through the telephone line is used. It is possible to develop a business that understands a natural language including an ambiguous word of a customer, that is, a customer, and provides information requested by the customer at a low price. In addition, for example, a significant effect can be obtained by applying the voice interactive system of the present invention to various service agents such as reception, ordering, reservation, and the like, and to a telephone voice automatic response device that provides information requested by the user. Alternatively, by applying the speech dialogue system of the invention, an effect of realizing a vending machine that understands a natural language including an ambiguous word of a customer can be obtained.
[0046]
【The invention's effect】
  As described above, according to the present invention, the voice dialog system is configured such that the command intention defined corresponding to the dialog system operation, the item defining the parameter type of the dialog system operation, and the value corresponding to the item A command knowledge storage means for storing conversion knowledge for converting a natural language into a command as command knowledge, an expression consisting of values as a dialogue system command, and a word that cannot be uniquely converted into an item value in natural language as an ambiguous word, An ambiguous word, an ambiguous word item, and an ambiguous word dictionary storage means for storing a meaning indicator corresponding to the ambiguous word as an ambiguous word dictionary, and an ambiguous word dictionary stored in the ambiguous word dictionary storage means, The ambiguous word contained in the input natural language is replaced with a semantic indicator corresponding to the ambiguous word to create a pair of the ambiguous word item and the semantic indicator and record it in the command knowledge storage means. A command conversion means for converting an input natural language into a command including a pair of an ambiguous word item and a semantic indicator, and estimating a value of a semantic indicator corresponding to the ambiguous word Estimated value information storage means for storing the estimated value information together with a user identifier for identifying the user, and estimated value information storage for a command including a pair of an ambiguous word item and a semantic sign input from the command conversion means An ambiguous word translating means for determining an estimated value of a semantic marker corresponding to an ambiguous word and outputting it together with a command with reference to estimated value information corresponding to the user identifier stored in the means;The relationship between the estimated values corresponding to the meaning indicators is defined as a function, the interpolation model storage means stored as an interpolation model, the user identifier from the ambiguous word translation means and the meaning sign of the ambiguous word are input, and the user Of the ambiguous words in the estimated value information corresponding to the identifier, the ambiguous words that have been learned by using the interpolation model of the interpolation model storage means for the ambiguous words for which the estimated value information of the input semantic marker is unlearned An estimated value interpolating means for calculating an estimated value of an unlearned semantic sign from the estimated value information for the meaning sign and outputting it to the ambiguous word translating means;Therefore, the natural language input including the ambiguous word of the user is converted into a command with an expression consisting of the intention, the item, and the item value, and the estimated value of the semantic indicator corresponding to the ambiguous word is obtained. By determining and outputting together with the command, there is an effect that an efficient and flexible search can be performed.In addition, even when the number of times the user has used the dialogue system is small, it is possible to estimate the item value of the ambiguous word, improve the efficiency of information retrieval, and present natural information that matches the user's sense, There is an effect that the convenience of the user can be improved.
[0048]
  Spoken dialogue system in the present inventionIs defined as a command of a dialog system, an expression consisting of a command intention defined corresponding to a dialog system action, an item defining a parameter type of the dialog system action, and an item value which is a value corresponding to the item as a command of the dialog system. Command knowledge storage means for storing conversion knowledge for conversion into commands as command knowledge, and words that cannot be uniquely converted into item values in natural language are defined as ambiguous words, and correspond to ambiguous words, ambiguous word items, and ambiguous words The ambiguous word dictionary storage means for storing the meaning sign as an ambiguous word dictionary and the ambiguous word dictionary stored in the ambiguous word dictionary storage means, and the ambiguous word included in the natural language input by the user is changed to the ambiguous word. A pair of ambiguous words and a meaning indicator is created by substituting the corresponding meaning indicator, and the command knowledge stored in the command knowledge storage means is referred to and input. Command conversion means for converting a natural language into a command including a pair of an ambiguous word item and a semantic indicator, and use of specifying a user with estimated value information for estimating a value of a semantic indicator corresponding to the ambiguous word A user identifier stored in the estimated value information storage means for a command including an ambiguous word item and a meaning indicator pair input from the command conversion means; An ambiguous word translating means for determining an estimated value of a semantic marker corresponding to an ambiguous word with reference to the corresponding estimated value information and outputting it together with a command, and all user estimated values for storing estimated value information for all users Input the user identifier and the ambiguous word meaning indicator from the information storage means and the ambiguous word translating means, and input the meaning indicator of the ambiguous words in the estimated value information corresponding to the user identifier. For ambiguous words whose fixed value information is unlearned, refer to the estimated value information for all users stored in the all-user estimated value information storage means, and estimate values for semantic markers of learned ambiguous words Estimated value selection means for selecting estimated values of unlearned semantic tags and outputting them to ambiguous word translation means using estimated value information of other users having a high degree of coincidence with informationWithRuEven if the number of times the user has used the dialogue system is small, the estimated value of the ambiguous word is estimated to improve the efficiency of information retrieval and present natural information that matches the user's sense. It is possible to improve user convenience.
[0049]
In the spoken dialogue system according to the present invention, a database for storing a set of search target data to which items and item values are assigned, a system and a user by executing a predetermined dialogue system operation in response to an input command. Natural language including user's ambiguous words because it is configured to include dialogue management means that manages the conversation of the user, searches the database, and generates a response semantic expression representing the semantic content of the response sentence notified to the user The input is converted into a command with an expression consisting of intention, item, and item value, the database is searched, and a response adapted to the user's input can be obtained.
[Brief description of the drawings]
FIG. 1 is a functional block configuration diagram of a voice interaction system according to a first embodiment of the present invention.
FIG. 2 is a diagram showing an example of a correspondence relationship between items, ambiguous words, and meaning markers stored in an ambiguous word dictionary storage unit in each embodiment of the present invention.
FIG. 3 is a diagram showing an example of search target data stored in a database according to each embodiment of the present invention.
FIG. 4 is a diagram showing a data structure stored in an estimated information storage unit in Embodiment 1 of the present invention.
FIG. 5 is a functional block configuration diagram of a voice interaction system according to Embodiment 2 of the present invention.
FIG. 6 is a functional block configuration diagram of a voice interaction system according to Embodiment 3 of the present invention.
[Explanation of symbols]
1 command knowledge storage unit (command knowledge storage unit), 2 command conversion unit (command conversion unit), 3 database, 4 response history storage unit (response history storage unit), 5 dialog management unit (dialog management unit), 6 ambiguous word Dictionary storage unit (ambiguous word dictionary storage unit), 7 estimated value information storage unit (estimated value information storage unit), 8 ambiguous word translation unit (ambiguous word translation unit), 9 ambiguous word storage unit (ambiguous word storage unit), 10 Estimated knowledge storage unit (estimated knowledge storage unit), 11 Estimated value adaptation unit (estimated value adaptation unit), 12 Interpolation model storage unit (interpolation model storage unit), 13 Estimated value interpolation unit (estimated value interpolation unit), 14 Full use Estimated value information storage unit (all user estimated value information storage means), 15 estimated value selection unit (estimated value selection means).

Claims

An expression consisting of a command intention defined corresponding to a dialog system operation, an item defining a parameter type of the dialog system operation, and an item value that is a value corresponding to the item is defined as a dialog system command, and a natural language is converted to a command. Command knowledge storage means for storing conversion knowledge for conversion as command knowledge;
An ambiguous word dictionary storage means for storing an ambiguous word, an ambiguous word item, and a meaning indicator corresponding to the ambiguous word as an ambiguous word dictionary as a word that cannot be uniquely converted into an item value in a natural language;
By referring to the ambiguous word dictionary stored in the ambiguous word dictionary storage means, the ambiguous word included in the natural language input by the user is replaced with a semantic indicator corresponding to the ambiguous word, and the ambiguous word item and the semantic indicator A command conversion means for converting the input natural language into a command including a pair of an ambiguous word item and a semantic indicator with reference to command knowledge stored in the command knowledge storage means ,
Estimated value information storage means for storing estimated value information for estimating a value of a meaning sign corresponding to an ambiguous word together with a user identifier for identifying a user;
For a command including an ambiguous word item and a meaning indicator pair input from the command conversion means, refer to the estimated value information corresponding to the user identifier stored in the estimated value information storage means, and the ambiguous word An ambiguous word translation means for determining an estimated value of a semantic sign corresponding to and outputting the estimated sign together with the command;
Interpolation model storage means that defines the relationship between the estimated values corresponding to the meaning markers as a function and stores it as an interpolation model;
Of the ambiguous words in the estimated value information corresponding to the user identifier, the ambiguous word for which the estimated value information of the input semantic marker is unlearned is input. On the other hand, using the interpolation model of the interpolation model storage means, the estimated value of the unlearned meaning tag is calculated from the estimated value information for the learned meaning mark of the ambiguous word and is output to the ambiguous word translation means Value interpolation means;
Spoken dialogue system with

An expression consisting of a command intention defined corresponding to a dialog system operation, an item defining a parameter type of the dialog system operation, and an item value that is a value corresponding to the item is defined as a dialog system command, and a natural language is converted to a command. Command knowledge storage means for storing conversion knowledge for conversion as command knowledge;
An ambiguous word dictionary storage means for storing an ambiguous word, an ambiguous word item, and a meaning indicator corresponding to the ambiguous word as an ambiguous word dictionary as a word that cannot be uniquely converted into an item value in a natural language;
By referring to the ambiguous word dictionary stored in the ambiguous word dictionary storage means, the ambiguous word included in the natural language input by the user is replaced with a semantic indicator corresponding to the ambiguous word, and the ambiguous word item and the semantic indicator A command conversion means for converting the input natural language into a command including a pair of the ambiguous word item and the semantic indicator, with reference to the command knowledge stored in the command knowledge storage means. ,
Estimated value information storage means for storing estimated value information for estimating a value of a meaning sign corresponding to an ambiguous word together with a user identifier for identifying a user;
For a command including an ambiguous word item and a meaning indicator pair input from the command conversion means, refer to the estimated value information corresponding to the user identifier stored in the estimated value information storage means, and the ambiguous word An ambiguous word translation means for determining an estimated value of a semantic sign corresponding to and outputting the estimated sign together with the command;
All user estimated value information storage means for storing estimated value information for all users;
Of the ambiguous words in the estimated value information corresponding to the user identifier, the ambiguous word whose estimated value information of the input semantic marker is unlearned On the other hand, with reference to the estimated value information for all users stored in the all-user estimated value information storage means, the degree of coincidence with the estimated value information for the meaning sign of the learned ambiguous word is high. Using estimated value information of the user, selecting an estimated value of an unlearned semantic marker and outputting it to an ambiguous word translation means;
A voice dialogue system characterized by comprising:

A database for storing a set of search target data to which items and item values are assigned;
In response to an input command, a predetermined semantic system operation is executed to manage the dialog between the system and the user, and the database is searched and the response semantic expression representing the semantic content of the response sentence notified to the user A dialogue management means for generating
Claim 1 or the speech dialogue system of any one of claims 2, characterized in that with a.