JP2004045900A

JP2004045900A - Voice interaction device and program

Info

Publication number: JP2004045900A
Application number: JP2002204895A
Authority: JP
Inventors: Tsukasa Shimizu; 清水　司; Toshihiro Wakita; 脇田　敏裕
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2002-07-12
Filing date: 2002-07-12
Publication date: 2004-02-12

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a voice interaction device which determines an aimed slot value, even when a user freely speaks independently of guidance at the time of receiving an inquiry or check based on the guidance from the interaction device. <P>SOLUTION: A slot value aimed by a guidance sentence, slot values other than the aimed slot value and a command vocabulary can be extracted from answer sentences. Which slot value is referred to by the command vocabulary can be judged. When all answers aimed by the voice interaction device are not obtained, a guidance sentence for obtaining the aimed answer is outputted by voice to request a speaker to provide an answer sentence again. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、発話者から目的とする回答を得るためにガイダンス文を音声出力し、このガイダンスに応答して発話者から得られる回答文を解析して目的とする回答を確定する音声認識装置に関する。従って、本発明は、例えば、車載用のカーナビゲーション・システム等に応用可能で、例えば目的地の施設名称や住所などの、所謂「スロット」に該当する情報を埋めていくような対話を行う音声対話装置等に適用することができる。
【０００２】
【従来の技術】
発話者から目的とする回答を得るためにガイダンス文を音声出力し、このガイダンスに応答して発話者から得られる回答文を解析して目的とする回答を確定する音声認識装置としては、一例として、公開特許公報「特開平１０−３１４９７」に記載されている音声対話装置が知られている。「特開平１０−３１４９７」に記載されている音声対話装置においては、複数のスロットを埋めるようなタスクにおいて、実施されている。直前の認識結果の確認と、次の質問を同時に行うことができるように、直前に認識したスロットの値を次に質問するべきスロットを尋ねるガイダンス文に挿入する。ガイダンスの提示後には、認識対象語句として質問したスロットに関するスロット語句と、認識に対する否定を表すコマンド語句（「はい」、「違います」などの諾否を表現する語句）に制限する。その後ユーザ発話の認識結果がスロット語句であればそのスロット語句を用いて次のガイダンスを生成し、コマンド語句であれば、その前に認識したスロット語句を棄却し、同じガイダンスを繰り返す。このように対話を行うことにより、効率よく円滑な対話を実現できる。このように、認識対象語彙を制限することにより、確実に対象とするスロットを確定することが期待できる。
【０００３】
以下に対話の例を示す。
例１）
システム：相手の所属をどうぞ。
ユーザ　：資材課です。
システム：資材課の誰ですか？
ユーザ　：佐藤。
システム：佐藤さんに電話をつなぎます。
例２）
システム：相手の所属をどうぞ。
ユーザ　：資材課。
システム：資料課の誰ですか？
ユーザ　：違います。
システム：もう一度、相手の所属をどうぞ。
……
このように直前の認識語句を次のガイダンスに挿入することにより、発話内容を確認しながら、対話の円滑な実現を図っている。
【０００４】
【発明が解決しようとする課題】
しかし、上述した従来の発明は、ユーザと効率的で、円滑な対話を実現するためのものであるが、対話中のある時点（各ガイダンスの直後）毎に認識語彙を、例えば、仮定したカテゴリーに属する語句に制限している。すなわち、ユーザは、ガイダンスで尋ねられたことにのみ応答を返すという仮定の下で対話が進められる。そのため、ユーザは、常にシステムのガイダンスの意図を意識し、システムが仮定（期待）している応答を返さなければいけない。このような対話では、ユーザの対話に対する心的負担が大きく、必ずしも全てのユーザに対して円滑な対話が実現されるとは限らない。
【０００５】
例えば、上記の例２）でのユーザ発話、「違います。」に対して、ユーザは、「資材課」と応答するかもしれない。しかし、システムでは、認識対象語彙として、人名キーワード語彙と、コマンド語彙を仮定しているので、「資材課」と発話した場合、「資材課」は、必ず、人名、又はコマンド語句に誤認識されてしまい、対話が迷走する。つまり、システムが仮定する応答が行われない時は、発話された内容とは全く異なるカテゴリーに分類されて、認識されてしまう。本発明は、上記の課題を解決する為になされたものであり、その目的は、ユーザが、システムが発するガイダンスを意識し、仮定している応答を常に意識して回答をしなくても、音声対話装置との円滑な音声対話を実現させ目的とする回答を得ることである。
【０００６】
なお、上述したある１つの発明が、上記した全ての目的を同時に達成するものと解されるべきではなく、個々の発明が、それぞれの目的を達成するものと解されるべきである。
【０００７】
【課題を解決するための手段】
上記課題を解決するために、請求項１の発明は、異なる複数のカテゴリー毎に設けられた少なくとも１つのスロットに発話者から目的とする回答を得るために少なくとも１つのカテゴリーを対象としたガイダンス文を音声出力し、このガイダンスに応答して発話者から得られる回答文を解析して目的とする回答を確定する音声対話装置において、回答文から、ガイダンス文で問われたカテゴリーのスロットについてのスロット値の候補を含むスロット語句及び、ガイダンス文で問われた以外の他のカテゴリーのスロットについてのスロット値の候補を含むスロット語句及び、「肯定」又は「否定」を表す単語を含むコマンド語句を検出するキーワード検出部を有するようにした。つまり、ユーザが、ガイダンスを意識しないで、発話をしても良いようにするためには、ガイダンスが前提としているカテゴリーを前提としないで、ユーザの回答文を解析する必要があるためである。同時に、全てのスロットが確定していない場合は、確認すべきカテゴリーのスロットに対して回答を要求するガイダンス文の制御を行う対話制御部とを有することを特徴とする。ここで、スロット語句とは、スロット値の候補を含む語句のことである。広くユーザの発話をカバーし、検出できるようにした。
【０００８】
さらに、請求項２の発明の音声対話装置は、スロット語句は、スロット値の候補となる単語であるスロット単語、及びスロット値の候補となる単語が助詞などを伴ったスロット単語列であり、コマンド語句は、「肯定」又は「否定」を表す単語であるコマンド単語、及び「肯定」又は「否定」を表す単語を含む単語列であるコマンド単語列であることを特徴とする。つまり、スロット語句とは、スロット値の候補となるスロット単語、及びスロット単語が助詞などを伴った単語列のことである。例えば、スロット単語が、「名古屋市」である場合、検出する過程において、助詞「の」を伴って「名古屋市の」を検出した場合においても、以後の処理の過程において扱うことを可能とするものである。さらに、コマンド語句としては、「否定」の意味を検出できる語句としては、「いいえ」、「違う」等のように、単語として扱えるものもあれば、「で　ない」、「じゃ　ない」、「違い　ます」等のように助詞等を伴った単語列の場合が多い。このような単語列からなる語句を含めて広く対話装置が目的とするスロット値、コマンドを検出できるようにするものである。
【０００９】
さらに、請求項３の発明は、ユーザがガイダンスにとらわれずに発話した場合、ユーザの回答文が、どのスロットに付いて言及し、その意味は何なのか判断する必要がある。従って、音声対話装置において、検出したスロット語句から、スロット値を検出し、回答文が、何れのスロット、又は検出したスロット値について言及しているか解析し、言及しているスロット、又はスロット値についてそれぞれコマンド語句の有無、及びその意味により「肯定」、「否定」の度合い、又は「何れでもない」を判定し、スロット毎、又はスロット値毎に対してそれぞれの判定結果により発話の意味を解析し、スロット毎、又はスロット値毎にそれぞれの判定結果の組からなる意味情報を生成する意味情報生成部と、意味情報により、各スロットにおけるスロット値とスロット状態の組からなるスロット情報を更新するスロット情報管理部とをさらに有することを特徴とする。
【００１０】
さらに、音声対話装置は、ユーザの回答文を解析し、音声対話装置が認識した結果得られたスロット情報が、ユーザの希望している情報であるか確認を取り、確定していく必要がある。従って、請求項４の発明は、音声対話装置において、意味情報により、スロット状態は、「未知」＜「未確認」＜「確定」の順に順位付けられてスロット情報管理部により管理されることを特徴とする。
【００１１】
さらに、請求項５の発明の音声対話装置は、得られたスロット値の組み合わせにより複数のスロット値により構成されたデータベース内を検索し、検索の結果、検索の対象外であるスロット値が得られていないスロットについてスロット値が一意的に定まる場合、検索により得られたスロット値を、該当するスロットのスロット値とし、スロット状態を「データベースによる設定」とするデーターベース検索部をさらに有するようにした。又、スロット情報管理部は、「データベースによる設定」状態を加えた４種類の状態により、「未知」＜「データベースによる設定」＜「未確認」＜「確定」の順に順位付けて管理することを特徴とする。
【００１２】
さらに、請求項６の発明の音声対話装置は、自由に発話してそれを誤り無く認識できるようにするため、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは、陽に肯定、又は否定を表すことを発話することにより、意味情報生成部は、あるスロットについて「肯定」、又は「否定」の意味情報を生成することができることを特徴とする。
【００１３】
さらに、請求項７の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは陽に肯定、又は否定を表すことを発話する代わりに、１つ、又は複数のスロットについて発話することで、意味情報生成部は、１つ、又は複数のスロットについての意味情報を生成することができることを特徴とする。
【００１４】
さらに、請求項８の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは陽に肯定、又は否定を表すことを発話する代わりに、確認を求められているスロット語句を発することにより、意味情報生成部は、あるスロットについてスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができることを特徴とする。
【００１５】
さらに、請求項９の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは、陽に肯定を表すことを発話する代わりに、１つ又は、複数の新たなスロットについて発話することにより、意味情報生成部は、あるスロットについて「肯定」の意味情報を生成し、さらに、１つ又は複数のスロットについてそれぞれ意味情報を生成することができることを特徴とする。
【００１６】
さらに、請求項１０の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは、陽に否定を表すことを発話する代わりに、確認を求められているスロット語句と「否定」を表す単語を含むコマンド語句を用いた発話により、意味情報生成部は、あるスロットについてスロット語句から検出されるスロット値と「否定」の組からなる意味情報を生成するすることができることを特徴とする。
【００１７】
さらに、請求項１１の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、正しいスロット語句を発話することによって、意味情報生成部は、あるスロットについて、保持するスロット値と「否定」の組からなる意味情報を生成し、さらに、発話された正しいスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができることを特徴とする。
【００１８】
さらに、請求項１２の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、確認を求められているスロット語句と「否定」を表す単語を含むコマンド語句を用いた発話をし、さらに、１つ又は、複数の新たなスロットについて発話することにより、意味情報生成部は、あるスロットについて、スロット語句から検出されるスロット値と「否定」の組からなる意味情報を生成し、さらに、１つ、又は複数のスロットについてそれぞれ意味情報を生成することができることを特徴とする。
【００１９】
さらに、請求項１３の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、正しいスロット語句を発話し、さらに、１つ又は複数の新たなスロットについて発話することにより、意味情報生成部は、あるスロットの保持するスロット値について「否定」の意味情報を生成し、さらに、あるスロットについて、発話された正しいスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成し、さらに、１つ、又は複数のスロットについてそれぞれ意味情報を生成することができることを特徴とする。
【００２０】
さらに、請求項１４の発明の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、確認を求められているスロット語句から検出されるスロット値と「否定」を表す単語を含むコマンド語句を用いた発話をし、さらに、１つ又は複数のスロットについて発話することにより、意味情報生成部は、あるスロットについて、スロット語句から検出されるスロット値と「否定」の組からなる意味情報を生成し、さらに、１つ、又は複数のスロットについてそれぞれ意味情報を生成することができることを特徴とする。
【００２１】
さらに、請求項１５の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロットについて直接スロット語句を用いて発話ことにより、意味情報生成部は、あるスロットについてスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することを特徴とする。
【００２２】
さらに、請求項１６の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロット以外のスロットについてスロット語句を用いて発話することにより、意味情報生成部は、あるスロット以外のスロットについて、発話されたスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができることを特徴とする。
【００２３】
さらに、請求項１７の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロット以外の１つ又は、複数のスロットについてスロット語句を用いて答えることにより、意味情報生成部は、あるスロット以外の１つ又は、複数のスロットについて、それぞれのスロット毎に発話されたスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができることを特徴とする。
【００２４】
さらに、請求項１８の発明の音声対話装置は、ユーザは音声対話装置から発せられたガイダンスを意識せずに発話しても、正しく音声対話装置が、認識可能であるようにするために、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロットについて又は／及び尋ねられたスロット以外の１つ又は、複数のスロットについてスロット語句を用いて答えることにより、意味情報生成部は、あるスロットについて発話されたスロット語句から検出されるスロット値と「肯定」の組による意味情報又は／及び尋ねられたスロット以外の１つ又は、複数のスロットについて、それぞれのスロット毎に発話されたスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができることを特徴とする。
【００２５】
さらに、請求項１９の発明は、より人対人の対話に近い円滑な音声対話を実現するために、目的の回答を得るために、発話者に回答文を促す文であるガイダンス文は、明示的ガイダンス文、暗黙的ガイダンス文、又は暗黙的な確認と、質問を同時に行うガイダンス文であることを特徴とする。
【００２６】
さらに、請求項２０の発明の音声対話プログラムは、音声対話装置のコンピュータにおいて、異なる複数のカテゴリー毎に設けられた少なくとも１つのスロットに発話者から目的とする回答を得るために少なくとも１つのカテゴリーを対象としたガイダンス文を音声出力する手順と、このガイダンスに応答して発話者から得られる回答文を解析する手順と、解析した結果を用いて、目的とする回答を確定する手順と、回答文から、ガイダンス文で問われたカテゴリーのスロットのスロット値の候補となる、スロット単語及びスロット単語が助詞などを伴った単語列からなるスロット語句及び、ガイダンス文で問われた以外の他のカテゴリーのスロットについてのスロット値の候補となるスロット語句及び、「肯定」又は「否定」を表す単語を含むコマンド語句を検出するキーワード検出手順と、全てのスロットが確定していない場合は、確認すべきカテゴリーのスロットに対して回答を要求するガイダンス文の制御を行う対話制御手順とを有することを特徴とする。
【００２７】
さらに、請求項２１の発明の音声対話プログラムは、音声対話装置のコンピュータにおいて、検出したスロット語句から、スロット値を検出し、回答文が、何れのスロット、又は検出したスロット値について言及しているか解析し、言及しているスロット、又はスロット値について「肯定」、「否定」の度合い、又は「何れでもない」を判定し、スロット毎、又はスロット値毎の判定結果により発話の意味を解析し、スロット値とそれぞれの判定結果の組からなる意味情報を生成する意味情報生成手順と、意味情報により、各スロットにおけるスロット語句とスロット状態の組からなるスロット情報を更新するスロット情報管理手順とをさらに有することを特徴とする。
【００２８】
さらに、請求項２２の発明の音声対話プログラムは、音声対話装置のコンピュータにおいて、スロット情報管理手順は、意味情報により、スロット状態を、「未知」＜「未確認」＜「確定」の順に順位付けて管理する手順をさらに有することを特徴とする。
【００２９】
さらに、請求項２３の発明の音声対話プログラムは、得られたスロット値の組み合わせにより、カテゴリー毎に複数のスロット値により構成されたデータベース内を検索し、検索の結果、検索の対象外であるスロット値が得られていないスロットについてスロット値が一意的に定まる場合、検索により得られたスロット値を、該当するスロットのスロット値とし、スロット状態を「データベースによる設定」とするデーターベース検索手順をさらに有し、「データベースによる設定」状態を加えた４種類の状態により、「未知」＜「データベースによる設定」＜「未確認」＜「確定」の順に順位付けて管理する手順をさらに有することを特徴とする。
【００３０】
【発明の作用、効果】
本欄では、各請求項に記載の発明に関して、主としてその作用及び効果を記載する。発明の理解を容易にするために、例示的に具体化して説明しているが、請求項の構成を限定するものではない。そして、例示的に具体化して説明した部分は、発明の実施の形態の説明でもある。
【００３１】
ここでは、カーナビゲーションシステムにおける目的地設定を対象タスクとした対話を例に説明する。目的地設定に必要なスロットとしては、「店名」、「市町村名」、「詳細住所」、「業種」の４つを考える。又、これらのスロット値から目的地となる、店舗の経度、緯度を検索できるデータベースを用いる。
【００３２】
まず、請求項１の発明は、異なる複数のカテゴリー毎に設けられた複数のスロットに発話者から目的とする回答を得るために少なくとも１つのカテゴリーを対象としたガイダンス文を音声出力し、このガイダンスに応答して発話者から得られる回答文を解析して目的とする回答を確定する際に、回答文から、ガイダンス文で問われたカテゴリーのスロットのスロット値の候補を含むスロット語句及び、ガイダンス文で問われた以外の他のカテゴリーのスロットについてのスロット値の候補を含むスロット語句及び、「肯定」又は「否定」を表す単語を含むコマンド語句を検出するキーワード検出部を有するようにした。これにより、従来、音声対話装置から問い掛けたガイダンスのカテゴリー以外の回答は、誤認識されるか、無視されていたが、正確に認識できるようになる。従って、ユーザは、ガイダンスの意図するカテゴリーを意識せずに自由に発話しても、音声対話装置が期待するスロット値を確定することが可能となる。同時に、音声対話装置は、対話を繰り返し、回答文から、スロット値を確定していくときに、全てのスロットが確定していない場合は、確認すべきスロットに対して回答を要求するガイダンス文の制御を行う対話制御部とを有するようにしたので、すべてのスロットが確定するまで、繰り返し、確認すべきスロットに対して、回答を要求するガイダンス文をユーザに発することができるようになる。つまり、ユーザが、ガイダンスが意図するカテゴリーに対しての発話を行わずに、他のカテゴリーに対する発話を自由に行っても、音声対話装置は、他のカテゴリーに対する発話としてスロット値を確定することが可能となる。さらに、回答を必要とするカテゴリーのスロットについての明示的なガイダンスをユーザに発することが可能な音声対話装置を得ることができるようになる。
【００３３】
例えば、カーナビゲーション装置において、目的地設定のための対話において「都道府県名を言ってください。」という都道府県カテゴリーを問うガイダンスに対して、本来、想定しているカテゴリーは、都道府県であるので、都道府県名を答えるべきであるのに、「長久手町です。」と答えることがある。また、「タンポポはどこにありますか？」という質問のガイダンスに対して、「名古屋市に有るレストランです。」などと、住所だけを答えるのではなく、さらに業種等も付け加えて答える場合もある。つまり、この場合、住所のカテゴリーについてのみを問い掛けているのに、住所だけでなく、業種カテゴリーに属する回答も同時に発話している。さらに、「名古屋市ですか？」という確認のガイダンスに対して、「はい。」「いいえ。」で答えるだけではなく、「そう、名古屋市のレストラン。」や「違う、豊田市。」などのように、新たな情報の追加や、修正を行う場合もある。
【００３４】
このような対話は、人対人の対話では良く見受けられるものであり、極めて自然な対話であり、本発明によれば、このような対話を可能とする装置を得ることができるようになる。これにより、ユーザは、必ずしも、装置からのガイダンスを意識して対話を進める必要が無くなり、言いたいことや、思いついたことを自由に話すことができるようになる。また、ユーザは、自由に話すことができるが、対話装置からは、明示的に、あるカテゴリーの特定のスロットについて、質問／確認を行うことも可能であるので、これにより、ユーザの発話を促すことができる。特に初心者にとっては、例えば、「どうぞお話ください。」などの明示的にどのカテゴリーについてのスロットについての質問か、指示されないガイダンスを提示した場合には、何を話してよいのか、又、どの様に話してよいのか惑わせることになり、対話が迷走する原因となる。従って、カテゴリーを明示的に表すガイダンスは、ガイダンスを手がかりに、ユーザに適確な発話を促す効果がある。上述した対話の例では、説明を分かりやすくするため、質問の対象となっているカテゴリーは１つであるが、実際の装置では、例えば、「都道府県名と、市町村名を言って下さい。」等のように複数のカテゴリーに付いての質問を行うことも可能である。
【００３５】
さらに、本発明のキーワード検出部においては、認識結果文字列から、ガイダンスで尋ねた内容にかかわらずに、スロット値を含むスロット語句を検出することが可能となる。同時に、諾否をあらわす単語を含むコマンド語句についても検出することができるようになる。このとき、例えば、対象としている目的地設定に関わるユーザの発話を広くカバーすることができる単一の単語辞書を用いることにより、ユーザが、対話のどの時点においても、言いたいことを言っても、スロット語句を検出できるようになる。さらに、例えば、この単一の単語辞書によれば、検出されたスロット語句がどのカテゴリーに属するか判断できるようになっている。
【００３６】
さらに、請求項２の発明によれば、音声対話装置において、スロット語句は、スロット値の候補となる単語であるスロット単語、及びスロット値の候補となる単語が助詞などを伴ったスロット単語列であり、コマンド語句は、「肯定」又は「否定」を表す単語であるコマンド単語、及び「肯定」又は「否定」を表す単語を含む単語列であるコマンド単語列であるので、スロット単語を含むスロット語句を検出することが可能となる。同時に、「肯定」又は「否定」を表す単語を含むコマンド語句についても検出することが可能となる。これにより、自由に行われるユーザの発話を広く認識することが可能となり、より使い勝手の良い対話装置を得ることが可能となる。
【００３７】
さらに、請求項３の発明によれば、音声対話装置は、意味情報生成部において検出したスロット語句から、スロット値を検出し、回答文が、何れのスロット、又は検出したスロット値について言及しているか解析し、言及しているスロット、又はスロット値について「肯定」、「否定」の度合い、又は「何れでもない」を判定し、スロット毎、又はスロット値毎に対してそれぞれの判定結果により発話の意味を解析し、スロット、又はスロット値とそれぞれの判定結果の組からなる意味情報を生成することができるようになる。さらに、スロット情報管理部において、意味情報により、各スロットにおけるスロット値とスロット状態の組からなるスロット情報を更新することができるようになる。これにより、ユーザの回答文から検出されたスロット語句をスロット値として、確定することができるようになる。
【００３８】
つまり、意味情報生成部では、請求項１の発明によるキーワード検出部において、認識結果文字列から検出された、スロット値を含むスロット語句からスロット地を検出し、さらに、どのカテゴリーのスロットについて言及しているのかを判定する。また、例えば、「ない」、「違う」などの否定を表す単語を含むコマンド語句の存在を調べ、それぞれのコマンド語句がどのスロット、又はスロット値について言及しているか調べ、全てのスロット、及びスロット値について肯定／否定をそれぞれ判定する。検出されたスロット値のカテゴリーの判定と、否定語の有無により、スロット毎に意味情報を生成する。
【００３９】
例えば、「中区じゃなくて、千種区のレストラン。」に対して、次のような意味情報を生成する。
店名スロット　　：　「肯定」
市町村名スロット：　「肯定」
詳細住所スロット：　「中区　否定　千種区」
業種スロット　　：　「レストラン」
又、単に、「違います。」などと発話して、どのスロットに対して言及しているか判定できない場合は、全てのスロットに対して、否定の度合いとして、「弱否定」を与える。さらに、スロット情報管理部では、意味情報と、例えば、図２、図３、図４示すスロット情報更新の規則に従い、各スロットに対してそれぞれ「スロット値」と、「状態」からなるスロット情報を更新する。これにより、対話の進行に伴い、スロット状態を管理し、スロット値を確定できるようになる。
【００４０】
さらに、請求項４の発明によれば、音声対話装置において、意味情報により、スロット状態は、「未知」＜「未確認」＜「確定」の順に順位付けられてスロット情報管理部により管理できるようになるので、意味情報により、スロットの状態に順位をつけてスロット状態を更新することが可能となる。つまり、スロット又は、スロット語句から検出されるスロット値に対する肯定語／否定語の有無により、スロット状態を順位付けて管理更新することができるようになる。
【００４１】
さらに、請求項５の発明によれば、データベース検索部は、カテゴリー毎に複数のスロット値により構成されたデータベース内を得られたスロット値の組み合わせにより検索し、検索の結果、検索の対象外であるスロット値が得られていないスロットについてスロット値が一意的に定まる場合、検索により得られたスロット値を、該当するスロットのスロット値とし、そのスロットのスロット状態を「データベースによる設定」とすることができるようになる。
【００４２】
つまり、例えば、「店名」が「ピッコロ」、住所が「名古屋市」により、データベースを検索したとする。このときの検索結果としては、「名古屋市」の「ピッコロ」は、複数店舗あったとする。しかし、どの店舗の「業種」も、「レストラン」であったとした場合、「業種」スロットに対してのスロット値として「レストラン」を設定できるようになる。さらに、このときのスロット状態を「データベースによる設定」とするということである。これにより、例えば、「名古屋市役所」などの周知の固有の施設については、「店名」を答えるだけで、他のスロットは、全てデータベースの検索により「データベースによる設定」状態として設定されるようになる。音声対話装置は、「データベースによる設定」状態のスロットに対しては、設定されたスロット値をガイダンス文の中に挿入して、確認を行うことにより、ユーザからの回答を待たずに、詳しい住所、業種などをガイダンス文として提供し、確認をすることで、容易に目的のスロットを設定できるようになる。ユーザの側から見ても、ポイントとなるスロットのみ回答すれば、装置の側からの明示的な確認のガイダンスにより、「はい」「いいえ」などの「肯定」「否定」を表す言語の発話のみにより、目的のスロットを設定できるといった効果がある。
【００４３】
さらに、スロット情報管理部は、「データベースによる設定」状態を加えた４種類の状態により、「未知」＜「データベースによる設定」＜「未確認」＜「確定」の順に順位付けて管理、更新することができるようになるので、データベースにより設定されたスロット値を、「未知」ではないが、ユーザの回答から得られた値である「未確認」より、弱い状態として扱うことができるようになる。
【００４４】
さらに、請求項６の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは、陽に肯定、又は否定を表すことを発話することにより、意味情報生成部は、あるスロットについて「肯定」、又は「否定」の意味情報を生成することができるようになる。
【００４５】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですか？」
ユーザ　：「いいえ。」
【００４６】
さらに、請求項７の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは陽に肯定、又は否定を表すことを発話する代わりに、１つ、又は複数のスロットについて発話することで、意味情報生成部は、１つ、又は複数のスロットについての意味情報を生成することができるようになる。
【００４７】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですか？」
ユーザ　：「図書館です。」
【００４８】
さらに、請求項８の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは陽に肯定、又は否定を表すことを発話する代わりに、確認を求められているスロット語句を発話することにより、意味情報生成部は、あるスロットについてスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができるようになる。
【００４９】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「名古屋市です。」
【００５０】
さらに、請求項９の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは、陽に肯定を表すことを発話する代わりに、１つ又は複数のスロットについて発話することにより、意味情報生成部は、あるスロットについて「肯定」の意味情報を生成し、さらに、１つ又は複数のスロットについてそれぞれ意味情報を生成することができるようになる。
【００５１】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「名古屋市のレストランピッコロです。」
【００５２】
さらに、請求項１０の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対して、ユーザは、陽に否定を表すことを発話する代わりに、確認を求められているスロット語句と「否定」を表す単語を含むコマンド語句を用いた発話により、意味情報生成部は、あるスロットについて、スロット語句から検出されるスロット値と「否定」の組からなる意味情報を生成することができるようになる。
【００５３】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「名古屋市じゃない。」
【００５４】
さらに、請求項１１の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、正しいスロット語句を発話することによって、意味情報生成部は、あるスロットについて、保持するスロット値と「否定」の組からなる意味情報を生成し、さらに、発話された正しいスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができるようになる。
【００５５】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「豊田市です。」
【００５６】
さらに、請求項１２の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、確認を求められているスロット語句と「否定」を表す単語を含むコマンド語句を用いた発話をし、さらに、１つ又は複数のスロットについて発話をすることにより、意味情報生成部は、あるスロットについて、スロット語句から検出されるスロット値と「否定」の組からなる意味情報を生成し、さらに、１つ、又は複数のスロットについてそれぞれ意味情報を生成することができるようになる。
【００５７】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「名古屋市じゃなくて、豊田市です。」
【００５８】
さらに、請求項１３の発明によれば、ああるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、正しいスロット語句を発話し、さらに、１つ又は複数の新たなスロットについて発話することにより、意味情報生成部は、あるスロットの保持するスロット値について「否定」の意味情報を生成し、さらに、あるスロットについて、発話された正しいスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成し、さらに、１つ、又は複数のスロットについてそれぞれ意味情報を生成することができるようになる。
【００５９】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「豊田市のレストラン。」
【００６０】
さらに、請求項１４の発明によれば、あるスロットについて「肯定言語」による回答、又は「否定言語」による回答を要求するガイダンスに対してユーザは、陽に否定を表すことを発話する代わりに、確認を求められているスロット語句と「否定」を表す単語を含むコマンド語句を用いた発話をし、さらに、１つ又は複数のスロットについて発話することにより、意味情報生成部は、あるスロットについて、スロット語句から検出されるスロット値と「否定」の組からなる意味情報を生成し、さらに、１つ、又は複数のスロットについてそれぞれ意味情報を生成することができるようになる。
【００６１】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市ですね？」
ユーザ　：「名古屋市じゃなくて、豊田市のレストラン。」
【００６２】
さらに、請求項１５の発明によれば、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロットについて直接スロット語句を用いて発話ことにより、意味情報生成部は、あるスロットについてスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができるようになる。
【００６３】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「住所を言ってください。」
ユーザ　：「名古屋市です。」
【００６４】
さらに、請求項１６の発明によれば、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロット以外のスロットについてスロット語句を用いて発話することにより、意味情報生成部は、あるスロット以外のスロットについて、発話されたスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができるようになる。
【００６５】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「住所を言ってください。」
ユーザ　：「ピッコロというレストランです。」
【００６６】
さらに、請求項１７の発明によれば、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロット以外の１つ又は、複数のスロットについてスロット語句を用いて答えることにより、意味情報生成部は、あるスロット以外の１つ又は、複数のスロットについて、それぞれのスロット毎に発話されたスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができるようになる。
【００６７】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「住所を言ってください。」
ユーザ　：「名古屋市のピッコロというレストランです。」
【００６８】
さらに、請求項１８の発明によれば、あるスロットのスロット値を尋ねるガイダンスに対して、ユーザは、あるスロットについて又は／及び尋ねられたスロット以外の１つ又は、複数のスロットについてスロット語句を用いて答えることにより、意味情報生成部は、あるスロットについて発話されたスロット語句から検出されるスロット値と「肯定」の組による意味情報又は／及び尋ねられたスロット以外の１つ又は、複数のスロットについて、それぞれのスロット毎に発話されたスロット語句から検出されるスロット値と「肯定」の組からなる意味情報を生成することができるようになる。
【００６９】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市の何のお店でしょうか？」
ユーザ　：「名古屋市じゃなくて、豊田市のレストラン、ピッコロです。」
【００７０】
さらに、請求項１９の発明によれば、目的の回答を得るために、発話者に回答文を促す文であるガイダンス文は、明示的ガイダンス文、暗黙的ガイダンス文、又は暗黙的な確認と、質問を同時に行うガイダンス文を用いることができるようになる。これにより、ユーザに発話を促すガイダンス文として、人対人の対話により近いガイダンス文を提示することができるようになる。
【００７１】
例えば、対話の一例として、以下のような対話が可能となる。
対話装置：「名古屋市の何のお店でしょうか？」
ユーザ　：「名古屋市じゃなくて、豊田市のレストランピッコロです。」
【００７２】
さらに、請求項２０の発明のプログラムは、上述した音声対話装置にインストールして用いると、上述した音声対話装置において異なる複数のカテゴリー毎に設けられた少なくとも１つのスロットに発話者から目的とする回答を得るためにガイダンス文を音声出力ことができるようになり、ガイダンスに応答して発話者から得られる回答文を解析することが可能となる。さらに、解析した結果を用いて、目的とする回答を確定することができるようになる。さらに、回答文から、ガイダンス文で問われたカテゴリーのスロットのスロット値の候補となるスロット単語及びスロット単語が助詞などを伴った単語列からなるスロット語句、及びガイダンス文で問われた以外の他のカテゴリーのスロットについてのスロット値の候補となるスロット語句、及び「肯定」又は「否定」を表す単語を含むコマンド語句をキーワード検出手順により検出ことができるようになる。ガイダンスに対する回答を解析して、全てのスロットが確定していない場合は、確認すべきカテゴリーのスロットに対して回答を要求するガイダンス文をユーザに提示することができるようになる。これにより、ユーザが自由に回答しても、装置の目的とするスロット値を確定することができるようになる。
【００７３】
さらに、請求項２１の発明による音声対話プログラムをインストールして用いることにより、意味情報生成手順により、検出されたスロット語句からスロット値を検出し、回答文が、何れのスロット、又は検出したスロット値について言及しているか解析し、言及しているスロット、又はスロット値について「肯定」、「否定」の度合い、又は「何れでもない」をそれぞれ判定し、スロット毎又は、スロット値毎に対してそれぞれの判定結果により発話の意味を解析し、スロット毎、又はスロット値毎それぞれの判定結果の組からなる意味情報を生成するすることができるようになる。さらに、スロット情報更新手順により、意味情報により、各スロットにおけるスロット値とスロット状態の組からなるスロット情報を更新することができるようになる。これにより、ユーザは、ガイダンス文により求められた回答以外を発話しても、スロット語句からスロット値の検出、認識、暗黙の確認、「肯定」、「否定」を含めて、人対人の対話のように、ガイダンスを意識せずに対話を行うことが可能となる。
【００７４】
さらに、請求項２２の発明によれば、スロット情報管理手順により、意味情報により、スロット状態を、「未知」＜「未確認」＜「確定」の順に順位付けて管理することができるようになる。
【００７５】
さらに、請求項２３の発明によれば、音声対話プログラムは、データーベース検索手順をさらに有し、得られたスロット値の組み合わせにより複数のスロット値により構成されたデータベース内を検索し、検索の結果、検索の対象外であるスロット値が得られていないスロットについてスロット値が一意的に定まる場合、検索により得られたスロット値を、該当するスロットのスロット値とし、スロット状態を「データベースによる設定」とすることができるようになる。同時に、「データベースによる設定」状態を加えた４種類の状態により、「未知」＜「データベースによる設定」＜「未確認」＜「確定」の順に順位付けて管理することができるようになる。
【００７６】
上述した発明により、ユーザは、必ずしも、装置からのガイダンスを意識して対話を進める必要が無く、言いたいことや、思いついたことを自由に話すことができる。また、ユーザは、自由に話すことができるが、対話装置からは、明示的に、特定のスロットについて、質問／確認を行うことにより、ユーザの発話を促すことができるようになる。例えば、「お店の名前をどうぞ。」といった明示的にカテゴリーを問うガイダンスは、ガイダンスを手がかりに、ユーザに適確な発話を促す効果がある。
【００７７】
尚、以上の本発明の作用・効果は、日本語処理に限定されることなく、任意の自然言語処理に対して有効である。
【００７８】
また、本発明は、上記の作用原理からも判るように、言語処理における目的語句の検出、検出語句の意味判断に特徴を有するものであり、必ずしも音声入力や音声出力を前提とするものではない。即ち、本発明は、使用者に対して実時間応答や対話型応答をすることを前提としない自動翻訳装置や自動議事録生成装置、或いは、音声入力することを前提としない例えば利用者がキーボード等から入力する文字列入力型の言語処理装置等に応用することも可能である。
【００７９】
【発明の実施の形態】
以下、本発明を具体的な実施例に基づいて説明する。ただし、本発明は、以下に示す実施例に限定されるものではない。
【００８０】
ここでは、カーナビゲーションシステムにおける目的地設定を対象タスクとした対話を例に説明する。
【００８１】
目的地設定に必要なスロットとしては、「店名」、「市町村名」、「詳細住所」、「業種」の４つを考える。又、これらのスロット値から目的地となる、店舗（施設）の経度、緯度を検索できるデータベースを用いる。図１の各部の具体的な構成を以下に述べる。
【００８２】
図１は、本発明の実施例に係わる音声対話装置１００の論理的な構成を例示する構成図である。
音声対話装置１００は、物理的なハードウエア構成としては、周知の音声対話装置と同様に、音声入力部１１０が有するマイクや、音声出力部１６０が有するスピーカー等のマンマシン・インターフェイス部を備えたコンピュータ・システムにより具現されている。具体的には、マイクロフォン及び周辺機器からなる音声入力部１１０と、音声認識部１２０と、意味理解部１３０、スロット情報管理部１４０、ガイダンス生成部１５０、データベース検索部１７０、対話制御部１８０の演算制御を行い、その処理プログラムを実行するＣＰＵ、処理プログラム、辞書、データベース等を記憶したＲＯＭ、データを一時的に記憶したり、処理プログラムを一時的に記憶するＲＡＭ等により構成される制御装置、さらに、スピーカ及びその周辺機器からなる音声出力部１６０により構成されている。さらに、必要に応じて、表示装置、入力ボタン等も備えている。
【００８３】
音声入力部１１０は、マイクロフォンから発話者の音声を音声情報として取り込む。マイクロフォン、及びその周辺機器は、音声をデジタル信号に変換し、音声認識部１２０へ出力する。
【００８４】
さらに、後述する辞書類、データベース等は、外部記憶装置であるハードディスク、ＣＤ−ＲＯＭ等に記憶されていても良く、その場合は、音声対話装置は、ハードディスク装置、ＣＤーＲＯＭ装置を備えたコンピュータシステムにより構成されている。さらに、初期処理として、ハードディスク装置に記憶されている場合は、ＲＡＭへ、ＣＤ−ＲＯＭに記憶されている場合は、ＲＡＭ、ハードディスク装置等のアクセス速度の速い、メディアへ読み込み利用するようにしても良い。
【００８５】
音声認識部１２０は、発話者の発話音声を文字列として認識する。即ち、マイク（音声入力部１１０）から入力された音声情報を、一旦、ＲＡＭに記憶させ、音声認識用辞書（認識用言語辞書や認識用音響辞書等）を用いた音声認識処理を例えば、音声認識処理を専用に行うＣＰＵを備えた専用チップにより文字列に変換し出力する。
【００８６】
音声認識部１２０における言語モデルは、対人との目的地設定に関する対話から、収集したコーパスを用いて学習した確率言語モデル（Ｎ−ｇｒａｍ）を用いる。この言語モデルは、目的地設定に関わる発話を広くカバーしているものであり、例えば、ＣＤ−ＲＯＭ等に記憶されており、必要に応じて、音声対話装置の処理の開始時に、ＲＡＭなどに読み込み利用する。本発明においては、対話の状況に応じて、言語モデル、単語辞書を選択する方法をとらずに対象とする目的地設定に対して、単一の言語モデル、単語辞書を用いることによって、対象としている目的地設定に関わるユーザの発話を広くカバーすることができるため、ユーザは、対話のどの時点においても、言いたいことを言うことができる。
【００８７】
意味理解部１３０は、キーワード検出部１３１と、意味情報生成部１３２により構成され、キーワード検出部１３１において認識結果文字列から、スロット値を含むスロット語句を検出し、ユーザが、どのカテゴリーのスロットに言及しているのかを判定する。判定するには、例えば、検出された語句のカテゴリーをＲＡＭに記憶された単語辞書、或いはスロット値候補単語リスト等を利用して同定し、目的地設定が想定しているカテゴリーに属するか判断することにより、スロット語句であるかどうか判断する。ここで、それぞれの辞書は、ＲＡＭにおかずに、ハードディスク上に置いたまま、処理を進めることも可能である。又、認識結果文字列、検出語句などは、一時的には、それぞれ決められた、メモリー上に置かれ装置の処理が進められる。カテゴリーとしては、例えば、住所、地名、施設の種類、店名、業種、施設名、ランドマーク名、或いはユーザ設定名等の任意の属性を定義することができる。判定の結果、スロット語句であると判定すると、得られたカテゴリーとともにスロット語句として検出される。また、意味情報生成部１３２において、スロット語句からスロット値を検出し、スロット値の候補とする。さらに、「ない」、「違う」などの否定を表す単語を含むコマンド語句の存在を調べることにより、そのスロット値についての肯定／否定について判定する。得られたスロット値がどのカテゴリーのスロットであるか判定し、さらに、否定語の有無により、スロット毎に意味情報を生成する。これらの処理は、意味理解部１３０に設けられた専用のＣＰＵ及びメモリ上で行っても良く、音声対話装置全体を制御するＣＰＵ、メモリ上で行っても良い。意味理解部１３０の処理の負荷、ＣＰＵの処理能力に応じて、ＣＰＵ、メモリを設置することにする。
【００８８】
例えば、「中区じゃなくて、千種区のレストラン。」に対して、次のような意味情報を生成する。
店名スロット　　：　「肯定」
市町村名スロット：　「肯定」
詳細住所スロット：　「中区　否定　千種区」
業種スロット　　：　「レストラン」
【００８９】
又、単に、「違います。」などと発話して、どのスロットに対して言及しているか判定できない場合は、全てのスロットに対して、「弱否定」を与える。本実施例においては、意味情報は、スロット値又は、言及しているスロットと、意味要素により構成される。意味要素は、「肯定」、「弱否定」、「否定」、「いずれでもない」の４種類の要素からなる。
【００９０】
スロット情報管理部１４０では、意味情報と、図２、図３、図４に示すルールに従い、所定のメモリ上に定められたテーブルとして構成されている各スロットに対して、「スロット値」と、「状態」からなるスロット情報を更新する。
【００９１】
状態としては、「未知」、「未確認」、「確定」、「データベースによる設定」の４つの状態がある。「未知」状態は、スロット値として何も無い状態であり、システムからユーザに尋ねる必要のあるスロットを示す。「未確認」状態は、スロット値として、何らかの値を持っているが、その値について確認を行う必要が有る状態である。「確定」状態は、スロット値として何らかの値を持っており、さらに、その値について、確認がなされている状態である。「データベースによる設定」状態は、データベース検索結果からスロット値が一意的に決まる場合データベースの検索結果により設定された値である。ユーザの回答から得られた値ではないが、データベース検索結果からは、他の値が無い場合の値であり、その値についてはユーザに対して確認を行う必要がある値である。データベース検索部１７０では、各スロットの値に基づいてデータベース１７１を検索し、検索結果と、図６に示すルールに従い各スロット情報を更新する。
【００９２】
データベース１５１は、主に、音声認識用辞書、単語辞書、スロット値候補単語リスト、音声合成用辞書、カテゴリー毎に複数のスロット値により構成されたデータベース、及びガイダンス文テンプレート等から構成されており、例えば、ＣＤ−ＲＯＭなどに記憶されており、必要に応じて、処理速度のより早いメモリ、メディア上に読み込まれて利用される。又それぞれのデータベースの検索は、ＣＰＵの処理能力、必要とされる処理速度により、それぞれ、単独の専用ＣＰＵ及びメモリにより構成されるファームウエアにより構成されていても良く、音声対話装置の主ＣＰＵ上において行っても良い。いずれにしろ、処理能力により、その構成は異なる。
（ａ）音声認識用辞書
認識用言語辞書や認識用音響辞書等からなる。
（ｂ）単語辞書
カテゴリー、関連カテゴリー、その他の属性、発音情報等を有する。
（ｃ）スロット値候補単語リスト
候補単語リスト。単語、又は単語を含む語句とその単語のカテゴリーの対から構成されたテーブル。
（ｄ）音声合成用辞書
発話の抑揚、単語接続、間などに関する音声合成用の発音規則を有する。
（ｅ）カテゴリー毎に複数のスロット値により構成されたデータベース
カテゴリー毎に存在するスロット値により構成され、それぞれのスロット値において、カテゴリー相互の存在する組み合わせにより関連付けられたスロット値によるデータベース
（ｆ）　ガイダンス文テンプレート
確認スロット、質問スロットの対に対応した複数のガイダンス文のテンプレートからなる。
【００９３】
ガイダンス生成部１５０では、対話の開始、継続、終了時に応じて、各スロットの値、状態に基づきユーザに尋ねるべきカテゴリー、スロット及び、確認すべきカテゴリー、スロットを判定し、発話者（ユーザ）に対する応答文（確認応答文や質問応答文等）であるガイダンス文を、例えば、ガイダンス文テンプレートを用いて生成する。ここでは、「未知」　状態のスロットについて尋ねるガイダンス中に「未確認」　状態のスロット値を入れて、暗黙的に確認をおこなうこととする。又、「未知」　状態のスロットが複数有り、それらを尋ねる場合は、１回のガイダンスで最大２つのスロットに付いて尋ねることができるものとする。さらに、それらを選択する際の優先順位は、「店名」　＞「市町村名」　＞「業種」＞「詳細住所」　とする。さらに、対話の進展に伴い、スロット情報を参照して、全てのスロット情報が満たされていない場合は、対話を継続し、尋ねるもしくは、確認すべきスロットについて、明示的に、その趣旨のガイダンスを行う。
【００９４】
更に、その応答文（単語列）を音響的なデジタル信号（音声情報）に変換・合成する。ただし、この変換・合成処理は、音声出力部１６０が行うようにしても良い。これらの処理は、例えば、それぞれ、対話の開始、継続、終了時に応じて、各スロットの値、状態に基づきユーザに尋ねるべきカテゴリー、スロット及び、確認すべきカテゴリー、スロット等が決められたメモリ上のテーブルにセットされ、必要なガイダンステンプレートを選択することによりＣＰＵにより実行される。
【００９５】
音声出力部１６０は、生成されたガイダンス文が音響的なデジタル信号（音声情報）に変換・合成され、スピーカーに音声として出力する。これらの処理は、例えば、音声出力の専用チップ、メモリにより構成されるファームウエアにより実行され、音声としてスピーカより出力される。
【００９６】
対話制御部１８０は、次に尋ねるべき質問項目又は確認項目を決定し、対話の流れを制御し、対話の進展により、保持されたスロット値に対して、スロット状態が、推移し、すべてのスロットが、「確定」　状態になるまで、対話を行う。その処理は、一般的には、音声対話装置の主ＣＰＵにより行わるが、必要とされる処理速度によっては、単独のＣＰＵ、メモリにより構成され、音声対話装置の主ＣＰＵにより管理制御される。
【００９７】
図７、図８は、上記の音声対話装置１００が実行する処理の手順を例示するフローチャートである。本手順では、まず最初に、ステップ４００により初期処理を実行し、スロット情報、データベースの検索結果、意味情報を記憶するメモリの初期化を行う。４つのスロットはすべて「スロット値」は、何もなく、「状態」　は、「未知」　状態に初期化される。
【００９８】
本処理では、データベース１７１の中から使用頻度が高いと予期されるプログラム及びデータを、比較的アクセス速度の高いメモリー上にローディングしておく等の初期処理を実行しても良い。例えば、音声対話装置１００がディスプレイ装置（図略）を有する場合等には、例えば初期メニュー画面を表示する等のその他の初期処理を行っても良い。以下の対話を行うことにより、例えば、図６に示すようなスロット値、及び、スロット状態からなるメモリ上のテーブルを設定していくものとする。さらに、必用に応じて、直前の発話による認識結果であるスロット語句を直前の認識結果としてメモリ上に保持しておくこともする。
【００９９】
次に、ステップ４０２において、ガイダンス生成部１５０において、システムは、開始のガイダンスを生成、合成する。ステップ４０４において、音声出力部１６０により、ガイダンスが音声出力される。音声出力例として、
システム：「目的地を設定します。」
【０１００】
次に、ステップ４０６において、すべてのスロットが、「未知」状態であるので、上述した優先順位の上位のものから、順に、まず、２つのスロット、「店名」と、「住所」を尋ねる。ガイダンス文が生成される。ステップ４０８において、生成されたガイダンス文が音声出力される。音声出力例としては、
システム：「お店の名前と、住所を教えてください。」
【０１０１】
ステップ４１０において、音声入力部１１０から、ユーザからの音声発話を入力する。
対話例として、次の発話がユーザから行われたとする。
ユーザ　：「名古屋市のレストランなんだけど。」
ステップ４１２において、音声認識部１２０により、入力音声が文字列として認識される。ここでは、先のユーザの発話が次のように誤認識されたとする。
認識結果：「長野市　の　レストラン　な　ん　だけど　」
【０１０２】
ステップ４１４において、意味理解部１３０のキーワード検出部１３１において、語句の検出が行われ、続くステップ４１６において、意味情報生成部１３２により、各スロットに対して次のような意味情報を生成する。
店名スロット意味情報：「肯定」
市町村名スロット意味情報：「長野市　肯定」
詳細住所スロット意味情報：「肯定」
業種スロット意味情報：「レストラン　肯定」
【０１０３】
ステップ４１８において、スロット情報管理部により、図２、図３、図４のルール（スロット情報管理手順）に従い、各スロットに対する意味情報から、スロット情報を更新する。
【０１０４】
ここでは、まず、「店名」スロットの意味情報は、「肯定」のみなので、図２スロット値「空白」「肯定」の規則を用いる。「店名」スロットは、初期状態で、スロット値は「空白」、状態は「未知」なので、更新の規則の現在のスロット状態「未知」の欄の規則を適用すると、店名スロット情報：「？　未知」と設定される。
【０１０５】
次に、「市町村名」のスロットの意味情報は、「長野市　肯定」なので、図３スロット値「スロット値Ａ」「肯定」の規則を用いる。現在のスロット状態「未知」なので、更新の規則の現在のスロット状態「未知」の欄の規則を適用すると、市町村名スロット情報：「長野市　未確認」と設定される。同様に、意味情報に対して、規則を適用すると、詳細住所スロット情報：「？　未知」、業種スロット情報：「レストラン　未確認」と設定される。
【０１０６】
次に、ステップ４２０において、設定、更新されたスロット情報を用いて、データベース検索部１７０により、データベース１７１を検索する。ここでは、「長野市」「レストラン」で検索した結果、複数の検索結果がられ、それらは、「店名」「詳細住所」について一意に決まるようなものではなかったとする。従って、図５で示す、ルールに合致せず、ここではスロット情報は更新されていない。
【０１０７】
次に、ステップ４２２において、全てのスロットが満たされて（全てのスロットの情報が「確定」となる）おらず、ステップ４０６に戻り、例えば、次のガイダンス文が、生成、合成される。以後全てのスロットが満たされるまで、ステップ４０６からステップ４２２が繰り返し実行される。以下対話の例を示しながら、意味情報によるスロット情報更新の規則及び、データベース検索結果によるスロット情報更新の規則の説明を行い、スロット値の確定の方法を説明する。
【０１０８】
対話の継続を行う例として、次のガイダンス文が生成され、出力される。
システム：「長野市の何と言うレストランですか？」
ユーザは、
ユーザ　：「長野市じゃなくて、名古屋市のタンポポです。」
と回答し、システムは、
認識結果：「長野市　じゃ　なく　て　名古屋市　の　タンポポ　です」
と認識したとする。意味理解部では、解析の結果、次のような意味情報を生成する。
店名スロット意味情報　：「タンポポ」
市町村名スロット意味情報：「長野市　否定　名古屋市　肯定」
詳細住所スロット意味情報：「肯定」
業種スロット意味情報：「肯定」
【０１０９】
スロット情報管理部では、図２、図３、図４の規則に従い、各スロット情報を更新する。
店名スロット情報：「タンポポ　未確認」
市町村名スロット情報：「名古屋市　未確認」
詳細住所スロット情報：「？　未知」
業種スロット情報：「レストラン　確定」
と設定される。
【０１１０】
ここで、データベース検索部により検索した結果、複数の結果が得られ、それらは、詳細住所について、一意に決まるようなものではなかったとする。従って、図５の規則に合致せず、ここではスロット値は更新されなかった。
【０１１１】
次に、まだ、全てのスロットが満たされていないので、次のガイダンスが生成される。
システム：「名古屋市のタンポポの詳しい住所を教えてください。」
ユーザの回答が次のようだったとする。
ユーザ　：「えーと、中区だったと思いますが。」
認識結果：「えと　中区　だ　居酒屋　ます　が」
意味理解部では、キ−ワード検出手順を実行するステップ４１４とキーワード検出部１３１により、キーワードが検出され、意味情報生成手順を実行するステップ４１６と意味情報生成部１３２により、次のような意味情報が生成される。
店名スロット意味情報：「肯定」
市町村名スロット意味情報：「肯定」
詳細住所スロット意味情報：「中区」
業種スロット意味情報：「居酒屋　肯定」
この意味情報に対して、意味情報による更新の規則を用いスロット情報を更新すると、
店名スロット情報：「タンポポ　確定」
市町村名スロット情報：「名古屋市　確定」
詳細住所スロット情報：「中区　未確認」
業種スロット情報：「レストラン　確定」
業種スロットは、既に確定済みだったので、誤認識「居酒屋」による影響は受けない。
【０１１２】
ここで、「タンポポ」「名古屋市」「中区」「レストラン」で検索した結果１つの検索結果が得られたとする。しかし、「中区　未確認」は影響を受けないので、まだ、全てのスロットが満たされてはいないので、次のガイダンス文がステップ４０６により生成される。
システム：「名古屋市中区のレストランタンポポですね。」
ユーザの回答が次のようであったとする。
ユーザ　：「そうです。」
認識結果：「そう　です」
意味理解部１３０において、次のような意味情報が生成される。
店名スロット意味情報：「肯定」
市町村名スロット意味情報：「肯定」
詳細住所スロット意味情報：「肯定」
業種スロット意味情報「肯定」
スロット情報管理部１４０において、スロット情報が更新され
店名スロット情報　　：「タンポポ　確定」
市町村名スロット情報：「名古屋市　確定」
詳細住所スロット情報：「中区　確定」
業種スロット情報　　：「レストラン　確定」
となり、全ての情報が確定となる。
【０１１３】
さらに、データベース検索部１７０により、「タンポポ」「名古屋市」「中区」「レストラン」で検索した結果１つの結果が得られ、全てのスロット情報が確定となる。
【０１１４】
ステップ４２４に進み、次のガイダンス文が生成され、ステップ４２６により音声出力され対話が終了する。
システム：「では、名古屋市のタンポポで設定します。」
【０１１５】
ここで、意味情報によるスロット情報の更新の規則について補足説明する。
図３に示す、意味情報が「スロット値Ａ　肯定／無し」であった場合を説明する。
【０１１６】
現在のスロット状態が、「未知」である場合は、スロット値Ａをスロット値として設定し、スロット状態を「未確認」とする。
【０１１７】
現在のスロット状態が、「未確認」である場合は、保持しているスロットがスロット値Ｂであったとする。このとき回答により得られたスロット値Ａと、保持していたスロット値Ｂが同じであれば、スロット値はそのまま保持し、スロット状態を「確定」とする。このとき回答により得られたスロット値Ａと、保持していたスロット値Ｂが異なれば、スロット値Ａをスロット値として設定し、スロット状態を「未確認」とする。
【０１１８】
現在のスロット状態が、「確定」である場合は、保持しているスロットがスロット値Ｂであったとする。又、直前に認識されたスロット値が、スロット値Ｃであったとする。このとき回答により得られたスロット値Ａと、保持していたスロット値Ｂが同じであれば、スロット値はそのまま保持し、スロット状態を「確定」とする。このとき回答により得られたスロット値Ａと、保持していたスロット値Ｂが異なれば、直前のスロット値を確認する。保持しているスロット値と異なるスロット値が２回続けて入力された場合、入力された語句が２回とも同じであれば入力された語句によりスロット値を更新する。つまり、スロット値Ａ＝スロット値Ｃであれば、スロット値をスロット値Ａとして設定し、スロット状態を「未確認」とする。スロット値Ａ≠スロット値Ｃであれば、スロット値をそのまま保持し、スロット状態を「確定」とする。
【０１１９】
現在のスロット状態が、「データベースによる設定」である場合は、保持しているスロットがスロット値Ｂであったとする。このとき回答により得られたスロット値Ａと、保持していたスロット値Ｂが同じであれば、スロット値はそのまま保持し、スロット状態を「確定」とする。このとき回答により得られたスロット値Ａと、保持していたスロット値Ｂが異なれば、スロット値Ａをスロット値として設定し、スロット状態を「未確認」とする。
【０１２０】
このように、音声対話装置を実施することにより、ユーザが、ガイダンス文とは関係なく自由に発話しても、正しくスロット値を確定することができる音声対話装置を得ることが可能となり、ユーザにとってとても使い勝手の良い音声対話装置を得ることができるようになる。
【０１２１】
尚、上述した各スロット情報の更新の規則は、一例であるので、これ以外の規則であっても、効率よくスロット値を確定できる規則であれば良い。
【０１２２】
請求項１のキーワード検出部は、具体的には、ＣＰＵ及びその処理手順であるステップ４１４及び音声認識用辞書、単語辞書等のデータベースを検索することにより実現される。
【０１２３】
請求項１の対話制御部は、具体的には、ＣＰＵ及びその処理手順であるステップ４００からステップ４２６により全ての対話を制御し実現する。
【０１２４】
請求項３、請求項６、請求項７、請求項８、請求項９、請求項１０、請求項１１、請求項１２、請求項１３、請求項１４、請求項１５、請求項１６、請求項１７、請求項１８に記載の意味情報生成部は、具体的には、ＣＰＵ及びその処理手順であるステップ４１６及び単語辞書、スロット値候補単語リスト等のデータベースを用いて実現される。
【０１２５】
請求項３、請求項４、請求項５に記載のスロット情報管理部は、具体的には、ＣＰＵ及びその処理手順である４１８により実現される。
【０１２６】
請求項５に記載のデータベース検索部は、具体的には、ＣＰＵ及びその処理手順であるステップ４２０において、カテゴリー毎に複数のスロット値により構成されたデータベースを検索することにより実現される。
【０１２７】
さらに、上記実施例では、「肯定」、「否定」の度合いとしては、「肯定」、「否定」、「弱否定」、「何れでもない」の４種類において説明したが、必要に応じて、「弱肯定」、「強肯定」、「強否定」などを加えても良い。又これ以外の「肯定」「否定」段階を設けても良い。さらに、それぞれの、「肯定」「否定」の度合いに応じたスロット情報の更新の規則を設定しても良い。
【０１２８】
さらに、上記実施例では、スロット値を確定する処理の説明は、ユーザの発話から検出されたスロット値を用いた例を示したが、一旦検出されたスロット語句及びスロット値は、そのものを使わずにスロット語句に個別に付けられたスロット語句個別認識符号であるスロット語句ＩＤ（ＩＤｅｎｔｉｆｉｃａｔｉｏｎ）　、スロット単語ＩＤなどを用いて内部的に処理を行っても良い。
【０１２９】
さらに、上記スロット語句は、スロット単語単体であっても、又はスロット単語が助詞などを伴った語句であっても良い。
【０１３０】
さらに、上記実施例では、分かりやすく、簡単のために、ガイダンスにより質問を行う場合、１つのカテゴリーに対しての質問の例を用いたが、複数のカテゴリーを対象としたガイダンスであっても良い。例えば、「目的地の都道府県名とお店の名前をどうぞ。」のように、「都道府県名」と、「店名」を一度に質問するガイダンスであっても良い。その他のカテゴリーに付いても同様であり、同時に質問するカテゴリーについての制限はない。
【０１３１】
尚、請求項５、請求項６、請求項７、請求項８、請求項９、請求項１０、請求項１１、請求項１２、請求項１３、請求項１４、請求項１５、請求項１６、請求項１７の発明を実現する音声対話プログラムは、請求項１８、請求項１９、請求項２０、請求項２１の発明を個別にまたは、組み合わせて用いることにより可能であるが、それぞれの発明を個別に実現するプログラムを用いても良い。
【０１３２】
尚、本発明は、音声対話装置においての発明であるが、上述した、言語の認識方法は、音声以外の対話方式においても可能である。
【０１３３】
上述した実施形態は、本発明の一例であって、これに限定されるものではなく、発明の本質に照らして、様々な変形例が考えられる。
【図面の簡単な説明】
【図１】本発明の実施例に係わる音声対話装置１００の論理的な構成を例示する構成図。
【図２】意味情報によるスロット情報更新の規則の説明図（その１）。
【図３】意味情報によるスロット情報更新の規則の説明図（その２）。
【図４】意味情報によるスロット情報更新の規則の説明図（その３）。
【図５】データベース検索結果によるスロット情報更新の規則の説明図。
【図６】設定するスロット情報を構成するテーブルの例。
【図７】音声対話装置１００が実行する処理の手順を例示するフローチャート（その１）。
【図８】音声対話装置１００が実行する処理の手順を例示するフローチャート（その２）。
【符号の説明】
１００　…　音声対話装置
１１０　…　音声入力部
１２０　…　音声認識部
１３０　…　意味理解部
１３１　…　キーワード検出部
１３２　…　意味情報生成部
１４０　…　スロット情報管理部
１５０　…　ガイダンス生成部
１６０　…　音声出力部
１７０　…　データベース検索部
１７１　…　データベース
１８０　…　対話制御部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice recognition device that outputs a guidance sentence in order to obtain a target answer from a speaker, analyzes an answer sentence obtained from the speaker in response to the guidance, and determines a target answer. . Therefore, the present invention can be applied to, for example, an in-vehicle car navigation system and the like, and has a voice that performs a dialog to fill information corresponding to a so-called “slot” such as a destination facility name or address. The present invention can be applied to an interactive device or the like.
[0002]
[Prior art]
An example of a voice recognition device that outputs a guidance sentence to obtain a desired answer from a speaker and analyzes the answer sentence obtained from the speaker in response to the guidance to determine a desired answer is as an example. A voice interaction device described in Japanese Patent Laid-Open Publication No. Hei 10-31497 is known. In the voice interaction apparatus described in Japanese Patent Application Laid-Open No. H10-31497, this is performed in a task of filling a plurality of slots. The value of the previously recognized slot is inserted into the guidance sentence for asking the next slot to be queried so that the confirmation of the immediately preceding recognition result and the next question can be performed simultaneously. After the guidance is presented, the slot words related to the slot inquired as the recognition target words and the command words (negative words such as “Yes” and “No”) indicating negation of the recognition are limited. Thereafter, if the recognition result of the user utterance is a slot phrase, the next guidance is generated using the slot phrase. If the recognition result is a command phrase, the previously recognized slot phrase is rejected and the same guidance is repeated. By conducting such a dialogue, a smooth and efficient dialogue can be realized. In this way, by limiting the recognition target vocabulary, it can be expected that the target slot is definitely determined.
[0003]
The following is an example of the dialogue.
Example 1)
System: Please enter the affiliation of the other party.
User: The materials section.
System: Who is in the Materials Section?
User: Sato.
System: Connect Sato's phone.
Example 2)
System: Please enter the affiliation of the other party.
User: Materials Section.
System: Who is in the Materials Division?
User: No.
System: Once again, please go to your affiliation.
......
In this way, by inserting the immediately preceding recognized word into the next guidance, the dialogue is smoothly realized while confirming the utterance content.
[0004]
[Problems to be solved by the invention]
However, the above-described conventional invention is for realizing an efficient and smooth dialogue with the user. At each point in the dialogue (immediately after each guidance), the recognition vocabulary is, for example, assumed to be a category. Are restricted to words belonging to. That is, the dialogue proceeds under the assumption that the user returns a response only to what has been asked in the guidance. Therefore, the user must always be aware of the intention of the guidance of the system and return a response assumed (expected) by the system. In such a dialogue, a mental burden on the user's dialogue is large, and a smooth dialogue is not always realized for all users.
[0005]
For example, in response to the user's utterance "No" in Example 2) above, the user may respond with "Materials Section". However, since the system assumes the keyword vocabulary of the person's name and the command vocabulary as the vocabulary to be recognized, when uttering "Materials Section", the "Materials Section" is always erroneously recognized as a person name or a command phrase. And the dialogue gets lost. That is, when a response assumed by the system is not made, the content is classified into a category completely different from the uttered content and recognized. The present invention has been made in order to solve the above-described problems, and its purpose is to allow the user to be aware of the guidance issued by the system and to always respond to the assumed response without giving a response. An object of the present invention is to realize a smooth voice dialogue with a voice dialogue device and obtain a desired answer.
[0006]
It should be understood that one invention described above is not intended to achieve all the objects described above at the same time, and individual inventions are intended to achieve the respective objects.
[0007]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, the invention according to claim 1 provides a guidance sentence for at least one category in order to obtain a desired answer from a speaker in at least one slot provided for each of a plurality of different categories. In a voice interactive device that outputs a voice response and analyzes an answer sentence obtained from a speaker in response to this guidance to determine a target answer, a slot corresponding to a slot of a category asked in the guidance sentence is obtained from the answer sentence. Detects a slot phrase containing a value candidate, a slot phrase containing a slot value candidate for a slot of a category other than that sought in the guidance sentence, and a command phrase containing a word indicating “positive” or “negative”. It has a keyword detection unit that performs That is, in order for the user to be able to utter without being conscious of the guidance, it is necessary to analyze the answer sentence of the user without assuming the category assumed by the guidance. At the same time, if not all slots have been determined, a dialogue control unit for controlling a guidance sentence requesting an answer to a slot of a category to be confirmed is provided. Here, a slot phrase is a phrase including a candidate for a slot value. The utterance of the user is widely covered and can be detected.
[0008]
Further, in the speech dialogue apparatus according to the second aspect of the present invention, the slot phrase is a slot word that is a candidate word for a slot value, and a slot word string in which the candidate word for a slot value is accompanied by a particle or the like. The phrases are a command word that is a word representing “affirmation” or “negation”, and a command word string that is a word string including a word representing “affirmation” or “negation”. In other words, a slot phrase is a slot word that is a candidate for a slot value, and a word string in which the slot word is accompanied by a particle or the like. For example, when the slot word is "Nagoya City", even if "Nagoya City" is detected with the particle "no" in the detection process, it can be handled in the subsequent process. Things. Furthermore, as command phrases, there are some phrases that can detect the meaning of “negation”, such as “no” and “different”, which can be treated as words, “not”, “not”, “ It is often the case with word strings accompanied by particles, such as "No." The purpose of the present invention is to enable the interactive device to detect a target slot value and a command including a phrase composed of such a word string.
[0009]
Further, according to the invention of claim 3, when the user utters without being bound by the guidance, it is necessary to judge which slot the user's answer sentence refers to and what the meaning is. Therefore, in the voice interaction device, the slot value is detected from the detected slot phrase, and the answer sentence is analyzed as to which slot or the detected slot value. Based on the presence or absence of the command phrase and its meaning, the degree of “affirmation”, “negation”, or “neither” is determined, and the meaning of the utterance is analyzed based on the determination result for each slot or slot value Then, a semantic information generation unit that generates semantic information including a set of determination results for each slot or slot value, and updates the slot information including a set of a slot value and a slot state in each slot with the semantic information. A slot information management unit.
[0010]
Further, the voice interaction device needs to analyze the answer sentence of the user, confirm whether the slot information obtained as a result of recognition by the voice interaction device is the information desired by the user, and determine it. . Therefore, the invention of claim 4 is characterized in that, in the voice interaction device, the slot status is managed by the slot information management unit in the order of “unknown” <“unconfirmed” <“confirmed” according to the semantic information. And
[0011]
Further, the voice interaction apparatus according to the fifth aspect of the present invention searches a database composed of a plurality of slot values by a combination of the obtained slot values, and as a result of the search, obtains a slot value that is not a search target. When a slot value is uniquely determined for a slot that has not been set, the slot value obtained by the search is set as the slot value of the corresponding slot, and a database search unit that sets the slot state to “database setting” is further provided. . Further, the slot information management unit manages the slots in order of “unknown” <“set by database” <“unconfirmed” <“confirmed” according to four types of states including the “set by database” state. And
[0012]
Further, the voice interaction apparatus according to the invention of claim 6 provides a guidance for requesting an answer in a "positive language" or an answer in a "negative language" for a certain slot in order to utter freely and recognize it without error. On the other hand, the user explicitly speaks to express affirmation or negation, whereby the semantic information generation unit can generate semantic information of “positive” or “negative” for a certain slot. And
[0013]
Further, the voice interactive device according to the invention of claim 7 is provided so that the voice interactive device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interactive device. For guidance requesting a "positive language" answer or a "negative language" answer for a slot, the user may speak for one or more slots instead of explicitly saying yes or no. By doing so, the semantic information generation unit can generate semantic information for one or a plurality of slots.
[0014]
Further, the voice interaction device according to the invention of claim 8 is provided so that the voice interaction device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interaction device. For guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of explicitly saying yes or no, the user may enter the slot phrase being asked for confirmation. By issuing, the semantic information generating unit can generate semantic information including a set of a slot value detected from a slot phrase and “positive” for a certain slot.
[0015]
Further, the voice interaction device according to the ninth aspect of the present invention is provided so that the voice interaction device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interaction device. For guidance that requires a "positive language" answer or a "negative language" answer for a slot, the user may speak for one or more new slots instead of explicitly saying yes. By doing so, the semantic information generating unit can generate “positive” semantic information for a certain slot, and can further generate semantic information for one or more slots.
[0016]
Further, the voice dialogue apparatus according to the invention of claim 10 is provided so that the voice dialogue apparatus can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice dialogue apparatus. For guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of explicitly saying negative, the user may see the slot phrase being asked for confirmation and the "negative , The semantic information generation unit can generate semantic information composed of a set of a slot value and “negation” detected from the slot phrase for a certain slot, by utterance using a command phrase including a word representing “”. And
[0017]
Further, the voice interactive device according to the invention of claim 11 is provided so that the voice interactive device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interactive device. For guidance that requires an answer in a "positive language" or an answer in a "negative language" for a slot, the user can generate semantic information by saying the correct slot phrase instead of saying explicitly negative. The unit generates, for a certain slot, semantic information composed of a set of a retained slot value and “negation”, and further generates semantic information composed of a set of a slot value and “positive” detected from a correct slot word spoken. It can be generated.
[0018]
Further, the voice dialogue apparatus according to the twelfth aspect is provided so that the voice dialogue apparatus can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice dialogue apparatus. For guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of explicitly saying negative, the user may be asked to confirm the slot phrase and "negative" Utterance using a command phrase including a word representing the word, and further speaking about one or a plurality of new slots, the semantic information generation unit, for a certain slot, the slot value detected from the slot phrase and Generates semantic information consisting of a set of “negation”, and further specifies that semantic information can be generated for one or a plurality of slots. To.
[0019]
Further, the voice dialogue apparatus according to the thirteenth aspect is provided so that the voice dialogue apparatus can be correctly recognized even if the user speaks without being aware of the guidance issued from the voice dialogue apparatus. For guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of explicitly saying negation, the user speaks the correct slot phrase, and one or more By speaking about a plurality of new slots, the semantic information generating unit generates semantic information of “negation” for the slot value held by a certain slot, and further detects a certain slot from a correct slot word spoken. Generating semantic information comprising a set of slot values and “positive”, and further generating semantic information for one or more slots, respectively. Characterized in that that can be generated.
[0020]
Further, the voice interaction device according to the invention of claim 14 is provided so that the voice interaction device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interaction device. In response to a guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, the user can detect from the slot phrase being asked for confirmation instead of explicitly saying negative. The utterance using a command phrase including a word indicating “negation” and a slot value to be performed, and further speaking about one or more slots, the semantic information generation unit detects a certain slot from the slot phrase. Generating a semantic information composed of a set of a slot value to be performed and “negation”, and further generating semantic information for one or a plurality of slots. Wherein the can.
[0021]
Further, the voice interactive device according to the invention of claim 15 is provided so that the voice interactive device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interactive device. In response to the guidance for asking the slot value of the slot, the user utters the slot using the slot phrase directly, and the semantic information generation unit determines the combination of the slot value detected from the slot phrase and the “affirmation” for the slot. Is generated.
[0022]
Furthermore, the speech dialogue apparatus of the invention of claim 16 is provided so that the speech dialogue apparatus can be correctly recognized even if the user speaks without being conscious of the guidance issued from the speech dialogue apparatus. In response to the guidance for asking the slot value of the slot, the user speaks using a slot phrase for a slot other than a certain slot, and the semantic information generation unit detects a slot other than a certain slot from the uttered slot phrase. It is characterized in that it is possible to generate semantic information composed of a set of a slot value to be executed and “positive”.
[0023]
Further, the voice interactive device according to the invention of claim 17 is provided so that the voice interactive device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interactive device. In response to the guidance for asking the slot value of the slot, the user answers one or more slots other than a certain slot using a slot phrase, so that the semantic information generation unit can perform one or more other than a certain slot. Is characterized by being able to generate semantic information consisting of a set of “positive” and a slot value detected from a slot phrase spoken for each slot.
[0024]
Further, the voice interactive device according to the invention of claim 18 is provided so that the voice interactive device can be correctly recognized even if the user speaks without being conscious of the guidance issued from the voice interactive device. In response to the guidance for asking the slot value of the slot, the user answers the guidance for a certain slot or / and for one or more slots other than the asked slot by using the slot phrase, and the The slot value detected from the slot phrase spoken for the slot and the semantic information by the set of “positive” or / and the slot phrase spoken for each slot for one or more slots other than the asked slot Can generate semantic information consisting of a set of a slot value and "positive" detected from
[0025]
Further, according to the invention of claim 19, in order to realize a smooth voice dialogue closer to a person-to-person dialogue, the guidance sentence which is a sentence urging the speaker to answer in order to obtain a desired answer is explicitly stated. It is a guidance sentence that performs a guidance sentence, an implicit guidance sentence, or an implicit confirmation and a question at the same time.
[0026]
Further, the speech dialogue program according to the twentieth aspect of the present invention, in the computer of the speech dialogue apparatus, stores at least one category in order to obtain a target answer from a speaker in at least one slot provided for each of a plurality of different categories. A step of outputting the target guidance sentence by voice, a step of analyzing the answer sentence obtained from the speaker in response to the guidance, a step of using the analysis result to determine the intended answer, and an answer sentence From the above, slot words that are candidates for the slot value of the slot of the category queried in the guidance sentence, a slot phrase composed of a word string in which the slot word is accompanied by a particle, and other categories other than those queried in the guidance sentence A slot phrase that is a candidate for a slot value for a slot and a word indicating “positive” or “negative” A keyword detection procedure for detecting a command phrase, and a dialogue control procedure for controlling a guidance sentence requesting an answer to a slot of a category to be confirmed when all slots have not been determined. And
[0027]
Furthermore, the speech dialogue program according to the twenty-first aspect of the present invention, in the computer of the speech dialogue device, detects a slot value from the detected slot phrase, and the answer sentence refers to which slot or the detected slot value. Analyze and determine the degree of “affirmation”, “negation”, or “neither” for the mentioned slot or slot value, and analyze the meaning of the utterance based on the determination result for each slot or slot value. A semantic information generating procedure for generating semantic information composed of a set of slot values and respective determination results, and a slot information managing procedure for updating slot information composed of a set of a slot phrase and a slot state in each slot, based on the semantic information. It is further characterized by having.
[0028]
Further, in the speech dialogue program according to the present invention, in the computer of the speech dialogue device, the slot information management procedure ranks the slot states in the order of “unknown” <“unconfirmed” <“confirmed” according to the semantic information. It is characterized by further having a management procedure.
[0029]
Further, the voice dialogue program according to the twenty-third aspect of the present invention searches the database constituted by a plurality of slot values for each category based on the combination of the obtained slot values, and as a result of the search, finds out which slots are excluded from the search. When the slot value is uniquely determined for a slot for which no value has been obtained, a database search procedure in which the slot value obtained by the search is set as the slot value of the corresponding slot, and the slot state is set to “database setting” is further performed. The method further comprises a procedure of managing the data in an order of “unknown” <“database setting” <“unconfirmed” <“confirmed” according to the four types of states including the “database setting” state. I do.
[0030]
Actions and effects of the present invention
This section mainly describes the functions and effects of the invention described in each claim. To facilitate understanding of the present invention, the present invention is exemplarily embodied and described, but does not limit the configuration of the claims. The part concretely described as an example is also an explanation of the embodiment of the invention.
[0031]
Here, a description will be given of an example of a dialogue in which a destination setting in a car navigation system is a target task. As the slots required for setting the destination, four types of “shop name”, “city name”, “detailed address”, and “business type” are considered. Further, a database that can search the longitude and latitude of the store, which is the destination, from these slot values is used.
[0032]
First, according to the first aspect of the present invention, a guidance sentence for at least one category is output as voice in a plurality of slots provided for each of a plurality of different categories in order to obtain a desired answer from a speaker. When analyzing the answer sentence obtained from the speaker in response to the above and determining the target answer, a slot phrase including a slot value candidate of the slot of the category asked in the guidance sentence and the guidance from the answer sentence A keyword detection unit is provided for detecting a slot phrase including a slot value candidate for a slot in a category other than the one asked in the sentence, and a command phrase including a word indicating “positive” or “negative”. As a result, answers other than the guidance category asked by the spoken dialogue apparatus have been misrecognized or ignored, but can now be accurately recognized. Therefore, even if the user speaks freely without being conscious of the intended category of the guidance, the slot value expected by the voice interaction device can be determined. At the same time, the voice interaction device repeats the dialogue, and when determining the slot value from the answer sentence, if not all the slots are determined, a guidance sentence for requesting an answer to the slot to be confirmed is provided. Since the control unit has a dialogue control unit for controlling, it is possible to repeatedly send a guidance sentence to the user requesting an answer to a slot to be confirmed until all slots are determined. In other words, even if the user freely speaks to another category without speaking to the category intended by the guidance, the voice interactive device may determine the slot value as the speech to another category. It becomes possible. Further, it is possible to obtain an audio interactive device capable of issuing explicit guidance to a user about slots of a category requiring an answer.
[0033]
For example, in a car navigation device, in a dialog for setting a destination, the guidance that asks for a prefecture category such as "Please state the name of the prefecture." , But you should answer the name of the prefecture, but you may answer, "It is Nagakute-machi." In addition, in response to the guidance of the question "Where are the dandelions?", The answer may be "Not a restaurant in Nagoya City", but also the business type. In other words, in this case, only the category of the address is asked, but not only the address but also answers belonging to the business category are spoken at the same time. In addition, in response to the guidance of confirming "Is Nagoya City?", Not only "Yes" and "No.", but also "Yes, a restaurant in Nagoya City" or "No, Toyota City." As described above, new information may be added or corrected.
[0034]
Such a conversation is often seen in a person-to-person conversation and is a very natural conversation. According to the present invention, it is possible to obtain a device that enables such a conversation. As a result, the user does not always need to proceed with the dialogue while being conscious of the guidance from the device, and can freely speak what he wants to say or what he has come up with. In addition, although the user can speak freely, it is also possible to explicitly ask / confirm a specific slot of a certain category from the interactive device, thereby prompting the user to speak. be able to. Especially for beginners, for example, if you provide explicit questions about slots in which category, such as "please tell us." Or guidance that is not instructed, what should you talk about? It can be confusing to talk and cause the conversation to get stuck. Therefore, the guidance that explicitly indicates the category has an effect of prompting the user to make an appropriate utterance based on the guidance. In the example of the above-mentioned dialogue, in order to make the explanation easy to understand, there is only one category for which a question is asked. However, in an actual device, for example, "Please state the name of prefecture and municipalities." It is also possible to ask questions for a plurality of categories, such as.
[0035]
Further, in the keyword detection unit of the present invention, it is possible to detect a slot phrase including a slot value from the recognition result character string regardless of the content asked in the guidance. At the same time, it is also possible to detect a command phrase including a word indicating acceptance or rejection. At this time, for example, by using a single word dictionary that can widely cover the utterance of the user related to the target destination setting, even when the user wants to say at any point of the dialogue, , A slot phrase can be detected. Further, for example, according to this single word dictionary, it is possible to determine which category the detected slot phrase belongs to.
[0036]
Further, according to the invention of claim 2, in the voice interaction apparatus, the slot phrase is a slot word that is a candidate word for a slot value, and a slot word string in which the candidate word for a slot value is accompanied by a particle or the like. Since the command phrase is a command word that is a word representing “positive” or “negative” and a command word string that is a word sequence including a word representing “positive” or “negative”, the slot containing the slot word is included. Words can be detected. At the same time, it is also possible to detect a command phrase that includes a word indicating “positive” or “negative”. As a result, it is possible to widely recognize the user's utterance freely performed, and it is possible to obtain a more convenient interactive device.
[0037]
Further, according to the invention of claim 3, the voice interaction apparatus detects a slot value from the slot phrase detected by the semantic information generation unit, and the answer sentence refers to any slot or the detected slot value. The slot or slot value referred to is judged to be "Yes", "Negative", or "None", and the utterance is made based on the judgment result for each slot or each slot value. Can be analyzed to generate semantic information including a set of a slot or a slot value and each determination result. Further, the slot information management unit can update the slot information including the set of the slot value and the slot state in each slot, based on the semantic information. As a result, the slot phrase detected from the answer sentence of the user can be determined as the slot value.
[0038]
In other words, the semantic information generation unit detects the slot location from the slot phrase including the slot value detected from the recognition result character string in the keyword detection unit according to the first aspect of the present invention, and further refers to which category of slot. Is determined. Also, for example, check for the presence of a command phrase that includes a negation word such as "not" or "different", check which slot or slot value each command phrase refers to, and check all slots and slots. A positive / negative judgment is made for the value. Based on the determination of the category of the detected slot value and the presence or absence of a negative word, semantic information is generated for each slot.
[0039]
For example, the following semantic information is generated for “Restaurant in Chikusa-ku, not Naka-ku”.
Store name slot: "Yes"
Municipal slot: "Yes"
Detailed address slot: "Naka Ward Negative Chikusa Ward"
Industry Slot: "Restaurant"
If it is not possible to determine which slot is referred to simply by saying “No”, “weak negation” is given to all the slots as the degree of negation. Further, the slot information management unit stores the semantic information and the slot information including the “slot value” and the “state” for each slot in accordance with, for example, the rules for updating the slot information shown in FIGS. 2, 3, and 4. Update. As a result, the slot state can be managed and the slot value can be determined as the dialogue progresses.
[0040]
Furthermore, according to the invention of claim 4, in the voice interaction device, the slot status is ranked in the order of “unknown” <“unconfirmed” <“confirmed” based on the semantic information, and can be managed by the slot information management unit. Therefore, it is possible to update the slot status by assigning a ranking to the status of the slot based on the semantic information. That is, the slot status can be ranked and updated based on the presence or absence of an affirmative word / negative word for a slot or a slot value detected from a slot phrase.
[0041]
Further, according to the fifth aspect of the present invention, the database search unit searches the database constituted by a plurality of slot values for each category by a combination of the obtained slot values, and as a result of the search, the search result is excluded from the search target. If the slot value is not uniquely determined for a slot for which a certain slot value has not been obtained, the slot value obtained by the search shall be the slot value of the corresponding slot, and the slot status of that slot shall be "database setting" Will be able to
[0042]
That is, for example, it is assumed that the database is searched with “shop name” as “piccolo” and address as “Nagoya City”. As a search result at this time, it is assumed that “piccolo” of “Nagoya” has a plurality of stores. However, if the “business type” of any store is “restaurant”, “restaurant” can be set as a slot value for the “business type” slot. Further, the slot state at this time is referred to as “database setting”. Thus, for example, for a well-known unique facility such as "Nagoya City Hall", only "store name" is answered, and all other slots are set as "set by database" by searching the database. . The voice interaction device inserts the set slot value into the guidance sentence for the slot in the “database setting” state and confirms it so that the detailed address can be obtained without waiting for a response from the user. By providing and confirming the type of business, such as the type of business, and confirming it, the intended slot can be easily set. From the user's point of view, if only the point slot is answered, only the utterance in the language that expresses "affirmation" or "denial" such as "yes" or "no" is provided by the explicit confirmation guidance from the device side. Thus, there is an effect that a target slot can be set.
[0043]
Furthermore, the slot information management unit ranks and manages and updates “unknown” <“setting by database” <“unconfirmed” <“confirmed” in the order of four types including the “setting by database” state. Therefore, the slot value set by the database is not “unknown”, but can be handled as a weaker state than “unconfirmed” which is a value obtained from the answer of the user.
[0044]
Furthermore, according to the invention of claim 6, for the guidance requesting the answer in the "positive language" or the answer in the "negative language" for a certain slot, the user utters that it explicitly expresses affirmative or negative. By doing so, the semantic information generating unit can generate “positive” or “negative” semantic information for a certain slot.
[0045]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is Nagoya City?"
User: "No."
[0046]
Further, according to the invention of claim 7, the user explicitly utters affirmation or denial with respect to a guidance requesting an answer in a "positive language" or an answer in a "negative language" for a certain slot. Alternatively, by speaking about one or more slots, the semantic information generation unit can generate semantic information for one or more slots.
[0047]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is Nagoya City?"
User: "It is a library."
[0048]
Furthermore, according to the invention of claim 8, the user explicitly utters an affirmative or negative response to a guidance requesting an answer in a “positive language” or an answer in a “negative language” for a certain slot. Instead, by uttering the slot phrase that is required to be confirmed, the semantic information generation unit can generate the semantic information including a set of the slot value detected from the slot phrase and “positive” for a certain slot. Become like
[0049]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "I'm Nagoya City."
[0050]
Further, according to the ninth aspect of the present invention, in response to a guidance requesting an answer in a "positive language" or an answer in a "negative language" for a certain slot, the user may instead speak explicitly to express an affirmation. By speaking about one or a plurality of slots, the semantic information generating unit can generate “positive” semantic information for a certain slot, and can further generate semantic information for one or a plurality of slots, respectively. Become like
[0051]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "It is a restaurant piccolo in Nagoya."
[0052]
Further, according to the invention of claim 10, in response to a guidance requesting an answer in a “positive language” or an answer in a “negative language” for a certain slot, the user does not speak explicitly to express negation. By using an utterance using a slot phrase that is required to be confirmed and a command phrase including a word representing “negation,” the semantic information generation unit determines, for a certain slot, a combination of a slot value detected from the slot phrase and “negation”. Can be generated.
[0053]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "Not Nagoya City."
[0054]
Furthermore, according to the invention of claim 11, in response to a guidance requesting an answer in a “positive language” or an answer in a “negative language” for a certain slot, instead of uttering that the user explicitly expresses negation, By uttering the correct slot phrase, the semantic information generation unit generates, for a certain slot, semantic information including a set of a held slot value and “negation”, and further generates a slot detected from the uttered correct slot phrase. It becomes possible to generate semantic information composed of a set of a value and “positive”.
[0055]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "Toyota City."
[0056]
Further, according to the invention of claim 12, in response to a guidance requesting an answer in a “positive language” or an answer in a “negative language” for a certain slot, instead of explicitly saying that the user expresses a negative, By making an utterance using a command phrase including a slot phrase that is required to be confirmed and a word indicating “negation”, and further uttering one or more slots, the semantic information generation unit , The semantic information including a set of the slot value detected from the slot phrase and “negation” can be generated, and further, the semantic information can be generated for one or a plurality of slots.
[0057]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "Not in Nagoya, but in Toyota."
[0058]
Further, according to the invention of claim 13, in response to a guidance requesting an answer in a “positive language” or an answer in a “negative language” for a certain slot, instead of saying the user explicitly expressing negation, the user does not speak. , Speaking the correct slot phrase, and further speaking about one or more new slots, the semantic information generating unit generates “negative” semantic information for the slot value held by a certain slot, For a certain slot, semantic information consisting of a set of a slot value and “positive” detected from a correct slot word spoken can be generated, and further, semantic information can be generated for one or a plurality of slots. become.
[0059]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "Restaurant in Toyota City."
[0060]
Further, according to the invention of claim 14, in response to a guidance requesting an answer in a “positive language” or an answer in a “negative language” for a certain slot, instead of uttering that the user explicitly expresses negation, By uttering using a command phrase including a slot phrase that is required to be confirmed and a word representing “negation”, and further speaking about one or more slots, the semantic information generation unit performs, for a certain slot, It is possible to generate semantic information composed of a set of a slot value detected from a slot phrase and “negation”, and further generate semantic information for one or a plurality of slots.
[0061]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Is it Nagoya?"
User: "Not a restaurant in Toyota City, but in Nagoya City."
[0062]
Further, according to the invention of claim 15, in response to the guidance for asking the slot value of a certain slot, the user utters directly using the slot phrase for a certain slot, so that the semantic information generation unit can determine the slot phrase for a certain slot. , It is possible to generate semantic information composed of a set of a slot value detected from and a “positive”.
[0063]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Please say your address."
User: "I'm Nagoya City."
[0064]
Furthermore, according to the invention of claim 16, in response to the guidance for asking the slot value of a certain slot, the user speaks a slot other than a certain slot using a slot phrase, so that the For other slots, semantic information consisting of a set of a slot value detected from an uttered slot word and an “affirmation” can be generated.
[0065]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Please say your address."
User: "It's a piccolo restaurant."
[0066]
Furthermore, according to the seventeenth aspect of the present invention, the user answers the guidance for asking the slot value of a certain slot using one or more slots other than the certain slot using a slot word, thereby generating semantic information. The unit can generate, for one or more slots other than a certain slot, semantic information including a set of a slot value detected from a slot phrase spoken for each slot and “positive”. Become.
[0067]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "Please say your address."
User: "It is a piccolo restaurant in Nagoya."
[0068]
Further, according to the eighteenth aspect of the present invention, in response to the guidance for asking the slot value of a certain slot, the user uses the slot phrase for a certain slot or / and for one or more slots other than the asked slot. The answer, the semantic information generation unit determines one or a plurality of slots other than the inquired slot and / or the semantic information based on the combination of the slot value and “positive” detected from the slot phrase spoken for a certain slot. , It is possible to generate semantic information consisting of a set of a slot value detected from a slot phrase spoken for each slot and “positive”.
[0069]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "What shop in Nagoya?"
User: "Piccolo, a restaurant in Toyota City, not Nagoya City."
[0070]
Further, according to the invention of claim 19, in order to obtain a target answer, the guidance sentence which is a sentence urging the speaker to answer is an explicit guidance sentence, an implicit guidance sentence, or an implicit confirmation, Guidance sentences that ask questions simultaneously can be used. Thus, a guidance sentence that prompts the user to speak can be presented as a guidance sentence closer to a person-to-person conversation.
[0071]
For example, the following dialogue is possible as an example of the dialogue.
Dialogue device: "What shop in Nagoya?"
User: "It's not a Nagoya city, but a restaurant piccolo in Toyota city."
[0072]
Further, when the program according to the twentieth aspect of the present invention is installed and used in the above-described voice interactive device, the speaker can set a desired answer in at least one slot provided for each of a plurality of different categories in the above voice interactive device. As a result, the guidance sentence can be output by voice, and the answer sentence obtained from the speaker in response to the guidance can be analyzed. Further, a target answer can be determined using the analysis result. Furthermore, from the answer sentence, a slot word that is a candidate for the slot value of the slot of the category asked in the guidance sentence and a slot phrase composed of a word string with a particle or the like, and other than the question asked in the guidance sentence , And a command phrase including a word indicating “positive” or “negative” can be detected by the keyword detection procedure. If the answer to the guidance is analyzed and not all the slots are determined, it is possible to present to the user a guidance sentence requesting an answer to a slot of a category to be confirmed. As a result, even if the user freely answers, the target slot value of the apparatus can be determined.
[0073]
Further, by installing and using the voice dialogue program according to the invention of claim 21, the slot value is detected from the detected slot phrase by the semantic information generation procedure, and the answer sentence is determined to which slot or the detected slot value. Analyzing or referring to the slot, referring to the slot or slot value, determine the degree of “positive”, “negative”, or “none”, respectively, for each slot or for each slot value , The meaning of the utterance is analyzed based on the determination result, and semantic information including a set of determination results for each slot or each slot value can be generated. Further, according to the slot information updating procedure, it becomes possible to update the slot information composed of a set of the slot value and the slot state in each slot by the semantic information. As a result, even if the user utters an answer other than the answer requested by the guidance sentence, the user can recognize the slot value from the slot phrase, recognize the slot value, confirm the tacit, and include the “affirmation” and “negation” in the person-to-person conversation. As described above, it is possible to perform a dialogue without being conscious of the guidance.
[0074]
Furthermore, according to the invention of claim 22, the slot status can be managed in the order of “unknown” <“unconfirmed” <“confirmed” by the semantic information by the slot information management procedure.
[0075]
Further, according to the invention of claim 23, the voice dialogue program further has a database search procedure, searches a database constituted by a plurality of slot values by a combination of the obtained slot values, and obtains a search result. If the slot value is not uniquely determined for the slot for which no slot value has been obtained, the slot value obtained by the search is used as the slot value of the corresponding slot, and the slot state is set by “database setting”. And you will be able to At the same time, the four types of states including the “database setting” state allow management in the order of “unknown” <“database setting” <“unconfirmed” <“confirmed”.
[0076]
According to the above-described invention, the user does not always have to proceed with the dialogue while being conscious of the guidance from the device, and can freely speak what he wants to say or what he has come up with. Also, the user can speak freely, but the dialogue device can prompt the user to speak by explicitly asking / confirming a specific slot. For example, guidance that explicitly asks for a category such as “please name the store” has the effect of prompting the user to speak appropriately using the guidance.
[0077]
The operation and effect of the present invention described above are not limited to Japanese language processing, but are effective for arbitrary natural language processing.
[0078]
Further, as can be seen from the above operation principle, the present invention has a feature in detecting a target word in language processing and determining the meaning of the detected word, and is not necessarily premised on voice input or voice output. . That is, the present invention provides an automatic translation device or an automatic minutes generation device that does not assume that a real-time response or an interactive response is made to a user, or a user who does not assume that a voice is input. It is also possible to apply the present invention to a character string input type language processing device or the like that is input from a user.
[0079]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described based on specific examples. However, the present invention is not limited to the embodiments described below.
[0080]
Here, a description will be given of an example of a dialogue in which a destination setting in a car navigation system is a target task.
[0081]
As the slots required for setting the destination, four types of “shop name”, “city name”, “detailed address”, and “business type” are considered. In addition, a database that can search the longitude and latitude of a store (facility) as a destination from these slot values is used. The specific configuration of each unit in FIG. 1 will be described below.
[0082]
FIG. 1 is a configuration diagram illustrating a logical configuration of a voice interaction device 100 according to an embodiment of the present invention.
As a physical hardware configuration, the voice interaction device 100 includes a man-machine interface unit such as a microphone included in the voice input unit 110 and a speaker included in the voice output unit 160, similarly to a known voice interaction device. It is embodied by a computer system. More specifically, the operations of the voice input unit 110 including a microphone and peripheral devices, the voice recognition unit 120, the meaning understanding unit 130, the slot information management unit 140, the guidance generation unit 150, the database search unit 170, and the dialog control unit 180 A control device configured to include a CPU that performs control and executes the processing program, a ROM that stores a processing program, a dictionary, a database, and the like; a RAM that temporarily stores data and a RAM that temporarily stores a processing program; Further, the audio output unit 160 includes a speaker and its peripheral devices. Further, a display device, an input button, and the like are provided as necessary.
[0083]
The voice input unit 110 takes in the voice of the speaker from the microphone as voice information. The microphone and its peripheral devices convert the voice into a digital signal and output the digital signal to the voice recognition unit 120.
[0084]
Further, dictionaries, databases, and the like, which will be described later, may be stored in an external storage device such as a hard disk, a CD-ROM, or the like. In this case, the voice interactive device is a computer including a hard disk device and a CD-ROM device. It is composed of a system. Further, as an initial process, when stored in a hard disk device, the data may be read and used in a medium having a high access speed such as a RAM or a hard disk device when stored in a CD-ROM. good.
[0085]
The voice recognition unit 120 recognizes the voice of the speaker as a character string. That is, the speech information input from the microphone (speech input unit 110) is temporarily stored in the RAM, and speech recognition processing using a speech recognition dictionary (a recognition language dictionary, a recognition acoustic dictionary, or the like) is performed, The data is converted into a character string by a dedicated chip having a CPU dedicated to the recognition processing and output.
[0086]
As the language model in the voice recognition unit 120, a probabilistic language model (N-gram) learned using a corpus collected from a dialogue with a person regarding a destination setting is used. This language model widely covers the utterances related to the destination setting. For example, the language model is stored in a CD-ROM or the like. Read and use. In the present invention, a single language model and a single word model are used for a target destination setting without using a method of selecting a language model and a word dictionary according to the situation of the dialogue. Since the utterance of the user related to a certain destination setting can be widely covered, the user can say what he / she wants to say at any point in the dialogue.
[0087]
The semantic understanding unit 130 includes a keyword detecting unit 131 and a semantic information generating unit 132. The keyword detecting unit 131 detects a slot phrase including a slot value from the recognition result character string, It is determined whether it is mentioned. To make the determination, for example, the category of the detected phrase is identified using a word dictionary or a slot value candidate word list stored in the RAM, and it is determined whether the destination setting belongs to the assumed category. Thus, it is determined whether or not the word is a slot phrase. Here, it is also possible to proceed with the processing while leaving each dictionary on the hard disk without storing it in the RAM. In addition, the recognition result character string, the detected word, and the like are temporarily stored in a predetermined memory, and the processing of the apparatus is performed. As the category, for example, any attribute such as an address, a place name, a facility type, a shop name, a business type, a facility name, a landmark name, or a user setting name can be defined. If the result of the determination is that the phrase is a slot phrase, it is detected as a slot phrase along with the obtained category. Further, the semantic information generation unit 132 detects a slot value from the slot phrase and sets it as a candidate for the slot value. Further, by examining the presence of a command phrase including a word indicating negation such as “not” or “different”, it is determined whether the slot value is positive or negative. The category of the obtained slot value is determined, and further, semantic information is generated for each slot based on the presence or absence of a negative word. These processes may be performed on a dedicated CPU and a memory provided in the meaning understanding unit 130, or may be performed on a CPU and a memory that control the entire voice interactive device. A CPU and a memory are installed according to the processing load of the meaning understanding unit 130 and the processing capacity of the CPU.
[0088]
For example, the following semantic information is generated for “Restaurant in Chikusa-ku, not Naka-ku”.
Store name slot: "Yes"
Municipal slot: "Yes"
Detailed address slot: "Naka Ward Negative Chikusa Ward"
Industry Slot: "Restaurant"
[0089]
If it is not possible to determine which slot is referred to simply by saying "No", "weak negation" is given to all slots. In the present embodiment, the semantic information includes a slot value or a referred slot and a semantic element. The semantic element is composed of four types of elements: “affirmation”, “weak negation”, “negation”, and “neither”.
[0090]
In the slot information management unit 140, for each slot configured as a table defined on a predetermined memory in accordance with the rules shown in FIGS. 2, 3, and 4, a "slot value" Update the slot information consisting of “state”.
[0091]
There are four states, "unknown", "unconfirmed", "confirmed", and "setting by database". The “unknown” state is a state in which there is no slot value, and indicates a slot that needs to be asked by the user from the system. The “unconfirmed” state is a state that has some value as a slot value, but that value needs to be confirmed. The “determined” state is a state in which a slot value has a certain value, and the value is confirmed. The “set by database” state is a value set by the database search result when the slot value is uniquely determined from the database search result. Although it is not a value obtained from the user's answer, it is a value when there is no other value from the database search result, and that value needs to be confirmed with the user. The database search unit 170 searches the database 171 based on the value of each slot, and updates each slot information according to the search result and the rule shown in FIG.
[0092]
The database 151 mainly includes a speech recognition dictionary, a word dictionary, a slot value candidate word list, a speech synthesis dictionary, a database including a plurality of slot values for each category, a guidance sentence template, and the like. For example, it is stored in a CD-ROM or the like, and is read and used as needed on a memory or a medium having a higher processing speed. The search of each database may be configured by firmware composed of a single dedicated CPU and a memory, respectively, depending on the processing capacity of the CPU and the required processing speed. May be performed. In any case, the configuration differs depending on the processing capacity.
(A) Dictionary for speech recognition
It consists of a language dictionary for recognition, a sound dictionary for recognition, and the like.
(B) Word dictionary
It has categories, related categories, other attributes, pronunciation information, and the like.
(C) Slot value candidate word list
Candidate word list. A table made up of pairs of words or phrases containing words and categories of the words.
(D) Speech synthesis dictionary
It has pronunciation rules for speech synthesis regarding utterance inflection, word connection, spacing, and the like.
(E) A database composed of a plurality of slot values for each category
A database composed of slot values that exist for each category, and a database based on slot values that are associated with combinations of categories that exist in each slot value.
(F) Guidance sentence template
It consists of a plurality of guidance sentence templates corresponding to a pair of confirmation slot and question slot.
[0093]
The guidance generating unit 150 determines a category to be asked to the user, a slot to be checked, a category to be checked, and a slot based on the value and state of each slot according to the start, continuation, and end of the dialogue. A guidance sentence that is a response sentence (a confirmation response sentence, a question response sentence, or the like) is generated using, for example, a guidance sentence template. Here, the slot value in the “unconfirmed” state is included in the guidance asking about the slot in the “unknown” state, and the confirmation is performed implicitly. In addition, when there are a plurality of slots in the “unknown” state, when asking them, it is possible to ask about a maximum of two slots with one guidance. In addition, the priority when selecting them is “shop name”> “municipal name”> “industry type”> “detailed address”. Furthermore, with the progress of the dialogue, referring to the slot information, if all the slot information is not satisfied, continue the dialogue, and for the slot to be asked or confirmed, give explicit guidance to that effect. Do.
[0094]
Further, the response sentence (word string) is converted and synthesized into an acoustic digital signal (voice information). However, this conversion / synthesis process may be performed by the audio output unit 160. These processes are performed, for example, on a memory in which the category to be asked to the user based on the value and state of each slot, the slot, the category to be confirmed, the slot, etc. are determined according to the start, continuation, and end of the dialog, respectively. Is set in the table, and is executed by the CPU by selecting a necessary guidance template.
[0095]
The audio output unit 160 converts and synthesizes the generated guidance sentence into an acoustic digital signal (audio information), and outputs it to the speaker as audio. These processes are executed by, for example, firmware configured by a dedicated chip and a memory for audio output, and are output from a speaker as audio.
[0096]
The dialog control unit 180 determines a question item or a confirmation item to be asked next, controls the flow of the dialog, and the progress of the dialog changes the slot state with respect to the retained slot value. Interact until they reach the “determined” state. The processing is generally performed by the main CPU of the voice interactive device, but depending on the required processing speed, it is composed of a single CPU and memory, and is managed and controlled by the main CPU of the voice interactive device.
[0097]
FIG. 7 and FIG. 8 are flowcharts illustrating the procedure of the process executed by the above-described voice interactive device 100. In this procedure, first, an initial process is executed in step 400, and a memory for storing slot information, a database search result, and semantic information is initialized. All four slots have no "slot value" and the "state" is initialized to an "unknown" state.
[0098]
In this process, an initial process such as loading programs and data expected to be frequently used from the database 171 onto a memory having a relatively high access speed may be executed. For example, when the voice interactive device 100 has a display device (not shown), other initial processing such as displaying an initial menu screen may be performed. By performing the following dialogue, for example, a table on the memory including the slot values and the slot states as shown in FIG. 6 is set. Further, if necessary, the slot phrase as the recognition result of the immediately preceding utterance may be stored in the memory as the immediately preceding recognition result.
[0099]
Next, in step 402, in the guidance generation unit 150, the system generates and combines start guidance. In step 404, the voice output unit 160 outputs voice guidance. As an audio output example,
System: "Set the destination."
[0100]
Next, in step 406, since all slots are in the "unknown" state, first, two slots, "store name" and "address" are asked in order from the one with the highest priority. A guidance sentence is generated. In step 408, the generated guidance sentence is output as voice. As an example of audio output,
System: "Please tell me the name and address of the store."
[0101]
In step 410, a voice utterance from the user is input from the voice input unit 110.
As an example of the dialog, it is assumed that the next utterance is performed by the user.
User: "It's a restaurant in Nagoya City."
In step 412, the speech recognition unit 120 recognizes the input speech as a character string. Here, it is assumed that the utterance of the previous user is erroneously recognized as follows.
Recognition result: "It's a restaurant in Nagano city."
[0102]
In step 414, the keyword is detected in the keyword detection unit 131 of the meaning understanding unit 130, and in the following step 416, the following meaning information is generated for each slot by the meaning information generation unit 132.
Store name slot meaning information: "Yes"
Municipal name slot meaning information: "Nagano City affirmation"
Detailed address slot meaning information: "Yes"
Business slot meaning information: "Restaurant affirmation"
[0103]
In step 418, the slot information management unit updates the slot information from the semantic information for each slot in accordance with the rules (slot information management procedure) shown in FIGS.
[0104]
Here, first, since the meaning information of the “store name” slot is only “positive”, the rule of the slot value “blank” and “positive” in FIG. 2 is used. Since the “store name” slot is in the initial state, the slot value is “blank” and the state is “unknown”, applying the rule of the column “current unknown” of the update rule to the store name slot information: “? Is set.
[0105]
Next, since the semantic information of the slot of "city name" is "Nagano City affirmation", the rule of the slot value "slot value A" and "affirmation" is used in FIG. Since the current slot status is “unknown”, if the rule in the column of the current slot status “unknown” of the update rule is applied, the municipal name slot information: “Nagano city unconfirmed” is set. Similarly, when the rule is applied to the semantic information, the detailed address slot information: “? Unknown” and the business type slot information: “restaurant not confirmed” are set.
[0106]
Next, in step 420, the database search unit 170 searches the database 171 using the set and updated slot information. Here, it is assumed that a plurality of search results are obtained as a result of searching for “Nagano City” and “Restaurant”, and these are not uniquely determined for “Store Name” and “Detailed Address”. Therefore, the rule does not match the rule shown in FIG. 5, and the slot information is not updated here.
[0107]
Next, in step 422, not all the slots are filled (the information of all the slots becomes “determined”), and the process returns to step 406, for example, the next guidance sentence is generated and synthesized. Thereafter, steps 406 to 422 are repeatedly executed until all slots are filled. Hereinafter, the rules for updating the slot information based on the semantic information and the rules for updating the slot information based on the database search result will be described with reference to an example of the dialogue, and the method of determining the slot value will be described.
[0108]
As an example of continuing the dialogue, the following guidance sentence is generated and output.
System: "What is the restaurant in Nagano City?"
The user
User: “It's not Nagano City, but a dandelion in Nagoya City.”
And the system
Recognition result: "It's not Nagano City but Nagoya Dandelion"
It is assumed that it is recognized. The semantic understanding unit generates the following semantic information as a result of the analysis.
Store name slot meaning information: "dandelion"
Municipal name slot meaning information: "Nagano city denial Nagoya city affirmation"
Detailed address slot meaning information: "Yes"
Industry slot semantic information: "Yes"
[0109]
The slot information management unit updates each piece of slot information according to the rules shown in FIGS.
Store name slot information: "dandelion not confirmed"
Municipal name slot information: "Nagoya City unconfirmed"
Detailed address slot information: "? Unknown"
Industry slot information: "Restaurant confirmed"
Is set.
[0110]
Here, it is assumed that a plurality of results are obtained as a result of the search by the database search unit, and that these are not uniquely determined for the detailed address. Therefore, the rule of FIG. 5 was not met, and the slot value was not updated here.
[0111]
Next, the following guidance is generated because not all slots have been filled yet.
System: "Please tell me the detailed address of the dandelion in Nagoya."
Suppose the user's answer is as follows:
User: "Um, I think it was Naka Ward."
Recognition result: "Uto Naka-ku, Izakaya Masuga"
In the semantic understanding unit, a keyword is detected by the keyword detecting unit 131 in step 414 for executing the keyword detecting procedure, and a step 416 for executing the semantic information generating procedure and the semantic information generating unit 132 perform the following semantic information. Is generated.
Store name slot meaning information: "Yes"
Municipal name slot meaning information: "Yes"
Detailed address slot meaning information: "Naka Ward"
Industry slot meaning information: "Izakaya affirmation"
When the slot information is updated with respect to this semantic information using the rule of updating with the semantic information,
Store name slot information: "Dandelion confirmed"
Municipal name slot information: "Nagoya City confirmed"
Detailed address slot information: “Unconfirmed Naka Ward”
Industry slot information: "Restaurant confirmed"
Since the industry slot has already been determined, it is not affected by the misrecognition "Izakaya".
[0112]
Here, it is assumed that one search result is obtained as a result of searching for “dandelion”, “Nagoya City”, “Naka Ward”, and “restaurant”. However, since “Naka Ward Unconfirmed” is not affected, and all slots have not yet been filled, the next guidance sentence is generated by step 406.
System: "A restaurant dandelion in Naka-ku, Nagoya."
Assume that the user's answer is as follows.
User: "Yes."
Recognition result: "Yes"
In the meaning understanding unit 130, the following meaning information is generated.
Store name slot meaning information: "Yes"
Municipal name slot meaning information: "Yes"
Detailed address slot meaning information: "Yes"
Industry slot semantic information "Yes"
In the slot information management unit 140, the slot information is updated.
Store name slot information: "Dandelion confirmed"
Municipal name slot information: "Nagoya City confirmed"
Detailed address slot information: "Naka Ward confirmed"
Industry slot information: "Restaurant confirmed"
And all information is determined.
[0113]
Further, the database search unit 170 obtains one result as a result of searching for "dandelion", "Nagoya City", "Naka Ward", and "restaurant", and all slot information is determined.
[0114]
Proceeding to step 424, the next guidance sentence is generated, and voice is output in step 426, ending the dialogue.
System: "Then, set with dandelions in Nagoya City."
[0115]
Here, the rules for updating the slot information based on the semantic information will be supplementarily described.
The case where the semantic information shown in FIG. 3 is “slot value A positive / none” will be described.
[0116]
When the current slot state is “unknown”, the slot value A is set as the slot value, and the slot state is set to “unconfirmed”.
[0117]
If the current slot state is “unconfirmed”, it is assumed that the slot held is the slot value B. At this time, if the slot value A obtained from the answer is the same as the held slot value B, the slot value is held as it is, and the slot state is set to “determined”. At this time, if the slot value A obtained from the answer is different from the held slot value B, the slot value A is set as the slot value, and the slot state is set to “unconfirmed”.
[0118]
When the current slot state is “determined”, it is assumed that the slot held is the slot value B. It is also assumed that the slot value recognized immediately before is the slot value C. At this time, if the slot value A obtained from the answer is the same as the held slot value B, the slot value is held as it is, and the slot state is set to “determined”. At this time, if the slot value A obtained from the answer is different from the held slot value B, the immediately preceding slot value is confirmed. When a slot value different from the held slot value is input twice in succession, if the input word is the same twice, the slot value is updated by the input word. That is, if slot value A = slot value C, the slot value is set as slot value A, and the slot state is set to “unconfirmed”. If slot value A ≠ slot value C, the slot value is held as it is, and the slot state is set to “determined”.
[0119]
When the current slot state is “setting by database”, it is assumed that the slot held is the slot value B. At this time, if the slot value A obtained from the answer is the same as the held slot value B, the slot value is held as it is, and the slot state is set to “determined”. At this time, if the slot value A obtained from the answer is different from the held slot value B, the slot value A is set as the slot value, and the slot state is set to “unconfirmed”.
[0120]
As described above, by implementing the voice interactive device, it is possible to obtain a voice interactive device capable of correctly determining the slot value even if the user utters freely regardless of the guidance sentence. A very easy-to-use voice interactive device can be obtained.
[0121]
The rules for updating the slot information described above are merely examples, and any other rules may be used as long as the rules allow the slot value to be determined efficiently.
[0122]
More specifically, the keyword detection unit according to claim 1 is realized by searching the CPU and step 414, which is a processing procedure thereof, and a database such as a speech recognition dictionary and a word dictionary.
[0123]
Specifically, the dialogue control unit of claim 1 controls and realizes all the dialogues in steps 400 to 426 which are the CPU and its processing procedure.
[0124]
Claim 3, Claim 6, Claim 7, Claim 8, Claim 9, Claim 10, Claim 11, Claim 12, Claim 13, Claim 14, Claim 15, Claim 16, Claim 17. Specifically, the semantic information generation unit according to claim 18 is specifically realized by using a CPU and step 416 which is a processing procedure thereof, and a database such as a word dictionary and a slot value candidate word list.
[0125]
The slot information management unit according to the third, fourth, and fifth aspects is specifically realized by a CPU and 418 which is a processing procedure thereof.
[0126]
Specifically, the database search unit according to claim 5 is realized by searching a database constituted by a plurality of slot values for each category in step 420, which is a CPU and its processing procedure.
[0127]
Furthermore, in the above embodiment, the degree of “affirmation” and “negation” was described in four types of “affirmation”, “negation”, “weak negation”, and “neither”, but if necessary, "Weak affirmation", "strong affirmation", "strong negation", and the like may be added. Further, other "positive" and "negative" steps may be provided. Further, a rule for updating slot information according to the degree of “positive” or “deny” may be set.
[0128]
Further, in the above embodiment, the description of the processing for determining the slot value has been described using an example in which the slot value detected from the utterance of the user is used, but the once detected slot phrase and slot value are not used. The processing may be internally performed by using a slot phrase ID (IDentification), which is a slot phrase individual recognition code individually attached to the slot phrase, a slot word ID, or the like.
[0129]
Further, the slot phrase may be a single slot word or a phrase in which the slot word is accompanied by a particle.
[0130]
Further, in the above-described embodiment, for easy understanding and simplicity, when asking a question by guidance, an example of a question for one category is used, but guidance for a plurality of categories may be used. . For example, a guidance for asking "prefecture name" and "store name" at a time, such as "please enter the name of the destination prefecture and the name of the store." The same applies to other categories, and there is no restriction on the category to be asked at the same time.
[0131]
In addition, Claim 5, Claim 6, Claim 7, Claim 8, Claim 9, Claim 10, Claim 11, Claim 12, Claim 13, Claim 14, Claim 15, Claim 16, The speech dialogue program for realizing the invention of claim 17 can be realized by using the inventions of claims 18, 19, 20, and 21 individually or in combination. May be used.
[0132]
It should be noted that the present invention is an invention in a speech dialogue apparatus, but the above-described language recognition method can be applied to a dialogue method other than speech.
[0133]
The embodiment described above is an example of the present invention, and the present invention is not limited to the embodiment. Various modifications can be considered in light of the essence of the present invention.
[Brief description of the drawings]
FIG. 1 is a configuration diagram illustrating a logical configuration of a voice interaction device according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram (part 1) of a rule for updating slot information based on semantic information.
FIG. 3 is an explanatory diagram of a rule for updating slot information by means of semantic information (part 2).
FIG. 4 is an explanatory view (3) of a rule for updating slot information based on semantic information.
FIG. 5 is an explanatory diagram of a rule for updating slot information based on a database search result.
FIG. 6 is an example of a table constituting slot information to be set.
FIG. 7 is a flowchart (part 1) illustrating a procedure of a process executed by the voice interaction apparatus 100;
FIG. 8 is a flowchart (part 2) illustrating a procedure of a process executed by the voice interaction device 100;
[Explanation of symbols]
100… voice interaction device
110 ... voice input unit
120… voice recognition unit
130… semantic understanding
131 ... Keyword detection unit
132 ... semantic information generation unit
140… slot information management unit
150 ... Guidance generation unit
160 ... audio output unit
170… Database search section
171… Database
180… Dialogue control unit

Claims

A guidance sentence for at least one category is output as speech in at least one slot provided for each of a plurality of different categories in order to obtain a target answer from the speaker, and the guidance sentence is obtained from the speaker in response to the guidance. In a spoken dialogue device that analyzes a given answer sentence and determines the desired answer,
From the answer sentence, a slot phrase including a slot value candidate of the category asked in the guidance sentence, and a slot phrase containing slot value candidates for slots of other categories other than the question asked in the guidance sentence, and A keyword detection unit that detects a command phrase including a word representing “positive” or “negative”;
If not all the slots are determined, a dialogue control unit that controls a guidance sentence requesting an answer to a slot of a category to be checked,
A voice interactive device comprising:

In the voice interaction device,
The slot phrase is a slot word that is a candidate word for a slot value, and a slot word string in which the candidate word for a slot value is accompanied by a particle or the like,
2. The command phrase according to claim 1, wherein the command phrase is a command word that is a word representing "positive" or "negative", and a command word sequence is a word sequence including a word representing "positive" or "negative". The speech dialogue device according to 1.

In the voice interaction device,
Detecting a slot value from the detected slot phrase;
Analyzing which answer sentence refers to which slot or the detected slot value,
The presence or absence of a command phrase for the slot referred to, or the slot value, and the meaning thereof, determine the degree of “affirmation”, “negation”, or “neither”,
Semantic information for analyzing the meaning of the utterance based on each determination result for each slot or each slot value, and generating semantic information including a set of the determination results for each slot or each slot value. A generating unit;
The voice interaction apparatus according to claim 2, further comprising: a slot information management unit that updates slot information including a set of a slot value and a slot state in each slot based on the semantic information.

In the voice interaction device,
The voice interaction apparatus according to claim 3, wherein the slot status is managed by the slot information management unit in the order of "unknown"<"unconfirmed"<"confirmed" according to the semantic information. .

The voice interaction device,
Based on the obtained combination of the slot values, a search is performed in a database configured by a plurality of slot values for each category, and as a result of the search, the slot values of the slots for which no slot values that are not searched are not obtained. When uniquely determined, the slot value obtained by the search is set as the slot value of the corresponding slot, and the database further includes a database search unit that sets the slot state to “setting by database”,
The slot information management unit includes:
By the four types of states including the “setting by database” state,
The voice interaction apparatus according to claim 4, wherein the slot information is managed in the order of "unknown"<"databasesetting"<"unconfirmed"<"confirmed".

For a guidance requesting an answer in a "positive language" or a "negative language" for a certain slot, the user explicitly speaks out to express affirmation or negation, so that the semantic information generation unit, The voice interaction apparatus according to any one of claims 3 to 5, wherein semantic information of "positive" or "negative" can be generated for the certain slot.

For guidance that requires an answer in a "positive language" or an answer in a "negative language" for a slot, instead of explicitly stating affirmation or negation, the user may respond to one or more slots. The voice interaction according to any one of claims 3 to 5, wherein, by speaking, the semantic information generating unit can generate semantic information about the one or more slots. apparatus.

For guidance that requires an answer in a "positive language" or an answer in a "negative language" for a slot, the user may be asked to confirm a slot instead of explicitly saying yes or no. 8. The utterance of a phrase, the semantic information generation unit can generate the semantic information including a set of a slot value detected from the slot phrase and "positive" for the certain slot. The voice interaction device according to claim 1.

For guidance requesting a "positive language" answer or a "negative language" answer for a slot, the user speaks for one or more slots instead of explicitly saying yes. Thus, the semantic information generation unit can generate “positive” semantic information for the certain slot, and can further generate semantic information for each of the one or more slots. The voice interaction device according to claim 1.

For guidance requesting a response in a "positive language" or a response in a "negative language" for a slot, instead of explicitly saying negative, the user may be asked to confirm the slot phrase The utterance using the command phrase including the word indicating “negation” causes the semantic information generating unit to generate, for the certain slot, semantic information including a pair of a slot value detected from the slot phrase and “negation”. The voice interaction device according to claim 7, wherein the voice interaction device can perform the operation.

For guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, the user may say the correct slot phrase instead of explicitly saying negation. The semantic information generation unit generates, for the certain slot, semantic information including a set of a held slot value and “negation”, and further sets a set of a slot value and “positive” detected from the uttered correct slot word. The speech interaction apparatus according to claim 7, wherein the speech interaction apparatus can generate semantic information consisting of:

In response to a guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of uttering an explicit negation, the user may be asked to confirm the slot phrase and " By making an utterance using a command phrase including a word indicating “negation” and further uttering one or more slots, the semantic information generation unit detects the certain slot from the slot phrase. The voice interaction apparatus according to claim 7, wherein semantic information including a set of a slot value and "negation" is generated, and further, semantic information can be generated for each of the one or a plurality of slots.

In response to a guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of explicitly saying negative, the user speaks the correct slot phrase and By speaking about one or a plurality of new slots, the semantic information generating unit generates semantic information of “negation” for the slot value held by the certain slot, and further, for the certain slot, 8. The method according to claim 7, wherein semantic information including a set of a slot value detected from a correct slot word and "positive" is generated, and further, semantic information can be generated for each of the one or more slots. The voice interaction device according to claim 1.

In response to a guidance requesting an answer in a "positive language" or an answer in a "negative language" for a slot, instead of uttering an explicit negation, the user may be asked to confirm the slot phrase and " By uttering using a command phrase including a word representing "negation" and further speaking about one or a plurality of slots, the semantic information generating unit generates, for the certain slot, a slot detected from the slot phrase. The voice interaction device according to claim 7, wherein semantic information including a set of a value and "negation" is generated, and further, semantic information can be generated for each of the one or the plurality of slots.

In response to the guidance for asking the slot value of a certain slot, the user speaks directly using the slot phrase for the certain slot, so that the semantic information generation unit generates a slot detected from the slot phrase for the certain slot. The voice interaction device according to claim 3, wherein semantic information including a set of a value and “positive” can be generated.

In response to the guidance for asking the slot value of a certain slot, the user utters using a slot phrase for a slot other than the certain slot, so that the semantic information generating unit is uttered for a slot other than the certain slot. The speech dialogue device according to claim 3, wherein semantic information including a set of a slot value detected from the slot phrase and “affirmation” can be generated.

In response to the guidance for asking the slot value of a certain slot, the user answers one or more slots other than the certain slot using a slot phrase, so that the semantic information generating unit sets 17. The method according to claim 16, wherein for one or a plurality of slots, semantic information including a set of a slot value detected from the slot phrase spoken for each slot and "positive" can be generated. A voice interaction device as described.

In response to guidance asking for a slot value of a slot, the user may answer the semantic information by using a slot phrase for the certain slot or / and for one or more slots other than the queried slot. The generation unit may include, for each of one or a plurality of slots other than the asked slot, one or a plurality of slots other than the requested slot, 17. The voice interaction apparatus according to claim 16, wherein semantic information including a set of "positive" and a slot value detected from a slot phrase spoken every time can be generated.

In order to obtain the desired answer, the guidance sentence that prompts the speaker for an answer sentence is that the guidance sentence is an explicit guidance sentence, an implicit guidance sentence, or a guidance sentence that asks a question at the same time as an implicit confirmation. 19. The voice interaction device according to claim 1, wherein

Outputting a guidance sentence for at least one category in order to obtain a desired answer from a speaker in at least one slot provided for each of a plurality of different categories;
Analyzing a response sentence obtained from the speaker in response to the guidance,
Using the analyzed result, a procedure for determining the target answer,
From the answer sentence, a slot word which is a candidate for a slot value of a slot of the category asked in the guidance sentence, except for a slot word and a slot word consisting of a word string with a particle or the like, and in the guidance sentence For other categories of slots, a slot word that is a candidate slot value, a slot word composed of a word string in which the slot word has a particle, etc., and a command phrase that includes a word representing “positive” or “negative” are detected. Keyword detection procedure,
If not all of the slots are confirmed, a dialogue control procedure for controlling a guidance sentence requesting an answer to a slot of a category to be checked,
Dialogue program that causes a computer to execute

In the voice dialogue program,
Detecting a slot value from the detected slot phrase,
Analyzing which answer sentence refers to which slot or the detected slot value,
Determine the degree of "positive", "negative", or "neither" for the slot referred to, or the slot value,
A semantic information generating step of analyzing the meaning of the utterance based on the determination result for each of the slots or for each of the slot values, and generating semantic information including a set of the determination results for each of the slots or for each of the slot values; When,
21. The storage medium according to claim 20, further comprising: a slot information management procedure for updating slot information including a set of a slot value and a slot state in each slot according to the semantic information.

The slot information management procedure includes:
22. The storage medium according to claim 21, further comprising a procedure of managing the slot states in the order of "unknown"<"unconfirmed"<"confirmed" according to the semantic information.

The voice dialogue program,
A search is made in a database composed of a plurality of slot values by the obtained combination of the slot values, and as a result of the search, a slot value that is not a search target and for which no slot value is obtained is uniquely determined. In the case, the database further includes a database search procedure in which the slot value obtained by the search is set to the slot value of the corresponding slot, and the slot state is set to “set by database”.
By the four types of states including the “setting by database” state,
23. The voice dialogue program according to claim 22, further comprising a procedure for managing the data in the order of "unknown"<"databasesetting"<"unconfirmed"<"confirmed".